**Shuvendu K. Lahiri Chao Wang (Eds.)**

# LNCS 12225

# **Computer Aided Verification**

**32nd International Conference, CAV 2020 Los Angeles, CA, USA, July 21–24, 2020 Proceedings, Part II**

## Lecture Notes in Computer Science 12225

#### Founding Editors

Gerhard Goos Karlsruhe Institute of Technology, Karlsruhe, Germany Juris Hartmanis Cornell University, Ithaca, NY, USA

#### Editorial Board Members

Elisa Bertino Purdue University, West Lafayette, IN, USA Wen Gao Peking University, Beijing, China Bernhard Steffen TU Dortmund University, Dortmund, Germany Gerhard Woeginger RWTH Aachen, Aachen, Germany Moti Yung Columbia University, New York, NY, USA

More information about this series at http://www.springer.com/series/7407

Shuvendu K. Lahiri • Chao Wang (Eds.)

## Computer Aided Verification

32nd International Conference, CAV 2020 Los Angeles, CA, USA, July 21–24, 2020 Proceedings, Part II

Editors Shuvendu K. Lahiri Microsoft Research Lab Redmond, WA, USA

Chao Wang University of Southern California Los Angeles, CA, USA

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-53290-1 ISBN 978-3-030-53291-8 (eBook) https://doi.org/10.1007/978-3-030-53291-8

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2020. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## Preface

It was our privilege to serve as the program chairs for CAV 2020, the 32nd International Conference on Computer-Aided Verification. CAV 2020 was held as a virtual conference during July 21–24, 2020. The tutorial day was on July 20, 2020, and the pre-conference workshops were held during July 19–20, 2020. Due to the coronavirus disease (COVID-19) outbreak, all events took place online.

CAV is an annual conference dedicated to the advancement of the theory and practice of computer-aided formal analysis methods for hardware and software systems. The primary focus of CAV is to extend the frontiers of verification techniques by expanding to new domains such as security, quantum computing, and machine learning. This puts CAV at the cutting edge of formal methods research, and this year's program is a reflection of this commitment.

CAV 2020 received a very high number of submissions (240). We accepted 18 tool papers, 4 case studies, and 43 regular papers, which amounts to an acceptance rate of roughly 27%. The accepted papers cover a wide spectrum of topics, from theoretical results to applications of formal methods. These papers apply or extend formal methods to a wide range of domains such as concurrency, machine learning, and industrially deployed systems. The program featured invited talks by David Dill (Calibra) and Pushmeet Kohli (Google DeepMind) as well as invited tutorials by Tevfik Bultan (University of California, Santa Barbara) and Sriram Sankaranarayanan (University of Colorado at Boulder). Furthermore, we continued the tradition of Logic Lounge, a series of discussions on computer science topics targeting a general audience.

In addition to the main conference, CAV 2020 hosted the following workshops: Numerical Software Verification (NSV), Verified Software: Theories, Tools, and Experiments (VSTTE), Verification of Neural Networks (VNN), Democratizing Software Verification, Synthesis (SYNT), Program Equivalence and Relational Reasoning (PERR), Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), Formal Methods for Blockchains (FMBC), and Verification Mentoring Workshop (VMW).

Organizing a flagship conference like CAV requires a great deal of effort from the community. The Program Committee (PC) for CAV 2020 consisted of 85 members – a committee of this size ensures that each member has to review a reasonable number of papers in the allotted time. In all, the committee members wrote over 960 reviews while investing significant effort to maintain and ensure the high quality of the conference program. We are grateful to the CAV 2020 PC for their outstanding efforts in evaluating the submissions and making sure that each paper got a fair chance. Like last year's CAV, we made the artifact evaluation mandatory for tool paper submissions and optional but encouraged for the rest of the accepted papers. The Artifact Evaluation Committee consisted of 40 reviewers who put in significant effort to evaluate each artifact. The goal of this process was to provide constructive feedback to tool developers and help make the research published in CAV more reproducible. The Artifact Evaluation Committee was generally quite impressed by the quality of the artifacts, and, in fact, all accepted tools passed the artifact evaluation. Among the accepted regular papers, 67% of the authors submitted an artifact, and 76% of these artifacts passed the evaluation. We are also very grateful to the Artifact Evaluation Committee for their hard work and dedication in evaluating the submitted artifacts. The evaluation and selection process involved thorough online PC discussions using the EasyChair conference management system, resulting in more than 2,000 comments.

CAV 2020 would not have been possible without the tremendous help we received from several individuals, and we would like to thank everyone who helped make CAV 2020 a success. First, we would like to thank Xinyu Wang and He Zhu for chairing the Artifact Evaluation Committee and Jyotirmoy Deshmukh for local arrangements. We also thank Zvonimir Rakamaric for chairing the workshop organization, Clark Barrett for managing sponsorship, Thomas Wies for arranging student fellowships, and Yakir Vizel for handling publicity. We also thank Roopsha Samanta for chairing the Mentoring Committee. Last but not least, we would like to thank members of the CAV Steering Committee (Kenneth McMillan, Aarti Gupta, Orna Grumberg, and Daniel Kroening) for helping us with several important aspects of organizing CAV 2020.

We hope that you will find the proceedings of CAV 2020 scientifically interesting and thought-provoking!

June 2020 Shuvendu K. Lahiri Chao Wang

## Organization

#### Program Chairs



Sagar Chaki Mentor Graphics, USA Ankush Desai Amazon, USA Cezara Dragoi Inria, France Michael Emmi Amazon, USA Malay Ganai Synopsys, USA Franjo Ivancic Google, USA Dejan Jovanovié SRI International, USA Akash Lal Microsoft, India Francesco Logozzo Facebook, USA Kenneth McMillan Microsoft, USA Ruzica Piskac Yale University, USA Xiaokang Qiu Purdue University, USA Philipp Ruemmer Uppsala University, Sweden

Swarat Chaudhuri University of Texas, Austin, USA Hana Chockler King's College London, UK Maria Christakis Max Planck Institute, Germany Eva Darulova Max Planck Institute, Germany Cristina David University of Cambridge, UK Jyotirmoy Deshmukh University of Southern California, USA Kerstin Eder University of Bristol, UK Constantin Enea Université de Paris, France Lu Feng University of Virginia, USA Yu Feng University of California, Santa Barbara, USA Bernd Finkbeiner Saarland University, Germany Dana Fisman Ben-Gurion University, Israel Daniel J. Fremont University of California, Santa Cruz, USA Ganesh Gopalakrishnan University of Utah, USA Orna Grumberg Technion - Israel Institute of Technology, Israel Arie Gurfinkel University of Waterloo, Canada Alan J. Hu The University of British Columbia, Canada Laura Humphrey Air Force Research Laboratory, USA Joxan Jaffar National University of Singapore, Singapore Zachary Kincaid Princeton University, USA Laura Kovacs Vienna University of Technology, Austria Daniel Kroening University of Oxford, UK Ori Lahav Tel Aviv University, Israel Anthony Lin TU Kaiserslautern, Germany Yang Liu Nanyang Technological University, Singapore Ruben Martins Carnegie Mellon University, USA Anastasia Mavridou NASA Ames Research Center, USA Jedidiah McClurg Colorado School of Mines, USA Kuldeep S. Meel National University of Singapore, Singapore Sayan Mitra University of Illinois at Urbana-Champaign, USA Mukund Raghothaman University of Southern California, USA Jan Reineke Saarland University, Germany Kristin Yvonne Rozier Iowa State University, USA


#### Artifact Evaluation Committee



#### Mentoring Workshop Chair


#### Steering Committee


#### Additional Reviewers


## Contents – Part II

#### Model Checking




## Contents – Part I

#### AI Verification


#### Blockchain and Security



xvi Contents – Part I


#### Concurrency


#### Hardware Verification and Decision Procedures


xviii Contents – Part I


## **Model Checking**

## **Automata Tutor v3**

Loris D'Antoni<sup>1</sup>, Martin Helfrich<sup>2</sup>, Jan Kretinsky<sup>2</sup>, Emanuel Ramneantu<sup>2</sup>, and Maximilian Weininger2(B)

<sup>1</sup> University of Wisconsin, Madison, USA loris@cs.wisc.edu <sup>2</sup> Technical University of Munich, Munich, Germany *{*martin.helfrich,jan.kretinsky,emanuel.ramneantu,maxi.weininger*}*@tum.de

**Abstract.** Computer science class enrollments have rapidly risen in the past decade. With current class sizes, standard approaches to grading and providing personalized feedback are no longer possible and new techniques become both feasible and necessary. In this paper, we present the third version of Automata Tutor, a tool for helping teachers and students in large courses on automata and formal languages. The second version of Automata Tutor supported automatic grading and feedback for finiteautomata constructions and has already been used by thousands of users in dozens of countries. This new version of Automata Tutor supports automated grading and feedback generation for a greatly extended variety of new problems, including problems that ask students to create regular expressions, context-free grammars, pushdown automata and Turing machines corresponding to a given description, and problems about converting between equivalent models - e.g., from regular expressions to nondeterministic finite automata. Moreover, for several problems, this new version also enables teachers and students to automatically generate new problem instances. We also present the results of a survey run on a class of 950 students, which shows very positive results about the usability and usefulness of the tool.

**Keywords:** Theory of computation · Automata theory · Personalized education · Automata tutor · Automated grading

#### **1 Introduction**

Computer science (CS) class enrollments have been rapidly rising, e.g., CS enrollment roughly triples per decade at Berkeley and Stanford [12] or TU Munich.

We thank Emil Ratko-Dehnert from ProLehre TUM for the professional help with the student survey; Tobias Nipkow and his team for allowing us to conduct the user survey in his class; Christian Backs, Vadim Goryainov, Sebastian Mair and Jan Wagener for the exercises they added as part of their Bachelor's theses; Julia Eisentraut and Salomon Sickert-Zehnter for their help in developing this project; the TUM fund "Verbesserung der Lehrmittelsituation" and the CAV community for caring about good teaching. Loris D'Antoni was supported, in part, by NSF under grants CNS-1763871, CCF-1750965, CCF-1744614, and CCF-1704117; and by the UW-Madison OVRGE with funding from WARF.

Both online and offline courses and degrees are being created to educate students and professionals in computer science and these courses may soon have thousands of students attending a lecture, or tens of thousands following a Massive Online Open Course (MOOC). At these scales, standard approaches to grading and providing personalized feedback are no longer possible and new techniques become both feasible and necessary. Current approaches for handling this growing student volume include reducing the complexity of assignments or relying on imprecise feedback and grading mechanisms. Simpler assessment mechanisms, e.g., multiple-choice questions, are easier to grade automatically but lack realism [8]. Designing better techniques for automated grading and feedback generation is therefore a necessity.

Recent advances in formal methods, including program synthesis and verification, can help teachers and students in verifiably correct ways that statistical or rule-based techniques cannot. For example, formal methods have been used to identify student errors and provide feedback for problems related to introductory Python programming assignments [17] geometry [9,11], algebra [16], logic [2], and automata [3,6]. In particular, for this last topic, the tool Automata Tutor v2 [7] has already been used by more than 9,000 students at more than 30 universities in North America, South America, Europe, and Asia.

In this paper, we present Automata Tutor v3, an online<sup>1</sup> tool that extends Automata Tutor v2 and uses techniques from program synthesis and decision procedures to improve the quality and effectiveness of teaching courses on automata and formal languages. Besides being part of the standard CS curriculum, the concepts taught in these courses are rich in structure and applications, e.g., in control theory, text editors, lexical analyzers, or models of software interfaces. Concrete topics in such curricula include automata, regular expressions, context-free grammars, and Turing machines. For problems and assignments related to these topics Automata Tutor v3 can automatically: (1) Detect whether the student's solution is correct. (2) Detect different types of student's mistakes and translate them into explanatory feedback. (3) If possible, generate new problems together with the corresponding solutions for teachers to use in class.

Automata Tutor v3 greatly expands its predecessor Automata Tutor v2, which only provides ways to pose and solve problems for deterministic and nondeterministic finite automata constructions. This paper describes the new components introduced by Automata Tutor v3 and how this new version improves on its previous one. The key advantages to its competitors are the breadth, automatic generation and grading of exercises, infrastructure allowing for use in large courses and a useful feedback to the students, compared to text-based interfaces used by Autotool [13], rudimentary feedback in JFLAP [14] and none in Gradience [1].

Since Automata Tutor has already been well received by teachers around the world, we believe that the readers from the CAV community will find great value in knowing about this new and fundamentally richer version of the tool and how

<sup>1</sup> https://automata.model.in.tum.de.

it can extensively help with teaching the automata and formal languages courses, a task we know many of the attendees have to face on a yearly basis.

Our contributions are the following:


#### **2 Automata Tutor in a Nutshell**

Automata Tutor is an online education tool created to support courses teaching basic concepts in automata and formal languages [7]. In this section, we describe how Automata Tutor helps teachers run large courses and students learn efficiently in such courses.

*Learning Without Automata Tutor.* Figure 1 schematically shows a studentteacher interaction in a course taught without an online tutoring system. The teacher creates exercises, grades them manually, and (sometimes) manually provides personalized feedback to the students. This type of interaction has many limitations: (1) it is asynchronous (i.e., the student has to wait a long time for what is often little feedback) and does not scale to large classrooms, posing strenuous amount of work on teachers, (2) it does not guarantee consistency in the assigned grades and feedback, and (3) it does not allow students to revise

**Fig. 1.** Common structure of practical sessions for CS classes.

their solutions upon receiving feedback as the teachers often release a solution to all students as part of the feedback and do not grade new submissions.

Another drawback of this interaction is the limited number of problems students can practice on. Because teachers do not have the resources to create many practice problems and provide feedback for them, students are often forced to search the Internet for old exams and practice sheets or even exercises from other universities. Due to the lack of feedback, this chaotic search for practice problems often ends up confusing the students rather than helping them.

**Fig. 2.** Overview of Automata Tutor v3 (our contributions in green). The teacher creates exercises on various topics. The students solve the exercises in a feedback cycle: After each attempt they are automatically graded and get personalized feedback. The teacher has access to the grade overview. For additional practice, students can generate an unlimited number of new exercises using the automatic problem generation. (Color figure online)

*Learning with Automata Tutor.* Figure 2 shows the improved interaction offered by Automata Tutor v3. Here, a teacher creates the problem instances with the


**Fig. 3.** Creating a new problem of type "PDA Construction".

help of the tool. The problems are then posed to the students and, *no matter how large a class is*, Automata Tutor automatically grades the solution attempts of students right when they are submitted and immediately gives detailed and personalized feedback for each submission. If required, e.g. for a graded homework, it is possible to restrict the number of attempts. Using this feedback, the students can immediately try the problem again and learn from their mistakes. As shown in a large user study run on the first version of Automata Tutor [6], this fast feedback cycle is encouraging for students and results in students spontaneously exploring more practice problems and engaging with the course material. Additional practice is supported by the automatic problem generation, with the

**Fig. 4.** Feedback received when solving the problem created in Fig. 3.

same level of detailed and personalized feedback as before without increasing the workload of the teacher. Furthermore, automatic problem generation can assist the teacher in creating new exercises. Finally, whenever necessary, the teacher can download an overview of all the grades.

*Improved User interface.* Automata Tutor is an online tool which runs in the most used browsers. A new collapsible navigation bar groups problems by topic, facilitating quick access to exercises and displaying the structure of the course (see Figure 6 in [5, Appendix B]). To create a new exercise, a teacher clicks the "+"-button and is presented the view of Fig. 3. In this case, the drawing canvas allows to easily specify the sample solution pushdown automaton. Similarly, when students solve this exercise, they draw their solution attempt also on the canvas. After submitting, they receive their personalized feedback and grade (see example in Fig. 4). For the automatic problem generation, a dropdown menu to select the problem type and a slider to select the difficulty is displayed together with the list of all problems the user has generated so far (see the screenshot in Figure 7 in [5, Appendix B]).

#### **3 Design**

#### **3.1 University and Course Management**

While Automata Tutor can be used for independent online practice, one of the main advantages is its infrastructure for large university courses. To this end, it is organized in *courses*. A course is created and supervised by one or more teachers. Together, they can create, test and edit exercises. The students cannot immediately see the problems, but only after the teachers have decided to pose them. This involves setting the maximum number of points, the number of allowed attempts as well as the start and end date.

To use Automata Tutor, students must have an account. One can either register by email or, in case the university supports it, login with an external login service like *LDAP* or *Oauth*. When using the login service of their university, teachers get a certified mapping from users to students and enabling teachers to use Automata Tutor v3 for grading homework or exams.

Students can enroll in a course using a password. Enrolled students see all posed problems and can solve them (using the allowed number of attempts). The final grade can be accessed by the teachers in the grade overview.

#### **3.2 New Problem Types**

In this section, we list the problem types newly added to Automata Tutor v3. They are all part of the course [10] and a detailed description of each problem can be found in [5, Appendix A], including the basic theoretical concept, how a student can solve such a problem, what a teacher has to provide to create a problem, the idea of the grading algorithm, and what feedback the tool gives.


#### **3.3 Automatic Problem Generation**

Automatic Problem Generation (APG) allows one to generate new exercises of a requested *difficulty* level and problem type. This allows students to practice independently and supports teachers when creating new exercises. While APG is currently implemented for four CFG problem types and for the problem type "While to TM", it can be easily extended to other problem types by providing the following components:


Given these components, Automata Tutor generates a new problem with a given minimum difficulty *d*min and maximum difficulty *d*max as follows. Firstly, 100 random exercises are generated. Secondly, Automata Tutor chooses exercises *E* with the best quality such that *<sup>d</sup>*min <sup>≤</sup> *diff* (*E*) <sup>≤</sup> *<sup>d</sup>*max.

Concretely, for the CFG problem types, CFGs with random productions are generated and sanitized. Resulting CFGs that do not accept any words or have too few productions are excluded using the quality metric. The difficulty metric always depends on the number of productions; additionally, depending on the exact problem type, further criteria are taken into account.

For the problem type "While to TM" we use an approach similar to the one suggested in existing tools for automatic problem generation [15,18]: We handcrafted several *base programs* which are of different difficulty level. In the generation process, the syntax tree of such a base program is abstracted and certain modifying operations are executed; these change the program without affecting the difficulty too much. E.g. we choose different variables, switch the order of if-else branches or change arithmetic operators. Then several programs are generated and those of bad quality are filtered out. A program is of bad quality if its language is trivially small or if it contains infinite loops; since detecting these properties is undecidable, we employ heuristics such as checking that the loops terminate for all inputs up to a certain size with a certain timeout.

#### **4 Implementation and Scalability**

Automata Tutor v3 is open source and it consists of a frontend, a backend, and a database. It also provides a developer's manual for creating new exercises.

The frontend, written in scala, renders the webpage. The drawing canvases for the different automata and the Turing machines rely on javascript. The frontend and backend communicate using XML objects.

The backend, written in C#, contains methods to unpack the xml of the frontend to compute the grade and feedback for solutions. It is also used to check the syntax of exercises and for the automatic problem generation. It relies on AutomataDotNet<sup>2</sup>, a library that provides efficient algorithms for automata and regular expressions.

The database keeps track of existing users, problems and courses. It uses the H2 Database Engine.

All the new parts of Automata Tutor v3 were developed and tested over the last 3 years at TU Munich, where they were used to support the introductory theoretical computer science course. This local deployment served as an important test-bed before publicly deploying the tool online at large scale. Due to its modular structure, the tool is easily scalable by having multiple frontends and backends together with a load distributor. This approach has successfully scaled to 950 concurrent student users; for this, we used 7 virtual machines: 3 hosting frontends, 3 hosting backends (each with 2 cores 2.60 GHz Intel(R) Xeon(R) CPU and 4 GB RAM), and 1 for load distribution and the database (with 4 such cores and 8 GB RAM). We will scale the number of machines based on need.

### **5 Evaluation and User Study**

**Large-Class Deployment.** In the latest iteration of the TU Munich course in 2019, we used Automata Tutor v3 (in the following denoted as AT) in a mandatory homework system for a course with about 950 students; the homework system also included written and programming exercises. In total, we posed 79 problems consisting of 18 homework and 61 practice problems. The teachers saved themselves the effort of correcting 26,535 homework exercises, and the students used AT to get personalized feedback for their work 76,507 times. On average, each student who used AT did so 107 times.

**Student Survey Results.** At the end of the course, we conducted an anonymized survey, based on the System Usability Survey [4]. 14.6% of the students in the course answered the survey, which is an ordinary rate of return for an online questionnaire, especially given that there was no incentive. The students were given statements to judge on a Likert scale from 1 to 5 (strongly disagree to strongly agree). We define "The students agreed with the following statement" to mean that the average and median scores were at least 4 and less than 10% of the students chose a score below 3. Dually, if the students disagreed with the statement with median and average score that was at most 2 and less than 10% having a score greater than 3, we say that they "agreed with the negation of the statement". For all statements that do not satisfy either of the criteria, we report mixed answers. The full survey results can be found in [5, Appendix C].

*Usability.* Regarding the usability of the tool, the students agreed with the following statements:

<sup>2</sup> https://github.com/AutomataDotNet/Automata.


However, there were lots of valuable suggestions for improvements, many of which we have implemented since then. Moreover, the survey also revealed space for improvement, in particular for streamlining as documented by the following statements where the answers were more mixed:


*Usefulness.* Regarding how useful AT was for learning, the students agreed with the following statements:


Note that there are no statements with mixed or negative answers regarding the usefulness. Additionally, as shown in Fig. 5, when we asked students about their preferred means of learning, AT gets the highest approval rate, being preferred to written or programming exercises as well as lectures.

Overall, this class deployment of Automata Tutor v3 and the accompanying surveys were great successes, and showed how the tool is of extreme value for both students and teachers, in particular for such large a course.

## **6 Conclusion**

This paper presents the third version of Automata Tutor, an online tool helping teachers and students in large automata/computation theory courses. Automata Tutor v3 now supports automated grading and feedback generation for a wide variety of problems and, for some of them, even automatic generation of new problem instances. Furthermore, it is easy to extend and we invite the community to contribute by implementing further exercises. Finally, our experience shows that Automata Tutor v3 improves the economical aspects of teaching greatly as it scales effortlessly with the number of students.

Earlier versions of Automata Tutor have already been adopted by thousands of students at dozens of schools and we hope this paper allows Automata Tutor v3 to help even more students and teachers around the world.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Seminator 2 Can Complement Generalized B¨uchi Automata via Improved Semi-determinization**

Frantiˇsek Blahoudek<sup>1</sup> , Alexandre Duret-Lutz2(B) , and Jan Strejˇcek<sup>3</sup>

<sup>1</sup> University of Texas at Austin, Austin, USA frantisek.blahoudek@gmail.com <sup>2</sup> LRDE, EPITA, Le Kremlin-Bicˆetre, France adl@lrde.epita.fr <sup>3</sup> Masaryk University, Brno, Czech Republic strejcek@fi.muni.cz

**Abstract.** We present the second generation of the tool Seminator that transforms transition-based generalized B¨uchi automata (TGBAs) into equivalent semi-deterministic automata. The tool has been extended with numerous optimizations and produces considerably smaller automata than its first version. In connection with the state-of-the-art LTL to TGBAs translator Spot, Seminator 2 produces smaller (on average) semi-deterministic automata than the direct LTL to semi-deterministic automata translator ltl2ldgba of the Owl library. Further, Seminator 2 has been extended with an improved NCSB complementation procedure for semi-deterministic automata, providing a new way to complement automata that is competitive with state-of-the-art complementation tools.

### **1 Introduction**

*Semi-deterministic* [24] automata are automata where each accepting run makes only finitely many nondeterministic choices. The merit of this interstage between deterministic and nondeterministic automata comes from two facts known since the late 1980s. First, every nondeterministic B¨uchi automaton with n states can be transformed into an equivalent semi-deterministic B¨uchi automaton with at most 4*<sup>n</sup>* states [7,24]. Note that asymptotically optimal determinization procedures transform nondeterministic B¨uchi automata to deterministic automata with 2O(*<sup>n</sup>* log *<sup>n</sup>*) states [24] and with a more complex (typically Rabin) acceptance condition, as deterministic B¨uchi automata are strictly less expressive. Second, some algorithms cannot handle nondeterministic automata, but they can handle semi-deterministic ones; for example, algorithms for qualitative model checking of *Markov decision processes* (MDPs) [7,29].

For theoreticians, the difference between the complexity of determinization and semi-determinization is not dramatic—both constructions are exponential. However, the difference is important for authors and users of practical automatabased tools—automata size and the complexity of their acceptance condition often have a significant impact on tool performance. This latter perspective has recently initiated another wave of research on semi-deterministic automata. Since 2015, many new results have been published: several direct translations of LTL to semideterministic automata [11,15,16,26], specialized complementation constructions for semi-deterministic automata [4,6], algorithms for quantitative model checking of MDPs based on semi-deterministic automata [13,25], a transformation of semideterministic automata to deterministic parity automata [10], and reinforcement learning of control policy using semi-deterministic automata [21].

In 2017, we introduced Seminator 1.1 [5], a tool that transforms nondeterministic automata to semi-deterministic ones. The original semi-determinization procedure of Courcoubetis and Yannakakis [7] works with standard *B¨uchi automata* (BAs). Seminator 1.1 extends this construction to handle more compact automata, namely *transition-based B¨uchi automata* (TBAs) and *transitionbased generalized B¨uchi automata* (TGBAs). TBAs use accepting transitions instead of accepting states, and TGBAs have several sets of accepting transitions, each of these sets must be visited infinitely often by accepting runs. The main novelty of Seminator 1.1 was that it performed degeneralization and semi-determinization of a TGBA simultaneously. As a result, it could translate TGBAs to smaller semi-deterministic automata than (to our best knowledge) the only other tool for automata semi-determinization called nba2ldba [26]. This tool only accepts BAs as input, and thus TGBAs must be degeneralized before nba2ldba is called.

Moreover, in connection with the LTL to TGBAs translator ltl2tgba of Spot [8], Seminator 1.1 provided a translation of LTL to semi-deterministic automata that can compete with the direct LTL to semi-deterministic TGBAs translator ltl2ldba [26]. More precisely, our experiments [5] showed that the combination of ltl2tgba and Seminator 1.1 outperforms ltl2ldba on LTL formulas that ltl2tgba translates directly to deterministic or semi-deterministic TGBA (i.e., when Seminator has no work to do), while ltl2ldba produced (on average) smaller semi-deterministic TGBAs on the remaining LTL formulas (i.e., when the TGBA produced by ltl2tgba has to be semi-determinized by Seminator).

This paper presents Seminator 2, which changes the situation. With many improvements in semi-determinization, the combination of ltl2tgba and Seminator 2 now translates LTL to smaller (on average) semi-deterministic TGBAs than ltl2ldba even for the cases when ltl2tgba produces a TGBA that is not semi-deterministic. Moreover, this holds even when we compare to ltl2ldgba, which is the current successor of ltl2ldba distributed with Owl [19].

Further, Seminator 2 now provides a new feature: *complementation of TGBAs*. Seminator 2 chains semi-determinization with the complementation algorithm called NCSB [4,6], which is tailored for semi-deterministic BAs. Our experiments show that the complementation in Seminator 2 is fully competitive with complementations implemented in state-of-the-art tools [1,8,20,23,30].

#### **2 Improvements in Semi-determinization**

First of all, we recall the definition of semi-deterministic automata and principles of the semi-determinization procedure implemented in Seminator 1.1 [5].

**Fig. 1.** Structure of a semi-deterministic automaton. The deterministic part contains all accepting transitions and states reachable from them. Cut-transitions are magenta.

Let A = (Q, Σ, δ, q0, {F1,...,F*n*}) be a TGBA over alphabet Σ, with a finite set of states Q, a transition relation δ ⊆ Q × Σ × Q, an initial state q<sup>0</sup> ∈ Q, and sets of accepting transitions F1,...,F*<sup>n</sup>* ⊆ δ. Then A is *semi-deterministic* if there exists a subset Q*<sup>D</sup>* ⊆ Q such that (i) each transition from Q*<sup>D</sup>* goes back to <sup>Q</sup>*<sup>D</sup>* (i.e., <sup>δ</sup> <sup>∩</sup> (Q*<sup>D</sup>* <sup>×</sup> <sup>Σ</sup> <sup>×</sup> (<sup>Q</sup> - Q*D*)) = ∅), (ii) all states of Q*<sup>D</sup>* are deterministic (i.e., for each q ∈ Q*<sup>D</sup>* and a ∈ Σ there is at most one q such that (q, a, q ) ∈ δ), and (iii) each accepting transition starts in a state of Q*<sup>D</sup>* (i.e., F1,...,F*<sup>n</sup>* ⊆ Q*<sup>D</sup>* × Σ × Q*D*).

The part of A delimited by states of Q*<sup>D</sup>* is called *deterministic*, while the part formed by the remaining states Q-Q*<sup>D</sup>* is called *nondeterministic*, although it could contain deterministic states too. The transitions leading from the nondeterministic part to the deterministic one are called *cut-transitions*. The structure of a semi-deterministic automaton is depicted in Fig. 1.

Intuitively, a TGBA A with a set of states Q and a single set of accepting transitions F can be transformed into a semi-deterministic TBA B as follows. First, we use a copy of A as the nondeterministic part of B. The deterministic part of B has states of the form (M,N) such that Q ⊇ M ⊇ N and M = ∅. Every accepting transition (q, a, q ) ∈ F induces a cut-transition (q, a,({q }, ∅)) of B. The deterministic part is then constructed to track all runs of A from each such state q using the powerset construction. More precisely, the first element of (M,N) tracks all runs while the second element tracks only the runs that passed some accepting transition of F. Each transition of the deterministic part, that would reach a state where M = N (so-called *breakpoint*) is replaced with an accepting transition of B leading to state (M,N ), where N tracks only the runs of A passing an accepting transition of F in the current step.

Seminator 1.1 extended this procedure to construct a semi-deterministic TBA even for a TGBA with multiple acceptance sets F1,...,F*n*. States of the deterministic part are now triples (M,N, i), where i ∈ {0,...,n − 1} is called *level* and it has a similar semantics as in degeneralization. Cut-transitions are induced by transitions of F*<sup>n</sup>* and they lead to states of the form ({q }, ∅, 0). The level i says that N tracks runs that passed a transition of F*i*+1 since the last level change. When the deterministic part reaches a state (M,N, i) with M = N, we change the level to i = (i + 1) mod n and modify N to track only runs passing F*<sup>i</sup>*-+1 in the current step. Transitions changing the level are accepting.

A precise description of these semi-determinization procedures and proofs of their correctness can be found in Blahoudek's dissertation [3]. Now we briefly explain the most important optimizations added in Seminator 2 (we work on a journal paper with their formal description). Each optimization can be enabled/disabled by the corresponding option. All of them are enabled by default.


Note that Seminator 1.1 can produce a semi-deterministic TGBA with multiple acceptance sets only when it gets a semi-deterministic TGBA as input. Seminator 2 produces such automata more often due to --reuse-deterministic.

## **3 Implementation and Usage**

Seminator 2 is an almost complete rewrite of Seminator [5], and is still distributed under the GNU GPL 3.0 license. Its distribution tarball and source code history

**Fig. 2.** Workflow for the two operation modes of seminator: semi-determinizing and complementing via semi-determinization.

are hosted on GitHub (https://github.com/mklokocka/seminator). The package contains sources of the tool with two user-interfaces (a command-line tool and Python bindings), a test-suite, and some documentation.

Seminator is implemented in C++ on top of the data-structures provided by the Spot library [8], and reuses its input/output functions, simplification algorithms, and the NCSB complementation. The main implementation effort lies in the optimized semi-determinization and an alternative version of NCSB.

The first user interface is a command-line tool called seminator. Its highlevel workflow is pictured in Fig. 2. By default (top-part of Fig. 2) it takes a TGBA (or TBA or BA) on input and produces a semi-deterministic TGBA (or TBA or BA if requested). Figure 2 details various switches that control the optional simplifications and acceptance transformations that occur before the semi-determinization itself. The pre- and post-processing are provided by the Spot library. The semi-determinization algorithm can be adjusted by additional command-line options (not shown in Fig. 2) that enable or disable optimizations of Sect. 2. As Spot simplification routines are stronger on automata with simpler acceptance conditions, it sometimes pays off to convert the automaton to TBA or BA first. If the input is a TGBA, seminator attempts three semi-determinizations, one on the input TGBA, one on its TBA equivalent, and one on its BA equivalent; only the smallest result is retained. If the input is already a TBA (resp. a BA), only the last two (resp. one) routes are attempted.

The --complement option activates the bottom part of Fig. 2 with two variants of the NCSB complementation [4]: "spot" stands for a transition-based adaptation of the original algorithm (implemented in Spot); "pldi" refers to its modification based on the optimization by Chen et al. [6, Section 5.3] (implemented in Seminator 2). Both variants take a TBA as input and produce a TBA. The options --tba and --ba apply on the final complement automaton only.

The seminator tool can now process automata in batch, making it possible to build pipelines with other commands. For instance the pipeline

ltl2tgba <input.ltl | seminator | autfilt --states=3.. >output.hoa uses Spot's ltl2tgba command to read a list of LTL formulas from input.ltl and transform it into a stream of TGBAs that is passed to seminator, which transforms them into semi-deterministic TGBAs, and finally Spot's autfilt saves into output.hoa the automata with 3 states or more.

Python bindings form the second user-interface and are installed by the Seminator package as an extension of Spot's own Python bindings. It offers several functions, all working with Spot's automata (twa graph objects):

semi determinize() implements the semi-determinization procedure;


The Python bindings integrate well with the interactive notebooks of Jupyter [17]. Figure 3 shows an example of such a notebook, using the seminator() and highlight components() functions. Additional Jupyter notebooks, distributed with the tool, document the effect of the various optimization options.<sup>1</sup>

## **4 Experimental Evaluation**

We evaluate the performance of Seminator 2 for both semi-determinization and complementation of TGBAs. We compare our tool against several tools listed in Table 1. As ltl2ldgba needs LTL on input, we used the set of 221 LTL formulas already considered for benchmarking in the literature [9,12,14,22,27]. To provide TGBAs as input for Seminator 2, we use Spot's ltl2tgba to convert the LTL formulas. Based on the automata produced by ltl2tgba, we distinguish three

<sup>1</sup> https://nbviewer.jupyter.org/github/mklokocka/seminator/tree/v2.0/notebooks/.

**Fig. 3.** Jupyter notebook illustrating a case where a nondeterministic TBA (nba, left) has an equivalent semi-deterministic TBA (sdba, middle) that is smaller than a minimal deterministic TBA (dba, right). Accepting transitions are labeled by **<sup>0</sup>** .

categories of formulas: *deterministic* (152 formulas), *semi-deterministic* but not deterministic (49 formulas), and *not semi-deterministic* (20 formulas). This division is motivated by the fact that Seminator 2 applies its semi-determinization only on automata that are not semi-deterministic, and that some complementation tools use different approaches to deterministic automata. We have also generated 500 random LTL formulas of each category.

The scripts and formulas used in those experiments can be found online,<sup>2</sup> as well as a Docker image with these scripts and all the tools installed.<sup>3</sup> All experiments were run inside the supplied Docker image on a laptop Dell XPS13 with Intel i7-1065G7, 16 GB RAM, and running Linux.

<sup>2</sup> https://github.com/xblahoud/seminator-evaluation/.

<sup>3</sup> https://hub.docker.com/r/gadl/seminator.


**Table 1.** Versions and references to the other tools used in our evaluation.

**Fig. 4.** Comparison of the sizes of the semi-deterministic automata produced by Seminator 2 and Owl for the *not semideterministic* random set.

**Table 2.** Comparison of semi-determinization tools. A benchmark set marked with x + y consists of x formulas for which all tools produced some automaton, and y formulas leading to some timeouts. A cell of the form s (m) shows the cumulative number s of states of automata produced for the x formulas, and the number m of formulas for which the tool produced the smallest automaton out of the obtained automata. The best results in each column are highlighted.


#### **4.1 Semi-determinization**

We compare Seminator 2 to its older version 1.1 and to ltl2ldgba of Owl. We do not include Buchifier [16] as it is available only as a binary for Windows. Also, we did not include nba2ldba [26] due to the lack of space and the fact that even Seminator 1.1 performs significantly better than nba2ldba [5].

Recall that Seminator 2 calls Spot's automata simplification routines on constructed automata. To get a fair comparison, we apply these routines also to the results of other tools, indicated by *+Spot* in the results. Further, ltl2ldgba of Owl can operate in two modes: --symmetric and --asymmetric. For each formula, we run both settings and pick the better result, indicated by *+best*.

Table 2 presents the cumulative results for each semi-determinization tool and each benchmark set (we actually merged *deterministic* and *semi-deterministic* benchmark sets). The timeout of 30 s was reached by Owl for one formula in


**Table 3.** Comparison of tools complementing B¨uchi automata, using the same conventions as Table 2.

the *(semi-)deterministic* category and by Seminator 1.1 for one formula in the *not semi-deterministic* category. Besides timeouts, the running times of all tools were always below 3 s, with a few exceptions for Seminator 1.1.

In the *(semi-)deterministic* category, the automaton produced by ltl2tgba and passed to both versions of Seminator is already semi-deterministic. Hence, both versions of Seminator have nothing to do. This category, in fact, compares ltl2tgba of Spot against ltl2ldgba of Owl.

Figure 4 shows the distribution of differences between semi-deterministic automata produced by Owl+best+Spot and Seminator 2 for the *not semideterministic* random set. A dot at coordinates (x, y) represents a formula for which Owl and Seminator 2 produced automata with x and y states, respectively.

We can observe a huge improvement brought by Seminator 2 in *not semideterministic* benchmarks: while in 2017 Seminator 1.1 produced a smaller automaton than Owl in only few cases in this category [5], Seminator 2 is now more than competitive despite the fact that also Owl was improved over the time.

#### **4.2 Complementation**

We compare Seminator 2 with the complementation of ROLL based on automata learning (formerly presented as Buechic), the determinization-based algorithm [23] implemented in GOAL, the asymptotically optimal Fribourg complementation implemented as a plugin for GOAL, and with Spot (autfilt --complement). We apply the simplifications from Spot to all results and we use Spot's ltl2tgba to create the input B¨uchi automata for all tools, using transition-based generalized acceptance or state-based acceptance as appropriate (only Seminator 2 and Spot can complement transition-based generalized B¨uchi automata). The timeout of 120 s was reached once by both Seminator 2 and Spot, 6 times by Fribourg, and 13 times by GOAL and ROLL.

Table 3 shows results for complementation in the same way as Table 2 does for semi-determinization. For the *deterministic* benchmark, we can see quite

**Fig. 5.** Comparison of Seminator 2 against Spot and Fribourg+Spot in terms of the sizes (i.e., number of states) of complement automata produced for the *not semideterministic* random benchmark. Note that axes are logarithmic.

**Fig. 6.** Running times of complementation tools on the 83 hard cases of the *not semideterministic* random benchmark. The running times of each tool on these cases are sorted increasingly before being plotted.

similar results from all tools but ROLL. This is caused by the fact that complementation of deterministic automata is easy. Some tools (including Spot) even apply a dedicated complementation procedure. It comes at no surprise that the specialized algorithm of Seminator 2 performs better than most other complementations in the *semi-deterministic* category. Interestingly, this carries over to the *not semi-deterministic* category. The results demonstrate that the 2-step approach of Seminator 2 to complementation performs well in practice. Figure 5 offers more detailed insight into distribution of automata sizes created by Seminator 2, Spot, and Fribourg+Spot for random benchmarks in this category.

Finally, Fig. 6 compares the running times of these tools over the 83 hard cases of *not semi-deterministic* random benchmark (a case is *hard* if at least one tool did not finish in 10 s). We can see that Seminator 2 and Spot run significantly faster than the other tools.

#### **5 Conclusion**

We have presented Seminator 2, which is a substantially improved version of Seminator 1.1. The tool now offers a competitive complementation of TGBA. Furthermore, the semi-determinization code was rewritten and offers new optimizations that significantly reduce the size of produced automata. Finally, new user-interfaces enable convenient processing of large automata sets thanks to the support of pipelines and batch processing, and versatile applicability in education and research thanks to the integration with Spot's Python bindings.

**Acknowledgment.** F. Blahoudek has been supported by the DARPA grant D19AP00004 and by the F.R.S.-FNRS grant F.4520.18 (ManySynth). J. Strejˇcek has been supported by the Czech Science Foundation grant GA19-24397S.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **RTLola Cleared for Take-Off: Monitoring Autonomous Aircraft**

Jan Baumeister<sup>1</sup> , Bernd Finkbeiner<sup>1</sup> , Sebastian Schirmer<sup>2</sup>, Maximilian Schwenger1(B) , and Christoph Torens<sup>2</sup>

<sup>1</sup> Department of Computer Science, Saarland University, 66123 Saarbr¨ucken, Germany *{*jbaumeister,finkbeiner,schwenger*}*@react.uni-saarland.de <sup>2</sup> German Aerospace Center (DLR), 38108 Braunschweig, Germany *{*sebastian.schirmer,christoph.torens*}*@dlr.de

**Abstract.** The autonomous control of unmanned aircraft is a highly safety-critical domain with great economic potential in a wide range of application areas, including logistics, agriculture, civil engineering, and disaster recovery. We report on the development of a dynamic monitoring framework for the DLR ARTIS (Autonomous Rotorcraft Testbed for Intelligent Systems) family of unmanned aircraft based on the formal specification language RTLola. RTLola is a stream-based specification language for real-time properties. An RTLola specification of hazardous situations and system failures is statically analyzed in terms of consistency and resource usage and then automatically translated into an FPGA-based monitor. Our approach leads to highly efficient, parallelized monitors with formal guarantees on the noninterference of the monitor with the normal operation of the autonomous system.

**Keywords:** Runtime verification *·* Stream monitoring *·* FPGA *·* Autonomous aircraft

#### **1 Introduction**

An unmanned aerial vehicle, commonly known as a drone, is an aircraft without a human pilot on board. While usually connected via radio transmissions to a base station on the ground, such aircraft are increasingly equipped with decision-making capabilities that allow them to autonomously carry out complex missions in applications such as transport, mapping and surveillance, or crop and irrigation monitoring. Despite the obvious safety-criticality of such systems, it is impossible to foresee all situations an autonomous aircraft might encounter and thus make a safety case purely by analyzing all of the potential behaviors in advance. A critical part of the safety engineering of a drone is therefore to carefully monitor the actual behavior during the flight, so that the health status of the system can be assessed and mitigation procedures (such as a return to the base station or an emergency landing) can be initiated when needed.

In this paper, we report on the development of a dynamic monitoring framework for the DLR ARTIS (Autonomous Rotorcraft Testbed for Intelligent Systems) family of aircraft based on the formal specification language RTLola. The development of a monitoring framework for an autonomous aircraft differs significantly from a monitoring framework in a more standard setting, such as network monitoring. A key consideration is that while the specification language needs to be highly *expressive*, the monitor must operate within strictly limited resources, and the monitor itself needs to be highly *reliable*: any interference with the normal operation of the aircraft could have fatal consequences.

A high level of expressiveness is necessary because the assessment of the health status requires complex analyses, including a cross-validation of different sensor modules such as the agreement between the GPS module and the accelerometer. This is necessary in order to discover a deterioration of a sensor module. At the same time, the expressiveness and the precision of the monitor must be balanced against the available computing resources. The reliability requirement goes beyond pure correctness and robustness of the execution. Most importantly, reliability requires that the peak resource consumption of the monitor in terms of energy, time, and space needs to be known ahead of time. This means that it must be possible to compute these resource requirements statically based on an analysis of the specification. The determination whether the drone is equipped with sufficient hardware can then be made before the flight, and the occurrence of dynamic failures such as running out of memory or sudden drops in voltage can be ruled out. Finally, the collection of the data from the on-board architecture is a non-trivial problem: While the monitor needs access to almost the complete system state, the data needs to be retrieved non-intrusively such that it does not interfere with the normal system operation.

Our monitoring approach is based on the formal stream specification language RTLola [11]. In an RTLola specification, input streams that collect data from sensors, networks, etc., are filtered and combined into output streams that contain data aggregated from multiple sources and over multiple points in time such as over sliding windows of some real-time length. Trigger conditions over these output streams then identify critical situations. An RTLola specification is translated into a monitor defined in a hardware description language and subsequently realized on an FPGA. Before deployment, the specification is checked for consistency and the minimal requirements on the FPGA are computed. The hardware monitor is then placed in a central position where as much sensor data as possible can be collected; during the execution, it then extracts the relevant information. In addition to requiring no physical changes to the system architecture, this integration incurs no further traffic on the bus.

Our experience has been extremely positive: Our approach leads to highly efficient, parallelized monitors with formal guarantees on the non-interference of the monitor with the normal operation of the autonomous system. The monitor is able to detect violations to complex specifications without intruding into the system execution, and operates within narrow resource constraints. RTLola is cleared for take-off.

#### **1.1 Related Work**

Stream-based monitoring approaches focus on an expressive specification language while handling non-binary data. Its roots lie in synchronous, declarative stream processing languages like Lustre [13] and Lola [9]. The *Copilot* framework [19] features a declarative data-flow language from which constant space and constant time C monitors are generated; these guarantees enable usage on an embedded device. Rather than focusing on data-flow, the family of Lola-languages puts an emphasis on statistical measures and has successfully been used to monitor synchronous, discrete time properties of autonomous aircraft [1,23]. In contrast to that, RTLola [12,22] supports real-time capabilities and efficient aggregation of data occurring with arbitrary frequency, while forgoing parametrization for efficiency [11]. RTLola can also be compiled to VHDL and subsequently realized on an FPGA [8].

Apart from stream-based monitoring, there is a rich body of monitoring based on real-time temporal logics [2,10,14–16,20] such as Signal Temporal Logic (STL) [17]. Such languages are a concise way to describe temporal behaviors with the shortcoming that they are usually limited to qualitative statements, i.e. boolean verdicts. This limitation was addressed for STL [10] by introducing a quantitative semantics indicating the robustness of a satisfaction. To specify continuous signal patterns, specification languages based on regular expressions can be beneficial, e.g. Signal Regular Expressions (SRE) [5]. The R2U2 tool [18] stands out in particular as it successfully brought a logic closely related to STL onto unmanned aerial systems as an external hardware implementation.

#### **2 Setup**

The Autonomous Rotorcraft Testbed for Intelligent Systems (ARTIS) is a platform used by the Institute of Flight Systems of the German Aerospace Center (DLR) to conduct research on autonomous flight. It consists of a set of unmanned helicopters and fixed-wing aircraft of different sizes which can be used to develop new techniques and evaluate them under real-world conditions.

The case study presented in this paper revolves around the superARTIS, a large helicopter with a maximum payload of 85 kg, depicted in Fig. 1. The high payload capabilities allow the aircraft to carry multiple sensor systems, computational resources, and data links. This extensive range of avionic equipment plays an important role in improving the situational awareness of the aircraft [3] during the flight. It facilitates safe autonomous research missions which include flying in urban or maritime areas, alone or with other aircraft. Before an actual flight test, software- and hardware-in-the-loop simulations, as well as real-time logfile replays strengthen confidence in the developed technology.

#### **2.1 Mission**

One field of application for unmanned aerial vehicles (UAVs) is reconnaissance missions. In such missions, the aircraft is expected to operate within a fixed area in which it can cause no harm. The polygonal boundary of this area is called a geo-fence. As soon as the vehicle passes the geo-fence, mitigation procedures need to be initiated to ensure that the aircraft does not stray further away from the safe area.

The case study presented in this paper features a reconnaissance mission. Figure 2 shows the flight path (blue line) within a geo-fence (red line). Evidently, the aircraft violates the fence several times temporarily. A reason for this can be flawed position estimation: An aircraft estimates its position based on several factors such as landmarks detected optically or GPS sensor readings. In the latter case, GPS satellites send position and time information to earth. The GPS module uses this data to compute the aircraft's absolute position with trilateration. However, signal reflection or a low number of GPS satellites in range can result in imprecisions in the position approximation. If the aircraft is continuously exposed to imprecise position updates, the error adds up and results in a strong deviation from the expected flight path.

The impact of this effect can be seen in Fig. 3. It shows the velocity of a ground-borne aircraft in an enclosed backyard according to its GPS module.<sup>1</sup> During the reported period of time, the aircraft was pushed across the backyard by hand. While the expected graph is a smooth curve, the actual measurements show an erratic curve with errors of up to *<sup>±</sup>*1.5 ms−<sup>1</sup>, which can be mainly attributed to signals being reflected on the enclosure. The strictly positive trend of the horizontal velocity can explain strong deviations from the desired flight path seen in Fig. 3.

A counter-measure to these imprecisions is the cross-validation of several redundant sensors. As an example, rather than just relying on the velocity reported by a GPS module, its measured velocity can be compared to the integrated output of an accelerometer. When the values deviate strongly, the values can be classified as less reliable than when both sensors agree.

#### **2.2 Non-Intrusive Instrumentation**

When integrating the monitor into an existing system, the system architecture usually cannot be altered drastically. Moreover, the monitor should not interfere with the regular execution of the system, e.g. by requiring the controller to send explicit messages to it. Such a requirement could offset the timing behavior and thus have a negative impact on the overall performance of the system.

The issue can be circumvented by placing the monitor at a point where it can access all data necessary for the monitoring process non-intrusively. In the case of the superARTIS, the logger interface provides such a place as it compiled the data of all position-related sensors as well as the output of the position estimation [3,4]. Figure 4 outlines the relevant data lines of the aircraft. Sensors were polled with fixed frequencies of up to 100 Hz. The schematic shows that the logger explicitly sends data to the monitor. This is not a strict requirement of

<sup>1</sup> GPS modules only provide absolute position information; the first derivative thereof, however, is the velocity.

**Fig. 1.** DLR's autonomous superAR-TIS equipped with optical navigation.

**Fig. 2.** Reconnaissance mission for a UAV. The thin blue line represents its trajectory, the thick red line a geofence.

the monitor as it could be connected to the data buses leading to the logger and passively read incoming data packets. However, in the present setting, the logger did not run at full capacity. Thus sending information to the monitor came at no relevant cost while requiring few hardware changes to the bus layout.

In turn, the monitor provides feedback regarding violations of the specification. Here, we distinguish between different timing behaviors of triggers. The monitor evaluates event-based triggers whenever the system passes new events to the monitor and immediately replies with the results. For periodic triggers, i.e. , those annotated with an evaluation frequency, the evaluation is decoupled from the communication between monitor and system. Thus, the monitor needs to wait until it receives another event until reporting the verdict. This incurs a short delay between detection and report.

#### **2.3 StreamLAB**

StreamLAB<sup>2</sup> [11] is a monitoring framework revolving around the streambased specification language RTLola. It emphasizes on analyses conducted before deployment of the monitor. This increases the confidence in a successful execution by providing information to aid the specifier. To this end, it detects inconsistencies in the specification such as type errors, e.g. an lossy conversion of a floating point number to an integer, or timing errors, e.g. accessing values that might not exist. Further, it provides two execution modes: an interpreter and an FPGA compilation. The interpreter allows the specifier to validate their specification. For this, it requires a *trace*, i.e. a series of data that is expected to occur during an execution of the system. It then checks whether a trace complies with the specification and reports the points in time when specified bounds are violated. After successfully validating the specification, it can be compiled into VHDL code. Yet again, the compiled code can be analyzed with respect to the space and power consumption. This information allows for evaluating whether the available hardware suffices for running the RTLola monitor.

<sup>2</sup> www.stream-lab.eu.

**Fig. 3.** Line plot of the horizontal and vertical speed calculated by a GPS receiver.

**Fig. 4.** Overview of data flow in system architecture.

An RTLola specification consists of input and output streams, as well as trigger conditions. *Input* streams describe data the system produces asynchronously and provides to the monitor. *Output* streams use this data to assess the health state of the system e.g. by computing statistical information. *Trigger* conditions distinguish desired and undesired behavior. A violation of the condition issues an alarm to the system.

The following specification declares a floating point input stream height representing sensor readings of an altimeter. The output stream avg\_height computes the average value of the height stream over two minutes. The aggregation is a sliding window computed once per second, as indicated with the @1Hz annotation.<sup>3</sup> The stream δheight computes the difference between the average and the current height. A strong deviation of these values constitutes a suspicious jump in sensor readings, which might indicate a faulty sensor or an unexpected loss or gain in height. In this case, the trigger in the specification issues a warning to the system, which can initiate mitigation measures.

```
input height: Float32
output avg_height @1Hz := height.aggregate(over: 2min, using: avg)
output δheight := abs(avg_height.hold().defaults(to: height) - height)
trigger δheight > 50.0 "WARNING: Suspicious jump in height."
```
Note that this is just a brief introduction to RTLola and the StreamLAB framework. For more details, the authors refer to [8,11,12,22].

#### **2.4 FPGA as Monitoring Platform**

An RTLola specification can be compiled into the hardware description language VHDL and subsequently realized on an FPGA as proposed by Baumeister et al. [8]. An FPGA as target platform for the monitor has several advantages

<sup>3</sup> Details on how such a computation can cope with a statically-bounded amount of memory can be found in [12,22].

in terms of improving the development process, reducing its cost, and increasing the overall confidence in the execution.

Since the FPGA is a separate module and thus decoupled from the control software, these components do not share processor time or memory. This especially means that control and monitoring computations happen in parallel. Further, the monitor itself parallelizes the computation of independent RTLola output streams with almost no additional overhead. This significantly accelerates the monitoring process [8]. The compiled VHDL specification allows for extensive static analyses. Most notably, the results include whether the board is sufficiently large in terms of look-up tables and storage capabilities to host the monitor, and the power consumption when idle or at peak performance. Lastly, an FPGA is the sweet spot between generality and specificity: it runs faster, is lighter, and consumes less energy than general purpose hardware while retaining a similar time-to-deployment. The latter combined with a drastically lower cost renders the FPGA superior to application-specific integrated circuits (ASIC) during development phase. After that, when the specification is fixed, an ASIC might be considered for its yet increased performance.

#### **2.5 RTLola Specifications**

The entire specification for the mission is comprised of three sub-specifications. This section briefly outlines each of them and explains representative properties in Fig. 5. The complete specifications as well as a detailed description were presented in earlier work [6,21] and the technical report of this paper [7].


Figure 5 points out some representative sub-properties of the previously described specification in RTLola, which are too long to discuss them in detail. It contains a validation of GPS readings as well as a cross-validation of the GPS module against the Inertial Measurement Unit (IMU). The specification declares **input** gps x: Float16 // Absolute x positive from GPS module **input** num sat : UInt8 // Number of GPS satellites in range **input** imu acc x: Float32 // Acceleration in x direction from IMU // Check if the GPS module emitted few readings in the last 3s. **trigger** @1Hz gps x.aggregate(over: 3s, using: count) < 10 "VIOLATION: Few GPS updates " // 1 if there are few GPS Satellites in range, otherwise 0. **output** few sat: UInt8 := Int(num sat < 9) // Check if there rarely were enough GPS satellites in range. **trigger** @1Hz few sat.aggregate(over: 5s, using: Σ) > 12 "WARNING: Unreliable GPS data." // Integrate acceleration twice to obtain absolute position. **output** imu vel x@1Hz := imu acc x.aggregate(over: ∞, using: - ) **output** imu x@1Hz := imu vel x.aggregate(over: ∞, using: - ) // Issue an alarm if readings from GPS and IMU disagree. **trigger** abs(imu x − gps x) > 0.5 "VIOLATION: GPS and IMU readings deviate."

**Fig. 5.** An RTLola specification validating GPS sensor data and cross validating readings from the GPS module and IMU.

three input streams, the x-position and number of GPS satellites in range from the GPS module, and the acceleration in x-direction according to the IMU.

The first trigger counts the number of updates received from the GPS module by counting how often the input stream gps\_x gets updated to validate the timing behavior of the module.

The output stream few\_sat computes the indicator function for num\_sat < 9, which indicates that the GPS module might report unreliable data due to few satellites in reach. If this happens more than 12 times within five seconds, the next trigger issues a warning to indicate that the incoming GPS values might be inaccurate. The last trigger checks whether the double integral of the IMU acceleration coincides with the GPS position up to a threshold of 0.5 m.

#### **2.6 VHDL Synthesis**

The specifications mentioned above were compiled into VHDL and realized on the Xilinx ZC702 Base Board<sup>4</sup>. The following table details the resource consumption of each sub-specification reported by the synthesis tool Vivado. The number of flip-flops (FF) indicates the memory consumption in bits; neither specification requires more than 600B of memory. The number of LUTs (Lookup Tables) is an indicator for the complexity of the logic. The sensor validation, despite being significantly longer than the cross-validation, requires the least

<sup>4</sup> https://www.xilinx.com/support/documentation/boards and kits/zc702 zvik/ug8 50-zc702-eval-bd.pdf.


amount of LUTs. The reason is that its computations are simple in comparison: Rather than computing sliding window aggregations or line intersections, it mainly consists of simple thresholding. The number of multiplexers (MUX) reflects this as well: Since thresholding requires comparisons, which translate to multiplexers, the validation requires twice as many of them. Lastly, the power consumption of the monitor is extremely low: When idle, neither specification requires more than 156mW and even under peak pressure, the power consumption does not exceed 2.1W. For comparison, a Raspberry Pi needs between 1.1W (Model 2B) and 2.7W (Model 4B) when idle and roughly twice as much under peak pressure, i.e., 2.1W and 6.4W, respectively.<sup>5</sup>

Note that the geo-fence specification checks for 12 intersections in parallel, one for each face of the fence (cf. Fig. 2). Adapting the number of faces allows for scaling the amount of FPGA resources required, as can be seen in Fig. 6a. The graph does not grow linearly because the realization problem of VHDL code onto an FPGA is a multi-dimensional optimization problem with several pareto-optimal solutions. Under default settings, the optimizer found a solution for four faces that required fewer LUTs than for three faces. At the same time, the worst negative slack time (WNST) of the four-face solution was lower than the WNST for the three-face solution as well (cf. Fig. 6b), indicating that the former performs worst in terms of running time.

#### **3 Results**

As the title of the paper suggests, the superARTIS with the RTLola monitor component is cleared to fly and a flight test is already scheduled. In the meantime, the monitor was validated on log files from past missions of the superARTIS replayed under realistic conditions. During a flight, the controller polls samples from sensors, estimates the current position, and sends the respective data to the logger and monitor. In the replay setting, the process remains the same except for one detail: Rather than receiving data from the actual sensors, the data sent to the controller is read from a past log file in the same frequency in which they were recorded. The timing and logging behavior is equivalent to a real execution. This especially means that the replayed data points will be recorded again in the same way. Control computations take place on a machine identical to the one on the actual aircraft. As a result, from the point of view of the monitor, the replay mode and the actual flight are indistinguishable. Note that the setup

<sup>5</sup> Information collected from https://www.pidramble.com/wiki/benchmarks/powerconsumption in January, 2020.

**Fig. 6.** Result of the static analysis for different amounts of face of the geo-fence.

is open-loop, i.e. , the monitor cannot influence the running system. Therefore, the replay mode using real data is more realistic than a high-fidelity simulation.

When monitoring the geo-fence of the reconnaissance mission in Fig. 2, all twelve face crossings were detected successfully. Additionally, when replaying the sensor data of the experiment in the enclosed backyard from Sect. 2.1, the erratic GPS sensor data lead to 113 violations regarding the GPS module on its own. Note that many of these violations point to the same culprit: a low number of available GPS satellites, for example, correlates with the occurrence of peaks in the GPS velocity. Moreover, the cross validation issued another 36 alarms due to a divergence of IMU and GPS readings. Other checks, for example detecting a deterioration of the GPS module based on its output frequency, were not violated in either flight and thus not reported.

#### **4 Conclusion**

We have presented the integration of a hardware-based monitor into the super-ARTIS UAV. The distinguishing features of our approach are the high level of expressiveness of the RTLola specification language combined with the formal guarantees on the resource usage. The comprehensive tool framework facilitates the development of complex specifications, which can be validated on log data before they get translated into a hardware-based monitor. The automatic analysis of the specification derives the minimal requirements on the development board needed for safe operation. If they are met, the specification is realized on an FPGA and integrated into the superARTIS architecture. Our experience shows that the overall system works correctly and reliably, even without thorough system-level testing. This is due to the non-interfering instrumentation, the validated specification, and the formal guarantees on the absence of dynamic failures of the monitor.

**Acknowledgments.** This work was partially supported by the German Research Foundation (DFG) as part of the Collaborative Research Center Foundations of Perspicuous Software Systems (TRR 248, 389792660), and by the European Research Council (ERC) Grant OSARES (No. 683300).

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Realizing** *ω***-regular Hyperproperties**

Bernd Finkbeiner , Christopher Hahn , Jana Hofmann(B) , and Leander Tentrup

> Reactive Systems Group, Saarland University, Saarbr¨ucken, Germany {finkbeiner,hahn,hofmann, tentrup}@react.uni-saarland.de

**Abstract.** We study the expressiveness and reactive synthesis problem of HyperQPTL, a logic that specifies ω-regular hyperproperties. HyperQPTL is an extension of linear-time temporal logic (LTL) with explicit trace and propositional quantification and therefore *truly* combines trace relations and ω-regularity. As such, HyperQPTL can express promptness, which states that there is a common bound on the number of steps up to which an event must have happened. We demonstrate how the HyperQPTL formulation of promptness differs from the type of promptness expressible in the logic Prompt-LTL. Furthermore, we study the realizability problem of HyperQPTL by identifying decidable fragments, where one decidable fragment contains formulas for promptness. We show that, in contrast to the satisfiability problem of HyperQPTL, propositional quantification has an immediate impact on the decidability of the realizability problem. We present a reduction to the realizability problem of HyperLTL, which immediately yields a bounded synthesis procedure. We implemented the synthesis procedure for HyperQPTL in the bounded synthesis tool BoSy. Our experimental results show that a range of arbiter satisfying promptness can be synthesized.

#### **1 Introduction**

Hyperproperties [5], which are mainly studied in the area of secure information flow control, are a generalization from trace properties to *sets* of trace properties. That is, they relate multiple execution traces with each other. Examples are noninterference [20], observational determinism [34], symmetry [16], or promptness [24], i.e., properties whose satisfaction cannot be determined by analyzing each execution trace in isolation.

A number of logics have been introduced to express hyperproperties (examples are [4,19,25]). They either add explicit trace quantification to a temporal logic or build on monadic first-order or second-order logics and add an equallevel predicate, which connects traces with each other. A comprehensive study comparing such hyperlogics has been initiated in [6].

This work was partially supported by the Collaborative Research Center "Foundations of Perspicuous Software Systems" (TRR 248, 389792660) and by the European Research Council (ERC) Grant OSARES (No. 683300).

The most prominent hyperlogic is HyperLTL [4], which extends classic linear-time temporal logic (LTL) [26] with trace variables and explicit trace quantification. HyperLTL has been successfully applied in (runtime) verification, (e.g., [15,21,32]), specification analysis [11,14], synthesis [12,13], and program repair [1] of hyperproperties. As an example specification, the following HyperLTL formula expresses observational determinism by stating that for every pair of traces, if the observable inputs I are the same on both traces, then also the observable outputs O have to agree

$$\forall \pi \forall \pi'. \Box(I\_{\pi} = I\_{\pi'}) \to \Box(O\_{\pi} = O\_{\pi'}) \ . \tag{1}$$

Thus, hyperlogics can not only specify functional correctness, but may also enforce the absence of information leaks or presence of information propagation. There is a great practical interest in information flow control, which makes synthesizing implementations that satisfy hyperproperties highly desirable. Recently [12], it was shown that the synthesis problem of HyperLTL, although undecidable in general, remains decidable for many fragments, such as the ∃<sup>∗</sup>∀ fragment. Furthermore, a *bounded synthesis* procedure was developed, for which a prototype implementation based on BoSy [7,9,12] showed promising results.

HyperLTL is, however, intrinsically limited in expressiveness. For example, promptness is not expressible in HyperLTL. Promptness is a property stating that there is a bound b, common for all traces, on the number of steps up to which an event e must have happened. Additionally, just like LTL, HyperLTL can express neither ω-regular nor epistemic properties [2,29]. Epistemic properties are statements about the transfer of knowledge between several components. An exemplary epistemic specification is described by the *dining cryptographers problem* [3]: three cryptographers sit at a table in a restaurant. Either one of the cryptographers or, alternatively, the NSA must pay for their meal. The question is whether there is a protocol where each cryptographer can find out whether the NSA or one of the cryptographers paid the bill, without revealing the identity of the paying cryptographer.

In this paper, we explore HyperQPTL [6,29], a hyperlogic that is more expressive than HyperLTL. Specifically, we study its expressiveness and reactive synthesis problem. HyperQPTL extends HyperLTL with quantification over sequences of new propositions. What makes the logic particularly expressive is the fact that the trace quantifiers and propositional quantifiers can be freely interleaved. With this mechanism, HyperQPTL can not only express all ωregular properties over a sequences of n-tuples; it truly interweaves trace quantification and ω-regularity. For example, promptness can be stated as the following HyperQPTL formula:

$$
\exists b. \forall \pi. \; \bigotimes b \land \left(\neg b \; \mathcal{U} \, e\_{\pi}\right) \; . \tag{2}
$$

The formula states that there exists a sequence s <sup>∈</sup> (2{q})<sup>ω</sup>, such that event <sup>e</sup> holds on all traces before the first occurrence of b in s. In this paper, we argue that the type of promptness expressible in HyperQPTL is incomparable to the expressiveness of Prompt-LTL [24], a logic introduced to express promptness

**Fig. 1.** The realizability problem of HyperQPTL. Left and below of the solid line are the decidable fragments, right above the solid line the undecidable fragments.

properties. It is further known that HyperQPTL also subsumes epistemic extensions of temporal logics such as LTL<sup>K</sup> [22], as well as the first-order hyperlogic FO[<, E] [6,19,29]. Its expressiveness makes HyperQPTL particularly interesting. The model checking problem of HyperQPTL is, despite the logic being quite expressive, decidable [29]. We also explore an alternative definition of HyperQPTL that would result in an even more expressive logic. However, we show that the logic would have an undecidable model checking problem, which constitutes a major drawback in the context of computer-aided verification. Furthermore, satisfiability is decidable for large fragments of the logic [6]. Decidable HyperQPTL fragments can be described solely in terms of their *trace* quantifier prefix. This indicates that propositional quantification has no negative impact on the decidability, although it greatly increases the expressiveness. We establish that propositional quantification, in contrast to the satisfiability problem, has an impact on the realizability problem: it becomes undecidable when combining a propositional ∀∃ quantifier alternation with a single universal trace quantifier. However, we show that the synthesis problem of large HyperQPTL fragments remains decidable, where one of these fragments contains promptness properties. We partially obtain these results by reducing the HyperQPTL realizability problem to the HyperLTL realizability problem. Based on this reduction, we extended the BoSy bounded synthesis tool to also synthesize systems respecting HyperQPTL specifications. We provide promising experimental results of our prototype implementation: using BoSy and HyperQPTL specifications, we were able to synthesize arbiters that respect promptness.

This paper is structured as follows. In Sect. 2, we give necessary preliminaries. In Sect. 3, we define HyperQPTL. We discuss an alternative approach to define a logic expressing ω-regular hyperproperties, before pointing out that its model checking problem is undecidable. Subsequently, we give examples for the expressiveness of HyperQPTL, namely by characterizing the type of promptness properties HyperQPTL can express. Additionally, we recapitulate how HyperQPTL also subsumes epistemic properties. Section 4 discusses the realizability problem of HyperQPTL. We describe HyperQPTL fragments in terms of their quantifier prefixes. To present our results, we use the following notation. We write ∀<sup>π</sup>

and ∀<sup>q</sup> for a single universal trace and propositional quantifier, respectively. To denote a sequence of universal trace and propositional quantifiers, we write ∀<sup>∗</sup> π and ∀<sup>∗</sup> <sup>q</sup> . Furthermore, we use ∀<sup>∗</sup> π/q for a sequence of mixed universal quantification. We use the analogous notation for existential quantifiers. Lastly, Q<sup>∗</sup> <sup>π</sup> and Q∗ <sup>q</sup> denote a sequence of mixed universal and existential trace and propositional quantifiers, respectively. As an example, the ∀<sup>∗</sup> <sup>π</sup>Q<sup>∗</sup> <sup>q</sup> fragment denotes all formulas of the form <sup>∀</sup>π<sup>1</sup>.... <sup>∀</sup>πm. <sup>∃</sup>/∀q<sup>1</sup>.... <sup>∃</sup>/∀qn. ϕ, where <sup>ϕ</sup> is quantifier free. Figure <sup>1</sup> summarizes our results. We establish that a major factor for the decidability of the realizability problem consists in the number of universal trace occurring in a formula. Realizability of HyperQPTL formulas without <sup>∀</sup>π quantifiers is decidable (Sect. 4.1). Formulas with a single <sup>∀</sup>π are decidable if they belong to the ∃∗ q/π∀<sup>∗</sup> <sup>q</sup>∀<sup>π</sup>Q<sup>∗</sup> <sup>q</sup> fragment. This fragment also contains promptness. For more than one universal trace quantifier, we show that decidability can be guaranteed for a fragment that we call the linear ∀<sup>∗</sup> <sup>π</sup>Q<sup>∗</sup> <sup>q</sup> fragment. We also show that all the above fragments are tight, i.e., realizability of all other formulas is in general undecidable. Lastly, Sect. 5 presents experiments for the prototype implementation of our bounded synthesis algorithm for HyperQPTL.

#### **2 Preliminaries**

We use AP for a set of atomic propositions. A *trace* over AP is an infinite sequence t <sup>∈</sup> (2AP)<sup>ω</sup>. For <sup>i</sup> <sup>∈</sup> <sup>N</sup>, we write <sup>t</sup>[i] for the <sup>i</sup>th element of <sup>t</sup> and <sup>t</sup>[i,∞] for the suffix of t starting from position i. For two traces t, t over AP and a set AP' <sup>⊆</sup> AP, we write t <sup>=</sup> AP'<sup>t</sup> to indicate that t and t agree on all a <sup>∈</sup> AP', and respectively T <sup>=</sup> AP'T for two sets of traces <sup>T</sup> and <sup>T</sup> . Furthermore, we define a replacement function <sup>t</sup>[<sup>q</sup> <sup>→</sup> <sup>t</sup><sup>q</sup>] that given a trace <sup>t</sup> and a trace <sup>t</sup><sup>q</sup> <sup>∈</sup> (2{q})<sup>ω</sup>, replaces the occurrences of <sup>q</sup> in <sup>t</sup> according to <sup>t</sup><sup>q</sup>, such that <sup>t</sup>[<sup>q</sup> <sup>→</sup> <sup>t</sup><sup>q</sup>] = {q}t<sup>q</sup> and t[q <sup>→</sup> t<sup>q</sup>] = AP\{q}t. We also lift this notation to sets of traces and define T[q <sup>→</sup> t<sup>q</sup>] = {t[<sup>q</sup> <sup>→</sup> <sup>t</sup><sup>q</sup>] <sup>|</sup> <sup>t</sup> <sup>∈</sup> <sup>T</sup>}.

QPTL [31] extends Linear Temporal Logic (LTL) with quantification over propositions. QPTL formulas ϕ are defined as follows.

$$\begin{aligned} \varphi &:= \exists q. \varphi \mid \forall q. \varphi \mid \psi \\ \psi &:= q \mid \neg \psi \mid \psi \lor \psi \mid \mathsf{O} \psi \mid \mathsf{O} \psi \end{aligned}$$

where q <sup>∈</sup> AP and AP is a set of atomic propositions. For simplicity, we assume that variable names in formulas are cleared of double occurrences. The semantics of ϕ over AP is defined with respect to a trace t <sup>∈</sup> (2AP)<sup>ω</sup>.


We did not define the until operator U as native part of the logic. It can be derived using propositional quantification [23]. The boolean connectives <sup>∧</sup>,→,<sup>↔</sup> and the temporal operators globally and release R are derived as usually.

#### **3** *ω***-Regular Hyperproperties**

Just like LTL, HyperLTL cannot express ω-regular languages [29]. LTL can be extended to QPTL by adding quantification over atomic propositions. In QPTL, ω-regular languages become expressible. We therefore study HyperQPTL [6,29], the extension of HyperLTL with propositional quantification, to express ω-regular hyperproperties. Given a set AP of atomic propositions and a set <sup>V</sup> of trace variables, the syntax of HyperQPTL is defined as follows

$$\begin{aligned} \varphi &::= \forall \pi.\varphi \mid \exists \pi.\varphi \mid \forall q.\varphi \mid \exists q.\varphi \mid \psi\\ \psi &::= a\_{\pi} \mid q \mid \neg\psi \mid \psi \lor \psi \mid \mathsf{O}\psi \mid \mathsf{O}\psi \mid , \end{aligned}$$

where a, q <sup>∈</sup> AP and π ∈ V. As for QPTL, we assume that formulas are cleared of double occurrences of variable names. We require that in well-defined HyperQPTL formulas, each <sup>a</sup><sup>π</sup> is in the scope of a trace quantifier binding <sup>π</sup> and each q is in the scope of a propositional quantifier binding q. Note that atomic propositions <sup>a</sup><sup>π</sup> refer to a quantified trace <sup>π</sup>, whereas quantified propositional variables q are independent of the traces. The semantics of a welldefined HyperQPTL formula over AP is defined with respect to a set of traces T <sup>⊆</sup> (2AP)<sup>ω</sup> and an assignment function <sup>Π</sup> : V → <sup>T</sup>. We define the satisfaction relation Π, i <sup>|</sup>=<sup>T</sup> <sup>ϕ</sup> as follows:


Note that the semantics of propositional quantification is defined in such a way that in the scope of a quantifier binding q, all traces agree on their q-sequence. We say that a set of traces <sup>T</sup> satisfies a HyperQPTL formula <sup>ϕ</sup> if <sup>∅</sup>, <sup>0</sup> <sup>|</sup>=<sup>T</sup> <sup>ϕ</sup>, where <sup>∅</sup> is the empty trace assignment. QPTL formulas can be expressed in HyperQPTL using a single universal trace quantifier. Furthermore, HyperLTL [4] is the syntactic subset of HyperQPTL that does not contain propositional quantification.

While HyperQPTL can express a wide range of properties (see Sect. 3.1), its model checking problem is still decidable [29]. Furthermore, the syntactic fragments for which satisfiability is decidable can be expressed solely in terms of the occurring trace quantifiers: Just like for HyperLTL, satisfiability of a HyperQPTL formula is decidable if no <sup>∀</sup>π is followed by an <sup>∃</sup>π [6].

The definition of HyperQPTL is straightforward, however, one could argue that it is not the only way to extend QPTL to a hyperlogic. The original idea of QPTL is to "color" the trace by introducing additional atomic propositions. The way HyperQPTL is defined, that idea is translated to sets of traces by coloring the traces uniformly. An alternative approach could be to color every trace individually by introducing a full atomic proposition for every propositional quantification. This resembles full second-order quantification and would therefore result in a considerably more expressive logic. In particular, we show that the model checking problem would become undecidable, which is, especially in the context of automatic verification, unfavorable. For the remainder of this section, we call the logic resulting from the alternative definition HyperQPTL**+**. The syntax of HyperQPTL**<sup>+</sup>** is similar to the one of HyperQPTL, just without the rule q for the evaluation of the propositional variables. This accounts for the idea that the propositional quantification can freely reassign atomic propositions; thus, there is no need to distinguish between free atomic propositions and quantified atomic propositions:

$$\begin{aligned} \varphi &::= \forall \pi.\varphi \mid \exists \pi.\varphi \mid \forall a.\varphi \mid \exists a.\varphi \mid \psi\\ \psi &::= a\_{\pi} \mid \neg\psi \mid \psi \lor \psi \mid \mathsf{O}\,\psi \mid \mathsf{O}\,\psi \; . \end{aligned}$$

Semantically, only the rules for the quantification of the propositional quantifiers change:

$$\begin{aligned} \Pi, i \mid & \vdash\_T \exists a. \varphi & \quad \text{iff} \qquad \exists T' \subseteq (2^{\text{AP}})^\omega. \; T' = \_{\text{AP}}\backslash\!\!/ a \; \!/ \_t T \land \Pi, i \mid =\_{T'} \varphi\\ \Pi, i \mid & \vdash\_T \forall a. \varphi & \quad \text{iff} \qquad \forall T' \subseteq (2^{\text{AP}})^\omega. \; T' = \_{\text{AP}}\backslash\!\!/ a \; \!/ \_t T \to \text{II}, i \mid =\_{T'} \varphi \; \; . \end{aligned}$$

**Lemma 1.** *The HyperQPTL<sup>+</sup> model checking problem is undecidable.*

*Proof.* Given a finite Kripke structure K and a HyperQPTL**<sup>+</sup>** formula <sup>ϕ</sup>, the model checking problem asks whether the trace set T produced by K satisfies ϕ. The proof follows the undecidability proof for the model checking problem of S1S[E] [6], a logic which lifts S1S to the level of hyperlogics. We describe a reduction from the halting problem of 2-counter machines (which are Turing complete) to the HyperQPTL**<sup>+</sup>** model checking problem. A 2-counter machine (2CM) consists of a finite set of serially numbered instructions that modify two counters. A configuration of a 2CM is a triple (n, v1, v<sup>2</sup>) <sup>∈</sup> <sup>N</sup><sup>3</sup>, where n determines the next instruction to be executed, and <sup>v</sup><sup>1</sup> and <sup>v</sup><sup>2</sup> assign the counter values. Each instruction can either increase or decrease one of the counters; or test either of the counters for zero and, depending on the outcome, jump to another instruction. Furthermore, we assume a special instruction i*halt*, which indicates that the machine has reached a halting state. A 2CM halts from initial configuration <sup>s</sup><sup>0</sup> if there is a finite sequence <sup>s</sup>0,...,s<sup>n</sup> of configurations such that <sup>s</sup><sup>n</sup> is a halting configuration and <sup>s</sup><sup>i</sup>+1 is a result of applying the instruction in <sup>s</sup><sup>i</sup> to configuration <sup>s</sup><sup>i</sup>. Let <sup>M</sup> be a 2CM. We describe <sup>T</sup> and <sup>ϕ</sup> such that T <sup>|</sup><sup>=</sup> ϕ iff <sup>M</sup> halts. We choose AP = {i, c1, c<sup>2</sup>} and <sup>T</sup> is the set of all traces where each atomic proposition holds exactly once. That way, a trace t encodes a configuration of the machine: If <sup>i</sup> <sup>∈</sup> <sup>t</sup>[n], <sup>c</sup><sup>1</sup> <sup>∈</sup> <sup>t</sup>[v<sup>1</sup>], and <sup>c</sup><sup>2</sup> <sup>∈</sup> <sup>t</sup>[v<sup>2</sup>], the machine is in configuration (n, v1, v<sup>2</sup>). It is easy to see that T can be produced by a finite Kripke structure. To describe ϕ, we make two helpful observations. First, using propositional quantification, we can quantify a trace set <sup>T</sup><sup>q</sup> <sup>⊆</sup> <sup>T</sup>: a trace <sup>t</sup> is in <sup>T</sup><sup>q</sup> iff the quantified proposition <sup>q</sup> eventually occurs on <sup>t</sup>. Second, for two traces t, t <sup>∈</sup> T, we can state that t encodes a configuration which is the successor of the configuration encoded by t. Using these observations, we define ϕ <sup>=</sup> <sup>∃</sup>q.ϕ , where <sup>q</sup> encodes a set <sup>T</sup><sup>q</sup> <sup>⊆</sup> <sup>T</sup> that is supposed to describe a halting computation. To ensure that <sup>T</sup><sup>q</sup> describes a halting computation, <sup>ϕ</sup> is a conjunction of the following requirements: <sup>T</sup><sup>q</sup> must


Finiteness of <sup>T</sup><sup>q</sup> can be expressed by stating that there is an upper bound on the values of i, c<sup>1</sup>, and <sup>c</sup><sup>2</sup> on the traces in <sup>T</sup><sup>q</sup>. With the observations made before, stating the above requirements in HyperQPTL**<sup>+</sup>** now remains a straightforward exercise. Since the model checking problem of HyperQPTL**<sup>+</sup>** is undecidable, we focus on HyperQPTL to express ω-regular hyperproperties. In particular, we show that HyperQPTL can express a range of relevant properties that are neither expressible in HyperLTL, nor in QPTL.

#### **3.1 The Expressiveness of HyperQPTL**

HyperQPTL combines trace quantification with ω-regularity. The interplay between the two features enables HyperQPTL to express a variety of properties. In Sect. 1, we showed how HyperQPTL can express a form of promptness. In this section, we further elaborate on the type of properties HyperQPTL can express. In particular, we compare it to Prompt-LTL, a logic that extends LTL with bounded eventualities. Furthermore, HyperQPTL is also able to express epistemic properties by emulating the knowledge operator known from LTLK.

A straightforward class of properties HyperQPTL can express are ω-regular properties over n-tuples of quantified traces. Formulas expressing this type of properties first have a trace quantifier prefix followed by a QPTL formula, i.e., they lie in the Q<sup>∗</sup> <sup>π</sup>Q<sup>∗</sup> <sup>q</sup> fragment. This fragment of HyperQPTL corresponds to the extension of QPTL with *prenex* trace quantification. However, the true expressive power of HyperQPTL originates from the fact that we allow the trace quantifiers and propositional quantifiers to alternate.

*Promptness Properties.* Promptness properties are an example for HyperQPTL's interplay between trace quantification and propositional quantification. Promptness expresses that eventualities are fulfilled within a bounded number of steps. One way to express promptness properties is the logic Prompt-LTL, which extends LTL with the promptness operator <sup>p</sup>. A system satisfies a Prompt-LTL formula ϕ if there is a bound k such that all traces of the system fulfill the formula where each <sup>p</sup> in <sup>ϕ</sup> is replaced by <sup>≤</sup><sup>k</sup>, i.e., the system must fulfill all prompt eventualities within <sup>k</sup> steps. For example, <sup>ϕ</sup> <sup>=</sup> <sup>p</sup> <sup>ψ</sup> holds in a system if there is a bound k such that all traces of the system at all times satisfy ψ within k steps. HyperQPTL can express a different type of promptness properties. In Sect. 1, Formula 2, we showed how one can state in HyperQPTL that there is a bound, common for all traces, until which an eventuality has to be fulfilled. The idea is to quantify a new proposition b, such that the first position in which b is true serves as the bound. Compared to Prompt-LTL, HyperQPTL thus expresses a weaker form of promptness, while still being stronger than pure eventuality. This type of promptness only becomes meaningful when comparing several traces of the system: HyperQPTL can enforce that there is a common bound for all traces (the system cannot starve), but it does not make the bound explicit. The following example shows a more involved promptness property expressible in HyperQPTL.

*Example 1.* HyperQPTL can express *bounded waiting for a grant*. It states that if the system requests access to a shared resource at point in time t, then it will be granted access within a bounded amount of time. The bound may depend on the point in time t where access to the resource was requested. However, it may not depend on the current trace. We express this property in HyperQPTL as follows, also adding that the system will not request access twice without being granted access in between.

$$\forall \pi. \Box (r\_{\pi} \to \bigcirc(\neg r\_{\pi} \, \mathcal{W}g\_{\pi})) \tag{1}$$

$$\forall \pi. \exists b. \forall \pi'. \Box (r\_{\pi} \land r\_{\pi'} \to \mathsf{Q}(\mathsf{Q} \, b \land (\neg b \, \mathcal{U}g\_{\pi}) \land (\neg b \, \mathcal{U}g\_{\pi'})))\tag{2}$$

Formula 1 states that no second request is posed before being given a grant. Formula 2 expresses the bounded waiting property by universally quantifying a trace, then existentially quantifying a sequence of bounds b. Now, for every trace π , whenever π and π pose a request at the same point in time, both have to get access to the resource before b holds next. Therefore, for each point in time, there is a bound such that all traces posing a request at that point in time get access within a bounded number of steps. Note that this property differs from saying "all traces are eventually granted access", where the bound may also depend on the trace under consideration. In this scenario, each of the infinitely many traces could wait arbitrarily long for the grant. In particular, it could happen that with each trace the waiting time is longer than before.

The above example shows how the interplay of trace quantifiers and propositional quantifiers can be leveraged to express a new class of promptness properties. We finally note that compared to Prompt-LTL, HyperQPTL cannot express that all eventualities must be fulfilled within a fixed k number of steps.

**Corollary 1.** *The expressiveness of HyperQPTL and Prompt-LTL is incomparable.*

*Epistemic Properties.* Another interesting class of properties that are not expressible in HyperLTL are epistemic properties. Epistemic properties describe the knowledge of agents that interact with each other in a system. Logics that express epistemic properties are often equipped with a so-called knowledge operator, e.g., LTLK, which is LTL extended with the knowledge operator <sup>K</sup><sup>A</sup> <sup>ϕ</sup>. The operator denotes that an agent A <sup>⊆</sup> AP knows ϕ. An agent A is characterized in terms of the atomic propositions he can observe. The semantics of the operator is described with the following rule

$$[t, i \mid = \mathcal{K}\_A \; \varphi \quad \text{iff} \quad \forall t'. t[0, i] =\_A t'[0, i] \to t', i \mid = \varphi \; . \; . $$

The formula is evaluated with respect to a trace t and a position i. We omit the semantic definition for the rest of the logic, which corresponds to plain LTL. The semantic definition of the operator captures the idea that an agent knows some fact ϕ if ϕ holds on all traces that are indistinguishable for the agent.

*Example 2 (Dining Cryptographers).* The dining cryptographers problem [3] is an interesting example of how epistemic properties can characterize non-trivial

**Fig. 2.** The dining cryptographers problem with three cryptographers.

protocols. The problem describes the following situation (see Fig. 2): three cryptographers <sup>C</sup>1, C2, and <sup>C</sup><sup>3</sup> sit at a table in a restaurant and either one of cryptographers or, alternatively, the NSA paid for their meal. The task for the cryptographers is to figure out whether the NSA or one of the cryptographers paid. However, if one of the cryptographers paid, then the others must not be able to infer who it was. Each cryptographer <sup>C</sup><sup>i</sup> receives several bits of information: *paid<sup>i</sup>* indicating whether or not he pays the bill, and two secrets, each shared with one of the other cryptographers. The secrets can be used to encode the information they share as output *out* <sup>i</sup>. By combining the outputs of all cryptographers, it must become clear whether the NSA or one of the group paid. The specification of the protocol can be easily formalized in LTLK. The following formula describes the desired behavior of agent C<sup>1</sup>:

$$\begin{aligned} \textit{DC agent1} &:= \\ & \quad (paid\_{group} \wedge \neg paid\_1 \to (\mathcal{K}\_{C1}(paid\_2 \lor paid\_3) \land \neg \mathcal{K}\_{C1}(paid\_2 \land \neg \mathcal{K}\_{C1}(paid\_3)))) \\ & \quad \wedge \ (paid\_{NSA} \to \mathcal{K}\_{C1}(\neg paid\_1 \land \neg paid\_2 \land \neg paid\_3)) \ . \end{aligned}$$

The knowledge operator can also be defined for hyperlogics [29]. It receives an additional parameter π, indicating the trace the knowledge refers to. When added to HyperQPTL, it has the following semantics:

$$\{H, i \vdash\_T \mathcal{K}\_{A, \pi} \varphi \qquad \text{iff} \qquad \forall t' \in T. H(\pi)[0, i] =\_A t'[0, i] \to \Pi[\pi \mapsto t'], i \vdash\_T \varphi \quad \dots$$

The knowledge operator, however, can be encoded in HyperQPTL using propositional quantification. Epistemic problems, such as the dining cryptographers problem, can thus be expressed in HyperQPTL.

**Theorem 1 (***[29]* **).** *HyperQPTL can emulate the knowledge operator.*

*Proof.* We recap the proof from [29]: Let <sup>ϕ</sup> <sup>=</sup> <sup>Q</sup>π/q ...Qπ/q. ϕ be a HyperQPTL formula, equipped with the knowledge operator as defined above. We assume that <sup>ϕ</sup> is given in negated normal form, i.e. each <sup>K</sup>A,π occurs either in positive position or in negated form. Let u and t be fresh propositions and let π be a fresh trace variable. Recursively, we replace each knowledge operator KA,π occurring in ϕ in positive position with the following formula

$$\begin{aligned} &Q\_{\pi/q} \dots Q\_{\pi/q}. \exists u. \forall r. \forall \pi'. \,\,\varphi'[\mathcal{K}\_{A,\pi}\psi \mapsto u] \wedge \\ &\left( \begin{array}{c} (r \text{ } \mathcal{U} \ (u \wedge r \wedge \textsf{O} \Box \neg r)) \wedge \square (r \rightarrow A\_{\pi} = A\_{\pi'}) \rightarrow \square (r \wedge \textsf{O} \Box \neg r \rightarrow \psi[\pi \mapsto \pi']) \end{array} \right) \end{aligned}$$

and each KA,π occurring negatively with the following formula

$$\begin{split} &Q\_{\pi/q} \dots Q\_{\pi/q}. \exists u. \forall r. \exists \pi'. \,\,\varphi'[\neg \mathcal{K}\_{A,\pi} \psi \mapsto u] \land \\ &((r \; \mathcal{U} \;\,(u \land r \land \textsf{O} \Box \neg r)) \to \square (r \rightarrow A\_{\pi} = A\_{\pi'}) \land \square (r \land \textsf{O} \neg r \rightarrow \neg \psi[\pi \leftrightarrow \pi'])), \end{split}$$

where we use ϕ [KA,πψ <sup>→</sup> u] to denote that in ϕ , *a single* occurrence of the knowledge operator is replaced by u, and ψ[π <sup>→</sup> π ] to denote the formula where π is replaced by π . The existentially quantified proposition u indicates the points in time where the knowledge operator is supposed to hold/not hold. The universally quantified proposition r is assumed to change once from r to <sup>¬</sup>r and thereby point at one of the points in time picked by u. It is then used to compare the prefix of the old trace π and an alternative trace quantified by the trace variable π .

## **4 HyperQPTL Realizability**

In reactive synthesis, the task is, given a specification ϕ, to construct a system that satisfies the specification. More precisely, the system is assumed to receive some inputs from an environment and has to react with outputs such that the specification is fulfilled. The realizability problem asks for the existence of a so-called *strategy tree*, where the edges are labeled with all possible inputs and the task is to find a function f that labels the nodes with the corresponding outputs. Figure <sup>3</sup> shows a strategy tree for a single input bit i. We define strategies following [12]. Let a set AP = I <sup>∪</sup>˙ O be given. A *strategy* f : (2<sup>I</sup> )<sup>∗</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> maps sequences of input valuations 2<sup>I</sup> to an output valuation <sup>2</sup><sup>O</sup>. For an infinite word <sup>w</sup> <sup>=</sup> <sup>w</sup><sup>0</sup>w<sup>1</sup>w<sup>2</sup> ··· ∈ (2<sup>I</sup> )<sup>ω</sup>, the trace corresponding to a strategy f is defined as (f() <sup>∪</sup> w<sup>0</sup>)(f(w<sup>0</sup>) <sup>∪</sup> <sup>w</sup><sup>1</sup>)(f(w<sup>0</sup>w<sup>1</sup>) <sup>∪</sup> <sup>w</sup><sup>2</sup>)... <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup>. For any trace <sup>w</sup> <sup>=</sup> <sup>w</sup><sup>0</sup>w<sup>1</sup>w<sup>2</sup> ... <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> and strategy <sup>f</sup> : (2<sup>I</sup> )<sup>∗</sup> <sup>→</sup> <sup>2</sup><sup>O</sup>, we lift the set containment operator <sup>∈</sup> defining that <sup>w</sup> <sup>∈</sup> <sup>f</sup> iff <sup>f</sup>() = <sup>w</sup><sup>0</sup> <sup>∩</sup> <sup>O</sup> and <sup>f</sup>((w<sup>0</sup> <sup>∩</sup>I)···(w<sup>i</sup> <sup>∩</sup>I)) = <sup>w</sup><sup>i</sup>+1 <sup>∩</sup><sup>O</sup> for all <sup>i</sup> <sup>≥</sup> 0. We say that a strategy <sup>f</sup> satisfies a HyperQPTL formula ϕ over AP = I <sup>∪</sup>˙ O iff {w <sup>|</sup> w <sup>∈</sup> f} satisfies ϕ.

With the definition of a strategy at hand, we can define the realizability problem of HyperQPTL formally.

**Definition 1 (HyperQPTL Realizability).** *A HyperQPTL formula* ϕ *over atomic propositions AP* <sup>=</sup> I <sup>∪</sup>˙ O *is realizable if there is a strategy* f : (2<sup>I</sup> )<sup>∗</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> *that satisfies* ϕ*.*

**Fig. 3.** A strategy tree for the reactive realizability problem.

For technical reasons, we assume (without loss of generality) that quantified atomic propositions are classified as outputs, not inputs. This complies with the intuition that propositional quantifiers should be a means for additional expressiveness; they should not overwrite the inputs received from the environment. The definition of realizability of QPTL and HyperLTL specifications is inherited from the definition for HyperQPTL.

Compared to the standard realizability problem, the distributed realizability problem is defined over an architecture, containing a number of processes interacting with each other. The goal is to find a strategy for each of the processes. In the following proofs, we will make use of the distributed realizability problem of QPTL, which we therefore also define formally.

<sup>A</sup> *distributed architecture* [17,27] A over atomic propositions AP is a tuple P, p*env* , <sup>I</sup>, O, where <sup>P</sup> is a finite set of processes and <sup>p</sup>*env* <sup>∈</sup> <sup>P</sup> is a designated environment process. The functions <sup>I</sup> : P <sup>→</sup> <sup>2</sup>AP and <sup>O</sup> : <sup>P</sup> <sup>→</sup> <sup>2</sup>AP define the inputs and outputs of processes. The output of one process can be the input of another process. The output of the processes must be pairwise disjoint, i.e., for all p <sup>=</sup> p <sup>∈</sup> <sup>P</sup> it holds that <sup>O</sup>(p) ∩ O(p ) = ∅. We assume that the environment process forwards inputs to the processes and has no input of its own, i.e., <sup>I</sup>(p*env* ) = <sup>∅</sup>.

**Definition 2 (Distributed QPTL Realizability** *[17]***).** *A QPTL formula* ϕ *over free atomic propositions AP is realizable in an architecture* A <sup>=</sup> P, p*env* , <sup>I</sup>, O *if for each process* <sup>p</sup> <sup>∈</sup> <sup>P</sup>*, there is a strategy* <sup>f</sup><sup>p</sup> : (2I(p))<sup>∗</sup> <sup>→</sup> <sup>2</sup>O(p) *such that the combination of all* <sup>f</sup><sup>p</sup> *satisfies* <sup>ϕ</sup>*.*

The distributed realizability problem for QPTL is (inherited from LTL) in general undecidable [27]. However, we will use the result that the problem remains decidable for architectures without *information forks*[17]. The notion of information forks captures the flow of data in the system. Intuitively, an architecture contains an information fork if the processes cannot be ordered linearly according to their informedness. Formally, an information fork in an architecture <sup>A</sup> <sup>=</sup> P, p*env* , <sup>I</sup>, O is defined as a tuple (P , V , p, p ), where p, p are two different processes, P <sup>⊆</sup> <sup>P</sup>, and <sup>V</sup> <sup>⊆</sup> AP is disjoint from <sup>I</sup>(p) ∪ I(p ). (P , V , p, p )

(a) Information fork: An architecture with two processes; process p to produces output o from input i and p produces output o from input i . (b) No information fork: The same architecture as on the left, where the inputs of process p are changed to i and i .

#### **Fig. 4.** Distributed architectures

is an information fork if P together with the edges that are labeled with at least one variable from V forms a subgraph rooted in the environment and there exist two nodes q, q <sup>∈</sup> <sup>P</sup> that have edges to p, p , respectively, such that <sup>O</sup>(q)∩ I(p) - <sup>I</sup>(p ) and <sup>O</sup>(q )∩ I(p ) - <sup>I</sup>(p). The definition formalizes the intuition that <sup>p</sup> and <sup>p</sup> receive incomparable input bits, i.e., they have incomparable information.

*Example 3.* Two example architectures are depicted in Fig. 4 [12]. The processes in Fig. 4a receive distinct inputs and thus neither process is more informed than the other. The architecture therefore contains an information fork with P <sup>=</sup> {*env*, p, p }, V <sup>=</sup> {i, i }, q <sup>=</sup> *env*, q <sup>=</sup> *env*. The processes in Fig. 4b can be ordered linearly according to the subset relation on the inputs and thus the architecture contains no information fork.

In the following sections, we identify tight syntactic fragments of HyperQPTL for which the standard realizability problem is decidable. We give decidability proofs and show that formulas outside the decidable fragments are in general undecidable. An important aspect for decidability is the number of universal trace quantifiers that appear in the formula. We thus present our findings in three categories, depending on the number of universal trace quantifiers a formula has.

#### **4.1 No Universal Trace Quantifier**

We show that the realizability problem of any HyperQPTL formula without a ∀<sup>π</sup> quantifier is decidable. The problem is reduced to QPTL realizability.

**Theorem 2.** *Realizability of the* (∃<sup>∗</sup> <sup>π</sup>Q<sup>∗</sup> <sup>q</sup> )<sup>∗</sup> *fragment of HyperQPTL is decidable.*

*Proof.* Let a (∃<sup>∗</sup> <sup>π</sup>Q<sup>∗</sup> <sup>q</sup> )<sup>∗</sup> HyperQPTL formula <sup>ϕ</sup> over AP = <sup>I</sup> <sup>∪</sup>˙ <sup>O</sup> <sup>=</sup> {a<sup>0</sup>,...,a<sup>k</sup>} with trace quantifiers <sup>π</sup><sup>0</sup>,...π<sup>n</sup> be given. We reduce the problem to the realizability problem of QPTL, which is known to be decidable (since QPTL formulas can be translated to B¨uchi automata). The idea is to replace each existential trace quantifier <sup>∃</sup>π<sup>i</sup> with quantification of propositions <sup>a</sup><sup>0</sup> <sup>π</sup><sup>i</sup> , a<sup>1</sup> <sup>π</sup><sup>i</sup> ,...,a<sup>k</sup> <sup>π</sup><sup>i</sup> , one for each a<sup>j</sup> <sup>∈</sup> AP, thereby mimicking the quantification of a trace. To make sure that only traces from an actual strategy tree are chosen, we add a dependency formula which forces the outputs to be dependent on the inputs. The following QPTL formula implements the idea.

$$\varphi\_{QPTL} := \varphi[i \le n : \exists \pi\_i \mapsto \exists a\_{\pi\_i}^0 \dots \exists a\_{\pi\_i}^k .] \wedge$$

$$\bigwedge\_{\substack{i \le n \\ i \le n}} \bigwedge\_{j \le n} (I\_{\pi\_i} \ne I\_{\pi\_j}) \, \mathcal{R}(O\_{\pi\_i} = O\_{\pi\_j})$$

We use the notation [<sup>i</sup> <sup>≤</sup> <sup>n</sup> : <sup>∃</sup>π<sup>i</sup> → ∃a<sup>0</sup> <sup>π</sup><sup>i</sup> .... <sup>∃</sup>a<sup>k</sup> <sup>π</sup><sup>i</sup> . ] to indicate that each <sup>π</sup><sup>i</sup> for <sup>0</sup> <sup>≤</sup> i <sup>≤</sup> n is replaced with the respective series of existential propositional quantification. Furthermore, we write <sup>I</sup><sup>π</sup><sup>i</sup> <sup>=</sup> <sup>I</sup><sup>π</sup><sup>j</sup> as syntactic sugar for <sup>a</sup>∈<sup>I</sup> <sup>a</sup><sup>π</sup><sup>i</sup> <sup>a</sup><sup>π</sup><sup>j</sup> (and similarly for <sup>O</sup><sup>π</sup><sup>i</sup> <sup>=</sup> <sup>O</sup><sup>π</sup><sup>j</sup> ). We show that <sup>ϕ</sup> and <sup>ϕ</sup>*QPTL* are equirealizable. For the first direction, assume that ϕ is realizable by a strategy f. Notice that all atomic propositions in <sup>ϕ</sup>*QPTL* are bound by a propositional quantifier. Therefore, if the witness sequences for the quantified propositions can be chosen correctly, any strategy realizes ϕ*QPTL*. Propositions <sup>a</sup><sup>j</sup> <sup>π</sup><sup>i</sup> are chosen according to the witness traces of f <sup>|</sup><sup>=</sup> ϕ. Witnesses for the remaining atomic propositions are also chosen according to their witnesses from <sup>f</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup>. Now, the first conjunct of <sup>ϕ</sup>*QPTL* is fulfilled since f <sup>|</sup><sup>=</sup> ϕ holds. The second conjunct is fulfilled since any two traces <sup>π</sup><sup>i</sup>, π<sup>j</sup> of a strategy tree fulfill by construction (I<sup>π</sup><sup>i</sup> <sup>=</sup> <sup>I</sup><sup>π</sup><sup>j</sup> ) <sup>R</sup>(O<sup>π</sup><sup>i</sup> <sup>=</sup> <sup>O</sup><sup>π</sup><sup>j</sup> ). For the other direction, assume that <sup>ϕ</sup>*QPTL* is realizable (by construction independently from the strategy). Let t<sup>a</sup><sup>0</sup> π0 ,...,t<sup>a</sup><sup>k</sup> πn be the witness sequences for the respective quantified atomic propositions. The following strategy realizes ϕ.

$$f(\sigma) = \begin{cases} \{ t\_{a\_{\pi\_i}}[|\sigma|] \mid a \in O \} & \text{if for some } i \le n, \\ & \sigma = \{ t\_{a\_{\pi\_i}}[0] \mid a \in I \} \dots \{ t\_{a\_{\pi\_i}}[|\sigma|] \mid a \in I \} \\ \emptyset & \text{otherwise} \end{cases}$$

Strategy f chooses the outputs according to the witnesses for the propositions encoding the traces. Note that because of the second conjunct in ϕ*QPTL*, the output is always unique, even if several encoded traces start with the same input sequence. Now, f <sup>|</sup><sup>=</sup> ϕ holds because of the first conjunct of ϕ*QPTL*.

#### **4.2 Single Universal Trace Quantifier**

In this fragment, we allow exactly one universal trace quantifier. It is particularly interesting as it contains many promptness properties. For example, the following promptness formulation mentioned in the introduction lies within the fragment:

$$
\exists b. \forall \pi. \; \bigotimes b \land (\neg b \; \mathcal{U} \; e\_{\pi})\; \; .
$$

**Theorem 3.** *Realizability of the* <sup>∃</sup><sup>∗</sup> q/π∀<sup>∗</sup> <sup>q</sup>∀<sup>π</sup>Q<sup>∗</sup> <sup>q</sup> *fragment is decidable.*

We show the theorem in two steps. First, we generalize a proof from [12], showing that realizability of the ∃<sup>∗</sup> <sup>π</sup>∀<sup>π</sup>Q<sup>∗</sup> <sup>q</sup> fragment is decidable. Second, we show that we can reduce the realizability problem of any HyperQPTL formula to a formula where some propositional quantifiers are replaced with trace quantifiers.

**Fig. 5.** Distributed architecture encoding existential choice of traces.

#### **Lemma 2.** *Realizability of the* <sup>∃</sup><sup>∗</sup> <sup>π</sup>∀<sup>π</sup>Q<sup>∗</sup> <sup>q</sup> *fragment is decidable.*

*Proof.* The reasoning generalizes the proof in [12] showing that realizability ∃∗ <sup>π</sup>∀<sup>π</sup> HyperLTL formulas is decidable. We reduce the problem to the distributed realizability problem of QPTL without information forks, which is since QPTL is subsumed by the μ-calculus—decidable [17]. Let a HyperQPTL formula ϕ <sup>=</sup> <sup>∃</sup>π1.... <sup>∃</sup>π<sup>n</sup>. <sup>∀</sup>π.ψ over AP <sup>=</sup> <sup>I</sup> <sup>∪</sup>˙ <sup>O</sup> be given, where <sup>ψ</sup> is from the Q<sup>∗</sup> <sup>q</sup> fragment. We define a distributed architecture A over an extended set of atomic propositions AP <sup>=</sup> I <sup>∪</sup> O <sup>∪</sup> I <sup>∪</sup> O . Similarly to the proof in Theorem 2, I and <sup>O</sup> are composed of a copy of the atomic propositions for each existentially quantified variable <sup>π</sup><sup>j</sup> . Formally, <sup>I</sup> <sup>=</sup> <sup>1</sup>≤j≤<sup>n</sup>{i<sup>π</sup><sup>j</sup> <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>I</sup>} and <sup>O</sup> <sup>=</sup> <sup>1</sup>≤j≤<sup>n</sup>{o<sup>π</sup><sup>j</sup> <sup>|</sup> <sup>o</sup> <sup>∈</sup> <sup>O</sup>}. Now we define <sup>A</sup> as follows.

$$\begin{aligned} \mathcal{A} &:= \langle (p\_{env}, p\_1, p\_2), p\_{env}, \mathcal{I}, \mathcal{O}, \rangle \\ \mathcal{I} &:= (p\_1 \mapsto \emptyset, p\_2 \mapsto I) \\ \mathcal{O} &:= (p\_{env} \mapsto I, p\_1 \mapsto I' \cup O', p\_2 \mapsto O) \end{aligned}$$

The architecture is displayed in Fig. 5. The idea is that process <sup>p</sup><sup>1</sup> sets the values of all <sup>i</sup><sup>π</sup><sup>j</sup> and <sup>o</sup><sup>π</sup><sup>j</sup> (for <sup>j</sup> <sup>≤</sup> <sup>n</sup>) and thereby determines the choice for the existentially quantified traces. Process <sup>p</sup><sup>1</sup> receives no input and therefore needs to make a deterministic choice. Process <sup>p</sup><sup>2</sup> then solves the realizability of formula <sup>∀</sup>π.ψ. The following QPTL formula ϕ encodes the idea.

$$\varphi' := \psi' \land \left(\bigwedge\_{1 \le j \le n} (I\_{\pi\_j} \ne I) \mathcal{R}(O\_{\pi\_j} = O)\right),$$

where <sup>ψ</sup> is defined as <sup>ψ</sup>, where all <sup>a</sup><sup>π</sup> are replaced by <sup>a</sup> (but atomic propositions <sup>a</sup><sup>π</sup><sup>j</sup> are still part of <sup>ψ</sup> !). Note that QPTL formulas implicitly quantify over all traces universally. Similarly to the proof in Theorem 2, the second conjunct ensures that process <sup>p</sup><sup>1</sup> encodes actual paths from the strategy tree of process <sup>p</sup><sup>2</sup> (which is also the strategy tree for formula <sup>ϕ</sup>). Thus, <sup>ϕ</sup> is realizable for the distributed architecture <sup>A</sup> iff ϕ is realizable.

To state the second lemma, we need to define what it means to replace quantifiers in a formula. Let <sup>ϕ</sup> <sup>=</sup> <sup>Q</sup>π/q,...,Qπ/q. ψ be a HyperQPTL formula, and J be a set of indices such that for all j <sup>∈</sup> J, there exists a propositional quantifier <sup>∃</sup>q<sup>j</sup> or <sup>∀</sup>q<sup>j</sup> in <sup>ϕ</sup>. Furthermore, assume that no <sup>π</sup><sup>j</sup> with <sup>j</sup> <sup>∈</sup> <sup>J</sup> occurs in <sup>ϕ</sup> and that <sup>a</sup> <sup>∈</sup> AP. We denote by <sup>ϕ</sup>[J →<sup>a</sup> <sup>π</sup>] the formula where each propositional quantifier <sup>∃</sup>q<sup>j</sup> (or <sup>∀</sup>q<sup>j</sup> , respectively) with <sup>j</sup> <sup>∈</sup> <sup>J</sup> is replaced with the corresponding trace quantifier <sup>∃</sup>π<sup>j</sup> (or <sup>∀</sup>π<sup>j</sup> , respectively); and each <sup>q</sup><sup>j</sup> in <sup>ψ</sup> is replaced by *<sup>a</sup>*π<sup>j</sup> .

**Lemma 3.** *Let any HyperQPTL formula* ϕ *over AP* <sup>=</sup> I∪˙ O *and a set of indices* <sup>J</sup> *be given. If* <sup>ϕ</sup>[J →<sup>i</sup> <sup>π</sup>] *is realizable, then so is* <sup>ϕ</sup>*, where* <sup>i</sup> <sup>∈</sup> <sup>I</sup> *is an arbitrary input, assuming w.l.o.g., that* I *is non-empty.*

*Proof.* Let <sup>ϕ</sup> and <sup>J</sup> be given. Formula <sup>ϕ</sup>[J →<sup>i</sup> <sup>π</sup>] replaces the quantification over sequences (2{q})<sup>ω</sup> with trace quantification, where the trace is only used for statements about a single input i. We thus exploit the fact that in the realizability problem, there is a trace for every input sequence. Therefore, the transformed formula is equirealizable.

Now, we have everything we need to prove Theorem 3.

*Proof (of Theorem 3).* Let ϕ be a HyperQPTL formula of the <sup>∃</sup><sup>∗</sup> q/π∀<sup>∗</sup> <sup>q</sup>∀<sup>π</sup>Q<sup>∗</sup> <sup>q</sup> fragment. First, observe that in the quantifier prefix of ϕ, the <sup>∀</sup><sup>∗</sup> <sup>q</sup> quantifiers and the ∀<sup>π</sup> can be swapped. The resulting formula belongs to the ∃<sup>∗</sup> q/π∀<sup>π</sup>Q<sup>∗</sup> <sup>q</sup> fragment. By Lemma 3, the formula can be transformed to a equirealizable formula of the ∃∗ <sup>π</sup>∀<sup>π</sup>Q<sup>∗</sup> <sup>q</sup> fragment, for which realizability is decidable by Lemma 2.

Lemma 3 allows us to decide realizability of a HyperQPTL formula by replacing propositional quantifiers with trace quantifiers. Thus, we can reduce HyperQPTL realizability to HyperLTL realizability, a fact that we use in Sect. 5 to describe a bounded synthesis algorithm for HyperQPTL.

**Corollary 2.** *The realizability problem of HyperQPTL can be soundly reduced to the realizability problem of HyperLTL.*

Lastly, we show that the decidable fragment is tight in the class of formulas with a single universal trace quantifier. We do so by showing that a propositional ∀∗ q∃<sup>∗</sup> <sup>q</sup> quantifier alternation followed by a single trace quantifier ∀<sup>π</sup> leads to an undecidable realizability problem. The proof is carried out by a reduction from Post's Correspondence Problem.

**Theorem 4.** *Realizability is undecidable for HyperQPTL formulas with a single* ∀<sup>π</sup> *quantifier outside the* ∃<sup>∗</sup> q/π∀<sup>∗</sup> <sup>q</sup>∀<sup>π</sup>Q<sup>∗</sup> <sup>q</sup> *fragment.*

*Proof.* Inherited from HyperLTL, realizability of formulas with a ∀<sup>π</sup> quantifier followed by an ∃<sup>π</sup> quantifier is undecidable [12]. It remains to show that realizability of formulas from the ∀<sup>∗</sup> q∃<sup>∗</sup> <sup>q</sup>∀<sup>π</sup> fragment is in general undecidable. We give a reduction from Post's Correspondence Problem (PCP) [28] to a HyperQPTL formula from the ∀<sup>∗</sup> q∃<sup>∗</sup> <sup>q</sup>∀<sup>π</sup> fragment. In PCP, we are given two equally long lists <sup>α</sup> and β consisting of finite words from some alphabet Σ of size n. PCP is the problem to find an index sequence (i<sup>k</sup>)<sup>1</sup>≤k≤<sup>K</sup> with <sup>K</sup> <sup>≥</sup> 1 and 1 <sup>≤</sup> <sup>i</sup><sup>k</sup> <sup>≤</sup> <sup>n</sup>, such that <sup>α</sup><sup>i</sup><sup>1</sup> ...α<sup>i</sup><sup>K</sup> <sup>=</sup> <sup>β</sup><sup>i</sup><sup>1</sup> ...β<sup>i</sup><sup>K</sup> . Intuitively, PCP is the problem of choosing an infinite sequence of domino stones (with finitely many different stones), where each stone

**Fig. 6.** A sketch of the strategy tree of our PCP reduction: relevant traces are marked in green. (Color figure online)

consists of two words <sup>α</sup><sup>i</sup> and <sup>β</sup><sup>i</sup>. Let a PCP instance with <sup>Σ</sup> <sup>=</sup> {a1, a2, ..., a<sup>n</sup>} and two lists α and β be given. We choose our set of atomic propositions as follows: AP := I <sup>∪</sup>˙ O with I := {i} and O := (Σ ∪ {a˙ <sup>1</sup>, <sup>a</sup>˙ <sup>2</sup>, ..., <sup>a</sup>˙ <sup>n</sup>} ∪ #)<sup>2</sup>, where we use the dot symbol to encode that a stone starts at this position of the trace. We write ˜a to denote either a or ˙a. The single input i spans a binary strategy tree. We encode the PCP instance into a HyperQPTL formula that is realizable if and only if the PCP instance has a solution:

$$\begin{aligned} \forall q\_i. \forall \mathbf{q}. \exists p\_i. \exists \mathbf{p}. \forall \pi. \left( (\Box \pi = p\_i) \to (\Box \pi = \mathbf{p}) \right) \land \\ \left( (\Box \pi = (q\_i, \mathbf{q})) \to \varphi\_{reduc}(q\_i, \mathbf{q}, p\_i, \mathbf{p}) \right) \; , \end{aligned}$$

where *q* and *p* are sequences of universally and existentially quantified propositional variables, such that for each (o, o ) <sup>∈</sup> O, there is a q(o,o-) ∈ *q* and a p(o,o-) <sup>∈</sup> *<sup>p</sup>*. Together with <sup>q</sup><sup>i</sup> and <sup>p</sup><sup>i</sup> for the input <sup>i</sup>, they simulate a universally and an existentially quantified trace from the model. The notation π <sup>=</sup> *<sup>q</sup>* denotes that for every <sup>q</sup><sup>a</sup> <sup>∈</sup> *<sup>q</sup>*, it holds that <sup>a</sup><sup>π</sup> <sup>↔</sup> <sup>q</sup><sup>a</sup>. As seen before, the premise ( <sup>π</sup> = (q<sup>i</sup>, *<sup>q</sup>*)) and the conjunct ( <sup>π</sup> <sup>=</sup> <sup>p</sup><sup>i</sup>) <sup>→</sup> ( <sup>π</sup> <sup>=</sup> *<sup>p</sup>*) ensure that the propositions (q<sup>i</sup>, *<sup>q</sup>*) and (p<sup>i</sup>, *<sup>p</sup>*) are chosen to represent actual traces from the model. The universal quantification π thus only ensures that (q<sup>i</sup>, *<sup>q</sup>*) and (p<sup>i</sup>, *<sup>p</sup>*), which are used for the main reduction, are chosen correctly. The reduction is implemented in the formula <sup>ϕ</sup>*reduc* and follows the construction in [10], where it is shown that the satisfiability and realizability problem of HyperLTL are undecidable for a ∀∃ trace quantifier prefix.

$$\begin{aligned} \varphi\_{reduc}(q\_i, \mathbf{q}, p\_i, \mathbf{p}) &:= \varphi\_{rel}(q\_i) \to \varphi\_{is++}(q\_i, p\_i) \\ &\wedge \varphi\_{start}(\varphi\_{stone\&shift}(\mathbf{q}, \mathbf{p}), q\_i) \wedge \varphi\_{sol}(q\_i, \mathbf{q}) \end{aligned}$$


For example, let <sup>α</sup> with <sup>α</sup><sup>1</sup> <sup>=</sup> <sup>a</sup>, <sup>α</sup><sup>2</sup> <sup>=</sup> ab, <sup>α</sup><sup>3</sup> <sup>=</sup> bba, and <sup>β</sup> with <sup>β</sup><sup>1</sup> <sup>=</sup> baa, <sup>β</sup><sup>2</sup> <sup>=</sup> aa and <sup>β</sup><sup>3</sup> <sup>=</sup> bb be given. A possible solution for this PCP instance is be (3, <sup>2</sup>, <sup>3</sup>, 1), since bbaabbbaa <sup>=</sup> <sup>i</sup><sup>α</sup> <sup>=</sup> <sup>i</sup><sup>β</sup>. The full sequence at the trace <sup>i</sup> represents the solution with the outputs

$$(\dot{b}, \dot{b})(b, b)(a, \dot{a})(\dot{a}, a)(b, \dot{b})(\dot{b}, b)(b, \dot{b})(a, a)(\dot{a}, a)(\#, \#)(\#, \#)\dots$$

The next relevant trace, therefore, contains

(˙a, a˙)(b, a)(˙ b, ˙ b)(b, b)(a, ˙ b)(˙a, a)(#, a)(#, #)(#, #)...

Continuing this, the following relevant traces are:

$$\begin{aligned} &(\dot{b}, \dot{b})(b, b)(a, \dot{b})(\dot{a}, a)(\#, a)(\#, \#)(\#, \#)\dots \\ &(\dot{a}, \dot{b})(\#, a)(\#, a)(\#, \#)(\#, \#)\dots \\ &(\#, \#)(\#, \#)\dots \end{aligned}$$

The relevant traces verify the solution provided on the <sup>i</sup> trace by removing one stone after the other. Thus, the formula is realizable iff the PCP instance has a solution.

#### **4.3 Multiple Universal Trace Quantifiers**

When considering multiple universal trace quantifiers ∀<sup>∗</sup> <sup>π</sup>, the problem becomes undecidable. This is because in HyperLTL, one can encode distributed architectures – for which the problem is undecidable – directly into the formula without using any propositional quantification [12].

#### **Corollary 3.** *Realizability of the* <sup>∀</sup><sup>∗</sup> <sup>π</sup> *fragment is in general undecidable.*

However, we show that the realizability problem for formulas with more than one universal trace quantifier is decidable if we restrict ourselves to formulas in the so-called *linear fragment*, i.e., that does not allow an encoding of a distributed architecture. We define the linear fragment of HyperQPTL, where the definitions are adopted from [12].

Let A, C <sup>⊆</sup> AP. We define that atomic propositions c <sup>∈</sup> C do solely depend on propositions a <sup>∈</sup> A as the HyperQPTL formula

$$D\_{A \hookrightarrow C} := \forall \pi \forall \pi'. \left(\bigvee\_{a \in A} (a\_{\pi} \leftrightarrow a\_{\pi'})\right) \mathcal{R}\left(\bigwedge\_{c \in C} (c\_{\pi} \leftrightarrow c\_{\pi'})\right) \dots$$

We define a *collapse* function, which collapses a HyperQPTL formula with a ∀<sup>∗</sup> π universal quantifier prefix into a formula with a single ∀<sup>π</sup> quantifier. Propositional quantifiers are preserved by the operation. Let <sup>ϕ</sup> be <sup>∀</sup>π<sup>1</sup> ···∀πn. Q<sup>∗</sup> <sup>q</sup> . ψ. We define the collapsed formula of ϕ as *collapse*(ϕ) := <sup>∀</sup>π.Q<sup>∗</sup> <sup>q</sup> . ψ[π<sup>1</sup> <sup>→</sup> <sup>π</sup>][π<sup>2</sup> <sup>→</sup> <sup>π</sup>] ... [π<sup>n</sup> <sup>→</sup> <sup>π</sup>] where <sup>ψ</sup>[π<sup>i</sup> <sup>→</sup> <sup>π</sup>] replaces all occurrences of <sup>π</sup><sup>i</sup> in <sup>ψ</sup> with <sup>π</sup>.

**Lemma 4.** *Either* ϕ <sup>≡</sup> *collapse*(ϕ) *or* ϕ *has no equivalent* <sup>∀</sup><sup>1</sup> <sup>π</sup>. Q<sup>∗</sup> <sup>q</sup> *formula.*

*Proof.* The collapse function solely works on the trace quantification mechanism of the HyperQPTL formula, by reducing them to a single universal quantification. The theorem has been proven for ∀<sup>∗</sup> HyperLTL formulas in [12]. Inner propositional quantification does not interfere with this mechanism, hence, the proof can be carried out identically.

Now we can formally define the linear ∀<sup>∗</sup> <sup>π</sup> fragment. Intuitively, we require that every input-output dependency can be ordered linearly, i.e., we are restricted to linear architectures without information forks (see Example 3).

**Definition 3.** *Let* O <sup>=</sup> {o1,...,o<sup>n</sup>}*. A HyperQPTL formula* <sup>ϕ</sup> *is called* linear *if for all* <sup>o</sup><sup>i</sup> <sup>∈</sup> <sup>O</sup> *there is a* <sup>J</sup><sup>i</sup> <sup>⊆</sup> <sup>I</sup> *such that* <sup>ϕ</sup> <sup>∧</sup> <sup>D</sup><sup>I</sup>→<sup>O</sup> <sup>≡</sup> *collapse*(ϕ) <sup>∧</sup> <sup>o</sup>i∈<sup>O</sup> <sup>D</sup><sup>J</sup>i→{oi} *and* <sup>J</sup><sup>i</sup> <sup>⊆</sup> <sup>J</sup><sup>i</sup>+1 *for all* <sup>i</sup> <sup>≤</sup> <sup>n</sup>*.*

This results in the following corollary. Since the universal quantifiers can be collapsed, the resulting problem is the realizability problem of QPTL in a linear architecture, which is decidable [17].

**Corollary 4.** *Realizability of the linear* <sup>∀</sup><sup>∗</sup> <sup>π</sup>Q<sup>∗</sup> <sup>q</sup> *fragment is decidable.*

*Remark on Complexities.* Our aim was to work out the largest possible fragments for which the realizability problem of HyperQPTL remains decidable. The three fragments for which we could prove decidability all subsume the logic QPTL, for which the realizability problem is known to be non-elementary (already its satisfiability problem is non-elementary [30]). Hence, realizability of the discussed HyperQPTL fragments has a non-elementary lower bound. Finding interesting fragments for which the problem has a more feasible complexity therefore remains an open challenge.

#### **5 Experiments**

We have implemented a prototype tool that can solve the HyperQPTL realizability problem using the bounded synthesis approach [18]. More concretely, we extended the HyperLTL synthesis tool BoSy [7,9,12]. Bosy reduces the HyperLTL synthesis problem to a SMT constraint system which is then solved by z3 [8] (for more see [12]). We implemented the reduction of HyperQPTL synthesis to HyperLTL synthesis (Corollary 2) in BoSy, such that the tool can also handle HyperQPTL formulas. We evaluated the tool against a range of


**Table 1.** Experimental results for prompt arbiter

benchmarks sets, shown in Table 1. The first column indicates the parameterized benchmark name. The second and third columns indicate the bounds given to the bounded synthesis procedure. The second column is the bound on the size of the system. The newest version of BoSy also bounds the size of the strategy for the existential player, this bound is given in column three. For a detailed explanation of how existential strategies are bounded in BoSy, we refer to [7].

We synthesized a range of resource arbiters. Our benchmark set is parametric in the number of clients that can request access to the shared resource (written arbiter-k-prompt where k is the number of clients in Table 1). Unlike normal arbiters, we require the arbiter to fulfill promptness for some of the clients, i.e., requests must be answered within a bounded number of steps [33]. We state the promptness requirement in HyperQPTL by applying the *alternating-color technique* from [24]. Intuitively, the alternating-color technique works as follows: We quantify a q-sequence that "changes color" between q and <sup>¬</sup>q. Each change of color is used as a potential bound. Once a request occurs, the grant must be given withing two changes of color. Thus, the HyperQPTL formulation amounts to the following specifications, here exemplary for 2 clients, where we require promptness only for client 1.

$$\forall \pi. \Box \neg (g\_{\pi}^1 \land g\_{\pi}^2) \tag{1}$$

$$\forall \pi. \Box (r\_{\pi}^2 \to \bigotimes g\_{\pi}^2) \tag{2}$$

$$
\exists q. \forall \pi. \Box \bigotimes q \land \Box \bigotimes \neg q \tag{3}
$$

$$\begin{aligned} \wedge \square (r\_{\pi}^{1} \to (q \to (q \mathcal{U} (\neg q \mathcal{U} \, g\_{\pi}^{1}))) \\ \wedge (\neg q \to (\neg q \mathcal{U} (q \mathcal{U} \, g\_{\pi}^{1})))) \\ \forall \pi. (\neg g\_{\pi}^{1} \mathcal{W} r\_{\pi}^{1}) \wedge (\neg g\_{\pi}^{2} \mathcal{W} r\_{\pi}^{2}) \end{aligned} \tag{4}$$

Formula 1 states mutual exclusion. Formula 2 states that client 2 must be served eventually (but not within a bounded number of steps). Formula 3 states the promptness requirement for client 1. It quantifies an alternating q-sequence, which serves as a sequence of global bounds that must be respected on all traces π. Then, if client 1 poses a request, the grant must be given within two changes of the value of q. Formula <sup>4</sup> is only added in benchmarks named arbiter-k-fullprompt. It specifies that no spurious grants should be given.

BoSy successfully synthesizes prompt arbiter of up to 3 states. For a 4-state prompt arbiter BoSy did not return in reasonable time.

#### **6 Conclusion**

We studied the hyperlogic HyperQPTL, which combines the concepts of trace relations and ω-regularity. We showed that HyperQPTL is very expressive, it can express properties like *promptness*, *bounded waiting for a grant*, *epistemic* properties, and, in particular, any ω-*regular* property. Those properties are not expressible in previously studied hyperlogics like HyperLTL. At the same time, we argued that the expressiveness of HyperQPTL is optimal in a sense that a more expressive logic for ω-regular hyperproperties would have an undecidable model checking problem. We furthermore studied the realizability problem of HyperQPTL. We showed that realizability is decidable for HyperQPTL fragments that contain properties like promptness. But still, in contrast to the satisfiability problem, propositional quantification does make the realizability problem of hyperlogics harder. More specifically, the HyperQPTL fragment of formulas with a universal-existential propositional quantifier alternation followed by a single trace quantifier is undecidable in general, even though the projection of the fragment to HyperLTL has a decidable realizability problem. Lastly, we implemented the bounded synthesis problem for HyperQPTL in the prototype tool BoSy. Using BoSy with HyperQPTL specifications, we have been able to synthesize several resource arbiters. The synthesis problem of non-linear-time hyperlogics is still open. For example, it is not yet known how to synthesize systems from specifications given in branching-time hyperlogics like HyperCTL∗.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## AdamMC**: A Model Checker for Petri Nets with Transits against Flow-LTL**

Bernd Finkbeiner<sup>1</sup>, Manuel Gieseking2(B), Jesko Hecking-Harbusch<sup>1</sup>, and Ernst-R¨udiger Olderog<sup>2</sup>

<sup>1</sup> Saarland University, Saarbr¨ucken, Germany {finkbeiner,hecking-harbusch}@react.uni-saarland.de <sup>2</sup> University of Oldenburg, Oldenburg, Germany {gieseking,olderog}@informatik.uni-oldenburg.de

**Abstract.** The correctness of networks is often described in terms of the individual data flow of components instead of their global behavior. In software-defined networks, it is far more convenient to specify the correct behavior of packets than the global behavior of the entire network. Petri nets with transits extend Petri nets and Flow-LTL extends LTL such that the data flows of tokens can be tracked. We present the tool AdamMC as the first model checker for Petri nets with transits against Flow-LTL. We describe how AdamMC can automatically encode concurrent updates of software-defined networks as Petri nets with transits and how common network specifications can be expressed in Flow-LTL. Underlying AdamMC is a reduction to a circuit model checking problem. We introduce a new reduction method that results in tremendous performance improvements compared to a previous prototype. Thereby, AdamMC can handle software-defined networks with up to 82 switches.

#### **1 Introduction**

In networks, it is difficult to specify correctness in terms of the global behavior of the entire system. Instead, the individual *flow* of components is far more convenient to specify correct behavior. For example, loop and drop freedom can be easily specified for the flow of each packet. Petri nets and LTL lack this local view. Petri nets with transits and Flow-LTL have been introduced to overcome this restriction [10]. A transit relation is introduced to follow the *flow* induced by tokens. *Flow-LTL* is a temporal logic to specify both the *local* flow of data and the *global* behavior of markings. The global behavior as in Petri nets and LTL is still important for maximality and fairness assumptions. In this paper,

<sup>1</sup> AdamMC is available online at https://uol.de/en/csd/adammc [12].

c The Author(s) 2020 S. K. Lahiri and C. Wang (Eds.): CAV 2020, LNCS 12225, pp. 64–76, 2020. https://doi.org/10.1007/978-3-030-53291-8\_5

This work was supported by the German Research Foundation (DFG) Grant Petri Games (392735815) and the Collaborative Research Center "Foundations of Perspicuous Software Systems" (TRR 248, 389792660), and by the European Research Council (ERC) Grant OSARES (683300).

**Fig. 1.** Access control at an airport modeled as Petri net with transits. Colored arrows display the transit relation and define flow chains to model the passengers.

we present the tool AdamMC<sup>1</sup> as the first model checker for Petri nets with transits against Flow-LTL and its application to software-defined networking.

In Fig. 1, we present an example of a Petri net with transits that models the security check at an airport where passengers are checked by a security guard. The number of passengers entering the airport is unknown in advance. Rather than introducing the complexity of an infinite number of tokens, we use a fixed number of tokens to model possibly infinitely many *flow chains*. This is done by the transit relation which is depicted with colored arrows.

The left-hand side of Fig. 1 models passengers who want to reach the terminal. There are three tokens in the places *airport*, *queue*, and *terminal*. Thus, transitions *start* and *en* are always enabled. Each firing of *start* creates a new flow chain as depicted by the green arrow. This models a new person arriving at the *airport*. Meanwhile, the double-headed blue arrow maintains all flow chains that are still in place *airport*. Passengers have to *en*ter the *queue* and wait until the security *check* is performed. Therefore, transition *en* continues every flow chain in *airport* to *queue*. Checking the passengers is carried out by transition *check* which becomes enabled if the security guard *work*s. Thus, passengers residing in *queue* have to wait until the guard *check*s them. Afterwards, they reach the *terminal*. The security guard is modeled on the right-hand side of Fig. 1. By firing *comeToWork* and thus moving the token in place *home*, her flow chain starts and she can repeatedly either *idle* or *work*, *check* passengers, and *ret*urn. Her transit relation is depicted in orange and models exactly one flow chain.

In Fig. 1, we define the checkpoints *cp*<sup>1</sup> and *cp*<sup>2</sup> and the *booth* as a security zone and require that passengers never enter the security zone and eventually reach the *terminal*. The flow formula <sup>ϕ</sup> <sup>=</sup> <sup>A</sup>(airport <sup>→</sup> ( <sup>¬</sup>(*cp*<sup>1</sup> <sup>∨</sup>*cp*<sup>2</sup> <sup>∨</sup>*booth*)<sup>∧</sup> terminal)) specifies this. AdamMC verifies the example from Fig. 1 against the formula *check* → ϕ specifying that if passengers are checked regularly then they cannot access the security zone and eventually reach the terminal.

In this paper, we present AdamMC as a full-fledged tool. First, AdamMC can handle Petri nets with transits and Flow-LTL formulas in general. Second, AdamMC has an input interface for a concurrent update and a softwaredefined network and encodes both of them as a Petri nets with transits. Common assumptions on fairness and requirements for network correctness are also provided as Flow-LTL formulas. This allows users of the tool to model check the correctness of concurrent updates and to prevent packet loss, routing loops, and network congestion. Third, AdamMC provides algorithms to check safe Petri nets against LTL with *both* places and transitions as atomic propositions which makes it especially easy to specify fairness and maximality assumptions.

The tool reduces the model checking problem for safe Petri nets with transits against Flow-LTL to the model checking problem for safe Petri nets against LTL. We develop the new *parallel approach* to check global and local behavior in parallel instead of sequentially. This approach yields a tremendous speed-up for a few local requirements and realistic fairness assumptions in comparison to the sequential approach of a previous prototype [10]. In general, the parallel approach has worst-case complexity inferior to the sequential approach even though the complexities of both approaches are the same when using only one flow formula.

As last step, AdamMC reduces the model checking problem of safe Petri nets against LTL to a circuit model checking problem. This is solved by ABC [2,4] with effective verification techniques like IC3 and bounded model checking. AdamMC verifies concurrent updates of software-defined networks with up to 38 switches (31 more than the prototype) and falsifies concurrent updates of software-defined networks with up to 82 switches (44 more than the prototype).

The paper is structured as follows: In Sect. 2, we recall Petri nets with transits and Flow-LTL. In Sect. 3, we outline the three application areas of AdamMC: checking safe Petri nets with transits against Flow-LTL, checking concurrent updates of software-defined networks against common assumptions and specifications, and checking safe Petri nets against LTL. In Sect. 4, we algorithmically encode concurrent updates of software-defined networks in Petri nets with transits. In Sect. 5, we introduce the parallel approach for the underlying circuit model checking problem. In Sect. 6, we present our experimental evaluation.

Further details can be found in the full paper [13].

#### **2 Petri Nets with Transits and Flow-LTL**

A safe *Petri net with transits N* = (*P*, *T* , *F*,*In*, Υ) [10] contains the set of *places P*, the set of *transitions T* , the *flow relation F* ⊆ (*P* × *T* )∪(*T* ×*P*), and the *initial marking In* ⊆ *P* as in safe Petri nets [27]. In a *safe* Petri net, reachable markings contain at most one token per place. The *transit relation* Υ is for every transition <sup>t</sup> <sup>∈</sup> *<sup>T</sup>* of type <sup>Υ</sup>(t) <sup>⊆</sup> (*pre<sup>N</sup>* (t) ∪ {-}) × *post<sup>N</sup>* (t). With p Υ(t) q, we define that firing transition t *transits* the flow in place p to place q. The symbol denotes a *start* and - Υ(t) q defines that firing transition t *starts* a new flow for the token in place q. Note that the transit relation can split, merge, and end flows. A sequence of flows leads to a *flow chain* which is a sequence of the current place and the fired outgoing transition. Thus, Petri nets with transits can describe both the global progress of tokens and the local flow of data.

Flow-LTL [10] extends Linear-time Temporal Logic (LTL) and uses places and transitions as atomic propositions. It introduces A as a new operator which uses LTL to specify the flow of data for *all* flow chains. For Fig. 1, the formula <sup>A</sup>(*booth* <sup>→</sup> *check*) specifies that the guard performs at least one check. We call

**Fig. 2.** Overview of the workflow of AdamMC: The application areas of the tool are given by three different input domains: software-defined network/Flow-LTL (Input I), Petri nets with transits/Flow-LTL (Input II), and Petri nets/LTL (Input III). AdamMC performs all unlabeled steps. MCHyper creates the final circuit which ABC checks to answer the initial model checking problem.

formulas starting with A *flow formulas*. Formulas around flow formulas specify the global progress of tokens in the form of markings and fired transitions to formalize maximality and fairness assumptions. These formulas are called *run formulas*. Often, Flow-LTL formulas have the form *run formula* → *flow formula*.

#### **3 Application Areas**

AdamMC consists of modules for three application areas: checking safe Petri nets with transits against Flow-LTL, checking concurrent updates of softwaredefined networks against common assumptions and specifications, and checking safe Petri nets against LTL. The general architecture and workflow of the model checking procedure is given in Fig. 2. AdamMC is based on the tool Adam [14]. **Petri Nets with Transits.** Petri nets with transits follow the progress of tokens and the flow of data. Flow-LTL allows to specify requirements on both. For Petri nets with transits and Flow-LTL (Input II), AdamMC extends a parser for Petri nets provided by APT [30], provides a parser for Flow-LTL, and implements two reduction methods to create a safe Petri net and an LTL formula. The sequential approach is outlined in [10] and the parallel approach in Sect. 5. **Software-Defined Networks.** Concurrent updates of software-defined networks are the second application area of AdamMC. The tool automatically encodes an initially configured network topology and a concurrent update as a Petri net with transits. The concurrent update renews the forwarding table. We provide parsers for the *network topology*, the *initial configuration*, the *concurrent update*, and Flow-LTL (Input I). In Sect. 4, we present the creation of a Petri net with transits from the input and Flow-LTL formulas for *common network properties* like *connectivity*, *loop freedom*, *drop freedom*, and *packet coherence*.

**Petri Nets.** AdamMC supports the model checking of safe Petri nets against LTL with both places *and* transitions as atomic propositions. It provides dedicated algorithms to check *interleaving-maximal* runs of the system. A run is interleaving-maximal if a transition is fired whenever a transition is enabled. Furthermore, AdamMC allows a concurrent view on runs and can check *concurrency-maximal* runs which demand that each subprocess of the system has to progress maximally rather than only the entire system. State-of-the-art tools like LoLA [32] and ITS-Tools [29] are restricted to interleaving-maximal runs and places as atomic propositions. For Petri net model checking (Input III), we allow Petri nets in APT and PNML format as input and provide a parser for LTL formulas.

The construction of the circuit in Aiger format [3] is defined in [11]. MCHyper [15] is used to create a circuit from a given circuit and an LTL formula. This circuit is given to ABC [2,4] which provides a toolbox of modern hardware verification algorithms like IC3 and bounded model checking to decide the initial model checking question. As output for all three modules, AdamMC transforms a possible counterexample (CEX) from ABC into a counterexample to the Petri net (with transits) and visualizes the net with Graphviz and the dot language [9]. When no counterexample exists, AdamMC verified the input successfully.

## **4 Verifying Updates of Software Defined Networks**

We show how AdamMC can check concurrent updates of realistic examples from software-defined networking (SDN) against typical specifications [19]. SDN [6,25] separates the *data plane* for forwarding packets and the *control plane* for the routing configuration. A central controller initiates updates which can cause problems like routing loops or packet loss. AdamMC provides an input interface to automatically encode software-defined networks and concurrent updates of their configuration as Petri nets with transits. The tool checks requirements like loop and drop freedom to find erroneous updates before they are deployed.

#### **4.1 Network Topology, Configurations, and Updates**

A *network topology* T is an undirected graph T = (*Sw*, *Con*) with *switches* as vertices and *connections* between switches as edges. Packets enter the network at *ingress* switches and they leave at *egress* switches. *Forwarding* rules are of the form x.fwd(y) with x, y ∈ *Sw*. A concurrent *update* has the following syntax:


where a switch update can renew the forwarding rule of switch x from switch z to switch y, introduce a new forwarding rule from switch x to switch y, or remove an existing forwarding rule from switch x to switch z.

#### **4.2 Data Plane and Control Plane as Petri Net with Transits**

For a network topology T = (*Sw*, *Con*), a set of *ingress* switches, a set of *egress* switches, an initial *forwarding* table, and a concurrent *update*, we show how data and control plane are encoded as Petri net with transits. Switches are modeled by tokens remaining in corresponding places s whereas the flow of packets is modeled by the transit relation Υ. Specific transitions i<sup>s</sup> model ingress switches where new data flows begin. Tokens in places of the form x.fwd(y) configure the forwarding. Data flows are extended by firing transitions (x,y) corresponding to configured forwarding without moving any tokens. Thus, we model any order of newly generated packets and their forwarding. Assuming that each existing direction of a connection between two switches is explicitly given in *Con*, we obtain Algorithm 1 which calls Algorithm 2 to obtain the control plane.

For the *update*, let *SwU* be the set of switch updates in it, *SeU* the set of sequential updates in it, and *PaU* the set of parallel updates in it. Depending on *update*'s type, it is also added to the respective set. The subnet for the *update* has an empty transit relation but moves tokens from and to places of the form x.fwd(y). Tokens in these places correspond to the forwarding table. The order of the switch updates is defined by the nesting of sequential and parallel updates. The *update* is realized by a specific token moving through unique places of the form <sup>u</sup>*s*, u*<sup>f</sup>* , s*s*, s*<sup>f</sup>* , p*s*, p*<sup>f</sup>* for start and finish of each switch update <sup>u</sup> <sup>∈</sup> *SwU* , each sequential update s ∈ *SeU* , and each parallel update p ∈ *PaU* . A parallel update temporarily increases the number of tokens and reduces it upon completion to one. Algorithm 2 defines the update behavior between start and finish places and connects finish and start places depending on the subexpression structure.

**Fig. 3.** Overview of the *sequential approach*: Each firing of a transition of the original net is split into first firing a transition in the subnet for the run formula and subsequently firing a transition in each subnet tracking a flow formula. The constructed LTL formula skips the additional steps with until operators.

**Fig. 4.** Overview of the *parallel approach*: The n subnets are connected such that for every transition <sup>t</sup> <sup>∈</sup> *<sup>T</sup>* there are (|Υ(t)<sup>|</sup> + 1)*<sup>n</sup>* transitions, i.e., there is one transition for every combination of which transit of t (or none) is tracked by which subnet. We use until operators in the constructed LTL formula to only skip steps not involving the tracking of the guessed chain in the flow formula.

#### **4.3 Assumptions and Requirements**

We use the run formula *pre* (t) → t to assume weak fairness for every transition t in our encoding *N* . Transitions, which are always enabled after some point, are ensured to fire infinitely often. Thus, packets are eventually forwarded and the routing table is eventually updated. We use flow formulas to test specific requirements for all packets. Connectivity (A( - *<sup>s</sup>*∈*egress* <sup>s</sup>)) ensures that all packets reach an egress switch. Packet coherence (A( ( - <sup>s</sup>∈*initial* <sup>s</sup>) <sup>∨</sup> ( - <sup>s</sup>∈*final* <sup>s</sup>))) tests that packets are either routed according to the initial or final configuration. Drop freedom (A ( <sup>e</sup>∈*egress* <sup>¬</sup><sup>e</sup> <sup>→</sup> - *<sup>f</sup>*∈*Con* <sup>f</sup>)) forbids dropped packets whereas loop freedom (A ( <sup>s</sup>∈*Sw*\*egress* <sup>s</sup> <sup>→</sup> (s<sup>U</sup> <sup>¬</sup>s))) forbids routing loops. We combine run and flow formula into *fairness* → *requirement*.

#### **5 Algorithms and Optimizations**

Central to model checking a Petri net with transits *N* against a Flow-LTL formula ϕ is the reduction to a safe Petri net *N <sup>&</sup>gt;* and an LTL formula ϕ*>*. The infinite state space of the Petri net with transits due to possibly infinitely many flow chains is reduced to a finite state model. The key idea is to guess and track a violating flow chain for each flow subformula <sup>A</sup> <sup>ψ</sup>*i*, for <sup>i</sup> ∈ {1,...,n}, and to only once check the equivalent future of flow chains merging into a common place.

AdamMC provides two approaches for this reduction: Fig. 3 and Fig. 4 give an overview of the *sequential* approach and the *parallel* approach, respectively. Both algorithms create one subnet *N <sup>&</sup>gt; <sup>i</sup>* for each flow subformula <sup>A</sup> <sup>ψ</sup>*<sup>i</sup>* to track the corresponding flow chain and have one subnet *N <sup>&</sup>gt; <sup>O</sup>* to check the run part of the formula. The places of *N <sup>&</sup>gt; <sup>O</sup>* are copies of the places in *N* such that the current state of the system can be memorized. The subnets *N <sup>&</sup>gt; <sup>i</sup>* also consist of the original places of *N* but only use one token (initially residing on an additional place) to track the current state of the considered flow chain. The approaches differ in how these nets are connected to obtain *N <sup>&</sup>gt;*.

**Sequential Approach.** The places in each subnet *N <sup>&</sup>gt; <sup>i</sup>* are connected with one transition for each transit (*Tfl* = *<sup>t</sup>*∈*<sup>T</sup>* <sup>Υ</sup>(t)). An additional token iterates sequentially through the subnets to activate or deactivate the subnet. This allows each subnet to track a flow chain corresponding to firing a transition in *N <sup>&</sup>gt; <sup>O</sup>* . The formula ϕ*<sup>&</sup>gt;* takes care of these additional steps by means of the until operator: In the run part of the formula, all steps corresponding to moves in a subnet *N <sup>&</sup>gt; i* are skipped and, for each subformula A ψ*i*, all steps are skipped until the next transition of the corresponding subnet is fired which transits the tracked flow chain. This technique results in a polynomial increase of the size of the Petri net and the formula: *<sup>N</sup> <sup>&</sup>gt;* has *<sup>O</sup>*(|*<sup>N</sup>* | · <sup>n</sup> <sup>+</sup> <sup>|</sup>*<sup>N</sup>* <sup>|</sup>) places and *<sup>O</sup>*(|*<sup>N</sup>* <sup>|</sup> <sup>3</sup> · <sup>n</sup> <sup>+</sup> <sup>|</sup>*<sup>N</sup>* <sup>|</sup>) transitions and the size of <sup>ϕ</sup>*<sup>&</sup>gt;* is in *<sup>O</sup>*(|*<sup>N</sup>* <sup>|</sup> <sup>3</sup> · <sup>n</sup> · |ϕ<sup>|</sup> <sup>+</sup> <sup>|</sup>ϕ|). We refer to [11] for formal details.

**Parallel Approach.** The n subnets are connected such that the current chain of each subnet is tracked simultaneously while firing an original transition <sup>t</sup> <sup>∈</sup> *<sup>T</sup>* . Thus, there are (|Υ(t)|+ 1)*<sup>n</sup>* transitions. Each of these transitions stands for exactly one combination of which subnet is tracking which (or no) transit. Hence, firing one transition of the original net is directly tracked in one step for all subnets. This significantly reduces the complexity of the run part of the constructed formula, since no until operator is needed to skip sequential steps. A disjunction over all transitions corresponding to an original transition suffices to ensure correctness of the construction. Transitions and next operators in the flow parts of the formula still have to be replaced by means of the until operator to ensure that the next step of the tracked flow chain is checked at the corresponding step of the global timeline of ϕ*<sup>&</sup>gt;*. In general, the parallel approach results in an exponential blow-up of the net and the formula: *<sup>N</sup> <sup>&</sup>gt;* has *<sup>O</sup>*(|*<sup>N</sup>* |·n+|*<sup>N</sup>* <sup>|</sup>) places and *O*(|*N* | <sup>3</sup>*<sup>n</sup>* <sup>+</sup>|*<sup>N</sup>* <sup>|</sup>) transitions and the size of <sup>ϕ</sup>*<sup>&</sup>gt;* is in *<sup>O</sup>*(|*<sup>N</sup>* <sup>|</sup> <sup>3</sup>*<sup>n</sup>* ·|ϕ|+|ϕ|). For the practical examples, however, the parallel approach allows for model checking Flow-LTL with few flow subformulas with a tremendous speed-up in comparison to the sequential approach. Formal details are in the full version of the paper [13]. **Table 1.** Overview of optimization parameters of AdamMC: The three reduction steps depicted in the first column can each be executed by different algorithms. The first step allows to combine the optimizations of the first and second row.


**Optimizations.** Various optimizations parameters can be applied to the model checking routine described in Sect. 3 to tweak the performance. Table 1 gives an overview of the major parameters.

We found that the versions of the sequential and the parallel approach with inhibitor arcs to track flow chains are generally faster than the versions without. Furthermore, the reduction step from a Petri net into a circuit with logarithmically encoded transitions had oftentimes better performance than the same step with explicitly encoded transitions. However, several possibilities to reduce the number of gates of the created circuit worsened the performance of some benchmark families and improved the performance of others. Consequently, all parameters are selectable by the user and a script is provided to compare different settings. An overview of the selectable optimization parameters can be found in the documentation of AdamMC [12]. Our main improvement claims can be retraced by the case study in Sect. 6.

## **6 Evaluation**

We conduct a case study based on SDN with a corresponding artifact [16]. The performance improvements of AdamMC compared to the prototype [10] are summarized in Table 2. For realistic software-defined networks [19], one ingress and one egress switch are chosen at random. Two forwarding tables between the two switches and an update from the first to the second configuration are chosen at random. AdamMC verifies that the update maintained *connectivity* between ingress and egress switch. The results are depicted in rows starting with T. For rows starting with F, we required *connectivity* of a random switch which is not in the forwarding tables. AdamMC falsified this requirement for the update.

The prototype implementation based on an *explicit encoding* can verify updates of networks with 7 switches and falsify updates of networks with 38 switches. We optimize the explicit encoding to a *logarithmic encoding* and the number of switches for which updates can be verified increases to 17. More significantly, the *parallel approach* in combination with the logarithmic encoding leads to tremendous performance gains. The performance gains of an approach with inferior worst-case complexity are mainly due to the smaller complexity of the LTL formula created by the reduction. The encoding of SDN requires fairness assumptions for each transition. These assumptions (encoded in the run

**Table 2.** We compare the explicit and logarithmic encoding of the sequential approach with the parallel approach. The results are the average over five runs from an Intel i7- 2700K CPU with 3.50 GHz, 32 GB RAM, and a timeout (TO) of 30 min. The runtimes are given in seconds.


part of the formula) experience a blow-up with until operators by the sequential approach but only need a disjunction in the parallel approach. Hence, the size of networks for which AdamMC can verify updates increases to 38 switches and the size for which it can falsify updates increases to 82 switches. For rather small networks, the tool needs only a few seconds to verify and falsify updates which makes it a great option for operators when updating networks.

#### **7 Related Work**

We refer to [21] for an introduction to SDN. Solutions for correctness of updates of software-defined networks include *consistent updates* [7,28], *dynamic scheduling* [17], and *incremental updates* [18]. Both explicit and SMT-based model checking [1,5,22,23,26,31] is used to verify software-defined networks. Closest to our approach are models of networks as Kripke structures to use model checking for synthesis of correct network updates [8,24]. The model checking subroutine of the synthesizer assumes that each packet sees at most one updated switch. Our model checking routine does not make such an assumption.

There is a significant number of model checking tools (e.g., [29,32]) for Petri nets and an annual model checking contest [20]. AdamMC is restricted to safe Petri nets whereas other tools can handle bounded and colored Petri nets. At the same time, only AdamMC accepts LTL formulas with places *and* transitions as atomic propositions. This is essential to express fairness in our SDN encoding.

## **8 Conclusion**

We presented the tool AdamMC with its three application domains: checking safe Petri nets with transits against Flow-LTL, checking concurrent updates of software-defined networks against common assumptions and specifications, and checking safe Petri nets against LTL. New algorithms allow AdamMC to model check software-defined networks of realistic size: it can verify updates of networks with up to 38 switches and can falsify updates of networks with up to 82 switches.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Action-Based Model Checking: Logic, Automata, and Reduction**

Stephen F. Siegel(B) and Yihao Yan

University of Delaware, Newark, DE 19716, USA {siegel,yihaoyan}@udel.edu

**Abstract.** Stutter invariant properties play a special role in state-based model checking: they are the properties that can be checked using partial order reduction (POR), an indispensable optimization. There are algorithms to decide whether an LTL formula or B¨uchi automaton (BA) specifies a stutter-invariant property, and to convert such a BA to a form that is appropriate for on-the-fly POR-based model checking.

The *interruptible* properties play the same role in action-based model checking that stutter-invariant properties play in the state-based case. These are the properties that are invariant under the insertion or deletion of "invisible" actions. We present algorithms to decide whether an LTL formula or BA specifies an interruptible property, and show how a BA can be transformed to an *interrupt normal form* that can be used in an on-the-fly POR algorithm. We have implemented these algorithms in a new model checker named McRERS, and demonstrate their effectiveness using the RERS 2019 benchmark suite.

**Keywords:** Model checking · Action · Event · LTL · Stutter-invariant

#### **1 Introduction**

To apply model checking to a concurrent system, one must formulate properties that the system is expected to satisfy. A property may be expressed by specifying acceptable sequences of states, or by specifying acceptable sequences of actions the events that cause the state to change. Each approach has advantages and disadvantages, and in any particular context one may be more appropriate than the other.

In the state-based context, there is a rich theory involving automata, logic, and reduction for model checking. Some of the core ideas in this theory can be summarized as follows. First, the behavior of the concurrent system is represented by a state-transition system T. One identifies a set AP of atomic propositions, and each state of T is labeled by the set of propositions which hold at that state. An execution passes through an infinite sequence of states, which defines a *trace*, i.e., a sequence of subsets of AP. A *property* is a set of traces, and T satisfies the property if every trace of T is in P.

Y. Yan—Currently employed at Google.

c The Author(s) 2020

S. K. Lahiri and C. Wang (Eds.): CAV 2020, LNCS 12225, pp. 77–100, 2020. https://doi.org/10.1007/978-3-030-53291-8\_6

Properties may be specified by formulas in a temporal logic, such as LTL [26]. There are algorithms (e.g., [37]) to convert an LTL formula φ to an equivalent B¨uchi automaton (BA) B<sup>φ</sup> with alphabet 2AP. (Properties may also be specified directly using BAs.) The system T satisfies φ if and only if the language of the synchronous product T ⊗ B¬<sup>φ</sup> is empty. The emptiness of the language can be determined on-the-fly, i.e., while the reachable states of the product are being constructed.

A property P is *stutter-invariant* if it is closed under the insertion and deletion of repetitions, i.e., <sup>s</sup>0s<sup>1</sup> ··· ∈ <sup>P</sup> <sup>⇔</sup> <sup>s</sup>i<sup>0</sup> <sup>0</sup> si<sup>1</sup> <sup>1</sup> ··· ∈ P holds for any positive integers i0, i1, ··· . Many algorithms are known for deciding whether an LTL formula or a BA specifies a stutter-invariant property [22,24]. There is also an argument that only stutter-invariant properties should be used in practice. For example, suppose that a trace is formed by sampling the state of a system once every millisecond. If we sample the same system twice each millisecond, and there are no state changes in the sub-millisecond intervals, the second trace will be stutter-equivalent to the first. A meaningful property should be invariant under this choice of time resolution.

Stutter-invariant properties are desirable for another reason: they admit the most significant optimization in model checking, partial order reduction (POR, [15,23,25]). At each state encountered in the exploration of the product space, an on-the-fly POR scheme produces a subset of the enabled transitions. Restricting the search to the transitions in those subsets does not affect the language emptiness question. Recent work has revealed that the BA must have a certain form—"SI normal form"—when POR is used with on-the-fly model checking, but any BA with a stutter-invariant language can be easily transformed into SI normal form [27].

The purpose of this paper is to elaborate an analogous theory for eventbased models. Event-based models of concurrency are widely used and have been extremely influential for over three decades. For example, process algebras, such as CSP, are event-based and use *labeled transition systems* (LTSs) for the semantic model. Event-based models are the main formalism used in assumeguarantee reasoning (e.g, [10]), and in many other areas. There are mature model checking and verification tools for process algebras and LTSs, and which have significant industrial applications; see, e.g., [13]. Temporal logics, including LTL, CTL, and CTL\*, have long been used to specify event-based systems [3,7,12].

We call the class of properties in the action context that are analogous to the stutter-invariant properties in the state context the *interruptible* properties (Sect. 3). These properties are invariant under "action stuttering" [34], i.e., the insertion or deletion of "invisible" actions. We present algorithms for deciding whether an LTL formula or a BA specifies an interruptible property (Theorems 1 and 2); to the best of our knowledge, these are the first published algorithms for deciding this property of formulas or automata.

Interruptible properties play the same role in action-based POR that stutterinvariant properties play in state-based POR. In particular, we present an actionbased on-the-fly POR algorithm that works for interruptible properties (Sect. 4). As with the state-based case, the algorithm requires that the BA be in a certain normal form. We introduce a novel *interrupt normal form* (Definition 11) for this purpose, and show how any BA with an interruptible language can be transformed into that form. The relation to earlier work is discussed in Sect. 5. The effectiveness of these reduction techniques is demonstrated by applying them to problems in the 2019 RERS benchmark suite (Sect. 6).

#### **2 Preliminaries**

Let S be a set. 2<sup>S</sup> denotes the set of all subsets of S. S<sup>∗</sup> denotes the set of finite sequences of elements of <sup>S</sup>; <sup>S</sup><sup>ω</sup> the infinite sequences. Let <sup>ζ</sup> <sup>=</sup> <sup>s</sup>0s<sup>1</sup> ··· be a (finite or infinite) sequence and i ≥ 0. If ζ is finite of length n, assume i<n. Then <sup>ζ</sup>(i) denotes the element <sup>s</sup>i. For any <sup>i</sup> <sup>≥</sup> 0, <sup>ζ</sup><sup>i</sup> denotes the suffix <sup>s</sup>isi+1 ··· . (ζ<sup>i</sup> is empty if <sup>ζ</sup> is finite and <sup>i</sup> <sup>≥</sup> <sup>n</sup>).

For <sup>ζ</sup> <sup>∈</sup> <sup>S</sup><sup>∗</sup> and <sup>η</sup> <sup>∈</sup> <sup>S</sup><sup>∗</sup> <sup>∪</sup> <sup>S</sup><sup>ω</sup>, <sup>ζ</sup> ◦ <sup>η</sup> denotes the concatenation of <sup>ζ</sup> and <sup>η</sup>.

If S ⊆ T and η is a sequence of elements of T, η|<sup>S</sup> denotes the sequence obtained by deleting from η all elements not in S.

#### **2.1 Linear Temporal Logic**

Let Act be a universal set of actions. We assume Act is infinite.

**Definition 1.** Form (the *LTL formulas over* Act) is the smallest set satisfying:


– if <sup>f</sup> and <sup>g</sup> are in Form, so are <sup>¬</sup>f, <sup>f</sup> <sup>∧</sup> <sup>g</sup>, **<sup>X</sup>**f, and <sup>f</sup>**U**g.

Additional operators are defined as shorthand for other formulas: false = ¬true, <sup>f</sup> <sup>∨</sup> <sup>g</sup> <sup>=</sup> <sup>¬</sup>((¬f) ∧ ¬g), <sup>f</sup> <sup>→</sup> <sup>g</sup> = (¬f) <sup>∨</sup> <sup>g</sup>, **<sup>F</sup>**<sup>f</sup> <sup>=</sup> true**U**f, **<sup>G</sup>**<sup>f</sup> <sup>=</sup> <sup>¬</sup>**F**¬f, and <sup>f</sup>**W**<sup>g</sup> = (f**U**g) <sup>∨</sup> **<sup>G</sup>**f. 

**Definition 2.** The *alphabet* of an LTL formula f, denoted αf, is the set of actions that occur syntactically within f. 

**Definition 3.** The *action-based semantics* of LTL is defined by the relation <sup>ζ</sup> <sup>|</sup>=<sup>A</sup> <sup>f</sup>, where <sup>ζ</sup> <sup>∈</sup> Act<sup>ω</sup> and <sup>f</sup> <sup>∈</sup> Form, which is defined as follows:

$$\begin{array}{lcl} & -\zeta & \mathsf{true},\\ & -\zeta & \equiv\_{\mathsf{A}} & a \text{ iff } \zeta(0) = a, \\ & -\zeta & \equiv\_{\mathsf{A}} & -f \text{ iff } \zeta \mid \models\_{\mathsf{A}} & f, \\ & -\zeta & \equiv\_{\mathsf{A}} & f \wedge g \text{ iff } \zeta \mid \models\_{\mathsf{A}} & f \text{ and } \zeta \mid \models\_{\mathsf{A}} & g, \\ & -\zeta & \equiv\_{\mathsf{A}} & \mathbf{X}f \text{ iff } \zeta^{1} \mid \models\_{\mathsf{A}} & f, \text{ and } \\ & -\zeta & \equiv\_{\mathsf{A}} & f \mathbf{U}g \text{ iff } \exists i \ge 0 \ . (\zeta^{i} \mid \models\_{\mathsf{A}} & g \wedge \forall j \in 0..i-1..\zeta^{j} \mid \models\_{\mathsf{A}} & f). \\ & & & \end{array}$$

When using the action-based semantics, the logic is sometimes referred to as "Action LTL" or ALTL [11,12].

The *state-based semantics* is defined by a relation <sup>ξ</sup> <sup>|</sup>=<sup>S</sup> <sup>f</sup>, where <sup>ξ</sup> <sup>∈</sup> (2Act)ω. The definition of |=<sup>S</sup> is well-known, and is exactly the same as Definition 3, except that ξ |=<sup>S</sup> a iff a ∈ ξ(0). The action semantics are consistent with the state semantics in the following sense. Let f ∈ Form, and ζ = a0a<sup>1</sup> ··· ∈ Actω. Let <sup>ξ</sup> <sup>=</sup> {a0}{a1}··· ∈ (2Act)ω. Then <sup>ζ</sup> <sup>|</sup>=<sup>A</sup> <sup>f</sup> iff <sup>ξ</sup> <sup>|</sup>=<sup>S</sup> <sup>f</sup>. The main difference between the state- and action-based formalisms is that in the statebased formalism, any number of atomic propositions can hold at each step. In the action-based formalism, precisely one action occurs in each step.

## **Definition 4.** Let f,g <sup>∈</sup> Form. Define


The following fact about the state-based semantics can be proved by induction on the formula structure:

**Lemma 1.** *Let* <sup>f</sup> <sup>∈</sup> Form *and* <sup>ξ</sup> <sup>=</sup> <sup>s</sup>0s<sup>1</sup> ··· ∈ (2Act)<sup>ω</sup>*. Let* <sup>ξ</sup> <sup>=</sup> <sup>s</sup> 0s <sup>1</sup> ··· *, where* s <sup>i</sup> = αf ∩ si*. Then* ξ |=<sup>S</sup> f *iff* ξ |=<sup>S</sup> f*.*

The following shows that action LTL, like ordinary state-based LTL, is a decidable logic:

**Proposition 1.** *Let* f,g <sup>∈</sup> Form*,* <sup>A</sup> <sup>=</sup> αf <sup>∪</sup> αg*, and*

$$h = \mathbf{G}\Big[\left(\bigwedge\_{a \in A} \neg a\right) \lor \bigvee\_{a \in A} \left(a \land \bigwedge\_{b \in A\backslash\{a\}} \neg b\right)\Big].$$

*Then* f ≡<sup>A</sup> g ⇔ f ∧ h ≡<sup>S</sup> g ∧ h*. In particular, action equivalence is decidable.*

*Proof.* Note the meaning of h: at each step in a state-based trace, at most one element of A is true.

Suppose <sup>f</sup> <sup>∧</sup> <sup>h</sup> <sup>≡</sup><sup>S</sup> <sup>g</sup> <sup>∧</sup> <sup>h</sup>. Let <sup>ζ</sup> <sup>=</sup> <sup>a</sup>0a<sup>1</sup> ··· ∈ Act<sup>ω</sup>. Let <sup>ξ</sup> <sup>=</sup> {a0}{a1}··· . We have ξ |=<sup>S</sup> h. By the consistency of the state and action semantics, we have

$$\zeta \mid =\_{\lambda} f \iff \xi \mid =\_{\mathfrak{s}} f \iff \xi \mid =\_{\mathfrak{s}} f \land h \iff \xi \mid =\_{\mathfrak{s}} g \land h \iff \xi \mid =\_{\mathfrak{s}} g \iff \zeta \mid =\_{\mathfrak{s}} g,$$

#### hence f ≡<sup>A</sup> g.

Suppose instead that f ≡<sup>A</sup> g. We wish to show ξ |=<sup>S</sup> f ∧ h ⇔ ξ |=<sup>S</sup> g ∧ h for any <sup>ξ</sup> <sup>=</sup> <sup>s</sup>0s<sup>1</sup> ···∈ (2Act)<sup>ω</sup>. By Lemma 1, it suffices to assume <sup>s</sup><sup>i</sup> <sup>⊆</sup> <sup>A</sup> for all <sup>i</sup>.

Let τ be any element of Act \ A. (Here we are using the fact that Act is infinite, while A is finite.) If |si| > 1 for some i, then ξ violates h and therefore violates both f ∧h and g∧h. So suppose |si| ≤ 1 for all i, which means ξ |=<sup>S</sup> h. Let ζ = a0a<sup>1</sup> ··· , where a<sup>i</sup> is the sole member of s<sup>i</sup> if |si| = 1, or τ if |si| = 0. By Lemma 1, ξ |=<sup>S</sup> f iff {a0}{a1}···|=<sup>S</sup> f. By the consistency of the action and state semantics, this is equivalent to ζ |=<sup>A</sup> f. A similar statement holds for g. Hence

$$\xi \mid \vdash\_{\mathfrak{s}} f \land h \iff \xi \mid \vdash\_{\mathfrak{s}} f \iff \zeta \mid \vdash\_{\mathfrak{s}} f \iff \zeta \mid \vdash\_{\mathfrak{s}} g \iff \xi \mid \vdash\_{\mathfrak{s}} g \iff \xi \mid \vdash\_{\mathfrak{s}} g \land h.$$

The proposition reduces the question of action equivalence to one of ordinary (state) equivalence of LTL formulas, which is known to be decidable ([26], see also [36, Thm. 24]).  **Definition 5.** For <sup>A</sup> <sup>⊆</sup> Act and <sup>f</sup> <sup>∈</sup> Form with αf <sup>⊆</sup> <sup>A</sup>, let

$$\mathcal{L}(f, A) = \{ \zeta \in A^{\omega} \mid \zeta \mid = \mid f \}.$$

**2.2 B¨uchi Automata**

**Definition 6.** <sup>A</sup> *B¨uchi Automaton* (BA) over Act is a tuple (S, Σ,→, S<sup>0</sup>, F) where


We will use the following notation and terminology for a BA B. The *source* of a transition (s, a, s ) is s, the *destination* is s , and the *label* is a. We write s <sup>a</sup> −→ s as shorthand for (s, a, s ) ∈→, and <sup>s</sup> <sup>a</sup>0a1...a*<sup>n</sup>* −−−−−−→ <sup>s</sup> for <sup>∃</sup>s1, s2,...s<sup>n</sup> <sup>∈</sup> S.s <sup>a</sup><sup>0</sup> −→ s1 <sup>a</sup><sup>1</sup> −→ <sup>s</sup><sup>2</sup> ...s<sup>n</sup> <sup>a</sup>*<sup>n</sup>* −−→ <sup>s</sup> . For <sup>a</sup> <sup>∈</sup> <sup>A</sup> and <sup>s</sup> <sup>∈</sup> <sup>S</sup>, we say <sup>a</sup> is *enabled at* <sup>s</sup> if <sup>s</sup> <sup>a</sup> → s for some s ∈ S. The set of all actions enabled at s is denoted enabled(B,s).

For s ∈ S, a *path in* B *starting from* s is a (finite or infinite) sequence π of transitions such that (1) if π is not empty, the source of π(0) is s, and (2) the destination of π(i) is the source of π(i + 1) for all i for which these are defined. If π is not empty, define first(π) to be s; if π is finite, define last(π) to be the destination of the last transition of π. We say π *spells the word* a0a<sup>1</sup> ··· , where a<sup>i</sup> is the label of π(i).

An infinite path is *accepting* if it visits a state in F infinitely often. An *(accepting) trace starting from* s is a word spelled by an (accepting) path starting from s. An *(accepting) trace of* B is an (accepting) trace starting from an initial state. The *language of* B, denoted L(B), is the set of all accepting traces of B.

**Proposition 2.** *There is an algorithm that consumes any finite subset* A *of* Act *and an* f ∈ Form *with* αf ⊆ A*, and produces a BA* B *with alphabet* A *such that* L(B) = L(f,A)*.*

*Proof.* There are well-known algorithms to produce a BA C with alphabet 2<sup>A</sup> which accepts exactly the words satisfying f under the state semantics (e.g., [37]). Let B be the same as C, except the alphabet is A and there is a transition s <sup>a</sup> −→ s in B iff there is a transition s {a} −−→ <sup>s</sup> in <sup>C</sup>. We have

$$\begin{aligned} a\_0 a\_1 \cdots \in \mathcal{L}(B) &\Leftrightarrow \{a\_0\} \{a\_1\} \cdots \in \mathcal{L}(C) \\ &\Leftrightarrow \{a\_0\} \{a\_1\} \cdots \mid =\_\mathbb{s} f \\ &\Leftrightarrow a\_0 a\_1 \cdots \in \mathcal{L}(f, A). \end{aligned}$$

 

 

In practice, tools that convert LTL formulas to BAs produce an automaton in which an edge is labeled by a propositional formula φ over αf. Such an edge represents a set of transitions, one for each P ⊆ A for which φ holds for the valuation that assigns *true* to each element of P and *false* to each element of A \ P. In this case, the conversion to B entails creating one transition for each a ∈ A for which φ holds when *true* is assigned to a and *false* is assigned to all other actions.

**Definition 7.** Let <sup>B</sup><sup>i</sup> = (Si, Σi,→i, S<sup>0</sup> <sup>i</sup> , Fi) (i = 1, 2) denote two BAs over Act. The *parallel composition of* B<sup>1</sup> *and* B<sup>2</sup> is the BA

$$B\_1 \parallel B\_2 \equiv (S\_1 \times S\_2, \Sigma\_1 \cup \Sigma\_2, \to, S\_1^0 \times S\_2^0, F\_1 \times F\_2),$$

where → is defined by

$$\frac{s\_1 \xrightarrow{a} s\_1' \quad a \notin \Sigma\_2}{\langle s\_1, s\_2 \rangle \xrightarrow{a} \langle s\_1', s\_2 \rangle} \quad \frac{s\_2 \xrightarrow{a} s\_2' \quad a \notin \Sigma\_1}{\langle s\_1, s\_2 \rangle \xrightarrow{a} \langle s\_1, s\_2' \rangle} \quad \frac{s\_1 \xrightarrow{a} s\_1' \quad s\_2 \xrightarrow{a} s\_2'}{\langle s\_1, s\_2 \rangle \xrightarrow{a} \langle s\_1', s\_2' \rangle}. \tag{7}$$

If we flatten all tuples (e.g., identify (S<sup>1</sup> × S2) × S<sup>3</sup> with S<sup>1</sup> × S<sup>2</sup> × S3) then is an associative operator.

Note that in the special case where the two automata have the same alphabet (Σ<sup>1</sup> = Σ2), every action is synchronizing, and the parallel composition is the usual "synchronous product." In this case, L(B<sup>1</sup> B2) = L(B1) ∩ L(B2).

#### **2.3 Labeled Transition Systems**

**Definition 8.** <sup>A</sup> *labeled transition system* (LTS) over Act is a tuple (Q, A,→, q<sup>0</sup>) for which (Q, A,→, {q<sup>0</sup>}, Q) is a BA over Act. In other words, it is a BA in which all states are accepting and there is only one initial state. 

**Definition 9.** Let M be an LTS with alphabet A, and f an LTL formula with αf ⊆ A. We write M |= f if L(M) ⊆ L(f,A). 

The following observation is the basis of the automata-theoretic approach to model checking (cf. [36, §4.2]):

**Proposition 3.** *Let* <sup>M</sup> *be an LTS with alphabet* <sup>A</sup> *and* <sup>f</sup> *an LTL formula with* αf <sup>⊆</sup> A*. Let* B *be a BA with* L(B) = L(¬f,A)*. Then* M |= f ⇔ L(M B) = ∅*.*

*Proof.* M and B have the same alphabet, so L(M B) = L(M) ∩ L(B), hence

$$
\mathcal{L}(M \parallel B) = \mathcal{L}(M) \cap \mathcal{L}(\neg f, A) = \mathcal{L}(M) \cap (A^{\omega} \backslash \mathcal{L}(f, A)) = \mathcal{L}(M) \backslash \mathcal{L}(f, A).
$$

This set is empty iff L(M) ⊆ L(f,A). 

There are various algorithms to determine language emptiness of a BA; in this paper we use the well-known Nested Depth First Search (NDFS) algorithm [2].

#### **3 Interruptible Properties**

#### **3.1 Definition and Examples**

An LTS comes with an alphabet, which is a subset A of Act. By a *property over* <sup>A</sup> we simply mean a subset <sup>P</sup> of <sup>A</sup>ω. We say a trace <sup>ζ</sup> <sup>∈</sup> <sup>A</sup><sup>ω</sup> *satisfies* <sup>P</sup> if <sup>ζ</sup> <sup>∈</sup> <sup>P</sup>. We have already seen two ways to specify properties. An LTL formula f with αf ⊆ A specifies the property L(f,A). A B¨uchi automaton B with alphabet A specifies the property L(B). We next define a special class of properties:

**Definition 10.** Given sets <sup>V</sup> <sup>⊆</sup> <sup>A</sup> <sup>⊆</sup> Act, we say a property <sup>P</sup> over <sup>A</sup> is <sup>V</sup>  *interruptible* if

$$|\zeta|\_V = \eta|\_V \Rightarrow (\zeta \in P \iff \eta \in P) \qquad \text{ for all } \zeta, \eta \in A^\omega.$$

An LTL formula f is V *-interruptible* if L(f,Act) is V -interruptible. We say f is *interruptible* if f is αf-interruptible. The set of all interruptible LTL formulas is denoted Intrpt. 

The set V is known as the *visible set*. The definition essentially says that the insertion or deletion of invisible actions (those in A\V ) has no bearing on whether a trace satisfies P. Put another way, the question of whether a trace belongs to P is determined purely by its visible actions. The following collects some basic facts about interruptibility. All follow immediately from the definitions.

**Proposition 4.** *Let* <sup>V</sup> <sup>⊆</sup> <sup>A</sup> <sup>⊆</sup> Act*,* <sup>P</sup> <sup>⊆</sup> <sup>A</sup><sup>ω</sup> *and* f,g <sup>∈</sup> Form*. Then all of the following hold:*


$$\forall \zeta, \eta \in \mathsf{Act}^{\omega} \; . \; (\zeta|\_{\alpha f} = \eta|\_{\alpha f} \land \zeta \; | =\_{\mathsf{A}} f) \Rightarrow \eta \; | =\_{\mathsf{A}} f.$$

*5. If* αf = αg *and* f ≡<sup>A</sup> g *then* f *is interruptible iff* g *is interruptible.*

Many, if not most, properties that arise in practice are V -interruptible for the set V of actions that are mentioned in the property. Assuming a, b, and c are distinct actions, we have:

– For any n ≥ 0, the property "a occurs at most n times" is {a}-interruptible, since the insertion or deletion of actions other than a cannot affect whether a word satisfies that property. The same is true for the properties "a occurs at least n times" and "a occurs exactly n times." These are examples of the *bounded existence pattern with global scope* in a widely used property specification pattern system [5]. LTL formulas in this category include **<sup>G</sup>**¬<sup>a</sup> (<sup>a</sup> occurs 0 times), **<sup>F</sup>**<sup>a</sup> (<sup>a</sup> occurs at least once), and **<sup>F</sup>**(<sup>a</sup> <sup>∧</sup> **XF**a) (<sup>a</sup> occurs at least twice).


On the other hand, the property "a occurs at time 0", (LTL formula a) is not {a}-interruptible. Neither is "an event other than a occurs at least once" (**F**¬a) nor "only <sup>a</sup> occurs" (**G**a). The property "every occurrence of <sup>a</sup> is followed immediately by <sup>b</sup>," formula **<sup>G</sup>**(<sup>a</sup> <sup>→</sup> **<sup>X</sup>**b), is not {a, b}-interruptible. The property "after any occurrence of a, c eventually occurs and until then only b occurs," **<sup>G</sup>**(<sup>a</sup> <sup>→</sup> **<sup>X</sup>**(b**U**c)), is not {a, b, c}-interruptible.

The following provides a useful way to show that two interruptible properties are equal:

**Lemma 2.** *Suppose* <sup>V</sup> <sup>⊆</sup> <sup>A</sup> <sup>⊆</sup> Act *and* <sup>P</sup><sup>1</sup> *and* <sup>P</sup><sup>2</sup> *are* <sup>V</sup> *-interruptible properties over* <sup>A</sup>*. Let* <sup>F</sup> <sup>=</sup> <sup>V</sup> <sup>ω</sup> <sup>∪</sup> <sup>V</sup> <sup>∗</sup> ◦ (<sup>A</sup> \ <sup>V</sup> )<sup>ω</sup>. *Then* <sup>P</sup><sup>1</sup> <sup>=</sup> <sup>P</sup><sup>2</sup> *iff* <sup>P</sup><sup>1</sup> ∩ F <sup>=</sup> <sup>P</sup><sup>2</sup> ∩ F*.*

*Proof.* Assume P<sup>1</sup> ∩ F = P<sup>2</sup> ∩ F. Let ζ ∈ P1. If ζ|<sup>V</sup> is infinite, then since <sup>ζ</sup>|<sup>V</sup> <sup>|</sup><sup>V</sup> <sup>=</sup> <sup>ζ</sup>|<sup>V</sup> , and <sup>P</sup><sup>1</sup> is <sup>V</sup> -interruptible, <sup>ζ</sup>|<sup>V</sup> <sup>∈</sup> <sup>P</sup>1. But <sup>ζ</sup>|<sup>V</sup> <sup>∈</sup> <sup>V</sup> <sup>ω</sup>, so <sup>ζ</sup>|<sup>V</sup> <sup>∈</sup> <sup>P</sup>1∩F, and therefore ζ|<sup>V</sup> ∈ P2. Since P<sup>2</sup> is V -interruptible, ζ ∈ P2.

If <sup>ζ</sup>|<sup>V</sup> is finite, there is a prefix <sup>θ</sup> of <sup>ζ</sup> such that <sup>ζ</sup> <sup>=</sup> <sup>θ</sup> ◦ <sup>η</sup>, with <sup>η</sup> <sup>∈</sup> (<sup>V</sup> \ <sup>A</sup>)<sup>ω</sup>. Let <sup>ξ</sup> <sup>=</sup> <sup>θ</sup>|<sup>V</sup> ◦ <sup>η</sup>. We have <sup>ξ</sup> <sup>∈</sup> <sup>V</sup> <sup>∗</sup> ◦ (<sup>A</sup> \ <sup>V</sup> )<sup>ω</sup> and <sup>ξ</sup>|<sup>V</sup> <sup>=</sup> <sup>ζ</sup>|<sup>V</sup> , hence <sup>ξ</sup> <sup>∈</sup> <sup>P</sup><sup>1</sup> ∩ F. Therefore ξ ∈ P2, and since P<sup>2</sup> is V -interruptible, ζ ∈ P2. 

The elements of F are known as the V -*interrupt-free* words over A.

#### **3.2 Decidability of Interruptibility of LTL Formulas**

We next show that interruptibility is a decidable property of LTL formulas. Define intrpt: Form <sup>→</sup> Form as follows. Given <sup>f</sup> <sup>∈</sup> Form, let <sup>V</sup> <sup>=</sup> αf and <sup>V</sup><sup>ˆ</sup> <sup>=</sup> <sup>a</sup>∈<sup>V</sup> <sup>a</sup>, and define <sup>β</sup> : Form <sup>→</sup> Form by

$$\begin{aligned} \beta(\text{true}) &= \text{true} \\ \beta(a) &= (\neg \hat{V}) \mathbf{U} a \\ \beta(\neg f\_1) &= \neg \beta(f\_1) \\ \beta(f\_1 \land f\_2) &= \beta(f\_1) \land \beta(f\_2) \\ \beta(\mathbf{X} f\_1) &= ((\neg \hat{V}) \mathbf{U} (\hat{V} \land \mathbf{X} \beta(f\_1))) \lor ((\mathbf{G} \neg \hat{V}) \land \mathbf{X} \beta(f\_1)) \\ \beta(f\_1 \mathbf{U} f\_2) &= \beta(f\_1) \mathbf{U} \beta(f\_2) .\end{aligned}$$

for a ∈ Act and f1, f<sup>2</sup> ∈ Form. Let intrpt(f) = β(f).

**Theorem 1.** *Let* f *be an LTL formula over* Act*. The following hold:*


*In particular, interruptibility of LTL formulas is decidable.*

Before proving Theorem 1, we give some intuition regarding the definition of intrpt. Function β can be thought of as consuming a property on V -interrupt-free words (i.e., words in <sup>V</sup> <sup>ω</sup> <sup>∪</sup> <sup>V</sup> <sup>∗</sup> ◦ (<sup>A</sup> \ <sup>V</sup> )ω) and extending it to a property on all words (Aω). It is designed so that β(g) is V -interruptible and agrees with g on V -interrupt-free words. For example, the formula a means "a is the first action" (in an interrupt-free word), which extends to the property "a is the first visible action" (in an arbitrary word). The formula **X**f<sup>1</sup> states "f<sup>1</sup> holds after removing the first action," so β(**X**f1) should declare "β(f1) holds after removing the prefix ending in the first visible action." That is almost correct, but there is also the possibility that an element of A<sup>ω</sup> has no visible action, which is the reason for the second clause in the definition of β(**X**f1).

The remainder of this subsection is devoted to the proof of Theorem 1. First note that intrpt(f) and f have the same alphabet, i.e., αintrpt(f) = V .

**Proof of Part 1.** Say a subformula g of f is *good* if β(g) is V -interruptible, i.e.,

$$
\forall \zeta, \eta \in \mathsf{Act}^{\omega}. \; \zeta|\_{V} = \eta|\_{V} \Rightarrow (\zeta \; \vdash\_{\mathsf{A}} \beta(g) \; \Leftrightarrow \; \eta \; \vdash\_{\mathsf{A}} \beta(g)).
$$

We show by induction on formula structure that every subformula of f is good. The case g = f will show that intrpt(f) is interruptible. Assume throughout that ζ|<sup>V</sup> = η|<sup>V</sup> .

If g = true then β(g) = true, so g is clearly good.

If <sup>g</sup> <sup>=</sup> <sup>a</sup> for some <sup>a</sup> <sup>∈</sup> Act, then <sup>ζ</sup> <sup>|</sup>=<sup>A</sup> <sup>β</sup>(g)=(¬V<sup>ˆ</sup> )**U**<sup>a</sup> iff <sup>ζ</sup>|<sup>V</sup> is non-empty and ζ|<sup>V</sup> (0) = a. Since this depends only on ζ|<sup>V</sup> , g is good.

If g = ¬f<sup>1</sup> and f<sup>1</sup> is good, then g is good because

$$
\zeta \vdash\_{\mathsf{A}} \beta(g) \iff \zeta \mid \models\_{\mathsf{A}} \beta(f\_{\mathsf{I}}) \Leftrightarrow \eta \mid \models \beta(f\_{\mathsf{I}}) \Leftrightarrow \eta \mid \models\_{\mathsf{A}} \beta(g).
$$

If g = f<sup>1</sup> ∧ f2, and f<sup>1</sup> and f<sup>2</sup> are good, then g is good because

$$\begin{aligned} \zeta \vdash\_{\mathsf{A}} \beta(g) &\Leftrightarrow \zeta \vdash\_{\mathsf{A}} \beta(f\_1) \wedge \zeta \vdash\_{\mathsf{A}} \beta(f\_2) \\ &\Leftrightarrow \eta \vdash\_{\mathsf{A}} \beta(f\_1) \wedge \eta \vdash\_{\mathsf{A}} \beta(f\_2) \Leftrightarrow \eta \vdash\_{\mathsf{A}} \beta(g) .\end{aligned}$$

Suppose g = **X**f<sup>1</sup> and f<sup>1</sup> is good. There are two cases:

– **Case 1:** <sup>ζ</sup>|<sup>V</sup> is empty. Then no suffix of <sup>ζ</sup> or <sup>η</sup> satisfies <sup>V</sup><sup>ˆ</sup> . Hence

$$\theta \mid \vdash\_{\mathsf{A}} \beta(g) \iff \theta \mid \vdash\_{\mathsf{A}} \mathbf{X}\beta(f\_1) \iff \theta^1 \mid \vdash\_{\mathsf{A}} \beta(f\_1) \qquad (\theta \in \{\zeta, \eta\}).$$

Moreover, <sup>ζ</sup><sup>1</sup>|<sup>V</sup> <sup>=</sup> <sup>η</sup><sup>1</sup>|<sup>V</sup> (as both are empty), and <sup>β</sup>(f1) is good, so we have <sup>ζ</sup><sup>1</sup> <sup>|</sup>=<sup>A</sup> <sup>β</sup>(f1) <sup>⇔</sup> <sup>η</sup><sup>1</sup> <sup>|</sup>=<sup>A</sup> <sup>β</sup>(f1). These show <sup>ζ</sup> <sup>|</sup>=<sup>A</sup> <sup>β</sup>(g) <sup>⇔</sup> <sup>η</sup> <sup>|</sup>=<sup>A</sup> <sup>β</sup>(g).

– **Case 2:** <sup>ζ</sup>|<sup>V</sup> is nonempty. Let <sup>i</sup> be the index of the first occurrence of an element of V in ζ, and j the similar index for η. We have

$$
\zeta^{i+1}|\_V = \left(\zeta|\_V\right)^1 = \left(\eta|\_V\right)^1 = \eta^{j+1}|\_V.
$$

As <sup>f</sup><sup>1</sup> is good, it follows that <sup>ζ</sup><sup>i</sup>+1 <sup>|</sup>=<sup>A</sup> <sup>β</sup>(f1) <sup>⇔</sup> <sup>η</sup><sup>j</sup>+1 <sup>|</sup>=<sup>A</sup> <sup>β</sup>(f1). Hence

$$
\zeta \mid\_{\mathsf{A}} \vdash\_{\mathsf{A}} \beta(g) \iff \zeta^{i+1} \mid\_{\mathsf{A}} \vdash\_{\mathsf{A}} \beta(f\_1) \Leftrightarrow \ \eta^{j+1} \mid\_{\mathsf{A}} \vdash\_{\mathsf{A}} \beta(f\_1) \Leftrightarrow \ \eta \mid\_{\mathsf{A}} \vdash\_{\mathsf{A}} \beta(g) .
$$

Suppose g = f1**U**f<sup>2</sup> and f<sup>1</sup> and f<sup>2</sup> are good. We have β(g) = β(f1)**U**β(f2). If <sup>ζ</sup> <sup>|</sup>=<sup>A</sup> <sup>β</sup>(g) then there exists <sup>i</sup> <sup>≥</sup> 0 such that <sup>ζ</sup><sup>i</sup> <sup>|</sup>=<sup>A</sup> <sup>β</sup>(f2) and <sup>ζ</sup><sup>j</sup> <sup>|</sup>=<sup>A</sup> <sup>β</sup>(f1) for j<i. Now there is some i <sup>≥</sup> 0 such that <sup>η</sup><sup>i</sup> - <sup>|</sup><sup>V</sup> <sup>=</sup> <sup>ζ</sup><sup>i</sup> |<sup>V</sup> and for all j < i , there is some j<i such that ηj- <sup>|</sup><sup>V</sup> <sup>=</sup> <sup>ζ</sup><sup>j</sup> <sup>|</sup><sup>V</sup> . It follows that <sup>η</sup> <sup>|</sup><sup>=</sup> <sup>β</sup>(g). Hence <sup>g</sup> is good.

**Proof of Part 2.** Suppose first that intrpt(f) <sup>≡</sup><sup>A</sup> <sup>f</sup>. From part 1, intrpt(f) is interruptible, so Proposition 4(5) implies f is interruptible.

Suppose instead that f is interruptible. We wish to show intrpt(f) ≡<sup>A</sup> f. By Lemma 2, it suffices to show the two formulas agree on V -interrupt-free words. We will show by induction that for each subformula g of f, ζ |=<sup>A</sup> g ⇔ ζ |=<sup>A</sup> β(g) for all V -interrupt-free ζ. The case g = f will complete the proof.

If g = true, β(g) = true and the condition clearly holds.

If <sup>g</sup> <sup>=</sup> <sup>a</sup> for some <sup>a</sup> <sup>∈</sup> Act, <sup>ζ</sup> <sup>|</sup>=<sup>A</sup> <sup>β</sup>(g) <sup>⇔</sup> <sup>ζ</sup> <sup>|</sup>=<sup>A</sup> (¬V<sup>ˆ</sup> )**U**<sup>a</sup> <sup>⇔</sup> <sup>ζ</sup> <sup>|</sup>=<sup>A</sup> a, as <sup>ζ</sup> is V -interrupt-free.

If g = ¬f<sup>1</sup> and the inductive hypothesis holds for f1, then

$$\zeta \mid \vdash\_{\mathsf{A}} \beta(g) \iff \zeta \mid \models\_{\mathsf{A}} \beta(f\_1) \iff \zeta \mid \models\_{\mathsf{A}} f\_1 \iff \zeta \mid \models\_{\mathsf{A}} g.$$

If g = f<sup>1</sup> ∧ f<sup>2</sup> and the inductive hypothesis holds for f<sup>1</sup> and f<sup>2</sup> then

$$\zeta \mid \vdash\_{\mathsf{A}} \beta(g) \iff \zeta \mid \vdash\_{\mathsf{A}} \beta(f\_1) \land \zeta \mid \vdash\_{\mathsf{A}} \beta(f\_2) \iff \zeta \mid \vdash\_{\mathsf{A}} f\_1 \land \zeta \mid \vdash\_{\mathsf{A}} f\_2 \Leftrightarrow \zeta \mid \vdash\_{\mathsf{A}} g.$$

Suppose g = **X**f<sup>1</sup> and the inductive hypothesis holds for f1. Note that any suffix of a <sup>V</sup> -interrupt-free word, e.g., <sup>ζ</sup><sup>1</sup>, is also <sup>V</sup> -interrupt-free. If <sup>ζ</sup>|<sup>V</sup> is empty,

$$\begin{array}{ccccccccc} \zeta & \models\_{\mathsf{A}} & \beta(g) \Leftrightarrow \zeta & \models\_{\mathsf{A}} & \mathbf{X}\beta(f\_{1}) \Leftrightarrow \zeta^{1} & \models\_{\mathsf{A}} & \beta(f\_{1}) \Leftrightarrow \zeta^{1} & \models\_{\mathsf{A}} & f\_{1} \Leftrightarrow \zeta & \models\_{\mathsf{A}} & g. \\\\ \text{If } \triangleleft & \text{ is a common theorem } \wedge & \hat{V} & \dots & & & & & & \\ \end{array}$$

$$\text{If } \zeta |\_V \text{ is nonempty, then } \zeta \text{ } \vdash\_{\mathsf{A}} \hat{V}, \text{ so }$$

$$\begin{aligned} \zeta \mid\_{\mathsf{A}} \vdash\_{\mathsf{A}} \beta(g) &\Leftrightarrow \zeta \mid\_{\mathsf{A}} \vdash\_{\mathsf{A}} (\neg \hat{V}) \mathbf{U} (\hat{V} \wedge \mathbf{X} \beta(f\_{1})) &\Leftrightarrow \zeta \mid\_{\mathsf{A}} \mathbf{X} \beta(f\_{1}) \\ &\Leftrightarrow \zeta^{1} \mid\_{\mathsf{A}} \beta(f\_{1}) \Leftrightarrow \zeta^{1} \mid\_{\mathsf{A}} \vdash\_{\mathsf{A}} f\_{1} \Leftrightarrow \zeta \mid\_{\mathsf{A}} g. \end{aligned}$$

If g = f1**U**f2, then applying the inductive hypothesis to f<sup>1</sup> and f<sup>2</sup> yields

$$\begin{aligned} \zeta \quad \mid \vdash\_{\mathsf{A}} g \iff \exists i > 0 \, . \, \zeta^{i} \; \mid \vdash\_{\mathsf{A}} f\_{2} \land \forall j < i \, . \, \zeta^{j} \; \mid \vdash\_{\mathsf{A}} f\_{1} \\ \Leftrightarrow \, \exists i > 0 \, . \, \zeta^{i} \; \mid \vdash\_{\mathsf{A}} \beta(f\_{2}) \land \forall j < i \, . \, \zeta^{j} \; \mid \vdash\_{\mathsf{A}} \beta(f\_{1}) \\ \Leftrightarrow \, \zeta \; \mid \vdash\_{\mathsf{A}} \beta(g) . \end{aligned}$$

Decidability follows from part 2 and Proposition 1. This completes the proof of Theorem 1.

*Remark 1.* The definition of β(**X**f1) is convenient for the proof but shorter definitions also work. If the formula <sup>f</sup><sup>1</sup> is satisfied by some word <sup>ζ</sup> <sup>∈</sup> (<sup>A</sup> \ <sup>V</sup> )<sup>ω</sup>, then all such <sup>ζ</sup> satisfy <sup>f</sup>1, and the clause (**G**¬V<sup>ˆ</sup> ) <sup>∧</sup> **<sup>X</sup>**β(f1) can be replaced by **<sup>G</sup>**¬V<sup>ˆ</sup> . Otherwise, that clause can be removed altogether. One can determine whether a formula is satisfied by such a word by replacing every occurrence of every action with false.

#### **3.3 Generation of Interruptible LTL Formulas**

The following can be used to show that many formulas are interruptible. It establishes a kind of parity pattern involving a class of *positive* formulas (Pos) and a class of *negative* formulas (Neg). It is proved in [28].

**Proposition 5.** *There exist* Pos, Neg <sup>⊆</sup> Form *such that (i) for all* f,f <sup>∈</sup> Form*,*

$$\begin{aligned} (f \in \mathsf{Pos} \land f' \equiv\_{\mathsf{A}} f) &\Rightarrow f' \in \mathsf{Pos} \\ (f \in \mathsf{Neg} \land f' \equiv\_{\mathsf{A}} f) &\Rightarrow f' \in \mathsf{Neg}, \end{aligned}$$

*and (ii) for all* a ∈ Act*,* f1, f<sup>2</sup> ∈ Intrpt*,* g1, g<sup>2</sup> ∈ Pos*, and* h1, h<sup>2</sup> ∈ Neg*,*

$$false, \ a, \ \neg h\_1, \ g\_1 \land g\_2, \ g\_1 \lor g\_2, \ a \land f\_1, \ a \land \mathbf{X}f\_1 \in \mathsf{Pos}\ .$$

$$\vdash true,\ \neg a,\ \neg g\_1,\ h\_1 \land h\_2,\ h\_1 \lor h\_2,\ \neg a \lor f\_1,\ \neg a \lor \mathbf{X}f\_1\ \in\ \mathsf{Neg}$$

*true*, *false*, f<sup>1</sup> <sup>∧</sup> <sup>f</sup>2, f<sup>1</sup> <sup>∨</sup> <sup>f</sup>2, <sup>¬</sup>f1, **<sup>F</sup>**g1, **<sup>G</sup>**h1, f1**U**f2, h1**U**g1, h1**U**f<sup>1</sup> <sup>∈</sup> Intrpt.

Consider the examples from Sect. 3.1. The formula a is positive, so **F**a is interruptible. Since <sup>¬</sup><sup>a</sup> is negative, **<sup>G</sup>**¬<sup>a</sup> is interruptible. Since **<sup>F</sup>**<sup>a</sup> is interruptible, <sup>a</sup> <sup>∧</sup> **XF**<sup>a</sup> is positive, hence **<sup>F</sup>**(<sup>a</sup> <sup>∧</sup> **XF**a) is interruptible.

Formula **<sup>G</sup>**(<sup>a</sup> <sup>→</sup> **<sup>F</sup>**b) is seen to be interruptible as follows. Since <sup>b</sup> <sup>∈</sup> Pos, **<sup>F</sup>**<sup>b</sup> <sup>∈</sup> Intrpt, whence <sup>¬</sup><sup>a</sup> <sup>∨</sup> **<sup>F</sup>**<sup>b</sup> <sup>∈</sup> Neg. Since this last formula is action-equivalent to <sup>a</sup> <sup>→</sup> **<sup>F</sup>**b, we have <sup>a</sup> <sup>→</sup> **<sup>F</sup>**<sup>b</sup> <sup>∈</sup> Neg. Therefore **<sup>G</sup>**(<sup>a</sup> <sup>→</sup> **<sup>F</sup>**b) <sup>∈</sup> Intrpt.

Similarly, (¬b)**U**<sup>c</sup> <sup>∈</sup> Intrpt, so <sup>a</sup> <sup>→</sup> **<sup>X</sup>**((¬b)**U**c) <sup>∈</sup> Neg. This negative formula is action-equivalent to <sup>a</sup> <sup>→</sup> ((¬b)**U**c), whence **<sup>G</sup>**(<sup>a</sup> <sup>→</sup> ((¬b)**U**c)) <sup>∈</sup> Intrpt.

Note that Intrpt and the set of stutter-invariant formulas are not comparable. For example, <sup>f</sup> <sup>=</sup> **<sup>F</sup>**(<sup>a</sup> <sup>∧</sup> **XF**a) is interruptible, but not stutter-invariant. In fact f is not action-equivalent to any stutter-invariant formula g, since if there were such a g, the sequence aab<sup>ω</sup> would satisfy g, but the stutter-equivalent sequence ab<sup>ω</sup> cannot satisfy g. Conversely, the formulas a and **G**a are both stutter-invariant, but neither is interruptible. The formula **F**a is both stutterinvariant and interruptible. Finally, the formula **X**a is neither stutter-invariant nor interruptible.

#### **3.4 Decidability of Interruptibility of B¨uchi Automata**

**Definition 11.** Let <sup>B</sup> be a BA with alphabet <sup>A</sup>, <sup>V</sup> <sup>⊆</sup> <sup>A</sup> (the *visible* actions), and I = A \ V (the *invisible actions*). We say B is in V *-interrupt normal form* if the following hold for any x ∈ I, a ∈ A, and states s1, s2, and s3:

1. If s<sup>1</sup> a → s<sup>2</sup> then B has a state s <sup>1</sup> such that s<sup>1</sup> x → s 1 a → s2. 2. If s<sup>1</sup> x → s<sup>2</sup> a → s<sup>3</sup> then s<sup>1</sup> a → s<sup>3</sup> and if s<sup>2</sup> is accepting then s<sup>1</sup> or s<sup>3</sup> is accepting. 3. If s<sup>1</sup> x → s<sup>2</sup> then s<sup>1</sup> y → s<sup>2</sup> for all y ∈ I.

**Proposition 6.** *Suppose* <sup>B</sup> *is in* <sup>V</sup> *-interrupt normal form. Then* <sup>L</sup>(B) *is* <sup>V</sup>  *interruptible.*

*Proof.* Suppose ζ,η <sup>∈</sup> <sup>A</sup>ω, <sup>ζ</sup> ∈ L(B), and <sup>ζ</sup>|<sup>V</sup> <sup>=</sup> <sup>η</sup>|<sup>V</sup> . We wish to show <sup>η</sup> ∈ L(B). Let π be an accepting path for ζ.

Assume ζ|<sup>V</sup> is infinite. By Definition 11(2), we can remove all invisible transitions from the accepting path π, and the result is an accepting path that spells ζ|<sup>V</sup> . By Definition 11(1), we can insert any arbitrary finite sequence of invisible transition between two consecutive visible transitions; we can therefore construct an accepting path for η.

If ζ|<sup>V</sup> is finite, proceed as above to form an accepting path which spells a finite prefix of η followed by an infinite word of invisible actions. By Definition 11(3), that infinite suffix can be transformed to spell any infinite word of invisibles, and in that way one obtains an accepting path for η. 

Given any BA <sup>B</sup> = (S, A, T, S0, F) and a visible set <sup>V</sup> <sup>⊆</sup> <sup>A</sup>, define a BA norm(B,V ) as follows: if V = A, norm(B,V ) = B, otherwise norm(B,V ) is Bˆ = (S, A, ˆ T , ˆ Sˆ<sup>0</sup>, Fˆ), where

D = {s ∈ S | there is an accepting path from s with all labels in I} <sup>S</sup><sup>ˆ</sup> <sup>=</sup> {u<sup>ˆ</sup> <sup>|</sup> <sup>u</sup> <sup>∈</sup> <sup>S</sup>}∪{u <sup>|</sup> <sup>u</sup> <sup>∈</sup> <sup>F</sup> \ <sup>D</sup>}∪{DIV} <sup>S</sup>ˆ<sup>0</sup> <sup>=</sup> {u<sup>ˆ</sup> <sup>|</sup> <sup>u</sup> <sup>∈</sup> <sup>S</sup><sup>0</sup>} <sup>F</sup><sup>ˆ</sup> <sup>=</sup> {u<sup>ˆ</sup> <sup>|</sup> <sup>u</sup> <sup>∈</sup> <sup>F</sup>}∪{DIV} <sup>T</sup><sup>ˆ</sup> <sup>=</sup> {(ˆu, a, <sup>v</sup>ˆ) <sup>|</sup> <sup>a</sup> <sup>∈</sup> <sup>V</sup> <sup>∧</sup> u, v <sup>∈</sup> <sup>S</sup> <sup>∧</sup> (u, a, v) <sup>∈</sup> <sup>T</sup> } ∪ {(ˆu, x, uˆ) | x ∈ I ∧ u ∈ D ∪ (S \ F) } ∪ {(DIV, x, DIV) | x ∈ I } ∪ {(ˆu, x, DIV) | x ∈ I ∧ u ∈ D \ F } ∪ {(ˆu, x, u),(u, x, u) <sup>|</sup> <sup>x</sup> <sup>∈</sup> <sup>I</sup> <sup>∧</sup> <sup>u</sup> <sup>∈</sup> <sup>F</sup> \ <sup>D</sup> } ∪ {(u, a, <sup>v</sup>ˆ) <sup>|</sup> <sup>a</sup> <sup>∈</sup> <sup>V</sup> <sup>∧</sup> <sup>u</sup> <sup>∈</sup> <sup>F</sup> \ <sup>D</sup> <sup>∧</sup> <sup>v</sup> <sup>∈</sup> <sup>S</sup> <sup>∧</sup> (u, a, v) <sup>∈</sup> <sup>T</sup> }

The set Sˆ consists of the *original states* uˆ, the *sharp states* u, and one additional state DIV. The mapping from <sup>S</sup> to <sup>S</sup><sup>ˆ</sup> defined by <sup>u</sup> → <sup>u</sup><sup>ˆ</sup> is injective and preserves acceptability and visible transitions, i.e., for any u, v ∈ S and <sup>a</sup> <sup>∈</sup> <sup>V</sup> , <sup>u</sup> <sup>a</sup> <sup>→</sup> <sup>v</sup> <sup>⇔</sup> <sup>u</sup><sup>ˆ</sup> <sup>a</sup> → vˆ. It follows that paths in B in which all labels are visible correspond one-to-one with paths through original states in Bˆ in which all labels are visible. Note that every invisible transition in Bˆ is a self-loop or ends in a sharp state or DIV. Moreover, all transitions in Bˆ ending in a sharp state or DIV are invisible.

**Proposition 7.** *For any BA* <sup>B</sup> *with alphabet* <sup>A</sup>*, and any visible set* <sup>V</sup> <sup>⊆</sup> <sup>A</sup>*,* norm(B,V ) *is in* V *-interrupt normal form.*

*Proof.* To see Definition 11(1), suppose s<sup>1</sup> a → s2. If s<sup>1</sup> x → s1, take s <sup>1</sup> = s1. Otherwise, s<sup>1</sup> = ˆu for some u ∈ F \ D, and we can take s <sup>1</sup> = u.

For Definition 11(2), suppose s<sup>1</sup> x → s<sup>2</sup> a → s3. We need to show s<sup>1</sup> a → s<sup>3</sup> and if s<sup>2</sup> is accepting then s<sup>1</sup> or s<sup>3</sup> is accepting. If s<sup>1</sup> = s2, the result is clear, so assume <sup>s</sup><sup>1</sup> <sup>=</sup> <sup>s</sup>2. There are then two cases: <sup>s</sup><sup>2</sup> <sup>=</sup> DIV or <sup>s</sup><sup>2</sup> <sup>=</sup> <sup>u</sup> for some <sup>u</sup> <sup>∈</sup> <sup>F</sup> \ <sup>D</sup>.

If s<sup>2</sup> = DIV, then a ∈ I and s<sup>3</sup> = DIV, and we have s<sup>1</sup> a → DIV. As DIV is accepting, the desired conclusion holds.

If s<sup>2</sup> = u, then s<sup>1</sup> = ˆu, which is accepting. There are again two cases: either <sup>s</sup><sup>3</sup> <sup>=</sup> <sup>u</sup> or <sup>s</sup><sup>3</sup> = ˆ<sup>v</sup> for some <sup>v</sup> <sup>∈</sup> <sup>S</sup>. If <sup>s</sup><sup>3</sup> <sup>=</sup> <sup>u</sup> then <sup>a</sup> <sup>∈</sup> <sup>I</sup> and ˆ<sup>u</sup> <sup>a</sup> <sup>→</sup> <sup>u</sup>, as required. If <sup>s</sup><sup>3</sup> = ˆv, then <sup>a</sup> <sup>∈</sup> <sup>V</sup> and therefore <sup>u</sup> <sup>a</sup> <sup>→</sup> <sup>v</sup>, hence ˆ<sup>u</sup> <sup>a</sup> → vˆ, as required.

Definition 11(3) is clear from the definition of <sup>T</sup>ˆ. 

**Theorem 2.** <sup>L</sup>(B) *is* <sup>V</sup> *-interruptible iff* <sup>L</sup>(norm(B,V )) = <sup>L</sup>(B)*. In particular interruptibility for B¨uchi Automata is decidable.*

*Proof.* Let P<sup>1</sup> = L(B) and P<sup>2</sup> = L(norm(B,V )). By Proposition 7, norm(B,V ) is in V -interrupt normal form, so by Proposition 6, P<sup>2</sup> is V -interruptible. Hence one direction is clear: if P<sup>1</sup> = P2, then P<sup>1</sup> is V -interruptible.

So suppose P<sup>1</sup> is V -interruptible. We wish to show P<sup>1</sup> = P2. By Lemma 2, it suffices to show the two languages contain the same V -interrupt-free words.

Suppose <sup>ζ</sup> is a <sup>V</sup> -interrupt-free word in <sup>P</sup>1. If <sup>ζ</sup> <sup>∈</sup> <sup>V</sup> <sup>ω</sup> then an accepting path <sup>θ</sup> in <sup>B</sup> maps to the accepting path <sup>ˆ</sup><sup>θ</sup> in <sup>B</sup>ˆ, and <sup>ζ</sup> <sup>∈</sup> <sup>P</sup>2. So assume <sup>ζ</sup> <sup>∈</sup> <sup>V</sup> <sup>∗</sup>I<sup>ω</sup>. Then an accepting path in B has a prefix θ of visible transitions ending in a state <sup>u</sup> <sup>∈</sup> <sup>D</sup>. That prefix corresponds to a path <sup>ˆ</sup><sup>θ</sup> in <sup>B</sup><sup>ˆ</sup> ending in ˆu. As <sup>u</sup> <sup>∈</sup> <sup>D</sup>, uˆ <sup>x</sup> <sup>→</sup> <sup>u</sup><sup>ˆ</sup> for all <sup>x</sup> <sup>∈</sup> <sup>I</sup>. If <sup>u</sup> is accepting, we get an accepting path for <sup>ζ</sup> that follows <sup>ˆ</sup><sup>θ</sup> and then loops at ˆu. If <sup>u</sup> is not accepting then <sup>u</sup> <sup>∈</sup> <sup>D</sup> \ <sup>F</sup>, and ˆ<sup>u</sup> <sup>x</sup> → DIV for all <sup>x</sup> <sup>∈</sup> <sup>I</sup>. Since DIV is accepting and DIV <sup>x</sup> → DIV for all x ∈ I, we again get an accepting path for ζ in Bˆ.

Suppose now that <sup>ζ</sup> is a <sup>V</sup> -interrupt-free word in <sup>P</sup>2. Assume <sup>ζ</sup> <sup>∈</sup> <sup>V</sup> <sup>ω</sup>. An accepting path for ζ cannot pass through a sharp state or DIV, because only invisible transitions end in those states. So the path passes through only original states, and therefore corresponds to an accepting path in B.

Suppose <sup>ζ</sup> <sup>∈</sup> <sup>V</sup> <sup>∗</sup>I<sup>ω</sup>. An accepting path for <sup>ζ</sup> in <sup>B</sup><sup>ˆ</sup> consists of a prefix <sup>ˆ</sup><sup>θ</sup> of visible transitions followed by an infinite accepting path ξ of invisible transitions. As above, ˆθ corresponds to a path θ in B ending in a state u.

We claim that ξ cannot pass through a sharp state. This is because all invisible transitions departing from a sharp state are self loops. But sharp states are not accepting, while ξ is an accepting path of invisible transitions. It follows that each transition in ξ is a self-loop or terminates in DIV.

We now claim u ∈ D. For suppose the first transition in ξ is a self-loop on ˆu. According to the definition of <sup>T</sup>ˆ, this implies <sup>u</sup> <sup>∈</sup> <sup>D</sup> <sup>∪</sup> (<sup>S</sup> \ <sup>F</sup>). Hence, if <sup>u</sup> ∈ <sup>D</sup> then u is not accepting, and all invisible transitions departing from ˆu are selfloops, contradicting the fact that ξ is an accepting path. If, on the other hand, the first transition in ξ is ˆu <sup>x</sup> <sup>→</sup> DIV, for some <sup>x</sup> <sup>∈</sup> <sup>I</sup>, then the definition of <sup>T</sup><sup>ˆ</sup> implies u ∈ D, establishing the claim.

So u ∈ D, i.e., there is an accepting path ρ in B starting from u and consisting of all invisible transitions. The accepting path obtained by concatenating θ and ρ spells a word which, projected onto V , equals ζ|<sup>V</sup> . Since P<sup>1</sup> is V -interruptible, ζ ∈ P1. This completes the proof that P<sup>1</sup> = P2.

The theorem reduces the problem of determining V -interruptibility to a problem of determining equivalence of two B¨uchi Automata, which can be done using language intersection, complement, and emptiness algorithms for BAs [37]. 

#### **4 On-the-Fly Partial Order Reduction**

#### **4.1 General Theory and Soundness Theorem**

Let <sup>M</sup> = (Q, A, T, q<sup>0</sup>) be an LTS, <sup>V</sup> <sup>⊆</sup> <sup>A</sup>, and <sup>B</sup> = (S, A, δ, S<sup>0</sup>, F) a V -interruptible BA. The goal of on-the-fly POR is to explore a sub-automaton R of R = M B with the property that L(R) = ∅⇔L(R ) = ∅.

A function amp: <sup>Q</sup>×<sup>S</sup> <sup>→</sup> <sup>2</sup><sup>A</sup> is an *ample selector* if amp(q, s) <sup>⊆</sup> enabled(M, q) for all q ∈ Q, s ∈ S. Each amp(q, s) is an *ample set*. An ample selector determines a BA R = reduced(R, amp) which has the same states, accepting states, and initial state as R, but only a subset of the transitions:

$$\begin{aligned} R' &= (Q \times S, A, \delta', \{q^0\} \times S^0, Q \times F) \\ \delta' &= \{ ((q, s), a, (q', s')) \mid a \in \mathsf{amp}(q, s) \land (q, a, q') \in T \land (s, a, s') \in \delta \}. \end{aligned}$$

We now define some constraints on an ample selector that will be used to guarantee the reduced product space has nonempty language if the full space does. First we need the usual notion of independence:

**Definition 12.** Let <sup>M</sup> be an LTS with alphabet <sup>A</sup>, and a, b <sup>∈</sup> <sup>A</sup>. We say <sup>a</sup> and b are *independent* if both of the following hold for all states q and q of M:

1. (q <sup>a</sup> → q ∧ b ∈ enabled(M, q)) ⇒ b ∈ enabled(M, q ) 2. q ab −→ q ⇔ q ba −→ q .

We say a and b are *dependent* if they are not independent. 

Note that, in contrast with [1], we do not assume actions are deterministic. We can now define the four constraints:


**Theorem 3.** *Let* <sup>M</sup> *be an LTS with alphabet* <sup>A</sup>*,* <sup>V</sup> <sup>⊆</sup> <sup>A</sup>*,* <sup>B</sup> *a BA with alphabet* <sup>A</sup> *in* V *-interrupt normal form,* R = M B*, and* amp *an ample selector satisfying* **C0***–***C3***. Then* <sup>L</sup>(reduced(R, amp)) = ∅⇔L(R) = <sup>∅</sup>*.*

The requirement that B be in interrupt normal form is necessary. A counterexample when that condition is not met is given in Fig. 1. Note a and b are independent, and a is invisible. The ample set for product states 0 and 1 is {a}; the ample set for product state 2 is {a, b}. Hence **C3** holds because a state on the sole cycle is fully enabled. After normalizing B (and removing unreachable states), this problem goes away: in any reduced space, the ample sets must retain

the a-transitions, and state 0 must be fully enabled since it has an a-self-loop, so the accepting cycle involving the two states will remain.

The remainder of this section is devoted to the proof of Theorem 3. The proof is similar to that of the analogous theorem in the state-based case [27], but some changes are necessary and we include the proof for completeness.

Let θ be an accepting path in R. An infinite sequence of accepting paths π0, π1,... will be constructed, where π<sup>0</sup> = θ. For each i ≥ 0, π<sup>i</sup> will be decomposed as η<sup>i</sup> ◦ θi, where η<sup>i</sup> is a finite path of length i in R , θ<sup>i</sup> is an infinite path, and η<sup>i</sup> is a prefix of ηi+1. For i = 0, η<sup>0</sup> is empty and θ<sup>0</sup> = θ.

Assume i ≥ 0 and we have defined η<sup>j</sup> and θ<sup>j</sup> for j ≤ i. Write

$$\theta\_i \;= \; \langle q\_0, s\_0 \rangle \xrightarrow{a\_1} \langle q\_1, s\_1 \rangle \xrightarrow{a\_2} \cdots \tag{1}$$

Then ηi+1 and θi+1 are defined as follows. Let E = amp(q0, s0). There are two cases:

*Case 1:* a<sup>1</sup> ∈ E. Let ηi+1 be the path obtained by appending the first transition of θ<sup>i</sup> to ηi, and θi+1 the path obtained by removing the first transition from θi.

*Case 2:* a<sup>1</sup> ∈ E. Then there are two sub-cases:

*Case 2a:* Some operation in E occurs in θi. Let n be the index of the first such occurrence. By **C1**, <sup>a</sup><sup>j</sup> and <sup>a</sup><sup>n</sup> are independent for 1 <sup>≤</sup> j<n. By repeated application of the independence property, there is a path in M of the form

$$q\_0 \stackrel{a\_n}{\rightarrow} q\_1' \stackrel{a\_1}{\rightarrow} q\_2' \stackrel{a\_2}{\rightarrow} \cdots \stackrel{a\_n}{\rightarrow} \stackrel{a\_n}{\rightarrow} q\_{n-1}' \stackrel{a\_{n-1}}{\rightarrow} q\_n \stackrel{a\_{n+1}}{\rightarrow} q\_{n+1} \stackrel{a\_{n+2}}{\rightarrow} \cdots \stackrel{a\_n}{\cdots}$$

By **C2**, a<sup>n</sup> is invisible. By Definition 11, B has an accepting path of the form

$$s\_0 \stackrel{a\_n}{\rightarrow} s'\_0 \stackrel{a\_1}{\rightarrow} s\_1 \stackrel{a\_2}{\rightarrow} \dots \stackrel{a\_{n-2}}{\rightarrow} s\_{n-2} \stackrel{a\_{n-1}}{\rightarrow} s\_{n-1} \stackrel{a\_{n+1}}{\rightarrow} s\_{n+1} \stackrel{a\_{n+2}}{\rightarrow} \dots \stackrel{a\_{n+2}}{\cdots} s\_n$$

Composing these two paths yields a path in R. Removing the first transition (labeled an) yields θi+1. Appending that transition to η<sup>i</sup> yields ηi+1.

**Fig. 1.** Counterexample to Theorem 3 if B is not in interrupt normal form: (a) the LTS M, (b) the BA B representing **GF**b, (c) the product space—dashed edges are in the full, but not reduced, space, and (d) the result of normalizing B and removing unreachable states, which also depicts the resulting full product space.

*Case 2b:* No operation in <sup>E</sup> occurs in <sup>θ</sup>i. By **C0**, <sup>E</sup> is nonempty. Let <sup>b</sup> <sup>∈</sup> <sup>E</sup>. By **C2**, every action in θ<sup>i</sup> is independent of b. As in the case above, we obtain a path in R

$$\langle q\_0, s\_0 \rangle \stackrel{b}{\to} \langle q\_1', s\_0' \rangle \stackrel{a\_1}{\to} \langle q\_2', s\_1 \rangle \stackrel{a\_2}{\to} \langle q\_3', s\_2 \rangle \stackrel{a\_3}{\to} \dots \dots$$

and define θi+1 and ηi+1 as above.

Let η be the limit of the ηi, i.e., η(i) = ηi+1(i). It is clear that η is an infinite path in R , but we must show it passes through an accepting state infinitely often. To see this, define integers d<sup>i</sup> for i ≥ 0 as follows. Let ξ<sup>i</sup> = s0s<sup>1</sup> ··· be the sequence of BA states traced by θi. Let d<sup>i</sup> be the minimum j ≥ 0 such that s<sup>j</sup> is accepting. Note that d<sup>i</sup> = 0 iff last(ηi) is accepting.

Suppose <sup>i</sup> <sup>≥</sup> 0 and <sup>d</sup><sup>i</sup> <sup>&</sup>gt; 0. If Case 1 holds, then <sup>d</sup>i+1 <sup>=</sup> <sup>d</sup>i−1, since <sup>ξ</sup>i+1 <sup>=</sup> <sup>ξ</sup><sup>1</sup> i . It is not hard to see that if Case 2 holds, di+1 ≤ di. Note that in Case 2a, if d<sup>i</sup> = n, the accepting state s<sup>n</sup> is removed, but Definition 11(2) guarantees that at least one of s<sup>n</sup>−<sup>1</sup> and sn+1 is accepting. In the worst case (s<sup>n</sup>−<sup>1</sup> is not accepting), we still have di+1 = n.

We claim there are an infinite number of i ≥ 0 such that Case 1 holds. Otherwise, there is some i > 0 such that Case 2 holds for all j ≥ i. Let a be the first action in θi. Then for all j ≥ i, a is the first action of θ<sup>j</sup> and a is not in the ample set of last(η<sup>j</sup> ). Since the number of states of R is finite, there is some k>i such that last(ηk) = last(ηi). Hence there is a cycle in R for which a is always enabled but never in the ample set, contradicting **C3**.

If η does not pass through an accepting state infinitely often, there is some i ≥ 0 such that for all j ≥ i, first(θ<sup>j</sup> ) is not accepting. But then (d<sup>j</sup> )<sup>j</sup>≥<sup>i</sup> is a nondecreasing sequence of positive integers which strictly decreases infinitely often, a contradiction.

#### **4.2 Ample Sets for a Parallel Composition of LTSs**

We now describe the specific method used by McRERS to select ample sets. Since this method is similar to existing approaches, such as [32, Algorithm 4.3], we just outline the main ideas.

Let n ≥ 1, P = {1,...,n}, and let M1,...,M<sup>n</sup> be LTSs over Act. Write <sup>M</sup><sup>i</sup> = (Qi, Ai,→i, q<sup>0</sup> <sup>i</sup> ) and

$$M = M\_1 \parallel \dots \parallel M\_n = (Q, A, \to, q^0).$$

For a ∈ A, let procs(a) = {i ∈ P | a ∈ Ai}. It can be shown that if a and b are dependent actions, then procs(a) ∩ procs(b) = ∅.

Let q = (q1,...,qn) ∈ Q and E<sup>i</sup> = enabled(Mi, qi) for i ∈ P. Let

$$R\_q = \{(i, j) \in P \times P \mid E\_i \cap A\_j \neq \emptyset\}.$$

Suppose C ⊆ P is closed under Rq, i.e., for all i ∈ C and j ∈ P, (i, j) ∈ R<sup>q</sup> ⇒ j ∈ C. This implies that if a ∈ E<sup>i</sup> for some i ∈ C then procs(a) ⊆ C. Define

$$\mathsf{enabled}(C, q) = \mathsf{enabled}(M, q) \cap \bigcup\_{i \in C} A\_i.$$

Let E = enabled(C, q). Note E ⊆ <sup>i</sup>∈<sup>C</sup> <sup>E</sup>i. Hence for any <sup>a</sup> <sup>∈</sup> <sup>E</sup>, procs(a) <sup>⊆</sup> <sup>C</sup>.

**Lemma 3.** *On any trace in* M *starting from* q*, no action outside of* E *but dependent on an action in* E *can occur without an action in* E *occurring first.*

*Proof.* Let ζ be a trace in M starting from q, such that no element of E occurs in ζ. We claim no action involving C (i.e., an action a for which procs(a) ∩ C = ∅) can occur in ζ. Otherwise, let x be the first such action. Then x ∈ Ei, for some i ∈ C, so procs(x) ⊆ C. As x ∈ E, x ∈ enabled(M, q). So some earlier action y in ζ caused x to become enabled, and therefore procs(x)∩procs(y) = ∅, hence procs(y)∩ C = ∅, contradicting the assumption that x was the first action involving C in ζ.

Now any action b dependent on an action a ∈ E must satisfy procs(a) ∩ procs(b) is nonempty. Since procs(a) ⊆ C, procs(b) ∩ C is nonempty. Hence no action dependent on an action in E can occur in ζ. 

We now describe how to find an ample set in the context of NDFS. Let (q, s) be a new product state that has just been pushed onto the outer DFS stack. The relation R<sup>q</sup> defined above gives P the structure of a directed graph. Suppose that graph has a strongly connected component C<sup>0</sup> such that all of the following hold for E = enabled(C0, q):


Then set amp(q, s) = E. If no such SCC exists, set amp(q, s) = enabled(M, q). It follows that **C0**–**C4** hold. Note that the union C of all SCCs reachable from C<sup>0</sup> is closed under Rq, and enabled(C, q) = E, so Lemma 3 guarantees **C1**. For **C3**, we actually have the stronger condition that in any cycle in the reduced space, at least one state is fully enabled. In our implementation, the SCCs are computed using Tarjan's algorithm. Among all SCCs C<sup>0</sup> satisfying the conditions above, we choose one for which |enabled(C0, q)| is minimal.

One known issue when combining NDFS with on-the-fly POR is that the inner DFS must explore the same subspace as the outer DFS, i.e., amp must be a deterministic function of its input (q, s) [18]. To accomplish this, McRERS stores one additional integer j in the state: j is the root node of the SCC C0, or −1 if the state is fully enabled. The outer search saves j in the state, and the inner search uses j to reconstruct the SCC C<sup>0</sup> and the ample set E.

#### **5 Related Work**

There has been significant earlier research on the use of partial order reduction to model check LTSs (or the closely related concept of process algebras); see, e.g., [14,16,30–33,35]. To understand how this previous work relates to this paper, we must explain a subtle, but important, distinction concerning how a property is specified. In much of this literature, a property of an LTS with alphabet A is essentially a pair π = (V,T), where V ⊆ A is a set of visible actions and T is a set of (finite and infinite) words over V . A property in this sense specifies acceptable behaviors *after invisible actions have been removed*. (See, e.g., Def. 2.4 and preceding comments in [32].) We can translate π to a property P in our sense by taking its inverse image under the projection map:

$$P = \{ \zeta \in A^{\omega} \mid \zeta|\_{V} \in T \}.$$

Note that P *is* V *-interruptible by definition*. Hence the need to distinguish interruptible properties does not arise in this context.

Much of the earlier work on POR for LTSs deals with the "offline" case, i.e., the construction of a subspace of M that preserves certain classes of properties. In contrast, Theorem 3 deals with an on-the-fly algorithm, i.e., the construction of a subspace of M B. The on-the-fly approach is an essential optimization in model checking, but recent work in the state-based formalism has shown that offline POR schemes do not always generalize easily to on-the-fly algorithms [27].

One work that does describe an on-the-fly model checking algorithm for LTSs is [32] (see also [17], which deals with the same ideas in a state formalism). The property is specified by a *tester process* B. Consistent with the notion of *property* described above, the alphabet of B does not include the invisible actions. Hence, in the parallel composition M B, the tester does not move when M executes an invisible action. In order to specify both finite and infinite words of visible actions, the tester has two kinds of accepting states: "livelock monitor states" and "infinite trace monitor states." (Two additional classes of states for detecting other kinds of violations are not relevant to the discussion here.) A version of the stubborn set theory is used to define the reduced space, and a special condition is used to solve the "ignoring problem" (instead of our **C3**). It would be interesting to compare this algorithm with the one described here.

There are many algorithms for reducing or even minimizing the size of an LTS while preserving various properties, e.g., *bisimulation equivalence* [8] or *divergence preserving bisimilarity* [6]. These algorithms could be applied to the individual components of a parallel composition (taking all visible and communication actions to be "visible"), as a preprocessing step before beginning the model checking search. An exploration of these algorithms, and how they impact POR, is beyond the scope of this paper, but we hope to explore that avenue in future work.

The RERS Challenge [9,19–21] is an annual event involving a number of different categories of large model checking problems. The "parallel LTL category," offered from 2016 on, is directly relevant to this paper. Each problem in that category consists of a Graphviz "dot" file specifying an LTS as a parallel composition, and a text file containing 20 LTL formulas. The goal is to identify the formulas satisfied by the LTS. The solutions are initially known only to the organizers, and are published after the event. The RERS semantics for LTSs, LTL, and satisfiability are exactly the same as in this paper.

The methods for generating the LTS and the properties are complicated, and have varied over the years, but are designed to satisfy certain hardness guarantees. The approach described in [29] is ". . . based on the weak refinement . . . of convergent systems which preserves an interesting class of temporal properties." It can be seen that the properties preserved by weak refinement are exactly the interruptible properties. While [29] does not describe a method for determining whether a property is interruptible, the authors have informed us that they developed a sufficient condition for an LTL formula to be interruptible, and used this in combination with a random method to generate the formulas for 2016 and 2019. Our analysis (Sect. 6) confirms that all formulas from 2016 and 2019 are interruptible, while 2017 and 2018 contain some non-interruptible formulas.

There is a well-known way to translate a system and property expressed in an action-based formalism to a state-based formalism. The idea is to add a shared variable *last* which records the last action executed. An LTL formula over actions can be transformed to one over states by replacing each action a with the predicate *last* = a. This is the approach taken in the Promela representations of the parallel problems provided with the RERS challenges.

This translation is semantics-preserving but performance-destroying. Every transition writes to the shared variable *last*, so any state-based POR scheme will assume that no two transitions commute. Furthermore, since the property references *last*, all transitions are visible. This effectively disables POR, even when the property is stutter-invariant, as can be seen in the poor performance of Spin on the RERS Promela models (Sect. 6). It is possible that there are more effective Spin translations; [34, §2.2], for example, suggests not updating *last* on invisible actions, and adding a global boolean variable that is flipped on every visible action (in addition to updating *last*). We note that this would also require modifying the LTL formula, or specifying the property in some other way. In any case, it suggests another interesting avenue for future work.

#### **6 Experimental Results and Conclusions**

We implemented a model checker named McRERS based on the algorithms described in this paper. McRERS is a library and set of command line tools. It is written in sequential C and uses the Spot library [4] for several tasks: (1) determining equivalence of LTL formulas, (2) determining language equivalence of BAs, and (3) converting an LTL formula to a BA. The source code for McR-ERS as well as all artifacts related to the experiments discussed in this section, are available at https://vsl.cis.udel.edu/cav2020. The experiments were run on an 8-core 3.7GHz Intel Xeon W-2145 Linux machine with 256 GB RAM, though McRERS is a sequential program and most experiments required much less memory.

As described in Sect. 5, each edition of RERS includes a number of problems, each of which comes with 20 LTL formulas. The numbers of problems for years 2016–2019 are, in order, 20, 15, 3, and 9, for a total of 47 problems, or 47 ∗ 20 = 940 distinct model checking tasks. (Some formulas become identical after renaming propositions.) We used the McRERS *property analyzer* to analyze these formulas to determine which are interruptible; the algorithm used is based on Theorem 1. The results show that all formulas from 2016 and 2019 are interruptible, which agrees with the expectations of the RERS organizers. In 2017, 22 of the 300 formulas are not interruptible; these include


In 2018, 3 of the 60 formulas are not interruptible. In summary, only 25 of the 940 tasks involve non-interruptible formulas. The total runtime for the analysis of all 940 formulas was 6 s.

We next used the McRERS *automaton analyzer* to create BAs from each of the interruptible formulas, and then to determine which of these Spot-generated BAs was not in interrupt normal form. This uses a straightforward algorithm that iterates over all states and checks the conditions of Definition 11. For each BA not in normal form, the analyzer transforms it to normal form using function norm of Sect. 3.4. Interestingly, all of the Spot-generated BAs in 2016 and 2019 were already in normal form. Four of the BAs from interruptible formulas in 2017 were not in normal form; all of these formulas had the form **<sup>F</sup>**[<sup>a</sup> <sup>∨</sup> ((¬b)**W**c)]. In 2018, 6 interruptible formulas have non-normal BAs; these formulas have several different non-isomorphic forms, some of which are quite complex. The details can be seen on the online archive. The total runtime for this analysis (including writing all BAs to a file) was 11 s.

The McRERS model checker parses RERS "dot" and property files to construct an internal representation of a parallel composition M = M<sup>1</sup> ··· M<sup>n</sup> of LTSs and a list of LTL formulas. Each formula f is converted to a BA B; if f is interruptible and B is not already in normal form, B is transformed to normal form. The NDFS algorithm is used to determine language emptiness, and if f is interruptible, the POR scheme described in Sect. 4 is also used. States are saved in a hash table.

One other simple optimization is used regardless of whether f is interruptible. Let αM denote the set of actions labeling at least one transition in M, and define αB similarly. If αM = αB, then all transitions labeled by an action in (αM \ αB) ∪ (αB \ αM) are removed from the M<sup>i</sup> and B; all unreachable states and transitions in the M<sup>i</sup> and B are also removed. This is repeated until αM = αB.

We applied the model checker to all problems in the 2019 benchmarks. Interestingly, all 180 tasks completed, with the correct results, using at most 8 GB RAM; the times are given in Fig. 2.

We also ran these problems with POR turned off, to measure the impact of that optimization. As is often the case with POR schemes, the difference is dramatic. The non-POR tests ran out of memory on our 256 GB machine after problem 106. We show the resources consumed for a representative task in Fig. 3; this property holds, so a complete search is required. In terms of number of states or time, the performance differs by about 5 orders of magnitude.


**Fig. 2.** Time to solve RERS 2019 parallel LTL problems using McRERS. Each problem comprises 20 LTL formulas. Memory limited to 8 GB. Rows: problem number, number of components in the LTS, and total McRERS wall time rounded up to nearest second.


**Fig. 3.** Performance impact of POR on solving RERS 2019 problem 106, formula 1, (a6 → **F**a7)**W**(a7 ∨ a88).


**Fig. 4.** Performance of Spin v6.5.1 and McRERS on RERS 2019 problem 101, property 1. Both tools used POR. Spin used -DCOLLAPSE for state compression and -m100000000 for search depth bound.

As explained in Sect. 5, the RERS Spin models can not be expected to perform well. We ran the latest version of Spin on these using -DCOLLAPSE compression. We show the result for just the first task in Fig. 4. There is at least a 4 order of magnitude performance difference (measured in states or time) between the tools. An examination of Spin's output in verbose mode reveals the problem to be as described in Sect. 5: the full set of enabled transitions is explored at each transition due to the update of the shared variable.

The 2016 RERS problems are more challenging for McRERS. The problems are numbered from 101 to 120. To scale beyond problem 111, with a memory bound of 256 GB, additional reduction techniques, such as the component minimization methods discussed in Sect. 5, must be used. We plan to carry out a thorough study of those methods and how they interact with POR.

**Acknowledgements.** We are grateful to Marc Jasper of TU Dortmund for answering many of our questions about the RERS benchmarks, and for coining the term "interruptible" to describe the class of properties that are the topic of this paper. This material is based upon work by the RAPIDS Institute, supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program. Funding was also provided by DoE award DE-SC0012566, and by the U.S. National Science Foundation award CCF-1319571.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Global Guidance for Local Generalization in Model Checking**

Hari Govind Vediramana Krishnan1(B) , YuTing Chen<sup>2</sup>, Sharon Shoham<sup>3</sup>, and Arie Gurfinkel<sup>1</sup>

> <sup>1</sup> University of Waterloo, Waterloo, Canada hgvk94@gmail.com <sup>2</sup> Chalmers University of Technology, Gothenburg, Sweden <sup>3</sup> Tel Aviv University, Tel Aviv, Israel

**Abstract.** SMT-based model checkers, especially IC3-style ones, are currently the most effective techniques for verification of infinite state systems. They infer *global* inductive invariants via *local* reasoning about a single step of the transition relation of a system, while employing SMTbased procedures, such as interpolation, to mitigate the limitations of local reasoning and allow for better generalization. Unfortunately, these mitigations intertwine model checking with heuristics of the underlying SMT-solver, negatively affecting stability of model checking.

In this paper, we propose to tackle the limitations of locality in a systematic manner. We introduce explicit *global guidance* into the local reasoning performed by IC3-style algorithms. To this end, we extend the SMT-IC3 paradigm with three novel rules, designed to mitigate fundamental sources of failure that stem from locality. We instantiate these rules for the theory of Linear Integer Arithmetic and implement them on top of Spacer solver in Z3. Our empirical results show that GSpacer, Spacer extended with global guidance, is significantly more effective than both Spacer and sole global reasoning, and, furthermore, is insensitive to interpolation.

#### **1 Introduction**

SMT-based Model Checking algorithms that combine SMT-based search for bounded counterexamples with interpolation-based search for inductive invariants are currently the most effective techniques for verification of infinite state systems. They are widely applicable, including for verification of synchronous systems, protocols, parameterized systems, and software.

The Achilles heel of these approaches is the mismatch between the *local* reasoning used to establish absence of bounded counterexamples and a *global* reason for absence of unbounded counterexamples (i.e., existence of an inductive invariant). This is particularly apparent in IC3-style algorithms [7], such as Spacer [18]. IC3-style algorithms establish bounded safety by repeatedly computing predecessors of error (or bad) states, blocking them by local reasoning c The Author(s) 2020

about a single step of the transition relation of the system, and, later, using the resulting *lemmas* to construct a candidate inductive invariant for the global safety proof. The whole process is driven by the choice of local lemmas. Good lemmas lead to quick convergence, bad lemmas make even simple-looking problems difficult to solve.

The effect of local reasoning is somewhat mitigated by the use of interpolation in lemma construction. In addition to the usual inductive generalization by dropping literals from a blocked bad state, interpolation is used to further generalize the blocked state using theory-aware reasoning. For example, when blocking a bad state x = 1 ∧ y = 1, inductive generalization would infer a subclause of x = 1 ∨ y = 1 as a lemma, while interpolation might infer x = y – a predicate that might be required for the inductive invariant. Spacer, that is based on this idea, is extremely effective, as demonstrated by its performance in recent CHC-COMP competitions [10]. The downside, however, is that the approach leads to a highly unstable procedure that is extremely sensitive to syntactic changes in the system description, changes in interpolation algorithms, and any algorithmic changes in the underlying SMT-solver.

An alternative approach, often called *invariant inference*, is to focus on the global safety proof, i.e., an inductive invariant. This has long been advocated by such approaches as Houdini [15], and, more recently, by a variety of machinelearning inspired techniques, e.g., FreqHorn [14], LinearArbitrary [28], and ICE-DT [16]. The key idea is to iteratively generate positive (i.e., reachable states) and negative (i.e., states that reach an error) examples and to compute a candidate invariant that separates these two sets. The reasoning is more focused towards the invariant, and, the search is restricted by either predicates, templates, grammars, or some combination. Invariant inference approaches are particularly good at finding simple inductive invariants. However, they do not generalize well to a wide variety of problems. In practice, they are often used to complement other SMT-based techniques.

In this paper, we present a novel approach that extends, what we call, *local reasoning* of IC3-style algorithms with *global guidance* inspired by the invariant inference algorithms described above. Our main insight is that the set of lemmas maintained by IC3-style algorithms hint towards a potential global proof. However, these hints are lost in existing approaches. We observe that letting the current set of lemmas, that represent candidate global invariants, guide local reasoning by introducing new lemmas and states to be blocked is often sufficient to direct IC3 towards a better global proof.

We present and implement our results in the context of Spacer—a solver for Constrained Horn Clauses (CHC)—implemented in the Z3 SMT-solver [13]. Spacer is used by multiple software model checking tools, performed remarkably well in CHC-COMP competitions [10], and is open-sourced. However, our results are fundamental and apply to any other IC3-style algorithm. While our implementation works with arbitrary CHC instances, we simplify the presentation by focusing on infinite state model checking of transition systems.

We illustrate the pitfalls of local reasoning using three examples shown in Fig. 1. All three examples are small, simple, and have simple inductive invariants. All three are challenging for Spacer. Where these examples are based on Spacerspecific design choices, each exhibits a fundamental deficiency that stems from local reasoning. We believe they can be adapted for any other IC3-style verification algorithm. The examples assume basic familiarity with the IC3 paradigm. Readers who are not familiar with it may find it useful to read the examples after reading Sect. 2.

**Fig. 1.** Verification tasks to illustrate sources of divergence for Spacer. The call nd() non-deterministically returns a Boolean value.

*Myopic Generalization.* Spacer diverges on the example in Fig. 1(a) by iteratively learning lemmas of the form (a − c ≤ k) ⇒ (b − d ≤ k) for different values of k, where a, b, c, d are the program variables. These lemmas establish that there are no counterexamples of longer and longer lengths. However, the process never converges to the desired lemma (a − c) ≤ (b − d), which excludes counterexamples of any length. The lemmas are discovered using interpolation, based on proofs found by the SMT-solver. A close examination of the corresponding proofs shows that the relationship between (a − c) and (b − d) does not appear in the proofs, making it impossible to find the desired lemma by tweaking local interpolation reasoning. On the other hand, looking at the global proof (i.e., the set of lemmas discovered to refute a bounded counterexample), it is almost obvious that (a − c) ≤ (b − d) is an interesting generalization to try. Amusingly, a small, syntactic, but semantic preserving change of swapping line 2 for line 3 in Fig. 1(a) changes the SMT-solver proofs, affects local interpolation, and makes the instance trivial for Spacer.

*Excessive (Predecessor) Generalization.* Spacer diverges on the example in Fig. 1(b) by computing an infinite sequence of lemmas of the form a+k<sup>1</sup> ×b ≥ k2, where a and b are program variables, and k<sup>1</sup> and k<sup>2</sup> are integers. The root cause is excessive generalization in predecessor computation. The *Bad* states are a < 0, and their predecessors are states such as (a = 1 ∧ b = −10), (a = 2 ∧ b = −10), etc., or, more generally, regions (a + b < 0), (a + 2b < −1), etc. Spacer always attempts to compute the most general predecessor states. This is the best local strategy, but blocking these regions by learning their negation leads to the aforementioned lemmas. According to the global proof these lemmas do not converge to a linear invariant. An alternative strategy that underapproximates the problematic regions by (numerically) simpler regions and, as a result, learns simpler lemmas is desired (and is effective on this example). For example, region a + 3b ≤ −4 can be under-approximated by a ≤ 32 ∧ b ≤ −12, eventually leading to a lemma b ≥ 0, that is a part of the final invariant: (a ≥ 0 ∧ b ≥ 0).

*Stuck in a Rut.* Finally, Spacer converges on the example in Fig. 1(c), but only after unrolling the system for 100 iterations. During the first 100 iterations, Spacer learns that program states with (a ≥ 100 ∧ b = c) are not reachable because a is bounded by 1 in the first iteration, by 2 in the second, and so on. In each iteration, the global proof is updated by replacing a lemma of the form a<k by lemma of the form a < (k + 1) for different values of k. Again, the strategy is good locally – total number of lemmas does not grow and the bounded proof is improved. Yet, globally, it is clear that no progress is made since the same set of bad states are blocked again and again in slightly different ways. An alternative strategy is to abstract the literal a ≥ 100 from the formula that represents the bad states, and, instead, conjecture that no states in b = c are reachable.

*Our Approach: Global Guidance.* As shown in the examples above, in all the cases that Spacer diverges, the missteps are not obvious locally, but are clear when the overall proof is considered. We propose three new rules, Subsume, Concretize, and, Conjecture, that provide global guidance, by considering existing lemmas, to mitigate the problems illustrated above. Subsume introduces a lemma that generalizes existing ones, Concretize under-approximates partiallyblocked predecessors to focus on repeatedly unblocked regions, and Conjecture over-approximates a predecessor by abstracting away regions that are repeatedly blocked. The rules are generic, and apply to arbitrary SMT theories. Furthermore, we propose an efficient instantiation of the rules for the theory Linear Integer Arithmetic.

We have implemented the new strategy, called GSpacer, in Spacer and compared it to the original implementation of Spacer. We show that GSpacer outperforms Spacer in benchmarks from CHC-COMP 2018 and 2019. More significantly, we show that the performance is independent of interpolation. While Spacer is highly dependent on interpolation parameters, and performs poorly when interpolation is disabled, the results of GSpacer are virtually unaffected by interpolation. We also compare GSpacer to LinearArbitrary [28], a tool that *infers invariants* using global reasoning. GSpacer outperforms LinearArbitrary on the benchmarks from [28]. These results indicate that global guidance mitigates the shortcomings of local reasoning.

The rest of the paper is structured as follows. Sect. 2 presents the necessary background. Sect. 3 introduces our *global guidance* as a set of abstract inference rules. Sect. 4 describes an instantiation of the rules to Linear Integer Arithmetic (LIA). Sect. 5 presents our empirical evaluation. Finally, Sect. 7 describes related work and concludes the paper.

#### **2 Background**

*Logic.* We consider first order logic modulo theories, and adopt the standard notation and terminology. A first-order language modulo theory T is defined over a signature Σ that consists of constant, function and predicate symbols, some of which may be *interpreted* by T . As always, *terms* are constant symbols, variables, or function symbols applied to terms; *atoms* are predicate symbols applied to terms; *literals* are atoms or their negations; *cubes* are conjunctions of literals; and *clauses* are disjunctions of literals. Unless otherwise stated, we only consider *closed* formulas (i.e., formulas without any free variables). As usual, we use sets of formulas and their conjunctions interchangeably.

*MBP.* Given a set of constants *v*, a formula ϕ and a model M |= ϕ, Model Based Projection (MBP) of ϕ over the constants *v*, denoted MBP(*v*, ϕ, M), computes a model-preserving under-approximation of ϕ projected onto Σ \ *v*. That is, MBP(*v*, ϕ, M) is a formula over Σ \ *v* such that M |= MBP(*v*, ϕ, M) and any model M |= MBP(*v*, ϕ, M) can be extended to a model M |= ϕ by providing an interpretation for *v*. There are polynomial time algorithms for computing MBP in Linear Arithmetic [5,18].

*Interpolation.* Given an unsatisfiable formula A ∧ B, an interpolant, denoted ITP(A, B), is a formula I over the shared signature of A and B such that A ⇒ I and I ⇒ ¬B.

*Safety Problem.* A *transition system* is a pair *Init*, *Tr* , where *Init* is a formula over Σ and *Tr* is a formula over Σ ∪ Σ , where Σ = {s | s ∈ Σ}. <sup>1</sup> The states of the system correspond to structures over Σ, *Init* represents the initial states and *Tr* represents the transition relation, where Σ is used to represent the prestate of a transition, and Σ is used to represent the post-state. For a formula ϕ over Σ, we denote by ϕ the formula obtained by substituting each s ∈ Σ by s ∈ Σ . A *safety problem* is a triple *Init*, *Tr* , *Bad* , where *Init*, *Tr* is a transition system and *Bad* is a formula over Σ representing a set of bad states.

The safety problem *Init*, *Tr* , *Bad* has a *counterexample of length* k if the following formula is satisfiable: *Init*<sup>0</sup> <sup>∧</sup><sup>k</sup>−<sup>1</sup> <sup>i</sup>=0 *Tr* <sup>i</sup> <sup>∧</sup>*Bad*<sup>k</sup>, where <sup>ϕ</sup><sup>i</sup> is defined over <sup>Σ</sup><sup>i</sup> <sup>=</sup> {s<sup>i</sup> <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>Σ</sup>} (a copy of the signature used to represent the state of the system after the execution of i steps) and is obtained from ϕ by substituting each <sup>s</sup> <sup>∈</sup> <sup>Σ</sup> by <sup>s</sup><sup>i</sup> <sup>∈</sup> <sup>Σ</sup><sup>i</sup> , and *Tr* <sup>i</sup> is obtained from *Tr* by substituting <sup>s</sup> <sup>∈</sup> <sup>Σ</sup> by <sup>s</sup><sup>i</sup> <sup>∈</sup> <sup>Σ</sup><sup>i</sup> and <sup>s</sup> <sup>∈</sup> <sup>Σ</sup> by <sup>s</sup><sup>i</sup>+1 <sup>∈</sup> <sup>Σ</sup><sup>i</sup>+1. The transition system is *safe* if the safety problem has no counterexample, of any length.

<sup>1</sup> In fact, a primed copy is introduced in Σ only for the uninterpreted symbols in Σ. Interpreted symbols remain the same in Σ- .

**Algorithm 1:** Spacer algorithm as a set of guarded commands. We use the shorthand F(ϕ) = U ∨ (ϕ ∧ *Tr* ).

**function** Spacer**: In:** *Init*, *Tr*, *Bad* **Out:** safe, *Inv* or unsafe <sup>Q</sup> := <sup>∅</sup> // pob queue N := 0 // maximum safe level <sup>O</sup><sup>0</sup> := *Init*, <sup>O</sup>*<sup>i</sup>* := **for all** i > <sup>0</sup> // lemma trace U := *Init* // reachable states **forever do** Candidate isSat(O*<sup>N</sup>* <sup>∧</sup> *Bad*) <sup>Q</sup> := <sup>Q</sup> ∪ *Bad*, N Predecessor ϕ, i + 1 ∈ <sup>Q</sup>*,* <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>O</sup>*<sup>i</sup>* <sup>∧</sup> *Tr* <sup>∧</sup> <sup>ϕ</sup>- <sup>Q</sup> := <sup>Q</sup> ∪ MBP(*x*- , *Tr* <sup>∧</sup> <sup>ϕ</sup>- , M), i Successor ϕ, i + 1 ∈ <sup>Q</sup>*,* <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>F</sup>(U) <sup>∧</sup> <sup>ϕ</sup>- <sup>U</sup> := U ∨ MBP(*x*, <sup>F</sup>(U), M)[*x*- → *x*] Conflict ϕ, i + 1 ∈ <sup>Q</sup>*,* <sup>F</sup>(O*i*) ⇒ ¬ϕ- <sup>O</sup>*<sup>j</sup>* := (O*<sup>j</sup>* <sup>∧</sup> ITP(F(O*i*), ϕ- )[*x*- <sup>→</sup> *<sup>x</sup>*]) **for all** <sup>j</sup> <sup>≤</sup> <sup>i</sup> + 1 Induction - ∈ O*i*+1, = (<sup>ϕ</sup> <sup>∨</sup> <sup>ψ</sup>), <sup>F</sup>(<sup>ϕ</sup> ∧ O*i*) <sup>⇒</sup> <sup>ϕ</sup>- <sup>O</sup>*<sup>j</sup>* := <sup>O</sup>*<sup>j</sup>* <sup>∧</sup> <sup>ϕ</sup> **for all** <sup>j</sup> <sup>≤</sup> <sup>i</sup> + 1 Propagate - ∈ O*i*, <sup>O</sup>*<sup>i</sup>* <sup>∧</sup> *Tr* <sup>⇒</sup> - <sup>O</sup>*i*+1 := (O*i*+1 <sup>∧</sup> ) Unfold - <sup>O</sup>*<sup>N</sup>* ⇒ ¬*Bad* <sup>N</sup> := <sup>N</sup> + 1 Safe - <sup>O</sup>*i*+1 ⇒ O*<sup>i</sup>* **for some** i<N **return** safe, <sup>O</sup>*i* Unsafe isSat*(Bad* ∧ U*)* **return** unsafe

**Algorithm 2:** Global guidance rules for Spacer.

Subsume - L⊆O*i*, k <sup>≥</sup> i, <sup>F</sup>(O*k*) <sup>⇒</sup> <sup>ψ</sup>- , <sup>∀</sup> ∈ L. ψ <sup>⇒</sup> <sup>O</sup>*<sup>j</sup>* := (O*<sup>j</sup>* <sup>∧</sup> <sup>ψ</sup>) **for all** <sup>j</sup> <sup>≤</sup> <sup>k</sup> + 1 Concretize - L⊆O*i*, ϕ, j ∈ Q, <sup>∀</sup> ∈ L. isSat(<sup>ϕ</sup> ∧ ¬), isSat(<sup>ϕ</sup> <sup>∧</sup> - <sup>L</sup>), γ <sup>⇒</sup> ϕ, isSat(<sup>γ</sup> <sup>∧</sup> - L) <sup>Q</sup> := <sup>Q</sup> ∪ γ, k + 1 **where** <sup>k</sup> = max{<sup>j</sup> | O*<sup>j</sup>* ⇒ ¬γ} Conjecture - L⊆O*i*, ϕ, j ∈ Q, ϕ <sup>≡</sup> <sup>α</sup> <sup>∧</sup> β, <sup>∀</sup> ∈ L. ⇒ ¬<sup>β</sup> <sup>∧</sup> isSat( <sup>∧</sup> <sup>α</sup>), U⇒¬<sup>α</sup> <sup>Q</sup> := <sup>Q</sup> ∪ α, k + 1 **where** <sup>k</sup> = max{<sup>j</sup> | O*<sup>j</sup>* ⇒ ¬α}

*Inductive Invariants.* An *inductive invariant* is a formula *Inv* over Σ such that (i) *Init* ⇒ *Inv*, (ii) *Inv* ∧ *Tr* ⇒ *Inv* , and (iii) *Inv* ⇒ ¬*Bad*. If such an inductive invariant exists, then the transition system is safe.

*Spacer.* The safety problem defined above is an instance of a more general problem, CHC-SAT, of satisfiability of Constrained Horn Clauses (CHC). Spacer is a semi-decision procedure for CHC-SAT. However, to simplify the presentation, we describe the algorithm only for the particular case of the safety problem. We stress that Spacer, as well as the developments of this paper, apply to the more general setting of CHCs (both linear and non-linear). We assume that the only uninterpreted symbols in Σ are constant symbols, which we denote *x*. Typically, these represent program variables. Without loss of generality, we assume that *Bad* is a cube.

Algorithm 1 presents the key ingredients of Spacer as a set of guarded commands (or rules). It maintains the following. Current unrolling depth N at which a counterexample is searched (there are no counterexamples with depth less than N). A *trace* O = (O0, O1,...) of *frames*, such that each frame O<sup>i</sup> is a set of *lemmas*, and each lemma ∈ O<sup>i</sup> is a clause. A queue of *proof obligations* Q, where each proof obligation (pob) in Q is a pair ϕ, i of a cube ϕ and a level number i, 0 ≤ i ≤ N. An under-approximation U of reachable states. Intuitively, each frame O<sup>i</sup> is a candidate inductive invariant s.t. O<sup>i</sup> over-approximates states reachable up to i steps from *Init*. The latter is ensured since O<sup>0</sup> = *Init*, the trace is monotone, i.e., Oi+1 ⊆ Oi, and each frame is inductive *relative* to its previous one, i.e., O<sup>i</sup> ∧ *Tr* ⇒ O <sup>i</sup>+1. Each pob ϕ, i in Q corresponds to a suffix of a potential counterexample that has to be blocked in Oi, i.e., has to be proven unreachable in i steps.

The Candidate rule adds an initial pob *Bad*, N to the queue. If a pob ϕ, i cannot be blocked because ϕ is reachable from frame (i − 1), the Predecessor rule generates a predecessor ψ of ϕ using MBP and adds ψ, i − 1 to Q. The Successor rule updates the set of reachable states if the pob is reachable. If the pob is blocked, the Conflict rule strengthens the trace O by using interpolation to learn a new lemma that blocks the pob, i.e., implies ¬ϕ. The Induction rule strengthens a lemma by inductive generalization and the Propagate rule pushes a lemma to a higher frame. If the *Bad* state has been blocked at N, the Unfold rule increments the depth of unrolling N. In practice, the rules are scheduled to ensure progress towards finding a counterexample.

#### **3 Global Guidance of Local Proofs**

As illustrated by the examples in Fig. 1, while Spacer is generally effective, its local reasoning is easily confused. The effectiveness is very dependent on the local computation of predecessors using model-based projection, and lemmas using interpolation. In this section, we extend Spacer with three additional *global* reasoning rules. The rules are inspired by the deficiencies illustrated by the motivating examples in Fig. 1. In this section, we present the rules abstractly, independent of any underlying theory, focusing on pre- and post-conditions. In Sect. 4, we specialize the rules for Linear Integer Arithmetic, and show how they are scheduled with the other rules of Spacer in an efficient verification algorithm. The new global rules are summarized in Algorithm 2. We use the same guarded command notation as in description of Spacer in Algorithm 1. Note that the rules supplement, and not replace, the ones in Algorithm 1.

*Subsume* is the most natural rule to explain. It says that if there is a set of lemmas L at level i, and there exists a formula ψ such that (a) ψ is stronger than every lemma in L, and (b) ψ over-approximates states reachable in at most k steps, where k ≥ i, then ψ can be added to the trace to subsume L. This rule reduces the size of the global proof – that is, the number of total not-subsumed lemmas. Note that the rule allows ψ to be at a level k that is higher than i. The choice of ψ is left open. The details are likely to be specific to the theory involved. For example, when instantiated for LIA, Subsume is sufficient to solve example in Fig. 1(a). Interestingly, Subsume is not likely to be effective for propositional IC3. In that case, ψ is a clause and the only way for it to be stronger than L is for ψ to be a syntactic sub-sequence of every lemma in L, but such ψ is already explored by local inductive generalization (rule Induction in Algorithm 1).

*Concretize* applies to a pob, unlike Subsume. It is motivated by example in Fig. 1(b) that highlights the problem of excessive local generalization. Spacer always computes as general predecessors as possible. This is necessary for refutational completeness since in an infinite state system there are infinitely many potential predecessors. Computing the most general predecessor ensures that Spacer finds a counterexample, if it exists. However, this also forces Spacer to discover more general, and sometimes more complex, lemmas than might be necessary for an inductive invariant. Without a global view of the overall proof, it is hard to determine when the algorithm generalizes too much. The intuition for Concretize is that generalization is excessive when there is a single pob ϕ, j that is not blocked, yet, there is a set of lemmas L such that every lemma ∈ L partially blocks ϕ. That is, for any ∈ L, there is a sub-region ϕ of pob ϕ that is blocked by (i.e., ⇒ ¬ϕ), and there is at least one state s ∈ ϕ that is not blocked by any existing lemma in L (i.e., s |= ϕ∧ - L). In this case, Concretize computes an under-approximation γ of ϕ that includes some not-yet-blocked state s. The new pob is added to the lowest level at which γ is not yet blocked. Concretize is useful to solve the example in Fig. 1(b).

*Conjecture* guides the algorithm away from being stuck in the same part of the search space. A single pob ϕ might be blocked by a different lemma at each level that ϕ appears in. This indicates that the lemmas are too strong, and cannot be propagated successfully to a higher level. The goal of the Conjecture rule is to identify such a case to guide the algorithm to explore alternative proofs with a better potential for generalization. This is done by abstracting away the part of the pob that has been blocked in the past. The pre-condition for Conjecture is the existence of a pob ϕ, j such that ϕ is split into two (not necessarily disjoint) sets of literals, α and β. Second, there must be a set of lemmas L, at a (typically much lower) level i<j such that every lemma ∈ L blocks ϕ, and, moreover, blocks ϕ by blocking β. Intuitively, this implies that while there are many different lemmas (i.e., all lemmas in L) that block ϕ at different levels, all of them correspond to a *local* generalization of ¬β that could not be propagated to block ϕ at higher levels. In this case, Conjecture abstracts the pob ϕ into α, hoping to generate an alternative way to block ϕ. Of course, α is conjectured only if it is not already blocked and does not contain any known reachable states. Conjecture is necessary for a quick convergence on the example in Fig. 1(c). In some respect, Conjecture is akin to widening in Abstract Interpretation [12] – it abstracts a set of states by dropping constraints that appear to prevent further exploration. Of course, it is also quite different since it does not guarantee termination. While Conjecture is applicable to propositional IC3 as well, it is much more significant in SMT-based setting since in many FOL theories a single literal in a pob might result in infinitely many distinct lemmas.

Each of the rules can be applied by itself, but they are most effective in combination. For example, Concretize creates less general predecessors, that, in the worst case, lead to many simple lemmas. At the same time, Subsume combines lemmas together into more complex ones. The interaction of the two produces lemmas that neither one can produce in isolation. At the same time, Conjecture helps unstuck the algorithm from a single unproductive pob, allowing the other rules to take effect.

#### **4 Global Guidance for Linear Integer Arithmetic**

In this section, we present a specialization of our general rules, shown in Algorithm 2, to the theory of Linear Integer Arithmetic (LIA). This requires solving two problems: identifying subsets of lemmas for pre-conditions of the rules (clearly using all possible subsets is too expensive), and applying the rule once its pre-condition is met. For lemma selection, we introduce a notion of syntactic clustering based on anti-unification. For rule application, we exploit basic properties of LIA for an effective algorithm. Our presentation is focused on LIA exclusively. However, the rules extend to combinations of LIA with other theories, such as the combined theory of LIA and Arrays.

The rest of this section is structured as follows. We begin with a brief background on LIA in Sect. 4.1. We then present our lemma selection scheme, which is common to all the rules, in Sect. 4.2, followed by a description of how the rules Subsume (in Sect. 4.3), Concretize (in Sect. 4.4), and Conjecture (in Sect. 4.5) are instantiated for LIA. We conclude in Sect. 4.6 with an algorithm that integrates all the rules together.

#### **4.1 Linear Integer Arithmetic: Background**

In the theory of Linear Integer Arithmetic (LIA), formulas are defined over a signature that includes interpreted function symbols +, −, ×, interpreted predicate symbols <, ≤, |, interpreted constant symbols 0, 1, 2,..., and uninterpreted constant symbols a, b, . . . , x, y, . . .. We write Z for the set interpreted constant symbols, and call them *integers*. We use *constants* to refer exclusively to the uninterpreted constants (these are often called *variables* in LIA literature). Terms (and accordingly formulas) in LIA are restricted to be *linear*, that is, multiplication is never applied to two constants.

We write LIA<sup>−</sup>div for the fragment of LIA that excludes divisiblity (d|h) predicates. A literal in LIA<sup>−</sup>div is a linear inequality; a cube is a conjunction of such inequalities, that is, a polytope. We find it convenient to use matrix-based notation for representing cubes in LIA<sup>−</sup>div. A ground cube <sup>c</sup> <sup>∈</sup> LIA<sup>−</sup>div with <sup>p</sup> inequalities (literals) over k (uninterpreted) constants is written as A · *x* ≤ *n*, where <sup>A</sup> is a <sup>p</sup> <sup>×</sup> <sup>k</sup> matrix of coefficients in <sup>Z</sup><sup>p</sup>×<sup>k</sup>, *<sup>x</sup>* = (x<sup>1</sup> ··· <sup>x</sup>k)<sup>T</sup> is a column vector that consists of the (uninterpreted) constants, and *<sup>n</sup>* = (n<sup>1</sup> ··· <sup>n</sup>p)<sup>T</sup> is a column vector in <sup>Z</sup><sup>p</sup>. For example, the cube <sup>x</sup> <sup>≥</sup> <sup>2</sup> <sup>∧</sup> <sup>2</sup><sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> 3 is written as <sup>−</sup>1 0 2 1 ·[ x <sup>y</sup> ] <sup>≤</sup> <sup>−</sup> <sup>2</sup> 3 . In the sequel, all vectors are column vectors, super-script T denotes transpose, dot is used for a dot product and [*n*1; *n*2] stands for a matrix of column vectors *n*<sup>1</sup> and *n*2.

#### **4.2 Lemma Selection**

A common pre-condition for all of our global rules in Algorithm 2 is the existence of a subset of lemmas L of some frame Oi. Attempting to apply the rules for every subset of O<sup>i</sup> is infeasible. In practice, we use syntactic similarity between lemmas as a predictor that one of the global rules is applicable, and restrict L to subsets of syntactically similar lemmas. In the rest of this section, we formally define what we mean by *syntactic similarity*, and how syntactically similar subsets of lemmas, called *clusters*, are maintained efficiently throughout the algorithm.

*Syntactic Similarity.* A formula π with free variables is called a *pattern*. Note that we do not require π to be in LIA. Let σ be a substitution, i.e., a mapping from variables to terms. We write πσ for the result of replacing all occurrences of free variables in π with their mapping under σ. A substitution σ is called *numeric* if it maps every variable to an integer, i.e., the range of σ is Z. We say that a formula ϕ *numerically matches* a pattern π iff there exists a numeric substitution σ such that ϕ = πσ. Note that, as usual, the equality is syntactic. For example, consider the pattern π = v0a + v1b ≤ 0 with free variables v<sup>0</sup> and v<sup>1</sup> and uninterpreted constants a and b. The formula ϕ<sup>1</sup> = 3a + 4b ≤ 0 matches π via a numeric substitution σ<sup>1</sup> = {v<sup>0</sup> → 3, v<sup>1</sup> → 4}. However, ϕ<sup>2</sup> = 4b+ 3a ≤ 0, while semantically equivalent to ϕ1, does not match π. Similarly ϕ<sup>3</sup> = a + b ≤ 0 does not match π as well.

Matching is extended to patterns in the usual way by allowing a substitution σ to map variables to variables. We say that a pattern π<sup>1</sup> is more general than a pattern π<sup>2</sup> if π<sup>2</sup> matches π1. A pattern π is a *numeric anti-unifier* for a pair of formulas ϕ<sup>1</sup> and ϕ<sup>2</sup> if both ϕ<sup>1</sup> and ϕ<sup>2</sup> match π numerically. We write *anti*(ϕ1, ϕ2) for a most general numeric anti-unifier of ϕ<sup>1</sup> and ϕ2. We say that two formulas ϕ<sup>1</sup> and ϕ<sup>2</sup> are *syntactically similar* if there exists a numeric antiunifier between them (i.e., *anti*(ϕ1, ϕ2) is defined). Anti-unification is extended to sets of formulas in the usual way.

*Clusters.* We use anti-unification to define *clusters* of syntactically similar formulas. Let Φ be a fixed set of formulas, and π a pattern. A *cluster*, C<sup>Φ</sup>(π), is a subset of Φ such that every formula ϕ ∈ C<sup>Φ</sup>(π) numerically matches π. That is, π is a numeric anti-unifier for C<sup>Φ</sup>(π). In the implementation, we restrict the pre-conditions of the global rules so that a subset of lemmas L⊆O<sup>i</sup> is a cluster for some pattern π, i.e., L = C<sup>O</sup>*<sup>i</sup>* (π).

*Clustering Lemmas.* We use the following strategy to efficiently keep track of available clusters. Let new be a new lemma to be added to O<sup>i</sup>. Assume there is at least one lemma ∈ O<sup>i</sup> that numerically anti-unifies with new via some pattern π. If such an does not belong to any cluster, a new cluster C<sup>O</sup>*<sup>i</sup>* (π) = {new, } is formed, where π = *anti*(new, ). Otherwise, for every lemma ∈ O<sup>i</sup> that numerically matches new and every cluster C<sup>O</sup>*<sup>i</sup>* (ˆπ) containing , new is added to C<sup>O</sup>*<sup>i</sup>* (ˆπ) if new matches ˆπ, or a new cluster is formed using , new, and any other lemmas in C<sup>O</sup>*<sup>i</sup>* (ˆπ) that anti-unify with them. Note that a new lemma new might belong to multiple clusters.

For example, suppose new = (a ≤ 6 ∨ b ≤ 6), and there is already a cluster CO*<sup>i</sup>* (a ≤ v<sup>0</sup> ∨ b ≤ 5) = {(a ≤ 5 ∨ b ≤ 5),(a ≤ 8 ∨ b ≤ 5)}. Since new anti-unifies with each of the lemmas in the cluster, but does not match the pattern a ≤ v<sup>0</sup> ∨b ≤ 5, a new cluster that includes all of them is formed w.r.t. a more general pattern: CO*<sup>i</sup>* (a ≤ v0∨b ≤ v1) = {(a ≤ 6∨b ≤ 6),(a ≤ 5∨b ≤ 5),(a ≤ 8∨b ≤ 5)}.

In the presentation above, we assumed that anti-unification is completely syntactic. This is problematic in practice since it significantly limits the applicability of the global rules. Recall, for example, that a+b ≤ 0 and 2a+2b ≤ 0 do not anti-unify numerically according to our definitions, and, therefore, do not cluster together. In practice, we augment syntactic anti-unification with simple rewrite rules that are applied greedily. For example, we normalize all LIA terms, take care of implicit multiplication by 1, and of associativity and commutativity of addition. In the future, it is interesting to explore how advanced anti-unification algorithms, such as [8,27], can be adapted for our purpose.

#### **4.3 Subsume Rule for LIA**

Recall that the Subsume rule (Algorithm 2) takes a cluster of lemmas L = C<sup>O</sup>*<sup>i</sup>* (π) and computes a new lemma - ψ that subsumes all the lemmas in L, that is ψ ⇒ L. We find it convenient to dualize the problem. Let S = {¬ | ∈ L} be the dual of L, clearly ψ ⇒ - L iff ( S) ⇒ ¬ψ. Note that L is a set of clauses, S is a set of cubes, <sup>ψ</sup> is a clause, and <sup>¬</sup><sup>ψ</sup> is a cube. In the case of LIA<sup>−</sup>div, this means that S represents a union of convex sets, and ¬ψ represents a convex set that the Subsume rule must find. The strongest such <sup>¬</sup><sup>ψ</sup> in LIA<sup>−</sup>div exists, and is the convex closure of <sup>S</sup>. Thus, applying Subsume in the context of LIA<sup>−</sup>div is reduced to computing a convex closure of a set of (negated) lemmas in a cluster. Full LIA extends LIA<sup>−</sup>div with divisibility constraints. Therefore, Subsume obtains a stronger ¬ψ by adding such constraints.

*Example 1.* For example, consider the following cluster:

$$\begin{aligned} \mathcal{L} &= \{ (x > 2 \lor x < 2 \lor y > 3), (x > 4 \lor x < 4 \lor y > 5), (x > 8 \lor x < 8 \lor y > 9) \} \\ \mathcal{S} &= \{ (x \le 2 \land x \ge 2 \land y \le 3), (x \ge 4 \land x \le 4 \land y \le 5), (x \ge 8 \land x \le 8 \land y \le 9) \} \end{aligned}$$

The convex closure of <sup>S</sup> in LIA<sup>−</sup>div is 2 <sup>≤</sup> <sup>x</sup> <sup>≤</sup> <sup>8</sup>∧<sup>y</sup> <sup>≤</sup> <sup>x</sup>+ 1. However, a stronger over-approximation exists in LIA: 2 ≤ x ≤ 8 ∧ y ≤ x + 1 ∧ (2 | x).

In the sequel, we describe subsumeCube (Algorithm 3) which computes a cube ϕ that over-approximates ( S). Subsume is then implemented by removing from L lemmas that are already subsumed by existing lemmas in L, dualizing the result into S, invoking subsumeCube on S and returning ¬ϕ as a lemma that subsumes L.

Recall that Subsume is tried only in the case L = C<sup>O</sup>*<sup>i</sup>* (π). We further require that the negated pattern, ¬π, is of the form A · *x* ≤ *v*, where A is a coefficients matrix, *<sup>x</sup>* is a vector of constants and *<sup>v</sup>* = (v<sup>1</sup> ··· <sup>v</sup>p)<sup>T</sup> is a vector of <sup>p</sup> free variables. Under this assumption, S (the dual of L) is of the form {(A·*x* ≤ *n*i) | 1 ≤ i ≤ q}, where q = |S|, and for each 1 ≤ i ≤ q, *n*<sup>i</sup> is a numeric substitution to *v* from which one of the negated lemmas in S is obtained. That is, |*n*i| = |*v*|. In Example 1, ¬π = x ≤ v<sup>1</sup> ∧ −x ≤ v<sup>2</sup> ∧ y ≤ v<sup>3</sup> and

$$A = \begin{bmatrix} 1 & 0 \\ -1 & 0 \\ 0 & 1 \end{bmatrix} \quad \mathbf{z} = \begin{bmatrix} x \\ y \end{bmatrix} \quad \mathbf{v} = \begin{bmatrix} v\_1 \\ v\_2 \\ v\_3 \end{bmatrix} \quad \mathbf{n}\_1 = \begin{bmatrix} 2 \\ -2 \\ 3 \end{bmatrix} \quad \mathbf{n}\_2 = \begin{bmatrix} 4 \\ -4 \\ 5 \end{bmatrix} \quad \mathbf{n}\_3 = \begin{bmatrix} 8 \\ -8 \\ 9 \end{bmatrix}$$

Each cube (A · *x* ≤ *n*i) ∈ S is equivalent to ∃*v*. A · *x* ≤ *v* ∧ (*v* = *n*i). Finally, ( S) ≡ ∃*v*.(A · *x* ≤ *v*) ∧ ( (*v* = *n*i)). Thus, computing the overapproximation of S is reduced to (a) computing the convex hull H of a set of points {*n*<sup>i</sup> | 1 ≤ i ≤ q}, (b) computing divisibility constraints D that are satisfied by all the points, (c) substituting H ∧ D for the disjunction in the equation above, and (c) eliminating variables *v*. Both the computation of H ∧ D and the elimination of *v* may be prohibitively expensive. We, therefore, overapproximate them. Our approach for doing so is presented in Algorithm 3, and explained in detail below.

*Computing the convex hull of* {*n*<sup>i</sup> | 1 ≤ i ≤ q}. lines 3 to 8 compute the convex hull of {*n*<sup>i</sup> | 1 ≤ i ≤ q} as a formula over *v*, where variable v<sup>j</sup> , for 1 ≤ j ≤ p, represents the jth coordinates in the vectors (points) *n*i. Some of the coordinates, v<sup>j</sup> , in these vectors may be linearly dependent upon others. To simplify the problem, we first identify such dependencies and compute a set of linear equalities that expresses them (L in line 4). To do so, we consider a matrix N<sup>q</sup>×<sup>p</sup>, where the i th row consists of *n*<sup>T</sup> <sup>i</sup> . The <sup>j</sup>th column in <sup>N</sup>, denoted <sup>N</sup>∗<sup>j</sup> , corresponds to the <sup>j</sup>th coordinate, v<sup>j</sup> . The rank of N is the number of linearly independent columns (and rows). The other columns (coordinates) can be expressed by linear combinations of the linearly independent ones. To compute these linear combinations we use the kernel of [N; **1**] (N appended with a column vector of 1's), which is the set of all vectors *<sup>y</sup>* such that [N; **<sup>1</sup>**] · *<sup>y</sup>* <sup>=</sup> **<sup>0</sup>**, where **<sup>0</sup>** is the zero vector. Let <sup>B</sup> = kernel([N; **<sup>1</sup>**]) be a basis for the kernel of [N; **<sup>1</sup>**]. Then <sup>|</sup>B<sup>|</sup> <sup>=</sup> <sup>p</sup> <sup>−</sup> rank(N), and for each vector *y* ∈ B, the linear equality [v<sup>1</sup> ··· v<sup>p</sup> 1] · *y* = 0 holds in all the rows of N (i.e., all the given vectors satisfy it). We accumulate these equalities, which capture the linear dependencies between the coordinates, in L. Further, the equalities are used to compute rank(N) coordinates (columns in N) that are linearly independent and, modulo L, uniquely determine the remaining coordinates. We denote by *v*<sup>L</sup><sup>↓</sup> the subset of *v* that consists of the linearly independent coordinates. We further denote by *n* L↓ <sup>i</sup> the projection of *n*<sup>i</sup> to these coordinates and by N<sup>L</sup><sup>↓</sup> the projection of N to the corresponding columns. We have that ((*v* = *n*i)) ≡ L ∧ ( (*v*<sup>L</sup><sup>↓</sup> = *n* L↓ <sup>i</sup> ).

In Example 1, the numeral matrix is N = <sup>2</sup> <sup>−</sup>2 3 4 −4 5 <sup>8</sup> <sup>−</sup>8 9 , for which kernel([N; **<sup>1</sup>**]) = {( <sup>1100</sup> ) <sup>T</sup> ,( 1 0 <sup>−</sup>1 1 ) T }. Therefore, L is the conjunction of equalities v<sup>1</sup> + v<sup>2</sup> = 0 ∧ v<sup>1</sup> − v<sup>3</sup> + 1 = 0, or, equivalently v<sup>3</sup> = v<sup>1</sup> + 1 ∧ v<sup>2</sup> = −v1, *v*<sup>L</sup><sup>↓</sup> = v1 <sup>T</sup> , and

$$\begin{bmatrix} \mathbf{n}\_1^{L\_\perp} = \begin{bmatrix} 2 \end{bmatrix} & \mathbf{n}\_2^{L\_\perp} = \begin{bmatrix} 4 \end{bmatrix} & \mathbf{n}\_3^{L\_\perp} = \begin{bmatrix} 8 \end{bmatrix} & N^{L\_\perp} = \begin{bmatrix} 2 \\ 4 \\ 8 \end{bmatrix} \end{bmatrix}$$

Next, we compute the convex closure of (*v*L<sup>↓</sup> = *n* L↓ <sup>i</sup> ), and conjoin it with L to obtain H, the convex closure of ((*v* = *n*i)).

If the dimension of *v*L<sup>↓</sup> is one, as is the case in the example above, convex closure, C, of (*v*L<sup>↓</sup> = *n* L↓ <sup>i</sup> ) is obtained by bounding the sole element of *v*L<sup>↓</sup> based on its values in <sup>N</sup><sup>L</sup><sup>↓</sup> (line 6). In Example 1, we obtain <sup>C</sup> = 2 <sup>≤</sup> <sup>v</sup><sup>1</sup> <sup>≤</sup> 8.

If the dimension of *v*<sup>L</sup><sup>↓</sup> is greater than one, just computing the bounds of one of the constants is not sufficient. Instead, we use the concept of syntactic convex closure from [2] to compute the convex closure of (*v*<sup>L</sup><sup>↓</sup> = *n* L↓ <sup>i</sup> ) as ∃*α*. C where *α* is a vector that consists of q fresh *rational* variables and C is defined as follows (line 8): <sup>C</sup> <sup>=</sup> *<sup>α</sup>* <sup>≥</sup> <sup>0</sup> <sup>∧</sup> <sup>Σ</sup>*<sup>α</sup>* = 1 <sup>∧</sup> *<sup>α</sup>*<sup>T</sup> · <sup>N</sup><sup>L</sup><sup>↓</sup> = (*v*<sup>L</sup><sup>↓</sup> )<sup>T</sup> . <sup>C</sup> states that (*v*<sup>L</sup><sup>↓</sup> )<sup>T</sup> is a convex combination of the rows of N<sup>L</sup><sup>↓</sup> , or, in other words, *v*<sup>L</sup><sup>↓</sup> is a convex combination of {*n* L↓ <sup>i</sup> | 1 ≤ i ≤ q}.

To illustrate the syntactic convex closure, consider a second example with a set of cubes: S = {(x ≤ 0∧y ≤ 6),(x ≤ 6∧y ≤ 0),(x ≤ 5∧y ≤ 5)}. The coefficient matrix A, and the numeral matrix N are then: A = [ 1 0 0 1 ] and N =  0 6 6 0 5 5 . Here, kernel([N; **1**]) is empty – all the columns are linearly independent, hence, L = *true* and *v*<sup>L</sup><sup>↓</sup> = *v*. Therefore, syntactic convex closure is applied to the full matrix N, resulting in

$$\begin{aligned} C = \left(\alpha\_1 \ge 0\right) \land \left(\alpha\_2 \ge 0\right) \land \left(\alpha\_3 \ge 0\right) \land \left(\alpha\_1 + \alpha\_2 + \alpha\_3 = 1\right) \land \\ \left(6\alpha\_2 + 5\alpha\_3 = v\_1\right) \land \left(6\alpha\_1 + 5\alpha\_3 = v\_2\right) \end{aligned}$$

The convex closure of (*v* = *n*i) is then L ∧ ∃*α*. C, which is ∃*α*. C here.

*Divisibility Constraints.* Inductive invariants for verification problems often require divisibility constraints. We, therefore, use such constraints, denoted D, to obtain a stronger over-approximation of (*v* = *n*i) than the convex closure. To add a divisibility constraint for <sup>v</sup><sup>j</sup> <sup>∈</sup> *<sup>v</sup>*<sup>L</sup><sup>↓</sup> , we consider the column <sup>N</sup><sup>L</sup><sup>↓</sup> <sup>∗</sup><sup>j</sup> that corresponds to v<sup>j</sup> in N<sup>L</sup><sup>↓</sup> . We find the largest positive integer d such that each integer in <sup>N</sup><sup>L</sup><sup>↓</sup> <sup>∗</sup><sup>j</sup> leaves the same remainder when divided by <sup>d</sup>; namely, there exists <sup>0</sup> <sup>≤</sup> r<d such that <sup>n</sup> mod <sup>d</sup> <sup>=</sup> <sup>r</sup> for every <sup>n</sup> <sup>∈</sup> <sup>N</sup><sup>L</sup><sup>↓</sup> <sup>∗</sup><sup>j</sup> . This means that <sup>d</sup> <sup>|</sup> (v<sup>j</sup> <sup>−</sup>r) is satisfied by all the points *n*i. Note that such r always exists for d = 1. To avoid this trivial case, we add the constraint d | (v<sup>j</sup> − r) only if d = 1 (line 12). We repeat this process for each <sup>v</sup><sup>j</sup> <sup>∈</sup> *<sup>v</sup>*<sup>L</sup><sup>↓</sup> .

In Example 1, all the elements in the (only) column of the matrix N<sup>L</sup><sup>↓</sup> , which corresponds to v1, are divisible by 2, and no larger d has a corresponding r. Thus, line 12 of Algorithm 3 adds the divisibility condition (2 | v1) to D.

*Eliminating Existentially Quantified Variables Using MBP.* By combining the linear equalities exhibited by N, the convex closure of NL<sup>↓</sup> and the divisibility constraints on *v*, we obtain ∃*α*. L ∧ C ∧ D as an over-approximation of (*v* = *n*i). Accordingly, ∃*v*. ∃*α*. ψ, where ψ = (A · *x* ≤ *v*) ∧ L ∧ C ∧ D, is an overapproximation of ( S) ≡ ∃*v*.(A·*x* ≤ *v*)∧( (*v* = *n*i)) (line 13). In order to get a LIA cube that overapproximates S, it remains to eliminate the existential quantifiers. Since quantifier elimination is expensive, and does not necessarily generate convex formulas (cubes), we approximate it using MBP. Namely, we obtain a cube ϕ that under-approximates ∃*v*. ∃*α*. ψ by applying MBP on ψ and a model M<sup>0</sup> |= ψ. We then use an SMT solver to drop literals from ϕ until it over-approximates ∃*v*. ∃*α*. ψ, and hence also S (lines 16 to 19). The result is returned by Subsume as an over-approximation of S.

Models M<sup>0</sup> that satisfy ψ and do not satisfy any of the cubes in S are preferred when computing MBP (line 14) as they ensure that the result of MBP is not subsumed by any of the cubes in S.

Note that the *α* are rational variables and *v* are integer variables, which means we require MBP to support a mixture of integer and rational variables. To achieve this, we first relax all constants to be rationals and apply MBP over LRA to eliminate *α*. We then adjust the resulting formula back to integer arithmetic by multiplying each atom by the least common multiple of the denominators of the coefficients in it. Finally, we apply MBP over the integers to eliminate *v*.

Considering Example 1 again, we get that ψ = (x ≤ v1) ∧ (−x ≤ v2) ∧ (y ≤ v3)∧(v<sup>3</sup> = 1+v1)∧(v<sup>2</sup> = −v1)∧(2 ≤ v<sup>1</sup> ≤ 8)∧(2 | v1) (the first three conjuncts correspond to (<sup>A</sup> · (x y)<sup>T</sup> <sup>≤</sup> (v<sup>1</sup> <sup>v</sup><sup>2</sup> <sup>v</sup>3)<sup>T</sup> )). Note that in this case we do not have rational variables *<sup>α</sup>* since <sup>|</sup>*v*<sup>L</sup><sup>↓</sup> <sup>|</sup> = 1. Depending on the model, the result of MBP can be one of

$$\begin{aligned} y \le x + 1 \land 2 \le x \le 8 \land (2 \mid y - 1) \land (2 \mid x) & \quad x \ge 2 \land x \le 2 \land y \le 3\\ y \le x + 1 \land 2 \le x \le 8 \land (2 \mid x) & \quad x \ge 8 \land x \le 8 \land y \le 9\\ y \ge x + 1 \land y \le x + 1 \land 3 \le y \le 9 \land (2 \mid y - 1) \end{aligned}$$

However, we prefer a model that does not satisfy any cube in S = {(x ≥ 2∧x ≤ 2 ∧ y ≤ 3),(x ≤ 4 ∧ x ≥ 4 ∧ y ≤ 5),(x ≤ 8 ∧ x ≥ 8 ∧ y ≤ 9)}, rules off the two possibilities on the right. None of these cubes cover ψ, hence generalization is used.

If the first cube is obtained by MBP, it is generalized into y ≤ x + 1 ∧ x ≥ 2 ∧ x ≤ 8 ∧ (2|x); the second cube is already an over-approximation; the third cube is generalized into y ≤ x + 1 ∧ y ≤ 9. Indeed, each of these cubes overapproximates S.

#### **4.4 Concretize Rule for LIA**

The Concretize rule (Algorithm 2) takes a cluster of lemmas L = C<sup>O</sup>*<sup>i</sup>* (π) and a pob ϕ, j such that each lemma in L partially blocks ϕ, and creates a new pob γ that is still not blocked by L, but γ is more concrete, i.e., γ ⇒ ϕ. In our implementation, this rule is applied when ϕ is in LIA<sup>−</sup>div. We further require that the pattern, π, of L is non-linear, i.e., some of the constants appear in π with free variables


as their coefficients. We denote these constants by U. An example is the pattern π = v0x + v1y + z ≤ 0, where U = {x, y}. Having such a cluster is an indication that attempting to block ϕ in full with a single lemma may require to track nonlinear correlations between the constants, which is impossible to do in LIA. In such cases, we identify the coupling of the constants in U in pobs (and hence in lemmas) as the potential source of non-linearity. Hence, we concretize (strengthen) ϕ into a pob γ where the constants in U are no longer coupled to any other constant.

*Coupling.* Formally, constants u and v are *coupled* in a cube c, denoted u  <sup>c</sup> v, if there exists a literal *lit* in c such that both u and v appear in *lit* (i.e., their coefficients in *lit* are non-zero). For example, x and y are coupled in x + y ≤ 0 ∧ z ≤ 0 whereas neither of them are coupled with z. A constant u is said to be *isolated* in a cube c, denoted Iso(u, c), if it appears in c but it is not coupled with any other constant in c. In the above cube, z is isolated.

*Concretization by Decoupling.* Given a pob ϕ (a cube) and a cluster L, Algorithm 4 presents our approach for concretizing ϕ by decoupling the constants in U—those that have variables as coefficients in the pattern of L (line 2). Concretization is guided by a model M |= ϕ∧ - L, representing a part of ϕ that is not yet blocked by the lemmas in L (line 3). Given such M, we concretize ϕ into a *model-preserving* under-approximation that isolates all the constants in U and preserves all other couplings. That is, we find a cube γ, such that

$$\gamma \Rightarrow \varphi \quad M \vdash \gamma \quad \forall u \in U. \text{ Iso}(u, \gamma) \quad \forall u, v \notin U. (u \bowtie \lnot \varphi \ v) \Rightarrow (u \bowtie \lnot \lnot \gamma \quad v) \tag{1}$$

Note that γ is not blocked by L since M satisfies both - L and γ. For example, if ϕ = (x+y ≤ 0)∧(x−y ≤ 0)∧(x+z ≥ 0) and M = [x = 0, y = 0, z = 1], then γ = 0 ≤ y ≤ 0 ∧ x ≤ 0 ∧ x + z ≥ 1 is a model preserving under-approximation that isolates U = {y}.

Algorithm 4 computes such a cube γ by a point-wise concretization of the literals of ϕ followed by the removal of subsumed literals. Literals that do not contain constants from U remain unchanged. A literal of the form *lit* = t ≤ b, where t = <sup>i</sup> nix<sup>i</sup> (recall that every literal in LIA<sup>−</sup>div can be normalized to this form), that includes constants from U is concretized into a *cube* by (1) isolating each of the summands nix<sup>i</sup> in t that include U from the rest, and (2) for each of the resulting sub-expressions creating a literal that uses its value in M as a bound. Formally, t is decomposed to s+ <sup>x</sup>*i*∈<sup>U</sup> <sup>n</sup>ixi, where <sup>s</sup> <sup>=</sup> <sup>x</sup>*i*∈<sup>U</sup> <sup>n</sup>ixi. The concretization of *lit* is the cube <sup>γ</sup>*lit* <sup>=</sup> <sup>s</sup> <sup>≤</sup> <sup>M</sup>[s] <sup>∧</sup> - <sup>x</sup>*i*∈<sup>U</sup> <sup>n</sup>ix<sup>i</sup> <sup>≤</sup> <sup>M</sup>[nixi], where M[t ] denotes the interpretation of t in <sup>M</sup>. Note that <sup>γ</sup>*lit* <sup>⇒</sup> *lit* since the bounds are stronger than the original bound on t: M[s] + <sup>x</sup>*i*∈<sup>U</sup> <sup>M</sup>[nixi] = <sup>M</sup>[t] <sup>≤</sup> <sup>b</sup>. This ensures that γ, obtained by the conjunction of literal concretizations, implies ϕ. It trivially satisfies the other conditions of Eq. (1).

For example, the concretization of the literal (x + y ≤ 0) with respect to U = {y} and M = [x = 0, y = 0, z = 1] is the cube x ≤ 0 ∧ y ≤ 0. Applying concretization in a similar manner to all the literals of the cube ϕ = (x+y ≤ 0)∧ (x−y ≤ 0)∧(x+z ≥ 0) from the previous example, we obtain the concretization x ≤ 0 ∧ 0 ≤ y ≤ 0 ∧ x + z ≥ 0. Note that the last literal is not concretized as it does not include y.

#### **4.5 Conjecture Rule for LIA**

The Conjecture rule (see Algorithm 2) takes a set of lemmas L and a pob ϕ ≡ α ∧ β such that all lemmas in L block β, but none of them blocks α, where α does not include any known reachable states. It returns α as a new pob.

For LIA, Conjecture is applied when the following conditions are met: (1) the pob <sup>ϕ</sup> is of the form <sup>ϕ</sup><sup>1</sup> <sup>∧</sup> <sup>ϕ</sup><sup>2</sup> <sup>∧</sup> <sup>ϕ</sup>3, where <sup>ϕ</sup><sup>3</sup> = (*n*<sup>T</sup> · *<sup>x</sup>* <sup>≤</sup> <sup>b</sup>), and <sup>ϕ</sup><sup>1</sup> and <sup>ϕ</sup><sup>2</sup> are any cubes. The sub-cube ϕ<sup>1</sup> ∧ ϕ<sup>2</sup> acts as α, while the sub-cube ϕ<sup>2</sup> ∧ ϕ<sup>3</sup> acts as β. (2) The cluster <sup>L</sup> consists of {bg <sup>∨</sup> (*n*<sup>T</sup> · *<sup>x</sup>* <sup>≥</sup> <sup>b</sup>i) <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>q</sup>}, where <sup>b</sup><sup>i</sup> > b and bg ⇒ ¬ϕ2. This means that each of the lemmas in L blocks β = ϕ<sup>2</sup> ∧ ϕ3, and they may be ordered as a sequence of increasingly stronger lemmas, indicating that they were created by trying to block the pob at different levels, leading to too strong lemmas that failed to propagate to higher levels. (3) The formula (bg <sup>∨</sup> (*n*<sup>T</sup> · *<sup>x</sup>* <sup>≥</sup> bi)) ∧ ϕ<sup>1</sup> ∧ ϕ<sup>2</sup> is satisfiable, that is, none of the lemmas in L block α = ϕ<sup>1</sup> ∧ ϕ2, and (4) U⇒¬(ϕ<sup>1</sup> ∧ ϕ2), that is, no state in ϕ<sup>1</sup> ∧ ϕ<sup>2</sup> is known to be reachable. If all four conditions are met, we conjecture α = ϕ<sup>1</sup> ∧ ϕ2. This is implemented by conjecture, that returns α (or ⊥ when the pre-conditions are not met).


For example, consider the pob ϕ = x ≥ 10 ∧ (x + y ≥ 10) ∧ y ≤ 10 and a cluster of lemmas L = {(x + y ≤ 0 ∨ y ≥ 101),(x + y ≤ 0 ∨ y ≥ 102)}. In this case, ϕ<sup>1</sup> = x ≥ 10, ϕ<sup>2</sup> = (x + y ≥ 10), ϕ<sup>3</sup> = y ≤ 10, and bg = x + y ≤ 0. Each of the lemmas in L block ϕ<sup>2</sup> ∧ ϕ<sup>3</sup> but none of them block ϕ<sup>1</sup> ∧ ϕ2. Therefore, we conjecture ϕ<sup>1</sup> ∧ ϕ2: x ≥ 10 ∧ (x + y ≥ 10).

#### **4.6 Putting It All Together**

Having explained the implementation of the new rules for LIA, we now put all the ingredients together into an algorithm, GSpacer. In particular, we present our choices as to when to apply the new rules, and on which clusters of lemmas and pobs. As can be seen in Sect. 5, this implementation works very well on a wide range of benchmarks.

Algorithm 5 presents GSpacer. The comments to the right side of a line refer to the abstract rules in Algorithm 1 and 2. Just like Spacer, GSpacer iteratively computes predecessors (line 10) and blocks them (line 14) in an infinite loop. Whenever a pob is proven to be reachable, the reachable states are updated (line 38). If *Bad* intersects with a reachable state, GSpacer terminates and returns unsafe (line 12). If one of the frames is an inductive invariant, GSpacer terminates with safe (line 20).

When a pob ϕ, i is handled, we first apply the Concretize rule, if possible (line 7). Recall that Concretize (Algorithm 4) takes as input a cluster that partially blocks ϕ and has a non-linear pattern. To obtain such a cluster, we first find, using C*pob*(ϕ, i ), a cluster π1,L<sup>1</sup> = CO*<sup>k</sup>* (π1), where k ≤ i, that includes *some* lemma (from frame k) that blocks ϕ; if none exists, L<sup>1</sup> = ∅. We then filter out from L<sup>1</sup> lemmas that completely block ϕ as well as lemmas that are irrelevant to ϕ, i.e., we obtain L<sup>2</sup> by keeping only lemmas that partially block ϕ. We apply Concretize on π1,L<sup>2</sup> to obtain a new pob that under-approximates ϕ if (1) the remaining sub-cluster, L2, is non-empty, (2) the pattern, π1, is nonlinear, and (3) - L<sup>2</sup> ∧ ϕ is satisfiable, i.e., a part of ϕ is not blocked by any lemma in L2.

Once a pob is blocked, and a new lemma that blocks it, , is added to the frames, an attempt is made to apply the Subsume and Conjecture rules on a cluster that includes . To that end, the function C*lemma*() finds *a* cluster π3,L<sup>3</sup> = C<sup>O</sup>*<sup>i</sup>* (π3) to which belongs (Sect. 4.2). Note that the choice of cluster is arbitrary. The rules are applied on π3,L<sup>3</sup> if the required pre-conditions are met (line 49 and line 53, respectively). When applicable, Subsume returns a new lemma that is added to the frames, while Conjecture returns a new pob that is added to the queue. Note that the latter is a *may* pob, in the sense that some of the states it represents *may not* lead to safety violation.

*Ensuring Progress.* Spacer always makes progress: as its search continues, it establishes absence of counterexamples of deeper and deeper depths. However, GSpacer does not ensure progress. Specifically, unrestricted application of the Concretize and Conjecture rules can make GSpacer diverge even on executions of a fixed bound. In our implementation, we ensure progress by allotting a fixed amount of *gas* to each pattern, π, that forms a cluster. Each time Concretize or Conjecture is applied to a cluster with π as the pattern, π loses some gas. Whenever π runs out of gas, the rules are no longer applied to any cluster with π as the pattern. There are finitely many patterns (assuming LIA terms are normalized). Thus, in each bounded execution of GSpacer, the Concretize and Conjecture rules are applied only a finite number of times, thereby, ensuring progress. Since the Subsume rule does not hinder progress, it is applied without any restriction on gas.

### **5 Evaluation**

We have implemented<sup>2</sup> GSpacer (Algorithm 5) as an extension to Spacer. To reduce the dimension of a matrix (in subsume, Sect. 4.3), we compute pairwise linear dependencies between all pairs of columns instead of computing the full kernel. This does not necessarily reduce the dimension of the matrix to its rank, but, is sufficient for our benchmarks. We have experimented with computing the full kernel using SageMath [25], but the overall performance did not improve. Clustering is implemented by anti-unification. LIA terms are normalized using

<sup>2</sup> https://github.com/hgvk94/z3/tree/gspacer-cav-ae.

default Z3 simplifications. Our implementation also supports global generalization for non-linear CHCs. We have also extended our work to the theory of LRA. We defer the details of this extension to an extended version of the paper.

To evaluate our implementation, we have conducted two sets of experiments<sup>3</sup>. All experiments were run on Intel E5-2690 V2 CPU at 3 GHz with 128 GB memory with a timeout of 10 min. First, to evaluate the performance of local reasoning with global guidance against pure local reasoning, we have compared GSpacer with the latest Spacer, to which we refer as the *baseline*. We took the benchmarks from CHC-COMP 2018 and 2019 [10]. We compare to Spacer because it dominated the competition by solving 85% of the benchmarks in CHC-COMP 2019 (20% more than the runner up) and 60% of the benchmarks in CHC-COMP 2018 (10% more than runner up). Our evaluation shows that GSpacer outperforms Spacer both in terms of number of solved instances and, more importantly, in overall robustness.

Second, to examine the performance of local reasoning with global guidance compared to solely global reasoning, we have compared GSpacer with an MLbased data-driven invariant inference tool LinearArbitrary [28]. Compared to other similar approaches, LinearArbitrary stands out by supporting invariants with arbitrary Boolean structure over arbitrary linear predicates. It is completely automated and does not require user-provided predicates, grammars, or any other guidance. For the comparison with LinearArbitrary, we have used both the CHC-COMP benchmarks, as well as the benchmarks from the artifact evaluation of [28]. The machine and timeout remain the same. Our evaluation shows that GSpacer is superior in this case as well.

*Comparison with* Spacer. Table 1 summarizes the comparison between Spacer and GSpacer on CHC-COMP instances. Since both tools can use a variety of interpolation strategies during lemma generalization (Line 45 in Algorithm 5), we compare three different configurations of each: *bw* and *fw* stand for two interpolation strategies, *backward* and *forward*, respectively, already implemented in Spacer, and *sc* stands for turning interpolation off and generalizing lemmas only by *subset clauses* computed by inductive generalization.

Any configuration of GSpacer solves significantly more instances than even the best configuration of Spacer. Figure 2 provides a more detailed comparison between the best configurations of both tools in terms of running time and depth of convergence. There is no clear trend in terms of running time on instances solved by both tools. This is not surprising—SMT-solving run time is highly nondeterministic and any change in strategy has a significant impact on performance of SMT queries involved. In terms of depth, it is clear that GSpacer converges at the same or lower depth. The depth is significantly lower for instances solved only by GSpacer.

Moreover, the performance of GSpacer is not significantly affected by the interpolation strategy used. In fact, the configuration *sc* in which interpolation is

<sup>3</sup> Detailed experimental results including the effectiveness of each rule, and the extensions to non-linear CHCs and LRA can be found at https://hgvk94.github.io/ gspacer/.

disabled performs the best in CHC-COMP 2018, and only slightly worse in CHC-COMP 2019! In comparison, disabling interpolation hurts Spacer significantly.

Figure 3 provides a detailed comparison of GSpacer with and without interpolation. Interpolation makes no difference to the depth of convergence. This implies that lemmas that are discovered by interpolation are discovered as efficiently by the global rules of GSpacer. On the other hand, interpolation significantly increases the running time. Interestingly, the time spent in interpolation itself is insignificant. However, the lemmas produced by interpolation tend to slow down other aspects of the algorithm. Most of the slow down is in increased time for inductive generalization and in computation of predecessors. The comparison between the other interpolation-enabled strategy and GSpacer (*sc*) shows a similar trend.


**Table 1.** Comparison between Spacer and GSpacer on CHC-COMP.

**Fig. 2.** Best configurations: GSpacer versus Spacer.

*Comparison with* LinearArbitrary. In [28], the authors show that Linear-Arbitrary, to which we refer as LArb for short, significantly outperforms Spacer on a curated subset of benchmarks from SV-COMP [24] competition.

At first, we attempted to compare LArb against GSpacer on the CHC-COMP benchmarks. However, LArb did not perform well on them. Even the

**Fig. 3.** Comparing GSpacer with different interpolation tactics.

baseline Spacer has outperformed LArb significantly. Therefore, for a more meaningful comparison, we have also compared Spacer, LArb and GSpacer on the benchmarks from the artifact evaluation of [28]. The results are summarized in Table 2. As expected, LArb outperforms the baseline Spacer on the safe benchmarks. On unsafe benchmarks, Spacer is significantly better than LArb. In both categories, GSpacer dominates solving more safe benchmarks than either Spacer or LArb, while matching performance of Spacer on unsafe instances. Furthermore, GSpacer remains orders of magnitude faster than LArb on benchmarks that are solved by both. This comparison shows that incorporating local reasoning with global guidance not only mitigates its shortcomings but also surpasses global data-driven reasoning.

**Table 2.** Comparison with LArb.


#### **6 Related Work**

The limitations of local reasoning in SMT-based infinite state model checking are well known. Most commonly, they are addressed with either (a) different strategies for local generalization in interpolation (e.g., [1,6,19,23]), or (b) shifting the focus to *global* invariant inference by learning an invariant of a restricted shape (e.g., [9,14–16,28]).

*Interpolation Strategies.* Albarghouthi and McMillan [1] suggest to minimize the number of literals in an interpolant, arguing that simpler (i.e., fewer half-spaces) interpolants are more likely to generalize. This helps with myopic generalizations (Fig. 1(a)), but not with excessive generalizations (Fig. 1(b)). On the contrary, Blicha et al. [6] decompose interpolants to be numerically simpler (but with more literals), which helps with excessive, but not with myopic, generalizations. Deciding *locally* between these two techniques or on their combination (i.e., some parts of an interpolant might need to be split while others combined) seems impossible. Schindler and Jovanovic [23] propose local interpolation that bounds the number of lemmas generated from a single pob (which helps with Fig. 1(c)), but only if inductive generalization is disabled. Finally, [19] suggests using external guidance, in a form of predicates or terms, to guide interpolation. In contrast, GSpacer uses global guidance, based on the current proof, to direct different local generalization strategies. Thus, the guidance is automatically tuned to the specific instance at hand rather than to a domain of problems.

*Global Invariant Inference.* An alternative to inferring lemmas for the inductive invariant by blocking counterexamples is to enumerate the space of potential candidate invariants [9,14–16,28]. This does not suffer from the pitfall of local reasoning. However, it is only effective when the search space is constrained. While these approaches perform well on their target domain, they do not generalize well to a diverse set of benchmarks, as illustrated by results of CHC-COMP and our empirical evaluation in Sect. 5.

*Locality in SMT and* IMC. Local reasoning is also a known issue in SMT, and, in particular, in DPLL(T) (e.g., [22]). However, we are not aware of global guidance techniques for SMT solvers. Interpolation-based Model Checking (IMC) [20,21] that uses interpolants from proofs, inherits the problem. Compared to IMC, the propagation phase and inductive generalization of IC3 [7], can be seen as providing global guidance using lemmas found in other parts of the search-space. In contrast, GSpacer magnifies such global guidance by exploiting patterns within the lemmas themselves.

*IC3-SMT-based Model Checkers.* There are a number of IC3-style SMT-based infinite state model checkers, including [11,17,18]. To our knowledge, none extend the IC3-SMT framework with a global guidance. A rule similar to Subsume is suggested in [26] for the theory of bit-vectors and in [4] for LRA, but in both cases without global guidance. In [4], it is implemented via a combination of syntactic closure with interpolation, whereas we use MBP instead of interpolation. Refinement State Mining in [3] uses similar insights to our Subsume rule to refine predicate abstraction.

### **7 Conclusion and Future Work**

This paper introduces *global guidance* to mitigate the limitations of the local reasoning performed by SMT-based IC3-style model checking algorithms. Global guidance is necessary to redirect such algorithms from divergence due to persistent local reasoning. To this end, we present three general rules that introduce new lemmas and pobs by taking a global view of the lemmas learned so far. The new rules are not theory-specific, and, as demonstrated by Algorithm 5, can be incorporated to IC3-style solvers without modifying existing architecture. We instantiate, and implement, the rules for LIA in GSpacer, which extends Spacer.

Our evaluation shows that global guidance brings significant improvements to local reasoning, and surpasses invariant inference based solely on global reasoning. More importantly, global guidance decouples Spacer's dependency on interpolation strategy and performs almost equally well under all three interpolation schemes we consider. As such, using global guidance in the context of theories for which no good interpolation procedure exists, with bit-vectors being a primary example, arises as a promising direction for future research.

**Acknowledgements.** We thank Xujie Si for running the LArb experiments and collecting results. We thank the ERC starting Grant SYMCAR 639270 and the Wallenberg Academy Fellowship TheProSE for supporting the research visit. This research was partially supported by the United States-Israel Binational Science Foundation (BSF) grant No. 2016260, and the Israeli Science Foundation (ISF) grant No. 1810/18. This research was partially supported by grants from Natural Sciences and Engineering Research Council Canada.

## **References**


28. Zhu, H., Magill, S., Jagannathan, S.: A data-driven CHC solver. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, 18–22 June 2018, pp. 707–721 (2018)

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Towards Model Checking Real-World Software-Defined Networks**

Vasileios Klimis(B) , George Parisis , and Bernhard Reus

University of Sussex, Brighton, UK {v.klimis,g.parisis,bernhard}@sussex.ac.uk

**Abstract.** In software-defined networks (SDN), a controller program is in charge of deploying diverse network functionality across a large number of switches, but this comes at a great risk: deploying buggy controller code could result in network and service disruption and security loopholes. The automatic detection of bugs or, even better, verification of their absence is thus most desirable, yet the size of the network and the complexity of the controller makes this a challenging undertaking. In this paper, we propose MOCS, a highly expressive, optimised SDN model that allows capturing subtle real-world bugs, in a reasonable amount of time. This is achieved by (1) analysing the model for possible partial order reductions, (2) statically pre-computing packet equivalence classes and (3) indexing packets and rules that exist in the model. We demonstrate its superiority compared to the state of the art in terms of expressivity, by providing examples of realistic bugs that a prototype implementation of MOCS in Uppaal caught, and performance/scalability, by running examples on various sizes of network topologies, highlighting the importance of our abstractions and optimisations.

#### **1 Introduction**

Software-Defined Networking (SDN) [16] has brought about a paradigm shift in designing and operating computer networks. A logically centralised controller implements the control logic and 'programs' the data plane, which is defined by flow tables installed in network switches. SDN enables the rapid development of advanced and diverse network functionality; e.g. in designing next-generation inter-data centre traffic engineering [10], load balancing [19], firewalls [24], and Internet exchange points (IXPs) [15]. SDN has gained noticeable ground in the industry, with major vendors integrating OpenFlow [37], the de-facto SDN standard maintained by the Open Networking Forum, in their products. Operators deploy it at scale [27,38]. SDN presents a unique opportunity for innovation and rapid development of complex network services by enabling all players, not just vendors, to develop and deploy control and data plane functionality in networks. This comes at a great risk; deploying buggy code at the controller could result in problematic flow entries at the data plane and, potentially, service disruption [13,18,47,49] and security loopholes [7,26]. Understanding and fixing such bugs is far from trivial, given the distributed and concurrent nature of computer networks and the complexity of the control plane [44].

With the advent of SDN, a large body of research on verifying network properties has emerged [33]. Static network analysis approaches [2,11,30,34,45,51] can only verify network properties on a given fixed network configuration but this may be changing very quickly (e.g. as in [1]). Another key limitation is the fact that they cannot reason about the controller program, which, itself, is responsible for the changes in the network configuration. Dynamic approaches, such as [23,29,31,40,48,50], are able to reason about network properties as changes happen (i.e. as flow entries in switches' flow tables are being added and deleted), but they cannot reason about the controller program either. As a result, when a property violation is detected, there is no straightforward way to fix the bug in the controller code, as these systems are oblivious of the running code. Identifying bugs in large and complex deployments can be extremely challenging.

Formal verification methods that include the controller code in the model of the network can solve this important problem. Symbolic execution methods, such as [5,8,11,12,14,28,46], evaluate programs using symbolic variables accumulating path-conditions along the way that then can be solved logically. However, they suffer from the path explosion problem caused by loops and function calls which means verification does not scale to larger controller programs (bug finding still works but is limited). Model checking SDNs is a promising area even though only few studies have been undertaken [3,8,28,35,36,43]. Networks and controller can be naturally modelled as transition systems. State explosion is always a problem but can be mitigated by using abstraction and optimisation techniques (i.e. partial order reductions). At the same time, modern model checkers [6,9,20,21,25] are very efficient.

netsmc [28] uses a bespoke *symbolic* model checking algorithm for checking properties given a subset of computation tree logic that allows quantification only over all paths. As a result, this approach scales relatively well, but the requirement that only one packet can travel through the network at any time is very restrictive and ignores race conditions. nice [8] employs model checking but only looks at a limited amount of input packets that are extracted through symbolically executing the controller code. As a result, it is a bug-finding tool only. The authors in [43] propose a model checking approach that can deal with dynamic controller updates and an arbitrary number of packets but require manually inserted non-interference lemmas that constrain the set of packets that can appear in the network. This significantly limits its applicability in realistic network deployments. Kuai [35] overcomes this limitation by introducing modelspecific partial order reductions (PORs) that result in pruning the state space by avoiding redundant explorations. However, it has limitations explained at the end of this section.

In this paper, we take a step further towards the full realisation of model checking real-world SDNs by introducing MOCS (MOdel Checking for Software defined networks)<sup>1</sup>, a highly expressive, optimised SDN model which we

<sup>1</sup> A release of MOCS is publicly available at https://tinyurl.com/y95qtv5k.

implemented in Uppaal<sup>2</sup> [6]. MOCS, compared to the state of the art in model checking SDNs, can model network behaviour more realistically and verify larger deployments using fewer resources. The main contributions of this paper are:

**Model Generality.** The proposed network model is closer to the Open-Flow standard than previous models (e.g. [35]) to reflect commonly exhibited behaviour between the controller and network switches. More specifically, it allows for race conditions between control messages and includes a significant number of OpenFlow interactions, including barrier response messages. In our experimentation section, we present families of elusive bugs that can be efficiently captured by MOCS.

**Model Checking Optimisations.** To tackle the state explosion problem we propose context-dependent *partial order reductions* by considering the concrete control program and specification in question. We establish the soundness of the proposed optimisations. Moreover, we propose *state representation optimisations*, namely packet and rule indexing, identification of packet equivalence classes and bit packing, to improve performance. We evaluate the benefits from all proposed optimisations in Sect. 4.

Our model has been inspired by Kuai [35]. According to the contributions above, however, we consider MOCS to be a considerable improvement. We model more OpenFlow messages and interactions, enabling us to check for bugs that [35] cannot even express (see discussion in Sect. 4.2). Our context-dependent PORs systematically explore possibilities for optimisation. Our optimisation techniques still allow MOCS to run at least as efficiently as Kuai, often with even better performance.

### **2 Software-Defined Network Model**

A key objective of our work is to enable the verification of network-wide properties in real-world SDNs. In order to fulfill this ambition, we present an extended network model to capture complex interactions between the SDN controller and the network. Below we describe the adopted network model, its state and transitions.

#### **2.1 Formal Model Definition**

The formal definition of the proposed SDN model is by means of an actiondeterministic transition system. We parameterise the model by the underlying network topology λ and the controller program cp in use, as explained further below (Sect. 2.2).

**Definition 1.** *An SDN model is a 6-tuple* <sup>M</sup>pλ,cp<sup>q</sup> <sup>=</sup> <sup>p</sup>S, s0, A, →, AP, Lq*, where* S *is the set of all states the SDN may enter,* s<sup>0</sup> *the initial state,* A *the set of*

<sup>2</sup> Uppaal has been chosen as future plans include extending the model to timed actions like e.g. timeouts. Note that the model can be implemented in any model checker.

*actions which encode the events the network may engage in,* →<sup>Ď</sup> <sup>S</sup> <sup>ˆ</sup> <sup>A</sup> <sup>ˆ</sup> <sup>S</sup> *the transition relation describing which execution steps the system undergoes as it perform actions,* AP *a set of atomic propositions describing relevant state properties, and* <sup>L</sup> : <sup>S</sup> <sup>→</sup> <sup>2</sup>AP *is a labelling function, which relates to any state* <sup>s</sup> <sup>P</sup> <sup>S</sup> *a set* <sup>L</sup>psq P <sup>2</sup>AP *of those atomic propositions that are true for* <sup>s</sup>*. Such an SDN model is composed of several smaller systems, which model network components (hosts, switches and the controller) that communicate via queues and, combined, give rise to the definition of* →*. The states of an SDN transition system are 3-tuples* <sup>p</sup>π, δ, γq*, where* <sup>π</sup> *represents the state of each host,* <sup>δ</sup> *the state of each switch, and* γ *the controller state. The components are explained in Sect. 2.2 and the transitions* <sup>→</sup> *in Sect. 2.3.*

Figure 1 illustrates a high-level view of OpenFlow interactions (left side), modelled actions and queues (right side).

**Fig. 1.** A high-level view of OpenFlow interactions using OpenFlow specification terminology (left half) and the modelled actions (right half). A red solid-line arrow depicts an action which, when fired, (1) dequeues an item from the queue the arrow begins at, and (2) adds an item in the queue the arrowhead points to (or multiple items if the arrow is double-headed). Deleting an item from the target queue is denoted by a reverse arrowhead. A forked arrow denotes multiple targeted queues. (Color figure online)

#### **2.2 SDN Model Components**

Throughout we will use the common "dot-notation" ( . ) to refer to components of composite gadgets (tuples), e.g. queues of switches, or parts of the state. We use obvious names for the projections functions like s.δ.sw.pq for the packet queue of the switch *sw* in state s. At times we will also use t<sup>1</sup> and t<sup>2</sup> for the first and second projection of tuple t.

**Network Topology.** A location <sup>p</sup>*n*, *pt*<sup>q</sup> is a pair of a node (host or switch) n and a port pt. We describe the network topology as a bijective function <sup>λ</sup> : <sup>p</sup>*Switches* <sup>Y</sup> *Hosts*q ˆ *Ports* <sup>→</sup> <sup>p</sup>*Switches* <sup>Y</sup> *Hosts*q ˆ *Ports* consisting of a set of directed edges pn, ptq,pn , pt <sup>q</sup>, where pt is the input port of the switch or host n that is connected to port pt at host or switch n. *Hosts*, *Switches* and *Ports* are the (finite) sets of all hosts, switches and ports in the network, respectively. The topology function is used when a packet needs to be forwarded in the network. The location of the next hop node is decided when a *send*, *match* or *fwd* action (all defined further below) is fired. Every SDN model is w.r.t. a fixed topology λ that does not change.

**Packets.** Packets are modelled as finite bit vectors and transferred in the network by being stored to the queues of the various network components. A packet <sup>P</sup> *Packets* (the set of all packets that can appear in the network) contains bits describing the proof-relevant header information and its location loc.

**Hosts.** Each host <sup>P</sup> *Hosts*, has a packet queue (*rcvq*) and a finite set of ports which are connected to ports of other switches. A host can send a packet to one or more switches it is connected to (*send* action in Fig. 1) or receive a packet from its own rcvq (*recv* action in Fig. 1). Sending occurs repeatedly in a nondeterministic fashion which we model implicitly via the <sup>p</sup>0, 8q abstraction at switches' packet queues, as discussed further below.

**Switches.** Each switch <sup>P</sup> *Switches*, has a flow table (*ft*), a packet queue (*pq*), a control queue (*cq*), a forwarding queue (*fq*) and one or more ports, through which it is connected to other switches and/or hosts. A flow table *ft* Ď *Rules* is a set of forwarding rules (with Rules being the set of all rules). Each one consists of a tuple <sup>p</sup>priority, pattern, portsq, where priority <sup>P</sup> <sup>N</sup> determines the priority of the rule over others, pattern is a proposition over the proof-relevant header of a packet, and ports is a subset of the switch's ports. Switches match packets in their packet queues against rules (i.e. their respective pattern) in their flow table (*match* action in Fig. 1) and forward packets to a connected device (or final destination), accordingly. Packets that cannot be matched to any rule are sent to the controller's request queue (*rq*) (*nomatch* action in Fig. 1); in OpenFlow, this is done by sending a *PacketIn* message. The forwarding queue *fq* stores packets forwarded by the controller in *PacketOut* messages. The control queue stores messages sent by the controller in *FlowMod* and *BarrierReq* messages. *FlowMod* messages contain instructions to add or delete rules from the flow table (that trigger *add* and *del* actions in Fig. 1). *BarrierReq* messages contain barriers to synchronise the addition and removal of rules. MOCS conforms to the OpenFlow specifications and always execute instructions in an interleaved fashion obeying the ordering constraints imposed by barriers.

**OpenFlow Controller.** The controller is modelled as a finite state automaton embedded into the overall transition system. A controller program cp, as used to parametrise an SDN model, consists of <sup>p</sup>CS, *pktIn*, *barrierIn*q. It uses its own local state cs <sup>P</sup> CS, where CS is the finite set of control program states. Incoming *PacketIn* and *BarrierRes* messages from the SDN model are stored in separate queues (*rq* and *brq*, respectively) and trigger *ctrl* or *bsync* actions (see Fig. 1) which are then processed by the controller program in its current state. The controller's corresponding handler, *pktIn* for *PacketIn* messages and *barrierIn* for *BarrierRes* messages, responds by potentially changing its local state and sending messages to a subset of *Switches*, as follows. A number of *PacketOut* messages (pairs of *pkt*, *ports*) can be sent to a subset of *Switches*. Such a message is stored in a switch's forward queue and instructs it to forward packet *pkt* along the ports *ports*. The controller may also send any number of *FlowMod* and *BarrierReq* messages to the control queue of any subset of *Switches*. A *FlowMod* message may contain an *add* or *delete* rule modification instruction. These are executed in an arbitrary order by switches, and *barriers* are used to synchronise their execution. Barriers are sent by the controller in *BarrierReq* messages. OpenFlow requires that a response message (*BarrierRes*) is sent to the controller by a switch when a barrier is consumed from its control queue so that the controller can synchronise subsequent actions. Our model includes a *brepl* action that models the sending of a *BarrierRes* message from a switch to the controller's barrier reply queue (*brq*), and a *bsync* action that enables the controller program to react to barrier responses.

**Queues.** All queues in the network are modelled as *finite* state. Packet queues pq for switches are modelled as multisets, and we adopt <sup>p</sup>0, 8q abstraction [41]; i.e. a packet is assumed to appear either zero or an arbitrary (unbounded) amount of times in the respective multiset. This means that once a packet has arrived at a switch or host, (infinitely) many other packets of the same kind repeatedly arrive at this switch or host. Switches' forwarding queues *fq* are, by contrast, modelled as sets, therefore if multiple identical packets are sent by the controller to a switch, only one will be stored in the queue and eventually forwarded by the switch. The controller's request *rq* and barrier reply queues *brq* are modelled as sets as well. Hosts' receive queues rcvq are also modelled as sets. Controller queues cq at switches are modelled as a finite sequence of sets of control messages (representing add and remove rule instructions), interleaved by any number of barriers. As the number of barriers that can appear at any execution is finite, this sequence is finite.

#### **2.3 Guarded Transitions**

Here we provide a detailed breakdown of the transition relation <sup>s</sup> <sup>α</sup>pa<sup>q</sup> ´´<sup>→</sup> <sup>s</sup> for each action <sup>α</sup>paq P <sup>A</sup>psq, where <sup>A</sup>ps<sup>q</sup> the set of all enabled actions in <sup>s</sup> in the proposed model (see Fig. 1). Transitions are labelled by action names α with arguments a. The transitions are only enabled in state s if s satisfies certain conditions called *guards* that can refer to the arguments a. In guards, we make use of predicate *bestmatch*p*sw*, *<sup>r</sup>* , *pkt*<sup>q</sup> that expresses that <sup>r</sup> is the highest priority rule in *sw*.*ft* that matches pkt's header. Below we list all possible actions with their respective guards.

*send*p*h, pt, pkt*q*.* Guard: *true*. This transition models packets arriving in the network in a non-deterministic fashion. When it is executed, pkt is added to the packet queue of the network switch connected to the port pt of host h (or, formally, to <sup>λ</sup>ph, ptq1.pq, where <sup>λ</sup> is the topology function described above). As described in Sect. 3.2, only relevant representatives of packets are actually sent by end-hosts. This transition is unguarded, therefore it is always enabled.

*recv*p*h, pkt*q*.* Guard: *pkt* <sup>P</sup> *<sup>h</sup>*.*rcvq*. This transition models hosts receiving (and removing) packets from the network and is enabled if pkt is in h's receive queue.

*match*p*sw, pkt, r*q*.* Guard: pkt <sup>P</sup> sw.pq^<sup>r</sup> <sup>P</sup> *sw*.*ft* ^*bestmatch*psw, r, pktq. This transition models matching and forwarding packet pkt to zero or more next hop nodes (hosts and switches), as a result of highest priority matching of rule r with pkt. The packet is then copied to the packet queues of the connected hosts and/or switches, by applying the topology function to the port numbers in the matched rule; i.e. <sup>λ</sup>psw, ptq1.pq, @pt <sup>P</sup> r.ports. Dropping packets is modelled by having a special 'drop' port that can be included in rules. The location of the forwarded packet(s) is updated with the respective destination (switch/host, port) pair; i.e. <sup>λ</sup>psw, ptq. Due to the <sup>p</sup>0, 8q abstraction, the packet is not removed from sw.pq.

*nomatch*p*sw, pkt*q*.* Guard: pkt <sup>P</sup> sw.pq ^ <sup>r</sup> <sup>P</sup> *sw*.*ft* . *bestmatch*psw, r, pktq. This transition models forwarding a packet to the OpenFlow controller when a switch does not have a rule in its forwarding table that can be matched against the packet header. In this case, pkt is added to rq for processing. pkt is not removed from sw.pq due to the supported <sup>p</sup>0, 8q abstraction.

*ctrl*p*sw, pkt, cs*q*.* Guard: pkt <sup>P</sup> *controller* .*rq*. This transition models the execution of the packet handler by the controller when packet pkt that was previously sent by *sw* is available in *rq*. The controller's packet handler function *pktIn*p*sw*, *pkt*, *cs*<sup>q</sup> is executed which, in turn (i) reads the current controller state cs and changes it according to the controller program, (ii) adds a number of rules, interleaved with any number of barriers, into the *cq* of zero or more switches, and (iii) adds zero or more forwarding messages, each one including a packet along with a set of ports, to the *fq* of zero or more switches.

*fwd*p*sw, pkt, ports*q*.* Guard: <sup>p</sup>pkt, portsq P *sw*.*fq*. This transition models forwarding packet pkt that was previously sent by the controller to sw's forwarding queue *sw*.*fq*. In this case, pkt is removed from *sw*.*fq* (which is modelled as a set), and added to the *pq* of a number of network nodes (switches and/or hosts), as defined by the topology function <sup>λ</sup>psw, ptq1.pq, @pt <sup>P</sup> ports. The location of the forwarded packet(s) is updated with the respective destination (switch/host, port) pair; i.e. <sup>λ</sup>pn, ptq.

*FM*p*sw, r*q, where *FM* <sup>P</sup> {*add*, *del*}. Guard: <sup>p</sup>*FM* , rq P *head*psw.cqq. These transitions model the addition and deletion, respectively, of a rule in the flow table of switch *sw*. They are enabled when one or more *add* and *del* control messages are in the set at the head of the switch's control queue. In this case, r is added to – or deleted from, respectively – *sw*.*ft* and the control message is deleted from the set at the head of *cq*. If the set at the head of *cq* becomes empty it is removed. If then the next item in *cq* is a barrier, a *brepl* transition becomes enabled (see below).

*brepl*p*sw, xid*q*.* Guard: <sup>b</sup>pxid<sup>q</sup> <sup>=</sup> *head*p*sw*.*cq*q. This transition models a switch sending a barrier response message, upon consuming a barrier from the head of its control queue; i.e. if <sup>b</sup>pxid<sup>q</sup> is the head of *sw*.*cq*, where xid <sup>P</sup> <sup>N</sup> is an identifier for the barrier set by the controller, <sup>b</sup>pxid<sup>q</sup> is removed and the barrier reply message *br* <sup>p</sup>*sw*, *xid*<sup>q</sup> is added to the controller's *brq*.

*bsync*p*sw, xid, cs*q*.* Guard: brpsw, xidq P *controller* .*brq*. This transition models the execution of the barrier response handler by the controller when a barrier response sent by switch *sw* is available in *brq*. In this case, *br* <sup>p</sup>*sw*, *xid*<sup>q</sup> is removed from the *brq*, and the controller's barrier handler *barrierIn*p*sw*, *xid*, *cs*<sup>q</sup> is executed which, in turn (i) reads the current controller state *cs* and changes it according to the controller program, (ii) adds a number of rules, interleaved with any number of barriers, into the *cq* of zero or more switches, and (iii) adds zero or more forwarding messages, each one including a packet along with a set of ports, to the *fq* of zero or more switches.

**An Example Run.** In Fig. 2, we illustrate a sequence of MOCS transitions through a simple packet forwarding example. The run starts with a *send* transition; packet *p* is copied to the packet queue of the switch in black. Initially, switches' flow tables are empty, therefore *p* is copied to the controller's request queue (*nomatch* transition); note that *p* remains in the packet queue of the switch in black due to the <sup>p</sup>0, 8q abstraction. The controller's packet handler is then called (*ctrl* transition) and, as a result, (1) *p* is copied to the forwarding queue of the switch in black, (2) rule r<sup>1</sup> is copied to the control queue of the switch in black, and (3) rule r<sup>2</sup> is copied to the control queue of the switch in white. Then, the switch in black forwards *p* to the packet queue of the switch in white (*fwd* transition). The switch in white installs r<sup>2</sup> in its flow table (*add* transition) and then matches *p* with the newly installed rule and forwards it to the receive queue of the host in white (*match* transition), which removes it from the network (*recv* transition).

#### **2.4 Specification Language**

In order to specify properties of packet flow in the network, we use LTL formulas without "next-step" operator <sup>3</sup>, where atomic formulae denoting properties of states of the transition system, i.e. SDN network. In the case of safety properties, i.e. an invariant w.r.t. states, the LTL\{-} formula is of the form ϕ, i.e. has only an outermost temporal connective.

Let P denote unary predicates on packets which encode a property of a packet based on its fields. An atomic *state condition* (proposition) in AP is either of the following: (i) existence of a packet pkt located in a packet queue (pq) of a switch or in a receive queue (*rcvq*) of a host that satisfies P (we denote this by <sup>D</sup>pktPn.pq . Pppkt<sup>q</sup> with <sup>n</sup> <sup>P</sup> *Switches*, and <sup>D</sup>pktPh.rcvq . Pppkt<sup>q</sup>

<sup>3</sup> This is the largest set of formulae supporting the partial order reductions used in Sect. 3, as stutter equivalence does not preserve the truth value of formulae with the -.

**Fig. 2.** Forwarding p from to . Non greyed-out icons are the ones whose state changes in the current transition.

with <sup>h</sup> <sup>P</sup> *Hosts*)<sup>4</sup>; (ii) the controller is in a specific *controller* state <sup>q</sup> <sup>P</sup> CS, denoted by a unary predicate symbol <sup>Q</sup>pq<sup>q</sup> which holds in system state <sup>s</sup> <sup>P</sup> <sup>S</sup> if q = s.γ.cs. The specification logic comprises first-order formula with equality on the finite domains of switches, hosts, rule priorities, and ports which are *state-independent* (and decidable).

For example, <sup>D</sup>pktPsw.pq . Pppkt<sup>q</sup> represents the fact that the packet predicate <sup>P</sup><sup>p</sup> <sup>q</sup> is true for at least one packet pkt in the *pq* of switch sw. For every atomic packet proposition <sup>P</sup>ppktq, also its negation <sup>¬</sup>Pppkt<sup>q</sup> is an atomic proposition for the reason of simplifying syntactic checks of formulae in Table 1 in the next section. Note that universal quantification over packets in a queue is a derived notion. For instance, @pktPn.pq . Pppkt<sup>q</sup> can be expressed as pkPn.pq .¬Pppktq. Universal and existential quantification over switches or hosts can be expressed by finite iterations of ^ and \_, respectively.

In order to be able to express that a condition holds when a certain event happened, we add to our propositions instances of *propositional dynamic logic* [17,42]. Given an action <sup>α</sup>p·q P <sup>A</sup> and a proposition <sup>P</sup> that may refer to any variables in x, [αpxq]<sup>P</sup> is also a proposition and [αpxq]<sup>P</sup> is true if, and only if, after firing transition <sup>α</sup>pa<sup>q</sup> (to get to the current state), <sup>P</sup> holds with the variables in x bound to the corresponding values in the actual arguments a. With the help of those basic modalities one can then also specify that more complex events occurred. For instance, dropping of a packet due to a *match* or *fwd* action can be expressed by [*match*psw, pkt, rq]pr.*fwd port* <sup>=</sup> drop<sup>q</sup> <sup>∧</sup> [*fwd*psw, pkt, ptq]ppt <sup>=</sup> dropq. Such predicates derived from modalities are used in [32] (extended version of this paper, with proofs and controller programs), Appendix B-CP5.

<sup>4</sup> Note that these are *atomic* propositions despite the use of the existential quantifier notation.

The meaning of temporal LTL operators is standard depending on the trace of a transition sequence s<sup>0</sup> α<sup>1</sup> ´<sup>→</sup> <sup>s</sup><sup>1</sup> α<sup>2</sup> ´<sup>→</sup> .... The trace <sup>L</sup>ps0qLps1q...Lpsiq... is defined as usual. For instance, trace <sup>L</sup>ps0qLps1qLps2q... satisfies invariant ϕ if each <sup>L</sup>psi<sup>q</sup> implies <sup>ϕ</sup>.

#### **3 Model Checking**

In order to verify desired properties of an SDN, we use its model as described in Definition 1 and apply model checking. In the following we propose optimisations that significantly improve the performance of model checking.

#### **3.1 Contextual Partial-Order Reduction**

*Partial order reduction* (POR) [39] reduces the number of interleavings (traces) one has to check. Here is a reminder of the main result (see [4]) where we use a stronger condition than the regular (*C4* ) to deal with cycles:

**Theorem 1 (Correctness of POR).** *Given a finite transition system* <sup>M</sup> <sup>=</sup> <sup>p</sup>S, A, →, s0, AP, L<sup>q</sup> *that is action-deterministic and without terminal states, let* <sup>A</sup>ps<sup>q</sup> *denote the set of actions in* <sup>A</sup> *enabled in state* <sup>s</sup> <sup>P</sup> <sup>S</sup>*. Let ample*psq Ď <sup>A</sup>ps<sup>q</sup> *be a set of actions for a state* <sup>s</sup> <sup>P</sup> <sup>S</sup> *that satisfies the following conditions:*


*where* <sup>M</sup>*ample* <sup>=</sup> <sup>p</sup>*S<sup>a</sup>* , *<sup>A</sup>*, →→, *<sup>s</sup><sup>0</sup>* , *AP*, *<sup>L</sup><sup>a</sup>* <sup>q</sup> *is the new, optimised, model defined as follows: let* <sup>S</sup><sup>a</sup> <sup>Ď</sup> <sup>S</sup> *be the set of states reachable from the initial state* <sup>s</sup><sup>0</sup> *under* →→*, let* <sup>L</sup>aps<sup>q</sup> <sup>=</sup> <sup>L</sup>ps<sup>q</sup> *for all* <sup>s</sup> <sup>P</sup> <sup>S</sup>a*, and define* →→<sup>Ď</sup> <sup>S</sup><sup>a</sup> <sup>ˆ</sup> <sup>A</sup> <sup>ˆ</sup> <sup>S</sup><sup>a</sup> *inductively by the rule*

$$\frac{s \stackrel{\alpha}{\longrightarrow} s'}{s \stackrel{\alpha}{\longrightarrow} s'} \qquad \text{if} \quad \alpha \in \operatorname{angle}(s)$$

*If ample*ps<sup>q</sup> *satisfies conditions (C1)–(C4) as outlined above, then for each path in* <sup>M</sup> *there exists a stutter-trace equivalent path in* <sup>M</sup>*ample , and vice-versa, denoted* <sup>M</sup> st ≡ M*ample .*

The intuitive reason for this theorem to hold is the following: Assume an action sequence <sup>α</sup>i...α<sup>i</sup>`<sup>n</sup><sup>β</sup> that reaches the state <sup>s</sup>, and <sup>β</sup> is *independent* of {αi, ...α<sup>i</sup>`<sup>n</sup>}. Then, one can permute <sup>β</sup> with <sup>α</sup><sup>i</sup>`<sup>n</sup> through <sup>α</sup><sup>i</sup> successively <sup>n</sup> times. One can therefore construct the sequence βαi...αi`<sup>n</sup> that also reaches the state <sup>s</sup>. If this shift of β does not affect the labelling of the states with atomic propositions (β is called *invisible* in this case), then it is not detectable by the property to be shown and the permuted and the original sequence are equivalent w.r.t. the property and thus don't have to be checked both. One must, however, ensure, that in case of loops (infinite execution traces) the ample sets do not *preclude* some actions to be fired altogether, which is why one needs (*C4* ).

The more actions that are both stutter and provably independent (also referred to as *safe actions* [22]) there are, the smaller the transition system, and the more efficient the model checking. One of our contributions is that we attempt to identify *as many safe actions as possible* to make PORs more widely applicable to our model.

The PORs in [35] consider only dependency and invisibility of *recv* and *barrier* actions, whereas we explore systematically all possibilities for applications of Theorem 1 to reduce the search space. When identifying safe actions, we consider (1) the actual controller program cp, (2) the topology λ and (3) the state formula ϕ to be shown invariant, which we call the *context* ctx of actions. It turns out that two actions may be dependent in a given context of abstraction while independent in another context, and similarly for invisibility, and we exploit this fact. The argument of the action thus becomes relevant as well.

**Definition 2 (Safe Actions).** *Given a context* ctx <sup>=</sup> <sup>p</sup>cp, λ, ϕq*, and SDN model* <sup>M</sup>pλ,cp<sup>q</sup> <sup>=</sup> <sup>p</sup>S, A, →, s0, *AP*, Lq*, an action* <sup>α</sup>p·q P <sup>A</sup>ps<sup>q</sup> *is called 'safe' if it is independent of any other action in* A *and invisible for* ϕ*. We write safe actions* <sup>α</sup>ˇp·q*.*

**Definition 3 (Order-sensitive Controller Program).** *A controller program* cp *is order-sensitive if there exists a state* <sup>s</sup> <sup>P</sup> <sup>S</sup> *and two actions* α, β *in* {*ctrl*p·q, *bsync*p·q} *such that* α, β <sup>P</sup> <sup>A</sup>ps<sup>q</sup> *and* <sup>s</sup> <sup>α</sup> ´<sup>→</sup> <sup>s</sup><sup>1</sup> β ´<sup>→</sup> <sup>s</sup><sup>2</sup> *and* <sup>s</sup> β ´<sup>→</sup> <sup>s</sup><sup>3</sup> α ´<sup>→</sup> <sup>s</sup><sup>4</sup> *with* <sup>s</sup><sup>2</sup> <sup>=</sup> <sup>s</sup>4*.*

**Definition 4.** *Let* <sup>ϕ</sup> *be a state formula. An action* <sup>α</sup> <sup>P</sup> <sup>A</sup> *is called '*ϕ*-invariant' if* <sup>s</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup> *iff* <sup>α</sup>ps<sup>q</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup> *for all* <sup>s</sup> <sup>P</sup> <sup>S</sup> *with* <sup>α</sup> <sup>P</sup> <sup>A</sup>psq*.*

**Lemma 1.** *For transition system* <sup>M</sup>pλ,cp<sup>q</sup> <sup>=</sup> <sup>p</sup>S, A, →, s0, AP, L<sup>q</sup> *and a formula* <sup>ϕ</sup> <sup>P</sup> *LTL*\{-}*,* <sup>α</sup> <sup>P</sup> <sup>A</sup> *is safe* iff -3 <sup>i</sup>=1 *Safe*ipαq*, where Safe*i*, given in Table 1, are per-row.*

*Proof.* See [32] Appendix A.

**Theorem 2 (POR instance for SDN).** *Let* <sup>p</sup>cp, λ, ϕ<sup>q</sup> *be a context such that* <sup>M</sup>pλ,cp<sup>q</sup> <sup>=</sup> <sup>p</sup>S, A, →, s0, AP, L<sup>q</sup> *is an SDN network model from Definition 1; and let safe actions be as in Definition 2. Further, let ample*ps<sup>q</sup> *be defined by:*

$$sample(s) = \begin{cases} \{ \alpha \in A(s) \mid \alpha \text{ safe } \} \text{ if } \{ \alpha \in A(s) \mid \alpha \text{ safe } \} \neq \mathcal{B} \\ A(s) & \text{otherwise} \end{cases}$$


**Table 1.** Safeness predicates

*Then, ample satisfies the criteria of Theorem 1 and thus* Mpλ,cp<sup>q</sup> st ≡ M*ample* pλ,cpq 5

#### *Proof.*


*Case 1. A sequence of safe actions of same type*. Let us consider the different safe actions:

• Let <sup>ρ</sup> an execution of <sup>M</sup>ample <sup>p</sup>λ,cp<sup>q</sup> which consists of only one type of *ctrl*-actions:

$$\rho = s\_1 \xrightarrow{ctrl(pkt\_1, cs\_1)} s\_2 \xrightarrow{ctrl(pkt\_2, cs\_2)} \dots s\_{i-1} \xrightarrow{ctrl(pkt\_{i-1}, cs\_{i-1})} s\_i$$

Suppose ρ is a cycle. According to the *ctrl* semantics, for each transition s ctrlppkt,csq ´´´´´´´→→ <sup>s</sup> , where <sup>s</sup> <sup>=</sup> <sup>p</sup>π, δ, γq, <sup>s</sup> <sup>=</sup> <sup>p</sup>π , δ , γ <sup>q</sup>, it holds that <sup>γ</sup> .rq = γ.rq\{pkt} as we use sets to represent *rq* buffers. Hence, for the execution <sup>ρ</sup> it holds <sup>γ</sup>i.rq <sup>=</sup> <sup>γ</sup>1.rq\{pkt1, pkt2, ...pkt<sup>i</sup>´<sup>1</sup>} which implies that <sup>s</sup><sup>1</sup> <sup>=</sup> <sup>s</sup>i. Contradiction.

<sup>5</sup> Stutter equivalence here implicitly is defined w.r.t. the atomic propositions appearing in ϕ, but this suffices as we are just interested in the validity of ϕ.


*Case 2. A sequence of different safe actions*. Suppose there exists a cycle with mixed safe actions starting in s<sup>1</sup> and ending in si. Distinguish the following cases.

	- a) There is a *fwd* and/or *brepl* in the cycle: *fwd* will always switch to a state with smaller *fq* and *brepl* will always switch to a state with smaller *cq* (*brepl* and *recv* do not interfere with *fwd*). This implies that <sup>s</sup><sup>1</sup> <sup>=</sup> <sup>s</sup>i. Contradiction.
	- b) There is neither *fwd* nor *brepl* in the cycle. This means that only *recv* is in the cycle which is already covered by the first case.

 

Due to the definition of the transition system via ample sets, each safe action is immediately executed after its enabling one. Therefore, one can merge every transition of a safe action with its precursory enabling one. Intuitively, the semantics of the merged action is defined as the successive execution of its constituent actions. This process can be repeated if there is a chain of safe actions; for instance, in the case of s *nomatch*psw,pktq ´´´´´´´´´´´→→ <sup>s</sup> *ctrl*psw,pkt,cs<sup>q</sup> ´´´´´´´´´→→ <sup>s</sup> *fwd*psw,pkt,ports<sup>q</sup> ´´´´´´´´´´´→→ <sup>s</sup> where each transition enables the next and the last two are assumed to be safe. These transitions can be merged into one, yielding a stutter equivalent trace as the intermediate states are invisible (w.r.t. the context and thus the property to be shown) by definition of safe actions.

#### **3.2 State Representation**

Efficient state representation is crucial for minimising MOCS's memory footprint and enabling it to scale up to relatively large network setups.

**Packet and Rule Indexing.** In MOCS, only a single instance of each packet and rule that can appear in the modelled network is kept in memory. An index is then used to associate queues and flow tables with packets and rules, with a single bit indicating their presence (or absence). This data structure is illustrated in Fig. 3. For a data packet, a value of 1 in the pq section of the entry indicates that infinite copies of it are stored in the packet queue of the respective switch. A value of 1 in the *fq* section indicates that a single copy of the packet is stored in the forward queue of the respective switch. A value of 1 in the *rq* section indicates that a copy of the packet sent by the respective switch (when a *nomatch* transition is fired) is stored in the controller's request queue. For a rule, a value of 1 in the *ft* section indicates that the rule is installed in the respective switch's flow table. A value of 1 in the cq section indicates that the rule is part of a *FlowMod* message in the respective switch's control queue.

**Fig. 3.** Packet (left) and rule (right) indices

The proposed optimisation enables scaling up the network topology by minimising the required memory footprint. For every switch, MOCS only requires a few bits in each packet and rule entry in the index.

**Discovering Equivalence Classes of Packets.** Model checking with all possible packets, including all specified fields in the OpenFlow standard, would entail a huge state space that would render any approach unusable. Here, we propose the discovery of equivalence classes of packets that are then used for model checking. We first remove all fields that are not referenced in a statement or rule creation or deletion in the controller program. Then, we identify packet classes that would result in the same controller behaviour. Currently, as with the rest of literature, we focus on simple controller programs where such equivalence classes can be easily identified by analysing static constraints and rule manipulation in the controller program. We then generate one representative packet from each class and assign it to all network switches that are directly connected to end-hosts; i.e. modelling clients that can send an arbitrarily large number of packets in a non-deterministic fashion. We use the minimum possible number of bits to represent the identified equivalence classes. For example, if the controller program exerts different behaviour if the destination tcp port of a packet is 23 (i.e. destined to an ssh server) or not, we only use a 1-bit field to model this behaviour.

**Bit Packing.** We reduce the size of each recorded state by employing bit packing using the *int* type supported by Uppaal, and bit-level operations for the entries in the packet and rule indices as well as for the packets and rules themselves.

#### **4 Experimental Evaluation**

In this section, we experimentally evaluate MOCS by comparing it with the state of the art, in terms of performance (verification throughput and memory footprint) and model expressivity. We have implemented MOCS in Uppaal [6] as a network of parallel automata for the controller and network switches, which communicate asynchronously by writing/reading packets to/from queues that are part of the model discussed in Sect. 2. As discussed in Sect. 3, this is implemented by directly manipulating the packet and rule indices.

Throughout this section we will be using three examples of network controllers: (1) A *stateless firewall* ([32] Appendix B-CP1) requires the controller to install rules to network switches that enable them to decide whether to forward a packet towards its destination or not; this is done in a stateless fashion, i.e. without having to consider any previously seen packets. For example, a controller could configure switches to block all packets whose destination tcp port is 22 (i.e. destined to an ssh server). (2) A *stateful firewall* ([32] Appendix B-CP2) is similar to the stateless one but decisions can take into account previously seen packets. A classic example of this is to allow bi-directional communication between two end-hosts, when one host opens a tcp connection to the other. Then, traffic flowing from the other host back to the connection initiator should be allowed to go through the switches on the reverse path. (3) A *MAC learning application* ([32] Appendix B-CP3) enables the controller and switches to learn how to forward packets to their destinations (identified with respective MAC addresses). A switch sends a *PacketIn* message to the controller when it receives a packet that it does not know how to forward. By looking at this packet, the controller learns a mapping of a source switch (or host) to a port of the requesting switch. It then installs a rule (by sending a *FlowMod* message) that will allow that switch to forward packets back to the source switch (or host), and asks the requesting switch (by sending a *PacketOut* message) to flood the packet to all its ports except the one it received the packet from. This way, the controller eventually learns all mappings, and network switches receive rules that enable them to forward traffic to their neighbours for all destinations in the network.

#### **4.1 Performance Comparison**

We measure MOCS's performance, and also compare it against Kuai [35] <sup>6</sup> using the examples described above, and we investigate the behaviour of MOCS as we scale up the network (switches and clients/servers). We report three metrics:

<sup>6</sup> Note that parts of Kuai's source code are not publicly available, therefore we implemented it's model in Uppaal.

**Fig. 4.** Performance comparison – verification throughput

(1) *verification throughput* in visited states per second, (2) number of visited states, and (3) required memory. We have run all verification experiments on an 18-Core iMac pro, 2.3 GHz Intel Xeon W with 128 GB DDR4 memory.

**Verification Throughput.** We measure the verification throughput when running a single experiment at a time on one cpu core and report the average and standard deviation for the first 30 min of each run. In order to assess how MOCS's different optimisations affect its performance, we report results for the following system variants: (1) MOCS, (2) MOCS without POR, (3) MOCS without any optimisations (neither POR, state representation), and (4) Kuai. Figure 4 shows the measured throughput (with error bars denoting standard deviation).

For the MAC learning and stateless firewall applications, we observe that MOCS performs significantly better than Kuai for all different network setups and sizes<sup>7</sup>, achieving at least double the throughput Kuai does. The throughput performance is much better for the stateful firewall, too. This is despite the fact that, for this application, Kuai employs the unrealistic optimisation where the *barrier* transition forces the immediate update of the forwarding state. In other words, MOCS is able to explore significantly more states and identify bugs that Kuai cannot (see Sect. 4.2).

The computational overhead induced by our proposed PORs is minimal. This overhead occurs when PORs require dynamic checks through the safety predicates described in Table 1. This is shown in Fig. 4a, where, in order to decide about the (in)visibility of *fwd(sw,pk,pt)* actions, a lookup is performed in the history-array of packet *pk*, checking whether the bit which corresponds to switch sw , which is connected with port *pt* of *sw*, is set. On the other hand, if a POR does not require any dynamic checks, no penalty is induced, as shown in Figs. 4b

<sup>7</sup> <sup>S</sup> <sup>ˆ</sup> <sup>H</sup> in Figs. <sup>4</sup> to <sup>6</sup> indicates the number of switches <sup>S</sup> and hosts <sup>H</sup>.

**Fig. 6.** Performance comparison – memory footprint (logarithmic scale)

and 4c, where the throughput when the PORs are disabled is almost identical to the case where PORs are enabled. This is because it has been statically established at a pre-analysis stage that all actions of a particular type are always safe for any argument/state. It is important to note that even when computational overhead is induced, PORs enable MOCS to scale up to larger networks because the number of visited states can be significantly reduced, as discussed below.

In order to assess the contribution of the state representation optimisation in MOCS's performance, we measure the throughput when both PORs and state representation optimisations are disabled. It is clear that they contribute significantly to the overall throughput; without these the measured throughput was at least less than half the throughput when they were enabled.

**Number of Visited States and Required Memory.** Minimising the number of visited states and required memory is crucial for scaling up verification to larger networks. The proposed partial order reductions (Sect. 3.1) and identification of packet equivalent classes aim at the former, while packet/rule indexing and bit packing aim at the latter (§3.2). In Fig. 5, we present the results for the various setups and network deployments discussed above. We stopped scaling up the network deployment for each setup when the verification process required more than 24 h or started swapping memory to disk. For these cases we killed the process and report a topped-up bar in Figs. 5 and 6.

For the MAC learning application, MOCS can scale up to larger network deployments compared to Kuai, which could not verify networks consisting of more than 2 hosts and 6 switches. For that network deployment, Kuai visited ∼7 m states, whereas MOCS visited only ∼193 k states. At the same time, Kuai required around 48 GBs of memory (7061 bytes/state) whereas MOCS needed ∼43 MBs (228 bytes/state). Without the partial order reductions, MOCS can only verify tiny networks. The contribution of the proposed state representation optimisations is also crucial; in our experiments (results not shown due to lack of space), for the 6 ˆ 2 network setups (the largest we could do without these optimisations), we observed a reduction in state space (due to the identification of packet equivalence classes) and memory footprint (due to packet/rule indexing and bit packing) from ∼7 m to ∼200k states and from ∼6 KB per state to ∼230 B per state. For the stateless and stateful firewall applications, resp., MOCS performs equally well to Kuai with respect to scaling up.

#### **4.2 Model Expressivity**

The proposed model is significantly more expressive compared to Kuai as it allows for more asynchronous concurrency. To begin with, in MOCS, controller messages sent before a barrier request message can be interleaved with all other enabled actions, other than the control messages sent after the barrier. By contrast, Kuai always flushes all control messages until the last barrier in one go, masking a large number of interleavings and, potentially, buggy behaviour. Next, in MOCS *nomatch, ctrl* and *fwd* can be interleaved with other actions. In Kuai, it is enforced a mutual exclusion concurrency control policy through the *wait*semaphore: whenever a *nomatch* occurs the mutex is locked and it is unlocked by the *fwd* action of the thread *nomatch-ctrl-fwd* which refers to the same packet; all other threads are forced to wait. Moreover, MOCS does not impose any limit on the size of the *rq* queue, in contrast to Kuai where only one packet can exist in it. In addition, Kuai does not support notifications from the data plane to the controller for completed operations as it does not support reply messages and as a result any bug related to the fact that the controller is not synced to data-plane state changes is hidden.<sup>8</sup> Also, our specification language for states is more expressive than Kuai's, as we can use any property in LTL without "next", whereas Kuai only uses invariants with a single outermost -.

The MOCS extensions, however, are conservative with respect to Kuai, that is we have the following theorem (without proof, which is straightforward):

<sup>8</sup> There are further small extensions; for instance, in MOCS the controller can send multiple *PacketOut* messages (as OpenFlow prescribes).

**Theorem 3 (MOCS Conservativity).** *Let* <sup>M</sup>pλ,cp<sup>q</sup> <sup>=</sup> <sup>p</sup>S, A, →, s0, AP, L<sup>q</sup> *and* M*<sup>K</sup>* <sup>p</sup>λ,cp<sup>q</sup> <sup>=</sup> <sup>p</sup>SK, AK, →K, s0, AP, L<sup>q</sup> *the original SDN models of MOCS and Kuai, respectively, using the same topology and controller. Furthermore, let Traces*pMpλ,cpqq *and Traces*pM*<sup>K</sup>* pλ,cpq q *denote the set of all initial traces in these models, respectively. Then, Traces*pM*<sup>K</sup>* pλ,cpq q Ď *Traces*pMpλ,cpqq*.*

For each of the extensions mentioned above, we briefly describe an example (controller program and safety property) that expresses a bug that is impossible to occur in Kuai.

**Control Message Reordering Bug.** Let us consider a stateless firewall in Fig. 7a (controller is not shown), which is supposed to block incoming ssh packets from reaching the server (see [32] Appendix B-CP1). Formally, the safety property to be checked here is p@pkt <sup>P</sup> S.rcvq .¬pkt.sshq. Initially, flow tables are empty. Switch A sends a *PacketIn* message to the controller when it receives the first packet from the client (as a result of a *nomatch* transition). The controller, in response to this request (and as a result of a *ctrl* transition), sends the following *FlowMod* messages to switch A; rule r1 has the highest priority and drops all ssh packets, rule r2 sends all packets from port 1 to port 2, and rule r3 sends all packets from port 2 to port 1. If the packet that triggered the transition above is an ssh one, the controller drops it, otherwise, it instructs (through a *PacketOut* message) A to forward the packet to S. A bug-free controller should ensure that r1 is installed before any other rule, therefore it must send a barrier request after the *FlowMod* message that contains r1. If, by mistake, the *Flow-Mod* message for r2 is sent before the barrier request, A may install r2 before r1, which will result in violating the given property. MOCS is able to capture this buggy behaviour as its semantics allows control messages prior to the barrier to be processed in a interleaved manner.

**Fig. 7.** Two networks with (a) two switches, and (b) n stateful firewall replicas

**Wrong Nesting Level Bug.** Consider a correct controller program that enforces that server S (Fig. 7a) is not accessible through ssh. Formally, the safety property to be checked here is p@pkt <sup>P</sup> S.rcvq .¬pkt.sshq. For each incoming *PacketIn* message from switch A, it checks if the enclosed packet is an ssh one and destined to S. If not, it sends a *PacketOut* message instructing A to forward the packet to S. It also sends a *FlowMod* message to A with a rule that allows packets of the same protocol (not ssh) to reach S. In the opposite case (ssh), it checks (a Boolean flag) whether it had previously sent drop rules for ssh packets to the switches. If not, it sets flag to true, sends a *FlowMod* message with a rule that drops ssh packets to A and drops the packet. Note that this inner block does not have an else statement.

A fairly common error is to write a statement at the wrong nesting level ([32] Appendix B-CP4). Such a mistake can be built into the above program by nesting the outer else branch in the inner if block, such that it is executed any time an ssh-packet is encountered but the ssh drop-rule has already been installed (i.e. flag f is true). Now, the ssh drop rule, once installed in switch A, disables immediately a potential nomatchpA, p<sup>q</sup> with p.ssh <sup>=</sup> true that would have sent packet p to the controller, but if it has not yet been installed, a second incoming ssh packet would lead to the execution of the else statement of the inner branch. This would violate the property defined above, as p will be forwarded to S<sup>9</sup>.

MOCS can uncover this bug because of the correct modelling of the controller request queue and the asynchrony between the concurrent executions of control messages sent before a barrier. Otherwise, the second packet that triggers the execution of the wrong branch would not have appeared in the buffer before the first one had been dealt with by the controller. Furthermore, if all rules in messages up to a barrier were installed synchronously, the second packet would be dealt with correctly, so no bug could occur.

**Inconsistent Update Bug.** OpenFlow's barrier and barrier reply mechanisms allow for updating multiple network switches in a way that enables *consistent packet processing*, i.e., a packet cannot see a partially updated network where only a subset of switches have changed their forwarding policy in response to this packet (or any other event), while others have not done so. MOCS is expressive enough to capture this behaviour and related bugs. In the topology shown in Fig. 7a, let us assume that, by default, switch B drops all packets destined to S. Any attempt to reach S through A are examined separately by the controller and, when granted access, a relevant rule is installed at both switches (e.g. allowing all packets from C destined to S for given source and destination ports). Updates must be consistent, therefore the packet cannot be forwarded by A and dropped by B. Both switches must have the new rules in place, before the packet is forwarded. To do so, the controller, ([32] Appendix B-CP5), upon receiving a *PacketIn* message from the client's switch, sends the relevant rule to switch B (*FlowMod*) along with respective barrier (*BarrierReq*) and temporarily stores the packet that triggered this update. Only after receiving *BarrierRes* message from B, the controller will forward the previously stored packet back to A along with the relevant rule. This update is consistent and the packet is guaranteed to reach S. A (rather common) bug would be one where the controller installs the rules to both switches and at the same time forwards the packet to A. In this case, the packet may end up being dropped by B, if it arrives and gets processed before the relevant rule is installed, and therefore the invariant - [dropppkt, swq] .¬ppkt.dest <sup>=</sup> <sup>S</sup><sup>q</sup> , where [dropppkt, swq] is a quantifier that binds dropped packets (see definition in [32] Appendix B-CP5), would

<sup>9</sup> Here, we assume that the controller looks up a static forwarding table before sending *PacketOut* messages to switches.

be violated. For this example, it is crucial that MOCS supports barrier response messages.

## **5 Conclusion**

We have shown that an OpenFlow compliant SDN model, with the right optimisations, can be model checked to discover subtle real-world bugs. We proved that MOCS can capture real-world bugs in a more complicated semantics without sacrificing performance.

But this is not the end of the line. One could automatically compute equivalence classes of packets that cover all behaviours (where we still computed manually). To what extent the size of the topology can be restricted to find bugs in a given controller is another interesting research question, as is the analysis of the number and length of interleavings necessary to detect certain bugs. In our examples, all bugs were found in less than a second.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Software Verification**

## **Code2Inv: A Deep Learning Framework for Program Verification**

Xujie Si1(B), Aaditya Naik<sup>1</sup>, Hanjun Dai<sup>2</sup>, Mayur Naik<sup>1</sup>, and Le Song<sup>3</sup>

 University of Pennsylvania, Philadelphia, USA xsi@cis.upenn.edu Google Brain, Mountain View, USA Georgia Institute of Technology, Atlanta, USA

**Abstract.** We propose a general end-to-end deep learning framework Code2Inv, which takes a verification task and a proof checker as input, and automatically learns a valid proof for the verification task by interacting with the given checker. Code2Inv is parameterized with an embedding module and a grammar: the former encodes the verification task into numeric vectors while the latter describes the format of solutions Code2Inv should produce. We demonstrate the flexibility of Code2Inv by means of two small-scale yet expressive instances: a loop invariant synthesizer for C programs, and a Constrained Horn Clause (CHC) solver.

#### **1 Introduction**

A central challenge in automating program verification lies in effective proof search. Counterexample-guided Inductive Synthesis (CEGIS) [3,4,17,31,32] has emerged as a promising paradigm for solving this problem. In this paradigm, a *generator* proposes a candidate solution, and a *checker* determines whether the solution is correct or not; in the latter case, the checker provides a counterexample to the generator, and the process repeats.

Finding loop invariants is arguably the most crucial part of proof search in program verification. Recent works [2,9,10,26,29,38] have instantiated the CEGIS paradigm for synthesizing loop invariants. Since *checking* loop invariants is a relatively standard process, these works target *generating* loop invariants using various approaches, such as stochastic sampling [29], syntax-guided enumeration [2,26], and decision trees with templates [9,10] or linear classifiers [38]. Despite having greatly advanced the state-of-the-art in program verification, however, there remains significant room for improvement in practice.

We set out to build a CEGIS-based program verification framework and identified five key objectives that it must address to be useful:

– The proof search should automatically evolve according to a given verification task as opposed to using exhaustive enumeration or a fixed set of search heuristics common in existing approaches.

c The Author(s) 2020

X. Si, A. Naik—Both authors contributed equally to the paper.

<sup>-</sup>S. K. Lahiri and C. Wang (Eds.): CAV 2020, LNCS 12225, pp. 151–164, 2020. https://doi.org/10.1007/978-3-030-53291-8\_9


We present Code2Inv, an end-to-end deep learning framework which aims to realize the above objectives. Code2Inv has two key differences compared to existing CEGIS-based approaches. First, instead of simply focusing on counterexamples but ignoring program structure, Code2Inv learns a neural representation of program structure by leveraging graph neural networks [8,11,19,28], which enable to capture structural information and thereby generalize to different but structurally similar programs. Secondly, Code2Inv reduces loop invariant generation into a deep reinforcement learning problem [22,34]. No search heuristics or training labels are needed from human experts; instead, a neural policy for loop invariant generation can be automatically learned by interacting with the given proof checker on the fly. The learnable neural policy generates a loop invariant by taking a sequence of actions, which can be flexibly controlled by a grammar that defines the structure of loop invariants. This decoupling of the action definition from policy learning enables Code2Inv to adapt to different loop invariants or other reasoning tasks in a new domain with almost no changes except for adjusting the grammar or the underlying checker.

We summarize our contributions as follows:


## **2 Background**

In this section, we introduce artificial neural network concepts used by Code2Inv. A multilayer perceptron (MLP) is a basic neural network model which can approximate an arbitrary continuous function **y** = f <sup>∗</sup>(**x**), where **x** and **y** are numeric vectors. An MLP defines a mapping **y** = f(**x**; θ), where θ denotes weights of connections, which are usually trained using gradient descent methods.

Recurrent neural networks (RNNs) approximate the mapping from a sequence of inputs **x**(1), ..., **x**(*t*) to either a single output **y** or a sequence of outputs **y**(1), ..., **y**(*t*). An RNN defines a mapping **h**(*t*) = f(**h**(*t*−1), **x**(*t*); θ), where **h**(*t*) is the hidden state, from which the final output **y**(*t*) can be computed (e.g. by a non-linear transformation or an MLP). A common RNN model is the long short-term memory network (LSTM) [16] which is used to learn long-term dependencies. Two common variants of LSTM are gated recurrent units (GRUs) [7] and tree-structured LSTM (Tree-LSTM) [35]. The former simplifies the LSTM for efficiency while the latter extends the modeling ability to tree structures.

In many domains, graphs are used to represent data with rich structure, such as programs, molecules, social networks, and knowledge bases. Graph neural networks (GNNs) [1,8,11,19,36] are commonly used to learn over graphstructured data. A GNN learns an embedding (i.e. real-valued vector) for each node of the given graph using a recursive neighborhood aggregation (or neural message passing) procedure. After training, a node embedding captures the structural information within the node's K-hop neighborhood, where K is a hyper-parameter. A simple aggregation of all node embeddings or pooling [37] according to the graph structure summarizes the entire graph into an embedding. GNNs are parametrized with other models such as MLPs, which are the learnable non-linear transformations used in message passing, and GRUs, which are used to update the node embedding.

Lastly, the generalization ability of neural networks can be improved by an external memory [12,13,33] which can be accessed using a differentiable *attention mechanism* [5]. Given a set of neural embeddings, which form the external memory, an attention mechanism assigns a likelihood to each embedding, under a given neural context. These likelihoods guide the selection of decisions that are represented by the chosen embeddings.

#### **3 Framework**

We first describe the general framework, Code2Inv, and then illustrate two instances, namely, a loop invariant synthesizer for C programs and a CHC solver.

Figure 1 defines the domains of program structures and neural structures used in Code2Inv. The framework is parameterized by graph constructors *G* that produce graph representations of verification instance T and invariant grammar A, denoted Ginst and Ginv, respectively. The invariant grammar uses placeholder symbols H, which represent *abstract* values of entities such as variables, constants, and operators, and will be replaced by *concrete* values from the verification instance during invariant generation. The framework requires a black-box function *check* that takes a verification instance T and a candidate invariant inv, and returns success (denoted ⊥) or a counterexample cex.

#### **Domains of Program Structures:**


#### **Domains of Neural Structures:**


**Fig. 1.** Semantic domains. *L*(*A*) denotes the set of all sentential forms of *A*.

The key component of the framework is a neural policy π which comprises four neural networks. Two graph neural networks, η<sup>T</sup> and ηA, are used to compute neural embeddings, ν<sup>T</sup> and νA, for graph representations Ginst and Ginv, respectively. The neural network αctx, implemented as a GRU, maintains the attention context *ctx* which controls the selection of the production rule to apply or the concrete value to replace a placeholder symbol at each step of invariant generation. The neural network inv, implemented as a Tree-LSTM, encodes the partially generated invariant into a numeric vector denoted *state*, which captures the state of the generation that is used to update the attention context *ctx*.

Algorithm 1 depicts the main algorithm underlying Code2Inv. It takes a verification instance and a proof checker as input and produces an invariant that suffices to verify the given instance<sup>1</sup>. At a high level, Code2Inv learns a neural policy, in lines 1–5. The algorithm first initializes the neural policy and the set of counterexamples (line 1–2). The algorithm then iteratively samples a candidate invariant (line 4) and improves the policy using a reward for the new

<sup>1</sup> Fuzzers may be applied first so that the confidence of existence of a proof is high.

```
Algorithm 1. Code2Inv Framework
```
is maximum over all nodes of G

```
Input: a verification instance T and a proof checker check
  Output: a invariant inv satisfying check(T, inv) = ⊥
  Parameter: graph constructor G and invariant grammar A
1 π ← initPolicy(T,A)
2 C ← ∅
3 while true do
4 inv ← sample(π, T, A)
5 π, C ←improve(π, inv, C)
6 Function initPolicy(T,A)
7 Initialize weights of ηT, ηA, αctx, inv with random values
8 νT ← ηT(G(T))
9 νA ← ηA(G(A))
10 return νT, νA, ηT, ηA, αctx, inv
11 Function sample(π, T, A)
12 inv ← A.S
13 ctx ← aggregate(π.νT)
14 while inv is partially derived do
15 x ← leftmost non-terminal or placeholder symbol in inv
16 state ← π.inv(inv)
17 ctx ← π.αctx(ctx, state)
18 if x is non-terminal then
19 p ← attention(ctx, π.νA[x], G(A))
20 expand inv according to p
21 else
22 v ← attention(ctx, π.νT[x], G(T))
23 replace x in inv with v
24 return inv
25 Function improve(π, inv, C)
26 n ← number of counter-examples C that inv can satisfy
27 if n = |C| then
28 cex ← check(T, inv)
29 if cex = ⊥ then
30 save inv and weights of π
31 exit // a sufficient invariant is found
32 else
33 C ← C ∪ {cex}
34 r ← n/|C|
35 π ← updatePolicy(π, r)
36 return π, C
37 Function updatePolicy(π, r)
38 Update weights of π.ηT, π.ηA, π.αctx, π.inv, π.νT, π.νA by
39 standard policy gradient [34] using reward r
40 Function attention(ctx, ν, G)
41 Return node t in G such that dot product of ctx and ν[t]
```
candidate based on the accumulated counterexamples (line 5). We next elucidate upon the initialization, policy sampling, and policy improvement procedures.

**Initialization.** The initPolicy procedure (line 6–10) initializes the neural policy. All four neural networks are initialized with random weights (line 7), and graph embeddings νT, ν<sup>A</sup> for verification task T and invariant grammar A are computed by applying corresponding graph neural networks ηT, η<sup>A</sup> to their graph representations *G*(T),*G*(A) respectively. Alternatively, the neural networks can be initialized with pre-trained weights, which can boost overall performance.

**Neural Policy Sampling.** The sample procedure (lines 11–24) generates a candidate invariant by executing the current neural policy. The candidate is first initialized to the start symbol of the given grammar (line 12), and then updated iteratively (lines 14–23) until it is complete (i.e. there are no nonterminals). Specifically, the candidate is updated by either expanding its leftmost non-terminal according to one of its production rules (lines 19–20) or by replacing its leftmost placeholder symbol with some concrete value from the verification instance (lines 22–23). The selection of a production rule or concrete value is done through an *attention mechanism*, which picks the most likely one according to the current context and corresponding region of external memory. The neural context is initialized to the aggregation of embeddings of the given verification instance (line 13), and then maintained by αctx (line 17) which, at each step, incorporates the neural state of the partially generated candidate invariant (line 16), where the neural state is encoded by inv.

**Neural Policy Improvement.** The improve procedure (lines 25–36) improves the current policy by means of a *continuous* reward. Simply checking whether the current candidate invariant is sufficient or not yields a discrete reward of 1 (yes) or 0 (no). This reward is too sparse to improve the policy, since most candidate invariants generated are insufficient, thereby almost always yielding a zero reward. Code2Inv addresses this problem by accumulating counterexamples provided by the checker. Whenever a new candidate invariant is generated, Code2Inv tests the number of counterexamples it can satisfy (line 26), and uses the fraction of satisfied counterexamples as the reward (line 34). If all counterexamples are satisfied, Code2Inv queries the checker to validate the candidate (line 28). If the candidate is accepted by the checker, then a sufficient invariant was found, and the learned weights of the neural networks are saved for speeding up similar verification instances in the future (lines 29–31). Otherwise, a new counterexample is accumulated (line 33). Finally, the neural policy (including the neural embeddings) is updated based on the reward.

**Framework Instantiations.** We next show two instantiations of Code2Inv by customizing the graph constructor *G*. Specifically, we demonstrate two scenarios of graph construction: 1) by carefully exploiting task specific knowledge, and 2) with minimum information of the given task.

**Fig. 2.** (a) C program snippet in SSA form; (b) its graph representation.

*Instantiation to Synthesize Loop Invariants for C Programs.* An effective graph representation for a C program should reflect its control-flow and data-flow information. We leverage the static single assignment (SSA) transformation for this purpose. Figure 2 illustrates the graph construction process. Given a C program, we first apply SSA transformation as shown in Fig. 2a, from which a graph is constructed as shown in Fig. 2b. The graph is essentially abstract syntax trees (ASTs) augmented with control-flow (black dashed) edges and data-flow (blue dashed) edges. Different types of edges will be modeled as different message passing channels used in graph neural networks so that rich structural information can be captured more effectively by the neural embeddings. Furthermore, certain nodes (marked black) are annotated with placeholder symbols and will be used to fill corresponding placeholders during invariant generation. For instance, variables x and y are annotated with VAR, integer values 1000 and 1 are annotated with CONST, and the operator < is annotated with OP.

**Fig. 3.** (a) CHC instance snippet; (b) node representation for the CHC example; (c) example of invariant grammar; (d) node representation for the grammar.

*Instantiation to Solve Constrained Horn Clauses (CHC).* CHC are a uniform way to represent recursive, inter-procedural, and multi-threaded programs, and serve as a suitable basis for automatic program verification [6] and refinement type inference [21]. Solving a CHC instance involves determining unknown predicates that satisfy a set of logical constraints. Figure 3a shows a simple example of a CHC instance where itp is the unknown predicate. It is easy to see that itp in fact represents an invariant of a loop. Thus, CHC solving can be viewed as a generalization of finding loop invariants [6].

Unlike C programs, which have explicit control-flow and data-flow information, a CHC instance is a set of *un-ordered* Horn rules. The graph construction for Horn rules is not as obvious as for C programs. Therefore, instead of deliberately constructing a graph that incorporates detailed domain-specific information, we use a *node representation*, which is a degenerate case of graph representation and requires only necessary nodes but no edges. Figure 3b shows the node representation for the CHC example from Fig. 3a. The top two nodes are derived from the signature of unknown predicate itp and represent the first and the second arguments of itp. The bottom two nodes are constants extracted from the Horn rule. We empirically show that node representation works reasonably well. The downside of node representation is that no structural information is captured by the neural embeddings which in turn prevents the learned neural policy from generalizing to other structurally similar instances.

*Embedding Invariant Grammar.* Lastly, both instantiations must define the embedding of the invariant grammar. The grammar can be arbitrarily defined, and similar to CHCs, there is no obvious information such as control- or dataflow to leverage. Thus, we use node representation for the invariant grammar as well. Figure 3c and Fig. 3d shows an example of invariant grammar and its node representation, respectively. Each node in the graph represents either a terminal or a production rule for a non-terminal. Note that this representation does not prevent the neural policy from generalizing to similar instances as long as they share the same invariant grammar. This is feasible because the invariant grammar does not contain instance specific details, which are abstracted away by placeholder symbols like VAR, CONST, and OP.

## **4 Evaluation**

We first discuss the implementation, particularly the improvement over our previous prototype [30], and then evaluate our framework in a number of aspects, such as performance, transferability, flexibility, and naturalness.

**Implementation.** Code2Inv<sup>2</sup> consists of a frontend, which converts an instance into a graph, and a backend, which maintains all neural components (i.e. neural embeddings and policy) and interacts with a checker. Our previous prototype has a very limited frontend based on CIL [24] and no notion of invariant grammar in the backend. We made significant improvements in both the frontend and the backend. We re-implemented the frontend for C programs based on Clang and implemented a new frontend for CHCs. We also re-implemented the backend to accept a configurable invariant grammar. Furthermore, we developed a standard graph format, which decouples the frontend and backend, and a clean interface between the backend and the checker. No changes are needed in the backend to support new instantiations.

**Evaluation Setup.** We evaluate both instantiations of Code2Inv by comparing each instantiation with corresponding state-of-the-art solvers. For the task of

<sup>2</sup> Our artifacts are available on GitHub: https://github.com/PL-ML/code2inv.

synthesizing loop invariants for C programs, we use the same suite of benchmarks from our previous work [30], which consists of 133 C programs from SyGuS [2]. We compare Code2Inv with our previous specialized prototype and three other state-of-the-art verification tools: C2I [29], LoopInvGen [26] and ICE-DT [10]. For the CHC solving task, we collect 120 CHC instances using SeaHorn [14] to reduce the C benchmark programs into CHCs.<sup>3</sup> We compare Code2Inv with two state-of-the-art CHC solvers: Spacer [18], which is the default fixedpoint engine of Z3, and LinearyArbitrary [38]. We run all solvers on a single 2.4 GHz AMD CPU core up to 12 h and using up to 4 GB memory. Unless specified otherwise, Code2Inv is always initialized randomly, that is, untrained.

**Performance.** Given that both the hardware and the software environments could affect the absolute running time and that all solvers for loop invariant generation for C programs rely on the same underlying SMT engine, Z3 [23], we compare the performance in terms of number of Z3 queries. We note that this is an imperfect metric but a relatively objective one that also highlights salient features of Code2Inv. Figure 4a shows the plot of verification cost (i.e. number of Z3 queries) by each solver and the number of C programs successfully verified within the corresponding cost. Code2Inv significantly outperforms other state-of-the-art solvers in terms of verification cost and the general framework Code2Inv-G achieves performance comparable to (slightly better than) the previous specialized prototype Code2Inv-S.

**Fig. 4.** (a) Comparison of Code2Inv with state-of-the-art solvers; (b) comparison between untrained model and pre-trained model.

**Transferability**. Another hallmark of Code2Inv is that, along with the desired loop invariant, it also learns a neural policy. To evaluate the performance benefits of the learned policy, we randomly perturb the C benchmark programs by various edits (e.g. renaming existing variables and injecting new variables and

<sup>3</sup> SeaHorn produces empty Horn rules on 13 (out of 133) C programs due to optimizations during VC generation that result in proving the assertions of interest.

statements). For each program, we obtain 100 variants, and use 90 for training and 10 for testing. Figure 4b shows the performance difference between the untrained model (i.e. initialized with random weights) and the pre-trained model (i.e. initialized with pre-trained weights). Our results indicate that the learned neural policy can be transferred to accelerate the search for loop invariants for similar programs. This is especially useful in the CI/CD setting [25] where programs evolve incrementally and quick turnaround time is indispensable.

**Flexibility.** Code2Inv can be instantiated or extended in a very flexible manner. For one instance, with a simple frontend (e.g. node representation as discussed above), Code2Inv can be customized as a CHC solver. Our evaluation shows that, without any prior knowledge about Horn rules, Code2Inv can solve 94 (out of 120) CHC instances. Although it is not on a par with state-of-the-art CHC solvers Spacer and LinearArbitrary, which solve 112 and 118 instances, respectively, Code2Inv provides new insights for solving CHCs and could be further improved by better embeddings and reward design.

As another example, by simply adjusting the invariant grammar, Code2Inv is immediately ready for solving CHC tasks involving *non-linear* arithmetic. Our case study shows that Code2Inv successfully solves 5 (out of 7) non-linear instances we created<sup>4</sup>, while both Spacer and LinearArbitrary failed to solve any of them. Tasks involving non-linear arithmetic are particularly challenging because the underlying checker is more likely to get stuck, and no feedback (e.g. counterexample) can be provided, which is critical for existing solvers like Spacer and LinearArbitrary to make progress. This highlights another strength of Code2Inv—even if the checker gets stuck, the learning process can still continue by simply assigning zero or negative reward.


**Fig. 5.** Comparison of solution naturalness.

**Naturalness.** Our final case study concerns the naturalness of solutions. As illustrated in Fig. 5, solutions discovered by Code2Inv tend to be more natural, whereas Spacer and LinearArbitrary tend to find solutions that unnecessarily depend on constants from the given verification instance. Such *overfitted* solutions may become invalid when these constants change. Note that

<sup>4</sup> The non-linear instances we created are available in the artifact.

expressions such as (+ 0 0) in Code2Inv's solutions can be eliminated by postprocessing simplification akin to peephole optimization in compilers. Alternatively, the reward mechanism in Code2Inv could incorporate a regularizer on the naturalness.

**Limitations.** Code2Inv does not support finding loop invariants for programs with multiple loops, function calls, or recursion. Code2Inv generally runs slower compared to other contemporary approaches. Specifically, 90% of the solved C instances took 2 h or less, and the rest could take up to 12 hours to solve. This could be improved upon by leveraging GPUs, developing more efficient training algorithms, or leveraging templates [27].

## **5 Conclusion**

We presented a framework Code2Inv which automatically learns invariants (or more generally unknown predicates) by interacting with a proof checker. Code2Inv is a general and learnable tool for solving many different verification tasks and can be flexibly configured with a grammar and a graph constructor. We compared its performance with state-of-the-art solvers for both C programs and CHC formulae, and showed that it can adapt to different types of inputs with minor changes. We also showed, by simply varying the input grammar, how it can tackle non-linear invariant problems which other solvers are not equipped to work with, while still giving results that are relatively natural to read.

**Acknowledgements.** We thank the reviewers for insightful comments. We thank Elizabeth Dinella, Pardis Pashakhanloo, and Halley Young for feedback on improving the paper. This research was supported by grants from NSF (#1836936 and #1836822), ONR (#N00014-18-1-2021), AFRL (#FA8750-20-2-0501), and Facebook.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **MetaVal: Witness Validation via Verification**

Dirk Beyer and Martin Spiessl

LMU Munich, Munich, Germany

**Abstract.** Witness validation is an important technique to increase trust in verification results, by making descriptions of error paths (violation witnesses) and important parts of the correctness proof (correctness witnesses) available in an exchangeable format. This way, the verification result can be validated independently from the verification in a second step. The problem is that there are unfortunately not many tools available for witness-based validation of verification results. We contribute to closing this gap with the approach of *validation via verification*, which is a way to automatically construct a set of validators from a set of existing verification engines. The idea is to take as input a specification, a program, and a verification witness, and produce a new specification and a transformed version of the original program such that the transformed program satisfies the new specification if the witness is useful to confirm the result of the verification. Then, an 'off-the-shelf' verifier can be used to validate the previously computed result (as witnessed by the verification witness) via an ordinary verification task. We have implemented our approach in the validator MetaVal, and it was successfully used in SV-COMP 2020 and confirmed 3 653 violation witnesses and 16 376 correctness witnesses. The results show that MetaVal improves the effectiveness (167 uniquely confirmed violation witnesses and 833 uniquely confirmed correctness witnesses) of the overall validation process, on a large benchmark set. All components and experimental data are publicly available.

**Keywords:** Computer-aided verification · Software verification · Program analysis · Software model checking · Certification · Verification witnesses · Validation of verification results · Reducer

## **1 Introduction**

Formal software verification becomes more and more important in the development process for software systems of all types. There are many verification tools available to perform verification [4]. One of the open problems that was addressed only recently is the topic of results validation [10–12,37]: The verification work is often done by untrusted verification engines, on untrusted computing infrastructure, or even on approximating computation systems, and static-analysis tools suffer from false positives that engineers in practice hate because they are tedious to refute [20]. Therefore, it is necessary to validate verification results,

This work was funded by the Deutsche Forschungsgemeinschaft (DFG) – 378803395. c The Author(s) 2020

S. K. Lahiri and C. Wang (Eds.): CAV 2020, LNCS 12225, pp. 165–177, 2020. https://doi.org/10.1007/978-3-030-53291-8\_10

ideally by an independent verification engine that likely does not have the same weaknesses as the original verifier. Witnesses also help serving as an interface to the verification engine, in order to overcome integration problems [1].

The idea to witness the correctness of a program by annotating it with assertions is as old as programming [38], and from the beginning of model checking it was felt necessary to witness counterexamples [21]. Certifying algorithms [30] are not only computing a solution but also produce a witness that can be used by a computationally much less expensive checker to (re-)establish the correctness of the solution. In software verification, witnesses became standardized<sup>1</sup> and exchangeable about five years ago [10,11]. In the meanwhile, the exchangeable witnesses can be used also for deriving tests from witnesses [12], such that an engineer can study an error report additionally with a debugger. The ultimate goal of this direction of research is to obtain witnesses that are certificates and can be checked by a fully trusted validator based on trusted theorem provers, such as Coq and Isabelle, as done already for computational models that are 'easier' than C programs [40].

Yet, although considered very useful, there are not many witness validators available. For example, the most recent competition on software verification (SV-COMP 2020)<sup>2</sup> showcases 28 software verifiers but only 6 witness validators. Two were published in 2015 [11], two more in 2018 [12], the fifth in 2020 [37], and the sixth is MetaVal, which we describe here. Witness validation is an interesting problem to work on, and there is a large, yet unexplored field of opportunities. It involves many different techniques from program analysis and model checking. However, it seems that this also requires a lot of engineering effort.

Our solution *validation via verification* is a construction that takes as input an off-the-shelf software verifier and a new program transformer, and composes a witness validator in the following way (see Fig. 1): First, the transformer takes the original input program and transforms it into a new program. In case of a violation witness, which describes a path through the program to a specific program location, we transform the program such that all parts that are marked as unnecessary for the path by the witness are pruned. This is similar to the reducer for a condition in reducer-based conditional model checking [14]. In case of a correctness witness, which describes invariants that can be used in a correctness proof, we transform the program such that the invariants are asserted (to check that they really hold) and assumed (to use them in a re-constructed correctness proof). A standard verification engine is then asked to verify that (1) the transformed program contains a feasible path that violates the original specification (violation witness) or (2) the transformed program satisfies the original specification and all assertions added to the program hold (correctness witness).

MetaVal is an implementation of this concept. It performs the transformation according to the witness type and specification, and can be configured to use any of the available software verifiers<sup>3</sup> as verification backend.

<sup>1</sup>

Latest version of standardized witness format: https://github.com/sosy-lab/sv-witnesses <sup>2</sup> https://sv-comp.sosy-lab.org/2020/systems.php

<sup>3</sup> https://gitlab.com/sosy-lab/sv-comp/archives-2020/tree/master/2020

**Fig. 1.** Validator construction using readily available verifiers

**Contributions.** MetaVal contributes several important benefits:


### **2 Preliminaries**

For the theoretical part, we will have to set a common ground for the concepts of verification witnesses [10,11] as well as reducers [14]. In both cases, programs are represented as control-flow automata (CFAs). A *control-flow automaton* <sup>C</sup> = (L, l<sup>0</sup>, G) consists of a set <sup>L</sup> of control locations, an initial location <sup>l</sup><sup>0</sup> <sup>∈</sup> <sup>L</sup>, and a set G <sup>⊆</sup> L <sup>×</sup> Ops <sup>×</sup> L of control-flow edges that are labeled with the operations in the program. In the mentioned literature on witnesses and reducers, a simple programming language is used in which operations are either assignments or assumptions over integer variables. Operations op <sup>∈</sup> Ops in such a language can be represented by formulas in first order logic over the sets V ,V of program variables before and after the transition, which we denote by op(V,V ). In order to simplify our construction later on, we will also allow mixed operations of the form f(V ) <sup>∧</sup> (x <sup>=</sup> <sup>g</sup>(<sup>V</sup> )) that combine assumptions with an assignment, which would otherwise be represented as an assumption followed by an assignment operation.

```
1 void fun(uint x, uint y, uint z) {
2 if (x > y) {
3 z=2*x-y;
4 } else {
5 z=2*y-x+1;
6 }
7 if (z>y || z>x) {
8 return;
9 } else {
10 error();
11 }
12 }
```
**Fig. 2.** Example program for both correctness and violation witness validation

**Fig. 3.** CFA *C* of example program from Fig. 2

The conversion from the source code into a CFA and vice versa is straight forward, provided that the CFA is deterministic. A CFA is called *deterministic* if in case there are multiple outgoing CFA edges from a location l, the assumptions in those edges are mutually exclusive (but not necessarily exhaustive).

Since our goal is to validate (i.e., prove or falsify) the statement that a program fulfills a certain specification, we need to additionally model the property to be verified. For properties that can be translated into non-reachability, this can be done by defining a set T <sup>⊆</sup> L of target locations that shall not be reached. For the example program in Fig. 2 we want to verify that the call in line 10 is not reachable. In the corresponding CFA in Fig. 3 this is represented by the reachability of the location labeled with 10. Depending on whether or not a verifier accounts for the overflow in this example program, it will either consider the program safe or unsafe, which makes it a perfect example that can be used to illustrate both correctness and violation witnesses.

In order to reason about the soundness of our approach, we need to also formalize the program semantics. This is done using the concept of concrete data states. A *concrete data state* is a mapping from the set V of program variables to their domain Z, and a *concrete state* is a pair of control location and concrete data state. A *concrete program path* is then defined as a sequence π = (c<sup>0</sup>, l<sup>0</sup>) <sup>g</sup><sup>1</sup> −→ ... <sup>g</sup>*<sup>n</sup>* −→ (cn, ln) where <sup>c</sup><sup>0</sup> is the initial concrete data state, <sup>g</sup>i = (li−<sup>1</sup>, opi, li) <sup>∈</sup> <sup>G</sup>, and <sup>c</sup>i−<sup>1</sup>(<sup>V</sup> ), ci(<sup>V</sup> ) opi. A *concrete execution* ex(π) is then derived from a path <sup>π</sup> by only looking at the sequence (c<sup>0</sup>, l<sup>0</sup>)...(cn, ln) of concrete states from the path. Note the we deviate here from the definition given in [14], where concrete executions do not contain information about the program locations. This is necessary here since we want to reason about the concrete executions that fulfill a given non-reachability specification, i.e., that never reach certain locations in the original program.

Witnesses are formalized using the concept of protocol automata [11]. A *protocol automaton* W = (Q, Σ, δ, q<sup>0</sup>, F) consists of a set <sup>Q</sup> of states, a set of transition labels Σ = 2<sup>G</sup> <sup>×</sup> <sup>Φ</sup>, a transition relation <sup>δ</sup> <sup>⊆</sup> <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>×</sup> <sup>Q</sup>, an initial state <sup>q</sup><sup>0</sup>, and a set F <sup>⊆</sup> Q of final states. A state is a pair that consists of a name to identify the state and a predicate over the program variables V to represent the state invariant.<sup>4</sup> A transition label is a pair that consists of a subset of control-flow edges and a predicate over the program variables V to represent the guard condition for the transition to be taken. An *observer automaton* [11,13,32,34,36] is a protocol automaton that does not restrict the state space, i.e., if for each state q <sup>∈</sup> Q the disjunction of the guard conditions of all outgoing transitions is a tautology. Violation witnesses are represented by protocol automata in which all state invariants are *true*. Correctness witnesses are represented by observer automata in which the set of final states is empty.

#### **3 Approach**

#### **3.1 From Witnesses to Programs**

When given a CFA <sup>C</sup> = (L, l0, G), a specification T <sup>⊆</sup> L, and a witness automaton W = (Q, Σ, δ, q0, F), we can construct a product automaton <sup>A</sup>C×W = (<sup>L</sup> <sup>×</sup> Q,(l0, q<sup>0</sup>), Γ, T <sup>×</sup> <sup>F</sup>) where <sup>Γ</sup> <sup>⊆</sup> (<sup>L</sup> <sup>×</sup> <sup>Q</sup>) <sup>×</sup> (Ops <sup>×</sup> <sup>Φ</sup>) <sup>×</sup> (<sup>L</sup> <sup>×</sup> <sup>Q</sup>). The new transition relation Γ is defined by allowing for each transition g in the CFA only those transitions (S, ϕ) from the witness where g <sup>∈</sup> S holds:

Γ <sup>=</sup> -(li, qi),(op, ϕ),(lj , qj ) <sup>∃</sup>S : <sup>q</sup>i,(S, ϕ), qj <sup>∈</sup> δ,(li, op, lj ) <sup>∈</sup> <sup>S</sup> 

We can now define the semantics of a witness by looking at the paths in the product automaton and mapping them to concrete executions in the original program. A path of the product automaton <sup>A</sup>C,W is a sequence (l0, q<sup>0</sup>) <sup>α</sup><sup>0</sup> −→ ... <sup>α</sup>*n−*<sup>1</sup> −−−→ (ln, qn) such that (li, qi), αi,(li+1, qi+1) <sup>∈</sup> Γ and <sup>α</sup>i = (opi, φi).

It is evident that the automaton <sup>A</sup>C×W can easily be mapped to a new program <sup>C</sup>C×W by reducing the pair (op, ϕ) in its transition relation to an operation op. In case op is a pure assumption of the form <sup>f</sup>(<sup>V</sup> ) then op will simply be f(V ) <sup>∧</sup> ϕ(V ). If op is an assignment of the form f(V ) <sup>∧</sup> (x <sup>=</sup> g(V )), then op will be (f(<sup>V</sup> )∧ϕ(<sup>V</sup> ))∧(x <sup>=</sup> <sup>g</sup>(<sup>V</sup> )). This construction has the drawback that the resulting CFA might be non-deterministic, but this is actually not a problem when the corresponding program is only used for verification. The non-determinism can be expressed in the source code by using non-deterministic values, which are already formalized by the community and established in the SV-COMP rules, and therefore also supported by all participating verifiers. The concrete executions of <sup>C</sup>C×W can be identified with concrete executions of <sup>C</sup> by projecting their pairs (l, q) on their first element. Let projC (ex(CC×W )) denote the set of concrete executions that is derived this way. Due to how the relation Γ of <sup>A</sup>C×W is constructed, it is guaranteed that this is a subset of the executions of <sup>C</sup>, i.e., projC (ex(CC×W )) <sup>⊆</sup> ex(C). In this respect the witness acts in very much the same way as a reducer [14], and the reduction of the search space is also one of the desired properties of a validator for violation witnesses.

<sup>4</sup> These invariants are the central piece of information in correctness witnesses. While invariants that proof a program correct can be hard to come up with, they are usually easier to check.

**Fig. 4.** Violation witness *W<sup>V</sup>*

**Fig. 5.** Product automaton *A<sup>C</sup>×W<sup>V</sup>*

#### **3.2 Programs from Violation Witnesses**

For explaining the validation of results based on a violation witness, we consider the witness in Fig. <sup>4</sup> for our example C program in Fig. 2. The program <sup>C</sup>C×W*<sup>V</sup>* resulting from product automaton <sup>A</sup>C×W*<sup>V</sup>* in Fig. <sup>5</sup> can be passed to a verifier. If this verification finds an execution that reaches a specification violation, then this violation is guaranteed to be also present in the original program. There is however one caveat: In the example in Fig. 5, a reachable state (10, q<sup>0</sup>) at program location 10 (i.e., a state that violates the specification) can be found that is not marked as accepting state in the witness automaton <sup>W</sup>V . For a strict version of witness validation, we can remove all states that are in T <sup>×</sup> Q but not in T <sup>×</sup> F from the product automaton, and thus, from the generated program. This will ensure that if the verifier finds a violation in the generated program, the witness automaton also accepts the found error path. The version of MetaVal that was used in SV-COMP 2020 did not yet support strict witness validation.

#### **3.3 Programs from Correctness Witnesses**

Correctness witnesses are represented by observer automata. Figure 6 shows a potential correctness witness <sup>W</sup>C for our example program <sup>C</sup> in Fig. 2, where the invariants are annotated in bold font next to the corresponding state. The construction of the product automaton <sup>A</sup>C×W*<sup>C</sup>* in Fig. <sup>7</sup> is a first step towards reestablishing the proof of correctness: the product states tell us to which control locations of the CFA for the program the invariants from the witness belong.

The idea of a result validator for correctness witnesses is to


We can achieve the second goal by extracting the invariants from each state in the product automaton <sup>A</sup>C×W*<sup>C</sup>* and adding them as conditions to all edges by which the state can be reached. This will then be semantically equivalent to assuming that the invariants hold at the state and potentially make the consecutive proof easier. For soundness we need to also ensure the first goal. To achieve that, we add transitions into a (new) accepting state from T <sup>×</sup> F whenever we transition

**Fig. 6.** Correctness witness *W<sup>C</sup>* **Fig. 7.** Product automaton *A<sup>C</sup>×W<sup>C</sup>*

into a state q and the invariant of q does not hold, and we add self-loops such that the automaton stays in the new accepting state forever. In sum, for each invariant, there are two transitions, one with the invariant as guard (to assume that the invariant holds) and one with the negation of the invariant as guard (to assert that the invariant holds, going to an accepting (error) state if it does not hold). This transformation ensures that the resulting automaton after the transformation is still a proper observer automaton.

## **4 Evaluation**

This section describes the results that were obtained in the 9th Competition on Software Verification (SV-COMP 2020), in which MetaVal participated as validator. We did not perform a separate evaluation because the results of SV-COMP are complete, accurate, and reproducible; all data and tools are publicly available for inspection and replication studies (see data availability in Sect. 6).

#### **4.1 Experimental Setup**

**Execution Environment.** In SV-COMP 2020, the validators were executed in a benchmark environment that makes use of a cluster with 168 machines, each of them having an Intel Xeon E3-1230 v5 CPU with 8 processing units, 33 GB of RAM, and the GNU/Linux operating system Ubuntu 18.04. Each validation run was limited to 2 processing units and 7 GB of RAM, in order to allow up to 4 validation runs to be executed on the same machine at the same time. The time limit for a validation run was set to 15 min for correctness witnesses and to 90 s for violation witnesses. The benchmarking framework BenchExec 2.5.1 was used to ensure that the different runs do not influence each other and that the resource limits are measured and enforced reliably [15]. The exact information to replicate the runs of SV-COMP 2020 can be found in Sect. 3 of the competition report [4].

**Benchmark Tasks.** The verification tasks<sup>5</sup> of SV-COMP can be partitioned wrt. their specification into ReachSafety, MemSafety, NoOverflows, and Termination. Validators can be configured using different options for each specification.

<sup>5</sup> https://github.com/sosy-lab/sv-benchmarks/tree/svcomp20


**Table 1.** Overview of validation for violation witnesses in SV-COMP 2020

**Table 2.** Overview of validation for correctness witnesses in SV-COMP 2020


**Validator Configuration.** Since our architecture (cf. Fig. 1) allows for a wide range of verifiers to be used for validation, there are many interesting configurations for constructing a validator. Exploring all of these in order to find the best configuration, however, would require significant computational resources, and also be susceptible to over-fitting. Instead, we chose a heuristic based on the results of the competition from the previous year, i.e., SV-COMP 2019 [3]. The idea is that a verifier which performed well at *verifying* tasks for a specific specification is also a promising candidate to be used in *validating* results for that specification. Therefore the configuration of our validator MetaVal uses CPA-Seq as verifier for tasks with specification ReachSafety, Ultimate Automizer for NoOverflow and Termination, and Symbiotic for MemSafety.

#### **4.2 Results**

The results of the validation phase in SV-COMP 2020 [5] are summarized in Table 1 (for violation witnesses) and Table 2 (for correctness witnesses). For each specification, MetaVal was able to not only confirm a large number of results that were also validated by other tools, but also to confirm results that were not previously validated by any of the other tools.<sup>6</sup>

For violation witnesses, we can observe that MetaVal confirms significantly less witnesses than the other validators. This can be explained partially by the restrictive time limit of 90 s. Our approach not only adds overhead when generating the program from the witness, but this new program can also be harder to parse and analyze for the verifier we use in the backend. It is also the case that the verifiers that we use in MetaVal are not tuned for such a short time limit, as a verifier in the competition will always get the full 15 min. For specification ReachSafety, for example, we use CPA-Seq, which starts with a very simply analysis and switches verification strategies after a fixed time that happens to be also 90 s. So in this case we will never benefit from the more sophisticated strategies that CPA-Seq offers.

For validation of correctness witnesses, where the time limit is higher, this effect is less noticeable such that the number of results confirmed by MetaVal is more in line with the numbers achieved by the other validators. For specification MemSafety, MetaVal even confirms more correctness witnesses than Ultimate Automizer. This indicates that Symbiotic was a good choice in our configuration for that specification. Symbiotic generally performs much better in verification of MemSafety tasks than Ultimate Automizer, so this result was expected.

Before the introduction of MetaVal, there was only one validator for correctness witnesses in the categories NoOverflow and MemSafety, while constructing a validator for those categories with our approach did not require any additional development effort.

#### **5 Related Work**

*Programs from Proofs.* Our approach for generating programs can be seen as a variant of the Programs from Proofs (PfP) framework [27,41]. Both generate programs from an abstract reachability graph of the original program. The difference is that PfP tries to remove all specification violations from the graph, while we just encode them into the generated program as violation of the standard reachability property. We do this for the original specification and the invariants in the witness, which we treat as additional specifications.

*Automata-Based Software Model Checking.* Our approach is also similar to that of the validator Ultimate Automizer [10]. For violation witnesses, it also constructs the product of CFA and witness. For correctness witnesses, it instruments the invariants directly into the CFA of the program (see [10], Sect. 4.2) and passes the result to its verification engine, while MetaVal constructs the product of CFA and witness, and applies a similar instrumentation. In both cases, MetaVal's transformer produces a C program, which can be passed to an independent verifier.

*Reducer-Based Conditional Model Checking.* The concept of generating programs from an ARG has also been used to successfully construct conditional verifiers [14].

<sup>6</sup> In the statistics, a witness is only counted as confirmed if the verifier correctly stated whether the input program satisfies the respective specification.

Our approach for correctness witnesses can be seen as a special case of this technique, where MetaVal acts as initial verifier that does not try to reduce the search space and instead just instruments the invariants from the correctness witness as additional specification into the program.

*Verification Artifacts and Interfacing.* The problem that verification results are not treated well enough by the developers of verification tools is known [1] and there are also other works that address the same problem, for example, the work on execution reports [19] or on cooperative verification [17].

*Test-Case Generation.* The idea to generate test cases from verification counterexamples is more than ten years old [8,39], has since been used to create debuggable executables [31,33], and was extended and combined to various successful automatic test-case generation approaches [24,25,29,35].

*Execution.* Other approaches [18,22,28] focus on creating tests from concrete and tool-specific counterexamples. In contrast, witness validation does not require full counterexamples, but works on more flexible, possibly abstract, violation witnesses from a wide range of verification tools.

*Debugging and Visualization.* Besides executing a test, it is important to understand the cause of the error path [23], and there are tools and methods to debug and visualize program paths [2,9,26].

## **6 Conclusion**

We address the problem of constructing a tool for witness validation in a systematic and generic way: We developed the concept of *validation via verification*, which is a two-step approach that first applies a program transformation and then applies an off-the-shelf verification tool, without development effort.

The concept is implemented in the witness validator MetaVal, which has already been successfully used in SV-COMP 2020. The validation results are impressive: the new validator enriches the competition's validation capabilities by 164 uniquely confirmed violation results and 834 uniquely confirmed correctness results, based on the witnesses provided by the verifiers. This paper does not contain an own evaluation, but refers to results from the recent competition in the field.

The major benefit of our concept is that it is now possible to configure a spectrum of validators with different strengths, based on different verification engines. The 'time to market' of new verification technology into validators is negligibly small because there is no development effort necessary to construct new validators from new verifiers. A potential technology bias is also reduced.

**Data Availability Statement.** All data from SV-COMP 2020 are publicly available: witnesses [7], verification and validation results as well as log files [5], and benchmark programs and specifications [6] <sup>7</sup>. The validation statistics in Tables 1 and 2 are available in the archive [5] and on the SV-COMP website<sup>8</sup>. MetaVal 1.0 is available on GitLab<sup>9</sup> and in our AEC-approved virtual machine [16].

<sup>7</sup> https://github.com/sosy-lab/sv-benchmarks/tree/svcomp20

<sup>8</sup> https://sv-comp.sosy-lab.org/2020/results/results-verified/validatorStatistics.html

<sup>9</sup> https://gitlab.com/sosy-lab/software/metaval/-/tree/1.0

#### **References**


## **Recursive Data Structures in SPARK**

Claire Dross(B) and Johannes Kanig

AdaCore, 75009 Paris, France *{*dross,kanig*}*@adacore.com

**Abstract.** SPARK is both a deductive verification tool for the Ada language and the subset of Ada on which it operates. In this paper, we present a recent extension of the SPARK language and toolset to support pointers. This extension is based on an ownership policy inspired by Rust to enforce non-aliasing through a move semantics of assignment. In particular, we consider pointer-based recursive data structures, and discuss how they are supported in SPARK. We explain how iteration over these structures can be handled using a restricted form of aliasing called local borrowing. To avoid introducing a memory model and to stay in the first-order logic background of SPARK, the relation between the iterator and the underlying structure is encoded as a predicate which is maintained throughout the program control flow. Special first-order contracts, called pledges, can be used to describe this relation. Finally, we give examples of programs that can be verified using this framework.

**Keywords:** Deductive verification *·* Recursive structures *·* Ownership

#### **1 Introduction**

The programming language SPARK [8] has been designed to be amenable to formal verification, and one of the most impactful design choices was the exclusion of aliasing. While this choice vastly simplified the tool design and improved the expected proof performance, it also meant that pointers, as a major source of aliasing, were excluded from the language. While SPARK over the years had seen the addition of many language features, adding pointers just seemed impossible without violating the non-aliasing property. Then came Rust [11] democratizing a type system based on ownership [5]. Taking inspiration from it, it was possible to add pointers to the language in a way that still excludes aliasing. We will give an overview of the rules in this paper.

However, it was unclear if programs traversing recursive data structures such as lists and trees could be supported in this setting. In particular, iteration using a loop requires an alias between the traversed structure and the iterator. In this paper, we detail an approach, inspired by recent work by Astrauskas et al. [1], that enables proofs about recursive pointer-based data structures in SPARK. We have implemented this approach in the industrial formal verification tool SPARK, and, using this tool, developed a number of examples. Some important restrictions remain - we will also discuss them in this paper.

Ada [2] is a general-purpose procedural programming language. The design of the Ada language puts great emphasis on the safety and correctness of the program. This objective is realized by using a readable syntax that uses keywords instead of symbols where reasonable. The type system is strong and strict and many potential violations of type constraints can be detected statically by the compiler. If not, a run-time check is inserted into the program, to guarantee the detection of incorrect situations.

```
declare -- Block introducing new declarations
  type My_Int is range -100 .. 100;
  -- User-defined integer type ranging from -100 to 100
  subtype My_Nat is My_Int range 0 .. My_Int'Last;
  -- Subtype of My_Int with additional constraints
  X : My_Int := 50; -- Static check that 50 is in the bounds of My_Int
  Y : My_Nat;
begin -- Part of the block containing statements
  ...
  Y := X; -- Dynamic check that X is in the bounds of My_Nat
end; -- End of scope of the entities declared in the block
```
Ada 2012 introduced contract based programming to Ada. In particular, it is possible to attach pre- and postconditions to subprograms<sup>1</sup>. These conditions can be checked during the execution of the program, just like assertions.

SPARK is the name of a tool that provides formal verification for Ada. It uses the user-provided contracts and attempts to prove that the runtime checks cannot fail and that postconditions are established by the corresponding subprograms. As formal verification for the whole Ada language would be intractable, SPARK is also the name of the subset of the Ada language that is supported by the SPARK tool<sup>2</sup>. This subset contains almost all features of Ada, though sometimes in a restricted form. In particular, expressions should be free from side effects, and aliasing is forbidden (no two variables should share the same memory location or overlap in memory). This restriction greatly simplifies the memory model used in the SPARK tool: any program variables can be reasoned about independently from other variables.

The SPARK tool uses the Why3 platform to generate verification conditions for SMT solvers via a weakest-precondition calculus [4].

#### **2 Support for Pointers**

Pointers in Ada are called *access types*. It is possible to declare an access type using the access keyword. Objects of an access type are null if no initial values are supplied. It is possible to allocate an object on the heap using the keyword new. An initial value can be supplied for the allocated object. A dereference of a pointer is written as a record component access, but using the keyword all.

<sup>1</sup> In Ada, a distinction is made between functions that return a value, and procedures, which do not. *Subprogram* is the term that designates both.

<sup>2</sup> http://docs.adacore.com/spark2014-docs/html/ug/.

```
declare
  type Int_Acc is access Integer; -- Declare a new access type
  X : Int_Acc; -- Declare an object of this type
  pragma Assert (X = null); -- No initial values provided, X is null
  Y : Integer;
begin
  X := new Integer; -- Allocation of uninitialized data
  X := new Integer'(3); -- Allocation of initialized data
  Y := X.all; -- Dereference the access
end;
```
When a pointer is dereferenced, a runtime check is introduced to make sure that it is not null. Ada does not mandate garbage collection. Memory allocated on the heap can be reclaimed manually by the user using a generic function named Unchecked Deallocation, which also sets its argument pointer to null. There are several kinds of access types. The basic access types, like Int Acc defined above, are called pool specific access types. They can only designate objects allocated on the heap. General access types, introduced by the keyword all, can also be used to designate objects allocated on the stack or global data.

Pointers were excluded from the SPARK subset until recently. Indeed, allowing pointers in a straightforward way would break the absence of aliasing in SPARK. In addition, pointers are associated with a list of classes of bugs such as memory leaks, use-after-free and dereferencing a null-pointer.

To support pointers in SPARK, we designed a subset of Ada's access types which does not introduce aliasing and avoids some pointer-specific issues, while retaining as much expressivity as possible. The first restriction we selected is the exclusion of general access types. This means that SPARK can only create pointers designating memory allocated on the heap, and not on the stack. As a result, pointers can only be made invalid by explicit deallocation, and deallocation of a valid pointer is always legal. To eliminate aliasing between (heap) pointers, ownership rules inspired by Rust have been added on top of Ada's legality rules. These rules enforce a single writer/multiple readers policy. They ensure that, when a value designated by a pointer is modified, all other objects can be considered to be preserved.

The basis of the ownership policy of SPARK is the move semantics of assignments. When a pointer is assigned to a variable, both the source and the target of the assignment designate the same memory region: assigning an object containing a pointer creates an alias. To alleviate this problem, when an object containing a pointer is assigned, the memory region designated by the pointer is said to be *moved*. The source of the assignment loses the ownership of the designated data while the target of the assignment gains it. The ownership system makes sure that the designated data is not accessed again through the source of the assignment.

```
Y : Int_Acc := X; -- Ownership of the data designated by X is moved to Y
Y.all := Y.all + 1; -- The data can be read and modified through Y
Z := X.all; -- Illegal: Reading or modifying X.all is not allowed
```
As the ownership policy ensures that no aliasing can occur between access objects, it is possible to reason about the program almost as if the pointer was replaced by the data it points to. When an object containing a pointer is assigned to another variable, it is safe to consider that the designated data is copied by the assignment. Indeed, any effects that could occur because variables are sharing a substructure cannot be observed because of the ownership rules.

Pointers are handled in the verification model of the SPARK proof tool as *maybe*, or *option* types: access objects are either null, or they contain a value. In addition, access objects also contain an address, which can be used to handle comparison (two pointers may not be equal even if the values they designate are equal). When a pointer is dereferenced, a verification condition is generated to make sure that the pointer is not null, so that its value can be accessed.

```
X : Int_Acc; -- X is null
X := new Integer'(3); -- X has a value which is 3
Y := X; -- Y has a value which is 3
Z := Y.all; -- Check that Y is not null, Z is 3
```
Note that the ownership policy is key for this translation to be correct, as it prevents the program from observing side-effects caused by the modification of a shared reference, which would not be accounted for in the verification model.

#### **3 Recursive Data Structures**

In Ada, recursivity can only be introduced through pointers. The idea is to first declare a type, but without giving its definition. This declaration, called an *incomplete declaration*, introduces a place-holder for the type, which can only be used in restricted circumstances. In particular, this place-holder can be used to declare an access type designating pointers to values of this type. Using this mechanism, it is possible to declare a recursive data structure, since the access type can be used in the type definition as it comes afterward.

```
type List_Cell;
type List is access List_Cell;
type List_Cell is record
  Data : Integer;
  Next : List;
end record;
```
There are no specific restrictions concerning recursive types in SPARK. However, the ownership policy of SPARK implies that it will not be possible to create a structure which has either cycles (e.g. doubly linked lists) or shared substructures (e.g. DAGs) in it. The ownership policy may also impact how recursive structures can be manipulated. In general, working with such structures involves a traversal, which can be done either recursively, or iteratively using a loop. Algorithms working in a recursive way are generally compliant with the ownership policy of SPARK. Indeed, the recursive calls will allow reading or modifying the structure in depth without having to deconstruct it<sup>3</sup>.

```
function Length (L : access constant List_Cell) return My_Nat is
  (if L = null then 0 else Length (L.Next) + 1);
function Nth (L : access constant List_Cell; N : My_Pos) return Integer is
  (if N=1 then L.Data else Nth (L.Next, N - 1))
with Pre ⇒ N ≤ Length (L);
```
<sup>3</sup> In Length and Nth, addition on My Nat and My Pos has been redefined to saturate so as to avoid the overflow checking mandated by Ada.

Algorithms involving loops are trickier. The declaration of the iterator used for the loop creates an alias of the traversed data structure. As per SPARK's ownership policy, this is considered to be a move, so it makes it illegal to access the initial structure. Further assignments to the iterator during the traversal contribute to losing definitively one by one the ownership of every node in the structure, making it impossible to restore the ownership at the end.

```
procedure Set_All_To_Zero (X : in out List) is
   Y : List := X; -- The ownership of X is transferred to Y
begin
   while Y = null loop
      Y.Data := 0;
      Y := Y.Next; -- Ownership of the first cell of Y is lost for good
   end loop; -- The ownership of X cannot be restored
end Set_All_To_Zero;
```
To traverse recursive data structures, a move is not what we want. Here we need a way to lend the ownership of a memory region for a period of time and automatically restore it at the end. A similar mechanism, called *borrowing*, is available in the Rust language. We have adapted it to SPARK.

## **4 Borrowing Ownership**

As Ada is an imperative language, losing the possibility to traverse a linked data structure using a loop was deemed too restrictive. To alleviate this problem, a notion of ownership borrowing was introduced in SPARK. It allows the users to declare a variable, called a borrower, which is initialized with a reference to a part of an existing data structure. To state that this initialization should not be considered a move, an *anonymous access type* is used for the borrower<sup>4</sup>. During the scope of the borrower, the borrowed part of the underlying structure is frozen, meaning that it is illegal to read or modify it. Once the borrower has gone out of scope, the ownership automatically returns to the borrowed object, so that it is again fully accessible.

```
X := ...; -- X is initialized to the list {1,2,3,4}
declare
  Y : access List_Cell := X; -- Y has an anonymous access type.
  -- Ownership of X is transferred to Y for the duration of its lifetime.
begin
  Y.Data := Y.Data + 1; -- Y can be used to read or modify X
  pragma Assert (X.Data = 2); -- Illegal, during the lifetime of Y, X
                                    -- cannot be read or modified directly
end;
pragma Assert (X.Data = 2); -- Afterwards, the ownership returns to X
```
A borrower can be used to modify the underlying structure. This makes it effectively an alias of the borrowed object. To allow the tool to statically determine the cases of aliasing, SPARK restricts the initial value of a local borrower to be the name of a part of an existing object. This forbids for example borrowing one of two structures depending on a condition.

<sup>4</sup> A type is said to be anonymous if it does not have a previous declaration. Here access List Cell is anonymous while List is named.

It is possible to update a borrower to change the part of the object it designates (as opposed to modifying the designated object). This is called a reborrow. In SPARK, the value assigned to the borrower in a reborrow should be rooted at the borrower. This means that reborrows only go deeper into the structure.

```
declare
  Y : access List_Cell := X; -- Y is X
begin
  Y := Y.Next; -- This is a reborrow, Y is now X.Next
end;
```
Borrowing can be used to allow simple iterative traversals of a recursive data structure like the loop of Set All To Zero. More complex traversals, involving stacks for example, cannot be written iteratively in SPARK.

```
procedure Set_All_To_Zero (X : in out List) is
   Y : access List_Cell := X;
   -- The ownership of X is transferred to Y for the duration of its lifetime
begin
   while Y = null loop
      Y.Data := 0;
      Y := Y.Next; -- Reborrow: Y designates something deeper
   end loop;
end Set_All_To_Zero; -- The ownership of X is restored
```
Using reborrows, local borrowers allow one to indirectly modify a data structure at an arbitrarily-deep position, which may not be statically-known. While in the scope of the borrower, these indirect modifications can be ignored by the analysis, as the ownership policy makes them impossible to observe. However, after the end of the borrow, ownership is transferred back to the borrowed object, and SPARK needs to take into account whatever modifications may have occurred through the borrower.

```
X := ...; -- X is initialized to the list {1,2,3,4}
declare
  Y : access List_Cell := X; -- Y is X
begin
  Y := Y.Next.Next;
  -- Through reborrows, Y designates an arbitrarily-deep part of X
  Y.Data := 42; -- Y is used to indirectly modify X
end;
pragma Assert (X.Next.Next.Data = 42); -- The assertion should hold
```
To be able to reconstruct the borrowed object from the value of the borrower, we must track the relation between them. As this relation cannot be statically determined because of reborrows, SPARK handles it as an additional object in the program. This allows us to take advantage of the normal mechanism for handling value dependent control-flow in SPARK (the weakest-precondition calculus of Why3). The idea is the following. When a borrower is declared in Ada, we create two objects: the borrower itself, which is considered as a stand-alone structure, independent of the borrowed object, and a predicate. The predicate, which we call the borrow relation, encodes the most precise relation between the borrower and the borrowed object which does not depend on the actual value designated by the borrower. The value of the *borrow relation* is computed by the tool from the definition of the borrower, and is updated at each reborrow. Modifications of the underlying data structure don't impact this relation. At the end of the borrow, the borrowed object is reconstructed using both the borrow relation and the current value of the borrower.

```
X := ...; -- X is initialized to the list {1,2,3,4}
declare
  Y : access List_Cell := X; -- Create borrow relation to relate X and Y
  -- b_rel := λ new_x, new_y. new_x = null ∧ new_x = new_y
begin
  Y := Y.Next.Next; -- Update the predicate to model the new relation
  -- b_rel := λ new_x, new_y. new_x = null ∧ new_x.data = 1 ∧
  -- new_x.next = null ∧ new_x.next.data = 2 ∧ new_x.next.next = null
  -- ∧ new_x.next.next = new_y
  Y.Data := 42; -- The borrow relation is not modified
end;
pragma Assert (X.Next.Next.Data = 42);
```
#### **5 Describing the Borrow Relation**

SPARK performs deductive verification, which relies on user-specified invariants to handle loops. When traversing a linked data structure, the loop body contains a reborrow, which means that the borrow relation is modified in the loop. As a general rule, if a variable is modified in a loop, it should be described in the loop invariant, lest nothing is known about its value afterward. Thus, we need a way to describe the borrow relation in the loop invariant.

As part of their work on the Prusti proof tool for Rust, Astrauskas et al. found the need for a similar annotation that they call *pledges* [1]. In Rust, a pledge is an assertion associated with a borrower which is guaranteed to hold at the time when the borrow expires, no matter what may happen in between. In SPARK, a property guaranteed to hold at the end of the borrow must be a consequence of the borrow relation, since the borrow relation is the most precise relation which does not depend on the actual value of the borrower. Therefore, the user-visible notion of a pledge is suitable to approximate the internally computed borrow relation. Similar to user-provided postconditions, which must be implied by the strongest postcondition computed by a verifying tool, the user-provided pledge should follow from the borrow relation.

Since the Ada syntax has no support for pledges, we have resorted in SPARK to introducing special functions (dedicated to each access type) called pledge functions, which mark expressions which should be considered as pledge expressions by the tool. A pledge function is a *ghost* function (meaning that it is not allowed to have any effect on the output of the program) which has two parameters. The first one is used to identify the borrower on which the pledge should apply, while the second holds the assertion. Note that a call to a pledge function isn't really a call for the SPARK analyzer. It is simply a marker that the expression in argument is a pledge.

```
function Pledge
  (L : access constant Cell; -- The borrower to which the pledge applies
   P : Boolean) -- The property we want to assert in the pledge
     return Boolean
is (P) -- For execution, the function evaluates the property
with Ghost,
  Annotate ⇒ (GNATprove, Pledge); --IdentifiesapledgefunctionforSPARK
```
When a pledge function is called in an assertion, SPARK recognizes it and identifies its parameter as a pledge. It therefore attempts to show that the property is implied by the borrow relation (as opposed to implied by the current value of the borrower).

```
X := ...; -- X is initialized to the list {1,2,3,4}
declare
  Y : access List_Cell := X;
begin
  Y := Y.Next.Next;
  pragma Assert (Pledge (Y, Y = X.Next.Next));
  -- True as this is implied by borrow relation
  pragma Assert (Pledge (Y, X.Data = 1 and X.Next.Data = 2));
  -- True again as the first 2 elements of X are frozen
  pragma Assert (Pledge (Y, X.Next.Next.Data = 3));
  -- False, though this is true at the current program point, as it is not
  -- guaranteed to hold at the end of the borrow.
  ...
```
**end**;

Using pledges, we can formally verify the Set All To Zero procedure. Its postcondition states that all elements of the list have been set to 0 using the Nth function. To be able to express the loop invariant in a similar way, we have introduced a ghost variable C to count the number of iterations. Its value is maintained by the first loop invariant. The second and third invariants are pledges, describing how the value of X can be reconstructed from the value of the iterator Y. The second invariant gives the length of the list, while the third describes the value of its elements using the Nth function. Elements which have already been processed are frozen by the borrow. Their value is known to be 0. Other elements can be linked to the corresponding position in the iterator Y.

```
procedure Set_All_To_Zero (X : List) with
  Pre ⇒ Length (X) < My_Nat'Last,
  Post ⇒ Length (X) = Length (X)'Old
    and (for all I in 1 .. Length (X) ⇒ Nth (X, I) = 0);
  -- All elements of X are 0 after the call
procedure Set_All_To_Zero (X : List) is
   C : My_Nat := 0 with Ghost;
   Y : access List_Cell := X;
begin
   while Y = null loop
      pragma Loop_Invariant (C = Length (Y)'Loop_Entry - Length (Y));
      -- C elements have been traversed
      pragma Loop_Invariant
        (Pledge (Y, Length (X) = Length (Y) + C));
      pragma Loop_Invariant
        (Pledge (Y, (for all I in 1 .. Length (X) ⇒
           Nth (X, I) = (if I ≤ C then 0 else Nth (Y, I - C)))));
      -- All elements are 0 up to C, others are elements of Y
      Y.Data := 0;
      Y := Y.Next;
      C := C + 1;
   end loop;
end Set_All_To_Zero;
```
Note that, in general, it is not necessary to write a pledge to verify a program using a local borrower. Indeed, the analysis tool is able to precisely track the borrow relation through successive reborrows. Pledges need only be provided when the borrow relation itself cannot be tracked by the tool, for example because of a loop, like in our example.

#### **6 Evaluation**

We could not try the tool on any pre-existing benchmark since SPARK codebases do not have pointers, and Ada codebases usually violate some SPARK rules. In particular, Ada codebases have no reason to abide by the ownership policy of SPARK. So instead, we mostly had to write new tests to assess the correctness and performance of our implementation. The public testsuite of SPARK contains more than 150 tests mentioning access types, be they supported cases or not.

To assess expressivity and provability on programs dealing with recursive data structures, we have written 6 examples, none of them very big, but ranging over various levels of complexity<sup>5</sup>. On all of these examples, we have shown that the runtime checks imposed by the Ada language are guaranteed to pass and that no uninitialized value can be read. In addition, we have manually supplied functional properties.

Figure 1 gives some metrics over these examples. Under the tab Loc are listed the total number of lines of code in the example, the number of lines of specification (including contracts and specification functions), and the number of additional ghost annotations (assertions, loop invariants, ghost variables*...*). The #Checks column gives the number of checks generated by the tool (contracts, assertions, invariants, language defined checks...). In the last three columns, we can see the total running time of SPARK, both from scratch using its default strategy and only replaying the proofs through the replay facility, as well as the maximal time needed to prove a single verification condition.


**Fig. 1.** Overview of the examples involving recursive data structures

Though these examples are small, we think they demonstrate that it is possible to define recursive data structures in SPARK, and to verify iterative programs using them. When writing the algorithms, we found that the limitations mostly come from the ownership policy of SPARK. Some data structures are not supported, requiring either to switch to full Ada for their implementations, or to change the algorithm to work around the missing links. In general, we found

<sup>5</sup> https://github.com/AdaCore/spark2014/tree/master/papers/Pledge2020/ examples.

that the annotation effort required to describe the borrow relations, though nonnegligible, was acceptable. In particular, it uses the standard SPARK expressions, with no mentions of memory separation or permission.

#### **7 Related Work**

Program verification tools for mainstream languages such as C or Java generally support aliasing, because the concept of pointer or reference is more central. They deal with it by modeling the heap. The WP plugin of Frama-C uses by default a *typed memory model* where different arrays are used for the basic types of C [6]. The VerCors [3] toolset handles high-level programming languages, such as Java, by extending the annotation language with separation logic with permission [10]. In SPARK we have chosen a different approach, as we avoid modeling the heap completely by using ownership rules to enforce non-aliasing.

The ownership rules introduced in SPARK are largely inspired by the Rust language [11]. The differences are mostly motivated by the need to comply with the preexisting Ada semantics of pointers. In addition, SPARK was aiming at coming up with a subset as easy to verify as possible. The resulting model is simpler because it does not make lifetime of borrowers explicit, and aliases created through borrows are always statically known.

The Prusti verification tool for Rust [1] allows users to verify that a program complies with its specification. Both tools provide similar guarantees and require similar annotations. However, they differ in their implementation. Indeed, Prusti works by translating separation constraints enforced by the Rust type system to the intermediate verification language of the Viper tool [9]. Our work differs here, as we use the ownership system to abstract away memory related concerns, so that the verification process does not need to be aware of them.

In a recent work [7], Matsushita et al. propose a translation to CHCs for Rust programs. Like in our approach, the restrictions imposed by the ownership policy are key for the soundness of their method. However, while we introduce the notion of borrow relation to be able to use a standard WP calculus, they present a new calculus specifically tailored to Rust references.

#### **8 Conclusion**

We have presented a recent extension of the SPARK language and toolset to support pointers. It is based on an ownership policy enforcing non-aliasing. To support pointer-based recursive data structures, a restricted form of aliasing is introduced in SPARK through local borrowers, which can be used to iterate through a linked data structure in an imperative way. We have described how local borrowers can be supported by the verification tool, without introducing a memory model, by using a mutable predicate named the borrow relation. This borrow relation can be described when necessary using special annotations named pledges, which solely consist of SPARK standard expressions, and do not expose the underlying verification technique. Our work is available in the 20.1 release of SPARK Pro and will be part of the next community release.

As for future work, we would like to extend the subset of Ada pointers supported in SPARK. In particular, we would like to introduce function pointers to model callbacks, pointers to constants with a more permissive ownership policy, and local borrowing of objects allocated on the stack.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Ivy: A Multi-modal Verification Tool for Distributed Algorithms**

Kenneth L. McMillan1(B) and Oded Padon<sup>2</sup>

<sup>1</sup> Microsoft Research, Redmond, USA kenmcmil@microsoft.com <sup>2</sup> Stanford University, Stanford, USA padon@cs.stanford.edu

**Abstract.** Ivy is a multi-modal verification tool for correct design and implementation of distributed protocols and algorithms, supporting modular specification, implementation and proof. Ivy supports proving safety and liveness properties of parameterized and infinite-state systems via three modes: deductive verification using an SMT solver, abstraction and model checking, and manual proofs using natural deduction. It supports light-weight formal methods via compositional specification-based testing and bounded model checking. Ivy can extract executable distributed programs by translation to efficient C++ code. It is designed to support decidable automated reasoning, to improve proof stability and to provide transparency in the case of proof failures. For this purpose, it presents concrete finite counterexamples, automatically audits proofs for decidability of verification conditions, and provides modular hiding of theories.

## **1 Introduction**

Ivy is an open-source [16] multi-modal verification tool for correct design and implementation of distributed algorithms, supporting modular specification, implementation and proof. The motivating principles of Ivy are *predictability*, *stability* and *transparency*. That is, automated proof steps should provide complexity bounds, should be insensitive to small perturbations, and when they fail should provide actionable feedback. To the extent consistent with these principles, Ivy aims to maximize expressiveness and proof automation, and thus to achieve a high level of user productivity in designing, implementing and proving programs. A major goal of Ivy is to support *decidable reasoning*. That is, automated proof should be restricted to logical fragments for which the tool is a decision procedure. This greatly improves the stability of automated provers, which otherwise rely on fragile heuristics to avoid divergence [28]. This is important for the maintenance of large proofs, to prevent small changes from creating unpredictable proof failures. Moreover, on decidable problems, provers fail transparently by providing true counterexamples, which greatly simplifies the iterative development of proofs. Ivy supports the decomposition of proofs to decidable theories by the use of modular abstraction.

The architecture of Ivy is depicted in Fig. 1. The figure shows the major components of the tool and the information flow between them. Ivy provides a language (also called "Ivy") for the modular description of distributed programs, along with their specifications and proofs (see Sect. 2). Ivy is a synchronous, reactive programming language [3], meaning that the program only executes actions in response to input from its environment, and these actions appear to execute atomically. From an Ivy program, the tool can extract an asynchronous, distributed implementation. A program is made up of reactive modules [1], each having a temporal assume/guarantee-style specification. After parsing of this description and elaboration of templates, the program is decomposed into its component modules, each with associated assumptions and proof obligations, according to a system of proof rules for circular assume/guarantee reasoning (see Sect. 2.1).

These proof obligations are passed on to the tactics engine (see Sect. 3). This engine orchestrates the use of various built-in proof tactics, including decidable invariant checking with an SMT solver (Sect. 3.1), model checking with eager abstraction [19] (Sect. 3.2), liveness proof by translation to safety (Sect. 3.3) and logical deduction rules (Sect. 3.4). Each tactic works by reducing a given proof goal to a (possibly empty) set of sub-goals, from which the original goal can be proved. Combined with modular reasoning, the tactics engine makes it possible to use a variety of proof approaches and proof automation tools in constructing a proof.

Ivy extracts executable distributed programs by translation to C++ (see Sect. 5). From the specifications of a module, Ivy can also generate a modular randomized specification-based tester [7] (see Sect. 4.1). This also makes it possible to test infrastructure not written in Ivy (including hardware) against Ivy specifications.

#### **1.1 Related Work**

Ivy can be thought of as a hybrid between program verification tools such as ESC-Java [11] and Dafny [14], based on the Floyd/Hoare approach, compositional model checking tools, such as Mocha [2] and Cadence SMV [17] and proof assistants based on the LCF model, such as Isabelle [26] or Coq [4]. Compared to program verification tools that support only procedure modularity, Ivy provides a richer form of specification that allows complete hiding of internal state, and provides architectural support for decidable reasoning (see Sect. 2.1). Compared to compositional tools, Ivy integrates a richer variety of reasoning techniques (see Sect. 3). Compared to proof assistants, Ivy provides domain-specific support for decidable proof automation, supporting a greater degree of proof automation [28]. On the other hand, Ivy relies on a vastly larger trusted computing base than typical proof assistants. Moreover, Ivy has no mechanism of reflection, and thus cannot be used for meta-reasoning about programs and program transformations. In principle, all the techniques in Ivy could be integrated into a tool such as Isabelle or Coq but the effort would be large. A less foundational tool such as

**Fig. 1.** Ivy architecture, showing flow between major components. Red, solid arrows represent flow of proof goals and assumptions. Green, dashed arrows represent flow of proofs and/or counterexamples. Not shown is VC generator, shared between Invariant Checking/BMC and Eager Abstraction components. (Color figure online)

Ivy makes it possible to rapidly experiment with new proof and proof automation strategies. Compared to all of these tools, Ivy differs in providing native support for extracting distributed programs, and specification-based testing. A related tool, mypyvy, focuses on more powerful invariant inference techniques, but lacks the other features of Ivy [10,29].

### **2 A Modular Language for Decidable Reasoning**

The primary design goal of Ivy's language is to support decidable reasoning while maximizing expressiveness and performance. Figure 2 is an example of the basic unit of verification in Ivy, called an *isolate*. An isolate is a reactive module that hides internal state and provides a temporal (that is, stateful) specification of its interface. An isolate has named traits that include types, properties, variables and actions. It is divided into a *specification* part and an *implementation* part. The figure shows an example of a simple module that inputs a sequence of numbers and outputs an upper bound on the numbers received thus far.

**Types, Variables and Actions.** The native datatypes in Ivy include just the Boolean type, uninterpreted types, records (structs) over datatypes, and pure first-order functions. In the figure, line 2 declares an uninterpreted type t. Line 6 declares a state variable 'seen' holding a predicate over t. This variable is initialized at line 9. This assigns 'seen(X)' to be the function that returns false for all values of X.

Procedures in Ivy are called *actions* and may have side effects on variables. Parameters are passed by value and there are no references. This greatly simplifies modular reasoning (see Sect. 2.1) and also allows for aggressive compiler optimizations due to the absence of aliasing (see Sect. 5).

In the figure, line 3 declares an action 'ub' that takes an input x of type t and outputs y of type t. Its implementation is given at lines 24 to 27. It updates a state variable 'max' holding the maximum value received thus far, and returns this value by assigning it to the output variable y.

#### **2.1 Modularity and Decidability**

The specification part of the isolate (lines 5 to 18) consists of *ghost* variables and code that are *visible* outside the isolate. The implementation part (lines 19 to 30) consists of *real* variables and code that are *invisible* outside the module. At line 15 the ghost predicate 'seen' is updated to reflect the fact that value x has been seen as an input. Specification code contains assume/guarantee specifications in terms of require and ensure statements. For example, line 12 represents an assumption that input values are non-negative. Line 16 represents a guarantee that output values will be an upper bound on all seen values.

Ghost and real code are kept syntactically separate in Ivy. The specification code is interleaved with the implementation code using the directives 'before' (line 11) and 'after' (line 14). Thus, in the figure, the 'require' statement acts as a precondition, while the 'ensure' statement acts as a postcondition. The implementation code is not allowed to side effect any externally visible state, so it is sound to erase (or 'slice') this code when verifying other modules. Other modules see only the ghost code, which provides an abstract model of the isolate. Similarly, when extracting executable code, it is safe to erase the ghost code (which must be proven to be terminating). This makes it possible, for example, to provide a pure, functional specification of a module interface, even though internally it has state.

Theories can also be hidden inside modules. For example, the implementation of our example interprets the type t as the integers (line 28). For verification purposes, this instantiates the theory of Peano arithmetic for type t. This theory is used *only* to prove correctness of the isolate, and is invisible to other isolates. The theory can be used to prove properties (such as the irreflexivity property at line 7) that provide an abstraction of the type externally. The ability to hide theories behind abstractions provides an important strategy for keeping proof obligations decidable.

An isolate with no implementation part (that is, a "ghost" module) can act as an abstract model of a protocol. Using Ivy's modular rules, an abstract model can be *refined* to an implementation, using properties of the abstract model as lemmas. In addition to simplifying the proof, abstract models provide another useful strategy to hide functions, properties or theories that break decidability. This approach, in combination with theory hiding, was used to verify implementations of distributed consensus protocols [28]. Modularity provides the primary means in Ivy of keeping the automated reasoning decidable.


**Fig. 2.** Example of an Ivy isolate.

#### **3 Verification Tactics**

Ivy provides a range of automated tactics for discharging proof goals that are selected for their relatively predictable and stable performance, and for the ability to fail transparently.

#### **3.1 Invariant Checking with SMT**

The default tactic for proving safety properties is proof by inductive invariant, using the SMT solver Z3 [21]. For example, in Fig. 2, the guarantee at line 16 is proved using the auxiliary inductive invariant at line 29. The invariant relates the hidden implementation state variable 'max' with the visible specification state variable 'seen'. An invariant is a property that is required to hold only between executions of actions of the isolate. That is, actions may temporarily violate an invariant, but must re-establish it before terminating. The VC (verification condition) for the isolate holds if all invariants are established by the intializers and preserved by the interface actions, and if the invariant implies that no assertion in the code fails. These conditions are verified modulo the visible theories.

Before attempting to prove the VC, the invariance tactic sends it to the *fragment checker*, which determines whether the VC is in a logical fragment called FAU [12] for which Z3 is a decision procedure. If the VC is not in FAU, Ivy provides an explanation to the user, by pointing to formulas that create a *function cycle* or that violate rules for the use of quantifiers and interpreted operators of the visible theories. A function cycle is a cycle in a graph whose vertices are types and whose edges are functions (including Skolem functions). This transparent mode of failure helps the user to reorganize the proof to keep the VC's in the decidable fragment.

If a VC in the decidable fragment is false, Z3 fails transparently, producing a true finite counter-model, which is in turn translated into an execution trace that violates an invariant or guarantee. Ivy provides a graphical interactive tool to help the user in strengthening invariants [25] based on counterexamples. If the VC is valid, the tactic discharges the proof goal, returning the empty set of subgoals.

#### **3.2 Eager Abstraction and Model Checking**

An alternative tactic to prove safety properties is model checking with eager abstraction [19]. This technique allows parameterized and infinite-state systems to be verified with a finite-state model checker. The tactic first propositionally strengthens the symbolic transition relation by adding instances of axioms of the logic and theories, or of proved properties. It then propositionally abstracts the transition relation by converting the atomic predicates to Boolean variables. The resulting finite-state abstraction is verified by the ABC model checker [8]. If the property is false, the user is presented with an abstract counterexample expressed in terms of the truth values of the atomic propositions. The user may refine the abstraction by adding instantiation terms or auxiliary invariants. In [19] it was shown that this technique can reduce the burden of constructing auxiliary invariants, simplifying the overall proof of distributed protocols. As an example, the isolate of Fig. 2 can be proved without the auxiliary invariant. With eager abstraction, one need not be concerned with function cycles, but on the other hand, diagnosing abstract counterexamples can be challenging.

This approach is consistent with Ivy's philosophy of using stable and transparent automation, since the finite-state model checker has a single-exponential upper complexity bound and terminates with a proof or a counterexample. This is in contrast to more powerful proof engines such as Horn solvers [6] that suffer from unpredictable divergence. In practice, although eager abstraction is not fully automated, it can handle problems that are substantially beyond the capabilities of current Horn solvers.

#### **3.3 Liveness-to-Safety Transformation**

Ivy supports proofs of temporal properties, e.g., liveness properties, via a liveness-to-safety transformation. Temporal properties are specified in first-order linear temporal logic (FO-LTL). The liveness-to-safety tactic reduces a temporal proof goal into a safety proof goal, which can then be proven using an inductive invariant. For finite-state or parameterized systems, any temporal property can be proven by showing the absence of fair cycles, which is a safety property [27]. For infinite-state systems such an argument is not sound, and Ivy implements *dynamic abstraction* which generalizes the notion of fair cycles to infinite-state systems in a sound and powerful way [23,24]. With dynamic abstraction, Ivy's liveness-to-safety tactic supports temporal proofs of infinitestate systems, including both distributed systems with infinite-state per process and systems with *unbounded parallelism*, where new processes can be dynamically created so an infinite trace may involve infinite set of processes.

```
1 isolate bar = {
2 finite type t
3 action step(x:t)
4 specification {
5 relation pending(X:t)
6 instance enter : signal
7
8 after init {
9 pending(X) := true;
10 }
11 before step {
12 require pending(x);
13 call enter.raise;
14 pending(x) := false;
15 }
                           16 temporal property (-
                                                    ♦ enter.now) →
                           17 ♦ ∀X. ¬pending(X)
                           18 proof {
                           19 tactic l2s with
                           20 invariant ♦ enter.now
                           21 invariant ($was$ ¬pending(X)) → ¬pending(X)
                           22 invariant ($happened$ enter.now) →
                           23 ∃X. ($was$ pending(X)) ∧ ¬pending(X)
                           24 }
                           25 }
                           26 }
```
**Fig. 3.** Example of an Ivy isolate with a temporal property.

The liveness-to-safety tactic fits within Ivy's philosophy of using decidable reasoning. The more standard way of proving liveness properties is to use ranking functions, but for distributed systems, the required rankings often involve cardinalities of sets defined via first-order formulas, resulting in verification conditions that fall outside FAU and other decidable fragments. In contrast, the transformation to safety based on fair cycles and dynamic abstraction results in verification conditions which are often in the FAU fragment. Furthermore, since the temporal proof is transformed to a safety verification problem, it is possible to leverage for liveness proofs all the tactics and mechanisms that Ivy contains for safety verification.

When the liveness-to-safety tactic is applied, Ivy constructs a symbolic *cycle detection transition system*, which tracks fairness constraints and includes a *shadow* or *saved copy* of the state variables, similar to [5]. For finite-state or parameterized systems, it is enough to show that it is not possible to revisit the saved state while satisfying all fairness constraints. This can be shown by an inductive invariant, and Ivy contains special syntax for writing the invariant of the cycle detection system (e.g., to access the saved copy of state variables). For infinite-state systems, Ivy's cycle detection system includes dynamic abstraction, and invariants may also refer to the state of the abstraction [23].

Figure 3 shows an example of a simple liveness proof of an abstract model in Ivy. The type t (line 2) is declared as finite, which means it is sound to use a fair cycle argument without dynamic abstraction. The specification state of the system consists of a single unary relation, pending, which is initialized to true for all values of type t. The step action (line 11) removes a single value from the pending relation. This can model, e.g., execution of tasks from a finite pool of pending tasks. The temporal property that we prove (line 16) is that if step is called infinitely often, then eventually nothing is pending. At line 13, we detect the call by raising a flag enter.now. The proof applies the liveness-to-safety (l2s) tactic (line 19), and supplies inductive invariants for the cycle detection system. The special operators \$was\$ and \$happened\$ are used to refer to the saved state, and the fairness constraints, respectively. The crux of the invariant is that after

```
1 axiom eid(X) = eid(Y ) → X = Y
2 axiom mgr(X, Y ) ∧ mgr(X, Z) → Y = Z
3 explicit axiom [mgr total] ∃Y. mgr(X, Y )
4 axiom mgr(X, X) → X = ceo
5
6 invariant mgr(X, Y ) ∧ scanned(Y ) → mid(X) = eid(Y )
7
8 action get mid(x:emp) returns (res:id) = {
9 require ∀Y.scanned(Y );
10 res := mid(x);
11 ensure x = ceo → res = eid(x);
12 proof {
13 assume mgr total with X = x
14 }
15 }
```
**Fig. 4.** Example of manual quantifier instantiation with a tactic

enter.now has happened, there is some element which was pending in the saved state and is not pending anymore, showing that the system has no fair cycle.

#### **3.4 Logical Tactics**

Though most of a proof in Ivy is done with the above automated proof tactics, there are occasional situations in which a small amount of detailed manuallyguided proof is needed, or is preferable to restructuring the proof. For this purpose, Ivy provides logical proof tactics that can be applied to properties, invariants or code assertions, either to complete the proof or to reduce it to subgoals that can be discharged by the automated tactics. A simple example is shown in Fig. 4. Here, mgr(X, Y ) indicates that the manager of employee X is Y and eid(X) is the employee id of X. We assume that employee ids are unique, each employee has exactly one manager and that only the CEO is her own manager (lines 1 to 4). Action get mid(x) returns the id of the manager of employee x. For this purpose, a procedure (not shown) scans the employees m and sets mid(x) = eid(m) for each x managed by m, establishing the invariant at line 6. Action get mid(x) requires that all employees have been scanned and ensures that the return value is not the id of x, unless x is the CEO.

Axiom mgr total states that for all employees there exists a manager (the universal quantifier on X is implicit). Ivy complains that this quantifier alternation puts the VC outside the decidable fragment. We can solve this with a manual quantifier instantiation. We first tag the axiom *explicit*, meaning that it is not used by the default tactic. We then apply the tactic 'assume' (line 13) to instantiate this axiom for X = x. The resulting assumption ∃Y.mgr(x, Y ) has no alternation. The modified proof goal is discharged by the default tactic using Z3. Ivy's proof engine is based on the λΠ calculus [13] and a deterministic secondorder matching algorithm [30]. The Ivy standard library uses this framework to define proof rules for natural deduction, similarly to Isabelle/FOL [26]. Logical tactics also make it possible to perform theory reasoning outside the decidable fragment, for example, applying the Peano induction axiom.

## **4 Light-Weight Formal Methods**

#### **4.1 Compositional Specification-Based Testing**

Before attempting a formal proof that an isolate satisfies its specification, it is useful to debug it using testing. For this purpose, Ivy provides compositional specification-based testing. The testers that Ivy produces generate randomized input sequences for an isolate that satisfy its assumptions and check the outputs against the isolate's guarantees. This is similar in principle to specification-based testing tools such as QuickCheck [9], but is reactive and compositional. Compositionality provides a kind of completeness for unit testing. That is, if a system fails its specification, then there is a local test of some component that fails. Unlike QuickCheck, Ivy does not require the user to provide generators for datatypes, instead relying on SMT solving for this purpose. Ivy can also be used to generate specification-based tests for hardware or software systems not written in Ivy. For example, it has been used to find bugs in memory hierarchy components for RISC-V processors [18], and the QUIC secure Internet transport protocol [20].

#### **4.2 Bounded and Finite-State Model Checking**

For debugging, Ivy supports bounded model checking. This is decidable if the VC's are in the decidable fragment. It also allows uninterpreted types to be finitely instantiated, allowing under-approximate model checking in the style of TLC [31].

#### **5 Extracting Efficient Executable Code**

*Compilation.* The implementation part of an Ivy program can be extracted as executable code in C++. To be extractable, the implementation must satisfy certain computability conditions, for example that all quantifiers in conditionals be bounded. For functions, the compiler can choose among several representations: a closure, a dense representation as an array, or a sparse representation as a hash table. The dense representation is unboxed, allowing a cache-efficient contiguous representation of an array of structures and reducing allocation overhead.

Because there are no references in Ivy, there is a risk of copying large structures passed as arguments. However, the lack of aliasing makes it relatively easy for the compiler to detect linear use of data, allowing call and return by reference in the extracted code, and in-place update of structures. Subtype polymorphism in Ivy is implemented by the compiler using smart pointers, allowing structure sharing (and potentially copy-on-write, though this is not yet implemented). In addition, the compiler borrows a technique from the Rust language [22] to introduce references. Consider the Ivy code on the left of Fig. 5 that looks up a value in a map, operates on it, then writes it back into the map. The compiler recognizes this as an instance of the "borrowing" pattern and renders it as the C++ code on the right, which operates on the value in the map by reference.


**Fig. 5.** Updating a map in place using the borrow pattern.

This is possible because the of lack of aliasing and the fact that the compiler understands the underlying data structures. A C++ compiler cannot accomplish this optimization because of the difficulty of pointer analysis in the map implementation and the called operator f. Benchmarks of an older Ivy compiler [28] on distributed protocols showed comparable performance to implementation in OCaml and Go, though Ivy is purely value-based, while these languages support references.

*Concurrency.* Although Ivy is a synchronous reactive language, the compiler can extract parameterized distributed programs from Ivy programs in a sound way. In a parameterized module, each action and state variable has a first parameter representing a *location*. The compiler verifies that different locations do not interfere with each-other, and then extracts an executable process that takes its location as a parameter. Ivy guarantees that executing the locations concurrently is observably equivalent sequential execution, based on a left-mover/right-mover argument [15,28].

*Run-Time Support.* Ivy provide a standard library that includes useful abstractions, such ordered datatypes and arrays, as well as formally specified interfaces to networking services provided by operating systems. In addition, the compiler automatically generates marshaling and unmarshaling code for user-defined datatypes. These facilities make it relatively straightforward to implement verified networked protocols in Ivy.

#### **6 Conclusion**

Ivy has been designed to provide predictability, stability and transparency in the process of developing verified systems. For this purpose, it integrates a collection of verification techniques that provide these properties, while attempting to maximize the expressiveness of the language, the degree of proof automation, and the efficiency of extracted code. By setting the division of labor between the human and automated provers appropriately, it aims to increase the productivity of the overall process of formal development.

#### **References**

1. Alur, R., Henzinger, T.A.: Reactive modules. In: Proceedings, 11th Annual IEEE Symposium on Logic in Computer Science, New Brunswick, New Jersey, USA, 27–30 July 1996, pp. 207–218. IEEE Computer Society (1996)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Reasoning over Permissions Regions in Concurrent Separation Logic**

James Brotherston1(B), Diana Costa<sup>1</sup>, Aquinas Hobor<sup>2</sup>, and John Wickerson<sup>3</sup>

 University College London, London, UK J.Brotherston@ucl.ac.uk National University of Singapore, Singapore, Singapore Imperial College London, London, UK

**Abstract.** We propose an extension of separation logic with fractional permissions, aimed at reasoning about concurrent programs that share arbitrary *regions* or data structures in memory. In existing formalisms, such reasoning typically either fails or is subject to stringent side conditions on formulas (notably *precision*) that significantly impair automation. We suggest two formal syntactic additions that collectively remove the need for such side conditions: first, the use of both "weak" and "strong" forms of separating conjunction, and second, the use of nominal labels from hybrid logic. We contend that our suggested alterations bring formal reasoning with fractional permissions in separation logic considerably closer to common pen-and-paper intuition, while imposing only a modest bureaucratic overhead.

**Keywords:** Separation logic · Permissions · Concurrency · Verification

#### **1 Introduction**

*Concurrent separation logic* (CSL) is a version of separation logic designed to enable compositional reasoning about concurrent programs that manipulate memory possibly shared between threads [6,26]. Like standard separation logic [28], CSL is based on *Hoare triples* {A} <sup>C</sup> {B}, where <sup>C</sup> is a program and A and B are formulas (called the *precondition* and *postcondition* of the code respectively). The heart of the formalism is the following *concurrency rule*:

$$\frac{\left\{A\_1\right\}C\_1\left\{B\_1\right\} \quad \left\{A\_2\right\}C\_2\left\{B\_2\right\}}{\left\{A\_1 \circledast A\_2\right\}C\_1\left\|\right\|C\_2\left\{B\_1 \circledast B\_2\right\}}$$

where is a so-called *separating conjunction*. This rule says that if two threads C<sup>1</sup> and C<sup>2</sup> are run on spatially separated resources A1-A<sup>2</sup> then the result will be the spatially separated result, B<sup>1</sup> -B2, of running the two threads individually.

However, since many or perhaps even most interesting concurrent programs do share some resources, typically does not denote strict disjoint separation of memories, as it does in standard separation logic (where it is usually written as ∗). Instead, it usually denotes a weaker sort of "separation" designed to ensure that the two threads at least cannot interfere with each others' data. This gives rise to the idea of *fractional permissions*, which allow us to divide writeable memory into multiple read-only copies by adding a permission value to each location in heap memory. In the usual model, due to Boyland [5], permissions are rational numbers in the half-open interval (0, 1], with 1 denoting the write permission, and values in (0, 1) denoting read-only permissions. We write the formula Aπ, where π is a permission, to denote a "<sup>π</sup> share" of the formula <sup>A</sup>. For example, (<sup>x</sup> → <sup>a</sup>)<sup>0</sup>.<sup>5</sup> (typically written as x <sup>0</sup>.<sup>5</sup> → <sup>a</sup> for convenience) denotes a "half share" of a single heap cell, with address x and value a. The separating conjunction A-B then denotes heaps realising A and B that are "compatible", rather than disjoint: where the heaps overlap, they must agree on the data value, and one adds the permissions at the overlapping locations [4]. E.g., at the logical level, we have the entailment:

$$x \stackrel{0.5}{\mapsto} a \oplus x \stackrel{0.5}{\mapsto} b \mid = a = b \land x \mapsto a. \tag{1}$$

Happily, the concurrency rule of CSL is still sound in this setting (see e.g. [29]).

However, the use of this weaker notion of separation causes complications for formal reasoning in separation logic, especially if one wishes to reason over arbitrary regions of memory rather than individual pointers. There are two particular difficulties, as identified by Le and Hobor [24]. The first is that, since denotes possibly-overlapping memories, one loses the main useful feature of separation logic: its nonambiguity about separation, which means that desirable entailments such as A0.<sup>5</sup> - <sup>B</sup>0.<sup>5</sup> <sup>|</sup>= (<sup>A</sup> -B)<sup>0</sup>.<sup>5</sup> turn out to be false. E.g.:

$$x \stackrel{0.5}{\mapsto} a \circledast y \stackrel{0.5}{\mapsto} b \not\equiv (x \mapsto a \circledast y \mapsto b)^{0.5}.$$

Here, the two "half-pointers" on the LHS might be aliased (x = y and a = b), meaning they are two halves of the same pointer, whereas on the RHS they must be non-aliased (because we cannot combine two "whole" pointers). This ambiguity becomes quite annoying when one adds arbitrary predicate symbols to the logic, e.g. to support inductively defined data structures.

The second difficulty is that although recombining single pointers is straightforward, as indicated by Eq. (1), recombining the shares of arbitrary formulae is challenging. E.g., A<sup>0</sup>.<sup>5</sup> -<sup>A</sup><sup>0</sup>.<sup>5</sup> |<sup>=</sup> <sup>A</sup>, as shown by the counterexample

$$(x \mapsto 1 \lor y \mapsto 2)^{0.5} \circledast (x \mapsto 1 \lor y \mapsto 2)^{0.5} \neq x \mapsto 1 \lor y \mapsto 2.$$

The LHS can be satisfied by a heap with a 0.5-share of x and a 0.5-share of y, whereas the RHS requires a full (1) share of either x or y.

Le et al. [24] address these problems by a combination of the use of *tree shares* (essentially Boolean binary trees) rather than rational numbers as permissions, and semantic restrictions on when the above sorts of permissions reasoning can be applied. For example, recombining permissions (A<sup>0</sup>.<sup>5</sup>-<sup>A</sup><sup>0</sup>.<sup>5</sup> <sup>|</sup><sup>=</sup> <sup>A</sup>) is permitted only when the formula is *precise* in the usual separation logic sense (cf. [28]). The chief drawback with this approach is the need to repeatedly check these side conditions on formulas when reasoning, as well as that said reasoning cannot be performed on imprecise formulas.

Instead, we propose to resolve these difficulties by a different, two-pronged extension to the syntax of the logic. First, we propose that the usual "strong" separating conjunction ∗, which enforces the strict disjointness of memory, *should be retained* in the formalism in addition to the weaker -. The stronger ∗ supports entailments such as <sup>A</sup><sup>0</sup>.<sup>5</sup> <sup>∗</sup> <sup>B</sup><sup>0</sup>.<sup>5</sup> <sup>|</sup>= (<sup>A</sup> <sup>∗</sup> <sup>B</sup>)<sup>0</sup>.<sup>5</sup>, which does not hold when is used instead. Second, we introduce *nominal labels* from hybrid logic (cf. [3,10]) to remember that two copies of a formula have the same origin. We write a nominal <sup>α</sup> to denote a unique heap, in which case entailments such as (<sup>α</sup> <sup>∧</sup> A)<sup>0</sup>.<sup>5</sup> - (<sup>α</sup> <sup>∧</sup> <sup>A</sup>)<sup>0</sup>.<sup>5</sup> <sup>|</sup><sup>=</sup> <sup>α</sup> <sup>∧</sup> <sup>A</sup> become valid. We remark that labels have been adopted for similar "tracking" purposes in several other separation logic proof systems [10,21,23,25].

The remainder of this paper aims to demonstrate that our proposed extensions are (i) weakly *necessary*, in that expected reasoning patterns fail under the usual formalism, (ii) *correct*, in that they recover the desired logical principles, and (iii) *sufficient* to verify typical concurrent programming patterns that use sharing. Section 2 gives some simple examples that motivate our extensions. Section 3 then formally introduces the syntax and semantics of our extended formalism. In Sect. 4 we show that our logic obeys the logical principles that enable us to reason smoothly with fractional permissions over arbitrary formulas, and in Sect. 5 we give some longer worked examples. Finally, in Sect. 6 we conclude and discuss directions for future work.

#### **2 Motivating Examples**

In this section, we aim to motivate our extensions to separation logic with permissions by showing, firstly, how the failures of the logical principles described in the introduction actually arise in program verification examples and, secondly, how these failures are remedied by our proposed changes.

The overall context of our work is reasoning about concurrent programs that share some data structure or region in memory, which can be described as a formula in the assertion language. If A is such a formula then we write A<sup>π</sup> to denote a "π share" of the formula A, meaning informally that all of the pointers in the heap memory satisfying A are owned with share π. The main question then becomes how this notion interacts with the separating conjunction -. There are two key desirable logical equivalences:

$$(A \circledast B)^{\pi} \equiv A^{\pi} \circledast B^{\pi} \tag{I}$$

$$A^{\pi \oplus \sigma} \equiv A^{\pi} \circledast A^{\sigma} \tag{\text{II}}$$

Equivalence (I) describes distributing a fractional share over a separating conjunction, whereas equivalence (II) describes combining two pieces of a previously split resource. Both equivalences are true in the |= direction but, as we have seen in the Introduction, false in the =| one. Generally speaking, is like Humpty Dumpty: easy to break apart, but not so easy to put back together again.

The key to understanding the difficulty is the following equivalence:

$$x \overset{\pi}{\mapsto} a \circledast y \overset{\sigma}{\mapsto} b \equiv (x \overset{\pi}{\mapsto} a \ast y \overset{\sigma}{\mapsto} b) \vee (x = y \wedge a = b \wedge x \overset{\pi \oplus \sigma}{\mapsto} a)$$

In other words, either x and y are not aliased, or they *are* aliased and the permissions combine (the additive operation ⊕ on rational shares is simply normal addition when the sum is ≤ 1 and undefined otherwise). This disjunction undermines the notational economies that have led to separation logic's great successes in scalable verification [11]; in particular, (I) fails because the left disjunct might be true, and (II) fails because the right disjunct might be. At a high level, is a bit too easy to introduce, and therefore also a bit too hard to eliminate.

#### **2.1 Weak vs. Strong Separation and the Distribution Principle**

One of the challenges of the weak separating conjunction is that it interacts poorly with inductively defined predicates. Consider porting the usual separation logic definition of a possibly-cyclic linked list segment from x to y from a sequential setting to a concurrent one by a simple substitution of for ∗:

$$
\mathsf{Is}\ x\ y\ =\_{\mathsf{def}} (x = y \land \mathsf{emp}) \lor (\exists z.\ x \leftrightarrow z \circledast \mathsf{ls}\ z\ y).
$$

Now consider a simple recursive procedure foo(x,y) that traverses a linked list segment from x to y:

```
foo(x,y) { if x=y then return; else foo([x],y); }
```
It is easy to see that foo leaves the list segment unchanged, and therefore satisfies the following Hoare triple:

```
{(ls x y)0.5} foo(x,y); {(ls x y)0.5}.
```
The intuitive proof of this fact would run approximately as follows:

```
{(ls x y)0.5} foo(x,y) {
             if x=y then return; {(ls x y)0.5}
             else {x = y ∧ (x → z -
                                                    ls z y)0.5}
                                   {x
                                      0.5 → z -
                                            (ls z y)0.5}
             foo([x],y); {x
                                      0.5 → z -
                                            (ls z y)0.5}
                                 ×
                                   {(x → z -
                                            ls z y)0.5}
                                   {(ls x y)0.5}
           } {(ls x y)0.5}
```
However, because of the use of -, the highlighted inference step is not sound:

$$x \stackrel{0.5}{\mapsto} z \circledast (\mathfrak{ls} \, z \, y)^{0.5} \, \not\equiv (x \mapsto z \circledast \mathfrak{s} \, z \, y)^{0.5}.\tag{2}$$

To see this, consider a heap with the following structure, viewed in two ways:

$$x \stackrel{0.5}{\leftrightarrow} z \oplus z \stackrel{0.5}{\leftrightarrow} x \oplus x \stackrel{0.5}{\leftrightarrow} z \quad = \ x \mapsto z \oplus z \stackrel{0.5}{\leftrightarrow} x$$

This heap satisfies the LHS of the entailment in (2), as it is the --composition of a 0.5-share of <sup>x</sup> → <sup>z</sup> and a 0.5-share of ls z z, a cyclic list segment from <sup>z</sup> back to itself (note that here z = y). However, it does not satisfy the RHS, since it is not a 0.5-share of the --composition of <sup>x</sup> → <sup>z</sup> with ls z z, which would require the pointer to be disjoint from the list segment.

The underlying reason for the failure of this example is that, in going from (<sup>x</sup> → <sup>z</sup> ls z z)<sup>0</sup>.<sup>5</sup> to x <sup>0</sup>.<sup>5</sup> → <sup>z</sup> - (ls z z)<sup>0</sup>.<sup>5</sup>, we have lost the information that the pointer and the list segment are actually disjoint. This is reflected in the general failure of the distribution principle A<sup>π</sup> - <sup>B</sup><sup>π</sup> <sup>|</sup>= (<sup>A</sup> - B)<sup>π</sup>, of which the above is just one instance. Accordingly, our proposal is that the "strong" separating conjunction ∗ from standard separation logic, which forces disjointness of the heaps satisfying its conjuncts, should *also* be retained in the logic alongside -, on the grounds that (II) *is* true for the stronger connective:

$$(A\*B)^{\pi} \equiv A^{\pi} \* B^{\pi}.\tag{3}$$

If we then define our list segments using ∗ in the traditional way, namely

$$\mathsf{lsx}\,y\, =\_{\text{def}} (x = y \land \mathsf{emp}) \lor (\exists z.\,\, x \mapsto z \ast \mathsf{ls}\, z\,\, y),$$

then we can observe that this second definition of ls is identical to the first on permission-free formulas, since and ∗ coincide in that case. However, when we replay the verification proof above with the new definition of ls, every in the proof above becomes a ∗, and the proof then becomes sound. Nevertheless, we can still use to describe permission-decomposition of list segments at a higher level; e.g., ls x y can still be decomposed as (ls x y)<sup>0</sup>.<sup>5</sup> -(ls x y)<sup>0</sup>.<sup>5</sup>.

#### **2.2 Nominal Labelling and the Combination Principle**

Unfortunately, even when we use the strong separating conjunction ∗ to define list segments ls, a further difficulty still remains. Consider a simple concurrent program that runs two copies of foo in parallel on the same list segment:

$$\mathsf{foo(x,y)}; \ || \ \mathsf{foo(x,y)};$$

Since foo only reads from its input list segment, and satisfies the specification {(ls x y)<sup>0</sup>.<sup>5</sup>} foo(x,y); {(ls x y)<sup>0</sup>.<sup>5</sup>}, this program satisfies the specification

$$\{\mathsf{lsx}\,y\}\mathsf{foo(x,y)};\ \left|\begin{array}{c} \mathsf{foo(x,y)} \end{array};\ \{\mathsf{lsx}\,y\}.\right.$$

Now consider constructing a proof of this specification in CSL. First we view the list segment ls x y as the --composition of two read-only copies, with permission 0.5 each; then we use CSL's concurrency rule (see Sect. 1) to compose the specifications of the two threads; last we recombine the two read-only copies to obtain the original list segment. The proof diagram is as follows:

$$\{\operatorname{ls}x\,y\}$$

$$\{ (\operatorname{ls}x\,y)^{0.5} \otimes (\operatorname{ls}x\,y)^{0.5} \}$$

$$\{ (\operatorname{ls}x\,y)^{0.5} \} \qquad \left\| \begin{array}{c} \{ (\operatorname{ls}x\,y)^{0.5} \} \\ \{ (\operatorname{ls}x\,y)^{0.5} \} \\ \end{array} \right\}$$

$$\{ (\operatorname{ls}x\,y)^{0.5} \} \qquad \left\| \begin{array}{c} \{ (\operatorname{ls}x\,y)^{0.5} \} \\ \{ (\operatorname{ls}x\,y)^{0.5} \} \end{array} \right\}$$

$$\{ (\operatorname{ls}x\,y)^{0.5} \otimes (\operatorname{ls}x\,y)^{0.5} \}$$

$$\mathbb{N}\_{\diamondsuit} \{ \operatorname{ls}x\,y \}$$

However, again, the highlighted inference step in this proof is not correct:

$$(\mathfrak{ls}\,x\,y)^{0.5} \circledast (\mathfrak{ls}\,x\,y)^{0.5} \not\equiv \mathfrak{ls}\,x\,y. \tag{4}$$

A countermodel is a heap with the following structure, again viewed in two ways:

$$(x \stackrel{0.5}{\leftrightarrow} y \oplus y \stackrel{0.5}{\leftrightarrow} y) \circledast x \stackrel{0.5}{\leftrightarrow} y \quad = \ x \mapsto y \circledast y \stackrel{0.5}{\leftrightarrow} y$$

According to the first view of such a heap, it satisfies the LHS of (4), as it is the --composition of two 0.5-shares of ls x y (one of two cells, and one of a single cell). However, it does not satisfy ls x y, since that would require every cell in the heap to be owned with permission 1.

Like in our previous example, the reason for the failure of this example is that we have lost information. In going from ls x y to (ls x y)<sup>0</sup>.<sup>5</sup> - (ls x y)<sup>0</sup>.<sup>5</sup>, we have forgotten that the two formulas (ls x y)<sup>0</sup>.<sup>5</sup> are in fact *copies of the same region*. For formulas A that are *precise* in that they uniquely describe part of any given heap [12], e.g. formulas <sup>x</sup> → <sup>a</sup>, this loss of information does not happen and we do have A<sup>0</sup>.<sup>5</sup> - <sup>A</sup><sup>0</sup>.<sup>5</sup> <sup>|</sup><sup>=</sup> <sup>A</sup>; but for non-precise formulas such as ls x y, this principle fails.

However, we regard this primarily as a technical shortcoming of the formalism, rather than a failure of our intuition. It *ought* to be true that we can take any region of memory, split it into two read-only copies, and then later merge the two copies to re-obtain the original region. Were we conducting the above proof on pen and paper, we would very likely explain the difficulty away by adopting some kind of labelling convention, allowing us to remember that two formulas have been obtained from the same memory region by dividing permissions.

In fact, that is almost exactly our proposed remedy to the situation. We introduce *nominals*, or *labels*, from hybrid logic, where a nominal α is interpreted as denoting a unique heap. Any formula of the form <sup>α</sup>∧<sup>A</sup> is then precise (in the above sense), and so obeys the combination principle

$$(\alpha \wedge A)^{\pi} \circledast (\alpha \wedge A)^{\sigma} \vcentcolon (\alpha \wedge A)^{\sigma \oplus \pi},\tag{5}$$

where ⊕ is addition on permissions. Thus we can repair the faulty CSL proof above by replacing every instance of the formula ls x y by the "labelled" formula <sup>α</sup> <sup>∧</sup> ls x y (and adding an initial step in which we introduce the fresh label <sup>α</sup>).

#### **2.3 The Jump Modality**

However, this is not quite the end of the story. Readers may have noticed that replacing ls x y by the "labelled" version <sup>α</sup> <sup>∧</sup> ls x y also entails establishing a slightly stronger specification for the function foo, namely:

$$\{ (\alpha \land \mathsf{ls} \, x \, y)^{0.5} \} \, \mathsf{foo} \, \mathsf{(x, y)} \; ; \, \{ (\alpha \land \mathsf{ls} \, x \, y)^{0.5} \} \, .$$

This introduces an extra difficulty in the proof (cf. Sect. 2.1); at the recursive call to foo([x],y), the precondition now becomes <sup>α</sup>0.<sup>5</sup> <sup>∧</sup>(<sup>x</sup> <sup>0</sup>.<sup>5</sup> → <sup>z</sup> <sup>∗</sup> (ls z y)<sup>0</sup>.<sup>5</sup>)), which means that we cannot apply separation logic's *frame rule* [32] to the pointer formula without first weakening away the label-share α0.<sup>5</sup>.

For this reason, we shall also employ hybrid logic's "jump" modality @ , where the formula @αA means that A is true of the heap denoted by the label <sup>α</sup>. In the above, we can introduce labels <sup>β</sup> and <sup>γ</sup> for the list components <sup>x</sup> → <sup>z</sup> and ls z y respectively, whereby we can represent the decomposition of the list by the assertion @α(<sup>β</sup> <sup>∗</sup> <sup>γ</sup>). Since this is a *pure* assertion that does not depend on the heap, it can be safely maintained when applying the frame rule, and used after the function call to restore the label α, using the easily verifiable fact that

$$
\left\langle \left\| \left( \beta \ast \gamma \right) \wedge \left( \beta \ast \gamma \right) \right\| \right\rangle \left\vert = \alpha.
$$

Similar reasoning over labelled decompositions of data structures is seemingly necessary whenever treating recursion; we return to it in more detail in Sect. 5.

#### **3 Separation Logic with Labels and Permissions (SLLP)**

Following the motivation given in the previous section, here we give the syntax and semantics of a separation logic, SLLP, with permissions over arbitrary formulas, making use of both strong *and* weak separating conjunctions, and nominal labels (from hybrid logic [3,10]). First, we define a suitable notion of permissions and associated operations.

**Definition 3.1.** *<sup>A</sup>* permissions algebra *is a tuple* Perm, <sup>⊕</sup>, <sup>⊗</sup>, <sup>1</sup>*, where* Perm *is a set (of "permissions"),* 1 ∈ Perm *is called the* write permission*, and* ⊕ *and* ⊗ *are respectively partial and total binary functions on* Perm*, satisfying associativity, commutativity, cancellativity and the following additional axioms:*


The most common example of a permissions algebra is the Boyland fractional permission model (0, 1]∩Q, <sup>⊕</sup>, <sup>×</sup>, <sup>1</sup>, where permissions are rational numbers in (0, 1], <sup>×</sup> is standard multiplication, and <sup>⊕</sup> is standard addition but undefined if p + p > 1. From now on, we assume a fixed but arbitrary permissions algebra.

With the permissions structure in place, we can now define the syntax of our logic. We assume disjoint, countably infinite sets Var of variables, Pred of predicate symbols (with associated arities) and Label of labels.

**Definition 3.2.** *We define* formulas *of* SLLP *by the grammar:*

$$\begin{array}{c} A ::= x = y \mid \neg A \mid A \land A \mid A \lor A \mid A \to A\\ \mid \mathtt{emp} \mid x \mapsto y \mid P(\mathtt{x}) \mid A \ast A \mid A \otimes A \mid A \twoheadrightarrow A \mid A \multimap A \quad \text{(spatiial)}\\ \mid A^{\pi} \mid \alpha \mid \mathtt{@}\_{\alpha} A \end{array}$$

*where* x, y *range over* Var*,* π *ranges over* Perm*,* P *ranges over* Pred*,* α *ranges over* Label *and* **x** *ranges over tuples of variables of length matching the arity of the predicate symbol* P*. We write* x <sup>π</sup> → <sup>y</sup> *for* (<sup>x</sup> → <sup>y</sup>)<sup>π</sup>*, and* <sup>x</sup> <sup>=</sup> <sup>y</sup> *for* <sup>¬</sup>(<sup>x</sup> <sup>=</sup> <sup>y</sup>)*.*

The "magic wands" −−∗ and −− are the implications adjoint to ∗ and -, as usual in separation logic. We include them for completeness, but we use −−∗ only for fairly complex examples (see Sect. 5.3) and in fact do not use −−at all.

*Semantics.* We interpret formulas in a standard model of stacks and heapswith-permissions (cf. [4]), except that our models also incorporate a valuation of nominal labels. We assume an infinite set Val of *values* of which an infinite subset Loc ⊂ Val are considered addressable *locations*. A *stack* is as usual a map <sup>s</sup> : Var <sup>→</sup> Val. A *heap-with-permissions*, which we call a *p-heap* for short, is a finite partial function <sup>h</sup> : Loc fin Val×Perm from locations to value-permission pairs. We write dom (h) for the *domain* of h, i.e. the set of locations on which h is defined. Two p-heaps <sup>h</sup><sup>1</sup> and <sup>h</sup><sup>2</sup> are called *disjoint* if dom (h1)∩dom (h2) = <sup>∅</sup>, and *compatible* if, for all <sup>∈</sup> dom (h1) <sup>∩</sup> dom (h2), we have <sup>h</sup>1()=(v, π1) and <sup>h</sup>2(v, π2) and <sup>π</sup><sup>1</sup> <sup>⊕</sup> <sup>π</sup><sup>2</sup> is defined. (Thus, trivially, disjoint heaps are also compatible.) We define the multiplication <sup>π</sup> · <sup>h</sup> of a p-heap <sup>h</sup> by permission <sup>π</sup> by extending ⊗ pointwise:

$$(\pi \cdot h)(\ell) = (v, \pi \otimes \pi') \quad \Leftrightarrow \quad h(\ell) = (v, \pi').$$

We also assume that each predicate symbol P of arity k is given a fixed interpretation -<sup>P</sup> <sup>∈</sup> (Val<sup>k</sup> <sup>×</sup>PHeaps), where PHeaps is the set of all p-heaps. Here we allow an essentially free interpretation of predicate symbols, but they could also be given by a suitable inductive definition schema, as is done in many papers on separation logic (e.g. [7,8]). Finally, a *valuation* is a function <sup>ρ</sup> : Label <sup>→</sup> PHeaps assigning a single p-heap ρ(α) to each label α.

**Definition 3.3 (Strong and weak heap composition).** *The* strong composition <sup>h</sup><sup>1</sup> ◦ <sup>h</sup><sup>2</sup> *of two disjoint p-heaps* <sup>h</sup><sup>1</sup> *and* <sup>h</sup><sup>2</sup> *is defined as their union:*

$$(h\_1 \circ h\_2)(\ell) = \begin{cases} h\_1(\ell) & \text{if } \ell \notin \text{dom}\,(h\_2), \\ h\_2(\ell) & \text{if } \ell \notin \text{dom}\,(h\_1). \end{cases}$$


**Fig. 1.** Definition of the satisfaction relation s, h, ρ |= A for SLLP.

*If* <sup>h</sup><sup>1</sup> *and* <sup>h</sup><sup>2</sup> *are not disjoint then* <sup>h</sup><sup>1</sup> ◦ <sup>h</sup><sup>2</sup> *is undefined.*

*The* weak composition <sup>h</sup><sup>1</sup> ◦ <sup>h</sup><sup>2</sup> *of two compatible p-heaps* <sup>h</sup><sup>1</sup> *and* <sup>h</sup><sup>2</sup> *is defined as their union, adding permissions at overlapping locations:*

(h<sup>1</sup> ◦ <sup>h</sup>2)() = ⎧ ⎪⎨ ⎪⎩ (v, π<sup>1</sup> <sup>⊕</sup> <sup>π</sup>2) *if* <sup>h</sup>1()=(v, π1) *and* <sup>h</sup>2()=(v, π2) <sup>h</sup>1() *if* ∈ dom (h2) <sup>h</sup>2() *if* ∈ dom (h1)

*If* <sup>h</sup><sup>1</sup> *and* <sup>h</sup><sup>2</sup> *are not compatible then* <sup>h</sup><sup>1</sup> ◦ <sup>h</sup><sup>2</sup> *is undefined.*

**Definition 3.4.** *The satisfaction relation* s, h, ρ <sup>|</sup><sup>=</sup> <sup>A</sup>*, where* <sup>s</sup> *is a stack,* <sup>h</sup> *<sup>a</sup> p-heap,* ρ *a valuation and* A *a formula, is defined by structural induction on* A *in Fig. 1. We write the* entailment <sup>A</sup> <sup>|</sup><sup>=</sup> <sup>B</sup>*, where* <sup>A</sup> *and* <sup>B</sup> *are formulas, to mean that if* s, h, ρ <sup>|</sup><sup>=</sup> <sup>A</sup> *then* s, h, ρ <sup>|</sup><sup>=</sup> <sup>B</sup>*. We write the* equivalence <sup>A</sup> <sup>≡</sup> <sup>B</sup> *to mean that* <sup>A</sup> <sup>|</sup><sup>=</sup> <sup>B</sup> *and* <sup>B</sup> <sup>|</sup><sup>=</sup> <sup>A</sup>*.*

#### **4 Logical Principles of SLLP**

In this section, we establish the main logical entailments and equivalences of SLLP that capture the various interactions between the separating conjunctions and ∗, permissions and labels. As well as being of interest in their own right, many of these principles will be essential in treating the practical verification examples in Sect. 5. In particular, the permission distribution principle for ∗ (cf. (3), Sect. 2) is given in Lemma 4.3, and the permission combination principle for labelled formulas (cf. (5), Sect. 2) is given in Lemma 4.4.

**Proposition 4.1.** *The following equivalences all hold in* SLLP*:*

$$\begin{array}{c} A \circledast B \equiv B \circledast A\\ A \circledast (B \circledast C) \equiv (A \circledast B) \circledast C\\ A \circledast \mathsf{emp} \equiv A \end{array} \qquad \begin{array}{c} A \ast B \equiv B \ast A\\ A \ast (B \ast C) \equiv (A \ast B) \ast C\\ A \ast \mathsf{emp} \equiv A \end{array}$$

*Additionally, the following residuation laws hold:*

$$A \vdash B \dashv \circledast C \iff A \circledast B \vdash C \quad \text{and} \quad A \left\vdash B \dashv \gets C \Leftrightarrow A \ast B \left\vdash C.$$

*In addition, we can always weaken* ∗ *to* -*:* <sup>A</sup> <sup>∗</sup> <sup>B</sup> <sup>|</sup><sup>=</sup> <sup>A</sup> -B*.*

Next, we establish an additional connection between the two separating conjunctions and ∗.

**Lemma 4.2 (**-/<sup>∗</sup> **distribution).** *For all formulas* <sup>A</sup>*,* <sup>B</sup>*,* <sup>C</sup> *and* <sup>D</sup>*,*

$$(A \circledast B) \* (C \circledast D) \bigsqcup (A \* C) \circledast (B \* D). \tag{\circledast / \*}$$

*Proof.* First we show a corresponding model-theoretic property: for any p-heaps <sup>h</sup>1, h2, h<sup>3</sup> and <sup>h</sup><sup>4</sup> such that (h<sup>1</sup> ◦ <sup>h</sup>2) ◦ (h<sup>3</sup> ◦ <sup>h</sup>4) is defined,

$$((h\_1 \boxplus h\_2) \circ (h\_3 \boxplus h\_4) = (h\_1 \circ h\_3) \boxplus (h\_2 \circ h\_4) \tag{6}$$

Since (h<sup>1</sup> ◦ <sup>h</sup>2) ◦ (h<sup>3</sup> ◦ <sup>h</sup>4) is defined by assumption, we have that <sup>h</sup><sup>1</sup> ◦ <sup>h</sup><sup>2</sup> and <sup>h</sup><sup>3</sup> ◦ <sup>h</sup><sup>4</sup> are disjoint and that <sup>h</sup><sup>1</sup> and <sup>h</sup>2, as well as <sup>h</sup><sup>3</sup> and <sup>h</sup><sup>4</sup> are compatible. In particular, <sup>h</sup><sup>1</sup> and <sup>h</sup><sup>3</sup> are disjoint, so <sup>h</sup><sup>1</sup> ◦ <sup>h</sup><sup>3</sup> is defined; the same reasoning applies to <sup>h</sup><sup>2</sup> and <sup>h</sup>4. Moreover, since <sup>h</sup><sup>1</sup> and <sup>h</sup><sup>2</sup> are compatible, <sup>h</sup><sup>1</sup> ◦ <sup>h</sup><sup>3</sup> and <sup>h</sup><sup>2</sup> ◦ <sup>h</sup><sup>4</sup> must be compatible and so (h<sup>1</sup> ◦ <sup>h</sup>3) ◦ (h<sup>2</sup> ◦ <sup>h</sup>4) is defined.

Now, writing <sup>h</sup> for (h<sup>1</sup> ◦ <sup>h</sup>2) ◦ (h<sup>3</sup> ◦ <sup>h</sup>4), and letting <sup>∈</sup> dom (h), we have

$$h(\ell) = \begin{cases} h\_1(\ell) & \text{if } \ell \notin \text{dom}\left(h\_3\right), \ell \notin \text{dom}\left(h\_4\right) \text{ and } \ell \notin \text{dom}\left(h\_2\right) \\ h\_2(\ell) & \text{if } \ell \notin \text{dom}\left(h\_3\right), \ell \notin \text{dom}\left(h\_4\right) \text{ and } \ell \notin \text{dom}\left(h\_1\right) \\ \left(v, \pi\_1 \oplus \pi\_2\right) & \text{if } \ell \notin \text{dom}\left(h\_3\right), \ell \notin \text{dom}\left(h\_4\right) \text{ and } h\_1(\ell) = \left(v, \pi\_1\right) \\ & \text{and } h\_2(\ell) = \left(v, \pi\_2\right) \\ h\_3(\ell) & \text{if } \ell \notin \text{dom}\left(h\_1\right), \ell \notin \text{dom}\left(h\_2\right) \text{ and } \ell \notin \text{dom}\left(h\_4\right) \\ h\_4(\ell) & \text{if } \ell \notin \text{dom}\left(h\_1\right), \ell \notin \text{dom}\left(h\_2\right) \text{ and } \ell \notin \text{dom}\left(h\_3\right) \\ \left(u, \pi\_3 \oplus \pi\_4\right) & \text{if } \ell \notin \text{dom}\left(h\_1\right), \ell \notin \text{dom}\left(h\_2\right) \text{ and } h\_3(\ell) = \left(u, \pi\_3\right) \\ & \text{and } h\_4(\ell) = \left(u, \pi\_4\right) \end{cases}$$

We can merge the first and fourth cases by noting that <sup>h</sup>()=(h<sup>1</sup> ◦ <sup>h</sup>3)() if ∈ dom (h<sup>2</sup> ◦ <sup>h</sup>4), and similarly for the second and fifth cases. We can also rewrite the last two cases by observing that /<sup>∈</sup> dom (h3) implies <sup>h</sup>1()=(h<sup>1</sup> ◦ <sup>h</sup>3)(), and so on, resulting in

$$h(\ell) = \begin{cases} (h\_1 \diamond h\_3)(\ell) & \text{if } \ell \notin \text{dom}\,(h\_2 \diamond h\_4) \\ (h\_2 \diamond h\_4)(\ell) & \text{if } \ell \notin \text{dom}\,(h\_1 \diamond h\_3) \\ (w, \sigma\_1 \oplus \sigma\_2) & \text{if } (h\_1 \diamond h\_3)(\ell) = (w, \sigma\_1) \text{ and } (h\_2 \diamond h\_4)(\ell) = (w, \sigma\_2) \end{cases}$$

$$= ((h\_1 \diamond h\_3) \overline{\circ} (h\_2 \diamond h\_4))(\ell).$$

Now we show the main result. Suppose s, h, ρ <sup>|</sup>= (A-<sup>B</sup>)<sup>∗</sup> (<sup>C</sup> -D). This gives us <sup>h</sup> = (h<sup>1</sup> ◦ <sup>h</sup>2) ◦ (h<sup>3</sup> ◦ <sup>h</sup>4), where s, h1, ρ <sup>|</sup><sup>=</sup> <sup>A</sup> and s, h2, ρ <sup>|</sup><sup>=</sup> <sup>B</sup> and s, h3, ρ <sup>|</sup><sup>=</sup> <sup>C</sup> and s, h4, ρ <sup>|</sup><sup>=</sup> <sup>D</sup>. By Eq. (6), we have <sup>h</sup> = (h<sup>1</sup> ◦ <sup>h</sup>3) ◦ (h<sup>2</sup> ◦ <sup>h</sup>4), which gives us exactly that s, h, ρ <sup>|</sup>= (<sup>A</sup> <sup>∗</sup> <sup>C</sup>) -(<sup>B</sup> <sup>∗</sup> <sup>D</sup>), as required.

Next, we establish principles for distributing permissions over various connectives, in particular over the strong ∗, stated earlier as (3) in Sect. 2.

**Lemma 4.3 (Permission distribution).** *The following equivalences hold for all formulas* A *and* B*, and permissions* π *and* σ*:*

$$\left(A^{\sigma}\right)^{\pi} \equiv A^{\sigma \otimes \pi} \tag{\otimes}$$

$$(A \lor B)^{\pi} \equiv A^{\pi} \lor B^{\pi} \tag{\vee^{\pi}}$$

$$(A \land B)^{\pi} \equiv A^{\pi} \land B^{\pi} \tag{\wedge^{\pi}}$$

$$(A \ast B)^{\pi} \equiv A^{\pi} \ast B^{\pi} \tag{\ast^{\pi}}$$

*Proof.* We just show the most interesting case, (∗<sup>π</sup>). First of all, we establish a corresponding model-theoretic property: for any permission π and disjoint pheaps <sup>h</sup><sup>1</sup> and <sup>h</sup>2, meaning <sup>h</sup><sup>1</sup> ◦ <sup>h</sup><sup>2</sup> is defined,

$$
\pi \cdot (h\_1 \circ h\_2) = (\pi \cdot h\_1) \circ (\pi \cdot h\_2). \tag{7}
$$

To see this, we first observe that for any <sup>∈</sup> dom (h<sup>1</sup> ◦ <sup>h</sup>2), we have that either <sup>∈</sup> dom (h1) or <sup>∈</sup> dom (h2). We just show the case <sup>∈</sup> dom (h1), since the other is symmetric. Writing <sup>h</sup>1()=(v1, π1), and using the fact that ∈ dom (h2),

$$
\pi \cdot (h\_1 \circ h\_2)(\ell) = (v\_1, \pi \otimes \pi\_1) = (\pi \cdot h\_1)(\ell) = ((\pi \cdot h\_1) \circ (\pi \cdot h\_2))(\ell).
$$

Now for the main result, let s, h and ρ be given. We have

$$\begin{array}{lcl} & s, h, \rho \vdash (A\*B)^{\pi} \\ \Leftrightarrow & h = \pi \cdot h' \text{ and } s, h', \rho \vdash A\*B \\ \Leftrightarrow & h = \pi \cdot h' \text{ and } h' = h\_1 \circ h\_2 \text{ and } s, h\_1, \rho \vdash A \text{ and } s, h\_2, \rho \vdash B \\ \Leftrightarrow & h = \pi \cdot (h\_1 \circ h\_2) \text{ and } s, h\_1, \rho \vdash A \text{ and } s, h\_2, \rho \vdash B \\ \Leftrightarrow & h = (\pi \cdot h\_1) \circ (\pi \cdot h\_2) \text{ and } s, h\_1, \rho \vdash A \text{ and } s, h\_2, \rho \vdash B \\ \Leftrightarrow & h = h'\_1 \circ h'\_2 \text{ and } s, h'\_1, \rho \vdash A^{\pi} \text{ and } s, h'\_2, \rho \vdash B^{\pi} \\ \Leftrightarrow & s, h, \rho \vdash A^{\pi} \ast B^{\pi}. \end{array} \text{ by (7)}$$

We now establish the main principles for dividing and combining permissions formulas using -. As foreshadowed in Sect. 2, the combination principle holds only for formulas that are conjoined with a nominal label (cf. Eq. (5)).

**Lemma 4.4 (Permission division and combination).** *For all formulas* A*, nominals* <sup>α</sup>*, and permissions* <sup>π</sup>1, π<sup>2</sup> *such that* <sup>π</sup><sup>1</sup> <sup>⊕</sup> <sup>π</sup><sup>2</sup> *is defined:*

$$A^{\pi\_1 \oplus \pi\_2} = A^{\pi\_1} \circledast A^{\pi\_2} \tag{Split \circledast})$$

$$\left| \begin{pmatrix} \alpha \wedge A \end{pmatrix}^{\pi\_1} \oplus \left( \alpha \wedge A \right)^{\pi\_2} \right| = \left( \alpha \wedge A \right)^{\pi\_1 \oplus \pi\_2} \tag{Join \oplus} \text{ } \tag{Join \oplus} \text{)}$$

*Proof.* **Case (**Split -**):** Suppose that s, h, ρ <sup>|</sup><sup>=</sup> <sup>A</sup>π1⊕π<sup>2</sup> . We have <sup>h</sup> = (π1⊕π2)·h , where s, h , ρ <sup>|</sup><sup>=</sup> <sup>A</sup>. That is, for any <sup>∈</sup> dom (h), we have <sup>h</sup> ()=(v, π) say and, using the permissions algebra axiom (left-dist) from Definition 3.1,

$$h(\ell) = (v, (\pi\_1 \oplus \pi\_2) \otimes \pi) = (v, (\pi\_1 \otimes \pi) \oplus (\pi\_2 \otimes \pi)).$$

Now we define p-heaps h<sup>1</sup> and h2, both with domain exactly dom (h), by

$$h\_i(\ell) = (v, \pi\_i \otimes \pi) \iff h'(\ell) = (v, \pi) \qquad \text{for } i \in \{1, 2\}.$$

By construction, <sup>h</sup><sup>1</sup> <sup>=</sup> <sup>π</sup><sup>1</sup> · <sup>h</sup> and <sup>h</sup><sup>2</sup> <sup>=</sup> <sup>π</sup><sup>2</sup> · <sup>h</sup> . Since s, h , ρ <sup>|</sup><sup>=</sup> <sup>A</sup>, this gives us s, h1, ρ <sup>|</sup><sup>=</sup> <sup>A</sup><sup>π</sup><sup>1</sup> and s, h2, ρ <sup>|</sup><sup>=</sup> <sup>A</sup><sup>π</sup><sup>2</sup> . Furthermore, also by construction, <sup>h</sup><sup>1</sup> and <sup>h</sup><sup>2</sup> are compatible, with <sup>h</sup> <sup>=</sup> <sup>h</sup><sup>1</sup> ◦ <sup>h</sup>2. Thus s, h, ρ <sup>|</sup><sup>=</sup> <sup>A</sup><sup>π</sup><sup>1</sup> -A<sup>π</sup><sup>2</sup> , as required.

**Case (**Join -**):** First of all, we show that for any p-heap h,

$$(\pi\_1 \cdot h) \circ (\pi\_2 \cdot h) = (\pi\_1 \oplus \pi\_2) \cdot h. \tag{8}$$

To see this, we observe that for any <sup>∈</sup> dom (h), writing <sup>h</sup>()=(v, π) say,

$$\begin{array}{l} & \left( (\pi\_1 \oplus \pi\_2) \cdot h \right)(\ell) \\ = \left( v, (\pi\_1 \oplus \pi\_2) \otimes \pi \right) \\ = \left( v, (\pi\_1 \otimes \pi) \oplus (\pi\_2 \otimes \pi) \right) \qquad \text{by (left-dist)} \\ = (h\_1 \oplus h\_2)(\ell) \text{ where } h\_1(\ell) = (v, \pi\_1 \otimes \pi) \text{ and } h\_2 = (v, \pi\_2 \otimes \pi) \\ = ((\pi\_1 \cdot h) \oplus (\pi\_2 \cdot h))(\ell). \end{array}$$

Now, for the main result, suppose s, h, ρ <sup>|</sup>= (<sup>α</sup> <sup>∧</sup> <sup>A</sup>)<sup>π</sup><sup>1</sup> - (<sup>α</sup> <sup>∧</sup> <sup>A</sup>)<sup>π</sup><sup>2</sup> . We have <sup>h</sup> <sup>=</sup> <sup>h</sup><sup>1</sup> ◦ <sup>h</sup><sup>2</sup> where s, h1, ρ <sup>|</sup>= (<sup>α</sup> <sup>∧</sup> <sup>A</sup>)<sup>π</sup><sup>1</sup> and s, h2, ρ <sup>|</sup>= (<sup>α</sup> <sup>∧</sup> <sup>A</sup>)<sup>π</sup><sup>2</sup> . That is, <sup>h</sup> = (π<sup>1</sup> · <sup>h</sup> <sup>1</sup>) ◦ (π<sup>2</sup> · <sup>h</sup> <sup>2</sup>), where s, h <sup>1</sup>, ρ <sup>|</sup><sup>=</sup> <sup>α</sup> <sup>∧</sup> <sup>A</sup> and s, h <sup>2</sup>, ρ <sup>|</sup><sup>=</sup> <sup>α</sup> <sup>∧</sup> <sup>A</sup>. Thus h <sup>1</sup> = h <sup>2</sup> <sup>=</sup> <sup>ρ</sup>(α) and so, by (8), we have <sup>h</sup> = (π<sup>1</sup> <sup>⊕</sup>π2)·h <sup>1</sup>, where s, h <sup>1</sup>, ρ <sup>|</sup><sup>=</sup> <sup>α</sup>∧A. This gives us s, h, ρ <sup>|</sup>= (<sup>α</sup> <sup>∧</sup> <sup>A</sup>)<sup>π</sup>1⊕π<sup>2</sup> , as required.

Lastly, we state some useful principles for labels and the "jump" modality.

**Lemma 4.5 (Labelling and jump).** *For all formulas* A *and labels* α*,*

$$\otimes\_{\alpha} A \wedge \alpha^{\pi} \vdash A^{\pi} \tag{@ \text{ Elim}}$$

$$(\alpha \wedge A)^{\pi} \Vdash \otimes\_{\alpha} A \tag{@ \text{ Intro}}$$

$$\left(\otimes\_{\alpha} (\beta\_1 \, ^\pi \ast \beta\_2 \, ^\sigma) \wedge (\beta\_1 \, ^\pi \circledast \beta\_2 \, ^\sigma) \right) \vdash \alpha \wedge (\beta\_1 \, ^\pi \ast \beta\_2 \, ^\sigma) \tag{@ \,/\, \ast / \circledast})$$

*Proof.* We just show the case (@/ <sup>∗</sup> /-), the others being easy. Suppose s, h, ρ <sup>|</sup><sup>=</sup> @α(β<sup>1</sup> <sup>π</sup> <sup>∗</sup> <sup>β</sup><sup>2</sup> <sup>σ</sup>) <sup>∧</sup> (β<sup>1</sup> <sup>π</sup> β<sup>2</sup> <sup>σ</sup>), meaning that s, ρ(α), ρ <sup>|</sup><sup>=</sup> <sup>β</sup><sup>1</sup> <sup>π</sup> <sup>∗</sup> <sup>β</sup><sup>2</sup> <sup>σ</sup> and s, h, ρ <sup>|</sup><sup>=</sup> β1 <sup>π</sup> β<sup>2</sup> <sup>σ</sup>. Then we have <sup>ρ</sup>(α)=(<sup>π</sup> · <sup>ρ</sup>(β1)) ◦ (<sup>σ</sup> · <sup>ρ</sup>(β2)), while <sup>h</sup> = (<sup>π</sup> · <sup>ρ</sup>(β1)) ◦ (<sup>σ</sup> · <sup>ρ</sup>(β2)). Since ◦ is defined only when its arguments are disjoint p-heaps, we obtain that <sup>h</sup> <sup>=</sup> <sup>ρ</sup>(α)=(<sup>π</sup> · <sup>ρ</sup>(β1)) ◦ (<sup>σ</sup> · <sup>ρ</sup>(β2)). Thus s, h, ρ <sup>|</sup><sup>=</sup> <sup>α</sup> <sup>∧</sup> (β<sup>1</sup> <sup>π</sup> <sup>∗</sup> <sup>β</sup><sup>2</sup> <sup>σ</sup>).

$$\begin{array}{c} \{A\_{1}\} \, C\_{1} \, \{B\_{1}\} \quad \{A\_{2}\} \, C\_{2} \, \{B\_{2}\} \\ \hline \{A\_{1} \circledast A\_{2}\} \, C\_{1} \, \|\, C\_{2} \, \{B\_{1} \circledast B\_{2}\} \end{array} \text{(iii) (Par)} \qquad \begin{array}{c} \{\alpha \wedge A\} \, C \, \{B\} \\ \hline \{A\} \, C \, \{B\} \end{array} \text{([E3] (Label)} $$
 
$$ \begin{array}{c} \{A\} \, C \, \{B\} \\ \hline \{A\*F\} \, C \, \{B\*F\} \end{array} \text{(\dagger,\sharp) (Frame\*)} \qquad \begin{array}{c} \{A\} \, C \, \{B\} \\ \hline \{A \otimes F\} \, C \, \{B \otimes F\} \end{array} \text{(\dagger) (Frame\* \circledast)} $$

$$\begin{aligned} \text{(\boxed{)}} \text{ ModVars}(C\_2) \cap \text{FreeVars}(A\_1, B\_1) &= \text{ModVars}(C\_1) \cap \text{FreeVars}(A\_2, B\_2) = \emptyset \\ \text{(\boxed{)}} \text{ \alpha \text{ fresh} \qquad (\dagger)} \text{ ModVars}(C) \cap \text{FreeVars}(F) &= \emptyset \end{aligned}$$

**Fig. 2.** The key CSL proof rules used in our examples; not shown are standard rules for consequence, conditionals, load/store, etc. The fresh-labelling rule (Label) and combination of both weak (Frame -) and strong (Frame ∗) frame rules are novel to our approach. We require weak conjunction for the parallel rule (Par).

#### **5 Concurrent Program Verification Examples**

In this section, we demonstrate how SLLP can be used in conjunction with the usual principles of CSL to construct verification proofs of concurrent programs, taking three examples of increasing complexity.

Our examples all operate on *binary trees* in memory, defined as usual in separation logic (again note the use of ∗ rather than -):

$$(\mathsf{tree}(x)\ \ =\_{\text{def}}\ (x = null \land \mathsf{empty}) \lor (\exists d, l, r.\ \ x \mapsto (d, l, r) \* \mathsf{tree}(l) \* \mathsf{tree}(r)).$$

Our proofs employ (a subset of) the standard rules of CSL—with the most important being the concurrency rule from the Introduction, the separation logic *frame rules* for both ∗ and -, and a new rule enabling us to introduce fresh labels into the precondition of a triple (similar to the way Hoare logic usually handles existential quantifiers). These key rules are shown in Fig. 2. We simplify our Hoare triple to remove elements to handle function call/return and furthermore omit the presentation of the standard collection of rules for consequence, load, store, if-then-else, assignment, etc.; readers interested in such aspects can consult [1]. Both of our frame rules have the usual side condition on modified program variables. The strong frame rule (Frame ∗) has an additional side condition that will be discussed in Sect. 5.3; until then it is trivially satisfied.

#### **5.1 Parallel Read**

Consider the following program:

```
check(x) {
  if (x == null) { return; }
  read(x); read(x);
}
```
This is intended to be a straightforward example where we take a tree rooted at x and, if x is non-null, split into parallel threads that run the program read on <sup>x</sup>, and whose specification is {α<sup>π</sup> <sup>∧</sup> tree(x)σ} read(x) {α<sup>π</sup> <sup>∧</sup> tree(x)σ}. We prove that check satisfies the specification {tree(x)π} check(x) {tree(x)π}; the verification proof is in Fig. 3. The proof makes use of the basic operations of our theory: labelling, splitting and joining. The example follows precisely these steps, starting by labelling the formula tree(x)<sup>π</sup> <sup>∧</sup> <sup>x</sup> <sup>=</sup> null with <sup>α</sup>. The concurrency rule (Par) allows us to put formulas back together after the parallel call, and the two copies (<sup>α</sup> <sup>∧</sup> tree(x)<sup>π</sup>)<sup>0</sup>.<sup>5</sup> that were obtained are glued back together to yield tree(x)<sup>π</sup>, since they have the same label.

**Fig. 3.** Verification proof of program check in Example 5.1.

#### **5.2 Parallel Tree Processing (Le and Hobor [24])**

Consider the following program, which was also employed as an example in [24]:

```
proc(x) {
  if (x == null) { return; }
  print(x->d); print(x->d);
  proc(x->l); proc(x->l);
  proc(x->r); proc(x->r);
}
```
This code takes a tree rooted at x and, if x is non-null, splits into parallel threads that call proc recursively on its left and right branches. We prove, in Fig. 4, that proc satisfies the specification {<sup>α</sup> <sup>∧</sup> tree(x)π} proc(x) {<sup>α</sup> <sup>∧</sup> tree(x)π}. First we unroll the definition of tree(x) and distribute the permission over Boolean connectives and ∗. If the tree is empty the process stops. Otherwise, we label each component with a new label and introduce the "jump" statement @α(β<sup>1</sup> <sup>∗</sup> <sup>β</sup><sup>2</sup> <sup>∗</sup> <sup>β</sup>3), recording the decomposition of the tree into its three components. Since such statements are *pure*, i.e. independent of the heap, we can "carry" this formula along our computation without interfering with the frame rule(s). Now that every subregion is labelled, we split the formula into two copies, each with half share, but after distributing <sup>0</sup>.5 over <sup>∗</sup> and <sup>∧</sup> we end up with half shares in the labels as well. We relabel each subregion with new "whole" labels, and again introduce pure @-formulas that record the relation between the old and the new labels. At this moment we enter the parallel threads and recursively apply proc to the left and right subtrees of x. Assuming the specification of proc for subtrees of x, we then retrieve the original label α from the trail of crumbs left by the @-formulas. We can then recombine the α-labelled threads using (Join -) to arrive at the desired postcondition.

#### **5.3 Cross-thread Data Transfer**

Our previous examples involve only "isolated tank" concurrency: a program has some resources and splits them into parallel threads that do not communicate with each other before—remembering Humpty Dumpty!—ultimately re-merging. For our last example, we will show that our technique is expressive enough to handle more sophisticated kinds of sharing, in particular inter-thread coarsegrained communication. We will show that we can not only share read-only data, but in fact prove that one thread has acquired the full ownership of a structure, even when the associated root pointers are not easily exposed.

To do so, we add some communication primitives to our language, together with their associated Hoare rules. Coarse-grained concurrency such as locks, channels, and barriers have been well-investigated in various flavours of concurrent separation logic [19,26,31]. We will use a channel for our example in this section but with simplified rules: the Hoare rule for a channel c to send message number i whose message invariant is R<sup>c</sup> <sup>i</sup> is {R<sup>c</sup> <sup>i</sup> (x)} send(c, x) {emp}, while the corresponding rule to receive is {emp} receive(c) {λ*ret*. R<sup>c</sup> <sup>i</sup> (*ret*)}. We ignore details such as identifying which party is allowed to send/receive at a given time [14] or the resource ownership of the channel itself [18].

These rules interact poorly with the strong frame rule from Fig. 2:

$$\frac{\{A\} \, C \, \{B\}}{\{A\*F\} \, C \, \{B\*F\}} \begin{pmatrix} \dagger, \ddagger \end{pmatrix} \begin{pmatrix} \text{Fame } \* \end{pmatrix} \qquad \begin{pmatrix} \dagger \text{ ModVars}(C) \cap \text{FreeVars}(F) = \emptyset \end{pmatrix}$$

The revealed side condition (‡) means that <sup>C</sup> does not contain any subcommands that "transfer in" resources, such as unlock, receive, etc.; this side condition is a bit stronger than necessary but has a simple definition and can be checked syntactically. Without (‡), we can reach a contradiction. Assume that the current

**Fig. 4.** Verification proof of Le and Hobor's program from [24] in Example 5.2.

**Fig. 5.** Verification proof of the top and bottom of transfer in Example 5.3.

message invariant R<sup>c</sup> <sup>i</sup> is <sup>x</sup> <sup>0</sup>.<sup>5</sup> → <sup>a</sup>, which has been sent by thread B. Now thread A, which had the other half of x <sup>0</sup>.<sup>5</sup> → <sup>a</sup>, can reason as follows:

$$\frac{\{\mathsf{emp}\}\,\mathsf{reciceive(c)}\,\{x\stackrel{0.5}{\right\stackrel{0.5}{\right\stackrel{0.5}{\right\stackrel{0.5}{\rightleftharpoons}}}\,a\}}{\{\mathsf{emp}\,\ast\,x\stackrel{0.5}{\right\stackrel{0.5}{\right\stackrel{0.5}{\rightleftharpoons}}\,a\}}\text{ (Frame \text{ \*} )}\left(\frac{\mathsf{x}}{x\stackrel{0.5}{\rightleftharpoons}\,a\,\ast\,x\stackrel{0.5}{\rightleftharpoons}\,a\}}\left(\frac{\mathsf{Frame \*}}{}),\text{ without (\sharp)}$$

The postcondition is a contradiction as no location strongly separates from itself. However, given (‡) the strong frame rule can be proven by induction.

The consequence of (‡), from a verification point of view, is that when resources are transferred in they arrive *weakly separated*, by -, since we must use the weak frame rule around the receiving command. The troublesome issue is that this newly "arriving" state can thus --overlap awkwardly with the existing state. Fortunately, judicious use of labels can sort things out.

Consider the code in Fig. 5. The basic idea is simple: we create some data at the top (line 101) and then split its ownership 50-50 to two threads. The left thread finds a subtree, and passes its half of that subtree to the right via a channel. The right thread receives the root of that subtree, and thus has full ownership of that subtree along with half-ownership of the rest of the tree. Accordingly, the right thread can modify that subtree before notifying the left subtree and passing half of the modified subtree back. After merging, full ownership of the entire tree is restored and so on line 401 the program can delete it. Figure 5 only contains the proof and line numbers for the top and bottom shared portions. The left and the right thread's proofs appear in Fig. 6.

By this point the top and bottom portions of the verification are straightforward. After creating the tree tree(rt) at line 102, we introduce the label α, split the formula using (Split -), and then pass (α∧tree(rt))<sup>0</sup>.<sup>5</sup> to both threads. After the parallel execution, due to the call to modify(sub) in the right thread, the tree has changed in memory. Accordingly, the label for the tree must also change as indicated by the (<sup>∧</sup>tree(rt))<sup>0</sup>.<sup>5</sup> in both threads after parallel processing. These are then recombined on line 400 using the re-combination principle (Join -), before the tree is deallocated via standard sequential techniques.

Let us now examine the more interesting proofs of the individual threads in Fig. 6. Line 201 calls the find function, which searches a binary tree for a subtree rooted with key key. Following Cao *et al.* [13] we specify find as follows:

```
{ tree(x)
         π } find(x) { λret.

                                tree(ret) ∗ (tree(ret) −−∗ tree(x))π }
```
Here *ret* is bound to the return value of find, and the postcondition can be considered to represent the returned subtree tree(*ret*) separately from the treewith-a-hole tree(*ret*) −−∗ tree(x), using a <sup>∗</sup>/−−∗ style to represent replacement as per Hobor and Villard [20]. This is the invariant on line 202.

Line 203 then attaches the fresh labels <sup>β</sup> and <sup>γ</sup> to the <sup>∗</sup>-separated subparts, and line 204 snapshots the formula current at label α using the @ operator; @<sup>π</sup> αP should be read as "when one has a π-fraction of α, P holds"; it is definable using @ and an existential quantifier over labels. On line 205 we forget (in the left thread) the label α for the current heap for housekeeping purposes, and then on line 206 we weaken the strong separating conjunction ∗ to the weak one - before sending the root of the subtree sub on line 207.

In the transfer program, the invariant for the first channel message is

(<sup>β</sup> <sup>∧</sup> tree(sub))<sup>0</sup>.<sup>5</sup> <sup>∧</sup> @<sup>0</sup>.<sup>5</sup> <sup>α</sup> ((<sup>β</sup> <sup>∧</sup> tree(sub)) <sup>∗</sup> (<sup>γ</sup> <sup>∧</sup> (tree(sub) −−∗ tree(rt))))<sup>0</sup>.5

In other words, half of the ownership of the tree rooted at sub plus the (pure) @-fact about the shape of the heap labeled by α. Comparing lines 206 and 208 we can see that this information has been shipped over the wire (the @-information has been dropped since no longer needed). The left thread then continues to process until synchronizing again with the receive in line 211.

Before we consider the second synchronization, however, let us instead jump to the corresponding receive in the right thread at line 303. After the receive, the invariant on line 304 has the (weakly separated) resources sent from the left thread on line 206. We then "jump" label α using the @-information to reach line 305. We can redistribute the <sup>β</sup> inside the <sup>∗</sup> on line 306 since we already know that β and γ are disjoint. On line 307 we reach the payoff by combining both halves of the subtree sub, enabling the modification of the subtree in line 308.

On line 310 we label the two subheaps, and specialize the magic wand so that given the specific heap δ it will yield the specific heap ; we also record the pure fact that <sup>γ</sup> and <sup>δ</sup> are disjoint, written <sup>γ</sup> <sup>⊥</sup> <sup>δ</sup>. On line 311 we snapshot <sup>γ</sup> and split the tree sub 50-50; then on line 312 we push half of sub out of the strong ∗. On line 313 we combine the subtree and the tree-with-hole to reach the final tree . We then send on line 314 with the channel's second resource invariant:

(<sup>δ</sup> <sup>∧</sup> tree(sub))<sup>0</sup>.<sup>5</sup> <sup>∧</sup> <sup>γ</sup> <sup>⊥</sup> <sup>δ</sup> <sup>∧</sup> @<sup>0</sup>.<sup>5</sup> <sup>γ</sup> ((<sup>δ</sup> <sup>∧</sup> tree(sub)) −−∗ ( <sup>∧</sup> tree(rt)))<sup>0</sup>.<sup>5</sup>

After the send, on line 315 we have reached the final fractional tree .

Back in the left-hand thread, the second send is received in line 211, leading to the weakly-separated postcondition in line 212. In line 213 we "jump" label γ, and then in line 214 we use the known disjointness of γ and δ to change the to ∗. Finally in line 215 we apply the magic wand to reach the postcondition.

#### **6 Conclusions and Future Work**

We propose an extension of separation logic with fractional permissions [4] in order to reason about sharing over arbitrary regions of memory. We identify two fundamental logical principles that fail when the "weak" separating conjunction is used in place of the usual "strong" ∗, the first being distribution of permissions—A<sup>π</sup> -<sup>B</sup><sup>π</sup> |= (A-B)π—and the second being the re-combination of permission-divided formulas, A<sup>π</sup> - <sup>A</sup><sup>σ</sup> |<sup>=</sup> <sup>A</sup>π⊕σ. We avoid the former difficulty by *retaining* the strong ∗ in the formalism alongside -, and the latter by using nominal *labels*, from hybrid logic, to record exact aliasing between read-only copies of a formula.

The main previous work addressing these issues, by Le and Hobor [24], uses a combination of permissions based on *tree shares* [17] and semantic side conditions on formulas to overcome the aforementioned problems. The *rely-guarantee* separation logic in [30] similarly restricts concurrent reasoning to structures described by precise formulas only. In contrast, our logic is a little more complex, but we can use permissions of any kind, and do not require side conditions. In addition, our use of labelling enables us to handle examples involving the transfer of data structures between concurrent threads.

On the other hand, we think it probable that the kind of examples we consider in this paper could also be proven by hand in at least some of the verification formalisms derived from CSL (e.g. [16,22,27]). For example, using the "concurrent abstract predicates" in [16], one can explicitly declare shared regions of memory in a fairly ad-hoc way. However, such program logics are typically very complicated and, we believe, quite unlikely to be amenable to automation.

We feel that the main appeal of the present work lies in its relative simplicity—we build on standard CSL with permissions and invoke only a modest amount of extra syntax—which bodes well for its potential automation (at least for simpler examples). In practical terms, an obvious way to proceed would be to develop a prototype verifier for concurrent programs based on our logic SLLP. An important challenge in this area is to develop heuristics—e.g., for splitting, labelling and combining formulas—that work acceptably well in practice.

An even greater challenge is to move from *verifying* user-provided specifications to *inferring* them automatically, as is done e.g. by Facebook Infer. In separation logic, this crucially depends on solving the *biabduction* problem, which aims to discover "best fit" solutions for applications of the frame rule [9,11]. In the CSL setting, a further problem seems to lie in deciding how applications of the concurrency rule should divide resources between threads.

Finally, automating the verification approach set out in this paper will likely necessitate restricting our full logic to some suitably tractable fragment, e.g. one analogous to the well-known *symbolic heaps* in standard separation logic (cf. [2,15]). The identification of such tractable fragments is another important theoretical problem in this area. It is our hope that this paper will serve to stimulate interest in the automation of concurrent separation logic in particular, and permission-sensitive reasoning in general.

#### **References**

1. Appel, A.W., et al.: Program Logics for Certified Compilers. Cambridge University Press, New York (2014)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Local Reasoning About the Presence of Bugs: Incorrectness Separation Logic**

Azalea Raad1(B) , Josh Berdine<sup>2</sup>, Hoang-Hai Dang<sup>1</sup>, Derek Dreyer<sup>1</sup>, Peter O'Hearn2,3, and Jules Villard<sup>2</sup>

> <sup>1</sup> Max Planck Institute for Software Systems (MPI-SWS), Kaiserslautern and Saarbr¨ucken, Germany {azalea,haidang,dreyer}@mpi-sws.org <sup>2</sup> Facebook, London, UK {jjb,peteroh,jul}@fb.com <sup>3</sup> University College London, London, UK

**Abstract.** There has been a large body of work on local reasoning for proving the *absence* of bugs, but none for proving their *presence*. We present a new formal framework for local reasoning about the presence of bugs, building on two complementary foundations: 1) separation logic and 2) incorrectness logic. We explore the theory of this new *incorrectness separation logic* (ISL), and use it to derive a begin-anywhere, intra-procedural symbolic execution analysis that has no false positives *by construction*. In so doing, we take a step towards transferring modular, scalable techniques from the world of program verification to bug catching.

**Keywords:** Program logics · Separation logic · Bug catching

## **1 Introduction**

There has been significant research on sound, local reasoning about the state for proving the absence of bugs (e.g., [2,13,26,29,30,41]). Locality leads to techniques that are compositional *both* in code (concentrating on a program component) and in the resources accessed (spatial locality), without tracking the entire global state or the global program within which a component sits. Compositionality enables reasoning to scale to large teams and codebases: reasoning can be done even when a global program is not present (e.g., a library, or during program construction), without having to write the analogue of a test or verification harness, and the results of reasoning about components can be composed efficiently [11].

Meanwhile, many of the practical applications of symbolic reasoning have aimed at proving the *presence* of bugs (i.e., bug catching), rather than proving their absence (i.e., correctness). Logical bug catching methods include symbolic model checking [7,12] and symbolic execution for testing [9]. These methods are usually formulated as global analyses; but, the rationale of local reasoning holds just as well for bug catching as it does for correctness: it has the potential to benefit scalability, reasoning about incomplete code, and continuous incremental reasoning about a changing codebase within a continuous integration (CI) system [34]. Moreover, local evidence of a bug without usually-irrelevant contextual information can be more convincing and easier to understand and correct.

There do exist symbolic bug catchers that, at least partly, address scalability and continuous reasoning. Tools such as Coverity [5,32] and Infer [18] hunt for bugs in large codebases with tens of millions of LOC, and they can even run incrementally (within minutes for small code changes), which is compatible with deployment in CI to detect regressions. However, although such tools intuitively share ideas with correctness-based compositional analyses [16], the existing foundations of correctness-based analyses do not adequately explain what these bug-catchers do, why they work, or the extent to which they work in practice.

A notable such example is the relation between *separation logic* (SL) and Infer. SL provides novel techniques for local reasoning [28], with concise specifications that focus only on the memory accessed [36]. Using SL, symbolic execution need not begin from a "main" program, but rather can "begin anywhere" in a codebase, with constraints on the environment synthesized along the way. When analyzing a component, SL's frame rule is used in concert with abductive inference to isolate a description of the memory utilized by the component [11]. Infer was closely inspired by SL, and demonstrates the power of SL's local reasoning: the ability to begin anywhere supports incremental analysis in CI, and compositionality leads to highly scalable methods. These features have led to non-trivial impact: a recent paper quotes over 100,000 Infer-reported bugs fixed in Facebook's codebases, and thousands of security bugs found by a compositional taint analyzer, Zoncolan [18]. However, Infer reports bugs using *heuristics based on failed proofs*, whereas the SL theory behind Infer is based on *overapproximation* [11]. Thus, a critical aspect of Infer's successful deployment is not supported by the theory that inspired it. This is unfortunate, especially given that the begin-anywhere and scalable aspects of Infer's algorithms do not appear to be fundamentally tied to over-approximation.

In this paper, we take a step towards transferring the local reasoning techniques from the world of program verification to that of bug catching. To approach the problem from first principles, we do not try to understand tools such as Coverity and Infer as they are. Instead, we take their existence and reported impact as motivation for revisiting the foundations of SL, this time re-casting it as a formalism for proving the *presence* of bugs rather than their absence.

Our new logic, *incorrectness separation logic* (ISL), marries local reasoning based on SL's frame rule with the recently-advanced incorrectness logic [35], a formalism for reasoning about errors based on an *under-approximate* analogue of Hoare triples [43]. We observe that the original SL model, based on partial heaps, is incompatible with local, under-approximate reasoning. The problem is that the original model does not distinguish a pointer known to be dangling from one about which we have no knowledge; this in turn contradicts the frame rule for under-approximate reasoning. However, we recover the frame rule for a refined model with negative heap assertions of the form x -→ , read "invalidated x", stating that the location at x has been deallocated (and not re-allocated). Negative heaps were present informally in the original Infer, unsupported by theory but added for reporting use-after-free bugs (i.e., not for proving correctness). Interestingly, this semantic feature is needed in ISL for logical (and not merely pragmatic) reasons, in that it yields a *sound* logic for proving the presence of bugs: when ISL identifies a bug, then there is indeed a bug (no false positives), given the assumptions of the underlying ISL model. (That is, as usual, soundness is a relationship between assumptions and conclusions, and whether those assumptions match reality (i.e., running code) is a separate concern, outside the purview of logic.)

As well as being superior for bug reporting, our new model has a pleasant fundamental property in that it meshes better with intuitions originally expressed of SL. Specifically, our model admits a *footprint theorem*, stating that the meaning of a command is solely determined by its transitions on input-output heaplets of minimal size (including only the locations accessed), a theorem that was not true in full generality for the original SL model. Interestingly, ISL supports local reasoning for technically simpler reasons than the original SL (see Sect. 4.2).

We validate part of the ISL promise using an illustrative program analysis, Pulse, and use it to detect *memory safety bugs*, namely null-pointerdereference and use-after-free bugs. Pulse is written inside Infer [18] and deployed at Facebook where it is used to report issues to C++ developers. Pulse is currently under active development. In this paper, we explore the *intra-procedural* analysis, i.e., how it provides purely local reasoning about one procedure at a time without using results from other procedures; we defer formalising its *interprocedural* (between procedures) analysis to future work. While leaving out the inter-procedural capabilities of Pulse only partly validates the promise of the ISL theory, it already demonstrates how ISL can scale to large codebases, and run incrementally in a way compatible with CI. Pulse thus has the capability to begin anywhere, and it achieves scalability while embracing under- rather than over-approximation.

**Outline.** In Sect. 2 we present an intuitive account of ISL. In Sect. 3 we present the ISL proof system. In Sect. 4 we present the semantic model of ISL. In Sect. 5 we present our ISL-based Pulse analysis. In Sect. 6 we discuss related work and conclude. The full proofs of all stated theorems are given in the technical appendix [38].

#### **2 Proof of a Bug**

We proceed with an intuitive description of ISL for detecting memory safety bugs. To do this, in Fig. 1 we present an example of C++ use-after-lifetime bug, abstracted from real occurrences we have observed at Facebook, where use-afterlifetime bugs were one of the leading developer requests for C++ analysis. Given a vector v, a call to push back(v) in the std::vector library may cause the internal array backing v to be (deallocated and subsequently) reallocated when v

```
void deref_after_pb(std::vector<int> *v) {
 int *x = &v->at(1);
 v->push_back(42);
 std::cout << *x << "\n"; }
push_back.cpp:7: error: VECTOR_INVALIDATION. accessing memory that was
potentially invalidated by 'std::vector::push_back()' on line 6.
 5. int *x = &(v->at(1));
 6. v->push_back(42);
 7. > std::cout << *x << "\n"; }
```
**Fig. 1.** The C++ use-after-lifetime bug (above); the Pulse error message (below).

needs to grow to accommodate new elements. If the internal array is reallocated during the v->push back(42) call, a use-after-lifetime bug occurs on the next line as x points into the previous array. Note how the Pulse error message (at the bottom of Fig. 1) refers to memory that has been invalidated. As we describe shortly, this information is tracked in Pulse with an invalidated heap assertion.

For the theory in this paper, we do not want to descend into the details of C++, vectors, and so forth. Thus, for illustrative purposes, in Fig. 2 we present an adaptation of such use-after-lifetime bugs in C rather than C++, alongside its representation in the ISL language used in this paper. In this adaptation, the array at v is of size 1, and is reallocated in push back non-deterministically to model its dynamic reallocation when growing. We next demonstrate how we can use ISL to detect the use-after-lifetime bug in the client procedure in Fig. 2.

**ISL Triples.** The ISL theory uses *under-approximate triples* [35] of the form [presumption] C [- :result], interpreted as: the result assertion describes a *subset* of the states that can be reached from the presumption assertion by executing C, where denotes an *exit condition* indicating either normal or exceptional (erroneous) termination. The under-approximate triples can be equivalently interpreted as: every state in result can be obtained by executing C on a starting state in presumption. By contrast, given a Hoare triple {pre} <sup>C</sup> {post}, the postcondition post describes a *superset* of states that are reachable from the precondition pre, and may include states unreachable from pre. Hoare logic is about over-approximation, allowing false positives but not negatives, whereas ISL is about under-approximation, allowing false negatives but not positives.

**Bug Specification of** client(v)**.** Using ISL, we can specify the use-afterlifetime bug in client(v) as follows:

```
[v -
  → a ∗ a -
           →−] client(v) -

                               er (lrx ): ∃a-

                                            . v -
                                                → a-
                                                     ∗ a-
                                                          -
                                                          →− ∗ a -
                                                                   →  (PB-Client)
```
We make several remarks to illustrate the crucial features of ISL:


```
void push_back(int **v)
{
  if (nondet()) {
    free(*v);
    *v = malloc(sizeof(int));
  }
}
void client(v) {
  int* x = *v;
  push_back(v);
  *x = 88; }
                                     push back(v) -

                                       local z, y in
                                            z := *;
                                            (assume(z -
                                                       = 0); lrv : y :=[v];
                                              lf : free(y);
                                              y :=malloc(); [v]:= y)
                                          + (assume(z = 0); skip)
                                     client(v) -

                                       local x in
                                          x:= [v];
                                          push back(v);
                                          lrx : [x]:= 88
```
**Fig. 2.** The push back example in C (left); and in the ISL language (right).


Let us next consider how we reason symbolically about this bug. Note that for the client(v) execution to reach an error at line l*rx* , the push back(v) call within it must not cause an error. That is, in contrast to PB-Client, we need a specification for push back(v) that describes normal, non-erroneous termination. We specify this normal execution with the *ok* exit condition as follows:

```
[v → a ∗ a →−] push back(v) [ok : ∃a
                                   . v → a ∗ a →− ∗ a -
                                                       → ] (PB-Ok)
```
PB-Ok describes the case when push back(v) frees the internal array of v at a (denoted by a -→ in the result), and subsequently reallocates it at a . Consequently, as a is invalidated after the push back(v) call, the instruction following the call in client(v) dereferences invalidated memory at lrx, causing an error.

Note that the result assertion in PB-Ok is strictly under-approximate in that it is smaller (stronger) than the exact "strongest post". Given the assertion in the presumption, the strongest post must also consider the else clause of the conditional, when nondet() returns zero and push back(v) does nothing. That is, the strongest post is the disjunction of the given result and the presumption. The ability to go below the strongest post soundly is a hallmark of under-approximate reasoning: it allows for compromise in an analyzer, where we might choose, e.g., to limit the number of paths explored for efficiency reasons, or to concretize an assertion partially when symbolic reasoning becomes difficult [35].

We present proof outlines for PB-Ok and PB-Client in Fig. 3, where we annotate each step with a proof rule to connect to the ISL theory in Sect. 3. For legibility, uses of the Frame rule are omitted as it is used in almost every step, and the consequence rule Cons is usually omitted when rewriting a formula to an equivalent one. For the moment, we encourage the reader to attempt to follow, prior to formalization, by mentally executing the program instructions on the assertions and asking: does the assertion at each program point underapproximate the states that can be obtained from the prior state? Note that each step updates assertions in-place, just as concrete execution does on concrete memory. For example, <sup>l</sup><sup>f</sup> : free(y) replaces <sup>a</sup> →− with <sup>a</sup> -→ . In-place reasoning is a capability that the separating conjunction brings to symbolic execution; formally, this in-place aspect is achieved in the logic by applying the frame rule.

#### **3 Incorrectness Separation Logic (ISL)**

As a first attempt, it is tempting to obtain ISL straightforwardly by composing the standard semantics of SL [41] and the semantics of incorrectness logic [35]. Interestingly, this simplistic approach does not work. To see this, consider the following axiom for freeing memory, adapted from the corresponding SL axiom:

$$[x \mapsto -] \mathbf{f} \mathbf{r} \mathbf{e} \mathbf{e}(x) \begin{bmatrix} ok \colon \mathbf{emp} \land \mathbf{1oc}(x) \end{bmatrix}$$

Here, emp describes the empty heap and loc(x) states that x is an addressable location; e.g., x cannot be null. Note that this ISL triple is valid in that any state satisfying the result assertion can be obtained from one satisfying the presumption assertion, and thus we do have a true under-approximate triple.

However, in SL one can arbitrarily extend the state using the frame rule:

$$\frac{\vdash [p] \in [\epsilon : q] \qquad \mathsf{mod}(\mathbb{C}) \cap \mathsf{h}(r) = \emptyset}{\vdash [p \ast r] \: \mathbb{C} \; [\epsilon : q \ast r]} \; \text{(FRAME)}$$

Intuitively, the state described by the *frame* assertion r lies outside the footprint of C and thus remains unchanged when executing C. However, if we do this with the free(x) axiom above, choosing x → − as our frame, we run into a problem:

$$[x \mapsto -\*x \mapsto -] \mathbf{f} \mathbf{r} \mathbf{e} \mathbf{e} \mathbf{e} \mathbf{x}) \left[ \boldsymbol{\alpha} k . (\mathbf{e} \mathbf{m} \mathbf{p} \wedge \mathbf{1} \mathbf{o} \mathbf{c}(x)) \* x \mapsto -] \right]$$

Here, the presumption is inconsistent but the result is not, and thus there is no way to get back to the presumption from the result; i.e., the triple is invalid. In over-approximate reasoning this does not cause a problem since an inconsistent precondition renders an over-approximate triple vacuously valid. By contrast, an inconsistent presumption does not validate under-approximate reasoning.

Our way out of this conundrum is to consider a modified model in which the knowledge that a location was previously freed is a resource-oriented fact, using negative heap assertions. The negative heap assertion x -→ conveys more knowledge than the loc(x) assertion. Specifically, x -→ conveys: 1) the *knowledge* that x is an addressable location; 2) the knowledge that x has been deallocated; and 3) the *ownership* of location x. In other words, x -→ is analogous to the

```
[v → a ∗ a →−]
local y, z in
 z :=*; // Havoc
 [ok :z=1 ∗ v → a ∗ a →−]
 ( assume(z -
             = 0); // Assume
  [ok :z=1 ∗ z-
              =0 ∗ v → a ∗ a →−]
  lrv : y := [v]; //Load
  [ok :z=1 ∗ y=a ∗ v → a ∗ a →−]
  lf : free(y); // Free
  [ok :z=1 ∗ y=a ∗ v → a ∗ a →-
                               ]
  y := malloc(); //Alloc1, Choice
  [ok :z=1 ∗ v → a ∗ a →-
                        ∗ y → −]
  [v]:= y; // Store
  [ok :z=1 ∗ v → y ∗ a →-
                        ∗ y → −]
  )+(...) // Choice
  [ok :z=1 ∗ v → y ∗ a ∗ y → −]
// Local
[ok : ∃a-

        . v → a-
                ∗ a-
                    → − ∗ a →-
                               ]
                                         [v → a ∗ a →−]
                                         local x in
                                           x:= [v]; //Load
                                           [ok :x=a ∗ v → a ∗ a →−]
                                           push back(v); // PB-Ok
                                           [ok :∃a-

                                                  .x=a ∗ v →a-

                                                                ∗ a-

                                                                    → −∗a →-
                                                                             ]// Cons
                                           [ok :∃a-

                                                  .x=a ∗ v →a-

                                                                ∗ a-

                                                                    → −∗x →-
                                                                             ]
                                           lrx : [x]:= 88; // StoreEr
                                           [er(lrx ) : ∃a-

                                                        . x=a ∗ v → a-
                                                                       ∗ a-
                                                                            → − ∗ x →-
                                                                                      ]
                                         //Local
                                         [er(lrx ): ∃a-

                                                      . v → a-
                                                              ∗ a-
                                                                   → − ∗ a →-
                                                                             ]
                       →-
```
**Fig. 3.** The proof sketches of PB-Ok (left) and PB-Client (right).

points-to assertion x → − and is thus manipulated similarly, taking up space in ∗-conjuncts. That is, we cannot consistently ∗-conjoin x -→ either with x → − or with itself: x →−∗ x -→ ⇔ false and x -→ ∗ x -→ ⇔ false.

With such negative assertions, we can specify free() as the Free axiom in Fig. 5. Note that this allows us to recover the frame rule: when we frame x → − on both sides, we obtain the inconsistent assertion x →−∗ x -→ (i.e., false) in the result, which always makes an under-approximate triple vacuously valid.

We demonstrated how we arrived at negative heaps as a theoretical solution to recover the frame rule. However, negative heaps are more than a technical curiosity. In particular, a similar idea was informally present in Infer and has been used formally to reason about JavaScript [21]. Moreover, as we show in Sect. 4, negative heaps give rise to a *footprint theorem* (see Theorem 2).

Negative heap assertions were previously used informally in Infer. They were also independently and formally introduced in a separation logic for JavaScript [21] to state that a field is not present in a JavaScript object, which is a natural property to express when reasoning about JavaScript.

```
CommC ::= skip | x:=e | x:=* | assume(B) | local x in C | C1; C2 | C1 + C2 | C-

             | x:= alloc() | l: free(x) | l: x:= [y] | l: [x]:= y | l: error
         if B then C1 else C2 -
                                  (assume(B); C1)+(assume(!B); C2)
                    while(B) C -
                                  (assume(B); C)
                                                  -
                                                   ; assume(!B)
                      assert(B) -
                                  (assume(!B); error) + assume(B)
                   x:= malloc() -
                                  x:= alloc() + x:= null
```
**Fig. 4.** The ISL Language (above); encoding standard constructs in ISL (below).

**Programming Language.** To keep our presentation concise, we employ a simple heap-manipulating language as shown in Fig. 4. We assume an infinite set Val of *values*; a finite set Var of (program) *variables*; a standard interpreted language for *expressions*, Exp, containing variables and values; and a standard interpreted language for *Boolean expressions*, BExp. We use v as a metavariable for values; x, y, z for program variables; e for expressions; and B for Boolean expressions.

Our language is given by the C grammar and includes the standard constructs of skip, assignment (x := e), non-deterministic assignment (x := \*, where \* denotes a non-deterministically picked value), assume statements (assume(B)), scoped variable declaration (local x in C), sequential composition (C1; C2), non-deterministic choice (C<sup>1</sup> + C2) and loops (C-), as well as error statements (error) and heap-manipulating instructions. Note that deterministic choice and loops (e.g.,if and while statements) can be encoded using their nondeterministic counterparts and assume statements, as shown in Fig. 4.

To better track errors, we annotate instructions that may cause an error with a label <sup>l</sup> <sup>∈</sup> Label. When an error is encountered (e.g., in <sup>l</sup>: error), we report the label of the offending instruction (e.g., l). As such, we only consider *wellformed* programs: those with unique labels across their constituent instructions. For brevity, we drop the instruction labels when they are immaterial to the discussion.

As is standard practice, we use error statements as test oracles to detect violations. In particular, error statements can be used to encode *assert* statements as shown in Fig. 4. Heap-manipulating instructions include allocation, deallocation, lookup and mutation. The x := alloc() instruction allocates a new (unused) location on the heap and returns it in x, and can be used to represent the standard, possibly null-returning malloc() from C as shown in Fig. 4. Dually, free(x) deallocates the location denoted by x. Heap lookup x := [y] reads the contents of the location denoted by y and returns it in x; heap mutation [x]:= y overwrites the contents of the location denoted by x with y.

**Assertions.** The *ISL assertion language* is given by the grammar below, where ⊕∈{=, -=, <, ≤,...}. We use p, q, r as metavariables for assertions.

Ast p, q, r ::= false <sup>|</sup> <sup>p</sup> <sup>⇒</sup> <sup>q</sup> | ∃x. p <sup>|</sup> <sup>e</sup> <sup>⊕</sup> <sup>e</sup> classical and Boolean assertions | emp | e → e | e -→ | p ∗ q structural assertions As we describe formally in Sect. 4, assertions describe sets of *states*, where each state comprises a (variable) store and a heap. The classical (first-order logic) and Boolean assertions are standard. Other classical connectives can be encoded using existing ones (e.g., ¬p p ⇒ false). Aside from the highlighted x -→ , structural assertions are as defined in SL [28], and describe a set of states by constraining the shape of the underlying heap. More concretely, emp describes states in which the heap is empty; e → e describes states in which the heap comprises a single location denoted by e containing the value denoted by e ; and p ∗ q describes states in which the heap can be split into two disjoint sub-heaps, one satisfying p and the other q. We often write e → − as a shorthand for ∃v. e → v.

As described above, we extend our structural assertions with the *negative* heap assertion e -→ (read "e is invalidated"). As with its positive counterpart e → e , the negative assertion e -→ describes states in which the heap comprises a single location at e. However, whilst e → e states that the location at e is allocated (and contains the value e ), e -→ states that the location at e is *deallocated*.

**ISL Proof Rules (Syntactic ISL Triples).** We present the ISL proof rules in Fig. 5. As in incorrectness logic [35], the ISL triples are of the form [p] <sup>C</sup> [- :q], denoting that *every* state in the *result* assertion q is reachable from *some* state in the *presumption* assertion p with *exit condition* -. That is, for each state σ<sup>q</sup> in q, there exists σ<sup>p</sup> in p such that executing C on σ<sup>p</sup> terminates with and yields σq. As such, since false describes an empty state set, [p] C [- :false] is vacuously valid for all p, C, -. Dually, [false] C [- :q] is always invalid when q -⇒ false.

An exit condition, - <sup>∈</sup> Exit, may be: 1) *ok*, denoting a successful execution; or 2) *er* (l), denoting an erroneous execution with the error encountered at the l-labeled instruction. Compared to [35], we further annotate our error conditions to track the offending instructions. Moreover, whilst [35] rules only detect explicit errors caused by error statements, ISL rules additionally allow us to track errors caused by *memory safety violations*, namely "use-after-free" violations, where a previously deallocated location is subsequently accessed in the program, and "null-pointer-dereference" violations. Although it is straightforward to distinguish between explicit and memory safety errors, for brevity we use *er* (l) for both.

Thanks to the separation afforded by ISL assertions, compared to incorrectness triples in [35], ISL triples are *local* in that the states described by their presumptions only contain the resources needed by the program. For instance, as skip requires no resource for successful execution, the presumption of Skip is simply given by emp, which remains unchanged in the result. Similarly, assume(B) requires no resource and results in a state satisfying B. The Assign rule is analogous to its SL counterpart. Similarly, x:= \* in Havoc assigns a nondeterministic value to x. Although these axioms (and Alloc1, Alloc2) ask for a single equality x = x in their presumption, one can derive more general triples starting from any presumption p by picking a fresh x and applying the axiom, and the Frame and Cons rules on the equivalent presumption <sup>x</sup> <sup>=</sup> <sup>x</sup> <sup>∗</sup> <sup>p</sup>[x /x].


**Fig. 5.** The ISL proof rules where *x* and *x*are distinct variables.

Note that skip, assignments and assume statements always terminate successfully (with *ok*). By contrast, l: error always terminates erroneously (with *er* (l)) and requires no resource. The ISL rules Seq1, Seq2, Choice, Loop1, Loop2, Cons, Disj and Subst are as in [35]. The Seq1 rule captures shortcircuiting when the first statement (C1) encounters an error and thus the program terminates erroneously. Analogously, Seq2 states that when C<sup>1</sup> executes successfully, the program terminates with when the subsequent C<sup>2</sup> statement terminates with -. The Choice rule states that the states in q are reachable from p when executing C<sup>1</sup> + C<sup>2</sup> if they are reachable from p when executing either branch. Loop1 captures immediate exit from the loop; Loop2 states that q is reachable from p when executing C if it is reachable after a non-zero number of C iterations.

The Cons rule allows us to strengthen the result and weaken the presumption: if q is reachable from p , then the smaller q is reachable from the bigger p. Note that compared to SL, the direction of implications in the Cons premise are flipped. Using Cons, we can rewrite the premises of Disj as [p<sup>1</sup> <sup>∨</sup> <sup>p</sup>2] <sup>C</sup> [- :q1] and [p<sup>1</sup> <sup>∨</sup> <sup>p</sup>2] <sup>C</sup> [- :q2]. As such, if both q<sup>1</sup> and q<sup>2</sup> are reachable from p<sup>1</sup> ∨ p2, then <sup>q</sup><sup>1</sup> <sup>∨</sup> <sup>q</sup><sup>2</sup> is also reachable from <sup>p</sup><sup>1</sup> <sup>∨</sup> <sup>p</sup>2, as shown in Disj. The Exist rule is derived from Disj; Subst is standard and allows us to substitute x with a fresh variable y; Local is equivalent to that in [35] but uses the Barendregt variable convention, renaming variables in formulas instead of in commands to avoid clashes.

As in SL, the crux of ISL reasoning lies in the Frame rule, allowing one to extend the presumption and the result of a triple with disjoint resources in r. The fv(r) function returns the set of free variables in r, and mod(C) returns the set of (program) variables modified by C (i.e., those on the left-hand of ':=' in assignment, lookup and allocation). These definitions are standard and elided.

Negative assertions allow us to detect memory safety violations when accessing deallocated locations. For instance, FreeEr states that attempting to deallocate x causes an error when x is already deallocated; *mutatis mutandis* for LoadEr and StoreEr. As shown in Alloc2, we can use negative assertions to allocate a previously-deallocated location: if y is deallocated (y -→ holds in the presumption), then it may be reallocated. The FreeNull, LoadNull and StoreNull rules state that accessing x causes an error when x is null. Finally, Load and Store describe the successful execution of heap lookup and mutation, respectively.

*Remark 1.* Note that mutation and deallocation rules in SL are given as {x → −} [x]:= y {x → y} and {x → −} free(x) {emp}; i.e., the value of x is existentially quantified in the precondition. We can similarly rewrite the ISL rules as:


However, these rules are too weak. For instance, we cannot use StoreWeak to prove [x → 7] [x] := y [*ok* : x → y]. This is because the implications in the premise of the Cons rule are flipped from those in their SL counterpart, and thus to use StoreWeak we must show <sup>x</sup> →−⇒ <sup>x</sup> → 7 which we cannot. Put differently, StoreWeak states that for *some* value v, executing [x] := y on a state satisfying x → v yields a state satisfying x → y. However, this statement is valid for *all* values of v. As such, we strengthen the presumption of Store to x → e, allowing for an arbitrary (universally quantified) expression e at x.

In general, in over-approximate logics (e.g., SL) the aim is to *weaken* the preconditions and *strengthen* the postconditions of specifications as much as possible. This is to ensure that we can optimally apply the Cons rule to adapt the specifications to broader contexts. Conversely, in under-approximate logics (e.g., ISL) we should strengthen the presumptions and weaken the results as much as possible, since the implication directions in the premise of Cons are flipped.

*Remark 2.* The backward reasoning rules of SL [28] are generally unsound for ISL, just as the backward reasoning rules of Hoare logic are unsound for incorrectness logic [35]. For instance, the backward axiom for store is {x →− ∗ (x → y −∗ p)} [x] := y {p}. However, taking p = emp yields an inconsistent precondition, resulting in the triple {false} [x]:= y {emp}, which is valid in SL but not ISL.

**Proving.** PB-Ok **and** PB-Client**.** We next return to the proof sketch of PB-Ok in Fig. 3. For brevity, rather than giving full derivations, we follow the classical Hoare logic proof outline, annotating each line of the code with its presumption and result. We further commentate each proof step and write e.g., //Choice to denote an application of Choice. Note that when applying Choice, we *pick* a branch (e.g., the left branch in PB-Ok) to execute. Observe that unlike in SL where one needs to reason about *all* branches, in ISL it suffices to pick and reason about a *single* branch, and the remaining branches are ignored.

As in Hoare logic proof outlines, we assume that Seq2 is applied at every step; i.e., later instructions are executed only if the earlier ones execute successfully. In most steps, we apply Frame to frame off the unused resource r, carry out the instruction effect, and subsequently frame on r. For instance, when verifying z := \* in the proof sketch of PB-Ok, we apply Havoc to pick a non-zero value for z (in this case 1) after the assignment. As such, since the presumption of Havoc is emp, we use Frame to frame off the resource <sup>v</sup> → <sup>a</sup>∗<sup>a</sup> →− in the presumption, apply Havoc to obtain <sup>z</sup> = 1, and subsequently frame on <sup>v</sup> → <sup>a</sup>∗<sup>a</sup> →−, yielding <sup>z</sup> = 1 <sup>∗</sup> <sup>v</sup> → <sup>a</sup> <sup>∗</sup> <sup>a</sup> → −. For brevity, we keep the applications of Frame and Seq2 implicit and omit them in our annotations. The proof of PB-Client in Fig. 3 is then straightforward and applies the PB-Ok specification when calling push back(v). We refer the reader to the technical appendix [38] where we apply ISL to a further example to detect a null-pointer-dereference bug in OpenSSL.

#### **4 The ISL Model**

**Denotational Semantics.** We present the ISL semantics in Fig. 6. The semantics of a statement <sup>C</sup> <sup>∈</sup> Comm under an exit condition - <sup>∈</sup> Exit, written -C-, is described as a relation on *program states*. A program state, <sup>σ</sup> <sup>∈</sup> State, is a pair of the form (*s*, *<sup>h</sup>*), comprising a (variable) *store s* <sup>∈</sup> Store and a *heap <sup>h</sup>* <sup>∈</sup> Heap.

**Fig. 6.** The ISL denotational semantics (top); the ISL assertion semantics (bottom).

A store is a function from variables to values. Given a store *s*, expression e and Boolean expression B, we write *s*(e) and *s*(B) for the values to which e and B evaluate under *s*, respectively. These definitions are standard and omitted.

A heap is a partial function from *locations*, Loc, to Val {⊥}. We model heaps as partial functions as they may grow gradually by allocating additional locations. We use the designated value ⊥ -<sup>∈</sup> Val to track those locations that have been deallocated. That is, given <sup>l</sup> <sup>∈</sup> Loc, if *<sup>h</sup>*(l) <sup>∈</sup> Val then <sup>l</sup> is allocated in h and holds value *h*(l); and if *h*(l) = ⊥ then l has been deallocated. As we demonstrate shortly, we use ⊥ to model invalidated assertions such as x -→ .

The semantics in Fig. 6 closely corresponds to ISL rules in Fig. 5. For instance, x := [y]*ok* underpins Load, while <sup>x</sup> := [y]*er* (−) underpins LoadEr and LoadNull; e.g., if the location at <sup>y</sup> is deallocated (*h*(*s*(y))=⊥), then executing x:= [y] terminates erroneously as captured by x:= [y]*er* (−). The semantics of mutation, allocation and deallocation are defined analogously. As shown, skip, assignment and assume(B) never terminate erroneously (e.g., skip*er* (−)=∅), and the semantics of their successful execution is standard. The two disjuncts in -C1; C2 capture Seq1 and Seq2, respectively. The semantics of C<sup>1</sup> + C<sup>2</sup> is defined as the union of those of its two branches. The semantics of C is defined as the union of the semantics of zero or more C iterations.

**Heap Monotonicity.** Note that for all C, and (σp, σq) ∈ -C-, the (domain of the) underlying heap in σ<sup>p</sup> *monotonically grows* from σ<sup>p</sup> to σ<sup>q</sup> and *never shrinks*. In particular, whilst the heap domain grows via allocation, all other base cases (including deallocation) leave the domain of the heap (i.e., the heap size) unchanged – deallocation merely updates the value of the given location in the heap to ⊥ and thus does not alter the heap domain. This is in contrast to the original SL model [28], where deallocation *removes* the given location from the heap, and thus the underlying heap may grow or shrink. As we discuss shortly, this monotonicity is the key reason why our model supports a footprint theorem.

**ISL Assertion Semantics.** The *semantics of ISL assertions* is given at the bottom of Fig. <sup>6</sup> via the function . : Ast → P(State), interpreting each assertion as a set of states. The semantics of classical and Boolean assertions are standard and omitted. As described in Sect. 3, emp describes states in which the heap is empty; and e → e describes states of the form (*s*, *h*) in which *h* contains a single location at *s*(e) with value *s*(e ). Analogously, e -→ describes states of the form (*s*, *h*) in which *h* contains a single deallocated location at *s*(e). Finally, the interpretation of p ∗ q contains a state σ iff it can be split into two parts, σ = σ<sup>p</sup> • σq, such that σ<sup>p</sup> and σ<sup>q</sup> are included in the interpretations of p and <sup>q</sup>, respectively. The function • : State <sup>×</sup> State State given at the bottom of Fig. 6 denotes *state composition*, and is defined when the constituent stores agree and the heaps are disjoint. For brevity, we often write σ ∈ p for σ ∈ p.

**Semantic Incorrectness Triples.** We next present the formal interpretation of ISL triples. Recall from Sect. 3 that an ISL triple [p] C [- :q] states that every state in q is reachable from some state in p under -. Put formally:

$$\vdash [p] \; \mathbb{C} \; [\epsilon : q] \stackrel{\text{def}}{\iff} \forall \sigma\_q \in q. \; \exists \sigma\_p \in p. \; (\sigma\_p, \sigma\_q) \in [\mathbb{C}] \; \epsilon$$

Finally, in the following theorem we show that the ISL proof rules are *sound*: if a triple [p]C[- :q] is derivable using the rules in Fig. 5, then <sup>|</sup>=[p]C[-:q] holds.

**Theorem 1 (Soundness).** *For all* p, C, -, q*, if* [p] <sup>C</sup> [- :q]*, then* <sup>|</sup><sup>=</sup> [p] <sup>C</sup> [-:q]*.*

#### **4.1 The Footprint Theorem**

The frame rule of SL enables *local* reasoning about a command C by concentrating only on the parts of the memory that are accessed by C, i.e., the C *footprint*:

'To understand how a program works, it should be possible for reasoning and specification to be confined to the cells that the program actually accesses. The value of any other cell will automatically remain unchanged.' [36]

Local reasoning is then enabled by semantic observations about the local effect of heap accesses. In what follows we describe some of the semantic structure underpinning under-approximate local reasoning, including how it differs from the classic over-approximate theory. Our main result is a footprint theorem, stating that the meaning of a command C is determined by its action on the "small" part of the memory accessed by C (i.e., the C footprint). The overall meaning of C can then be obtained by "fleshing out" its footprint.

To see this, consider the following example:

$$\begin{array}{ll} 1.\ \mathsf{free}(y); \\ 2.\ \mathsf{l}\_2\mathsf{i}\mathsf{free}(y) + \mathsf{free}(x); \\ 3.\ \mathsf{l}\_3\mathsf{i}\mathsf{free}(x) + \mathsf{skip} \end{array} \tag{\mathsf{FOOT}}$$

For simplicity, let us ignore variable stores for the moment and consider the executions of foot from an initial heap *h* - [l<sup>x</sup> → 1, l<sup>y</sup> → 2, l<sup>z</sup> → 3], containing locations lx, l<sup>y</sup> and lz, corresponding to variables x, y and z, respectively. Note that starting from *h*, foot gives rise to four executions depending on the + branches taken at lines 2 and 3. Let us consider the successful execution from *h* that first frees y, then frees x (the right branch of + on line 2), and finally executes skip (the right branch of + on line 3). The footprint of this execution from *h* is then given by (*ok* : [l<sup>x</sup> → 1, l<sup>y</sup> → 2], [l<sup>x</sup> → ⊥, l<sup>y</sup> → ⊥]), denoting an *ok* execution from the initial sub-heap [l<sup>x</sup> → 1, l<sup>y</sup> → 2], yielding the final sub-heap [l<sup>x</sup> → ⊥, l<sup>y</sup> → ⊥] upon termination. That is, the initial and final sub-heaps in the footprint do not include the untouched location l<sup>z</sup> as it remains unchanged, and the overall effect of foot is obtained from its footprint by adding <sup>l</sup><sup>z</sup> → 3 to both the initial and final sub-heaps; i.e., by "fleshing out" the footprint.

Next, consider the execution in which the left branch of + on line 2 is taken, resulting in a use-after free error. The footprint of this second execution from *h* is given by (*er* (l2):[l<sup>y</sup> → 2], [l<sup>y</sup> → ⊥]), denoting an error at <sup>l</sup>2. Note that as this execution terminates erroneously at l2, unlike in the first execution, location l<sup>x</sup> remains untouched by foot and is thus not included in the footprint.

Put formally, let foot (.) : Comm <sup>→</sup> Exit → P(State <sup>×</sup> State) denote a *footprint function* such that foot (C) describes the *minimal* state needed for *some* C execution under -: if ((*s*, *h*),(*s* , *h* )) <sup>∈</sup> foot (C) -, then *h* contains only the locations accessed by some C execution, yielding *h* on termination. In Fig. 7 we present an excerpt of foot (.), with its full definition given in [38].

$$\begin{aligned} \mathsf{foot}\left(\mathbb{C}\_{1} + \mathbb{C}\_{2}\right) \epsilon & \triangleq \mathsf{foot}\left(\mathbb{C}\_{1}\right) \epsilon \cup \mathsf{froot}\left(\mathbb{C}\_{2}\right) \epsilon \\ \mathsf{froot}\left(\mathsf{L} . \mathsf{froot}\left(x\right)\right) \omicron k & \triangleq \left\{ \left(\left(s, \left[l \mapsto v\right]\right), \left(s, \left[l \mapsto \bot\right]\right)\right) \mid s(x) = l \wedge v \in \mathsf{VAL} \right\} \\ \mathsf{froot}\left(\mathsf{L} . \mathsf{free}\left(x\right)\right) \operatornameron r\left(\mathsf{L}'\right) & \triangleq \left\{ \left(\left(s, \left[l \mapsto \bot\right]\right), \left(s, \left[l \mapsto \bot\right]\right)\right) \mid \mathsf{L} = \mathsf{L}' \wedge s(x) = l\right\} \\ & \qquad \qquad \qquad \cup \left\{ \left(\left(s, h\_{0}\right), \left(s, h\_{0}\right)\right) \mid \mathsf{L} = \mathsf{L}' \wedge s(x) = \mathsf{n}\mathsf{u}\mathbbm{1} \mathsf{l} \right\} \end{aligned}$$

**Fig. 7.** The foot (*.*) function (excerpt), where *<sup>h</sup>*<sup>0</sup> denotes an empty heap (*dom*(*h*0) = <sup>∅</sup>).

Our footprint theorem (Theorem 2) then states that any pair (σp, σq) resulting from executing <sup>C</sup> (i.e., (σp, σq) <sup>∈</sup> -C-) can be obtained by fleshing out a pair (σ p, σ <sup>q</sup>) in the C footprint (i.e., (σ p, σ <sup>q</sup>) <sup>∈</sup> foot (C) -): (σp, σq)=(σ <sup>p</sup> •σr, σ <sup>q</sup> •σr) for some σr.

**Theorem 2 (Footprints).** *For all* C *and* -*:* -C- = frame (foot (C) -)*, where* frame (R) - - (σ<sup>p</sup> • σr, σ<sup>q</sup> • σr) (σp, σq) ∈ R *.*

We note that our footprint theorem is a positive by-product of the ISL *model* and *not* the ISL logic. That is, the footprint theorem is an added bonus of the heap monotonicity in the ISL model, brought about by negative heap resources, and is orthogonal to the notion of under-approximation. As such, the footprint theorem would be analogously valid in the original SL model, were we to alter its model to achieve heap monotonicity through negative heaps. That said, there are important differences with the classic SL theory, which we discuss next.

#### **4.2 Differences with the Classic (Over-Approximate) Theory**

Existing work [14,40] presents footprint theorems for classical SL based on the notion of *safe states*; i.e., those that do not lead to erroneous executions. This is understandable as the informal reasoning which led to the frame rule for SL was based on safety [36,45]. According to the *fault-avoiding interpretation* of an SL triple {p}C{q}, deemed invalid when a state in <sup>p</sup> leads to an error, if <sup>C</sup> accesses a location outside p, then this leads to a safety violation. As such, any location not guaranteed to exist in p must remain unchanged, thereby yielding the frame rule. The existing footprint theorems were for safe states only.

By contrast, our theorem considers footprints involving both unsafe and safe states. For instance, given the foot program and an initial state (e.g., *h* in Sect. 4.1), we distinguished a footprint leading to an erroneous execution (e.g., (*er* (l2):[l<sup>y</sup> → 2], [l<sup>y</sup> → ⊥])) from one leading to a safe execution (e.g., (*ok* : [l<sup>x</sup> → 1, l<sup>y</sup> → 2], [l<sup>x</sup> → ⊥, l<sup>y</sup> → ⊥])). This distinction is important, as otherwise we could not distinguish further bugs that follow a safe execution. To see this, consider a second error in foot, namely the possible use-after-free of x on line 3, following a successful execution of lines 1 and 2.

For reasoning about incorrectness, it is essential that we consider unsafe states when accounting for why things work; this is a technical difference with the classic footprint results. But it also points to a deeper conceptual difference between the correctness and incorrectness theories. Above, we explained how safety, and its violation, played a crucial role in justifying the frame rule of overapproximate SL. However, as we describe below, ISL and its frame rule do not rely on safety.

As shown in [35], an under-approximate triple can be equivalently defined as: [p]C[- :q] def ⇐⇒ post(C, p) <sup>⊇</sup> <sup>q</sup>, where post(C, p) describes the states obtained by executing C on p. While this under-approximate definition equivalently justifies the frame rule, the analogous over-approximate (Hoare) triple obtained by flipping <sup>⊇</sup> (i.e., {p} <sup>C</sup> {q} def ⇐⇒ post(C, p) <sup>⊆</sup> <sup>q</sup>) invalidates the frame rule:

$$\frac{\{\text{true}\}[x] := 23\{\text{true}\}}{\{x \mapsto 17 \ast \text{true}\}[x] := 23\{x \mapsto 17 \ast \text{true}\}} \text{ (FRAME)}$$

The premise of this derivation is valid according to the standard interpretation of over-approximate triples, but its conclusion (obtained by framing on x → 17) certainly is not, as it states that the value of x remains unchanged after mutation.

The frame rule is then recovered by strengthening the {p} <sup>C</sup> {q} interpretation, *either* by requiring that executing C on p not fault (fault avoidance), *or* by "baking in" frame preservation: <sup>∀</sup>r. post(C, p <sup>∗</sup> <sup>r</sup>) <sup>⊆</sup> <sup>q</sup> <sup>∗</sup> <sup>r</sup>. Both solutions then invalidate the premise of the above derivation. We found it remarkable that our ISL theory is consistent with the technically simpler interpretation of triples – namely as post(C, p) <sup>⊇</sup> <sup>q</sup>, the dual of Hoare's interpretation – and that it supports a simple footprint theorem at once, again in contrast to the over-approximate theory.

#### **5 Begin-Anywhere, Intra-procedural Symbolic Execution**

ISL lends itself naturally to the definition of forward symbolic execution analyses. We demonstrate that using the ISL rules, it is straightforward to derive a *begin-anywhere*, *intra-procedural* analysis that allows us to infer valid ISL triples *automatically* for a given piece of code, with the goal of finding only true bugs reachable from an initial state. This is implemented in the intraprocedural-only mode of the Pulse analysis inside Infer [18] (accessible by passing --pulse --pulse-intraprocedural-only to infer). The analysis follows principles from bi-abduction [11], but takes its most successful application – bug catching [18] – as the sole objective. This allows us to make a number of adjustments and to obtain an analysis that is a much closer fit to the ISL theory of under-approximation than the original bi-abduction analysis was to the SL theory of over-approximation.

The original bi-abduction analysis in Abductor [11] and Infer [18] aimed at discovering fault-avoiding specifications for procedures. Ideally, one would find specifications for *all* procedures in the codebase, all the way to an entry-point (e.g., the main() function), thus proving the program safe. In practice, however, virtually all sizable codebases have bugs, and known abstract domains are imprecise when proving memory safety for large codebases. As such, specifications were

**Fig. 8.** Symbolic heaps (above) and selected symbolic execution rules (below).

found for only 40–70% of the procedures in the experiments of [11]. Nonetheless, proof failures, a by-product of proof search, became practically more valuable than proofs, as they can indicate errors. Complex heuristics came into play to classify proof failures and to report to the programmer those more likely to be errors. These heuristics have not been given a formal footing, contributing to the gap between the theory of proofs and the practice of bug catching.

Pulse approaches bug reporting more directly: by looking for them. It infers under-approximate specifications, while recording invalidated addresses. If such an address is later accessed, a bug is reported soundly, in line with the theory.

**Symbolic Execution.** In Fig. 8 we present our symbolic execution as big-step, syntax-directed inference rules of the form [p0] C<sup>0</sup> [-<sup>0</sup> :q0] C [p] C0; C [- :q], which can be read as: "having already executed C<sup>0</sup> yielding (discovering) the presumption p<sup>0</sup> and the result q0, then executing C yields the presumption p and result q". As is standard in SL-based tools [4,11], our abstract states consist of ∗-conjoined predicates, with the notable addition of the invalidated assertion and omission of inductive predicates. The latter are not needed because we never perform the over-approximation steps that would introduce them.

SE-Seq describes how the symbolic execution goes forward step by step. SE-Choice describes how the analysis computes one specification per path taken in the program. To ensure termination, loops are unrolled up to a fixed bound Nloops, borrowing from symbolic bounded model checking [12]. These two ideas avoid the arduous task of inventing join and widen operators [15]. For added efficiency, in practice we also limit the maximum number of paths leading to the same program point to a fixed bound Ndisjuncts. The Nloops and Ndisjunctsbounds give us easy "knobs" to tune the precision of the analysis. Note that pruning paths by limiting disjuncts is also sound for under-approximate reasoning [35].

To analyze a program C, we start from C<sup>0</sup> = skip and produce [emp] skip [*ok* : emp] C [p] skip; C [- :q]. As |= [emp] skip [*ok* : emp] holds and symbolic execution rules preserve validity, we then obtain valid triples for C by Theorem 3.

**Theorem 3 (Soundness of Symbolic Execution).** *If* <sup>|</sup><sup>=</sup> [p0] <sup>C</sup><sup>0</sup> [- :q0] *and* [p0] C<sup>0</sup> [-<sup>0</sup> :q0] C [p] C0; C [- :q]*, then* <sup>|</sup><sup>=</sup> [p] <sup>C</sup>0; <sup>C</sup> [-:q]*.*

Symbolic execution of individual commands follows the derived SymbExec rule below, with the side-condition that mod(C0)∩fv(M) = mod(C)∩fv(F) = <sup>∅</sup>:

#### SymbExec

$$\begin{array}{c} \frac{[p\_0] \mathbb{C}\_0 \left[ \begin{smallmatrix} \boldsymbol{\alpha} \mathbf{q}\_0 \end{smallmatrix} \right]}{[p\_0 \ast M] \mathbb{C}\_0 \left[ \begin{smallmatrix} \boldsymbol{\alpha} \mathbf{q}\_0 \mathbf{\*} \mathbf{M} \end{smallmatrix} \right]} & q\_0 \ast M \dashv p \ast F & \frac{[p] \mathbb{C} \left[ \begin{smallmatrix} \boldsymbol{\alpha} \mathbf{q} \end{smallmatrix} \right]}{[p \ast F] \mathbb{C} \left[ \begin{smallmatrix} \boldsymbol{\alpha} \mathbf{q} \ast F \end{smallmatrix} \right]} \end{array}$$

If executing C<sup>0</sup> yields the presumption p<sup>0</sup> and the current state q0, then SymbExec allows us to execute the next command C with specification [p] C [- :q]. This may 1) materialize a state M that is *missing* from q<sup>0</sup> (and is needed to execute C); and 2) carry over an unchanged *frame* F. The unknowns M and F in the *bi-abduction question* p ∗ F q<sup>0</sup> ∗ M have analogous counterparts in over-approximate bi-abduction; but, as in the Cons rule, their roles have flipped: the *frame* F is *abduced*, while the missing M is *framed* (or *anti-abduced*).

**Bi-abduction and ISL.** Bi-abduction is arguably a better fit for ISL than SL: in SL adding the missing M to the overall precondition p<sup>0</sup> is only valid for straight-line code, and not across control flow branches. Intuitively, there is no guarantee that a safe precondition for one path is safe for the other. This is especially the case in the presence of non-determinism or over-approximation of Boolean conditions, where one cannot find definitive predicates to force the analysis down one path. It is thus necessary to *re-execute* the whole procedure on the inferred preconditions, eliminating those that are not safe for all paths. By contrast, in our setting SE-Choice is *sound*, and this re-execution is not needed!

We allow the analysis to abduce information only for *successful* execution; *erroneous* executions have to be *manifest* and realizable using only the information at hand. We do this by requiring M to be emp in SymbExec when applied to error triples. We go even further and require that the implication be in both directions, i.e., that the current state *force* the error – note that if q x -→ ∗true then there exists F such that x -→ ∗F q, and similarly for q x = null ∗ true. This is a practical choice and only one of many ways to decide *where* to report, trying to avoid blaming the code for issues it did not itself cause. For instance, thanks to this restriction, we do not report on [x]:= 10 (which has error specifications through StoreEr and StoreNull) unless a previous instruction actively invalidated x. This choice also chimes well with the fact that the analysis can *start anywhere* in a program and give results relevant to the code analyzed.

Solving the bi-abduction entailment in SymbExec can be done using the techniques developed for SL [11, §3]. We do not detail them here as they are straightforwardly adapted to our simpler setting without inductive predicates.

**Finding a Bug in** client**, Automatically.** We now describe how Pulse automatically finds a proof of the bug in the unnanotated code of client from Fig. 3, by automatically applying the only possible symbolic execution rule at each step. Starting from emp and going past the first instruction x:= [v] requires solving v → u ∗ F emp ∗ M. The bi-abduction entailment solver then answers with F = emp and M = v → u, yielding the inferred presumption v → u and the next current state v → u ∗ x = u. The next instruction is the call to push back(v). For ease of presentation, let us consider this library call as an axiomatized instruction that has been given the specification in Fig. 3. This corresponds to writing a model for it in the analyzer, which is actually the case in the implementation, although the analysis would work equally well if we were to inline the code inside client. Applying SymbExec requires solving the entailment v → a ∗ a → w ∗ F v → u ∗ x = u ∗ M. The solver then answers with the solution F = (x = u ∗ a = u) and M = u → w. Finally, the following instance of SE-StoreEr is used to report an error, where C = skip; x := [v]; push back(v) and q*rx* = v → a ∗ a → w ∗ a -→ ∗ x = u ∗ a = u:

$$\begin{aligned} [v &\mapsto u \ast u \mapsto w] \mathbb{C} \, [ok \colon q\_{rx}] \, \mathsf{L}\_{rx} \colon [x] := 88 \\ \sim [v &\mapsto u \ast u \mapsto w] \, \mathsf{C} \colon \mathsf{L}\_{rx} \colon [x] := 88 \, [er(\mathsf{L}\_{rx}) \colon q\_{rx}] \end{aligned}$$

**Preliminary Results.** Our analysis handles the examples in this paper, modulo function inlining. While our analysis shows how to derive a sound static analysis from first principles, it does not yet fully exploit the theory, as it does not handle function calls, and in particular *summarization*. Under-approximate triples pave the way towards succinct summaries. However, this is a subtle problem, requiring significant theoretical and empirical work out of the scope of this initial paper.

Pragmatically, we can make Pulse scale by skipping over procedure calls instead of inlining them, in effect assuming that the call has no effect beyond assigning fresh (non-deterministic) values to the return address and the parameters passed by reference – note that such fresh values are treated optimistically by Pulse as we do not know them to be invalid. In theory, this may cause false positives and false negatives, but in practice we observed that such an analysis reports very few issues. For instance, it reports no issues on OpenSSL 1.0.2d (with 8681 C functions) at the time of writing, and only 17 issues on our proprietary C++ codebase of hundreds of thousands of procedures. As expected, the analysis is very fast and scales well (6 s for OpenSSL, running on a Linux machine with 24 cores). Moreover, 30 disjuncts suffice to detect all 17 issues (in comparison, using 20 disjuncts misses 1 issue, while using 100 disjuncts detects no more issues than using 30 disjuncts), and varying loop unrollings between 1–10 has no effect.

We also ran Pulse in production at Facebook and reported issues to developers as they submit code changes, where bugs are more likely than in mature codebases. Over the course of 4 months, Pulse reported 20 issues to developers, of which 15 were fixed. This deployment relies crucially on the begin-anywhere capability: though the codebase in question has 10s of MLOC, analysing a code change starts from the changed files and usually visits only a small fraction of the codebase.

**Under-Approximation in Pulse.** Pulse achieves under-approximate reasoning in several ways. First, Pulse uses the under-approximate Choice, Loop1 and Loop2 rules in Fig. 5 which prune paths by considering one execution branch (Choice) or finite loop unrollings (Loop1 and Loop2). Second, Pulse does not use Alloc2, and thus prunes further paths. Third, Pulse uses under-approximate models of certain library procedures; e.g., the vector::push back() model assumes the internal array is always deallocated. Finally, our bi-abduction implementation assumes that memory locations are distinct unless known otherwise, thus leading to further path pruning. These choices are all sound thanks to the under-approximate theory of ISL; it is nevertheless possible to make different pragmatic choices.

Although our implementation does not do it, we can use ISL to derive strongest posts for primitive statements, using a combination of their axioms and the Frame, Disj and Exist rules. Given the logic fragment we use (which excludes inductive predicates) and a programming language with Boolean conditions restricted to a decidable fragment, there is likely a bounded decidability result obtained by unrolling loops up to a given bound and then checking the strongest post on each path. However, the ability to under-approximate (by forgetting paths/disjuncts) gives us the leeway to tune a deployment for optimizing the bugs/minute rate: in one experiment, we found that running Pulse on a codebase with 100s kLOC and a limit of 20 disjuncts was ∼3.1x user-time faster than running it with a limit of 50 disjuncts, and yet found 97% of the issues found in the 50-disjuncts case.

*Remark 3.* Note that although the underlying heaps in ISL grow monotonically, the impact on the size of the manipulated states in our analysis is comparable to that of the original bi-abductive analysis for SL [11]. This is in part thanks to the compositionality afforded by ISL and its footprint property (Theorem 2), especially when individual procedures analyzed are not too big. In particular, the original bi-abduction work for SL already tracks the allocated memory; in ISL we additionally track deallocated memory which is of the same order of magnitude.

#### **6 Context, Related Work and Conclusions**

Although the foundations of program verification have been mostly developed with correctness in mind, industrial uses of symbolic reasoning often derive value from their deployment as *bug catchers* rather than *provers* of bug absence. There is a fundamental tension in correctness-based techniques, most thoroughly explored in the model checking field, between compact representations versus strength and utility of counter-examples. Abstraction techniques are typically used to increase compactness. This has the undesired side-effect that counterexamples become "abstract": they may be infeasible, in that they may not actually witness a concrete execution that violates a given property. Using proofs of bugs, this paper aims to provide a symbolic mechanism to express the *definite* existence of a concrete counter-example, without committing to a particular one, while simultaneously enabling sound, compositional, local reasoning. Our working hypothesis is that bugs are a fundamental enough phenomenon to warrant a fundamental compositional theory for reasoning positively about their existence, rather than only being about failed proofs. We hope that future work will explore the practical ramifications of these foundational ideas more thoroughly.

Amongst static bug-catching techniques, there is a dichotomy between the highly scalable, compositional static tools such as Coverity [5], Facebook Infer [18] and those deployed at Google [42], which suffer from false positives as well as negatives, and the under-approximating global bug hunters such as fuzzers [23] and symbolic executors [9], which suffer from scalability limitations but not false positives (at least, ideally). In a recent survey, Godefroid remarks "How to engineer exhaustive symbolic testing (that is, a form of verification) in a cost-effective manner is still an open problem for large applications" [23]. The ability to apply compositional analyses incrementally to large codebases has led to considerable impact that is complementary to that of the global analyses. But, compositional techniques can have less precision compared to global ones: examining all call sites of a procedure can naturally lead to more precise results.

Our illustrative analysis, Pulse, starts from the scalable end of the spectrum and moves towards the under-approximate end. An equally valid research direction would be to start from existing under-approximate analyses and make them more scalable and with lower start-up-cost. There has indeed been valuable research in this direction. For example, SMART [22] tries to make symbolic execution more scalable by using summaries as in inter-procedural static analysis, and UC-KLEE [39] allows symbolic execution to begin anywhere, and thus does not need a complete program. UC-KLEE uses a "lazy initialization" mechanism to synthesize assumptions about data structures; this is not unlike the biabductive approach here and in [10]. An interesting research question is whether this similarity can be made rigorous. There are many papers on marrying underand over-approximation e.g., [1], but they often lack the scalability that is crucial to the impact of modular bug catchers. In general, there is a large unexplored territory, relevant to Godefroid's open problem stated above, between the existing modular but not-quite-under-approximate bug catchers such as Infer and Coverity, and the existing global and under-approximate tools such as KLEE [8], CBMC [12] and DART [24]. This paper provides not a solution, but a step in the exploration.

Gillian [20] is a platform for developing symbolic analysis tools using a symbolic execution engine based on separation logic. Gillian has C and JavaScript instantiations for precise reasoning about a finite unwinding of a program, similar to symbolic bounded model checking. Gillian's execution engine is currently exact for primitive commands (it is both over- and under-approximate); however, it uses over-approximate bi-abduction for function calls, and is thus open to false positives (Petar Maksimovi´c, personal communication). We believe Gillian can be modified to embrace under-approximation more strongly, serving as a general engine for proving ISL specifications. Aiming for under-approximate results rather than exact ones gives additional flexibility to the analysis designer, just as aiming for over-approximate rather than exact results does for correctness tools.

Many assertion languages for heap reasoning have been developed, including ones not based on SL (e.g., [3,27,31,46]). We do not claim that, compared to these alternatives, the ISL assertion language in this paper is particularly advantageous for reasoning along individual paths, or exhaustive (but bounded) reasoning about complete programs. Rather, the key point is that our analysis solves abduction and anti-abduction problems, which in turn facilitates its application to large codebases. In particular, as our analysis synthesizes contextual heap assumptions (using anti-abduction), it can begin anywhere in a codebase instead of starting from main(). For example, it can start on a modified function that is part of a larger program: this capability enables continuous deployment in codebases with millions of LOC [18,34]. To our knowledge, the cited assertion languages have only ever been applied in a whole-program fashion on small codebases (with low thousands of LOC). We speculate that this is not because of the assertion languages *per se*: if methods to solve analogues of abduction and anti-abduction queries were developed, perhaps they too could be applied to large codebases.

It is natural to consider how the ideas of ISL extend to concurrency. The RacerD analyzer [25] provided a static analysis for data races in concurrent programs; this analysis was provably under-approximate under certain assumptions. RacerD was intuitively inspired by concurrent separation logic (CSL [6]), but did not match the over-approximate CSL theory (just as Infer did not match SL). We speculate that RacerD and other concurrency analyses might be seen as constructing proofs in a yet-to-be-defined incorrectness version of CSL, a logic which would aim at finding bugs in concurrent programs via modular reasoning.

Our approach supports reasoning that is local not only in code, but also in state (spatial locality). Spatially local symbolic heap update has led to advances in scalability of global shape analyses of mutable data structures, where heap predicates are modified in-place in a way reminiscent of operational in-place update, and where transfer functions need not track global heap information [44]. Mutable data structures have been suggested as one area where classic symbolic execution has scaling challenges, and SL has been employed with human-directed proof on heap-intensive components to aid the overall scalability of symbolic execution [37]. An interesting question is whether spatial locality in the analysis can benefit scalability of fully automatic, global, under-approximate analyses.

We probed the semantic fundamentals underpinning local reasoning in Sect. 4, including a footprint theorem (Theorem 2) that is independent of the logic. The semantic principles are more deeply fundamental than the surface syntax of the logic. Indeed, in the early days of work on SL, it was remarked that local reasoning flows from locality properties of the semantics, and that separation logic is but one convenient syntax to exploit these [45]. Since then, a number of correctness logics with non-SL syntax have been proposed for local reasoning (e.g., [33] and its references) that exploit the semantic locality of heap update, and it stands to reason that the same will be possible for incorrectness logics.

Relating this paper to the timeline of SL for correctness, we have developed the basic logic (like [36] but under-approximate) and a simple local intraprocedural analysis (like [19] but under-approximate). We have not yet made the next steps to relatively-scalable global analyses [44] or extremely-scalable interprocedural, compositional ones [11]. These future directions are challenging for theory and especially practice, and are the subject of ongoing and future work.

**Conclusions.** Long ago, Dijkstra (in)famously remarked that "testing can be quite effective for showing the presence of bugs, but is hopelessly inadequate for showing their absence" [17], and he advocated the use of logic for the latter. As noted by others, many of the benefits of logic hold for both bug catching and verification, particularly the ability to cover many states and paths succinctly, even if not the alluring all. But there remains a frustrating division between testing and verification, where e.g., distinct tools are used for each. With more research on the fundamentals of symbolic bug catching and correctness, division may be replaced by unified foundations and toolsets in the future. For under-approximate reasoning in particular, we hope that bug catching eventually becomes more modular, scalable, easier to deploy and with elegant foundations similar to those of verification. This paper presents but one modest step towards that goal.

**Acknowledgments.** We thank Petar Maksimovi´c, Philippa Gardner, and the CAV reviewers for their feedback, and Ralf Jung for fruitful discussions in early stages of this work. This work was supported in part by a European Research Council (ERC) Consolidator Grant for the project "RustBelt", funded under the European Union's Horizon 2020 Framework Programme (grant no. 683289).

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Stochastic Systems**

## **Maximum Causal Entropy Specification Inference from Demonstrations**

Marcell Vazquez-Chanlatte(B) and Sanjit A. Seshia

University of California, Berkeley, USA marcell.vc@eecs.berkeley.edu

**Abstract.** In many settings, such as robotics, demonstrations provide a natural way to specify tasks. However, most methods for learning from demonstrations either do not provide guarantees that the learned artifacts can be safely composed or do not explicitly capture temporal properties. Motivated by this deficit, recent works have proposed learning Boolean *task specifications*, a class of Boolean non-Markovian rewards which admit well-defined composition and explicitly handle historical dependencies. This work continues this line of research by adapting maximum *causal* entropy inverse reinforcement learning to estimate the posteriori probability of a specification given a multi-set of demonstrations. The key algorithmic insight is to leverage the extensive literature and tooling on reduced ordered binary decision diagrams to efficiently encode a time unrolled Markov Decision Process. This enables transforming a naïve algorithm with running time exponential in the episode length, into a polynomial time algorithm.

### **1 Introduction**

In many settings, episodic demonstrations provide a natural and robust mechanism to partially specify a task, even in the presence of errors. For example, consider the agent operating in the gridworld illustrated in Fig. 1. Blue arrows denote intended actions and the solid black arrow shows the agent's actual path. This path can stochastically differ from the blue arrows due to a downward wind. One might naturally ask: "What task was this agent attempting to perform?" Even without knowing if this was a positive or negative example, based on the agent's state/action sequence, one can reasonably infer the agent's intent, namely, "reach the yellow tile while avoiding the red tiles." Compared with traditional learning from positive and negative examples, this is somewhat surprising, particularly given that the task is never actually demonstrated in Fig. 1.

This problem, inferring intent from demonstrations, has received a fair amount of attention over the past two decades particularly within the robotics community [5,22,30,33]. In this literature, one traditionally models the demonstrator as operating within a dynamical system whose transition relation only depends on the current state and action (called the Markov condition). However, even if the dynamics are Markovian, many tasks are naturally modeled in history

**Fig. 1.** Example of an agent unsuccessfully demonstrating the task "reach a yellow tile while avoiding red tiles". (Color figure online)

dependent (non-Markovian) terms, e.g., "if the robot enters a blue tile, then it must touch a brown tile *before* touching a yellow tile". Unfortunately, most methods for learning from demonstrations either do not provide guarantees that the learned artifacts (e.g. rewards) can be safely composed or do not explicitly capture history dependencies [30].

Motivated by this deficit, recent works have proposed specializing to **task specifications**, a class of Boolean non-Markovian rewards induced by formal languages. This additional structure admits well-defined compositions and explicitly captures temporal dependencies [15,30]. A particularly promising direction has been to adapt maximum entropy inverse reinforcement learning [33] to task specifications, enabling a form of robust specification inference, even in the presence unlabeled demonstration errors [30].

However, while powerful, the principle of maximum entropy is limited to settings where the dynamics are deterministic or agents that use open-loop policies [33]. This is because the principle of maximum entropy incorrectly allows the agent's predicted policy to depend on future state values resulting in an overly optimistic agent [19]. For instance, in our gridworld example (Fig. 1), the principle of maximum entropy would discount the possibility of slipping, and thus we would not forecast the agent to correct its trajectory after slipping once.

This work continues this line of research by instead using the principle of maximum *causal* entropy, which generalizes the principle of maximum entropy to general stochastic decision processes [32]. While a conceptually straightforward extension, a naïve application of maximum *causal* entropy inverse reinforcement learning to non-Markovian rewards results in an algorithm with run-time exponential in the episode length, a phenomenon sometimes known as the **curse of history** [24]. The key algorithmic insight in this paper is to leverage the extensive literature and tooling on Reduced Ordered Binary Decision Diagrams (BDDs) [3] to efficiently encode the time unrolled composition of the dynamics and task specification. This allows us to translate a naïve exponential time algorithm into a polynomial time algorithm. In particular, we shall show that this BDD has size at most linear in the episode length making inference comparatively efficient.

#### **1.1 Related Work**

Our work is intimately related to the fields of Inverse Reinforcement Learning and Grammatical Inference. **Grammatical inference** [8] refers to the welldeveloped literature on learning a formal grammar (often an automaton) from data. Examples include learning the smallest automata that in consistent with a set of positive and negative strings [7,8] or learning an automaton using membership and equivalence queries [1]. This and related work can be seen as extending these methods to unlabeled and potentially noisy demonstrations, where demonstrations differ from examples due to the existence of a dynamics model. This notion of demonstration derives from the Inverse Reinforcement Learning literature.

In **Inverse Reinforcement Learning** (IRL) [22] the demonstrator, operating in a stochastic environment, is assumed to attempt to (approximately) optimize some unknown reward function over the trajectories. In particular, one traditionally assumes a trajectory's reward is the sum of state rewards of the trajectory. This formalism offers a succinct mechanism to encode and generalize the goals of the demonstrator to new and unseen environments.

In the IRL framework, the problem of learning from demonstrations can then be cast as a Bayesian inference problem [25] to predict the most probable reward function. To make this inference procedure well-defined and robust to demonstration/modeling noise, Maximum Entropy [33] and Maximum Causal Entropy [32] IRL appeal to the principles of maximum entropy [13] and maximum causal entropy respectively [32]. This results in a likelihood over the demonstrations which is no more committed to any particular behavior than what is required to match observed statistical features, e.g., average distance to an obstacle. While this approach was initially limited to rewards represented as linear combinations of scalar features, IRL has been successfully adapted to arbitrary function approximators such as Gaussian processes [20] and neural networks [5]. As stated in the introduction, while powerful, traditional IRL provides no principled mechanism for composing the resulting rewards.

**Compositional RL:** To address this deficit, composition using soft optimality has recently received a fair amount of attention; however, the compositions are limited to either strict disjunction (do X *or* Y) [26,27] or conjunction (do X *and* Y) [6]. Further, this soft optimality only bounds the deviation from simultaneously optimizing both rewards. Thus, optimizing the composition does not preclude violating safety constraints embedded in the rewards (e.g., do not enter the red tiles).

**Logic Based IRL:** Another promising approach for introducing compositionality has been the recent research on automata and logic based encodings of rewards [11,14] which admit well defined compositions. To this end, work has been done on inferring Linear Temporal Logic (LTL) formulas by finding the specification that minimizes the expected number of violations by an optimal agent compared to the expected number of violations by an agent applying actions uniformly at random [15]. The computation of the optimal agent's expected violations is done via dynamic programming on the explicit product of the deterministic Rabin automaton [4] of the specification and the state dynamics. A fundamental drawback of this procedure is that due to the curse of history, it incurs a heavy run-time cost, even on simple two state and two action Markov Decision Processes. Additionally, as with early work on grammatical inference and IRL, these techniques do not produce likelihood estimates amenable to Bayesian inference.

**Maximum Entropy Specification Inference:** In our previous work [30], we adapted maximum entropy IRL to learn task specifications. Similar to standard maximum entropy IRL, this technique produces robust likelihood estimates. However, due to the use of the principle of maximum entropy, rather than maximum *causal* entropy, this model is limited to settings where the dynamics are deterministic or agents with open-loop policies [33].

**Inference Using BDDs:** This work makes heavy use of Binary Decision Diagrams (BDDs) [3] which are frequently used in symbolic value iteration for Markov Decision Processes [9] and reachability analysis for probabilistic systems [18]. However, the literature has largely relied on Multi-Terminal BDDs to encode the transition probabilities for a **single** time step. In contrast, this work introduces a two-terminal encoding based on the finite unrolling of a probabilistic circuit. To the best of our knowledge, the most similar usage of BDDs for inference appears in the independently discovered literal weight based encoding of [10] - although their encoding does not directly support non-determinism or state-indexed random variables.

**Contributions:** The primary contributions of this work are two fold. First, we leverage the principle of maximum causal entropy to provide the likelihood of a specification given a set of demonstrations. This formulation removes the deterministic and/or open-loop restriction imposed by prior work based on the principle of maximum entropy. Second, to mitigate the curse of history, we propose using a BDD to encode the time unrolled Markov Decision Process that the maximum causal entropy forecaster is defined over. We prove that this BDD has size that grows linearly with the horizon and quasi-linearly with the number of actions. Furthermore, we prove that our derived likelihood estimates are robust to the particular reward associated with satisfying the specification. Finally, we provide an initial experimental validation of our method. An overview of this pipeline is provided in Fig. 8.

## **2 Problem Setup**

We seek to learn task specifications from demonstrations provided by a teacher who executes a sequence of actions that probabilistically change the system state. For simplicity, we assume that the set of actions and states are finite and fully observed. Further, until Sect. 5.3, we shall assume that all demonstrations are a fixed length, *τ* <sup>∈</sup> <sup>N</sup>. Formally, we begin by modeling the underlying dynamics as a probabilistic automaton.

**Definition 1** *<sup>A</sup> probabilistic automaton (PA) is a tuple* (*S, s*<sup>0</sup>*, A, δ*)*, where <sup>S</sup> is the finite set of states, <sup>s</sup>*<sup>0</sup> <sup>∈</sup> *<sup>S</sup> is the initial state, <sup>A</sup> is a finite set of actions, and δ specifies the transition probability of going from state s to state s given action <sup>a</sup>, i.e. <sup>δ</sup>*(*s, a, s* ) = Pr(*s* <sup>|</sup> *s, a*)*. <sup>A</sup> tracea, <sup>ξ</sup>, is a sequence of (action, state) pairs implicitly starting from s*<sup>0</sup>*. A trace of length τ* <sup>∈</sup> <sup>N</sup> *is an element of* (*A* <sup>×</sup> *S*)*<sup>τ</sup> .*

*<sup>a</sup> sometimes referred to as a trajectory or behavior.*

Note that probabilistic automata are equivalently characterized as *1<sup>1</sup>/<sup>2</sup> player games* where each round has the agent choose an action and then the environment samples a state transition outcome. In fact, this alternative characterization is implicitly encoded in the directed bipartite graph used to visualize probabilistic automata (see Fig. 2b). In this language, we refer to the nodes where the agent makes a decision as a **decision node** and the nodes where the environment samples an outcome as a **chance node**.

Next, we develop machinery to distinguish between desirable and undesirable traces. For simplicity, we focus on finite trace properties, referred to as specifications, that are decidable within some fixed *τ* <sup>∈</sup> <sup>N</sup> time steps, e.g., "Recharge before t = 20."

**Fig. 2.** Example of gridworld probabilistic automata (PA).

**Definition 2** *<sup>A</sup> task specification, ϕ, (or simply specification) is a subset of traces. For simplicity, we shall assume that each trace is of a fixed length τ* <sup>∈</sup> <sup>N</sup>*, e.g.,*

$$
\varphi \subseteq (A \times S)^{\tau} \tag{1}
$$

*A collection of specifications, Φ, is called a concept class. Further, we define true* def = (*A* <sup>×</sup> *S*)*<sup>τ</sup> ,* <sup>¬</sup>*ϕ* def <sup>=</sup> *true* \ *ϕ, and false* def = ¬*true.*

Often specifications are not directly given as sets, but induced by abstract descriptions of a task. For example, the task "avoid lava" induces a concrete set of traces that never enter lava tiles. If the workspace/world/dynamics change, this abstract specification would map to a different set of traces.

#### **2.1 Specification Inference from Demonstrations**

The primary task in this paper is to find the specification that best explains/ forecasts the behavior of an agent. As in our prior work [30], we formalize our problem statement as:

**Definition 3** *The specification inference from demonstrations problem is a tuple* (*M, X, Φ, D*) *where M* = (*S, s*0*, A, δ*) *is a probabilistic automaton, X is a (multi-)set of τ -length traces drawn from an unknown distribution induced by a teacher attempting to demonstrate (satisfy) some unknown task specification within M, Φ is a concept class of specifications, and D is a prior distribution over Φ. A solution to* (*M, X, Φ, D*) *is:*

$$\varphi^\* \in \operatorname\*{arg\,max}\_{\varphi \in \Phi} \text{Pr}(X \mid M, \varphi) \cdot \Pr\_{\varphi \sim D}(\varphi) \tag{2}$$

*where* Pr(*X* <sup>|</sup> *M,ϕ*) *denotes the likelihood that the teacher would have demonstrated X given the task ϕ.*

Of course, by itself, the above formulation is ill-posed as Pr(*X* <sup>|</sup> *M,ϕ*) is left undefined. Below, we shall propose leveraging Maximum Causal Entropy Inverse Reinforcement Learning (IRL) to select the demonstration likelihood distribution in a regret minimizing manner.

#### **3 Leveraging Inverse Reinforcement Learning**

The key idea of Inverse Reinforcement Learning (IRL), or perhaps more accurately Inverse Optimal Control, is to find the reward structure that best explains the actions of a reward optimizing agent operating in a Markov Decision Process. We formalize below.

**Definition 4** *A Markov Decision Process (MDP) is a probabilistic automaton endowed with a reward map from states to reals, r* : *S* <sup>→</sup> <sup>R</sup>*. This reward mapping is lifted to traces via,*

$$R(\xi) \stackrel{\text{def}}{=} \sum\_{s \in \xi} r(s). \tag{3}$$

*Remark 1.* Note that a temporal discount factor, *γ* <sup>∈</sup> [0*,* 1] can be added into (3) by introducing a sink state, \$, to the MDP, where *r*(\$) = 0 and

$$\Pr(s'=\ $\mid s,a) = \begin{cases} \gamma & \text{if } s \neq \$ \\ 1 & \text{otherwise} \end{cases}.\tag{4}$$

Given a MDP, the goal of an agent is to maximize the expected trace reward. In this work, we shall restrict ourselves to rewards that are given as a linear combination of **state features**, **<sup>f</sup>** : *S* <sup>→</sup> <sup>R</sup>*<sup>n</sup>* <sup>≥</sup><sup>0</sup>, e.g.,

$$r(s) = \theta \cdot \mathbf{f}(s) \tag{5}$$

for some *<sup>θ</sup>* <sup>∈</sup> <sup>R</sup>*n*. Note that since state features can themselves be rewards, such a restriction does not actually restrict the space of possible rewards.

*Example 1.* Let the components of **<sup>f</sup>**(*s*) be distances to various locations on a map. Then the choice of *θ* characterizes the relative preferences in avoiding/reaching the respective locations.

Formally, we model an agent as acting according to a **policy**.

**Definition 5** *A policy, π, is a state indexed distribution over actions,* Pr(*a* <sup>|</sup> *s*) = *π*(*a* <sup>|</sup> *s*)*.* (6)

In this language, the agent's goal is equivalent to finding a policy which maximizes the expected trace reward. We shall refer to a trace generated by such an agent as a **demonstration**. Due to the Markov requirement, the likelihood of a demonstration, *ξ*, given a particular policy, *π*, and probabilistic automaton, *M*, is easily stated as:

$$\Pr(\xi \mid M, \pi) = \prod\_{s', a, s \in \xi} \Pr(s' \mid s, a) \cdot \Pr(a \mid s). \tag{7}$$

Thus, the likelihood of multi-set of i.i.d demonstrations, *X*, is given by:

$$\Pr(X \mid M, \pi) = \prod\_{\xi \in X} \Pr(\xi \mid M, \pi). \tag{8}$$

#### **3.1 Inverse Reinforcement Learning (IRL)**

As previously stated, the main motivation in introducing the MDP formalism has been to discuss the inverse problem. Namely, given a set of demonstrations, find the reward that best "explains" the agent's behavior, where by "explain" one typically means that under the conjectured reward, the agent's behavior was approximately optimal. Notice however, that many undesirable rewards satisfy this property. For example, consider the following reward in which every demonstration is optimal,

$$r: s \mapsto 0. \tag{9}$$

Furthermore, observe that given a fixed reward, many policies are approximately optimal! For instance, using (9), an optimal agent could pick actions uniformly at random or select a single action to always apply.

#### **3.2 Maximum Causal Entropy IRL**

A popular, and in practice effective, solution to the lack of unique policy conundrum is to appeal to the **principle of maximum causal entropy** [32]. To formalize this principle, we recall the definitions of causally conditioned probability [17] and causal entropy [17,23].

**Definition 6** *Let <sup>X</sup>*1:*τ* def <sup>=</sup> *<sup>X</sup>*<sup>1</sup>*,...,Xτ denote a temporal sequence of <sup>τ</sup>* <sup>∈</sup> <sup>N</sup> *random variables. The probability of a sequence <sup>Y</sup>*1:*τ causally conditioned on sequence <sup>X</sup>*1:*τ is:*

$$\Pr(Y\_{1:\tau} \mid \mid X\_{1:\tau}) \stackrel{\text{def}}{=} \prod\_{t=1}^{\tau} \Pr(Y\_t \mid X\_{1:t}, Y\_{1:t-1}) \tag{10}$$

*The causal entropy of <sup>Y</sup>*1:*τ given <sup>X</sup>*1:*τ is defined as,*

$$H(Y\_{1:\tau} \mid \mid X\_{1:\tau}) \stackrel{\text{def}}{=} \underset{Y\_{1:\tau}, X\_{1:\tau}}{\mathbb{E}} [-\log(\Pr(Y\_{1:\tau} \mid \mid X\_{1:\tau}))] \tag{11}$$

In the case of inverse reinforcement learning, the principle of maximum causal entropy suggests forecasting using the policy whose action sequence, *<sup>A</sup>*1:*τ* , has the highest causal entropy, conditioned on the state sequence, *<sup>S</sup>*1:*τ* . That is, find the policy that maximizes

$$H(A\_{1:\tau} \parallel S\_{1:\tau}),\tag{12}$$

subject to feature matching constraints, <sup>E</sup>[**f**], e.g., does the resulting policy, *<sup>π</sup>*<sup>∗</sup>, complete the task as seen in the data. Compared to all other policies, this policy (i) minimizes regret with respect to model/reward uncertainty, (ii) ensures that the agent's predicted policy does not depend on the future, (iii) is consistent with observed feature statistics [32].

Concretely, as proved in [32], when an agent is attempting to maximize the sum of feature state rewards, *<sup>T</sup> t*=1 *<sup>θ</sup>* · **<sup>f</sup>**(*st*), the principle of maximum causal entropy prescribes the following policy:

#### **Maximum Causal Entropy Policy:**

$$\log\left(\pi\_{\theta}(a\_t \mid s\_t)\right) \stackrel{\text{def}}{=} Q\_{\theta}(a\_t, s\_t) - V\_{\theta}(s\_t) \tag{13}$$

where

$$\begin{split} Q\_{\theta}(a\_{t},s\_{t}) \stackrel{\text{def}}{=} & \mathbb{E} \left[ V\_{\theta}(s\_{t+1}) \mid s\_{t}, a\_{t} \right] + \theta \cdot \mathbf{f}(s\_{t}) \\ V\_{\theta}(s\_{t}) \stackrel{\text{def}}{=} & \ln \sum\_{a\_{t}} e^{Q\_{\theta}(a\_{t},s\_{t})} \stackrel{\text{def}}{=} & \text{softmax}\_{a\_{t}} Q\_{\theta}(a\_{t},s\_{t}). \end{split} \tag{14}$$

where, *θ* is such that (14) results in a policy which matches feature demonstrations.

*Remark 2.* Note that replacing softmax with max in (14) yields the standard Bellman Backup [2] used to compute the optimal policy in tabular reinforcement learning. Further, it can be shown that maximizing causal entropy corresponds to believing that the agent is exponentially biased towards high reward policies [32]:

$$\Pr(\pi\_{\theta} \mid M) \propto \exp\left(\mathbb{E}[R\_{\theta}(\xi) \mid \pi\_{\theta}, M]\right),\tag{15}$$

where (14) is the most likely policy under (15).

*Remark 3.* In the special case of scalar state features, **<sup>f</sup>** : *<sup>S</sup>* <sup>→</sup> <sup>R</sup>≥<sup>0</sup>, the maximum causal entropy policy (14) becomes increasingly optimal as *θ* <sup>∈</sup> <sup>R</sup> increases (since softmax monotonically approaches max). In this setting, we shall refer to *θ* as the agent's **rationality coefficient**.

#### **3.3 Non-Markovian Rewards**

The MDP formalism traditionally requires that the reward map be Markovian (i.e., state based); however, in practice, many tasks are history dependent, e.g. touch a red tile and then a blue tile.

A common trick within the reinforcement learning literature is to simply change the MDP and add the necessary history to the state so that the reward is Markovian, e.g. a flag for touching a red tile. However, in the case of inverse reinforcement learning, by definition, one does not know what the reward is. Therefore, one cannot assume to a priori know what history suffices.

Further exacerbating the situation is the fact that naïvely including the entire history into the state results in an exponential increase in the number of states. Nevertheless, as we shall soon see, by restricting the class of rewards to represent task specifications, this curse can be mitigated to only result in a blow-up that is at most **linear** in the state space size and in the trace length!

To this end, we shall find it fruitful to develop machinery for embedding the full trace history into the state space. Explicitly, we shall refer to the process of adding all history to a probabilistic automaton's (or MDP's) state as **unrolling**.

**Definition 7** *Let M* = (*S, s*<sup>0</sup>*, A, δ*) *be a PA. The unrolling of M is a PA, M* = (*S , s*<sup>0</sup>*, A, δ* )*, where <sup>S</sup>* <sup>=</sup> {*s*<sup>0</sup>} × ∞ *i*=0 (*A* <sup>×</sup> *S*) *<sup>i</sup> δ* (*ξn*+1*, a, ξn*) = *<sup>δ</sup>*(*sn*+1*, a, sn*) *<sup>ξ</sup>n* <sup>=</sup> *<sup>s</sup>*<sup>0</sup>*,...,*(*an*−<sup>1</sup>*, sn*) *<sup>ξ</sup>n*+1 <sup>=</sup> *<sup>s</sup>*<sup>0</sup>*,...,*(*an, sn*+1) (16)

If *R* : *S<sup>τ</sup>* <sup>→</sup> <sup>R</sup> is a non-Markovian reward over *<sup>τ</sup>* -length traces, then we endow the corresponding unrolled PA with the now Markovian Reward,

$$r'(s\_0, \dots, (a\_{n-1}, s\_n)) \stackrel{\text{def}}{=} \begin{cases} R(s\_0, \dots, s\_n) & \text{if } n = \tau \\ 0 & \text{otherwise} \end{cases} \tag{17}$$

Further, by construction the reward is Markovian in *S* and only depends only *τ* -length state sequences,

$$\sum\_{t=0}^{\infty} r'((s\_0, a\_0), \dots, s\_\tau) = R(s\_0, \dots, s\_\tau). \tag{18}$$

Next, observe that for *<sup>τ</sup>* -length traces, the 1<sup>1</sup>*/*<sup>2</sup> player game formulation's bipartite graph forms a tree of depth *τ* (see Fig. 3). Further, observe that each leaf corresponds to unique *τ* -length trace. Thus, to each leaf, we associate the corresponding trace's reward, *R*(*ξ*). We shall refer to this tree as a **decision tree**, denoted T.

**Fig. 3.** Decision tree generated by the PA shown in Fig. 2 and specification "By *τ* = 2, reach a yellow tile while avoiding red tiles.". Here a binary reward is given depending on whether or not the agent satisfies the specification. (Color figure online)

Finally, observe that the trace reward depends only on the sequence of agent actions, *<sup>A</sup>*, and environment actions, *<sup>A</sup>e*. That is, <sup>T</sup> can be interpreted as a function:

$$\mathbb{T}: (A \times A\_e)^{\tau} \to \mathbb{R}.\tag{19}$$

#### **3.4 Specifications as Non-Markovian Rewards**

Next, with the intent to frame our specification inference problem as an inverse reinforcement learning problem, we shall overload notation and denote by *ϕ* the following non-Markovian reward corresponding to a specification *ϕ* <sup>∈</sup> (*A* <sup>×</sup> *S*)*<sup>τ</sup>* ,

$$\varphi(\xi) \stackrel{\text{def}}{=} \begin{cases} 1 & \text{if } \xi \in \varphi \\ 0 & \text{otherwise} \end{cases} . \tag{20}$$

Note that the corresponding decision tree is then a Boolean predicate:

$$\mathbb{T}\_{\varphi}: (A \times A\_e)^{\tau} \to \{0, 1\}. \tag{21}$$

#### **3.5 Computing Maximum Causal Entropy Specification Policies**

Now let us return to the problem of computing the policy prescribed by (14). In particular, note that viewing the unrolled reward (17) as a scalar state feature results in the following soft-Bellman Backup:

$$\begin{aligned} Q\_{\theta}(a\_t, \xi\_t) &= \mathbb{E}\left[V\_{\theta}(s\_{t+1}) \mid \xi\_t, a\_t\right] \\ V\_{\theta}(\xi\_t) &= \begin{cases} \theta \cdot \varphi(\xi\_t) & \text{if } t = \tau \\ \text{softmax}\_{a\_t} Q\_{\theta}(a\_t, \xi\_t) & \text{otherwise} \end{cases} \end{aligned} \tag{22}$$

where *<sup>ξ</sup>i* ∈ {*s*<sup>0</sup>} × (*<sup>A</sup>* <sup>×</sup> *<sup>S</sup>*)*<sup>i</sup>* denotes a state in the unrolled MDP.

Equation (22) thus suggests a naïve dynamic programming scheme over T starting at the *<sup>t</sup>* <sup>=</sup> *<sup>τ</sup>* leaves to compute *<sup>Q</sup>θ* and *<sup>V</sup>θ* (and thus *<sup>π</sup>θ*).

Namely, in T, the chance nodes, which correspond to action/state pairs, are responsible for computing *Q* values and the decision nodes, which correspond to states waiting for an action to be applied, are responsible for computing *V* values. For chance nodes this is done by taking the softmax of the values of the child nodes. Similarly, for decision nodes, this is done by taking a weighted average of the child nodes, where the weights correspond to the probability of a given transition. This,

**Fig. 4.** Computation graph generated from applying (14) to the decision tree shown in Fig. 3. Here smax and avg denote the softmax and weighted average respectively.

at least conceptually, corresponds to transforming T into a bipartite computation graph (see Fig. 4).

Next, note that (i) the above dynamic programming scheme can be trivially modified to compute the expected trace reward of the maximum causal entropy policy and (ii) the expected reward increases<sup>1</sup> with the rationality coefficient *θ*.

Observe then that, due to monotonicity, bisection (binary search) approximates *θ* to tolerance  in *O*(log(1*/* )) time. Additionally, notice that the likelihood of each demonstration can be computed by traversing the path of length *τ* in <sup>T</sup> corresponding to the trace and multiplying the corresponding policy and transition probabilities (8). Therefore, if <sup>|</sup>*Ae*| ∈ <sup>N</sup> denotes the maximum number of outcomes the environment can choose from (i.e, the branching factor for chance nodes), it follows that the run-time of this naïve scheme is:

$$O\left(\overbrace{\underbrace{|A|\cdot|A\_{e}|\right}^{\text{compute policy}}}^{\text{compute policy}} \cdot \underbrace{\log(1/\epsilon)}\_{\text{Feature Matching}} + \underbrace{\tau|X|}\_{\text{evaluate elements}}\right).\tag{23}$$

<sup>1</sup> Formally, this is due to (a) softmax and average being monotonic (b) trajectory rewards only increasing with *θ*, and (c) *π* exponentially biasing towards high Qvalues.

#### **3.6 Task Specification Rewards**

Of course, the problem with this naïve approach is that explicitly encoding the unrolled tree, T, results in an exponential blow-up in the space and time complexity. The key insight in this paper is that the additional structure of task specifications enables avoiding such costs while still being expressive. In particular, as is exemplified in Fig. 4, the computation graphs for task specifications are often highly redundant and apt for compression.

In particular, we shall apply the following two semantic preserving transformations: (i) Eliminate nodes whose children are isomorphic sub-graphs, i.e., inconsequential decisions (ii) Combine all isomorphic sub-graphs i.e., equivalent decisions. We refer to the limit of applying these two operations as a **reduced ordered probabilistic decision diagram** and shall denote<sup>2</sup> the reduced variant of <sup>T</sup> as <sup>T</sup> .

**Fig. 5.** Reduction of the decision tree shown in Fig. 3.

*Remark 4.* For those familiar, we emphasize that these decision diagrams are MDPs, not Binary Decision Diagrams (see Sect. 4). Importantly, more than two actions can be taken from a node if max(|*A*|*,* <sup>|</sup>*Ae*|) <sup>≥</sup> 2 and *<sup>A</sup>e* has a state dependent probability distribution attached to it. That said, the above transformations are **exactly** the reduction rules for BDDs [3].

As Fig. 5 illustrates, reduced decision diagrams can be much smaller than their corresponding decision tree. Nevertheless, we shall briefly postpone characterizing |T | until developing some additional machinery in Sect. 4. Computationally, three problems remain.


We shall postpone discussing solutions to the second and third problems until Sect. 4. The first problem however, can readily be addressed with the tools at hand. Recall that in the variable ordering, nodes alternate between decision and chance nodes (i.e., agent and environment decisions), and thus alternate between taking a softmax and expectations of child values in (22). Next, by definition, if a node is skipped in T , then it must have been inconsequential. Thus the trace reward must have been independent of the decision made at that node. Therefore, the softmax/expectation's corresponding to eliminated nodes must have been over a constant value - otherwise the eliminated sequences would

<sup>2</sup> Mnemonic: <sup>T</sup> is a (typographically) slimmed down variant of <sup>T</sup>.

be distinguishable w.r.t *ϕ*. The result is summarized in the following identities, where *α* denotes the value of an eliminated node's children.

$$\text{softmax}(\overbrace{\alpha, \dots, \alpha}^{|A|}) = \log(e^{\alpha} + \dots + e^{\alpha}) = \ln(|A|) + \alpha \tag{24}$$

$$\mathbb{E}[\alpha] = \sum\_{x} p(x)\alpha = \alpha \tag{25}$$

Of course, it could also be the case that a sequence of nodes is skipped in T . Using (24), one can compute the change in value, *Δ*, that the eliminated sequence of *n* decision nodes and any number of chance nodes would have applied in <sup>T</sup>:

$$\Delta(n,\alpha) = \ln(|A|^n) + \alpha = n\ln(|A|) + \alpha \tag{26}$$

Crucially, evaluation of this compressed computation graph is linear in |T | which as shall later prove, is often much smaller than <sup>|</sup>T|.

## **4 Constructing and Characterizing** *T*

Let us now consider how to avoid the construction of T and characterize the size of the reduced ordered decision diagram, T . We begin by assuming that the underlying dynamics is well-approximated in the random-bit model.

**Definition 8** *For q* <sup>∈</sup> <sup>N</sup>*, let* **<sup>c</sup>** ∼ {0*,* <sup>1</sup>} *q denote the random variable representing the result of flipping q* <sup>∈</sup> <sup>N</sup> *fair coins. We say a probabilistic automata M* = (*S, s*0*, A, δ*) *is* ( *, q*) *approximated in the random bit model if there exists a mapping,*

$$
\delta : S \times A \times \{0, 1\}^q \to S \tag{27}
$$

*such that for all s, a, s* <sup>∈</sup> *S* <sup>×</sup> *A* <sup>×</sup> *S:*

$$\left| \begin{array}{c} \delta(s, a, s') - \Pr\_{\mathbf{c} \sim \{0, 1\}^q} \left( \hat{\delta}(s, a, c) = s' \right) \right| \leq \epsilon. \tag{28} \end{array} \tag{28}$$

For example, in our gridworld example (Fig. 2a), if **<sup>c</sup>** ∈ {0*,* <sup>1</sup>} 3 , elements of *s* are interpreted as pairs in <sup>R</sup><sup>2</sup>, and the right/down actions are interpreted as the addition of the unit vectors (1*,* 0) and (0*,* 1) then,

$$\hat{\delta}(s, a, \mathbf{c}) = \begin{cases} s & \text{if } \max\_{i} [(s+a)\_{i}] > 1 \\ s + (0, 1) & \text{else if } \mathbf{c} = 0 \\ s + a & \text{otherwise} \end{cases},\tag{29}$$

As can be easily confirmed, (29) satisfies (28) with  = 0. In the sequel, we shall take access to <sup>ˆ</sup>*δ* as given<sup>3</sup>. Further, to simplify exposition, until Sect. 5.1, we

<sup>3</sup> See [31] for an explanation on systematically deriving such encodings.

shall additionally require that the number of actions, <sup>|</sup>*A*|, be a power of 2. This assumption implies that *A* can be encoded using exactly log2(|*A*|) bits.

Under the above two assumptions, the key observation is to recognize that T (and thus T ) can be viewed as a Boolean predicate over an alternating sequence of action bit strings and coin flip outcomes determining if the task specification is satisfied, i.e.,

$$\mathbb{T}: \{0, 1\}^n \to \{0, 1\}, \tag{30}$$

where *n* def <sup>=</sup> *<sup>τ</sup>* · log2(|*<sup>A</sup>* <sup>×</sup> *<sup>A</sup>e*|) = *<sup>τ</sup>* · (*<sup>q</sup>* + log2(|*A*|)). That is to say, the resulting decision diagram can be re-encoded as a reduced ordered **binary** decision diagram [3].

**Definition 9** *A reduced ordered binary decision diagram (BDD), is a representation of a Boolean predicate <sup>h</sup>*(*x*1*, x*2*,...,xn*) *as a reduced ordered (deterministic) decision diagram, where each decision corresponds to testing a bit <sup>x</sup>i* ∈ {0*,* <sup>1</sup>}*. We denote the BDD encoding of* <sup>T</sup> *as* <sup>B</sup>*.*

Binary decision diagrams are well developed both in a theoretical and practical sense. Before exploring these benefits, we first note that this change has introduced an additional problem. First, note that in B, decision and chance nodes from T are now encoded as sequences of decision and chance nodes. For example, if *a* <sup>∈</sup> *A* is encoded by the 4-length bit sequence *b*1*b*2*b*3*b*<sup>4</sup>, then four decisions are made by the agent before selecting an action. Notice however that the original semantics are preserved due to associativity of the softmax and E operators. In particular, recall that by definition,

$$\begin{aligned} \text{softmax}(\alpha\_1, \dots, \alpha\_4) &= \ln(\sum\_{i=1}^4 e^{\alpha\_i}) = \ln(e^{\ln(e^{\alpha\_1} + e^{\alpha\_2})} + e^{\ln(e^{\alpha\_3} + e^{\alpha\_4})}) \\ &\stackrel{\text{def}}{=} \text{softmax}(\text{softmax}(\alpha\_1, \alpha\_2), \text{softmax}(\alpha\_3, \alpha\_4)) \end{aligned} \tag{31}$$

and thus the semantics of the sequence decision nodes is equivalent to the decision node in T. Similarly, recall that the coin flips are fair, and thus expectations are computed via avg(*α*<sup>1</sup>*,...,αn*) = <sup>1</sup>*/n*( *n i*=1 *<sup>α</sup>i*). Therefore, averaging over two sequential coin flips yields,

$$\text{avg}(\alpha\_1, \dots, \alpha\_4) \stackrel{\text{def}}{=} \frac{1}{4} \sum\_{i=1}^4 \alpha\_i = \frac{1}{2} (\frac{1}{2}(\alpha\_1 + \alpha\_2) + \frac{1}{2}(\alpha\_3 + \alpha\_4))\tag{32}$$

$$\stackrel{\text{def}}{=} \text{avg}(\text{avg}(\alpha\_1, \alpha\_2), \text{avg}(\alpha\_3, \alpha\_4))$$

which by assumption (28), is the same as applying E on the original chance node. Finally, note that skipping over decisions needs to be adjusted slightly to account for sequences of decisions. Recall that via (26), the corresponding change in value, *Δ*, is a function of initial value, *α*, and the number of agent actions skipped, i.e., <sup>|</sup>*A*<sup>|</sup> *<sup>n</sup>* for *n* skipped decision nodes. Thus, in the BDD, since each decision node has two actions, skipping *k* decision bits corresponds to skipping <sup>2</sup>*<sup>k</sup>* actions. Thus, if *k* decision bits are skipped over in the BDD, the change in value, *Δ*, becomes,

$$
\Delta(k,\alpha) = \alpha + k \ln(2). \tag{33}
$$

Further, note that *Δ* can be computed in constant time while traversing the BDD. Thus, the dynamic programming scheme is linear in the size of B.

#### **4.1 Size of** *B*

Next we return to the question of how big the compressed decision diagram can actually be. To this aim, we cite the following (conservative) bound on the size of an BDD given an encoding of the corresponding Boolean predicate in the linear model computation illustrated in Fig. 6 (for more details, we refer the reader to [16]).

**Fig. 6.** Generic network of Boolean modules for which Theorem 1 holds.

In particular, consider an arbitrary Boolean predicate

$$f: \{0, 1\}^n \to \{0, 1\} \tag{34}$$

and a sequential arrangement of *<sup>n</sup>* Boolean modules, *<sup>f</sup>*<sup>1</sup>*, f*<sup>2</sup>*,...,fn* where each *<sup>f</sup>i* has shape:

$$f\_i: \{0, 1\} \times \{0, 1\}^{a\_{i-1}} \times \{0, 1\}^{b\_i} \to \{0, 1\}^{a\_i} \times \{0, 1\}^{b\_{i-1}},\tag{35}$$

and takes as input *<sup>x</sup>i* as well as *<sup>a</sup>i*−<sup>1</sup> outputs of its left neighbor and *<sup>b</sup>i* outputs of the right neighbor (*b*<sup>0</sup> = 0*, an* = 1). Further, assume that this arrangement is well defined, e.g. for each assignment to *<sup>x</sup>*<sup>1</sup>*,...,xn* there exists a unique way to set each of the inter-module wires. We say these modules compute *f* if the final output is equal to *<sup>f</sup>*(*x*<sup>1</sup>*,...,xn*).

**Theorem 1** *If f can be computed by a linear arrangement of such modules, ordered <sup>x</sup>*<sup>1</sup>*, x*<sup>2</sup>*,...,xn, then the size, <sup>S</sup>* <sup>∈</sup> <sup>N</sup>*, of its BDD (in the same order), is upper bounded [3] by:*

$$S \le \sum\_{k=1}^{n} 2^{a\_k \cdot \binom{2^{b\_k}}{k}}.\tag{36}$$

To apply this bound to our problem, recall that B computes a Boolean function where the decisions are temporally ordered and alternate between sequences of agent and environment decisions. Next, observe that because the traces are bounded (and all finite sets are regular), there exists a finite state machine which can monitor the satisfaction of the specification.

*Remark 5.* In the worst case, the monitor could be the unrolled decision tree, T. This monitor would have exponential number of states. In practice, the composition of the dynamics and the monitor is expected to be much smaller.

Further, note that because this composed system is causal, no backward wires are needed, e.g., <sup>∀</sup>*k.bk* = 0. In particular, observe that because the composition of the dynamics and the monitor is Markovian, the entire system can be uniquely described using the monitor/dynamics state and agent/environment action (see Fig. 7). This description can be encoded in log2(2*q*|*<sup>A</sup>* <sup>×</sup> *<sup>S</sup>* <sup>×</sup> *<sup>S</sup>ϕ*|) bits, where *<sup>q</sup>* denotes the number of coin flips tossed by the environment and *<sup>S</sup>ϕ* denotes the monitor state. Therefore, *<sup>a</sup>k* is upper bounded by log2(2*q*|*<sup>A</sup>* <sup>×</sup> *<sup>S</sup>* <sup>×</sup> *<sup>S</sup>ϕ*|). Combined with (36) this results in the following bound on the size of <sup>B</sup>.

**Corollary 1** *Let <sup>M</sup>* = (*S, s*0*, A, δ*) *be a probabilistic automaton whose probabilistic transitions can be approximated using q coin flips and let ϕ be a specification defined for horizon τ and monitored by a finite automaton with states <sup>S</sup>ϕ. The corresponding BDD,* <sup>B</sup>*, has size bounded by:*

$$\underbrace{|\mathcal{B}| \le \tau \cdot \left(\log(|A|) + q\right) \cdot \left(2^q |A \times S \times S\_\varphi|\right)}\_{} \tag{37}$$

Notice that the above argument implies that as the episode length grows, |B| grows linearly in the horizon/states and quasilinearly in the agent/environment actions!

*Remark 6.* Note that this bound actually holds for the minimal representation of the composed dynamics/monitor (even if it's unknown a-prori!). For example, if the property is *true*, the BDD requires only one state (always evaluate true). This also illustrates that the above bound is often very conserva-

**Fig. 7.** Generic module in linear model of computation for B. Note that backward edges are not required.

tive. In particular, note that for *ϕ* <sup>=</sup> *true*, |B| = 1, independent of the horizon or dynamics. However, the above bound will always be linear in *τ* . In general, the size of the BDD will depend on the particular symmetries compressed.

*Remark 7.* With hindsight, Corollary 1 is not too surprising. In particular, if the monitor is known, then one could explicitly compose the dynamics MDP with the monitor, with the resulting MDP having at most <sup>|</sup>*<sup>S</sup>* <sup>×</sup> *<sup>S</sup>ϕ*<sup>|</sup> states. If one then includes the time step in the state, one could perform the soft-Bellman Backup directly on this automaton. In this composed automaton each (action, state) pair would need to be recorded. Thus, one would expect *<sup>O</sup>*(|*<sup>S</sup>* <sup>×</sup> *<sup>S</sup>ϕ* <sup>×</sup> *<sup>A</sup>*|) space to be used. In practice, this explicit representation is much bigger than B due to the BDDs ability to skip over time steps and automatically compress symmetries.

## **4.2 Constructing** *B*

One of the biggest benefits of the BDD representation of a Boolean function is the ability to build BDDs from a Boolean combinations of other BDDs. Namely, given two BDDs with *n* and *m* nodes respectively, it is well known that the conjunction or disjunction of the BDDs has at most *n*·*m* nodes. Thus, in practice, if the combined BDD's remain relatively small, Boolean combinations remain efficient to compute and one does not construct the full binary decision tree! Further, note that BDDs support function composition. Namely, given predicates *<sup>f</sup>*(*x*1*,...,xn*) and *<sup>n</sup>* predicates *<sup>g</sup>i*(*y*1*,...,yk*) the function

$$f\left(g\_1(y\_1,\ldots,y\_k),\ldots,g\_n(y\_1,\ldots,y\_k)\right)\tag{38}$$

can be computed in time [16]:

$$O(n \cdot |B\_f|^2 \cdot \max\_i |B\_{g\_i}|),\tag{39}$$

where *<sup>B</sup><sup>f</sup>* is the BDD for *<sup>f</sup>* and *<sup>B</sup>g<sup>i</sup>* are the BDDs for *<sup>g</sup>i*. Now, suppose ˆ*<sup>δ</sup>*1*,...* <sup>ˆ</sup>*δ*log(|*S*|) are Boolean predicates such that:

$$\hat{\delta}(\mathbf{s}, \mathbf{a}, \mathbf{c}) = (\hat{\delta}\_1(\mathbf{s}, \mathbf{a}, \mathbf{c}), \dots, \hat{\delta}\_{\log(|S|)}(\mathbf{s}, \mathbf{a}, \mathbf{c})). \tag{40}$$

Theorem 1 and an argument similar to that for Corollary 1 imply then that constructing B, using repeated composition, takes time bounded by a low degree polynomial in <sup>|</sup>*<sup>A</sup>* <sup>×</sup> *<sup>S</sup>* <sup>×</sup> *<sup>S</sup>ϕ*<sup>|</sup> and the horizon. Moreover, the space complexity before and after composition are bounded by Corollary 1.

#### **4.3 Evaluating Demonstrations**

Next let us return to the question of how to evaluate the likelihood of a concrete demonstration in our compressed BDD. The key problem is that the BDD can only evaluate (binary) sequences of actions/coin flips, where as demonstrations are given as sequences of action/state pairs. That is, we need to algorithmically perform the following transformation.

$$\mathbf{a}\_0 \mathbf{a}\_0 \mathbf{s}\_1 \dots \mathbf{a}\_n \mathbf{s}\_{n+1} \mapsto \mathbf{a}\_1 \mathbf{c}\_1 \dots \mathbf{a}\_n \mathbf{c}\_n \tag{41}$$

Given the random bit model assumption, this transformation can be rewritten as a series of Boolean Satisfiability problems:

$$\exists \text{ } \mathbf{c}\_i \; . \; \hat{\delta}(\mathbf{s}\_i, \mathbf{a}\_i, \mathbf{c}\_i) = \mathbf{s}\_{i+1} \tag{42}$$

While potentially intimidating, in practice such problems are quite simple for modern SAT solvers, particularly if the number of coin flips used is small. Furthermore, many systems are translation invariant. In such systems, the results of a single query (42), can be reused on other queries. For example, in (29), **c** = **0** always results in the agent moving to the right. Nevertheless, in general, if *q* coin flips are used, encoding all the demonstrations takes at most *O*(|*X*| · *τ* · <sup>2</sup>*q*), in the worst case.

#### **4.4 Run-Time Analysis**

We are finally ready to provide a run-time analysis for our new inference algorithm. The high-level likelihood estimation procedure is described in Fig. 8. First, the user specifies a dynamical system and a (multi-) set of demonstrations. Then, using a user-defined mechanism, a candidate task specification is selected. The system then creates a compressed representation of the composition of the dynamical system with the task specification. Then, in parallel, the maximum causal entropy policy is estimated and the demonstrations are themselves encoded as bit-vectors. Finally, the likelihood of generating the encoded demonstrations is computed.

**Fig. 8.** High level likelihood estimation procedure described in this paper.

There are three computational bottlenecks in the compressed scheme. First, given a candidate specification, *ϕ*, one needs to construct <sup>B</sup>. As argued in Sect. 4.2, this takes time at most polynomial in the horizon, monitoring automata size, and MDP size (in the random-bit model). Second is the process of computing *Q* and *V* values by tuning the rationality coefficient to match a particular satisfaction probability. Just as with the naïve run-time (23), this process takes time linear in the size of |B| and logarithmic in the inverse tolerance 1*/* . Further, using Corollary 1, we know that |B| is at most linear in horizon and quasi-linear in the MDP size. Thus, the policy computation takes time polynomial in the MDP size and logarithmic in the inverse tolerance. Finally, as before, evaluating the likelihoods takes time linear in the number of demonstrations and the horizon. However, we now require an additional step of finding coin-flips which are consistent with the demonstrations. Thus, the compressed run-time is bounded by:

$$O\left(\left(\underbrace{|X\rangle}\_{\#\text{Demos}}\xrightarrow{\text{Feature Matching}}\right)\cdot\text{POLY}\left(\overbrace{\tau,\ |S|,|S\_{|\varphi|}|,|A|}^{\#\text{Coin Filter}},\underbrace{2^{q}}\_{\text{Compressed}}\right)\right)^{\text{\#Coin Floo}}\tag{43}$$

*Remark 8.* In practice, this analysis is fairly conservative since BDD composition is often fast, the bound given by Corollary 1 is loose, and the SAT queries underconsideration are often trivial.

#### **5 Additional Model Refinements**

#### **5.1 Conditioning on Valid Actions**

So far, we have assumed that the number of actions is a power of 2. Functionally, this assumption makes it so each assignment to the action decision bits corresponds to a valid action. Of course, general MDPs have non-power of 2 action sets, and so it behooves us to adapt our method for such settings. The simplest way to do so is to use a 3-terminal Binary Decision Diagram. In particular, while each decision is still Boolean, there has now three possible types of leaves, 0, 1, and ⊥. In the adapted algorithm, edges leading to ⊥ are simply ignored, as they semantically correspond to invalid assignments to action or coin flip bits. A similar analysis can be done using these three valued decision diagrams, and as with BDDs, there exist efficient implementations of multi-terminal BDDs.

*Remark 9.* This generalization also opens up the possibility of state dependent action sets, where *A* is now the union of all possible actions, e.g, disable the action for moving to the right when the agent is on the right edge of the grid.

#### **5.2 Choice of Binary Co-Domain**

One might wonder how sensitive this formulation is to the choice of *R*(*ξ*) = *θ* · *ϕ*(*ξ*). In particular, how does changing the co-domain of *ϕ* from {0*,* <sup>1</sup>} to any other real values, i.e.,

$$
\varphi' : (A \times S)^{\tau} \to \{a, b\},
$$

change the likelihood estimates in our maximum causal entropy model. We briefly remark that, subject to some mild technical assumptions, almost any two real values could be used for *ϕ*'s co-domain. Namely, observe that unless both *a* and *b* are zero, the expected satisfaction probability, *p*, is in one-to-one correspondence with the expected value of *ϕ* , i.e.,

$$\mathbb{E}[\varphi'] = a \cdot p + b \cdot (1 - p).$$

Thus, if a policy is feature matching for *ϕ*, it must be feature matching for *ϕ* (and vice-versa). Therefore, the space of consistent policies is invariant under such transformations. Finally, because the space of policies is unchanged, the maximum causal entropy policies must remain unchanged. In practice, we prefer the use of {0*,* <sup>1</sup>} as the co-domain for *ϕ* since it often simplifies many calculations.

#### **5.3 Variable Episode Lengths (with Discounting)**

As earlier promised, we shall now discuss how to extend our model to include variable length episodes. For simplicity, we shall limit our discussion to the setting where at each time step, the probability that the episode will end is *γ* <sup>∈</sup> (0*,* 1]. As we previously discussed, this can be modeled by introducing a sink state, \$, representing the end of an episode (4). In the random bit model, this simply adds a few additional environment coin flips, corresponding to the environments new transitions to the sink state.

*Remark 10.* Note that when unrolled, once the end of episode transition happens, all decisions are assumed inconsequential w.r.t *ϕ*. Thus, all subsequent decisions will be compressed by in the BDD, B.

Finally, observe that the probability that the episode ending increases exponentially, implying that the planning horizon need not be too big, i.e., the probability that the episode has not ended by timestep, *τ* <sup>∈</sup> <sup>N</sup>, is: (1 <sup>−</sup> *γ*)*<sup>τ</sup> .* Thus, letting *τ* <sup>=</sup> ln(*/*1−*γ*) ensures that with probability at least 1<sup>−</sup>the episode has ended.

#### **6 Experiment**

Below we report empirical results that provide evidence that our proposed technique is robust to demonstration errors and that the produced BDDs are smaller than a naïve dynamic programming scheme. To this end, we created a reference implementation [29] in Python. BDD and SAT solving capabilities are provided via dd [21] and pySAT [12] respectively. To encode the task specifications and the random-bit model MDP, we leveraged the py-aiger ecosystem [28] which includes libraries for modeling Markov Decision Processes and encoding Past Tense Temporal Logic as sequential circuits.

**Problem:** Consider a gridworld where an agent can attempt to move up, down, left, or right; however, with probability 1*/*32, the agent slips and moves left. Further, suppose a demonstrator has provided the six unlabeled demonstrations shown in Fig. 9 for the task: "Within 10 time steps, touch a yellow (recharge) tile while avoiding red (lava) tiles. Additionally, if a blue (water) tile is stepped on, the agent must step on a brown (drying) tile before going to a yellow (recharge) tile." All of the solid paths satisfy the task. The dotted path fails because

**Fig. 9.** Example Gridworld (Color figure online)

the agent keeps slipping left and thus cannot dry off by *t* = 10. Note that due to slipping, all the demonstrations that did not enter the water are sub-optimal.


**Results:** For a small collection of specifications, we have computed the size of the BDD, the time it took to construct the BDD, and the *relative* log likelihoods of the demonstrations<sup>4</sup>,

$$\text{RelativeLogLikelihood}(\varphi) \stackrel{\text{def}}{=} \ln \left( \frac{\Pr(\text{demos} \mid \varphi)}{\Pr(\text{demos} \mid \text{true})} \right), \tag{44}$$

where each maximum entropy policy was fit to match the corresponding specification's empirical satisfaction probability. We remark that the computed BDDs are small compared to other straw-man approaches. For example, an explicit construction of the product of the monitor, dynamics, and the current time step would require space given by:

$$
\tau \cdot |S| \cdot |A| \cdot |S\_{\varphi}| = (10 \cdot 8 \cdot 8 \cdot 4) \cdot |S\_{\varphi}| = 2560 \cdot |S\_{\varphi}| \tag{45}
$$

The resulting BDDs are much smaller than (45) and the naïve unrolled decision tree. We note that the likelihoods appear to (qualitatively) match expectations. For example, **despite** an unlabeled negative example, the demonstrated task, *ϕ*<sup>∗</sup>, is the most likely specification. Moreover, under the second most likely specification, which omits the avoid lava constraint, the sub-optimal traces that do not enter the water appear more attractive.

Finally, to emphasize the need for our causal extension, we compute the likelihoods of *<sup>ϕ</sup>*<sup>∗</sup>*, ϕ*<sup>1</sup>*, ϕ*<sup>2</sup> for our opening example (Fig. 1) using both our causal model and the prior non-causal model [30]. Concretely, we take *τ* = 15, a slip probability of 1/32, and fix the expected satisfaction probability to 0.9. The trace shown in Fig. <sup>1</sup> acts as the sole (failed) demonstration for *ϕ*<sup>∗</sup>. As desired, our causal extension assigned more than 3 times the relative likelihood to *ϕ*<sup>∗</sup> compared to *ϕ*<sup>1</sup>, *ϕ*<sup>2</sup>, and *true*. By contrast, the non-causal model assigns relative log likelihoods (−2*.*83*,* <sup>−</sup>3*.*16*,* <sup>−</sup>3*.*17) for (*ϕ*<sup>1</sup>*, ϕ*<sup>2</sup>*, ϕ*<sup>∗</sup>). This implies that (i) *<sup>ϕ</sup>*<sup>∗</sup> is the least likely specification and (ii) each specification is less likely than *true*!

<sup>4</sup> The maximum entropy policy for *ϕ* = true applies actions uniformly at random.

#### **7 Conclusion and Future Work**

Motivated by the problem of learning specifications from demonstrations, we have adapted the principle of maximum causal entropy to provide a posterior probability to a candidate task specification given a multi-set of demonstrations. Further, to exploit the structure of task specifications, we proposed an algorithm that computes this likelihood by first encoding the unrolled Markov Decision Process as a reduced ordered binary decision diagram (BDD). As illustrated on a few toy examples, BDDs are often much smaller than the unrolled Markov Decision Process and thus could enable efficient computation of maximum causal entropy likelihoods, at least for well behaved dynamics and specifications.

Nevertheless, two major questions remain unaddressed by this work. First is the question of how to select which specifications to compute likelihoods for. For example, is there a way to systematically mutate a specification to make it more likely and/or is it possible to systematically reuse computations for previously evaluated specifications to propose new specifications.

Second is how to set prior probabilities. Although we have largely ignored this question, we view the problem of setting good prior probabilities as essential to avoid over fitting and/or making this technique require only one or two demonstrations. However, we note that prior probabilities can make inference arbitrarily more difficult since any structure useful for optimization imposed by our likelihood estimate can be overpowered.

Finally, additional future work includes extending the formalism to infinite horizon specifications, continuous dynamics, and characterizing the optimal set of teacher demonstrations.

**Acknowledgments.** We would like to thank the anonymous referees as well as Daniel Fremont, Ben Caulfield, Marissa Ramirez de Chanlatte, Gil Lederman, Dexter Scobee, and Hazem Torfah for their useful suggestions and feedback. This work was supported in part by NSF grants 1545126 (VeHICaL) and 1837132, the DARPA BRASS program under agreement number FA8750-16-C0043, the DARPA Assured Autonomy program, Toyota under the iCyPhy center, and Berkeley Deep Drive.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Certifying Certainty and Uncertainty in Approximate Membership Query Structures**

Kiran Gopinathan1(B) and Ilya Sergey1,2

<sup>1</sup> School of Computing, National University of Singapore, Singapore, Singapore {kirang,ilya}@comp.nus.edu.sg <sup>2</sup> Yale-NUS College, Singapore, Singapore

**Abstract.** Approximate Membership Query structures (AMQs) rely on randomisation for time- and space-efficiency, while introducing a possibility of false positive and false negative answers. Correctness proofs of such structures involve subtle reasoning about bounds on probabilities of getting certain outcomes. Because of these subtleties, a number of unsound arguments in such proofs have been made over the years.

In this work, we address the challenge of building rigorous and reusable computer-assisted proofs about probabilistic specifications of AMQs. We describe the framework for systematic decomposition of AMQs and their properties into a series of interfaces and reusable components. We implement our framework as a library in the Coq proof assistant and showcase it by encoding in it a number of non-trivial AMQs, such as Bloom filters, counting filters, quotient filters and blocked constructions, and mechanising the proofs of their probabilistic specifications.

We demonstrate how AMQs encoded in our framework guarantee the absence of false negatives *by construction*. We also show how the proofs about probabilities of false positives for complex AMQs can be obtained by means of *verified reduction* to the implementations of their simpler counterparts. Finally, we provide a library of domain-specific theorems and tactics that allow a high degree of automation in probabilistic proofs.

#### **1 Introduction**

Approximate Membership Query structures (AMQs) are probabilistic data structures that compactly implement (multi-)sets via hashing. They are a popular alternative to traditional collections in algorithms whose utility is not affected by some fraction of wrong answers to membership queries. Typical examples of such data structures are Bloom filters [6], quotient filters [5,38], and count-min sketches [12]. In particular, versions of Bloom filters find many applications in security and privacy [16,18,36], static program analysis [37], databases [17], web search [22], suggestion systems [45], and blockchain protocols [19,43].

Hashing-based AMQs achieve efficiency by means of losing precision when answering queries about membership of certain elements. Luckily, most of the c The Author(s) 2020

applications listed above can tolerate *some* loss of precision. For instance, a static points-to analysis may consider two memory locations as aliases even if they are not (a *false positive*), still remaining sound. However, it would be unsound for such an analysis to claim that two locations do not alias in the case they do (a *false negative*). Even if it increases the number of false positives, a randomised data structure can be used to answer aliasing queries in a sound way—as long as it does not have false negatives [37]. But *how much* precision would be lost if, *e.g.*, a Bloom filter with certain parameters is chosen to answer these queries? Another example, in which quantitative properties of false positives are critical, is the security of Bitcoin's Nakamoto consensus [35] that depends on the counts of block production per unit time [19].

In the light of the described above applications, of particular interest are two kinds of properties specifying the behaviour of AMQs:


Given the importance of such claims for practical applications, it is desirable to have machine-checked formal proofs of their validity. And, since many of the existing AMQs share a common design structure, one may expect that a large portion of those validity proofs can be reused across different implementations.

Computer-assisted reasoning about the absence of *false negatives* in a particular AMQ (Bloom filter) has been addressed to some extent in the past [7]. However, to the best of our knowledge, mechanised proofs of probabilistic bounds on the *rates of false positives* did not extend to such structures. Furthermore, to the best of our knowledge, no other existing AMQs have been formally verified to date, and no attempts were made towards characterising the commonalities in their implementations in order to allow efficient proof reuse.

In this work, we aim to advance the state of the art in machine-checked proofs of probabilistic theorems about false positives in randomised hash-based data structures. As recent history demonstrates, when done in a "paper-andpencil" way, such proofs may contain subtle mistakes [8,10] due to misinterpreted assumptions about relations between certain kinds of events. These mistakes are not surprising, as the proofs often need to perform a number complicated manipulations with expressions that capture probabilities of certain events. Our goal is to factor out these reasoning patterns into a standalone library of *reusable* program- and specification-level definitions and theorems, implemented in a proof assistant enabling computer-aided verification of a variety of AMQs.

*Our Contributions.* The key novel observation we make in this work is the decomposition of the common AMQ implementations into the following components: (a) a hashing strategy and (b) a state component that operates over hash outcomes, together capturing most AMQs that provide fixed constant-time insertion and query operations. Any AMQ that is implemented as an instance of those components enjoys the *no-false-negatives* property *by construction*. Furthermore, such a decomposition streamlines the proofs of structure-specific bounds on false positive rates, while allowing for proof reuse for complex AMQ implementations, which are built on top of simpler AMQs [40]. Powered by those insights, this work makes the following technical contributions:


For ordinary Bloom filters, we provide the first mechanised proof that the probability of a false positive in a Bloom filter can be written as a closed form expression in terms of the input parameters; a bound that has often been mischaracterised in the past due to oversight of subtle dependencies between the components of the structure [6,34]. For Counting Bloom filters, we provide the first mechanised proofs of several of their properties: that they have no false negatives, its false positive rate, that an element can be removed without affecting queries for other elements, and the fact that Counting Bloom filters preserve the number of inserted elements irrespective of the randomness of the hash outputs. For quotient filters, we provide a mechanised proof of the false positive rate and of the absence of false negatives. Finally, alongside the standard Blocked Bloom filter [40], we derive two novel AMQ data structures: *Counting Blocked Bloom filters* and *Blocked Quotient filters*, and prove corresponding no-false-negatives and false positive rates for all of them. Our case studies illustrate that Ceramist can be repurposed to verify hash-based AMQ structures, including entirely new ones that have not been described in the literature, but rather have been obtained by composing existing AMQs via the "blocked" construction.

Our mechanised development [24] is entirely *axiom-free*, and is compatible with Coq 8.11.0 [11] and MathComp 1.10 [31]. It relies on the infotheo library [2] for encoding discrete probabilities.

*Paper Outline.* We start by providing the intuition on Bloom filters, our main motivating example, in Sect. 2. We proceed by explaining the encoding of their semantics, auxiliary hash-based structures, and key properties in Coq in Sect. 3.

<sup>1</sup> Ceramist stands for **Cer**tified **A**pproximate **M**embersh**i**p **St**ructures.

Section 4 generalises that encoding to a general AMQ interface, and provides an overview of Ceramist, its embedding into Coq, showcasing it by another example instance—Counting Bloom filters. Section 5 describes the specific techniques that help to structure our mechanised proofs. In Sect. 6, we report on the evaluation of Ceramist on various case studies, explaining in detail our compositional treatment of blocked AMQs and their properties. Section 7 provides a discussion on the state of the art in reasoning about probabilistic data structures.

#### **2 Motivating Example**

Ceramist is a library specialised for reasoning about AMQ data structures in which the underlying randomness arises from the interaction of one or more hashing operations. To motivate this development, we thus consider applying it to the classical example of such an algorithm—a Bloom filter [6].

#### **2.1 The Basics of Bloom Filters**

Bloom filters are probabilistic data structures that provide compact encodings of mathematical sets, trading increased space efficiency for a weaker membership test [6]. Specifically, when testing membership for a value *not* in the Bloom filter, there is a possibility that the query may be answered as positive. Thus a property of direct practical importance is the exact probability of this event, and how it is influenced by the other parameters of the implementation.

A Bloom filter *bf* is implemented as a binary vector of m bits (all initially zeros), paired with a sequence of k hash functions f1,...,f*k*, collectively mapping each input value to a vector of k indices from {<sup>1</sup> ...m}, the indices determine the bits set to true in the m-bit array Assuming an ideal selection of hash functions, we can treat the output of f1,...,f*<sup>k</sup>* on new values as a uniformly-drawn random vector.

To insert a value x into the Bloom filter, we can treat each element of the "hash vector" produced from f1,...,f*<sup>k</sup>* as an index into *bf* and set the corresponding bits to ones. Similarly, to test membership for an element x, we can check that all k bits specified by the hash-vector are raised.

#### **2.2 Properties of Bloom Filters**

Given this model, there are two obvious properties of practical importance: that of false positives and of false negatives.

*False Negatives.* It turns out that these definitions are sufficient to guarantee the lack of false-negatives with complete certainty, *i.e.*, irrespective of the random outcome of the hash functions. This follows from the fact that once a bit is raised, there are no permitted operations that will unset it.

**Theorem 1 (No False Negatives).** *If* <sup>x</sup> <sup>∈</sup> *bf , then* Pr [<sup>x</sup> <sup>∈</sup>? *bf* ]=1*, where* <sup>x</sup> <sup>∈</sup>? *bf stands for the approximate membership test, while the relation* <sup>x</sup> <sup>∈</sup> *bf means that* x *has been previously inserted into bf .*

*False Positives.* This property is more complex as the occurrence of a false positive is entirely dependent on the particular outcomes of the hash functions f1,...,f*<sup>k</sup>* and one needs to consider situations in which the hash functions happen to map some values to *overlapping* sets of indices. That is, after inserting a series of values *xs*, subsequent queries for y /<sup>∈</sup> *xs* might incorrectly return true.

This leads to subtle dependencies that can invalidate the analysis, and have lead to a number of incorrect probabilistic bounds on the event, including in the analysis by Bloom in his original paper [6]. Specifically, Bloom first considered the probability that inserting l distinct items into the Bloom filter will set a particular bit b*i*. From the independence of the hash functions, he was able to show that the probability of this event has a simple closed-form representation:

**Lemma 1 (Probability of a single bit being set).** *If the only values previously inserted into bf are* x1,...,x*l, then the probability of a particular single bit at the position* i *being set is* Pr - i th bit in *bf* is set = 1 − <sup>1</sup> <sup>−</sup> <sup>1</sup> *m kl* .

Bloom then claimed that the probability of a false positive was simply the probability of a single bit being set, raised to the power of k, reasoning that a false positive for an element <sup>y</sup> ∈ *bf* only occurs when all the <sup>k</sup> bits corresponding to the hash outputs are set.

Unfortunately, as was later pointed out by Bose *et al.* [8], as the bits specified by <sup>f</sup>1(x),...,f*<sup>k</sup>*−<sup>1</sup>(x) may overlap, we cannot guarantee the independence that is required for any simple relation between the probabilities. Bose *et al.* rectified the analysis by instead interpreting the bits within a Bloom filter as maintaining a set bits(*bf* ) <sup>⊆</sup> <sup>N</sup>[0*,...,m*−1], corresponding to the indices of raised bits. With this interpretation, an element y only tests positive if the random set of indices produced by the hash functions on <sup>y</sup> is such that inds(y) <sup>⊆</sup> bits(*bf* ). Therefore, the chance of a positive result for <sup>y</sup> ∈ *bf* resolves to the chance that the random set of indices from hashing <sup>y</sup> is a subset of the union of inds(x) for each <sup>x</sup> <sup>∈</sup> *bf* . The probability of this reduced event is described by the following theorem:

**Theorem 2 (Probability of False Positives).** *If the only values inserted into bf are* <sup>x</sup>1,...,x*l, then for any* <sup>y</sup> ∈ *bf ,* Pr [<sup>y</sup> <sup>∈</sup>? *bf* ] = <sup>1</sup> *<sup>m</sup>k*(*l*+1) *<sup>m</sup> <sup>i</sup>*=1 <sup>i</sup> *k*i! m i kl i , *where* s t *stands for the* Stirling number of the second kind*, capturing the number of surjections from a set of size* s *to a set of size* t*.*

The key step in capturing these program properties is in treating the outcomes of hashes as *random variables* and then propagating this randomness to the results of the other operations. A formal treatment of program outcomes requires a suitable semantics, representing programs as distributions of such random variables. In moving to mechanised proofs, we must first fully characterise this semantics, formally defining a notion of a probabilistic computation in Coq.

#### **3 Encoding AMQs in Coq**

To introduce our encoding of AMQs and their probabilistic behaviours in Coq, we continue with our running example, transitioning from mathematical notation to Gallina, Coq's language. The rest of this section will introduce each of the key components of this encoding through the lens of Bloom filters.

#### **3.1 Probability Monad**

Our formalisation represents probabilistic computations using an embedding following the style of the FCF library [39]. We do not use FCF directly, due to its primary focus on cryptographic proofs, wherein it provides little support for proving probabilistic bounds directly, instead prioritising a reduction-based approach of expressing arbitrary computations as compositions of known distributions.

Following the adopted FCF notation, a term of type Comp A represents a probabilistic computation returning a value of type A, and is constructed using the standard monadic operators, with an additional primitive rand n that allows sampling from a uniform distribution over the range Z*n*:

$$\begin{aligned} \mathsf{ret} &: A \to \mathsf{Comp}\ A\\ \mathsf{bind} &: \mathsf{Comp}\ A \to (A \to \mathsf{Comp}\ B) \to \mathsf{Comp}\ B\\ \mathsf{rand} &: (n: \mathsf{N}) \to \mathsf{Comp}\ (\mathbb{Z}\_n) \end{aligned}$$

We implement a Haskell-style do-notation over this monad to allow descriptions of probabilistic computations within Gallina. For example, the following code is used to implement the query operation for the Bloom filter:

```
hash_res <-$ hash_vec_int x hashes; (* hash x using the hash functions *)
let (new_hashes, hash_vec) := hash_res in
(* check if all the corresponding bits are set *)
let qres := bf_query_int hash_vec bf in
(* return the query result and the new hashes *)
    ret (new_hashes, qres).
```
In the above listing, we pass the queried value x along with the hash functions hashes to a probabilistic hashing operation hash\_vec\_int to hash x over each function in hashes. The result of this random operation is then bound to hash\_res and split into its constituent components—a sequence of hash outputs hash\_vec and an updated copy new\_hashes of the hash functions, now incorporating the mapping for x. Then, having mapped our input into a sequence of indices, we can query the Bloom filter for membership using a corresponding deterministic operation bf\_query\_int to check that all the bits specified by hash\_vec are set. Finally, we complete the computation by returning the query outcome qres and the updated hash functions new\_hashes using the ret operation to lift our result to a probabilistic outcome.

Using the code snippet above, we can define the query operation bf\_query as a function that maps a Bloom filter, a value to query, and a collection of hash functions to a probabilistic computation returning the query result and an updated set of hash functions. However, because our computation type does not impose any particular semantics, this result only encodes the *syntax* of the probabilistic query and has no actual meaning without a separate interpretation.

Thus, given a Gallina term of type Comp A, we must first evaluate it into a distribution over possible results to state properties on the probabilities of its outcomes. We interpret our monadic encoding in terms of Ramsey's probability monad [42], which decomposes a complex distribution into composition of primitive ones bound together via conditional distributions. To capture this interpretation within Coq, we then use the encoding of this monad from the infotheo library [1,2], and provide a function eval\_dist : Comp <sup>A</sup> <sup>→</sup> dist <sup>A</sup> that evaluates computations into distributions by recursively mapping them to the probability monad. Here, dist A represents infotheo's encoding of distributions over a finite support <sup>A</sup>, defined as being composed of a measure function pmf : <sup>A</sup> <sup>→</sup> <sup>R</sup><sup>+</sup>, and a proof that the sum of the measure over the support A produces 1.

This mapping from computations to distributions must be done to a program e (involving, *e.g.*, Bloom filter) before stating its probability bound. Therefore, we hide this evaluation process behind a notation that allows stating probabilistic properties in a form closer to their mathematical counterparts:

$$\Pr\left[e = v\right] \stackrel{\scriptstyle}{=} \left(\mathsf{eval1\\_dist\ e}\right)v$$

$$\Pr\left[e\right] \stackrel{\scriptstyle}{=} \left(\mathsf{eval1\\_dist\ e}\right)\mathsf{true}$$

Above, v is an arbitrary element in the support of the distribution induced by e. Finally, we introduce a binding operator to allow concise representation of dependent distributions: e f bind e f.

#### **3.2 Representing Properties of Bloom Filters**

We define the state of a Bloom filter (BF) in Coq as a binary vector of a fixed length m, using Ssreflect's m.-tuple data type:

```
Record BF := mkBF { bloomfilter_state: m.-tuple bool }.
Definition bf_new : BF := (* construct a BF with all bits cleared *).
Definition bf_get_int i : BF → bool := (* retrieve BF's ith bit *).
```
We define the deterministic components of the Bloom filter implementation as pure functions taking an instance of BF and a series of indices assumed to be obtained from earlier calls to the associated hash functions:

bf\_add\_int : BF <sup>→</sup> seq <sup>Z</sup><sup>m</sup> <sup>→</sup> BF bf\_query\_int : BF <sup>→</sup> seq <sup>Z</sup><sup>m</sup> <sup>→</sup> bool

That is, bf\_add\_int takes the Bloom filter state and a sequence of indices to insert and returns a new state with the requested bits also set. Conversely, bf\_query\_int returns true *iff* all the queried indices are set. These pure operations are then called within a probabilistic wrapper that handles hashing the input and the book-keeping associated with hashing to provide the standard interface for AMQs:

> bf\_add : <sup>B</sup> <sup>→</sup> (HashVec <sup>B</sup> <sup>∗</sup> BF) <sup>→</sup> Comp (HashVec <sup>B</sup> <sup>∗</sup> BF) bf\_query : <sup>B</sup> <sup>→</sup> (HashVec <sup>B</sup> <sup>∗</sup> BF) <sup>→</sup> Comp (HashVec <sup>B</sup> <sup>∗</sup> bool)

The component HashVec B (to be defined in Sect. 3.3), parameterised over an input type B, keeps track of *known results* of the involved hash functions and is provided as an external parameter to the function rather than being a part of the data structure to reflect typical uses of AMQs, wherein the hash operation is pre-determined and shared by *all* instances.

With these definitions and notation, we can now state the main theorems of interest about Bloom filters directly within Coq:<sup>2</sup>

**Theorem 3 (No False Negatives).** *For any Bloom filter state bf , a vector of hash functions hs , after having inserted an element* x *into bf , followed by a series xs of other inserted elements, the result of query* <sup>x</sup> <sup>∈</sup>? *bf is always* true*. That is, in terms of probabilities:* Pr [bf\_add x (*hs*, *bf* ) bf\_addm *xs* bf\_query x] =1.

**Lemma 2 (Probability of Flipping a Single Bit).** *For a vector of hash functions hs of length* k*, after inserting a series of* l *distinct values xs , all unseen in hs , into an empty Bloom filter bf , represented by a vector of* m *bits, the probability of its any index* <sup>i</sup> *being set is* Pr [bf\_addm *xs* (*hs*, bf\_new) bf\_get <sup>i</sup>]=1<sup>−</sup> <sup>1</sup> <sup>−</sup> <sup>1</sup> *m kl* . *Here,* bf\_get *is a simple embedding of the pure function* bf\_get\_int *into a probabilistic computation.*

**Theorem 4 (Probability of a False Positive).** *After having inserted a series of* l *distinct values xs , all unseen in hs , into an empty Bloom filter bf , for any unseen* <sup>y</sup> ∈ *xs , the probability of a subsequent query* <sup>y</sup> <sup>∈</sup>? *bf for* y *returning true is given as* Pr [bf\_addm *xs* (*hs*, bf\_new) bf\_query y] = 1 *<sup>m</sup>k*(*l*+1) *<sup>m</sup> <sup>i</sup>*=1 <sup>i</sup> *k*i! m i kl i .

The proof of this theorem required us to provide *the first axiom-free mechanised proof* for the closed form for Stirling numbers of the second kind [26].

In the definitions above, we used the output of the hashing operation as the bound between the deterministic and probabilistic components of the Bloom filter. For instance, in our earlier description of the Bloom filter query operation

<sup>2</sup> bf addm is a trivial generalisation of the insertion to multiple elements.

in Sect. 3.1, we were able to implement the entire operation with the only probabilistic operation being the call hash\_vec\_int x hashes. In general, structuring AMQ operations as manipulations with hash outputs via *pure* deterministic functions allows us to decompose reasoning about the data structure into a series of specialised properties about its deterministic primitives and a separate set of reusable properties on its hash operations.

#### **3.3 Reasoning About Hash Operations**

We encode hash operations within our development using a random oracle-based implementation. In particular, in order to keep track of *seen* hashes learnt by hashing previously observed values, we represent a *state* of a hash function from elements of type <sup>B</sup> to a range Z*<sup>m</sup>* using a finite map to ensure that previously hashed values produce the same hash output:

```
Definition HashState B := FixedMap B 'I_m.
```
The state is paired with a hash function generating uniformly random outputs for unseen values, and otherwise returns the value as from its prior invocations:

```
Definition hash value state : Comp (HashState B * B) :=
 match find value state with
 | Some(output) ⇒ ret (state, output)
 | None ⇒ rnd <-$ rand m;
           new_state <- put value rnd state;
           ret (new_state, rnd)
```
end.

A *hash vector* is a generalisation of this structure to represent a vector of states of k independent hash functions:

```
Definition HashVec B := k.-tuple HashState B.
```
The corresponding hash operation over the hash vector, hash\_vec\_int, is then defined as a function taking a value and the current hash vector and then returning a pair of the updated hash vector and associated random vector, internally calling out to hash to compute individual hash outputs.

This random oracle-based implementation allows us to formulate several helper theorems for simplifying probabilistic computations using hashes by considering whether the hashed values *have been seen before or not*. For example, if we knew that a value x had not been seen before, we would know that the possibility of obtaining any particular choice of a vector of indices would be equivalent to obtaining the same vector by a draw from a corresponding uniform distribution. We can formalise this intuition in the form of the following theorem:

**Theorem 5 (Uniform Hash Output).** *For any two hash vectors hs , hs of length* k*, a value* x *that has not been hashed before, and an output vector* ι*s of length* m *obtained by hashing* x *via hs , if the state of hs has the same mappings* *as hs and also maps* x *to* ι*s , the probability of obtaining the pair* (*hs* , ι*s*) *is uniform:* Pr - hash\_vec\_int x *hs* = (*hs* , ι*s*) = <sup>1</sup> *m k .*

Similarly, there are also often cases where we are hashing a value that we *have already seen*. In these cases, if we know the exact indices a value hashes to, we can prove a certainty on the value of the outcome:

**Theorem 6 (Hash Consistency).** *For any hash vector hs , a value* x*, if hs maps* x *to outputs* ι*s , then hashing* x *again will certainly produce* ι*s and not change hs , that is,* Pr [hash\_vec\_int x *hs* = (*hs*, ι*s*)] = 1*.*

By combining these types of probabilistic properties about hashes with the earlier Bloom filter operations, we are able to prove the prior theorems about Bloom filters by reasoning primarily about the core logical interactions of the *deterministic components* of the data structure. This decomposition is not just applicable to the case of Bloom filters, but can be extended into a general framework for obtaining modular proofs of AMQs, as we will show in the next section.

#### **4 Ceramist at Large**

Zooming out from the previous discussion of Bloom filters, we now presentCeramist in its full generality, describing the high-level design in terms of the various interfaces it requires to instantiate to obtain verified AMQ implementations.

The core of our framework revolves around the decomposition of an AMQ data structure into separate interfaces for hashing (AMQHash) and state (AMQ), generalising the specific decomposition used for Bloom filters (hash vectors and bit vectors respectively). More specifically, the AMQHash interface captures the probabilistic properties of the hashing operation, while the AMQ interface captures the deterministic interactions of the state with the hash outcomes.

#### **4.1 AMQHash Interface**

The AMQHash interface generalises the behaviours of hash vectors (Sect. 3.3) to provide a generic description of the hashing operation used in AMQs.

The interface first abstracts over the specific types used in the prior hashing operations (such as, *e.g.*, HashVec B) by treating them as opaque parameters: using a parameter AMQHashState to represent the state of the hash operation; types Key and Value encoding the hash inputs and outputs respectively, and finally, a deterministic operation AMQHash\_add\_internal : AMQHashState → Key → Value → AMQHashState to encode the interaction of the state with the outputs and inputs. For example, in the case of a single hash, the state parameter AMQHashState would be HashState B, while for a hash vector this would instead be HashVec B.

To use this hash state in probabilistic computations, the interface assumes a separate probabilistic operation that will take the hash state and randomly generate an output (*e.g.*, hash for single hashes and hash\_vec\_int for hash vectors): Parameter AMQHash\_hash: Key → AMQHashState → Comp (AMQHash \* Value).

Then, to abstractly capture the kinds of reasoning about the outcomes of hash operations done with Bloom filters in Sect. 3.3, the interface assumes a few predicates on the hash state to provide information about its contents:

Parameter AMQHash\_hashstate\_contains: AMQHashState → Key → Value → bool. Parameter AMQHash\_hashstate\_unseen: AMQHashState → Key → bool.

These components are then combined together to produce more abstract formulations of the previous Theorems 5 and 6 on hash operations.

**Property 1 (Generalised Uniform Hash Output).** *There exists a probability* p*hash, such that for any two AMQ hash states hs*, *hs , a value* x *that is unseen, and an output* ι*s obtained by hashing* x *via hs , if the state of hs has the same mappings as hs and also maps* x *to* ι*s , the probability of obtaining the pair* (*hs* , ι*s*) *is given by:* Pr - AMQHash\_hash x *hs* = (*hs* , ι*s*) = p*hash.*

**Property 2 (Generalised Hash Consistency).** *For any AMQ hash state hs , a value* x*, if hs maps* x *to an output* ι*s , then hashing* x *again will certainly produce* ι*s and not change hs :* Pr [AMQhash\_hash x *hs* = (*hs*, ι*s*)] = 1

Proofs of these corresponding properties must also be provided to instantiate the AMQHash interface. Conversely, components operating over this interface can assume their existence, and use them to abstractly perform the same kinds of simplifications as done with Bloom filters, resolving many probabilistic proofs to dealing with deterministic properties on the AMQ states.

#### **4.2 The AMQ Interface**

Building on top of an abstract AMQHash component, the AMQ interface then provides a unified view of the state of an AMQ and how it deterministically interacts with the output type Value of a particular hashing operation.

As before, the interface begins by abstracting the specific types and operations of the previous analysis of Bloom filters, first introducing a type AMQState to capture the state of the AMQ, and then assuming deterministic implementations of the typical *add* and *query* operations of an AMQ:

```
Parameter AMQ_add_internal: AMQState → Value → AMQState.
Parameter AMQ_query_internal: AMQState → Value → bool.
```

```
In the case of Bloom filters, these would be instantiated with the BF, bf_add_int
and bf_query_int operations respectively (cf. Sect. 3.2), thereby setting the asso-
ciated hashing operation to the hash vector (Sect. 3.3).
```
As we move on to reason about the behaviours of these operations, the interface diverges slightly from that of the Bloom filter by conditioning the behaviours on the assumption that the state has sufficient capacity:

```
Parameter AMQ_available_capacity: AMQState → nat → bool.
```
While the Bloom filter has no real deterministic notion of a capacity, this cannot be said of all AMQs in general, such as the Counting Bloom filter or Quotient filter, as we will discuss later.

With these definitions in hand, the behaviours of the AMQ operations are characterised using a series of associated assumptions:

**Property 3 (AMQ insertion validity).** *For a state* s *with sufficient capacity, inserting any hash output* ι*s into* s *via* AMQ\_add\_internal *will produce a new state* s *for which any subsequent queries for* ι*s via* AMQ\_query\_internal *will return* true*.*

**Property 4 (AMQ query preservation).** *For any AMQ state* s *with sufficient remaining capacity, if queries for a particular hash output* ι*s in* s *via* AMQ\_query\_internal *happen to return* true*, then inserting any further outputs* ι*s into* s *will return a state for which queries for* ι*s will* still *return* true*.*

Even though these assumptions seemingly place strict restrictions on the permitted operations, we found that these properties are satisfied by most common AMQ structures. One potential reason for this might be because they are in fact *sufficient* to ensure the No-False-Negatives property standard of most AMQs:

**Theorem 7 (Generalised No False Negatives).** *For any AMQ state* s*, a corresponding hash state hs , after having inserted an element* x *into* s*, followed by a series xs of other inserted elements, the result of query for* x *is always* true*. That is,* Pr [AMQ\_add x (*hs*, s) AMQ\_addm *xs* AMQ\_query x]=1.

Here, AMQ\_add, AMQ\_addm, and AMQ\_query are generalisations of the probabilistic wrappers of Bloom filters (*cf.* Sect. 3.1) for doing the bookkeeping associated with hashing and delegating to the internal deterministic operations.

The generalised Theorem 7 illustrates one of the key facilities of our framework, wherein by simply providing components satisfying the AMQHash and AMQ interfaces, it is possible to obtain proofs of certain standard probabilistic properties or simplifications *for free*.

The diagram in Fig. 1 provides a high-level overview of the interfaces of Ceramist, their specific instances, and dependencies between them, demonstrating Ceramist's take on compositional reasoning and proof reuse. For instance Bloom filter implementation instantiates the AMQ interface implementation and uses, as a component, hash vectors, which themselves instantiate AMQHash used by AMQ. Bloom filter itself is also used as a proof reduction target by Counting Bloom filter. We will elaborate on this and the other noteworthy dependencies between interfaces and instances of Ceramist in the following sections.

#### **4.3 Counting Bloom Filters Through Ceramist**

To provide a concrete demonstration of the use of the AMQ interface, we now switch over to a new running example—Counting Bloom filters [46]. A Counting Bloom filter is a variant of the Bloom filter in which individual bits are replaced

**Fig. 1.** Overview of Ceramist and dependencies the between its components.

with counters, thereby allowing the removal of elements. The implementation of the structure closely follows the Bloom filter, generalising the logic from bits to counters: insertion increments the counters specified by the hash outputs, while queries treat counters as set if greater than 0. In the remainder of this section, we will show how to encode and verify the Counting Bloom filter for the standard AMQ properties. We have also proven two novel domain-specific properties of Counting Bloom filters (*cf.* Appendix A of the extended paper version [25]).

First, as the Counting Bloom filter uses the same hashing strategy as the Bloom filter, the hash interface can be instantiated with the Hash Vector structure used for the Bloom filter, entirely reusing the earlier proofs on hash vectors. Next, in order to instantiate the AMQ interface, the state parameter can be defined as a vector of bounded integers, all initially set to 0:

```
Record CF := mkCF { countingbloomfilter_state: m.-tuple Zp }.
Definition cf_new : CF := (* a new CF with all counters set to 0 *).
```
As mentioned before, the *add* operation increments counters rather than setting bits, and the *query* operation treats counters greater than 0 as raised.

> cf\_add\_int : CF <sup>→</sup> seq <sup>Z</sup><sup>m</sup> <sup>→</sup> CF cf\_query\_int : CF <sup>→</sup> seq <sup>Z</sup><sup>m</sup> <sup>→</sup> bool

To prevent integer overflows, the counters in the Counting Bloom filter are bounded to some range Z*p*, so the overall data structure too has a maximum capacity. It would not be possible to insert any values if doing such would raise any of the counters above their maximum. To account for this, the capacity parameter of the AMQ interface is instantiated with a simple predicate cf\_available\_capacity that verifies that the structure can support l further inserts by ensuring that each counter has at least <sup>k</sup> <sup>∗</sup> <sup>l</sup> spaces free (where <sup>k</sup> is the number of hash functions used by the data structure).

The add operation can be shown to be monotone on the value of any counter when there is sufficient capacity (Property 3). The remaining properties of the operations also trivially follow, thereby completing the instantiation, and allowing the automatic derivation of the No-False-Negatives result via Theorem 7.

#### **4.4 Proofs About False Positive Probabilities by Reduction**

As the observable behaviour of Counting Bloom filter almost exactly matches that of the Bloom filter, it seems reasonable that the same probabilistic bounds should also apply to the data structure. To facilitate these proof arguments, we provide the AMQMap interface that allows the derivation of probabilistic bounds by reducing one AMQ data structure to another.

The AMQMap interface is parameterised by two AMQ data structures, AMQ A and B, using the same hashing operation. It is assumed that corresponding bounds on False Positive rates have already been proven for AMQ B, while have not for AMQ A. The interface first assumes the existence of some mapping from the state of AMQ A to AMQ B, which satisfies a number of properties:

#### Parameter AMQ\_state\_map: A.AMQState → B.AMQState.

In the case of our Counting Bloom filter example, this mapping would convert the Counting Bloom filter state to a bit vector by mapping each counter to a raised bit if its value is greater than 0. To provide the of the false positive rate boundary, the AMQMap interface then requires the behaviour of this mapping to satisfy a number of additional assumptions:

**Property 5 (AMQ Mapping Add Commutativity).** *Adding a hash output to the AMQ B obtained by applying the mapping to an instance of AMQ A produces the same result as first adding a hash output to AMQ A and then applying the mapping to the result.*

**Property 6 (AMQ Mapping Query Preservation).** *Applying B's query operation to the result of mapping an instance of AMQ A produces the same result as applying A's query operation directly.*

In the case of reducing Counting Bloom filters (A) to Bloom filters (B), both results follow from the fact that after incrementing the some counters, all of them will have values greater than 0 and thus be mapped to raised bits.

Having instantiated the AMQMap interface with the corresponding function and proofs about it, it is now possible to derive the false positive rate of Bloom filters for Counting Bloom filters for free through the following generalised lemma:

**Theorem 8 (AMQ False Positive Reduction).** *For any two AMQs A, B, related by the* AMQMap *interface, if the false positive rate for B after inserting* l *items is given by the function* f *on* l*, then the false positive rate for* A *is also given by* f *on* l*. That is, in terms of probabilities:*

Pr [B.AMQ\_addm *xs* (*hs*, B.AMQ\_new) B.AMQ\_query <sup>y</sup>] = <sup>f</sup>(length *xs*) =<sup>⇒</sup>

Pr [A.AMQ\_addm *xs* (*hs*, A.AMQ\_new) A.AMQ\_query y] = f(length *xs*).

#### **5 Proof Automation for Probabilistic Sums**

We have, until now, avoided discussing details of how facts about the probabilistic computations can be composed, and thereby also the specifics of how our proofs are structured. As it turns out, most of this process resolves to reasoning about summations over real values as encoded by Ssreflect's bigop library. Our development also relies on the tactic library by Martin-Dorel and Soloviev [32].

In this section, we outline some of the most essential proof principles facilitating the proofs-by-rewriting about probabilistic sums. While most of the provided rewriting primitives are standalone general equality facts, some of our proof techniques are better understood as combining a series of rewritings into a more general rewriting pattern. To delineate these two cases, will use the terminology **Pattern** to refer to a general pattern our library supports by means of a dedicated Coq tactic, while **Lemma** will refer to standalone proven equalities.

#### **5.1 The Normal Form for Composed Probabilistic Computations**

When stating properties on outcomes of a probabilistic computation (*cf.* Sect. 3.1), the computation must first be recursively evaluated into a distribution, where the intermediate results are combined using the probabilistic bind operator. Therefore, when decomposing a probabilistic property into smaller subproofs, we must rely on its semantics that is defined for discrete distributions as follows:

$$(\mathtt{bind\\_dist}\ (P:\mathtt{dist\ A})\ (f:A\to\mathtt{dist\ B})\ \stackrel{\Delta}{=}\sum\_{a:\ A}\sum\_{b:\ B}P\ \stackrel{\Delta}{a}\times\ (f\ a)\ \stackrel{\Delta}{b}$$

Expanding this definition, one can represent any statement on the outcome of a probabilistic computation in a *normal form* composed of only nested summations over a product of the probabilities of each intermediate computational step. This paramount transformation is captured as the following pattern:

#### **Pattern 1 (Bind normalisation)**

$$\Pr\left[\left(c\_1 \rhd \dots \rhd c\_m\right) = v\right] = \sum\_{v\_1} \dots \sum\_{v\_{m-1}} \Pr\left[c\_1 = v\_1\right] \times \dots \times \Pr\left[c\_m \ v\_{m-1} = v\right]$$

Here, by <sup>c</sup>*<sup>i</sup>* <sup>v</sup>*<sup>i</sup>*−<sup>1</sup> <sup>=</sup> <sup>v</sup>*i*, we denote the event in which the result of evaluating the command <sup>c</sup>*<sup>i</sup>* <sup>v</sup>*<sup>i</sup>*−<sup>1</sup> is <sup>v</sup>*i*, where <sup>v</sup>*<sup>i</sup>*−<sup>1</sup> is the result of evaluating the previous command in the chain. This transformation then allows us to resolve the proof of a given probabilistic property into proving simpler statements on its substeps. For instance, consider the implementation of Bloom filter's query operation from Sect. 3.1. When proving properties of the result of a particular query (as in Theorem 3), we use this rule to decompose the program into its component parts, namely as being the product of a hash invocation Pr [hash\_vec\_int x *hs*] and the deterministic query operation bf\_query\_int. This allows dealing with the hash operation and the deterministic component *separately* by applying subsequent rewritings to each factor on the right-hand side of the above equality.

#### **5.2 Probabilistic Summation Patterns**

Having resolved a property into our normal form via a tactic implementing Pattern 1, the subsequent reductions rely on the following patterns and lemmas.

*Sequential Composition.* When reasoning about the properties of composite programs, it is common for some subprogram e to return a probabilistic result that is then used as the arguments for a probabilistic function f. This composition is encapsulated by the operation e f , as used by Theorems 3, 2, and 4. The corresponding programs, once converted to the normal form, are characterised by having factors within its internal product that simply evaluate the probability of the final statement ret v to produce a particular value v*k*:

$$\sum\_{v\_1} \cdots \sum\_{v\_{m-1}} \underbrace{\Pr\left[c\_1 = v\_1\right] \times \cdots \times \Pr\left[\text{ret } v' = v\_k\right]}\_{\text{e}} \underbrace{\cdots \times \Pr\left[c\_m \, v\_{m-1} = v\right]}\_{\text{f}}$$

Since the return operation is defined as a delta distribution with a peak at the return value v , we can simplify the statement by removing the summation over v*k*, and replacing all occurrences of v*<sup>k</sup>* with v , via the following pattern:

**Pattern 2 (Probability of a Sequential Composition).**

$$\sum\_{v\_1} \cdots \sum\_{v\_{m-1}} \Pr\left[\text{ret } v'=v\_1\right] \cdots \times \Pr\left[c\_m \, v\_{m-1}=v\right]$$

$$= \sum\_{v\_2} \cdots \sum\_{v\_{m-1}} \Pr\left[ [v'/v\_1](c\_2 \, v\_1) = v\_2 \right] \times \cdots \times \Pr\left[ [v'/v\_1]c\_m \, v\_{m-1} = v \right]$$

Notice that, without loss of generality, Pattern 2 assumes that the v -containing factor is in the head. Our tactic implicitly rewrites the statement to this form.

*Plausible Statement Sequencing.* One common issue with the normal form, is that, as each statement is evaluated over the entirety of its support, some of the dependencies between statements are obscured. That is, the outputs of one statement may in fact be constrained to *some subset* of the complete support. To recover these dependencies, we provide the following theorem, that allows reducing computations under the assumption that their inputs are plausible:

**Lemma 3 (Plausible Sequencing).** *For any computation sequence* c<sup>1</sup> c2*, if it is possible to reduce the computation* c<sup>2</sup> x *to a simpler form* c<sup>3</sup> x *when* x *is amongst plausible outcomes of* <sup>c</sup>1*, (* i.e.*,* Pr [c<sup>1</sup> <sup>=</sup> <sup>x</sup>] = 0 *holds) then it is possible to rewrite* c<sup>2</sup> *to* c<sup>3</sup> *without changing the resulting distribution:*

$$\sum\_{x} \sum\_{y} \Pr\left[c\_1 = x\right] \times \Pr\left[c\_2 \; x = y\right] = \sum\_{x} \sum\_{y} \Pr\left[c\_1 = x\right] \times \Pr\left[c\_3 \; x = y\right]$$

*Plausible Outcomes.* As was demonstrated in the previous paragraph, it is sometimes possible to gain knowledge that a particular value v is a plausible outcome for a composite probabilistic computation c<sup>1</sup> ... c*m*:

$$\sum\_{v\_1} \cdots \sum\_{v\_{m-1}} \Pr\left[c\_1 = v\_1\right] \times \cdots \times \Pr\left[c\_m \ v\_{m-1} = v\right] \neq 0$$

This fact in itself is not particularly helpful as it does not immediately provide any usable constraints on the value v. However, we can now turn this inequality into a conjunction of inequalities for individual probabilities, thus getting more information about the intermediate steps of the computation:

**Pattern 3.** If *<sup>v</sup>*<sup>1</sup> ··· *<sup>v</sup>m−*<sup>1</sup> Pr [c<sup>1</sup> <sup>=</sup> <sup>v</sup>1] ×···× Pr [c*<sup>m</sup>* <sup>v</sup>*<sup>m</sup>*−<sup>1</sup> <sup>=</sup> <sup>v</sup>] = 0, then there exist <sup>v</sup>1,...,v*<sup>m</sup>*−<sup>1</sup> such that Pr [c<sup>1</sup> <sup>=</sup> <sup>v</sup>1] = 0 ∧···∧ Pr [c*<sup>m</sup>* <sup>=</sup> <sup>v</sup>] = 0.

This transformation is possible due to the fact that probabilities are always nonnegative, thus if a summation is positive, there must exist at least one element in the summation that is also positive.

*Summary of the Development.* By composing these components together, we obtain a comprehensive toolbox for effectively reasoning about probabilistic computations. We find that our summation patterns end up encapsulating most of the book-keeping associated with our encoding of probabilistic computations, which, combined with the AMQ/AMQHash decomposition from Sect. 4, allows for a fairly straightforward approach for verifying properties of AMQs.

#### **5.3 A Simple Proof of Generalised No False Negatives Theorem**

To showcase the fluid interaction of our proof principles in action, let us consider the proof of the generalised No-False-Negatives Theorem 7, stating the following:

$$\Pr\left[\underbrace{\mathsf{AM}\mathsf{Q}\mathsf{add}\ x\ (hs,s)}\_{(a),(b)}\ \rhd\quad\uplus\!\!\mathsf{M}\!\!\mathsf{Q}\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\begin{array}{c}\;\!\mathsf{AM}\!\!\!\!\!\!\!\!\!\!\!\!\!\!\begin{array}{c}\;\!\mathsf{x}\!\mathsf{M}\!\!\!\!\!\!\!\!\!\!\begin{array}{c}\mathsf{a}\!\mathsf{M}\!\!\!\!\!\!\!\!\!\!\begin{array}{c}\mathsf{a}\!\mathsf{M}\!\!\!\!\!\!\!\!\begin{array}{c}\mathsf{a}\!\mathsf{M}\!\!\!\!\!\!\!\!\end{array}\\\mathrm{(d)},(c)\end{array}\right]\right]$$

As with most of our probabilistic proofs, we begin by applying normalisation Pattern 1 to reduce the computation into our normal form:

$$\sum\_{s\boldsymbol{s}\_0, h s \boldsymbol{s}\_0} \sum\_{s\boldsymbol{s}\_0} \sum\_{s\_1} \sum\_{s\_1, h s\_1} \sum\_{s\_2, h s\_2} \begin{pmatrix} (a) \Pr[\mathsf{AMQ} \mathsf{hash}\_{\mathsf{\mathsf{h}}} \mathsf{hash} \ x \; h s = (\boldsymbol{\omega}\_0, h s\_0)] & \boldsymbol{\times} \\ (b) \Pr[\mathsf{rect} \, (\mathsf{AMQ} \mathsf{add}\_{\mathsf{\mathsf{internal}}} \; s \; \boldsymbol{\omega}\_0) = s\_0] \times \\ (c) \Pr[\mathsf{AMQ} \mathsf{add} \; xs \; (s\_0, h s\_0) = (s\_1, h s\_1)] & \boldsymbol{\times} \\ (d) \Pr[\mathsf{AMQ} \mathsf{hash}\_{\mathsf{\mathsf{h}}} \mathsf{hash} \ x \; h s\_1 = (\boldsymbol{\omega}\_2, h s\_2)] & \boldsymbol{\times} \\ (e) \Pr[\mathsf{rect} \, (\mathsf{AMQ} \mathsf{query}\_{\mathsf{\mathsf{internal}}} \; s\_1 \; \boldsymbol{\omega}\_2)] & \end{pmatrix}$$

We label the factors to be rewritten as (a)–(e) for the convenience of the presentation, indicating the correspondence to the components of the statement (1). From here, as all values are assumed to be unseen, we can use Property 1 in conjunction with the sequencing Pattern 2 to reduce factors (a) and (b) as follows:

$$\sum\_{s\boldsymbol{s}\_0} \sum\_{s\_1, s\_2, s\_3} \sum\_{s\_2, s\_3} \begin{pmatrix} (\boldsymbol{a}) \ p\_{\text{hash}} & & & \times \\ (\boldsymbol{c}) \ \Pr\left[\textsf{AMQ\\_add\'m} \; xs\left((\boldsymbol{s} \leftarrow\_{\text{add}} \; us\_0), (\boldsymbol{h} \leftarrow\_{\text{hash}} (\boldsymbol{x} \cdot \; us\_0)\right)\right) = (s\_1, hs\_1)\right] \times \\ (\boldsymbol{d}) \ \Pr\left[\textsf{AMQ\\_had\\_hash} \; \mathbf{h} \; \mathbf{s} \; hs\_1 = (s\_2, hs\_2)\right] & \times \\ (\boldsymbol{e}) \ \Pr\left[\textsf{AMQ\\_query\\_intro\;al} \; s\_1 \; us\_2\right] \end{pmatrix}$$

Here, phash is the probability from the statement of Property 1. We also introduce the notations <sup>s</sup> <sup>←</sup>add <sup>ι</sup>*s*<sup>0</sup> and *hs* <sup>←</sup>hash (<sup>x</sup> : <sup>ι</sup>*s*0) to denote the deterministic operations AMQ\_add\_internal and AMQHash\_add\_internal respectively. Then, using Pattern 3 for decomposing plausible outcomes, it is possible to separately show that any plausible *hs*<sup>1</sup> from AMQ\_addm must map x to ι*s*0, as hash operations preserve mappings. Combining this fact with Lemma 3 (plausible sequencing) and Hash Consistency (Property 2), we can derive that the execution of AMQHash\_hash on x in (d) must return ι*s*0, simplifying the summation even further:

$$\sum\_{s\_0} \sum\_{s\_1, s\_2} \begin{pmatrix} (a) \ p\_{\text{hash}} & & & \times \\ (c) \ \Pr\left[\textsf{AMQ\\_addn}\ xs\left((s \leftarrow\_{\text{add}} \ us\_0), (hs \leftarrow\_{\text{hah}} (x \cdot \omega\_0))\right) = (s\_1, hs\_1)\right] \times \\ (e) \ \Pr\left[\textsf{AMQ\\_query\\_internal}\ s\_1 \ us\_0\right] & & \end{pmatrix}$$

Finally, as <sup>s</sup><sup>1</sup> is a plausible outcome from AMQ\_addm called on <sup>s</sup> <sup>←</sup>add <sup>ι</sup>*s*0, we can then show, using Property 4 (query preservation), that querying for ι*s*<sup>0</sup> on s<sup>1</sup> must succeed. Therefore, the entire summation reduces to the summation of distributions over their support, which can be trivially shown to be 1.

#### **6 Overview of the Development and More Case Studies**

The Ceramist mechanised framework is implmented as library in Coq proof assistant [24]. It consists of three main sub-parts, each handling a different aspect of constructing and reasoning about AMQs: (*i*) a library of *boundedlength data structures*, enhancing MathComp's [31] support for reasoning about finite sequences with varying lengths; (*ii*) a library of *probabilistic computations*, extending the infotheo probability theory library [2] with definitions of deeply embedded probabilistic computations and a collection of tactics and lemmas on summations described in Sect. 5; and (*iii*) the *AMQ interfaces and instances* representing the core of our framework described in Sect. 4.

Alongside these core components, we also include four specific case studies to provide concrete examples of how the library can be used for practical verification. Our first two case studies are the mechanisation of the Bloom filter [6] and the Counting Bloom filter [46], as discussed earlier. In proving the false-positive rate for Bloom


filters, we follow the proof by Bose *et al.* [8], also providing the first mechanised

proof of the closed expression for Stirling numbers of the second kind. Our third case study provides mechanised verification of the quotient filter [5]. Our final case study is a mechanisation of the Blocked AMQ—a family of AMQs with a common aggregation strategy. We instantiate this abstract structure with each of the prior AMQs, obtaining, among others, a mechanisation of Blocked Bloom filters [40]. The sizes of each library component, along with the references to the sections that describe them, are given in the table above.

Of particular note, in effect due to the extensive proof reuse supported by Ceramist, the proof size for each of our case-studies *progressively decreases*, with around a 50% reduction in the size from our initial proofs of Bloom filters to the final case-studies of different Blocked AMQs instances.

#### **6.1 Quotient Filter**

A quotient filter [5] is a type of AMQ data structure optimised to be more cachefriendly than other typical AMQs. In contrast to the relatively simple internal vector-based states of the Bloom filters, a quotient filter works by internally maintaining a hash table to track its elements.

The internal operations of a quotient filter build upon a fundamental notion of *quotienting*, whereby a single p-bit hash outcome is split into two by treating the upper q-bits (the quotient) and the lower r-bits (the remainder) separately. Whenever an element is inserted or queried, the item is first hashed over a single hash function and then the output quotiented. The operations of the quotient filter then work by using the q-bit quotient to specify a bucket of the hash table, and the r-bit remainder as a proxy for the element, such that a query for an element will succeed if its remainder can be found in the corresponding bucket.

A false positive can occur if the outputs of the hash function happen to exactly collide for two particular values (collisions in just the quotient or remainder are not sufficient to produce an incorrect result). Therefore, it is then possible to reduce the event of a false positive in a quotient filter to the event that at least one in several draws from a uniform distribution produces a particular value. We encode quotient filters by instantiating the AMQHash interface from Sect. 4.1 with a *single* hash function, rather than a vector of hash functions, which is used by the Bloom filter variants (Sect. 2). The size of the output of this hashing operation is defined to be 2*<sup>q</sup>* <sup>∗</sup> <sup>2</sup>*<sup>r</sup>*, and a corresponding quotienting operation is defined by taking the quotient and remainder from dividing the hash output by 2*<sup>q</sup>*. With this encoding, we are able to provide a mechanised proof of the false positive rate for the quotient filter implemented using p-bit hash as being:

**Theorem 9 (Quotient filter False Positive Rate).** *For a hash-function hs , after inserting a series of* l *unseen distinct values xs into an empty quotient filter qf , for any unseen* <sup>y</sup> ∈ *xs , the probability of a query* <sup>y</sup> <sup>∈</sup>? *qf for* <sup>y</sup> *returning true is given by:* Pr [qf\_addm *xs* (*hs*, qf\_new) qf\_query <sup>y</sup>]=1 <sup>−</sup> <sup>1</sup> <sup>−</sup> <sup>1</sup> 2*<sup>p</sup> l* .

#### **6.2 Blocked AMQ**

Blocked Bloom filters [40] are a cache-efficient variant of Bloom filters where a single instance of the structure is composed of a vector of m independent Bloom filters, using an additional "meta"-hash operation to distribute values between the elements. When querying for a particular element, the meta-hash operation would first be consulted to select a particular instance to delegate the query to.

While prior research has only focused on applying this blocking design to Bloom filters, we found that this strategy is in fact generic over the choice of AMQ, allowing us to formalise an abstract Blocked AMQ structure, and later instantiate it for particular choices of "basic" AMQs. As such, this data structure highlights the scalability of Ceramist *wrt.* composition of programs and proofs.

Our encoding of Blocked AMQs within Ceramist is done via means of two higher-order modules as in Fig. 1: (*i*) a *multiplexed-hash* component, parameterised over an arbitrary hashing operation, and (*ii*) a *blocked-state* component, parameterised over some instantiation of the AMQ interface. The multiplexed hash captures the relation between the meta-hash and the hashing operations of the basic AMQ, randomly multiplexing hashes to particular hashing operations of the sub-components. We construct a multiplexed-hash as a composition of the hashing operation H used by the AMQ in each of the m blocks, and a meta-hash function to distribute queries between the m blocks. The state of this structure is defined as pairing of m states of the hashing operation H, one for each of the m blocks of the AMQ, with the state of the meta-hash function. As such, hashing a value v with this operation produces a *pair* of type (Z*m*, Value), where the first element is obtained by hashing v over the meta-hash to select a particular block, and the second element is produced by hashing v again over the hash operation H for this selected block. With this custom hashing operation, the state component of the Blocked AMQ is defined as sequence of m states of the AMQ, one for each block. The insertion and query operations work on the output of the multiplexed hash by using the first element to select a particular element of the sequence, and then use the second element as the value to be inserted into or queried on this selected state.

Having instantiated the data structure as described above, we proved the following abstract result about the false positive rate for blocked AMQs:

**Theorem 10 (Blocked AMQ False Positive Rate).** *For any AMQ* A *with a false positive rate after inserting* l *elements estimated as* f(l)*, for a multiplexed hash-function hs , after having inserted* l *distinct values xs , all unseen in hs , into an empty Blocked AMQ filter bf composed of* m *instances of* A*, for any unseen* <sup>y</sup> ∈ *xs , the probability of a subsequent query* <sup>y</sup> <sup>∈</sup>? *bf for* <sup>y</sup> *returning* true *is given by:* Pr [BA\_addm *xs* (*hs*, BA\_new) BA\_query y] = *<sup>l</sup> <sup>i</sup>*=0 *<sup>l</sup> i* ( 1 *<sup>m</sup>* )*<sup>i</sup>* (1 <sup>−</sup> <sup>1</sup> *<sup>m</sup>* )*<sup>l</sup>*−*<sup>i</sup>* f(i).

We instantiated this interface with each of the previously defined AMQ structures, obtaining the Blocked Bloom filters, Counting Blocked Bloom filters and Blocked Quotient filter along with proofs of similar properties for them, for free.

#### **7 Discussion and Related Work**

*Proofs About AMQs.* While there has been a wealth of prior research into approximate membership query structures and their probabilistic bounds, the prevalence of paper-and-pencil proofs has meant that errors in analysis have gone unnoticed and propagated throughout the literature.

The most notable example is in Bloom's original paper [6], wherein dependencies between setting bits lead to an incorrect formulation of the bound (equation (17)), which has since been repeated in several papers [9,14,15,33] and even textbooks [34]. While this error was later identified by Bose *et al.* [8], their own analysis was also marred by an error in their definition of Stirling numbers of the second kind, resulting in yet another incorrect bound, corrected two years later by Christensen *et al.* [10], who avoided the error by eliding Stirling numbers altogether, and deriving the bound directly. Furthermore, despite these corrections, many subsequent papers [13,28–30,40,41,46] still use Bloom's original incorrect bounds. For example, in Putze *et al.* [40]'s analysis of a Blocked Bloom filter, they derive an incorrect bound on the false positive rate by assuming that the false positive of the constituent Bloom filters are given by Bloom's bound. While the Ceramist is the first development that, to the best of our knowledge, provides a mechanised proof of the probabilistic properties of Bloom filters, prior research has considered their deterministic properties. In particular, Blot *et al.* [7] provided a mechanised proof of the absence of false negatives for their implementation of a Bloom filter.

*Mechanically Verified Probabilistic Algorithms.* Past research has also focused on the verification of probabilistic algorithms, and our work builds on the results and ideas from several of these developments. The ALEA library tackles the task of proving properties of probabilistic algorithms [3], however in contrast to our deep embedding of computations, ALEA uses a shallow embedding through a Giry monad [20], representing probabilistic programs as measures over their outcomes. ALEA also axiomatises a custom type to represent reals between 0 and 1, which means they must independently prove any properties on reals they use, increasing the proof effort. The Foundational Cryptography Framework (FCF) [39] was developed for proving the security properties of cryptographic programs and provides an encoding of probabilistic algorithms. Rather than developing tooling for solving probabilistic obligations, their library proves probabilistic properties by reducing them to standard programs with known distributions. While this strategy follows the structure of cryptographic proofs, the simple tooling makes directly proving probabilistic bounds challenging. Tassarotti *et al.*'s Polaris [47] library for reasoning about probabilistic concurrent algorithms, also uses the same reduction strategy, and thereby inherits the same issues with proving standalone bounds. H¨olzl considers mechanised verification of probabilistic programs in Isabelle/HOL [27], using a similar composition of probability and computation monads to encode probabilistic programs. However, his construction defines the semantics of programs as infinite Markov chains represented as a co-inductive streams, making it unsuitable for capturing terminating programs. Our previous effort on mechanising the probabilistic properties of blockchains also considered the encoding of probabilistic computations in Coq [23]. While that work also relied on infotheo's probability monad, it only considered a restricted form of probabilistic properties, and did not deliver reusable tooling for the task.

*Proofs of Differential Privacy.* A popular motivation for reasoning about probabilistic computations is for the purposes of demonstrating differential privacy. Barthe *et al.*'s CertiPriv framework [4] extends ALEA to support reasoning using a Probabilistic Relational Hoare logic, and uses this fragment to prove probabilistic non-interference arguments. More recently, Barthe *et al.* [44] have developed a mechanisation that supports a more general coupling between distributions. Given the focus on relational properties, these developments are not suited for proving explicit numerical bounds as Ceramist is.

## **8 Conclusion**

The key properties of Approximate Membership Query structures are inherently probabilistic. Formalisations of those properties are frequently stated incorrectly, due to the complexity of the underlying proofs. We have demonstrated the feasibility of conducting such proofs in a machine-assisted framework. The main ingredients of our approach are a principled decomposition of structure definitions and proof automation for manipulating probabilistic sums. Together, they enable scalable and reusable mechanised proofs about a wide range of AMQs.

**Acknowledgements.** We thank Georges Gonthier, Karl Palmskog, George Pˆırlea, Prateek Saxena, and Anton Trunov for their comments on the prelimiary versions of the paper. We thank the CPP'20 referees (especially Reviewer D) for pointing out that the formulation of the closed form for Stirling numbers of the second kind, which we adopted as an axiom from the work by Bose *et al.* [8] who used it in the proof of Theorem 4, implied False. This discovery has forced us to prove the closed form statement in Coq from the first principles, thus getting rid of the corresponding axiom and eliminating all potentially erroneous assumptions. Finally, we are grateful to the CAV'20 reviewers for their feedback.

Ilya Sergey's work has been supported by the grant of Singapore NRF National Satellite of Excellence in Trustworthy Software Systems (NSoE-TSS) and by Crystal Centre at NUS School of Computing.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Global PAC Bounds for Learning Discrete Time Markov Chains**

Hugo Bazille<sup>1</sup>, Blaise Genest<sup>1</sup>, Cyrille Jegourel2(B) , and Jun Sun<sup>3</sup>

<sup>1</sup> Univ Rennes, CNRS & Rennes 1, Rennes, France {hbazille,bgenest}@irisa.fr <sup>2</sup> Singapore University of Technology and Design, Singapore, Singapore cyrille.jegourel@gmail.com <sup>3</sup> Singapore Management University, Singapore, Singapore junsun@smu.edu.sg

**Abstract.** Learning models from observations of a system is a powerful tool with many applications. In this paper, we consider learning Discrete Time Markov Chains (DTMC), with different methods such as *frequency estimation* or *Laplace smoothing*. While models learnt with such methods converge asymptotically towards the exact system, a more practical question in the realm of trusted machine learning is how accurate a model learnt with a limited time budget is. Existing approaches provide bounds on how close the model is to the original system, in terms of bounds on *local* (transition) probabilities, which has unclear implication on the *global* behavior.

In this work, we provide *global bounds on the error* made by such a learning process, in terms of global behaviors formalized using *temporal logic*. More precisely, we propose a learning process ensuring a bound on the error in the probabilities of these properties. While such learning process cannot exist for the full LTL logic, we provide one ensuring a bound that is uniform over all the formulas of CTL. Further, given one timeto-failure property, we provide an improved learning algorithm. Interestingly, frequency estimation is sufficient for the latter, while Laplace smoothing is needed to ensure non-trivial uniform bounds for the full CTL logic.

#### **1 Introduction**

Discrete-Time Markov Chains (DTMC) are commonly used in model checking to model the behavior of stochastic systems [3,4,7,26]. A DTMC is described by a set of states and transition probabilities between these states. The main issue with modeling stochastic systems using DTMCs is to obtain the transition probabilities. One appealing approach to overcome this issue is to observe the system and to *learn automatically* these transition probabilities [8,30], e.g., using frequency estimation or Laplace (or additive) smoothing [12]. Frequency

All authors have contributed equally.

estimation works by observing a long run of the system and estimating each individual transition by its empirical frequency. However, in this case, the unseen transitions are estimated as zeros. Once the probability of a transition is set to zero, the probability to reach a state could be tremendously changed, e.g., from 1 to 0 if the probability of this transition in the system is small but non-zero. To overcome this problem, when the set of transitions with non-zero probability is known (but not their probabilities), Laplace smoothing assigns a positive probability to the unseen transitions, i.e., by adding a small quantity both to the numerator and the denominator of the estimate used in frequency estimation. Other smoothing methods exist, such as Good-Turing [15] and Kneser-Sey estimations [7], notably used in natural language processing. Notwithstanding smoothing generates estimation biases, all these methods converge asymptotically to the exact transition probabilities.

In practice, however, there is often limited budget in observing and learning from the system, and the validity of the learned model is in question. In trusted machine learning, it is thus crucial to measure how the learned model differs from the original system and to provide practical guidelines (e.g., on the number of observations) to guarantee some control of their divergence.

Comparing two Markov processes is a common problem that relies on a notion of divergence. Most existing approaches focus on deviations between the probabilities of local transitions (e.g., [5,10,27]). However, a single deviation in a transition probability between the original system and the learned model may lead to large differences in their global behaviors, even when no transitions are overlooked, as shown in our example 1. For instance, the probability of reaching certain state may be magnified by paths which go through the same deviated transition many times. It is thus important to use a measure that quantifies the differences over global behaviors, rather than simply checking whether the differences between the individual transition probabilities are low enough.

Technically, the knowledge of a lower bound on the transition probabilities is often assumed [1,14]. While it is a soft assumption in many cases, such as when all transition probabilities are large enough, it is less clear how to obtain such a lower bound in other cases, such as when a very unlikely transition exists (e.g., a very small error probability). We show how to handle this in several cases: learning a Markov chain accurate w.r.t. this error rate, or learning a Markov chain accurate over all its global behaviors, which is possible if we know the underlying structure of the system (e.g., because we designed it, although we do not know the precise transition probabilities which are governed by uncertain forces). For the latter, we define a new concept, namely *conditioning* of a DTMC.

In this work, we model global behaviors using temporal logics. We consider Linear Temporal Logic (LTL) [24] and Computational Tree Logic (CTL) [11]. Agreeing on all formulas of LTL means that the first order behaviors of the system and the model are the same, while agreeing on CTL means that the system and the model are bisimilar [2]. Our goal is to provide stopping rules in the learning process of DTMCs that provides Probably Approximately Correct (PAC) bounds on the error in probabilities of every property in the logic between the model and the system. In Sect. 2, we recall useful notions on DTMCs and PAC-learning. We point out related works in Sect. 3. Our main contributions are as follows:


In Sect. 4, we formally state the problem and the specification that the learning process must fulfill. We also show our first contribution: the impossibility of learning a DTMC, accurate for all LTL formulas. Nevertheless, we prove in Sect. 5 our second contribution: the existence of a global bound for the time-tofailure properties, notably used to compute the mean time between failures of critical systems (see e.g., [25]) and provide an improved learning process, based on frequency estimation. In Sect. 6, we present our main contribution: a global bound guaranteeing that the original system and a model learned by Laplace smoothing have similar behaviors for all the formulas in CTL. We show that the error bound that we provide on the probabilities of properties is close to optimal. We evaluate our approach in Sect. 7 and conclude in Sect. 8.

#### **2 Background**

In this section, we introduce the notions and notations used throughout the paper. A stochastic system S is interpreted as a set of interacting components in which the state is determined randomly with respect to a global probability measure described below.

**Definition 1 (Discrete-Time Markov Chains).** *A Discrete-Time Markov Chain is a triple* <sup>M</sup> = (S, μ, A) *where:*


We denote by <sup>m</sup> the cardinal of <sup>S</sup> and <sup>A</sup> = (aij )<sup>1</sup>≤i,j≤<sup>m</sup> = (A(i, j))<sup>1</sup>≤i,j≤<sup>m</sup> the probability matrix. Figures 1 and 2 show the graph of two DTMCs over 3 states {s1, s2, s3} (with <sup>μ</sup>(s1) = 1). A run is an infinite sequence <sup>ω</sup> <sup>=</sup> <sup>s</sup>0s<sup>1</sup> ··· and a path is a finite sequence <sup>ω</sup> <sup>=</sup> <sup>s</sup><sup>0</sup> ··· <sup>s</sup><sup>l</sup> such that <sup>μ</sup>(s0) <sup>&</sup>gt; 0 and <sup>A</sup>(si, si+1) <sup>&</sup>gt; <sup>0</sup> for all <sup>i</sup>, 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>l</sup>. The length <sup>|</sup>ω<sup>|</sup> of a path <sup>ω</sup> is its number of transitions.

The cylinder set of ω, denoted C(ω), consists of all the runs starting by a path <sup>ω</sup>. Markov chain <sup>M</sup> underlies a probability space (Ω, <sup>F</sup>, <sup>P</sup>), where <sup>Ω</sup> is the

**Fig. 1.** An example of DTMC M<sup>1</sup>

**Fig. 2.** DTMC M<sup>2</sup>

set of all runs from M; F is the sigma-algebra generated by all the cylinders <sup>C</sup>(ω) and <sup>P</sup> is the unique probability measure [32] such that <sup>P</sup>(C(s<sup>0</sup> ··· <sup>s</sup>l)) = μ(s0) l <sup>i</sup>=1 <sup>A</sup>(s<sup>i</sup>−1, si). For simplicity, we assume a unique initial state <sup>s</sup><sup>0</sup> and denote P(ω) = P (C(ω)). Finally, we sometimes use the notation P<sup>A</sup> <sup>i</sup> to emphasize that the probability distribution is parameterized by the probability matrix A, and the starting state is i.

#### **2.1 PAC-Learning for Properties**

To analyze the behavior of a system, properties are specified in temporal logic (e.g., LTL or CTL, respectively introduced in [24] and [11]). Given a logic L and <sup>ϕ</sup> a property of <sup>L</sup>, decidable in finite time, we denote <sup>ω</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup> if a path <sup>ω</sup> satisfies <sup>ϕ</sup>. Let <sup>z</sup> : <sup>Ω</sup> ×L → {0, <sup>1</sup>} be the function that assigns 1 to a path <sup>ω</sup> if <sup>ω</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup> and 0 otherwise. In what follows, we assume that we have a procedure that draws path ω with respect to P<sup>A</sup> and outputs z(ω,ϕ). Further, we denote γ(A, ϕ) the probability that a path drawn with respect to P<sup>A</sup> satisfies ϕ. We omit the property or the matrix in the notation when it is clear from the context. Finally, note that the behavior of z(., ϕ) can be modeled as a Bernoulli random variable Z<sup>ϕ</sup> parameterized by the mean value γ(A, ϕ).

Probably Approximately Correct (PAC) learning [28] is a framework for mathematical analysis of machine learning. Given ε > 0 and 0 <δ< 1, we say that a property <sup>ϕ</sup> of <sup>L</sup> is PAC-learnable if there is an algorithm <sup>A</sup> such that, given a sample of n paths drawn according to the procedure, with probability of at least 1−δ, <sup>A</sup> outputs in polynomial time (in 1/ε and 1/δ) an approximation of the average value for Z<sup>ϕ</sup> close to its exact value, up to an error less than or equal to <sup>ε</sup>. Formally, <sup>ϕ</sup> is PAC-learnable if and only if <sup>A</sup> outputs an approximation ˆ<sup>γ</sup> such that:

$$\mathbb{P}\left(|\gamma-\hat{\gamma}|>\varepsilon\right)\leq\delta\tag{1}$$

Moreover, if the above statement for algorithm A is true for every property in L, we say that A is a PAC-learning algorithm for L.

#### **2.2 Monte-Carlo Estimation and Algorithm of Chen**

Given a sample W of n paths drawn according to P<sup>A</sup> until ϕ is satisfied or violated (for ϕ such that with probability 1, ϕ is eventually satisfied or violated), the crude Monte-Carlo estimator, denoted ˆγ<sup>W</sup> (A, ϕ), of the mean value for the random variable Z<sup>ϕ</sup> is given by the empirical frequency: ˆγ<sup>W</sup> (A, ϕ) = 1 n n <sup>i</sup>=1 <sup>z</sup>(ωi) <sup>≈</sup> <sup>γ</sup>(A, ϕ).

The Okamoto inequality [23] (also called the Chernoff bound in the literature) is often used to guarantee that the deviation between a Monte-Carlo estimator γˆ<sup>W</sup> and the exact value γ by more than ε > 0 is bounded by a predefined confidence parameter δ. However, several sequential algorithms have been recently proposed to guarantee the same confidence and accuracy with fewer samples<sup>1</sup>. In what follows, we use the Massart bound [22], implemented in the algorithm of Chen [6].

**Theorem 1 (Chen bound).** *Let* ε > 0*,* δ *such that* 0 <δ< 1 *and* γˆ<sup>W</sup> *be the crude Monte-Carlo estimator, based on* n *samples, of probability* γ*. If* <sup>n</sup> <sup>≥</sup> <sup>2</sup> <sup>ε</sup><sup>2</sup> log <sup>2</sup> δ <sup>1</sup> <sup>4</sup> <sup>−</sup> (<sup>|</sup> <sup>1</sup> <sup>2</sup> <sup>−</sup> <sup>γ</sup>ˆ<sup>W</sup> | − <sup>2</sup> <sup>3</sup> <sup>ε</sup>)<sup>2</sup> *,*

$$\mathbb{P}(|\gamma - \hat{\gamma}\_W| > \varepsilon) \le \delta.$$

To ease the readability, we write nsucc = n <sup>i</sup>=1 <sup>z</sup>(ωi) and <sup>H</sup>(n, nsucc, , δ) = 2 <sup>ε</sup><sup>2</sup> log <sup>2</sup> δ <sup>1</sup> <sup>4</sup> <sup>−</sup> (<sup>|</sup> <sup>1</sup> <sup>2</sup> <sup>−</sup> <sup>γ</sup>ˆ<sup>W</sup> | − <sup>2</sup> <sup>3</sup> <sup>ε</sup>)<sup>2</sup> . When it is clear from the context, we only write <sup>H</sup>(n). Then, the algorithm <sup>A</sup> that stops sampling as soon as <sup>n</sup> <sup>≥</sup> <sup>H</sup>(n) and outputs a crude Monte-Carlo estimator for γ(A, ϕ) is a PAC-learning algorithm for ϕ. The condition over n is called the stopping criteria of the algorithm. As far as we know, this algorithm requires fewer samples than the other sequential algorithms (see e.g., [18]). Note that the estimation of a probability close to 1/2 likely requires more samples since H(n) is maximized in ˆγ<sup>W</sup> = 1/2.

#### **3 Related Work**

Our work shares similar statistical results (see Sect. 2.3) with Statistical Model Checking (SMC) [32]. However, the context and the outputs are different. SMC is a simulation-based approach that aims to estimate one probability for a given property [9,29], within acceptable margins of error and confidence [17,18,33]. A challenge in SMC is posed by unbounded properties (e.g., fairness) since the sampled executions are finite. Some algorithms have been proposed to handle unbounded properties but they require the knowledge of the minimal probability transition of the system [1,14], which we avoid. While this restriction is light in many contexts, such as when every state and transition appears with a sufficiently high probability, contexts where probabilities are unknown and some are very small seems much harder to handle. In the following, we propose 2 solutions not requiring this assumption. The first one is the closest to SMC: we learn a Markov chain accurate for a given time-to-error property, and it does not require knowledge on the Markov chain. The second one is much more ambitious than SMC as it learns a Markov chain accurate for *all* its global behaviors, formalized as all properties of a temporal logic; it needs the assumption that the set

<sup>1</sup> We recall the Okamoto-Chernoff bound in the extended version (as well as the Massart bound), but we do not use it in this work.

of transitions is known, but not their probabilities nor a lower bound on them. This assumption may seem heavy, but it is reasonable for designers of systems, for which (a lower bound on) transition probabilities are not known (e.g. some error rate of components, etc).

For comparison with SMC, our final output is the (approximated) transition matrix of a DTMC rather than one (approximated) probability of a given property. This learned DTMC can be used for different purposes, e.g. as a component in a bigger model or as a simulation tool. In terms of performances, we will show that we can learn a DTMC w.r.t. a given property with the same number of samples as we need to estimate this property using SMC (see Sect. 5). That is, there is no penalty to estimate a DTMC rather than estimate one probability, and we can scale as well as SMC. In terms of expressivity, we can handle unbounded properties (e.g. fairness properties). Even better, we can learn a DTMC accurate uniformly over a possibly infinite set of properties, e.g. all formulas of CTL. This is something SMC is not designed to achieve.

Other related work can be cited: In [13], the authors investigate several distances for the estimation of the difference between DTMCs. But they do not propose algorithms for learning. In [16], the authors propose to analyze the learned model a posteriori to test whether it has some good properties. If not, then they tweak the model in order to enforce these properties. Also, several PAC-learning algorithms have been proposed for the estimation of stochastic systems [5,10] but these works focus on local transitions instead of global properties.

#### **4 Problem Statement**

In this work, we are interested to learn a DTMC model from a stochastic system S such that the behaviors of the system and the model are similar. We assume that the original system is a DTMC parameterized by a matrix A of transition probabilities. The transition probabilities are unknown, but the set of states of the DTMC is assumed to be known.

Our goal is to provide a learning algorithm A that guarantees an accurate estimation of S with respect to certain global properties. For that, a sampling process is defined as follows. A path (i.e., a sequence of states from <sup>s</sup>0) of <sup>S</sup> is observed, and at steps specified by the sampling process, a reset action is performed, setting <sup>S</sup> back to its initial state <sup>s</sup>0. Then another path is generated. This process generates a set W of paths, called traces, used to learn a matrix Aˆ<sup>W</sup> . Formally, we want to provide a learning algorithm that guarantees the following specification:

$$\mathbb{P}(\mathcal{D}(A, \hat{A}\_W) > \varepsilon) \le \delta \tag{2}$$

where ε > 0 and δ > 0 are respectively *accuracy* and *confidence* parameters and <sup>D</sup>(A, <sup>A</sup>ˆ<sup>W</sup> ) is a measure of the divergence between <sup>A</sup> and <sup>A</sup>ˆ<sup>W</sup> .

There exist several ways to specify the divergence between two transition matrices, e.g., the Kullback-Leibler divergence [19] or a distance based on a matrix norm. However, the existing notions remain heuristic because they are based on the difference between the individual probabilistic transitions of the matrix. We argue that what matters in practice is often to quantify the similarity between the global behaviors of the systems and the learned model.

In order to specify the behaviors of interest, we use a property ϕ or a set of properties Ψ on the set of states visited. We are interested in the difference between the probabilities of ϕ (i.e., the measure of the set of runs satisfying ϕ) with respect to A and Aˆ<sup>W</sup> . We want to ensure that this difference is less than some predefined <sup>ε</sup> with (high) probability 1 <sup>−</sup> <sup>δ</sup>. Hence, we define:

$$\mathcal{D}\_{\varphi}(A, \hat{A}\_W) = |\gamma(A, \varphi) - \gamma(\hat{A}\_W, \varphi)| \tag{3}$$

$$\mathcal{D}\_{\Psi}(A, \hat{A}\_W) = \max\_{\varphi \in \Psi} (\mathcal{D}\_{\varphi}(A, \hat{A}\_W)) \tag{4}$$

Our problem is to construct an algorithm which takes the following as inputs:


and provides a learning procedure sampling a set W of paths, outputs Aˆ<sup>W</sup> , and terminates the sampling procedure while fulfilling Specification (2), with D = D<sup>ϕ</sup> (= D<sup>Ψ</sup> ).

In what follows, we assume that the confidence level δ and absolute error ε are fixed. We first start with a negative result: if Ψ is the set of LTL formulas [2], such a learning process is impossible.

**Theorem 2.** *Given* ε > 0*,* 0 <δ< 1*, and a finite set* W *of paths randomly drawn with respect to a DTMC* A*, there is no learning strategy such that, for every LTL formula* ϕ*,*

$$\mathbb{P}(|\gamma(A,\varphi) - \gamma(\hat{A}\_W, \varphi)| > \varepsilon) \le \delta \tag{5}$$

Note that contrary to Theorem 1, the deviation in Theorem 2 is a difference between two exact probabilities (of the original system and of a learned model). The theorem holds as long as Aˆ<sup>W</sup> and A are not strictly equal, no matter how Aˆ<sup>W</sup> is learned. To prove this theorem, we show that, for any number of observations, we can always define a sequence of LTL properties that violates the specification above. It only exploits a single deviation in one transition. The proof, inspired by a result from [13], is given in the extended version.

*Example 1.* We show in this example that in general, one needs to have some knowledge on the system in order to perform PAC learning - either a positive lower bound > 0 on the lowest probability transition, as in [1,14], or the support of transitions (but no knowledge on their probabilities), as we use in Sect. 6. Further, we show that the latter assumption does not imply the former, as even if no transitions are overlooked, the error in some reachability property can be arbitrarily close to 0.5 even with arbitrarily small error on the transition probabilities.

**Fig. 3.** Three DTMCs A, A, ˆ Bˆ (from left to right), with 0 <η< 2τ < 1

Let us consider DTMCs A, A, ˆ Bˆ in Fig. 3, and formula **F** s<sup>2</sup> stating that s<sup>2</sup> is eventually reached. The probabilities to satisfy this formula in A, A, ˆ Bˆ are respectively P<sup>A</sup>(**F** s2) = <sup>1</sup> <sup>2</sup> , <sup>P</sup>A<sup>ˆ</sup> (**F** s2) = <sup>2</sup>τ−<sup>η</sup> <sup>4</sup><sup>τ</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> <sup>−</sup> <sup>η</sup> <sup>4</sup><sup>τ</sup> and <sup>P</sup>B<sup>ˆ</sup> (**F** s2) = 0.

Assume that A is the real system and that Aˆ and Bˆ are DTMCs we learned from A. Obviously, one wants to avoid learning Bˆ from A, as the probability of **F** s<sup>2</sup> is very different in Bˆ and in Aˆ (0 instead of 0.5). If one knows that τ> for some lower bound > 0, then one can generate enough samples from s<sup>1</sup> to evaluate τ with an arbitrarily small error <sup>η</sup> <sup>2</sup> << on probability transitions with an arbitrarily high confidence, and in particular learn a DTMC similar to Aˆ.

On the other hand, if one knows there are transitions from s<sup>1</sup> to s<sup>2</sup> and to s3, then immediately, one does not learn DTMC Bˆ, but a DTMC similar to DTMC Aˆ (using e.g. Laplace smoothing [12]). While this part is straightforward with this assumption, evaluating τ is much harder when one does not know a priori a lower bound > 0 such that τ> . That is very important: while one can make sure that the error <sup>η</sup> <sup>2</sup> on probability transitions is arbitrarily small, if <sup>τ</sup> is unknown, then it could be the case that τ is as small as <sup>η</sup> 2(1−ε) <sup>&</sup>gt; <sup>η</sup> <sup>2</sup> , for a small ε > 0. This gives us PA<sup>ˆ</sup> (**F** s2) = <sup>1</sup> <sup>2</sup> <sup>−</sup> <sup>1</sup>−<sup>ε</sup> <sup>2</sup> = <sup>ε</sup> <sup>2</sup> , which is arbitrarily small, whereas P<sup>A</sup>(**F** s2)=0.5, leading to a huge error in the probability to reach s2. We work around that problem in Sect. 6 by defining and computing the *conditioning* of DTMC Aˆ. In some particular cases, as the one discussed in the next section, one can avoid that altogether (actually, the conditioning in these cases is perfect (=1), and it needs not be computed explicitly).

#### **5 Learning for a Time-to-failure Property**

In this section, we focus on property ϕ of reaching a failure state s<sup>F</sup> from an initial state s<sup>0</sup> without re-passing by the initial state, which is often used for assessing the failure rate of a system and the mean time between failures (see e.g., [25]). We assume that with probability 1, the runs eventually re-pass by s<sup>0</sup> or reach s<sup>F</sup> . Also, without loss of generality, we assume that there is a unique failure state s<sup>F</sup> in A. We denote γ(A, ϕ) the probability, given DTMC A, of satisfying property ϕ, i.e., the probability of a failure between two visits of s0.

Assume that the stochastic system <sup>S</sup> is observed from state <sup>s</sup>0. Between two visits of s0, property ϕ can be monitored. If s<sup>F</sup> is observed between two instances of <sup>s</sup>0, we say that the path <sup>ω</sup> <sup>=</sup> <sup>s</sup><sup>0</sup> · <sup>ρ</sup> · <sup>s</sup><sup>F</sup> satisfies <sup>ϕ</sup>, with <sup>s</sup>0, s<sup>F</sup> <sup>∈</sup>/ <sup>ρ</sup>. Otherwise, if <sup>s</sup><sup>0</sup> is visited again from <sup>s</sup>0, then we say that the path <sup>ω</sup> <sup>=</sup> <sup>s</sup><sup>0</sup> · <sup>ρ</sup> · <sup>s</sup><sup>0</sup> violates <sup>ϕ</sup>, with <sup>s</sup>0, s<sup>F</sup> <sup>∈</sup>/ <sup>ρ</sup>. We call *traces* paths of the form <sup>ω</sup> <sup>=</sup> <sup>s</sup><sup>0</sup> · <sup>ρ</sup> ·(s<sup>0</sup> <sup>∨</sup> <sup>s</sup><sup>F</sup> ) with <sup>s</sup>0, s<sup>F</sup> <sup>∈</sup>/ <sup>ρ</sup>. In the following, we show that it is sufficient to use a *frequency estimator* to learn a DTMC which provides a good approximation for such a property.

#### **5.1 Frequency Estimation of a DTMC**

Given a set W of n traces, we denote n<sup>W</sup> ij the number of times a transition from state i to state j has occurred and n<sup>W</sup> <sup>i</sup> the number of times a transition has been taken from state i.

The *frequency estimator* of <sup>A</sup> is the DTMC <sup>A</sup>ˆ<sup>W</sup> = (ˆaij )<sup>1</sup>≤i,j≤<sup>m</sup> given by <sup>a</sup>ˆij <sup>=</sup> <sup>n</sup><sup>W</sup> ij nWi for all i, j, with m <sup>i</sup>=1 <sup>n</sup><sup>W</sup> <sup>i</sup> = m i=1 m <sup>j</sup>=1 <sup>n</sup><sup>W</sup> ij <sup>=</sup> <sup>|</sup>W|. In other words, to learn Aˆ<sup>W</sup> , it suffices to count the number of times a transition from i to j occurred, and divide by the number of times state i has been observed. The matrix Aˆ<sup>W</sup> is trivially a DTMC, except for states i which have not been visited. In this case, one can set ˆaij = <sup>1</sup> <sup>m</sup> for all states <sup>j</sup> and obtain a DTMC. This has no impact on the behavior of Aˆ<sup>W</sup> as i is not reachable from s<sup>0</sup> in Aˆ<sup>W</sup> .

Let Aˆ<sup>W</sup> be the matrix learned using the frequency estimator from the set <sup>W</sup> of traces, and let <sup>A</sup> be the real probabilistic matrix of the original system <sup>S</sup>. We show that, in the case of time-to-failure properties, γ(Aˆ<sup>W</sup> , ϕ) is equal to the crude Monte Carlo estimator ˆγ<sup>W</sup> (A, ϕ) induced by W.

#### **5.2 PAC Bounds for a Time-to-failure Property**

We start by stating the main result of this section, bounding the error between γ(A, ϕ) and γ(Aˆ<sup>W</sup> , ϕ):

**Theorem 3.** *Given a set* <sup>W</sup> *of* <sup>n</sup> *traces such that* <sup>n</sup> <sup>=</sup> H(n) *, we have:*

$$\mathbb{P}\left(|\gamma(A,\varphi) - \gamma(\hat{A}\_W,\varphi)| > \varepsilon\right) \le \delta \tag{6}$$

*where* Aˆ<sup>W</sup> *is the frequency estimator of* A*.*

To prove Theorem (3), we first invoke Theorem 1 to establish:

$$\mathbb{P}\left(|\gamma(A,\varphi) - \hat{\gamma}\_W(A,\varphi)| > \varepsilon\right) \le \delta \tag{7}$$

It remains to show that ˆγ<sup>W</sup> (A, ϕ) = γ(Aˆ<sup>W</sup> , ϕ):

**Proposition 1.** *Given a set* W *of traces,* γ(Aˆ<sup>W</sup> , ϕ)=ˆγ<sup>W</sup> (A, ϕ)*.*

It might be appealing to think that this result can be proved by induction on the size of the traces, mimicking the proof of computation of reachability probabilities by linear programming [2]. This is actually not the case. The remaining of this section is devoted to proving Proposition (1).

We first define q<sup>W</sup> (u) the number of occurrences of sequence u in the traces of W. Note that u can be a state, an individual transition or even a path. We also use the following definitions in the proof.

**Definition 2 (Equivalence).** *Two sets of traces* W *and* W *are equivalent if for all* s, t <sup>∈</sup> <sup>S</sup>*,* <sup>q</sup><sup>W</sup> (s·t) <sup>q</sup><sup>W</sup> (s) <sup>=</sup> <sup>q</sup>W- (s·t) qW-(s) *.*

We define a set of traces W equivalent with W, implying that Aˆ<sup>W</sup> = AˆW- . This set W of traces satisfies the following:

**Lemma 1.** *For any set of traces* W*, there exists a set of traces* W *such that:*


The proof of Lemma 1 is provided in the extended version. In Lemma 1, (i) ensures that Aˆ<sup>W</sup>- = Aˆ<sup>W</sup> and (ii) ensures the equality between the proportion of runs of W passing by s and satisfying γ, denoted ˆγ<sup>s</sup> W- , and the probability of reaching s<sup>F</sup> before s<sup>0</sup> starting from s with respect to Aˆ<sup>W</sup>-. Formally,

**Lemma 2.** *For all* <sup>s</sup> <sup>∈</sup> <sup>S</sup>*,* <sup>P</sup><sup>A</sup>ˆW- <sup>s</sup> (*reach* s<sup>f</sup> *before* s0)=ˆγ<sup>s</sup> W-*.*

*Proof.* Let S<sup>0</sup> be the set of states s with no path in Aˆ<sup>W</sup> from s to s<sup>f</sup> without passing through <sup>s</sup>0. For all <sup>s</sup> <sup>∈</sup> <sup>S</sup>0, let <sup>p</sup><sup>s</sup> = 0. Also, let <sup>p</sup><sup>s</sup><sup>f</sup> = 1. Let <sup>S</sup><sup>1</sup> <sup>=</sup> <sup>S</sup>\(S0∪{s<sup>f</sup> }). Consider the system of Eq. (8) with variables (ps)<sup>s</sup>∈S<sup>1</sup> <sup>∈</sup> [0, 1]|S1<sup>|</sup> :

$$\forall s \in S\_1, \quad p\_s = \sum\_{t=1}^m \hat{A}\_{W'}(s, t) p\_t \tag{8}$$

The system of Eq. (8) admits a unique solution according to [2] (Theorem 10.19. page 766). Then, (P<sup>A</sup>ˆW- <sup>s</sup> (reach <sup>s</sup><sup>f</sup> before <sup>s</sup>0))<sup>s</sup>∈S<sup>1</sup> is trivially a solution of (8). But, since W satisfies the conditions of Lemma 1, we also have that (ˆγ<sup>s</sup> W- )<sup>s</sup>∈S<sup>1</sup> is a solution of (8), and thus we have the desired equality. 

Notice that Lemma 2 does not hold in general with the set W. We have:

$$\begin{aligned} \hat{\gamma}\_W(A,\varphi) &= \hat{\gamma}\_W^{s\_0} \quad \text{(by definition)}\\ &= \hat{\gamma}\_{W'}^{s\_0} \quad \text{(by Lemma 1)}\\ &= \mathbb{P}\_{s\_0}^{\hat{A}\_{W'}} \text{(reach } s\_f \text{ before } s\_0\text{)} \quad \text{(by Lemma 2)}\\ &= \mathbb{P}\_{s\_0}^{\hat{A}\_W} \text{(reach } s\_f \text{ before } s\_0\text{)} \quad \text{(by Lemma 1)}\\ &= \gamma(\hat{A}\_W, \varphi) \quad \text{(by definition)}. \end{aligned}$$

That concludes the proof of Proposition 1. It shows that learning can be as efficient as statistical model-checking on comparable properties.

#### **6 Learning for the Full CTL Logic**

In this section, we learn a DTMC Aˆ<sup>W</sup> such that Aˆ<sup>W</sup> and A have similar behaviors over all CTL formulas. This provides a much stronger result than on timeto-failure property, e.g., properties can involve liveness and fairness, and more importantly they are not known before the learning. Notice that PCTL [2] cannot be used, since an infinitesimal error on one > 0 probability can change the probability of a PCTL formula from 0 to 1. (State)-CTL is defined as follows:

**Definition 3.** *Let* P rop *be the set of state names. (State)-CTL is defined by the following grammar* <sup>ϕ</sup> ::= ⊥ | | <sup>p</sup> | ¬<sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup> <sup>∧</sup> <sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup> <sup>∨</sup> <sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup> <sup>∧</sup> <sup>ϕ</sup> <sup>|</sup> **AX**<sup>ϕ</sup> <sup>|</sup> **EX**<sup>ϕ</sup> <sup>|</sup> **AF**<sup>ϕ</sup> <sup>|</sup> **EF**<sup>ϕ</sup> <sup>|</sup> **AF**<sup>ϕ</sup> <sup>|</sup> **EG**<sup>ϕ</sup> <sup>|</sup> **AG**<sup>ϕ</sup> <sup>|</sup> **<sup>E</sup>**(ϕ**U**ϕ) <sup>|</sup> **<sup>A</sup>**(ϕ**U**ϕ)*, with* <sup>p</sup> <sup>∈</sup> P rop*.* **<sup>E</sup>***(xists) and* **<sup>A</sup>***(ll) are quantifiers on paths, ne***X***t,* **<sup>G</sup>***lobally,* **<sup>F</sup>***inally and* **U***ntil are path-specific quantifiers. Notice that some operators are redundant. A minimal set of operators is* {,∨,¬, **EG**, **EU**, **EX**}*.*

As we want to compute the probability of *paths* satisfying a CTL formula, we consider the set Ψ of *path-CTL* properties, that is formulas ϕ of the form ϕ = **X**ϕ1, ϕ = ϕ1**U**ϕ2, ϕ = **F**ϕ<sup>1</sup> or ϕ = **G**ϕ1, with ϕ1, ϕ<sup>2</sup> (state)-CTL formulas. For instance, the property considered in the previous section is (¬s0)**U**s<sup>F</sup> .

In this section, for the sake of simplicity, the finite set W of traces is obtained by observing paths till a state is seen twice on the path. Then, the reset action is used and another trace is obtained from another path. That is, a trace ω from <sup>W</sup> is of the form <sup>ω</sup> <sup>=</sup> <sup>ρ</sup> · <sup>s</sup> · <sup>ρ</sup> · <sup>s</sup>, with <sup>ρ</sup> · <sup>s</sup> · <sup>ρ</sup> a loop-free path.

As explained in example 1, some additional knowledge on the system is necessary. In this section, we assume that the support of transition probabilities is known, i.e., for any state <sup>i</sup>, we know the set of states <sup>j</sup> such that <sup>a</sup>ij = 0. This assumption is needed both for Theorem 5 and to apply Laplace smoothing.

#### **6.1 Learning DTMCs with Laplace Smoothing**

Let α > 0. For any state s, let k<sup>s</sup> be the number of successors of s, that we know by hypothesis, and T = - <sup>s</sup>∈<sup>S</sup> <sup>k</sup><sup>s</sup> be the number of non-zero transitions. Let W be a set of traces, n<sup>W</sup> ij the number of transitions from state <sup>i</sup> to state <sup>j</sup>, and n<sup>W</sup> <sup>i</sup> = - <sup>j</sup> <sup>n</sup><sup>W</sup> ij . The *estimator for* <sup>W</sup> *with Laplace smoothing* <sup>α</sup> is the DTMC Aˆ<sup>α</sup> <sup>W</sup> = (ˆaij )<sup>1</sup>≤i,j≤<sup>m</sup> given for all i, j by:

$$
\hat{a}\_{ij} = \frac{n\_{ij}^W + \alpha}{n\_i^W + k\_i \alpha} \text{ if } a\_{ij} \neq 0 \quad \text{and} \quad \hat{a}\_{ij} = 0 \text{ otherwise}
$$

In comparison with the frequency estimator, the Laplace smoothing adds for each state s a term α to the numerator and k<sup>s</sup> times α to the denominator. This preserves the fact that Aˆ<sup>α</sup> <sup>W</sup> is a Markov chain, and it ensures that ˆaij = 0 iff <sup>a</sup>ij = 0. In particular, compared with the frequency estimator, it avoids creating zeros in the probability tables.

#### **6.2 Conditioning and Probability Bounds**

Using Laplace smoothing slightly changes the probability of each transition by an additive offset η. We now explain how this small error η impacts the error on the probability of a CTL property.

Let <sup>A</sup> be a DTMC, and <sup>A</sup><sup>η</sup> be a DTMC such that <sup>A</sup>η(i, j) = 0 iff <sup>A</sup>(i, j) = 0 for all states i, j, and such that - <sup>j</sup> <sup>|</sup>Aη(i, j) <sup>−</sup> <sup>A</sup>(i, j)| ≤ <sup>η</sup> for all states <sup>i</sup>. For all states <sup>s</sup> <sup>∈</sup> <sup>S</sup>, let <sup>R</sup>(s) be the set of states <sup>i</sup> such that there exists a path from <sup>i</sup> to <sup>s</sup>. Let <sup>R</sup>∗(s) = <sup>R</sup>(s) \ {s}. Since both DTMCs have the same support, <sup>R</sup> (and also <sup>R</sup>∗) is equal for <sup>A</sup> and <sup>A</sup>η. Given <sup>m</sup> the number of states, the conditioning of <sup>A</sup> for <sup>s</sup> <sup>∈</sup> <sup>S</sup> and <sup>≤</sup> <sup>m</sup> is:

$$\operatorname{Cond}\_s^\ell(A) = \min\_{i \in R\_\*\left(s\right)} \mathbb{P}\_i^A \left(\mathbf{F}\_{\leq \ell} \neg R\_\*\left(s\right)\right)$$

i.e., the minimal probability from state <sup>i</sup> <sup>∈</sup> <sup>R</sup>∗(s) to move away from <sup>R</sup>∗(s) in at most steps. Let <sup>s</sup> be the minimal value such that Cond <sup>s</sup> <sup>s</sup> (A) <sup>&</sup>gt; 0. This minimal <sup>s</sup> exists as Cond<sup>m</sup> <sup>s</sup> (A) <sup>&</sup>gt; 0 since, for all <sup>s</sup> <sup>∈</sup> <sup>S</sup> and <sup>i</sup> <sup>∈</sup> <sup>R</sup>∗(s), there is at least one path reaching <sup>s</sup> from <sup>i</sup> (this path leaves <sup>R</sup>∗(s)), and taking a cycle-free path, we obtain a path of length at most m. Thus, the probability P<sup>A</sup> <sup>i</sup> (**F**≤<sup>m</sup>¬R∗(s)) is at least the positive probability of the cylinder defined by this finite path. Formally,

**Theorem 4.** *Denoting* ϕ *the property of reaching state* s *in DTMC* A*, we have:*

$$|\gamma(A,\varphi) - \gamma(A\_{\eta},\varphi)| < \frac{\ell\_s \cdot \eta}{Cond\_s^{\ell\_s}(A)}$$

*Proof.* Let v<sup>s</sup> be the stochastic vector with vs(s) = 1. We denote v<sup>0</sup> = v<sup>s</sup><sup>0</sup> . Let <sup>s</sup> <sup>∈</sup> <sup>S</sup>. We assume that <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>R</sup>∗(s) (else <sup>γ</sup>(A, ϕ) = <sup>γ</sup>(Aη, ϕ) and the result is trivial). Without loss of generality, we can also assume that A(s, s) = Aη(s, s) = 1 (as we are interested in reaching s at any step). With this assumption:

$$|\gamma(A,\varphi) - \gamma(A\_{\eta},\varphi)| = \lim\_{t \to \infty} |v\_0 \cdot (A^t - A\_{\eta}^t) \cdot v\_s|.$$

We bound this error, through bounding by induction on t:

$$E(t) = \max\_{i \in R\_{\bullet}(s)} |v\_i \cdot (A^t - A^t\_\eta) \cdot v\_s|.$$

We then have trivially:

$$|\gamma(A,\varphi) - \gamma(A\_{\eta},\varphi)| \le \lim\_{t \to \infty} E(t).$$

Note that for <sup>i</sup> <sup>=</sup> <sup>s</sup>, lim<sup>t</sup>→∞ <sup>v</sup><sup>i</sup> · (A<sup>t</sup> ) · <sup>v</sup><sup>s</sup> = 1 = lim<sup>t</sup>→∞ <sup>v</sup><sup>i</sup> · <sup>A</sup><sup>t</sup> <sup>η</sup> · <sup>v</sup>s, and thus their difference is null.

Let <sup>t</sup> <sup>∈</sup> <sup>N</sup>. We let <sup>j</sup> <sup>∈</sup> <sup>R</sup>∗(s) such that <sup>E</sup>(t) = <sup>|</sup>v<sup>j</sup> · (A<sup>t</sup> <sup>−</sup> <sup>A</sup><sup>t</sup> <sup>η</sup>) · <sup>v</sup>s|.

By the triangular inequality, introducing the term <sup>v</sup><sup>j</sup> · <sup>A</sup> <sup>s</sup>At−<sup>k</sup> <sup>η</sup> · <sup>v</sup><sup>s</sup> <sup>−</sup> <sup>v</sup><sup>j</sup> · A <sup>s</sup>At−<sup>k</sup> <sup>η</sup> · <sup>v</sup><sup>s</sup> = 0, we have:

$$E(t) \le |v\_j \cdot (A^t\_\eta - A^{\ell\_s} A^{t-\ell\_s}\_\eta) \cdot v\_s| + |(v\_j \cdot A^{\ell\_s}) \cdot (A^{t-\ell\_s}\_\eta - A^{t-\ell\_s}) \cdot v\_s|.$$

We separate vector (v<sup>j</sup> · <sup>A</sup> <sup>s</sup> ) = <sup>w</sup><sup>1</sup> <sup>+</sup> <sup>w</sup><sup>2</sup> <sup>+</sup> <sup>w</sup><sup>3</sup> in three sub-stochastic vectors <sup>w</sup>1, w2, w3: vector <sup>w</sup><sup>1</sup> is over {s}, and thus we have <sup>w</sup><sup>1</sup> ·At<sup>−</sup> <sup>s</sup> <sup>η</sup> <sup>=</sup> <sup>w</sup><sup>1</sup> <sup>=</sup> <sup>w</sup><sup>1</sup> ·At<sup>−</sup> <sup>s</sup> , and the term cancels out. Vector <sup>w</sup><sup>2</sup> is over states of <sup>R</sup>∗(s), with - <sup>i</sup>∈R<sup>∗</sup> <sup>w</sup>2[i] <sup>≤</sup> (1 <sup>−</sup> Cond <sup>s</sup> <sup>s</sup> (A)), and we obtain an inductive term <sup>≤</sup> (1 <sup>−</sup> Cond <sup>s</sup> <sup>s</sup> (A))E(<sup>t</sup> <sup>−</sup> s). Last, vector <sup>w</sup><sup>3</sup> is over states not in <sup>R</sup>(s), and we have <sup>w</sup><sup>3</sup> · <sup>A</sup><sup>t</sup><sup>−</sup> <sup>s</sup> <sup>η</sup> · <sup>v</sup><sup>s</sup> =0= <sup>w</sup><sup>3</sup> · <sup>A</sup><sup>t</sup><sup>−</sup> <sup>s</sup> · <sup>v</sup>s, and the term cancels out.

We also obtain that <sup>|</sup>v<sup>j</sup> · (A<sup>t</sup> <sup>η</sup> <sup>−</sup> <sup>A</sup> <sup>s</sup>A<sup>t</sup><sup>−</sup> <sup>s</sup> <sup>η</sup> ) · <sup>v</sup>s| ≤ <sup>s</sup> · <sup>η</sup>. Thus, we have the inductive formula <sup>E</sup>(t) <sup>≤</sup> (1−Cond <sup>s</sup> <sup>s</sup> (A))E(t<sup>−</sup> s) + <sup>s</sup> · <sup>η</sup>. It yields for all <sup>t</sup> <sup>∈</sup> <sup>N</sup>:

$$E(t) \le \left(\ell\_s \cdot \eta\right) \sum\_{i=1}^{\infty} (1 - \text{Cond}\_s^{\ell\_s}(A))^i$$

$$E(t) \le \frac{\ell\_s \cdot \eta}{\text{Cond}\_s^{\ell\_s}(A)} \qquad\qquad\qquad\square$$

We can extend this result from reachability to formulas of the form S0**U**S<sup>F</sup> , where S0, S<sup>F</sup> are subsets of states. This formula means that we reach the set of states S<sup>F</sup> through only states in S<sup>0</sup> on the way.

We define R(S0, S<sup>F</sup> ) to be the set of states which can reach S<sup>F</sup> using only states of <sup>S</sup>0, and <sup>R</sup>∗(S0, S<sup>F</sup> ) = <sup>R</sup>(S0, S<sup>F</sup> ) \ <sup>S</sup><sup>F</sup> . For <sup>∈</sup> <sup>N</sup>, we let:

$$\text{Cond}\_{S\_0, S\_F}^{\ell}(A) = \min\_{i \in R\_\*\left(S\_0, S\_F\right)} \mathbb{P}\_i^A(\mathbf{F}\_{\leq \ell} \neg R\_\*(S\_0, S\_F) \lor \neg S\_0).$$

Now, one can remark that Cond<sup>S</sup>0,S<sup>F</sup> (A) <sup>≥</sup> CondS,S<sup>F</sup> (A) <sup>&</sup>gt; 0. Let Cond <sup>S</sup><sup>F</sup> (A) = Cond S,S<sup>F</sup> (A). We have Cond <sup>S</sup>0,S<sup>F</sup> (A) <sup>≥</sup> Cond <sup>S</sup><sup>F</sup> (A). As before, we let <sup>S</sup><sup>F</sup> <sup>≤</sup> <sup>m</sup> be the minimal such that Cond <sup>S</sup><sup>F</sup> (A) <sup>&</sup>gt; 0, and obtain:

**Theorem 5.** *Denoting* ϕ *the property* S0**U**S<sup>F</sup> *, we have, given DTMC* A*:*

$$|\gamma(A,\varphi) - \gamma(A\_{\eta},\varphi)| < \frac{\ell\_{S\_F} \cdot \eta}{Cond\_{S\_F}^{\ell\_{S\_F}}(A)}$$

We can actually improve this conditioning: we defined it as the probability to reach <sup>S</sup><sup>F</sup> or <sup>S</sup> \R(S, S<sup>F</sup> ). At the price of a more technical proof, we can obtain a better bound by replacing S<sup>F</sup> by the set of states R1(S<sup>F</sup> ) that have probability 1 to reach <sup>S</sup><sup>F</sup> . We let <sup>R</sup>∗(S<sup>F</sup> ) = <sup>R</sup>(S, S<sup>F</sup> ) \ <sup>R</sup>1(S<sup>F</sup> ) the set of states that can reach S<sup>F</sup> with < 1 probability, and define the *refined conditioning* as follows:

$$\overline{\operatorname{Cond}}\_{S\_F}^{\ell}(A) = \min\_{i \in \overline{R\_\bullet}(S\_F)} \mathbb{P}\_i^A(\mathbf{F}\_{\leq \ell} \neg \overline{R\_\ast}(S\_F)),$$

#### **6.3 Optimality of the Conditioning**

We show now that the bound we provide in Theorem 4 is close to optimal.

Consider again DTMCs A, Aˆ in Fig. 3 from example 1, and formula **F** s<sup>2</sup> stating that s<sup>2</sup> is eventually reached. The probabilities to satisfy this formula in A, Aˆ are respectively PA(**F** s2) = <sup>1</sup> <sup>2</sup> and <sup>P</sup>A<sup>ˆ</sup> (**F** s2) = <sup>1</sup> <sup>2</sup> <sup>−</sup> <sup>η</sup> <sup>4</sup><sup>τ</sup> . Assume that <sup>A</sup> is the real system and that Aˆ is the DTMC we learned from A.

As we do not know precisely the transition probabilities in A, we can only compute the conditioning on Aˆ and not on A (it suffices to swap A and A<sup>η</sup> in Theorem 4 and 5 to have the same formula using Cond(Aη) = Cond(Aˆ)). We have <sup>R</sup>(s2) = {s1, s2} and <sup>R</sup>∗(s2) = <sup>R</sup>∗(s2) = {s1}. The probability to stay in <sup>R</sup>∗(s2) after <sup>s</sup><sup>2</sup> = 1 step is (1 <sup>−</sup> <sup>2</sup><sup>τ</sup> ), and thus Cond<sup>1</sup> {s2}(Aˆ) = Cond<sup>1</sup> {s2}(Aˆ) = <sup>1</sup>−(1−2<sup>τ</sup> )=2<sup>τ</sup> . Taking <sup>A</sup><sup>η</sup> <sup>=</sup> <sup>A</sup>ˆ, Theorem <sup>5</sup> tells us that <sup>|</sup>P<sup>A</sup>(**<sup>F</sup>** <sup>s</sup>2)−PA<sup>ˆ</sup> (**<sup>F</sup>** <sup>s</sup>2)| ≤ η <sup>2</sup><sup>τ</sup> . Notice that on that example, using <sup>s</sup><sup>2</sup> = m = 3, we obtain Cond<sup>3</sup> {s2}(Aˆ) = <sup>1</sup> <sup>−</sup> (1 <sup>−</sup> <sup>2</sup><sup>τ</sup> )<sup>3</sup> <sup>≈</sup> <sup>6</sup><sup>τ</sup> , and we find a similar bound <sup>≈</sup> <sup>3</sup><sup>η</sup> <sup>6</sup><sup>τ</sup> <sup>=</sup> <sup>η</sup> 2τ .

Compare our bound with the exact difference <sup>|</sup>P<sup>A</sup>(**<sup>F</sup>** <sup>s</sup>2) <sup>−</sup> <sup>P</sup>A<sup>ˆ</sup> (**<sup>F</sup>** <sup>s</sup>2)<sup>|</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> − ( 1 <sup>2</sup> <sup>−</sup> <sup>η</sup> <sup>4</sup><sup>τ</sup> ) = <sup>η</sup> <sup>4</sup><sup>τ</sup> . Our upper bound only has an overhead factor of 2, even while the conditioning is particularly bad (small) in this example.

#### **6.4 PAC Bounds for** - *<sup>j</sup> <sup>|</sup>A***ˆ***<sup>W</sup>* **(***i, j***)** *<sup>−</sup> <sup>A</sup>***(***i, j***)***| ≤ <sup>η</sup>*

We use Theorem 1 in order to obtain PAC bounds. We use it to estimate individual transition probabilities, rather than the probability of a property.

Let <sup>W</sup> be a set of traces drawn with respect to <sup>A</sup> such that every <sup>ω</sup> <sup>∈</sup> <sup>W</sup> is of the form <sup>ω</sup> <sup>=</sup> <sup>ρ</sup> · <sup>s</sup> · <sup>ρ</sup> · <sup>s</sup>. Recall for each state i, j of <sup>S</sup>, <sup>n</sup><sup>W</sup> <sup>i</sup> is the number of transitions originating from i in W and n<sup>W</sup> ij is the number of transitions ss in W. Let δ = <sup>δ</sup> <sup>m</sup>stoch , where <sup>m</sup>stoch is the number of *stochastic* states, i.e., with at least two outgoing transitions.

We want to sample traces until the empirical transition probabilities <sup>n</sup><sup>W</sup> ij nWi are relatively close to the exact transition probabilities <sup>a</sup>ij , for all i, j <sup>∈</sup> <sup>S</sup>. For that, we need to determine a stopping criteria over the number of state occurrences (ni)<sup>1</sup>≤i≤<sup>m</sup> such that:

$$\mathbb{P}\left(\exists i \in S, \sum\_{j} \left| a\_{ij} - \frac{n\_{ij}^W}{n\_i^W} \right| > \varepsilon \right) \le \delta$$

First, note that for any observed state <sup>i</sup> <sup>∈</sup> <sup>S</sup>, if <sup>a</sup>ij = 0 (or <sup>a</sup>ij = 1), then with probability 1, <sup>n</sup><sup>W</sup> ij nWi = 0 (respectively <sup>n</sup><sup>W</sup> ij nWi = 1). Thus, for all ε > 0, <sup>|</sup>aij <sup>−</sup> <sup>n</sup><sup>W</sup> ij nWi | < ε with probability 1. Second, for two distinct states i and i , the transition probabilities <sup>n</sup><sup>W</sup> ij nWi and <sup>n</sup><sup>W</sup> ij- n<sup>W</sup> i- are independent for all j, j .

Let <sup>i</sup> <sup>∈</sup> <sup>S</sup> be a stochastic state. If we observe <sup>n</sup><sup>W</sup> <sup>i</sup> transitions from <sup>i</sup> such that n<sup>W</sup> <sup>i</sup> <sup>≥</sup> <sup>2</sup> <sup>ε</sup><sup>2</sup> log <sup>2</sup> δ- 1 <sup>4</sup> − max<sup>j</sup> <sup>|</sup> <sup>1</sup> <sup>2</sup> <sup>−</sup> <sup>n</sup><sup>W</sup> ij nWi | − <sup>2</sup> 3 ε 2 , then, according to Theorem 1, P <sup>m</sup> <sup>j</sup>=1 <sup>|</sup>aij <sup>−</sup> <sup>n</sup><sup>W</sup> ij nWi <sup>|</sup> > ε <sup>≤</sup> <sup>δ</sup> . In particular, P maxj∈<sup>S</sup> <sup>|</sup>aij <sup>−</sup> <sup>n</sup><sup>W</sup> ij nWi <sup>|</sup> > ε <sup>≤</sup> <sup>δ</sup> . Moreover, we have:

$$\begin{aligned} \mathbb{P}\left(\bigvee\_{j=1}^{m}\max\_{j\in S}|a\_{ij}-\frac{n\_{ij}^{W}}{n\_{i}^{W}}|>\varepsilon\right) &\leq \sum\_{j=1}^{m}\mathbb{P}\left(\max\_{j\in S}|a\_{ij}-\frac{n\_{ij}^{W}}{n\_{i}^{W}}|>\varepsilon\right) \\ &\leq m\_{\text{stoch}}\delta' \\ &\leq \delta \end{aligned}$$

In other words, the probability that "there exists a state <sup>i</sup> <sup>∈</sup> <sup>S</sup> such that the deviation between the exact and empirical outgoing transitions from i exceeds <sup>ε</sup>" is bounded by <sup>δ</sup> as soon as for each state <sup>i</sup> <sup>∈</sup> <sup>S</sup>, <sup>n</sup><sup>W</sup> <sup>i</sup> satisfies the stopping rule of the algorithm of Chen using ε and the corresponding δ . This gives the hypothesis - <sup>j</sup> <sup>|</sup>Aη(i, j) <sup>−</sup> <sup>A</sup>(i, j)| ≤ for all states <sup>i</sup> of Sect. 6.2.

## **6.5 A Matrix** *<sup>A</sup>***ˆ***<sup>W</sup>* **Accurate for all CTL properties**

We now use Laplace smoothing in order to ensure the other hypothesis <sup>A</sup>η(i, j) <sup>=</sup> 0 iff <sup>A</sup>(i, j) = 0 for all states i, j. For all <sup>i</sup> <sup>∈</sup> <sup>S</sup>, we define the Laplace offset depending on the state <sup>i</sup> as <sup>α</sup><sup>i</sup> <sup>=</sup> (n<sup>W</sup> <sup>i</sup> )2<sup>ε</sup> <sup>10</sup>·k<sup>2</sup> <sup>i</sup> max<sup>j</sup> nWij , where k<sup>i</sup> is the number of transitions from state i. This ensures that the error from Laplace smoothing is at most one tenth of the statistical error. Let <sup>α</sup> = (αi)<sup>1</sup>≤i≤<sup>m</sup>. From the sample set <sup>W</sup>, we output the matrix Aˆ<sup>α</sup> <sup>W</sup> = (ˆaij )<sup>1</sup>≤i,j≤<sup>m</sup> with Laplace smoothing <sup>α</sup><sup>i</sup> for state i, i.e.:

$$
\hat{a}\_{ij} = \frac{n\_{ij}^W + \alpha\_i}{n\_i^W + k\_i \alpha\_i} \text{ if } a\_{ij} \neq 0 \quad \text{and} \quad \hat{a}\_{ij} = 0 \text{ otherwise}
$$

It is easy to check that we have for all i, j <sup>∈</sup> <sup>S</sup>: <sup>a</sup>ˆij <sup>−</sup> <sup>n</sup><sup>W</sup> ij nWi <sup>≤</sup> <sup>ε</sup> 10·k<sup>i</sup>

That is, for all states i, - j <sup>a</sup>ˆij <sup>−</sup> <sup>n</sup><sup>W</sup> ij nWi <sup>≤</sup> <sup>ε</sup> <sup>10</sup> . Using the triangular inequality:

$$\mathbb{P}\left(\exists i \in S, \sum\_{j} |a\_{ij} - \hat{a}\_{ij}| > \frac{11}{10}\varepsilon\right) \leq \delta$$

For all <sup>i</sup> <sup>∈</sup> <sup>S</sup>, let <sup>H</sup>∗(n<sup>W</sup> <sup>i</sup> , , δ ) = max<sup>j</sup>∈<sup>S</sup> <sup>H</sup>(n<sup>W</sup> <sup>i</sup> , n<sup>W</sup> ij , , δ ) be the maximal Chen bound over all the transitions from state i. Let B(Aˆ<sup>α</sup> <sup>W</sup> ) = max<sup>S</sup><sup>F</sup> SF Cond-SF SF (Aˆ<sup>α</sup> <sup>W</sup> ) . Since in Theorem 5, the original model and the learned one have symmetric roles, by applying this theorem on Aˆ<sup>α</sup> <sup>W</sup> , we obtain that:

**Theorem 6.** *Given a set* W *of traces, for* 0 << 1 *and* 0 <δ< 1*, if for all* <sup>i</sup> <sup>∈</sup> <sup>S</sup>*,* <sup>n</sup><sup>W</sup> i ≥ <sup>11</sup> 10B(Aˆ<sup>α</sup> <sup>W</sup> ) 2 H∗(n<sup>W</sup> <sup>i</sup> , , δ )*, we have for any CTL property* ϕ*:*

$$\mathbb{P}(|\gamma(A,\varphi) - \gamma(\hat{A}\_W^{\alpha}, \varphi)|) > \varepsilon) \le \delta \tag{9}$$

*Proof.* First, ˆaij = 0 iff <sup>a</sup>ij = 0, by definition of <sup>A</sup>ˆ<sup>α</sup> <sup>W</sup> . Second, <sup>P</sup>(∃i,- <sup>j</sup> <sup>|</sup>aij <sup>−</sup> <sup>a</sup>ˆij <sup>|</sup> <sup>&</sup>gt; <sup>11</sup> <sup>10</sup> <sup>ε</sup>) <sup>≤</sup> <sup>δ</sup>. We can thus apply Theorem <sup>5</sup> on <sup>A</sup>ˆ<sup>α</sup> <sup>W</sup> , A and obtain (9) for <sup>ϕ</sup> any formula of the form <sup>S</sup>1**U**S2. It remains to show that for any formula <sup>ϕ</sup> <sup>∈</sup> <sup>Ψ</sup>, we can define <sup>S</sup>1, S<sup>2</sup> <sup>⊆</sup> <sup>S</sup> such that <sup>ϕ</sup> can be expressed as <sup>S</sup>1**U**S2.

Consider the different cases: If ϕ is of the form ϕ = ϕ1**U**ϕ<sup>2</sup> (it subsumes the case <sup>ϕ</sup> <sup>=</sup> **<sup>F</sup>**ϕ<sup>1</sup> <sup>=</sup> **U**ϕ1) with <sup>ϕ</sup>1, ϕ<sup>2</sup> CTL formulas, we define <sup>S</sup>1, S<sup>2</sup> as the sets of states satisfying ϕ<sup>1</sup> and ϕ2, and we have the equivalence (see [2] for more details). If <sup>ϕ</sup> <sup>=</sup> Xϕ2, define <sup>S</sup><sup>1</sup> <sup>=</sup> <sup>∅</sup> and <sup>S</sup><sup>2</sup> as the set of states satisfying <sup>ϕ</sup>2.

The last case is ϕ = **G**ϕ1, with ϕ<sup>1</sup> a CTL formula. Again, we define S<sup>1</sup> the set of states satisfying ϕ1, and S<sup>2</sup> the set of states satisfying the CTL formula **AG**ϕ1. The probability of the set of paths satisfying ϕ = **G**ϕ<sup>1</sup> is exactly the same as the probability of the set of paths satisfying <sup>S</sup>1**U**S2. 

#### **6.6 Algorithm**

We give more details about the learning process of a Markov Chain, accurate for every CTL formula. For completeness, we also provide in the extended version a similar algorithm for a time-to-failure property.

A path ω is observed from s<sup>0</sup> till a state is observed twice. Then ω is added to W and the reset operation is performed. We use Laplace smoothing to compute

#### **Algorithm 1:** Learning a matrix accurate for CTL

**Data:** S, s0, δ, ε **<sup>1</sup>** W := ∅ **<sup>2</sup>** m = |S| **<sup>3</sup>** for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>, <sup>n</sup><sup>W</sup> <sup>s</sup> := 0 **<sup>4</sup>** Compute Aˆ := Aˆ<sup>α</sup> W **<sup>5</sup>** Compute B := B(Aˆ) **<sup>6</sup> while** <sup>∃</sup><sup>s</sup> <sup>∈</sup> S, n<sup>W</sup> <sup>s</sup> < -11 <sup>10</sup>B(Aˆ) 2 H∗(n<sup>W</sup> <sup>s</sup> , , <sup>δ</sup> <sup>m</sup> ) **do <sup>7</sup>** Generate a new trace ω := s<sup>0</sup> ρ s<sup>1</sup> ρ s1, and reset S **<sup>8</sup>** for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>, <sup>n</sup><sup>W</sup> <sup>s</sup> := n<sup>W</sup> <sup>s</sup> + n{ω} <sup>s</sup> **9** add ω to W **<sup>10</sup>** Compute Aˆ := Aˆ<sup>α</sup> W **<sup>11</sup>** Compute B := B(Aˆ) **Output:** Aˆ<sup>α</sup> W

the corresponding matrix Aˆ<sup>α</sup> <sup>W</sup> . The error bound is computed on <sup>W</sup>, and a new path ω is then being generated if the error bound is not as small as desired.

This algorithm is guaranteed to terminate since, as traces are generated, with probability 1, n<sup>W</sup> <sup>s</sup> tends towards <sup>∞</sup>, <sup>A</sup>ˆ<sup>α</sup> <sup>W</sup> tends towards <sup>A</sup>, and <sup>B</sup>(Aˆ<sup>α</sup> <sup>W</sup> ) tends towards B(A).

#### **7 Evaluation and Discussion**

In this section, we first evaluate Algorithm 1 on 5 systems which are crafted to evaluate the algorithm under different conditions (e.g., rare states). The objective of the evaluation is to provide some idea on how many samples would be sufficient for learning accurate DTMC estimations, and compare learning for all properties of CTL and learning for one time-to-failure property.

Then, we evaluate our algorithm on very large PRISM systems (millions or billions of states). Because of the number of states, we cannot learn a DTMC accurate for all properties of CTL there: it would ask to visit every single state a number of times. However, we can learn a DTMC for one specific (unbounded) property. We compare with an hypothesis testing algorithm from [31] which can handle the same unbounded property through a reachability analysis using the topology of the system.

**Table 1.** Average number of observed events N (and relative standard deviation in parenthesis) given ε = 0.1 and δ = 0.05 for a time-to-failure property and for the full CTL logic using the refined conditioning Cond.


#### **7.1 Evaluation on Crafted Models**

We first describe the 5 systems: Systems 1 and 2 are three-state models described in Fig. 1 and Fig. 2. Systems 3 (resp. 5) is a 30-state (resp. 200-states) clique in which every individual transition probability is 1/30 (resp. 1/200). System 4 is a 64-state system modeling failure and repair of 3 types of components (3 components each, 9 components in total), see the extended version for a full description of the system, including a PRISM [20] model for the readers interested to investigate this system in details.

We tested time-to-failure properties by choosing as failure states s<sup>3</sup> for Systems 1, 2, 3, 5, and the state where all 9 components fail for System 4. We also tested Algorithm 1 (for full CTL logic) using the refined conditioning Cond. We performed our algorithms 100 times for each model, except for full CTL on System 4, for which we only tested once since it is very time-consuming. We report our results in Table 1 for ε = 0.1 and δ = 0.05. In particular, we output for each model its number of states and transitions. For each (set of) property, we provide the average number of observations (i.e. the number of samples times their average length) and the relative standard deviation (in parenthesis, that is the standard deviation divided by the average number of observed events).

The results show that we can learn a DTMC with more than 40000 stochastic transitions, such that the DTMC is accurate for all CTL formulas. Notice that for some particular systems such as System 4, it can take a lot of events to be observed before Algorithm 1 terminates. The reason is the presence of rare states, such as the state where all 9 components fail, which are observed with an extremely small probability. In order to evaluate the probabilities of CTL properties of the form: "if all 9 components fail, then CTL property ϕ is satisfied", this state needs to be explored many times, explaining the high number of events observed before the algorithm terminates. On the other hand, for properties that do not involve the 9 components failing as prior, such as time-to-failure, one does not need to observe this state even once to conclude that it has an extremely small probability to happen. This suggests that efficient algorithms could be developed for subsets of CTL formulas, e.g., in defining a subset of important events to consider. We believe that Theorem 4 and 5 could be extended to handle such cases. Over different runs, the results stay similar (notice the rather small relative standard deviation).

Comparing results for time-to-failure (or equivalently SMC) and for the full CTL logic is interesting. Excluding System 4 which involves rare states, the number of events that needs to be observed for the full CTL logic is 4.3 to 7 times more. Surprisingly, the highest difference is obtained on the smallest System 1. It is because every run of System 1 generated for time-to-failure is short (s1s2s<sup>1</sup> and s1s2s3). However, in Systems 2,3 and 5, samples for time-to-failure can be much longer, and the performances for time-to-failure (or equivalently SMC) is not so much better than for learning a DTMC accurate for all CTL properties.

For the systems we tested, the unoptimized Cond was particularly large (more than 20) because for many states s, there was probability 0 to leave R(s), and hence (s) was quite large. These are the cases where Cond is much more efficient, as then we can choose <sup>s</sup> = 1 as the probability to reach s from states in R(s) is 1 (R1(s) = <sup>R</sup>(s) and <sup>R</sup>∗(s) = <sup>∅</sup>). We used Cond in our algorithm.

Finally, we evaluate experimental confidence by comparing the time-to-failure probabilities in the learned DTMC and the original system. We repeat our algorithms 1000 times on System 1 and 2 (with ε = 0.1 and δ = 0.05). These probabilities differ by less than ε, respectively 999 and 995 times out of 1000. Specification (2) is thus largely fulfilled (the specification should be ensured 950 out of 1000 times), that empirically endorses our approach. Hence, while our PAC bound over-approximates the confidence in the learned system (which is unavoidable), it is not that far from experimental values.

#### **7.2 Evaluation on Large Models**

We also evaluated our algorithm on large PRISM models, ranging from hundreds of thousands to billions of states. With these numbers of states, we cannot use the more ambitious learning over all the properties of CTL, which would need to visit every states a number of times. However, we can use our algorithm for learning a DTMC which is accurate given a particular (unbounded) property: it will visit only a fraction of the states, which is enough to give a model accurate for that property, with a well-learned kernel of states and some other states representatives for the remaining of the runs. We consider three test-cases from PRISM, satisfying the property that the sample stops with a conclusion (yes or no) with probability 1. Namely, *herman, leader* and *egl*.

**Table 2.** Results for ε = 0.01 and δ = 0.001 of our algorithm compared with sampling with reachability analysis [31], as reported in [14], page 20. Numbers of samples needed by our method are given by the Massart bound (resp. by the Okamoto-Chernoff bound in parenthesis). TO and MO means time out (> 15 minutes on an Opteron 6134) and memory out (> 5GB) respectively.


Our prototype tool used in the previous subsection is implemented in Scilab: it cannot simulate very large systems of PRISM. Instead, we use PRISM to generate the samples needed for the learning. Hence, we report the usual Okamoto-Chernoff bound on the number of samples, which is what is implemented in PRISM. We also compare with the Massart bound used by the Chen algorithm (see Sect. 2.2), which is implemented in our tool and is more efficient as it takes into account the probability of the property.

For each model, we report its parameters, its *size*, i.e. its number of states, the number of *samples* needed using the Massart bound (the conservative Okamoto-Chernoff bound is in parenthesis), and the average *path length*. For comparison, we consider an hypothesis testing algorithm from [31] which can also handle unbounded properties. It uses the knowledge of the topology to do reachability analysis to stop the sampling if the property cannot be reached anymore. Hypothesis testing is used to decide with high confidence whether a probability exceeds a threshold or not. This requires less samples than SMC algorithms which estimate probabilities, but it is also less precise. We chose to compare with this algorithm because as in our work, it does not require knowledge on the probabilities, such as a lower bound on the transition probabilities needed by e.g. [14]. We do not report runtime as they cannot be compared (different platforms, different nature of result, etc.).

There are several conclusions we can draw from the experimental results (shown in Table 2). First, the number of samples from our algorithm (Chen algorithm implementing the Massart bound) are larger than in the algorithm from [31]. This is because they do hypothesis testing, which requires less samples than even estimating the probability of a property, while we learn a DTMC accurate for this property. For *herman* and *leader*, the difference is small (2.5x), because it is a case where the Massart bound is very efficient (80 times better than Okamoto-Chernoff implemented in PRISM). The *egl* system is the worst-case for the Massart bound (the probability of the property is <sup>1</sup> <sup>2</sup> ), and it coincides with Okamoto-Chernoff. The difference with [31] is 40x in that case. Also, as shown in *egl*, paths in our algorithm can be a bit larger than in the algorithm from [31], where they can be stopped early by the reachability analysis. However, the differences are never larger than 3x. On the other hand, we learn a model representative of the original system for a given property, while [31] only provide a yes/no answer to hypothesis testing (performing SMC evaluating the probability of a property with the Massart bound would give exactly the same number of samples as we report for our learning algorithm). Last, the reachability analysis from [31] does time out or memory out on some complex systems, which is not the case with our algorithm.

#### **8 Conclusion**

In this paper, we provided theoretical grounds for obtaining global PAC bounds when learning a DTMC: we bound the error made between the behaviors of the model and of the system, formalized using temporal logics. While it is not possible to obtain a learning framework for LTL properties, we provide it for the whole CTL logic. For subsets of CTL, e.g. for a fixed timed-to-failure property, we obtain better bounds, as efficient as Statistical MC. Overall, this work should help in the recent trends of establishing trusted machine learning [16].

Our techniques are useful for designers of systems for which probabilities are governed by uncertain forces (e.g. error rates): in this case, it is not easy to have a lower bound on the minimal transition probability, but we can assume that the set of transitions is known. Technically, our techniques provides rationale to set the constant in Laplace smoothing, otherwise left to an expert to set.

Some cases remain problematic, such as systems where states are visited very rarely. Nevertheless, we foresee potential solutions involving rare event simulation [21]. This goes beyond the scope of this work and it is left to future work.

**Acknowledgment.** Jun Sun's research is supported by the National Research Foundation Singapore under its AI Singapore Programme (Award Number: AISG-RP-2019- 012).

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Unbounded-Time Safety Verification of Stochastic Differential Dynamics**

Shenghua Feng1,2(B) , Mingshuai Chen3(B) , Bai Xue1,2(B) , Sriram Sankaranarayanan4(B) , and Naijun Zhan1,2(B)

> <sup>1</sup> SKLCS, Institute of Software, CAS, Beijing, China {fengsh,xuebai,znj}@ios.ac.cn <sup>2</sup> University of Chinese Academy of Sciences, Beijing, China <sup>3</sup> Lehrstuhl f¨ur Informatik 2, RWTH Aachen University, Aachen, Germany chenms@cs.rwth-aachen.de <sup>4</sup> University of Colorado, Boulder, USA sriram.sankaranarayanan@colorado.edu

**Abstract.** In this paper, we propose a method for bounding the probability that a stochastic differential equation (SDE) system violates a safety specification over the infinite time horizon. SDEs are mathematical models of stochastic processes that capture how states evolve continuously in time. They are widely used in numerous applications such as engineered systems (e.g., modeling how pedestrians move in an intersection), computational finance (e.g., modeling stock option prices), and ecological processes (e.g., population change over time). Previously the safety verification problem has been tackled over finite and infinite time horizons using a diverse set of approaches. The approach in this paper attempts to connect the two views by first identifying a finite time bound, beyond which the probability of a safety violation can be bounded by a negligibly small number. This is achieved by discovering an exponential barrier certificate that proves exponentially converging bounds on the probability of safety violations over time. Once the finite time interval is found, a finite-time verification approach is used to bound the probability of violation over this interval. We demonstrate our approach over a collection of interesting examples from the literature, wherein our approach can be used to find tight bounds on the violation probability of safety properties over the infinite time horizon.

**Keywords:** Stochastic differential equations (SDEs) · Unbounded safety verification · Failure probability bound · Barrier certificates

This work was partially funded by NSFC under grant No. 61625206, 61732001 and 61872341, by the ERC Advanced Project FRAPPANT under grant No. 787914, by the US NSF under grant No. CCF 1815983 and by the CAS Pioneer Hundred Talents Program under grant No. Y8YC235015.

#### **1 Introduction**

In this paper, we investigate the problem of verifying probabilistic safety properties for continuous stochastic dynamics modeled by stochastic differential equations (SDEs). The study of SDEs dates back to the 1900s when, e.g., Einstein used SDEs to model the phenomenon of Brownian motion [10]. Since then, SDEs have witnessed numerous applications including models of disturbances in engineered systems ranging from wind forces [37] to pedestrian motion [14]; models of financial instruments such as options [5]; and models of biological/ecological processes for instance predator-prey models [25]. In the meantime, SDEs are hard to reason about: they are defined using ideas from stochastic calculus that reimagine basic concepts such as integration in order to conform to the basic laws of probability and stochastic processes [24].

There are many important verification problems for SDEs. Prominent topics include the safety verification problem which seeks to know the probability that a given SDE with specified initial conditions will enter an unsafe region (or leave a safe region) over a given time horizon. Generally, safety verification can be performed over a finite-time horizon setting, wherein the probability is sought over a finite time interval [0, T]. On the other hand, the infinite-time horizon problem seeks a bound on the probability of satisfying a safety property over the unbounded time horizon [0,∞). A handful of methods have been proposed for verifying SDE systems, such as the barrier certificate-based methods over both the infinite time horizon [27] and finite time horizons [35], the moment optimization-based method over finite time horizons [33] and the Hamilton-Jacobi-based method over the infinite time horizon [16]. The novelty of our work lies in the reduction of infinite-time horizon verification problems to finite time problems.

In this paper, we propose a novel reduction-based method to verify unbounded-time safety properties of stochastic systems modeled as nonlinear polynomial SDEs. We employ a similar idea as in [11] (for verifying delay differential equations) that reduces the safety verification problem over the infinite time horizon to the one over a finite time interval. This is achieved by computing an *exponential stochastic barrier certificate* which witnesses an exponentially decreasing upper bound on the probability that a target system violates a given safety specification. Consequently, for any - > 0, we can identify a time instant T beyond which the violation (a.k.a. failure) probability is smaller than the negligibly small cutoff -. The reduced bounded-time safety verification problem over [0, T] can hence be tackled by any of the available methods. We furthermore present an alternative method to address the reduced finite-time horizon verification problem based on the discovery of a *time-dependent stochastic barrier certificate*. We show that both the exponential and the time-dependent stochastic barrier certificate can be synthesized by respectively solving a pertinent *semidefinite programming* (SDP) [38] optimization problem. Experimental results on some interesting examples taken from the literature demonstrated the effectiveness of the reduction and that our method often produces tighter bounds on the failure probability. Our approach has some broad similarities to related approaches in symbolic execution of probabilistic programs that conclude facts about infinitely many behaviors by analyzing finitely many paths in the program that account for a sufficient probability among all the behaviors [31].

**Contributions.** The main contributions of this work can be summarized as follows: (1) We reduce the unbounded-time safety verification of stochastic systems to a bounded one, based on an exponentially decreasing bound on the failure probability which guarantees the dominance of the overall failure probability by the truncated finite time horizon. (2) We show how the obtained bound on the overall failure probability is tighter than that produced by existing methods for some interesting SDEs.

**Related Work.** The use of mathematical models of processes–ranging from finite state machines to various types of differential equations–has allowed us to reason about rich behaviors of Cyber-Physical Systems produced by the interaction between digital computers and physical plants [29]. In this regard, many modeling formalisms have been studied including finite state machines, ordinary differential equations (ODEs), timed automata, hybrid automata, etc. [8], on top of which a large variety of verification problems have been extensively investigated, e.g., safety verification through reachability analysis and temporal logic verification [3].

In the existing literature on formal verification, ODEs are often used to describe the behavior of deterministic continuous-time systems. However, these models have been shown over-simplistic in many applications that involve time delays, nondeterministic inputs and stochastic noises. SDEs hence arose as an important class of models that have been employed in practical domains covering, among others [24], financial models such as the famous Black-Scholes model used extensively in the theory of options pricing [5], wind disturbances [37], human pedestrian motion [14] and ecological models [25].

In what follows, we place our work in the context of formal verification techniques tailored for stochastic differential dynamics modeled as SDEs, and discuss contributions thereof that are highly related to our approach. Unbounded-time stochastic safety verification of SDE systems was first studied by Prajna et al. in [27,28], where a typical supermartingale was employed as a stochastic barrier certificate followed by computational conditions derived from Doob's martingale inequality [15]. Thereafter, the stochastic barrier certificate-based method was extended to cater for bounded-time safety verification by Steinhardt and Tedrake [35] by leveraging a relaxed formulation called c-martingale for locally stable systems. The barrier certificate-based method by Prajna et al. (ibid.) for unbounded-time safety verification often leads to conservative bound on the failure probability. On the other hand, Steinhardt and Tedrake (ibid.) established impressive probability bounds but only for finite time horizons. In order to reduce the conservativeness, we propose a method of reducing the unbounded safety verification to a bounded one. Although our method in this paper is also based on the construction of stochastic barrier certificates, the gain of stochastic barrier certificates only helps to identify a finite time interval such that the violation probability of interest beyond this time interval is arbitrarily negligibly small. A time-dependent barrier certificate is further proposed to solve the resulting bounded-time safety verification. The Unbounded-time safety verification problem has also been studied by Koutsoukos and Riley [16], who linked the reachability probability to the viscosity solution of certain Hamilton-Jacobi partial differential equations, under restrictions on bounded state space and non-degenerate diffusion. Grid-based numerical approaches, e.g., the finite difference method in [16] and the level set method in [22], are traditionally used to solve these equations, leading to the fact that the Hamilton-Jacobi reachability method only scales well to systems of special structures. More recently, a novel constraint solving-based method has been proposed in [20] for algebraically over- and under-approximating the reachability probability, which is nevertheless limited to bounded-time safety verification. In addition to the abovementioned methods, we refer the readers to [7] for a Dirichlet form-based method for stochastic hybrid systems featuring "nice" Markov properties, while to [6,18,39] and [1,17] respectively for related contributions in statistical and discrete/numerical methods for stochastic verification and control.

Finally, we mention a relation between the ideas in this paper and previously proposed ideas for (non-stochastic) ODEs due to Sogokon et al. [34]. The key similarity lies in the use of a non-negative matrix through which a vector of functions whose derivatives are related to their current value. Whereas Sogokon et al. explored this idea for ODEs, we do so for SDEs. Another significant difference, in our work, is that we use the super-martingale functions to identify a time horizon [0, T] and bound the probability of safety violation beyond T.

The reminder of this paper is structured as follows. Section 2 introduces stochastic differential dynamics modeled by SDEs and the unbounded-time safety verification problem of interest. Section 3 elucidates the reduction of unbounded safety verification to bounded ones based on the witness of stochastic barrier certificates. Section 4 presents the SDP formulation for discovering such barrier certificates over the reduced bounded time interval. After demonstrating our method on several examples in Sect. 5, we conclude the paper in Sect. 6.

#### **2 Problem Formulation**

**Notations.** Let <sup>R</sup> be the set of real numbers. For a vector <sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>, <sup>x</sup><sup>i</sup> refers to its i-th component and <sup>|</sup>x<sup>|</sup> denotes the <sup>2</sup>-norm. Particularly, **<sup>0</sup>** and **<sup>1</sup>** denote respectively the vector of zeros and ones of appropriate dimension, and the comparison between vectors, e.g., x <sup>≤</sup> **<sup>0</sup>**, is component-wise. We define for δ > 0, B(x, δ) =- {x <sup>∈</sup> <sup>R</sup><sup>n</sup> | |x <sup>−</sup> <sup>x</sup>| ≤ <sup>δ</sup>} as the <sup>δ</sup>-closed ball centered at <sup>x</sup>. We abuse the notation |·| for an m <sup>×</sup> n matrix M as <sup>|</sup>M<sup>|</sup> <sup>=</sup>- m i=1 n <sup>j</sup>=1 <sup>|</sup>Mij <sup>|</sup> 2 . The exponential of a square matrix <sup>M</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>×<sup>n</sup>, denoted by e<sup>M</sup>, is the n <sup>×</sup> n matrix given by the power series e<sup>M</sup> <sup>=</sup>- ∞ k=0 1 <sup>k</sup>!M<sup>k</sup>. For a set X ⊆ <sup>R</sup><sup>n</sup>, ∂<sup>X</sup> , <sup>X</sup> and <sup>X</sup> <sup>o</sup> denote respectively the boundary, the closure and the interior of <sup>X</sup> . Let C<sup>k</sup> be the space of functions on <sup>R</sup> with continuous derivatives up to order k; a function f(t, x): <sup>R</sup> <sup>×</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> is in <sup>C</sup><sup>1</sup>,<sup>2</sup>(<sup>R</sup> <sup>×</sup> <sup>R</sup><sup>n</sup>) if <sup>f</sup> <sup>∈</sup> <sup>C</sup><sup>1</sup> w.r.t. <sup>t</sup> <sup>∈</sup> <sup>R</sup> and <sup>f</sup> <sup>∈</sup> <sup>C</sup><sup>2</sup> w.r.t. x <sup>∈</sup> <sup>R</sup><sup>n</sup>.

Let (Ω, <sup>F</sup>, P) be a probability space, where Ω is a sample space, F ⊆ <sup>2</sup><sup>Ω</sup> is a σ-algebra on Ω, and P : F → [0, 1] is a probability measure on the measurable space (Ω, <sup>F</sup>). A *random variable* X defined on the probability space (Ω, <sup>F</sup>, P) is an <sup>F</sup>-measurable function X : Ω <sup>→</sup> <sup>R</sup>n; its *expectation* (w.r.t. <sup>P</sup>) is denoted by <sup>E</sup>[X]. Every random variable <sup>X</sup> induces a probability measure <sup>μ</sup><sup>X</sup> : B → [0, 1] on <sup>R</sup>n, defined as μX(B) <sup>=</sup>- P(X−<sup>1</sup>(B)) for Borel sets B in the Borel σ-algebra <sup>B</sup> on <sup>R</sup>n. <sup>μ</sup><sup>X</sup> is called the *distribution of* <sup>X</sup>; its *support set* is supp(μX) <sup>=</sup>- <sup>μ</sup>X(B)><sup>0</sup> <sup>B</sup>, which will also be referred to as the support of X.

A (continuous-time) *stochastic process* is a parametrized collection of random variables {X<sup>t</sup>}<sup>t</sup>∈<sup>T</sup> where the parameter space <sup>T</sup> is interpreted as, unless explicitly notated in this paper, the halfline [0,∞). We sometimes further drop the brackets in {X<sup>t</sup>} when it is clear from the context. A collection {F<sup>t</sup> <sup>|</sup> <sup>t</sup> <sup>≥</sup> <sup>0</sup>} of <sup>σ</sup>-algebras of sets in <sup>F</sup> is a *filtration* if <sup>F</sup><sup>t</sup> ⊆ Ft+<sup>s</sup> for t, s <sup>∈</sup> [0,∞). Intuitively, <sup>F</sup><sup>t</sup> carries the information known to an observer at time t. A random variable τ : Ω <sup>→</sup> [0,∞) is called a *stopping time* w.r.t. some filtration {F<sup>t</sup> <sup>|</sup> <sup>t</sup> <sup>≥</sup> <sup>0</sup>} of <sup>F</sup> if {<sup>τ</sup> <sup>≤</sup> <sup>t</sup>}∈F<sup>t</sup> for all <sup>t</sup> <sup>≥</sup> 0. A stochastic process {X<sup>t</sup>} adapted to a filtration {F<sup>t</sup> <sup>|</sup> <sup>t</sup> <sup>≥</sup> <sup>0</sup>} is called a *supermartingale* if <sup>E</sup>[X<sup>t</sup>] <sup>&</sup>lt; <sup>∞</sup> for any <sup>t</sup> <sup>≥</sup> 0 and <sup>E</sup>[X<sup>t</sup> | Fs] <sup>≤</sup> <sup>X</sup><sup>s</sup> for all 0 <sup>≤</sup> s <sup>≤</sup> t. That is, the conditional expected value of any future observation, given all the past observations, is no larger than the most recent observation.

**Stochastic Differential Dynamics.** We consider a class of dynamical systems featuring stochastic differential dynamics governed by time-homogeneous SDEs of the form<sup>1</sup>

$$\text{d}X\_t = b(X\_t)\text{ dt} + \sigma(X\_t)\text{ d}W\_t, \quad t \ge 0 \tag{1}$$

where {X<sup>t</sup>} is an n-dimensional continuous-time stochastic process, {W<sup>t</sup>} denotes an m-dimensional Wiener process (standard Brownian motion), b : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup><sup>n</sup> is a vector-valued polynomial flow field (called the *drift coefficient*) modeling deterministic evolution of the system, and σ : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup><sup>n</sup>×<sup>m</sup> is a matrix-valued polynomial flow field (called the *diffusion coefficient*) that encodes the coupling of the system to Gaussian white noise dW<sup>t</sup>.

Suppose there exists a Lipschitz constant D s.t. <sup>|</sup>b(x) <sup>−</sup> b(y)<sup>|</sup> <sup>+</sup> <sup>|</sup>σ(x) <sup>−</sup> σ(y)| ≤ D <sup>|</sup>x <sup>−</sup> y<sup>|</sup> holds for all x, y <sup>∈</sup> <sup>R</sup><sup>n</sup>. Then, given an initial state (a random variable) X<sup>0</sup>, an SDE of the form (1) has a unique *solution* which is a stochastic process X<sup>t</sup>(ω) = <sup>X</sup>(t, ω): [0,∞) <sup>×</sup> <sup>Ω</sup> <sup>→</sup> <sup>R</sup><sup>n</sup> satisfying the stochastic integral equation (`a la Itˆo's interpretation) 

$$X\_t = X\_0 + \int\_0^t b(X\_s) \, \mathrm{d}s + \int\_0^t \sigma(X\_s) \, \mathrm{d}W\_s. \tag{2}$$

The solution {X<sup>t</sup>} in Eq. (2) is also referred to as an *(Itˆo) diffusion process*, and will be denoted by X<sup>0</sup>,X<sup>0</sup> <sup>t</sup> (or simply <sup>X</sup><sup>X</sup><sup>0</sup> <sup>t</sup> ), if necessary, to indicate the initial condition <sup>X</sup><sup>0</sup> at <sup>t</sup> = 0.

A great deal of information about a diffusion process can be encoded in a partial differential operator termed the *infinitesimal generator*, which generalizes

<sup>1</sup> The general time-inhomogeneous case with time-dependent b and σ can be reduced to this form (cf. [24, Chap. 10]).

the Lie derivative that captures the evolution of a function along the diffusion process:

**Definition 1 (Infinitesimal generator** [24]**).** *Let* {Xt} *be a (timehomogeneous) diffusion process in* <sup>R</sup>n*. The* infinitesimal generator <sup>A</sup> of <sup>X</sup><sup>t</sup> *is defined by*

$$\mathcal{A}f(s,x) = \lim\_{t \downarrow 0} \frac{E^{s,x}\left[f(s+t,X\_t)\right] - f(s,x)}{t}, \quad x \in \mathbb{R}^n.$$

*The set of functions* <sup>f</sup> : <sup>R</sup> <sup>×</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> *s.t. the limit exists at* (s, x) *is denoted by* <sup>D</sup>A(s, x)*, while* <sup>D</sup><sup>A</sup> *denotes the set of functions for which the limit exists for all* (s, x) <sup>∈</sup> <sup>R</sup> <sup>×</sup> <sup>R</sup><sup>n</sup>*.*

In subsequent sections, the readers may find applications of the operator A to a vector-valued function in a component-wise manner. The relation between <sup>A</sup> and the coefficients b, σ in SDE (1) is captured by the following result:

**Lemma 1** [24]**.** *Let* {X<sup>t</sup>} *be a diffusion process defined by Eq.* (1)*. If* <sup>f</sup> <sup>∈</sup> <sup>C</sup>1,<sup>2</sup>(<sup>R</sup> <sup>×</sup> <sup>R</sup><sup>n</sup>) *with compact support, then* <sup>f</sup> ∈ D<sup>A</sup> *and* 

\*\*24]. Let  $\{X\_t\}$  be a diffusion process defined by Eq. 1 with compact support, then  $f \in \mathcal{D}\_{\mathcal{A}}$  and

$$\mathcal{A}f(t, x) = \frac{\partial f}{\partial t} + \sum\_{i=1}^{n} b\_i(x) \frac{\partial f}{\partial x\_i} + \frac{1}{2} \sum\_{i,j} (\sigma \sigma^{\sf T})\_{ij} \frac{\partial^2 f}{\partial x\_i \partial x\_j}.$$

As a stochastic generalization of the Newton-Leibniz axiom, Dynkin's formula gives the expected value of any adequately smooth function of an Itˆo diffusion at a stopping time:

**Theorem 1 (Dynkin's formula** [9]**).** *Let* {X<sup>t</sup>} *be a diffusion process in* <sup>R</sup><sup>n</sup>*. Suppose* τ *is a stopping time with* E[τ ] < <sup>∞</sup>*, and* f <sup>∈</sup> C1,<sup>2</sup>(R×R<sup>n</sup>) *with compact support. Then* <sup>E</sup>h,x [f(τ,X<sup>τ</sup> )] = <sup>f</sup>(h, x) + <sup>E</sup>h,x <sup>τ</sup> 

$$E^{h,x}\left[f(\tau,X\_{\tau})\right] = f(h,x) + E^{h,x}\left[\int\_0^{\tau} \mathcal{A}f(s,X\_s) \,ds\right].$$

In order to specify the behavior of an Itˆo diffusion across the domain boundary, we introduce the concept of *stopped process*, which is a stochastic process that is forced to have the same value after a prescribed (possibly random) time.

**Definition 2 (Stopped process** [12]**).** *Given a stopping time* τ *and a stochastic process* {X<sup>t</sup>}*, the* stopped process {X<sup>τ</sup> <sup>t</sup> } *is defined by*

$$\begin{cases} \text{( $\textbf{Stopped process [12])}. Given a stopping time $ \tau $, $ t $, the stopped process $ \{X\_t^\tau\} $ is defined by $ \\ X^\tau(t,\omega) \cong X\_{t\wedge\tau}(\omega) = \begin{cases} X(t,\omega) & if \, t \le \tau(\omega), \\ X(\tau(\omega),\omega) & otherwise. \end{cases} \end{cases}$$

*Remark 1.* By definition, a stopped process preserves, among others, continuity and the Markov property, and hence the aforementioned results on a stochastic process apply also to a stopped process.

Now consider a stochastic system modeled by an SDE of the form (1) that evolves "within" a not necessarily bounded set X ⊆ <sup>R</sup>n. Since the solution {Xt} of Eq. (1) may escape from <sup>X</sup> at any time instant t > 0, due to the unbounded nature of Gaussian, we define a stopped process <sup>X</sup>˜<sup>t</sup> <sup>=</sup>- <sup>X</sup>t∧τ*<sup>X</sup>* with <sup>τ</sup><sup>X</sup> = inf - {t <sup>|</sup> <sup>X</sup><sup>t</sup> ∈ X} / . <sup>X</sup>˜<sup>t</sup> hence represents the process that will stop at the boundary of <sup>X</sup> . Denote the infinitesimal generator of the stopped process as <sup>A</sup>˜. One plausible property here is that, for all compactly-supported f <sup>∈</sup> C<sup>1</sup>,<sup>2</sup>(<sup>R</sup> <sup>×</sup> <sup>R</sup>n), A˜f(t, x) =

$$
\tilde{\mathcal{A}}f(t,x) = \begin{cases}
\mathcal{A}f(t,x) & \text{for } x \in \mathcal{X}^{\bullet}, \\
\frac{\partial f}{\partial t}(t,x) & \text{for } x \in \partial \mathcal{X}.
\end{cases}
\tag{3}
$$

**The** ∞**-Safety Problem.** Given an SDE of the form (1), a (not necessarily bounded<sup>2</sup>) domain set X ⊆ <sup>R</sup><sup>n</sup>, an initial set <sup>X</sup><sup>0</sup> ⊂ X , and an unsafe set <sup>X</sup><sup>u</sup> ⊂ X . We aim to bound the failure probability 

$$P\left(\exists t \in [0,\infty) \colon \tilde{X}\_t \in \mathcal{X}\_u\right),$$

for any initial state <sup>X</sup><sup>0</sup> whose support lies within <sup>X</sup>0. Accordingly, the <sup>T</sup>*-safety problem*, with T < <sup>∞</sup>, refers to the problem where one aims to bound the failure probability within the finite time horizon [0, T].

*Remark 2.* Roughly speaking, if we denote by <sup>φ</sup> the proposition "X˜<sup>t</sup> evolves within <sup>X</sup> " and by <sup>ψ</sup> the proposition "X˜<sup>t</sup> evolves into <sup>X</sup>u", then the above <sup>∞</sup> safety problem asks for a bound on the probability that the LTL formula φ <sup>U</sup>ψ holds.

## **3 Reducing** *<sup>∞</sup>***-Safety to** *<sup>T</sup>* **-Safety**

We dedicate this section to the reduction of the ∞-safety problem to its bounded counterpart. Observe that for any 0 <sup>≤</sup> T < <sup>∞</sup>,

$$P(\exists t \ge 0 \colon \tilde{X}\_t \in \mathcal{X}\_u) \le P(\exists t \in [0, T] \colon \tilde{X}\_t \in \mathcal{X}\_u) + P(\exists t \ge T \colon \tilde{X}\_t \in \mathcal{X}\_u).$$

The key idea behind our approach is to first compute an exponentially decreasing bound on the *tail failure probability* over [<sup>T</sup> <sup>∗</sup>,∞) (the computation of T <sup>∗</sup> <sup>≥</sup> <sup>0</sup> will be shown later), and then for any constant - > 0, we can identify (out of the exponentially decreasing bound) a time instant T˜ <sup>≥</sup> T <sup>∗</sup> such that <sup>P</sup>(∃<sup>t</sup> <sup>≥</sup> <sup>T</sup>˜ : <sup>X</sup>˜<sup>t</sup> ∈ Xu) <sup>≤</sup> -. The overall bound on the failure probability over [0,∞) can consequently be obtained by solving the truncated T˜-safety problem.

<sup>2</sup> In practice, if we can specify <sup>X</sup> based on prior knowledge when modeling a physical system, then the larger X we choose, the greater (bound on) failure probability we will obtain.

#### **3.1 Exponentially Decreasing Bound on the Tail Failure Probability**

We first state a result that gives conditions when a linear map keeps vector inequality:

## **Lemma 2** [4, Chap. 4]**.** *For a matrix* M <sup>∈</sup> <sup>R</sup>n×n*,*


The existence of an exponentially decreasing bound on the tail failure probability relies on a witness of a supermartingale of the exponential type:

**Theorem 2.** *Suppose there exists an essentially non-negative matrix* Λ <sup>∈</sup> <sup>R</sup><sup>m</sup>×<sup>m</sup>*, together with an* m*-dimensional polynomial function (termed* exponential stochastic barrier certificate*)* <sup>V</sup> (x)=(V<sup>1</sup>(x), V<sup>2</sup>(x),...,V<sup>m</sup>(x))<sup>T</sup>*, with* <sup>V</sup><sup>i</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> *for* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>m</sup>*, satisfying*3*,*<sup>4</sup>

$$V(x) \ge \mathbf{0} \quad \text{for } x \in \mathcal{X}, \tag{4}$$

$$V(x) \ge \mathbf{0} \quad \text{for } x \in \mathcal{X}. \tag{5}$$

$$\mathcal{A}V(x) \le -\Lambda V(x) \quad \text{for } x \in \mathcal{X},\tag{5}$$

$$\mathcal{A}V(x) \ll \alpha \quad \text{or} \quad \mathcal{A}\mathbf{u} \tag{6}$$

$$
\Lambda V(x) \le \mathbf{0} \quad \text{for } x \in \partial \mathcal{X}. \tag{6}
$$

*Define a function*

$$\begin{aligned} 0 \le \mathbf{0} \quad &\text{for } x \in \partial \mathcal{X}. \\\\ F(t, x) \cong \mathbf{e}^{At} V(x), \\ z. \end{aligned}$$

*then every component of* <sup>F</sup>(t, <sup>X</sup>˜t) *is a supermartingale.*

*Proof.* For cases with a bounded domain X , one can trivially extend the domain of F(t, x) s.t. F is compactly-supported, and thus Dynkin's formula in Theorem <sup>1</sup> applies immediately. For cases where X is unbounded, we introduce a stopping time τδ = inf - 

$$\tau\_{\delta} \stackrel{\frown}{=} \inf \left\{ t \mid F\left(t, \tilde{X}\_t\right) \ge \mathfrak{B}(\mathbf{0}, \delta) \right\},$$

and denote by X(δ) t = ( <sup>t</sup> <sup>∧</sup> <sup>τ</sup><sup>δ</sup>, <sup>X</sup>˜<sup>t</sup>∧τ<sup>δ</sup> ) the corresponding stopped process involving the timeline, and by <sup>A</sup>(δ) the corresponding infinitesimal generator. Then X(δ) t evolves within the δ-closed ball <sup>B</sup>(**0**, δ) and hence boils down to the case with a bounded domain. Moreover, by Eq. (3), we have ⎧

$$\begin{split} \mathcal{A}^{(\delta)}F\left(X\_{t}^{(\delta)}\right) &= \mathcal{A}^{(\delta)}F\left(t \wedge \tau\_{\delta}, \check{X}\_{t \wedge \tau\_{\delta}}\right) \\ &= \begin{cases} 0 & \text{if } \tau\_{\delta}(\omega) \le t, \\ \frac{\partial F}{\partial t}(t, X\_{t}) + \mathrm{e}^{\Lambda t} \mathcal{A}V(X\_{t}) \le 0 & \text{if } \tau\_{\delta}(\omega) > t \wedge \tau\_{\mathcal{X}}(\omega) > t, \\ \frac{\partial F}{\partial t}(t, X\_{t}) \le 0 & \text{if } \tau\_{\delta}(\omega) > t \wedge \tau\_{\mathcal{X}}(\omega) \le t, \end{cases} \end{split}$$

<sup>3</sup> Condition (5) is slightly stronger than the corresponding one used in [27,28], yet will lead to an exponentially decreasing bound on the tail failure probability in return.

<sup>4</sup> Condition (6) is to ensure that when <sup>X</sup>˜<sup>t</sup> stops at the boundary of <sup>X</sup> , we still have <sup>A</sup>˜<sup>V</sup> (x) ≤ −ΛV (x) for <sup>x</sup> <sup>∈</sup> <sup>∂</sup><sup>X</sup> . If <sup>X</sup> <sup>=</sup> <sup>R</sup><sup>n</sup>, however, this condition can be omitted.

where <sup>τ</sup><sup>X</sup> represents the time instant when escaping from the state space <sup>X</sup> . Note that the second and the third case hold due to the non-negativity of eΛt (as Λ is essentially non-negative), which implies that eΛt preserves vector inequalities (5) and (6). Hence by Dynkin's formula (in a component-wise manner), for fixed t, h <sup>∈</sup> [0,∞), we have 

$$\begin{split} E\left[F\left((t+h)\wedge\tau\_{\delta},\tilde{X}\_{(t+h)\wedge\tau\_{\delta}}\right)\mid\mathcal{F}\_{h}\right] &= E^{X^{(\delta)}\_{h}}\left[F\left(X^{(\delta)}\_{t+h}\right)\right] \\ &= F\left(X^{(\delta)}\_{h}\right) + E^{X^{(\delta)}\_{h}}\left[\int\_{0}^{t} \mathcal{A}^{(\delta)}F\left(X^{(\delta)}\_{s}\right)ds\right] \\ &\leq F\left(X^{(\delta)}\_{h}\right) \\ &= F\left(h\wedge\tau\_{\delta},\tilde{X}\_{h\wedge\tau\_{\delta}}\right). \end{split}$$
 Since  $F(t,x) > \mathbf{0}$ , by Fatou's lemma, we have 
$$\tilde{X}\_{t}^{\prime}\left(\mathbf{0},\mathbf{0},\mathbf{0}\right) = \mathbf{0}. \qquad \mathbf{0}.$$

Since F(t, x) > **<sup>0</sup>**, by Fatou's lemma, we have 

$$\begin{split} E\left[F\left(t+h,\check{X}\_{t+h}\right)\mid\mathcal{F}\_{h}\right] &= E\left[\liminf\_{\delta\to\infty} F\left((t+h)\wedge\tau\_{\delta},\check{X}\_{(t+h)\wedge\tau\_{\delta}}\right)\mid\mathcal{F}\_{h}\right] \\ &\leq \liminf\_{\delta\to\infty} E\left[F\left((t+h)\wedge\tau\_{\delta},\check{X}\_{(t+h)\wedge\tau\_{\delta}}\right)\mid\mathcal{F}\_{h}\right] \\ &\leq \liminf\_{\delta\to\infty} F\left(h\wedge\tau\_{\delta},\check{X}\_{h\wedge\tau\_{\delta}}\right) \\ &\leq F\left(h,\check{X}\_{h}\right). \end{split}$$

 

It follows consequently that every component of <sup>F</sup>(t, <sup>X</sup>˜t) is a supermartingale.

We will show in Sect. 4 that the synthesis of the exponential stochastic barrier certificate V (x) (and thereby the function F(t, x)) boils down to solving a pertinent SDP optimization problem.

In order to further establish the relation between the exponential supermartingale F(t, X˜t) (and thereby <sup>V</sup> (x)) and the bound on tail failure probability, we recall Doob's maximal inequality for supermartingales, which gives a bound on the probability that a non-negative supermartingale exceeds some given value over a given time interval:

**Lemma 3 (Doob's supermartingale inequality** [15]**).** *Let* {X<sup>t</sup>}t><sup>0</sup> *be a right continuous non-negative supermartingale adapted to a filtration* {F<sup>t</sup> <sup>|</sup> t > <sup>0</sup>}*. Then for any* λ > <sup>0</sup>*,* λP 

$$
\lambda P\left(\sup\_{t\geq 0} X\_t \geq \lambda\right) \leq E[X\_0].
$$

The following theorem claims an intermediate fact that will later reveal the exponentially decreasing bound on the tail failure probability. 

**Theorem 3.** *Suppose the conditions in Theorem 2 are satisfied. Then for any* T <sup>≥</sup> <sup>0</sup> *and any positive vector* γ <sup>∈</sup> <sup>R</sup><sup>m</sup>*,* 

$$P\left(\sup\_{t\geq T} V\left(\check{X}\_t\right) \geq \sup\_{t\geq T} \left(\mathrm{e}^{-At}\gamma\right)\right) \leq E\left[V\_i(X\_0)\right]/\gamma\_i \tag{7}$$

*holds for all* i ∈ {1,...,m}*.* 

*Proof.* Observe the following chain of (in-)equalities: 

$$\begin{split} P\left(\sup\_{t\geq T} V\left(\bar{X}\_{t}\right) \geq \sup\_{t\geq T} \left(\mathbf{e}^{-\Lambda t}\gamma\right)\right) &\leq P\left(\exists t \geq T \colon V\left(\bar{X}\_{t}\right) \geq \mathbf{e}^{-\Lambda t}\gamma\right) \\ &\leq P\left(\exists t \geq T \colon \mathbf{e}^{\Lambda t}V\left(\bar{X}\_{t}\right) \geq \gamma\right) &\left[\text{non-negative}\,\mathbf{e}^{\Lambda t}\right] \\ &= P\left(\sup\_{t\geq T} F\left(t, \bar{X}\_{t}\right) \geq \gamma\right) \\ &\leq P\left(\sup\_{t\geq T} F\_{i}\left(t, \bar{X}\_{t}\right) \geq \gamma\_{i}\right) \\ &\leq E\left[F\_{i}\left(T, \bar{X}\_{T}\right)\right]/\gamma\_{i} &\text{[Lemma 3]} \\ &\leq E\left[V\_{i}\left(X\_{0}\right)\right]/\gamma\_{i} &\text{[Theorem 2]} \end{split}$$

 

 

which holds for any i ∈ {1, <sup>2</sup>, ··· , m}. This completes the proof.

Now, we are ready to give the exponentially decreasing bound on the tail failure probability derived from Theorem 3. We start by considering the simple case where the barrier certificate V (x) is a scalar function, i.e., with m = 1.

**Proposition 1.** *Suppose there exists a positive constant* Λ <sup>∈</sup> <sup>R</sup> *and a scalar function* V : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> *satisfying Theorem 2. Then,* 

$$P\left(\sup\_{t\geq T} V\left(\tilde{X}\_t\right) \geq \gamma\right) \leq \frac{E\left[V(X\_0)\right]}{\mathbf{e}^{\Lambda T}\gamma} \tag{8}$$

*holds for any* γ > <sup>0</sup> *and* T <sup>≥</sup> <sup>0</sup>*. Moreover, if there exists* l > <sup>0</sup> *such that*

 

V (x) <sup>≥</sup> l *for all* x ∈ X<sup>u</sup>, 

*then*

$$P\left(\exists t \ge T \colon \tilde{X}\_t \in \mathcal{X}\_u\right) \le \frac{E[V(X\_0)]}{\mathbf{e}^{AT}l} \tag{9}$$

*holds for any* T <sup>≥</sup> <sup>0</sup>*.* 

*Proof.* Equation (8) holds since

$$\begin{aligned} \text{Equation (8) holds since} \\ \text{Equation (8) holds since} \\ P\left(\sup\_{t\geq T} V\left(\check{X}\_t\right) \geq \gamma\right) &= P\left(\sup\_{t\geq T} V\left(\check{X}\_t\right) \geq \mathbf{e}^{-AT}\left(\mathbf{e}^{AT}\gamma\right)\right) \\ &\leq P\left(\sup\_{t\geq T} V\left(\check{X}\_t\right) \geq \sup\_{t\geq T} \left(\mathbf{e}^{-At}\left(\mathbf{e}^{AT}\gamma\right)\right)\right) \\ &\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \left[\text{monotonicity on } t\right] \end{aligned}$$
 
$$\leq \frac{E[V(X\_0)]}{\mathbf{e}^{AT}\gamma}. \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \left[\text{Theorem 3}\right]$$

 

 

For Eq. (9), it is immediately obvious that

 

$$P\left(\exists t \ge T \colon \tilde{X}\_t \in \mathcal{X}\_u\right) \le P\left(\sup\_{t\ge T} V\left(\tilde{X}\_t\right) \ge l\right) \le \frac{E[V(X\_0)]}{\mathbf{e}^{\Lambda T} l}.$$

This completes the proof.

Now we lift the results to the slightly more involved case with m > 1.

**Proposition 2.** *Suppose there exists an essentially non-negative matrix* Λ <sup>∈</sup> <sup>R</sup><sup>m</sup>×<sup>m</sup> *and an* m*-dimensional polynomial function* V : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup><sup>m</sup> *satisfying Theorem 2. If all of the eigenvalues of* Λ *have positive real parts, i.e.,*

$$\min\_{1 \le i \le m} \{ \Re(\lambda\_i) \mid \lambda\_i \text{ is an eigenvalue of } A \} > 0,$$

$$\dots \qquad \qquad \qquad \qquad \dots \qquad \qquad \qquad \dots \qquad \qquad \dots$$

*then for any positive vector* <sup>γ</sup> <sup>∈</sup> <sup>R</sup><sup>m</sup>*, there exists* T <sup>∗</sup> <sup>=</sup> T <sup>∗</sup>(γ,M,Λ) <sup>∈</sup> <sup>R</sup> *such that for any* T <sup>≥</sup> T <sup>∗</sup>*,* 

$$P\left(\sup\_{t\geq T} V\left(\tilde{X}\_t\right) \geq \gamma\right) \leq \frac{E[V\_i(X\_0)]}{\left(\mathbf{e}^{MT}\gamma\right)\_i} \tag{10}$$

*holds for all* i ∈ {1,...,m}*. Here,* M *is an essentially non-negative matrix s.t. all of the eigenvalues of* <sup>Λ</sup>−<sup>M</sup> *have positive real parts*5*. Moreover, if there exists a positive vector* l <sup>∈</sup> <sup>R</sup><sup>m</sup> *such that*

$$V(x) \ge l \quad \text{for all } x \in \mathcal{X}\_u,$$

*then for any* T <sup>≥</sup> T <sup>∗</sup>*,*

$$P\left(\exists t \ge T \colon \tilde{X}\_t \in \mathcal{X}\_u\right) \le \frac{E[V\_i(X\_0)]}{\left(\mathbf{e}^{MT}l\right)\_i} \tag{11}$$

*holds for all* i ∈ {1,...,m}*.* 

*Proof.* By substituting <sup>γ</sup> in Eq. (7) with eMT γ, we have that for all T <sup>≥</sup> 0, 

$$\begin{split} \frac{E[V\_i(X\_0)]}{\left(\mathbf{e}^{MT}\boldsymbol{\gamma}\right)\_i} &\geq P\left(\sup\_{t\geq T} V\left(\bar{X}\_t\right) \geq \sup\_{t\geq T} \left(\mathbf{e}^{-\Lambda t} \mathbf{e}^{MT}\boldsymbol{\gamma}\right)\right) \\ &= P\left(\sup\_{t\geq T} V\left(\bar{X}\_t\right) \geq \sup\_{t\geq T} \left(\mathbf{e}^{-\Lambda(t-T)} \mathbf{e}^{-(\Lambda-M)T}\boldsymbol{\gamma}\right)\right) \end{split} \tag{12}$$

holds for any γ <sup>∈</sup> <sup>R</sup><sup>m</sup> with γ > **<sup>0</sup>**. Observe that 

$$\begin{aligned} \left| \sup\_{t \ge T} \left( \mathbf{e}^{-A(t-T)} \mathbf{e}^{-(A-M)T} \gamma \right) \right|\_{\infty} &= \left| \sup\_{t \ge 0} \left( \mathbf{e}^{-At} \mathbf{e}^{-(A-M)T} \gamma \right) \right|\_{\infty} \\ &\le \left| \sup\_{t \ge 0} \left( \mathbf{e}^{-At} \right) \right|\_{\infty} \left| \mathbf{e}^{-(A-M)T} \gamma \right|\_{\infty}, \\\ \overline{\mathbf{e}^{5} \text{Such matrix}} &\ M \text{ always exists, for instance, } M \cong A/2. \end{aligned}$$

Λ/2.

where |·|<sup>∞</sup> denotes the infinity norm. Moreover, since all of the eigenvalues of Λ <sup>−</sup> M have positive real parts, then by the Lyapunov stability established in the theory of ODEs, we have

$$\lim\_{T \to \infty} \mathbf{e}^{-(A-M)T} \gamma = \mathbf{0}.$$

There hence exists <sup>T</sup> <sup>∗</sup> s.t. for all T <sup>≥</sup> T <sup>∗</sup>,

$$\sup\_{t \ge T} \left( \mathbf{e}^{-A(t-T)} \mathbf{e}^{-(A-M)T} \gamma \right) \le \gamma. \tag{13}$$

By Combining Eq. (13) and Eq. (12), we obtain Eq. (10). For Eq. (11), it follows immediately that 

$$P\left(\exists t \ge T \colon \tilde{X}\_t \in \mathcal{X}\_u\right) \le P\left(\sup\_{t \ge T} V\left(\tilde{X}\_t\right) \ge l\right) \le \frac{E[V\_i(X\_0)]}{(\mathbf{e}^{MT}l)\_i}.$$

This completes the proof.

*Remark 3.* Proposition <sup>2</sup> argues the existence of T <sup>∗</sup> that suffices to "split off" the tail failure probability. From a computational perspective, this is algorithmically tractable as the matrix exponential involved in Eq. (13) is symbolically computable (cf., e.g., [23]).

The following theorem states the main result of this section, that is, for any given constant -, there exists T˜ <sup>≥</sup> 0 such that the truncated T˜-tail failure probability is bounded by -:

**Theorem 4.** *Suppose the conditions in Proposition 1 and 2 are satisfied. If there exists* α > <sup>0</sup>*, s.t.* <sup>∀</sup><sup>x</sup> ∈ X<sup>0</sup> : <sup>V</sup><sup>i</sup>(x) <sup>≤</sup> <sup>α</sup> *holds for some* <sup>i</sup> ∈ {1,...,m}*. Then for any* - > <sup>0</sup>*, there exists* T˜ <sup>≥</sup> <sup>0</sup> *such that* 

$$P\left(\exists t \ge \tilde{T} \colon \tilde{X}\_t \in \mathcal{X}\_u\right) \le \epsilon.$$

*Proof.* Observe that for Eq. (11) in Proposition 2, the assumption <sup>∀</sup>x <sup>∈</sup> <sup>X</sup><sup>0</sup> : <sup>V</sup><sup>i</sup>(x) <sup>≤</sup> <sup>α</sup> guarantees an upper bound on the numerator <sup>E</sup>[V<sup>i</sup>(X<sup>0</sup>)], while the essential non-negativity of M (with all its eigenvalues having positive real parts) ensures that the denominator (eMT <sup>l</sup>)<sup>i</sup> <sup>→</sup> <sup>+</sup><sup>∞</sup> as <sup>T</sup> → ∞. An analogous argument applies to Eq. (9) in Proposition 1. The claim in this theorem then follows immediately.

#### **3.2 Bounding the Failure Probability over [0***, T* **]**

The reduced T-safety problem can be solved by existing methods tailored for bounded verification of SDEs, e.g., [32,35]. In what follows, we propose an alternative method leveraging time-dependent polynomial stochastic barrier certificates. Our method requires constraints (on the barrier certificates) of simpler form compared to [35]; meanwhile, it yields strictly more expressive

form of barrier certificates, against the approach on unbounded verification as in [27,28], thus leading to theoretically non-looser (usually tighter) failure bound. A detailed argument will be given at the end of this section.

The following theorem states a sufficient condition, i.e., a collection of constraints on the time-dependent polynomial stochastic barrier certificates H(t, x), under which the failure probability of a stochastic system over a finite time horizon can be explicitly bounded from above.

**Theorem 5.** *Suppose there exists a constant* η > <sup>0</sup> *and a polynomial function (termed* time-dependent stochastic barrier certificate*)* H(t, x): <sup>R</sup> <sup>×</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>*, satisfying*<sup>6</sup>

$$H(t, x) \ge 0 \quad \text{for } (t, x) \in [0, T] \times \mathcal{X},\tag{14}$$

$$\lim\_{t \to \infty} (\iota\_{\mu}) \times \alpha \quad \iota\_{\mu} \times \iota\_{\nu} \quad \text{(15)} \\ \to \infty \quad \text{(16)} \\ \ll \omega \quad \text{(17)}$$

$$\mathcal{A}H(t,x) \le 0 \quad for \; (t,x) \in [0,T] \times \left(\mathcal{X} \; \middle| \; \mathcal{X}\_u\right),\tag{15}$$

$$\frac{\partial H}{\partial t} \le 0 \quad \text{for } (t, x) \in [0, T] \times \partial \mathcal{X},\tag{16}$$

$$H(t, x) \ge n \quad \text{for } (t, x) \in [0, T] \times \mathcal{X}\tag{17}$$

$$H(t, x) \ge \eta \quad \text{for } (t, x) \in [0, T] \times \mathcal{X}\_u. \tag{17}$$

*Then,*

$$P\left(\exists t \in [0, T] \colon \tilde{X}\_t \in \mathcal{X}\_u\right) \le \frac{E[H(0, X\_0)]}{\eta}.\tag{18}$$

 

 

*Proof.* Assume in the following that the system evolves within a bounded domain <sup>X</sup> <sup>7</sup>. Define a stopping time 

$$\begin{aligned} & \text{giving that the system evolves} \\ & \text{ne} \\ & \tau\_u \supseteq \inf \left\{ t \mid \bar{X}\_t \notin \mathcal{X} \backslash \mathcal{X}\_u \right\}, \\ & \tau\_u \end{aligned}$$

and denote by X(u) t = ( <sup>t</sup> <sup>∧</sup> <sup>τ</sup><sup>u</sup> <sup>∧</sup> T, <sup>X</sup>˜<sup>t</sup>∧τu∧<sup>T</sup> ) the corresponding stopped process, and by <sup>A</sup>(u) the corresponding infinitesimal generator. By Eq. (3), we have ⎧

$$\begin{split} \mathcal{A}^{(u)}H\left(X\_{t}^{(u)}\right) &= \mathcal{A}^{(u)}H\left(t \wedge \tau\_{u} \wedge T, \bar{X}\_{t \wedge \tau\_{u} \wedge T}\right) \\ &= \begin{cases} 0 & \text{if } t \ge T \vee t \ge \tau\_{u}(\omega), \\ \mathcal{A}H(t, X\_{t}) &\le 0 & \text{if } t < \min\{T, \tau\_{u}(\omega), \tau\_{\mathcal{X}}(\omega)\}, \\ \frac{\partial H}{\partial t}(t, X\_{t}) &\le 0 & \text{if } t < \min\{T, \tau\_{u}(\omega)\} \wedge t \ge \tau\_{\mathcal{X}}(\omega). \end{cases} \\ \text{Dynkin's formula, for fixed } t, h &\in [0, T], \text{ we have} \\ \bar{\tau}\_{u}(\omega) &\sim \tau\_{u}(\omega) \text{ and } \bar{\tau}\_{u}(\omega) \text{ is } \bar{\tau}\_{u}(\omega). \end{split}$$

By Dynkin's formula, for fixed t, h <sup>∈</sup> [0, T], we have 

$$\begin{aligned} E\left[H\left(X\_{t+h}^{(u)}\right)\mid \mathcal{F}\_h\right] &= E^{X\_h^{(u)}}\left[H\left(X\_{t+h}^{(u)}\right)\right] \\ &= E\left[H\left(X\_h^{(u)}\right)\right] + E^{X\_h^{(u)}}\left[\int\_0^t \mathcal{A}^{(u)} H\left(X\_s^{(u)}\right) \, \mathrm{d}s\right] \\ &\leq E\left[H\left(X\_h^{(u)}\right)\right]. \end{aligned}$$

<sup>6</sup> Condition (16) is to ensure that when <sup>X</sup>˜<sup>t</sup> stops at the boundary of <sup>X</sup> , we still have

AH˜ (t, x) <sup>≤</sup> 0 for <sup>x</sup> <sup>∈</sup> <sup>∂</sup><sup>X</sup> . If <sup>X</sup> <sup>=</sup> <sup>R</sup><sup>n</sup>, however, this condition can be dropped. <sup>7</sup> For cases with an unbounded <sup>X</sup> , the same proof technique of introducing a <sup>δ</sup>-closed ball as in the proof of Theorem 2 applies.

Thus H(X(u) <sup>t</sup> ) is a non-negative supermartingale. Then by Doob's maximal inequality in Lemma 3, we have 

$$\begin{split}P\left(\exists t\in[0,T] \colon \check{X}\_t \in \mathcal{X}\_u\right) &= P\left(\exists t \ge 0 \colon \check{X}\_{t\wedge\tau\_u \wedge T} \in \mathcal{X}\_u\right) \\ &\le P\left(\exists t \ge 0 \colon H\left(X\_t^{(u)}\right) \ge \eta\right) \\ &\le \frac{E[H(0,X\_0)]}{\eta} .\end{split}$$

This completes the proof.

The following fact is then immediately obvious:

**Corollary 1.** *Suppose the conditions in Theorem <sup>5</sup> hold, and there exists* β > <sup>0</sup>*, s.t.* <sup>H</sup>(0, x) <sup>≤</sup> <sup>β</sup> *for* <sup>x</sup> ∈ X0*. Then,* 

$$P\left(\exists t \in [0, T] \colon \tilde{X}\_t \in \mathcal{X}\_u\right) \le \frac{\beta}{\eta}.$$

*Proof.* This is a direct consequence of Theorem 5.

**Remarks on Potentially Tighter Bound.** There exists already in the literature a barrier certificate-based method proposed in [27,28] that can deal with the ∞-safety problem. It is worth highlighting, however, that our bound on the overall failure probability derived from Proposition 1, 2 and Theorem 5 (with appropriate T˜ chosen) is at least as tight as (and usually tighter than, as can be seen later in the experiments) that in [27,28]. The reasons are twofold: (1) the reduction to a finite-time horizon T˜-safety problem substantially "trims off" verification efforts pertaining to t > T˜; (2) our method for the reduced T˜-safety problem admits time-dependent barrier certificates, which are strictly more expressive than those time-independent ones exploited in [27,28], in the sense that any feasible solution thereof shall also be a feasible solution satisfying Theorem 5.

*Remark 4.* Roughly speaking, by setting the diffusion coefficients σ in SDEs to zero, our method applies trivially to ODE dynamics with either a known or an unknown probability distribution over the initial set of states. For the former, we can even obtain a tighter bound on the failure probability, since in this case we do not need to compute a bound on the barrier certificate over all possible initial distributions.

#### **4 Synthesizing Stochastic Barrier Certificates Using SDP**

In this section, we encode the synthesis of the aforementioned exponential and time-dependent stochastic barrier certificates into semidefinite programming [38] optimizations, and thus a solution thereof yields an upper bound on the failure

probability over the infinite-time horizon. Specifically, an SDP problem is formulated, for each of the two barrier certificates, to encode the constraints for "being an exponential/time-dependent stochastic barrier certificate", while in the meantime optimizing the tightness of the failure probability bound.

It is worth noting that SDP is a generalization of the standard linear programming in which the element-wise non-negativity constraints are replaced by a generalized inequality w.r.t. the cone of positive semidefinite matrices. The generalization preserves *convexity*, leading to the fact that SDP admits polynomialtime algorithms, say the well-known *interior-point methods*, that can efficiently solve the synthesis problem, albeit numerically. We remark that the numerical computation employed in off-the-shelf SDP solvers and the use of interior-point algorithms may potentially lead to erroneous results and thereby unsoundness in the verification/synthesis results. There have been numerous attempts to validate the results from the solver through a-posteriori numerical verification of the solution. For more details, we refer the readers to [30] and the references therein.

**Exponential Stochastic Barrier Certificate** V (x)**.** To encode the synthesis problem into an SDP optimization, we first fix the dimension m together with Λ satisfying Proposition <sup>1</sup> or <sup>2</sup> (depending on m), and then assume a polynomial template V <sup>a</sup>(x) of certain degree <sup>k</sup> with unknown parameters <sup>a</sup>, as the barrier certificate to be discovered. It then suffices to solve the following SDP problem<sup>8</sup>:

$$\underset{a, \alpha}{\mathbf{minimize}} \alpha \tag{19}$$

subject to V <sup>a</sup>(x) <sup>≥</sup> **<sup>0</sup>** for <sup>x</sup> ∈ X (20)

$$\mathcal{A}V^a(x) \le -\mathcal{A}V^a(x) \quad \text{for } x \in \mathcal{X} \tag{21}$$

$$AV^a(x) \le \mathbf{0} \quad \text{for } x \in \partial \mathcal{X} \tag{22}$$

$$V^a(x) \ge \mathbf{1} \quad \text{for } x \in \mathcal{X}\_u \tag{23}$$

$$V^a(x) \le \alpha \mathbf{1} \quad \text{for } x \in \mathcal{X}\_0 \tag{24}$$

Here, the constraints (20)–(22) encode the definition of an exponential stochastic barrier certificate (cf. Theorem 2), while constraint (23) (resp., (24)) corresponds to the lower (resp., upper) bound of V (x) as in Proposition <sup>1</sup> and <sup>2</sup> (resp., Theorem 4)<sup>9</sup>. Hence, minimizing the upper bound <sup>α</sup> of (each component of) V <sup>a</sup>(x) gives a tight exponentially decreasing bound on the tail failure probability, as claimed in Proposition 1 and 2.

*Remark 5.* If Λ is chosen as a non-negative matrix, the combination of condition (20) and (22) will force <sup>V</sup> <sup>a</sup>(x) = **<sup>0</sup>** for x <sup>∈</sup> ∂<sup>X</sup> , whereof the strict equality

<sup>8</sup> SDP problems in this paper refer to those that can be readily translated into the standard form of SDP, through, e.g., Stengle's Positivstellensatz [36] and sum-ofsquares decomposition [26].

<sup>9</sup> The lower bound l of V (x) in Proposition 1 and 2 is normalized to a vector with all its components no less than 1, based on the observation that, for any c > 0, V <sup>a</sup>(x) is a feasible solution implies cV <sup>a</sup>(x) is also a feasible solution.

may be violated due to numerical computations in SDP. In practice, however, this issue can be well addressed by looking for a barrier certificate of the form g(x)V (x), where g(x) satisfies ∂X ⊆{x <sup>|</sup> g(x)=0}, namely, an overapproximation of the boundary of X .

*Remark 6.* The choice of m is arbitrary, while the choices of Λ and k can be heuristic: If <sup>Λ</sup><sup>1</sup> admits no feasible solution, neither will <sup>Λ</sup><sup>2</sup> <sup>≥</sup> <sup>Λ</sup><sup>1</sup> (point-wise, with all the rest parameters fixed); similarly, if <sup>k</sup><sup>1</sup> admits no feasible solution, neither will <sup>k</sup><sup>2</sup> <sup>≤</sup> <sup>k</sup><sup>1</sup> (with all the rest parameters fixed). Therefore, one may decrease Λ (say, by a half) or increase k (say, by one) whenever a valid barrier certificate was not found.

**Time-Dependent Stochastic Barrier Certificate** H(t, x)**.** Given the results established in Sect. 3, the corresponding synthesis problem can be analogously encoded as the following SDP problem:

$$\underset{b, \beta}{\mathbf{minimize}} \quad \beta$$

subject to H<sup>b</sup> (t, x) <sup>≥</sup> 0 for (t, x) <sup>∈</sup> [0, T] × X (26)

$$\mathcal{A}H^b(t,x) \le 0 \quad \text{for } (t,x) \in [0,T] \times (\mathcal{X} \backslash \mathcal{X}\_u) \tag{27}$$

$$\frac{\partial H^b}{\partial t} \le 0 \quad \text{for } (t, x) \in [0, T] \times \partial \mathcal{X} \tag{28}$$

$$H^b(t, x) > 1 \quad \text{for } (t, x) \in [0, T] \times \mathcal{Y} \tag{20}$$

$$H^b(t, x) \ge 1 \quad \text{for } (t, x) \in [0, T] \times \mathcal{X}\_u \tag{29}$$

$$H^b(0, x) \le \beta \quad \text{for } x \in \mathcal{X}\_0 \tag{30}$$

Similarly, the constraints (26)–(29) encode the definition of a time-dependent stochastic barrier certificate (cf. Theorem 5), while constraint (30) corresponds to the upper bound of H(t, x) as in Corollary <sup>1</sup> (with η being normalized to 1, as in constraint (29)). Consequently, minimizing the upper bound <sup>β</sup> of <sup>H</sup><sup>b</sup>(t, x) produces a tight bound on the failure probability over the reduced finite-time horizon, as stated in Corollary 1.

*Remark 7.* The state-of-the-art interior-point methods solve an SDP problem up to an error ε in time that is polynomial in the program description size (number of variables) and log(1/ε). The former is exponential in the degree of V <sup>a</sup> and <sup>H</sup><sup>b</sup>, as it corresponds to the number of monomials in the template polynomials.

#### **5 Implementation and Experimental Results**

To further demonstrate the practical performance of our approach, we have carried out a prototypical implementation in Matlab R2019b, with the toolbox Yalmip [21] and Mosek [2] equipped for formulating and solving the underlying SDP problems. Given an ∞-safety problem as input, our implementation works toward an upper bound on the failure probability over the infinite time horizon, leveraging the reduction to a T-safety problem based on a computed exponentially decreasing bound on the tail failure probability. A collection of benchmark examples from the literature has been evaluated on a 1.8 GHz Intel Core-i7 processor with 8 GB RAM running 64-bit Windows 10. Each of the examples has been successfully tackled within 30 s. In what follows, we demonstrate the applicability of our techniques to SDEs featuring different dimensionalities and nonlinear dynamics, and show particularly that our approach usually produces tighter bounds compared to existing methods.

*Example 1 (Population growth* [25]*).* Consider the stochastic system

$$\operatorname{d}X\_t = b\left(X\_t\right)\operatorname{d}t + \sigma\left(X\_t\right)\operatorname{d}W\_t,$$

which is a stochastic model of population dynamics subject to random fluctuations that, possibly, can be attributed to extraneous or chance factors such as the weather, location, and the general environment. Suppose that the state space is restricted within <sup>X</sup> <sup>=</sup> {<sup>x</sup> <sup>|</sup> <sup>x</sup> <sup>≥</sup> <sup>0</sup>} with <sup>b</sup>(X<sup>t</sup>) = <sup>−</sup>X<sup>t</sup> and <sup>σ</sup>(X<sup>t</sup>) = <sup>√</sup>2/2X<sup>t</sup>. We instantiate the <sup>∞</sup>-safety problem as <sup>X</sup><sup>0</sup> <sup>=</sup> {<sup>x</sup> <sup>|</sup> <sup>x</sup> = 1} and <sup>X</sup><sup>u</sup> <sup>=</sup> {<sup>x</sup> <sup>|</sup> <sup>x</sup> <sup>≥</sup> <sup>2</sup>}, namely, we expect that the population does not diverge beyond 2.

Let Λ = 1 (with m = 1) and set the polynomial template degree of the exponential stochastic barrier certificate V <sup>a</sup>(x) to 4, the SDP solver gives

$$\begin{aligned} V^a(x) &= 0.000001474596322 - 0.000044643990040x \\ &+ \ 0.125023372121222x^2 + 0.000000001430428x^3, \end{aligned}$$

which satisfies

$$V^a(x) \ge 1 \quad \text{for } x \in \mathcal{X}\_u \quad \text{and} \quad V^a(x) \le 0.12498 \quad \text{for } x \in \mathcal{X}\_0.$$

Thus by Proposition 1, we obtain the exponentially decreasing bound

$$P\left(\exists t \ge T \colon \tilde{X}\_t \in \mathcal{X}\_u\right) \le \frac{0.12498}{\mathbf{e}^T} \quad \text{for all } T > 0.1$$

The user then may choose any T > 0 and solve the reduced T-safety problem. As depicted in the left of Fig. 1, different choices lead to different bounds on the failure probability. Nevertheless, one may surely select an appropriate T that yields a way tighter overall bound on the failure probability than that produced by the method in [27,28].

*Example 2 (Harmonic oscillator* [13]*).* Consider a two-dimensional harmonic oscillator with noisy damping: 

$$\mathrm{d}X\_t = \begin{pmatrix} 0 & \omega \\ -\omega & -k \end{pmatrix} X\_t \,\mathrm{d}t + \begin{pmatrix} 0 & 0 \\ 0 & -\sigma \end{pmatrix} X\_t \,\mathrm{d}W\_t,$$

with constants ω = 1, k = 7 and σ = 2. We instantiate the <sup>∞</sup>-safety problem as <sup>X</sup> <sup>=</sup> <sup>R</sup><sup>n</sup>, <sup>X</sup><sup>0</sup> <sup>=</sup> {(x<sup>1</sup>, x<sup>2</sup>) | −1.<sup>2</sup> <sup>≤</sup> <sup>x</sup><sup>1</sup> <sup>≤</sup> <sup>0</sup>.8, <sup>−</sup>0.<sup>6</sup> <sup>≤</sup> <sup>x</sup><sup>2</sup> <sup>≤</sup> <sup>0</sup>.4} and <sup>X</sup><sup>u</sup> <sup>=</sup> {(x<sup>1</sup>, x<sup>2</sup>) | |x<sup>1</sup>| ≥ <sup>2</sup>}.

**Fig. 1.** Different choices of T lead to different bounds on the failure probability (with the time-dependent stochastic barrier certificates of degree 4). Note that '◦'='×' + '' and '*•*' depicts the overall bound on the failure probability produced by the method in [27,28]. 

Let Λ <sup>=</sup> <sup>0</sup>.45 0.<sup>1</sup> 0.1 0.45 and set the polynomial template degree of the exponential stochastic barrier certificate V <sup>a</sup>(x) to 4, the SDP solver produces a twodimensional V <sup>a</sup>(x) (abbreviated for clear presentation) satisfying 

$$\begin{aligned} & \begin{cases} \text{v.r. } ^0 \text{-} ^0 \text{I} \end{cases} \\ & \text{Initial stochastic barrier} \begin{cases} ^a \text{r.r} (x) \text{ to 4, the SDP solver produces a two dimensional } ^a \text{I} \end{cases} \\ & \text{dimensional } V^a (x) \text{ (abbreviated for clear presentation) satisfying} \\ & \begin{cases} ^a \text{V} \text{ } ^a (x) \le \begin{cases} 0.19946 \\ 0.19946 \end{cases} \text{ for } x \in \mathcal{X}\_0 \quad \text{and} \quad V^a (x) \ge l = \begin{pmatrix} 1.000237\\ 1.000236 \end{pmatrix} \text{ for } x \in \mathcal{X}\_u. \end{aligned} \end{aligned}$$

According to the proof of Proposition 2, we set M <sup>=</sup> <sup>0</sup>.3 0.<sup>1</sup> <sup>0</sup>.1 0.<sup>3</sup> and aim to find <sup>T</sup> <sup>∗</sup> <sup>≥</sup> 0 such that for all T <sup>≥</sup> T <sup>∗</sup>, 1.000236 ≤ 1.000236

$$\sup\_{t\geq 0} \left( \mathbf{e}^{-At} \mathbf{e}^{-(A-M)T} \begin{pmatrix} 1.000237\\ 1.000236 \end{pmatrix} \right) \leq \begin{pmatrix} 1.000237\\ 1.000236 \end{pmatrix}.\tag{31}$$

Symbolic computation on the matrix exponential gives

$$\begin{split} \sup\_{t\geq 0} \left( \mathbf{e}^{-\Lambda t} \mathbf{e}^{-(\Lambda - M)T} \begin{pmatrix} 1.000237\\ 1.000236 \end{pmatrix} \right) &= \sup\_{t\geq 0} \begin{pmatrix} \mathbf{e}^{-0.15T} (1.0002365 \mathbf{e}^{-0.55t} + 0.0000005 \mathbf{e}^{-0.35t})\\ \mathbf{e}^{-0.15T} (1.0002365 \mathbf{e}^{-0.55t} - 0.0000005 \mathbf{e}^{-0.35t}) \end{pmatrix} \\ &\leq \begin{pmatrix} 1.0002365 \mathbf{e}^{-0.15T} \\ 1.0002365 \mathbf{e}^{-0.15T} \end{pmatrix}. \end{split}$$

Therefore, T <sup>∗</sup> = 1 satisfies condition (31). Further by Corollary 2, for any T <sup>≥</sup> T <sup>∗</sup> = 1, we have 

$$P\left(\exists t \ge T \colon \tilde{X}\_t \in \mathcal{X}\_u\right) \le \frac{E[V\_1(X\_0)]}{(\text{e}^{MT}l)\_1} \le \frac{0.19946}{0.0000005 \text{e}^{0.2T} + 1.00024 \text{e}^{0.4T}}.$$

Analogously, a comparison with existing methods concerning the tightness of the synthesized failure probability bound (under different choices of T) is shown in the right of Fig. 1.

*Example 3 (Nonlinear drift* [27]*).* We consider in this example a stochastic system involving nonlinear dynamics in its drift coefficient:

$$\begin{aligned} \operatorname{d}x\_1(t) &= x\_2(t) \operatorname{d}t \\ \operatorname{d}x\_2(t) &= -x\_1(t) - x\_2(t) - 0.5x\_1^3(t) \operatorname{d}t + 0.1 \operatorname{d}W\_t. \end{aligned}$$

As in [27], let <sup>X</sup> <sup>=</sup> {(x<sup>1</sup>, x<sup>2</sup>) | |x<sup>1</sup>| ≤ <sup>3</sup>, <sup>|</sup>x<sup>2</sup>| ≤ <sup>3</sup>, x<sup>2</sup> <sup>1</sup> <sup>+</sup> <sup>x</sup><sup>2</sup> <sup>2</sup> <sup>≥</sup> <sup>0</sup>.5<sup>2</sup>}, <sup>X</sup><sup>0</sup> <sup>=</sup> {(x<sup>1</sup>, x<sup>2</sup>) <sup>|</sup> (x<sup>1</sup> + 2)<sup>2</sup> <sup>+</sup> <sup>x</sup><sup>2</sup> <sup>2</sup> <sup>≤</sup> <sup>0</sup>.1<sup>2</sup>} and <sup>X</sup><sup>u</sup> <sup>=</sup> {(x<sup>1</sup>, x<sup>2</sup>) ∈X | <sup>x</sup><sup>2</sup> <sup>≥</sup> <sup>2</sup>.25}. With <sup>Λ</sup> = 1.<sup>5</sup> (<sup>m</sup> = 1), we obtain an exponential stochastic barrier certificate <sup>V</sup> <sup>a</sup>(x) of degree 8 satisfying

<sup>V</sup> <sup>a</sup>(x) <sup>≤</sup> <sup>4</sup>.00014 for <sup>x</sup> ∈ X<sup>0</sup> and <sup>V</sup> <sup>a</sup>(x) <sup>≥</sup> <sup>1</sup>.05248 for <sup>x</sup> ∈ X<sup>u</sup>. 

Thus by Corollary 1, we have for any T <sup>≥</sup> 0,

$$P\left(\exists t \ge T \colon \tilde{X}\_t \in \mathcal{X}\_u\right) \le \frac{3.80070}{\mathbf{e}^{1.5T}}.$$

 

 

Setting, for instance, T = 6, we have

$$P\left(\exists t \ge 0 \colon \tilde{X}\_t \in \mathcal{X}\_u\right) \le P\left(\exists t \in [0, 6] \colon \tilde{X}\_t \in \mathcal{X}\_u\right) + \frac{3.80070}{\text{e}^9} \tilde{X}$$

For the reduced T-safety problem with T = 6, a time-dependent stochastic barrier certificate of degree 8 is synthesized, thereby yielding P <sup>∃</sup><sup>t</sup> <sup>∈</sup> [0, 6]: <sup>X</sup>˜<sup>t</sup> ∈ X<sup>u</sup> ≤ <sup>0</sup>.196124, thus together we get 

$$P\left(\exists t \ge 0 \colon \tilde{X}\_t \in \mathcal{X}\_u\right) \le 0.196593,$$

which is tighter than 0.265388 produced (on the same machine) by the method in [27] under the same template degree.

#### **6 Conclusion**

We proposed a constructive method, based on the synthesis of stochastic barrier certificates, for computing an exponentially decreasing upper bound, if existent, on the tail probability that an SDE system violates a given safety specification. We showed that such an upper bound facilitates a reduction of the verification problem over an unbounded temporal horizon to that over a bounded one. Preliminary experimental results on a set of interesting examples from the literature demonstrated the effectiveness of the reduction and that our method often produces tighter bounds on the failure probability.

For future work, we plan to investigate a possible convergence result in the sense that the derived failure probability bound may converge to the exact one as increasing the degree of the barrier certificates. Extending our technique to tackle SDEs with control inputs will also be of interest. Moreover, checking whether a given parametric (polynomial) formula keeps probabilistic invariance plays a central in the verification of SDEs. Several kinds of sufficient conditions on probabilistic barrier certificates were proposed, including the ones given in this paper. It consequently deserves to investigate a necessary and sufficient condition for checking the probabilistic invariance of a given template, like for ODEs in [19]. Apart from that, we are interested in carrying our results to the verification of probabilistic programs without conditioning, which can be viewed as discrete-time stochastic dynamics.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Widest Paths and Global Propagation in Bounded Value Iteration for Stochastic Games**

Kittiphon Phalakarn<sup>1</sup>, Toru Takisaka2(B), Thomas Haas<sup>3</sup>, and Ichiro Hasuo2,4

<sup>1</sup> University of Waterloo, Waterloo, Canada kphalakarn@uwaterloo.ca <sup>2</sup> National Institute of Informatics, Tokyo, Japan {takisaka,hasuo}@nii.ac.jp <sup>3</sup> Technical University of Braunschweig, Braunschweig, Germany thohaas@tu-bs.de <sup>4</sup> The Graduate University for Advanced Studies (SOKENDAI), Tokyo, Japan

**Abstract.** Solving *stochastic games* with the reachability objective is a fundamental problem, especially in quantitative verification and synthesis. For this purpose, *bounded value iteration (BVI)* attracts attention as an efficient iterative method. However, BVI's performance is often impeded by costly *end component (EC) computation* that is needed to ensure convergence. Our contribution is a novel BVI algorithm that conducts, in addition to local propagation by the Bellman update that is typical of BVI, *global* propagation of upper bounds that is not hindered by ECs. To conduct global propagation in a computationally tractable manner, we construct a weighted graph and solve the *widest path problem* in it. Our experiments show the algorithm's performance advantage over the previous BVI algorithms that rely on EC computation.

## **1 Introduction**

#### **1.1 Stochastic Game (SG)**

A *stochastic game* [13] is a two-player game played on a graph. In an SG, an action a of a player causes a transition from the current state s to a successor s- , with the latter chosen from a prescribed probability distribution δ(s, a, s- ). Under the reachability objective, the two players (called *Maximizer* and *Minimizer* ) aim to maximize and minimize, respectively, the reachability probability to a designated target state.

Stochastic games are a fundamental construct in theoretical computer science, especially in the analysis of probabilistic systems. Its complexity is interesting in its own: the problem of threshold reachability—whether Maximizer has a strategy that ensures the reachability probability to be at least given p—is known

c The Author(s) 2020 S. K. Lahiri and C. Wang (Eds.): CAV 2020, LNCS 12225, pp. 349–371, 2020. https://doi.org/10.1007/978-3-030-53291-8\_19

K. Phalakarn—The work was done during K.P.'s internship at National Institute of Informatics, Japan, while he was a student at Chulalongkorn University, Thailand.

to be in UP ∩ coUP [19], but no polynomial algorithm is known. The practical significance of SGs comes from the number of problems that can be encoded to SGs and then solved. Examples include the following: solving deterministic parity games [8], solving stochastic games with the parity or mean-payoff objective [1], and a variety of probabilistic verification and reactive synthesis problems in different application domains such as cyber-physical systems. See e.g. [25].

SGs are often called 2.5-player games, where probabilistic branching is counted as 0.5 players. They generalize deterministic automata (0-player), Markov chains (MCs, 0.5-player), nondeterministic automata (1-player), Markov decision processes (MDPs, 1.5-player) and (deterministic) games (2-player). Many theoretical considerations on these special cases carry over smoothly to SGs. However, SGs have their peculiarities, too. One example is the treatment of end components in bounded value iteration, as we describe later.

#### **1.2 Value Iteration (VI)**

In an SG, we are interested in the *optimal* reachability probability, that is, the reachability probability when both Maximizer and Minimizer take their optimal strategies. The function that returns these optimal reachability probabilities is called the *value function* V (G) of the SG G; our interest is in computing this value function, desirably constructing optimal strategies for the two players at the same time. For this purpose, two principal families of solution methods are *strategy iteration (SI)* [19] and *value iteration (VI)* [10,13]—the latter is commonly preferred for performance reasons.

The mathematical principle that underpins VI is the characterization of the value function V (G) as the *least fixed point (lfp)* of an function update operator X called the *Bellman operator*. The Bellman operator X back-propagates function values by one step, using the average. For the simple case of Markov chains 

shown on the right, it is defined by (Xf)(s) = <sup>i</sup> p<sup>i</sup> · f(si), turning a function <sup>f</sup> : <sup>S</sup> <sup>→</sup> [0, 1] (i.e., assignment of "scores" to states) to <sup>X</sup><sup>f</sup> : <sup>S</sup> <sup>→</sup> [0, 1].

Since <sup>V</sup> (G) is the lfp <sup>μ</sup>X, Kleene's fixed point theorem tells us the sequence

$$
\perp \le \mathbb{X} \perp \le \mathbb{X}^2 \perp \le \cdots,\tag{1}
$$

where ⊥ is the least element of the function space S → [0, 1], converges to V (G) = <sup>μ</sup>X. VI consists of the iterative approximation of <sup>V</sup> (G) via the sequence (1).

An issue from the practical point of view, however, is that X<sup>i</sup> ⊥ never becomes equal to V (G) in general. Even worse, one cannot know how close the current approximant X<sup>i</sup> ⊥ is to the desired function V (G) [18]. In summary, VI as an iterative approximation method does not give any precision guarantee.

#### **1.3 Bounded Value Iteration (BVI) and End Components**

Bounded value iteration (BVI) has been actively studied as an extension of VI that comes with a precision guarantee [2,3,5,16,18,20,23]. Its core ideas are the following two.

Firstly, BVI computes not only iterative lower bounds L<sup>i</sup> = X<sup>i</sup> ⊥ for V (G), but also iterative *upper bounds* Ui, as shown on the right in (2). This gives us a precision guarantee—V (G) must lie between the approximants L<sup>i</sup> and Ui.

Secondly, for computing upper bounds Ui, BVI uses the Bellman operator again: U<sup>i</sup> = X<sup>i</sup> where is the greatest element of the function space S → [0, 1]. This leads to the following approximation sequence that is dual to (1):

$$
\top \ge \mathbb{X} \top \ge \mathbb{X}^2 \top \ge \cdots \tag{3}
$$

The sequence (3) converges to the *greatest fixed point (gfp)* νX of X, which must be above the lfp <sup>V</sup> (G) = <sup>μ</sup>X. Therefore the elements in (3) are all above <sup>V</sup> (G).

The problem, however, is that the gfp νX is not necessarily the same as μX. Therefore the upper bounds U<sup>0</sup> ≥ U<sup>1</sup> ≥ ··· given by (3) may not converge to V (G). In other words, for a given threshold ε > 0, the bounds in (2) may fail to achieve U<sup>i</sup> − L<sup>i</sup> ≤ ε, no matter how large i is.

In the literature, the source of this convergence issue has been identified to be *end components (ECs)* in MCs/MDPs/SGs. ECs are much like loops without exits—an example is in Fig. 1, where we use a Markov chain (MC) for simplicity. Any function f that assigns the same value to the states s<sup>I</sup> and s can be a fixed point of the Bellman operator X (that back-propagates f by averages); therefore, the gfp

**Fig. 1.** A Markov chain (MC) for which the naive BVI fails to converge

νX assigns 1 to both s<sup>I</sup> and s. In contrast, (μX)(s<sup>I</sup> )=(μX)(s) = 0, which says one never reaches the target **1** from s<sup>I</sup> or s (which is obvious).

Most previous works on BVI have focused on the problem of how to deal with ECs. Their solutions are to get somehow rid of ECs. For example, ECs in MDPs are discovered and *collapsed* in [5,18]; ECs in SGs cannot simply be collapsed, and an elaborate method is proposed in the recent work [20] that *deflates* them. This is the context of the current work, and we aim to enhance BVI for SGs.

#### **1.4 Contribution: Global Propagation in BVI with Widest Paths**

The algorithms in [20] seem to be the only BVI algorithms known for SGs. In their performance, however, EC computation often becomes a bottleneck. Our contribution in this paper is a new BVI algorithm for SGs that is *free from the need for EC computation*.

The key idea of our algorithm is *global propagation* for upper bounds, as sketched below. In each iteration for upper bounds U<sup>0</sup> ≥ U<sup>1</sup> ≥ ··· , we conduct *global* propagation, in addition to the *local* propagation in the usual BVI. The latter means the application of X to X<sup>i</sup> , leading to <sup>X</sup>i+1; this local propagation, as we previously discussed, gets trapped in end components. In contrast, our global propagation looks at paths from each state s to the target **1**, ignoring end components. For example, in Fig. 1, our global propagation sees that there is no path from s<sup>I</sup> to the target **1**, and consequently assigns 0 as an upper bound for the value function V (G)(s<sup>I</sup> ).

Such global propagation is easier said than done—in fact, the very advantage of VI is that the *global* quantities (namely reachability probabilities) get computed by iterations of *local* propagation. Conducting global propagation in a computationally tractable manner requires a careful choice of its venue. The solution in this paper is to compute *widest paths* in a suitable (directed) weighted graph.

More specifically, in each iteration where we compute an upper bound Ui, we conduct the following operations.


Due to this information loss, our analysis in W<sup>i</sup> is necessarily approximate. Nevertheless, the benefit of Wi's simplicity is significant, as in the following step.

– (Global propagation) In the WG Wi, we solve the *widest path problem*. This classic graph-theoretic problem can be solved efficiently, e.g., by the Dijkstra algorithm. The widest path width gives a new upper bound Ui.

We prove the correctness of our algorithm: soundness (V (G) ≤ Ui), and convergence (U<sup>i</sup> → V (G) as i → ∞). That the upper bounds decrease (U<sup>0</sup> ≥ U<sup>1</sup> ≥ ···) will be obvious by construction. These correctness proofs are technically nontrivial, combining combinatorial, graph-theoretic, and analytic arguments.

We have also implemented our algorithm. Our experiments compare its performance to the algorithms from [20] (the original one and its learning-based variation). The results show our consistent performance advantage: depending on SGs, our performance is from comparable to dozens of times faster. The advantage is especially eminent in SGs with many ECs.

#### **1.5 Related Works**

VI and BVI have been pursued principally for MDPs. The only work we know that deals with SGs is [20]—with the exception of [26] that works in a restricted setting where every end component belongs exclusively to either player. The work closest to ours is therefore [20], in that we solve the same problem.

For MDPs, the idea of BVI is first introduced in [23]; they worked in a limited setting where ECs do not cause the convergence issue. Its extension to general MDPs with the reachability objective is presented in [5,18], where ECs are computed and then collapsed. BVI is studied under different names in these works: *bounded real time dynamic programming* [5,23] and *interval iteration* [18]. The work [20] is an extension of this line of work from MDPs to SGs.

The work [20] has seen a few extensions to more advanced settings: black-box settings [3], concurrent reachability [16], and generalized reachability games [2].

Most BVI algorithms involve EC computation (although ours does not). The EC algorithm in [14,15] is used in [18,20]; more recent algorithms include [7,9].

#### **1.6 Organization**

In Sect. 2 we present some preliminaries. In Sect. 3 we review VI and BVI with an emphasis on the role of Kleene's fixed point theorem. This paves the way to Sect. 4 where we present our algorithm. We do so in three steps, and prove the correctness—soundness and convergence—in the end. Experiment results are shown in Sect. 5.

#### **2 Preliminaries**

We fix some basic notations. Let X be a set. We let X<sup>∗</sup> denote the set of finite sequences over X, that is, X<sup>∗</sup> = <sup>i</sup>∈<sup>N</sup> <sup>X</sup><sup>i</sup> . We let <sup>X</sup><sup>+</sup> <sup>=</sup> <sup>X</sup><sup>∗</sup> \ {ε}, where <sup>ε</sup> denotes the empty sequence (of length 0). The set of infinite sequences over X is denoted by <sup>X</sup><sup>ω</sup>. The set of functions from <sup>X</sup> to <sup>Y</sup> is denoted by <sup>X</sup> <sup>→</sup> <sup>Y</sup> .

#### **2.1 Stochastic Games**

In a stochastic game, two players (*Maximizer* and *Minimizer* ) play against each other. The goals of the two players are to maximize and minimize the *value function*, respectively. Many different definitions are possible for value functions. In this paper (as well as all the works on (bounded) value iteration), we focus on the *reachability objective*, in which case a value function is defined by the reachability probability to a designated target state **1**.

**Definition 2.1 (stochastic game (SG)).** A stochastic game (SG) is a tuple G = (S, S-, S, s<sup>I</sup> , **<sup>1</sup>**, **<sup>0</sup>**, A, Av, δ) where


We assume that each of **1** and **0** allows only one action that leads to a self-loop with probability 1. Moreover, for theoretical convenience, we assume that all SGs are non-blocking. That is, Av(s) = ∅ for each s ∈ S.

We introduce some notations: post(s, a) = {s- | δ(s, a, s- ) > 0}, and for S- ⊆ S, we let S- - = S- ∩ S and S- <sup>=</sup> <sup>S</sup>-∩ S.

**Definition 2.2 (Markov decision process (MDP), Markov chain (MC)).** An SG such that S- <sup>=</sup> <sup>S</sup> \ {**0**} (i.e. Minimizer is absent) is called a *Markov decision process (MDP)*. We often omit the second and third components for MDPs, writing <sup>M</sup> = (S, s<sup>I</sup> , **<sup>1</sup>**, **<sup>0</sup>**, A, Av, δ).

An SG such that |Av(s)| = 1 for each s ∈ S—both Maximizer and Minimizer are absent—is called a *Markov chain (MC)*. It is also denoted simply by a tuple <sup>G</sup> = (S, s<sup>I</sup> , **<sup>1</sup>**, **<sup>0</sup>**, δ) where its transition function is of the type <sup>δ</sup> : <sup>S</sup> <sup>×</sup> <sup>S</sup> <sup>→</sup> [0, 1].

Every notion for SGs that appears below applies to MDPs and MCs, too.

**Example 2.3.** Figure 2 presents an example of an SG. At the state s<sup>1</sup> of Minimizer, two actions α and β are in Av(s1). If Minimizer chooses α, the next state is s<sup>2</sup> with probability δ(s1, α, s2) = 1. If Minimizer instead chooses β, the next state is **1** with probability δ(s1, β, **1**)=0.8 or **0** with probability δ(s1, β, **0**)=0.2.

Maximizer's goal is to reach **1** as often as possible by choosing suitable actions. Minimizer's goal is to avoid reaching **1**—this can be achieved, for example but not exclusively, by reaching **0**.

Both players choose their actions according to their *strategies*. It is well-known [13] that *positional* (also called *memoryless*) and *deterministic* (also called *pure*) strategies are complete for finite SGs with the reachability objective.

**Fig. 2.** A stochastic game (SG), an example

**Definition 2.4 (strategy, path).** Let <sup>G</sup> be the SG in Definition 2.1. A *strategy* for Maximizer in G is a function σ : S- → A such that σ(s) ∈ Av(s) for each s ∈ S-. A *strategy* for Minimizer is defined similarly. The set of Maximizer's strategies in G is denoted by str<sup>G</sup> -; that of Minimizer's is denoted by str<sup>G</sup> .

Strategies τ ∈ str<sup>G</sup> and σ ∈ str<sup>G</sup> in <sup>G</sup> turn the game <sup>G</sup> into a Markov chain, which is denoted by <sup>G</sup>τ,σ. Similarly, a strategy <sup>τ</sup> for Maximizer (who is the only player) in an MDP <sup>M</sup> induces an MC, denoted by <sup>M</sup><sup>τ</sup> .

An *infinite path* in <sup>G</sup> is a sequence <sup>s</sup>0a0s1a1s2a<sup>2</sup> ... <sup>∈</sup> (<sup>S</sup> <sup>×</sup> <sup>A</sup>)<sup>ω</sup> such that for all <sup>i</sup> <sup>∈</sup> <sup>N</sup>, <sup>a</sup><sup>i</sup> <sup>∈</sup> Av(si) and <sup>s</sup>i+1 <sup>∈</sup> post(si, ai). A prefix <sup>s</sup>0a0s<sup>1</sup> ...s<sup>k</sup> of an infinite path ending with a state is called a *finite path*. If G is an MC, then we omit actions in a path and write s0s1s<sup>2</sup> ... or s0s<sup>1</sup> ...sk.

Given a game <sup>G</sup> and strategies τ,σ for the two players, the induced MC <sup>G</sup>τ,σ assigns to each state <sup>s</sup> <sup>∈</sup> <sup>S</sup> a probability distribution <sup>P</sup>τ,σ <sup>s</sup> . The distribution is with respect to the standard measurable structure of S<sup>ω</sup>; see, e.g., [4, Chap. 10]. For each measurable subset <sup>X</sup> <sup>⊆</sup> <sup>S</sup>ω, <sup>P</sup>τ,σ <sup>s</sup> (X) is the probability with which <sup>G</sup>τ,σ, starting from the state s, produces an infinite path π that belongs to X.

It is well-known that all the LTL properties are measurable in Sω. In the current setting with the reachability objective, we are interested in the probability of eventually reaching **1**, denoted by Pτ,σ <sup>s</sup> (♦**1**).

**Definition 2.5 (value function** <sup>V</sup> (G)**).** Let <sup>G</sup> be the SG in Definition 2.1. The *value function* V (G) of G is defined by

$$V(\mathcal{G})(s) = \max\_{\tau \in \operatorname{str}\_{\square}^{\mathcal{G}}} \min\_{\sigma \in \operatorname{str}\_{\diamondsuit}^{\mathcal{G}}} \mathbb{P}\_{s}^{\tau, \sigma}(\diamondsuit \mathbf{1}) = \min\_{\sigma \in \operatorname{str}\_{\diamondsuit}^{\mathcal{G}}} \max\_{\tau \in \operatorname{str}\_{\square}^{\mathcal{G}}} \mathbb{P}\_{s}^{\tau, \sigma}(\diamondsuit \mathbf{1}),$$

where the last equality is shown in [13].

We say a strategy <sup>τ</sup> of Maximizer's is *optimal* if <sup>V</sup> (G)(s) = min<sup>σ</sup> <sup>P</sup>τ,σ <sup>s</sup> (♦**1**) for each s ∈ S; similarly, we say a strategy σ of Minimizer's is *optimal* if V (G)(s) = max<sup>σ</sup> Pσ,τ <sup>s</sup> (♦**1**) for each <sup>s</sup> <sup>∈</sup> <sup>S</sup>.

We write V for V (G) when the dependence on G is clear from the context.

The set of states with a non-zero value is denoted by S♦**<sup>1</sup>**. That is, S♦**<sup>1</sup>** = {s ∈ S | V (G)(s) > 0}.

**Example 2.6.** Consider the SG <sup>G</sup> from Fig. 2. At <sup>s</sup>2, Maximizer's action should be α. Hence, V (G)(s2)=0.9. At s1, if Minimizer chooses α, then the probability of reaching **<sup>1</sup>** will be 0.9 by <sup>V</sup> (G)(s2). Thus, Minimizer should choose <sup>β</sup> at <sup>s</sup>1, which yields V (G)(s1)=0.8. Finally, at s<sup>I</sup> , γ is the best choice, since Maximizer can choose this action infinitely often until it gets to s2. We have V (G)(s<sup>I</sup> )=0.9.

#### **2.2 The Widest Path Problem**

**Definition 2.7 (weighted graph (WG)).** A (directed) *weighted graph* is a triple W = (V,E,w) of a finite set V of *vertices*, a set E ⊆ V × V of *edges*, and a *weight function* w: E → [0, 1] where [0, 1] is the unit interval.

A (finite) *path* in a WG is defined in the usual graph-theoretic way.

In the widest path problem, an edge weight w(v, v- ) is thought of as its capacity, and the capacity of a path is determined by its bottleneck. The problem asks for a path with the greatest capacity. In this paper, we use the following *all-source single-destination* version of the problem.

**Definition 2.8 (the widest path problem (WPP)).** A (finite) *path* in <sup>W</sup> <sup>=</sup> (V,E,w) is a sequence v0v<sup>1</sup> ...v<sup>n</sup> of vertices such that (vi, vi+1) ∈ E for each i ∈ [0, n − 1]. The *width* of a path v0v<sup>1</sup> ...v<sup>n</sup> is given by min<sup>i</sup>∈[0,n−1] w(vi, vi+1). The *widest path problem* is the following problem.

**Given:** a WG <sup>W</sup> = (V,E,w) and a target vertex <sup>v</sup><sup>t</sup> <sup>∈</sup> V. **Answer:** for each <sup>v</sup> <sup>∈</sup> <sup>V</sup> , the widest width of the paths from <sup>v</sup> to <sup>v</sup>t, that is,

> max <sup>n</sup>∈N,v=v0,v1,...,v*n*=v<sup>t</sup> min i∈[0,n−1] w(vi, vi+1),

We let WPW(W, vt) denote a function that solves this problem, and let WPath(W, vt) denote a function that assigns to each v ∈ V a widest path to vt. Furthermore, we assume the following property of WPath: if WPath(W, vt)(v0) = v0v<sup>1</sup> ...vkvt, then WPath(W, vt)(vi) = vivi+1 ...vkv<sup>t</sup> for each i ∈ [0, k].

Efficient algorithms are known for WPW(W, vt). An example is the Dijkstra search algorithm with Fibonacci heaps [17]; it is originally for the single-source all-destination version but its adaptation is easy. The algorithm runs in time O(|E| + |V | log |V |). It returns a widest path in addition to its width, too, computing the function WPath(W, vt) with the property required in the above.

## **3 (Bounded) Value Iteration**

#### **3.1 Bellman Operator and Value Iteration**

The following construct—used for "local propagation" in computing the value function—is central to formal analysis of probabilistic systems and games.

**Definition 3.1 (Bellman Operator).** Let <sup>G</sup> = (S, S-, S, s<sup>I</sup> , **<sup>1</sup>**, **<sup>0</sup>**, A, Av, δ) be a stochastic game. For each state s ∈ S, an available action a ∈ Av(s), and <sup>f</sup> : <sup>S</sup> <sup>→</sup> [0, 1], we define a function <sup>X</sup>a<sup>f</sup> : <sup>S</sup> <sup>→</sup> [0, 1] by the following.

$$(\mathbb{X}\_a f)(s) = \begin{cases} 1 & \text{if } s = \mathbf{1}, \\ 0 & \text{if } s = \mathbf{0}, \\ \sum\_{s' \in S} \delta(s, a, s') \cdot f(s') & \text{if } s \neq \mathbf{0}, \mathbf{1}. \end{cases}$$

These functions are used in the following definition of the *Bellman operator* <sup>X</sup>: (<sup>S</sup> <sup>→</sup> [0, 1]) <sup>→</sup> (<sup>S</sup> <sup>→</sup> [0, 1]) over <sup>G</sup>:

$$(\mathbb{X}f)(s) = \begin{cases} \max\_{a \in \text{Av}(s)} (\mathbb{X}\_a f)(s) & \text{if } s \in S\_{\square} \text{ is a Maximumz state,} \\ \min\_{a \in \text{Av}(s)} (\mathbb{X}\_a f)(s) & \text{if } s \in S\_{\square} \text{ is a minimizer state.} \end{cases}$$

The function space S → [0, 1] inherits the usual order ≤ between real numbers in the unit interval [0, 1], that is, f ≤ g if f(s) ≤ g(s) for each s ∈ S. The Bellman operator <sup>X</sup> over <sup>S</sup> <sup>→</sup> [0, 1] is clearly monotone; it is easily seen to preserve max and min, using the fact that the state space S of an SG is finite. Therefore we obtain the following, as consequences of Kleene's fixed point theorem.

**Lemma 3.2.** *Assume the setting of Definition 3.1.*

*1. The Bellman operator* <sup>X</sup> *has the greatest fixed point (gfp)* <sup>ν</sup>X: <sup>S</sup> <sup>→</sup> [0, 1]*. It is obtained as the limit of the descending* ω*-chain*

$$\mathsf{T} \ge \mathsf{X} \mathsf{T} \ge \mathsf{X}^2 \mathsf{T} \ge \cdots, 1$$

*where is the greatest element of* S → [0, 1] *(i.e.,* (s)=1 *for each* s ∈ S*). In other words, we have* (νX)(s) = inf <sup>i</sup>∈<sup>N</sup> (X<sup>i</sup> )(s) *for each* s ∈ S*.*


*2. Symmetrically,* <sup>X</sup> *has the least fixed point (lfp)* <sup>μ</sup>X: <sup>S</sup> <sup>→</sup> [0, 1]*, obtained as the limit of the ascending chain*

$$
\bot \le \mathbb{X} \bot \le \mathbb{X}^2 \bot \le \cdots,\tag{4}
$$

*where* <sup>⊥</sup>(s)=0 *for each* <sup>s</sup> <sup>∈</sup> <sup>S</sup>*. That is, we have* (μX)(s) = sup<sup>i</sup>∈<sup>N</sup> (X<sup>i</sup> ⊥)(s) *for each* s ∈ S*.*

The following characterization is fundamental. See, e.g., [10].

**Theorem 3.3.** *Let* <sup>G</sup> *be a stochastic game. The value function* <sup>V</sup> <sup>=</sup> <sup>V</sup> (G) *(Definition 2.5) coincides with the least fixed point* <sup>μ</sup>X*.*

The fact that <sup>V</sup> (G) is the least fixed point of <sup>X</sup> implies the following: a strategy τ of Maximizer is optimal if and only if Xτ(s) V (G) (s) = V (G)(s) holds for each s ∈ S-; similarly for Minimizer. We say a ∈ Av(s) is *optimal* at s if <sup>X</sup>a<sup>V</sup> (G)(s) = <sup>V</sup> (G)(s) holds; otherwise <sup>a</sup> is *suboptimal*.

Lemma 3.2.2 & Theorem 3.3 suggest iterative *under-*approximation of V (G) by ⊥ ≤ <sup>X</sup>⊥ ≤ <sup>X</sup><sup>2</sup>⊥ ≤ ··· . This is the principle of *value iteration* (VI); see Algorithm 1.

**Example 3.4.** The values L<sup>i</sup> computed by Algorithm 1, for the SG in Fig. 2, are shown in the following table. The values at **0** and **1** are omitted.


Li(s<sup>I</sup> ) converges to, but is never equal to, V (G)(s<sup>I</sup> ). The converges rate can be arbitrarily slow: for any <sup>ε</sup> <sup>∈</sup> (0, 1) and <sup>k</sup> <sup>∈</sup> <sup>N</sup> there is an SG <sup>G</sup> and a state s such that V (G)(s) − Lk(s) > ε. One sees this by modifying Fig. 2 with δ(s<sup>I</sup> , γ,s2) = ε and δ(s<sup>I</sup> , γ,s<sup>I</sup> )=1 − ε- , where ε- > 0 is an arbitrary small positive constant.

**Algorithm 2:** Bounded value iteration (BVI) for a stochastic game <sup>G</sup> <sup>=</sup> (S, S-, S, s<sup>I</sup> , **<sup>1</sup>**, **<sup>0</sup>**, A, Av, δ) and a stopping threshold ε > 0—a naive prototype that suffers from end components


There is no known stopping criterion for VI (Algorithm 1) with a precision guarantee, besides the one in [10] that is too pessimistic to be practical. The one shown in Line 3 ("little progress") is a commonly used heuristic, but it is known to lead to arbitrarily wrong results [18].

#### **3.2 Bounded Value Iteration**

When we turn back to Lemma 3.2, Lemma 3.2.1 suggests another iterative approximation, namely *over-*approximation of the value function V by ≥ <sup>X</sup> ≥ <sup>X</sup><sup>2</sup> ≥ ··· . The chain converges to the gfp <sup>ν</sup><sup>X</sup> that is necessarily above the lfp μX. This is the principle that underlies *bounded value iteration* (BVI); see Algorithm 2 for its naive prototype. BVI has been actively studied in the literature [2,3,5,16,18,20,23], sometimes under different names (such as *bounded real time dynamic programming* [5,23] or *interval iteration* [18]).

BVI comes with a precision guarantee: since V (G) lies between L<sup>i</sup> and U<sup>i</sup> (whose gap is at most ε), the approximation L<sup>i</sup> is at most ε apart from V (G).

The catch, however, is that μX and νX may not coincide, and therefore the overapproximation might not converge to the desired μX. This means Algorithm 2 might not terminate. This is the main technical challenge addressed in the previous works on BVI, including [5,20].

In those works, the source of the failure of convergence is identified to be *end components*. See the (very simple) Markov chain in Fig. 1, where the reachability probability from s<sup>I</sup> to **1** is clearly 0. However, due to the loop between s<sup>I</sup> and s, the values Ui(s<sup>I</sup> ) and Ui(s)—these get updated to the average of U<sup>i</sup>−<sup>1</sup> at successors—are easily seen to remain 1. Roughly speaking, end components generalize such loops defined in MDPs and SGs (the definitions are graph-theoretic, in terms of strongly connected components). End components cause non-convergence of naive BVI, essentially for the reason we just described.

The solutions previously proposed to this challenge have been to "get rid of end components." For MDPs (1.5 players), the *collapsing* technique detects end components and collapses each of them into a single state [5,18]. After doing so, the Bellman operator X has a unique fixed point (therefore μX = νX), assuring convergence of BVI (Algorithm 2). In the case of SGs (2.5 players), end components cannot simply be collapsed into single states—they must be handled carefully, taking the "best exits" into account. This is the key idea of the *deflating* technique proposed for SGs in [20].

### **4 Our Algorithm: Bounded Value Iteration with Upper Bounds Given by Widest Paths**

In our algorithm, like in other BVI algorithm, we iteratively construct upper and lower bounds Ui, L<sup>i</sup> of the value function V (G) at the same time. See (2). In updating Ui, however, we go beyond the *local* propagation by the Bellman update and conduct *global* propagation, too. This frees us from the curse of end components. The outline of our algorithm is as follows.

	- **(Player reduction)** Firstly, we turn the SG <sup>G</sup> into an MDP <sup>M</sup><sup>i</sup> by fixing Minimizer's strategy to a specific one σi.

Any choice of σ<sup>i</sup> would do for the sake of *soundness* (that is, V (G) ≤ Ui). However, for *convergence* (that is, U<sup>i</sup> → V (G) as i → ∞), it is important to have σ0, σ1,... eventually converge to Minimizer's optimal strategy σ. Therefore we let Li—the current lower estimate of V (G)—induce σi. Recall that L<sup>i</sup> converges to V (G) (Lemma 3.2.2, Theorem 3.3).

• **(Preprocessing by local propagation)** Secondly, we turn the MDP M<sup>i</sup> into a weighted graph (WG) Wi.

The construction here is *local* propagation of the previous upper bound U<sup>i</sup>−<sup>1</sup>, from each state s to its predecessors in Mi. This is much like an application of the Bellman operator X.

• **(Global propagation by widest paths)** Finally, we solve the widest path problem in the WG Wi, from each state s to the target state **1**. The maximum path width from s to **1** is used as the value of the upper bound Ui(s).

This way, we conduct *global* propagation of upper bounds, for which end components pose no threats. Our global propagation is still computationally feasible, thanks to the preprocessing in the previous step that turns a problem on an MDP into one on a WG (modulo some sound approximation).

The use of *global* propagation for upper bounds is a distinguishing feature of our algorithm. This is unlike other BVI algorithms (such as [5,20]) where upperbound propagation is only local and stepwise. The latter gets trapped when it encounters an EC—therefore some trick such as collapsing [5] and deflating [20] is needed—while our global propagation looks directly at the target state **1**.

The above outline is presented as pseudocode in Algorithm 3. We describe the three steps in the rest of the section. In particular, we exhibit the definitions of MPlRd and WLcPg (WPW has been defined and discussed in Definition 2.8), providing some of their properties towards the correctness proof of the algorithm (Sect. 4.3).


**Algorithm 3:** Our BVI algorithm via widest paths. Here <sup>G</sup> <sup>=</sup> , S, s<sup>I</sup> , **<sup>1</sup>**, **<sup>0</sup>**, A, Av, δ) is an SG; ε > 0 is a stopping threshold.

#### **4.1 Player Reduction: From SGs to MDPs**

The following general definition is not directly used in Algorithm 3. It is used in our theoretical development below, towards the algorithm's correctness.

**Definition 4.1 (the MDP** <sup>M</sup>(G, Av- )**).** Let <sup>G</sup> be the game in Algorithm 3, and Av- : <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>A</sup> be such that <sup>∅</sup> = Av- (s) ⊆ Av(s) for each s ∈ S.

Then the MDP given by the tuple (S, S \ {**0**}, {**0**}, s<sup>I</sup> , **<sup>1</sup>**, **<sup>0</sup>**, A, Av- , δ) shall be denoted by M(G, Av- ), and we say it is induced from G by restricting Av to Av- .

The above construction consists of 1) restricting actions (from Av to Av- ), and 2) turning Minimizer's states into Maximizer's.

The following class of action restrictions will be heavily used.

**Definition 4.2 (Minimizer restriction).** Let <sup>G</sup> be as in Algorithm 3. A *Minimizer restriction* of Av is a function Av- : <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>A</sup> such that 1) <sup>∅</sup> = Av- (s) ⊆ Av(s) for each s ∈ S, and 2) Av- (s) = Av(s) for each state s ∈ Sof Maximizer's.

In Algorithm 3, we will be using the MDP induced by the following specific Minimizer restriction induced by a function f.

**Definition 4.3 (the MDP** <sup>M</sup>PlRd(G, f)**).** Let <sup>G</sup> be the game in Algorithm 3, and f : S → [0, 1] be a function. The MDP MPlRd(G, f) is defined to be <sup>M</sup>(G, Av<sup>f</sup> ) (Definition 4.1), where the function Av<sup>f</sup> : <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>A</sup> is defined as follows.

$$\begin{aligned} \text{Av}\_f(s) &= \text{Av}(s) & \text{for } s \in S\_{\square},\\ \text{Av}\_f(s) &= \{ a \in \text{Av}(s) \mid \forall b \in \text{Av}(s) . (\mathbb{X}\_a f)(s) \le (\mathbb{X}\_b f)(s) \} \quad \text{for } s \in S\_{\square}. \end{aligned} \tag{5}$$

The function Av<sup>f</sup> is a Minimizer restriction in G (Definition 4.2).

The intuition of (5) is that <sup>a</sup> = arg min<sup>b</sup>∈Av(s)(Xbf)(s). In the use of this construction in Algorithm 3, the function f will be our "best guess" L<sup>i</sup> of the value function <sup>V</sup> (G). In this situation, arg min<sup>b</sup>∈Av(s)(Xbf)(s) is the best action for Minimizer based on the guess f = Li.

**Definition 4.4 (the MDP** <sup>M</sup>i, **and** Avi**).** In Algorithm 3, the MDP <sup>M</sup><sup>i</sup> is given by MPlRd(G, Li) = M(G, AvL*<sup>i</sup>* ). We write Av<sup>i</sup> for available actions in Mi, that is, <sup>M</sup><sup>i</sup> = (S, **<sup>1</sup>**, **<sup>0</sup>**, A, Avi, δ).

In the case of Algorithm 3, the MDPs M0,M1,... do not only "converge" to G, but also "reach G in finitely many steps," in the following sense. The proof is deferred to [24]. The proof relies crucially on the fact that the set Av(s) of available actions is finite—there is uniform ε > 0 such that every suboptimal action is suboptimal by a gap at least ε.

**Lemma 4.5.** *In Algorithm 3, there exists* <sup>i</sup><sup>M</sup> <sup>∈</sup> <sup>N</sup> *such that, for each* <sup>i</sup> <sup>≥</sup> <sup>i</sup>M*, we have* V (G) = V (Mi)*.*

#### **4.2 Local Propagation: From MDPs to WGs**

Here is a technical observation that motivates the function WLcPg.

**Lemma 4.6.** *Let* <sup>G</sup> *be the game in Algorithm 3, and* Av- : <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>A</sup> *be a Minimizer restriction (Definition 4.2).*


$$V(\mathcal{G})(s\_0) \le \max\_{s\_0 \xrightarrow{a\_0} s\_1 \xrightarrow{a\_1} \dots \xrightarrow{a\_k} \text{ in } \text{Av}'} \left( \mathbb{X}\_{a\_k} \left( V(\mathcal{G}) \right) \right)(s\_k), \tag{6}$$

*where the maximum is taken over* a0, s1, a1,...,sk, a<sup>k</sup> *such that* a<sup>0</sup> ∈ Av- (s0), s<sup>1</sup> ∈ post(s0, a0), a<sup>1</sup> ∈ Av- (s1),...,s<sup>k</sup> ∈ post(s<sup>k</sup>−<sup>1</sup>, a<sup>k</sup>−<sup>1</sup>), a<sup>k</sup> ∈ Av- (sk).

*Proof.* For the item 1, recall that V (G) is the least fixed point of the Bellman operator (Theorem 3.3). For each Minimizer state s ∈ S, we have

$$V(\mathcal{G})(s) = \min\_{a \in \text{Av}(s)} \left( \mathbb{X}\_a \left( V(\mathcal{G}) \right) \right)(s) \le \min\_{a \in \text{Av}'(s)} \left( \mathbb{X}\_a \left( V(\mathcal{G}) \right) \right)(s) \le \max\_{a \in \text{Av}'(s)} \left( \mathbb{X}\_a \left( V(\mathcal{G}) \right) \right)(s).$$

For each Maximizer state s ∈ S-, we have

$$V(\mathcal{G})(s) = \max\_{a \in \text{Av}(s)} \left( \mathbb{X}\_a \left( V(\mathcal{G}) \right) \right)(s) = \max\_{a \in \text{Av}'(s)} \left( \mathbb{X}\_a \left( V(\mathcal{G}) \right) \right)(s).$$

The latter equality is because Av does not restrict Maximizer's actions. This proves the item 1.

The item 2 is proved by induction as follows, using the item 1 in its course.

$$\begin{split} & V(\mathcal{G})(\mathcal{G})(s\_{0}) \\ & \leq \max\_{a\_{0} \in \mathcal{A}\mathcal{V}'(s\_{0})} \Big( \mathbb{X}\_{a\_{0}}(V(\mathcal{G})) \big)(s\_{0}) \qquad \text{by the item 1.} \\ &= \max\_{a\_{0} \in \mathcal{A}\mathcal{V}'(s\_{0})} \sum\_{s\_{1} \in \text{post}(s\_{0}, a\_{0})} \delta(s\_{0}, a\_{0}, s\_{1}) \cdot V(\mathcal{G})(s\_{1}) \\ & \leq \max\_{a\_{0} \in \mathcal{A}\mathcal{V}'(s\_{0})} \sum\_{s\_{1} \in \text{post}(s\_{0}, a\_{0})} \delta(s\_{0}, a\_{0}, s\_{1}) \cdot \left( \max\_{s\_{1}^{a\_{1}} \cdots \cdots \: \frac{a\_{k}}{k} \text{ in } \text{Ar}'} \left( \mathbb{X}\_{a\_{k}}(V(\mathcal{G})) \right)(s\_{k}) \right) \\ & \qquad \text{by the induction hypothesis (for } k-1) \\ & \leq \max\_{a\_{0} \in \mathcal{A}\mathcal{V}'(s\_{0})} \max\_{s\_{1} \in \text{post}(s\_{0}, a\_{0})} \Big( \max\_{s\_{1}^{a\_{1}} \cdots \: \frac{a\_{k}}{k} \text{ in } \text{Ar}'} \left( \mathbb{X}\_{a\_{k}}(V(\mathcal{G})) \right)(s\_{k}) \Big) \\ & = \max\_{s\_{0} \xrightarrow{a\_{0}} \text{max}} \Big( \mathbb{X}\_{a\_{k}}(V(\mathcal{G})) \Big)(s\_{k}). \tag{8} \end{split}$$

The inequality in (8) holds since an average over s<sup>1</sup> on the left-hand side is replaced by the corresponding maximum on the right-hand side. Note that the value max<sup>s</sup>1→ ··· *a*1 *a* →*<sup>k</sup>* in Av min<sup>i</sup>∈[1,k] X<sup>a</sup>*<sup>i</sup>* V (G) (si) that occurs on both sides is determined once s<sup>1</sup> is determined. This concludes the proof.

Lemma 4.6.2, although not itself used in the following technical development, suggests the idea of global propagation for upper bounds. Note that a bound is given in (6) for each k; it is possible that a bound for some k > 1 is tighter than that for k = 1, motivating us to take a "look-ahead" further than one step.

However, the bound in (6) is not particularly tuned for tractability: computation of the maximum involves words whose number is exponential in k, and moreover, we want to do so for many k's.

In the end, our main technical contribution is that a similar "look-ahead" can be done by solving the widest path problem in the following weighted graph. The soundness of this method is not so easy as for Lemma 4.6.2—see Sect. 4.3.

**Definition 4.7 (the WG** <sup>W</sup>LcPg(M, f)**).** Let <sup>M</sup> = (S, **<sup>1</sup>**, **<sup>0</sup>**, A, Av- , δ) be an MDP, and f : S → [0, 1]. The WG WLcPg(M, f) is the following triple (S, E, w).


$$w(s, s') = \max \{ \mathbb{X}\_a f(s) \mid a \in \text{Av}'(s), s' \in \text{post}(s, a) \}. \tag{9}$$

In (9), the function f—that is, the previous upper bound U<sup>i</sup>−<sup>1</sup> in Algorithm 3 is propagated one step by the application of Xa. This way of encoding these propagated values as weights in a WG seems pretty rough. For example, in case both s and s- are in post(s, a) for each a ∈ Av- (s), we have w(s, s- ) = w(s, s--), no matter what the transition probabilities from s to s- , s-are. The return **Algorithm 4:** A construction of PATH : <sup>S</sup>♦**<sup>1</sup>** <sup>→</sup> <sup>S</sup><sup>+</sup> for Lemma 4.8

 S<sup>v</sup> ← {**1**}, PATH(**1**) ← **1 while** S♦**<sup>1</sup>** \ S<sup>v</sup> = ∅ **do** Choose a pair of states (sc, sp) that satisfies the following: s<sup>c</sup> ∈ S \ Sv, s<sup>p</sup> ∈ Sv, V (G)(sc) = max*<sup>s</sup>*∈*S*\*S*<sup>v</sup> V (G)(s), and for an optimal action a at s<sup>c</sup> in M, s<sup>p</sup> ∈ post(sc, a) PATH(sc) ← s<sup>c</sup> · PATH(sp), S<sup>v</sup> ← S<sup>v</sup> ∪ {sc} **return** PATH

for this paid price (namely the information lost in the rough encoding) is that the resulting data structure (WG) allows fast *global* analysis via the widest path problem. Our experiment results in Sect. 5 demonstrate that this rough yet global approximation can make upper bounds quickly converge.

#### **4.3 Soundness and Convergence**

In Algorithm 3, an SG G is turned into an MDP M<sup>i</sup> and then to a WG Wi. Our claim is that computing a widest path in W<sup>i</sup> gives the next upper bound U<sup>i</sup> in the iteration. Here we prove the following correctness properties: soundness (V (G) ≤ Ui) and convergence (U<sup>i</sup> → V (G) as i → ∞).

We start with a technical lemma. The choice of the MDP M(G, Av- ) and the value function V (G) (for G, not for M(G, Av- )) in the statement is subtle; it turns out to be just what we need.

**Lemma 4.8.** *Let* <sup>G</sup> *be as in Algorithm 3, and* Av- : <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>A</sup> *be a Minimizer restriction (Definition 4.2). Let* s<sup>0</sup> ∈ S♦**<sup>1</sup>** *be a state with a non-zero value (Definition 2.5). Consider the MDP* M(G, Av- ) *(Definition 4.1), for which we write simply* M*. Then there is a finite path* π = s0a0s1a<sup>1</sup> ...a<sup>n</sup>−1s<sup>n</sup> *in* M *that satisfies the following.*


*Proof.* We construct a function PATH : <sup>S</sup>♦**<sup>1</sup>** <sup>→</sup> <sup>S</sup><sup>+</sup> by Algorithm 4. It is clear that PATH assigns a desired path to each s<sup>0</sup> ∈ S♦**<sup>1</sup>**. In particular, V (G) does not decrease along PATH(s0) since always a state with a smaller value of V (G) is prepended.

It remains to be shown that, in Line 3, a required pair (sc, sp) is always found. Let S<sup>v</sup> - <sup>S</sup>♦**<sup>1</sup>** be a subset with **<sup>1</sup>** <sup>∈</sup> <sup>S</sup>v; here <sup>S</sup><sup>v</sup> is a proper subset of <sup>S</sup>♦**<sup>1</sup>** since otherwise we should be already out of the while loop (Line 2).

Let Smax = {s ∈ S \ S<sup>v</sup> | V (G)(s) = maxs-<sup>∈</sup>S\S<sup>v</sup> V (G)(s- )}. Since <sup>S</sup><sup>v</sup> - S♦**1**, we have ∅ = Smax ⊆ S♦**<sup>1</sup>** and thus V (G)(s) > 0 for each s ∈ Smax. We also have **<sup>1</sup>** <sup>∈</sup> <sup>S</sup>max since **<sup>1</sup>** <sup>∈</sup> <sup>S</sup>v.

We argue by contradiction: assume that for any s ∈ S \ Sv, s- ∈ Sv, we have s- ∈ post(s, as), where a<sup>s</sup> is any optimal action at s in M with respect to V (G). Now let s ∈ Smax be an arbitrary element. It follows that V (G)(s) > 0.

$$V(\mathcal{G})(s) \le \underset{\ast}{\left(\mathbb{X}\_{a\_s}\left(V(\mathcal{G})\right)\right)}{\left(\mathbb{X}\_{a\_s}\left(V(\mathcal{G})\right)\right)}(s)$$

```
using Lemma 4.6; here as is an optimal action at s in M with respect to V (G),
 = 
      s-
       ∈S\Sv δ(s, as, s
                      ) · V (G)(s
                               )
         by the assumption that s ∈ post(s, as) for each s ∈ Sv
 ≤ 
      s-
       ∈S\Sv δ(s, as, s
                      ) · V (G)(s)
         since s ∈ Smax and hence V (G)(s
                                         ) ≤ V (G)(s)
 = V (G)(s) since 
                       s-
                        ∈S\Sv δ(sc, a, s
                                       )=1. (10)
```
Therefore both inequalities in the above must be equalities. In particular, for the second inequality (in (10)) to be an equality, we must have the weight for each suboptimal s to be 0. That is, δ(s, as, s- ) = 0 for each s-∈ (S \ Sv) \ Smax.

The above holds for arbitrary s ∈ Smax. Therefore, for any strategy that is optimal in M with respect to V (G), once a play is in Smax, it never comes out of <sup>S</sup>max, hence the play never reaches **<sup>1</sup>**. Moreover, an optimal strategy in <sup>M</sup> with respect to V (G) is at least as good as an optimal strategy for Maximizer in <sup>G</sup> (with respect to <sup>V</sup> (G)), that is, the latter reaches **<sup>1</sup>** no more often than the former. This follows from Lemma 4.6. Altogether, we conclude that a Maximizer optimal strategy in <sup>G</sup> does not lead any <sup>s</sup> <sup>∈</sup> <sup>S</sup>max to **<sup>1</sup>**, i.e., <sup>V</sup> (M)(s) = 0 for each s ∈ Smax. Now we come to a contradiction.

In the following lemma, we use the value function V (G) in the position of f in Definition 4.7. This cannot be done in actual execution of Algorithm 4: unlike U<sup>i</sup>−<sup>1</sup> in Algorithm 3, the value function V (G) is not known to us. Nevertheless, the lemma is an important theoretical vehicle towards soundness of Algorithm 3.

**Lemma 4.9.** *Let* <sup>G</sup> *be the game in Algorithm 3, and* Av- : <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>A</sup> *be a Minimizer restriction (Definition 4.2). Let* M = M(G, Av- )*, and* W = WLcPg M, V (G) *. Then, for each state* <sup>s</sup> <sup>∈</sup> <sup>S</sup>*, we have* WPW(W)(s, **<sup>1</sup>**) <sup>≥</sup> V (G)(s)*.*

*Proof.* In what follows, we let the WG W = WLcPg M, V (G) be denoted by W = (S, E, w). Let π = s0a0s1a<sup>1</sup> ...a<sup>n</sup>−<sup>1</sup>s<sup>n</sup> be a path of the MDP M such that <sup>s</sup><sup>n</sup> <sup>=</sup> **<sup>1</sup>**, each action is optimal in <sup>M</sup> with respect to <sup>V</sup> (G), and <sup>V</sup> (G)(si) <sup>≤</sup> V (G)(si+1) for each i ∈ [0, n − 1]. Existence of such a path π is shown by Lemma 4.8. Let π- <sup>=</sup> <sup>s</sup>0s<sup>1</sup> ...s<sup>n</sup>−<sup>1</sup>**<sup>1</sup>** be the path in the WG <sup>W</sup> induced by <sup>π</sup>—we simply omit actions.

The path π satisfies the following, for each i ∈ [0, n − 1]. <sup>w</sup>(s*i*, s*i*+1) = max -X*a* - <sup>V</sup> (G) (s*i*) <sup>a</sup> <sup>∈</sup> Av (s*i*), s*i*+1 <sup>∈</sup> post(s*i*, a) by Definition 4.7 = - <sup>X</sup>*a<sup>i</sup>* - <sup>V</sup> (G) (s*i*) since <sup>a</sup>*<sup>i</sup>* is optimal wrt. <sup>V</sup> (G); note that <sup>a</sup>*<sup>i</sup>* <sup>∈</sup> Av (s*i*), s*i*+1 <sup>∈</sup> post(s*i*, a*i*) hold since <sup>π</sup> is a path in <sup>M</sup> = max*a*∈Av-(*s*) - X*a* - <sup>V</sup> (G) (s*i*) since <sup>a</sup>*<sup>i</sup>* is optimal wrt. <sup>V</sup> (G) <sup>≥</sup> <sup>V</sup> (G)(s*i*) by Lemma 4.6.

This observation, combined with V (G)(s0) ≤ V (G)(s1) ≤ ··· ≤ V (G)(sn) (by the definition of π), implies that the width of the path π is at least V (G)(s0). The widest path width is no smaller than that.

**Theorem 4.10 (soundness).** *In Algorithm 3,* <sup>V</sup> (G) <sup>≤</sup> <sup>U</sup><sup>i</sup> *holds for each* <sup>i</sup> <sup>∈</sup> <sup>N</sup>*.*

*Proof.* We let the function

$$\begin{array}{llll} \min\{U, \operatorname{WPW} (\mathcal{W}\_{\operatorname{Lcp}} \Big(\mathcal{M}(\mathcal{G}, \operatorname{Av'}), U\Big)) (\\_, \mathbf{1})\} & : & S \longrightarrow [0, 1] \\ \text{denoted by} & T(\operatorname{Av'}, U) & : & S \longrightarrow [0, 1], \end{array}$$

clarifying its dependence on Av and <sup>U</sup> : <sup>S</sup> <sup>→</sup> [0, 1]. Clearly, for each <sup>i</sup> <sup>∈</sup> <sup>N</sup>, we have U<sup>i</sup> = T(Av<sup>L</sup>*<sup>i</sup>* , U<sup>i</sup>−<sup>1</sup>).

The rest of the proof is by induction. It is trivial if i =0(U<sup>0</sup> = ).

$$\begin{aligned} U\_{i+1} &= T(\text{Av}\_{L\_i}, U\_i) \\ &\ge T(\text{Av}\_{L\_i}, V(\mathcal{G})) \qquad \text{by ind. hyp., and } T(\text{Av}\_{L\_i}, \\_) \text{ is monotone} \\ &= \min \{ V(\mathcal{G}), \text{WPW}(\mathcal{W}\_{\text{Lcp}} \text{g} \left( \mathcal{M}(\mathcal{G}, \text{Av}\_{L\_i}), V(\mathcal{G}) \right) \} (\\_, \mathbf{1}) \} \\ &= V(\mathcal{G}) \qquad \text{by Lemma 4.9.} \end{aligned}$$

It is clear that U<sup>i</sup> decreases with respect to i (U<sup>0</sup> ≥ U<sup>1</sup> ≥···), by the presence of min in Line 8. It remains to show the following.

**Theorem 4.11 (convergence).** *In Algorithm 3, let the while loop iterate forever. Then* U<sup>i</sup> → V (G) *as* i → ∞*.*

*Proof.* We give a proof using the infinitary pigeonhole principle. The proof is nonconstructive—it is not suited for analyzing the speed of convergence, for example—but the proof becomes simpler.

In what follows, we let <sup>X</sup><sup>σ</sup> : (<sup>S</sup> <sup>→</sup> [0, 1]) <sup>→</sup> (<sup>S</sup> <sup>→</sup> [0, 1]) denote the Bellman operator on an MDP <sup>M</sup> induced by a strategy <sup>σ</sup>, i.e., (Xσf)(s) := (Xσ(s)f)(s). The MC obtained from an MDP <sup>M</sup> by fixing a strategy <sup>σ</sup> is denoted by <sup>M</sup><sup>σ</sup>.

Towards the statement of the theorem, for each <sup>i</sup> <sup>∈</sup> <sup>N</sup>, we choose a (positional) strategy σ<sup>i</sup> in the MDP M<sup>i</sup> as follows.

– For each <sup>s</sup> <sup>∈</sup> <sup>S</sup>♦**1**, take the widest path WPath(Wi, **<sup>1</sup>**)(s) = ss<sup>1</sup> ... **<sup>1</sup>** in <sup>W</sup><sup>i</sup> from s to **1** (Definition 2.8). Such a path from s to **1** exists—otherwise we have Ui(s) = 0, hence V (G)(s) = 0 by Theorem 4.10.

Let σi(s) be an action that justifies the first edge in the chosen widest path, that is, a ∈ Avi(s) such that s<sup>1</sup> ∈ post(s, a).

– For each s ∈ S \ S♦**1**, σi(s) is freely chosen from Avi(s).

It is then easy to see that

$$\text{WPW}(\mathcal{W}\_i)(s) \le (\mathbb{X}\_{\sigma\_i} U\_{i-1})(s) \qquad \text{for each } i \in \mathbb{N} \text{ and } s \in S\_{\diamondsuit \mathbf{1}}.\tag{11}$$

Indeed, by the definition of σi, the right-hand side is the weight of the first edge in the chosen widest path. This must be no smaller than the widest path width, that is, the width of the chosen path.

Now, since there are only finitely many strategies for the SG G, the same is true for the MDPs M0,M1,... that are obtained from G by restricting Minimizer's actions. Therefore, by the infinitary pigeonhole principle, there are infinitely many i<sup>0</sup> < i<sup>1</sup> < ··· such that σ<sup>i</sup><sup>0</sup> = σ<sup>i</sup><sup>1</sup> = ··· =: σ†. Moreover, we can choose them so that they are all beyond i<sup>M</sup> in Lemma 4.5, in which case we have

$$V(\mathcal{M}\_{\mathbf{i}\_m}^{\sigma^\dagger}) \le V(\mathcal{G}) \quad \text{for each } m \in \mathbb{N}.\tag{12}$$

Indeed, Minimizer's actions are already optimized in M<sup>i</sup> (Lemma 4.5), and thus the only freedom left for σ† is to choose suboptimal actions of Maximizer's.

In what follows, we cut down the domain of discourse from S → [0, 1] to S♦**<sup>1</sup>** → [0, 1], i.e., 1) every function of the type f : S → [0, 1] is now seen as the restriction over S♦**<sup>1</sup>**, and 2) the Bellman operator only adds up the value of the input function over S♦**<sup>1</sup>**, namely it is now defined by Xˆ <sup>a</sup>f(s) = - s-<sup>∈</sup>S♦**<sup>1</sup>** <sup>δ</sup>(s, a, s- )· f(s- ). The operator Xˆ<sup>σ</sup> is also defined in a similar way to Xσ.

Now proving convergence in S♦**<sup>1</sup>** → [0, 1] suffices for the theorem. Indeed, for each i ≥ iM, we have V (Mi)(s) = V (G)(s) = 0 for each s ∈ S \ S♦**<sup>1</sup>**. This implies that there is no path from <sup>s</sup> to **<sup>1</sup>** in <sup>M</sup>i, thus neither in the WG <sup>W</sup>i. Therefore U<sup>i</sup> ≤ WPW(Wi) = 0.

A benefit of this domain restriction is that the Bellman operator Xˆ<sup>σ</sup> has a unique fixed point in <sup>S</sup>♦**<sup>1</sup>** <sup>→</sup> [0, 1] if the set of non-sink states in <sup>M</sup><sup>σ</sup> is exactly <sup>S</sup>♦**<sup>1</sup>**, i.e., <sup>V</sup> (M<sup>σ</sup>)(s) <sup>&</sup>gt; 0 holds if and only if <sup>s</sup> <sup>∈</sup> <sup>S</sup>♦**<sup>1</sup>**. Furthermore, this unique fixed point is the value function <sup>V</sup> (M<sup>σ</sup>) restricted to <sup>S</sup>♦**<sup>1</sup>** <sup>⊆</sup> <sup>S</sup> [4, Theorem 10.19]. Therefore <sup>V</sup> (M<sup>σ</sup>) is computed by the gfp Kleene iteration, too:

$$\top \ge \hat{\mathbb{X}}\_{\sigma} \top \ge (\hat{\mathbb{X}}\_{\sigma})^2 \top \ge \cdots \quad \longrightarrow V(\mathcal{M}^{\sigma}) \quad \text{in the space } S\_{\lozenge \mathbf{1}} \to [0, 1]. \tag{13}$$

We show the following by induction on m.

$$U\_{i\_m} \le (\hat{\mathbb{X}}\_{\sigma^\dagger})^m \top \quad \text{for each } m \in \mathbb{N}. \tag{14}$$

It is obvious for m = 0. For the step case, we have the following. Notice that the inequality (11) holds in the restricted domain for i ≥ iM.

$$\begin{split} &U\_{i\_{m+1}} \leq \text{WPW}(\mathcal{W}\_{i\_{m+1}}) \quad \text{by Line 8 of Algorithm 3} \\ &\leq \hat{\mathbb{X}}\_{\sigma^{\dagger}} U\_{i\_{m+1}-1} \quad \text{by (11)} \\ &\leq \hat{\mathbb{X}}\_{\sigma^{\dagger}} U\_{i\_{m}} \quad \text{by monotonicity of } \hat{\mathbb{X}}\_{\sigma^{\dagger}}, \text{ decrease of } U\_{i} \text{ and } i\_{m} < i\_{m+1} \\ &\leq (\hat{\mathbb{X}}\_{\sigma^{\dagger}})^{m+1} \top \quad \text{by the induction hypothesis.} \end{split}$$

We have proved (14) which proves inf <sup>i</sup> <sup>U</sup><sup>i</sup> <sup>≤</sup> infm(Xˆ<sup>σ</sup>† )<sup>m</sup>.

Lastly, we prove that <sup>V</sup> (M<sup>σ</sup>† <sup>i</sup>*m*)(s) > 0 holds if and only if s ∈ S♦**<sup>1</sup>** for each <sup>m</sup> <sup>∈</sup> <sup>N</sup>, and thus <sup>σ</sup>† follows the characterization in (13). This proves

$$\inf\_{i} U\_{i} \le V(\mathcal{M}\_{i\_{m}}^{\sigma^{\dagger}}) \quad \text{for each } m \in \mathbb{N}. \tag{15}$$

Implication to the right is clear as Minimizer restriction is done optimally in <sup>M</sup><sup>i</sup>*m*. Conversely, if <sup>s</sup> <sup>∈</sup> <sup>S</sup>♦**<sup>1</sup>**, then there is a path from <sup>s</sup> to **<sup>1</sup>** in <sup>W</sup><sup>i</sup>*m*. Let WPath(W<sup>i</sup>*m*, **<sup>1</sup>**)(s) = <sup>s</sup>0s<sup>1</sup> ...sk, where <sup>s</sup><sup>0</sup> <sup>=</sup> <sup>s</sup>, <sup>k</sup> <sup>∈</sup> <sup>N</sup> and <sup>s</sup><sup>k</sup> <sup>=</sup> **<sup>1</sup>**. Then by the property of WPath and σ†, we have δ(s<sup>j</sup> , σ†(s<sup>j</sup> ), sj+1) > 0 for each j<k. Thus, the probability that the finite path WPath(W<sup>i</sup>*m*, **<sup>1</sup>**)(s) is obtained by running M<sup>σ</sup>† <sup>i</sup>*<sup>m</sup>* starting from <sup>s</sup>, which is apparently at most <sup>V</sup> (M<sup>σ</sup>† <sup>i</sup>*m*)(s), is nonzero. Hence we have implication to the left.

Combining (12), (15) and Theorem 4.10, we obtain the claim.

#### **5 Experiment Results**

*Experiment Settings.* We compare the following four algorithms.


The latter three—coming from [20]—are the only existing BVI algorithms for SGs with a convergence guarantee, to the best of our knowledge. The implementation of DFL and DFL BRTDP is provided by the authors of [20].

The four algorithms are implemented on top of PRISM-games [21] version 2.0. We used the stopping threshold ε = 10−<sup>6</sup>. The experiments were conducted on Dell Inspiron 3421 Laptop with 4.00 GB RAM and Intel(R) Core(TM) i5- 3337U 1.80 GHz processor.

In the implementations of DFL and DFL BRTDP, the deflating operation is applied only once every five iterations [20, Sect. B.3]. Following this, our WP also solves the widest path problem (Line 8) only once every five iterations, while other operations are applied in each iteration.

For input SGs, we took four models from the literature: *mdsm* [11], *cloud* [6], *teamform* [12] and *investor* [22]. In addition, we used our model *manyECs*—an artificial model with many ECs—to assess the effect of ECs on performance. The model manyECs is presented in the appendix in [24]. Each of these five models comes with a model parameter N.

There is another model called *cdmsn* in [20]. We do not discuss cdmsn since all the algorithms (ours and those from [20]) terminated within 0.001 seconds.

*Results.* The number i of iterations and the running time for each algorithm and each input SG is shown in Table 1. For DFL BRTDP, the ratio of states visited by the algorithm is shown in percentage; the smaller it is, the more efficient the algorithm is in reducing the state space. Each number for DFL BRTDP (a probabilistic algorithm) is the average over 5 runs.

**Table 1.** Experimental results, comparing WP (our algorithm) with those in [20]. N is a model parameter (the bigger the more complex). #states, #trans, #EC show the numbers of states, transitions and ECs in the SG, respectively. itr is the number i of iterations at termination; time is the execution time in seconds. For each SG, the fastest algorithm is shaded in green. The settings that did not terminate are shaded in gray; TO is time out (6 h), OOM is out of memory, and SO is stack overflow.


*Discussion.* We observe consistent performance advantage of our algorithm (WP). Even in the mdsm model where the DFL algorithms do not suffer from EC computation (#EC is just 1), WP's performance is comparable to DFL. The cloud model is where the learning-based approach in [20] works well—see visit% that are very small. Our WP performs comparably against DFL BRTDP, too.

The performance advantage of our WP algorithm is eminent, not only in the artificial model of manyECs (where WP is faster by magnitudes), but also in the realistic model investor that comes from a financial application scenario [22]. The results for these two models suggest that WP is indeed advantageous when EC computation poses a bottleneck for other algorithms.

Overall, we observe that our WP algorithm can be the first choice when it comes to solving SGs: for some models, it runs much faster than other algorithms; for other models, even if the performances of other algorithms differs a lot, WP's performance is comparable with the best algorithm.

#### **6 Conclusions and Future Work**

In this paper, we presented a new BVI algorithm for solving stochastic games. It features global propagation of upper bounds by widest paths, via a novel encoding of the problem to a suitable weighted graph. This way we avoid computation of end components that often penalizes the performance of the other BVI-based algorithms. Our experimental comparison with known BVI algorithms for SGs demonstrates the efficiency of our algorithm. For correctness of the algorithm, we presented proofs for soundness and convergence.

Extending the current algorithm for more advanced settings is future work this is much like the results in [20] are extended and used in [2,3,16]. In doing so, we hope to make essential use of structures that are unique to those advanced problem settings. Another important direction is to push forward the idea of global propagation in verification and synthesis, seeking further instances of the idea. Finally, pursuing the global propagation idea in the context of reinforcement learning—where problems are often formalized using MDPs and the Bellman operator is heavily utilized—may open up another fruitful collaboration between formal methods and statistical machine learning.

**Acknowledgment.** The authors are supported by ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603), JST; I.H. is supported by Grantin-Aid No. 15KT0012, JSPS. Thanks are due to Maximilian Weininger and Edon Kelmendi for sharing their implementation, and to Pranav Ashok and David Sprunger for useful discussions and comments.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Checking Qualitative Liveness Properties of Replicated Systems with Stochastic Scheduling**

Michael Blondin<sup>1</sup> , Javier Esparza<sup>2</sup> , Martin Helfrich<sup>2</sup> , Anton´ın Kuˇcera<sup>3</sup> , and Philipp J. Meyer2(B)

<sup>1</sup> Universit´e de Sherbrooke, Sherbrooke, Canada michael.blondin@usherbrooke.ca <sup>2</sup> Technical University of Munich, Munich, Germany {esparza,helfrich,meyerphi}@in.tum.de <sup>3</sup> Masaryk University, Brno, Czechia tony@fi.muni.cz

**Abstract.** We present a sound and complete method for the verification of qualitative liveness properties of replicated systems under stochastic scheduling. These are systems consisting of a finite-state program, executed by an unknown number of indistinguishable agents, where the next agent to make a move is determined by the result of a random experiment. We show that if a property of such a system holds, then there is always a witness in the shape of a *Presburger stage graph*: a finite graph whose nodes are Presburger-definable sets of configurations. Due to the high complexity of the verification problem (non-elementary), we introduce an incomplete procedure for the construction of Presburger stage graphs, and implement it on top of an SMT solver. The procedure makes extensive use of the theory of well-quasi-orders, and of the structural theory of Petri nets and vector addition systems. We apply our results to a set of benchmarks, in particular to a large collection of population protocols, a model of distributed computation extensively studied by the distributed computing community.

**Keywords:** Parameterized verification · Liveness · Stochastic systems

## **1 Introduction**

Replicated systems consist of a fully symmetric finite-state program executed by an unknown number of indistinguishable agents, communicating by rendez-vous

Michael Blondin is supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (NSERC) and by the Fonds de recherche du Qu´ebec – Nature et technologies (FRQNT). Javier Esparza, Martin Helfrich and Philipp J. Meyer have received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme under grant agreement No 787367 (PaVeS). Anton´ın Kuˇcera is supported by the Czech Science Foundation, grant No. 18-11193S.

or via shared variables [14,16,41,46]. Examples include distributed protocols and multithreaded programs, or abstractions thereof. The communication graph of replicated systems is a clique. They are a special class of *parameterized systems*, i.e., infinite families of systems that admit a finite description in some suitable modeling language. In the case of replicated systems, the (only) parameter is the number of agents executing the program.

Verifying a replicated system amounts to proving that an infinite family of systems satisfies a given property. This is already a formidable challenge, made even harder by the fact that we want to verify liveness (more difficult than safety) against stochastic schedulers. Loosely speaking, stochastic schedulers select the set of agents that should execute the next action as the result of a random experiment. Stochastic scheduling often appears in distributed protocols, and in particular also in population protocols—a model much studied in distributed computing with applications in computational biology<sup>1</sup>—that supplies many of our case studies [9,58]. Under stochastic scheduling, the semantics of a replicated system is an infinite family of finite-state Markov chains. In this work, we study *qualitative* liveness properties, stating that the infinite runs starting at configurations of the system satisfying a precondition almost surely reach and stay in configurations satisfying a postcondition. In this case, whether the property holds or not depends only on the topology of the Markov chains, and not on the concrete probabilities.

We introduce a formal model of replicated systems, based on multiset rewriting, where processes can communicate by shared variables or multiway synchronization. We present a sound and complete verification method called *Presburger stage graphs*. A Presburger stage graphs is a directed acyclic graphs with Presburger formulas as nodes. A formula represents a possibly infinite inductive set of configurations, i.e., a set of configurations closed under reachability. A node S (which we identify with the set of configurations it represents) has the following property: A run starting at any configuration of S almost surely reaches some configuration of some successor S of S, and, since S is inductive, get trapped in S . A stage graph labels the node S with a witness of this property in the form of a *Presburger certificate*, a sort of ranking function expressible in Presburger arithmetic. The completeness of the technique, i.e., the fact that for every property of the replicated system that holds there exists a stage graph proving it, follows from deep results of the theory of vector addition systems (VASs) [52–54].

Unfortunately, the theory of VASs also shows that, while the verification problems we consider are decidable, they have non-elementary computational complexity [33]. As a consequence, verification techniques that systematically explore the space of possible stage graphs for a given property are bound to be very inefficient. For this reason, we design an incomplete but efficient algorithm for the computation of stage graphs. Inspired by theoretical results, the algorithm combines a solver for linear constraints with some elements of the theory of wellstructured systems [2,39]. We report on the performance of this algorithm for a large number of case studies. In particular, the algorithm automatically verifies

<sup>1</sup> Under the name of *chemical reaction networks*.

many standard population protocols described in the literature [5,8,20,22,23, 28,31], as well as liveness properties of distributed algorithms for leader election and mutual exclusion [3,40,42,44,50,59,61,64].

*Related Work.* The parameterized verification of replicated systems was first studied in [41], where they were modeled as counter systems. This allows one to apply many efficient techniques [11,24,37,47]. Most of these works are inherently designed for safety properties, and some can also handle fair termination [38], but none of them handles stochastic scheduling. To the best of our knowledge, the only works studying parameterized verification of liveness properties under our notion of stochastic scheduling are those on verification of population protocols. For *fixed* populations, protocols can be verified with standard probabilistic model checking [13,65], and early works follow this approach [28,31,60,63]. Subsequently, an algorithm and a tool for the *parameterized* verification of population protocols were described in [21,22], and a first version of stage graphs was introduced in [23] for analyzing the expected termination time of population protocols. In this paper we overhaul the framework of [23] for liveness verification, drawing inspiration from the safety verification technology of [21,22]. Compared to [21,22], our approach is not limited to a specific subclass of protocols, and captures models beyond population protocols. Furthermore, our new techniques for computing Presburger certificates subsume the procedure of [22]. In comparison to [23], we provide the first completeness and complexity results for stage graphs. Further, our stage graphs can prove correctness of population protocols and even more general liveness properties, while those of [23] can only prove termination. We also introduce novel techniques for computing stage graphs, which compared to [23] can greatly reduce their size and allows us to prove more examples correct.

There is also a large body of work on parameterized verification via cutoff techniques: one shows that a specification holds for any number of agents iff it holds for any number of agents below some threshold called the cutoff (see [6,26,30,34,46], and [16] for a comprehensive survey). Cut-off techniques can be applied to systems with an array or ring communication structure, but they require the existence and effectiveness of a cutoff, which is not the case in our setting. Further parameterized verification techniques are regular model checking [1,25] and automata learning [7]. The classes of communication structures they can handle are orthogonal to ours: arrays and rings for regular model checking and automata learning, and cliques in our work. Regular model checking and learning have recently been employed to verify safety properties [29], liveness properties under arbitrary schedulers [55] and termination under finitary fairness [51]. The classes of schedulers considered in [51,55] are incomparable to ours: arbitrary schedulers in [55], and finitary-fair schedulers in [51]. Further, these works are based on symbolic state-space exploration, while our techniques are based on automatic construction of invariants and ranking functions [16].

#### **2 Preliminaries**

Let <sup>N</sup> denote {0, <sup>1</sup>,...} and let E be a finite set. A *unordered vector* over E is a mapping V : E <sup>→</sup> <sup>Z</sup>. In particular, a *multiset* over E is an unordered vector M : E <sup>→</sup> <sup>N</sup> where M(e) denotes the number of occurrences of e in M. The sets of all unordered vectors and multisets over E are respectively denoted <sup>Z</sup><sup>E</sup> and NE. Vector addition, subtraction and comparison are defined componentwise. The *size* of a multiset M is denoted <sup>|</sup>M<sup>|</sup> <sup>=</sup> - <sup>e</sup>∈<sup>E</sup> <sup>M</sup>(e). We let <sup>E</sup>k denote the set of all multisets over E of size k. We sometimes describe multisets using a set-like notation, e.g.M <sup>=</sup> f, g, g or equivalently M <sup>=</sup> f, <sup>2</sup> · g is such that M(f) = 1, M(g) = 2 and M(e) = 0 for all e ∈ {f,g}.

*Presburger Arithmetic.* Let X be a set of variables. The set of formulas of *Presburger arithmetic* over X is the result of closing atomic formulas, as defined in the next sentence, under Boolean operations and first-order existential quantification. Atomic formulas are of the form k <sup>i</sup>=1 <sup>a</sup><sup>i</sup>x<sup>i</sup> <sup>∼</sup> <sup>b</sup>, where <sup>a</sup><sup>i</sup> and <sup>b</sup> are integers, <sup>x</sup><sup>i</sup> are variables and <sup>∼</sup> is either <sup>&</sup>lt; or <sup>≡</sup>m, the latter denoting the congruence modulo m for any m <sup>≥</sup> 2. Formulas over X are interpreted on <sup>N</sup><sup>X</sup>. Given a formula φ of Presburger arithmetic, we let φ denote the set of all multisets satisfying φ. A set E <sup>⊆</sup> <sup>N</sup><sup>X</sup> is a *Presburger set* if <sup>E</sup> <sup>=</sup> φ for some formula <sup>φ</sup>.

#### **2.1 Replicated Systems**

<sup>A</sup> *replicated system* over Q of arity n is a tuple <sup>P</sup> = (Q, T), where T <sup>⊆</sup> <sup>n</sup> <sup>k</sup>=0 <sup>Q</sup>k <sup>×</sup> <sup>Q</sup>k is a *transition relation* containing the set of *silent* transitions <sup>n</sup> <sup>k</sup>=0{(*x*, *<sup>x</sup>*) <sup>|</sup> *<sup>x</sup>* <sup>∈</sup> Qk)}<sup>2</sup>. A *configuration* is a multiset C of states, which we interpret as a global state with C(q) agents in each state q <sup>∈</sup> Q.

For every t = (*x*, *<sup>y</sup>*) <sup>∈</sup> T with *<sup>x</sup>* <sup>=</sup> -X1, X2,...,X<sup>k</sup> and *<sup>y</sup>* <sup>=</sup> -Y1, Y2,...,Y<sup>k</sup>, we write <sup>X</sup>1X<sup>2</sup> ··· <sup>X</sup><sup>k</sup> <sup>→</sup> <sup>Y</sup>1Y<sup>2</sup> ··· <sup>Y</sup><sup>k</sup> and let •<sup>t</sup> def <sup>=</sup> *<sup>x</sup>*, t • def <sup>=</sup> *<sup>y</sup>* and Δ(t) def = t • <sup>−</sup> •t. A transition t is *enabled* at a configuration C if C <sup>≥</sup> •t and, if so, can *occur*, leading to the configuration C <sup>=</sup> <sup>C</sup> <sup>+</sup> <sup>Δ</sup>(t). If <sup>t</sup> is not enabled at <sup>C</sup>, then we say that it is *disabled*. We use the following reachability notation:

C t −→ C ⇐⇒ t is enabled at C and its occurrence leads to C , <sup>C</sup> −→ <sup>C</sup> ⇐⇒ C <sup>t</sup> −→ C for some t <sup>∈</sup> T, C w −→ <sup>C</sup> ⇐⇒ <sup>C</sup> <sup>=</sup> <sup>C</sup><sup>0</sup> <sup>w</sup><sup>1</sup> −−→ <sup>C</sup><sup>1</sup> ··· <sup>w</sup><sup>n</sup> −−→ <sup>C</sup><sup>n</sup> <sup>=</sup> <sup>C</sup> for some <sup>C</sup><sup>0</sup>, C<sup>1</sup>,...,C<sup>n</sup> <sup>∈</sup> <sup>N</sup><sup>Q</sup>, C ∗ −→ C ⇐⇒ C <sup>w</sup> −→ C for some w <sup>∈</sup> T <sup>∗</sup>.

Observe that, by definition of transitions, <sup>C</sup> −→ <sup>C</sup> implies <sup>|</sup>C<sup>|</sup> <sup>=</sup> <sup>|</sup>C |, and likewise for C <sup>∗</sup> −→ C . Intuitively, transitions cannot create or destroy agents.

<sup>A</sup> *run* is an infinite sequence <sup>C</sup><sup>0</sup>t<sup>1</sup>C<sup>1</sup>t<sup>2</sup>C<sup>2</sup> ··· such that <sup>C</sup><sup>i</sup> ti+1 −−→ <sup>C</sup><sup>i</sup>+1 for every i <sup>≥</sup> 0. Given L <sup>⊆</sup> T <sup>∗</sup> and a set of configurations <sup>C</sup>, we let

$$\begin{aligned} \operatorname{post}\_{L}(\mathcal{C}) & \stackrel{\operatorname{def}}{=} \{ C' : C \in \mathcal{C}, w \in L, C \stackrel{w}{\xrightarrow{w}} C' \}, & \quad \operatorname{post}^\*(\mathcal{C}) & \stackrel{\operatorname{def}}{=} \operatorname{post}\_{T^\*}(\mathcal{C}), \\\ \underline{pre\_L(\mathcal{C}) & \stackrel{\operatorname{def}}{=} \{ C : C' \in \mathcal{C}, w \in L, C \stackrel{w}{\xrightarrow{w}} C' \}, & \quad \operatorname{pre}^\*(\mathcal{C}) & \stackrel{\operatorname{def}}{=} \operatorname{pre}\_{T^\*}(\mathcal{C}). \end{aligned}$$

<sup>2</sup> In the paper, we will omit the silent transitions when giving replicated systems.

*Stochastic Scheduling.* We assume that, given a configuration C, a probabilistic scheduler picks one of the transitions enabled at C. We only make the following two assumptions about the random experiment determining the transition: first, the probability of a transition depends only on C, and, second, every transition enabled at C has a nonzero probability of occurring. Since C <sup>∗</sup> −→ C implies <sup>|</sup>C<sup>|</sup> <sup>=</sup> <sup>|</sup>C <sup>|</sup>, the number of configurations reachable from any configuration C is finite. Thus, for every configuration C, the semantics of <sup>P</sup> from C is a finite-state Markov chain rooted at C.

*Example 1.* Consider the replicated system <sup>P</sup> = (Q, T) of arity 2 with states Q <sup>=</sup> {A<sup>Y</sup>, <sup>A</sup><sup>N</sup>,P<sup>Y</sup>,PN} and transitions <sup>T</sup> <sup>=</sup> {t1, t2, t3, t<sup>4</sup>}, where

$$\begin{aligned} t\_1 &\colon \mathcal{A}\_Y \mathcal{A}\_N \mapsto \mathcal{P}\_Y \mathcal{P}\_N, &\qquad t\_2 &\colon \mathcal{A}\_Y \mathcal{P}\_N \mapsto \mathcal{A}\_Y \mathcal{P}\_Y, \\ t\_3 &\colon \mathcal{A}\_N \mathcal{P}\_Y \mapsto \mathcal{A}\_N \mathcal{P}\_N, &\qquad t\_4 &\colon \mathcal{P}\_Y \mathcal{P}\_N \mapsto \mathcal{P}\_N \mathcal{P}\_N. \end{aligned}$$

Intuitively, at every moment in time, agents are either *Active* or *Passive*, and have output *Yes* or *No*, which corresponds to the four states of Q. This system is designed to satisfy the following property: for every configuration C in which all agents are initially active, i.e., C satisfies C(PY) = <sup>C</sup>(PN) = 0, if <sup>C</sup>(AY) <sup>&</sup>gt; C(AN), then eventually all agents stay forever in the "yes" states {A<sup>Y</sup>,PY}, and otherwise all agents eventually stay forever in the "no" states {A<sup>N</sup>,PN}.

#### **2.2 Qualitative Model Checking**

Let us fix a replicated system <sup>P</sup> = (Q, T). Formulas of *linear temporal logic (LTL)* on P are defined by the following grammar:

$$\varphi ::= \phi \mid \neg \varphi \mid \varphi \lor \varphi \mid \varphi \land \varphi \mid \mathbf{X} \varphi \mid \varphi \mathbf{U} \varphi$$

where φ is a Presburger formula over Q. We look at φ as an atomic proposition over the set N<sup>Q</sup> of configurations. Formulas of LTL are interpreted over runs of <sup>P</sup> in the standard way. We abbreviate ♦ϕ <sup>≡</sup> *true* **<sup>U</sup>** ϕ and ϕ ≡ ¬♦¬ϕ.

Let us now introduce the probabilistic interpretation of LTL. A configuration C of <sup>P</sup> satisfies an LTL formula ϕ *with probability* p if Pr[C, ϕ] = p, where Pr[C, ϕ] denotes the probability of the set of runs of <sup>P</sup> starting at C that satisfy ϕ in the finite-state Markov chain rooted at C. The measurability of this set of runs for every C and ϕ follows from well-known results [65]. The *qualitative model checking problem* consists of, given an LTL formula ϕ and a set of configurations <sup>I</sup>, deciding whether Pr[C, ϕ] = 1 for every C ∈ I. We will often work with the complement problem, i.e., deciding whether Pr[C,¬ϕ] > 0 for some C ∈ I.

In contrast to the action-based qualitative model checking problem of [35], our version of the problem is undecidable due to adding atomic propositions over configurations (see the full version of the paper [19] for a proof):

#### **Theorem 1.** *The qualitative model checking problem is not semi-decidable.*

It is known that qualitative model checking problems of finite-state probabilistic systems reduces to model checking of non-probabilistic systems under an adequate notion of fairness.

**Definition 1.** *A run of a replicated system* <sup>P</sup> *is* fair *if for every possible step* C t −→ C *of* <sup>P</sup> *the following holds: if the run contains infinitely many occurrences of* C*, then it also contains infinitely many occurrences of* CtC *.*

So, intuitively, if a run can execute a step infinitely often, it eventually will. It is readily seen that a fair run of a finite-state transition system eventually gets "trapped" in one of its bottom strongly connected components, and visits each of its states infinitely often. Hence, fair runs of a finite-state Markov chain have probability one. The following proposition was proved in [35] for a model slightly less general than replicated systems; the proof can be generalized without effort:

**Proposition 1 (**[35, Prop. 7]**).** *Let* <sup>P</sup> *be a replicated system, let* C *be a configuration of* <sup>P</sup>*, and let* ϕ *be an LTL formula. It is the case that* Pr[C, ϕ]=1 *iff every fair run of* <sup>P</sup> *starting at* C *satisfies* ϕ*.*

We implicitly use this proposition from now on. In particular, we define:

**Definition 2.** *A configuration* C satisfies ϕ *with probability 1, or just* satisfies ϕ*, if every fair run starting at* C *satisfies* ϕ*, denoted by* C <sup>|</sup><sup>=</sup> ϕ*. We let* ϕ *denote the set of configurations satisfying* ϕ*. A set* <sup>C</sup> *of configurations* satisfies ϕ *if* C ⊆ ϕ*, i.e., if* C <sup>|</sup><sup>=</sup> ϕ *for every* C ∈ C*.*

*Liveness Specifications for Replicated Systems.* We focus on a specific class of temporal properties for which the qualitative model checking problem is decidable and which is large enough to formalize many important specifications. Using well-known automata-theoretic technology, this class can also be used to verify all properties describable in action-based LTL, see e.g. [35].

<sup>A</sup> *stable termination property* is given by a pair Π = (ϕpre, Φ*post*), where <sup>Φ</sup>*post* <sup>=</sup> {ϕ<sup>1</sup> post,...,ϕ<sup>k</sup> post} and ϕpre, ϕ<sup>1</sup> post,...,ϕ<sup>k</sup> post are Presburger formulas over Q describing sets of configurations. Whenever k = 1, we sometimes simply write Π = (ϕpre, ϕpost). The pair <sup>Π</sup> induces the LTL property

$$
\varphi\_{\varPi} \stackrel{\text{def}}{=} \bigotimes\_{i=1}^{k} \bigsqcup\_{}^{} \Box \varphi\_{\text{post}}^{i} \iota
$$

Abusing language, we say that a replicated system <sup>P</sup> *satisfies* <sup>Π</sup> if ϕpre <sup>⊆</sup> ϕ<sup>Π</sup>, that is, if every configuration <sup>C</sup> satisfying <sup>ϕ</sup>pre satisfies <sup>ϕ</sup><sup>Π</sup> with probability 1. The *stable termination problem* is the qualitative model checking problem for <sup>I</sup> <sup>=</sup> ϕpre and <sup>ϕ</sup> <sup>=</sup> <sup>ϕ</sup><sup>Π</sup> given by a stable termination property <sup>Π</sup> = (ϕpre, Φ*post*). *Example 2.* Let us reconsider the system from Example 1. We can formally specify that all agents will eventually agree on the majority output *Yes* or *No*. Let Π<sup>Y</sup> = (ϕ<sup>Y</sup> pre, ϕ<sup>Y</sup> post) and Π<sup>N</sup> = (ϕ<sup>N</sup> pre, ϕ<sup>N</sup> post) be defined by:

$$\begin{aligned} \varphi\_{\text{pre}}^{\text{Y}} &= (\text{A}\_{\text{Y}} > \text{A}\_{\text{N}} \land \text{P}\_{\text{Y}} + \text{P}\_{\text{N}} = 0), & \varphi\_{\text{post}}^{\text{Y}} &= (\text{A}\_{\text{N}} + \text{P}\_{\text{N}} = 0), \\ \varphi\_{\text{pre}}^{\text{N}} &= (\text{A}\_{\text{Y}} \le \text{A}\_{\text{N}} \land \text{P}\_{\text{Y}} + \text{P}\_{\text{N}} = 0), & \varphi\_{\text{post}}^{\text{N}} &= (\text{A}\_{\text{Y}} + \text{P}\_{\text{Y}} = 0). \end{aligned}$$

The system satisfies the property specified in Example <sup>1</sup> iff it satisfies <sup>Π</sup><sup>Y</sup> and Π<sup>N</sup>. As an alternative (weaker) property, we could specify that the system always stabilizes to either output by Π = (ϕ<sup>Y</sup> pre <sup>∨</sup> <sup>ϕ</sup><sup>N</sup> pre, {ϕ<sup>Y</sup> post, ϕ<sup>N</sup> post}).

#### **3 Stage Graphs**

In the rest of the paper, we fix a replicated system <sup>P</sup> = (Q, T) and a stable termination property <sup>Π</sup> = (ϕpre, Φ*post*), where <sup>Φ</sup>*post* <sup>=</sup> {ϕ<sup>1</sup> post,...,ϕ<sup>k</sup> post}, and address the problem of checking whether <sup>P</sup> satisfies Π. We start with some basic definitions on sets of configurations.

#### **Definition 3 (inductive sets, leads to, certificates)**


Note that certificates only require the existence of some executions decreasing f, not for all of them to to decrease it. Despite this, we have:

**Proposition 2.** *For all inductive sets* <sup>C</sup>, <sup>C</sup> *of configurations, it is the case that:* C *leads to* C *iff there exists a certificate for* C C *.*

The proof, which can be found in the full version [19], depends on two properties of replicated systems with stochastic scheduling. First, every configuration has only finitely many descendants. Second, for every fair run and for every finite execution C <sup>w</sup> −→ C , if C appears infinitely often in the run, then the run contains infinitely many occurrences of C <sup>w</sup> −→ C . We can now introduce stage graphs:

**Definition 4 (stage graph).** *<sup>A</sup>* stage graph *of* <sup>P</sup> *for the property* Π *is a directed acyclic graph whose nodes, called* stages*, are sets of configurations satisfying the following conditions:*


The existence of a stage graph implies that <sup>P</sup> satisfies Π. Indeed, by conditions 2–3 and repeated application of Proposition 2, every run starting at a configuration of ϕpre eventually reaches a terminal stage, say <sup>C</sup>, and, by condition 1, stays in C forever. Since, by condition 4, all configurations of C satisfy some ϕ<sup>i</sup> post, after its first visit to <sup>C</sup> every configuration satisfies ϕ<sup>i</sup> post.

*Example 3.* Figure 1 depicts stage graphs for the system of Example 1 and the properties defined in Example 2. The reader can easily show that every stage C is inductive by checking that for every <sup>C</sup> ∈ C and every transition <sup>t</sup> ∈ {t<sup>1</sup>,...,t<sup>4</sup>} enabled at C, the step C <sup>t</sup><sup>i</sup> −→ <sup>C</sup> satisfies <sup>C</sup> ∈ C. For example, if a configuration satisfies A<sup>Y</sup> <sup>&</sup>gt; <sup>A</sup>N, so does any successor configuration.

**Fig. 1.** Stage graphs for the system of Example 1.

The following proposition shows that stage graphs are a sound and complete technique for proving stable termination properties.

## **Proposition 3.** *System* <sup>P</sup> *satisfies* Π *iff it has a stage graph for* Π*.*

Proposition 3 does not tell us anything about the decidability of the stable termination problem. To prove that the problem is decidable, we introduce Presburger stage graphs. Intuitively these are stage graphs whose stages and certificates can be expressed by formulas of Presburger arithmetic.

#### **Definition 5 (Presburger stage graphs)**


Using a powerful result from [36], we show that: (1) <sup>P</sup> satisfies Π iff it has a Presburger stage graph for Π (Theorem 2); (2) there exists a denumerable set of candidates for a Presburger stage graph for Π; and (3) there is an algorithm that decides whether a given candidate is a Presburger stage graph for Π (Theorem 3). Together, (1–3) show that the stable termination problem is semi-decidable. To obtain decidability, we observe that the complement of the stable termination problem is also semi-decidable. Indeed, it suffices to enumerate all initial configurations <sup>C</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup>pre, build for each such <sup>C</sup> the (finite) graph <sup>G</sup><sup>C</sup> of configurations reachable from C, and check if some bottom strongly connected component <sup>B</sup> of <sup>G</sup><sup>C</sup> satisfies <sup>B</sup> |<sup>=</sup> <sup>ϕ</sup><sup>i</sup> post for all <sup>i</sup>. This is the case iff some fair run starting at C visits and stays in <sup>B</sup>, which in turn is the case iff <sup>P</sup> violates Π.

**Theorem 2.** *System* <sup>P</sup> *satisfies* Π *iff it has a Presburger stage graph for* Π*.*

We observe that testing whether a given graph is a Presburger stage graph reduces to Presburger arithmetic satisfiability, which is decidable [62] and whose complexity lies between 2-NEXP and 2-EXPSPACE [15]:

**Theorem 3.** *The problem of deciding whether an acyclic graph of Presburger sets and Presburger certificates is a Presburger stage graph, for a given stable termination property, is reducible in polynomial time to the satisfiability problem for Presburger arithmetic.*

#### **4 Algorithmic Construction of Stage Graphs**

At the current state of our knowledge, the decision procedure derived from Theorem 3 has little practical relevance. From a theoretical point of view, the TOWERhardness result of [33] implies that the stage graph may have non-elementary size in the system size. In practice, systems have relatively small stage graphs, but, even so, the enumeration of all candidates immediately leads to a prohibitive combinatorial explosion.

For this reason, we present a procedure to automatically *construct* (not guess) a Presburger stage graph G for a given replicated system <sup>P</sup> and a stable termination property Π = (ϕpre, Φ*post*). The procedure may *fail*, but, as shown in the experimental section, it succeeds for many systems from the literature.

The procedure is designed to be implemented on top of a solver for the existential fragment of Presburger arithmetic. While every formula of Presburger arithmetic has an equivalent formula within the existential fragment [32,62], quantifier-elimination may lead to a doubly-exponential blow-up in the size of the formula. Thus, it is important to emphasize that our procedure *never requires to eliminate quantifiers*: If the pre- and postconditions of Π are supplied as quantifier-free formulas, then all constraints of the procedure remain in the existential fragment.

We give a high-level view of the procedure (see Algorithm 1), which uses several functions, described in detail in the rest of the paper. The procedure maintains a workset *WS* of Presburger stages, represented by existential Presburger formulas. Initially, the only stage is an inductive Presburger overapproximation *PotReach*(ϕpre) of the configurations reachable from ϕpre (*PotReach* is an abbreviation for "potentially reachable"). Notice that we must necessarily use an overapproximation, since *post*∗(ϕpre) is not always expressible in Presburger arithmetic<sup>3</sup>. We use a refinement of the overapproximation introduced in [22,37], equivalent to the overapproximation of [24].

In its main loop (lines 2–9), Algorithm 1 picks a Presburger stage S from the workset, and processes it. First, it calls Terminal(S, Φ*post*) to check if <sup>S</sup> is terminal, i.e., whether S |<sup>=</sup> ϕ<sup>i</sup> post for some <sup>ϕ</sup><sup>i</sup> post <sup>∈</sup> <sup>Φ</sup>*post*. This reduces to checking

<sup>3</sup> This follows easily from the fact that *post*∗(ψ) is not always expressible in Presburger arithmetic for vector addition systems, even if ψ denotes a single configuration [43].

#### **Algorithm 1:** procedure for the construction of stage graphs.

**Input**: replicated system P = (Q, T), stable term. property Π = (ϕpre, Φ*post* ) **Result**: a stage graph of P for Π


```
4 if ¬Terminal(S, Φpost ) then
5 U ← AsDead(S)
6 if U = ∅ then
7 WS ← WS ∪ {IndOverapprox(S, U)}
8 else
9 WS ← WS ∪ Split(S)
```
the unsatisfiability of the existential Presburger formula φ ∧ ¬ϕ<sup>i</sup> post, where φ is the formula characterizing S. If S is not terminal, then the procedure attempts to construct successor stages in lines 5–9, with the help of three further functions: *AsDead*, *IndOverapprox*, and *Split*. In the rest of this section, we present the intuition behind lines 5–9, and the specification of the three functions. Sections 5, 6 and 7 present the implementations we use for these functions.

Lines 5–9 are inspired by the behavior of most replicated systems designed by humans, and are based on the notion of *dead* transitions, which can never occur again (to be formally defined below). Replicated systems are usually designed to run in *phases*. Initially, all transitions are alive, and the end of a phase is marked by the "death" of one or more transitions, i.e., by reaching a configuration at which these transitions are dead. The system keeps "killing transitions" until no transition that is still alive can lead to a configuration violating the postcondition. The procedure mimics this pattern. It constructs stage graphs in which if S is a successor of S, then the set of transitions dead at S is a *proper superset* of the transitions dead at S. For this, *AsDead*(S) computes a set of transitions that are alive at some configuration of S, but which will become dead in every fair run starting at <sup>S</sup> (line 5). Formally, *AsDead*(S) returns a set U <sup>⊆</sup> *Dead*(S) such that S |<sup>=</sup> ♦dead(U), defined as follows.

**Definition 6.** *A transition of a replicated system* <sup>P</sup> *is* dead *at a configuration* C *if it is disabled at every configuration reachable from* C *(including* C *itself ). A transition is* dead *at a stage* S *if it is dead at every configuration of* S*. Given a stage* <sup>S</sup> *and a set* U *of transitions, we use the following notations:*


Observe that we can compute *Dead*(S) by checking unsatisfiability of a sequence of existential Presburger formulas: as S is inductive, we have *Dead*(S) = {t |S|= dis(t)}, and S |= dis(t) holds iff the existential Presburger formula <sup>∃</sup>C : φ(C) <sup>∧</sup> C <sup>≥</sup> •<sup>t</sup> is unsatisfiable, where <sup>φ</sup> is the formula characterizing <sup>S</sup>.

The following proposition, whose proof appears in the full version [19], shows that determining whether a given transition will eventually become dead, while decidable, is PSPACE-hard. Therefore, Sect. 7 describes two implementations of this function, and a way to combine them, which exhibit a good trade-off between precision and computation time.

**Proposition 4.** *Given a replicated system* <sup>P</sup>*, a stage* <sup>S</sup> *represented by an existential Presburger formula* φ *and a set of transitions* U*, determining whether* S |<sup>=</sup> ♦*dead*(U) *holds is decidable and PSPACE-hard.*

If the set U returned by *AsDead*(S) is nonempty, then we know that every fair run starting at a configuration of S will eventually reach a configuration of S ∩ dead(U). So, this set, or any inductive overapproximation of it, can be a legal successor of <sup>S</sup> in the stage graph. Function *IndOverapprox*(S, U) returns such an inductive overapproximation (line 7). To be precise, we show in Sect. <sup>5</sup> that dead(U) is a Presburger set that can be computed exactly, albeit in doubly-exponential time in the worst case. The section also shows how to compute overapproximations more efficiently. If the set U returned by *AsDead*(S) is empty, then we cannot yet construct any successor of S. Indeed, recall that we want to construct stage graphs in which if S is a successor of S, then *Dead*(S ) is a *proper superset* of *Dead*(S). In this case, we proceed differently and try to split S:

**Definition 7.** *<sup>A</sup>* split *of some stage* <sup>S</sup> *is a set* {S<sup>1</sup>,..., <sup>S</sup>k} *of (not necessarily disjoint) stages such that the following holds:*

*– Dead*(Si) <sup>⊃</sup> *Dead*(S) *for every* <sup>1</sup> <sup>≤</sup> i <sup>≤</sup> k*, and –* <sup>S</sup> <sup>=</sup> <sup>k</sup> <sup>i</sup>=1 Si*.*

If there exists a split {S<sup>1</sup>,..., <sup>S</sup>k} of <sup>S</sup>, then we can let <sup>S</sup><sup>1</sup>,..., <sup>S</sup><sup>k</sup> be the successors of S in the stage graph. Observe that a stage may indeed have a split. We have *Dead*(C<sup>1</sup> ∪ C2) = *Dead*(C1)∩ *Dead*(C2), and hence *Dead*(C<sup>1</sup> ∪ C2) may be a proper subset of both *Dead*(C1) and *Dead*(C2):

*Example 4.* Consider the system with states {q<sup>1</sup>, q<sup>2</sup>} and transitions <sup>t</sup><sup>i</sup> : <sup>q</sup><sup>i</sup> <sup>→</sup> <sup>q</sup><sup>i</sup> for i ∈ {1, <sup>2</sup>}. Let <sup>S</sup> <sup>=</sup> {C <sup>|</sup> C(q<sup>1</sup>)=0 <sup>∨</sup> C(q<sup>2</sup>)=0}, i.e., <sup>S</sup> is the (inductive) stage of configurations disabling either <sup>t</sup><sup>1</sup> or <sup>t</sup><sup>2</sup>. The set {S<sup>1</sup>, <sup>S</sup>2}, where <sup>S</sup><sup>i</sup> <sup>=</sup> {C ∈S| C(q<sup>i</sup>)=0}, is a split of <sup>S</sup> satisfying *Dead*(Si) = {t<sup>i</sup>}⊃∅ <sup>=</sup> *Dead*(S).

The canonical split of <sup>S</sup>, if it exists, is the set {S ∩ dead(t) <sup>|</sup> t /<sup>∈</sup> *Dead*(S)}. As mentioned above, Sect. <sup>5</sup> shows that dead(U) can be computed exactly for every U, but the computation can be expensive. Hence, the canonical split can be computed exactly at potentially high cost. Our implementation uses an underapproximation of dead(t), described in Sect. 6.

## **5 Computing and Approximating dead(***U***)**

We show that, given a set U of transitions,


**Downward and Upward Closed Sets.** We enrich <sup>N</sup> with the limit element ω in the usual way. In particular, n<ω holds for every n <sup>∈</sup> <sup>N</sup>. An ω*-configuration* is a mapping C<sup>ω</sup> : <sup>Q</sup> <sup>→</sup> <sup>N</sup> ∪ {ω}. The *upward closure* and *downward closure* of a set <sup>C</sup><sup>ω</sup> of ω-configurations are the sets of configurations ↑ C<sup>ω</sup> and ↓ C<sup>ω</sup>, respectively defined as:

$$\begin{aligned} \uparrow \mathcal{C}^{\omega} & \stackrel{\text{def}}{=} \{ C \in \mathbb{N}^{Q} \mid C \ge C^{\omega} \text{ for some } C^{\omega} \in \mathcal{C}^{\omega} \}, \\ \downarrow \mathcal{C}^{\omega} & \stackrel{\text{def}}{=} \{ C \in \mathbb{N}^{Q} \mid C \le C^{\omega} \text{ for some } C^{\omega} \in \mathcal{C}^{\omega} \}. \end{aligned}$$

A set C of configurations is *upward closed* if C = ↑ C, and *downward closed* if C = ↓ C. These facts are well-known from the theory of well-quasi orderings:

**Lemma 1.** *For every set* <sup>C</sup> *of configurations, the following holds:*


**Computing** dead**(***U***) Exactly.** It follows immediately from Definition 6 that both dis(U) and dead(U) are downward closed. Indeed, if all transitions of U are disabled at C, and C <sup>≤</sup> <sup>C</sup>, then they are also disabled at <sup>C</sup> , and clearly the same holds for transitions dead at C. Furthermore:

**Proposition 5.** *For every set* U *of transitions, the (downward) decomposition of both sup*(*dis*(U)) *and sup*(*dead*(U)) *is effectively computable.*

*Proof.* For every <sup>t</sup> <sup>∈</sup> <sup>U</sup> and <sup>q</sup> <sup>∈</sup> •t, let C<sup>ω</sup> t,q be the <sup>ω</sup>-configuration such that Cω t,q(q) = •t(q) <sup>−</sup> 1 and C<sup>ω</sup> t,q(p) = ω for every p <sup>∈</sup> Q \ {q}. In other words, C<sup>ω</sup> t,q is the <sup>ω</sup>-configuration made only of <sup>ω</sup>'s except for state <sup>q</sup> which falls short from •t(q) by one. This ω-configurations captures all configurations disabled in t due to an insufficient amount of agents in state q. We have:

$$\sup\left(\left\|\mathrm{dis}(U)\right\|\right) = \left\{C\_{t,q}^{\omega} : t \in U, q \in \textsubscript{\bullet} t\right\}.$$

The latter can be made minimal by removing superfluous ω-configurations.

For the case of sup(dead(U)), we invoke [45, Prop. 2] which gives a proof for the more general setting of (possibly unbounded) Petri nets. Their procedure is based on the well-known backwards reachability algorithm (see, e.g., [2,39]).

Since sup(dead(U)) is finite, its computation allows to describe dead(U) by the following linear constraint<sup>4</sup>:

$$\bigvee\_{C^{\omega} \in \text{sup}(\{\text{dead}(U)\})} \bigwedge\_{q \in Q} \left[ C(q) \le C^{\omega}(q) \right].$$

However, the cardinality of sup(dead(U)) can be exponential [45, Remark for Prop. 2] in the system size. For this reason, we are interested in constructing both under- and over-approximations.

**Overapproximations of** dead**(***U***).** For every i <sup>∈</sup> <sup>N</sup>, define dead(U)<sup>i</sup> as:

dead(U) 0 def <sup>=</sup> dis(U) and dead(U) i+1 def <sup>=</sup> *pre*<sup>T</sup> (dead(U)<sup>i</sup>) <sup>∩</sup> dis(U).

Loosely speaking, dead(U)<sup>i</sup> is the set of configurations C such that every configuration reachable in at most i steps from C disables U. We immediately have:

$$\left[\text{dead}(U)\right] = \bigcap\_{i=0}^{\infty} [\text{dead}(U)]^i.$$

Using Proposition <sup>5</sup> and the following proposition, we obtain that dead(U)<sup>i</sup> is an effectively computable overapproximation of dead(U).

**Proposition 6.** *For every Presburger set* <sup>C</sup> *and every set of transitions* U*, the sets pre*<sup>U</sup> (C) *and post*<sup>U</sup> (C) *are effectively Presburger.*

Recall that function *IndOverapprox*(S, U) of Algorithm <sup>1</sup> must return an *inductive* overapproximation of dead(U). Since dead(U)<sup>i</sup> might not be inductive in general, our implementation uses either the inductive overapproximations *IndOverapprox*<sup>i</sup> (S, U) def <sup>=</sup> *PotReach*(S ∩ dead(U)<sup>i</sup> ), or the exact value *IndOverapprox*∞(S, U) def <sup>=</sup> S ∩ dead(U). The table of results in the experimental section describes for each benchmark which overapproximation was used.

**Underapproximations of** dead**(***U***): Death Certificates.** A *death certificate* for U in <sup>P</sup> is a finite set <sup>C</sup><sup>ω</sup> of <sup>ω</sup>-configurations such that:


If U is dead at a set <sup>C</sup> of configurations, then there is always a certificate that proves it, namely sup(dead(U)). In particular, if <sup>C</sup><sup>ω</sup> is a death certificate for U then ↓ C<sup>ω</sup> <sup>⊆</sup> dead(U), that is, ↓ C<sup>ω</sup> is an underapproximation of dead(U)

Using Proposition 6, it is straightforward to express in Presburger arithmetic that a finite set <sup>C</sup><sup>ω</sup> of <sup>ω</sup>-configurations is a death certificate for <sup>U</sup>:

**Proposition 7.** *For every* k <sup>≥</sup> <sup>1</sup> *there is an existential Presburger formula DeathCert*k(U, <sup>C</sup><sup>ω</sup>) *that holds iff* <sup>C</sup><sup>ω</sup> *is a death certificate of size* k *for* U*.*

<sup>4</sup> Observe that if <sup>C</sup><sup>ω</sup>(q) = <sup>ω</sup>, then the term "C(q) <sup>≤</sup> <sup>ω</sup>" is equivalent to "**true**".

#### **6 Splitting a Stage**

Given a stage <sup>S</sup>, we try to find a set <sup>C</sup><sup>ω</sup> <sup>1</sup> ,..., <sup>C</sup><sup>ω</sup> of death certificates for transitions <sup>t</sup><sup>1</sup>,...,t <sup>∈</sup> <sup>T</sup> \ *Dead*(S) such that S ⊆ ↓C<sup>ω</sup> <sup>1</sup> ∪···∪↓C<sup>ω</sup> . This allows us to split <sup>S</sup> into <sup>S</sup><sup>1</sup>,..., <sup>S</sup>, where <sup>S</sup><sup>i</sup> def <sup>=</sup> S∩↓C<sup>ω</sup> i .

For any fixed size k <sup>≥</sup> 1 and any fixed , we can find death certificates Cω <sup>1</sup> ,..., <sup>C</sup><sup>ω</sup> of size at most <sup>k</sup> by solving a Presburger formula. However, the formula does not belong to the existential fragment, because the inclusion check S ⊆ ↓C<sup>ω</sup> <sup>1</sup> ∪···∪↓C<sup>ω</sup> requires universal quantification. For this reason, we proceed iteratively. For every i <sup>≥</sup> 0, after having found <sup>C</sup><sup>ω</sup> <sup>1</sup> ,..., <sup>C</sup><sup>ω</sup> <sup>i</sup> we search for a pair (C<sup>i</sup>+1, <sup>C</sup><sup>ω</sup> <sup>i</sup>+1) such that


An efficient implementation requires to guide the search for (C<sup>i</sup>+1, <sup>C</sup><sup>ω</sup> <sup>i</sup>+1), because otherwise the search procedure might not even terminate, or might split S into too many parts, blowing up the size of the stage graph. Our search procedure employs the following heuristic, which works well in practice. We only consider the case k = 1, and search for a pair (C<sup>i</sup>+1, C<sup>ω</sup> <sup>i</sup>+1) satisfying (i) and (ii) above, and additionally:


Condition (iii) guarantees termination. Intuitively, condition (iv) leads to certificates valid for sets U <sup>⊆</sup> T \ *Dead*(S) as large as possible. So it allows us to avoid splits that, loosely speaking, do not make as much progress as they could. Condition (v) allows us to avoid splits with many elements because each element of the split has a small intersection with S.

An example illustrating these conditions is given in the full version [19].

#### **7 Computing Eventually Dead Transitions**

Recall that the function *AsDead*(S) takes an inductive Presburger set S as input, and returns a (possibly empty) set U <sup>⊆</sup> *Dead*(S) of transitions such that S |<sup>=</sup> ♦dead(U). This guarantees <sup>S</sup> dead(U) and, since <sup>S</sup> is inductive, also <sup>S</sup> S ∩ dead(U).

By Proposition 4, deciding if there exists a non-empty set U of transitions such that S |<sup>=</sup> ♦dead(U) holds is PSPACE-hard, which makes a polynomial reduction to satisfiability of existential Presburger formulas unlikely. So we design incomplete implementations of *AsDead*(S) with lower complexity. Combining these implementations, the lack of completeness essentially vanishes in practice.

The implementations are inspired by Proposition 2, which shows that S dead(U) holds iff there exists a certificate f such that:

$$\forall C \in \mathcal{S} \; \vert \; \llbracket \text{dead}(U) \rbrack \; : \exists \; C \xrightarrow{\ast} C' \colon f(C) > f(C'). \tag{Cert}$$


#### **7.1 First Implementation: Linear Ranking Functions**

Our first procedure computes the existence of a linear *ranking function*.

**Definition 8.** *A function* r : S → <sup>N</sup> *is a ranking function for* <sup>S</sup> *and* U *if for every* C ∈ S *and every step* C <sup>t</sup> −→ C *the following holds:*

*1. if* t <sup>∈</sup> U*, then* r(C) > r(C )*; and*

*2. if* t /<sup>∈</sup> U*, then* r(C) <sup>≥</sup> r(C )*.*

**Proposition 8.** *If* r : S → <sup>N</sup> *is a ranking function for* <sup>S</sup> *and* U*, then there exists* k <sup>∈</sup> <sup>N</sup> *such that* (r, k) *is a bounded certificate for* <sup>S</sup> *dead*(U)*.*

*Proof.* Let <sup>M</sup> be the minimal finite basis of the upward closed set dead(U). For every configuration <sup>D</sup> <sup>∈</sup> <sup>M</sup>, let <sup>σ</sup><sup>D</sup> be a shortest sequence that enables some transition of <sup>t</sup><sup>D</sup> <sup>∈</sup> <sup>U</sup> from <sup>D</sup>, i.e., such that <sup>D</sup> <sup>σ</sup><sup>D</sup> −−→ <sup>D</sup> <sup>t</sup><sup>D</sup>−−→ <sup>D</sup> for some <sup>D</sup> , D. Let k def = max{|σ<sup>D</sup>t<sup>D</sup><sup>|</sup> : D <sup>∈</sup> M}.

Let C ∈S\ dead(U). Since C <sup>∈</sup> dead(U), we have <sup>C</sup> <sup>≥</sup> <sup>D</sup> for some D <sup>∈</sup> M. By monotonicity, we have C <sup>σ</sup><sup>D</sup> −−→ <sup>C</sup> <sup>t</sup><sup>D</sup>−−→ <sup>C</sup> for some configurations <sup>C</sup> and C. By Definition 8, we have r(C) <sup>≥</sup> r(C ) > r(C), and so condition (Cert) holds. As <sup>|</sup>σ<sup>D</sup>t<sup>D</sup>| ≤ <sup>k</sup>, we have that (r, k) is a bounded certificate.

It follows immediately from Definition <sup>8</sup> that if <sup>r</sup><sup>1</sup> and <sup>r</sup><sup>2</sup> are ranking functions for sets <sup>U</sup><sup>1</sup> and <sup>U</sup><sup>2</sup> respectively, then <sup>r</sup> defined as <sup>r</sup>(C) def <sup>=</sup> r<sup>1</sup>(C) + r<sup>2</sup>(C) is a ranking function for <sup>U</sup><sup>1</sup> <sup>∪</sup> <sup>U</sup><sup>2</sup>. Therefore, there exists a unique maximal set of transitions U such that <sup>S</sup> dead(U) can be proved by means of a ranking function. Further, U can be computed by collecting all transitions t <sup>∈</sup> *Dead*(S) such that there exists a ranking function <sup>r</sup><sup>t</sup> for {t}. The existence of a *linear* ranking function <sup>r</sup><sup>t</sup> can be decided in polynomial time via linear programming, as follows. Recall that for every step C <sup>u</sup> −→ C , we have C <sup>=</sup> C <sup>+</sup> Δ(u). So, by linearity, we have r<sup>t</sup>(C) <sup>≥</sup> r<sup>t</sup>(C ) ⇐⇒ r<sup>t</sup>(C <sup>−</sup> C) <sup>≤</sup> <sup>0</sup> ⇐⇒ r<sup>t</sup>(Δ(u)) <sup>≤</sup> 0. Thus, the constraints of Definition 8 can be specified as:

$$a \cdot \Delta(t) < 0 \quad \land \bigwedge\_{u \in \overline{\operatorname{Dead}(\mathcal{S})}} a \cdot \Delta(u) \le 0,$$

where *<sup>a</sup>*: <sup>Q</sup> <sup>→</sup> <sup>Q</sup>≥<sup>0</sup> gives the coefficients of <sup>r</sup><sup>t</sup>, that is, <sup>r</sup><sup>t</sup>(C) = *<sup>a</sup>* · <sup>C</sup>, and *<sup>a</sup>* · *<sup>x</sup>* def = - <sup>q</sup>∈<sup>Q</sup> *<sup>a</sup>*(q) · *<sup>x</sup>*(q) for *<sup>x</sup>* <sup>∈</sup> <sup>N</sup><sup>Q</sup>. Observe that a solution may yield a function whose codomain differs from N. However, this is not an issue since we can scale it with the least common denominator of each *<sup>a</sup>*(q).

#### **7.2 Second Implementation: Layers**

*Transitions layers* were introduced in [22] as a technique to find transitions that will eventually become dead. Intuitively, a set U of transitions is a layer if (1) no run can contain only transitions of U, and (2) U becomes dead once disabled; the first condition guarantees that U eventually becomes disabled, and the second that it eventually becomes dead. We formalize layers in terms of *layer functions*.

**Definition 9.** *A function* : S → <sup>N</sup> *is a* layer function *for* <sup>S</sup> *and* U *if:*

**C1.** (C) > (C ) *for every* C ∈ S *and every step* C <sup>t</sup> −→ C *with* t <sup>∈</sup> U*; and* **C2.** *dis*(U) <sup>=</sup> *dead*(U)*.*

**Proposition 9.** *If* : S → <sup>N</sup> *is a layer function for* <sup>S</sup> *and* U*, then* ( , 1) *is a bounded certificate for* <sup>S</sup> *dead*(U)*.*

*Proof.* Let C ∈S\ dead(U). By condition **C2**, we have C ∈ dis(U). So there exists a step C <sup>u</sup> −→ C where u <sup>∈</sup> U. By condition **C1**, we have (C) > (C ), so condition (Cert) holds and ( , 1) is a bounded certificate.

Let <sup>S</sup> be a stage. For every set of transitions <sup>U</sup> <sup>⊆</sup> *Dead*(S) we can construct a Presburger formula *lin-layer*(U, *<sup>a</sup>*) that holds iff there there exists a *linear* layer function for U, i.e., a layer function of the form (C) = *<sup>a</sup>* · C for a vector of coefficients *<sup>a</sup>*: Q <sup>→</sup> <sup>Q</sup>≥<sup>0</sup>. Condition **C1**, for a linear function (C), is expressed by the existential Presburger formula

$$\lim\text{-layer-fun}(U, \mathbf{a}) \stackrel{\text{def}}{=} \bigwedge\_{u \in U} \mathbf{a} \cdot \Delta(u) < 0.$$

Condition **C2** is expressible in Presburger arithmetic because of Proposition 5. However, instead of computing dead(U) explicitly, there is a more efficient way to express this constraint. Intuitively, dis(U) <sup>=</sup> dead(U) is the case if enabling a transition u <sup>∈</sup> U requires to have previously enabled some transition <sup>u</sup> <sup>∈</sup> U. This observation leads to:

**Proposition 10.** *A set* U *of transitions satisfies dis*(U) <sup>=</sup> *dead*(U) *iff it satisfies the existential Presburger formula*

$$\iota \, dis\text{-}eq\text{-}dead(U) \stackrel{\text{def}}{=} \bigwedge\_{t \in T} \bigwedge\_{u \in U} \bigvee\_{u' \in U} \bullet + (\!^\bullet u \ominus t\text{"}) \ge \!^\bullet u'$$

*where <sup>x</sup> <sup>y</sup>* <sup>∈</sup> <sup>N</sup><sup>Q</sup> *is defined by* (*<sup>x</sup> <sup>y</sup>*)(q) *def* = max(*x*(q) <sup>−</sup> *<sup>y</sup>*(q), 0) *for <sup>x</sup>*, *<sup>y</sup>* <sup>∈</sup> <sup>N</sup><sup>Q</sup>*.*

This allows us to give the constraint *lin-layer*(U, *<sup>a</sup>*), which is of polynomial size:

$$\dim\text{-layer}(U,\mathfrak{a}) \stackrel{\text{def}}{=} \lim\text{-layer-fun}(U,\mathfrak{a}) \land \text{dis-eq-dead}(U) \,.$$

#### **7.3 Comparing Ranking and Layer Functions**

The ranking and layer functions of Sects. 7.1 and 7.2 are incomparable in power, that is, there are sets of transitions for which a ranking function but no layer function exists, and vice versa. This is shown by the following two systems:

$$\begin{aligned} \mathcal{P}\_1 &= (\{\mathbf{A}, \mathbf{B}, \mathbf{C}\}, \{t\_1 \colon \mathbf{A} \, \mathbf{B} \mapsto \mathbf{C} \, \mathbf{C}, \, t\_2 \colon \mathbf{A} \longmapsto \mathbf{B}, \, t\_3 \colon \mathbf{B} \mapsto \mathbf{A}\}),\\ \mathcal{P}\_2 &= (\{\mathbf{A}, \mathbf{B}\}, \quad \{t\_4 \colon \mathbf{A} \, \mathbf{B} \mapsto \mathbf{A} \, \mathbf{A}, \, t\_5 \colon \mathbf{A} \mapsto \mathbf{B}\}). \end{aligned}$$

Consider the system <sup>P</sup>1, and let <sup>S</sup> <sup>=</sup> <sup>N</sup><sup>Q</sup>, i.e., <sup>S</sup> contains all configurations. Transitions <sup>t</sup><sup>2</sup> and <sup>t</sup><sup>3</sup> never become dead at -A and can thus never be included in any <sup>U</sup>. Transition <sup>t</sup><sup>1</sup> eventually becomes dead, as shown by the linear ranking function r(C) = C(A) + C(B) for U <sup>=</sup> {t<sup>1</sup>}. But for this <sup>U</sup>, the condition **C2** for layer functions is not satisfied, as dis(U) -<sup>A</sup>, <sup>A</sup> <sup>t</sup><sup>2</sup> −→ -<sup>A</sup>, <sup>B</sup> ∈ dis(U), so dis(U) <sup>=</sup> dead(U). Therefore no layer function exists for this U.

Consider now the system <sup>P</sup>2, again with <sup>S</sup> <sup>=</sup> <sup>N</sup><sup>Q</sup>, and let <sup>U</sup> <sup>=</sup> {t<sup>5</sup>}. Once <sup>t</sup><sup>5</sup> is disabled, there is no agent in A, so both <sup>t</sup><sup>4</sup> and <sup>t</sup><sup>5</sup> are dead. So dis(U) <sup>=</sup> dead(U). The linear layer function (C) = C(A) satisfies *lin-layer-fun*(U, *<sup>a</sup>*), showing that U eventually becomes dead. As C <sup>t</sup>4t<sup>5</sup> −−→ <sup>C</sup> for <sup>C</sup> <sup>=</sup> -<sup>A</sup>, <sup>B</sup>, there is no ranking function r for this U, which would need to satisfy r(C) < r(C).

For our implementation of *AsDead*(S), we therefore combine both approaches. We first compute (in polynomial time) the unique maximal set U for which there is a linear ranking function. If this U is non-empty, we return it, and otherwise compute a set U of maximal size for which there is a linear layer function.

#### **8 Experimental Results**

We implemented the procedure of Sect. 4 on top of the SMT solver *Z3* [57], and use the Owl [48] and HOA [12] libraries for translating LTL formulas. The resulting tool automatically constructs stage graphs that verify stable termination properties for replicated systems. We evaluated it on two sets of benchmarks, described below. The first set contains population protocols, and the second leader election and mutual exclusion algorithms. All tests where performed on a machine with an Intel Xeon CPU E5-2630 v4 @ 2.20 GHz and 8GB of RAM. The results are depicted in Fig. 2 and can be reproduced by the certified artifact [18]. For parametric families of replicated systems, we always report the largest instance that we were able to verify with a timeout of one hour. For *IndOverapprox*, from the approaches in Sect. 5, we use *IndOverapprox*<sup>0</sup> in the examples marked with \* and *IndOverapprox*<sup>∞</sup> otherwise. Almost all constructed stage graphs are a chain with at most 3 stages. The only exceptions are the stage graphs for the approximate majority protocols that contained a binary split and 5 stages. The size of the Presburger formulas increases with increasing size of the replicated system. In the worst case, this growth can be exponential. However, the growth is linear in all examples marked with \*.


**Fig. 2.** Columns |Q|, |T|, and **Time** give the number of states and non-silent transitions, and the time for verification. Population protocols are verified for an infinite set of configurations. For parametric families, the smallest instance that could not be verified within one hour is shown in brackets, e.g. (TO: c = 90). Leader election and mutex algorithms are verified for one configuration. The number of processes leading to a timeout is given in brackets, e.g. (TO: 10).

*Population Protocols.* Population protocols [8,9] are replicated systems that compute Presburger predicates following the computation-as-consensus paradigm [10]. Depending on whether the initial configuration of agents satisfies the predicate or not, the agents of a correct protocol eventually agree on the output "yes" or "no", almost surely. Example 1 can be interpreted as a population protocol for the majority predicate A<sup>Y</sup> <sup>&</sup>gt; <sup>A</sup>N, and the two stable termination properties that verify its correctness are described in Example 2. To show that a population protocol correctly computes a given predicate, we thus construct two Presburger stage graphs for the two corresponding stable termination properties. In all these examples, correctness is proved for an infinite set of initial configurations.

Our set of benchmarks contains a broadcast protocol [31], three majority protocols (Example 1, [23, Ex. 3], [5]), and multiple instances of parameterized families of protocols, where each protocol computes a different instance of a parameterized family of predicates<sup>5</sup>. These include various *flock-of-birds* protocol families ([28], [20, Sect. 3], [31, *threshold-n*]) for the family of predicates x <sup>≥</sup> c for some constant c <sup>≥</sup> 0; two families for threshold predicates of the form *<sup>a</sup>*·*<sup>x</sup>* <sup>≥</sup> <sup>c</sup> [8,20]; and one family for remainder protocols of the form *<sup>a</sup>* · *<sup>x</sup>* <sup>≡</sup><sup>m</sup> <sup>c</sup> [22]. Further, we check approximate majority protocols ([27,56], [51, *coin game*]). As these protocols only compute the predicate with large probability but not almost surely, we only verify that they always converge to a stable consensus.

*Comparison with* [22]. The approach of [22] can only be applied to so-called *strongly-silent* protocols. However, this class does not contain many fast and succinct protocols recently developed for different tasks [4,17,20].

We are able to verify all six protocols reported in [22]. Further, we are also able to verify the fast Majority [5] protocol as well as the succinct protocols Flock-of-birds [20, Sect. 3] and Threshold [20]. All three protocols are not strongly-silent. Although our approach is more general and complete, the time to verify many strongly-silent protocol does not differ significantly between the two approaches. Exceptions are the Flock-of-birds [28] protocols where we are faster ([22] reaches the timeout at c = 55) as well as the Remainder and the Flock-ofbirds-threshold-n protocols where we are substantially slower ([22] reaches the timeout at m = 80 and c = 350, respectively). Loosely speaking, the approach of [22] can be faster because they compute inductive overapproximations using an iterative procedure instead of *PotReach*. In some instances already a very weak overapproximation, much less precise than *PotReach*, suffices to verify the result. Our procedure can be adapted to accommodate this (it essentially amounts to first running the procedure of [22], and if it is inconclusive then run ours).

*Other Distributed Algorithms.* We have also used our approach to verify arbitrary LTL liveness properties of non-parameterized systems with arbitrary communication structure. For this we apply standard automata-theoretic techniques and

<sup>5</sup> Notice that for each protocol we check correctness for all inputs; we cannot yet automatically verify that infinitely many protocols are correct, each of them for all possible inputs.

construct a product of the system and a *limit-deterministic B¨uchi automaton* for the negation of the property. Checking that no fair runs of the product are accepted by the automaton reduces to checking a stable termination property.

Since we only check correctness of one single finite-state system, we can also apply a probabilistic model checker based on state-space exploration. However, our technique delivers a stage graph, which plays two roles. First, it gives an explanation of why the property holds in terms of invariants and ranking functions, and second, it is a certificate of correctness that can be efficiently checked by independent means.

We verify liveness properties for several leader election and mutex algorithms from the literature [3,40,42,44,50,59,61,64] under the assumption of a probabilistic scheduler. For the leader election algorithms, we check that a leader is eventually chosen; for the mutex algorithms, we check that the first process enters its critical section infinitely often.

*Comparison with PRISM* [49]. We compared execution times for verification by our technique and by PRISM on the same models. While PRISM only needs a few seconds to verify instances of the mutex algorithms [3,40,50,59,61,64] where we reach the time limit, it reaches the memory limit for the two leader election algorithms [42,44] already for 70 and 71 processes, which we can still verify.

#### **9 Conclusion and Further Work**

We have presented stage graphs, a sound and complete technique for the verification of stable termination properties of replicated systems, an important class of parameterized systems. Using deep results of the theory of Petri nets, we have shown that Presburger stage graphs, a class of stage graphs whose correctness can be reduced to the satisfiability problem of Presburger arithmetic, are also sound and complete. This provides a decision procedure for the verification of termination properties, which is of theoretical nature since it involves a blind enumeration of candidates for Presburger stage graphs. For this reason, we have presented a technique for the algorithmic construction of Presburger stage graphs, designed to exploit the strengths of SMT-solvers for existential Presburger formulas, i.e., integer linear constraints. Loosely speaking, the technique searches for *linear* functions certifying the progress between stages, even though only the much larger class of Presburger functions guarantees completeness.

We have conducted extensive experiments on a large set of benchmarks. In particular, our approach is able to prove correctness of nearly all the standard protocols described in the literature, including several protocols that could not be proved by the technique of [22], which only worked for so-called stronglysilent protocols. We have also successfully applied the technique to some selfstabilization algorithms, leader election and mutual exclusion algorithms.

Our technique is based on the mechanized search for invariants and ranking functions. It avoids the use of state-space exploration as much as possible. For this reason, it also makes sense as a technique for the verification of liveness properties of non-parameterized systems with a finite but very large state space.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Stochastic Games with Lexicographic Reachability-Safety Objectives**

Krishnendu Chatterjee<sup>1</sup> , Joost-Pieter Katoen<sup>3</sup> , Maximilian Weininger<sup>2</sup> , and Tobias Winkler3(B)

> <sup>1</sup> IST Austria, Klosterneuburg, Austria <sup>2</sup> Technical University of Munich, Munich, Germany <sup>3</sup> RWTH Aachen University, Aachen, Germany tobias.winkler@cs.rwth-aachen.de

**Abstract.** We study turn-based stochastic zero-sum games with lexicographic preferences over reachability and safety objectives. Stochastic games are standard models in control, verification, and synthesis of stochastic reactive systems that exhibit both randomness as well as angelic and demonic non-determinism. Lexicographic order allows to consider multiple objectives with a strict preference order over the satisfaction of the objectives. To the best of our knowledge, stochastic games with lexicographic objectives have not been studied before. We establish determinacy of such games and present strategy and computational complexity results. For strategy complexity, we show that lexicographically optimal strategies exist that are deterministic and memory is only required to remember the already satisfied and violated objectives. For a constant number of objectives, we show that the relevant decision problem is in NP ∩ coNP, matching the current known bound for single objectives; and in general the decision problem is PSPACE-hard and can be solved in NEXPTIME ∩ coNEXPTIME. We present an algorithm that computes the lexicographically optimal strategies via a reduction to computation of optimal strategies in a sequence of single-objectives games. We have implemented our algorithm and report experimental results on various case studies.

#### **1 Introduction**

*Simple stochastic games (SGs)* [26] are zero-sum turn-based stochastic games played over a finite state space by two adversarial players, the Maximizer and Minimizer, along with randomness in the transition function. These games allow the interaction of angelic and demonic non-determinism as well as stochastic uncertainty. They generalize classical models such as Markov decision processes (MDPs) [39] which have only one player and stochastic uncertainty. An objective

c The Author(s) 2020 S. K. Lahiri and C. Wang (Eds.): CAV 2020, LNCS 12225, pp. 398–420, 2020. https://doi.org/10.1007/978-3-030-53291-8\_21

This research was funded in part by the TUM IGSSE Grant 10.06 (PARSEC), the German Research Foundation (DFG) project KR 4890/2-1 "Statistical Unbounded Verification", the ERC CoG 863818 (ForM-SMArt), the Vienna Science and Technology Fund (WWTF) Project ICT15-003, and the RTG 2236 UnRAVeL.

specifies a desired set of trajectories of the game, and the goal of the Maximizer is to maximize the probability of satisfying the objective against all choices of the Minimizer. The basic decision problem is to determine whether the Maximizer can ensure satisfaction of the objective with a given probability threshold. This problem is among the rare and intriguing combinatorial problems that are NP ∩ coNP, and whether it belongs to P is a major and long-standing open problem. Besides the theoretical interest, SGs are a standard model in control and verification of stochastic reactive systems [4,18,31,39], as well as they provide robust versions of MDPs when precise transition probabilities are not known [22,45].

The multi-objective optimization problem is relevant in the analysis of systems with multiple, potentially conflicting goals, and a trade-off must be considered for the objectives. While the multi-objective optimization has been extensively studied for MDPs with various classes of objectives [1,28,39], the problem is notoriously hard for SGs. Even for multiple reachability objectives, such games are not determined [23] and their decidability is still open.

This work considers SGs with multiple reachability and safety objectives with lexicographic preference order over the objectives. That is, we consider SGs with several objectives where each objective is either reachability or safety, and there is a total preference order over the objectives. The motivation to study such lexicographic objectives is twofold. First, they provide an important special case of general multiple objectives. Second, lexicographic objectives are useful in many scenarios. For example, (i) an autonomus vehicle might have a primary objective to avoid clashes and a secondary objective to optimize performance; and (b) a robot saving lives during fire in a building might have a primary objective to save as many lives as possible, and a secondary objective to minimize energy consumption. Thus studying reactive systems with lexicographic objectives is a very relevant problem which has been considered in many different contexts [7,33]. In particular non-stochastic games with lexicographic objectives [6,25] and MDPs with lexicographic objectives [47] have been considered, but to the best of our knowledge SGs with lexicographic objectives have not been studied.

In this work we present several contributions for SGs with lexicographic reachability and safety objectives. The main contributions are as follows.


*Technical Contribution.* The key idea is that, given the lexicographic order of the objectives, we can consider them sequentially. After every objective, we remove all actions that are not optimal, thereby forcing all following computation to consider only locally optimal actions. The main complication is that local optimality of actions does not imply global optimality when interleaving reachability and safety, as the latter objective can use locally optimal actions to stay in the safe region without reaching the more important target. We introduce quantified reachability objectives as a means to solve this problem.

*Related Work.* We present related works on: (a) MDPs with multiple objectives; (b) SGs with multiple objectives; (c) lexicographic objectives in related models; and (d) existing tool support.


(d) PRISM-Games [37] provides tool support for several multi-player multiobjective settings. MultiGain [10] is limited to generalized mean-payoff MDPs. Storm [27] can, among numerous single-objective problems, solve Markov automata with multiple timed reachability or expected cost objectives [40], multi-cost bounded reachability MDPs [35], and it can provide simple strategies for multiple expected reward objectives in MDPs [28].

*Structure of this Paper.* After recalling preliminaries and defining the problem in Sect. 2, we first consider games where all target sets are absorbing in Sect. 3. Then, in Sect. 4 we extend our insights to general games, yielding the full algorithm and the theoretical results. Finally, Sect. 5 describes the implementation and experimental evaluation. Section 6 concludes.

### **2 Preliminaries**

**Notation.** A probability distribution on a finite set <sup>A</sup> is a function <sup>f</sup> : <sup>A</sup> <sup>→</sup> [0, 1] such that - <sup>x</sup>∈<sup>A</sup> <sup>f</sup>(x) = 1. We denote the set of all probability distributions on <sup>A</sup> by D(A). Vector-like objects *x* are denoted in a bold font and we use the notation *x*<sup>i</sup> for the i-th component of *x*. We use *x*<n as a shorthand for (*x*1,...,*x*<sup>n</sup>−<sup>1</sup>).

#### **2.1 Basic Definitions**

**ProbabilisticModels.** In this paper, we consider *(simple) stochastic games* [26], which are defined as follows. Let L = {a, b, . . .} be a finite set of actions labels.

**Definition 1 (SG).** *<sup>A</sup>* stochastic game *(SG) is a tuple* <sup>G</sup> = (S-, S♦,Act, P) *with* S := S- <sup>S</sup>♦ <sup>=</sup> <sup>∅</sup> *a finite set of states,* Act : <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>L</sup> \ {∅} *defines finitely many actions available at every state, and* P : S × L → D(S) *is the transition probability function.* P(s, a) *is undefined if* a /∈ Act(s)*.*

We abbreviate P(s, a)(s ) to P(s, a, s ). We refer to the two players of the game as Max and Min and the sets S and S♦ are the Max- and Min-states, respectively. As the game is *turn based*, these sets partition the state space S such that in each state it is either Max's or Min's turn. The intuitive semantics of an SG is as follows: In every turn, the corresponding player picks one of the finitely many available actions a ∈ Act(s) in the current state s. The game then transitions to the next state according to the probability distribution P(s, a). The winning conditions are not part of the game itself and need to be further specified.

**Sinks, Markov Decision Processes and Markov Chains.** A state <sup>s</sup> <sup>∈</sup> <sup>S</sup> is called *absorbing* (or sink) if P(s, a, s) = 1 for all a ∈ Act(s) and Sinks(G) denotes the set of all absorbing states of SG G. A *Markov Decision Process* (MDP) is an SG where either S♦ = ∅ or S- = ∅, i.e. a one-player game. A *Markov Chain* (MC) is an SG where |Act(s)| = 1 for all s ∈ S. For technical reasons, we allow countably infinite state spaces S for both MDPs and MCs.

**Strategies.** We define the formal semantics of games by means of *paths* and *strategies*. An *infinite path* <sup>π</sup> is an infinite sequence <sup>π</sup> <sup>=</sup> <sup>s</sup>0a0s1a<sup>1</sup> ···∈ (<sup>S</sup> <sup>×</sup>L)ω, such that for every <sup>i</sup> <sup>∈</sup> <sup>N</sup>, <sup>a</sup><sup>i</sup> <sup>∈</sup> Act(si) and <sup>s</sup>i+1 ∈ {s <sup>|</sup> <sup>P</sup>(si, ai, s ) > 0}. *Finite path*s are defined analogously as elements of (S ×L)<sup>∗</sup> ×S. Note that when considering MCs, every state just has a single action, so an infinite path can be identified with an element of Sω.

A strategy of player Max is a function σ : (S × L)<sup>∗</sup> × S- → D(L) where σ(πs)(s ) > 0 only if s ∈ Act(s). It is *memoryless* if σ(πs) = σ(π s) for all π, π ∈ (S × L)∗. More generally, σ has memory of class-size at most m if the set (S × L)<sup>∗</sup> can be partitioned in m classes M1,...,M<sup>m</sup> ⊆ (S × L)<sup>∗</sup> such that σ(πs) = σ(π s) for all 1 ≤ i ≤ m, π, π ∈ M<sup>i</sup> and s ∈ S-. A memory of class-size m can be represented with log(m) bits.

A strategy is *deterministic* if σ(πs) is Dirac for all πs. Strategies that are both memoryless and deterministic are called *MD* and can be identified as functions σ : S- → L. Notice that there are at most |L| S different MD strategies, that is, exponentially many in S-; in general, there can be uncountably many strategies.

Strategies τ of player Min are defined analogously, with S replaced by S♦. The set of all strategies of player Max is denoted with ΣMax, the set of all MD strategies with ΣMD Max, and similarly ΣMin and ΣMD Min for player Min.

Fixing a strategy <sup>σ</sup> of one player in a game <sup>G</sup> yields the *induced MDP* <sup>G</sup><sup>σ</sup>. Fixing a strategy <sup>τ</sup> of the second player too, yields the *induced MC* <sup>G</sup>σ,τ . Notice that the induced models are finite if and only if the respective strategies use finite memory.

Given an (induced) MC <sup>G</sup>σ,τ , we let <sup>P</sup>σ,τ <sup>s</sup> be its associated probability measure on the Borel-measurable sets of infinite paths obtained from the standard cylinder construction where s is the initial state [39].

**Reachability and Safety.** In our setting, a *property* is a Borel-measurable set <sup>Ω</sup> <sup>⊆</sup> <sup>S</sup><sup>ω</sup> of infinite paths in an SG. The *reachability property* Reach (T) where <sup>T</sup> <sup>⊆</sup> <sup>S</sup> is the set Reach (T) = {s0s<sup>1</sup> ... <sup>∈</sup> <sup>S</sup><sup>ω</sup> | ∃<sup>i</sup> <sup>≥</sup> 0: <sup>s</sup><sup>i</sup> <sup>∈</sup> <sup>T</sup>}. The set Safe (T) = <sup>S</sup><sup>ω</sup> \Reach (T) is called a *safety property*. Further, for sets <sup>T</sup>1, T<sup>2</sup> <sup>⊆</sup> <sup>S</sup> we define the *until property* <sup>T</sup><sup>1</sup> <sup>U</sup> <sup>T</sup><sup>2</sup> <sup>=</sup> {s0s<sup>1</sup> ... <sup>∈</sup> <sup>S</sup><sup>ω</sup> | ∃<sup>i</sup> <sup>≥</sup> 0: <sup>s</sup><sup>i</sup> <sup>∈</sup> <sup>T</sup><sup>2</sup> ∧ ∀j < i: s<sup>j</sup> ∈ T1}. These properties are measurable (e.g. [4]). A reachability or safety property where the set T satisfies T ⊆ Sinks(G) is called *absorbing*. For the safety probabilities in an (induced) MC, it holds that <sup>P</sup>s(Safe (T)) = 1−Ps(Reach (T)). We highlight that an objective Safe (T) is specified by the set of paths to avoid, i.e. paths satisfying the objective remain forever in S \ T.

#### **2.2 Stochastic Lexicographic Reachability-Safety Games**

SGs with lexicographic preferences are a straightforward adaptation of the ideas of e.g. [46] to the game setting. The *lexicographic* order on R<sup>n</sup> is defined as *x* ≤lex *y* iff *x*<sup>i</sup> ≤ *y*<sup>i</sup> where i ≤ n is the greatest position such that for all j<i it holds that *x*<sup>j</sup> = *y*<sup>j</sup> . The position i thus acts like a *tiebreaker*. Notice that for arbitrary sets <sup>X</sup> <sup>⊆</sup> [0, 1]<sup>n</sup>, suprema and infima exist in the lexicographic order.

**Fig. 1.** (a) An example of a stochastic game. Max-states are rendered as squares and Min-states as rhombs ♦. Probabilistic choices are indicated with small circles. In this example, all probabilities equal <sup>1</sup>/2. The absorbing lex-objective *Ω* = {Reach (S1) , Safe (S2)} is indicated by the thick green line around S<sup>1</sup> = {s, t} and the dotted red line around S<sup>2</sup> = {t, u}. Self-loops in sinks are omitted. (b) Restriction of the game to lex-optimal actions only.

**Definition 2 (Lex-Objective and Lex-Value).** *A* lexicographic reachabilitysafety objective *(* lex-objective*, for short) is a vector Ω* = (Ω1,...,Ωn) *such that* Ω<sup>i</sup> ∈ {Reach (Si), Safe (Si)} *with* S<sup>i</sup> ⊆ S *for all* 1 ≤ i ≤ n*. We call Ω* absorbing *if all the* Ω<sup>i</sup> *are absorbing, i.e., if* S<sup>i</sup> ⊆ Sinks(G) *for all* 1 ≤ i ≤ n*. The* lex- (icographic)value *of Ω at state* s ∈ S *is defined as:*

$$\mathcal{^0\mathbf{v}}^{\text{lex}}(s) = \sup\_{\sigma \in \Sigma\_{\text{Max}}} \inf\_{\tau \in \Sigma\_{\text{Min}}} \mathbb{P}\_s^{\sigma,\tau}(\mathbf{\mathcal{Q}}) \tag{1}$$

*where* Pσ,τ <sup>s</sup> (*Ω*) *denotes the* vector (Pσ,τ <sup>s</sup> (Ω1),..., <sup>P</sup>σ,τ <sup>s</sup> (Ωn)) *and the suprema and infima are taken with respect to the order* <sup>≤</sup>lex *on* [0, 1]<sup>n</sup>*.*

Thus the lex-value at state s is the lexicographically supremal vector of probabilities that Max can ensure against all possible behaviors of Min. We will prove in Sect. 4.3 that the supremum and infimum in (1) can be exchanged; this property is called *determinacy*. We omit the superscript *Ω* in *<sup>Ω</sup>* **v**lex if it is clear from the context. We also omit the sets ΣMax and ΣMin in the suprema in (1), e.g. we will just write supσ.

*Example 1 (SGs and lex-values).* Consider the SG sketched in Fig. 1a with the lex-objective *Ω* = {Reach (S1), Safe (S2)}. Player Max must thus maximize the probability to reach S<sup>1</sup> and, moreover, among all possible strategies that do so, it must choose one that maximizes the probability to avoid S<sup>2</sup> forever.

**Lex-Value of Actions and Lex-Optimal Actions.** We extend the notion of value to actions. Let s ∈ S be a state. The *lex-value of an action* a ∈ Act(s) is defined as **v**lex(s, a) = - s- P(s, a, s )**v**lex(s ). If s ∈ S-, then action a is called *lex-optimal* if **<sup>v</sup>**lex(s, a) = maxb∈Act(s) **<sup>v</sup>**lex(s, b). Lex-optimal actions are defined analogously for states s ∈ S♦ by considering the minimum instead of the maximum. Notice that there is always at least one optimal action because Act(s) is finite by definition.

*Example 2 (Lex-value of actions).* We now intuitively explain the lex-values of all states in Fig. 1a. The lex-value of sink states s, t, u and w is determined by their membership in the sets S<sup>1</sup> and S2. E.g., **v**lex(s) = (1, 1), as it is part of the set S<sup>1</sup> that should be reached and not part of the set S<sup>2</sup> that should be avoided. Similarly we get the lex-values of t, u and w as (1, 0), (0, 0) and (0, 1) respectively. State v has a single action that yields (0, 0) or (0, 1) each with probability <sup>1</sup>/2, thus **v**lex(v) = (0, <sup>1</sup>/2).

State p has one action going to s, which would yield (1, 1). However, as p is a Min-state, its best strategy is to avoid giving such a high value. Thus, it uses the action going downwards and **v**lex(p) = **v**lex(q). State q only has a single action going to r, so **v**lex(q) = **v**lex(r).

State r has three choices: (i) Going back to q, which results in an infinite loop between q and r, and thus never reaches S1. So a strategy that commits to this action will not achieve the optimal value. (ii) Going to t or u each with probability <sup>1</sup>/2. In this case, the safety objective is definitely violated, but the reachability objective achieved with <sup>1</sup>/2. (iii) Going to t or v each with probability <sup>1</sup>/2. Similarly to (ii), the probability to reach S<sup>1</sup> is <sup>1</sup>/2, but additionally, there is a <sup>1</sup>/<sup>2</sup> · <sup>1</sup>/<sup>2</sup> chance to avoid S2. Thus, since r is a Max-state, its lex-optimal choice is the action leading to <sup>t</sup> or <sup>v</sup> and we get **<sup>v</sup>**lex(r)=(<sup>1</sup>/2, <sup>1</sup>/4).

Notice that with the kind of objectives considered, we can easily swap the roles of Max and Min by exchanging safety objectives with reachability and vice versa. It is thus no loss of generality to consider subsequently introduced notions such as optimal strategies only from the perspective of Max.

**Definition 3 (Lex-Optimal Strategies).** *A strategy* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>Max *is* lex-optimal *for <sup>Ω</sup> if for all* <sup>s</sup> <sup>∈</sup> <sup>S</sup>*,* **<sup>v</sup>**lex(s) = inf<sup>τ</sup>- Pσ,τ- <sup>s</sup> (*Ω*)*. A strategy* τ *of* Min *is a* lexoptimal counter-strategy *against* σ *if* Pσ,τ <sup>s</sup> (*Ω*) = inf<sup>τ</sup>- Pσ,τ- <sup>s</sup> (*Ω*)*.*

We stress that counter-strategies of Min depend on the strategy chosen by Max.

**Locally Lex-Optimal Strategies.** An MD strategy σ of Max (Min, resp.) is called *locally lex-optimal* if for all s ∈ S- (s ∈ S♦, resp.) and a ∈ Act(s), we have σ(s)(a) > 0 implies that action a is lex-optimal. Thus, locally lex-optimal strategies only assign positive probability to lex-optimal actions.

**Convention.** For the rest of the paper, unless stated otherwise, we use <sup>G</sup> <sup>=</sup> (S-, S♦,Act, P) to denote an SG and *Ω* = (Ω1,...,Ωn) is a suitable (not necessarily absorbing) lex-objective, that is Ω<sup>i</sup> ∈ {Reach (Si), Safe (Si)} with S<sup>i</sup> ⊆ S for all 1 ≤ i ≤ n.

#### **3 Lexicographic SGs with Absorbing Targets**

In this section, we show how to compute the lexicographic value for SGs where *all target sets are absorbing*. We first show various theoretical results in Sect. 3.1 upon which the algorithm for computing the values and optimal strategies presented in Sect. 3.2 is then built. The main technical difficulty arises from interleaving reachability and safety objectives. In Sect. 4, we will reduce solving general (not necessarily absorbing) SGs to the case with absorbing targets.

#### **3.1 Characterizing Optimal Strategies**

This first subsection derives a characterization of lex-optimal strategies in terms of local optimality and an additional reachability condition (Lemma 2 further below). It is one of the key ingredients for the correctness of the algorithm presented later and also gives rise to a (non-constructive) proof of existence of MD lex-optimal strategies in the absorbing case.

We begin with the following lemma that summarizes some straightforward facts we will frequently use. Recall that a strategy is *locally lex-optimal* if it only selects actions with optimal lex-value.

**Lemma 1.** *The following statements hold for any absorbing lex-objective Ω:*


*Proof (Sketch).* Both claims follow from the definitions of lex-value and lexoptimal strategy. For (b) in particular, we show that a strategy using actions which are not lex-optimal can be transformed into a strategy that achieves a greater (lower, resp.) value. Thus removing the non lex-optimal actions does not affect the lex-value. See [19, Appendix A.1] for more technical details.

*Example 3 (Modified game* <sup>G</sup>*).* Consider again the SG from Fig. 1a. Recall the lex-values from Example 1. Now we remove the actions that are not locally lexoptimal. This means we drop the action that leads from p to s and the action that leads from r to t or u (Fig. 1b). Since these actions were not used by the lex-optimal strategies, the value in the modified SG is the same as that of the original game.

*Example 4 (Locally lex-optimal does not imply globally lex-optimal).* Note that we do not drop the action that leads from r to q, because **v**lex(r) = **v**lex(q), so this action is locally lex-optimal. In fact, a lex-optimal strategy can use it arbitrarily many times without reducing the lex-value, as long as eventually it picks the action leading to t or v. However, if we only played the action leading to q, the lex-value would be reduced to (0, 1) as we would not reach S1, but would also avoid S2.

We stress the following consequence of this: Playing a locally lex-optimal strategy is not necessarily globally lex-optimal. It is not sufficient to just restrict the game to locally lex-optimal actions of the previous objectives and then solve the current one. Note that in fact the optimal strategy for the second objective Safe (S2) would be to remain in {p, q}; however, we must not pick this safety strategy, before we have not "tried everything" for all previous reachability objectives, in this case reaching S1.

This idea of "trying everything" for an objective Reach (Si) is equivalent to the following: either reach the target set Si, or reach a set of states from which <sup>S</sup><sup>i</sup> cannot be reached anymore. Formally, let Zero<sup>i</sup> <sup>=</sup> {<sup>s</sup> <sup>∈</sup> <sup>S</sup> <sup>|</sup> **<sup>v</sup>**lex <sup>i</sup> (s)=0} be the set of states that cannot reach the target set S<sup>i</sup> anymore. Note that it depends on the lex-value, not the single-objective value. This is important, as the singleobjective value could be greater than 0, but a more important objective has to be sacrificed to achieve it.

We define the set of states where we have "tried everything" for all reachability objectives as follows:

**Definition 4 (Final Set).** *For absorbing <sup>Ω</sup>, let* <sup>R</sup><i <sup>=</sup> {j<i <sup>|</sup> <sup>Ω</sup><sup>j</sup> <sup>=</sup> Reach (S<sup>j</sup> )}*. We define the* final set F<i = <sup>k</sup>∈R*<i* <sup>S</sup><sup>k</sup> <sup>∪</sup> <sup>k</sup>∈R*<i* Zero<sup>k</sup> *with the convention that* F<i = S *if* R<i = ∅*. We also let* F = F<n+1*.*

The final set contains all target states as well as the states that have lex-value 0 for all reachability objectives; we need the intersection of the sets Zerok, because as long as a state still has a positive probability to reach any target set, its optimal behaviour is to try that.

*Example 5 (Final set).* For the game in Fig. 1, we have Zero<sup>1</sup> = {u, v, w} and thus F = Zero<sup>1</sup> ∪ S<sup>1</sup> = {s, t, u, v, w}. An MD lex-optimal strategy of Max must almost-surely reach this set against any strategy of Min; only then it has "tried everything".

The following lemma characterizes MD lex-optimal strategies in terms of local lex-optimality and the final set.

**Lemma 2.** *Let <sup>Ω</sup> be an absorbing lex-objective and* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>MD Max*. Then* σ *is lexoptimal for Ω if and only if* σ *is locally lex-optimal and for all* s ∈ S *we have*

$$\forall \tau \in \Sigma\_{\text{Min}}^{\text{MD}} \colon \mathbb{P}\_s^{\sigma, \tau} (\mathsf{Reach} \ (F)) = 1. \tag{\*}$$

*Proof (Sketch).* The "*if* "-direction is shown by induction on the number n of targets. We make a case distinction according to the type of Ωn: If it is safety, then we prove that local lex-optimality is already sufficient for global lex-optimality. Else if Ω<sup>n</sup> is reachability, then intuitively, the additional condition () ensures that the strategy σ indeed "tries everything" and either reaches the target S<sup>n</sup> or eventually a state in Zero<sup>n</sup> where the opponent Min can make sure that Max cannot escape. The technical details of these assertions rely on a fixpoint characterization of the reachability probabilities combined with the classic Knaster-Tarski Fixpoint Theorem [44] and are given in [19, Appendix A.2].

For the "*only if* "-direction recall that lex-optimal strategies are necessarily locally lex-optimal by Lemma 1 (a). Further let i be such that Ω<sup>i</sup> = Reach (Si) and assume for contradiction that σ remains forever within S \ (S<sup>i</sup> ∪Zeroi) with positive probability against some strategy of Min. But then σ visits states with positive lex-value for Ω<sup>i</sup> infinitely often without ever reaching Si. Thus σ is not lex-optimal, contradiction.

Finally, this characterization allows us to prove that MD lex-optimal strategies exist for absorbing objectives.

**Theorem 1.** *For an absorbing lex-objective Ω, there exist MD lex-optimal strategies for both players.*

*Proof (Sketch).* We consider the subgame <sup>G</sup> obtained by removing lex-suboptimal actions for both players and then show that the (single-objective) value of Reach (F) in <sup>G</sup> equals 1. An optimal MD strategy for Reach (F) exists [26]; further, it is locally lex-optimal, because we are in <sup>G</sup>, and it reaches <sup>F</sup> almost surely. Thus, it is lex-optimal for *Ω* by the "*if* "-direction of Lemma 2. See [19, Appendix A.3] for more details on the proof.

#### **3.2 Algorithm for SGs with Absorbing Targets**

Theorem 1 is not constructive because it relies on the values **v**lex without showing how to compute them. Computing the values and constructing an optimal strategy for Max in the case of an absorbing lex-objective is the topic of this subsection.

**Definition 5 (QRO).** *A* quantified reachability objective *(QRO) is determined by a function* q : S → [0, 1] *where* S ⊆ S*. For all strategies* σ *and* τ *, we define:*

$$\mathbb{P}\_s^{\sigma,\tau}(\mathsf{Reach}\ (q)) = \sum\_{t \in S'} \mathbb{P}\_s^{\sigma,\tau}((S\ \langle S'\rangle \bullet \ t) \cdot q(t).$$

Intuitively, a QRO generalizes its standard Boolean counterpart by additionally assigning a weight to the states in the target set S . Thus the probability of a QRO is obtained by computing the sum of the q(t), t ∈ S , weighted by the probability to avoid S until reaching t. Note that this probability does not depend on what happens after reaching S ; so it is unaffected by making all states in S absorbing.

In Sect. 4, we need the dual notion of a quantified safety property, defined as Pσ,τ <sup>s</sup> (Safe (q)) = 1−Pσ,τ <sup>s</sup> (Reach (q)); intuitively, this amounts to minimizing the reachability probability.

*Remark 1.* A usual reachability property Reach (S ) is a special case of a quantified one with q(s) = 1 for all s ∈ S . Vice versa, quantified properties can be easily reduced to usual ones defined only by the set S : Convert all states t ∈ S into sinks, then for each such t prepend a new state t with a single action a and P(t , a, t) = q(t) and P(t , a, ⊥)=1 − q(t) where ⊥ is a sink state. Finally, redirect all transitions leading into t to t . Despite this equivalence, it turns out to be convenient and natural to use QROs.

*Example 6 (QRO).* Example 4 illustrated that solving a safety objective after a reachability objective can lead to problems, as the optimal strategy for Safe (S2) did not use the action that actually reached S1. In Example 5 we indicated that the final set F = {s, t, u, v, w} has to be reached almost surely, and among those states the ones with the highest safety values should be preferred. This can be encoded in a QRO as follows: Compute the values for the Safe (S2) objective for the states in F. Then construct the function q<sup>2</sup> : F → [0, 1] that maps all states in F to their safety value, i.e., q<sup>2</sup> : {s → 1, t → 0, u → 0, v → <sup>1</sup>/2, w → 1}.

Thus using QROs, we can effectively reduce (interleaved) safety objectives to quantified *reachability* objectives:

**Lemma 3 (Reduction Safe** <sup>→</sup> **Reach).** *Let <sup>Ω</sup> be an absorbing lex-objective with* <sup>Ω</sup><sup>n</sup> <sup>=</sup> Safe (Sn)*,* <sup>q</sup><sup>n</sup> : <sup>F</sup> <sup>→</sup> [0, 1] *with* <sup>q</sup>n(t) = **<sup>v</sup>**lex <sup>n</sup> (t) *for all* t ∈ F *where* F *is the final set (Definition 4), and Ω* = (Ω1,...,Ω<sup>n</sup>−1, Reach (qn))*. Then: <sup>Ω</sup>* **v**lex = *<sup>Ω</sup>* - **v**lex*.*

*Proof (Sketch).* By definition, *<sup>Ω</sup>* **v**lex(s) = *<sup>Ω</sup>* - **<sup>v</sup>**lex(s) for all <sup>s</sup> <sup>∈</sup> <sup>F</sup>, so we only need to consider the states in S \ F. Since any lex-optimal strategy for *Ω* or *Ω* must also be lex-optimal for *Ω*<n, we know by Lemma 2 that such a strategy reaches F<n almost-surely. Note that we have F<n = F, as the n-th objective, either the QRO or the safety objective, does not add any new states to F. The reachability objective Reach (qn) weighs the states in F with their lexicographic safety values **v**lex <sup>n</sup> . Thus we additionally ensure that in order to reach F, we use those actions that give us the best safety probability afterwards. In this way we obtain the correct lex-values **v**lex <sup>n</sup> even for states in S \F. See [19, Appendix A.4] for the full technical proof.

*Example 7 (Reduction Safe* → *Reach).* Recall Example 6. By the preceding Lemma 3, computing sup<sup>σ</sup> inf<sup>τ</sup> Pσ,τ <sup>s</sup> (Reach (S1), Reach (q2)) yields the correct lex-value **<sup>v</sup>**lex(s) for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>. Consider for instance state <sup>r</sup> in the running example: The action leading to q is clearly suboptimal for Reach (q2) as it does not reach F. Both other actions surely reach F. However, since q2(t) = q2(u)=0 while q2(v) = <sup>1</sup>/2, the action leading to u and v is preferred over that leading to t and u, as it ensures the higher safety probability after reaching F.

We now explain the basic structure of Algorithm 1. More technical details are explained in the proof sketch of Theorem 2 and the full proof is in [19, Appendix A.5]. The idea of Algorithm 1 is, as sketched in Sect. 3.1, to consider the objectives sequentially in the order of importance, i.e., starting with Ω1.

```
Algorithm 1. Solve absorbing lex-objective
```

```
Input: SG G, absorbing lex-objective Ω = (Ω1,...,Ωn)
Output: Vector of lex-values vlex, MD lex-optimal strategy σ for Max
1: procedure SolveAbsorbing(G, Ω )
2: initialize vlex and σ arbitrarily
3: G←G -
       Consider whole game in the beginning.
4: for 1 ≤ i ≤ n do
5: (v, σ-
          ) ← SolveSingleObj(G-
                          , Ωi)
6: if Ωi = Safe (Si) then
7: F<i ← final set with respect to G-
                                  and Ω <i  see Def. 4
8: qi(s) ← v(s) for all s ∈ F<i  see Def. 5
9: (v, σQ) ← SolveSingleObj(G-
                              , Reach (qi))
10: end if
11: G ←-
         restriction of G-
                      to optimal actions w.r.t. v
12: vlex
         i ← v
13: for s ∈ S do
14: if (Ωi = Reach (Si) and v(s) > 0) or (Ωi = Safe (Si) and s ∈ F<i) then
15: σ(s) ← σ-
                   (s)  Strategy improvement
16: else if Ωi = Safe (Si) and s /∈ F<i
17: σ(s) ← σQ(s)
18: end if
19: end for
20: end for
      return (vlex, σ)
21: end procedure
```
The i-th objective is solved (Lines 5–10) and the game is restricted to only the locally optimal actions (Line 11). This way, in the i-th iteration of the main loop, only actions that are locally lex-optimal for objectives 1 through (i−1) are considered. Finally, we construct the optimal strategy and update the result variables (Lines 12–19).

**Theorem 2.** *Given an SG* <sup>G</sup> *and an absorbing lex-objective <sup>Ω</sup>* = (Ω1,...,Ωn)*, Algorithm 1 correctly computes the vector of lex-values* **v**lex *and an MD lexoptimal strategy* σ *for player* Max*. It needs* n *calls to a single objective solver.*

*Proof (Sketch).*


– **Resulting strategy:** When storing the resulting strategy, we again need to avoid errors induced by the fact that locally lex-optimal actions need not be globally lex-optimal. This is why for a reachability objective, we only update the strategy in states that have a positive value for the current objective; if the value is 0, the current strategy does not have any preference, and we need to keep the old strategy. For safety objectives, we need to update the strategy in two ways: for all states in the final set F<i, we set it to the safety strategy <sup>σ</sup> (from Line 5) as within <sup>F</sup><i we do not have to consider the previous reachability objectives and therefore must follow an optimal safety strategy. For all states in S \ F<i, we set it to the reachability strategy from the QRO σ<sup>Q</sup> (from Line9). This is correct, as σ<sup>Q</sup> ensures almost-sure reachability of F<i which is necessary to satisfy all preceding reachability objectives; moreover σ<sup>Q</sup> prefers those states in F<i that have a higher safety value (cf. Lemma 3). – **Termination:** The main loop of the algorithm invokes SolveSingleObj for each of the <sup>n</sup> objectives.

#### **4 General Lexicographic SGs**

We now consider *Ω* where S<sup>i</sup> ⊆ Sinks(G) does *not* necessarily hold. Section 4.1 describes how we can reduce these general lex-objectives to the absorbing case. The resulting algorithm is given in Sect. 4.2 and the theoretical implications in Sect. 4.3.

#### **4.1 Reducing General Lexicographic SGs to SGs with Absorbing Targets**

In general lexicographic SG, strategies need memory, because they need to remember which of the S<sup>i</sup> have already been visited and behave accordingly. We formalize the solution of such games by means of *stages*. Intuitively, one can think of a stage as a copy of the game with less objectives, or as the sub-game that is played after visiting some previously unseen set Si.

**Definition 6 (Stage).** *Given an arbitrary lex-objective Ω* = (*Ω*1,..., *Ω*n) *and a set* I ⊆ {i ≤ n}*, a* stage *Ω*(I) *is the objective vector where the objectives Ω*<sup>i</sup> *are removed for all* i ∈ I*.*

*For state* s ∈ S*, let Ω*(s) = *Ω*({i | s ∈ Si})*. If a stage contains only one objective, we call it* simple*.*

*Example 8 (Stages).* Consider the SG in Fig. 2a. As there are two objectives, there are four possible stages: The one where we consider both objectives (the region denoted with *Ω* in Fig. 2b), the *simple* ones where we consider only one of the objectives (regions *Ω*({1}) and *Ω*({2})), and the one where both objectives have been visited. The last stage is trivial since there are no more objectives, hence we do not depict it and do not have to consider it. The actions of q and r are omitted in the *Ω*-stage, as upon visiting these states, a new stage begins.

**Fig. 2.** (a) SG with non-absorbing lex-objective *Ω* = (Reach (S1) , Reach (S2)). (b) The three stages identified by the sub-objectives *Ω*, *Ω*({1})=(Reach (S2)) and *Ω*({2}) = (Reach (S1)). The two stages on the right are both *simple*.

Consider the simple stages: in stage *Ω*({1}), q has value 0, as it is a Min-state and will use the self-loop to avoid reaching r ∈ S2. In stage *Ω*({2}), both p and r have value 1, as they can just go to the target state q ∈ S1. Combining this knowledge, we can get an optimal strategy for every state. In particular, note that an optimal strategy for state p needs memory: First go to r and thereby reach stage *Ω*({2}). Afterwards, go from r to p and now, on the second visit in a different stage, use the other action in p to reach q. In this example, we observe another interesting fact about lexicographic games: it can be optimal to first satisfy less important objectives.

In the example, we combined our knowledge of the sub-stages to find the lex-values for the whole lex-objective. In general, the values for the stages are numbers in [0, 1]. Thus we reuse the idea of *quantified* reachability and safety objectives, see Definition 5.

For all 1 ≤ i ≤ n, let q<sup>i</sup> : <sup>j</sup>≤<sup>n</sup> <sup>S</sup><sup>j</sup> <sup>→</sup> [0, 1] by defined by:

$$q\_i(s) = \begin{cases} 1 & \text{if } s \in S\_i \text{ and else:}\\ \begin{array}{ll} \mathcal{Q}^{(s)} \mathbf{v}\_i^{\text{lex}}(s) & \text{if } \mathcal{Q}\_i \text{ is reachable} \\ 1 - \mathcal{Q}^{(s)} \mathbf{v}\_i^{\text{lex}}(s) & \text{if } \mathcal{Q}\_i \text{ is safety.} \end{array} \end{cases}$$

To keep the correct type of every objective, we let q*Ω* = (type1(q1),...,typen(qn)) where for all 1 ≤ i ≤ n, type<sup>i</sup> = Reach if Ω<sup>i</sup> = Reach (Si) and else type<sup>i</sup> = Safe if Ω<sup>i</sup> = Safe (Si). So we have now reduced a general lexicographic objective *Ω* to a vector of quantitative objectives q*Ω*. Lemma 4 shows that this reduction preserves the values.

#### **Lemma 4.** *For arbitrary lex-objectives Ω it holds that <sup>Ω</sup>* **v**lex = <sup>q</sup>*<sup>Ω</sup>* **v**lex*.*

*Proof (Sketch).* We write S = <sup>j</sup>≤<sup>n</sup> <sup>S</sup><sup>j</sup> for the sake of readability in this sketch. By induction on the length n of the lex-objective *Ω*, it is easy to show that the equation holds in states <sup>s</sup> <sup>∈</sup> <sup>S</sup>, i.e., *<sup>Ω</sup>* **<sup>v</sup>**lex(s) = <sup>q</sup>*<sup>Ω</sup>* **<sup>v</sup>**lex(s). For a state <sup>s</sup> which is not contained in any of the S<sup>j</sup> , and for any strategies σ, τ we have the following equation

$$\mathbb{P}\_s^{\sigma,\tau}(\mathsf{Reach}\ (S\_i)) = \sum\_{\pi t \in Paths\_{fin}(\mathfrak{S})} \mathbb{P}\_s^{\sigma,\tau}(\pi t) \cdot \mathbb{P}\_{\pi t}^{\sigma,\tau}(\mathsf{Reach}\ (S\_i)),$$

where P athsf in(S) = {πt ∈ ((S \ S) × L)<sup>∗</sup> × S | t ∈ S} denotes the set of all finite paths to a state in <sup>S</sup> in the Markov chain <sup>G</sup>σ,τ and <sup>P</sup>σ,τ <sup>s</sup> (πt) is the probability of such a path when <sup>G</sup>σ,τ starts in <sup>s</sup>. From this we deduce that in order to maximize the left hand size of the equation in the lexicographic order, we should play such that we prefer reaching states in S where q<sup>i</sup> has a higher value; that is, we should maximize the QRO Reach (qi). The argument for safety is similar and detailed in [19, Appendix A.6].

The functions q<sup>i</sup> involved in q*Ω all have the same domain* <sup>j</sup>≤<sup>n</sup> <sup>S</sup><sup>j</sup> . Hence we can, as mentioned below Definition 5, consider q*Ω* on the game where all states in <sup>j</sup>≤<sup>n</sup> <sup>S</sup><sup>j</sup> are sinks without changing the lex-value. This is precisely the definition of an absorbing game, and hence we can compute <sup>q</sup>*<sup>Ω</sup>* **v**lex using Algorithm 1 from Sect. 3.2.

#### **4.2 Algorithm for General SG**

Algorithm 2 computes the lex-value *<sup>Ω</sup>* **v**lex for a given lexicographic objective *Ω* and an arbitrary SG G. We highlight the following technical details:


**Algorithm 2.** Solve general lex-objective **Input:** SG G, lex-objective *Ω* = (Ω1,...,Ω*n*) **Output:** Lex-values *<sup>Ω</sup>* **<sup>v</sup>**lex, lex-optimal <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>Max with memory of class-size <sup>≤</sup> <sup>2</sup>*<sup>n</sup>* <sup>−</sup> <sup>1</sup> 1: **procedure** SolveLex(G, *Ω*) 2: **if** *Ω* is *simple* **then** 3: **return** SolveSingleObj(G, Ω1) 4: **end if** 5: **for** s ∈ - *<sup>j</sup>*≤*<sup>n</sup>* <sup>S</sup>*<sup>j</sup>* **do** 6: *Ω* (*s*) **v**lex, *<sup>Ω</sup>* (*s*) σ ← SolveLex(G, *Ω*(s)) With dynamic programming 7: **end for** 8: **for** 1 ≤ i ≤ n **do** 9: Let q*<sup>i</sup>* : - *<sup>j</sup>*≤*<sup>n</sup>* <sup>S</sup>*<sup>j</sup>* <sup>→</sup> [0, 1], <sup>q</sup>*i*(s) <sup>←</sup> ⎧ ⎪⎨ ⎪⎩ 1 if s ∈ S*<sup>i</sup>* and else: *Ω* (*s*) **v**lex *<sup>i</sup>* (s) if type(Ω*i*) = Reach <sup>1</sup> <sup>−</sup> *<sup>Ω</sup>* (*s*) **v**lex *<sup>i</sup>* (s) if type(Ω*i*) = Safe 10: **end for** 11: q*Ω* ← (type1(q1),...,type*n*(q*n*)) 12: (<sup>q</sup>*<sup>Ω</sup>* **<sup>v</sup>**lex, <sup>q</sup>*<sup>Ω</sup>* <sup>σ</sup>) <sup>←</sup> SolveAbsorbing(G, <sup>q</sup>*Ω*) 13: <sup>σ</sup> <sup>←</sup> adhere to <sup>q</sup>*<sup>Ω</sup>* <sup>σ</sup> until some <sup>s</sup> <sup>∈</sup> - *<sup>j</sup>*≤*<sup>n</sup>* <sup>S</sup>*<sup>j</sup>* is reached. Then adhere to *<sup>Ω</sup>* (*s*) σ. 14: **return** ( <sup>q</sup>*<sup>Ω</sup>* **v**lex, σ) 15: **end procedure**

– **Resulting strategy:** The resulting strategy is composed in Line 13: It adheres to the strategy for the quantitative query <sup>q</sup>*<sup>Ω</sup>* <sup>σ</sup> until some <sup>s</sup> <sup>∈</sup> <sup>j</sup>≤<sup>n</sup> S<sup>j</sup> is reached. Then, to achieve the values promised by qi(s) for all i with s /∈ Si, it adheres to *<sup>Ω</sup>* (s) σ, the optimal strategy for stage *Ω*(s) obtained by the recursive call.

**Corollary 1.** *Given an SG* <sup>G</sup> *and an arbitrary lex-objective <sup>Ω</sup>* = (Ω1,...,Ωn)*, Algorithm 2 correctly computes the vector of lex-values* **v**lex *and a deterministic lex-optimal strategy* <sup>σ</sup> *of player* Max *which uses memory of class-size* <sup>≤</sup> <sup>2</sup><sup>n</sup> <sup>−</sup> <sup>1</sup>*. The algorithm needs at most* <sup>2</sup><sup>n</sup>−<sup>1</sup> *calls to* SolveAbsorbing *or* SolveSingleObj*.*

*Proof.* Correctness of the algorithm and termination follows from the discussion of the algorithm, Lemma 4 and Theorem 2.

#### **4.3 Theoretical Implications: Determinacy and Complexity**

Theorem 3 below states that lexicographic games are *determined* for arbitrary lex-objectives *Ω*. Intuitively, this means that the lex-value is independent from the player who fixes their strategy first. Recall that this property does not hold for non-lexicographic multi-reachability/safety objectives [23].

**Theorem 3 (Determinacy).** *For general SG* <sup>G</sup> *and lex-objective <sup>Ω</sup>, it holds for all* s ∈ S *that:*

$$\mathbf{v}^{\mathbb{lex}}(s) = \sup\_{\sigma} \inf\_{\tau} \mathbb{P}\_s^{\sigma,\tau}(\mathcal{Q}) = \inf\_{\tau} \sup\_{\sigma} \mathbb{P}\_s^{\sigma,\tau}(\mathcal{Q}).$$

*Proof.* This statement follows because single-objective games are determined [26] and Algorithm 2 obtains all values by either solving single-objective instances directly (Line 3) or calling Algorithm 1, which also reduces everything to the single-objective case (Line 5 of Algorithm 1). Thus the sup-inf values **v**lex returned by the algorithm are in fact equal to the inf-sup values.

By analyzing Algorithm 2, we also get the following complexity results:

**Theorem 4 (Complexity).** *For any SG* <sup>G</sup> *and lex-objective <sup>Ω</sup>* <sup>=</sup> (Ω1,...,Ωn)*:*


We leave the question whether PSPACE is also an upper bound open. The main obstacle towards proving PSPACE-membership is that it is unclear if the lex-value – being dependent on the value of *exponentially* many stages in the worst-case – may actually have exponential bit-complexity.

## **5 Experimental Evaluation**

In this section, we report the results of a series of experiments made with a prototypical implementation of our algorithm.

**Case Studies.** We have considered the following case studies for our experiments:


**Implementation and Experimental Results.** We have implemented our algorithm within PRISM-games [37]. Since PRISM-games does not provide an *exact* algorithm to solve SGs, we used the available value iteration to implement our single-objective blackbox. Note that since this value iteration is not exact for single-objective SGs, we cannot compute the exact lex-values. Nevertheless, we can still measure the overhead introduced by our algorithm compared to a single-objective solver.

In our implementation, value iteration stops if the values do not change by more than 10−<sup>8</sup> per iteration, which is PRISM's default configuration. The experiments were conducted on a 2.4 GHz Quad-Core Intel<sup>c</sup> CoreTM i5 processor, with 4 GB of RAM available to the Java VM. The results are reported in Table 1. We only recorded the run time of the actual algorithms; the time needed to parse and build the model is excluded. All numbers are rounded to full seconds. All instances (even those with state spaces of order 10<sup>6</sup>) could be solved within a few minutes.

**Table 1.** Experimental Results. The two leftmost columns of the table show the type of the lex-objective, the name of the case studies, possibly with scaling parameters, and the number of states in the model. The next three columns give the verification times (excluding time to parse and build the model), rounded to full seconds. The final three columns provide the average number of actions for the original SG as well as all considered subgames <sup>G</sup> in the main stage, and lastly the fraction of stages considered, i.e. the stages solved by the algorithm compared to the theoretically maximal possible number of stages (2*<sup>n</sup>* <sup>−</sup> 1).


The case studies are grouped by the type of lex-objective, where R indicates reachability, S safety. For each combination of case study and scaling parameters, we report the state size in column |S|, three different model checking runtimes, the average number of actions in the original and all considered restricted games, and the fraction of stages considered, i.e. the stages solved by the algorithm compared to the theoretically maximal possible number of stages (2<sup>n</sup> <sup>−</sup> 1).

We compare the time of our algorithm on the lexicographic objective (Lex.) to the time for checking the first single objective (First) and the sum of checking all single objectives (All). We see that the runtimes of our algorithm and checking all single objectives are always in the same order of magnitude. This shows that our algorithm works well in practice and that the overhead is often small. Even on SGs of non-trivial size (HW[10 × 10] and AV[20 × 20]), our algorithm returns the result within a few minutes.

Regarding the average number of actions, we see that the decrease in the number of actions in the sub-games <sup>G</sup> obtained by restricting the input game to optimal actions varies: For example, very few actions are removed in the Dice instances, in AV we have a moderate decrease and in HW a significant decrease, almost eliminating all non-determinism after the first objective. It is our intuition that the less actions are removed, the higher is the overhead compared to the individual single-objective solutions. Consider the AV and HW examples: While for AV[20 × 20], computing the lexicographic solution takes 1.7 times as long as all the single-objective solutions, it took only about 25% longer for HW[10×10]; this could be because in HW, after the first objective only little nondeterminism remains, while in AV also for the second and third objectives lots of choices have to be considered. Note that the first objective sometimes (HW), but not always (AV) needs the majority of the runtime.

We also see that the algorithm does not have to explore all possible stages. For example, for Dice we always just need a single stage, because the SG is absorbing. For charlton and HW all stages are relevant for the lex-objective, while for AV 4 of 7 need to be considered.

#### **6 Conclusion and Future Work**

In this work we considered simple stochastic games with lexicographic reachability and safety objectives. Simple stochastic games are a standard model in reactive synthesis of stochastic systems, and lexicographic objectives let one consider multiple objectives with an order of preference. We focused on the most basic objectives: safety and reachability. While simple stochastic games with lexicographic objectives have not been studied before, we have presented (a) determinacy; (b) strategy complexity; (c) computational complexity; and (d) algorithms; for these games. Moreover, we showed how these games can model many different case studies and we present experimental results for them.

There are several directions for future work. First, for the general case closing the complexity gap (NEXPTIME∩coNEXPTIME upper bound and PSPACE lower bound) is an open question. Second, the study of lexicographic simple stochastic games with more general objectives, e.g., quantitative or parity objectives poses interesting questions. In particular, in the case of parity objectives, there are some indications that the problem is significantly harder: Consider the case of a reachability-safety lex-objective. If the lex-value is (1, 1) then both objectives can be guaranteed almost surely. Since almost-sure safety is sure safety, our results imply that sure safety and almost-sure reachability can be achieved with constant memory. In contrast, for parity objectives the combination of sure and almost-sure requires infinite-memory (e.g, see [21, Appendix A.1]).

#### **References**

1. Altman, E.: Constrained Markov Decision Processes. CRC Presss, Boca Raton (1999)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Qualitative Controller Synthesis for Consumption Markov Decision Processes**

Frantiˇsek Blahoudek<sup>1</sup>, Tom´aˇs Br´azdil<sup>2</sup>, Petr Novotn´y<sup>2</sup>, Melkior Ornik<sup>3</sup>, Pranay Thangeda3(B) , and Ufuk Topcu<sup>1</sup>

> <sup>1</sup> The University of Texas at Austin, Austin, USA frantisek.blahoudek@gmail.com, utopcu@utexas.edu <sup>2</sup> Masaryk University, Brno, Czech Republic {xbrazdil,petr.novotny}@fi.muni.cz <sup>3</sup> University of Illinois at Urbana-Champaign, Urbana, USA {mornik,pranayt2}@illinois.edu

### **1 Introduction**

In the context of formal methods, controller synthesis typically boils down to computing a strategy in an *agent-environment* model, a nondeterministic statetransition model where some of the nondeterministic choices are resolved by the controller and some by an uncontrollable environment. Such models are typically either two-player graph games with an adversarial environment or Markov decision process (MDPs); the latter case being apt for modelling statistically predictable environments. In this paper, we consider controller synthesis for *resource-constrained MDPs*, where the computed controller must ensure, in addition to satisfying some linear-time property, that the system's operation is not compromised by a lack of necessary resources.

This work was partially supported by NASA under Early Stage Innovations grant No. 80NSSC19K0209, and by DARPA under grant No. HR001120C0065. Petr Novotn´y is supported by the Czech Science Foundation grant No. GJ19-15134Y.

*Resource-Constrained Probabilistic Systems. Resource-constrained* systems need a supply of some resource (e.g. power) for steady operation: the interruption of the supply can lead to undesirable consequences and has to be avoided. For instance, an autonomous system, e.g. an autonomous electric vehicle (*AEV* ), is not able to draw power directly from an endless source. Instead, it has to rely on an internal storage of the resource, e.g. a battery, which has to be replenished in regular intervals to prevent resource exhaustion. Practical examples of AEVs include driverless cars, drones, or planetary rovers [8]. In these domains, resource failures may cause a costly mission failure and even safety risks. Moreover, the operation of autonomous systems is subject to probabilistic uncertainty [54]. Hence, in this paper, we study the resource-constrained strategy synthesis problem for MDPs.

*Models of Resource-Constrained Systems & Limitations of Current Approaches.* There is a substantial body of work in the area of verification of resourceconstrained systems [3,5,7,9,11,23,38,39,53,58]. The typical approach is to model them as finite-state systems augmented with an integer-valued counter representing the current *resource level,* i.e. the amount of the resource present in the internal storage. The resource constraint requires that the resource level never drops below zero.<sup>1</sup> In the well-known *energy* model [11,23], each transition is labelled by an integer, and performing an --labelled transition results in being added to the counter. Thus, negative numbers stand for resource consumption while positive ones represent re-charging by the respective amount. Many variants of both MDP and game-based energy models were studied, as detailed in the related work. In particular, [26] considers controller synthesis for energy MDPs with qualitative B¨uchi and parity objectives. The main limitation of energy-based agent-environment models is that in general, they are not known to admit polynomial-time controller synthesis algorithms. Indeed, already the simplest problem, deciding whether a non-negative energy can be maintained in a two-player energy game, is at least as hard as solving mean-payoff graph games [11]; the complexity of the latter being a well-known open problem [45]. This hardness translates also to MDPs [26], making polynomial-time controller synthesis for energy MDPs impossible without a theoretical breakthrough.

*Consumption models,* introduced in [14], offer an alternative to energy models. In a consumption model, a non-negative integer, *cap*, represents the maximal amount of the resource the system can hold, e.g. the battery capacity. Each transition is labelled by a non-negative number representing the amount of the resource *consumed* when taking the transition (i.e., taking an --labelled transition decreases the resource level by -). The resource replenishment is different from the energy approach. The consumption approach relies on the fact that reloads are often *atomic events*, e.g. an AEV plugging into a charging station and waiting to finish the charging cycle. Hence, some states in the consumption model are designated as *reload states,* and whenever the system visits a

<sup>1</sup> In some literature, the level is required to stay positive as opposed to non-negative, but this is only a matter of definition: both approaches are equivalent.

reload state, the resource level is replenished to the full capacity *cap*. Modelling reloads as atomic events is natural and even advantageous: consumption models typically admit more efficient analysis than energy models [14,47]. However, consumption models have not yet been considered in the probabilistic setting.

*Our Contribution.* We study strategy synthesis in consumption MDPs with B¨uchi objectives. Our main theoretical result is stated in the following theorem.

**Theorem 1.** *Given a consumption MDP* <sup>M</sup> *with a capacity cap, an initial resource level* 0 ≤ d ≤ *cap, and a set* T *of accepting states, we can decide, in polynomial time, whether there exists a strategy* σ *such that when playing according to* σ*, the following* consumption-B¨uchi objectives *are satisfied:*


*Moreover, if such a strategy exists then we can compute, in polynomial time, its polynomial-size representation.*

For the sake of clarity, we restrict to proving Theorem 1 for a natural sub-class of MDPs called *decreasing consumption MDPs,* where there are no cycles of zero consumption. The restriction is natural (since in typical resource-constrained systems, each action – even idling – consumes some energy, so zero cycles are unlikely) and greatly simplifies presentation. In addition to the theoretical analysis, we implemented the algorithm behind Theorem 1 and evaluated it on several benchmarks, including a realistic model of an AEV navigating the streets of Manhattan. The experiments show that our algorithm is able to efficiently solve large CMDPs, offering a good scalability.

*Significance.* Some comments on Theorem 1 are in order. First, all the numbers in the MDP, and in particular the capacity *cap*, are encoded in binary. Hence, "polynomial time" means time polynomial in the encoding size of the MDP itself and in log(*cap*). In particular, a naive "unfolding" of the MDP, i.e. encoding the resource levels between 0 and *cap* into the states, does not yield a polynomialtime algorithm, but an exponential-time one, since the unfolded MDP has size proportional to *cap*. We employ a value-iteration-like algorithm to compute minimal energy levels with which one can achieve the consumption-B¨uchi objectives.

A similar concern applies to the "polynomial-size representation" of the strategy σ. To satisfy a consumption-B¨uchi objective, σ generally needs to keep track of the current resource level. Hence, under the standard notion of a finite-memory (FM) strategy (which views FM strategies as transducers), σ would require memory proportional to *cap*, i.e. a memory exponentially large w.r.t. size of the input. However, we show that for each state s we can partition the integer interval [0,..., *cap*] into polynomially many sub-intervals I<sup>s</sup> <sup>1</sup> ,...,I<sup>s</sup> <sup>k</sup> such that, for each 1 ≤ j ≤ k, the strategy σ picks the same action whenever the current state is

<sup>2</sup> In our model, this is equivalent to requiring that with probability 1, the resource level never drops below 0.

s and the current resource level is in I<sup>s</sup> <sup>j</sup> . As such, the endpoints of the intervals are the only extra knowledge required to represent σ, a representation which we call a *counter selector*. We instrument our main algorithm so as to compute, in polynomial time, a polynomial-size counter selector representing the witness strategy σ.

Finally, we consider linear-time properties encoded by B¨uchi objectives over the states of the MDP. In essence, we assume that the translation of the specification to the B¨uchi automaton and its product with the original MDP model of the system were already performed. Probabilistic analysis typically requires the use of deterministic B¨uchi automata, which cannot express all linear-time properties. However, in this paper we consider qualitative analysis, which can be performed using restricted versions of non-deterministic B¨uchi automata that are still powerful enough to express all ω-regular languages. Examples of such automata are limit-deterministic B¨uchi automata [51] or good-for-MDPs automata [41]. Alternatively, consumption MDPs with parity objectives could be reduced to consumption-B¨uchi MPDs using the standard parity-to-B¨uchi MDP construction [25,30,32,33]. We abstract from these aspects and focus on the technical core of our problem, solving consumption-B¨uchi MDPs.

Consequently, to our best knowledge, we present the first polynomial-time algorithm for controller synthesis in resource-constrained MDPs with ω-regular objectives.

*Related Work.* There is an enormous body of work on energy models. Stemming from the models introduced in [11,23], the subsequent work covered energy games with various combinations of objectives [10,12,13,18,20,21,27,48], energy games with multiple resource types [15,24,28,31,37,43,44,57] or the variants of the above in the MDP [17,49], infinite-state [1], or partially observable [34] settings. As argued previously, the controller synthesis within these models is at least as hard as solving mean-payoff games. The paper [29] presents polynomialtime algorithms for non-stochastic energy games with special weight structures. Recently, an abstract algebraic perspective on energy models was presented in [22,35,36].

Consumption systems were introduced in [14] in the form of consumption games with multiple resource types. Minimizing mean-payoff in automata with consumption constraints was studied in [16].

Our main result requires, as a technical sub-component, solving the *resourcesafety* (or just *safety*) problem in consumption MDPs, i.e. computing a strategy which prevents resource exhaustion. The solution to this problem consists (in principle) of a Turing reduction to the problem of minimum cost reachability in two-player games with non-negative costs. The latter problem was studied in [46], with an extension to arbitrary costs considered in [19] (see also [40]). We present our own, conceptually simple, value-iteration-like algorithm for the problem, which is also used in our implementation.

Elements of resource-constrained optimization and minimum-cost reachability are also present in the line of work concerning *energy-utility quantiles* in MDPs [4–7,42]. In this setting, there is no reloading in the consumption- or energy-model sense, and the task is typically to minimize the total amount of the resource consumed while maximizing the probability that some other objective is satisfied.

*Paper Organization & Outline of Techniques.* After the preliminaries (Sect. 2), we present counter selectors in Sect. 3. The next three sections contain the three main steps of our analysis. In Sect. 4, we solve the safety problem in consumption MDPs. The technical core of our approach is presented in Sect. 5, where we solve the problem of *safe positive reachability*: finding a resource-safe strategy which ensures that the set T of accepting states is visited with positive probability. Solving consumption-B¨uchi MDPs then, in principle, consists of repeatedly applying a strategy for safe positive reachability of T, ensuring that the strategy is "re-started" whenever the attempt to reach T fails. Details are given in Sect. 6. Finally, Sect. 7 presents our experiments. Due to space constraints, most technical proofs were moved to the full version.

#### **2 Preliminaries**

We denote by <sup>N</sup> the set of all non-negative integers and by <sup>N</sup> the set <sup>N</sup> ∪ {∞}. Given a set <sup>I</sup> and a vector **<sup>v</sup>** <sup>∈</sup> <sup>N</sup><sup>I</sup> of integers indexed by <sup>I</sup>, we use **<sup>v</sup>**(i) to denote the i-component of **v**. We assume familiarity with basic notions of probability theory. In particular, a *probability distribution* on an at most countable set X is a function f : X → [0, 1] s.t. - <sup>x</sup>∈<sup>X</sup> <sup>f</sup>(x) = 1. We use <sup>D</sup>(X) to denote the set of all probability distributions on X.

**Definition 1 (CMDP).** *A* consumption Markov decision process *(CMDP) is a tuple* M = (S, A, Δ, C, *R*, *cap*) *where* S *is a finite set of* states*,* A *is a finite set of* actions*,* <sup>Δ</sup>: <sup>S</sup> <sup>×</sup> <sup>A</sup> <sup>→</sup> <sup>D</sup>(S) *is a total* transition function*,* <sup>C</sup> : <sup>S</sup> <sup>×</sup> <sup>A</sup> <sup>→</sup> <sup>N</sup> *is a total* consumption function*, R* ⊆ S *is a set of* reload states *where the resource can be reloaded, and cap is a* resource capacity*.*

Figure 1 shows a visual representation of an CMDP. We denote by M(R ) for *R* ⊆ S the CMDP obtained from M by changing the set of reloads to R . For

Distributions in are indicated by gray numbers (we leave out 1 when an action has only one successor), and the cost of an action follows its name in the edge labels. Actions labeled by *a*1 2 represent that and *C* are defined identically for both actions *a*<sup>1</sup> and *a*2. The blue background indicates a target set *T s*<sup>2</sup> , while the double circles represent the reload states.

**Fig. 1.** CMDP M = ({s1, s2, s3, s4, s5}, {a1, a2}, Δ, C, {s2, s5}, 20). Details are given on the right.

s ∈ S and a ∈ A, we denote by *Succ*(s, a) the set {t | Δ(s, a)(t) > 0}. A *path* is a (finite or infinite) state-action sequence <sup>α</sup> <sup>=</sup> <sup>s</sup>1a1s2a2s<sup>3</sup> ···∈ (S×A)<sup>ω</sup> <sup>∪</sup>(S·A)∗·<sup>S</sup> such that <sup>s</sup>i+1 <sup>∈</sup> *Succ*(si, ai) for all <sup>i</sup>. We define <sup>α</sup><sup>i</sup> <sup>=</sup> <sup>s</sup><sup>i</sup> and *Act* <sup>i</sup> (α) = ai. We use α..i for the finite prefix s1a<sup>1</sup> ...s<sup>i</sup> of α, αi.. for the suffix sia<sup>i</sup> ... , and αi..j for the infix sia<sup>i</sup> ...s<sup>j</sup> . A finite path is a *cycle* if it starts and ends in the same state and is *simple* if none of its infixes forms a cycle. The *length* of a path α is the number len(α) of actions on α and len(α) = ∞ if α is infinite.

A CMDP is *decreasing* if for every cycle s1a1s<sup>2</sup> ...ak−<sup>1</sup>s<sup>k</sup> there exists 1 ≤ i<k such that C(si, ai) > 0. Throughout this paper we consider only decreasing CMDPs. The only place where this assumption is used are the proofs of Theorem 4 and Theorem 8.

An infinite path is called a *run*. We typically name runs by variants of the symbol . The set of all runs in M is denoted RunsM. A finite path is called *history*. The set of all possible histories of M is *hist*<sup>M</sup> or simply *hist*. We use *last*(α) for the last state of α. Let α be a history with *last*(α) = s<sup>1</sup> and β = s1a1s2a<sup>2</sup> ...; we define a *joint path* as α β = αa1s2a<sup>2</sup> ....

A *strategy* for M is a function σ : *hist*<sup>M</sup> → A assigning to each history an action to play. A strategy is *memoryless* if σ(α) = σ(β) whenever *last*(α) = *last*(β). We do not consider randomized strategies in this paper, as they are non-necessary for qualitative ω-regular objectives on finite MDPs [30,32,33].

A computation of M under the control of a given strategy σ from some initial state s ∈ S creates a path. The path starts with s<sup>1</sup> = s. Assume that the current path is α and let s<sup>i</sup> = *last*(α) (we say that M is currently in si). Then the next action on the path is a<sup>i</sup> = σ(α) and the next state s<sup>i</sup>+1 is chosen randomly according to Δ(si, ai). Repeating this process *ad infinitum* yields an infinite sample run . We say that is σ*-compatible* if it can be produced using this process, and s*-initiated* if it starts in s. We denote the set of all σ-compatible, <sup>s</sup>-initiated runs by CompM(σ, s).

We denote by P<sup>σ</sup> M,s(A) the probability that a sample run from CompM(σ, s) belongs to a given measurable set of runs A. For details on the formal construction of measurable sets of runs as well as the probability measure P<sup>σ</sup> <sup>M</sup>,s see [2]. Throughout the paper, we drop the <sup>M</sup> subscripts in symbols whenever M is known from the context.

#### **2.1 Resource: Consumption, Levels, and Objectives**

We denote by *cap*(M) the battery capacity in the MDP M. A resource is consumed along paths and can be reloaded in the reload states up to the full capacity. For a path α = s1a1s<sup>2</sup> ... we define the consumption of α as *cons*(α) = len(α) <sup>i</sup>=1 C(si, ai) (since the consumption is non-negative, the sum is always well defined, though possibly diverging). Note that *cons* does not consider reload states at all. To accurately track the remaining amount of the resource, we use the concept of a *resource level*.

**Definition 2 (Resource level).** *Let* <sup>M</sup> *be a CMDP with a set of reload states R, let* α *be a history, and let* 0 ≤ d ≤ *cap*(M) *be an integer called* initial load*.* *Then the* energy level after α initialized by d*, denoted by RL*<sup>M</sup> <sup>d</sup> (α) *or simply as RL*d(α)*, is defined inductively as follows: for a zero-length history* s *we have RL*M <sup>d</sup> (s) = d*. For a non-zero-length history* α = βat *we denote* c = C(*last*(β), a)*, and put*

$$RL\_d^{\mathcal{M}}(\alpha) = \begin{cases} RL\_d^{\mathcal{M}}(\beta) - c & \text{if } last(\beta) \notin R \text{ and } c \le RL\_d^{\mathcal{M}}(\beta) \ne \bot \\ cap(\mathcal{M}) - c & \text{if } last(\beta) \in R \text{ and } c \le cap(\mathcal{M}) \text{ and } RL\_d^{\mathcal{M}}(\beta) \ne \bot \\ \bot & \text{otherwise} \end{cases}$$

Consider <sup>M</sup> from Fig. <sup>1</sup> and the history <sup>α</sup>(i)=(s1a2s5a2)<sup>i</sup> s<sup>1</sup> with i as a parameter. We have *cons*(α(i)) = 3i and at the same time, following the inductive definition of *RL*d(α(i)) we have *RL*2(α(i)) = 19 for all i ≥ 1 as the resource is reloaded every time in s5. This generalizes into the following. Let α be a history and let f,l ≥ 0 be the minimal and maximal indices i such that α<sup>i</sup> ∈ *R*, respectively. For *RL*d(α) = ⊥, it holds *RL*d(α..i) = d − *cons*(α..i) for all i ≤ f and *RL*d(α) = *cap*(M) − *cons*(αl..). Further, for each history α and d such that e = *RL*d(α) = ⊥, and each history β suitable for joining with α it holds that *RL*d(α β) = *RL*e(β).

A run is d*-safe* if and only if the energy level initialized by d is a non-negative number for each finite prefix of ρ, i.e. if for all i > 0 we have *RL*d(..i) = ⊥. We say that a run is safe if it is *cap*(M)-safe. The next lemma follows immediately from the definition of an energy level.

**Lemma 1.** *Let* = s1a1s<sup>2</sup> ... *be a* d*-safe run for some* d *and let* α *be a history such that last*(α) = s1*. Then the run* α *is* e*-safe if RL*e(α) ≥ d*.*

*Example 1.* Recall the CMDP and the parameterized history α(i) from above. We know that *RL*2(α(i)) = 19 for all i. Therefore, a strategy that always picks a<sup>2</sup> in s<sup>1</sup> is d-safe in s<sup>1</sup> for all d ≥ 2. On the other hand, a strategy that always picks a<sup>1</sup> in s<sup>1</sup> is *not* d-safe in s<sup>1</sup> for any 0 ≤ d ≤ 20 = *cap*(M) because for all runs that visit s<sup>3</sup> at least three times before s<sup>2</sup> we have *RL*d() = ⊥.

*Objectives.* An *objective* is a set of runs. The objective SafeRuns(d) contains exactly <sup>d</sup>-safe runs. Given a *target set* <sup>T</sup> <sup>⊆</sup> <sup>S</sup> and <sup>i</sup> <sup>∈</sup> <sup>N</sup>, we define Reach<sup>i</sup> <sup>T</sup> = { ∈ Runs | <sup>j</sup> ∈ T for some 1 ≤ j ≤ i + 1} to be the set of all runs that reach some state from T within the first i steps. We put Reach<sup>T</sup> = <sup>i</sup>∈<sup>N</sup> Reach<sup>i</sup> T . Finally, the set B¨uchi<sup>T</sup> <sup>=</sup> { <sup>∈</sup> Runs <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>T</sup> for infinitely many <sup>i</sup> <sup>∈</sup> <sup>N</sup>}.

*Problems.* We solve three main qualitative problems for CMDPs, namely *safety*, *positive reachability*, and *B¨uchi*.

Let us fix a state s and a target set of states T. We say that a strategy σ is d*-safe in* s if Comp(σ, s) ⊆ SafeRuns(d). We say that σ is T*-positive* d*-safe in* s if it is d-safe in s and P<sup>σ</sup> <sup>s</sup> (Reach<sup>T</sup> ) > 0, which means that there exists a run in Comp(σ, s) that visits T. Finally, we say that σ is T*-B¨uchi* d*-safe in a state* s if it is d-safe in s and P<sup>σ</sup> <sup>s</sup> (B¨uchi<sup>T</sup> ) = 1.

The vectors *Safe*, *SafePR*<sup>T</sup> (PR for "positive reachability"), and *SafeB¨uchi*<sup>T</sup> of type <sup>N</sup><sup>S</sup> contain, for each <sup>s</sup> <sup>∈</sup> <sup>S</sup>, the minimal <sup>d</sup> such that there exists a strategy that is d-safe in s, T-positive d-safe in s, and T-B¨uchi d-safe in s, respectively, and ∞ if no such strategy exists.

The problems we consider for a given CMDP are:


*Example 2.* Now consider again the d-safe strategy from Example 1 that always picks a2; such a strategy is 2-safe in s1, but is not useful if we attempt to eventually reach T. Hence memoryless strategies are not sufficient in our setting. Consider, instead, a strategy σ that picks a<sup>1</sup> in s<sup>1</sup> whenever the current resource level is at least 10 and picks a<sup>2</sup> otherwise. Such a strategy is 2-safe in s<sup>1</sup> and guarantees reaching s<sup>2</sup> with a positive probability: we need at least 10 units of energy to return to s<sup>5</sup> in the case we are unlucky and picking a<sup>1</sup> leads us to s3. If we are lucky, a<sup>1</sup> leads us to s<sup>2</sup> by consuming just 5 units of the resource, witnessing that σ is T-positive. As a matter of fact, during *every* revisit of s<sup>5</sup> there is a <sup>1</sup> <sup>2</sup> chance of hitting s<sup>2</sup> during the next try, so σ actually ensures that s<sup>2</sup> is visited with probability 1.

Solving a CMDP is substantially different from solving a consumption 2 player game [14]. Indeed, imagine that in M from Fig. 1, the outcome of the action a<sup>1</sup> from state s<sup>1</sup> is resolved by an adversarial player. In such a game, the strategy σ does not produce any run that reaches s2. In fact, there would be no strategy that guarantees reaching T in a 2-player game like this at all.

The strategy σ from our example uses finite memory to track the resource level exactly. We describe an efficient representation of such strategies in the next section.

#### **3 Counter Strategies**

In this section, we define a succinct representation of finite-memory strategies via so called counter selectors. Under the standard definition, a strategy σ is a *finite memory* strategy, if σ can be encoded by a *memory structure*, a type of finite transducer. Formally, a memory structure is a tuple μ = (M, *nxt*, *up*, m0) where M is a finite set of *memory elements*, *nxt* : M × S → A is a *next action* function, *up* : M ×S×A×S → M is a *memory update* function, and m<sup>0</sup> : S → M is the *memory initialization function*. The function *up* can be lifted to a function *up*<sup>∗</sup> : M × *hist* → M as follows.

$$up^\*(m,\alpha) = \begin{cases} m & \text{if } \alpha = s \text{ has length } 0\\ up\left(up^\*(m,\beta), last(\beta), a, t\right) & \text{if } \alpha = \beta at \text{ for some } a \in A \text{ and } t \in S \end{cases}$$

The structure μ encodes a strategy σ<sup>μ</sup> such that for each history α = s1a1s<sup>2</sup> ...s<sup>n</sup> we have σμ(α) = *nxt up*∗(m0(s1), α), s<sup>n</sup> .

In our setting, strategies need to track energy levels of histories. Let us fix an CMDP M = (S, A, Δ, C, *R*, *cap*). A non-exhausted energy level is always a number between 0 and *cap*(M), which can be represented with a binaryencoded bounded counter. We call strategies with such counters *finite counter (FC) strategies*. An FC strategy selects actions to play according to *selection rules*.

**Definition 3 (Selection rule).** *<sup>A</sup>* selection rule <sup>ϕ</sup> *for* <sup>M</sup> *is a partial function from the set* {0,..., *cap*(M)} *to* A*. Undefined value for some* n *is indicated by* ϕ(n) = ⊥*.*

We use *dom*(ϕ) = {n ∈ {0,..., *cap*(M)} | ϕ(n) = ⊥} to denote the domain of ϕ and we use *Rules*<sup>M</sup> or simply *Rules* for the set of all selection rules for M. Intuitively, a selection according to rule ϕ selects the action that corresponds to the largest value from *dom*(ϕ) that is not larger than the current energy level. To be more precise, if *dom*(ϕ) consists of numbers n<sup>1</sup> < n<sup>2</sup> < ··· < nk, then the action to be selected in a given moment is ϕ(ni), where n<sup>i</sup> is the largest element of *dom*(ϕ) which is less then or equal to the current amount of the resource. In other words, ϕ(ni) is to be selected if the current resource level is in [ni, n<sup>i</sup>+1) (putting n<sup>k</sup>+1 = ∞).

**Definition 4 (Counter selector).** *<sup>A</sup>* counter selector *for* <sup>M</sup> *is a function* Σ : S → *Rules .*

A counter selector itself is not enough to describe a strategy. A strategy needs to keep track of the energy level throughout the path. With a vector **<sup>r</sup>** ∈ {0,..., *cap*(M)}<sup>S</sup> of initial resource levels, each counter selector <sup>Σ</sup> defines a strategy Σ**<sup>r</sup>** that is encoded by the following memory structure (M, *nxt*, *up*, m0) with a ∈ A being a globally fixed action (for uniqueness). We stipulate that <sup>⊥</sup> < n for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>.

– M = {⊥} ∪ {0,..., *cap*(M)}.


$$\operatorname{cap}(m, s, a, t) = \begin{cases} m - C(s, a) & \text{if } s \notin R \text{ and } C(s, a) \le m \ne \bot \\ \operatorname{cap}(\mathcal{M}) - C(s, a) & \text{if } s \in R \text{ and } C(s, a) \le \operatorname{cap}(\mathcal{M}) \text{ and } m \ne \bot \\ \bot & \text{otherwise.} \end{cases}$$

– The function m<sup>0</sup> is m0(s) = **r**(s).

A strategy σ is a finite counter (FC) strategy if there is a counter selector Σ and a vector **r** such that σ = Σ**<sup>r</sup>**. The counter selector can be imagined as a finite-state device that implements σ using O(log(*cap*(M))) bits of additional memory (counter) used to represent numbers 0, 1,..., *cap*(M). The device uses the counter to keep track of the current resource level, the element ⊥ representing energy exhaustion. Note that a counter selector can be exponentially more succinct than the corresponding memory structure.

*Example 3.* Consider again the CMDP M in Fig. 1 and a counter selector Σ defined as follows: Let ϕ be a selection rule with *dom*(ϕ) = {0, 10} such that ϕ(0) = a<sup>2</sup> and ϕ(10) = a1. Then let ϕ be a selection rule such that *dom*(ϕ ) = {0} and ϕ(0) = a1. Finally, let Σ be a counter selector such that Σ(s1) = ϕ and <sup>Σ</sup>(si) = <sup>ϕ</sup> for all <sup>i</sup> = 1. Then, for a vector of initial resource levels **<sup>r</sup>**, the strategy σ informally described in Example 2 can be formally represented by putting <sup>σ</sup> <sup>=</sup> <sup>Σ</sup>**r**. Note that for any **<sup>r</sup>** with **<sup>r</sup>**(s1) <sup>≥</sup> 2, **<sup>r</sup>**(s2) <sup>≥</sup> 0, **<sup>r</sup>**(s3) <sup>≥</sup> 5, **<sup>r</sup>**(s4) <sup>≥</sup> 4, and **<sup>r</sup>**(s5) <sup>≥</sup> 0 and for any state <sup>s</sup> of <sup>M</sup> the strategy <sup>Σ</sup>**<sup>r</sup>** is **<sup>r</sup>**(s)-safe in s.

#### **4 Safety**

In this section, we present an algorithm that computes, for each state, the minimal value d (if it exists) such that there exists a d-safe strategy from that state. We also provide the corresponding strategy. In the remainder of the section we fix an MDP M.

A d-safe run has the following two properties: (i) It consumes at most d units of the resource (energy) before it reaches the first reload state, and (ii) it never consumes more than *cap*(M) units of the resource between 2 visits of reload states. To ensure (ii), we need to identify a maximal subset *R* ⊆ *R* of reload states for which there is a strategy σ that, starting in some r ∈ *R* , can always reach *R* again (within at least one step) using at most *cap*(M) resource units. The d-safe strategy we seek can be then assembled from σ and from a strategy that suitably navigates towards *R* , which is needed for (i).

In the core of both properties (i) and (ii) lies the problem of *minimum cost reachability.* Hence, in the next subsection, we start with presenting necessary results on this problem.

#### **4.1 Minimum Cost Reachability**

The problem of minimum cost reachability with non-negative costs was studied before [46]. Here we present a simple approach to the problem used in our implementation and most of the technical details are available in the full version.

**Definition 5.** *Let* <sup>T</sup> <sup>⊆</sup> <sup>S</sup> *be a set of* target *states, let* <sup>α</sup> <sup>=</sup> <sup>s</sup>1a1s<sup>2</sup> ... *be a finite or infinite path, and let* 1 ≤ f *be the smallest index such that* s<sup>f</sup> ∈ T*. We define* consumption of α to T *as ReachCons*M,T (α) = *cons*(α..f ) *if* f *exists and we set ReachCons*M,T (α) = ∞ *otherwise. For a strategy* σ *and a state* s ∈ S *we define ReachCons*M,T (σ, s) = sup∈Comp(σ,s) *ReachCons*M,T ()*. A* minimum cost reachability of T from s *is a vector defined as*

*MinReach*M,T (s) = inf *ReachCons*M,T (σ, s) | σ *is a strategy for* M .

Intuitively, d = *MinReach*<sup>T</sup> (s) is the minimal initial load with which some strategy can ensure reaching T with consumption at most d, when starting in s. We say that a strategy σ is optimal for *MinReach*<sup>T</sup> if we have that *MinReach*<sup>T</sup> (s) = *ReachCons*<sup>T</sup> (σ, s) for all states s ∈ S.

We also define functions *ReachCons*<sup>+</sup> <sup>M</sup>,T and the vector *MinReach*<sup>+</sup> <sup>M</sup>,T in a similar fashion with one exception: we require the index f from definition of *ReachCons*M,T (α) to be strictly larger than 1, which enforces to take at least one step to reach T.

For the rest of this section, fix a target set T and consider the following functional F :

$$\mathcal{F}(\mathbf{v})(s) = \begin{cases} \min\_{a \in A} \left( C(s, a) + \max\_{t \in Succ(s, a)} \mathbf{v}(t) \right) & s \notin T\\ 0 & s \in T \end{cases}$$

F is a simple generalization of the standard Bellman functional used for computing shortest paths in graphs. The proof of the following Theorem is rather standard and moved to the full version of the paper.

**Theorem 2.** *Denote by* <sup>n</sup> *the length of the longest simple path in* <sup>M</sup>*. Let* **<sup>x</sup>**<sup>T</sup> *be a vector such that* **<sup>x</sup>**<sup>T</sup> (s)=0 *if* <sup>s</sup> <sup>∈</sup> <sup>T</sup> *and* **<sup>x</sup>**<sup>T</sup> (s) = <sup>∞</sup> *otherwise. Then iterating* <sup>F</sup> *on* **<sup>x</sup>**<sup>T</sup> *yields a fixpoint in at most* <sup>n</sup> *steps and this fixpoint equals MinReach*<sup>T</sup> *.*

To compute *MinReach*<sup>+</sup> <sup>M</sup>,T , we construct a new CMDP <sup>M</sup> from <sup>M</sup> by adding a copy ˜s of each state s ∈ S such that dynamics in ˜s is the same as in s; i.e. for each a ∈ A, Δ(˜s, a) = Δ(s, a) and C(˜s, a) = C(s, a). We denote the new state set as <sup>S</sup>. We don't change the set of reload states, so ˜<sup>s</sup> is *never* in <sup>T</sup>, even if <sup>s</sup> is. Given the new CMDP <sup>M</sup> and the new state set as <sup>S</sup>, the following lemma is straightforward.

**Lemma 2.** *Let* <sup>M</sup> *be a CMDP and let* <sup>M</sup> *be the CMDP constructed as above. Then for each state* <sup>s</sup> *of* <sup>M</sup> *it holds MinReach*<sup>+</sup> <sup>M</sup>,T (s) = *MinReach*M-,T (˜s)*.*

#### **4.2 Safely Reaching Reload States**

In the following, we use *MinInitCons*<sup>M</sup> (read *minimal initial consumption*) for the vector *MinReach*<sup>+</sup> <sup>M</sup>,*<sup>R</sup>* – minimal resource level that ensures we can surely reach a reload state in at least one step. By Lemma 2 and Theorem 2 we can construct <sup>M</sup> and iterate the operator <sup>F</sup> for <sup>|</sup>S<sup>|</sup> steps to compute *MinInitCons*M. Note that <sup>S</sup> is the state space of <sup>M</sup> since introducing the new states into <sup>M</sup> did not increase the length of the maximal simple path. However, we can avoid the construction of <sup>M</sup> and still compute *MinInitCons*<sup>M</sup> using a *truncated* version of the functional F , which is the approach used in our implementation. We first introduce the following truncation operator:

$$\|\mathbf{x}\|\_{\mathcal{M}}(s) = \begin{cases} \mathbf{x}(s) & \text{if } s \notin R, \\ 0 & \text{if } s \in R. \end{cases}$$

## **Algorithm 1:** Algorithm for computing *MinInitCons*M.

**Input:** CMDP <sup>M</sup> = (S, A, Δ, C, *<sup>R</sup>*, *cap*) **Output:** The vector *MinInitCons*<sup>M</sup> **<sup>1</sup>** initialize **<sup>x</sup>** <sup>∈</sup> <sup>N</sup>*<sup>S</sup>* to be <sup>∞</sup> in every component; **2 repeat <sup>3</sup> x***old* ← **x**; **<sup>4</sup> foreach** s ∈ S **do <sup>5</sup>** c ← min*<sup>a</sup>*∈*<sup>A</sup>* C(s, a) + max*s*-∈*Succ*(*s,a*) **x***old* M(s ) ; **6 if** c < **x**(s) **then <sup>7</sup> x**(s) ← c; **8 until x***old* = **x**; **9 return x**

Then, we define a truncated functional G as follows:

$$\mathcal{G}(\mathbf{v})(s) = \min\_{a \in A} \left( C(s, a) + \max\_{s' \in Succ(s, a)} \|\,\mathbf{v}\\_\|\_{\mathcal{M}}(s') \right).$$

The following lemma connects the iteration of G on M with the iteration of F on <sup>M</sup> .

**Lemma 3.** *Let <sup>∞</sup>* <sup>∈</sup> <sup>N</sup><sup>S</sup> *be a vectors with all components equal to* <sup>∞</sup>*. Consider iterating* <sup>G</sup> *on <sup>∞</sup> in* <sup>M</sup> *and* <sup>F</sup> *on* **<sup>x</sup>***<sup>R</sup> in* <sup>M</sup> *. Then for each* <sup>i</sup> <sup>≥</sup> <sup>0</sup> *and each* <sup>s</sup> <sup>∈</sup> *<sup>R</sup> we have* <sup>G</sup><sup>i</sup> (*∞*)(s) = <sup>F</sup> <sup>i</sup> (**x***R*)(˜s) *and for every* <sup>s</sup> <sup>∈</sup> <sup>S</sup> \ *R we have* <sup>G</sup><sup>i</sup> (*∞*)(s) = F i (**x***R*)(s)*.*

Algorithm 1 uses G to compute the vector *MinInitCons*M.

**Theorem 3.** *Algorithm* <sup>1</sup> *correctly computes the vector MinInitCons*M*. Moreover, the repeat-loop terminates after at most* |S| *iterations.*

#### **4.3 Solving the Safety Problem**

We want to identify a set *R* ⊆ R such that we can reach *R* in at least 1 step and with consumption at most *cap* = *cap*(M), from each r ∈ *R* . This entails identifying the maximal *R* ⊆ R such that *MinInitCons*M(*R*-) ≤ *cap* for each r ∈ *R* . This can be done by initially setting *R* = *R* and iteratively removing states that have *MinInitCons*M(*R*-) > *cap*, from *R* , as in Algorithm 2.

**Theorem 4.** *Algorithm* <sup>2</sup> *computes the vector Safe*<sup>M</sup> *in polynomial time.*

*Proof.* The algorithm clearly terminates. Computing *MinInitCons*M(*Rel*) on line 5 takes a polynomial number of steps per call due to Theorem 3 and since M(*Rel*) has asymptotically the same size as M. Since the repeat loop performs at most |*R*| iterations, the complexity follows.

```
Algorithm 2: Computing the vector SafeM.
```

```
Input: CMDP M
  Output: The vector SafeM
1 cap ← cap(M);
2 Rel ← R; ToRemove ← ∅;
3 repeat
4 Rel ← Rel -
                ToRemove;
5 mic ← MinInitConsM(Rel);
6 ToRemove ← {r ∈ Rel | mic(r) > cap};
7 until ToRemove = ∅;
8 foreach s ∈ S do
9 if mic(s) > cap then out(s) = ∞;
10 else out(s) = mic(s);
11 return out
```
As for correctness, we first prove that **out** <sup>≤</sup> *Safe*M. It suffices to prove for each <sup>s</sup> <sup>∈</sup> <sup>S</sup> that upon termination, **mic**(s) <sup>≤</sup> *Safe*M(s) whenever the latter value is finite. Since *MinInitCons*M- (s) <sup>≤</sup> *Safe*M- (s) for each MDP <sup>M</sup> and each its state such that *Safe*M- (s) < ∞, it suffices to show that *Safe*M(*Rel*) <sup>≤</sup> *Safe*<sup>M</sup> is an invariant of the algorithm (as a matter of fact, we prove that *Safe*M(*Rel*) <sup>=</sup> *Safe*M). To this end, it suffices to show that at every point of execution *Safe*M(t) = <sup>∞</sup> for each <sup>t</sup> <sup>∈</sup> *<sup>R</sup>* \ *Rel*: indeed, if this holds, no strategy that is safe for some state s = t can play an action a from s such that t ∈ *Succ*(s, a), so declaring such states non-reloading does not influence the *Safe*M-values. So denote by *Rel* <sup>i</sup> the contents of *Rel* after the <sup>i</sup>-th iteration. We prove, by induction on <sup>i</sup>, that *Safe*M(s) = <sup>∞</sup> for all <sup>s</sup> <sup>∈</sup> *<sup>R</sup>* \ *Rel*. For <sup>i</sup> = 0 we have *R* = *Rel*, so the statement holds. For i > 0, let s ∈ *R* \ *Rel* <sup>i</sup>, and let σ be any strategy. If some run from Comp(σ, s) visits a state from *R*\*Rel* <sup>i</sup>−<sup>1</sup>, then σ is not *cap*-safe, by induction hypothesis. Now assume that all such runs only visit reload states from *Rel* <sup>i</sup>−<sup>1</sup>. Then, since *MinInitCons*M(*Reli*−1)(s) > *cap*, there must be a run <sup>∈</sup> Comp(σ, s) with *ReachCons*<sup>+</sup> *Reli*−<sup>1</sup> () > *cap*. Assume that is *cap*-safe in s. Since we consider only decreasing CMDPs, must infinitely often visit a reload state (as it cannot get stuck in a zero cycle). Hence, there exists an index f > 1 such that <sup>f</sup> ∈ *Rel* <sup>i</sup>−<sup>1</sup>, and for this f we have *RLcap*(..f ) = ⊥, a contradiction. So again, σ is not safe in s. Since there is no safe strategy from <sup>s</sup>, we have *Safe*M(s) = <sup>∞</sup>.

Finally, we need to prove that upon termination, **out** <sup>≥</sup> *Safe*M. Informally, per the definition of **out**, from every state s we can ensure reaching a state of *Rel* by consuming at most **out**(s) units of the resource. Once in *Rel*, we can ensure that we can again return to *Rel* without consuming more than *cap* units of the resource. Hence, when starting with **out**(s) units, we can surely prevent resource exhaustion. 

**Definition 6.** *We call an action* a safe *in a state* s *if one of the following conditions holds:*

*–* <sup>s</sup> <sup>∈</sup> *R and* <sup>C</sup>(s, a) + maxt∈*Succ*(s,a) *Safe*M(t) <sup>≤</sup> *Safe*M(s)*; or –* <sup>s</sup> <sup>∈</sup> *R and* <sup>C</sup>(s, a) + maxt∈*Succ*(s,a) *Safe*M(t) <sup>≤</sup> *cap*(M)*.*

*Note that by the definition of Safe*M, *for each state* <sup>s</sup> *with Safe*M(s) <sup>&</sup>lt; <sup>∞</sup> *there is always at least one action safe in* <sup>s</sup>*. For states* <sup>s</sup> *s.t. Safe*M(s) = <sup>∞</sup>*, we stipulate all actions to be safe in* s*.*

**Theorem 5.** *Any strategy which always selects an action that is safe in the current state is Safe*M(s)*-safe in every state* <sup>s</sup>*. In particular, in each consumption MDP* <sup>M</sup> *there is a memoryless strategy* <sup>σ</sup> *that is Safe*M(s)*-safe in every state* <sup>s</sup>*. Moreover,* σ *can be computed in polynomial time.*

*Proof.* The first part of the theorem follows directly from Definition 6, Definition 2 (resource levels), and from definition of d-safe runs. The second part is a corollary of Theorem 4 and the fact that in each state, the safe strategy from Definition 6 can fix one such action in each state and thus is memoryless. The complexity follows from Theorem 4. 

*Example 4.* Consider again the M from Fig. 1. Algorithm 1 returns, for input <sup>M</sup>, the vector **mic** = (2, <sup>1</sup>, <sup>5</sup>, <sup>4</sup>, 3). Algorithm <sup>2</sup> reuses **mic** on line 5 and returns it unchanged. Hence, the vector **mic** equals *Safe*M. The strategies described in Example 1 witness that *Safe*(s1) ≤ 2. Here we see that there is no strategy that would be 1-safe in s1.

### **5 Positive Reachability**

In this section, we focus on strategies that are safe and such that at least one run they produce visits a given set T ⊆ S of *targets*. The main contribution of this section is Algorithm 3 used to compute such strategies as well as the vector *SafePR*M,T of minimal initial resource levels for which such a strategy exist. As before, for the rest of this section we fix a CMDP M.

We define a function *SPR*-*Val*<sup>M</sup> : <sup>S</sup> <sup>×</sup> <sup>A</sup> <sup>×</sup> <sup>N</sup><sup>S</sup> <sup>→</sup> <sup>N</sup> (*SPR* for safe positive reachability) s.t. for all <sup>s</sup> <sup>∈</sup> S, a <sup>∈</sup> <sup>A</sup>, and **<sup>x</sup>** <sup>∈</sup> <sup>N</sup><sup>S</sup> we have

$$\text{SPR-Val}\_{\mathcal{M}}(s, a, \mathbf{x}) = C(s, a) + \min\_{t \in \text{Succ}(s, a)} \left\{ \max \left\{ \mathbf{x}(t), Safe\_{\mathcal{M}}(t') \mid t' \in \text{Succ}(s, a), t' \neq t \right\} \right\}$$

The max operator considers, for given t, the value **x**(t) and the values needed to survive from all possible outcomes of <sup>a</sup> other than <sup>t</sup>. Let <sup>v</sup> <sup>=</sup> *SPR*-*Val*M(s, a, **<sup>x</sup>**) and t the outcome selected by min. Intuitively, v is the minimal amount of resource needed to reach t with at least **x**(t) resource units, or survive if the outcome of a is different from t.

We now define a functional whose fixed point characterizes *SPR*-*Val*M,T . We first define a two-sided version of the truncation operator from the previous section: the operator · <sup>M</sup> such that

$$\|\mathbf{x}\|\_{\mathcal{M}}(s) = \begin{cases} \infty & \text{if } \mathbf{x}(s) > cap(\mathcal{M}) \\ \mathbf{x}(s) & \text{if } \mathbf{x}(s) \le cap(\mathcal{M}) \text{ and } s \notin R \\ 0 & \text{if } \mathbf{x}(s) \le cap(\mathcal{M}) \text{ and } s \in R \end{cases}$$

Using the functions *SPR*-*Val* and · M, we now define an auxiliary operator A and the main operator B as follows.

$$\mathcal{A}\_{\mathcal{M}}(\mathbf{r})(s) = \begin{cases} Safe\_{\mathcal{M}}(s) & \text{if } s \in T \\ \min\_{a \in A} \left( SPR\text{-}Val\_{\mathcal{M}}(s, a, \mathbf{r}) \right) & \text{otherwise}; \end{cases}$$

$$\mathcal{B}\_{\mathcal{M}}(\mathbf{r}) = \left[ \mathcal{A}\_{\mathcal{M}}(\mathbf{r}) \right]\_{\mathcal{M}}$$

Let *SafePR*<sup>i</sup> <sup>T</sup> be the vector such that for a state s ∈ S the number d = *SafePR*<sup>i</sup> <sup>T</sup> (s) is the minimal number such that there exists a strategy that is --safe in s and produces at least one run that visits T within first i steps. Further, we denote by **y**<sup>T</sup> a vector such that

$$\mathbf{y}\_T(s) = \begin{cases} Safe\_{\mathcal{M}}(s) & \text{if } s \in T \\ \infty & \text{if } s \notin T \end{cases}$$

The following lemma can proved by a rather straightforward but technical induction.

**Lemma 4.** *Consider the iteration of* BM *on the initial vector* **<sup>y</sup>**<sup>T</sup> *. Then for each* <sup>i</sup> <sup>≥</sup> <sup>0</sup> *it holds that* <sup>B</sup><sup>i</sup> <sup>M</sup>(**y**<sup>T</sup> ) = *SafePR*<sup>i</sup> <sup>M</sup>,T *.*

The following lemma says that iterating BM reaches a fixed point in a polynomial number of iterations. Intuitively, this is because when trying to reach T, it doesn't make sense to perform a cycle between two visits of a reload state (as this can only increase the resource consumption) and at the same time it doesn't make sense to visit the same reload state twice (since the resource is reloaded to the full capacity upon each visit). The proof is straightforward and is omitted in the interest of brevity. Detailed proofs for Lemma 4 and Lemma 5 are available in the full version of the paper.

**Lemma 5.** *Let* <sup>K</sup> <sup>=</sup> <sup>|</sup>*R*<sup>|</sup> + (|*R*<sup>|</sup> + 1) · (|S|−|*R*<sup>|</sup> + 1)*. Taking the same initial vector* **<sup>y</sup>**<sup>T</sup> *as in Lemma 4, we have* <sup>B</sup><sup>K</sup> <sup>M</sup>(**y**<sup>T</sup> ) = *SafePR*M,T *.*

The computation of *SafePR*M,T and of the associated witness strategy is presented in Algorithm 3.

*Example 5.* Consider again the CMDP M from Fig. 1. After one iteration of the loop on line 5, we have **<sup>r</sup>** = (10, <sup>0</sup>,∞,∞,∞), as **<sup>r</sup>** is only finite for <sup>s</sup><sup>2</sup> before this iteration. In the next iteration, we have **<sup>r</sup>** = (10, <sup>0</sup>,∞, <sup>12</sup>, 0). Thus, the next iteration changes the value for s<sup>1</sup> to 2 and in the end, we end up with **r** = (2, 0, 4, 5, 0). The iteration with **r**(s1) = 10 influences the selector Σ. Note that the computed **r** and Σ match those mentioned in Example 3.

**Theorem 6.** *The Algorithm 3 always terminates after a polynomial number of steps, and upon termination,* **<sup>r</sup>** <sup>=</sup> *SafePR*M,T *.*

## **Algorithm 3:** Positive reachability of <sup>T</sup> in <sup>M</sup>

**Input:** CMDP M with states S, set of target states T ⊆ S **Output:** The vector *SafePR*M*,T* , coreresponding rule selector <sup>Σ</sup> **<sup>r</sup>** ← {∞}*<sup>S</sup>*; **foreach** <sup>s</sup> <sup>∈</sup> <sup>S</sup> *s.t. Safe*M(s) <sup>&</sup>lt; <sup>∞</sup> **do** <sup>Σ</sup>(s)(*Safe*M(s)) <sup>←</sup> arbitrary action safe in <sup>s</sup> **foreach** <sup>t</sup> <sup>∈</sup> <sup>T</sup> **do r**(t) <sup>←</sup> *Safe*M(t) ; **5 repeat r***old* ← **r**; **foreach** s ∈ S \ T **do <sup>a</sup>**(s) <sup>←</sup> arg min*<sup>a</sup>*∈*<sup>A</sup> SPR*-*Val* (s, a, **<sup>r</sup>***old* ); **<sup>r</sup>**(s) <sup>←</sup> min*<sup>a</sup>*∈*<sup>A</sup> SPR*-*Val* (s, a, **<sup>r</sup>***old* ); **r** ← **r** M; **foreach** s ∈ S \ T **do if r**(s) < **r***old* (s) **then** Σ(s)(**r**(s)) ← **a**(s); **until r***old* = **r**; **return r**, Σ

*Proof.* The repeat loop on lines 1–4 initialize **r** to **y**<sup>T</sup> . The repeat loop on lines 5–14 then iterates the operator B. By Lemma 5, the iteration reaches a fixed point in at most <sup>K</sup> steps, and this fixed point equals *SafePR*M,T . The complexity bound follows easily, since K is of polynomial magnitude.

The most intricate part of our analysis is extracting a strategy that is T-positive *SafePR*M,T (s)-safe in every state <sup>s</sup>.

**Theorem 7.** *Let* **<sup>v</sup>** <sup>=</sup> *SafePR*M,T *. Upon termination of Algorithm 3, the computed selector* Σ *has the property that the finite counter strategy* Σ**<sup>v</sup>** *is, for each state* <sup>s</sup> <sup>∈</sup> <sup>S</sup>*,* <sup>T</sup>*-positive* **<sup>v</sup>**(s)*-safe in* <sup>s</sup>*. That is, a polynomial-size finite counter strategy for the positive reachability problem can be computed in polynomial time.*

The rest of this section is devoted to the proof of Theorem 7. The complexity follows from Theorem 6. Indeed, since the algorithm has a polynomial complexity, also the size of Σ is polynomial. The correctness proof is based on the following invariant of the main repeat loop: the finite counter strategy π = Σ**<sup>r</sup>** has these properties:


The theorem then follows from this invariant (parts (a) and the first half of (b)) and from Theorem 6. We start with the following support invariant, which is easy to prove.

## **Lemma 6.** *The inequality* **<sup>r</sup>** <sup>≥</sup> *Safe*<sup>M</sup> *is an invariant of the main repeat-loop.*

*Proving Part (a) of the Main Invariant.* We use the following auxiliary lemma.

**Lemma 7.** *Assume that* <sup>Σ</sup> *is a counter selector such that for all* <sup>s</sup> <sup>∈</sup> <sup>S</sup> *such that Safe*(s) < ∞*:*


*Then for each vector* **<sup>y</sup>** <sup>≥</sup> *Safe the strategy* <sup>π</sup> <sup>=</sup> <sup>Σ</sup>**<sup>y</sup>** *is Safe*(s)*-safe in every state* <sup>s</sup>*.*

*Proof.* Let <sup>s</sup> be a state such that **<sup>y</sup>**(s) <sup>&</sup>lt; <sup>∞</sup>. It suffices to prove that for every <sup>π</sup>compatible finite path α started in s it holds ⊥ = *RL***y**(s)(α). We actually prove a stronger statement: ⊥ = *RL***y**(s)(α) ≥ *Safe*(*last*(α)). We proceed by induction on the length of <sup>α</sup>. If len(α) = 0 we have *RL***y**(s)(α) = **<sup>y</sup>**(s) <sup>≥</sup> *Safe*M(s) <sup>≥</sup> 0. Now let α = β t1at<sup>2</sup> for some shorter path β with *last*(β) = t<sup>1</sup> and a ∈ A, <sup>t</sup>1, t<sup>2</sup> <sup>∈</sup> <sup>S</sup>. By induction hypothesis, <sup>l</sup> <sup>=</sup> *RL***y**(s)(β) <sup>≥</sup> *Safe*M(t1), from which it follows that *Safe*M(t1) <sup>&</sup>lt; <sup>∞</sup>. Due to (1.), it follows that there exists at least one x ∈ *dom*(Σ(t1)) such that x ≤ l. We select maximal x satisfying the inequality so that a = Σ(t1)(x). We have that *RL***y**(s)(α) = *RL*l(t1at2) by definition and from (2.) it follows that ⊥ = *RL*x(t1at2) ≥ *Safe*(t2) ≥ 0. All together, as l ≥ x we have that *RL***y**(s)(α) ≥ *RL*x(t1at2) ≥ *Safe*(t2) ≥ 0. 

Now we prove the part (a) of the main invariant. We show that throughout the execution of Algorithm 3, Σ satisfies the assumptions of Lemma 7. Property (1.) is ensured by the initialization on line 3. The property (2.) holds upon first entry to the main loop by the definition of a safe action (Definition 6). Now assume that Σ(s)(**r**(s)) is redefined on line 13, and let a be the action **a**(s).

We first handle the case when s ∈ *R*. Since a was selected on line 8, from the definition of *SPR*-*Val* we have that there is t ∈ *Succ*(s, a) such that after the loop iteration,

$$\mathbf{r}(s) = C(s, a) + \max\{\mathbf{r}\_{old}(t), Safe(t') \mid t \neq t' \in succ(s, a)\} \ge C(s, a) + \max\_{t' \in succ(s, a)} Safe\_M(t'), \tag{1}$$

the latter inequality following from Lemma 6. Satisfaction of property (2.) in s then follows immediately from the Eq. (1).

If s ∈ *R*, then (1) holds before the truncation on line 10, at which point **<sup>r</sup>**(s) <sup>&</sup>lt; *cap*(M). Hence, *cap*(M) <sup>−</sup> <sup>C</sup>(s, a) <sup>≥</sup> maxt∈*Succ*(s,a) *Safe*M(t) as required by (2.). From Lemmas <sup>6</sup> and <sup>7</sup> it follows that <sup>Σ</sup>**<sup>r</sup>** is *Safe*M(s)-safe in every state s. This finishes the proof of part (a) of the invariant.

*Proving Part (b) of the Main Invariant.* Clearly, (b) holds after initialization. Now assume that an iteration of the main repeat loop was performed. Denote by π*old* the strategy Σ**r***old* and by π the strategy Σ**r**. Let s be any state such that **<sup>r</sup>**(s) <sup>≤</sup> *cap*(M). If **<sup>r</sup>**(s) = **<sup>r</sup>***old* (s), then we claim that (b) follows directly from the induction hypothesis: indeed, we have that there is an s-initiated π*old* -compatible path α ending in a target state s.t. the **r***old* (s)-initiated resource level along α never drops **<sup>r</sup>***old* , i.e. for each prefix <sup>β</sup> of <sup>α</sup> it holds *RL***<sup>r</sup>***old* (s)(β) <sup>≥</sup> **<sup>r</sup>***old* (*last*(β)). But then β is also π-compatible, since for each state q, Σ(q) was only redefined for values smaller than **r***old* (q).

The case when **r**(s) < **r***old* (s) is treated similarly. As in the proof of part (a), denote by <sup>a</sup> the action **<sup>a</sup>**(s) assigned on line 13. There must be a state <sup>t</sup> <sup>∈</sup> *Succ*(s, a) s.t. (1) holds before the truncation on line 10. In particular, for this t it holds *RL***r**(s)(sat) <sup>≥</sup> **<sup>r</sup>***old* (t). By induction hypothesis, there is a <sup>t</sup>-initiated <sup>π</sup>*old* compatible path β ending in T satisfying the conditions in (b). We put α = satβ. Clearly α is s-initiated and reaches T. Moreover, it is π-compatible. To see this, note that Σ**<sup>r</sup>**(s)(**r**(s)) = a; moreover, the resource level after the first transition is <sup>e</sup>(t) = *RL***r**(s)(sat) <sup>≥</sup> **<sup>r</sup>***old* (t), and due to the assumed properties of <sup>β</sup>, the **<sup>r</sup>***old* (t) initiated resource level (with initial load e(t)) never decreases below **r***old* along β. Since Σ was only re-defined for values smaller than those given by the vector **r***old* , <sup>π</sup> mimics <sup>π</sup>*old* along <sup>β</sup>. Since **<sup>r</sup>** <sup>≤</sup> **<sup>r</sup>***old* , we have that along <sup>α</sup>, the **<sup>r</sup>**(s)-initiated resource level never decreases below **r**. This finishes the proof of part (b) of the invariant and thus also the proof of Theorem 7. 

#### **6 B¨uchi**

This section proofs Theorem 1 which is the main theoretical result of the paper. The proof is broken down into the following steps.


**Algorithm 4:** Almost-sure B¨uchi reachability of <sup>T</sup> in <sup>M</sup>.

**Input:** CMDP <sup>M</sup> = (S, A, Δ, C, *<sup>R</sup>*, *cap*), target states <sup>T</sup> <sup>⊆</sup> <sup>S</sup> **Output:** The largest set *Rel* <sup>⊆</sup> *<sup>R</sup>* such that *SafePR*M(*Rel*)*,T* (r) <sup>≤</sup> *cap* for all <sup>r</sup> <sup>∈</sup> *Rel*. **<sup>1</sup>** *Rel* <sup>←</sup> *<sup>R</sup>*; *ToRemove* ← ∅; **2 repeat <sup>3</sup>** *Rel* <sup>←</sup> *Rel* - *ToRemove*; **<sup>4</sup>** (**reach**, Σ) <sup>←</sup> *SafePR*M(*Rel*)*, T* ; **<sup>5</sup>** *ToRemove* ← {<sup>r</sup> <sup>∈</sup> *Rel* <sup>|</sup> **reach**(r) <sup>&</sup>gt; *cap*}; **<sup>6</sup> until** *ToRemove* <sup>=</sup> <sup>∅</sup>; **7 return reach**, Σ

Algorithm 4 solves (1.) in a similar fashion as Algorithm 2 handled safety. In each iteration, we declare as non-reloading all states from which positive reachability of T and safety within M(*Rel*) cannot be guaranteed. This is repeated until we reach a fixed point. The number of iterations is clearly bounded by |*R*|.

**Theorem 8.** *Let* <sup>M</sup> = (S, A, Δ, C, *<sup>R</sup>*, *cap*) *be a CMDP and* <sup>T</sup> <sup>⊆</sup> <sup>S</sup> *be a target set. Moreover, let R be the contents of Rel upon termination of Algorithm 4 for the input* <sup>M</sup> *and* <sup>T</sup>*. Finally let* **<sup>r</sup>** *and* <sup>Σ</sup> *be the vector and the selector returned by Algorithm 3 for the input* M *and* T*. Then for every state* s*, the finite counter strategy* <sup>σ</sup> <sup>=</sup> <sup>Σ</sup>**<sup>r</sup>** *is* <sup>T</sup>*-B¨uchi* **<sup>r</sup>**(s)*-safe in* <sup>s</sup> *in both* <sup>M</sup>(*R* ) *and* M*. Moreover, the vector* **<sup>r</sup>** *is equal to SafeB¨uchi*M,T *.*

*Proof.* We first show that <sup>σ</sup> is <sup>T</sup>-B¨uchi **<sup>r</sup>**(s)-safe in <sup>M</sup>(*R* ) for all s ∈ S with **<sup>r</sup>**(s) <sup>≤</sup> *cap*. Clearly it is **<sup>r</sup>**(s)-safe, so it remains to prove that <sup>T</sup> is visited infinitely often with probability 1. We know that upon every visit of a state r ∈ *R* , σ guarantees a future visit to T with positive probability. As a matter of fact, since σ is a finite memory strategy, there is δ > 0 such that upon every visit of some r ∈ *R* , the probability of a future visit to T is at least δ. As M(*R* ) is decreasing, every s-initiated σ-compatible run must visit the set *R* infinitely many times. Hence, with probability 1 we reach T at least once. The argument can then be repeated from the first point of visit to T to show that with probability 1 ve visit T at least twice, three times, etc. *ad infinitum.* By the monotonicity of probability, P<sup>σ</sup> <sup>M</sup>,s(B¨uchi<sup>T</sup> ) = 1.

It remains to show that **<sup>r</sup>** <sup>≤</sup> *SafeB¨uchi*M,T . Assume that there is a state <sup>s</sup> <sup>∈</sup> <sup>S</sup> and a strategy <sup>σ</sup> such that <sup>σ</sup> is <sup>d</sup>-safe in <sup>s</sup> for some d < **<sup>r</sup>**(s) = *SafePR*M(*R*-),T (s). We show that this strategy is not T-B¨uchi d-safe in M. If all σ -compatible runs reach T, then there must be at least one history α produced by <sup>σ</sup> that visits <sup>r</sup> <sup>∈</sup> *<sup>R</sup>* \ *<sup>R</sup>* before reaching <sup>T</sup> (otherwise <sup>d</sup> <sup>≥</sup> **<sup>r</sup>**(s)). Then either (a) *SafePR*M,T (r) = <sup>∞</sup>, in which case any <sup>σ</sup> -compatible extension of <sup>α</sup> avoids <sup>T</sup>; or (b) since *SafePR*M(*R*-),T (r) > *cap*, there must be an extension of α that visits, between the visit of r and T, another r ∈ *R* \ *R* such that r = r. We can then repeat the argument, eventually reaching the case (a) or running out of the resource, a contradiction with σ being d-safe. 

We can finally proceed to prove Theorem 1.

*Proof (of Theorem* 1*).* The theorem follows immediately from Theorem 8 since we can (a) compute *SafeB¨uchi*M,T and the corresponding strategy <sup>σ</sup><sup>T</sup> in polynomial time (see Theorem 7 and Algorithm 4); (b) we can easily check whether <sup>d</sup> <sup>≥</sup> *SafeB¨uchi*M,T (s), if yes, than <sup>σ</sup><sup>T</sup> is the desired strategy <sup>σ</sup>; and (c) represent σ<sup>T</sup> in polynomial space as it is a finite counter strategy represented by a polynomial-size counter selector. 

#### **7 Implementation and Case Studies**

We implemented the presented algorithms in Python and released it as an open-source tool called *FiMDP (Fuel in MDP)* available at https://github.com/ xblahoud/FiMDP. The docker artifact is available at https://hub.docker.com/r/ xblahoud/fimdp and can be run without installation via the Binder project [50]. We investigate the practical behavior of our algorithms using two case studies: (1) An autonomous electric vehicle (AEV) routing problem in the streets of Manhattan modeled using realistic traffic and electric car energy consumption data, and (2) a multi-agent grid world model inspired by the Mars Helicopter Scout [8] to be deployed from the planned Mars 2020 rover. The first scenario demonstrates the utility of our algorithm for solving real-world problems [59], while the second scenario studies the algorithm's scalability limits.

The consumption-B¨uchi objective can be also solved by a naive approach that encodes the energy constraints in the state space of the MDP, and solves it using techniques for standard MDPs [33]. States of such an MDP are tuples (s, e) where s is a state of the input CMDP and e is the current level of energy. Naturally, all actions that would lead to states with e < 0 lead to a special sink state. The standard techniques rely on decomposition of the MDP into maximal end-components (MEC). We implemented the explicit encoding of CMDP into MDP, and the MEC-decomposition algorithm.

All computations presented in the following were performed on a PC with Intel Core i7-8700 3.20 GHz 12 core processor and a RAM of 16 GB running Ubuntu 18.04 LTS. All running times are means from at least 5 runs and the standard deviation was always below 5% among these runs.

#### **7.1 Electric Vehicle Routing**

We consider the area in the middle of Manhattan, from 42nd to 116th Street, see Fig. 2. Street intersections and directions of feasible movement form the state and action spaces of the MDP. Intersections in the proximity of real-world fast charging stations [56] represent the set of reload states.

After the AEV picks a direction, it reaches the next intersection in that direction deterministically with a stochastic energy consumption. We base our model of consumption on distributions of vehicle travel times from the area [55] and conversion of velocity and travel times to energy consumption [52]. We discretize the consumption distribution into three possible values (c1, c2, c3) reached with corresponding probabilities (p1, p2, p3). We then model the transition from one intersection (I1) to another (I2) using additional dummy states as explained in Fig. 2.

The corresponding CMDP has 7378 states and 8473 actions. For a fixed set of 100 randomly selected target states, Fig. 3 shows influence of requested capacity on running times for **(a)** strategy for B¨uchi objective using CMDP (our approach), and **(b)** MEC-decomposition for the corresponding explicit MDP. With constant number of states, our algorithm runs rea-

**Fig. 2.** (Top:) Street network in the considered area. Charging stations are red, one way roads green, and two-way roads blue. (Bottom:) Transition from intersection I<sup>1</sup> to I<sup>2</sup> with stochastic consumption. The small circles are dummy states. (Color figure online)

**Fig. 3.** Mean computation times for a fixed target set of size 100 and varying capacity: **(a) CMDP** – computating B¨uchi objective via CMDP, **(b) explicit** – computating MEC decomposition of the explicit MDP, **(c) combined** – **(a)** and **(b)** combined for small capacity values.

sonably fast for all capacities and the running time stabilizes for *cap* > 95; this is not the case for the explicit approach where the number of states keeps growing (52747 for *cap* = 95) as well as the running time. The decomposition to MECs is slightly faster than solving B¨uchi using CMDP for the small capacities (Fig. 3 (c)), but MECs decomposition is only a part of the solution and running the full algorithm for B¨uchi would most likely diminish this advantage.

#### **7.2 Multi-agent Grid World**

We use multi-agent grid world to generate CMDP with huge number of states to study the scalability limits of the proposed algorithms. We model the rover and the helicopter of the Mars 2020 mission with the following realistic considerations: the rover enjoys infinite energy while the helicopter is restricted by batteries recharged at the rover. These two vehicle jointly operate on a mission where the helicopter reaches areas inaccessible to the rover. The outcomes of the helicopter's actions are deterministic while those of the rover—influenced by terrain dynamics—are stochastic. For a grid world of size n, this system can be naturally modeled as a CMDP with n<sup>4</sup> states. Figure 4 shows the running times of the B¨uchi objective for growing grid sizes and capacities in CMDP. We observe that the increase in the computational time of CMDP follows the growth in the number of states roughly linearly, and our implementation deals with an MDP with 1.6×10<sup>5</sup> states in no more than seven minutes. The figure also shows the running time for the MEC decomposition of the corresponding explicit MDP when the capacity is 10 and, for certain smaller, computationally feasible grid sizes, when the capacity is 20.

**Fig. 4.** Mean computation times for varying grid sizes and of size capacities: **(a) CMDP** – computating B¨uchi objective via CMDP, the gray line shows the corresponding growth in the number of states on separate scale, **(b) explicit** – computating MEC decomposition of the explicit MDP, **(c) combined** – combined computation time for a capacity of 10.

## **8 Conclusion and Future Work**

We presented a first study of consumption Markov decision processes (CMDPs) with qualitative ω-regular objectives. We developed and implemented a polynomial-time algorithm for CMDPs with an objective of probability-1 satisfaction of a given B¨uchi condition. Possible directions for the future work are extensions to quantitative analysis (e.g. minimizing the expected resource consumption), stochastic games, or partially observable setting.

**Acknowledgements.** We acknowledge the kind help of Vojtˇech Forejt, David Klaˇska, and Martin Kuˇcera in the discussions leading to this paper.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **STMC: Statistical Model Checker with Stratified and Antithetic Sampling**

Nima Roohi1(B) , Yu Wang<sup>2</sup> , Matthew West<sup>3</sup>, Geir E. Dullerud<sup>3</sup>, and Mahesh Viswanathan<sup>3</sup>

> <sup>1</sup> University of California, San Diego, USA nroohi@ucsd.edu <sup>2</sup> Duke University, Durham, USA yw354@duke.edu <sup>3</sup> University of Illinois at Urbana-Champaign, Urbana, USA *{*mwest,dullerud,vmahesh*}*@illinois.edu

**Abstract.** STMC is a statistical model checker that uses antithetic and stratified sampling techniques to reduce the number of samples and, hence, the amount of time required before making a decision. The tool is capable of statistically verifying any *black-box* probabilistic system that PRISM can simulate, against probabilistic bounds on any property that PRISM can evaluate over individual executions of the system. We have evaluated our tool on many examples and compared it with both symbolic and statistical algorithms. When the number of strata is large, our algorithms reduced the number of samples more than 3 times on average. Furthermore, being a statistical model checker makes STMC able to verify models that are well beyond the reach of current symbolic model checkers. On large systems (up to 10<sup>14</sup> states) STMC was able to check 100% of benchmark systems, compared to existing symbolic methods in PRISM, which only succeeded on 13% of systems. The tool, installation instructions, benchmarks, and scripts for running the benchmarks are all available online as open source.

#### **1 Introduction**

Statistical model checking (SMC) plays an important role in verifying probabilistic temporal logics on cyber-physical systems [1,14,15]. In SMC, we treat the objective bounded temporal specifications as statistical hypothesis, and infer their correctness with high confidence from samples of the systems. Compared to analytic approaches, statistical model checkers rely only on samples from the systems, and hence are more scalable to large real-world problems with complicated stochastic behavior [3,6,18].

To our knowledge, all existing SMC tools use independent samples. Admittedly, independent sampling is easy to implement, and it is the only option when the model is completely unknown. However, as shown recently in [24,25], if the model is partially known, then we can exploit this knowledge to generate semantically negatively correlated samples to increase the sample efficiency in SMC. In [24,25], we present the *stratified and antithetic sampling* techniques for discrete-time Markov chains (DTMC). In this work, we extend the technique to continuous-time Markov chains (CTMC), and implement the corresponding SMC algorithms in the tool STMC. The tool is evaluated on several case studies under hundreds of different scenarios, some of which are well beyond the capabilities of current symbolic model checkers. The results show that the sample efficiency can be significantly improved by using semantically negatively correlated sampling, instead of independent sampling.

This work also provides experimental comparisons between our SMC method and common symbolic model checking methods. Since we use large values for parameters in our case studies, it is no surprise that symbolic engines fail on many of them. However, without our results, the meaning of the word "large" is unclear. Our results give a good understanding of what is currently beyond the capabilities of symbolic engines in a popular tool like PRISM. Next, restricting our attention to the cases in which symbolic engines successfully terminate, our results give us a helpful comparison between symbolic and statistical verification times. It is well-known that symbolic algorithms do not scale well, while statistical ones do. However, that knowledge alone does not give us any insight into how much more or less time a symbolic method requires compared to a statistical one. Finally, when a symbolic method terminates, one might argue that its result is far more valuable than the result of a statistical approach since statistical methods can produce incorrect results. Unfortunately, that is not entirely true. Since the complexity of solving a problem is too high in practice, many symbolic algorithms, including those in PRISM, employ an iterative method to approximate probabilities. This approximation can be far from the actual probability, leading to incorrect model checking results (*e.g.,* [5]).

*Related Work.* Among the existing statistical model checkers, PRISM [4,12], MRMC [10], VESTA [19], YMER [27], and COSMOS [2] only support independent sampling on DTMC, CTMC, or other more general probabilistic models. PLASMA [9] also supports importance sampling. In importance sampling, although samples may have different weights, they are still generated independently. To our knowledge, our tool STMC is the only existing statistical model checker that employs semantically negatively related sampling on DTMC and CTMC.

#### **2 Stratified and Antithetic Sampling**

Stratified and antithetic samplings are two approaches for generating negatively correlated random samples. When using stratified sampling to draw n samples from a distribution, we divide the support into sets with equal measure, and then draw one sample from each partition. When using antithetic sampling, a random seed is first drawn from x ∈ [0, 1], and then two correlated samples are generated using x and 1 − x, respectively. Figures 1 and 2 compare independent and stratified sampling for 625 samples that we drew from the joint distribution of two random variables. In Fig. 1, each variable is uniformly distributed in [0, 1], and in Fig. 2, each variable is exponentially distributed with rate 3 (we only show samples that are within the unit square). It is clear that the stratified samples are (visually) better distributed in both figures.

We have shown in [24] that by choosing a proper representation of a Markov chain, the stratified sampling technique can be applied to generate semantically negatively correlated sample paths. This technique reduces the sampling cost for statistically verifying temporal formulas. In the rest of this section, we list two algorithms: Stratified sampling of a CTMC, and stratified sequential probability ratio test for a CTMC. The antithetic variants are simpler and we do not present them here for the lack of space. Compared to our algorithms in [24], there are two main differences. First, we present these algorithms for CTMCs instead of DTMCs, as they are slightly more involved. Second, for the stratified sampling

of a CTMC, our algorithm supports stratification over multiple steps directly. Algorithm 1 shows the pseudo-code for stratified sampling of a CTMC; to obtain a stratified sampling algorithm for DTMC, we only need to remove π2, index2, offset <sup>2</sup>, rate, r2, and r3. It takes two inputs: ψ, a temporal formula that we want to evaluate on every sampled path, and strata sizes , the number of strata at every step. This is a non-empty list of positive integers. Let K be the length of this list, and N be the product of its elements. If the ith item of the list is n then the number of strata at steps i, i + K, i + 2K, i + 3K, . . . must be n. <sup>1</sup> The algorithm simultaneously simulates N paths and terminates after the value of ψ on all these paths are known. Inside the main loop, simulation is performed incrementally, K steps at a time. Random permutations π1, π2, and variables index1, index<sup>2</sup> are used to make simulations of every K steps and random numbers r<sup>1</sup> and r<sup>2</sup> (defined later in the code) independent of each other. The number of strata at every step is an input to this algorithm. Using that number, variables offset <sup>1</sup> and offset <sup>2</sup> determine which strata we should use at step s. Finally, r<sup>2</sup> is a uniformly distributed stratified sample in [0, 1). However,

<sup>1</sup> The current version of PRISM only handles one initial state for simulation. Therefore, there will be no stratification for initializing paths.

we need an exponentially distributed stratified sample, which is precisely what *−*ln(1*−*r2)/rate gives us.

Algorithm 2 shows pseudo-code for statistical verification of CTMC and DTMC using stratified samples. The algorithm is quite simple. It keeps sampling using Algorithm 1 and computes the average and variance of the values it receives until a termination condition is satisfied. Checking the termination conditions after every step suggests using an online algorithm for computing the mean and variance of samples. We use Welford's online algorithm [26] in our implementation.

```
Algorithm 1 Stratified Sampling for CTMC
1 // Take stratified samples and return fraction of samples that satisfy ψ.
2 // Param ψ is an LTL formula.
3 // Param strata sizes is a non−empty list of positive integers .
4 function stratified sampling(ψ, strata sizes)
5 val K = strata sizes .length // Length of the list
6 val N = strata sizes .product // Product of elements in the list
7 val paths = initialize N paths // index starts at 0
8 val evals = initialize N evaluators // incrementally evaluate ψ on paths
9 // Evaluation in the condition of the while loop is performed by PRISM
10 while(∃ j∈{0,...,N−1}, evals[j](path[j])='unknown')
11 val π1 = random permutation of 0,1,...,N−1
12 val π2 = random permutation of 0,1,...,N−1
13 for(i ← 0,...,N−1)
14 vars index1, index2 = π1[i], π2[i]
15 for(s ← 0,...,K−1)
16 val size = strata sizes [ s] // number of strata at step s
17 vals offset 1, offset 2 = index1%size, index2%size
18 index1, index2 /= size
19 val rate = rate of last state in path[i ] // by PRISM
20 val r1 = rnd(0,1) / size + offset1 / size // rnd(0,1) ∈ [0,1)
21 val r2 = rnd(0,1) / size + offset2 / size
22 val r3 = −ln(1−r2) / rate // stratified exponentially distributed
23 Simulate one step in path[i ] using r1 and r3 // by PRISM
24 return number of paths that satisfy ψ / N
```
Finally, one can extend the following results from [24] to include CTMC.

**Theorem 1.** *Let* ψ *be a bounded LTL formula.*


**Theorem 2.** *The sampling cost of Algorithm 2 is asymptotically no more than the sampling cost of SPRT [20] using i.i.d. samples.*

## **3 Tool Architecture**

We have implemented our algorithms in Scala and published it under the GNU General Public License v3.0. The tool can be downloaded from https:// github.com/nima-roohi/STMC/, where installation instructions, benchmarks, and scripts for running the benchmarks are located. We use PRISM to load models from files, simulate them, and evaluate simulated paths against non-probabilistic bounded temporal properties. Therefore, STMC is capable of statistically verifying any model, as long as it can be simulated by PRISM, and bounded temporal properties can be evaluated on single executions of that model. Figure 3 shows STMC at a very high level. Boxes marked with 'P' are where we directly use PRISM.

```
Algorithm 2 Stratified Sequential Probability Ratio Test
1 // Verify P≤tψ using stratified sampling.
2 // Param t is the input threshold
3 // Param ψ is an LTL formula (non−probabilistic).
4 // Param strata sizes is a non−empty list of positive integers.
5 // Param min iter is the minimum number of iters. the algorithm should take.
6 // Param α is Type−I error probability (must satisfy 0 <α< 1
                                                              2 ).
7 // Param β is Type−II error probability (must satisfy 0 <β< 1
                                                                2 ).
8 // Param δ is half of the size of indifference region.
9 function stratified SPRT(P≤tψ, strata sizes, min iter, α, β, δ)
10 var iter = 1
11 var μ = 0 // average of stratified sampling return values
12 var σ = 0 // standard deviation of stratified sampling return values
13 while(true)
14 iter ++
15 val x = stratified sampling (ψ,strata sizes)
16 update μ and σ using x // e.g. Welford's online algorithm [27]
17 if iter > min iter then
18 if μ − t < − σ2
                     2δ iter ln 1−α
                              β then return true // accept P≤tψ
19 if μ − t > σ2
                     2δ iter ln 1−β
                              α then return false // reject P≤tψ
```
Executions of STMC are configured through different options/switches. The most basic options are help, which prints out a list of switches for both STMC and PRISM, and stmc, which enables the tool (without stmc, everything will be passed to PRISM, pretty much like STMC was not there in the first place). Statistical verification is enabled using option sim; it is always required when stmc is used. The sampling method is specified using option smp method or sm. Possible values for the sampling method are independent, antithetic, and stratified. Using option hyp test method or hm, users also have to specify a hypothesis testing method that they would like to use. Supported values for this option are currently SPRT, TSPRT, GLRT, and SSPRT. SPRT is used for the sequential probability ratio test [20]. This algorithm has already been implemented in PRISM and in our experience it has a very similar performance to our implementation (SPRT in Sect. 4 refers to the implementation from PRISM). We use our implementation for the next option, TSPRT. Sequential probability ratio test assumes that the actual probability is not within the δ-neighborhood of the input

**Fig. 3.** Architecture of STMC. Boxes marked with letter 'P' use PRISM directly. N is the number of strata, K is the length of strata-size list (see option strata size below).

threshold. If this assumption is not satisfied, then the algorithm does not guarantee any error probability. TSPRT, which stands for Ternary SPRT, solves this problem by introducing a third possible answer: TOO CLOSE. The algorithm was introduced in [28]. *Without* assuming that the actual probability is not within the δ-neighborhood of the input threshold, TSPRT guarantees Type-I and Type-II error probabilities are bounded by the input parameters α and β, respectively. Furthermore, it guarantees that if the actual probability and the input threshold are not δ-close, then the probability of returning TOO CLOSE is less than another input parameter γ; we call this Type-III error probability. The sequential probability ratio test was originally developed for simple hypotheses, and the test is not necessarily optimal when composite hypotheses are used [13]. To overcome this problem, the generalized likelihood ratio test (GLRT) was designed in [7]. The algorithm does not require an indifference region as an input parameter and provides guarantees on Type-I and Type-II error probabilities *asymptotically*. The main issue with this test is that since probabilistic error guarantees are asymptotic, for the test to perform reasonably well in practice (*i.e.*, respect the input error parameters), a correct minimum number of samples must be given as an extra input parameter. If this parameter is too large then the number of samples will be unnecessarily high, and if the parameter is too small then the actual error probability of the algorithm could be close to 0.5, even though the input error parameters are set to, for example, 10−<sup>7</sup>. The last possible value for hyp test method is SSPRT, which stands for Stratified SPRT. This option is used whenever stratified or antithetic samplings are desired.

When stratification is used, the number of strata should be specified using option strata size or ss. It is a comma-separated list of positive integers. For example, 4, 4, 4, 4, 4, 4 specifies 4 strata for six consecutive steps (4096 total), and 4096 specifies 4096 strata for every single step. Note that in both of these examples, stratified sampling simultaneously takes 4096 sample paths, which requires more memory. However, we saw in our experiments that for non-nested temporal formulas, at most two states of each path are stored into memory. Therefore, even larger strata sizes should be possible. This was the most challenging part of the implementation, because the simulator engine in PRISM is written assuming that paths are sampled one by one. However, if we followed the same approach in STMC, we would have to store every random number that was previously generated, which increased the amount of memory used for simulation from O(1) to O(N ×L), where N is the number of strata and L is the maximum length of simulated paths. By simulating the paths simultaneously, we only use O(N) bytes of memory. Next, Type-I, Type-II, Type-III, and half of the size of the indifference region are specified using alpha, beta, <sup>2</sup> gamma and delta, respectively (not every algorithm uses all of these parameters). Finally, most algorithms that use variance in their termination condition, require help when sample variance remains zero after the first few iterations. STMC uses min iter for this purpose, and PRISM uses simvar.

#### **4 Experimental Results**

We evaluated our algorithms on 10 different sets of examples. Each set contains four variations of the same problem with varying parameters and, hence, various sizes, and each of those variations includes four symbolic tests as well as 16 statistical ones. Furthermore, we repeat each of the statistical tests 20 times, to compute 95% confidence intervals for time and number of samples taken by the statistical algorithms. This gives us a total of 800 tests and 12 960 runs to obtain results for those tests. Regarding the stratified sampling, for each variation, we consider 13 settings in 4 groups. Each group uses a different number of strata: 2, 16, 256, and 4096. When the number of strata is more than 2, we also consider different possibilities for how to divide strata among different steps. For example, when 256 strata are used, 256<sup>1</sup> means every step has 256 strata, but different steps are independent of each other. On the other hand, 2<sup>8</sup> means every step has only two strata, but stratification is performed over every 8 consecutive steps.

For the sake of space, we only present 15% of our results in this paper. Full experimental results are available at https://nima-roohi.github.io/STMC/#/ benchmarks. Also, all the benchmark source files, along with scripts for running them, can be obtained from the tool's repository page https://github.com/ nima-roohi/STMC/. The parameters we chose resulted in large systems, and

<sup>2</sup> To the best of our knowledge, PRISM always assumes α = β.

**(b)** N: 150, K: 11, States: 1 849 234 352, Transitions: 2 944 935 077

**Fig. 4.** NAND multiplexing (DTMC - macOS) [17]

significant time has been spent to run and collect the results. To perform our experiments faster, we ran all of our tests using four processes (using option '-mt 4'). We also divided out our 10 sets of examples into two groups and ran each set on one of two machines. One of them is running Ubuntu 18.04 with an i7-8700 CPU 3.2 GHz and 16 GB memory, and the other one is running macOS Mojave with an i7 CPU 3.5 GHz and 32 GB memory. STMC's webpage contains a short description for each example and a link to another page for the full explanation. We end this section with a few notes regarding our results.

1. Like any statistical test that is run in a black-box setting, we need to assume simulation of every path will eventually terminate. In fact, PRISM uses the parameter simpathlen, with 10 000 as its default value, to restrict the maximum number of simulation steps in each path. Currently, simpathlen can be as large as 2<sup>63</sup> <sup>−</sup>1, which is more than enough in most practical applications.

**(b)** MAX COUNT: 1 000 000, States: about 845 017 880, Transitions: about 3 567 075 050

**Fig. 5.** Embedded control system (CTMC - Ubuntu) [11,16]


**Fig. 6.** Tandem queueing network (CTMC - macOS) [8]

4. In general, the more strata we use, the greater reduction in the number of samples we observe. Also, the performance of antithetic sampling is similar to the case of using only two strata. Our best results are obtained when 4096<sup>1</sup> is used for the number of strata. For example, in Fig. 5a, comparing SPRT and 4096<sup>1</sup> strata shows almost ten times reduction in the average number of samples. The tool's webpage contains an example in which stratification reduces variance to 0. This results in the termination of the algorithm immediately after a minimum number of samples have been taken, giving us 3 orders of magnitude reduction in the number of samples.

#### **5 Conclusion**

We presented our new tool called STMC for statistical model checking of discrete and continuous Markov chains. It uses antithetic and stratified sampling to improve the performance of a test. We evaluated our tool on hundreds of examples. Our experimental results show that our techniques can significantly reduce the number of samples and hence, the amount of time required for a test. For example, when 4096<sup>1</sup> strata were used, our algorithms reduced the number of samples more than 3 times on average. We have implemented our tool in PRISM, and published it online under GNU General Public License v3.0. We would like to extend STMC to support other stratification-based algorithms. In particular, stratified sampling in model checking Markov decision processes, and temporal properties that are defined on the sequence of distributions generated by different types of Markov chains (see [21–23] for examples).

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **AMYTISS: Parallelized Automated Controller Synthesis for Large-Scale Stochastic Systems**

Abolfazl Lavaei1(B) , Mahmoud Khaled<sup>2</sup>, Sadegh Soudjani<sup>3</sup>, and Majid Zamani1,4

<sup>1</sup> Department of Computer Science, LMU Munich, Munich, Germany lavaei@lmu.de

<sup>2</sup> Department of Electrical Engineering, TU Munich, Munich, Germany <sup>3</sup> School of Computing, Newcastle University, Newcastle upon Tyne, UK

<sup>4</sup> Department of Computer Science, University of Colorado Boulder, Boulder, USA

**Abstract.** In this paper, we propose a software tool, called AMYTISS, implemented in C++/OpenCL, for designing correct-by-construction controllers for large-scale discrete-time stochastic systems. This tool is employed to (i) build finite Markov decision processes (MDPs) as finite abstractions of given original systems, and (ii) synthesize controllers for the constructed finite MDPs satisfying bounded-time high-level properties including safety, reachability and reach-avoid specifications. In AMYTISS, scalable parallel algorithms are designed such that they support the parallel execution within CPUs, GPUs and hardware accelerators (HWAs). Unlike all existing tools for stochastic systems, AMYTISS can utilize high-performance computing (HPC) platforms and cloudcomputing services to mitigate the effects of the state-explosion problem, which is always present in analyzing large-scale stochastic systems. We benchmark AMYTISS against the most recent tools in the literature using several physical case studies including robot examples, room temperature and road traffic networks. We also apply our algorithms to a 3-dimensional autonomous vehicle and 7-dimensional nonlinear model of a BMW 320i car by synthesizing an autonomous parking controller.

**Keywords:** Parallel algorithms · Finite MDPs · Automated controller synthesis · Discrete-time stochastic systems · High performance computing platform

#### **1 Introduction**

#### **1.1 Motivations**

Large-scale stochastic systems are an important modeling framework to describe many real-life safety-critical systems such as power grids, traffic networks, selfdriving cars, and many other applications. For this type of complex systems,

A. Lavaei and M. Khaled—Authors have contributed equally.

This work was supported in part by the H2020 ERC Starting Grant AutoCPS (grant agreement No. 804639).

automating the controller synthesis procedure to achieve high-level specifications, *e.g.,* those expressed as linear temporal logic (LTL) formulae [24], is inherently very challenging mainly due to their computational complexity arising from uncountable sets of states and actions. To mitigate the encountered difficulty, finite abstractions, *i.e.,* systems with finite state sets, are usually employed as replacements of original continuous-space systems in the controller synthesis procedure. More precisely, one can first abstract a given continuous-space system by a simpler one, *e.g.,* a finite Markov decision process (MDP), and then perform analysis and synthesis over the abstract model (using algorithmic techniques from computer science [3]). Finally, the results are carried back to the original system, while providing a guaranteed error bound [5,13–21,23].

Unfortunately, construction of finite MDPs for large-scale complex systems suffers severely from the so-called *curse of dimensionality*: the computational complexity grows exponentially as the number of state variables increases. To alleviate this issue, one promising solution is to employ high-performance computing (HPC) platforms together with cloud-computing services to mitigate the state-explosion problem. In particular, HPC platforms have a large number of processing elements (PEs) and this significantly affects the time complexity when serial algorithms are parallelized [7].

#### **1.2 Contributions**

The main contributions and merits of this work are:


We apply the proposed implementations to real-world applications including robot examples, room temperature and road traffic networks, and autonomous vehicles. This extends the applicability of formal methods to some safetycritical real-world applications with high dimensions. The results show remarkable reductions in the memory usage and computation time outperforming all existing tools in the literature.

We provide AMYTISS as an *open-source* tool. After compilation, AMYTISS is loaded via pFaces [10] and launched for parallel execution within available parallel computing resources. The source of AMYTISS and detailed instructions on its building and running can be found in: https://github.com/mkhaled87/pFaces-AMYTISS

Due to lack of space, we provide details of traditional serial and proposed parallel algorithms, case studies, etc. in an arXiv version of the paper [12].

#### **1.3 Related Literature**

There exist several software tools on verification and synthesis of stochastic systems with different classes of models. SReachTools [30] performs stochastic reachability analysis for linear, potentially time-varying, discrete-time stochastic systems. ProbReach [25] is a tool for verifying the probabilistic reachability for stochastic hybrid systems. SReach [31] solves probabilistic bounded reachability problems for two classes of models: (i) nonlinear hybrid automata with parametric uncertainty, and (ii) probabilistic hybrid automata with additional randomness for both transition probabilities and variable resets. Modest Toolset [6] performs modeling and analysis for hybrid, real-time, distributed and stochastic systems. Two competitions on tools for formal verification and policy synthesis of stochastic models are organized with reports in [1,2].

FAUST<sup>2</sup> [29] generates formal abstractions for continuous-space discrete-time stochastic processes, and performs verification and synthesis for safety and reachability specifications. However, FAUST<sup>2</sup> is originally implemented in MATLAB and suffers from the curse of dimensionality due to its lack of scalability for large-scale models. StocHy [4] provides the quantitative analysis of discrete-time stochastic hybrid systems such that it constructs finite abstractions, and performs verification and synthesis for safety and reachability specifications.

AMYTISS differs from FAUST<sup>2</sup> and StocHy in two main directions. First, AMYTISS implements novel parallel algorithms and data structures targeting HPC platforms to reduce the undesirable effects of the state-explosion problem. Accordingly, it is able to perform parallel execution in different heterogeneous computing platforms including CPUs, GPUs and HWAs. Whereas, FAUST<sup>2</sup> and StocHy can only run serially on one CPU, and consequently, it is limited to small systems. Additionally, AMYTISS can handle the abstraction construction and controller synthesis for two and a half player games (*e.g.,* stochastic systems with bounded disturbances), whereas FAUST<sup>2</sup> and StocHy only handle one and a half player games (*e.g.,* disturbance-free systems).

Unlike all existing tools, AMYTISS offers highly scalable, distributed execution of parallel algorithms utilizing all available processing elements (PEs) in any heterogeneous computing platform. To the best of our knowledge, AMYTISS is the only tool of its kind for continuous-space stochastic systems that is able to utilize all types of compute units (CUs), simultaneously.

We compare AMYTISS with FAUST<sup>2</sup> and StocHy in Table 1 in detail in terms of different technical aspects. Although there have been some efforts in FAUST<sup>2</sup> and StocHy for parallel implementations, these are not compatible with HPC platforms. Specifically, FAUST<sup>2</sup> employs some parallelization techniques using parallel



for-loops and sparse matrices inside Matlab, and StocHy uses Armadillo, a multithreaded library for scientific computing. However, these tools are not designed for the parallel computation on HPC platforms. Consequently, they can only utilize CPUs and cannot run on GPUs or HWAs. In comparison, AMYTISS is developed in OpenCL, a language specially designed for data-parallel tasks, and supports heterogeneous computing platforms combining CPUs, GPUs and HWAs.

Note that FAUST<sup>2</sup> and StocHy do not natively support reach-avoid specifications in the sense that users can explicitly provide some avoid sets. Implementing this type of properties requires some modifications inside those tools. In addition, we do not make a comparison here with SReachTools since it is mainly for stochastic reachability analysis of linear, potentially time-varying, discrete-time stochastic systems, while AMYTISS is not limited to reachability analysis and can handle nonlinear systems as well.

Note that we also provide a script in the tool repository<sup>1</sup> that converts the MDPs constructed by AMYTISS into PRISM-input-files [11]. In particular, AMYTISS can natively construct finite MDPs from continuous-space stochastic control systems. PRISM can then be employed to perform the controller synthesis for those classes of complex specifications that AMYTISS does not support.

#### **2 Discrete-Time Stochastic Control Systems**

We formally introduce discrete-time stochastic control systems (dt-SCS) below.

**Definition 1.** *A discrete-time stochastic control system (dt-SCS) is a tuple*

$$
\Sigma = \left( X, U, W, \varsigma, f \right), \tag{1}
$$

<sup>1</sup> https://github.com/mkhaled87/pFaces-AMYTISS/blob/master/interface/export PrismMDP.m.

*where,*


$$\varsigma := \{ \varsigma(k) : \Omega \to \mathcal{V}\_{\varsigma}, \ k \in \mathbb{N} \};$$

*–* f : X×U×W → X *is a measurable function characterizing the state evolution of the system.*

The state evolution of Σ, for a given initial state x(0) ∈ X, an input sequence <sup>ν</sup>(·) : <sup>N</sup> <sup>→</sup> <sup>U</sup>, and a disturbance sequence <sup>w</sup>(·) : <sup>N</sup> <sup>→</sup> <sup>W</sup>, is characterized by the difference equations

$$\Sigma: x(k+1) = f(x(k), \nu(k), w(k)) + \mathcal{T}(k), \qquad k \in \mathbb{N}, \tag{2}$$

where <sup>Υ</sup>(k) := <sup>ς</sup>(k) with <sup>V</sup><sup>ς</sup> <sup>=</sup> <sup>R</sup><sup>n</sup> for the case of the additive noise, and <sup>Υ</sup>(k) := ς(k)x(k) with V<sup>ς</sup> equals to the set of diagonal matrices of the dimension n for the case of the multiplicative noise [22]. We keep the notation Σ to indicate both cases and use respectively Σ<sup>a</sup> and Σ<sup>m</sup> when discussing these cases individually.

We should mention that our parallel algorithms are independent of the noise distribution. For an easier presentation of the contribution, we present our algorithms and case studies based on normal distributions but our tool natively supports other practical distributions including uniform, exponential, and beta. In addition, we provide a subroutine in our software tool so that the user can still employ the parallel algorithms by providing the density function of the desired class of distributions.

*Remark 1.* Our synthesis is based on a max-min optimization problem for two and a half player games by considering the disturbance and input of the system as players [9]. Particularly, we consider the disturbance affecting the system as an adversary and maximize the probability of satisfaction under the worstcase strategy of a rational adversary. Hence, we minimize the probability of satisfaction with respect to disturbances, and maximize it over control inputs.

One may be interested in analyzing dt-SCSs without disturbances (cf. case studies). In this case, the tuple (1) reduces to Σ = (X, U, ς, f), where f : X ×U → X, and the Eq. (2) can be re-written as

$$\Delta \Sigma : x(k+1) = f(x(k), \nu(k)) + \mathcal{T}(k), \qquad k \in \mathbb{N}.\tag{3}$$

Note that input models in this tool paper are given inside configuration text files. Systems are described by stochastic difference equations as (2)–(3), and the user should provide the right-hand-side of equations<sup>2</sup>. In the next section, we formally define MDPs and discuss how to build finite MDPs from given dt-SCSs.

<sup>2</sup> An example of such a configuration file is provided at: https://github.com/mkhaled 87/pFaces-AMYTISS/blob/master/examples/ex-toy-safety/toy2d.cfg.

#### **3 Finite Markov Decision Processes (MDPs)**

A dt-SCS Σ in (1) is *equivalently* represented by the following MDP [8, Proposition 7.6]:

$$
\Sigma = (X, U, W, T\_\mathbf{x}),
$$

where the map T<sup>x</sup> : B(X) × X × U × W → [0, 1], is a conditional stochastic kernel that assigns to any x ∈ X, ν ∈ U, and w ∈ W, a probability measure Tx(·|x, ν, w). The alternative representation as the MDP is utilized in [28] to approximate a dt-SCS <sup>Σ</sup> with a *finite* MDP <sup>Σ</sup> using an abstraction algorithm. This algorithm first constructs a finite partition of the state set <sup>X</sup> <sup>=</sup> <sup>∪</sup><sup>i</sup>Xi, the input set <sup>U</sup> <sup>=</sup> <sup>∪</sup><sup>i</sup>Ui, and the disturbance set <sup>W</sup> <sup>=</sup> <sup>∪</sup><sup>i</sup>Wi. Then representative points ¯x<sup>i</sup> <sup>∈</sup> <sup>X</sup>i, ¯ν<sup>i</sup> <sup>∈</sup> <sup>U</sup>i, and ¯w<sup>i</sup> <sup>∈</sup> <sup>W</sup><sup>i</sup> are selected as abstract states, inputs, and disturbances. The transition probability matrix for the finite MDP <sup>Σ</sup> is also computed as

$$
\hat{T}\_{\mathbf{x}}(x'|x,\nu,w) = T\_{\mathbf{x}}(\Xi(x')|x,\nu,w), \quad \forall x, x' \in \hat{X}, \forall \nu \in \hat{U}, \forall w \in \hat{W}, \tag{4}
$$

where the map <sup>Ξ</sup> : <sup>X</sup> <sup>→</sup> <sup>2</sup><sup>X</sup> assigns to any <sup>x</sup> <sup>∈</sup> <sup>X</sup>, the corresponding partition element it belongs to, *i.e.,* <sup>Ξ</sup>(x) = <sup>X</sup><sup>i</sup> if <sup>x</sup> <sup>∈</sup> <sup>X</sup>i. Since <sup>X</sup>ˆ, <sup>U</sup><sup>ˆ</sup> and <sup>W</sup><sup>ˆ</sup> are finite sets, Tˆ <sup>x</sup> is a static map. It can be represented with a matrix and we refer to it, from now on, as the transition probability matrix.

For a given logic specification ϕ and accuracy level , the discretization parameter δ can be selected a priori such that <sup>ϕ</sup>) <sup>−</sup> <sup>P</sup>(Σ-

$$|\mathbb{P}(\Sigma \models \varphi) - \mathbb{P}(\hat{\Sigma} \models \varphi)| \le \epsilon,\tag{5}$$

where depends on the horizon of formula ϕ, the Lipschitz constant of the stochastic kernel, and the *state* discretization parameter δ (cf. [28, Theorem 9]). We refer the interested reader to the arXiv version [12] for more details.

In the next sections, we propose novel parallel algorithms for the construction of finite MDPs and the synthesis of their controllers.

#### **4 Parallel Construction of Finite MDPs**

In this section, we propose an approach to efficiently compute the transition probability matrix Tˆ <sup>x</sup> of the finite MDP <sup>Σ</sup>-, which is essential for any controller synthesis procedure, as we discuss later in Sect. 5.

## **4.1 Data-Parallel Threads for Computing** *<sup>T</sup>***ˆ<sup>X</sup>**

The serial algorithm for computing Tˆ <sup>x</sup> is presented in Algorithm 1 in the arXiv version [12]. Computations of mean <sup>μ</sup> <sup>=</sup> <sup>f</sup>(¯xi, <sup>ν</sup>¯<sup>j</sup> , <sup>w</sup>¯k, 0), PDF(<sup>x</sup> <sup>|</sup> μ, <sup>Σ</sup>), where PDF stands for probability density functions and Σ is a noise covariance matrix, and of Tˆ <sup>x</sup> all do not share data from one inner-loop to another. Hence, this is an embarrassingly data-parallel section of the algorithm. pFaces [10] can be utilized to launch necessary number of parallel threads on the employed hardware configuration (HWC) to improve the computation time of the algorithm. Each thread will eventually compute and store, independently, its corresponding values within Tˆ x.

## **4.2 Less Memory for Post States in** *<sup>T</sup>***ˆ<sup>X</sup>**

Tˆ <sup>x</sup> is a matrix with the dimension of (nx×nν×nw, nx). The number of columns is n<sup>x</sup> as we need to compute and store the probability for each reachable partition element Ξ(x <sup>l</sup>), corresponding to the representing post state x <sup>l</sup>. Here, we consider the Gaussian PDFs for the sake of a simpler presentation. For simplicity, we now focus on the computation of tuple (¯xi, ν¯<sup>j</sup> , w¯k). In many cases, when the PDF is decaying fast, only partition elements near μ have high probabilities of being reached, starting from ¯x<sup>i</sup> and applying an input ¯ν<sup>j</sup> .

We set a cutting probability threshold γ ∈ [0, 1] to control how many partition elements around μ should be stored. For a given mean value μ, a covariance matrix <sup>Σ</sup> and a cutting probability threshold <sup>γ</sup>, <sup>x</sup> <sup>∈</sup> <sup>X</sup> is called a PDF cutting point if <sup>γ</sup> = PDF(x|μ, <sup>Σ</sup>). Since Gaussian PDFs are symmetric, by repeating this cutting process dimension-wise, we end up with a set of points forming a hyper-rectangle in X, which we call it the cutting region and denote it by Xˆ <sup>Σ</sup> γ . This is visualized in Fig. 1 in the arXiv version [12] for a 2-dimensional system. Any partition element Ξ(x <sup>l</sup>) with x <sup>l</sup> outside the cutting region is considered to have zero probability of being reached. Such approximation allows controlling the sparsity of the columns of Tˆ <sup>x</sup>. The closer the value of γ to zero, the more accurate Tˆ <sup>x</sup> in representing transitions of <sup>Σ</sup>-. On the other hand, the closer the value of γ to one, less post state values need to be stored as columns in Tˆ <sup>x</sup>. The number of probabilities to be stored for each (¯xi, <sup>ν</sup>¯<sup>j</sup> , <sup>w</sup>¯k) is then <sup>|</sup>X<sup>ˆ</sup> <sup>Σ</sup> γ |.

Note that since Σ is fixed prior to running the algorithm, number of columns needed for a fixed γ can be identified before launching the computation. We can then accurately allocate a uniform fixed number of memory locations for any tuple (¯xi, ν¯<sup>j</sup> , w¯k) in Tˆ <sup>x</sup>. Hence, there is no need for a dynamic sparse matrix data structure and Tˆ <sup>x</sup> is now a matrix with a dimension of (n<sup>x</sup> <sup>×</sup> <sup>n</sup><sup>ν</sup> <sup>×</sup> <sup>n</sup>w, <sup>|</sup>X<sup>ˆ</sup> <sup>Σ</sup> <sup>γ</sup> |). **4.3 A Parallel Algorithm for Constructing Finite MDP** *<sup>Σ</sup>*-

#### 

We present a novel parallel algorithm (Algorithm 2 in the arXiv version [12]) to efficiently construct and store Tˆ <sup>x</sup> as a successor. We employ the discussed enhancements in Subsect. 4.1 and 4.2 within the proposed algorithm. We do not parallelize the for-loop in Algorithm 2, Step 2, to avoid excessive parallelism (*i.e.,* we parallelize loops only over X and U, but not over W). Note that, practically, for large-scale systems, <sup>|</sup>X<sup>ˆ</sup> <sup>×</sup>Uˆ<sup>|</sup> can reach up to billions. We are interested in the number of parallel threads that can be scheduled reasonably by available HW computing units.

#### **5 Parallel Synthesis of Controllers**

In this section, we employ dynamic programming to synthesize controllers for constructed finite MDPs <sup>Σ</sup> satisfying safety, reachability, and reach-avoid properties [26,27]. The classical serial algorithm and its proposed parallelized version are respectively presented as Algorithms 3 and 4 in the arXiv version [12]. We should highlight that the parallelism here mainly comes from the parallelization of matrix multiplication and the loop over time-steps cannot be parallelized due to the data dependency. More details can be found in the arXiv version.

## **5.1 On-the-Fly Construction of** *<sup>T</sup>***ˆ<sup>X</sup>**

In AMYTISS, we also use another technique that further reduces the required memory for computing Tˆ <sup>x</sup>. We refer to this approach as *on-the-fly abstractions* (OFA). In OFA version of Algorithm 4 [12], we skip computing and storing the MDP Tˆ <sup>x</sup> and the matrix Tˆ <sup>0</sup><sup>x</sup> (*i.e.,* Steps 1 and 5). We instead compute the required entries of Tˆ <sup>x</sup> and Tˆ <sup>0</sup><sup>x</sup> on-the-fly as they are needed (*i.e.,* Steps 13 and 15). This significantly reduces the required memory for Tˆ <sup>x</sup> and Tˆ <sup>0</sup><sup>x</sup> but at the cost of repeated computation of their entries in each time step from 1 to Td. This gives the user an additional control over the trade-off between the computation time and memory.

#### **5.2 Supporting Multiplicative Noises and Practical Distributions**

AMYTISS natively supports multiplicative noises and practical distributions such as uniform, exponential, and beta distributions. The technique introduced in Subsect. 4.2 for reducing the memory usage is also tuned for other distributions based on the support of their PDFs. Since AMYTISS is designed for extensibility, it allows also for customized distributions. Users need to specify their desired PDFs and hyper-rectangles enclosing their supports so that AMYTISS can include them in the parallel computation of Tˆ <sup>x</sup>. Further details on specifying customized distributions are provided in the README file.

AMYTISS also supports multiplicative noises as introduced in (2). Currently, the memory reduction technique of Subsect. 4.2 is disabled for systems with multiplicative noises. This means users should expect larger memory requirements for systems with multiplicative noises. However, users can still benefit from the proposed OFA version to compensate for the increase in memory requirement.We plan to include this feature for multiplicative noises in a future update of AMYTISS. Note that for a better demonstration, previous sections were presented by the additive noise and Gaussian normal PDF to introduce the concepts.

## **6 Benchmarking and Case Studies**

AMYTISS is self-contained and requires only a modern C++ compiler. It supports all major operating systems: Windows, Linux and Mac OS. Once compiled, utilizing AMYTISS is a matter of providing text configuration files and launching the tool. AMYTISS implements scalable parallel algorithms that run on top of pFaces [10]. Hence, users can utilize computing power in HPC platforms and cloud computing to scale the computation and control the computational complexities of their problems. Table 2 lists the HW configuration we use to benchmark AMYTISS. The devices range from local devices in desktop computers to advanced compute devices in Amazon AWS cloud computing services.


**Table 2.** HW configurations for benchmarking AMYTISS.

Table 3 shows the benchmarking results running AMYTISS with these HWCs for several case studies and makes comparisons between AMYTISS, FAUST<sup>2</sup>, and StocHy. We employ a machine with Windows operating system (Intel i7@3.6 GHz CPU and 16 GB of RAM) for FAUST<sup>2</sup>, and StocHy. It should be mentioned that FAUST<sup>2</sup> predefines a minimum number of representative points based on the desired abstraction error, and accordingly the computation time and memory usage reported in Table 3 are based on the minimum number of representative points. In addition, to have a fair comparison, we run all the case studies with additive noises since neither FAUST<sup>2</sup> nor StocHy supports multiplicative noises.

To show the applicability of our results to large-scale systems, we apply our techniques to several physical case studies. We synthesize controllers for 3- and 5-dimensional *room temperature networks* to keep temperatures in a comfort zone. Furthermore, we synthesize controllers for *road traffic networks* with 3 and 5 dimensions to keep the density of the traffic below some desired level. In addition, we apply our algorithms to a 2-dimensional nonlinear robot and synthesize controllers satisfying safety and reach-avoid specifications. Finally, we consider 3- and 7-dimensional *nonlinear* models of an autonomous vehicle and synthesize reach-avoid controllers to automatically park the vehicles. For details of case studies, see the arXiv version [12].

Table 3 presents a comparison between AMYTISS, FAUST<sup>2</sup> and StocHy w.r.t the computation time and required memory. For each HWC, we show the time in seconds to solve the problem. Clearly, employing HWCs with more PEs reduces the time to solve the problem. This is a strong indication for the scalability of the proposed algorithms. Since AMYTISS is the only tool for stochastic systems that can utilize the reported HWCs, we do not compare it with other similar tools.

In Table 3, first 13 rows, we also include the benchmark provided in StocHy [4, Case study 3]. Table 4 in the arXiv version [12] shows an additional comparison between StocHy and AMYTISS on a machine with the same configuration as the one employed in [4] (a laptop having an Intel Core i7 − 8550U CPU at 1.80GHz with 8 GB of RAM). StocHy suffers significantly from the stateexplosion problem as seen from its exponentially growing computation time. AMYTISS, on the other hand, outperforms StocHy and can handle bigger systems using the same hardware.


470 A. Lavaei et al.


**Table 3.** (*continued*)

As seen in Table 3, AMYTISS outperforms FAUST<sup>2</sup> and StocHy in all the case studies (maximum speedups up to 692000 times). Moreover, AMYTISS is the only tool that can utilize the available HW resources. The OFA feature in AMYTISS reduces dramatically the required memory, while still solves the problems in a reasonable time. FAUST<sup>2</sup> and StocHy fail to solve many of the problems since they lack the native support for nonlinear systems, they require large amounts of memory, or they do not finish computing within 24 hours.

Note that considering only dimensions of systems can be sometimes misleading. In fact, number of transitions in MDPs (|X<sup>ˆ</sup> <sup>×</sup> <sup>U</sup>ˆ|) can give a better judgment on the size of systems since it directly affects the memory/time needed for solving the problem. For instance in Table 3, the number of transitions for the 14-dimensional case study is 16384, while for the 5-dimensional room temperature example is 279936 transitions (*i.e.,* almost 17 times bigger). This means AMYTISS can clearly handle much larger systems than existing tools.

**Acknowledgment.** The authors would like to thank Thomas Gabler for his help in implementing traditional serial algorithms for the purpose of analysis and then comparing with the parallel ones.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **PRISM-games 3.0: Stochastic Game Verification with Concurrency, Equilibria and Time**

Marta Kwiatkowska<sup>1</sup>, Gethin Norman<sup>2</sup>, David Parker3(B), and Gabriel Santos<sup>1</sup>

<sup>1</sup> Department of Computing Science, University of Oxford, Oxford, UK <sup>2</sup> School of Computing Science, University of Glasgow, Glasgow, UK <sup>3</sup> School of Computer Science, University of Birmingham, Birmingham, UK d.a.parker@cs.bham.ac.uk

**Abstract.** We present a major new release of the PRISM-games model checker, featuring multiple significant advances in its support for verification and strategy synthesis of stochastic games. Firstly, *concurrent* stochastic games bring more realistic modelling of agents interacting in a concurrent fashion. Secondly, *equilibria*-based properties provide a means to analyse games in which competing or collaborating players are driven by distinct objectives. Thirdly, a *real-time* extension of (turn-based) stochastic games facilitates verification and strategy synthesis for systems where timing is a crucial aspect. This paper describes the advances made in the tool's modelling language, property specification language and model checking engines in order to implement this new functionality. We also summarise the performance and scalability of the tool, and describe a selection of case studies, ranging from security protocols to robot coordination, which highlight the benefits of the new features.

#### **1 Introduction**

Quantitative verification and strategy synthesis are powerful techniques for the modelling and analysis of computerised systems which require reasoning about *quantitative* aspects such as probability, time or resource usage. They can be used either to produce formal *guarantees* about a system's behaviour, for example relating to its safety, reliability or efficiency, or to synthesise controllers which ensure that such guarantees will be met at runtime. Examples of applications where these techniques have been used include power controllers, unmanned aerial vehicles, autonomous driving and communication protocols.

As computing systems increasingly involve concurrently acting autonomous agents, *game-theoretic* approaches are becoming widespread in computer science as a faithful modelling abstraction. These techniques can be used to reason about the *competitive* or *collaborative* behaviour of multiple rational agents or entities with distinct goals or objectives. Applications include designing a defence strategy against attackers in a cybersecurity context or building controllers for autonomous robots operating in an unknown or potentially malicious environment. More broadly, game theory techniques such as mechanism design can be used to design protocols that are robust in the context of selfish participants, for example by incorporating *incentive/reward* schemes. They have been successfully deployed in diverse contexts such as network routing [29], auction design [10], public good provisioning [15] and ranking or recommender systems [30].

However, designing game-theoretic systems correctly is a challenge, in view of the complexity of behaviours arising from the interactions between autonomy, concurrency and quantitative rewards. This motivates the development of formal verification techniques to check their correctness and synthesise correct-byconstruction strategies for them. Furthermore, many of these applications require reasoning about *stochasticity*: protocols may employ randomisation, e.g., for reliable dissemination across a network, or to minimise the impact of information leakage to an observer; autonomous robots operate in uncertain environments and may use unreliable hardware components or noisy sensors; and data-driven systems such as ranking or navigation systems rely on learnt probabilistic models for their execution.

These challenges have inspired the development of PRISM-games [22], a model checking tool for *stochastic games*. To date, it supports verification and strategy synthesis for *turn-based* stochastic multi-player games (TSGs) using a variety of objectives, expressed in the temporal logic rPATL (probabilistic alternating-time temporal logic with rewards) [8]. This allows specification of *zero-sum* objectives relating to one coalition of players trying to maximise a probabilistic or reward-based objective, while the remaining players form a second coalition trying to minimise the objective. It has also been extended to include (zero-sum) *multi-objective* properties and additional reward measures such as *long-run average* and *ratio reward* [22]. These methods have been successfully applied to several case studies such as autonomous vehicles, user-centric networks, temperature control and an aircraft electric power system [21,23,32].

In this paper, we present PRISM-games 3.0, which significantly extends its predecessor's functionality in several ways [18–20]. First, it supports the modelling and analysis of *concurrent stochastic multi-player games* (CSGs). Previous versions of the tool supported TSGs, in which it is assumed that each state of the game is controlled by a specific player. CSGs allow players to make decisions simultaneously, without knowledge of each other's choices, providing a more realistic model of concurrent execution and decision making. For this, we extend the PRISM-games modelling language, allowing the user to specify concurrency and synchronisation among agents, as well as to associate rewards to either joint or single actions.

In the first instance, PRISM-games now supports verification and strategy synthesis for CSGs using zero-sum specifications in rPATL [19], which we extend to accommodate *instantaneous rewards*. The second major addition to the tool is the possibility of reasoning about *equilibria-based* properties, which allow players to have distinct, not necessarily conflicting objectives. We extend rPATL to express properties relating to (subgame perfect) *social-welfare optimal Nash equilibria (SWNE)* [20]. This provides synthesis of strategies for all players (or coalitions) from which there is no incentive for any of them to unilaterally deviate in any state of the game, and where the combined probabilities or rewards are maximised (or minimised).

Thirdly, PRISM-games now adds support for *probabilistic timed multi-player games* (TPTGs) [18] (currently just the turn-based variant of the model). These extend stochastic multi-player games with real-valued clocks, in the style of (probabilistic) timed automata. This allows real-time aspects of a system to be more accurately modelled. Using the *digital clocks* approach [18], timed models are automatically translated to discrete-time models in order to be verified.

In this paper, we describe the key enhancements made to the tool, notably to its modelling and property specification languages. We also summarise the results, algorithms and implementation of the verification and strategy synthesis techniques developed [18–20] to support the new functionality. We then describe a selection of case studies which showcase the advantages of the new features, and summarise the performance and scalability of the tool.

PRISM-games is open source and runs on all major operating systems. It is available from the tool's website [34]. Supporting material for the paper, including a virtual machine that allows easy running of the tool and reproduction of the results presented in Sect. 4, can be found at [33].

**Related Tools.** Other model checking tools have been developed to provide support for games. For non-stochastic games, model checking tools such as PRALINE [5], EAGLE [31] and EVE [16] support Nash equilibria [27], as does MCMAS-SLK [6] via strategy logic. Uppaal Stratego [11] is a tool that uses machine learning, model checking and simulation for the synthesis of strategies for stochastic priced timed games. GAVS+ [9] is a general-purpose tool for algorithmic game solving, supporting TSGs and (non-stochastic) concurrent games, but not CSGs. GIST [7] allows the analysis of ω-regular properties on probabilistic games, but again focuses on turn-based, not concurrent, games. General purpose tools such as Gambit [26] can compute a variety of equilibria but not for stochastic games.

#### **2 Modelling and Property Specification Languages**

#### **2.1 Modelling Concurrent and Timed Games**

The new features in PRISM-games 3.0 have required some significant enhancements to the language used to specify models. For the addition of real-time aspects (i.e., TPTGs), the changes are a straightforward combination of the existing language features for specifying TSGs in PRISM-games (player specifications and mapping of model states to them) and for probabilistic timed automata in PRISM (clock variables, module invariants, guards and clock resets). We therefore focus in this paper on the specification of CSGs, where the language changes are more fundamental.

PRISM-games has an existing language for specifying TSGs, which is an extension of the native PRISM modelling language [22]. Components of the system to be modelled are encapsulated as *modules*, whose states are defined by a set of finite-range *variables* and whose behaviour is specified using actionlabelled *guarded commands*. In a state, one or more modules can execute a command to make a transition: if the guard (a predicate over state variables) is satisfied, the state can be modified (probabilistically) by applying the *updates* of the command. Multiple modules can execute simultaneously if their commands are labelled with the same action.

```
1 csg
2 // Player specification
3 player p1 mac1 endplayer
4 player p2 mac2 endplayer
5 // Max energy per user
6 const int emax;
7 // User 1
8 module mac1
9 s1 : [0..1] init 0; // Has user 1 sent?
10 e1 : [0..emax] init emax; // Energy level of user 1
11 [w1] true -> (s1 '=0); // Wait
12 [t1] e1 >0 -> (s1 '=c '?0:1) & (e1 '=e1 -1); // Transmit
13 endmodule
14 // Define second user using module renaming
15 module mac2 = mac1 [ s1=s2 , e1=e2 , w1=w2 , t1=t2 ] endmodule
```

```
1 // Probability qi for transmission success when i users send
2 const double q1;
3 const double q2;
4 // Channel (computes joint transmission probabilities)
5 module channel
6 c : bool init false; // Did a collision occur during transmission?
7 [t1 ,w2] true -> q1:(c'=false) + (1-q1):(c'=true); // User 1 transmits
8 [w1 ,t2] true -> q1:(c'=false) + (1-q1):(c'=true); // User 2 transmits
9 [t1 ,t2] true -> q2:(c'=false) + (1-q2):(c'=true); // Both transmit
10 endmodule
```

```
1 // Reward structures
2 rewards "mess1" // Number of messages sent by user 1
3 s1=1 : 1;
4 endrewards
5 rewards "mess2" // Number of messages sent by user 2
6 s2=1 : 1;
7 endrewards
8 rewards "send2" // Number of times users 1 and 2 transmit simultaneously
9 [t1 ,t2] true : 1;
10 endrewards
```
**Fig. 1.** An example PRISM-games 3.0 CSG model of medium access control.

CSGs cannot naturally be modelled with this approach for several reasons: (i) players need to be able to concurrently choose between multiple commands with different action labels; (ii) the update performed by one player may be different depending on the action chosen by another player; (iii) when multiple players execute, variables may need to be updated according to an arbitrary probability distribution, rather than being limited to the product of separate distributions specified locally by individual modules.

Figure 1 shows an example of the PRISM-games 3.0 modelling language, which we use to illustrate some of its new features. It models a probabilistic version of the *medium access control* problem, previously described in [5]. Two users share a communication channel. At each time step, user maci (i = 1, 2) can choose between transmitting a message (ti) or waiting (wi). Variable si tracks whether a user successfully sent its message in the last time step and ei represents its energy level: transmissions can only occur when energy is positive. A third component is the channel channel, modelled by Boolean variable c denoting whether a collision occurred on the last transmission attempt.

The first difference (with respect to modelling of TSGs) is the player specification: players are associated with modules (rather than states). In the example, module maci constitutes player i. Modules with no nondeterministic choice (like channel) do not need to be tied to a player.

In each state of the CSG, each player chooses between enabled commands of the corresponding modules; if no command is enabled, the player idles. The players move simultaneously so transitions are labelled with *lists* of action labels [a1,..., a*n*]. So the guarded command notation is extended accordingly: note how the channel's behaviour depends on which actions the two users take (the same principle applies when specifying reward structures; see send2). Furthermore, variable updates within a command can now be dependent on the updated values of other variables, provided there are no cyclic dependencies. See for example (s1'=c'?0:1), which updates s1 depending on whether there was a channel collision (reflected in c', the updated value of c). We use this mechanism to model interference on the channel: module channel specifies a joint probability distribution which is used to update variables s1 and s2 simultaneously.

#### **2.2 Property Specification**

PRISM-games 3.0 also extends the language used to specify properties for verification and strategy synthesis. The previous version already supported *zero-sum* queries for TSGs using the logic rPATL, which combines the game logic ATL with reward-based extensions of the probabilistic logic PCTL. Again, for the new real-time models, it is relatively easy to combine the existing rPATL notation with real-valued time bounds. So, we focus here on the case of CSGs, and in particular *equilibria-based* properties.

We compute values or synthesise strategies which are *social-welfare optimal Nash equilibria (SWNE)*, i.e., which maximise (or minimise) the sum of the values associated to the objectives for each player, but from which there is no incentive for any of them to unilaterally deviate in any state of the game. We express such properties by adding to rPATL the + operator, which is then used to denote the sum of the values associated to both *bounded* and *unbounded* objectives.

When using the rewards operator in equilibria-based properties, we can reason about *cumulative* (C<sup>k</sup>), *instantaneous* (I<sup>=</sup><sup>k</sup>) and *expected reachability* (F) objectives. For properties with the probability operator, we support bounded and unbounded reachability using the temporal operators *next* (X), *eventually* (F) and *until* (U). In order to express zero-sum properties for CSGs, we have implemented all the previous temporal operators for probabilistic queries and a subset of the rPATL operators reported in [8] for reward-based queries, adding to that the instantaneous reward operator.

Finally, following the style of rPATL we separate players into *coalitions* with the syntax -coalition, in order to specify the player or association of players for which we seek to maximise or minimise the values for a given zero-sum property. For equilibria-based properties, given that we maximise/minimise the sum, we use the same operator to separate players in different coalitions using a colon, while players in the same coalition are separated by a comma.

The following are examples of both zero-sum and equilibria-based properties for the medium access CSG model described in Fig. 1.


#### **3 Verification and Strategy Synthesis Algorithms**

#### **3.1 Zero-Sum Properties for CSGs**

When verifying zero-sum properties of CSGs, PRISM-games makes use of the model checking algorithms described in [19], which were based on the methods formulated in [2,3]. We rely on *value iteration* and classical convergence criteria to approximate/compute the values for all states of the game under study, and on solving a *linear program* to compute a *minimax* strategy at each state. This corresponds to solving a *matrix game*, which represents a *one-shot zero-sum* game for the actions of each player in a state. For unbounded properties, the solutions of the matrix games are used to synthesise an optimal (memoryless and randomised) strategy for each player. Prior to this numerical solution phase, we find and remove the states for which the optimal expected reward values are infinite by using the qualitative algorithms developed in [1].

Our current implementation uses the LPsolve [24] library to solve the matrix games at each state. CSGs are built and stored in a explicit-state fashion using an extension of PRISM's Java-implemented *explicit* (sparse-matrix based) engine.

#### **3.2 Equilibria-Based Properties for CSGs**

For equilibria-based properties of CSGs, PRISM-games implements the methods described in [20]. We rely on value iteration and *backwards induction* to approximate/compute values and synthesise strategies that are SWNE. For unbounded properties, we can only compute values that are ε-Nash equilibria, since Nash equilibria are not guaranteed to exist. At each state, we solve a *bimatrix game*, which is a representation of a *one-shot nonzero-sum* game and is a linear complementarity problem. We solve these games via *labelled polytopes*, finding all equilibria values through an SMT-based implementation, for which we use thirdparty SMT solvers Z3 [12] and Yices [13]. We make use of a precomputation step of finding and removing *dominated strategies* in order to minimise the number of calls to the solver.

Unlike zero-sum properties, the synthesised strategies for bounded and unbounded equilibria-based properties require (finite) memory. This is needed due to the fact that a player's choices may change once their objectives have been satisfied. We synthesise strategies by combining the strategy vectors computed for each bimatrix game and the strategy generated by computing optimal values for the MDP resulting from playing the game after either goal has been met. As we use value iteration to approximate values for infinite-horizon properties, we can only synthesise ε-Nash strategy profiles.

#### **3.3 Turn-Based Probabilistic Timed Games**

Verification and strategy synthesis of TPTGs relies on the algorithms from [18], which use the *digital clocks* approach that has been a developed for a variety of real-time models. A translation, at the level of the PRISM-games modelling languages, automatically converts the problem of analysing a TPTG into one of solving a (discrete-time) TSG, for which PRISM-games's existing engines can be used. Time-bounded properties are handled by automatically integrating a timing clock into the model prior to translation. As in the rest of PRISM-games, TSGs are also built and solved using the Java-based *explicit* engine.

## **4 Case Studies and Experimental Results**

The features added in PRISM-games 3.0 have been used for over 10 new case studies across a wide range of application domains, including computer security (intrusion detection, radio jamming, non-repudiation), communication protocols (medium access control, Aloha), incentive schemes for cooperative networking, multi-robot navigation problems and processor task scheduling. Details can be found in [18–20] and on the case studies section of the PRISM-games website [35]. Supporting material is at [33]. In this section, we showcase four selected case studies that demonstrate the benefits of the tool's new functionality. We also include a discussion of the scalability and performance of the tool.

**Future Markets Investor.** This example models two investors playing against the stock market. Investors choose when to invest or to cash in, and the stock market can decide to bar investments at certain points; fluctuations in share values are modelled stochastically. PRISM-games can, for example, synthesise optimal strategies for the two investors to maximise their expected joint profit over time, acting against the stock market which aims to minimise it.

(a) **Future markets investor:** avoiding unrealistic strategy choices using CSGs

(b) **Robot coordination:** using equilibria for mutually beneficial navigation plans

(c) **Non-repudiation:** Attack & defence strategies in a timed, randomised protocol (d) **Public good game:** Tuning incentive parameter *f* by synthesising equilibria

**Fig. 2.** Results illustrating the benefits of the new verification and strategy synthesis techniques implemented in PRISM-games 3.0; see Sect. 4 for details. (Color figure online)

Figure 2(a) shows the results obtained for this property using both a *turnbased* stochastic game (TSG) and a *concurrent* stochastic game (CSG). The former leads to unrealistic modelling as the market can see the choices made by the investors and gain an unfair advantage: the values in the blue plot in Fig. 2(a) are artificially low. In the CSG model, using PRISM 3.0, decisions are taken simultaneously, yielding the correct strategies and values (red plot).

**Robot Coordination.** Our next example models two robots navigating in opposite directions across a 10-by-10 grid as a CSG. Obstacles which hinder the robots as they move from location to location are modelled stochastically; and if the robots collide, both of them fail in their attempt to reach their goal. We use PRISM-games to find navigation strategies for the two robots, where each robot does not know the choice being made by the other at each step.

The objective for each robot is to navigate successfully, so we maximise the average probability (across the two robots) of success. Figure 2(b) shows the best value that can be achieved within a fixed period of k moves across the grid. One robot aiming single-handedly to achieve this goal performs reasonably well (blue plot), but we can achieve better collective performance by using PRISM-games to synthesise a (social welfare Nash) *equilibrium* strategy (red plot).

**Non-repudiation.** Next we consider a non-repudiation protocol [25], which permits an originator O to transfer information to a recipient R while guaranteeing non-repudiation, i.e., that neither O nor R can deny that they participated in the transfer. Here, both *probability* (the protocol is randomised) and *time* (the protocol relies on acknowledgement time-outs) are essential ingredients for checking correctness. Furthermore, we model the two participants of the protocol as opposing players, resulting in a TPTG model.

To verify the protocol, we check the worst-case probability that a malicious recipient R can obtain the information being transferred within time T. This can be done with a PTA model (as in [28]) but, with a timed game model, we can also analyse counter-strategies of the honest participant. The results (see Fig. 2(c)) show that, while it is not possible to prevent the information being received, it *is* possible to delay it (the red plot shows lower probabilities for higher times). Note that the bound T is an actual time bound, unlike the examples above, where step-bounded properties measure the number of steps or rounds.

**Public Good Game.** Lastly, we show a new case study modelling a *public good game*, a well studied model of social choice in economics where participants repeatedly decide how much of an endowment to keep for themselves or to share it with the other players. The total shared by the players is boosted by a factor f in order to incentivise sharing and then divided equally between the players.

Figure 2(d) shows results from a 2-player game, modelled as a CSG. Player choices are necessarily *concurrent*, to avoid cheating. We also need to use *equilibria* since the players have distinct individual goals (maximising personal expected profit). Figure 2(d) shows the values for each player in a synthesised optimal (social welfare Nash) equilibrium for varying f. Changes in f affect both the resulting profit *and* potential inequalities between players in equilibria, indicating the subtleties involved when tuning parameters in an incentive mechanism and the usefulness of analysing this with PRISM-games.

**Scalability and Performance.** Finally, we show some experimental results for a representative selection of larger examples, to give an indication of the scalability and performance of PRISM-games 3.0. Table 1 shows a range of models (the first 4 are CSGs; the last is a TPTG), the statistics for each one (number of


**Table 1.** Model statistics for some of the case studies.

players, states, transitions) and the time taken to build and verify the model for some example properties on a 2.10 GHz Intel Xeon with 8 GB of JVM memory.

Verification of CSGs is more computationally expensive than for TSGs supported in earlier versions of the tool, but PRISM-games 3.0 is able to build and analyse CSGs with more than 3 million states on relatively modest hardware. The majority of the time is spent solving (bi)matrix games, which is done repeatedly for all states of the model. Hence, the number of choices per state, which dictates the size of these games, has a greater impact on performance than for TSGs. Unsurprisingly, equilibria properties are slower than zero-sum ones. For both types of property, the number of players in the game does not have a major impact since they are grouped into coalitions yielding a 2-player game to solve. For TPTGs, the digital clocks translation is fast since it is done syntactically, and then a TSG is solved whose size depends on several factors, primarily the number of locations and the magnitude of any time bound in the property.

#### **5 Conclusions**

We have presented PRISM-games 3.0, which adds three major new features: (i) concurrent stochastic games; (ii) synthesis of equilibria; and (iii) timed probabilistic games. The usefulness of these has been illustrated on several newly created or extended applications.

CSGs are considerably more expensive to solve than their turn-based counterparts and a key challenge is efficiently solving the matrix game at each state, which is itself a non-trivial optimisation problem. For equilibria, the main difficulty is finding an optimal equilibrium, which currently relies on iteratively restricting the solution search space. Both problems are sensitive to the limitations and issues of floating-point arithmetic, particularly equilibria computation, and might benefit from arbitrary precision representations. Recent research has also pointed out the shortcomings of only using a lower bound approximation as a stopping criterion for value iteration, as it can lead to inaccuracies [4,14,17]. The impact of similar issues on model checking for games is still to be studied.

A range of further challenges exist for future work. These include providing support for *multi-coalitional* properties and implementing other techniques for equilibria computation. For timed games, we plan to investigate concurrent variants, and also zone-based solution techniques. More broadly speaking, partial information variants of games would be a useful addition.

**Acknowledgements.** This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 834115) and the EPSRC Programme Grant on Mobile Autonomy (EP/M019918/1).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Optimistic Value Iteration**

Arnd Hartmanns1(B) and Benjamin Lucien Kaminski<sup>2</sup>

<sup>1</sup> University of Twente, Enschede, The Netherlands arnd.hartmanns@utwente.nl <sup>2</sup> University College London, London, UK b.kaminski@ucl.ac.uk

**Abstract.** Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two "sound" variations, which also deliver an upper bound, have recently appeared. In this paper, we present a new sound approach that leverages value iteration's ability to usually deliver good lower bounds: we obtain a lower bound via standard value iteration, use the result to "guess" an upper bound, and prove the latter's correctness. We present this optimistic value iteration approach for computing reachability probabilities as well as expected rewards. It is easy to implement and performs well, as we show via an extensive experimental evaluation using our implementation within the mcsta model checker of the Modest Toolset.

#### **1 Introduction**

Markov decision processes (MDP, [30]) are a widely-used formalism to represent discrete-state and -time systems in which *probabilistic* effects meet controllable *nondeterministic* decisions. The former may arise from an environment or agent whose behaviour is only known statistically (e.g. message loss in wireless communication or statistical user profiles), or it may be intentional as part of a randomised algorithm (such as exponential backoff in Ethernet). The latter may be under the control of the system—then we are in a planning setting and typically look for a *scheduler* (or strategy, policy) that minimises the probability of unsafe behaviour or maximises a reward—or it may be considered adversarial, which is the standard assumption in verification: we want to establish that the maximum probability of unsafe behaviour is below, or that the minimum reward is above, a specified threshold. Extensions of MDP cover continuous time [11,26],

The authors are listed alphabetically. This work was partly performed while author B. L. Kaminski was at RWTH Aachen University, Aachen, Germany. This work was supported by ERC Advanced Grant 787914 (FRAPPANT), DFG Research Training Group 2236 (UnRAVeL), and NWO VENI grant no. 639.021.754.

S. K. Lahiri and C. Wang (Eds.): CAV 2020, LNCS 12225, pp. 488–511, 2020. https://doi.org/10.1007/978-3-030-53291-8\_26

and the analysis of complex formalisms such as stochastic hybrid automata [13] can be reduced to the analysis of MDP abstractions.

The standard algorithm to compute optimal (maximum or minimum) probabilities or reward values on MDP is *value iteration* (VI). It implicitly computes the corresponding optimal scheduler, too. It keeps track of a value for every state of the MDP, locally improves the values iteratively until a "convergence" criterion is met, and then reports the final value for the initial state as the overall result. The initial values are chosen to be an underapproximation of the true values (e.g. 0 for all states in case of probabilities or non-negative rewards). The final values are then an improved underapproximation of the true values. For unbounded (infinite-horizon) properties, there is unfortunately no (known and practical) convergence criterion that could guarantee a predefined error on the final result. Still, probabilistic model checkers such as Prism [24] report the final result obtained via simple relative or absolute global error criteria as the definitive probability. This is because, on *most* case studies considered so far, value iteration in fact converges fast enough that the (relative or absolute) difference between the reported and the true value approximately meets the error specified for the convergence criterion. Only relatively recently has this problem of soundness come to the attention of the probabilistic verification and planning communities [7,14,28]. First highlighted on hand-crafted counterexamples, it has by now been found to affect benchmarks and real-life case studies, too [3].

The first proposal to compute sound reachability probabilities was to use *interval iteration* (II [15], first presented in [14]). The idea is to perform two iterations concurrently, one starting from 0 as before, and one starting from 1. The latter improves an overapproximation of the true values, and the process can be stopped once the (relative or absolute) difference between the two values for the initial state is below the specified -, or at any earlier time with a correspondingly larger but known error. Baier et al. extended interval iteration to expected accumulated reward values [3]; here, the complication is to find initial values that are guaranteed to be an overapproximation. The proposed graph-based (i.e. not numerical) algorithm in practice tends to compute conservative initial values from which many iterations are needed until convergence. More recently, *sound value iteration* (SVI) [31] improved upon interval iteration by computing upper bounds on-the-fly and performing larger value improvements per iteration, for both probabilities and expected rewards. However, we found SVI tricky to implement correctly; some edge cases not considered by the algorithm as presented in [31] initially caused our implementation to deliver incorrect results or diverge on very few benchmarks. Both II and SVI fundamentally depend on the MDP being *contracting*; this must be ensured by appropriate structural transformations, e.g. by collapsing end components, a priori. These transformations additionally complicate implementations, and increase memory requirements.

*Our Contribution.* We present (in Sect. 4) a new algorithm to compute sound reachability probabilities and expected rewards that is both simple and practically efficient. We first (1) perform standard value iteration until "convergence", resulting in a lower bound on the value for every state. To this we (2) apply specific heuristics to "guess", for every state, a candidate upper bound value. Further value iterations (3) then confirm (if all values decrease) or disprove (if all values increase, or lower and upper bounds cross) the soundness of the upper bounds. In the latter case, we perform more lower bound iterations with reduced before retrying from step 2. We combine classic results from domain theory with specific properties of value iteration to show that our algorithm terminates. In problematic cases, many retries may be needed before termination, and performance may be worse than interval or sound value iteration. However, on many existing case studies, value iteration already worked well, and our approach attaches a soundness proof to its result with moderate overhead. We thus refer to it as *optimistic value iteration* (OVI). In contrast to II and SVI, it also works well for non-contracting MDP, albeit without a general termination guarantee. Our experimental evaluation in Sect. 5 uses all applicable models from the Quantitative Verification Benchmark Set [21] to confirm that OVI indeed performs as expected. It uses our publicly available implementations of II, SVI, and now OVI in the mcsta model checker of the Modest Toolset [20].

*Related Work.* In parallel to [15], the core idea behind II was also presented in [7] (later improved in [2]), embedded in a learning-based framework that manages to alleviate the state space explosion problem in models with a particular structure. In this approach, end components are statistically detected and collapsed on-thefly. II has recently been extended to stochastic games in [23], offering *deflating* as a new alternative to collapsing end components in MDP. Deflating does not require a structural transformation, but rather extra computation steps in each iteration applied to the states of all (a priori identified) end components.

The only known convergence criterion for pure VI was presented in [9, Sect. 3.5]: if we run VI until the absolute error between two iterations is less than a certain value α, then the computed values at that point are within α of the true values, and can in fact be rounded to the exact true values (as implemented in the *rational search* approach [5]). However, α cannot be freely chosen; it is a fixed number that depends on the size of the MDP and the largest denominator of the (rational) transition probabilities. The number of iterations needed is exponential in the size and the denominators. While not very useful in practice, this establishes an exponential upper bound on the number of iterations needed in unbounded-horizon VI. Additionally, Balaji et al. [4] recently showed the computations in finite-horizon value iteration to be EXPTIME-complete.

As an alternative to the iterative numeric road, guaranteed correct results (modulo implementation errors) can be obtained by using precise rational arithmetic. It does not combine too well with iterative methods like II or SVI due to the increasingly small differences between the values and the actual solution. The probabilistic model checker Storm [10] thus combines topological decomposition, policy iteration, and exact solvers for linear equation systems based on Gaussian elimination when asked to use rational arithmetic [22, Section 7.4.8]. The disadvantage is the significant runtime cost for performing the unlimitedprecision calculations, limiting such methods to relatively smaller MDP.

The only experimental evaluations using large sets of benchmarks that we are aware of compared VI with II to study the overhead needed to obtain sound

**Table 1.** VI and OVI example on M<sup>e</sup>

results via II [3], and II with SVI to show the performance improvements of SVI [31]. The learning-based method with deflation of [2] does not compete against II and SVI; its aim is rather in dealing with state space explosion (i.e. memory usage). Its performance was evaluated on 16 selected small (<400 k states) benchmark instances in [2], showing absolute errors on the order of 10−<sup>4</sup> on many benchmarks with a 30-min timeout. SVI thus appears the most competitive technique in runtime and precision so far. Consequently, in our evaluation in Sect. 5, we compare OVI with SVI, and II for reference, using the default relative error of 10−<sup>6</sup>, including large and excluding clearly acyclic benchmarks (since they are trivial even for VI), with a 10-min timeout which is rarely hit.

#### **2 Preliminaries**

R+ <sup>0</sup> is the set of all non-negative real numbers. We write { x<sup>1</sup> -→ y1,... } to denote the function that maps all <sup>x</sup>i to <sup>y</sup>i, and if necessary in the respective context, implicitly maps to 0 all x for which no explicit mapping is specified. Given a set S, its powerset is 2S. A (discrete) *probability distribution* over S is a function <sup>μ</sup> <sup>∈</sup> <sup>S</sup> <sup>→</sup> [0, 1] with countable *support spt*(μ) def = { s ∈ S | μ(s) > 0 } and - s∈*spt*(μ) <sup>μ</sup>(s) = 1. *Dist*(S) is the set of all probability distributions over <sup>S</sup>.

*Markov Decision Processes* (MDP) combine nondeterministic choices as in labelled transition systems with discrete probabilistic decisions as in discrete-time Markov chains (DTMC). We define them formally and describe their semantics.

**Definition 1.** *<sup>A</sup>* Markov decision process *(MDP) is a triple* <sup>M</sup> <sup>=</sup> S, sI , T *where* <sup>S</sup> *is a finite set of* states *with* initial state <sup>s</sup>I <sup>∈</sup> <sup>S</sup> *and* <sup>T</sup> : *<sup>S</sup>* <sup>→</sup> <sup>2</sup>*Dist*(R<sup>+</sup> <sup>0</sup> <sup>×</sup>S) *is the* transition function*.* T(s) *must be finite and non-empty for all* s ∈ S*.*

For s ∈ S, an element of T(s) is a *transition*, and a pair r, s ∈ *spt*(T(s)) is a *branch* to successor state s with *reward* r and probability T(s)(r, s ). Let M(s- *<sup>I</sup>* ) be M but with initial state s I , and <sup>M</sup><sup>0</sup> be <sup>M</sup> with all rewards set to zero. *Example 1.* Figure <sup>1</sup> shows our example MDP <sup>M</sup>e. We draw transitions as lines to an intermediate node from which branches labelled with probability and reward (if not zero) lead to successor states. We omit the intermediate node and probability 1 for transitions with a single branch, and label some transitions to refer to them in the text. M<sup>e</sup> has 5 states, 7 transitions, and 10 branches.

In practice, higher-level modelling languages like Modest [17] are used to specify MDP. The semantics of an MDP is captured by its *paths*. A path represents a concrete resolution of all nondeterministic and probabilistic choices. Formally:

**Definition 2.** *<sup>A</sup>* finite path *is a sequence* <sup>π</sup>fin <sup>=</sup> <sup>s</sup><sup>0</sup> <sup>μ</sup><sup>0</sup> <sup>r</sup><sup>0</sup> <sup>s</sup><sup>1</sup> <sup>μ</sup><sup>1</sup> <sup>r</sup><sup>1</sup> ...μn−1rn−1sn *where* <sup>s</sup>i <sup>∈</sup> <sup>S</sup> *for all* <sup>i</sup> ∈ { <sup>0</sup>,...,n } *and* <sup>∃</sup> <sup>μ</sup>i <sup>∈</sup> <sup>T</sup>(si): ri, si+1 ∈ *spt*(μi) *for all* <sup>i</sup> ∈ { <sup>0</sup>,...,n <sup>−</sup> <sup>1</sup> }*. Let* <sup>|</sup>πfin<sup>|</sup> def = n*,* last(πfin) def <sup>=</sup> <sup>s</sup>n*, and* rew(πfin) def = n−<sup>1</sup> i=0 <sup>r</sup>i*.* <sup>Π</sup>*fin is the set of all finite paths starting in* <sup>s</sup>I *. A* path *is an analogous infinite sequence* <sup>π</sup>*, and* <sup>Π</sup> *is the set of all paths starting in* <sup>s</sup>I *. We write* <sup>s</sup> <sup>∈</sup> <sup>π</sup> *if* <sup>∃</sup> <sup>i</sup>: <sup>s</sup> <sup>=</sup> <sup>s</sup>i*, and* <sup>π</sup><sup>→</sup>G *for the shortest prefix of* <sup>π</sup> *that contains a state in* <sup>G</sup> <sup>⊆</sup> <sup>S</sup>*, or* <sup>⊥</sup> *if* <sup>π</sup> *contains no such state. Let* rew(⊥) def = ∞*.*

A scheduler (or *adversary*, *policy* or *strategy*) only resolves the nondeterministic choices of M. For this paper, memoryless deterministic schedulers suffice [6].

**Definition 3.** *A function* <sup>s</sup> : <sup>S</sup> <sup>→</sup> *Dist*(R<sup>+</sup> <sup>0</sup> × S) *is a* scheduler *if, for all* s ∈ S*, we have* s(s) ∈ T(s)*. The set of all schedulers of* M *is* S(M)*.*

Given an MDP <sup>M</sup> as above, let <sup>M</sup>|<sup>s</sup> <sup>=</sup> S, sI , T|s with <sup>T</sup>|s(s) = { <sup>s</sup>(s) } be the DTMC induced by s. Via the standard cylinder set construction [12, Sect. 2.2] on <sup>M</sup>|s, a scheduler induces a probability measure <sup>P</sup><sup>M</sup> <sup>s</sup> on measurable sets of paths starting in <sup>s</sup>I . For goal state <sup>g</sup> <sup>∈</sup> <sup>S</sup>, the maximum and minimum **probability of reaching** *g* is defined as P<sup>M</sup> max( <sup>g</sup>) = sup<sup>s</sup>∈<sup>S</sup> <sup>P</sup><sup>M</sup> <sup>s</sup> ({ π ∈ Π | g ∈ π }) and PM min( <sup>g</sup>) = inf<sup>s</sup>∈<sup>S</sup> <sup>P</sup><sup>M</sup> <sup>s</sup> ({ π ∈ Π | g ∈ π }), respectively. The definition extends to sets G of goal states. Let R<sup>M</sup> G : <sup>Π</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> <sup>0</sup> be the random variable defined by RM G (π) = rew(π<sup>→</sup>G) and let <sup>E</sup><sup>M</sup> <sup>s</sup> (G) be the expected value of R<sup>M</sup> G under <sup>P</sup><sup>M</sup> s . Then the maximum and minimum **expected reward to reach** *G* is defined as EM max(G) = sup<sup>s</sup> E<sup>M</sup> <sup>s</sup> (G) and E<sup>M</sup> min(G) = inf<sup>s</sup> E<sup>M</sup> <sup>s</sup> (G), respectively. We omit the superscripts for M when they are clear from the context. From now on, whenever we have an MDP with a set of goal states G, we assume that they have been made absorbing, i.e. for all g ∈ G we only have a self-loop: T(g) = {{0, g -→ 1 } }.

**Definition 4.** *An* end component *of* M *as above is a (sub-)MDP* S , T , s I *where* S ⊆ S*,* T (s) ⊆ T(s) *for all* s ∈ S *, if* μ ∈ T (s) *for some* s ∈ S *and* r, s ∈ *spt*(μ) *then* r = 0*, and the directed graph with vertex set* S *and edge set* { s, s |∃ μ ∈ T (s): 0, s ∈ *spt*(μ) } *is strongly connected.*

#### **3 Value Iteration**

The standard algorithm to compute reachability probabilities and expected rewards is *value iteration* (VI) [30]. In this section, we recall its theoretical foundations and its limitations regarding convergence.

```
1 function GSVI(M = -
                 S, sI , T, S?, v, α, diff )
2 repeat
3 error := 0
4 foreach s ∈ S? do
5 vnew := Φ(v)(s) // iterate lower bound
6 if vnew > 0 then error := max(error, diff (v(s), vnew ))
7 v(s) := vnew
8 until error ≤ α
```
**Algorithm 1.** Gauss-Seidel value iteration

#### **3.1 Theoretical Foundations**

Let <sup>V</sup> <sup>=</sup> { <sup>v</sup> <sup>|</sup> <sup>v</sup> : <sup>S</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> <sup>0</sup> ∪ {∞} } be a space of vectors of values. It can easily be shown that V, with

$$v \preceq w \qquad \text{if and only if} \qquad \forall s \in S \colon v(s) \le w(s)$$

forms a complete lattice, i.e. every subset <sup>V</sup> <sup>⊆</sup> <sup>V</sup> has a supremum (and an infimum) in <sup>V</sup> with respect to . We write <sup>v</sup> <sup>≺</sup> <sup>w</sup> for <sup>v</sup> <sup>w</sup> <sup>∧</sup> <sup>v</sup> <sup>=</sup> <sup>w</sup> and <sup>v</sup> ∼ <sup>w</sup> for ¬(v w ∨ w v).

Minimum and maximum reachability probabilities and expected rewards can be expressed as the *least fixed point* of the *Bellman operator* <sup>Φ</sup>: <sup>V</sup> <sup>→</sup> <sup>V</sup> given by

$$\Phi(v) \stackrel{\text{def}}{=} \lambda \, s. \begin{cases} opt\_{\mu \in T(s)} \sum\_{\langle r, s' \rangle \in spt(\mu)} \mu(s') \cdot (r + v(s')) & \text{if } s \in S\_{\text{?}}\\ d & \text{if } s \notin S\_{\text{?}} \end{cases}$$

where *opt* ∈ { max, min } and the choice of both S? ⊆ S and d depends on whether we wish to compute reachability probabilities or expected rewards. In any case, the Bellman operator Φ can be shown to be Scott-continuous [1], i.e. in our case: for any subset <sup>V</sup> <sup>⊆</sup> <sup>V</sup>, we have <sup>Φ</sup>(sup <sup>V</sup> ) = supΦ(<sup>V</sup> ).

The Kleene fixed point theorem for Scott-continuous self-maps on complete lattices [1,27] guarantees that lfpΦ, the least fixed point of Φ, indeed exists. Note that Φ can still have more than one fixed point. In addition to mere existence of lfp Φ, the Kleene fixed point theorem states that lfp Φ can be expressed by

$$\text{lfp } \Phi = \lim\_{n \to \infty} \Phi^n(\bar{0}) \tag{1}$$

where ¯0 <sup>∈</sup> <sup>V</sup> is the zero vector and <sup>Φ</sup>n(v) denotes <sup>n</sup>-fold application of <sup>Φ</sup> to <sup>v</sup>. Equation 1 is the basis of VI: the algorithm iteratively constructs a sequence of vectors

<sup>v</sup><sup>0</sup> <sup>=</sup> ¯0 and <sup>v</sup>i+1 <sup>=</sup> <sup>Φ</sup>(vi),

which converges to the sought-after least fixed point. This convergence is *monotonic*: for every <sup>n</sup> <sup>∈</sup> <sup>N</sup>, we have <sup>Φ</sup>n(¯0) <sup>Φ</sup>n+1(¯0) and hence <sup>Φ</sup>n(¯0) lfp <sup>Φ</sup>. In particular, <sup>Φ</sup>n(¯0)(sI ) is an *under*approximation of the sought-after quantity for every n. Note that iterating Φ on *any* underapproximation v lfp Φ (instead of ¯0) will still converge to lfp <sup>Φ</sup> and <sup>Φ</sup>n(v) lfp <sup>Φ</sup> will hold for any <sup>n</sup>.

*Gauss-Seidel Value Iteration.* Algorithm 1 shows the pseudocode of a VI implementation that uses the so-called *Gauss-Seidel optimisation*: Whereas standard VI needs to store two vectors <sup>v</sup>i and <sup>v</sup>i+1, Gauss-Seidel VI stores only a single vector v and performs updates in place. This does not affect the correctness of VI, but may speed up convergence depending on the order in which the loop in line 4 considers the states in S?. The error metric *diff* is used to check for convergence.

*VI for Probabilities.* For determining reachability probabilities, we operate on <sup>M</sup><sup>0</sup> and set <sup>S</sup>? <sup>=</sup> <sup>S</sup> \ <sup>G</sup> and <sup>d</sup> = 1. Then the corresponding Bellman operator satisfies

$$(\mathfrak{fp}\,\Phi)(s) = \mathcal{P}\_{opt}^{M^{(s)}}(\diamond G),$$

and VI will iteratively approximate this quantity *from below*. The corresponding call to Algorithm <sup>1</sup> is GSVI(M<sup>0</sup>, <sup>S</sup> \ <sup>G</sup>, { <sup>s</sup> -→ 0 | s ∈ S \ G }∪{ s -→ 1 | s ∈ G }, α, *diff* ).

*VI for Expected Rewards.* For determining the expected reward EM(*s*) *opt* (G), we operate on M and first have to determine the set S<sup>∞</sup> of states from which the minimum (if *opt* = max) or maximum (if *opt* = min) probability to reach G is less than 1.<sup>1</sup> If <sup>s</sup>I <sup>∈</sup> <sup>S</sup>∞, then the result is <sup>∞</sup> due to the definition of rew(⊥). Otherwise, we choose S? = S \ S<sup>∞</sup> and d = ∞. Then, for *opt* = max, the least fixed point of the corresponding Bellman operator satisfies

$$(\mathfrak{fp}\,\Phi)(s) = \mathrm{E}\_{opt}^{M^{(s)}}(G).$$

Again, VI underapproximates this quantity. The same holds for *opt* = min if M does not have end components containing states other than those in G and S∞. The corresponding call to Algorithm 1 is GSVI(M, S \ S∞, { s -→ 0 | s ∈ S \ S<sup>∞</sup> }∪{ s -→∞| s ∈ S<sup>∞</sup> }, α, *diff* ).

#### **3.2 Uniqueness of Fixed Points**

lfp Φ may not be unique for two reasons: states that cannot reach G under the optimal scheduler may take any value (causing fixed points greater than lfpΦ for Pmin and Pmax), and states in end components may take values higher than lfpΦ. The latter affects Pmax (higher fixed points) and Emin (lower fixed points).

*Example 2.* In <sup>M</sup>e of Fig. 1, <sup>s</sup><sup>1</sup> and <sup>s</sup><sup>2</sup> and the two transitions in-between form an end component. For PM*<sup>e</sup>* max( { s<sup>+</sup> }), v = { s -→ 1 } is a non-least fixed point for the corresponding Bellman operator; with appropriate values for s<sup>1</sup> and s2, we can obtain fixed points with any v(s0) > 0.5 of our choice. Similarly, we have EM min({ s+, s<sup>−</sup> })=0.6 (by scheduling b in s0), but due to the end component (with only zero-reward transitions by definition), the fixed point is s.t. v(s0) = 0.

<sup>1</sup> This can be done via Algs. 2 (for S<sup>1</sup> min) and 4 (for S<sup>1</sup> max) of [12], respectively. These algorithms do not consider the probabilities, but only whether there is a transition and branch (with positive probability) from one state to another or not. We thus call them graph-based algorithms, as opposed to numeric algorithms like VI itself.

VI works for Pmin, Pmax, and Emax with multiple fixed points: we anyway seek lfpΦ and start from a (trivial) underapproximation. For Emin, (zero-reward) end components need to be collapsed: we determine the maximal end components using algorithms similar to [15, Alg. 1], then replace each of them by a single state, keeping all transitions leading out of the end component. We refer to this as the *ECC* transformation. However, such end components rarely occur in case studies for Emin since they indicate Zeno behaviour w.r.t. to the reward. As rewards are often associated to time progress, such behaviour would be unrealistic.

To make the fixed points unique, for Emax and Emin we fix the values of all states in G to 0. For Pmin, we precompute the set S<sup>0</sup> min of states that reach G with minimum probability 0 using Alg. 1 of [12], then fix their values to 0. For Pmax, we analogously use S<sup>0</sup> max via Alg. 3 of [12]. For Pmax and Emin, we additionally need to remove end components via ECC. In contrast to the precomputations, ECC changes the structure of the MDP and is thus more memory-intensive.

#### **3.3 Convergence**

VI and GSVI will not *reach* a fixed point in general, except for special cases such as acyclic MDP. It is thus standard to use a convergence criterion based on the difference between two consecutive iterations (lines 6 and 8) to make GSVI terminate: we either check the *absolute error*, i.e.

$$\|\vec{df} = \vec{df}\_{abs} \stackrel{\text{def}}{=} \lambda \left< v\_{old}, v\_{new} \right> , \ v\_{new} - v\_{old},$$

or the *relative error*, i.e.

$$diff = diff\_{rel} \stackrel{\text{def}}{=} \lambda \left< v\_{old}, v\_{new} \right> . (v\_{new} - v\_{old}) / v\_{new} \dots$$

By default, probabilistic model checkers like Prism and Storm use *diffrel* and α = 10−<sup>6</sup>. Upon termination of GSVI, v is then closer to the least fixed point, but remains an underapproximation. In particular, α has, in general, no relation to the final difference between <sup>v</sup>(sI ) and P*opt*( <sup>G</sup>) or E*opt*(G), respectively.

*Example 3.* Consider MDP <sup>M</sup>e of Fig. <sup>1</sup> again with <sup>G</sup> <sup>=</sup> { <sup>s</sup><sup>+</sup> }. The first four rows in the body of Table 1 show the values for v after the i-th iteration of the outer loop of a call to GSVI(M<sup>0</sup> e , { <sup>s</sup>0, s1, s<sup>2</sup> }, max, { <sup>s</sup><sup>+</sup> -→ 1 }∪{ s -→ 0 | s = s<sup>+</sup> }, 0.05, *diffabs* ). After the fourth iteration, GSVI terminates since the error is less than α = 0.05; at this point, we have Pmax( s+) − v(s0)=0.08 > α.

To obtain a value within a prescribed error of the true value, we can compute an upper bound in addition to the lower bound provided by VI. Interval iteration (II) [3,15] does so by performing, in parallel, a second value iteration on a second vector u that starts from a known overapproximation. For probabilities, the vector ¯1 = { <sup>s</sup> -→ 1 } is a trivial overapproximation; for rewards, more involved graph-based algorithms need to be used to precompute (a very conservative) one [3]. II terminates when *diff* (v(sI ), u(sI )) <sup>≤</sup> <sup>2</sup>-


**Table 2.** Preprocessing requirements of value iteration variants

and returns v*II* = <sup>1</sup> <sup>2</sup> (u(s<sup>I</sup> ) + <sup>v</sup>(s<sup>I</sup> )). With <sup>v</sup>*true* = P*opt*( <sup>G</sup>), II thus guarantees that v*II* ∈ [v*true* − - · v*true* , v*true* + - · v*true* ] and analogously for expected rewards. However, to ensure termination, II requires a unique fixed point: u converges from above to the greatest fixed point gfp Φ, thus for every MDP where *diff* ((lfp <sup>Φ</sup>)(sI ),(gfp <sup>Φ</sup>)(sI )) <sup>&</sup>gt; <sup>2</sup>-, II diverges. For Pmax, we have gfp Φ(s*ec*)=1 for all s*ec* in end components, thus II tends to diverge when there is an end component. Sound value iteration (SVI) [31] is similar, but uses a different approach to derive upper bounds that makes it perform better overall, and that eliminates the need to precompute an initial overapproximation for expected rewards. However, SVI still requires unique fixed points.

We summarise the preprocessing requirements of VI, II, and SVI in Table 2. With unique fixed points, we can transform Pmin into Pmax by making S<sup>0</sup> min states absorbing and setting G to S<sup>0</sup> min, and Pmax into Emax by a similar transformation adding reward 1 to entering G. Most of the literature on VI variants works in such a setting and describes the Pmax or Emax case only. Since OVI also works with multiple fixed points, we have to consider all four cases individually.

#### **4 Optimistic Value Iteration**

We now present a new, practical solution to the convergence problem for unbounded reachability and expected rewards. It exploits the empirical observation that on many case studies VI delivers results which are roughly α-close to the true value—it only lacks the ability to prove it. Our approach, *optimistic value iteration* (OVI), extends standard VI with the ability to deliver such a proof.

The key idea is to exploit a property of the Bellman operator Φ and its Gauss-Seidel variant as in Algorithm 1 to determine whether a candidate vector is a lower bound, an upper bound, or neither. The foundation is basic domain theory: by Scott-continuity of Φ it follows that Φ is monotonic, meaning v w implies Φ(v) Φ(w). A principle called *Park induction* [29] for monotonic self-maps on complete lattices yields the following induction rules: For any <sup>u</sup> <sup>∈</sup> <sup>V</sup>,

 **function** OVI(M = -S, s<sup>I</sup> , T, S?, v, , α, diff ) GSVI(M,S?, v, α, diff ) // perform standard value iteration <sup>u</sup> := { <sup>s</sup> → diff <sup>+</sup>(s) <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup>? }, viters := 0 // guess candidate upper bound **while** viters < <sup>1</sup> <sup>α</sup> **do** // start verification phase up<sup>∀</sup> := true, down<sup>∀</sup> := true, viters := viters + 1, error := 0 **foreach** s ∈ S? **do** v*new* := Φ(v)(s), u*new* := Φ(u)(s) // iterate both bounds **if** v*new* > 0 **then** error := max { error, diff (v(s), v*new* ) } **if** u*new* < u(s) **then** // upper value decreased: u(s) := u*new* , up<sup>∀</sup> := false // update u with new lower u*new* **else if** u*new* > u(s) **then** // upper value increased: down<sup>∀</sup> := false // discard new higher u*new* v(s) := v*new* // update v with new value v*new* **if** v(s) > u(s) **then goto** line 17 // lower bound crossed u **if** down<sup>∀</sup> **then return** <sup>1</sup> (u(s<sup>I</sup> ) + v(s<sup>I</sup> )) // u is inductive upper bound **else if** up<sup>∀</sup> **then goto** line 17 // u is inductive lower bound **return** OVI(M,S?, v, , *error* , diff ) // retry with reduced α

**Algorithm 2.** Optimistic value iteration

$$
\Phi(u) \preceq u \qquad \text{implies} \qquad \mathsf{lfp} \Phi \preceq u. \tag{2}
$$

$$\text{and} \qquad u \preceq \Phi(u) \qquad \text{implies} \qquad u \preceq \mathbf{gfp} \,\Phi. \tag{3}$$

Thus, if we can construct a candidate vector u s.t. Φ(u) u, then u is in fact an upper bound on the sought-after lfpΦ. We call such a u an *inductive upper bound*. Optimistic value iteration uses this insight and can be summarised as follows:


The resulting procedure in more detail is shown as Algorithm 2. Starting from the same initial vectors v as for VI, we first perform standard Gauss-Seidel value iteration (in line 2). We refer to this as the *iteration phase* of OVI. After that, vector v is an improved underapproximation of the actual probabilities or reward values. We then "guess" a vector u of *upper values* from the *lower values* in v (line 3). The guessing heuristics depends on *diff* : if *diff* = *diffabs* , then we use

$$\operatorname{dif}^+(s) = \begin{cases} 0 & \text{if } v(s) = 0\\ v(s) + \epsilon & \text{otherwise} \end{cases}$$

if *diff* = *diffrel* , then

$$\operatorname{dif}^+(s) = v(s) \cdot (1 + \epsilon).$$

We cap the result at 1 for Pmin and Pmax. These heuristics have three important properties: (**H1**) <sup>v</sup>(s) = 0 implies *diff* <sup>+</sup>(s) = 0, (**H2**) *diff* (v(s), *diff* <sup>+</sup>(s)) <sup>≤</sup> <sup>2</sup>-, and (**H3**) *diff* (v(s), *diff* <sup>+</sup>(s)) > 0 unless v(s) = 0 or v(s) = 1 for Pmin and Pmax.

Then the *verification phase* starts in line 4: we perform value iteration on the lower values v and upper values u at the same time, keeping track of the direction in which the upper values move. For u, line 7 and the conditions around line 10 mean that we actually use operator Φmin(u) = λ s. min(Φ(u)(s), u(s)). This may shorten the verification phases, and is crucial for our termination argument. A state s is *blocked* if Φ(u)(s) > Φmin(u)(s) and *unblocked* if Φ(u)(s) < u(s) here.

If, in some iteration, no state was blocked (line 15), then we had Φ(u) u before the start of the iteration. We thus know by Eq. 2 that the current u is an inductive upper bound for the values of all states, and the true value must be in the interval [v(sI ), u(sI )]. By property H2, our use of <sup>Φ</sup>min for <sup>u</sup>, and the monotonicity of <sup>Φ</sup> as used on <sup>v</sup>, we also know that *diff* (v(sI ), u(sI )) <sup>≤</sup> <sup>2</sup>-, so we immediately terminate and return the interval's centre <sup>v</sup>I <sup>=</sup> <sup>1</sup> <sup>2</sup> (u(s<sup>I</sup> ) + <sup>v</sup>(s<sup>I</sup> )). The true value <sup>v</sup>*true* = (lfp <sup>Φ</sup>)(sI ) must then be in [vI <sup>−</sup> - · <sup>v</sup>*true* , vI <sup>+</sup> -· v*true* ].

If, in some iteration, no state was unblocked (line 16), then again by Park induction we know that u gfpΦ. If we are in a situation of unique fixed points, this also means u lfp Φ, thus the current u is no upper bound: we cancel verification and go back to the iteration phase to further improve v before trying again. We do the same if v crosses u: then u(s) < v(s) ≤ (lfp Φ)(s) for some s, so this u was just another bad guess, too.

Otherwise, we do not yet know the relationship between u and lfp Φ, so we remain in the verification phase until we encounter one of the cases above, or until we exceed the verification budget of <sup>1</sup> α iterations (as checked by the loop condition in line 4). This budget is a technical measure to ensure termination.

*Optimisation.* In case the fixed point of Φ is *unique*, by Park induction (via Eq. 3) we know that u Φ(u) implies that u is a lower bound on lfp Φ. In such situations of single fixed points, we can—as an optimisation—additionally replace v by u before the *goto* in line 16.

*Heuristics.* OVI relies on heuristics to gain an advantage over alternative methods such as II or SVI; it cannot be better on *all* MDP. Concretely, we can choose


Algorithm 2 shows the choices made by our implementation. We employ the standard stopping criteria used by probabilistic model checkers for VI, and the "weakest" guessing heuristics that satisfies properties H1, H2, and H3 (i.e. guessing any higher values would violate one of these properties). The only arbitrary choice is how to reduce α, which we at least halve on every retry. We experimentally found this to be a good compromise on benchmarks that we consider in Sect. 5, where


*Example 4.* We now use the version of Φ to compute Pmax and call

OVI(M<sup>0</sup> e , { <sup>s</sup>0, s1, s<sup>2</sup> }, { <sup>s</sup><sup>+</sup> -→ 1 }∪{ s -→ 0 | s = s<sup>+</sup> }, 0.05, 0.05, *diffabs* ).

Table 1 shows the values in v and u during this run, assuming that we use non-Gauss-Seidel iterations. The first iteration phase lasts from i = 0 to 4. At this point, u is initialised with the values shown in italics. The first verification phase needs only one iteration to realise that u is actually a lower bound (to a fixed point which is not the least fixed point, due to the uncollapsed end component). Blocked states are marked with a bar; unblocked states have a lower u-value than in the previous iteration. We resume GSVI from i = 6. The error in GSVI is again below α, which had been reduced to 0.008, during iteration i = 9. We thus start another verification phase, which immediately (in one iteration) finds the newly guessed vector u to be an upper bound, with *diff* (v(s0), u(s0)) < 2-.

#### **4.1 Termination of OVI**

We showed above that OVI returns an --correct result when it terminates. We now show that it terminates in all cases except for Pmax with multiple fixed points. Note that this is a stronger result than what II and SVI can achieve.

Let us first consider the situations where lfp Φ is the unique fixed point of Φ. First, GSVI terminates by Eq. 1. Let us now write <sup>v</sup>i and <sup>u</sup>i for the vectors <sup>u</sup> and v as they are at the beginning of verification phase iteration i. We know that v<sup>0</sup> u0. We distinguish three cases relating the initial guess u<sup>0</sup> to lfp Φ.

1. u<sup>0</sup> ∼ lfpΦ or u<sup>0</sup> ≺ lfpΦ, i.e. there is a state s with u0(s) < (lfpΦ)(s). Since we use <sup>Φ</sup>min on the upper values, it follows <sup>u</sup>i(s) <sup>≤</sup> <sup>u</sup>0(s) <sup>&</sup>lt; (lfpΦ)(s) for all <sup>i</sup>. By Eq. 1, there must thus be a <sup>j</sup> such that <sup>v</sup>j (s) > uj (s), triggering a retry with reduced α in line 14. Such a retry could also be triggered earlier in line 16. Due to the reduction of α and Eq. 1, every call to GSVI will further increase some values in v or reach v = lfpΦ (in special cases), and for some subsequent guess u we must have u0(s) < u(s). Consequently, after some repetitions of this case 1, we must eventually guess a u with lfp Φ u.

ECC

**Fig. 2.** DTMC M<sup>d</sup>


<sup>e</sup> without

**Table 3.** Nontermination of OVI on M

2. lfp Φ ≺ u0. Observe that operators Φ and Φmin are *local* [9], i.e. a state's value can only change if a direct successor's value changes. In particular, a state's value can only decrease (increase) if a direct successor's value decreases (increases). If <sup>u</sup>i(s) < ui−<sup>1</sup>(s), then <sup>s</sup> cannot be blocked again in any later iteration j>i: for it to become blocked, a successor's upper value would have to increase, but Φmin ensures non-increasing upper values for all states. Analogously to Eq. 1, we know that [3, Lemma 3.3 (c)]

$$\text{lfp } \Phi \preceq u \quad \text{implies} \quad \lim\_{n \to \infty} \Phi\_{\text{min}}^n(u) = \text{lfp } \Phi$$

(for the unique fixpoint case, since [3] assumes contracting MDP as usual). Thus, for all states <sup>s</sup>, there must be an <sup>i</sup> such that <sup>u</sup>i(s) < ui−<sup>1</sup>(s); in consequence, there is also an iteration j where no state is blocked any more. Then the condition in line 15 will be true and OVI terminates.

3. lfp Φ u<sup>0</sup> but not lfp Φ ≺ u0, i.e. there is a state s with u0(s)=(lfp Φ)(s). If there is an i where no state, including s, is blocked, then OVI terminates as above. For Pmin and Pmax, if u0(s) = 1, s cannot be blocked, so we can w.l.o.g. exclude such s. For other s not to be blocked in iteration i, we must have <sup>u</sup>i(s )=(lfp Φ)(s ) for all states s reachable from s under the optimal scheduler, i.e. all of those states must *reach* the fixed point. This cannot be guaranteed on general MDP. Since this case is a very particular situation unlikely to be encountered in practice with our heuristics, OVI adopts a pragmatic solution: it bounds the number of iterations in every verification phase (cf. line 4). Due to property H3 of our heuristics, u0(s)=(lfp Φ)(s) requires v0(s) < (lfp Φ)(s), thus some subsequent guess u will have u(s) > u0(s), and eventually we must get a u with lfp Φ ≺ u, which is case 2. Since we strictly increase the iteration bound on every retry, we will eventually encounter case 2 with a sufficiently high bound for termination.

Three of the four situations with multiple fixed points reduce to the corresponding unique fixed point situation due to property H1 of our guessing heuristics:

1. For Pmin, recall from Sect. 3.2 that the fixed point is unique if we fix the values of all S<sup>0</sup> min states to 0. In OVI without preprocessing, such states are in S?, thus they initially have value 0. Φ will not increase their values, neither will guessing due to H1, and neither will Φmin. Thus OVI here operates on a sublattice of V, where the fixed point of <sup>Φ</sup> is unique.


The only case where OVI may not terminate is for Pmax without ECC. Here, end components may cause states to be permanently blocked. However, we did not encounter this on any benchmark used in Sect. 5, so in contrast to e.g. II, OVI is still *practically* useful in this case despite the lack of a termination guarantee.

*Example 5.* We turn <sup>M</sup>e of Fig. <sup>1</sup> into <sup>M</sup> e by replacing the <sup>c</sup>-labelled transition from s<sup>2</sup> by transition { 0, s2 -<sup>→</sup> <sup>1</sup> <sup>2</sup> ,0, s+ -<sup>→</sup> <sup>1</sup> <sup>4</sup> ,1, s− -<sup>→</sup> <sup>1</sup> <sup>4</sup> }, i.e. we can now go from s<sup>2</sup> back to s<sup>2</sup> with probability <sup>1</sup> <sup>2</sup> and to each of <sup>s</sup>+, <sup>s</sup><sup>−</sup> with probability <sup>1</sup> 4 . The probability-1 transition from s<sup>2</sup> to s<sup>1</sup> remains. Then Table 3 shows a run of OVI for Pmax with *diffabs* and α = 0.1. s<sup>0</sup> is forever blocked from iteration 6 on.

#### **4.2 Variants of OVI**

While the core idea of OVI rests on classic results from domain theory, Algorithm 2 includes several particular choices that work together to achieve good performance and ensure termination. We sketch two variants to motivate these choices.

First, let us use Φ instead of Φmin for the upper values, i.e. move the assignment u(s) := u*new* down into line 13. Then we cannot prove termination because the arguments of case <sup>2</sup> for lfp <sup>Φ</sup> <sup>≺</sup> <sup>u</sup><sup>0</sup> no longer hold. Consider DTMC <sup>M</sup>d of Fig. 2 and Pmax( s+)=Pmin( s+). Let

$$u = \{ \begin{array}{c} s\_0 \mapsto 0.2, s\_1 \mapsto 1, s\_+ \mapsto 1, s\_- \mapsto 0 \end{array} \} \succ \{ \begin{array}{c} s\_0 \mapsto \frac{1}{9}, s\_1 \mapsto \frac{1}{9}, \dots \end{array} \} = \mathsf{lfp} \,\Phi.$$

Iterating Φ, we then get the following sequence of pairs u(s0), u(s1):

$$
\langle 0.2, 1 \rangle, \langle 1, 0.12 \rangle, \langle 0.12, 0.2 \rangle, \langle 0.2, 0.112 \rangle, \langle 0.112, 0.12 \rangle, \langle 0.12, 0.1112 \rangle, \dots
$$

Observe how the value of s<sup>0</sup> increases iff s<sup>1</sup> decreases and vice-versa. Thus we never encounter an inductive upper or lower bound. In Algorithm 2, we use Gauss-Seidel VI, which would not show the same effect on this model; however, if we insert another state between s<sup>0</sup> and s<sup>1</sup> that is updated last, Algorithm 2 would behave in the same alternating way. This particular u is contrived, but we could have guessed one with a similar relationship of the values leading to similar behaviour.

An alternative that allows us to use Φ instead of Φmin is to change the conditions that lead to retrying and termination: We separately store the initial guess of a verification phase as u0, and then compare each newly calculated u with u0. If u u0, then we know that there is an i such that u = Φ<sup>i</sup> (u) u0. Φ<sup>i</sup> retains all properties of Φ needed for Park induction, so this would also be a proof of lfp Φ u. The other conditions and the termination proofs can be adapted analogously. However, this variant needs ≈50 % more memory (to store an additional vector of values), and we found it to be significantly slower than Algorthm 2 and the first variant on almost all benchmark instances of Sect. 5.

#### **5 Experimental Evaluation**

We have implemented interval iteration (II) (using the "variant 2" approach of [3] to compute initial overapproximations for expected rewards), sound value iteration (SVI), and now optimistic value iteration (OVI) precisely as described in the previous section, in the mcsta model checker of the Modest Toolset [20], which is publicly available at modestchecker.net. It is cross-platform, implemented in C#, and built around the Modest [17] high-level modelling language. Via support for the Jani format [8], mcsta can exchange models with other tools like Epmc [18] and Storm [10]. Its performance is competitive with Storm and Prism [16]. We tried to spend equal effort performance-tuning our VI, II, SVI, and OVI implementations to avoid unfairly comparing highly-optimised OVI code with na¨ıve implementations of the competing algorithms.

In the following, we report on our experimental evaluation of OVI using mcsta on all applicable models of the Quantitative Verification Benchmark Set (QVBS) [21]. All models in the QVBS are available in Jani and can thus be used by mcsta. Most are parameterised, and come with multiple properties of different types. Aside from MDP models, the QVBS also includes DTMCs (which are a special case of MDP), continuous-time Markov chains (CTMC, for which the analysis of unbounded properties reduces to checking the embedded DTMC), Markov automata (MA [11], on which the embedded MDP suffices for unbounded properties), and probabilistic timed automata (PTA [26], some of which can be converted into MDP via the digital clocks semantics [25]). We use all of these model types. The QVBS thus gives rise to a large number of benchmark *instances*: combinations of a model, a parameter valuation, and a property to check. For every model, we chose one instance per probabilistic reachability and expected-reward property such that state space exploration did not run out of memory and VI took at least 10 s where possible. We only excluded


**Fig. 3.** OVI runtime and iteration count compared to VI (probabilistic reachability)

As a result, we considered 38 instances with probabilistic reachability and 41 instances with expected-reward properties, many comprising several million states.

We ran all experiments on an Intel Core i7-4790 workstation (3.6–4.0 GHz) with 8 GB of memory and 64-bit Ubuntu Linux 18.04. By default, we request a relative half-width of - = 10−<sup>6</sup> for the result probability or reward value, and configure OVI to use the relative-error criterion with α = 10−<sup>6</sup> in the iteration phase. We use a 600 s timeout ("TO"). Due to the number of instances, we show most results as scatter plots like in Fig. 3. Each such plot compares two methods in terms of runtime or number of iterations. Every point x, y corresponds to an instance and indicates that the method noted on the x-axis took x seconds or iterations to solve this instance while the method noted on the y-axis took y seconds or iterations. Thus points above the solid diagonal line correspond to instances where the x-axis method was faster (or needed fewer iterations); points above (below) the upper (lower) dotted diagonal line are where the x-axis method took less than half (more than twice) as long or as many iterations.

#### **5.1 Comparison with VI**

All methods except VI delivered correct results up to -. VI offers low runtime at the cost of occasional incorrect results, and in general the absence of any guarantee about the result. We thus compare with VI separately to judge the overhead caused by performing additional verification, and possibly iteration, phases. This is similar to the comparison done for II in [3]. Figures 3 and 4 show the results. The unfilled shapes indicate instances where VI produced an incorrect result. In terms of runtime, we see that OVI does not often take more than twice as long as VI, and frequently requires less than 50% extra time. On several instances where OVI incurs most overhead, VI produces an incorrect result, indicating

**Fig. 4.** OVI runtime and iteration count compared to VI (expected rewards)

that they are "hard" instances for value iteration. The unfilled CTMCs where OVI takes much longer to compute probabilities are all instances of the *embedded* model; the DTMC on the x-axis is *haddad-monmege*, an adversarial model built to highlight the convergence problem of VI in [14]. The problematic cases for expected rewards include most MA instances, the two expected-reward instances of the *embedded* CTMC, and again *haddad-monmege*. In terms of iterations, the overhead of OVI is even less than in runtime.

#### **5.2 Comparison with II and SVI**

We compare the runtime of OVI with the runtime of II and that of SVI separately for reachability probabilities (shown in Fig. 5) and expected rewards (shown in Fig. 6). As shown in Table 2, OVI has almost the same requirements on precomputations as VI, while II and SVI require extra precomputations and ECC for reachability probabilities. The precomputations and ECC need extra runtime (which turned out to be negligible in some cases but significant enough to cause a timeout in others) prior to the numeric iterations. However, doing the precomputations can reduce the size of the set S?, and ECC can reduce the size of the MDP itself. Both can thus reduce the runtime needed for the numeric iterations. For the overall runtime, we found that none of these effects dominates the other over all models. Thus sometimes it may be better to perform only the required precomputations and transformations, while on other models performing all applicable ones may lead to lower total runtime. For reachability probabilities, we thus compare OVI, II, and SVI in two scenarios: once in the default ("std") setting of mcsta that uses only required preprocessing steps

**Fig. 5.** OVI runtime compared to II and SVI (probabilities)

(without ECC for OVI; we report the total runtime for preprocessing and iterations), and once with all of them enabled ("pre", where we report only the runtime for numeric iterations, plus the computation of initial upper bounds in case of II).

For probabilistic reachability, we see in Fig. 5 that there is no clear winner among the three methods in the "std" setting (top plots). In some cases, the extra precomputations take long enough to give an advantage to OVI, while in others they speed up II and SVI significantly, compensating for their overhead. The "pre" setting (bottom), in which all three algorithms operate on exactly the same input w.r.t. to MDP M and set S?, however, shows a clearer picture: now OVI is faster, sometimes significantly so, than II and SVI on most instances.

**Fig. 6.** OVI runtime compared to II and SVI (expected rewards)

Expected-reward properties were more challenging for all three methods (as well as for VI, which produced more errors here than for probabilities). The plots in Fig. 6 paint a very clear picture of OVI being significantly faster for expected rewards than II (which suffers from the need to precompute initial upper bounds that then turn out to be rather conservative), and faster (though by a lesser margin and with few exceptions) than SVI.

In Fig. 7, we give a summary view combining the data from Figs. 3 to 6. For each algorithm, we plot the instances sorted by runtime, i.e. a point x, y on the line for algorithm z means that some instance took y seconds to solve via z, and there are x instances that z solves in less time. Note in particular that the times are *not* cumulative. The right-hand plot zooms into the left-hand one. We clearly see the speedup offered by OVI over SVI and especially II. Where the scatter plots merely show that OVI often does not obtain more than a 2× speedup compared to SVI, these plots provide an explanation: the VI line is a rough

**Fig. 7.** Summary comparison to VI, II, and SVI, instances ordered by runtime

**Fig. 8.** Influence of /α on runtime (expected rewards, relative error)

**Fig. 9.** Runtime comparison with absolute error (expected rewards)

bound on the performance that any *extension* of VI can deliver. Comparing the SVI and VI lines, over much of the plot's range, OVI thus cannot take less than half the runtime of SVI without outperforming VI itself.

#### **5.3 On the Effect of** **and** *α*

We also compared the four algorithms for different values of and, where applicable, α. We show a selection of the results in Fig. 8. The axis labels are of the form "algorithm, -/α". On the left, we see that the runtime of OVI changes if we set α to values different from -, however there is no clear trend: some instances are checked faster, some slower. We obtained similar plots for other combinations of α values, with only a slight tendency towards longer runtimes as α>-. mcsta thus uses α = as a default that can be changed by the user.

In the middle, we study the impact of reducing the desired precision by setting to 10−<sup>3</sup>. This allows OVI to speed up by factors mostly between 1 and 2; the same comparison for SVI and II resulted in similar plots, however VI was able to more consistently achieve higher speedups. When we compare the right plot with the right-hand plot of Fig. 6, we consequently see that the overall result of our comparison between OVI and SVI does not change significantly with the lower precision, although OVI does gain slightly more than SVI.

#### **5.4 Comparing Relative and Absolute Error**

In Fig. 9, we show comparison plots for the runtime when using *diffabs* instead of *diffrel* . Requiring absolute-error-correct results may make instances with low result values much easier and instances with high results much harder. We chose - = 10−<sup>2</sup> as a compromise, and the leftmost plot confirms that we indeed chose an that keeps the expected-reward benchmarks on average roughly as hard as with 10−<sup>6</sup> relative error. In the middle and right plots, we again see OVI compared with II and SVI. Compared to Fig. 6, both II and SVI gain a little, but there are no significant differences overall. Our experiments thus confirm that the relative performance of OVI is stable under varying precision requirements.

#### **5.5 Verification Phases**

On the right, we show histograms of the number of verification phases started (top, from 1 phase on the left to 20 on the right) and the percentage of iterations that are done in verification phases (bottom) over all benchmark instances (probabilities and rewards). We see that, in the vast majority of cases, we need few verification attempts, with many succeeding in the first attempt, and most iterations are performed in the iteration phases.

#### **6 Conclusion**

We have presented *optimistic value iteration* (OVI), a new approach to making non-exact probabilistic model checking via iterative numeric algorithms sound in the sense of delivering results within a prescribed interval around the true value (modulo floating-point and implementation errors). Compared to interval (II) and sound value iteration (SVI), OVI has slightly stronger termination guarantees in presence of multiple fixed points, and works in practice for max. probabilities without collapsing end components despite the lack of a guarantee. Like II, it can be combined with alternative methods for dealing with end components such as the new *deflating* technique of [23]. OVI is a *simple* algorithm that is *easy* to add to any tool that already implements value iteration, and it is *fast*, further closing the performance gap between VI and sound methods.

**Acknowledgments.** The authors thank Tim Quatmann (RWTH Aachen) for fruitful discussions when the idea of OVI initially came up in late 2018, and for his help in implementing and optimising the SVI implementation in mcsta.

**Data Availability.** A dataset to replicate our experimental evaluation is archived and available at DOI 10.4121/uuid:3df859e6-edc6-4e2d-92f3-93e478bbe8dc [19].

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## PrIC3: Property Directed Reachability for MDPs

Kevin Batz1(B) , Sebastian Junges<sup>2</sup> , Benjamin Lucien Kaminski<sup>3</sup> , Joost-Pieter Katoen<sup>1</sup> , Christoph Matheja<sup>4</sup> , and Philipp Schröer<sup>1</sup>

	- <sup>2</sup> University of California, Berkeley, USA
	- <sup>3</sup> University College London, London, UK <sup>4</sup> ETH Zürich, Zürich, Switzerland

Abstract. IC3 has been a leap forward in symbolic model checking. This paper proposes PrIC3 (pronounced pricy-three), a conservative extension of IC3 to symbolic model checking of MDPs. Our main focus is to develop the theory underlying PrIC3. Alongside, we present a first implementation of PrIC3 including the key ingredients from IC3 such as generalization, repushing, and propagation.

#### 1 Introduction

IC3. Also known as property-directed reachability (PDR) [23], IC3 [13] is a symbolic approach for verifying finite transition systems (TSs) against safety properties like "*bad states are unreachable*". It combines bounded model checking (BMC) [12] and inductive invariant generation. Put shortly, IC3 either proves that a set B of bad states is *un*reachable by finding a set of non-B states closed under reachability—called an *inductive invariant*—or refutes reachability of B by a *counterexample* path reaching B. Rather than unrolling the transition relation (as in BMC), IC3 attempts to incrementally strengthen the invariant "no state in <sup>B</sup> is reachable" into an inductive one. In addition, it applies aggressive abstraction to the explored state space, so-called generalization [36]. These aspects together with the enormous advances in modern SAT solvers have led to IC3's success. IC3 has been extended [27,38] and adapted to software verification [19,44]. This paper develops a *quantitative* IC3 framework for probabilistic models.

*MDPs.* Markov decision processes (MDPs) extend TSs with discrete probabilistic choices. They are central in planning, AI as well as in modeling randomized distributed algorithms. A key question in verifying MDPs is *quantitative* reachability: "*is the (maximal) probability to reach* B *at most* λ*?*". Quantitative reachability [5,6]

This work has been supported by the ERC Advanced Grant 787914 (FRAPPANT), NSF grants 1545126 (VeHICaL) and 1646208, the DARPA Assured Autonomy program, Berkeley Deep Drive, and by Toyota under the iCyPhy center.

c The Author(s) 2020

S. K. Lahiri and C. Wang (Eds.): CAV 2020, LNCS 12225, pp. 512–538, 2020. https://doi.org/10.1007/978-3-030-53291-8\_27

reduces to solving linear programs (LPs). Various tools support MDP model checking, e.g., Prism [43], Storm [22], modest [34], and EPMC [31]. The LPs are mostly solved using (variants of) value iteration [8,28,35,51]. Symbolic BDD-based MDP model checking originated two decades ago [4] and is rather successful.

*Towards* IC3 *for MDPs.* Despite the success of BDD-based symbolic methods in tools like Prism, IC3 has not penetrated probabilistic model checking yet. The success of IC3 and the importance of quantitative reachability in probabilistic model checking raises the question *whether and how* IC3 *can be adapted—not just utilized—to reason about quantitative reachability in MDPs*. This paper addresses the challenges of answering this question. It extends IC3 in several dimensions to overcome these hurdles, making PrIC3—to our knowledge—*the first* IC3 *framework for quantitative reachability in MDPs*<sup>1</sup>. Notably, PrIC3 is conservative: For a threshold <sup>λ</sup> = 0, PrIC3 solves the same qualitative problem *and behaves (almost) the same as standard* IC3. Our main contribution is developing the theory underlying PrIC3, which is accompanied by a proof-of-concept implementation.

*Challenge 1 (Leaving the Boolean domain).* IC3 iteratively computes *frames*, which are over-approximations of sets of states that can reach B in a bounded number of steps. For MDPs, Boolean reachability becomes a *quantitative reachability probability*. This requires a shift: frames become real-valued functions rather than sets of states. Thus, there are infinitely many possible frames—even for finite-state MDPs—just as for infinite-state software [19,44] and hybrid systems [54]. Additionally, whereas in TSs a state reachable within k steps remains reachable on increasing k, the reachability probability in MDPs may increase. This complicates ensuring termination of an IC3 algorithm for MDPs. -

*Challenge 2 (Counterexamples* <sup>=</sup> *single paths).* For TSs, a single cycle-free path<sup>2</sup> to B suffices to refute that "B *is not reachable*". This is not true in the probabilistic setting [32]. Instead, proving that the probability of reaching B exceeds the threshold λ requires *a set of possibly cyclic paths*—e.g., represented as a sub-MDP [15]—whose probability mass exceeds λ. Handling sets of paths as counterexamples in the context of IC3 is new. -

*Challenge 3 (Strengthening).* This key IC3 technique intuitively turns a proof obligation of type (i) "state s is unreachable from the initial state s*<sup>I</sup>* " into type (ii) "s's *predecessors* are unreachable from s*<sup>I</sup>* ". A first issue is that in the quantitative setting, the standard characterization of reachability probabilities in MDPs (the Bellman equations) inherently *reverses* the direction of reasoning (cf. "reverse" IC3 [53]): Hence, strengthening turns (i) "<sup>s</sup> cannot reach <sup>B</sup>" into (ii) "s's *successors* cannot reach B".

A much more challenging issue, however, is that in the quantitative setting obligations of type (i) read "s is reachable *with at most probability* δ". However,

<sup>1</sup> Recently, (standard) IC3 for TSs was *utilized* in model checking Markov chains [49] to on-the-fly compute the states that cannot reach B.

<sup>2</sup> In [38], tree-like counterexamples are used for non-linear predicate transformers in IC3.

the strengthened type (ii) obligation must then read: "*the weighted sum over the reachability probabilities of the successors of* s is at most δ". In general, there are infinitely many possible choices of subobligations for the successors of s in order to satisfy the original obligation, because—grossly simplified—there are infinitely many possibilities for a and b to satisfy weighted sums such as 1 <sup>3</sup> <sup>a</sup> <sup>+</sup> <sup>2</sup> <sup>3</sup> <sup>b</sup> <sup>≤</sup> <sup>δ</sup>. While we only need one choice of subobligations, picking a *good* one is approximately as hard as solving the entire problem altogether. We hence require a heuristic, which is guided by a *user-provided oracle*. -

*Challenge 4 (Generalization).* "One of the key components of IC3 is [inductive] generalization" [13]. Generalization [36] abstracts single states. It makes IC3 scale, but is *not* essential for correctness. To facilitate generalization, systems should be encoded symbolically, i.e., integer-valued program variables describe states. Frames thus map variables to probabilities. A first aspect is how to effectively present them to an SMT-solver. Conceptually, we use uninterpreted functions and universal quantifiers (encoding program behavior) together with linear real arithmetic to encode the weighted sums occurring when reasoning about probabilities. A second aspect is more fundamental: Abstractly, IC3's generalization guesses an unreachable set of states. We, however, need to guess this set *and* a probability for each state. To be effective, these guesses should moreover eventually yield an inductive frame, which is often highly nonlinear. We propose three SMT-guided interpolation variants for guessing these maps. -

*Structure of this Paper.* We develop PrIC3 gradually: We explain the underlying rationale in Sect. 3. We also describe the core of PrIC3—called PrIC3H—which resembles closely the main loop of standard IC3, but uses adapted frames and termination criteria (Challenge 1). In line with Challenge 3, PrIC3<sup>H</sup> is parameterized by a heuristic H which is applied whenever we need to select one out of infinitely many probabilities. No requirements on the quality of H are imposed. PrIC3<sup>H</sup> is *sound* and always terminates: If it returns true, then the maximal reachability probability is bounded by <sup>λ</sup>. Without additional assumptions about <sup>H</sup>, PrIC3<sup>H</sup> is *incomplete*: on returning false, it is unknown whether the returned sub-MDP is indeed a counterexample (Challenge 2). Section 4 details strengthening (Challenge 3). Section 5 presents a sound *and* complete algorithm PrIC3 on top of PrIC3H. Section 6 presents a prototype, discusses our chosen heuristics, and addresses Challenge 4. Section 7 shows some encouraging experiments, but also illustrates need for further progress.

Related Work. Just like IC3 has been a symbiosis of different approaches, PrIC3 has been inspired by several existing techniques from the verification of probabilistic systems.

*BMC.* Adaptions of BMC to Markov chains (MCs) with a dedicated treatment of cycles have been pursued in [57]. The encoding in [24] annotates sub-formulae with probabilities. The integrated SAT solving process implicitly unrolls all paths leading to an exponential blow-up. In [52], this is circumvented by grouping paths, discretizing them, and using an encoding with quantifiers and bit-vectors, but without numerical values. Recently, [56] extends this idea to a PAC algorithm by purely propositional encodings and (approximate) model counting [17]. These approaches focus on MCs and are not mature yet.

*Invariant Synthesis.* Quantitative loop invariants are key in analyzing *probabilistic programs* whose operational semantics are (possibly infinite) MDPs [26]. A quantitative invariant I maps states to probabilities. I is shown to be an invariant by comparing I to the result of applying the MDP's Bellman operator to I. Existing approaches for invariant synthesis are, e.g., based on weakest preexpectations [33,39,40,42,46], template-based constraint solving [25], notions of martingales [3,9,16,55], and solving recurrence relations [10]. All but the last technique require user guidance.

*Abstraction.* To combat state-space explosion, abstraction is often employed. CEGAR for MDPs [37] deals with explicit sets of paths as counterexamples. Game-based abstraction [30,41] and partial exploration [14] exploit that not all paths have to be explored to prove bounds on reachability probabilities.

*Statistical Methods and (deep) Reinforcement Learning.* Finally, an avenue that avoids storing a (complete) model are simulation-based approaches (statistical model checking [2]) and variants of reinforcement learning, possibly with neural networks. For MDPs, these approaches yield weak statistical guarantees [20], but may provide good oracles.

#### 2 Problem Statement

Our aim is to prove that the *maximal probability* of *reaching* a *set* B *of bad states* from the initial state s*<sup>I</sup>* of a *Markov decision process* M is at most some *threshold* λ. Below, we give a formal description of our problem. We refer to [7,50] for a thorough introduction.

Definition 1 (MDPs). *A* Markov decision process *(* MDP*) is a tuple* M = (S, s*<sup>I</sup>* , Act, P)*, where* <sup>S</sup> *is a finite set of* states*,* <sup>s</sup>*<sup>I</sup>* <sup>∈</sup> <sup>S</sup> *is the* initial state*,* Act *is a finite set of* actions*, and* <sup>P</sup> : <sup>S</sup> <sup>×</sup> Act <sup>×</sup> <sup>S</sup> <sup>→</sup> [0, 1] *is a* transition probability function*. For state* <sup>s</sup>*, let* Act (s) = {<sup>a</sup> <sup>∈</sup> Act | ∃s <sup>∈</sup> <sup>S</sup> : <sup>P</sup>(s, a, s ) <sup>&</sup>gt; <sup>0</sup>} *be the* - enabled actions *at* <sup>s</sup>*. For all states* <sup>s</sup> <sup>∈</sup> <sup>S</sup>*, we require* <sup>|</sup>Act (s)| ≥ <sup>1</sup> *and s*-<sup>∈</sup>*<sup>S</sup>* <sup>P</sup>(s, a, s )=1*.* -

For this paper, we fix an MDP <sup>M</sup> = (S, s*<sup>I</sup>* , Act, P), a set of *bad states* <sup>B</sup> <sup>⊆</sup> <sup>S</sup>, and a threshold <sup>λ</sup> <sup>∈</sup> [0, 1]. The *maximal* <sup>3</sup> *(unbounded) reachability probability* to eventually reach a state in <sup>B</sup> from a state <sup>s</sup> is denoted by Prmax (<sup>s</sup> <sup>|</sup><sup>=</sup> ♦B). We characterize Prmax (<sup>s</sup> <sup>|</sup><sup>=</sup> ♦B) using the so-called *Bellman operator*. Let <sup>M</sup>*<sup>N</sup>* denote the set of functions from <sup>N</sup> to <sup>M</sup>. Anticipating IC3 terminology, we call a function <sup>F</sup> <sup>∈</sup> [0, 1]*<sup>S</sup>* <sup>a</sup> *frame*. We denote by <sup>F</sup>[s] the evaluation of frame <sup>F</sup> for state s.

<sup>3</sup> Maximal with respect to all possible resolutions of nondeterminism in the MDP.

Definition 2 (Bellman Operator). *For a set of actions* <sup>A</sup> <sup>⊆</sup> Act*, we define the* Bellman operator for <sup>A</sup> *as a frame transformer* <sup>Φ</sup>*<sup>A</sup>* : [0, 1]*<sup>S</sup>* <sup>→</sup> [0, 1]*<sup>S</sup> with*

$$\Phi\_A\left(F\right)[s] \ = \begin{cases} 1, & if\ s \in B\\ \max\limits\_{a \in A} \sum\_{s' \in S} P(s, a, s') \cdot F[s']\ , & if\ s \notin B\ . \end{cases}$$

*We write* Φ*<sup>a</sup> for* Φ{*a*}*,* Φ *for* ΦAct*, and call* Φ *simply* the Bellman operator*.* -

For every state <sup>s</sup>, the maximal reachability probability Prmax (<sup>s</sup> <sup>|</sup><sup>=</sup> ♦B) is then given by the least fixed point of the Bellman operator Φ. That is,

$$\forall s: \quad \text{Pr}^{\text{max}}\left(s \mid \vdash \Diamond B\right) \; = \; \left(\text{lfp } \Phi\right)[s] \; ,$$

where the underlying partial order on frames is a complete lattice with ordering

$$\begin{array}{ccccc} F\_1 & \leq & F\_2 & \text{iff} & & \forall s \in S \colon & & F\_1[s] & \leq & F\_2[s] & . \end{array}$$

In terms of the Bellman operator, our formal problem statement reads as follows:


Whenever Prmax (s*<sup>I</sup>* <sup>|</sup><sup>=</sup> ♦B) <sup>≤</sup> <sup>λ</sup> indeed holds, we say that the MDP <sup>M</sup> is *safe* (with respect to the set of bad states B and threshold λ); otherwise, we call it *unsafe*.

Fig. 1. The MDP M serving as a running example.

Recovery Statement 1. *For* λ = 0*, our problem statement is equivalent to the* qualitative reachability *problem solved by (reverse) standard* IC3*, i.e, prove or refute that all bad states in* B *are* unreachable *from the initial state* s*<sup>I</sup> .*

*Example 1.* The MDP M in Fig. 1 consists of 6 states with initial state s<sup>0</sup> and bad states <sup>B</sup> <sup>=</sup> {s5}. In <sup>s</sup>2, actions <sup>a</sup> and <sup>b</sup> are enabled; in all other states, one unlabeled action is enabled. We have Prmax (s<sup>0</sup> <sup>|</sup><sup>=</sup> ♦B) = <sup>2</sup>/3. Hence, <sup>M</sup> is safe for all thresholds <sup>λ</sup> <sup>≥</sup> <sup>2</sup>/<sup>3</sup> and unsafe for λ < <sup>2</sup>/3. In particular, <sup>M</sup> is unsafe for <sup>λ</sup> = 0 as <sup>s</sup><sup>5</sup> is *reachable* from <sup>s</sup>0. -

#### 3 The Core PrIC3 Algorithm

The purpose of PrIC3 is to prove or refute that the maximal probability to reach a bad state in B from the initial state s*<sup>I</sup>* of the MDP M is at most λ. In this section, we explain the rationale underlying PrIC3. Moreover, we describe the core of PrIC3—called PrIC3H—which bears close resemblance to the main loop of standard IC3 for TSs.

Because of the inherent direction of the Bellman operator, we build PrIC3 on *reverse* IC3 [53], cf. Challenge 3. Reversing constitutes a shift from reasoning along the direction *initial-to-bad* to *bad-to-initial*. While this shift is mostly *inessential* to the fundamentals underlying IC3, the reverse direction is unswayable in the probabilistic setting. Whenever we draw a connection to standard IC3, we thus generally mean *reverse* IC3.

#### 3.1 Inductive Frames

IC3 for TSs operates on (*quali*tative) frames representing sets of states of the TS at hand. A frame <sup>F</sup> can hence be thought of as a mapping<sup>4</sup> from states to {0, <sup>1</sup>}. In PrIC3 for MDPs, we need to move from a Boolean to a quantitative regime. Hence, a (*quanti*tative) frame is a mapping from states to probabilities in [0, 1].

For a given TS, consider the frame transformer T that adds to a given input frame F all bad states in B and all predecessors of the states contained in F . The rationale of standard (reverse) IC3 is to find a frame <sup>F</sup> ∈ {0, <sup>1</sup>}*<sup>S</sup>* such that (i) the initial state s*<sup>I</sup>* does not belong to F and (ii) applying T takes us down in the partial order on frames, i.e.,

$$\text{(1)}\quad F[s\_I] \quad = \text{ 0} \qquad \text{and} \qquad \text{(ii)}\quad T(F) \le \text{ }F\text{ }\dots$$

Intuitively, (i) postulates the *hypothesis* that s*<sup>I</sup>* cannot reach B and (ii) expresses that F is closed under adding bad states and taking predecessors, thus affirming the hypothesis.

Analogously, the rationale of PrIC3 is to find a frame <sup>F</sup> <sup>∈</sup> [0, 1]*<sup>S</sup>* such that (i) F postulates that the probability of s*<sup>I</sup>* to reach B is at most the threshold λ and (ii) applying the Bellman operator Φ to F takes us down in the partial order on frames, i.e.,

(i) <sup>F</sup>[s*<sup>I</sup>* ] <sup>≤</sup> <sup>λ</sup> and (ii) Φ(F) <sup>≤</sup> F .

Frames satisfying the above conditions are called *inductive invariants* in IC3. We adopt this terminology. By *Park's Lemma* [48], which in our setting reads

Φ(F) <sup>≤</sup> <sup>F</sup> implies lfp <sup>Φ</sup> <sup>≤</sup> F ,

<sup>4</sup> In IC3, frames are typically characterized by logical formulae. To understand IC3's fundamental principle, however, we prefer to think of frames as functions in {0, <sup>1</sup>}<sup>S</sup> partially ordered by ≤.

an inductive invariant <sup>F</sup> would indeed *witness* that Prmax (s*<sup>I</sup>* <sup>|</sup><sup>=</sup> ♦B) <sup>≤</sup> <sup>λ</sup>, because

$$\Pr^{\text{max}}\left(s\_I \mid \! \! \! / B\right) \; = \; \left(\text{lfp } \Phi\right)[s\_I] \; \le \; \; F[s\_I] \; \le \; \lambda \; .$$

If no inductive invariant exists, then standard IC3 will find a counterexample: a *path* from the initial state s*<sup>I</sup>* to a bad state in B, which serves as a witness to refute. Analogously, PrIC3 will find a counterexample, but of a different kind: Since single paths are insufficient as counterexamples in the probabilistic realm (Challenge 2), PrIC3 will instead find a *subsystem* of states of the MDP witnessing Prmax (s*<sup>I</sup>* <sup>|</sup><sup>=</sup> ♦B) > λ.

#### 3.2 The PrIC3 Invariants

Analogously to standard IC3, PrIC3 aims to find the inductive invariant by maintaining a *sequence of frames* <sup>F</sup><sup>0</sup> <sup>≤</sup> <sup>F</sup><sup>1</sup> <sup>≤</sup> <sup>F</sup><sup>2</sup> <sup>≤</sup> ... such that <sup>F</sup>*i*[s] overapproximates the maximal probability of reaching B from s within *at most* i *steps*. This i*-step-bounded reachability probability* Prmax <sup>s</sup> <sup>|</sup><sup>=</sup> ♦≤*<sup>i</sup>* B can be characterized using the Bellman operator: Φ (**0**) is the 0-step probability; it is 1 for every <sup>s</sup> <sup>∈</sup> <sup>B</sup> and <sup>0</sup> otherwise. For any <sup>i</sup> <sup>≥</sup> <sup>0</sup>, we have

$$\Pr^{\text{max}}\left(s \mid \stackrel{<}{\rightharpoonup} \stackrel{<}{B}\right) \\ = \left(\Phi^i\left(\Phi\left(\mathbf{0}\right)\right)\right)[s] \\ = \left(\Phi^{i+1}\left(\mathbf{0}\right)\right)[s],$$

where **0**, the frame that maps every state to 0, is the least frame of the underlying complete lattice. For a finite MDP, the *unbounded* reachability probability is then given by the limit

$$\Pr^{\text{max}}\left(s\vDash\!\!\langle B\rangle\right) = \left(\text{lfp }\Phi\right)[s] \stackrel{(\*)}{=} \left(\lim\_{n\to\infty} \Phi^n\left(\mathbf{0}\right)\right)[s] \\ = \lim\_{n\to\infty} \Pr^{\text{max}}\left(s\vDash\!\!\!\langle \!\*\!\!\langle \!\*\!\!\!\langle \!\*\!\!\langle \!\*\!\!\langle \!\*\!\!\langle \!\*\!\!\langle \!\*\!\langle \!\*\!\langle \!\*\!\langle \!\*\!\langle\!\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\langle\!\/\vert\!\/)\!\/)\!\rangle\!\rangle\!\rangle}\!\rangle\!\rangle]}\right)$$

where (∗) is a consequence of the well-known Kleene fixed point theorem [45].

The sequence <sup>F</sup><sup>0</sup> <sup>≤</sup> <sup>F</sup><sup>1</sup> <sup>≤</sup> <sup>F</sup><sup>2</sup> <sup>≤</sup> ... maintained by PrIC3 should frame-wise overapproximate the increasing sequence Φ (**0**) <sup>≤</sup> <sup>Φ</sup><sup>2</sup> (**0**) <sup>≤</sup> <sup>Φ</sup><sup>3</sup> (**0**).... Pictorially:

F<sup>0</sup> ≤ F<sup>1</sup> ≤ F<sup>2</sup> ≤ ... ≤ F*<sup>k</sup>* ≤ ≤ ≤ ≤

$$\begin{array}{ccccccccc} \mathbf{0} & \leq & \Phi(\mathbf{0}) & \leq & \Phi^2(\mathbf{0}) & \leq & \Phi^3(\mathbf{0}) & \leq & \dots & \leq & \Phi^{k+1}(\mathbf{0}) \end{array}$$

However, the sequence Φ (**0**), Φ<sup>2</sup> (**0**), Φ<sup>3</sup> (**0**), ... will never explicitly be known to PrIC3. Instead, PrIC3 will ensure the above frame-wise overapproximation property implicitly by enforcing the so-called PrIC3 *invariants* on the frame sequence <sup>F</sup>0, F1, F2, .... Apart from allowing for a threshold <sup>0</sup> <sup>≤</sup> <sup>λ</sup> <sup>≤</sup> <sup>1</sup> on the maximal reachability probability, these invariants coincide with the standard IC3 invariants (where <sup>λ</sup> = 0 is fixed). Formally:

Definition 3 (PrIC3 Invariants). *Frames* <sup>F</sup>0, ..., F*k, for* <sup>k</sup> <sup>≥</sup> <sup>0</sup>*, satisfy the* PrIC3 invariants*, a fact we will denote by* PrIC3Inv (F0, ..., F*k*)*, if all of the following hold:*


The PrIC3 invariants enforce the above picture: The *chain property* ensures <sup>F</sup><sup>0</sup> <sup>≤</sup> <sup>F</sup><sup>1</sup> <sup>≤</sup> ... <sup>≤</sup> <sup>F</sup>*k*. We have Φ (**0**) = <sup>F</sup><sup>0</sup> <sup>≤</sup> <sup>F</sup><sup>0</sup> by *initiality*. Assuming <sup>Φ</sup>*<sup>i</sup>*+1 (**0**) <sup>≤</sup> F*<sup>i</sup>* as induction hypothesis, monotonicity of Φ and *relative inductivity* imply <sup>Φ</sup>*<sup>i</sup>*+2 (**0**) <sup>≤</sup> Φ(F*i*) <sup>≤</sup> <sup>F</sup>*i*+1.

By overapproximating Φ (**0**), Φ<sup>2</sup> (**0**), ..., Φ*<sup>k</sup>*+1 (**0**), the frames F0, ..., F*<sup>k</sup>* in effect bound the maximal step-bounded reachability probability of every state:

Lemma 1. *Let frames* <sup>F</sup>0, ..., F*<sup>k</sup> satisfy the* PrIC3 *invariants. Then*

<sup>∀</sup> <sup>s</sup> <sup>∀</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup> : *Prmax* <sup>s</sup> <sup>|</sup><sup>=</sup> ♦<sup>≤</sup>*<sup>i</sup>* B <sup>≤</sup> <sup>F</sup>*i*[s].

In particular, Lemma 1 together with *frame-safety* ensures that the maximal step-bounded reachability probability of the *initial state* s*<sup>I</sup>* to reach B is at most the threshold λ.

As for proving that the *unbounded* reachability probability is also at most λ, it suffices to find two consecutive frames, say F*<sup>i</sup>* and F*i*+1, that coincide:

Lemma 2. *Let frames* <sup>F</sup>0, ..., F*<sup>k</sup> satisfy the* PrIC3 *invariants. Then*

$$\exists \, i < k \colon \quad F\_i \; = \; F\_{i+1} \qquad implies \qquad Pr^{\max}(s\_I \mid \neg \Diamond B) \; \le \; \lambda \; .$$

*Proof.* <sup>F</sup>*<sup>i</sup>* <sup>=</sup> <sup>F</sup>*i*+1 and *relative inductivity* yield Φ(F*i*) <sup>≤</sup> <sup>F</sup>*i*+1 <sup>=</sup> <sup>F</sup>*i*, rendering <sup>F</sup>*<sup>i</sup> inductive*. By Park's lemma (cf. Sect. 3.1), we obtain lfp <sup>Φ</sup> <sup>≤</sup> <sup>F</sup>*<sup>i</sup>* and—by *frame-safety*—conclude

$$\Pr^{\max}(s\_I \mid \mho\_I \Phi) \; = \; \text{(lfp } \Phi) \\
[s\_I] \; \le \; F\_i[s\_I] \; \le \; \lambda \; . \tag{7}$$

#### 3.3 Operationalizing the PrIC3 Invariants for Proving Safety

Lemma 2 gives us a clear angle of attack for *proving* an MDP safe: Repeatedly add and refine frames approximating step-bounded reachability probabilities for more and more steps while enforcing the PrIC3 invariants (cf. Definition 3.2) until two consecutive frames coincide.

Analogously to standard IC3, this approach is taken by the core loop PrIC3<sup>H</sup> depicted in Algorithm 1; differences to the main loop of IC3 (cf. [23, Fig. 5]) are highlighted in red. A particular difference is that PrIC3<sup>H</sup> is parameterized by a heuristic H for finding suitable probabilities (see Challenge 3). Since the precise choice of H is irrelevant for the soundness of PrIC3H, we defer a detailed discussion of suitable heuristics to Sect. 4.

```
Data: MDP M, set of bad states B, threshold λ
  Result: true or false and a subset of the states of M
1 F0 ← Φ (0); F1 ← 1; k ← 1; oldSubsystem ← ∅;
2 while true do
3 success, F0,...,Fk, subsystem ← StrengthenH (F0, ..., Fk);
4 if ¬success then returnfalse, subsystem;
5 Fk+1 ← 1;
6 F0,...,Fk+1 ← Propagate (F0,...,Fk+1);
7 if ∃ 1 ≤ i ≤ k : Fi = Fi+1 then returntrue, ;
8 if oldSubsystem = subsystem then returnfalse, subsystem;
9 k ← k + 1; oldSubsystem ← subsystem;
10 end
```
Algorithm 1: PrIC3<sup>H</sup> (M, B, λ)

As input, PrIC3<sup>H</sup> takes an MDP <sup>M</sup> = (S, s*<sup>I</sup>* , Act, P), a set <sup>B</sup> <sup>⊆</sup> <sup>S</sup> of bad states, and a threshold <sup>λ</sup> <sup>∈</sup> [0, 1]. Since the input is never changed, we assume it to be *globally available*, also to subroutines. As output, PrIC3<sup>H</sup> returns true if two consecutive frames become equal. We hence say that PrIC3<sup>H</sup> is *sound* if it only returns true if M is safe.

We will formalize soundness using Hoare triples. For precondition φ, postcondition ψ, and program P, the triple φ P ψ is *valid* (for partial correctness) if, whenever program P starts in a state satisfying precondition φ and terminates in some state s , then <sup>s</sup> satisfies postcondition <sup>ψ</sup>. Soundness of PrIC3<sup>H</sup> then means validity of the triple

$$\{\text{true}\} \text{safe}, \underline{\hspace{0.1cm}} \leftarrow \text{PrlC3}\_{\mathcal{H}}(\mathfrak{M}, B, \lambda) \{\text{safe} \Rightarrow \Pr^{\text{max}} \left(s\_I \mid \!= \lozenge{0}B\right) \leq \lambda\} \;.$$

Let us briefly go through the individual steps of PrIC3<sup>H</sup> in Algorithm 1 and convince ourselves that it is indeed sound. After that, we discuss why PrIC3<sup>H</sup> terminates and what happens if it is unable to prove safety by finding two equal consecutive frames.

How **PrIC3***<sup>H</sup>* works. Recall that PrIC3<sup>H</sup> maintains a sequence of frames F0, ..., F*<sup>k</sup>* which is initialized in l. 1 with k = 1, F<sup>0</sup> =Φ(**0**), and F<sup>1</sup> = **1**, where the frame **1** maps every state to 1. Every time upon entering the while-loop in terms l. 2, the initial segment <sup>F</sup>0, ..., F*<sup>k</sup>*−<sup>1</sup> satisfies all PrIC3 invariants (cf. Definition 3), whereas the full sequence F0, ..., F*<sup>k</sup>* potentially violates frame-safety as it is possible that F*k*[s*<sup>I</sup>* ] > λ.

In l. 3, procedure StrengthenH—detailed in Sect. 4—is called to restore *all* PrIC3 invariants on the *entire* frame sequence: It either returns true if successful or returns false and a counterexample (in our case a subsystem of the MDP) if it was unable to do so. To ensure soundness of PrIC3H, it suffices that Strengthen<sup>H</sup> restores the PrIC3 invariants whenever it returns true. Formally, Strengthen<sup>H</sup> must meet the following specification:

Definition 4. *Procedure* Strengthen<sup>H</sup> *is* sound *if the following Hoare triple is valid:*

$$\begin{aligned} \{ \begin{aligned} \text{\textquotedblleft} \mathsf{PrlC3lnv} \left( F\_0, \ldots, F\_{k-1} \right) & \land \ F\_{k-1} \le F\_k \land \Phi \left( F\_{k-1} \right) \le F\_k \right\} \\ \text{\textit{subclosed}}, \, F\_0, \ldots, F\_k, \underline{\hspace{0.5cm}} & \leftarrow \mathsf{Strengthen}\_{\mathcal{H}} \left( F\_0, \ldots, F\_k \right) \\ \{ \begin{aligned} \text{\textquotedblleft} \text{\textquotedblright} \text{\textquotedblright} \left( F\_0, \ldots, F\_k \right) \end{aligned}} \end{aligned} \} $$

If Strengthen<sup>H</sup> returns true, then a new frame <sup>F</sup>*k*+1 <sup>=</sup> **<sup>1</sup>** is created in l. 5. After that, the (now initial) segment <sup>F</sup>0, ..., F*<sup>k</sup>* again satisfies all PrIC3 invariants, whereas the full sequence F0, ..., F*k*+1 potentially violates frame-safety at F*k*+1. *Propagation* (l. 6) aims to speed up termination by updating F*i*+1[s] by F*i*[s] iff this does not violate relative inductivity. Consequently, the previously mentioned properties remain unchanged.

If Strengthen<sup>H</sup> returns false, the PrIC3 invariants—premises to Lemma <sup>2</sup> for witnessing safety—cannot be restored and PrIC3<sup>H</sup> terminates returning false (l. 4). Returning false (also possible in l. 8) has by specification no affect on soundness of PrIC3H.

In l. 7, we check whether there exist two identical consecutive frames. If so, Lemma 2 yields that the MDP is safe; consequently, PrIC3<sup>H</sup> returns true. Otherwise, we increment k and are in the same setting as upon entering the loop, now with an increased frame sequence; PrIC3<sup>H</sup> then performs another iteration. In summary, we obtain:

Theorem 1 (Soundness of PrIC3*H*). *If* Strengthen<sup>H</sup> *is sound and* Propagate *does not affect the* PrIC3 *invariants, then* PrIC3<sup>H</sup> *is sound, i.e., the following triple is valid:*

 true *safe*, <sup>←</sup> PrIC3<sup>H</sup> (M, B, λ) *safe* <sup>=</sup><sup>⇒</sup> *Prmax* (s*<sup>I</sup>* <sup>|</sup><sup>=</sup> ♦B) <sup>≤</sup> <sup>λ</sup> 

**PrIC3***<sup>H</sup>* Terminates for Unsafe MDPs. If the MDP is unsafe, then there exists a step-bound n, such that Prmax <sup>s</sup>*<sup>I</sup>* <sup>|</sup><sup>=</sup> ♦≤*<sup>n</sup>*<sup>B</sup> > λ. Furthermore, any sound implementation of Strengthen<sup>H</sup> (cf. Definition 4) either immediately terminates PrIC3<sup>H</sup> by returning false or restores the PrIC3 invariants for <sup>F</sup>0, ..., F*k*. If the former case never arises, then Strengthen<sup>H</sup> will eventually restore the PrIC3 invariants for a frame sequence of length <sup>k</sup> <sup>=</sup> <sup>n</sup>. By Lemma 1, we have <sup>F</sup>*n*[s*<sup>I</sup>* ] <sup>≥</sup> Prmax <sup>s</sup>*<sup>I</sup>* <sup>|</sup><sup>=</sup> ♦≤*<sup>n</sup>*<sup>B</sup> > λ contradicting frame-safety.

**PrIC3***<sup>H</sup>* Terminates for Safe MDPs. Standard IC3 terminates on safe finite TSs as there are only finitely many different frames, making every ascending chain of frames eventually stabilize. For us, frames map states to probabilities (Challenge 1), yielding *infinitely many possible frames* even for finite MDPs. Hence, Strengthen<sup>H</sup> need not ever yield a stabilizing chain of frames. If it continuously fails to stabilize while repeatedly reasoning about the same set of states, we give up. PrIC3<sup>H</sup> checks this by comparing the subsystem Strengthen<sup>H</sup> operates on with the one it operated on in the previous loop iteration (l. 8).

Theorem 2. *If* Strengthen<sup>H</sup> *and* Propagate *terminate, then* PrIC3<sup>H</sup> *terminates.*

Recovery Statement 2. *For qual. reachability (*<sup>λ</sup> = 0*),* PrIC3<sup>H</sup> *never terminates in l. 8.*

**PrIC3***<sup>H</sup>* is Incomplete. Standard IC3 either proves safety or returns false and a counterexample—a single path from the initial to a bad state. As single paths are insufficient as counterexamples in MDPs (Challenge 2), PrIC3<sup>H</sup> instead returns a *subsystem* of the MDP M provided by StrengthenH. However, as argued above, we cannot trust Strengthen<sup>H</sup> to provide a stabilizing chain of frames. Reporting false thus only means that the given MDP *may* be unsafe; the returned subsystem has to be analyzed further.

The full PrIC3 algorithm presented in Sect. <sup>5</sup> addresses this issue. Exploiting the subsystem returned by PrIC3H, PrIC3 returns true if the MDP is safe; otherwise, it returns false and provides a true counterexample witnessing that the MDP is unsafe.

*Example 2.* We conclude this section with two example executions of PrIC3<sup>H</sup> on a simplified version of the MDP in Fig. 1. Assume that action b has been removed. Then, for every state, exactly one action is enabled, i.e., we consider a Markov chain. Figure 2 depicts the frame sequences computed by PrIC3<sup>H</sup> (for a reasonable <sup>H</sup>) on that Markov chain for two thresholds: <sup>5</sup>/<sup>9</sup> <sup>=</sup> Prmax (s<sup>0</sup> <sup>|</sup><sup>=</sup> ♦B) and <sup>9</sup>/10. In particular, notice that *proving the coarser bound of* <sup>9</sup>/<sup>10</sup> *requires fewer frames than proving the exact bound of* <sup>5</sup>/9. -


Fig. 2. Two runs of PrIC3<sup>H</sup> on the Markov chain induced by selecting action a in Fig. 1. For every iteration, frames are recorded after invocation of StrengthenH.

## 4 Strengthening in **PrIC3***<sup>H</sup>*

When the main loop of PrIC3<sup>H</sup> has created a new frame <sup>F</sup>*<sup>k</sup>* <sup>=</sup> **<sup>1</sup>** in its previous iteration, this frame may violate frame-safety (Definition 3.3) because of F*k*[s*<sup>I</sup>* ] = <sup>1</sup> ≤ <sup>λ</sup>. The task of Strengthen<sup>H</sup> is to restore the PrIC3 invariants on *all* frames F0,...,F*k*. To this end, our first *obligation* is to lower the value in frame i = k for state <sup>s</sup> <sup>=</sup> <sup>s</sup>*<sup>I</sup>* to <sup>δ</sup> <sup>=</sup> <sup>λ</sup> <sup>∈</sup> [0, 1]. We denote such an obligation by (i, s, δ). Observe that implicitly δ = 0 in the qualitative case, i.e., when proving unreachability. An obligation (i, s, δ) is *resolved* by updating the values assigned to state s in *all frames* <sup>F</sup>1,...,F*<sup>i</sup>* to at most <sup>δ</sup>. That is, for all <sup>j</sup> <sup>≤</sup> <sup>i</sup>, we set <sup>F</sup>*<sup>j</sup>* [s] to the minimum

```
1 Q ← {(k, sI , λ)} ;
2 while Q not empty do
3 (i, s, δ) ← Q.popMin(); /* pop obligation with minimal frame
      index */
4 if i = 0 ∨ (s ∈ B ∧ δ < 1) then
         /* possible counterexample given by subsystem
            consisting of states popped from Q at some point */
5 return false, , Q.touched();
      /* check whether Fi[s] ← δ violates relative inductivity */
6 if ∃a ∈ Act (s):Φa (Fi−1) [s] > δ then for such an a
7 δ1,...,δn ← H (s, a, δ) ;
8 {s1,...,sn} ← Succs(s, a);
9 Q.push ((i − 1, s1, δ1) ,..., (i − 1, sn, δn) , (i, s, δ));
10 else /* resolve (i, s, δ) without violating relative
      inductivity */
11 F1[s] ← min (F1[s], δ) ; ... ; Fi[s] ← min (Fi[s], δ);
12 end
13 (/* Q empty; all obligations have been resolved */ ) return
  true, F0,...,Fk, Q.touched();
                Algorithm 2: StrengthenH (F0,...,Fk)
```
of δ and the original value F*<sup>j</sup>* [s]. Such an update affects neither initiality nor the chain property (Definitions 3.1, 3.2). It may, however, violate relative inductivity (Definition 3.4), i.e., Φ (F*<sup>i</sup>*−1) <sup>≤</sup> <sup>F</sup>*i*. Before resolving obligation (i, s, δ), we may thus have to further decrease some entries in <sup>F</sup>*<sup>i</sup>*−<sup>1</sup> as well. Hence, *resolving obligations may spawn additional obligations* which have to be resolved first to maintain relative inductivity. In this section, we present a generic instance of Strengthen<sup>H</sup> meeting its specification (Definition 4) and discuss its correctness.

**Strengthen***<sup>H</sup>* by Example. Strengthen<sup>H</sup> is given by the pseudo code in Algorithm 2; differences to standard IC3 (cf. [23, Fig. 6]) are highlighted in red. Intuitively, Strengthen<sup>H</sup> attempts to recursively resolve all obligations until either both frame-safety and relative inductivity are restored for *all* frames or it detects a *potential counterexample* justifying why it is unable to do so. We first consider an execution where the latter does not arise:

*Example 3.* We zoom in on Example 2: Prior to the second iteration, we have created the following three frames assigning values to the states s0, s5:

$$F\_0 = \langle 0, 0, 0, 0, 1 \rangle, \qquad F\_1 = \langle 5/9, 1, 1, 1, 1 \rangle, \qquad \text{and} \qquad F\_2 = \mathbf{1}.$$

To keep track of unresolved obligations (i, s, δ), Strengthen<sup>H</sup> employs a priority queue Q which pops obligations with minimal frame index i first. Our first step is to ensure frame-safety of <sup>F</sup>2, i.e., alter <sup>F</sup><sup>2</sup> so that <sup>F</sup>2[s0] <sup>≤</sup> <sup>5</sup>/9; we thus initialize the queue Q with the initial obligation (2, s0, <sup>5</sup>/9) (l. 1). To do so, we check whether updating F2[s0] to <sup>5</sup>/<sup>9</sup> would invalidate relative inductivity (l. 6). This is indeed the case:

$$\Phi(F\_1)\begin{bmatrix} s\_0 \end{bmatrix} = \,^1/2 \cdot F\_1[s\_1] + ^1/2 \cdot F\_1[s\_2] \,^1 = \,^1 \not\le ^5/9.$$

To restore relative inductivity, Strengthen<sup>H</sup> spawns one new obligation for each relevant successor of s0. These have to be resolved before retrying to resolve the old obligation.<sup>5</sup>

*In contrast to standard* IC3 *, spawning obligations involves finding suitable probabilities* δ (l. 7). In our example this means we have to spawn two obligations (1, s1, δ1) and (1, s2, δ2) such that <sup>1</sup>/<sup>2</sup> · <sup>δ</sup><sup>1</sup> <sup>+</sup> <sup>1</sup>/<sup>2</sup> · <sup>δ</sup><sup>2</sup> <sup>≤</sup> <sup>5</sup>/9. There are *infinitely many choices* for <sup>δ</sup><sup>1</sup> and <sup>δ</sup><sup>2</sup> satisfying this inequality. Assume some heuristic <sup>H</sup> chooses δ<sup>1</sup> = <sup>11</sup>/<sup>18</sup> and δ<sup>2</sup> = <sup>1</sup>/2; we push obligations (1, s1, <sup>11</sup>/18), (1, s2, <sup>1</sup>/2), and (2, s0, <sup>5</sup>/9) (ll. 8, 9). In the next iteration, we first pop obligation (1, s1, <sup>11</sup>/18) (l. 3) and find that it can be resolved without violating relative inductivity (l. 6). Hence, we set F1[s1] to <sup>11</sup>/<sup>18</sup> (l. 11); no new obligation is spawned. Obligation (1, s2, <sup>1</sup>/2) is resolved analogously; the updated frame is F<sup>1</sup> = (5/9, <sup>11</sup>/18, <sup>1</sup>/2, 1). Thereafter, our initial obligation (2, s0, <sup>5</sup>/9) can be resolved; relative inductivity is restored for <sup>F</sup>0, F1, F2. Hence, Strengthen<sup>H</sup> returns true together with the updated frames. -

**Strengthen***<sup>H</sup>* is Sound. Let us briefly discuss why Algorithm 2 meets the specification of a sound implemenation of Strengthen<sup>H</sup> (Definition 4): First, we observe that Algorithm <sup>2</sup> alters the frames—and thus potentially invalidates the PrIC3 invariants—only in l. 11 by resolving an obligation (i, s, δ) with Φ (F*<sup>i</sup>*−1) [s] <sup>≤</sup> <sup>δ</sup> (due to the check in l. 6).

Let <sup>F</sup> <sup>s</sup> → <sup>δ</sup> denote the frame <sup>F</sup> in which <sup>F</sup>[s] is set to <sup>δ</sup>, i.e.,

$$F\left[s'] = \begin{cases} \delta, & \text{if } s'=s, \\ F[s'], & \text{otherwise}. \end{cases}$$

Indeed, resolving obligation (i, s, δ) in l. 11 lowers the values assigned to state s to at most <sup>δ</sup> *without* invalidating the PrIC3 invariants:

Lemma 3. *Let* (i, s, δ) *be an obligation and* F0,...,F*i, for* i > 0*, be frames with* Φ (F*<sup>i</sup>*−<sup>1</sup>) [s] <sup>≤</sup> <sup>δ</sup>*. Then* PrIC3Inv (F0,...,F*i*) *implies*

$$\text{Prl\u{C}3\u{\text{inv}}}\left(\left.F\_0\left,\ldots,\left.F\_i\left\right).$$

Crucially, the precondition of Definition <sup>4</sup> guarantees that all PrIC3 invariants except frame safety hold initially. Since these invariants are never invalidated due to Lemma 3, Algorithm 2 is a sound implementation of Strengthen<sup>H</sup> if it restores frame safety whenever it returns true, i.e., once it leaves the loop with an empty obligation queue Q (ll. 12–13). Now, an obligation (i, s, δ) is only popped from Q in l. 3. As (i, s, δ) is added to Q upon reaching l. 9, the size of Q can only ever be reduced (without returning false) by resolving (i, s, δ) in l. 11. Hence, Algorithm 2 does not return true unless it restored frame safety by resolving, amongst all other obligations, the initial obligation (k, s*<sup>I</sup>* , λ). Consequently:

<sup>5</sup> We assume that the set Succs(s, a) = {s <sup>∈</sup> <sup>S</sup> <sup>|</sup> <sup>P</sup>(s, a, s ) <sup>&</sup>gt; <sup>0</sup>} of *relevant* a*-successors* of state s is returned in some arbitrary, but fixed order.

Lemma 4. *Procedure* Strengthen<sup>H</sup> *is sound, i.e., it satisfies the specification in Definition 4.*

Theorem 3. *Procedure* PrIC3<sup>H</sup> *is sound, i.e., satisfies the specification in Theorem 1.*

We remark that, analogously to standard IC3, resolving an obligation in l. 11 may be accompanied by *generalization*. That is, we attempt to update the values of multiple states at once. Generalization is, however, highly non-trivial in a probabilistic setting. We discuss three possible approaches to generalization in Sect. 6.2.

**Strengthen***<sup>H</sup>* Terminates. We now show that Strengthen<sup>H</sup> as in Algorithm 2 terminates. The only scenario in which Strengthen<sup>H</sup> may not terminate is if it keeps spawning obligations in l. 9. Let us thus look closer at how obligations are spawned: Whenever we detect that resolving an obligation (i, s, δ) would violate relative inductivity for some action a (l. 6), we first need to update the values of the successor states <sup>s</sup>1,...,s*<sup>n</sup>* <sup>∈</sup> Succs(s, a) in frame <sup>i</sup>−1, i.e., we push the obligations (i−1, s1, δ1),...,(i−1, s*n*, δ*n*) which have to be resolved first (ll. 7–9). It is noteworthy that, for a TS, a single action leads to a single successor state <sup>s</sup>1. Algorithm <sup>2</sup> employs a heuristic <sup>H</sup> to determine the probabilities required for pushing obligations (l. 7). Assume for an obligation (i, s, δ) that the check in l. 6 yields <sup>∃</sup><sup>a</sup> <sup>∈</sup> Act (s):Φ*<sup>a</sup>* (F*<sup>i</sup>*−1) [s] > δ. Then <sup>H</sup> takes <sup>s</sup>, <sup>a</sup>, <sup>δ</sup> and reports some probability δ*<sup>j</sup>* for every a-successor s*<sup>j</sup>* of s. However, an arbitrary heuristic of type <sup>H</sup>: <sup>S</sup> <sup>×</sup> Act <sup>×</sup> [0, 1] <sup>→</sup> [0, 1]<sup>∗</sup> may lead to non-terminating behavior: If <sup>δ</sup>1,...,δ*<sup>n</sup>* <sup>=</sup> <sup>F</sup>*<sup>i</sup>*−1[s1],...F*<sup>i</sup>*−1[s*n*], then the heuristic has no effect. It is thus natural to require that an *adequate* heuristic H yields probabilities such that the check <sup>Φ</sup>*<sup>a</sup>* (F*<sup>i</sup>*−1) [s] > δ in l. 6 cannot succeed twice for the *same obligation* (i, s, δ) and *same action* a. Formally, this is guaranteed by the following:

Definition 5. *Heuristic* H *is* adequate *if the following triple is valid (for any frame* F*):*

$$\begin{aligned} \left\{ \begin{aligned} &Success(s,a) = s\_1, \dots, s\_n \end{aligned} \right\} \\ \left\{ \begin{aligned} &\delta\_1, \dots, \delta\_n \leftarrow \mathcal{H}(s,a,\delta) \\ &\{ \Phi\_a \left( F \left< s\_1 \mapsto \delta\_1 \right> \dots \left< s\_n \mapsto \delta\_n \right> \right) [s] \le \delta \end{aligned} \right\} \end{aligned}$$

Details regarding our implementation of heuristic H are found in Sect. 6.1.

For an adequate heuristic, attempting to resolve an obligation (i, s, δ) (ll. 3 – 11) either succeeds after spawning it at most <sup>|</sup>Act(s)<sup>|</sup> times or Strengthen<sup>H</sup> returns false. By a similar argument, attempting to resolve an obligation (i > 0, s, ) leads to at most - *<sup>a</sup>*∈Act(*s*) |{s <sup>∈</sup> <sup>S</sup> <sup>|</sup> <sup>P</sup>(s, a, s ) <sup>&</sup>gt; <sup>0</sup>}| other obligations of the form (i−1, s , ). Consequently, the total number of obligations spawned by Algorithm 2 is bounded. Since Algorithm 2 terminates if all obligations have been resolved (l. 12) and each of its loop iterations either returns false, spawns obligations, or resolves an obligation, we conclude:

Lemma 5. StrengthenH(F0, ..., F*k*) *terminates for every adequate heuristic* <sup>H</sup>*.*

Recovery Statement 3. *Let* H *be adequate. Then for qualitative reachability (*<sup>λ</sup> = 0*), all obligations spawned by* Strengthen<sup>H</sup> *as in Algorithm <sup>2</sup> are of the form* (i, s, 0)*.*

**Strengthen***<sup>H</sup>* returns **false**. There are two cases in which Strengthen<sup>H</sup> fails to restore the PrIC3 invariants and returns false. The first case (the left disjunct of l. 4) is that we encounter an obligation for frame F0. Resolving such an obligation would inevitably violate *initiality*; analogously to standard IC3, we thus return false.

The second case (the right disjunct of l. 4) is that we encounter an obligation (i, s, δ) for a bad state <sup>s</sup> <sup>∈</sup> <sup>B</sup> with a probability δ < <sup>1</sup> (though, obviously, all <sup>s</sup> <sup>∈</sup> <sup>B</sup> have probability =1). Resolving such an obligation would inevitably prevents us from restoring *relative inductivity*: If we updated F*i*[s] to δ, we would have Φ (F*<sup>i</sup>*−1) [s]=1 > δ <sup>=</sup> <sup>F</sup>*i*[s]. Notice that, in contrast to standard IC3, this second case *can* occur in PrIC3:

*Example 4.* Assume we have to resolve an obligation (i, s3, <sup>1</sup>/2) for the MDP in Fig. 1. This involves spawning obligations (i−1, s4, δ1) and (i−1, s5, δ2), where <sup>s</sup><sup>5</sup> is a bad state, such that <sup>1</sup>/<sup>3</sup> · <sup>δ</sup><sup>1</sup> <sup>+</sup> <sup>2</sup>/<sup>3</sup> · <sup>δ</sup><sup>2</sup> <sup>≤</sup> <sup>1</sup>/2. Even for <sup>δ</sup><sup>1</sup> = 0, this is only possible if <sup>δ</sup><sup>2</sup> <sup>≤</sup> <sup>3</sup>/<sup>4</sup> <sup>&</sup>lt; <sup>1</sup>. -

**Strengthen***<sup>H</sup>* Cannot Prove Unsafety. If standard IC3 returns false, it proves unsafety by constructing a counterexample, i.e., *a single path from the initial state to a bad state*. If PrIC3 returns false, there are two possible reasons: *Either* the MDP is indeed unsafe, *or* the heuristic H at some point selected probabilities in a way such that Strengthen<sup>H</sup> is unable to restore the PrIC3 invariants (even though the MDP might in fact be safe). Strengthen<sup>H</sup> thus only returns a *potential* counterexample which either proves unsafety or indicates that our heuristic was inappropriate.

Counterexamples in our case consist of subsystems rather than a single path (see Challenge <sup>2</sup> and Sect. 5). Strengthen<sup>H</sup> hence returns the set Q.touched() of all states that eventually appeared in the obligation queue. This set is a conservative approximation, and optimizations as in [1] may be beneficial. Furthermore, in the qualitative case, our potential counterexample subsumes the counterexamples constructed by standard IC3:

Recovery Statement 4. *Let* H<sup>0</sup> *be the adequate heuristic mapping every state to* 0*. For qual. reachability (*λ = 0*), if success* = false *is returned by* Strengthen<sup>H</sup><sup>0</sup> (F0, ..., F*k*)*, then* Q.*touched*() *contains a path from the initial to a bad state.*<sup>6</sup>

<sup>6</sup> Q.touched() might be restricted to only contain this path by some simple adaptions.

Data: global MDP M, set of bad states B, threshold λ Result: true iff Prmax (s<sup>I</sup> <sup>|</sup><sup>=</sup> ♦B) <sup>≤</sup> <sup>λ</sup> <sup>Ω</sup> <sup>←</sup> Initialize(); *touched* ← {s<sup>I</sup> }; 2 do H ← CreateHeuristic(Ω); *safe*, *subsystem* <sup>←</sup> PrIC3H(); if *safe* then return true ; if CheckRefutation(*subsystem*) then return false ; *touched* <sup>←</sup> Enlarge(*touched*, *subsystem*); <sup>Ω</sup> <sup>←</sup> Refine(Ω, *touched*); while *touched* <sup>=</sup> <sup>S</sup>; return Ω(s<sup>I</sup> ) ≤ λ

Algorithm 3: PrIC3: The outermost loop dealing with possibly imprecise heuristics

#### 5 Dealing with Potential Counterexamples

Recall that our core algorithm PrIC3<sup>H</sup> is incomplete for a fixed heuristic H: It cannot give a conclusive answer whenever it finds a potential counterexample for two possible reasons: Either the heuristic H turned out to be inappropriate or the MDP is indeed unsafe. The idea to overcome the former is to call PrIC3<sup>H</sup> finitely often in an outer loop that generates new heuristics until we find an appropriate one: If PrIC3<sup>H</sup> still does not report safety of the MDP, then it is indeed unsafe. We do not blindly generate new heuristics, but use the potential counterexamples returned by PrIC3<sup>H</sup> to refine the previous one.

Let consider the procedure PrIC3 in Algorithm 3 which wraps our core algorithm PrIC3<sup>H</sup> in more detail: First, we create an *oracle* Ω: <sup>S</sup> <sup>→</sup> [0, 1] which (roughly) *estimates* the probability of reaching B for every state. A *perfect oracle* would yield *precise* maximal reachability probabilites, i.e., Ω(s) = Prmax (<sup>s</sup> <sup>|</sup><sup>=</sup> ♦B) for every state <sup>s</sup>. We construct oracles by user-supplied methods (highlighted in blue). Examples of implementations of all user-supplied methods in Algorithm 3 are discussed in Sect. 7.

Assuming the oracle is good, but not perfect, we construct an adequate heuristic <sup>H</sup> selecting probabilities based on the oracle<sup>7</sup> for all successors of a given state: There are various options. The simplest is to pass-through the oracle values. A version that is more robust against noise in the oracle is discussed in Sect. 6. We then invoke PrIC3H. If PrIC3<sup>H</sup> reports safety, the MDP is indeed safe by the soundness of PrIC3H.

Check Refutation. If PrIC3<sup>H</sup> does not report safety, it reports a subsystem that hints to a *potential* counterexample. Formally, this subsystem is a subMDP of states that were 'visited' during the invocation of StrengthenH.

Definition 6 (subMDP). *Let* <sup>M</sup> = (S, s*<sup>I</sup>* , Act, P) *be an MDP and let* <sup>S</sup> <sup>⊆</sup> <sup>S</sup> *with* <sup>s</sup>*<sup>I</sup>* <sup>∈</sup> <sup>S</sup> *. We call* M*<sup>S</sup>*- = (S , s*<sup>I</sup>* , Act, P ) *the* subMDP induced by M and S *, where for all* s, s <sup>∈</sup> <sup>S</sup> *and all* <sup>a</sup> <sup>∈</sup> Act*, we have* <sup>P</sup> (s, a, s ) = P(s, a, s )*.* -

<sup>7</sup> We thus assume that heuristic <sup>H</sup> invokes the oracle whenever it needs to guess some probability.

A subMDP M*S* may be substochastic where missing probability mass never reaches a bad state. Definition <sup>1</sup> is thus relaxed: For all states <sup>s</sup> <sup>∈</sup> <sup>S</sup> we require that - *s*-∈*S*- P(s, a, s ) ≤ 1.If the subsystem is unsafe, we can conclude that the original MDP M is also safe.

Lemma 6. *If* M *is a subMDP of* M *and* M *is unsafe, then* M *is also unsafe.* The role of CheckRefutation is to establish whether the subsystem is indeed a true counterexample or a spurious one. Formally, CheckRefutation should ensure:

 true *res* <sup>←</sup> CheckRefutation (*subsystem*) *res* <sup>=</sup> true <sup>⇔</sup> <sup>M</sup>*subsystem* unsafe .

Again, PrIC3 is backward compatible in the sense that a single fixed heuristic is always sufficient when reasoning about reachability (λ = 0).

Recovery Statement 5. *For qualitative reachability (*λ = 0*) and the heuristic* H<sup>0</sup> *from Recovery Statement 4,* PrIC3 *invokes its core* PrIC3<sup>H</sup> *exactly* once*.*

This statement is true, as PrIC3<sup>H</sup> returns either *safe* or a subsystem containing a path from the initial state to a bad state. In the latter case, CheckRefutation detects that the subsystem is indeed a counterexample which cannot be spurious in the qualitative setting.

We remark that the procedure CheckRefutation invoked in l. 5 is a classical fallback; it runs an (alternative) model checking algorithm, e.g., solving the set of Bellman equations, for the subsystem. In the worst case, i.e., for S = S, we thus solve exactly our problem statement. Empirically (Table 1) we observe that for reasonable oracles the procedure CheckRefutation is invoked on significantly smaller subMDPs. However, in the worst case the subMDP must include *all* paths of the original MDP, and then thus coincides.

Refine Oracle. Whenever we have neither proven the MDP safe nor unsafe, we refine the oracle to prevent generating the same subsystem in the next invocation of PrIC3H. To ensure termination, oracles should only be refined finitely often. That is, we need some progress measure. The set *touched* overapproximates all counterexamples encountered in some invocation of PrIC3<sup>H</sup> and we propose to use its size as the progress measure. While there are several possibilities to update *touched* through the user-defined procedure Enlarge (l. 6), every implementation should hence satisfy true *touched* <sup>←</sup> Enlarge(*touched*, ) |*touched* <sup>|</sup> <sup>&</sup>gt; |*touched*| . Consequently, after finitely many iterations, the oracle is refined with respect to all states. In this case, we may as well rely on solving the characteristic LP problem:

Lemma 7. *The algorithm* PrIC3 *in Algorithm 3 is sound and complete if* Refine(Ω, S) *returns a perfect oracle* Ω *(with* S *is the set of all states).*

Weaker assumptions on Refine are possible, but are beyond the scope of this paper. Moreover, the above lemma does not rely on the abstract concept that heuristic <sup>H</sup> provides suitable probabilities after finitely many refinements.<sup>8</sup>

<sup>8</sup> One could of course now also create a heuristic that is trivial for a perfect oracle and invoke PrIC3<sup>H</sup> with the heuristic for the perfect oracle, but there really is no benefit in doing so.

## 6 Practical PrIC3

So far, we gave a conceptual view on PrIC3, but now take a more practical stance. We detail important features of effective implementations of PrIC3 (based on our empirical evaluation). We first describe an implementation without generalization, and then provide a prototypical extension that allows for three variants of generalization.

## 6.1 A Concrete PrIC3 Instance Without Generalization

*Input.* We describe MDPs using the Prism guarded command language<sup>9</sup>, exemplified in Fig. 3. States are described by valuations to m (integer-valued) program variables vars, and outgoing actions are described by commands of the form

[] guard -> prob1 : update1 & ... & probk : updatek

If a state satisfies guard, then the corresponding action with k branches exists; probabilities are given by probi, the successor states are described by updatei, see Fig. 3b.

Fig. 3. Illustrative Prism-style probabilistic guarded command language example

*Encoding.* We encode frames as logical formulae. Updating frames then corresponds to adding conjuncts, and checking for relative inductivity is a satisfiability call. Our encoding is as follows: States are assignments to the program variables, i.e., States = Z*<sup>m</sup>*. We use various uninterpreted functions, to whom we give semantics using appropriate constraints. Frames<sup>10</sup> are represented by uninterpreted functions Frame : States <sup>→</sup> <sup>R</sup> satisfying Frame (s) = <sup>d</sup> implies <sup>F</sup>[s] <sup>≥</sup> <sup>d</sup>. Likewise, the Bellman operator is an uninterpreted function Phi: States <sup>→</sup> <sup>R</sup> such that Phi(s) = <sup>d</sup> implies Φ (F) [s] <sup>≥</sup> <sup>d</sup>. Finally, we use Bad: States <sup>→</sup> <sup>B</sup> with Bad (s) iff <sup>s</sup> <sup>∈</sup> <sup>B</sup>.

Among the appropriate constraints, we ensure that variables are within their range, bound the values for the frames, and enforce Phi(s)=1 for <sup>s</sup> <sup>∈</sup> <sup>B</sup>. We encode the guarded commands as exemplified by this encoding of the first command in Fig. 3:

$$\begin{aligned} &\forall s \in \mathsf{States} \colon \neg \mathsf{Bad}\left(s\right) \land s[c] < 20\\ &\implies \mathsf{Phihi}\left(s\right) = 0.1 \cdot \mathsf{Frame}\left(\left(s[c], 1\right)\right) + 0.9 \cdot \mathsf{Frame}\left(\left(s[c] + 1, s[f]\right)\right). \end{aligned}$$

<sup>9</sup> Preprocessing ensures a single thread (module) and no deadlocks.

<sup>10</sup> In each operation, we only consider a single frame.

In our implementation, we optimize the encoding. We avoid the uninterpreted functions by applying an adapted Ackerman reduction. We avoid universal quantifiers, by first observing that we always ask whether a single state is not inductive, and then unfolding the guarded commands in the constraints that describe a frame. That encoding grows linear in the size of the maximal out-degree of the MDP, and is in the quantifier-free fragment of linear arithmetic (QFLRIA).

*Heuristic.* We select probabilities δ*<sup>i</sup>* by solving the following optimization problem, with variables <sup>x</sup>*i*, *range*(x*i*) <sup>∈</sup> [0, 1], for states <sup>s</sup>*<sup>i</sup>* <sup>∈</sup> Succs(s, a) and oracle Ω<sup>11</sup>.

$$\text{minimize} \sum\_{\substack{i \\ s\_i \notin B}}^k \left| \frac{x\_i}{\sum\_{j=1}^k x\_j} - \frac{\Omega(s\_i)}{\sum\_{j=1}^n \Omega(s\_j)} \right| \\
\text{s.t. } \delta = \sum\_{i=1}^k P(s, a, s\_i) \cdot \begin{cases} 1, & \text{if } s\_i \in B, \\ x\_i, & \text{else}. \end{cases}$$

The constraint ensures that, if the values x*<sup>i</sup>* correspond to the actual reachability probabilities from s*i*, then the reachability from state s is exactly δ. A constraint stating that <sup>δ</sup> <sup>≥</sup> ... would also be sound, but we choose equality as it preserves room between the actual probability and the threshold we want to show. Finally, the objective function aims to preserve the ratio between the suggested probabilities.

*Repushing and Breaking Cycles. Repushing* [23] is an essential ingredient of both standard IC3 and PrIC3. Intuitively, we avoid opening new frames and spawning obligations that can be deduced from current information. Since repushing generates further obligations in the current frame, its implementation requires that the detection of Zeno-behavior has to be moved from PrIC3<sup>H</sup> into the Strengthen<sup>H</sup> procedure. Therefore, we track the histories of the obligations in the queue. Furthermore, once we detect a cycle we first try to adapt the heuristic H locally to overcome this cyclic behavior instead of immediately giving up. This local adaption reduces the number of PrIC3<sup>H</sup> invocations.

*Extended Queue.* In contrast to standard IC3, the obligation queue might contain entries that vary only in their δ entry. In particular, if the MDP is not a tree, it may occur that the queue contains both (i, s, δ) and (i, s, δ ) with δ>δ . Then, (i, s, δ ) can be safely pruned from the queue. Similarly, after handling (i, s, δ), if some fresh obligation (i, s, δ > δ) is pushed to the queue, it can be substituted with (i, s, δ). To efficiently operationalize these observations, we keep an additional mapping which remains intact over multiple invocations of StrengthenH. We furthermore employed some optimizations for Q.touched() aiming to track potential counterexamples better. After refining the heuristic, one may want to reuse frames or the obligation queue, but empirically this leads to performance degradation as the values in the frames are inconsistent with behavior suggested by the heuristic.

<sup>11</sup> If max Ω(s<sup>j</sup> )=0, we assume ∀j.Ω(s<sup>j</sup> )=0.5. If δ = 0, we omit rescaling to allow x<sup>j</sup> = 0.

## 6.2 Concrete PrIC3 with Generalization

So far, frames are updated by changing single entries whenever we resolve obligations (i, s, δ), i.e., we add conjunctions of the form <sup>F</sup>*i*[s] <sup>≤</sup> <sup>δ</sup>. Equivalently, we may add a constraint <sup>∀</sup>s <sup>∈</sup> <sup>S</sup> : <sup>F</sup>*i*[s ] <sup>≤</sup> <sup>p</sup>{*s*}(s ) with <sup>p</sup>{*s*}(s) = <sup>δ</sup> and <sup>p</sup>{*s*} = 1 for all <sup>s</sup> <sup>=</sup> <sup>s</sup>.

Generalization in IC3 aims to update a set <sup>G</sup> (including <sup>s</sup>) of states in a frame rather than a single one without invalidating relative inductivity. In our setting, we thus consider a function <sup>p</sup>*<sup>G</sup>* : <sup>G</sup> <sup>→</sup> [0, 1] with <sup>p</sup>*G*(s) <sup>≤</sup> <sup>δ</sup> that assigns (possibly different) probabilities to all states in G. Updating a frame then amounts to adding the constraint

$$\forall s \in \mathsf{States} \colon s \in G \Longrightarrow \mathsf{Frame}\left(s\right) \leq p\_G(s).$$

Standard IC3 generalizes by iteratively "dropping" a variable, say <sup>v</sup>. The set <sup>G</sup> then consists of all states that do not differ from the fixed state s except for the value of v. <sup>12</sup> We take the same approach by iteratively dropping program variables. Hence, p*<sup>G</sup>* effectively becomes a mapping from the value s[v] to a probability. We experimented with four types of functions p*<sup>G</sup>* that we describe for Markov chains. The ideas are briefly outlined below; details are beyond the scope of this paper.

## *Constant* <sup>p</sup>*G.* Setting all <sup>s</sup> <sup>∈</sup> <sup>G</sup> to <sup>δ</sup> is straightforward but empirically not helpful.

*Linear Interpolation.* We use a linear function p*<sup>G</sup>* that interpolates two points. The first point (s[v], δ) is obtained from the obligation (i, s, δ). For a second point, consider the following: Let Com be the unique<sup>13</sup> command active at state s. Among all states in G that are enabled in the guard of Com, we take the state s in which s [v] is maximal<sup>14</sup>. The second point for interpolation is then (s [v], Φ (F*<sup>i</sup>*−1) [s ]). If the relative inductivity fails for p*<sup>G</sup>* we do not generalize with p*G*, but may attempt to find other functions.

*Polynomial Interpolation.* Rather than linearly interpolating between two points, we may interpolate using more than two points. In order to properly fit these points, we can use a higher-degree polynomial. We select these points using counterexamples to generalization (CTGs): We start as above with linear interpolation. However, if p*<sup>G</sup>* is not relative inductive, the SMT solver yields a model with state <sup>s</sup> <sup>∈</sup> <sup>G</sup> and probability <sup>δ</sup>, with <sup>s</sup> violating relative inductivity, i.e., Φ (F*<sup>i</sup>*−<sup>1</sup>) [s] > δ. We call (s, Φ (F*<sup>i</sup>*−<sup>1</sup>) [s]) a CTG, and (s[v], Φ (F*<sup>i</sup>*−<sup>1</sup>) [s])) is then a further interpolation point, and we repeat.

Technically, when generalizing using nonlinear constraints, we use real-valued arithmetic with a branch-and-bound-style approach to ensure integer values.

<sup>12</sup> Formally, <sup>G</sup> <sup>=</sup> {s <sup>|</sup> for all <sup>v</sup> <sup>∈</sup> vars \ {v} : <sup>s</sup> (v ) = s(v

<sup>)</sup>}. <sup>13</sup> Recall that we have a Markov chain consisting of a single module.

<sup>14</sup> This implicitly assumes that v is increased. Adaptions are possible.

*Hybrid Interpolation.* In polynomial interpolation, we generate high-degree polynomials and add them to the encoding of the frame. In subsequent invocations, reasoning efficiency is drastically harmed by these high-degree polynomials. Instead, we soundly approximate p*<sup>G</sup>* by a piecewise linear function, and use these constraints in the frame.

#### 7 Experiments

We assess how PrIC3 may contribute to the state of the art in probabilistic model checking. We do some early empirical evaluation showing that PrIC3 is feasible. We see ample room for further improvements of the prototype.

*Implementation.* We implemented a prototype<sup>15</sup> of PrIC3 based on Sect. 6.1 in Python. The input is represented using efficient data structures provided by the model checker Storm. We use an incremental instance of Z3 [47] for each frame, as suggested in [23]. A solver for each frame is important to reduce the time spent on pushing the large frame-encodings. The optimization problem in the heuristic is also solved using Z3. All previously discussed generalizations (none, linear, polynomial, hybrid) are supported.

*Oracle and Refinement.* We support the (pre)computation of four different types of oracles for the initialization step in Algorithm 3: (1) A perfect oracle solving *exactly* the Bellman equations. Such an oracle is unrealistic, but interesting from a conceptual point. (2) Relative frequencies by recording all visited states during simulation. This idea is a naïve simplification of Q-learning. (3) Model checking with decision diagrams (DDs) and few value iterations. Often, a DD representation of a model can be computed fast, and the challenge is in executing sufficient value iterations. We investigate whether doing few value iterations yields a valuable oracle (and covers states close to bad states). (4) Solving a (pessimistic) LP from BFS partial exploration. States that are not expanded are assumed bad. Roughly, this yields oracles covering states close to the initial states.

To implement Refine (cf. Algorithm 3, l. 7), we create an LP for the subMDP induced by the touched states. For states whose successors are not in the touched states, we add a transition to B labeled with the oracle value as probability. The solution of the resulting LP updates the entries corresponding to the touched states.

For Enlarge (cf. Algorithm 3, l. 6), we take the union of the subsystem and the touched states. If this does not change the set of touched states, we also add its successors.

*Setup.* We evaluate the run time and memory consumption of our prototype of PrIC3. We choose a combination of models from the literature (BRP [21], ZeroConf [18]) and some structurally straightforward variants of grids (chain, double chain; see [11, Appendix A]). Since our prototype lacks the sophisticated

<sup>15</sup> The prototype is available open-source from https://github.com/moves-rwth/PrIC3.

preprocessing applied by many state-of-the-art model checkers, it is more sensitive to the precise encoding of a model, e.g., the number of commands. To account for this, we generated new encodings for all models. All experiments were conducted on an single core of an Intel® Xeon® Platinum 8160 processor. We use a 15 min time-limit and report TO otherwise. Memory is limited to 8GB; we report MO if it is exceeded. Apart from the oracle, all parameters of our prototype remain fixed over all experiments. To give an impression of the run times, we compare our prototype with both the explicit (Stormsparse) and DD-based (Stormdd) engine of the model checker Storm 1.4, which compared favourably in QComp [29].

*Results.* In Table 1, we present the run times for various invocations of our prototype and Oracle 4<sup>16</sup>. In particular, we give the model name and the number of (non-trivial) states in the particular instance, and the (estimated) actual probability to reach B. For each model, we consider multiple thresholds λ. The next 8 columns report on the four variants of PrIC3 with varying generalization schemes. Besides the scheme with the run times, we report for each scheme the number of states of the largest (last) subsystem that CheckRefutation in Algorithm 3, l. 5 was invoked upon (column <sup>|</sup>sub|). The last two columns report on the run times for Storm that we provide for comparison. In each row, we mark with purple MDPs that are unsafe, i.e., PrIC3 refutes these MDPs for the given threshold <sup>λ</sup>. We highlight the best configurations of PrIC3.

*Discussion.* Our experiments give a mixed picture on the performance of our implementation of PrIC3. On the one hand, Storm significantly outperforms PrIC3 on most models. On the other hand, PrIC3 is capable of reasoning about huge, yet simple, models with up to 10<sup>12</sup> states that Storm is unable to analyze within the time and memory limits. There is more empirical evidence that PrIC3 may complement the state-of-the-art:

First, *the size of thresholds matters*. Our benchmarks show that—at least without generalization—more "wiggle room" between the precise maximal reachability probability and the threshold generally leads to a better performance. PrIC3 may thus prove bounds for large models where a precise quantitative reachability analysis is out of scope.

Second, PrIC3 *enjoys the benefits of bounded model checking*. In some cases, e.g., ZeroConf for <sup>λ</sup> = 0.45, PrIC3 refutes very fast as it does not need to build the whole model.

Third, if PrIC3 proves the safety of the system, it does so without relying on checking large subsystems in the CheckRefutation step.

Fourth, *generalization is crucial*. Without generalization, PrIC3 is unable to prove safety for any of the considered models with more than 10<sup>3</sup> states. With generalization, however, it can prove safety for very large systems and thresholds close to the exact reachability probability. For example, it proved safety of the

<sup>16</sup> We explore min{|S|, <sup>5000</sup>} states using BFS and Storm.


Table 1. Empirical results. Run times are in seconds; time out = 15 min.

Chain benchmark with 10<sup>12</sup> states for a threshold of 0.4 which differs from the exact reachability probability by 0.006.

Fifth, *there is no best generalization*. There is no clear winner out of the considered generalization approaches. Linear generalization always performs worse than the other ones. In fact, it performs worse than no generalization at all. The hybrid approach, however, occasionally has the edge over the polynomial approach. This indicates that more research is required to find suitable generalizations.

In [11, Appendix A], we also compare the additional three types of oracles (1–3). We observed that only few oracle refinements are needed to prove *safety*; for small models at most one refinement was sufficient. However, this does not hold if the given MDP is unsafe. DoubleChain with λ = 0.15, for example, and Oracle 2 requires 25 refinements.

#### 8 Conclusion

We have presented PrIC3—the first truly probabilistic, yet conservative, extension of IC3 to quantitative reachability in MDPs. Our theoretical development is accompanied by a prototypical implementation and experiments. We believe there is ample space for improvements including an in-depth investigation of suitable oracles and generalizations.

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Synthesis**

## **Good-Enough Synthesis**

Shaull Almagor1(B) and Orna Kupferman<sup>2</sup>

<sup>1</sup> Department of Computer Science, Technion, Haifa, Israel shaull@cs.technion.ac.il <sup>2</sup> School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel orna@cs.huji.ac.il

**Abstract.** We introduce and study *good-enough synthesis* (gesynthesis) – a variant of synthesis in which the system is required to satisfy a given specification ψ only when it interacts with an environments for which a satisfying interaction exists. Formally, an input sequence x is *hopeful* if there exists some output sequence y such that the induced computation <sup>x</sup>⊗<sup>y</sup> satisfies <sup>ψ</sup>, and a system ge-realizes <sup>ψ</sup> if it generates a computation that satisfies ψ on all hopeful input sequences. ge-synthesis is particularly relevant when the notion of correctness is *multi-valued* (rather than Boolean), and thus we seek systems of the highest possible quality, and when synthesizing *autonomous systems*, which interact with unexpected environments and are often only expected to do their best.

We study ge-synthesis in Boolean and multi-valued settings. In both, we suggest and solve various definitions of ge-synthesis, corresponding to different ways a designer may want to take hopefulness into account. We show that in all variants, ge-synthesis is not computationally harder than traditional synthesis, and can be implemented on top of existing tools. Our algorithms are based on careful combinations of nondeterministic and universal automata. We augment systems that ge-realize their specifications by monitors that provide satisfaction information. In the multi-valued setting, we provide both a worst-case analysis and an expectation-based one, the latter corresponding to an interaction with a stochastic environment.

#### **1 Introduction**

*Synthesis* is the automated construction of a system from its specification: given a specification ψ, typically by a linear temporal logic (LTL) formula over sets I and O of input and output signals, the goal is to construct a finite-state system that satisfies ψ [9,20]. At each moment in time, the system reads an assignment, generated by the environment, to the signals in I, and responds with an assignment to the signals in O. Thus, with every input sequence, the system associates an output sequence. The system *realizes* ψ if ψ is satisfied in all the interactions of the system, with all environments [5].

S. Almagor—Supported by the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 837327.

O. Kupferman—Supported in part by the Israel Science Foundation, grant No. 2357/19. c The Author(s) 2020

In practice, the requirement to satisfy the specification in all environments is often too strong. Accordingly, it is common to add assumptions on the behavior of the environment. An assumption may be direct, say given by an LTL formula that restricts the set of possible input sequences [8], less direct, say a bound on the size of the environment [13] or other resources it uses, or conceptual, say rationality from the side of the environment, which may have its own objectives [11,14]. We introduce and study a new type of relaxation of the requirement to satisfy the specification in all environments. The idea behind the relaxation is that if an environment is such that no system can interact with it in a way that satisfies the specification, then we cannot expect our system to succeed. In other words, the system has to satisfy the specification only when it interacts with environments in which this mission is possible. This is particularly relevant when synthesizing *autonomous systems*, which interact with unexpected environments and often replace human behavior, which is only expected to be *good enough* [28], and when the notion of correctness is multi-valued (rather than Boolean), and thus we seek *high-quality* systems.

Before we explain the relaxation formally, let us consider a simple example, and we start with the Boolean setting. Let I = {*req*} and O = {*grant*}. Thus, the system receives requests and generates grants. Consider the specification ψ = GF(*req* ∧ *grant*) ∧ GF(¬*req* ∧ ¬*grant*). Clearly, ψ is not realizable, as an input sequence need not satisfy GF*req* or GF¬*req*. However, a system that always generates a grant upon (and only upon) a request, ge*-realizes* ψ, in the sense that for every input sequence, if there is some interaction with it with which ψ is satisfied, then our system generates such an interaction.

Formally, we model a system by a strategy <sup>f</sup> : (2<sup>I</sup> )<sup>+</sup> <sup>→</sup> <sup>2</sup><sup>O</sup>, which given an input sequence <sup>x</sup> <sup>=</sup> <sup>i</sup><sup>0</sup> · <sup>i</sup><sup>1</sup> · <sup>i</sup><sup>2</sup> ··· ∈ (2<sup>I</sup> )<sup>ω</sup>, generates an output sequence <sup>f</sup>(x) = <sup>f</sup>(i0) · <sup>f</sup>(i<sup>0</sup> · <sup>i</sup>1) · <sup>f</sup>(i<sup>0</sup> · <sup>i</sup><sup>1</sup> · <sup>i</sup>2)··· ∈ (2<sup>O</sup>)<sup>ω</sup>, inducing the computation <sup>x</sup> <sup>⊗</sup> <sup>f</sup>(x)=(i<sup>0</sup> <sup>∪</sup>f(i0))·(i<sup>i</sup> <sup>∪</sup>f(i<sup>0</sup> ·i1))·(i<sup>2</sup> <sup>∪</sup>f(i<sup>0</sup> ·i<sup>1</sup> ·i2))···∈ (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup>, obtained by "merging" x and f(x). In traditional realizability, a system realizes ψ if ψ is satisfied in all environments. Formally, for all input sequences <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup>, the computation x ⊗ f(x) satisfies ψ. For our new notion, we first define when an input sequence <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> is *hopeful*, namely there is an output sequence <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>ω</sup> such that the computation <sup>x</sup>⊗<sup>y</sup> satisfies <sup>ψ</sup>. Then, a system ge*-realizes* ψ if ψ is satisfied in all interactions with hopeful input sequences. Formally, for all <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup>, if <sup>x</sup> is hopeful, then the computation <sup>x</sup> <sup>⊗</sup> <sup>f</sup>(x) satisfies <sup>ψ</sup>.

Since LTL is Boolean, synthesized systems are correct, but there is no reference to their quality. This is a crucial drawback, as designers would be willing to give up manual design only if automated-synthesis algorithms return systems of comparable quality. Addressing this challenge, researchers have developed quantitative specification formalisms. For example, in [4], the input to the synthesis problem includes also Mealy machines that grade different realizing systems. In [1], the specification formalism is the multi-valued logic LTL[F], which augments LTL with quality operators. The satisfaction value of an LTL[F] formula is a real value in [0, 1], where the higher the value, the higher the quality in which the computation satisfies the specification. The quality operators in F can prioritize and weight different scenarios. The synthesis algorithm for LTL[F] seeks systems with a highest possible satisfaction value. One can consider either a worst-case approach, where the satisfaction value of a system is the satisfaction value of its computation with the lowest satisfaction value [1], or a stochastic approach, where it is the expected satisfaction value, given a distribution of the inputs [2].

Consider, for example, an acceleration controller of an autonomous car. Normally, the car should maintain a relatively constant speed. However, in order to optimize travel time, if a long stretch of road is visible and is identified as low-risk, the car should accelerate. Conversely, if an obstacle or some risk factor is identified, the car should decelerate. Clearly, the car cannot accelerate and decelerate at the same time. We capture this desired behavior with the following LTL[F] formula over the inputs {*safe*, *obs*} and outputs {*acc*, *dec*}:

$$\psi = \mathsf{G}(safe \to (acc \oplus\_{\frac{\pi}{5}} \mathsf{X}acc)) \land \mathsf{G}(obs \to (dec \oplus\_{\frac{\pi}{4}} \mathsf{X}dec)) \land \mathsf{G}(\neg(acc \land dec)).$$

Thus, in order to get satisfaction value 1, each detection of a safe stretch should be followed by an acceleration during two transactions, with a preference to the first (by the semantics of the weighted average ⊕<sup>λ</sup> operator, the satisfaction value of *safe* → (*acc* ⊕<sup>2</sup> <sup>3</sup> <sup>X</sup>*acc*) is 1 when *safe* is followed by two *acc*s, <sup>2</sup> <sup>3</sup> when it is followed by one *acc*, and <sup>1</sup> <sup>3</sup> if it is followed by one *acc* with a delay), and each detection of an obstacle should be followed by a deceleration during two transactions, with a (higher) preference to the first. Clearly, ψ is not realizable with satisfaction value 1, as for some input sequences, namely those with simultaneous or successive occurrences of *safe* and *obs*, it is impossible to respond with the desired patterns of acceleration or declaration. Existing frameworks for synthesis cannot handle this challenge. Indeed, we do not want to add an assumption about *safe* and *obs* occurring far apart. Rather, we want our autonomous car to behave in an optimal way also in problematic environments, and we want, when we evaluate the quality of a car, to take into an account the challenge posed by the environment. This is exactly what high-quality ge-synthesis does: for each input sequence, it requires the synthesized car to obtain the maximal satisfaction value that is possible for that input sequence.

We show that in the Boolean setting, ge-synthesis can be reduced to synthesis of LTL with quantification of atomic propositions [26]. Essentially, ge-synthesis of ψ amounts to synthesis of (∃O.ψ) → ψ. We show that by carefully switching between nondeterminisitc and universal automata, we can solve the ge-synthesis problem in doubly-exponential time, thus it is not harder than traditional synthesis. Also, our algorithm is *Safraless*, thus no determinization and parity games are needed [15,17].

A drawback of ge-synthesis is that we do not actually know whether the specification is satisfied. We describe two ways to address this drawback. The first goes beyond providing satisfaction information and enables the designer to partition the specification into a *strong* component, which is guaranteed to be satisfied in all environments, and a *weak* component, which is guaranteed to be satisfied only in hopeful ones. The second way augments ge-realizing systems by "satisfaction indicators". For example, we show that when a system is lucky to interact with an environment that generates a prefix of an input sequence such that, when combined with a suitable prefix of an output sequence, the specification becomes realizable, then ge-synthesis guarantees that the system indeed responds with a suitable prefix of an output sequence. Moreover, it is easy to add to the system a monitor that detects such prefixes, thus indicating that the specification is going to be satisfied in all environments. Additional monitors we suggest detect prefixes after which the satisfaction becomes valid or unsatisfiable.

We continue to the quantitative setting. We parameterize hope by a satisfaction value <sup>v</sup> <sup>∈</sup> [0, 1] and say that an input sequence <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> is <sup>v</sup>*-hopeful* for an LTL[F] formula ψ if an interaction with it can generate a computation that satisfies <sup>ψ</sup> with value at least <sup>v</sup>. Formally, there is an output sequence <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>ω</sup> such that [[x⊗y,ψ]] <sup>≥</sup> <sup>v</sup>, where for a computation <sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup>, we use [[w, ψ]] to denotes the satisfaction value of ψ in w. As we elaborate below, while the basic idea of ge-synthesis, namely "input sequences with a potential to high quality should realize this potential" is as in the Boolean setting, there are several ways to implement this idea.

We start with a worst-case approach. There, a strategy <sup>f</sup> : (2<sup>I</sup> )<sup>+</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> ge-realizes an LTL[F] formula <sup>ψ</sup> if for all input sequences <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup>, if <sup>x</sup> is <sup>v</sup>hopeful, then [[x ⊗ f(x), ψ]] ≥ v. The requirement can be applied to a threshold value or to all values v ∈ [0, 1]. For example, our autonomous car controller has to achieve satisfaction value 1 in roads with no simultaneous or successive occurrences of *safe* and *obs*, and value <sup>3</sup> <sup>4</sup> in roads that violate the latter only with some *obs* followed by *safe*. We then argue that the situation is similar to that of *high-quality assume guarantee synthesis* [3], where richer relations between a quantitative assumption and a quantitative guarantee are of interest. In our case, the assumption is the hopefulness level of the input sequence, namely [[x, ∃O.ψ]], and the guarantee is the satisfaction value of the specification in the generated computation, namely [[x ⊗ f(x), ψ]]. When synthesizing, for example, a robot controller (e.g., vacuum cleaner) in a building, the doors to rooms are controlled by the environment, whereas the movement of the robot by the system. A measure of the performance of the robot has to take into an account both the number of "hopeful rooms", namely these with an open door – a projection of this number on [0, 1] serves as the assumption, and the number of room cleaned – which induces the guarantee. We assume that the desired relation between the assumption and the guarantee is given by a function comb : [0, 1]×[0, 1] → [0, 1], which can capture implication, difference, or ratio.

We continue with an analysis of the expected performance of the system. We do so by assuming a stochastic environment, with a known distribution on the input sequences. We introduce and study two measures for high-quality gesynthesis in a stochastic environment. In the first, termed *expected* ge*-synthesis*, all input sequences are sampled, yet the satisfaction value in each input sequence takes its hopefulness level into account, for example by a comb function as in the assume-guarantee setting. In the second, termed *conditional expected* ge*-synthesis*, only hopeful input sequences are sampled. For both approaches, our synthesis algorithm is based on the high-quality LTL[F] synthesis algorithm of [2], which is based on an analysis of deterministic automata associated with the different satisfaction values of the LTL[F] specification. Here too, the complexity stays doubly exponential. In addition, we extend the synthesized systems with guarantees for satisfaction and monitors indicating satisfaction in various satisfaction levels.

#### **2 Preliminaries**

Consider two finite sets I and O of input and output signals, respectively. For two words <sup>x</sup> <sup>=</sup> <sup>i</sup><sup>0</sup> · <sup>i</sup><sup>1</sup> · <sup>i</sup><sup>2</sup> ··· ∈ (2<sup>I</sup> )<sup>ω</sup> and <sup>y</sup> <sup>=</sup> <sup>o</sup><sup>0</sup> · <sup>o</sup><sup>1</sup> · <sup>o</sup><sup>2</sup> ··· ∈ (2<sup>I</sup> )<sup>ω</sup>, we define <sup>x</sup> <sup>⊗</sup> <sup>y</sup> as the word in (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> obtained by merging <sup>x</sup> and <sup>y</sup>. Thus, <sup>x</sup> <sup>⊗</sup> <sup>y</sup> <sup>=</sup> (i<sup>0</sup> ∪ o0)·(i<sup>1</sup> ∪ o1)·(i<sup>2</sup> ∪ o2)··· . The definition is similar for finite x and y of the same length. For a word <sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup>, we use <sup>w</sup>|<sup>I</sup> to denote the projection of <sup>w</sup> on I. In particular, (x ⊗ y)|<sup>I</sup> = x.

<sup>A</sup> *strategy* is a function <sup>f</sup> : (2<sup>I</sup> )<sup>+</sup> <sup>→</sup> <sup>2</sup><sup>O</sup>. Intuitively, <sup>f</sup> models the interaction of a system that generates in each moment in time a letter in 2<sup>O</sup> with an environment that generates letters in 2<sup>I</sup> . For an input sequence <sup>x</sup> <sup>=</sup> <sup>i</sup><sup>0</sup> ·i<sup>1</sup> ·i<sup>2</sup> ···∈ (2<sup>I</sup> )<sup>ω</sup>, we use <sup>f</sup>(x) to denote the output sequence <sup>f</sup>(i0)·f(i<sup>0</sup> ·i1)·f(i<sup>0</sup> ·i<sup>1</sup> ·i2)···∈ (2<sup>O</sup>)<sup>ω</sup>. Then, <sup>x</sup> <sup>⊗</sup> <sup>f</sup>(x) <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> is the *computation* of <sup>f</sup> on <sup>x</sup>. Note that the environment initiates the interaction, by inputting i0. Of special interest are *finite-state strategies*, induced by finite state transducers. Formally, an I/O*-transducer* is T = I, O, S, s0,M, τ , where S is a finite set of states, s<sup>0</sup> ∈ S is an initial state, <sup>M</sup> : <sup>S</sup> <sup>×</sup> <sup>2</sup><sup>I</sup> <sup>→</sup> <sup>S</sup> is a transition function, and <sup>τ</sup> : <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> is a labelling function. For <sup>x</sup> <sup>=</sup> <sup>i</sup><sup>0</sup> · <sup>i</sup><sup>1</sup> · <sup>i</sup><sup>2</sup> ··· ∈ (2<sup>I</sup> )∗, let <sup>M</sup>∗(x) be the state in <sup>S</sup> that <sup>T</sup> reaches after reading x. Thus is, M∗() = s<sup>0</sup> and for every j ≥ 0, we have that M∗(i<sup>0</sup> ·i<sup>1</sup> ·i<sup>2</sup> ···i<sup>j</sup> ) = M(M∗(i<sup>0</sup> ·i<sup>1</sup> ·i<sup>2</sup> ···i<sup>j</sup>−<sup>1</sup>), i<sup>j</sup> ). Then, T induces the strategy <sup>f</sup><sup>T</sup> : (2<sup>I</sup> )<sup>+</sup> <sup>→</sup> <sup>2</sup><sup>O</sup>, where for every <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>+</sup>, we have that <sup>f</sup><sup>T</sup> (x) = <sup>τ</sup> (M∗(x)). We use T (x) and x ⊗ T (x) to denote the output sequence and the computation of T on x, respectively, and talk about T realizing a specification, referring to the strategy f<sup>T</sup> .

We specify on-going behaviors of reactive systems using the *linear temporal logic* LTL [19]. Formulas of LTL are constructed from a set AP of atomic proposition using the usual Boolean operators and temporal operators like G ("always"), F ("eventually"), X ("next time"), and U ("until"). Each LTL formula <sup>ψ</sup> defines a language <sup>L</sup>(ψ) = {<sup>w</sup> : <sup>w</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup>} ⊆ (2AP )<sup>ω</sup>. We also use *automata on infinite words* for specifying and reasoning about on-going behaviors. We use automata with different branching modes (nondeterministic, where some run has to be accepting; universal, where all runs have to be accepting; and deterministic, where there is a single run) and different acceptance conditions (B¨uchi, co-B¨uchi, and parity). We use the three letter acronyms NBW, UCW, DPW, and DFW, to refer to nondeterministic B¨uchi, universal co-B¨uchi, deterministic parity, and deterministic finite word automata, respectively. Given an LTL formula <sup>ψ</sup> over AP, one can constructs an NBW <sup>A</sup><sup>ψ</sup> with at most 2<sup>O</sup>(|ψ|) states such that L(Aψ) = L(ψ) [27]. Constructing an NBW for ¬ψ and then dualizing it, results in a UCW for L(ψ), also with at most 2O(|ψ|) states. Determinization [23] then leads to a DPW for L(ψ) with at at most 2<sup>2</sup>O(*|*ψ*|*) states and index 2O(|ψ|) . For full definitions of LTL, automata, and their relation, see [12].

Consider an LTL formula ψ over I ∪ O. We say that ψ is *realizable* if there is a finite-state strategy <sup>f</sup> : (2<sup>I</sup> )<sup>+</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> such that for all <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )ω, we have that x ⊗ f(x) |= ψ. That is, the computation of f on every input sequence satisfies <sup>ψ</sup>. We say that a word <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> is *hopeful* for <sup>ψ</sup> if there is <sup>y</sup> <sup>∈</sup> (2O)<sup>ω</sup> such that <sup>x</sup>⊗<sup>y</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup>. Then, we say that <sup>ψ</sup> is *good-enough realizable* (ge-realizable, for short) if there is a finite-state strategy <sup>f</sup> : (2<sup>I</sup> )<sup>+</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> such that for every <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> that is hopeful for ψ, we have that x⊗f(x) |= ψ. That is, if there is some output sequence whose combination with x satisfies ψ, then the computation of f on x satisfies ψ. The LTL ge-synthesis problem is then to decide whether a given LTL formula is ge-realizable, and if so, to return a transducer that ge-realizes it. Clearly, every realizable specification is ge-realizable – by the same transducer. We say that ψ is *universally satisfiable* if all input sequences are hopeful for ψ. It is easy to see that for universally satisfiable specifications, realizability and ge-realizability coincide. On the other hand, as demonstrated in Sect. 1, there are specifications that are not realizable and are ge-realizable.

*Example 1.* Let I = {p} and O = {q}. Consider the specification ψ = GF((Xp)∧ <sup>q</sup>)∧GF((X¬p)∧ ¬q). Clearly, <sup>ψ</sup> is not realizable, as an input sequence <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> is hopeful for ψ iff x |= GFp ∧ GF¬p. Since the system has to assign a value to q before it knowns the value of Xp, it seems that ψ is also not ge-realizable. As we show below, however, the specification ψ is ge-realizable. Intuitively, it follows from the fact that hopeful input sequences consists of alternating p-blocks and (¬p)-blocks. Then, by outputting ¬q in p-blocks and outputting q in (¬p)-blocks, the system guarantees that each last position in a (¬p)-block satisfies q∧Xp and each last position in a <sup>p</sup>-block satisfies (¬q)∧Xp. Formally, <sup>ψ</sup> is ge-realized by the transducer T = {p}, {q}, {s0, s1}, s0,M, τ , where M(s0, ∅) = M(s1, ∅) = s0, M(s0, {p}) = M(s1, {p}) = s1, τ (s0) = {q}, and τ (s1) = ∅.

#### **3 LTL Good-Enough Synthesis**

Recall that a strategy <sup>f</sup> : (2<sup>I</sup> )<sup>+</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> ge-realizes an LTL formula <sup>ψ</sup> if its computations on all hopeful input sequences satisfy ψ. Thus, for every input sequence <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup>, either <sup>x</sup>⊗<sup>y</sup> |<sup>=</sup> <sup>ψ</sup> for all <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>ω</sup>, or <sup>x</sup>⊗f(x) <sup>|</sup><sup>=</sup> <sup>ψ</sup>. The above suggests that algorithms for solving LTL ge-synthesis involve existential and universal quantification over the behavior of output signals. The logic EQLTL extends LTL by allowing existential quantification over atomic propositions [26]. We refer here to the case the atomic propositions are the signals in I ∪ O, and the signals in O are existentially quantified. Then, an EQLTL formula is of the form <sup>∃</sup>O.ψ, and a computation <sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> satisfies <sup>∃</sup>O.ψ iff there is <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>ω</sup> such that w|<sup>I</sup> ⊗ y |= ψ. Dually, AQLTL extends LTL by allowing universal quantification over atomic propositions. We consider here formulas of the form <sup>∀</sup>O.ψ, which are equivalent to ¬∃O.¬ψ. Indeed, a computation <sup>w</sup> <sup>∈</sup> (2I∪O)<sup>ω</sup> satisfies <sup>∀</sup>O.ψ iff for all <sup>y</sup> <sup>∈</sup> (2O)ω, we have that <sup>w</sup>|<sup>I</sup> <sup>⊗</sup> <sup>y</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup>. Note that in both the existential and universal cases, the O-component of w is ignored. Accordingly, we sometimes interpret EQLTL and AQLTL formulas with respect to input sequences <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )ω. Also note that both EQLTL and AQLTL increase the expressive power of LTL. For example, the EQLTL formula ∃q.q ∧ X¬q ∧ G(q ↔ XXq) ∧ G(q → p) states that p holds in all even positions of the computation, which cannot be specified in LTL [29].

#### **Theorem 1.** *The LTL* ge*-synthesis problem is 2EXPTIME-complete.*

*Proof.* We start with the upper bound. Given an LTL formula ψ over I ∪ O, we describe an algorithm that returns a transducer <sup>T</sup> that ge-realizes <sup>ψ</sup>, or declares that no such transducer exists.

It is not hard to see that <sup>T</sup> ge-realizes <sup>ψ</sup> iff <sup>T</sup> realizes <sup>ϕ</sup> <sup>=</sup> <sup>ψ</sup> ∨ ∀O.¬ψ. Indeed, an input sequence <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> is hopeful for <sup>ψ</sup> iff <sup>x</sup> <sup>|</sup><sup>=</sup> <sup>∃</sup>O.ψ, and so the specification ϕ requires all hopeful input sequences to satisfy ψ. A naive construction of an NBW for ϕ involves a universal projection of the signals in O in an automaton for ¬ψ, and results in an NBW that is doubly exponential. In order to circumvent the extra exponent, we construct an NBW A¬<sup>ϕ</sup> for ¬ϕ, and then dualize it to get a UCW for ϕ, as follows.

Let A¬<sup>ψ</sup> be an NBW for L(¬ψ) and A∃O.ψ be an NBW for L(∃O.ψ). Thus, A∃O.ψ is obtained from an NBW A<sup>ψ</sup> for L(ψ) by existentially projecting its transitions on 2<sup>I</sup> . In more details, if <sup>A</sup><sup>ψ</sup> <sup>=</sup> <sup>2</sup><sup>I</sup>∪<sup>O</sup>, Q, Q0, δ, α, then <sup>A</sup>∃O.ψ <sup>=</sup> 2<sup>I</sup>∪<sup>O</sup>, Q, Q0, δ , α, where for all <sup>q</sup> <sup>∈</sup> <sup>Q</sup> and <sup>i</sup> <sup>∪</sup> <sup>o</sup> <sup>∈</sup> <sup>2</sup><sup>I</sup>∪<sup>O</sup>, we have <sup>δ</sup> (q, σ) = - <sup>o</sup>∈2<sup>O</sup> {δ(q,(<sup>σ</sup> <sup>∩</sup> <sup>I</sup>) <sup>∪</sup> <sup>o</sup>)}.

Let A¬<sup>ϕ</sup> be an NBW for the intersection of A¬<sup>ψ</sup> and A∃O.ψ. We can define A¬<sup>ϕ</sup> as the product of A¬<sup>ψ</sup> and A∃O.ψ, possibly using the generalized B¨uchi acceptance condition (see Remark 1), thus its size is exponential in ψ. The language of <sup>A</sup>¬<sup>ϕ</sup> is then {<sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> : <sup>w</sup> |<sup>=</sup> <sup>ψ</sup> and <sup>w</sup> <sup>|</sup><sup>=</sup> <sup>∃</sup>O.ψ}. We then solve usual synthesis for the complementing UCW. Its language is {<sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> : w |= ψ or w |= ∀O.¬ψ}, as required. By [17], the synthesis problem for UCW can be solved in EXPTIME, and we are done.

The lower bound follows from the 2EXPTIME-hardness of LTL realizability [22]. The hardness proof there constructs, given a 2EXPTIME Turing machine M, an LTL formula ψ that is realizable iff M accepts the empty tape. Since all input sequences are hopeful for ψ, realizability and ge-realizability coincide, and we are done.

Note that working with a UCW not only handles the universal quantification for free but also has the advantage of a Safraless synthesis algorithm – no determinization and parity games are needed [15,17]. Also note that the algorithm we suggest in the proof of Theorem 1 can be generalized to handle specifications that are arbitrary positive Boolean combinations of EQLTL formulas.

*Remark 1* **[Products and optimizations]**. Throughout the paper, we construct products of automata whose state space is 2cl(ψ) , and states correspond to maximal consistent subsets of cl(ψ), possibly in the scope of an existential quantifier of O. Accordingly, the product can be minimized to include only consistent pairs. Also, since traditional-synthesis algorithms, in particular the Safraless algorithms we use, can handle automata with *generalized* B¨uchi and co-B¨uchi acceptance condition, we need only one copy of the product.

*Remark 2* **[Determinancy of the** ge**-synthesis game]**. Determinancy of games implies that in traditional synthesis, a specification ψ is not I/O-realizable iff ¬ψ is O/I-realizable This is useful, for example when we want to synthesize a transducer of a bounded size and proceed simultaneously, aiming to synthesize either a system transducer that realizes ψ or an environment transducer that realizes <sup>¬</sup><sup>ψ</sup> [17]. For ge-synthesis, simple dualization does not hold, but we do have determinancy in the sense that (∃O.ψ) → ψ is not I/O-realizable iff (∃O.ψ) ∧ ¬<sup>ψ</sup> is O/I-realizable. Accordingly, <sup>ψ</sup> is not ge-realizable iff the environment has a strategy that generates, for each output sequence <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>ω</sup>, a helpful input sequence <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> such that <sup>x</sup> <sup>⊗</sup> <sup>y</sup> <sup>|</sup><sup>=</sup> <sup>¬</sup>ψ. In the full version, we formalize and study this duality further.

### **4 Guarantees in Good-Enough Synthesis**

A drawback of ge-synthesis is that we do not actually know whether the specification is satisfied. In this section we describe two ways to address this drawback. The first way goes beyond providing satisfaction information and enables the designer to partition the specification into to a *strong* component, which should be satisfied in all environments, and a *weak* component, which should be satisfied only in hopeful ones. The second way augments ge-realizing transducers by flags, raised to indicate the status of the satisfaction.

#### **4.1** ge**-Synthesis with a Guarantee**

Recall that ge-realizability is suitable especially in settings where we design a system that has to do its best in all environments. ge-synthesis with a guarantee is suitable in settings where we want to make sure that some components of the specification are satisfied in all environment. Accordingly, a specification is an LTL formula <sup>ψ</sup> <sup>=</sup> <sup>ψ</sup>*strong* <sup>∧</sup> <sup>ψ</sup>*weak* . When we ge*-synthesize* <sup>ψ</sup>*weak with guarantee* <sup>ψ</sup>*strong* , we seek a transducer <sup>T</sup> that realizes <sup>ψ</sup>*strong* and ge-realizes <sup>ψ</sup>*weak* . Thus, for all input sequences <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup>, we have that <sup>x</sup> ⊗ T (x) <sup>|</sup><sup>=</sup> <sup>ψ</sup>*strong* , and if <sup>x</sup> is hopeful for ψ*weak* , then x ⊗ T (x) |= ψ*strong* .

**Theorem 2.** *The LTL* ge*-synthesis with guarantee problem is 2EXPTIMEcomplete.*

*Proof.* Consider an LTL formula ψ = ψ*strong* ∧ ψ*weak* over I ∪ O. It is not hard to see that a transducer <sup>T</sup> ge-realizes <sup>ψ</sup>*weak* with guarantee <sup>ψ</sup>*strong* iff <sup>T</sup> realizes ϕ = ψ*strong* ∧((∃O.ψ*weak* ) → ψ*weak* ). We can then construct a UCW A<sup>ϕ</sup> for L(ϕ) by dualizing an NBW for its negation ¬ψ*strong* ∨ ((∃O.ψ*weak* ) ∧ ¬ψ*weak* ), which can be constructed using techniques similar to those in the proof of Theorem 1. We then proceed with standard synthesis for Aϕ. Note that the approach is Safraless. Taking an empty (that is, True) guarantee, a lower bound follows from the 2EXPTIME-hardness of LTL ge-synthesis.

#### **4.2 Flags by a** ge**-Realizing Transducer**

For a language <sup>L</sup> <sup>⊆</sup> (2I∪O)<sup>ω</sup> and a finite word <sup>w</sup> <sup>∈</sup> (2I∪O)∗, let <sup>L</sup><sup>w</sup> <sup>=</sup> {w <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> : <sup>w</sup> · <sup>w</sup> <sup>∈</sup> <sup>L</sup>}. That is, <sup>L</sup><sup>w</sup> is the language of suffixes of words in <sup>L</sup> that have <sup>w</sup> as a prefix. We say that a word <sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>∗</sup> is *green for* <sup>L</sup> if <sup>L</sup><sup>w</sup> is realizable. Then, a word <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>∗</sup> is *green for* <sup>L</sup> if there is <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>∗</sup> such that x ⊗ y is green for L. When a system is lucky to interact with an environment that generates a green input sequence, we want the system to react in a way that generates a green prefix, and then realizes the specification. Formally, we say that a strategy <sup>f</sup> : (2<sup>I</sup> )<sup>+</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> *green realizes* <sup>L</sup> if for every <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>+</sup>, if <sup>x</sup> is green for L, then x ⊗ f(x) is green for L. <sup>1</sup>,<sup>2</sup> We say that a word <sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>∗</sup> is *light green for* L if L<sup>w</sup> is universally satisfiable, thus all input sequences are hopeful for <sup>L</sup><sup>w</sup>. A word <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>∗</sup> is *light green for* <sup>L</sup> if there is <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>∗</sup> such that <sup>x</sup> <sup>⊗</sup> <sup>y</sup> is light green for <sup>L</sup>. It is not hard to see that for ge-realizable languages, green and light green coincide. Indeed, if L is universally satisfiable and ge-realizable, then L is realizable.

#### **Theorem 3.** ge*-realizability is strictly stronger than green realizability.*

*Proof.* We first prove that every strategy <sup>f</sup> : (2<sup>I</sup> )<sup>+</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> that ge-realizes a specification <sup>ψ</sup> also green realizes <sup>ψ</sup>. Consider <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>+</sup> that is green for <sup>ψ</sup>. By definition, there is <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>+</sup> such that <sup>L</sup><sup>x</sup>⊗<sup>y</sup> is realizable. Then, for every <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup>, there is <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>ω</sup> such that <sup>x</sup> <sup>⊗</sup> <sup>y</sup> in <sup>L</sup><sup>x</sup>⊗<sup>y</sup>. Hence, for every <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup>, we have that <sup>x</sup> · <sup>x</sup> is hopeful. Therefore, as <sup>f</sup> ge-realizes <sup>ψ</sup>, we have that (x · x ) ⊗ f(x · x ) |= ψ. Thus, x ⊗ f(x) is green, and so f green realizes ψ.

We continue and describe a specification that is green realizable and not gerealizable. Let I = {p} and O = {q}. Consider the specification ψ = G((Xp) ↔ q). Clearly, ψ is not realizable, as the system has to commit a value for q before a value for Xp is known. Likewise, no word <sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>∗</sup> is green for <sup>ψ</sup>, and so no finite input sequence <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>∗</sup> is green for <sup>ψ</sup>. Hence, every strategy (vacuously) green realizes <sup>ψ</sup>. On the other hand, for every input sequences <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> there is an output sequence <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>ω</sup> such that <sup>x</sup> <sup>⊗</sup> <sup>y</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup>. Thus, all input sequences are hopeful for ψ. Thus, synthesis and ge-synthesis coincide for ψ, which is not ge-realizable.

Theorem 3 brings with it two good news. The first is that a ge-realizing transducer has the desired property of being also green realizing. The second has

<sup>1</sup> Note that while the definition of green realization does not refer to directly, we have that is green iff <sup>L</sup> is realizable, in which case all <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> ) <sup>∗</sup> are green.

<sup>2</sup> While synthesis corresponds to finding a winning strategy for the system, green synthesis can be viewed as a subgame-perfect best-response strategy, where the system does its best in every subgame, even if it loses the overall game.

to do with our goal of providing the user with information about the satisfaction status, in particular raising a green flag whenever a green prefix is detected. By Theorem 3, such a flag indicates that the computation generated by our ge-realizing transducer satisfies the specification. A naive way to detect green prefixes for a specification ψ is to solve the synthesis problem for ψ by solving a game on top of a DPW D<sup>ψ</sup> for ψ. The winning positions in the game are states in Dψ. By defining them as accepting states, we can obtain from D<sup>ψ</sup> a DFW for green prefixes. Then, we run this DFW in parallel with the ge-realizing transducer, and raise the green flag whenever a green prefix is detected. This, however, requires a generation of D<sup>ψ</sup> and a solution of parity games. Below we describe a much simpler way, which makes use of the fact that our transducer ge-realizes the specification.

Recall that if L is universally satisfiable and ge-realizable, then L is realizable. Accordingly, given a transducer <sup>T</sup> that ge-realizes <sup>ψ</sup>, we can augment it with green flags by running in parallel a DFW that detects light-green prefixes. As we argue below, constructing such a DFW only requires an application of the subset construction on top of an NBW for the existential projection of ψ on 2<sup>I</sup> .

**Lemma 1.** *Given an LTL formula* <sup>ψ</sup> *over* <sup>I</sup> <sup>∪</sup> <sup>O</sup>*, we can construct a DFA* <sup>S</sup> *of size* 2<sup>2</sup>O(*|*ψ*|*) *such that* <sup>L</sup>(S) = {<sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>∗</sup> : <sup>x</sup> *is light green for* <sup>L</sup>(ψ)}*.*

*Proof.* Let A<sup>ψ</sup> = <sup>2</sup><sup>I</sup>∪<sup>O</sup>, Q, δ, Q0, α be an NBW for <sup>L</sup>(ψ), and let <sup>B</sup><sup>ψ</sup> <sup>=</sup> 2<sup>I</sup> , Q, δ , <sup>Q</sup>0, α be its existential projection on 2<sup>I</sup> . Thus, for every <sup>q</sup> <sup>∈</sup> <sup>Q</sup> and <sup>i</sup> <sup>∈</sup> <sup>2</sup><sup>I</sup> , we have δ (q, i) = - <sup>o</sup>∈2<sup>O</sup> <sup>δ</sup>(q, i <sup>∪</sup> <sup>o</sup>). We define the DFW <sup>S</sup> <sup>=</sup> <sup>2</sup><sup>I</sup> , <sup>2</sup><sup>Q</sup>,M, {Q0}, F, where <sup>M</sup> follows the subset construction of <sup>B</sup>ψ: for every <sup>S</sup> <sup>∈</sup> <sup>2</sup><sup>Q</sup> and <sup>i</sup> <sup>∈</sup> <sup>2</sup><sup>I</sup> , we have M(S, i) = - <sup>s</sup>∈<sup>S</sup> δ (s, i). Then, <sup>F</sup> <sup>=</sup> {<sup>S</sup> <sup>∈</sup> <sup>2</sup><sup>Q</sup> : <sup>L</sup>(B<sup>S</sup> <sup>ψ</sup>) = (2<sup>I</sup> )<sup>ω</sup>}. Observe that <sup>S</sup> rejects <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>∗</sup> iff there is <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> such that for all <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>∗</sup> and <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>ω</sup>, no state in <sup>δ</sup>(Q0, x <sup>⊗</sup> <sup>y</sup>) accepts <sup>x</sup> <sup>⊗</sup> <sup>y</sup> . Thus, S rejects x iff x is not light green, and accepts it otherwise. Note that the definition of F involves universality checking, possibly via complementation, yet no determinization is required, and the size of <sup>S</sup> is 2<sup>2</sup>O(*|*ψ*|*) .

Note that once we reach an accepting state in S, we can make it an accepting loop. Indeed, once a green prefix is detected, then all prefixes that extend it are green. Accordingly, once the green flag is raised, it stays up. Also note that if an input sequence is not hopeful for ψ, then none of its prefixes is light green for ψ. The converse, however, is not true: an input sequence may be hopeful and still have no light green prefixes. For example, taking I = {p}, the input sequence {p}<sup>ω</sup> is hopeful for <sup>G</sup>p, yet none of its prefixes is green light, as it can be extended to an input sequence with ¬p.

Green flags provide information about satisfaction. Two additional flags of interest are related to safety and co-safety properties:

– A word <sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>∗</sup> is *red for* <sup>L</sup> if <sup>L</sup><sup>w</sup> <sup>=</sup> <sup>∅</sup>. A word <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>∗</sup> is *red for* <sup>L</sup> if for all <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)∗, we have that <sup>x</sup>⊗<sup>y</sup> is red for <sup>L</sup>. Thus, when the environment generates x, then no matter how the system responds, L is not satisfied.

– a word <sup>w</sup> <sup>∈</sup> (2I∪O)<sup>∗</sup> is *blue for* <sup>L</sup> when <sup>L</sup><sup>w</sup> = (2I∪O)ω, and then define a word <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>∗</sup> as *blue for* <sup>L</sup> if there is <sup>y</sup> <sup>∈</sup> (2O)<sup>∗</sup> such that <sup>x</sup> <sup>⊗</sup> <sup>y</sup> is blue for <sup>L</sup>. Thus, when the environment generates x, the system can respond in a way that guarantees satisfaction no matter how the interaction continues.

A monitor that detects red and blue prefixes for L can be added to a transducer that ge-realizes L. As has been the case with the monitor for green prefixes, its construction is based on applying the subset construction on an NBW for L [16]. Also, once a red or blue flag is raised, it stays up. In a way analogous to green realizability, we seek a transducer that ge-realizes the specification and generates a red prefix only if all interactions generate a red prefix, and generates a blue prefix whenever this is possible. In the full version, we show that while ge-realization implies *red realization*, it may conflict with *blue realization*.

#### **5 High-Quality Good-Enough Synthesis**

ge-synthesis is of special interest when the satisfaction value of the specification is multi-valued, and we want to synthesize high-quality systems. We start by defining the multi-valued logic LTL[F], which is our multi-valued specification formalism. We then study LTL[F] ge-synthesis, first in a worst-case approach, where the satisfaction value of a transducer is the satisfaction value of its computation with the lowest satisfaction value, and then in a stochastic approach, where it is the expected satisfaction value, given a distribution of the inputs.

## **5.1 The Logic LTL[***F***]**

Let AP be a set of Boolean atomic propositions and let F⊆{<sup>f</sup> : [0, 1]<sup>k</sup> <sup>→</sup> [0, 1] : k ∈ IN} be a set of *quality operators*. An LTL[F] formula is one of the following:


The semantics of LTL[F] formulas is defined with respect to infinite computations over AP. For a computation <sup>w</sup> <sup>=</sup> <sup>w</sup>0, w1,... <sup>∈</sup> (2AP )<sup>ω</sup> and position <sup>j</sup> <sup>≥</sup> 0, we use w<sup>j</sup> to denote the suffix w<sup>j</sup> , wj+1,.... The semantics maps a computation w and an LTL[F] formula ψ to the *satisfaction value* of ψ in w, denoted [[w, ψ]]. The satisfaction value is in [0, 1] and is defined inductively as follows.

– [[w, True]] = 1 and [[w, False]] = 0. – For p ∈ AP, we have that [[w, p]] = 1 if p ∈ w0, and [[w, p]] = 0 if p ∈ w0. – [[w, f(ψ1, ..., ψk)]] = f([[w, ψ1]], ..., [[w, ψk]]). – [[w,Xψ1]] = [[w<sup>1</sup>, ψ1]]. – [[w, ψ1Uψ2]] = max <sup>i</sup>≥<sup>0</sup> {min{[[w<sup>i</sup> , ψ2]], min <sup>0</sup>≤j<i[[w<sup>j</sup> , ψ1]]}}.

The logic LTL can be viewed as LTL[F] for F that models the usual Boolean operators. In particular, the only possible satisfaction values are 0 and 1. We abbreviate common functions as described below. Let x, y, λ ∈ [0, 1]. Then,

• ¬x = 1 − x • x ∨ y = max{x, y} • x ∧ y = min{x, y} • x → y = max{1 − x, y} • <sup>λ</sup>x = λ · x • x ⊕<sup>λ</sup> y = λ · x + (1 − λ) · y

The realizability problem for LTL[F] is an optimization problem: For an LTL[F] specification ψ and a transducer T , we define the satisfaction value of <sup>ψ</sup> in <sup>T</sup> , denoted [[<sup>T</sup> , ψ]], by min{[[<sup>x</sup> ⊗ T (x), ψ]] : <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )ω}, namely the satisfaction value of ψ in the worst-case. Then, the synthesis problem is to find, given ψ, a transducer that maximizes its satisfaction value. Moving to a decision problem, given ψ and a threshold value v ∈ [0, 1], we say that ψ is v*-realizable* if there exists a transducer T such that [[T , ψ]] ≥ v, and the synthesis problem is to find, given ψ and v, a transducer T that v-realizes ψ.

For an LTL[F] formula ψ, let V (ψ) be the set of possible satisfaction values of <sup>ψ</sup> in arbitrary computations. Thus, <sup>V</sup> (ψ) = {[[w, ψ]] : <sup>w</sup> <sup>∈</sup> (2AP )<sup>ω</sup>}.

**Theorem 4** [1]. *Consider an* LTL[F] *formula* <sup>ψ</sup>*.*


As with LTL, we define the existential and universal extensions EQLTL[F] and AQLTL[F] of LTL[F]. Here too, we consider the case AP = I ∪ O, with the signals in O being quantified. Then, [[w, ∃O.ψ]] = max<sup>y</sup>∈(2O)<sup>ω</sup> {[[w|<sup>I</sup> ⊗ y,ψ]]} and [[w, ∀O.ψ]] = min<sup>y</sup>∈(2O)<sup>ω</sup> {[[w|<sup>I</sup> ⊗ y,ψ]]}.

*Remark 3* **[On the semantics of** EQLTL[F]**]**. It is tempting to interpret an expression like [[w, ∃O.ψ]] ≤ v as "there exists an output sequence y such that [[w<sup>I</sup> ⊗ y,ψ]] ≤ v". By the semantics of ∃O.ψ, however, [[w, ∃O.ψ]] ≤ v actually means that max<sup>y</sup>∈(2O)<sup>ω</sup> [[w<sup>I</sup> ⊗ y,ψ]] ≤ v. Thus, the correct interpretation is "for all output sequences y, we have that [[w<sup>I</sup> ⊗ y,ψ]] ≤ v".

## **5.2 LTL[***F***]** ge**-Synthesis**

For a value <sup>v</sup> <sup>∈</sup> [0, 1], we say that <sup>x</sup> is <sup>v</sup>*-hopeful for* <sup>ψ</sup> if there is <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>ω</sup> such that [[<sup>x</sup> <sup>⊗</sup> y,ψ]] <sup>≥</sup> <sup>v</sup>. We study two variants of LTL[F] ge-synthesis:


In the Boolean case, the two variants coincide, taking v = 1. Indeed, then, for every <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )ω, if <sup>x</sup> is hopeful, then <sup>x</sup> <sup>⊗</sup> <sup>f</sup>(x) has to satisfy <sup>ψ</sup>. We note that ge-realization with a threshold is not monotone, in the sense that decreasing the threshold need not lead to ge-realization. Indeed, the lower is the threshold v, the more input sequences are v-helpful (see Example 2). Accordingly, we do not search for a maximal threshold, and rather may ask about a desired threshold or about ge-synthesis without a threshold.

Solving the ge-synthesis problem, a naive combination of the automata construction of Theorem 4 with the projection technique of Theorem 1, corresponds to an erroneous semantics of EQLTL[F], as noted in Remark 3. Before describing our construction, it is helpful to state the correct (perhaps less intuitive) interpretation of existential and universal quantification in the quantitative setting:

**Lemma 2.** *For every* LTL[F] *formula* <sup>ψ</sup> *and an input sequence* <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup>*, we have that* [[x, ∃O.ψ]] = 1 − [[x, ∀O.¬ψ]]*. Accordingly, for every value* v ∈ [0, 1]*, we have that* [[x, ∃O.ψ]] < v *iff* [[x, ∀O.¬ψ]] > 1 − v*.*

*Proof.* By definition, [[x, ∃O.ψ]] = max<sup>y</sup>∈(2O)<sup>ω</sup> [[x ⊗ y,ψ]] = 1 − min<sup>y</sup>∈(2O)<sup>ω</sup> 1 − [[x ⊗ y,ψ]] = 1 − min<sup>y</sup>∈(2O)<sup>ω</sup> [[x ⊗ y,¬ψ]] = 1 − [[x, ∀O.¬ψ]]. Then, [[x, ∃O.ψ]] < v iff 1 − [[x, ∃O.ψ]] > 1 − v iff [[x, ∀O.¬ψ]] > 1 − v.

Consider an LTL[F] formula ψ, a value v ∈ [0, 1], and an input sequence <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup>. Recall that <sup>x</sup> is <sup>v</sup>-hopeful for <sup>ψ</sup> if there is <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>ω</sup> such that [[<sup>x</sup> <sup>⊗</sup> y,ψ]] ≥ v. Equivalently, [[x, ∃O.ψ]] ≥ v. Indeed, [[x, ∃O.ψ]] = max<sup>y</sup>∈(2O)<sup>ω</sup> [[x⊗y,ψ]], which is greater or equal to <sup>v</sup> iff there is <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>ω</sup> such that [[<sup>x</sup> <sup>⊗</sup> y,ψ]] <sup>≥</sup> <sup>v</sup>. Hence, x is not v-hopeful for ψ if [[x, ∃O.ψ]] < v. Equivalently, by Lemma 2, [[x, <sup>∀</sup>O.¬ψ]] <sup>&</sup>gt; <sup>1</sup> <sup>−</sup> <sup>v</sup>. Accordingly, for a strategy <sup>f</sup> : (2<sup>I</sup> )<sup>+</sup> <sup>→</sup> <sup>2</sup><sup>O</sup>, an input sequence <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup>, and a value <sup>v</sup> <sup>∈</sup> [0, 1], we say that <sup>f</sup> is <sup>v</sup>*-good for* <sup>x</sup> *with respect to* ψ, if [[x ⊗ f(x), ψ]] ≥ v or [[x, ∀O.¬ψ]] > 1 − v.

*Example 2.* Let I = {p} and O = {q}. Consider the LTL[F] formula ψ = (-1 <sup>4</sup> p ∨ -1 <sup>2</sup> q). Checking for which values v a strategy f is v-good for x with respect to ψ, we examine whether [[x ⊗ f(x), -1 <sup>4</sup> p ∨ -1 <sup>2</sup> q]] ≥ v or [[x, ∀q.¬(-1 <sup>4</sup> p ∨ -1 <sup>2</sup> q)]] > 1 − v. Since ψ refers only to the first position in the computation, it is enough to examine x<sup>0</sup> and f(x0). For example, if x<sup>0</sup> = ∅ and f(x0) = ∅, then [[x ⊗ f(x), -1 <sup>4</sup> p ∨ -1 <sup>2</sup> q]] = 0, [[x, ∃q.-1 <sup>4</sup> p ∨ -1 <sup>2</sup> <sup>q</sup>]] = max{0, <sup>1</sup> <sup>2</sup> } <sup>=</sup> <sup>1</sup> <sup>2</sup> , and [[x, ∀q.¬(-1 <sup>4</sup> p∨-1 <sup>2</sup> <sup>q</sup>)]] = min{1, <sup>1</sup><sup>−</sup> <sup>1</sup> <sup>2</sup> } <sup>=</sup> <sup>1</sup> <sup>2</sup> . Hence, f is v-good for x with respect to ψ if v = 0 or v > <sup>1</sup> <sup>2</sup> , thus <sup>v</sup> ∈ {0} ∪ ( <sup>1</sup> <sup>2</sup> , 1]. Similarly, we have the following.

– If x<sup>0</sup> = ∅ and f(x0) = {q} then f is v-good for x when v ∈ [0, 1].

$$\begin{array}{c} \text{ -- If } x\_0 = \{p\} \text{ and } f(x\_0) = \emptyset \text{ then } f \text{ is } v\text{-good for } x \text{ when } v \in \left[0, \frac{1}{4}\right] \cup \left(\frac{1}{2}, 1\right]. \end{array}$$

– If x<sup>0</sup> = {p} andf(x0) = {q} then f is v-good for x when v ∈ [0, 1].

**Theorem 5.** *The* LTL[F] ge*-synthesis with threshold problem is 2EXPTIMEcomplete.*

*Proof.* We show we can adjust the upper bound described in the proof of Theorem 1 to the multi-valued setting. Given an LTL[F] formula ψ over I ∪ O and a threshold v ∈ [0, 1], we describe an algorithm that returns a transducer T that ge-realizes ψ with threshold v, or declares that no such transducer exists.

By definition, we have that <sup>T</sup> ge-realizes <sup>ψ</sup> with threshold <sup>v</sup> if for every input sequence x, we have that f<sup>T</sup> is v-good for x with respect to ψ. Thus, [[x ⊗ f<sup>T</sup> (x), ψ]] ≥ v or [[x, ∀O.¬ψ]] > 1 − v. We construct a UCW whose language is {<sup>w</sup> <sup>∈</sup> (2I∪O)<sup>ω</sup> : [[w, ψ]] <sup>≥</sup> <sup>v</sup> or [[w, <sup>∀</sup>O.¬ψ]] <sup>&</sup>gt; <sup>1</sup> <sup>−</sup> <sup>v</sup>}.

Let <sup>A</sup><v <sup>ψ</sup> be an NBW for {<sup>w</sup> : [[w, ψ]] < v} and <sup>A</sup><sup>≥</sup><sup>v</sup> <sup>∃</sup>O.ψ be an NBW for {<sup>w</sup> : [[w, <sup>∃</sup>O.ψ]] <sup>≥</sup> <sup>v</sup>}. Thus, <sup>A</sup><sup>≥</sup><sup>v</sup> <sup>∃</sup>O.ψ is obtained from an NBW <sup>A</sup><sup>≥</sup><sup>v</sup> <sup>ψ</sup> for {w : [[w, ψ]] ≥ v} by existentially projecting its transitions on 2<sup>I</sup> . By Theorem 4, both <sup>A</sup><v <sup>ψ</sup> and A<sup>≥</sup><sup>v</sup> <sup>∃</sup>O.ψ are of size exponential in <sup>ψ</sup>.

Let <sup>B</sup><sup>v</sup> <sup>ψ</sup> be an NBW for the intersection of <sup>A</sup><v <sup>ψ</sup> and <sup>A</sup><sup>≥</sup><sup>v</sup> <sup>∃</sup>O.ψ. The language of B<sup>v</sup> <sup>ψ</sup> is then {<sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> : [[w, ψ]] < v and [[w, <sup>∃</sup>O.ψ]] <sup>≥</sup> <sup>v</sup>}. We then solve usual synthesis for the complementing UCW, whose language is {<sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> : [[w, ψ]] ≥ v or [[w, ∀O.¬ψ]] > 1 − v}, as required. By [17], the synthesis problem for UCW can be solved in EXPTIME.

The lower bound follows from the 2EXPTIME-hardness of LTL gerealizability.

## **Theorem 6.** *The* LTL[F] ge*-synthesis problem is 2EXPTIME-complete.*

*Proof.* We start with the upper bound. Given an LTL[F] specification ψ over <sup>I</sup> <sup>∪</sup> <sup>O</sup>, we describe an algorithm that returns a transducer <sup>T</sup> that ge-realizes <sup>ψ</sup> or declares that no such transducer exists.

As discussed above, a transducer <sup>T</sup> ge-realizes <sup>ψ</sup> iff for every input sequence <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> and value <sup>v</sup> <sup>∈</sup> [0, 1], we have that <sup>f</sup><sup>T</sup> is <sup>v</sup>-good for <sup>x</sup> with respect to ψ. Accordingly, we construct a UCW whose language is <sup>v</sup>∈<sup>V</sup> (ψ){<sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> : [[w, ψ]] ≥ v or [[w, ∀O.¬ψ]] > 1 − v}.

For <sup>v</sup> <sup>∈</sup> <sup>V</sup> (ψ), let <sup>B</sup><sup>v</sup> <sup>ψ</sup> be an NBW for {w : [[w,¬ψ]] ≥ v and [[w, ∃O.ψ]] ≥ v}, as constructed in the proof of Theorem 5, and let <sup>B</sup> be the union of <sup>B</sup><sup>v</sup> <sup>ψ</sup> for all v ∈ V (ψ). By Theorem 4, the size of V (ψ) is exponential in ψ, and thus so is the size of B. We then solve usual synthesis for the complementing UCW, whose language is as required. By [17], the synthesis problem for UCW can be solved in EXPTIME. The lower bound follows from the 2EXPTIME-hardness of LTL ge-realizability.

*Remark 4* **[Tuning hope down]**. The quantitative setting allows the designer to tune down "satisfaction by hoplessness": rather than synthesizing ψ∨ ∀O.¬ψ, we can have a factor λ and synthesize ψ ∨<sup>λ</sup>∀O.¬ψ. In Sect. 5.3 below we study additional ways to refer to hopefulness levels.

## **5.3 LTL[***F***] Assume-Guarantee** ge**-Synthesis**

In Sect. 5.2, we seek a transducer T such that for a given or for all values v ∈ [0, 1] and input sequences <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup>, if [[x, <sup>∃</sup>O.ψ]] <sup>≥</sup> <sup>v</sup> then [[<sup>x</sup> ⊗ T (x), ψ]] <sup>≥</sup> <sup>v</sup>. In this section we measure the quality of a transducer T by analyzing richer relations between [[x, ∃O.ψ]] and [[x ⊗ T (x), ψ]]. The setting has the flavor of quantitative assume-guarantee synthesis [3]. There, the specification consists of a multi-valued assumption A, which in our case is ∃O.ψ, and a multi-valued guarantee G, which is our case is ψ.

There are different ways to analyze the relation between [[x, ∃O.ψ]] and [[x ⊗ T (x), ψ]]. To this end, we assume that we are given a function comb : [0, 1]×[0, 1] → [0, 1] that given the satisfaction values of ∃O.ψ and of ψ, outputs a combined satisfaction value. We assume that comb is decreasing in the first component and increasing in the second component. This corresponds to the intuition that a lower satisfaction value of ∃O.ψ and a higher satisfaction value of ψ both yield a higher overall score. Also, since [[x, ∃O.ψ]] ≥ [[x ⊗ T (x), ψ]] for all <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup>, we assume that the first component is greater than or equal to the second. Finally, we require comb to be efficiently computed. Some natural comb functions include:


The choice of an appropriate comb function depends on the setting. Implication is in order when harsh environments may outweigh the actual performance of the system. For example, if our specification measures the uptime of a server in a cluster, then environments that cause very frequent power failures render the server unusable, as the overhead of reconnecting it outweighs its usefulness. In such a case, being shut down is better than continuously trying to reconnect, and so we give a higher satisfaction value for the server being down, which depends only on the environment. Then, as demonstrated with the cleaning robot in Sect. 1, the difference and ratio functions are fairly natural when measuring "realization of potential". We now describe a more detailed example when these measures are in order.

*Example 3.* Consider a controller for an elevator in an n-floor building. The environment sends to the controller requests, by means of a truth assignment to I = {1,...,n}, indicating the subset of floors in which the elevator is requested. Then, the controller assigns values to O = {*up*, *down*}, directing the elevator to go up, go down, or stay. The satisfaction value of the specification ψ reflects the waiting time of the request with the slowest response: it is 0 when this time is more than 2n, and is 1 when the slowest request is granted immediately. Sure enough, there is no controller that attains satisfaction value 1 on all input sequences, and so ψ is not realizable with satisfaction value 1. Also, adding assumptions about the behavior of the environment is not of much interest. Using AG ge-realizability, we can synthesize a controller that behaves in an optimal way. For example, using the difference function, we measure the performance of the controller on an input sequence <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> with respect to the best possible performance on x. Note that such a best performance needs a look-ahead on requests yet to come, which is indeed the satisfaction value of ∃O.ψ in x. Thus, the assumption [[x, ∃O.ψ]] actually gives us the performance of a good-enough *off-line* controller. Accordingly, using the ratio function, we can synthesize a system with the best *competitive ratio* for an on-line interaction [7].

Given an LTL[F] formula <sup>ψ</sup> and a function comb, we define the ge*-AGrealization value* of ψ in a transducer T by min{comb([[x, ∃O.ψ]], [[x ⊗ T (x), ψ]]) : <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup>}. Then, our goal in *AG* ge*-realizability* is to find, given an LTL[F] formula ψ and a function comb, the maximal value v ∈ [0, 1] such that there exists a transducer <sup>T</sup> whose AG ge-realization value of <sup>ψ</sup> is <sup>v</sup>. The *AG* ge *-synthesis* problem is then to find such a transducer.

We start by solving the decision version of AG ge-realizability.

**Theorem 7.** *The problem of deciding, given an* LTL[F] *formula* <sup>ψ</sup>*, a function* comb*, and a threshold* v ∈ [0, 1]*, whether there exists a transducer* T *whose AG* ge*-realization value of* ψ *is* v*, is 2EXPTIME-complete.*

*Proof.* Recall that V (ψ) is the set of possible satisfaction values of ψ (and hence of <sup>∃</sup>O.ψ), and that by Theorem 4, we have that <sup>|</sup><sup>V</sup> (ψ)| ≤ <sup>2</sup>|ψ<sup>|</sup> . Let G<sup>v</sup> = { v1, v2 ∈ V (ψ) × V (ψ) : comb(v1, v2) ≥ v}. Intuitively, G is the set of satisfaction-value pairs [[w, ∃O.ψ]], [[w, ψ]] that are allowed to be generated by a transducer whose AG ge-realization value of ψ is at least v. By definition, AG ge-realization of ψ with value v coincides with realization of the language <sup>L</sup><sup>v</sup> <sup>=</sup> {<sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> : comb([[w, <sup>∃</sup>O.ψ]], [[w, ψ]]) <sup>≥</sup> <sup>v</sup>}. By the monotonicity assumption on comb, for every v1, v2 ∈ Gv, we have that v 1, v <sup>2</sup> ∈ G for every v <sup>1</sup> ≤ v<sup>1</sup> and v <sup>2</sup> ≥ v2. Hence, we can write L<sup>v</sup> = - <sup>v</sup>1,v2∈G<sup>v</sup> {<sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> : [[w, ∃O.ψ]] ≤ v<sup>1</sup> and [[w, ψ]] ≥ v2}, and proceed to construct an NBW for L<sup>v</sup> by taking the union of NBWs A<sup>v</sup>1,v<sup>2</sup> for all v1, v2 ∈ Gv, each of which is the product of NBWs <sup>A</sup><sup>≤</sup>v<sup>1</sup> <sup>∃</sup>O.ψ and <sup>A</sup><sup>≥</sup>v<sup>2</sup> <sup>ψ</sup> , as in the proof of Theorem 5.

Aiming to proceed Safralessly, we can also construct a UCW for Lv, as follows. First, note that by the monotonicity of comb, for every v1, v2 ∈ V (ψ) × V (ψ) we have that v1, v2 ∈ G<sup>v</sup> iff for every u1, u2 ∈ V (ψ) × V (ψ) \ Gv, we have that v<sup>1</sup> < u<sup>1</sup> or v<sup>2</sup> > u2. Hence, L<sup>v</sup> = <sup>u</sup>1,u2∈<sup>V</sup> (ψ)×<sup>V</sup> (ψ)\G<sup>v</sup> {<sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> : [[w, <sup>∃</sup>O.ψ]] < u<sup>1</sup> or [[w, ψ]] > u2}, and so by dualization we have (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> \ <sup>L</sup><sup>v</sup> <sup>=</sup> - <sup>u</sup>1,u2∈<sup>V</sup> (ψ)×<sup>V</sup> (ψ)\G<sup>v</sup> {<sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> : [[w, <sup>∃</sup>O.ψ]] <sup>≥</sup> <sup>u</sup><sup>1</sup> and [[w, ψ]] <sup>≤</sup> <sup>u</sup>2}. Hence, we can obtain a UCW for L<sup>v</sup> by dualizing an NBW that is the union of NBWs A<sup>u</sup>1,u<sup>2</sup> , for all u1, u2 ∈ V (ψ) × V (ψ) \ Gv, each of which is the product of NBWs <sup>A</sup><sup>≥</sup>u<sup>1</sup> <sup>∃</sup>O.ψ and <sup>A</sup><sup>≤</sup>u<sup>2</sup> <sup>ψ</sup> .

Observe that in all cases, the size of the NBW is 2<sup>O</sup>(|ψ|) . Indeed, there are at most 2<sup>2</sup>|ψ<sup>|</sup> pairs in the union, and, by Theorem 4, the size of the NBW for each pair is 2<sup>O</sup>(|ψ|).

The lower bound follows from the 2EXPTIME-hardness of LTL gerealizability.

By Theorem 4, the number of possible satisfaction values for ψ is at most 2|ψ<sup>|</sup> . Thus, the number of possible values for comb(A, G), where A and G are satisfaction values of ψ, is at most 2<sup>2</sup>|ψ<sup>|</sup> . Using binary search over the image of comb, we can use Theorem 7 to obtain the following.

**Corollary 1.** *The AG* ge*-synthesis problem can be solved in doubly-exponential time.*

*Remark 5* **[**ge**-synthesis as a special case of AG** ge**-synthesis].** The two approaches taken in Sect. 5.2 can be captured by an appropriate comb function. Indeed, for ge-synthesis with a threshold, we can use the function comb with comb(A, G) = 1 if <sup>A</sup> <sup>≥</sup> <sup>v</sup> <sup>→</sup> <sup>G</sup> <sup>≥</sup> <sup>v</sup>, and comb(A, G) = 0 otherwise. For gesynthesis (without a threshold), we can use the function comb with comb(A, G) = 1 if A = G, and comb(A, G) = 0 otherwise (recall that A ≥ G by definition). However, the solution described in Sect. 5.2 is simpler than the one described here for the general case.

## **5.4 LTL[***F***]** ge**-Synthesis in Stochastic Environments**

The setting of LTL[F] ge-synthesis studied in Sects. 5.2 and 5.3 takes the different satisfaction values into an account, but is binary, in the sense that a specification is either (possibly AG) ge-realizable, or is not. In particular, in case the specification is not ge-realizable, synthesis algorithms only return "no". In this section we add a quantitative measure also to the underlying realizability question. We do so by assuming a stochastic environment, with a known distribution on the inputs sequences, and analyzing the expected performance of the system.

For completeness, we remind the reader of some basics of probability theory. For a comprehensive reference see e.g., [25]. Let Σ be a finite alphabet, and let ν be some *probability distribution* over Σ<sup>ω</sup>. For example, in the uniform distribution over (2<sup>I</sup> )<sup>ω</sup>, the probability space is induced by sampling each letter with probability 2−|I<sup>|</sup> , corresponding to settings in which each signal in I always holds in probability <sup>1</sup> <sup>2</sup> . We assume ν is given by a finite Markov Decision Process (MDP). That is, <sup>ν</sup> is induced by the distribution of each letter <sup>i</sup> <sup>∈</sup> <sup>2</sup><sup>I</sup> at each time step, determined by a finite stochastic control process that takes into account also the outputs generated by the system (see [2] for the precise model). A *random variable* is then a function <sup>X</sup> : <sup>Σ</sup><sup>ω</sup> <sup>→</sup> <sup>R</sup>. When <sup>X</sup> has a finite image <sup>V</sup> , which is the case in our setting, its *expected value* is E[X] = <sup>v</sup>∈<sup>V</sup> <sup>v</sup> · Pr(X−<sup>1</sup>(v)). Intuitively, E[X] is the "average" value that X attains. Next, consider an *event* <sup>E</sup> <sup>⊆</sup> <sup>Σ</sup><sup>ω</sup>. The *conditional expectation of* <sup>X</sup> *with respect to* <sup>E</sup> is <sup>E</sup>[X|E] = E[1EX] Pr(E) , where 1EX is the random variable that assigns X(w) to w ∈ E and 0 to <sup>w</sup> ∈ <sup>E</sup>. Intuitively, <sup>E</sup>[X|E] is the average value that <sup>X</sup> attains when restricting to words in E, and normalizing according to the probability of E itself.

We continue and review the *high-quality synthesis problem* [2], where the ge variant is not considered. There, the environment is assumed to be stochastic and we care for the expected satisfaction value of an LTL[F] specification in the computations of a transducer T , assuming some given distribution on the inputs sequences. Formally, let <sup>X</sup><sup>T</sup> ,ψ : (2<sup>I</sup> )<sup>ω</sup> <sup>→</sup> <sup>R</sup> be a random variable that assigns each sequence <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> of input signals with [[<sup>T</sup> (x), ψ]]. Then, when the sequences in (2<sup>I</sup> )<sup>ω</sup> are sampled according to a given distribution ν of (2<sup>I</sup> )ω, we define [[<sup>T</sup> , ψ]]<sup>ν</sup> <sup>=</sup> <sup>E</sup>[X<sup>T</sup> ,ψ]. Since <sup>ν</sup> is fixed, we omit it from the notation and use [[T , ψ]] in the following.

*Remark 6* **[Relating LTL** ge**-synthesis with stochastic** LTL[F] **synthesis]** Given an LTL formula ψ, we can view it as an LTL[F] formula with possible satisfaction values {0, 1}, apply to it high-quality synthesis *a-la* [2], and find a transducer <sup>T</sup> that maximizes <sup>E</sup>[X<sup>T</sup> ,ψ]. An interesting observation is that if <sup>T</sup> ge-realizes <sup>ψ</sup>, then it also maximizes <sup>E</sup>[X<sup>T</sup> ,ψ]. Indeed, all input sequences that can contribute to the expected satisfaction value, do so.

We introduce and study two measures for high-quality synthesis in a stochastic environment. In the first, termed *expected* ge*-synthesis*, all input sequences are sampled, yet the satisfaction value in each input sequence takes its hopefulness level into account. In the second, termed *conditional expected* ge*-synthesis*, only hopeful input sequences are sampled.

We start with expected ge-synthesis. There, instead of associating each sequence <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> with [[<sup>x</sup> ⊗ T (x), ψ]], we associate it with <sup>X</sup>comb <sup>T</sup> ,ψ <sup>=</sup> comb([[x, ∃O.ψ]], [[x ⊗ T (x), ψ]]}, where comb is as described in Sect. 5.3, thus capturing the assume-guarantee semantics of quantitative ge-synthesis. Then, we define [[<sup>T</sup> , ψ]]comb <sup>=</sup> <sup>E</sup>[Xcomb <sup>T</sup> ,ψ ]. For example, taking comb as implication, we have Xcomb <sup>T</sup> ,ψ = max{[[<sup>x</sup> ⊗ T (x), ψ]], [[x, <sup>∀</sup>O.¬ψ]]}, capturing the semantics of (∃O.ψ) → ψ.

Then, in conditional expected ge-synthesis, we consider <sup>∃</sup>O.ψ as an environment assumption, and factor it in using conditional expectation, parameterized by a threshold v ∈ [0, 1]. Formally, let ∃O.ψ ≥ v denote the event {<sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> : [[x, <sup>∃</sup>O.ψ]] <sup>≥</sup> <sup>v</sup>}. Then, we define [[<sup>T</sup> , ψ]]cond(v) <sup>=</sup> <sup>E</sup>[X<sup>T</sup> ,ψ|∃O.ψ <sup>≥</sup> <sup>v</sup>], assuming the event ∃O.ψ ≥ v has a strictly positive probability.

In [2], it is shown that the high-quality synthesis problem can be solved in doubly-exponential time, also in the presence of environment assumptions. In the solution, the first step is the translation of the involved formulas to DPWs. In order to extract from [2] the results relevant to us, we describe them by means of *discrete quantitative specifications*, defined as follows. A discrete quantitative specification Ψ over I ∪ O is given by means of a sequence A1,..., A<sup>n</sup> of DPWs, with (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> <sup>=</sup> <sup>L</sup>(A1) <sup>⊇</sup> <sup>L</sup>(A2) <sup>⊇</sup> ... <sup>⊇</sup> <sup>L</sup>(An), and sequence 0 <sup>≤</sup> <sup>v</sup><sup>1</sup> < ... < <sup>v</sup><sup>n</sup> <sup>≤</sup> 1 of values. For every <sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup>, the satisfaction value of <sup>w</sup> in <sup>Ψ</sup>, denoted [[w, Ψ]], is max{v<sup>i</sup> : w ∈ L(Ai)}. We refer to n as the depth of Ψ.

**Theorem 8 (**[2]**).** *Consider a discrete quantitative specification* <sup>Ψ</sup> *over* <sup>I</sup> <sup>∪</sup> <sup>O</sup>*. Let* n *be its depth and* m *be the size of the largest DPW in* Ψ*. For a transducer* <sup>T</sup> *, let* <sup>X</sup><sup>T</sup> *be a random variable that assigns a word* <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>ω</sup> *with* [[x⊗T (x), Ψ]]*.*

*1. We can synthesize a transducer* <sup>T</sup> *that maximizes* <sup>E</sup>[X<sup>T</sup> ] *in time* <sup>m</sup><sup>n</sup>*.*

*2. Given a DPW* <sup>B</sup> *over* <sup>2</sup><sup>I</sup> *such that* Pr(L(B)) <sup>&</sup>gt; <sup>0</sup>*, we can synthesize a transducer* <sup>T</sup> *that maximizes* <sup>E</sup>[X<sup>T</sup> |B] *in time* <sup>m</sup><sup>n</sup> · <sup>k</sup>*, where* <sup>k</sup> *is the size of* <sup>B</sup>*.*

We can now state the main results of this section.

**Theorem 9.** *Consider an* LTL[F] *formula* <sup>ψ</sup>*.*


*Proof.* Let v<sup>1</sup> < v<sup>2</sup> < ... < v<sup>n</sup> be the possible satisfaction values of ψ (and hence also of <sup>∃</sup>O.ψ and of <sup>∀</sup>O.ψ). By Theorem 4, we have that <sup>n</sup> <sup>≤</sup> <sup>2</sup>|ψ<sup>|</sup> . For each <sup>v</sup>i, we can construct a DPW <sup>D</sup><sup>≥</sup>v<sup>i</sup> comb(∃O.ψ,ψ) as in Theorem 7. It is not hard to see that the discrete quantitative specification given by the DPWs <sup>D</sup><sup>≥</sup>v<sup>i</sup> comb(∃O.ψ,ψ) and the values vi, for 1 ≤ i ≤ n, is qual to the specification comb(∃O.ψ, ψ). Thus, by Theorem <sup>8</sup> (1), we can find a transducer that maximizes <sup>E</sup>[X<sup>T</sup> ] in time (2<sup>2</sup>O(*|*ψ*|*) )<sup>2</sup>*|*ψ*<sup>|</sup>* = 2<sup>2</sup>O(*|*ψ*|*) .

Next, given v ∈ [0, 1], we can check whether Pr(∃O.ψ > v) > 0, for example by converting a DPW <sup>D</sup><sup>≥</sup><sup>v</sup> <sup>∃</sup>O.ψ to an MDP, and reasoning about its Ergodiccomponents. Then, by Theorem 8 (2), we can find a transducer that maximizes <sup>E</sup>[X<sup>T</sup> |∃Oψ > v], in time (2<sup>2</sup>O(*|*ψ*|*) )<sup>2</sup>*|*ψ*<sup>|</sup>* · <sup>2</sup><sup>2</sup>O(ψ) = 2<sup>2</sup>O(*|*ψ*|*) .

**Corollary 2.** *The (possibly conditional) expected* ge*-synthesis problem for* LTL[F] *can be solved in doubly-exponential time.*

#### **5.5 Guarantees in High-Quality** ge**-Synthesis**

As in the Boolean setting, also in the high-quality one we would like to add to a ge-realizing transducer guarantees and indications about the satisfaction level. As we detail below, the quantitative setting offers many possible ways to do so.

**High-Quality** ge**-Synthesis with Guarantees.** We consider specifications of the form ψ = ψ*strong* ∧ψ*weak* , where essentially, we seek a transducer that realizes ψ*strong* and (possibly AG) ge-realizes ψ*weak* . Maximizing the realization value of ψ*strong* may conflict with maximizing the ge-realization value of ψ*weak* , and there are different ways to trade-off the two goals. Technically, in the decision-problem variant, we are given two thresholds v1, v<sup>2</sup> ∈ [0, 1], and we seek a transducer T that realizes ψ*strong* with value at least v1, and ge-realizes ψ*weak* with value at least v2. Then, one may start, for example, by maximizing the value v1, and then find the maximal value v<sup>2</sup> that may be achieved simultaneously. Alternatively, one may prefer to maximize v2, or some other combination of v<sup>1</sup> and v2. Also, it is possible to decompose ψ further, to several strong and weak components, each with its desired threshold.

The solutions in the different settings all involve a construction of a UCW <sup>A</sup><sup>≥</sup>v<sup>1</sup> <sup>ψ</sup>*strong* , and its product with the automata constructed in the solutions for the different ge-synthesis variants. We thus have the following. We note that when the solution for ψ*weak* is Safraless, we can use a UCW for ψ*strong* to maintain a Safraless construction.

**Theorem 10.** *The problem of* LTL[F] *high-quality* ge*-synthesis with a guarantee can be solved in doubly-exponential time.*

**Flags by a High-Quality** ge**-Realizing Transducer.** In the quantitative setting, we parameterized the flags raised by the ge-realizing transducer by values in [0, 1], indicating the announced satisfaction level. Thus, rather than talking about prefixes being green, red, or blue, we talk about them being vgreen, v-red, and v-blue, for v ∈ [0, 1], which essentially means that a satisfaction value of at least v is guarantees (in green and blue flags) or is impossible (in red ones). We can think of those as "degrees" of green, red, and blue. Below, we formalize this intuition and argue that even an augmentation of a transducer that ge-realizes ψ by flags for all values in V (ψ) leaves the problem in doublyexponential time.

<sup>A</sup> *quantitative language* over 2<sup>I</sup>∪<sup>O</sup> is <sup>L</sup> : (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> <sup>→</sup> [0, 1]. For a quantitative language <sup>L</sup> and a word <sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)∗, we define <sup>L</sup><sup>w</sup> as the quantitative language where for all <sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup>, we have <sup>L</sup><sup>w</sup>(w ) = L(w · w ). For a value v ∈ [0, 1], a word <sup>w</sup> <sup>∈</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>∗</sup> is <sup>v</sup>*-green for* <sup>L</sup> if <sup>L</sup><sup>w</sup> is <sup>v</sup>-realizable. That is, there is a transducer <sup>T</sup> such that [[T,L<sup>w</sup>]] <sup>≥</sup> <sup>v</sup>. A word <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>∗</sup> is <sup>v</sup>*-green for* <sup>L</sup> if there is <sup>y</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>∗</sup> such that <sup>x</sup>⊗<sup>y</sup> is <sup>v</sup>-green for <sup>L</sup>. Thus, when the environment generates x, the system can respond in a way that would guarantee v-realizability. Finally, we say that <sup>L</sup> is *green realizable* if there is a strategy <sup>f</sup> : (2<sup>I</sup> )<sup>+</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> that for every threshold <sup>v</sup> and for every input <sup>x</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>+</sup> that is <sup>v</sup>-green for <sup>L</sup>, we have that x⊗f(x) is v-green for L. It is not hard to see that Theorem 3 carries over to the quantitative setting, thus quantitative optimal realizability is strictly stronger than quantitative green realizability. In particular, if a transducer T optimally realizes an LTL[F] formula ψ, then T also green realizes ψ. In the full version, we describe quantitative definitions also for red and blue prefixes, and describe monitors for the detection of the various types of prefixes.

#### **6 Discussion**

We introduced and solved several variants of ge-synthesis. Our complexity results are tight and show that ge-synthesis is not more complex than traditional synthesis. In practice, however, traditional synthesis algorithms do not scale well, and much research is devoted for the development of methods and heuristics for coping with the implementation challenges of synthesis. A natural future research direction is to extend these heuristics and methods for gesynthesis. We mention here two specific examples.

Efficient synthesis algorithms have been developed for fragments of LTL [21]. Most notable is the *GR(1) fragment* [18], which supports assume-guarantee reasoning, and for which synthesis has an efficient symbolic solution. Adding existential quantification to GR(1) specifications, which is how we handled LTL ge-synthesis, is not handled by its known algorithms, and is an interesting challenge. The success of SAT-based model-checking have led to the development of SAT-based synthesis algorithms [6], where the synthesis problem is reduced to satisfiability of a QBF formula. The fact the setting already includes quantifiers suggests it can be extended to ge-synthesis. A related effort is *bounded synthesis* algorithms [13,24], where the synthesized systems are assumed to be of a bounded size and can be represented symbolically [10].

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Synthesizing JIT Compilers for In-Kernel DSLs**

Jacob Van Geffen1(B) , Luke Nelson<sup>1</sup>, Isil Dillig<sup>2</sup>, Xi Wang<sup>1</sup>, and Emina Torlak<sup>1</sup>

> <sup>1</sup> University of Washington, Seattle, USA jsvg@cs.washington.edu <sup>2</sup> University of Texas at Austin, Austin, USA

**Abstract.** Modern operating systems allow user-space applications to submit code for kernel execution through the use of in-kernel domain specific languages (DSLs). Applications use these DSLs to customize system policies and add new functionality. For performance, the kernel executes them via just-in-time (JIT) compilation. The correctness of these JITs is crucial for the security of the kernel: bugs in in-kernel JITs have led to numerous critical issues and patches.

This paper presents JitSynth, the first tool for synthesizing verified JITs for in-kernel DSLs. JitSynth takes as input interpreters for the source DSL and the target instruction set architecture. Given these interpreters, and a mapping from source to target states, JitSynth synthesizes a verified JIT compiler from the source to the target. Our key idea is to formulate this synthesis problem as one of synthesizing a perinstruction compiler for *abstract register machines*. Our core technical contribution is a new *compiler metasketch* that enables JitSynth to efficiently explore the resulting synthesis search space. To evaluate Jit-Synth, we use it to synthesize a JIT from eBPF to RISC-V and compare to a recently developed Linux JIT. The synthesized JIT avoids all known bugs in the Linux JIT, with an average slowdown of 1.82<sup>×</sup> in the performance of the generated code. We also use JitSynth to synthesize JITs for two additional source-target pairs. The results show that JitSynth offers a promising new way to develop verified JITs for in-kernel DSLs.

**Keywords:** Synthesis · Just-in-time compilation · Symbolic execution

#### **1 Introduction**

Modern operating systems (OSes) can be customized with user-specified programs that implement functionality like system call whitelisting, performance profiling, and power management [11,12,24]. For portability and safety, these programs are written in restricted domain-specific languages (DSLs), and the kernel executes them via interpretation and, for better performance, just-in-time (JIT) compilation. The correctness of in-kernel interpreters and JITs is crucial for the reliability and security of the kernel, and bugs in their implementations have led to numerous critical issues and patches [15,30]. More broadly, embedded DSLs are also used to customize—and compromise [6,18]—other low-level software, such as font rendering and anti-virus engines [8]. Providing formal guarantees of correctness for in-kernel DSLs is thus a pressing practical and research problem with applications to a wide range of systems software.

Prior work has tackled this problem through interactive theorem proving. For example, the Jitk framework [40] uses the Coq interactive theorem prover [38] to implement and verify the correctness of a JIT compiler for the classic Berkeley Packet Filter (BPF) language [24] in the Linux kernel. But such an approach presents two key challenges. First, Jitk imposes a significant burden on DSL developers, requiring them to implement both the interpreter and the JIT compiler in Coq, and then manually prove the correctness of the JIT compiler with respect to the interpreter. Second, the resulting JIT implementation is extracted from Coq into OCaml and cannot be run in the kernel; rather, it must be run in user space, sacrificing performance and enlarging the trusted computing base (TCB) by relying on the OCaml runtime as part of the TCB.

This paper addresses these challenges with JitSynth, the first tool for synthesizing verified JIT compilers for in-kernel DSLs. JitSynth takes as input interpreters for the source DSL and the target instruction set architecture (ISA), and it synthesizes a JIT compiler that is guaranteed to transform each source program into a semantically equivalent target program. Using JitSynth, DSL developers write no proofs or compilers. Instead, they write the semantics of the source and target languages in the form of interpreters and a mapping from source to target states, which JitSynth trusts to be correct. The synthesized JIT compiler is implemented in C; thus, it can run directly in the kernel.

At first glance, synthesizing a JIT compiler seems intractable. Even the simplest compiler contains thousands of instructions, whereas existing synthesis techniques scale to tens of instructions. To tackle this problem in our setting, we observe that in-kernel DSLs are similar to ISAs: both take the form of bytecode instructions for an *abstract register machine*, a simple virtual machine with a program counter, a few registers, and limited memory store [40]. We also observe that in practice, the target machine has at least as many resources (registers and memory) as the source machine; and that JIT compilers for such abstract register machines perform register allocation statically at compile time. Our main insight is that we can exploit these properties to make synthesis tractable through *decomposition* and *prioritization*, while preserving soundness and completeness.

JitSynth works by decomposing the JIT synthesis problem into the problem of synthesizing individual *mini compilers* for every instruction in the source language. Each mini compiler is synthesized by generating a *compiler metasketch* [7], a set of ordered sketches that collectively represent *all* instruction sequences in the target ISA. These sketches are then solved by an off-the-shelf synthesis tool based on reduction to SMT [39]. The synthesis tool ensures that the target instruction sequence is semantically equivalent to the source instruction, according to the input interpreters. The order in which the sketches are explored is key to making this search practical, and JitSynth contributes two techniques for biasing the search towards tightly constrained, and therefore tractable, sketches that are likely to contain a correct program.

First, we observe that source instructions can often be implemented with target instructions that access the same parts of the state (e.g., only registers). Based on this observation, we develop *read-write sketches*, which restrict the synthesis search space to a subset of the target instructions, based on a sound and precise summary of their semantics. Second, we observe that hand-written JITs rely on pseudoinstructions to generate common target sequences, such as loading immediate (constant) values into registers. We use this observation to develop *pre-load sketches*, which employ synthesized pseudoinstructions to eliminate the need to repeatedly search for common target instruction subsequences.

We have implemented JitSynth in Rosette [39] and used it to synthesize JIT compilers for three widely used in-kernel DSLs. As our main case study, we used JitSynth to synthesize a RISC-V [32] compiler for extended BPF (eBPF) [12], an extension of classic BPF [24], used by the Linux kernel. Concurrently with our work, Linux developers manually built a JIT compiler for the same source and target pair, and a team of researchers found nine correctness bugs in that compiler shortly after its release [28]. In contrast, our JIT compiler is verified by construction; it supports 87 out of 102 eBPF instructions and passes all the Linux kernel tests within this subset, including the regression tests for these nine bugs. Our synthesized compiler generates code that is 5.24× faster than interpreted code and 1.82× times slower than the code generated by the Linux JIT. We also used JitSynth to synthesize a JIT from libseccomp [10], a policy language for system call whitelisting, to eBPF, and a JIT from classic BPF to eBPF. The synthesized JITs avoid previously found bugs in the existing generators for these source target pairs, while incurring, on average, a 2.28–2.61× slowdown in the performance of the generated code.

To summarize, this paper makes the following contributions:


The rest of this paper is organized as follows. Section 2 illustrates JitSynth on a small example. Section 3 formalizes the JIT synthesis problem for in-kernel DSLs. Section 4 presents the JitSynth algorithm for generating and solving compiler metasketches. Section 5 provides implementation details. Section 6 evaluates JitSynth. Section 7 discusses related work. Section 8 concludes.

#### **2 Overview**

This section provides an overview of JitSynth by illustrating how it synthesizes a toy JIT compiler (Fig. 1). The source language of the JIT is a tiny subset of


**Fig. 1.** Subsets of eBPF and RISC-V used as source and target languages, respectively, in our running example: R[r] denotes the value of register r; M[a] denotes the value at memory address <sup>a</sup>; <sup>⊕</sup> denotes concatenation of bitvectors; superscripts (e.g., 0<sup>32</sup>) denote repetition of bits; sext32(x) and sext64(x) sign-extend x to 32 and 64 bits, respectively; and extract(i, j, x) produces a subrange of bits of x from index i down to j.

eBPF [12] consisting of one instruction, and the target language is a subset of 64-bit RISC-V [32] consisting of seven instructions. Despite the simplicity of our languages, the Linux kernel JIT used to produce incorrect code for this eBPF instruction [27]; such miscompilation bugs not only lead to correctness issues, but also enable adversaries to compromise the OS kernel by crafting malicious eBPF programs [40]. This section shows how JitSynth can be used to synthesize a JIT that is verified with respect to the semantics of the source and target languages.

*In-Kernel Languages.* JitSynth expects the source and target languages to be a set of instructions for manipulating the state of an *abstract register machine* (Sect. 3). This state consists of a program counter (*pc*), a finite sequence of general-purpose registers (*reg*), and a finite sequence of memory locations (*mem*), all of which store bitvectors (i.e., finite precision integers). The length of these bitvectors is defined by the language; for example, both eBPF and RISC-V store 64-bit values in their registers. An instruction consists of an *opcode* and a finite set of *fields*, which are bitvectors representing either register identifiers or immediate (constant) values. For instance, the addi32 instruction in eBPF has two fields: *dst* is a 4-bit value representing the index of the output register, and *imm32* is a 32-bit immediate. (eBPF instructions may have two additional fields *src* and *off* , which are not shown here as they are not used by addi32). An abstract register machine for a language gives meaning to its instructions: the machine consumes an instruction and a state, and produces a state that is the result of executing that instruction. Figure 1 shows a high-level description of the abstract register machines for our languages.

JitSynth *Interface.* To synthesize a compiler from one language to another, JitSynth takes as input their syntax, semantics, and a mapping from source to target states. All three inputs are given as a program in a *solver-aided host language* [39]. JitSynth uses Rosette as its host, but the host can be any language with a symbolic evaluation engine that can reduce the semantics of host programs to SMT constraints (e.g., [37]). Figure 2 shows the interpreters for the source and target languages (i.e., emulators for their abstract register machines), as well as the state-mapping functions regST, pcST, and memST that JitSynth uses to determine whether a source state <sup>σ</sup>S is equivalent to a target state <sup>σ</sup>T . In particular, JitSynth deems these states equivalent, denoted by <sup>σ</sup>S <sup>∼</sup><sup>=</sup> <sup>σ</sup>T , whenever *reg*(σT )[regST(r)] = *reg*(σS)[r], *pc*(σT ) = pcST(*pc*(σS)), and *mem*(σT )[memST(a)] = *mem*(σS)[a] for all registers <sup>r</sup> and memory addresses <sup>a</sup>.

**Fig. 2.** Snippets of inputs to JitSynth: the interpreters for the source (eBPF) and and target (RISC-V) languages and state-mapping functions.

*Decomposition into Per-instruction Compilers.* Given these inputs, JitSynth generates a *per-instruction compiler* from the source to the target language. To ensure that the resulting compiler is correct (Theorem 1), and that one will be found if it exists (Theorem 2), JitSynth puts two restrictions on its inputs. First, the inputs must be self-finitizing [39], meaning that both the interpreters and the mapping functions must have a finite symbolic execution tree when applied to symbolic inputs. Second, the target machine must have at least as many registers and memory locations as the source machine; these storage cells must be as wide as those of the source machine; and the state-mapping functions (pcST, regST, and memST) must be injective. Our toy inputs satisfy these restrictions, as do the real in-kernel languages evaluated in Sect. 6.

*Synthesis Workflow.* JitSynth generates a per-instruction compiler for a given source and target pair in two stages. The first stage uses an optimized *compiler metasketch* to synthesize a mini compiler from every instruction in the source language to a sequence of instructions in the target language (Sect. 4). The second stage then simply stitches these mini compilers into a full C compiler using a trusted outer loop and a switch statement. The first stage is a core technical contribution of this paper, and we illustrate it next on our toy example.

*Metasketches.* To understand how JitSynth works, consider the basic problem of determining if every addi32 instruction can be emulated by a sequence of k instructions in toy RISC-V. In particular, we are interested in finding a program Caddi32 in our host language (which JitSynth translates to C) that takes as input a source instruction s = addi32 *dst*, *imm32* and outputs a semantically equivalent RISC-V program <sup>t</sup> = [t1,...,tk]. That is, for all *dst*, *imm32* , and for all equivalent states <sup>σ</sup>S <sup>∼</sup><sup>=</sup> <sup>σ</sup>T , we have *run*(s, σS, ebpf-interpret) <sup>∼</sup><sup>=</sup> *run*(t, σT , rv-interpret), where *run*(e, σ, f) executes the instruction interpreter f on the sequence of instructions e, starting from the state σ (Definition 3).

We can solve this problem by asking the host synthesizer to search for Caddi32 in a space of candidate mini compilers of length k. We describe this space with a syntactic template, or a *sketch*, as shown below:


Here, (??insn dst imm) stands for a missing expression—a hole—that the synthesizer needs to fill with an instruction from the toy RISC-V language. To fill an instruction hole, the synthesizer must find an expression that computes the value of the target instruction's fields. JitSynth limits this expression language to bitvector expressions (of any depth) over the fields of the source instruction and arbitrary bitvector constants.

Given this sketch, and our correctness specification for Caddi32, the synthesizer will search the space defined by the sketch for a program that satisfies the specification. Below is an example of the resulting toy compiler from eBPF to RISC-V, synthesized and translated to C by JitSynth (without the outer loop):

```
void compile(struct bpf_insn *insn, struct rv_insn *tgt_prog) {
  switch (insn->op) { case BPF_ADDI32:
    tgt_prog[0] = /* lui x6, extract(19, 0, (imm + 0x800) >> 12) */
      rv_lui(6, extract(19, 0, (insn->imm + 0x800) >> 12));
    tgt_prog[1] = /* addiw x6, x6, extract(11, 0, imm) */
      rv_addiw(6, 6, extract(11, 0, insn->imm));
    tgt_prog[2] = /* add rd, rd, x6 */
      rv_add(regmap(insn->dst), regmap(insn->dst), 6);
    tgt_prog[3] = /* slli rd, rd, 32 */
      rv_slli(regmap(insn->dst), regmap(insn->dst), 32);
    tgt_prog[4] = /* srli rd, rd, 32 */
  rv_srli(regmap(insn->dst), regmap(insn->dst), 32); break; }
}
```
Once we know how to synthesize a compiler of length k, we can easily extend this solution into a naive method for synthesizing a compiler of any length. We simply enumerate sketches of increasing lengths, k = 1, 2, 3,..., invoke the synthesizer on each generated sketch, and stop as soon as a solution is found (if ever). The resulting ordered set of sketches forms a metasketch [7]—i.e., a search space and a strategy for exploring it—that contains all candidate mini compilers (in a subset of the host language) from the source to the target language. This naive metasketch can be used to find a mini compiler for our toy example in 493 min. However, it fails to scale to real in-kernel DSLs (Sect. 6), motivating the need for JitSynth's optimized compiler metasketches.

*Compiler Metasketches.* JitSynth optimizes the naive metasketch by extending it with two kinds of more tightly constrained sketches, which are explored first. A constrained sketch of size k usually contains a correct solution of a given size if one exists, but if not, JitSynth will eventually explore the naive sketch of the same length, to maintain completeness. We give the intuition behind the two optimizations here, and present them in detail in Sect. 4.

First, we observe that practical source and target languages include similar kinds of instructions. For example, both eBPF and RISC-V include instructions for adding immediate values to registers. This similarity often makes it possible to emulate a source instruction with a sequence of target instructions that access the same part of the state (the program counter, registers, or memory) as the source instruction. For example, addi32 reads and writes only registers, not memory, and it can be emulated with RISC-V instructions that also access only registers. To exploit this observation, we introduce *read-write sets*, which summarize, soundly and precisely, how an instruction accesses state. JitSynth uses these sets to define *read-write sketches* for a given source instruction, including only target instructions that access the state in the same way as the source instruction. For instance, a read-write sketch for addi32 excludes both lb and sb instructions because they read and write memory as well as registers.

Second, we observe that hand-written JITs use pseudoinstructions to simplify their implementation of mini compilers. These are simply subroutines or macros for generating target sequences that implement common functionality. For example, the Linux JIT from eBPF to RISC-V includes a pseudoinstruction for loading 32-bit immediates into registers. JitSynth mimics the way handwritten JITs use pseudoinstructions with the help of *pre-load sketches*. These sketches first use a synthesized pseudoinstruction to create a sequence of concrete target instructions that load source immediates into scratch registers; then, they include a compute sequence comprised of read-write instruction holes. Applying these optimizations to our toy example, JitSynth finds a mini compiler for addi32 in 5 s—a roughly 6000× speedup over the naive metasketch.

#### **3 Problem Statement**

This section formalizes the compiler synthesis problem for in-kernel DSLs. We focus on JIT compilers, which, for our purposes, means one-pass compilers [11]. To start, we define *abstract register machines* as a way to specify the syntax and semantics of in-kernel languages. Next, we formulate our compiler synthesis problem as one of synthesizing a set of sound *mini compilers* from a single source instruction to a sequence of target instructions. Finally, we show that these mini compilers compose into a sound JIT compiler, which translates every source program into a semantically equivalent target program.

*Abstract Register Machines.* An abstract register machine (ARM) provides a simple interface for specifying the syntax and semantics of an in-kernel language. The syntax is given as a set of abstract instructions, and the semantics is given as a transition function over instructions and machine states.

An *abstract instruction* (Definition 1) defines the name (*op*) and type signature (F) of an operation in the underlying language. For example, the abstract instruction (*addi32* , r → *Reg*, imm32 → *BV* (32)) specifies the name and signature of the addi32 operation from the eBPF language (Fig. 1). Each abstract instruction represents the (finite) set of all *concrete instructions* that instantiate the abstract instruction's parameters with values of the right type. For example, addi32 0, 5 is a concrete instantiation of the abstract instruction for addi32. In the rest of this paper, we will write "instruction" to mean a concrete instruction.

**Definition 1 (Abstract and Concrete Instructions).** *An* abstract instruction ι *is a pair* (*op*, F) *where op is an opcode and* F *is a mapping from* fields *to their* types*. Field types include Reg , denoting register names, and BV* (k)*, denoting* k*-bit bitvector values. The abstract instruction* ι *represents all* concrete instructions p = (*op*, F) *with the opcode op that bind each field* f ∈ *dom*(F) *to a value* F(f) *of type* F(f)*. We write* P(ι) *to denote the set of all concrete instructions for* ι*, and we extend this notation to sets of abstract instructions in the usual way, i.e.,* P(I) = - ι∈I <sup>P</sup>(ι) *for the set* <sup>I</sup>*.*

Instructions operate on machine *states* (Definition 2), and their semantics are given by the machine's *transition function* (Definition 3). A machine state consists of a program counter, a map from register names to register values, and a map from memory addresses to memory values. Each state component is either a bitvector or a map over bitvectors, making the set of all states of an ARM finite. The transition function of an ARM defines an interpreter for the ARM's language by specifying how to compute the output state for a given instruction and input state. We can apply this interpreter, together with the ARM's *fuel function*, to define an *execution* of the machine on a program and an initial state. The fuel function takes as input a sequence of instructions and returns a natural number that bounds the number of steps (i.e., state transitions) the machine can make to execute the given sequence. The inclusion of fuel models the requirement of in-kernel languages for all program executions to terminate [40]. It also enables us to use symbolic execution to soundly reduce the semantics of these languages to SMT constraints, in order to formulate the synthesis queries in Sect. 4.5.

**Definition 2 (State).** *A* state σ *is a tuple* (*pc*, *reg*, *mem*) *where pc is a value, reg is a function from register names to values, and mem is a function from memory addresses to values. Register names, memory addresses, and all values* *are finite-precision integers, or bitvectors. We write* |σ| *to denote the* size *of the state* σ*. The size* |σ| *is defined to be the tuple* (r, m, k*pc*, k*reg* , k*mem*)*, where* r *is the number of registers in* σ*,* m *is the number of memory addresses, and* k*pc,* k*reg , and* k*mem are the width of the bitvector values stored in the pc, reg , and mem , respectively. Two states have the same size if* <sup>|</sup>σi<sup>|</sup> <sup>=</sup> <sup>|</sup>σj <sup>|</sup>*; one state is smaller than another,* <sup>|</sup>σi|≤|σj <sup>|</sup>*, if each element of* <sup>|</sup>σi<sup>|</sup> *is less than or equal to the corresponding element of* <sup>|</sup>σj <sup>|</sup>*.*

**Definition 3 (Abstract Register Machines and Executions).** *An* abstract register machine A *is a tuple* (I,Σ, T , Φ) *where* I *is a set of abstract instructions,* Σ *is a set of states of the same size,* T : P(I) → Σ → Σ *is a* transition function *from instructions and states to states, and* <sup>Φ</sup> : *List*(P(I)) <sup>→</sup> <sup>N</sup> *is a* fuel function *from sequences of instructions to natural numbers. Given a state* <sup>σ</sup><sup>0</sup> <sup>∈</sup> <sup>Σ</sup> *and a sequence of instructions <sup>p</sup> drawn from* <sup>P</sup>(I)*, we define the* execution *of* <sup>A</sup> *on <sup>p</sup> and* <sup>σ</sup><sup>0</sup> *to be the result of applying* <sup>T</sup> *to <sup>p</sup> at most* <sup>Φ</sup>(*p*) *times. That is,* <sup>A</sup>(*p*, σ0) = *run*(*p*, σ0, <sup>T</sup> , Φ(*p*))*, where*

$$run(\mathbf{p}, \sigma, \mathcal{T}, k) = \begin{cases} \sigma, & \text{if } k = 0 \text{ or } pc(\sigma) \notin [0, |\mathbf{p}|), \\ run(\mathbf{p}, \mathcal{T}(\mathbf{p}[pc(\sigma)], \sigma), \mathcal{T}, k - 1), & \text{otherwise.} \end{cases}$$

*Synthesizing JIT Compilers for ARMs.* Given a source and target ARM, our goal is to synthesize a one-pass JIT compiler that translates source programs to semantically equivalent target programs. To make synthesis tractable, we fix the structure of the JIT to consist of an outer loop and a switch statement that dispatches compilation tasks to a set of *mini compilers* (Definition 4). Our synthesis problem is therefore to find a sound mini compiler for each abstract instruction in the source machine (Definition 5).

**Definition 4 (Mini Compiler).** *Let* <sup>A</sup>S = (IS, ΣS, <sup>T</sup>S, ΦS) *and* <sup>A</sup>T <sup>=</sup> (IT , ΣT , <sup>T</sup>T , ΦT ) *be two abstract register machines,* <sup>∼</sup><sup>=</sup> *an equivalence relation on their states* <sup>Σ</sup>S *and* <sup>Σ</sup>T *, and* <sup>C</sup> : <sup>P</sup>(ι) <sup>→</sup> *List*(P(IT )) *a function for some* <sup>ι</sup> ∈ IS*. We say that* <sup>C</sup> *is a* sound mini compiler *for* <sup>ι</sup> *with respect to* <sup>∼</sup><sup>=</sup> *iff*

$$\forall \sigma\_S \in \Sigma\_S, \ \sigma\_T \in \Sigma\_T, \ p \in P(\iota). \ \sigma\_S \cong \sigma\_T \Rightarrow \mathcal{A}\_S(p, \sigma\_S) \cong \mathcal{A}\_T(C(p), \sigma\_T)$$

**Definition 5 (Mini Compiler Synthesis).** *Given two abstract register machines* <sup>A</sup>S = (IS, ΣS, <sup>T</sup>S, ΦS) *and* <sup>A</sup>T = (IT , ΣT , <sup>T</sup>T , ΦT )*, as well as an equivalence relation* ∼= *on their states, the* mini compiler synthesis problem *is to generate a sound mini compiler* <sup>C</sup>ι *for each* <sup>ι</sup> ∈ IS *with respect to* <sup>∼</sup>=*.*

The general version of our synthesis problem, defined above, uses an arbitrary equivalence relation ∼= between the states of the source and target machines to determine if a source and target program are semantically equivalent. Jit-Synth can, in principle, solve this problem with the naive metasketch described in Sect. 2. In practice, however, the naive metasketch scales poorly, even on small languages such as toy eBPF and RISC-V. So, in this paper, we focus on source and target ARMs that satisfy an additional assumption on their state equivalence relation: it can be expressed in terms of injective mappings from source to target states (Definition 6). This restriction enables JitSynth to employ optimizations (such as pre-load sketches described in Sect. 4.4) that are crucial to scaling synthesis to real in-kernel languages.

**Definition 6 (Injective State Equivalence Relation).** *Let* <sup>A</sup>S *and* <sup>A</sup>T *be abstract register machines with states* <sup>Σ</sup>S *and* <sup>Σ</sup>T *such that* <sup>|</sup>σS|≤|σT <sup>|</sup> *for all* <sup>σ</sup>S <sup>∈</sup> <sup>Σ</sup>S *and* <sup>σ</sup>T <sup>∈</sup> <sup>Σ</sup>T *. Let* <sup>M</sup> *be a* state mapping (M*pc*,M*reg* ,M*mem*) *from* <sup>Σ</sup>S *and* <sup>Σ</sup>T *, where* <sup>M</sup>*pc multiplies the program counter of the states in* <sup>Σ</sup>S *by a constant factor,* <sup>M</sup>*reg is an injective map from register names in* <sup>Σ</sup>S *to those in* <sup>Σ</sup>T *, and* <sup>M</sup>*mem is an injective map from memory addresses in* <sup>Σ</sup>S *to those in* <sup>Σ</sup>T *. We say that two states* <sup>σ</sup>S <sup>∈</sup> <sup>Σ</sup>S *and* <sup>σ</sup>T <sup>∈</sup> <sup>Σ</sup>T *are equivalent according to* <sup>M</sup>*, written* <sup>σ</sup>S <sup>∼</sup>=<sup>M</sup> <sup>σ</sup>T *, iff* <sup>M</sup>*pc*(*pc*(σS)) = *pc*(σT )*, reg*(σS)[r] = *reg*(σT )[M*reg* (r)] *for all register names* <sup>r</sup> <sup>∈</sup> dom(*reg*(σS))*, and mem*(σS)[a] = *mem*(σT )[M*mem*(a)] *for all memory addresses* <sup>a</sup> <sup>∈</sup> dom(*mem*(σS))*. The binary relation* <sup>∼</sup>=<sup>M</sup> *is called an* injective state equivalence relation *on* <sup>A</sup>S *and* <sup>A</sup>T *.*

*Soundness of JIT Compilers for ARMs.* Finally, we note that a JIT compiler composed from the synthesized mini compilers correctly translates every source program to an equivalent target program. We formulate and prove this theorem using the Lean theorem prover [25].

**Theorem 1 (Soundness of JIT compilers).** *Let* <sup>A</sup>S = (IS, ΣS, <sup>T</sup>S, ΦS) *and* <sup>A</sup>T = (IT , ΣT , <sup>T</sup>T , ΦT ) *be abstract register machines,* <sup>∼</sup>=<sup>M</sup> *an injective state equivalence relation on their states such that* <sup>M</sup>*pc*(*pc*(σS)) = <sup>N</sup>*pcpc*(σS)*, and* {C1,...,C|I*<sup>S</sup>* <sup>|</sup>} *a solution to the mini compiler synthesis problem for* <sup>A</sup>S*,* <sup>A</sup><sup>T</sup> *, and* <sup>∼</sup>=<sup>M</sup> *where* <sup>∀</sup><sup>s</sup> <sup>∈</sup> <sup>P</sup>(ι). <sup>|</sup>Ci(s)<sup>|</sup> <sup>=</sup> <sup>N</sup>*pc. Let* <sup>C</sup> : <sup>P</sup>(IS) <sup>→</sup> *List*(P(IT )) *be a function that maps concrete instructions* <sup>s</sup> <sup>∈</sup> <sup>P</sup>(ι) *to the compiler output* <sup>C</sup>ι(s) *for* <sup>ι</sup> ∈ IS*. If <sup>s</sup>* <sup>=</sup> <sup>s</sup>1,...,sn *is a sequence of concrete instructions drawn from* <sup>I</sup>S*, and <sup>t</sup>* <sup>=</sup> <sup>C</sup>(s1) · ... · C(sn) *where* · *stands for sequence concatenation, then* <sup>∀</sup>σS <sup>∈</sup> <sup>Σ</sup>S, σT <sup>∈</sup> <sup>Σ</sup>T . σS <sup>∼</sup>=<sup>M</sup> <sup>σ</sup>T ⇒ AS(*s*, σS) <sup>∼</sup>=<sup>M</sup> <sup>A</sup>T (*t*, σT )*.*

#### **4 Solving the Mini Compiler Synthesis Problem**

This section presents our approach to solving the mini compiler synthesis problem defined in Sect. 3. We employ syntax-guided synthesis [37] to search for an implementation of a mini compiler in a space of candidate programs. Our core contribution is an effective way to structure this space using a *compiler metasketch*. This section presents our algorithm for generating compiler metasketches, describes its key subroutines and optimizations, and shows how to solve the resulting sketches with an off-the-shelf synthesis engine.

#### **4.1 Generating Compiler Metasketches**

JitSynth synthesizes mini compilers by generating and solving *metasketches* [7]. A metasketch describes a space of candidate programs using an ordered set of syntactic templates or *sketches* [37]. These sketches take the form of programs with missing expressions or *holes*, where each hole describes a finite set of candidate completions. JitSynth sketches are expressed in a *host language* H that serves both as the implementation language for mini compilers and the specification language for ARMs. JitSynth expects the host to provide a synthesizer for completing sketches and a symbolic evaluator for reducing ARM semantics to SMT constraints. JitSynth uses these tools to generate optimized metasketches for mini compilers, which we call *compiler metasketches*.

Figure 3 shows our algorithm for generating compiler metasketches. The algorithm, CMS, takes as input an abstract source instruction ι for a source machine <sup>A</sup>S, a target machine <sup>A</sup>T , and a state mapping <sup>M</sup> from <sup>A</sup>S to <sup>A</sup>T . Given these inputs, it lazily enumerates an infinite set of *compiler sketches* that collectively represent the space of all straight-line bitvector programs from P(ι) to *List*(P(IT )). In particular, each compiler sketch consists of <sup>k</sup> target *instruction holes*, constructed from field holes that denote bitvector expressions (over the fields of ι) of depth d or less. For each length k and depth d, the CMS loop generates three kinds of compiler sketches: the *pre-load*, the *read-write*, and the *naive* sketch. The naive sketch (Sect. 4.2) is the most general, consisting of all candidate mini compilers of length k and depth d. But it also scales poorly, so CMS first yields the pre-load (Sect. 4.4) and read-write (Sect. 4.3) sketches. As we will see later, these sketches describe a subset of the programs in the naive sketch, and they are designed to prioritize exploring small parts of the search space that are likely to contain a correct mini compiler for ι, if one exists.


**Fig. 3.** Compiler metasketch for the abstract source instruction <sup>ι</sup>, source machine <sup>A</sup>*S*, target machine A*<sup>T</sup>* , and state mapping M from A*<sup>S</sup>* to A*<sup>T</sup>* .

#### **4.2 Generating Naive Sketches**

The most general sketch we consider, Naive(k, d, ι, <sup>A</sup>S, <sup>A</sup>T ,M), is shown in Fig. 4. This sketch consists of k instruction holes that can be filled with any instruction from <sup>I</sup>T . An instruction hole chooses between expressions of the form (*op*T , H), where *op*T is a target opcode, and <sup>H</sup> specifies the field holes for that opcode. Each field hole is a bitvector expression (of depth d) over the fields of the input source instruction and arbitrary bitvector constants. This lets target instructions use the immediates and registers (modulo M) of the source instruction, as well as arbitrary constant values and register names. Letting field holes include constant register names allows the synthesized mini compilers to use target registers unmapped by M as temporary, or scratch, storage. In essence, the naive sketch describes all straight-line compiler programs that can make free use of standard C arithmetic and bitwise operators, as well as scratch registers.

The space of such programs is intractably large, however, even for small inputs. For instance, it includes at least 2<sup>350</sup> programs of length k = 5 and depth d ≤ 3 for the toy example from Sect. 2. JitSynth therefore employs two effective heuristics to direct the exploration of this space toward the most promising candidates first, as defined by the read-write and pre-load sketches.


**Fig. 4.** Naive sketch of length <sup>k</sup> and maximum depth <sup>d</sup> for <sup>ι</sup>, <sup>A</sup>*S*, <sup>A</sup>*<sup>T</sup>* , and <sup>M</sup>. Here, *Expr* creates an expression in the host language, using <sup>M</sup> to map from source to target register names and memory addresses; *Choose*(E) is a hole that chooses an expression from the set E; and *Field*(τ, d, E) is a hole for a bitvector expression of type τ and maximum depth d, constructed from arbitrary bitvector constants and expressions E.

#### **4.3 Generating Read-Write Sketches**

The read-write sketch, RW(k, d, ι, <sup>A</sup>S, <sup>A</sup>T ,M), is based on the observation that many practical source and target languages provide similar functionality, so a source instruction ι can often be emulated with target instructions that access the same parts of the state as ι. For example, the addi32 instruction from eBPF reads and writes only registers (not, e.g., memory), and it can be emulated with RISC-V instructions that also touch only registers (Sect. 2). Moreover, note that the semantics of addi32 ignores the values of its *src* and *off* fields, and that the target RISC-V instructions do the same. Based on these observations, our optimized sketch for addi32 would therefore consists of instruction holes that allow only register-register instructions, with field holes that exclude *src* and *off* . We first formalize this intuition with the notion of *read and write sets*, and then describe how JitSynth applies such sets to create RW sketches.

*Read and Write Sets.* Read and write sets provide a compact way to summarize the semantics of an abstract instruction ι. This summary consists of a set of *state labels*, where a state label is one of L*reg* , L*mem*, and L*pc* (Definition 7). Each label in a summary set represents a state component (registers, memory, or the program counter) that a concrete instance of ι may read or write during some execution. We compute three such sets of labels for every ι: the read set *Read*(ι), the write set *Write*(ι), and the write set *Write*(ι, f) for each field f of ι. Figure 5 shows these sets for the toy eBPF and RISC-V instructions.


**Fig. 5.** Read and write sets for the addi32, lui, and sb instructions from Fig. 1.

The read set *Read*(ι) specifies which components of the input state may affect the execution of ι (Definition 8). For example, if *Read*(ι) includes L*reg* , then some concrete instance of ι produces different output states when executed on two input states that differ only in register values. The write set *Write*(ι) specifies which components of the output state may be affected by executing ι (Definition 9). In particular, if *Write*(ι) includes L*reg* (or L*mem*), then executing some concrete instance of ι on an input state produces an output state with different register (or memory) values. The inclusion of L*pc* is based on a separate condition, designed to distinguish jump instructions from fall-through instructions. Both kinds of instructions change the program counter, but fallthrough instructions always change it in the same way. So, L*pc* ∈ *Write*(ι) if two instances of ι can write different values to the program counter. Finally, the field write set, *Write*(ι, f), specifies the parts of the output state are affected by the value of the field <sup>f</sup>; <sup>L</sup>n <sup>∈</sup> *Write*(ι, f) means that two instances of <sup>ι</sup> that differ only in f can produce different outputs when applied to the same input state.

JitSynth computes all read and write sets from their definitions, by using the host symbolic evaluator to reduce the reasoning about instruction semantics to SMT queries. This reduction is possible because we assume that all ARM interpreters are self-finitizing, as discussed in Sect. 2.

**Definition 7 (State Labels).** *<sup>A</sup>* state label *is an identifier* <sup>L</sup>n *where* <sup>n</sup> *is a state component, i.e.,* n ∈ {*reg*, *mem*, *pc*}*. We write* N *for the set of all state components, and* L *for the set of all state labels. We also use state labels to access the corresponding state components:* <sup>L</sup>n(σ) = <sup>n</sup>(σ) *for all* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*.*

**Definition 8 (Read Set).** *Let* ι ∈ I *be an abstract instruction in* (I,Σ, T , Φ)*. The* read set *of* <sup>ι</sup>*, Read*(ι)*, is the set of all state labels* <sup>L</sup>n ∈ L *such that* <sup>∃</sup><sup>p</sup> <sup>∈</sup> <sup>P</sup>(ι). <sup>∃</sup>Lw <sup>∈</sup> *Write*(ι). <sup>∃</sup>σa, σb <sup>∈</sup> Σ.(Ln(σa) <sup>=</sup> <sup>L</sup>n(σb) <sup>∧</sup> ( m∈N\{n} <sup>L</sup>m(σa) = <sup>L</sup>m(σb)) <sup>∧</sup> <sup>L</sup>w(<sup>T</sup> (p, σa)) <sup>=</sup> <sup>L</sup>w(<sup>T</sup> (p, σb)).

**Definition 9 (Write Set).** *Let* ι ∈ I *be an abstract instruction in* (I,Σ, T , Φ)*. The* write set *of* <sup>ι</sup>*, Write*(ι)*, includes the state label* <sup>L</sup>n ∈ {L*reg* , L*mem*} *iff* <sup>∃</sup><sup>p</sup> <sup>∈</sup> <sup>P</sup>(ι). <sup>∃</sup><sup>σ</sup> <sup>∈</sup> Σ.Ln(σ) <sup>=</sup> <sup>L</sup>n(<sup>T</sup> (p, σ))*, and it includes the state label* <sup>L</sup>*pc iff* <sup>∃</sup>pa, pb <sup>∈</sup> <sup>P</sup>(ι). <sup>∃</sup><sup>σ</sup> <sup>∈</sup> Σ.L*pc*(<sup>T</sup> (pa, σ)) <sup>=</sup> <sup>L</sup>*pc*(<sup>T</sup> (pb, σ)).

**Definition 10 (Field Write Set).** *Let* f *be a field of an abstract instruction* ι = (*op*, F) *in* (I,Σ, T , Φ)*. The* write set *of* ι *and* f*, Write*(ι, f)*, includes the state label* <sup>L</sup>n ∈ L *iff* <sup>∃</sup>pa, pb <sup>∈</sup> <sup>P</sup>(ι). <sup>∃</sup><sup>σ</sup> <sup>∈</sup> Σ. (pa.f <sup>=</sup> <sup>p</sup>b.f) <sup>∧</sup> ( g∈*dom*(F)\{f} <sup>p</sup>a.g <sup>=</sup> <sup>p</sup>b.g) <sup>∧</sup> <sup>L</sup>n(<sup>T</sup> (pa, σ)) <sup>=</sup> <sup>L</sup>n(<sup>T</sup> (pb, σ))*, where* p.f *denotes* F(f) *for* p = (op, F)*.*

*Using Read and Write Sets.* Given the read and write sets for a source instruction <sup>ι</sup> and target instructions <sup>I</sup>T , JitSynth generates the RW sketch of length <sup>k</sup> and depth d by modifying the Naive algorithm (Fig. 4) as follows. First, it restricts each target instruction hole (line 7) to choose an instruction <sup>ι</sup>T ∈ IT with the same read and write sets as <sup>ι</sup>, i.e., *Read*(ι) = *Read*(ιT ) and *Write*(ι) = *Write*(ιT ). Second, it restricts the target field holes (line 9) to use the source fields with the matching field write set, i.e., the hole for a target field <sup>f</sup>T uses the source field <sup>f</sup> when *Write*(ιT , ft) = *Write*(ι, f). For example, given the sets from Fig. 5, the RW instruction holes for addi32 exclude sb but include lui, and the field holes for lui use only the *dst* and *imm* source fields. More generally, the RW sketch for addi32 consists of register-register instructions over *dst* and *imm*, as intended. This sketch includes 2<sup>290</sup> programs of length k = 5 and depth <sup>d</sup> <sup>≤</sup> 3, resulting in a 2<sup>60</sup> fold reduction in the size of the search space compared to the Naive sketch of the same length and depth.

#### **4.4 Generating Pre-load Sketches**

The pre-load sketch, PLD (k, d, ι, <sup>A</sup>S, <sup>A</sup>T ,M), is based on the observation that hand-written JITs use macros or subroutines to generate frequently used target instruction sequences. For example, compiling a source instruction with immediate fields often involves loading the immediates into scratch registers, and handwritten JITs include a subroutine that generates the target instructions for performing these loads. The pre-load sketch shown in Fig. 6 mimics this structure.

In particular, PLD generates a sequence of m concrete instructions that load the (used) immediate fields of ι, followed by a sequence of k − m instruction holes. The instruction holes can refer to both the source registers (if any) and the scratch registers (via the arbitrary bitvector constants included in the *Field* holes). The function *Load*(*Expr* (p.f), <sup>A</sup>T ,M) returns a sequence of target instructions that load the immediate p.f into an unused scratch register. This function itself is synthesized by JitSynth using a variant of the RW sketch.

As an example, the pre-load sketch for addi32 consists of two *Load* instructions (lui and addiw in the generated C code) and k−2 instruction holes. The holes choose among register-register instructions in toy RISC-V, and they can refer to the *dst* register of addi32, as well as any scratch register. The resulting sketch includes 2<sup>100</sup> programs of length <sup>k</sup> = 5 and depth <sup>d</sup> <sup>≤</sup> 3, providing a 2<sup>190</sup> fold reduction in the size of the search space compared to the RW sketch.

**Fig. 6.** Pre-load sketch of length <sup>k</sup> and maximum depth <sup>d</sup> for <sup>ι</sup>, <sup>A</sup>*S*, <sup>A</sup>*<sup>T</sup>* , and <sup>M</sup>. The *Load*(E, <sup>A</sup>*<sup>T</sup>* ,M) function returns a sequence of target instructions that load the immediate value described by the expression E into an unused scratch register; see Fig. 4 for descriptions of other helper functions.

#### **4.5 Solving Compiler Metasketches**

JitSynth solves the metasketch CMS(ι, <sup>A</sup>S, <sup>A</sup>T ,M) by applying the host synthesizer to each of the generated sketches in turn until a mini compiler is found. If no mini compiler exists in the search space, this synthesis process runs forever. To check if a sketch S contains a mini compiler, JitSynth would ideally ask the host synthesizer to solve the following query, derived from Definitions 4–6:

$$\exists C \in \mathcal{S}. \,\forall \sigma\_S \in \Sigma\_S, \,\,\sigma\_T \in \Sigma\_T, \,\, p \in P(\iota). \sigma\_S \cong\_{\mathcal{M}} \sigma\_T \Rightarrow \mathcal{A}\_S(p, \sigma\_S) \cong\_{\mathcal{M}} \mathcal{A}\_T(C(p), \sigma\_T)$$

But recall that the state equivalence check ∼=<sup>M</sup> involves universally quantified formulas over memory addresses and register names. In principle, these innermost quantifiers are not problematic because they range over finite domains (bitvectors) so the formula remains decidable. In practice, however, they lead to intractable SMT queries. We therefore solve a stronger soundness query (Definition 11) that pulls these quantifiers out to obtain the standard ∃∀ formula with a quantifier-free body. The resulting formula can be solved with CEGIS [37], without requiring the underlying SMT solver to reason about quantifiers.

**Definition 11 (Strongly Sound Mini Compiler).** *Let* <sup>A</sup>S = (IS, ΣS, <sup>T</sup>S, <sup>Φ</sup>S) *and* <sup>A</sup>T = (IT , ΣT , <sup>T</sup>T , ΦT ) *be two abstract register machines,* <sup>∼</sup>=<sup>M</sup> *an injective state equivalence relation on their states* <sup>Σ</sup>S *and* <sup>Σ</sup>T *, and* <sup>C</sup> : <sup>P</sup>(ι) <sup>→</sup> *List*(P(IT )) *a function for some* <sup>ι</sup> ∈ IS*. We say that* <sup>C</sup> *is a* strongly sound mini compiler *for* ι<sup>M</sup> *with respect to* ∼= *iff*

$$\begin{aligned} &\forall \sigma\_S \in \Sigma\_S, \ \sigma\_T \in \Sigma\_T, \ p \in P(\iota), \ a \in dom(mem(\sigma\_S)), \ r \in dom(reg(\sigma\_S)). \\ &\sigma\_S \cong\_{\mathcal{M}, a, r} \sigma\_T \Rightarrow \mathcal{A}\_S(p, \sigma\_S) \cong\_{\mathcal{M}, a, r} \mathcal{A}\_T(C(p), \sigma\_T) \end{aligned}$$

*where* <sup>∼</sup>=M,a,r *stands for the* <sup>∼</sup>=<sup>M</sup> *formula with* <sup>a</sup> *and* <sup>r</sup> *as free variables.*

The JitSynth synthesis procedure is sound and complete with respect to this stronger query (Theorem 2). The proof follows from the soundness and completeness of the host synthesizer, and the construction of the compiler metasketch. We discharge this proof using Lean theorem prover [25].

**Theorem 2 (Strong soundness and completeness of** JitSynth**).** *Let* C = CMS(ι, <sup>A</sup>S, <sup>A</sup>T ,M) *be the compiler metasketch for the abstract instruction* <sup>ι</sup>*, machines* <sup>A</sup>S *and* <sup>A</sup>T *, and the state mapping* <sup>M</sup>*. If* JitSynth *terminates and returns a program* C *when applied to* C*, then* C *is a strongly sound mini compiler for* <sup>ι</sup> *and* <sup>A</sup>T *(soundness). If there is a strongly sound mini compiler in the most general search space* {Naive(k, d, ι, <sup>A</sup>S, <sup>A</sup>T ,M)<sup>|</sup> k, d <sup>∈</sup> <sup>N</sup>}*, then* JitSynth *will terminate on* C *and produce a program (completeness).*

#### **5 Implementation**

We implemented JitSynth as described in Sect. 2 using Rosette [39] as our host language. Since the search spaces for different compiler lengths are disjoint, the JitSynth implementation searches these spaces in parallel [7]. We use Φ(*p*) = length(*p*) as the fuel function for all languages studied in this paper. This provides sufficient fuel for evaluating programs in these languages that are accepted by the OS kernel. For example, the Linux kernel requires eBPF programs to be loop-free, and it enforces this restriction with a conservative static check; programs that fail the check are not passed to the JIT [13].

#### **6 Evaluation**

This section evaluates JitSynth by answering the following research questions:

**RQ1**: Can JitSynth synthesize correct and performant compilers for real-world source and target languages?

**RQ2**: How effective are the sketch optimizations described in Sect. 4?

#### **6.1 Synthesizing Compilers for Real-World Source-Target Pairs**

To demonstrate the effectiveness of JitSynth, we applied JitSynth to synthesize compilers for three different source-target pairs: eBPF to 64-bit RISC-V, classic BPF to eBPF, and libseccomp to eBPF. This subsection describes our results for each of the synthesized compilers.

**Fig. 7.** Execution time of eBPF benchmarks on the HiFive Unleashed RISC-V development board, using the existing Linux eBPF to RISC-V compiler, the JitSynth compiler, and the Linux eBPF interpreter. Measured in processor cycles.

*eBPF to RISC-V.* As a case study, we applied JitSynth to synthesize a compiler from eBPF to 64-bit RISC-V. It supports 87 of the 102 eBPF instruction opcodes; unsupported eBPF instructions include function calls, endianness operations, and atomic instructions. To validate that the synthesized compiler is correct, we ran the existing eBPF test cases from the Linux kernel; our compiler passes all test cases it supports. In addition, our compiler avoids bugs previously found in the existing Linux eBPF-to-RISC-V compiler in Linux [27]. To evaluate performance, we compared against the existing Linux compiler. We used the same set of benchmarks used by Jitk [40], which includes system call filters from widely used applications. Because these benchmarks were originally for classic BPF, we first compile them to eBPF using the existing Linux classic-BPF-toeBPF compiler as a preprocessing step. To run the benchmarks, we execute the generated code on the HiFive Unleashed RISC-V development board [35], measuring the number of cycles. As input to the filter, we use a system call number that is allowed by the filter to represent the common case execution.

Figure 7 shows the results of the performance evaluation. eBPF programs compiled by JitSynth JIT compilers show an average slowdown of 1.82× compared to programs compiled by the existing Linux compiler. This overhead results from additional complexity in the compiled eBPF jump instructions. Linux compilers avoid this complexity by leveraging bounds on the size of eBPF jump offsets. JitSynth-compiled programs get an average speedup of 5.24× compared to interpreting the eBPF programs. This evidence shows that JitSynth can synthesize a compiler that outperforms the current Linux eBPF interpreter, and nears the performance of the Linux compiler, while avoiding bugs.

*Classic BPF to eBPF.* Classic BPF is the original, simpler version of BPF used for packet filtering which was later extended to eBPF in Linux. Since many applications still use classic BPF, Linux must first compile classic BPF to eBPF as an intermediary step before compiling to machine instructions. As a second case study, we used JitSynth to synthesize a compiler from classic BPF to eBPF. Our synthesized compiler supports all classic BPF opcodes. To evaluate performance, we compare against the existing Linux classic-BPF-to-eBPF

**Fig. 8.** Performance of code generated by JitSynth compilers compared to existing compilers for the classic BPF to eBPF benchmarks (left) and the libseccomp to eBPF benchmarks (right). Measured in number of instructions executed.

compiler. Similar to the RISC-V benchmarks, we run each eBPF program with input that is allowed by the filter. Because eBPF does not run directly on hardware, we measure the number of instructions executed instead of processor cycles.

Figure 8 shows the performance results. Classic BPF programs generated by JitSynth compilers execute an average of 2.28× more instructions than those compiled by Linux.

*Libseccomp to eBPF.* libseccomp is a library used to simplify construction of BPF system call filters. The existing libseccomp implementation compiles to classic BPF; we instead choose to compile to eBPF because classic BPF has only two registers, which does not satisfy the assumptions of JitSynth. Since libseccomp is a library and does not have distinct instructions, libseccomp itself does not meet the definition of an abstract register machine; we instead introduce an intermediate libseccomp language which does satisfy this definition. Our full libseccomp to eBPF compiler is composed of both a trusted program to translate from libseccomp to our intermediate language and a synthesized compiler from our intermediate language to eBPF.

To evaluate performance, we select a set of benchmark filters from real-world applications that use libseccomp, and measure the number of eBPF instructions executed for an input the filter allows. Because no existing compiler exists from libseccomp to eBPF directly, we compare against the composition of the existing libseccomp-to-classic-BPF and classic-BPF-to-eBPF compilers.

Figure 8 shows the performance results. libseccomp programs generated by JitSynth execute 2.61× more instructions on average compared to the existing libseccomp-to-eBPF compiler stack. However, the synthesized compiler avoids bugs previously found in the libseccomp-to-classic-BPF compiler [16].

#### **6.2 Effectiveness of Sketch Optimizations**

In order to evaluate the effectiveness of the search optimizations described in Sect. 4, we measured the time JitSynth takes to synthesize each of the three compilers with different optimizations enabled. Specifically, we run JitSynth in


**Fig. 9.** Synthesis time for each source-target pair, broken down by set of optimizations used in the sketch. An X indicates that synthesis either timed out or ran out of memory.

three different configurations: (1) using Naive sketches, (2) using RW sketches, and (3) using PLD sketches. For each configuration, we ran JitSynth with a timeout of 48 hours (or until out of memory). Figure 9 shows the time to synthesize each compiler under each configuration. Note that these figures do not include time spent computing read and write sets, which takes less than 11 min for all cases. Our results were collected using an 8-core AMD Ryzen 7 1700 CPU with 16 GB memory, running Racket v7.4 and the Boolector [29] solver v3.0.1-pre.

When synthesizing the eBPF-to-RISC-V compiler, JitSynth runs out of memory with Naive sketches, reaches the timeout with RW sketches, and completes synthesis with PLD sketches. For the classic-BPF-to-eBPF compiler, Jit-Synth times out with both Naive sketches and RW sketches. JitSynth only finishes synthesis with PLD sketches. For the libseccomp-to-eBPF compiler, all configurations finish, but JitSynth finishes synthesis about 34× times faster with PLD sketches than with Naive sketches. These results demonstrate that the techniques JitSynth uses are essential to the scalability of JIT synthesis.

#### **7 Related Work**

*JIT Compilers for In-kernel Languages.* JIT compilers have been widely used to improve the extensibility and performance of systems software, such as OS kernels [8,11,12,26]. One notable system is Jitk [40]. It builds on the CompCert compiler [20] to compile classic BPF programs to machine instructions. Both Jitk and CompCert are formally verified for correctness using the Coq interactive theorem prover. Jitk is further extended to support eBPF [36]. Like Jitk, JitSynth provides formal correctness guarantees of JIT compilers. Unlike Jitk, JitSynth does not require developers to write either the implementation or proof of a JIT compiler. Instead, it takes as input interpreters of both source and target languages and state-mapping functions, using automated verification and synthesis to produce a JIT compiler.

An in-kernel extension system such as eBPF also contains a *verifier*, which checks for safety and termination of input programs [13,40]. JitSynth assumes a well-formed input program that passes the verifier and focuses on the correctness of JIT compilation.

*Synthesis-Aided Compilers.* There is a rich literature that explores generating and synthesizing peephole optimizers and superoptimizers based on a given ISA or language specification [4,9,14,17,23,33,34]. Bansal and Aiken described a PowerPC-to-x86 binary translator using peephole superoptimization [5]. Chlorophyll [31] applied synthesis to a number of compilation tasks for the GreenArrays GA144 architecture, including code partitioning, layout, and generation. JitSynth bears the similarity of translation between a source-target pair of languages and shares the challenge of scaling up synthesis. Unlike existing work, JitSynth synthesizes a *compiler* written in a host language, and uses compiler metasketches for efficient synthesis.

*Compiler Testing.* Compilers are complex pieces of software and are known to be difficult to get right [22]. Recent advances in compiler testing, such as Csmith [41] and EMI [42], have found hundreds of bugs in GCC and LLVM compilers. Alive [19,21] and Serval [28] use automated verification techniques to uncover bugs in the LLVM's peephole optimizer and the Linux kernel's eBPF JIT compilers, respectively. JitSynth complements these tools by providing a correctness-by-construction approach for writing JIT compilers.

#### **8 Conclusion**

This paper presents a new technique for synthesizing JIT compilers for in-kernel DSLs. The technique creates per-instruction compilers, or compilers that independently translate single source instructions to sequences of target instructions. In order to synthesize each per-instruction compiler, we frame the problem as search using compiler metasketches, which are optimized using both read and write set information as well as pre-synthesized load operations. We implement these techniques in JitSynth and evaluate JitSynth over three source and target pairs from the Linux kernel. Our evaluation shows that (1) JitSynth can synthesize correct and performant compilers for real in-kernel languages, and (2) the optimizations discussed in this paper make the synthesis of these compilers tractable to JitSynth. As future in-kernel DSLs are created, JitSynth can reduce both the programming and proof burden on developers writing compilers for those DSLs. The JitSynth source code is publicly available at https:// github.com/uw-unsat/jitsynth.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Program Synthesis Using Deduction-Guided Reinforcement Learning**

Yanju Chen1(B), Chenglong Wang<sup>2</sup>, Osbert Bastani<sup>3</sup>, Isil Dillig<sup>4</sup>, and Yu Feng1(B)

<sup>1</sup> University of California, Santa Barbara, Santa Barbara, CA 93106, USA {yanju,yufeng}@cs.ucsb.edu <sup>2</sup> University of Washington, Seattle, WA 98115, USA

clwang@cs.washington.edu <sup>3</sup> University of Pennsylvania, Philadelphia, PA 19104, USA obastani@seas.upenn.edu

<sup>4</sup> The University of Texas at Austin, Austin, TX 78712, USA isil@cs.utexas.edu

**Abstract.** In this paper, we present a new program synthesis algorithm based on reinforcement learning. Given an initial policy (i.e. statistical model) trained off-line, our method uses this policy to guide its search and gradually improves it by leveraging feedback obtained from a deductive reasoning engine. Specifically, we formulate program synthesis as a reinforcement learning problem and propose a new variant of the *policy gradient* algorithm that can incorporate feedback from a deduction engine into the underlying statistical model. The benefit of this approach is two-fold: First, it combines the power of deductive and statistical reasoning in a unified framework. Second, it leverages deduction not only to *prune* the search space but also to *guide* search. We have implemented the proposed approach in a tool called Concord and experimentally evaluate it on synthesis tasks studied in prior work. Our comparison against several baselines and two existing synthesis tools shows the advantages of our proposed approach. In particular, Concord solves 15% more benchmarks compared to Neo, a state-of-the-art synthesis tool, while improving synthesis time by 8.71× on benchmarks that can be solved by both tools.

## **1 Introduction**

Due to its potential to significantly improve both programmer productivity and software correctness, *automated program synthesis* has gained enormous popularity over the last decade. Given a high-level specification of user intent, most modern synthesizers perform some form of backtracking search in order to find a

This work was sponsored by the National Science Foundation under agreement number of 1908494, 1811865 and 1910769.

c The Author(s) 2020

S. K. Lahiri and C. Wang (Eds.): CAV 2020, LNCS 12225, pp. 587–610, 2020. https://doi.org/10.1007/978-3-030-53291-8\_30

**Fig. 1.** Overview of our synthesis algorithm

program that satisfies the specification. However, due to the enormous size of the search space, synthesizers additionally use at least one of two other techniques, namely deduction and statistical reasoning, to make this approach practical. For example, many recent synthesis techniques use lightweight program analysis or logical reasoning to significantly prune the search space [18,19,39,53]. On the other hand, several recent approaches utilize a statistical model (trained off-line) to bias the search towards programs that are more likely to satisfy the specification [2,4,7,19]. While both deductive and statistical reasoning have been shown to dramatically improve synthesis efficiency, a key limitation of existing approaches is that they do not tightly combine these two modes of reasoning. In particular, although logical reasoning often provides very useful feedback at synthesis time, existing synthesis algorithms do not leverage such feedback to improve their statistical model.

In this paper, we propose a new synthesis algorithm that meaningfully combines deductive and statistical reasoning. Similar to prior techniques, our approach starts with a statistical model (henceforth called a *policy*) that is trained off-line on a representative set of training problems and uses this policy to guide search. However, unlike prior techniques, our method *updates* this policy on-line at synthesis time and gradually improves the policy by incorporating feedback from a deduction engine.

To achieve this tight coupling between deductive and statistical reasoning, we formulate syntax-guided synthesis as a reinforcement learning (RL) problem. Specifically, given a context-free grammar for the underlying DSL, we think of partial (i.e., incomplete) programs in this DSL as states in a Markov Decision Process (MDP) and actions as grammar productions. Thus, a *policy* of this MDP specifies how a partial program should be extended to obtain a more specific program. Then, the goal of our reinforcement learning problem is to improve this policy over time as some partial programs are proven infeasible by an underlying deduction engine.

While the framework of reinforcement learning is a good fit for our problem, standard RL algorithms (e.g., policy gradient) typically update the policy based on feedback received from states that have *already* been explored. However, in the context of program synthesis, deductive reasoning can also provide feedback about states that have *not* been explored. For example, given a partial program that is infeasible, one can analyze the root cause of failure to infer *other* infeasible programs [18,54]. To deal with this difficulty, we propose an *off-policy* reinforcement learning algorithm that can improve the policy based on such additional feedback from the deduction engine.

As shown schematically in Fig. 1, our synthesis algorithm consists of three conceptual elements, indicated as "Take action", "Deduce", and "Update policy". Given the current policy π and partial program P, "Take action" uses π to expand P into a more complete program P . Then, "Deduce" employs existing deductive reasoning techniques (e.g., [18,32]) to check whether P is feasible with respect to the specification. If this is not the case, "Update policy" uses the feedback provided by the deduction engine to improve π. Specifically, the policy is updated using an off-policy variant of the *policy gradient* algorithm, where the gradient computation is adapted to our unique setting.

We have implemented the proposed method in a new synthesis tool called Concord and empirically evaluate it on synthesis tasks used in prior work [2,18]. We also compare our method with several relevant baselines as well as two existing synthesis tools. Notably, our evaluation shows that Concord can solve 15% more benchmarks compared to Neo (a state-of-the-art synthesis tool), while being 8.71× faster on benchmarks that can be solved by both tools. Furthermore, our ablation study demonstrates the empirical benefits of our proposed reinforcement learning algorithm.

To summarize, this paper makes the following key contributions:


The rest of this paper is structured as follows. First, we provide some background on reinforcement learning and MDPs (Sect. 2) and introduce our problem formulation in Sect. 3. After formulating the synthesis problem as an MDP in Sect. 4, we then present our synthesis algorithm in Sect. 5. Sections 6 and 7 describe our implementation and evaluation respectively. Finally, we discuss related work and future research directions in Sect. 8 and 9.

#### **2 Background on Reinforcement Learning**

At a high level, the goal of reinforcement learning (RL) is to train an agent, such as a robot, to make a sequence of decisions (e.g., move up/down/left/right) in order to accomplish a task. All relevant information about the environment and the task is specified as a *Markov decision process (MDP)*. Given an MDP, the goal is to compute a policy that specifies how the agent should act in each state to maximize their chances of accomplishing the task.

In the remainder of this section, we provide background on MDPs and describe the policy gradient algorithm that our method will build upon.

*Markov Decision Process.* We formalize a *Markov decision process (MDP)* as a tuple <sup>M</sup> = (S, <sup>S</sup><sup>I</sup> , <sup>S</sup><sup>T</sup> , <sup>A</sup>, <sup>F</sup>, <sup>R</sup>), where:


In general, transitions in an MDP can be stochastic; however, for our setting, we only consider deterministic transitions and rewards.

*Policy.* A policy for an MDP specifies how the agent should act in each state. Specifically, we consider a (stochastic) *policy* π : S×A → <sup>R</sup>, where π(S, A) is the probability of taking action A in state S. Alternatively, we can also think of π as a mapping from states to distributions over actions. Thus, we write A <sup>∼</sup> π(S) to denote that action A is sampled from the distribution for state s.

*Rollout.* Given an MDP <sup>M</sup> and policy π, a *rollout* is a sequence of state-actionreward tuples obtained by sampling an initial state and then using π to make decisions until a final state is reached. More formally, for a rollout of the form:

$$\zeta = ((S\_1, A\_1, R\_1), \dots, (S\_{m-1}, A\_{m-1}, R\_{m-1}), (S\_m, \mathcal{Q}, R\_m)),$$

we have <sup>S</sup><sup>m</sup> ∈ S<sup>T</sup> , <sup>S</sup><sup>1</sup> ∼ S<sup>I</sup> (i.e., <sup>S</sup><sup>1</sup> is sampled from an initial state), and, for each <sup>i</sup> ∈ {1, ..., m <sup>−</sup> <sup>1</sup>}, <sup>A</sup><sup>i</sup> <sup>∼</sup> <sup>π</sup>(S<sup>i</sup>), <sup>R</sup><sup>i</sup> <sup>=</sup> <sup>R</sup>(S<sup>i</sup>), and <sup>S</sup><sup>i</sup>+1 <sup>=</sup> <sup>F</sup>(S<sup>i</sup>, A<sup>i</sup>).

In general, a policy <sup>π</sup> induces a distribution <sup>D</sup><sup>π</sup> over the rollouts of an MDP M. Since we assume that MDP transitions are deterministic, we have:

$$\mathcal{D}\_{\pi}(\zeta) = \prod\_{i=1}^{m-1} \pi(S\_i, A\_i).$$

*RL Problem.* Given an MDP M, the goal of reinforcement learning is to compute an *optimal* policy <sup>π</sup><sup>∗</sup> for <sup>M</sup>. More formally, π<sup>∗</sup> should maximize *cumulative expected reward*:

$$
\pi^\* = \arg\max\_{\pi} J(\pi)
$$

where the *cumulative expected reward* J(π) is computed as follows:

$$J(\pi) = \mathbb{E}\_{\zeta \sim \mathcal{D}\_{\pi}} \left[ \sum\_{i=1}^{m} R\_i \right]$$

*Policy Gradient Algorithm.* The *policy gradient algorithm* is a well-known RL algorithm for finding optimal policies. It assumes a parametric policy family <sup>π</sup><sup>θ</sup> with parameters <sup>θ</sup> <sup>∈</sup> <sup>R</sup>d. For example, <sup>π</sup><sup>θ</sup> may be a deep neural network (DNN), where θ denotes the parameters of the DNN. At a high level, the policy gradient algorithm uses the following theorem to optimize J(πθ) [48]:

**Theorem 1.** *We have*

$$\nabla\_{\theta} J(\pi\_{\theta}) = \mathbb{E}\_{\zeta \sim \mathcal{D}\_{\pi\_{\theta}}} [\ell(\zeta)] \qquad where \qquad \ell(\zeta) = \sum\_{i=1}^{m-1} \left( \sum\_{j=i+1}^{m} R\_j \right) \nabla\_{\theta} \log \pi\_{\theta}(S\_i, A\_i). \tag{1}$$

In this theorem, the term <sup>∇</sup><sup>θ</sup> log <sup>π</sup><sup>θ</sup>(S<sup>i</sup>, A<sup>i</sup>) intuitively gives a direction in the parameter space that, when moving the policy parameters towards it, increases the probability of taking action <sup>A</sup><sup>i</sup> at state <sup>S</sup><sup>i</sup>. Also, the sum m <sup>j</sup>=i+1 <sup>R</sup><sup>j</sup> is the total future reward after taking action A<sup>i</sup>. Thus, (ζ) is just the sum of different directions in the parameter space weighted by their corresponding future reward. Thus, the gradient <sup>∇</sup><sup>θ</sup>J(π<sup>θ</sup>) moves policy parameters in a direction that increases the probability of taking actions that lead to higher rewards.

Based on this theorem, we can estimate the gradient <sup>∇</sup><sup>θ</sup>J(π<sup>θ</sup>) using rollouts sampled from D<sup>π</sup><sup>θ</sup> :

$$\nabla\_{\theta} J(\pi\_{\theta}) \approx \frac{1}{n} \sum\_{k=1}^{n} \ell(\zeta^{(k)}),\tag{2}$$

where <sup>ζ</sup>(k) ∼ D<sup>π</sup><sup>θ</sup> for each <sup>k</sup> ∈ {1, ..., n}. The policy gradient algorithm uses stochastic gradient ascent in conjunction with Eq. (2) to maximize J(π<sup>θ</sup>) [48].

#### **3 Problem Formulation**

In this paper, we focus on the setting of syntax-guided synthesis [1]. Specifically, given a domain-specific language (DSL) L and a specification φ, our goal is to find a program in L that satisfies φ. In the remainder of this section, we formally define our synthesis problem and clarify our assumptions.

*DSL.* We assume a domain-specific language L specified as a context-free grammar L = (V, Σ, R, S), where V,Σ denote non-terminals and terminals respectively, R is a set of productions, and S is the start symbol.

**Definition 1 (Partial program).** *<sup>A</sup>* partial program P *is a sequence* P <sup>∈</sup> (<sup>Σ</sup> <sup>∪</sup> <sup>V</sup> )<sup>∗</sup> *such that* S <sup>∗</sup> <sup>⇒</sup> P *(i.e.,* P *can be derived from* S *via a sequence of productions). We refer to any non-terminal in* P *as a hole* hole*, and we say that* P *is* complete *if it does not contain any holes.*

$$\begin{array}{l} S \to N \mid L \\ N \to 0 \mid \dots \mid 10 \mid x\_i \\ L \to x\_i \mid \mathsf{take}(L, N) \mid \mathsf{drop}(L, N) \mid \mathsf{sort}(L) \\ \mid \mathsf{reverse}(L) \mid \mathsf{add}(L, L) \mid \mathsf{sub}(L, L) \mid \mathsf{sumUpTo}(L) \end{array}$$

**Fig. 2.** A simple programming language used for illustration. Here, take (resp. drop) keeps (resp. removes) the first N elements in the input list. Also, add (resp. sub) compute a new list by adding (resp. subtracting) elements from the two lists pair-wise. Finally, sumUpTo generates a new list where the i'th element in the output list is the sum of all previous elements (including the i'th element) in the input list.

Given a partial program P containing a hole H, we can fill this hole by replacing H with the right-hand-side of any grammar production r of the form H <sup>→</sup> e. We use the notation P <sup>r</sup> <sup>⇒</sup> P to indicate that P is the partial program obtained by replacing the first occurrence of H with the right-hand-side of r, and we write Fill(P, r) = P whenever <sup>P</sup> <sup>r</sup> <sup>⇒</sup> P .

*Example 1.* Consider the small programming language shown in Fig. 2 for manipulating lists of integers. The following partial program P over this DSL contains three holes, namely L1, L2, N<sup>1</sup>:

$$\mathtt{add}(L\_1, \mathtt{take}(L\_2, N\_1))$$

Now, consider the production r <sup>≡</sup> L <sup>→</sup> reverse(L). In this case, Fill(P, r) yields the following partial program P :

$$\mathsf{add}(\mathsf{reversse}(L\_1), \mathsf{take}(L\_2, N\_1))$$

*Program Synthesis Problem.* Given a specification φ and language L <sup>=</sup> (V, Σ, R, S), the goal of program synthesis is to find a *complete* program P such that S <sup>∗</sup> <sup>⇒</sup> P and P satisfies φ. We use the notation P <sup>|</sup><sup>=</sup> φ to indicate that P is a complete program that satisfies specification φ.

*Deduction Engine.* In the remainder of this paper, we assume access to a *deduction engine* that can determine whether a partial program P is *feasible* with respect to specification φ. To make this more precise, we introduce the following notion of feasibility.

**Definition 2 (Feasible partial program).** *Given a specification* φ *and language* L = (V, Σ, R, S)*, a partial program* P *is said to be* feasible *with respect to* φ *if there exists any complete program* P *such that* <sup>P</sup> <sup>∗</sup> <sup>⇒</sup> P *and* P <sup>|</sup><sup>=</sup> φ*.*

In other words, a feasible partial program can be refined into a complete program that satisfies the specification. We assume that our deduction oracle over-approximates feasibility. That is, if P is feasible with respect to specification φ, then Deduce(P, φ) should report that P is feasible but not necessarily vice versa. Note that almost all deduction techniques used in the program synthesis literature satisfy this assumption [18,19,21,27,53].

*Example 2.* Consider again the DSL from Fig. <sup>2</sup> and the specification φ defined by the following input-output example:

$$[65, 2, 73, 62, 78] \mapsto [143, 129, 213, 204, 345]$$

The partial program add(reverse(x), take(x, N)) is infeasible because, no matter what production we use to fill non-terminal N, the resulting program cannot satisfy the provided specification for the following reason:


Several techniques from prior work (e.g., [18,19,39,53]) can prove the infeasibility of such partial programs by using an SMT solver (provided specifications are given for the DSL constructs).

Beyond checking feasibility, some deduction techniques used for synthesis can also provide additional information [18,32,54]. In particular, given a partial program P that is infeasible with respect to specification φ, several deduction engines can generate a set of other infeasible partial programs <sup>P</sup>1,...,P<sup>n</sup> that are infeasible for the same reason as P. To unify both types of feedback, we assume that the output of the deduction oracle <sup>O</sup> is a set S of partial programs such that S is empty if and only if <sup>O</sup> decides that the partial program is feasible.

This discussion is summarized by the following definition:

**Definition 3 (Deduction engine).** *Given a partial program* P *and specification* φ*,* Deduce*(*P, φ*) yields a set of partial programs* S *such that (1) if* S <sup>=</sup> <sup>∅</sup>*, then* P *is infeasible, and (2) for every* P <sup>∈</sup> <sup>S</sup>*, it must be the case that* <sup>P</sup> *is infeasible with respect to* φ*.*

*Example 3.* Consider again the same infeasible partial program P given in Example 2. Since drop(l, n) drops the first n elements from list l (where n < *length*(l)), it also produces a list whose length is smaller than that of the input. Thus, the following partial program P is also infeasible for the same reason as <sup>P</sup>:

$$P' \equiv \mathsf{add}(\mathsf{reoverse}(x), \mathsf{drop}(x, N))$$

Thus, Deduce(P, φ) may return the set {P, P }.

#### **4 MDP Formulation of Deduction-Guided Synthesis**

Given a specification φ and language L = (V, Σ, R, S), we can formulate the program synthesis problem as an MDP <sup>M</sup><sup>φ</sup> = (S, <sup>S</sup><sup>I</sup> , <sup>S</sup><sup>T</sup> , <sup>A</sup>, <sup>F</sup>, <sup>R</sup>), where:


$$\mathcal{S}\_I(P) = \begin{cases} 1 \text{ if } P = S \\ 0 \text{ if } P \neq S \end{cases}$$

– S<sup>T</sup> includes complete programs as well as infeasible partial programs, i.e.,

$$P \in \mathcal{S}\_T \iff \text{IsCosMPLETE}(P) \lor \text{ DEDUCE}(P, \phi) \neq \mathcal{Q} \lor P = \bot$$


$$\mathcal{F}(P, \; r = (H \to e)) = \begin{cases} \bot & \text{if } H \text{ is not a hole in } P \\ \text{FIL}(P, r) & \text{otherwise} \end{cases}$$

– The reward function penalizes infeasible programs and rewards correct solutions, i.e.,

$$\mathcal{R}(P) = \begin{cases} 1 & \text{if } P \mid = \phi \\ -1 & \text{if } P = \bot \lor \text{Denduce}(P, \phi) \neq \mathcal{Q} \lor (\text{IsCompleteness}(P) \land P \not\models \phi) \\ 0 & \text{otherwise} \end{cases}$$

Observe that our reward function encodes the goal of synthesizing a complete program P that satisfies φ, while avoiding the exploration of as many infeasible programs as possible. Thus, if we have a good policy π for this MDP, then a rollout of π is likely to correspond to a solution of the given synthesis problem.

*Example 4.* Consider the same specification (i.e., input-output example) φ from Example 2 and the DSL from Example 1. The partial program

P <sup>≡</sup> add(reverse(x), take(x, N))

is a terminal state of <sup>M</sup><sup>φ</sup> since Deduce(P, φ) yields a non-empty set, and we have <sup>R</sup>(P) = <sup>−</sup>1. Thus, the following sequence corresponds to a rollout of <sup>M</sup><sup>φ</sup>:

(S, S → L, 0), (L, L → add(L, L), 0), (add(L1, L2), L → reverse(L), 0) (add(reverse(L1), L2), L → x, 0), (add(reverse(x), L), L → take(L, N), 0) (add(reverse(x), take(L, N)), L <sup>→</sup> x, 0), (add(reverse(x), take(x, N)), <sup>∅</sup>, <sup>−</sup>1).

*Simplified Policy Gradient Estimate for* M<sup>φ</sup>*.* Since our synthesis algorithm will be based on policy gradient, we will now derive a simplified policy gradient for our MDP <sup>M</sup><sup>φ</sup>. First, by construction of <sup>M</sup><sup>φ</sup>, a rollout <sup>ζ</sup> has the form

$$(P\_1, r\_1, 0), \dots, (P\_m, \mathcal{B}, q)$$

where <sup>q</sup> = 1 if <sup>P</sup><sup>m</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> and <sup>q</sup> <sup>=</sup> <sup>−</sup>1 otherwise. Thus, the term (P) from Eq. <sup>1</sup> can be simplified as follows:

$$\ell(P\_m) = \sum\_{i=1}^{m-1} q \cdot \nabla\_{\theta} \log \pi\_{\theta}(P\_i, r\_i), \tag{3}$$

where <sup>P</sup><sup>m</sup> ∼ Dπ<sup>θ</sup> is a final state (i.e., complete program or infeasible partial program) sampled using π<sup>θ</sup>. Then, Eq. <sup>1</sup> is equivalently

$$\nabla\_{\theta} J(\pi\_{\theta}) \approx \frac{1}{n} \sum\_{k=1}^{n} \ell(P^{(k)}),\tag{4}$$

where <sup>P</sup>(k) ∼ D<sup>π</sup><sup>θ</sup> for each <sup>k</sup> ∈ {1, ..., n}.

#### **5 RL-Based Synthesis Algorithm**

In this section, we describe our synthesis algorithm based on reinforcement learning. Our method is an *off-policy* variant of the standard (on-policy) policy gradient algorithm and incorporates additional feedback – in the form of other infeasible programs – provided by the deduction engine when improving its policy parameters. We first give a high-level overview of the synthesis algorithm and then explain how to update the policy.

#### **5.1 Overview of Synthesis Algorithm**

Our RL-based synthesis algorithm is presented in Fig. 3. In addition to specification φ and domain-specific language L, this algorithm also takes as input an initial policy <sup>π</sup><sup>0</sup> that has been trained off-line on a representative set of training problems.<sup>1</sup> In each iteration of the main synthesis loop, we first obtain a rollout of the current policy by calling the GetRollout procedure at line 7. Here, each rollout either corresponds to a complete program P or an infeasible partial program. If P is complete *and* satisfies the specification, we return it as a solution in line 8. Otherwise, we use feedback C provided by the deduction engine to improve the current policy (line 9). In the following subsections, we explain the GetRollout and UpdatePolicy procedures in more detail.

#### **5.2 Sampling Rollouts**

The GetRollout procedure iteratively expands a partial program, starting from the start symbol S of the grammar (line 11). In each iteration (lines 12– 19), we first check whether the current partial program P is feasible by calling Deduce. If P is infeasible (i.e., <sup>C</sup> is non-empty), then we have reached a terminal

<sup>1</sup> We explain how to train this initial policy in Sect. 6.

1: **procedure** Synthesize(L, φ, π0) 2: **input:** Domain-specific language L = (V, Σ, R, S) 3: **input:** Specification φ; initial policy π<sup>0</sup> 4: **output:** Complete program P such that P |= φ 5: π<sup>θ</sup> ← π<sup>0</sup> 6: **while true do** 7: (P, <sup>C</sup>) <sup>←</sup> GetRollout(L, φ, πθ) 8: **if** <sup>C</sup> <sup>=</sup> <sup>∅</sup> **then return** <sup>P</sup> 9: **else** <sup>π</sup><sup>θ</sup> <sup>←</sup> UpdatePolicy(πθ, <sup>C</sup>) 10: **procedure** GetRollout(L, φ, πθ) 11: P ← S 12: **while true do** 13: C ← Deduce(P, φ) 14: **if** <sup>C</sup> <sup>=</sup> <sup>∅</sup> **then return** (P, <sup>C</sup>) 15: **choose** <sup>r</sup> <sup>∼</sup> <sup>π</sup>θ(P) <sup>∧</sup> Lhs(r) <sup>∈</sup> Holes(P) 16: <sup>P</sup> <sup>←</sup> Fill(P, r) 17: **if** IsComplete(P) **then** 18: **if** <sup>P</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> **then return** (P, <sup>∅</sup>) 19: **else return** (P, {P}) 20: **procedure** UpdatePolicy(πθ, <sup>C</sup>) 21: **for** k ∈ {1, ..., n- } **do** 22: <sup>P</sup>(k) <sup>∼</sup> Uniform(C) 23: θ- ← θ + η n- <sup>k</sup>=1 (P(k) ) · <sup>D</sup>πθ (<sup>P</sup> (k)) 1/|C| 24: **return** π<sup>θ</sup>-

**Fig. 3.** Deduction-guided synthesis algorithm based on reinforcement learning

state of the MDP; thus, we return P as the final state of the rollout. Otherwise, we continue expanding P according to the current policy π<sup>θ</sup>. Specifically, we first sample an action (i.e., grammar production) r that is applicable to the current state (i.e., the left-hand-side of r is a hole in P), and, then, we expand P by calling the Fill procedure (defined in Sect. 3) at line 16. If the resulting program is complete, we have reached a terminal state and return P; otherwise, we continue expanding P according to the current policy.

#### **5.3 Improving the Policy**

As mentioned earlier, our algorithm improves the policy by using the feedback C provided by the deduction engine. Specifically, consider an infeasible program P explored by the synthesis algorithm at line 7. Since Deduce(P, φ) yields a set of infeasible programs, for every program P ∈ C, we know that the reward should be −1. As a consequence, we should be able to incorporate the rollout used to construct P into the policy gradient estimate based on Eq. (3). However, the challenge to doing so is that Eq. (4) relies on *on-policy* samples – i.e., the programs P(k) in Eq. (4) must be sampled using the current policy πθ. Since P ∈ C is not sampled using πθ, we cannot directly use it in Eq. (4).

Instead, we use *off-policy* RL to incorporate P into the estimate of <sup>∇</sup>θJ(πθ) [28]. Essentially, the idea is to use *importance weighting* to incorporate data sampled from a different distribution than Dπ<sup>θ</sup> . In particular, suppose we are given a distribution <sup>D</sup>˜ over final states. Then, we can derive the following gradient:

$$\begin{split} \nabla\_{\theta} J(\pi\_{\theta}) &= \mathbb{E}\_{P \sim \mathcal{D}\_{\pi\_{\theta}}} [\ell(P)] \\ &= \mathbb{E}\_{P \sim \mathcal{D}} \left[ \ell(P) \cdot \frac{\mathcal{D}\_{\pi\_{\theta}}(P)}{\tilde{\mathcal{D}}(P)} \right] \end{split} \tag{5}$$

Intuitively, the *importance weight* <sup>D</sup>πθ (<sup>P</sup> ) <sup>D</sup>˜(<sup>P</sup> ) accounts for the fact that <sup>P</sup> is sampled from the "wrong" distribution.

Now, we can use the distribution <sup>D</sup>˜ = Uniform(Deduce(P , φ)) for a randomly sampled final state <sup>P</sup> ∼ D<sup>π</sup><sup>θ</sup> . Thus, we have<sup>2</sup>:

**Theorem 2.** *The policy gradient is*

$$\nabla\_{\theta}J(\pi\_{\theta}) = \mathbb{E}\_{P' \sim \mathcal{D}\_{\pi\_{\theta}}, P \sim Uniform(\text{DENDCE}(P', \phi))} \left[ \ell(P) \cdot \frac{\mathcal{D}\_{\pi\_{\theta}}(P)}{1/|\text{DEDUCE}(P', \phi)|} \right]. \tag{6}$$

*Proof.* Note that

$$\begin{split} \nabla\_{\theta}J(\pi\_{\theta}) &= \mathbb{E}\_{P' \sim \mathcal{D}\_{\pi\_{\theta}}} [\nabla\_{\theta}J(\pi\_{\theta})] \\ &= \mathbb{E}\_{P' \sim \mathcal{D}\_{\pi\_{\theta}}, P \sim \text{Uniform}(\text{DenUCE}(P', \phi))} \left[ \ell(P) \cdot \frac{\mathcal{D}\_{\pi\_{\theta}}(P)}{1/|\text{DendUCE}(P', \phi)|} \right], \end{split}$$

as claimed.

The corresponding estimate of <sup>∇</sup><sup>θ</sup>J(π<sup>θ</sup>) is given by the following equation:

$$\nabla\_{\theta}J(\theta) \approx \frac{1}{n} \sum\_{k=1}^{n} \frac{1}{n'} \sum\_{k'=1}^{n'} \ell(P^{(k,k')}) \cdot \frac{\mathcal{D}\_{\pi\_{\theta}}(P^{(k,k')})}{1/|\text{DendUCE}(P^{(k)}, \phi)|},$$

where P(k) <sup>∼</sup> <sup>D</sup>˜ and <sup>P</sup>(k,k- ) <sup>∼</sup> Uniform(Deduce(P(k) , φ)) for each k ∈ {1, ..., n} and k ∈ {1, ..., n }. Our actual implementation uses n = 1, in which case this equation can be simplified to the following:

$$\nabla\_{\theta}J(\theta) \approx \frac{1}{n'} \sum\_{k'=1}^{n'} \ell(P) \cdot \frac{\mathcal{D}\_{\pi\_{\theta}}(P^{(k')})}{1/|\text{DendUC}(P,\phi)|},\tag{7}$$

<sup>2</sup> Technically, importance weighting requires that the support of <sup>D</sup>˜ contains the support of <sup>D</sup><sup>π</sup><sup>θ</sup> . We can address this issue by combining <sup>D</sup>˜ and <sup>D</sup><sup>π</sup>θ—in particular, take <sup>D</sup>˜(P) = (1 <sup>−</sup> ) · Uniform(Deduce(P- , φ))(P) + · D<sup>π</sup><sup>θ</sup> (P), for any > 0.

where P <sup>∼</sup> <sup>D</sup>˜ and P(k- ) <sup>∼</sup> Uniform(Deduce(P, φ)) for each k ∈ {1, ..., n }.

Now, going back to our synthesis algorithm from Fig. 3, the UpdatePolicy procedure uses Eq. <sup>7</sup> to update the policy parameters θ. Specifically, given a set <sup>C</sup> of infeasible partial programs, we first sample n programs <sup>P</sup>(1),...,P(n- ) from <sup>C</sup> uniformly at random (line 22). Then, we use the probability of each P(k) being sampled from the current distribution Dπ<sup>θ</sup> to update the policy parameters to a new value θ according to Eq. 7.

*Example 5.* Suppose that the current policy assigns the following probabilities to these state, action pairs:

$$\begin{aligned} &\pi\_{\theta}((\mathsf{add}(\mathtt{reverse}(x), L)), L \to \mathtt{take}(L, N)) = 0.3 \\ &\pi\_{\theta}((\mathsf{add}(\mathtt{reverse}(x), L)), L \to \mathtt{drop}(L, N)) = 0.3 \\ &\pi\_{\theta}((\mathsf{add}(\mathtt{reverse}(x), L)), L \to \mathtt{sumUpTo}(L)) = 0.1 \end{aligned}$$

Furthermore, suppose that we sample the following rollout using this policy:

$$P \equiv \mathsf{add}(\mathsf{reoverse}(x), \mathsf{take}(x, N)),$$

This corresponds to an infeasible partial program, and, as in Example 3, Deduce(P, φ) yields {P, P } where P <sup>≡</sup> add(reverse(x), drop(x, N)). Using the gradients derived by Eq. 7, we update the policy parameters θ to θ . The updated policy now assigns the following probabilities to the same state, action pairs:

$$\begin{aligned} &\pi\_{\theta'}((\mathtt{add}(\mathtt{reversse}(x),L)),L \to \mathtt{take}(L,N)) = 0.15\\ &\pi\_{\theta'}((\mathtt{add}(\mathtt{reversse}(x),L)),L \to \mathtt{drop}(L,N)) = 0.15\\ &\pi\_{\theta'}((\mathtt{add}(\mathtt{reversse}(x),L)),L \to \mathtt{sumUpTo}(L)) = 0.2 \end{aligned}$$

Observe that the updated policy makes it less likely that we will expand the partial program add(reverse(x), L)) using the drop production in addition to the take production. Thus, if we reach the same state add(reverse(x), L) during rollout sampling in the next iteration, the policy will make it more likely to explore the sumUpTo production, which does occur in the desired program

$$\mathtt{add}(\mathtt{reversse}(x), \mathtt{sumUpTo}(x))$$

that meets the specification from Example 2.

#### **6 Implementation**

We have implemented the proposed algorithm in a new tool called Concord written in Python. In what follows, we elaborate on various aspects of our implementation.

#### **6.1 Deduction Engine**

Concord uses the same deduction engine described by Feng et al. [18]. Specifically, given a partial program P, Concord first generates a specification ϕ of P by leveraging the abstract semantics of each DSL construct. Then, Concord issues a satisfiability query to the Z3 SMT solver [15] to check whether ϕ is consistent with the provided specification. If it is not, this means that P is infeasible, and Concord proceeds to infer other partial programs that are also infeasible for the same reason as P. To do so, Concord first obtains an unsatisfiable core <sup>ψ</sup> for the queried formula, and, for each clause <sup>c</sup><sup>i</sup> of <sup>ψ</sup> originating from DSL construct <sup>f</sup><sup>i</sup>, it identifies a set <sup>S</sup><sup>i</sup> of other DSL constructs whose semantics imply <sup>c</sup><sup>i</sup>. Finally, it generates a set of other infeasible programs by replacing all f<sup>i</sup>'s in the current program with another construct drawn from its corresponding set S<sup>i</sup>.

#### **6.2 Policy Network**

**Architecture.** As shown by Fig. 4, Concord represents its underlying policy using a deep neural network (DNN) <sup>π</sup><sup>θ</sup>(r <sup>|</sup> P), which takes as input the current state (i.e., a partial program P) and outputs a probability distribution over actions (i.e., productions r in the DSL). We represent each program P as a flat sequence of statements and use a recurrent neural network (RNN) architecture, as this is a natural choice for sequence inputs. In particular, our policy network is a gated recurrent unit (GRU) network [13], which is a state-of-the-art RNN architecture. Our policy network has one hidden layer with 256 neurons; this layer is sequentially applied to each statement in the partial program together with the latent vector from processing the previous statement. Once the entire partial program <sup>P</sup> has been encoded into a vector, <sup>π</sup><sup>θ</sup> has a final layer that outputs a distribution over DSL productions r based on this vector.

**Fig. 4.** The architecture of the policy network showing how to roll out the partial program in Example 4.

**Pretraining the Initial Policy.** Recall from Sect. 5 that our synthesis algorithm takes a input an *initial policy network* that is updated during the synthesis process. One way to initialize the the policy network would be to use a standard random initialization of the network weights. However, a more effective alternative is to *pretrain* the policy on a benchmark suite of program synthesis problems [44]. Specifically, consider a representative training set <sup>X</sup>train of synthesis problems of the form (φ, P), where φ is the specification and P is the desired program. To obtain an initial policy, we augment our policy network to take as input an encoding of the specification φ for the current synthesis problem – i.e., it has the form <sup>π</sup>θ(<sup>r</sup> <sup>|</sup> P, φ).<sup>3</sup> Then, we use supervised learning to train <sup>π</sup><sup>θ</sup> to predict P given φ—i.e.,

$$\theta^0 = \underset{\theta}{\text{arg}\,\text{max}} \sum\_{(\phi, P) \in X\_{\text{train}}} \sum\_{i=1}^{|P|-1} \pi\_{\theta}(r\_i \mid P\_i, \phi).$$

We optimize θ using stochastic-gradient descent (SGD) on this objective.

Given a new synthesis problem φ, we use π<sup>θ</sup><sup>0</sup> as the initial policy. Our RL algorithm then continues to update the parameters starting from θ<sup>0</sup>.

#### **6.3 Input Featurization**

As standard, we need a way to featurize the inputs to our policy network – i.e., the statements in each partial program P, and the specification φ. Our current implementation assumes that statements are drawn from a finite set and featurizes them by training a different embedding vector for each kind of statement. While our general methodology can be applied to different types specifications, our implementation featurizes the specification under the assumption that it consists of input-output examples and uses the same methodology described by Balog et al. [2].

#### **6.4 Optimizations**

Our implementation performs a few optimization over the algorithm presented in Sect. 5. First, since it is possible to sample the same rollout multiple times, our implementation uses a hash map to check whether a rollout has already been explored. Second, in different invocations of the GetRollout procedure from Fig. 3, we may end up querying the feasibility of the same state (i.e., partial program) *many* times. Since checking feasibility requires a potentially-expensive call to the SMT solver, our implementation also memoizes the results of feasibility checks for each state. Finally, similar to Chen et al. [11], we use a 3-model ensemble to alleviate some of the randomness in the synthesis process and return a solution as soon as one of the models in the ensemble finds a correct solution.

### **7 Evaluation**

In this section, we describe the results from our experimental evaluation, which is designed to answer the following key research questions:

<sup>3</sup> Including the specification as an input to π<sup>θ</sup> is unnecessary if we do not use pretraining, since φ does not change for a single synthesis problem.

**Fig. 5.** Comparison between Concord, Neo, and DeepCoder


*Benchmarks.* We evaluate the proposed technique on a total of 100 synthesis tasks used in prior work [2,18]. Specifically, these synthesis tasks require performing non-trivial transformations and computations over lists using a functional programming language. Since these benchmarks have been used to evaluate both Neo [18] and DeepCoder [2], they provide a fair ground for comparing our approach against two of the most closely-related techniques. In particular, note that DeepCoder uses a pre-trained deep neural network to guide its search, whereas Neo uses both statistical and logical reasoning (i.e., statistical model to guide search and deduction to prune the search space). However, unlike our proposed approach, neither Neo nor DeepCoder update their statistical model during synthesis time.

*Training.* Recall that our algorithm utilizes a pre-trained initial policy. To generate the initial policy, we use the same methodology described in DeepCoder [2] and adopted in Neo [18]. Specifically, we randomly generate both programs and inputs, and we obtain the corresponding output by executing the program. Then, we train the DNN model discussed in Sect. 6 on the Google Cloud Platform with a 2.20 GHz Intel Xeon CPU and an NVIDIA Tesla K80 GPU using 16 GB of memory.

#### **7.1 Comparison Against Existing Tools**

To answer our first research question, we compare Concord against both Neo and DeepCoder on the 100 synthesis benchmarks discussed earlier. The result of this comparison is shown in Fig. 5, which plots the number of benchmarks solved within a given time limit for each of the three tools. As we can see from this figure, Concord outperforms DeepCoder and Neo both in terms of synthesis time as well as the number of benchmarks solved within the 5-min time limit. In particular, Concord can solve 82% of these benchmarks with an average running time of 36 s, whereas Neo (resp. DeepCoder) solves 71% (resp. 32%) with an average running time of 99 s (resp. 205 s). Thus, we believe these results answer our first research question in a positive way.

#### **7.2 Ablation Study**

To answer our remaining research questions, we perform an ablation study in which we compare Concord against three variants:


The results from this evaluation are summarized in Table 1. Here, the first column labeled "# solved" shows the number of solved benchmarks, and the second column shows percentage improvement over Neo in terms of benchmarks solved. The third column shows average synthesis time for benchmarks that can

<sup>4</sup> We reimplement the RL algorithm proposed in [44] since we cannot directly compare against their tool. Specifically, the policy network in their implementation is tailored to their problem domain.


**Table 1.** Results of ablation study result comparing different variants.

be solved by *all* variants and Neo. Finally, the last column shows speed-up in terms of synthesis time compared to Neo.

As we can see from this table, all variants are significantly worse than Concord in terms of the number of benchmarks that can be solved within a 5-min time limit<sup>5</sup>. Furthermore, as we can see from the column labeled "Delta to Neo", all of our proposed ideas are important for improving over the state-of-the-art, as Neo outperforms all three variants but not the full Concord system, which solves 15% more benchmarks compared to Neo.

Next, looking at the third column of Table 1, we see that all three variants of Concord are significantly slower compared to Concord in terms of synthesis time. While both Concord and all of its variants outperform Neo in terms of synthesis time (for benchmarks solved by all tools), Concord by far achieves the greatest speed-up over Neo.

In summary, the results from Table 1 highlight that all of our proposed ideas (i.e., (1) improving policy at synthesis time; (2) using feedback from deduction; and (3) off-policy RL) make a significant difference in practice. Thus, we conclude that the ablation study positively answers our last three research questions.

#### **8 Related Work**

In this section, we survey prior work that is closely related to the techniques proposed in this paper.

*Program Synthesis.* Over the past decade, there has been significant interest in automatically synthesizing programs from high-level expressions of user intent [2,6,21,23,25,39,40,46]. Some of these techniques are geared towards computer end-users and therefore utilize informal specifications such as inputoutput examples [23,40,50], natural language [24,42,55,56], or a combination of both [10,12]. On the other hand, program synthesis techniques geared towards programmers often utilize additional information, such as a program sketch [17,36,46,49] or types [33,39] in addition to test cases [20,30] or logical specifications [6,49]. While the synthesis methodology proposed in this paper

<sup>5</sup> To understand the improvement brought by the pre-trainedd policy, we also conduct a baseline experiment by using randomly initialized policy in Concord. Given the setting, Concord can solve as many as 27% of the benchmarks in the given 5-min time limit.

can, in principle, be applied to a broad set of specifications, the particular featurization strategy we use in our implementation is tailored towards input-output examples.

*Deduction-Based Pruning.* In this paper, we build on a line of prior work on using deduction to prune the search space of programs in a DSL [18,19,21,39,53]. Some of these techniques utilize type-information and type-directed reasoning to detect infeasible partial programs [20–22,37,39]. On the other hand, other approaches use some form of lightweight program analysis to prune the search space [18,19,53]. Concretely, Blaze uses abstract interpretation to build a compact version space representation capturing the space of all feasible programs [53]; Morpheus [19] and Neo [18] utilize logical specifications of DSL constructs to derive specifications of partial programs and query an SMT solver to check for feasibility; Scythe [50] and Viser [51] use deductive reasoning to compute approximate results of partial programs to check their feasibility. Our approach learns from deduction feedback to improve search efficiency. As mentioned in Sect. 6, the deductive reasoning engine used in our implementation is similar to the latter category; however, it can, in principle, be used in conjunction with other deductive reasoning techniques for pruning the search space.

*Learning from Failed Synthesis Attempts.* The technique proposed in this paper can utilize feedback from the deduction engine in the form of other infeasible partial programs. This idea is known as *conflict-driven learning* and has been recently adopted from the SAT solving literature [5,57] to program synthesis [18]. Specifically, Neo uses the unsat core of the program's specification to derive other infeasible partial programs that share the same root cause of failure, and, as described in Sect. 6, we use the same idea in our implementation of the deduction engine. While we use logical specifications to infer other infeasible programs, there also exist other techniques (e.g., based on testing [54]) to perform this kind of inference.

*Machine Learning for Synthesis.* This paper is related to a long line of work on using machine learning for program synthesis. Among these techniques, some of them train a machine learning model (typically a deep neural network) to directly predict a full program from the given specification [12,16,34,35]. Many of these approaches are based on sequence-to-sequence models [47], sequence to tree models [56], or graph neural networks [41] commonly used in machine translation.

A different approach, sometimes referred to as *learning to search*, is to train a statistical model that is used to *guide* the search rather than directly predict the target program. For example, DeepCoder [2] uses a deep neural network (DNN) to predict the most promising grammar productions to use for the given inputoutput examples. Similarly, R3NN [38] and NGDS [26] use DNNs to predict the most promising grammar productions conditioned on both the specification and the current partial program. In addition, there has been work on using concrete program executions on the given input-output examples to guide the DNN [11,52]. Our technique for pretraining the initial policy network is based on the same ideas as these supervised learning approaches; however, their initial policies do not change during the synthesis algorithm, whereas we continue to update the policy using RL.

While most of the work at the intersection of synthesis and machine learning uses *supervised learning* techniques, recent work has also proposed using reinforcement learning to speed up syntax-guided synthesis [8,29,31,44] These approaches are all on-policy and do not incorporate feedback from a deduction engine. In contrast, in our problem domain, rewards are very sparse in the program space, which makes exploration highly challenging in a on-policy learning setting. Our approach addresses this problem using off-policy RL to incorporate feedback from the deduction engine. Our ablation study results demonstrate that our off-policy RL is able to scale to more complex benchmarks.

*Reinforcement Learning for Formal Methods.* There has been recent interest in applying reinforcement learning (RL) to solve challenging PL problems where large amounts of labeled training data are too expensive to obtain. For instance, Si et al. use graph-based RL to automatically infer loop invariants [43], Singh et al. use Q-learning (a different RL algorithm) to speed up program analysis based on abstract interpretation [45], Dai et al. [14] uses meta-reinforcement learning for test data generation, and Chen et al. [9] uses RL to speed up relational program verification. However, these approaches only use RL offline to pretrain a DNN policy used to guide search. In contrast, we perform reinforcement learning online during synthesis. Bastani et al. has used an RL algorithm called Monte-carlo tree search (MCTS) to guide a specification inference algorithm [3]; however, their setting does not involve any kind of deduction.

#### **9 Conclusion and Future Work**

We presented a new program synthesis algorithm based on reinforcement learning. Given an initial policy trained off-line, our method uses this policy to guide its search at synthesis time but also gradually improves this policy using feedback obtained from a deductive reasoning engine. Specifically, we formulated program synthesis as a reinforcement learning problem and proposed a new variant of the *policy gradient* algorithm that is better suited to solve this problem. In addition, we implemented the proposed approach in a new tool called Concord and evaluated it on 100 synthesis tasks taken from prior work. Our evaluation shows that Concord outperforms a state-of-the-art tool by solving 15% more benchmarks with an average speedup of 8.71×. In addition, our ablation study highlights the advantages of our proposed reinforcement learning algorithm.

There are several avenues for future work. First, while our approach is applicable to different DSLs and specifications, our current implementation focuses on input-output examples. Thus, we are interested in extending our implementation to richer types of specifications and evaluating our method in application domains that require such specifications. Another interesting avenue for future work is to integrate our method with other types of deductive reasoning engines. In particular, while our deduction method is based on SMT, it would be interesting to try other methods (e.g., based on types or abstract interpretation) in conjunction with our proposed RL approach.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Manthan: A Data-Driven Approach for Boolean Function Synthesis**

Priyanka Golia1,2(B), Subhajit Roy<sup>1</sup>, and Kuldeep S. Meel<sup>2</sup>

<sup>1</sup> Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, India

{pgolia,subhajit}@cse.iitk.ac.in <sup>2</sup> School of Computing, National University of Singapore, Singapore, Singapore meel@comp.nus.edu.sg

**Abstract.** Boolean functional synthesis is a fundamental problem in computer science with wide-ranging applications and has witnessed a surge of interest resulting in progressively improved techniques over the past decade. Despite intense algorithmic development, a large number of problems remain beyond the reach of the state of the art techniques.

Motivated by the progress in machine learning, we propose Manthan, a novel data-driven approach to Boolean functional synthesis. Manthan views functional synthesis as a classification problem, relying on advances in constrained sampling for data generation, and advances in automated reasoning for a novel proof-guided refinement and provable verification. On an extensive and rigorous evaluation over 609 benchmarks, we demonstrate that Manthan significantly improves upon the current state of the art, solving 356 benchmarks in comparison to 280, which is the most solved by a state of the art technique; thereby, we demonstrate an increase of 76 benchmarks over the current state of the art. Furthermore, Manthan solves 60 benchmarks that none of the current state of the art techniques could solve. The significant performance improvements, along with our detailed analysis, highlights several interesting avenues of future work at the intersection of machine learning, constrained sampling, and automated reasoning.

#### **1 Introduction**

Given an existentially quantified Boolean formula <sup>∃</sup>Y F(X, Y ) over the set of variables X and Y , the problem of Boolean functional synthesis is to compute a vector of Boolean functions, denoted by <sup>Ψ</sup>(X) = ψ1(X), ψ2(X),...,ψ|<sup>Y</sup> <sup>|</sup>(X), and referred to as Skolem function vector, such that <sup>∃</sup>Y F(X, Y ) <sup>≡</sup> <sup>F</sup>(X, Ψ(X)). In the context of applications, the sets X and Y are viewed as inputs and outputs, and the formula F(X, Y ) is viewed as a functional specification capturing the relationship between X and Y , while the Skolem function vector Ψ(X) allows one to determine the value of Y for the given X by evaluating Ψ. The study of

c The Author(s) 2020

The open source tool is available at https://github.com/meelgroup/manthan.

S. K. Lahiri and C. Wang (Eds.): CAV 2020, LNCS 12225, pp. 611–633, 2020. https://doi.org/10.1007/978-3-030-53291-8\_31

Boolean functional synthesis traces back to Boole [12], and over the decades, the problem has found applications in a wide variety of domains such as certified QBF solving [8,9,36,41], automated program repair [27], program synthesis [44], and cryptography [35].

Theoretical investigations have demonstrated that there exist instances where Boolean functional synthesis takes super-polynomial time. On the other hand, practical applicability has necessitated the development of algorithms with progressively impressive scaling. The algorithmic progress for Boolean functional synthesis has been driven by a diverse set of techniques: (i) the usage of incremental determinization employing the several heuristics in state-of-the-art Conflict Driven Clause Learning (CDCL) solvers [41], (ii) usage of decomposition techniques employing the progress in knowledge compilation [6,19,28,45], and (iii) Counter-Example Guided Abstraction Refinement (CEGAR)-based techniques relying on usage of SAT solvers as black boxes [4–6,28]. While the state of the art techniques are capable of handling problems of complexity beyond the capability of tools a decade ago, the design of scalable algorithms capable of handling industrial problems remains the holy grail.

In this work, we take a step towards the above goal by proposing a novel approach, called Manthan, at the intersection of machine learning, constrained sampling, and automated reasoning. Motivated by the unprecedented advances in machine learning, we view the problem of functional synthesis through the lens of multi-class classification aided by the generation of the data via constrained sampling and employ automated reasoning to certify and refine the learned functions. To this end, the architecture of Manthan comprises of the following three novel techniques:


**Proof-Guided Refinement.** Since machine learning techniques often produce good but inexact approximations, we augment our method with automated reasoning techniques to verify the correctness of decision tree-based candidate Skolem functions. To this end, we perform a counterexample driven refinement approach for candidate Skolem functions.

To fully utilize the impressive test accuracy attained by machine learning models, we design a *proof-guided refinement* approach that seeks to identify and apply *minor* repairs to the candidate functions, in an iterative manner, until we converge to a provably correct Skolem function vector. In a departure from prior approaches utilizing the Shannon expansion and self-substitution, we first use a MaxSAT solver to determine potential repair candidates, and employ unsatisfiability cores obtained from the infeasibility proofs capturing the reason for current candidate functions to meet the specification, to construct a *good repair*.

Finally, We perform an extensive evaluation over a diverse set of benchmarks with state-of-the-art tools, viz. C2Syn [4], BFSS [5], and CADET [39]. Of 609 benchmarks, Manthan is able to solve 356 benchmarks while C2Syn, BFSS, and CADET solve 206, 247, and 280 benchmarks respectively. Significantly, Manthan can solve 60 benchmarks beyond the reach of all the other existing tools extending the reach of functional synthesis tools. We then perform an extensive empirical evaluation to understand the impact of different design choices on the performance of Manthan. Our study reveals several surprising observations arising from the inter-play of machine learning and automated reasoning.

Manthan owes its runtime performance to recent advances in machine learning, constrained sampling, and automated reasoning. Encouraged by Manthan's scalability, we will seek to extend the above approach to related problem domains such as automated program synthesis, program repair, and reactive synthesis.

The rest of the paper is organized as follows: We first introduce notations and preliminaries in Sect. 2. We then discuss the related work in Sect. 3. In Sect. 4 we present an overview of Manthan and give an algorithmic description in Section 5. We then describe the experimental methodology and discuss results in Sect. 6. Finally, we conclude in Sect. 7.

#### **2 Notations and Preliminaries**

We use lower case letters (with subscripts) to denote propositional variables and upper case letters to denote a subset of variables. The formula <sup>∃</sup>Y F(X, Y ) is existentially quantified in <sup>Y</sup> , where <sup>X</sup> <sup>=</sup> {x1, ··· , x<sup>n</sup>} and <sup>Y</sup> <sup>=</sup> {y1, ··· , y<sup>m</sup>}. For notational clarity, we use F to refer to F(X, Y ) when clear from the context. We denote V ars(F) as the set of variables appearing in F(X, Y ). A literal is a boolean variable or its negation. We often abbreviate universally (resp. existentially) quantified variables as universal (resp. existential) variables.

<sup>A</sup> *satisfying assignment* of a formula <sup>F</sup>(X, Y ) is a mapping <sup>σ</sup> : V ars(F) <sup>→</sup> {0, <sup>1</sup>}, on which the formula evaluates to True. For <sup>V</sup> <sup>⊆</sup> V ars(F), <sup>σ</sup>[<sup>V</sup> ] represents the truth values of variables in V in a satisfying assignment σ of F. We denote the set of all witnesses of F as R<sup>F</sup> . For a formula in conjunctive normal form, the *unsatisfiable core*(UnsatCore) is a subset of clauses of the formula for which no satisfying assignment exists.

We use <sup>F</sup>(X, Y )|y*i*=<sup>b</sup> to denote *substitutions*: a formula obtained after substituting every occurrence of y<sup>i</sup> in F(X, Y ) by b, where b can be a constant (0 or 1) or a formula. The operator *ite(condition,exp1,exp2)* is used to represent the if-else case: if the *condition* is true, then it returns *exp1*, else it returns *exp2*.

A variable <sup>y</sup><sup>i</sup> is considered as a *positive unate* if and only if <sup>F</sup>(X, Y )|y*i*=0 <sup>∧</sup> <sup>¬</sup>F(X, Y )|<sup>y</sup>*i*=1 is UNSAT and a *negative unate* if and only if <sup>F</sup>(X, Y )|<sup>y</sup>*i*=1 <sup>∧</sup> <sup>¬</sup>F(X, Y )|<sup>y</sup>*i*=0 is UNSAT [5].

Given a function vector ψ1,...,ψ<sup>m</sup> for the vector of variables y1,...y<sup>m</sup> such that ψ<sup>i</sup> is the function corresponding to yi, we say that there exists a partial order <sup>≺</sup><sup>d</sup> over the variables {y1,...y<sup>m</sup>} such that <sup>y</sup><sup>i</sup> <sup>≺</sup><sup>d</sup> <sup>y</sup><sup>j</sup> if <sup>ψ</sup><sup>i</sup> depends on <sup>y</sup><sup>j</sup> .

In decision tree learning, a fraction of incorrectly assigned labels refer to the *impurity*. We use Gini Index [38] as a measure of *impurity* for a class label. The *impurity decrease* at a node is the difference of its impurity to the mean of impurities of its children. The *minimum impurity decrease* is a hyper-parameter used to control the maximum allowable impurity at the leaf nodes, thereby providing a lever for how closely the classifier fits the training data.

Given a propositional formula <sup>F</sup>(X, Y ) and a weight function <sup>W</sup>(·) assigning non-negative weights to every literal, we refer to the *weight* of a satisfying assignment σ, denoted as W(σ), as the product of weights of all the literals appearing in σ, i.e., W(σ) = - <sup>l</sup>∈<sup>σ</sup> <sup>W</sup>(l). A *sampler* <sup>A</sup>(·, ·) is a probabilistic generator that guarantees <sup>∀</sup><sup>σ</sup> <sup>∈</sup> <sup>R</sup><sup>F</sup> , Pr[A(F,Bias) = <sup>σ</sup>] <sup>∝</sup> <sup>W</sup>(σ).

We use a function Bias that takes a mapping from a sequence of variables to the desired weights of their positive literals, and assigns corresponding weights to each of the positive literals. We use a simpler notation, Bias(a,b) to denote that positive literals corresponding to all universal variables are assigned a weight a and positive literals corresponding to all existential variables are assigned a weight b. For example, Bias(0.5, 0.9) assigns a weight of 0.5 to the positive literals of the universally quantified variables and 0.9 to the positive literals of the existentially quantified variables.

*Problem Statement:* Given a Boolean specification F(X, Y ) between set of inputs <sup>X</sup> <sup>=</sup> {x1, ··· , x<sup>n</sup>} and vector of outputs <sup>Y</sup> <sup>=</sup> y1, ··· , y<sup>m</sup>, the problem of *Skolem function synthesis* is to synthesize a function vector Ψ = ψ1(X), ··· , ψm(X) such that <sup>y</sup><sup>i</sup> <sup>↔</sup> <sup>ψ</sup>i(X) and <sup>∃</sup>Y F(X, Y ) <sup>≡</sup> <sup>F</sup>(X, Ψ). We refer to Ψ as the *Skolem function vector* and ψ<sup>i</sup> as the *Skolem function* for yi.

A variable y<sup>i</sup> is called self-substituted variable, if the Skolem function ψ<sup>i</sup> corresponding to <sup>y</sup><sup>i</sup> is set to <sup>F</sup>(X, Y )|<sup>y</sup>*i*=1 [19].

Given a formula <sup>∃</sup>Y F(X, Y ) and a Skolem function vector <sup>Ψ</sup>, we refer to E(X, Y, Y ) as an *error formula* [28], where <sup>Y</sup> <sup>=</sup> {y <sup>1</sup>, ··· , y <sup>|</sup><sup>Y</sup> <sup>|</sup> }, and <sup>Y</sup> <sup>=</sup> <sup>Y</sup> .

$$E(X, Y, Y') = F(X, Y) \land \neg F(X, Y') \land (Y' \leftrightarrow \Psi) \tag{1}$$

We use the following theorems from prior work:

**Theorem 1 (**[28]**).** Ψ *is a Skolem function if and only if* E(X, Y, Y ) *is UNSAT.*

**Theorem 2 (**[5]**).** *If* y<sup>i</sup> *is positive(resp negative) unate in* F(X, Y )*, then* ψ<sup>i</sup> = 1 *(resp* ψ<sup>i</sup> = 0*) is the Skolem function for* yi*.*

#### **3 Related Work**

The origins of the problem of Boolean functional synthesis traces back to Boole's seminal work [12], which was subsequently rigorously pursued, albeit focused on decidability, by Lowenheim and Skolem [33]. The complexity theoretic studies have shown that there exist instances where Boolean functional synthesis takes super polynomial time and was also shown that there exist instances for which polynomial size Skolem function vector does not suffice unless Polynomial Hierarchy (PH) collapses [5].

Motivated by the success of the CEGAR (Counter-Example Guided Abstraction Refinement) approach in model checking, CEGAR-based approaches have been pursued in the context of synthesis as well, where the key idea is to use a Conflict-Driven Clause Learning (CDCL) SAT solver to verify and refine the candidate Skolem functions [4–6,28].

Another line of work has focused on the representation of specification, i.e., F(X, Y ), in representations that are amenable to efficient synthesis for a class of functions. The early approaches focused on ROBDD representation building on the functional composition approach proposed by Balabanov and Jiang [8]. Building on Tabajara and Vardi's ROBDD-based approach [45], Chakraborty et al. extended the approach to factored specifications [14]. It is worth mentioning that factored specifications had earlier been pursued in the context of CEGAR-based approaches. Motivated by the success of knowledge compilation in the field of probabilistic reasoning, Akshay et al. achieved a significant breakthrough over a series of papers [5,6,28] to propose a new negation normal form, SynNNF [4]. The generalization and a functional specification presented in SynNNF is amenable to efficient functional synthesis [4]. Another line of work focused on the usage of *incremental determinization* to incrementally construct the Skolem functions [25,30,36,39,41].

Several approaches have been proposed for the particular case when the specification, <sup>∃</sup>Y F(X, Y ) is valid, i.e., <sup>∀</sup>X∃Y F(X, Y ) is True. Inspired by the sequential relational decomposition, Chakraborty et al. [14] recently proposed an approach focused on viewing each CNF clause of the specification consisting of *input and output* clauses and employing a *cooperation*-based strategy. The progress in modern CDCL solvers has led to an exploration of usage of heuristics for problems in complexity classes beyond NP. This has led to work on the extraction of Skolem functions from the proofs constructed for the formulas expressed as <sup>∀</sup>X∃Y F(X, Y ) [8,9].

The performance of Manthan crucially depends on its ability to employ constrained sampling, which has witnessed a surge of interest with approaches ranging from those based on hashing-based techniques [15], knowledge compilation [24,42], augmentation of SAT solvers with heuristics [43].

The recent success of machine learning has led to several attempts to the usage of machine learning in several related synthesis domains such as program synthesis [7], invariant generation, decision-tree for functions in Linear Integer Arithmetic theory using pre-specified examples [18], strategy synthesis for QBF [26]. Use of data-driven approaches for invariant synthesis has been investigated in the ICE learning framework [17,20,21] aimed with data about the program behavior from test executions, it proposes invariants by learning from data, checks for inductiveness and, on failure, extend the data by the generated counterexamples. The usage of proof-artifacts such as unsat cores has been explored in verification since early 2000s [23] and in program repair in Wolverine [46], while MaxSAT has been used in program debugging in [10,29].

#### **4 Manthan: An overview**

In this section, we provide an overview of our proposed framework, Manthan, before divulging into core algorithmic details in the following section. Manthan takes in a function specification, represented as F(X, Y ), and returns a Skolem function vector <sup>Ψ</sup>(X) such that <sup>∃</sup>Y F(X, Y ) <sup>≡</sup> <sup>F</sup>(X, Ψ(X)). As shown in Fig. <sup>1</sup> Manthan consists of following three phases:


**Fig. 1.** Overview of Manthan

We now provide a high-level description of different phases to highlight the technical challenges, which provides context for several algorithmic design choices presented in the next section.

#### **4.1 Phase 1: Preprocess**

Preprocess focuses on pre-processing of the formula to search for unates among the variables in Y ; if y<sup>i</sup> is positive (resp. negative) unate, then ψ<sup>i</sup> = 1(resp. 0) suffices as a Skolem function. We employ the algorithmic routine proposed by Akshay et al. [5] to drive this preprocessing.

#### **4.2 Phase 2: LearnSkF**

LearnSkF views the problem of functional synthesis through the lens of machine learning where the learned machine learning model for classification of a variable y<sup>i</sup> can be viewed as a candidate Skolem function for yi. We gather training data about the function's behavior by exploiting the progress in constrained sampling to sample solutions of F(X, Y ). Recall that F(X, Y ) defines a relation (and not necessarily a function) between X and Y , and the machine learning techniques typically assume the existence of function between features and labels, necessitating the need for sophisticated sampling strategy as discussed below. Moving on to features and labels, since we want to learn Y in terms of X, we view X as a set of features while assignments to Y as a set of class labels.

The off-the-shelf classification techniques typically require that the size of training data is several times larger than the size of possible class labels, which would be prohibitively large for the typical problems involving more than thousand variables. To mitigate the requirement of large training data, we make note of two well-known observations in functional synthesis literature: (1) the Skolem function ψ<sup>i</sup> for a variable y<sup>i</sup> typically does not depend on all the variables in X, (2) A Skolem function vector Ψ where ψ<sup>i</sup> depends on variable y<sup>j</sup> is a valid vector if the Skolem function ψ<sup>j</sup> is not dependent on y<sup>i</sup> (i.e., acyclic dependency), i.e., there exists a partial order <sup>≺</sup><sup>d</sup> over {y1,...y<sup>m</sup>}.

The above observations lead us to design an algorithmic procedure where we learn candidate Skolem functions as decision trees in an iterative manner, i.e., one y<sup>i</sup> at a time, thereby allowing us to constrain ourselves to the binary classification. The learned classifier can then be represented as the disjunction of all the paths from the root to the leaves in the learnt decision tree. We update the set of possible features for a given y<sup>i</sup> depending on the candidate functions generated so far, i.e., valuation of X variables and Y variables, which are not dependent on yi. Finally, we compute the candidate Skolem function for y<sup>i</sup> as the disjunction of labels along edges for all the paths from the root to leaf nodes with label 1. Once, we have the candidate Skolem function vector Ψ, we obtain a valid linear extension, *TotalOrder* , of the partial order <sup>≺</sup><sup>d</sup> in accordance to <sup>Ψ</sup>.

Before moving on to the next phase, we return to the formulation of sampling. The past few years have witnessed the design of uniform [15,42], and weighted samplers [24], and one wonders what kind of sampler should we choose to generate samples for training data. A straightforward choice would be to perform uniform sampling over X and Y , but the relational nature of specification, F, between X and Y offers interesting challenges and opportunities. Recall while F specifies a relation between X and Y , we are interested in a Skolem function, and we would like to tailor our sampling subroutines to allow discovery of Skolem functions with *small* description given the relationship between description and sample complexity. To this end, consider <sup>X</sup> <sup>=</sup> {x1, x2} and <sup>Y</sup> <sup>=</sup> {y1}, and let <sup>F</sup> := (x<sup>1</sup> <sup>∨</sup>x<sup>2</sup> <sup>∨</sup>y1). Note that <sup>F</sup> has 7 solutions over <sup>X</sup> <sup>∪</sup><sup>Y</sup> , out of which <sup>y</sup><sup>1</sup> = 0 appears in 3 solutions while y<sup>1</sup> = 1 appears in 4. Also, note that there are several possible Skolem functions such as <sup>y</sup><sup>1</sup> <sup>=</sup> <sup>¬</sup>(x<sup>1</sup> <sup>∧</sup> <sup>x</sup>2). Now, if we uniformly sample solutions of F over x1, x2, y1, i.e. Bias(0.5, 0.5), we would see (almost) equal number of samples with y<sup>1</sup> = 0 and y<sup>1</sup> = 1. A closer look at F reveals that it is possible to construct a Skolem function by knowing that the only case where y<sup>1</sup> cannot be assigned 0 is when x<sup>1</sup> = x<sup>2</sup> = 0. To encode this intuition, we propose a novel idea of collecting samples with weighted sampling, i.e., Bias(0.5, q) where q is chosen in a multi-step process of first drawing a small set of samples with both q = 0.9 and q = 0.1, and then drawing rest of the samples by fixing the value of q following analysis of an initial set of samples. To the best of our knowledge, this is the first application of weighted sampling in the context of synthesis, and our experimental results point to several interesting avenues of future work.

#### **4.3 Phase 3: Refine**

The candidate Skolem functions generated in LearnSkF may not always be the actual Skolem functions. Hence, we require a *verification* check to see if candidate Skolem functions are indeed correct; if not, the generated counterexample can be used to *repair* it. The verification query constructs an *error formula* E(X, Y, Y ) (Formula 1): if unsatisfiable, the candidate Skolem function vector is indeed a Skolem function vector and the procedure can terminate; else, when E(X, Y, Y ) is SAT, the solution of E(X, Y, Y ) is used to identify and refine the erring functions among the candidate Skolem function vector.

In contrast to prior techniques that apply Shannon expansion or selfsubstitution, the refinement strategy in Manthan is guided by the view that the candidate function vector from the LearnSkF phase is *almost correct*, and hence, attempts to identify and apply a series of *minor* repairs to the erring functions to arrive at the correct Skolem function vector. To this end, Manthan uses two key techniques: *fault localization* and *repair synthesis*. Let us assume that σ is a satisfying assignment of E(X, Y, Y ) and referred to as counterexample for the current candidate Skolem function vector Ψ.

*Fault Localization.* In order to identify the initial candidates to repair for the counterexample σ, Manthan attempts to identify a small number of Skolem functions (correspondingly Y variables) whose outputs must undergo a change for the formula to behave correctly on σ; in other words, it makes a best-effort attempt to ensure that most of the Skolem functions (correspondingly Y variables) can retain their current output on σ while satisfying the formula. Manthan encodes this problem as a partial MaxSAT query with <sup>F</sup>(X, Y ) <sup>∧</sup> (<sup>X</sup> <sup>↔</sup> <sup>σ</sup>[X]) as a hard constraint and (<sup>Y</sup> <sup>↔</sup> <sup>σ</sup>[<sup>Y</sup> ]) as soft constraints. All Y variables whose valuation constraint (<sup>Y</sup> <sup>↔</sup> <sup>σ</sup>[<sup>Y</sup> ]) does not hold in the MaxSAT solution are identified as erring Skolem functions that may need to be repaired.

*Repair Synthesis.* Let y<sup>k</sup> be the variable corresponding to the erring function, ψk, identified in the previous step. To synthesize a repair for the function, Manthan applies a proof-guided strategy: it constructs a formula Gk(X, Y ), such that if Gk(X, Y ) is unsatisfiable then ψ<sup>k</sup> must undergo a change. The Unsat-Core of Gk(X, Y ) provides a *reason* that explains the discrepancy between the specification and the current Skolem function.

$$G\_k(X, Y) = (y\_k \leftrightarrow \sigma[y\_k']) \land F(X, Y) \land (X \leftrightarrow \sigma[X]) \land (\hat{Y} \leftrightarrow \sigma[\hat{Y}])$$
 
$$\text{where } \hat{Y} \subset Y \text{ and } \hat{Y} = \{TotalOrder[index(y\_k) + 1], \dots, TotalOrder[|Y|] \}$$

Manthan uses the UnsatCore to constructs a *repair formula*, say β, as a conjunction over literals in the unsatisfiable core; if ψ<sup>k</sup> is *true* with the current valuation of X and Yˆ , Manthan updates the function ψ<sup>k</sup> by conjoining it with the negation of repair formula (ψ<sup>k</sup> <sup>←</sup> <sup>ψ</sup><sup>k</sup> ∧ ¬β); otherwise, Manthan updates the function <sup>ψ</sup>k, by disjoining it with the repair formula (ψ<sup>k</sup> <sup>←</sup> <sup>ψ</sup><sup>k</sup> <sup>∨</sup> <sup>β</sup>).

**Self-substitution for Poorly Learnt Functions.** Some Skolem functions are difficult to learn through data. In our implementation, the corresponding variables escape the LearnSkF phase with poor candidate functions, thereby requiring a long sequence of incremental repairs for convergence. To handle such scenarios, we make the following observation: though synthesizing Skolem functions via self-substitution [19] can lead to an exponential blowup in the worst case, it is inexpensive if the number of variables synthesized via this technique is small. We use this observation to quickly synthesize a Skolem function for an erring variable if we detect its candidate function is poor (detected by comparing the number of times it enters refinement against an empirically determined threshold). Of course, this heuristic does not scale well if the number of such variables is large; in our experiments, we found less than 20% of the instances solved required self-substitution, and for over 75% of these instances, only one variable needed self-substitution. We elaborate more on the empirical evidence on the success of this heuristic in Sect. 6. A theoretical understanding of the learnability of Boolean functions from data seems to be an interesting direction for future work.

#### **5 Manthan: Algorithmic Description**

In this section, we present a detailed algorithmic description of Manthan, whose pseudocode is presented in Algorithm 1. Manthan takes in a formula F(X, Y ) as input and returns a Skolem vector Ψ. The algorithm starts off by preprocessing (line 1) the formula F(X, Y ) to get the unates (U) and their corresponding Skolem functions (Ψ). Next, it invokes the sampler (line 2) to collect a set of samples(Σ) as training data for the learning phase.

For each of the existential variables that are not unates, Manthan attempts to learn candidate Skolem functions (lines 4–5). To generate a variable order,

#### **Algorithm 1:** Manthan(F(X, Y ))

**<sup>1</sup>** Ψ,U <sup>←</sup> Preprocess(F(X, Y )) **<sup>2</sup>** <sup>Σ</sup> <sup>←</sup> GetSamples(F(X, Y )) **<sup>3</sup>** <sup>D</sup> ← ∅ **<sup>4</sup> foreach** <sup>y</sup>*<sup>j</sup>* <sup>∈</sup> <sup>Y</sup> \ <sup>U</sup> **do <sup>5</sup>** <sup>ψ</sup>*<sup>j</sup>* , D <sup>←</sup> CandidateSkF(Σ,F(X, Y ), y*<sup>j</sup>* , D) **<sup>6</sup>** *TotalOrder* <sup>←</sup> FindOrder(D) **7 repeat <sup>8</sup>** E(X, Y, Y - ) <sup>←</sup> <sup>F</sup>(X, Y ) ∧ ¬F(X, Y - ) <sup>∧</sup> (<sup>Y</sup> - <sup>↔</sup> <sup>Ψ</sup>) **<sup>9</sup>** ret, σ <sup>←</sup> CheckSat(E(X, Y, Y - )) **<sup>10</sup> if** *ret = SAT* **then <sup>11</sup>** <sup>Ψ</sup> <sup>←</sup> RefineSkF(F(X, Y ), Ψ, σ, *TotalOrder*) **<sup>12</sup> until** *ret = UNSAT* **<sup>13</sup>** <sup>Ψ</sup> <sup>←</sup> Substitute(F(X, Y ),Ψ, *TotalOrder*) **<sup>14</sup> return** Ψ

CandidateSkF uses a collection of sets <sup>d</sup>1, ··· , d|<sup>Y</sup> <sup>|</sup> <sup>∈</sup> <sup>D</sup>, such that <sup>y</sup><sup>i</sup> <sup>∈</sup> <sup>d</sup><sup>j</sup> indicates that y<sup>j</sup> depends on yi. Next, the FindOrder routine (line 6) construct *TotalOrder* of the Y variables in accordance to the dependencies in D. The verification and refinement phase (line 8) commences by constructing the error formula and launching the verification check (line 9). If the error formula is satisfiable, the counterexample model (σ) is used to refine the formula. Once the verification check is successful, the refinement phase ends and the subroutine Substitute is invoked to recursively substitute all <sup>y</sup><sup>i</sup> <sup>∈</sup> <sup>Y</sup> appearing in Skolem functions with their corresponding Skolem functions such that only X variables entirely describe all Skolem functions. The strict variable ordering enforced above ensures that Substitute always succeeds and does not get stuck in a cycle. Finally, the Skolem function vector Ψ is returned.

It is worth noting that Manthan can successfully solve an instance without having to necessarily execute all the phases. In particular, if U = Y , then Manthan terminates after Preprocess (i.e., line 1). Similarly, if the CheckSat return UNSAT during the first iteration of loop (lines 8–11), then Manthan does not invoke RefineSkF.

We now discuss each subroutine in detail. The pseudocode for Preprocess, GetSamples and Substitute is deferred to technical report [22].

Preprocess: We perform the pre-processing step as described in [5], which performs SAT queries on the formulas constructed as specified in Theorem 2.

GetSamples: GetSamples takes F(X, Y ) as input and returns a subset of satisfying assignments of F(X, Y ). GetSamples first generates a small set of samples (500) with Bias(0.5, 0.9) and calculates m<sup>i</sup> for all yi, m<sup>i</sup> is a ratio of number of samples with y<sup>i</sup> being 1 to the total number of samples. Similarity, GetSamples generates 500 samples with Bias(0.5, 0.1) and calculates n<sup>i</sup> for all yi, n<sup>i</sup> is a ratio **Algorithm 2:** CandidateSkF(Σ,F(X, Y ), y<sup>j</sup> , D)

```
1 featset ← X
2 foreach yk ∈ Y \ yj do
3 if yj ∈/ dk then
4 featset ← featset ∪ yk /* if yk is not dependent on yj */
5 feat, lbl ← Σ↓featset, Σ↓yj
6 t ← CreateDecisionTree(feat, lbl)
7 foreach n ∈ LeafNodes(t) do
8 if Label(n) = 1 then
9 π ← Path(t, root, n)
10 ψj ← ψj ∨ π
11 foreach yk ∈ ψj do
12 dj ← dj ∪ yk ∪ dk
13 return ψj , D
```
of number of samples with y<sup>i</sup> being 0 to the total number of samples. Finally, GetSamples generates required number of samples with Bias(0.5, q); for a yi, q is m<sup>i</sup> if both m<sup>i</sup> and n<sup>i</sup> are in range 0.35 to 0.65, else q is 0.9.

CandidateSkF: CandidateSkF, presented in Algorithm 2, assumes access to following three subroutines:


As we seek to learn Boolean functions, we employ binary classifiers with class labels 0 and 1. CandidateSkF shows our algorithm for extracting a Boolean function from the decision trees: lines 2–4 find a feature set (*featset*) to predict y<sup>j</sup> . The feature set includes all X variables and the subset of Y variables that are not dependent on y<sup>j</sup> . CandidateSkF creates decision tree t using samples Σ over the feature set. Lines 7–10 generate candidate Skolem function ψ<sup>j</sup> by iterating over all the leaf nodes of t. In particular, if a leaf node is labeled with 1, the candidate function is updated by disjoining with the formula returned by subroutine Path. CandidateSkF also updates d<sup>j</sup> in D, d<sup>j</sup> is set of all Y variables on which, y<sup>j</sup> depends. If y<sup>j</sup> depends on yk, then by transitivity y<sup>j</sup> also depends on dk; in line 12, CandidateSkF updates d<sup>j</sup> accordingly.

FindOrder: FindOrder takes D as an input to output a valid linear extension of the partial order <sup>≺</sup><sup>d</sup> defined over {y1,...ym} with respect to the candidate Skolem function vector Ψ.


RefineSkF: RefineSkF is invoked with a counterexample σ. RefineSkF first performs *fault localization* to find the initial set of erring candidate functions; to this end, it calls the MaxSATList subroutine (line 2) with <sup>F</sup>(X, Y )∧(<sup>X</sup> <sup>↔</sup> <sup>σ</sup>[X]) as hard-constraints and (<sup>Y</sup> <sup>↔</sup> <sup>σ</sup>[<sup>Y</sup> ]) as soft-constraints. MaxSATList employs a MaxSAT solver to find the solution that satisfies all the hard constraints and maximizes the number of satisfied soft constraints, and then returns a list (*Ind*) of Y variables such that for each of the variables appearing in (*Ind*) the corresponding soft-constraint was not satisfied by the optimal solution returned by MaxSAT solver.

Since candidate Skolem function corresponding to the variables in *Ind* needs to refine, RefineSkF now attempts to synthesize a repair for each of these candidate Skolem functions. Repair synthesis loop (lines 3–19) starts off by collecting the set of Y variables, Yˆ , on which y<sup>k</sup> of *Ind* can depend on as per the ordering constraints (line 4). Next, it invokes the subroutine CheckSubstitute, which returns True if the candidate function corresponding to y<sup>k</sup> has been refined more than a chosen threshold times (fixed to 10 in our implementation), and the corresponding decision tree constructed during execution CandidateSkF has exactly one node. If CheckSubstitute returns true, RefineSkF calls DoSelfSubstitution to perform self-substitution. DoSelfSubstitution takes a formula F(X, Y ), an existentially quantified variable y<sup>k</sup> and a list of variables which depends on y<sup>k</sup> and performs self substitution of y<sup>k</sup> with constant 1 in the formula F(X, Y ) [28].

If CheckSubstitute returns false, RefineSkF attempts a proof-guided repair for yk. RefineSkF calls CheckSat in line 9 on Gk, which corresponds to formula 2: if G<sup>k</sup> is SAT, then CheckSat returns a satisfying assignment(ρ) of G<sup>k</sup> in σ, else CheckSat returns unsatisfiable in the result, ret.


Substitute: To return the Skolem functions in terms of only X, Manthan invokes Substitute subroutine. For each y<sup>j</sup> of Y variable, Substitute consider Y variables that occurs later in *TotalOrder* as Yˆ . Then, for each y<sup>i</sup> of Yˆ ; it substitutes corresponding Skolem function ψ<sup>i</sup> in the Skolem function ψ<sup>j</sup> of y<sup>j</sup> .

An example to illustrate our algorithm is deferred to the technical report [22].

#### **6 Experimental Results**

We evaluate the performance of Manthan on the union of all the benchmarks employed in the most recent works [4,5],which includes 609 benchmarks from different sources: Prenex-2QBF track of QBFEval-17 [2], QBFEval-18 [3], disjunctive [6], arithmetic [45] and factorization [6]. We ran all the tools as per the specification laid out by their authors. We used Open-WBO [34] for our MaxSAT queries and PicoSAT [11] to compute UnsatCore. We used PicoSAT for its ease of usage and we expect further performance improvements by upgrading to one of the state of the art SAT solvers. We have used the Scikit-Learn [37] to create decision trees in LearnSkF phase of Manthan. We have also used ABC [31] to represent and manipulate Boolean functions. To allow for the input formats supported by the different tools, we use the utility scripts available with the BFSS distribution [5] to convert each of the instances to both QDIMACS and Verilog formats. For Manthan, unless otherwise specified, we set the number of samples according to heuristic based on <sup>|</sup><sup>Y</sup> <sup>|</sup> as described in Sect. 6.3 and minimum impurity decrease to 0.005. All our experiments were conducted on a high-performance computer cluster with each node consisting of a E5-2690 v3 CPU with 24 cores and 96 GB of RAM, with a memory limit set to 4 GB per core. All tools were run in a single-threaded mode on a single core with a timeout of 7200 s.

The objective of our experimental evaluation was two-fold: to understand the impact of various design choices on the runtime performance of Manthan and to perform an extensive comparison of runtime performance vis-a-vis state of the art synthesis tools. In particular, we sought to answer the following questions:


We observe that Manthan significantly improves upon state of the art, and solves 356 benchmarks while the state of the art tool can only solve 280; in particular, Manthan solves 60 more benchmarks that could not be solved by any of the state of the art tools. To put the runtime performance statistics in a broader context, the number of benchmarks solved by techniques developed over the past five years range from 206 to 280, i.e., a difference of 74, which is same as an increase of 76 (i.e., from 280 to 356) due to Manthan.

Our experimental evaluation leads to interesting conclusions and several directions for future work. We observe that the performance of Manthan is sensitive to different sampling schemes and the underlying samplers; in fact, we found that biased sampling yields better results than uniform sampling. This raises interesting questions on the possibility of designing specialized samplers for this task. Similarly, we observe interesting trade offs between the number of samples and the minimum impurity decrease in LearnSkF. The diversity of our extensive benchmark suite produces a nuanced picture with respect to time distribution across different phases, highlighting the critical nature of each of the phases to the performance of Manthan. Manthan shows significant performance improvement by using MaxSAT solver to identify candidates to refine. Manthan also has significant performance improvement with self substitution in terms of the required number of refinements.

#### **6.1 Comparison with Other Tools**

We now present performance comparison of Manthan with the current state of the art synthesis tools, BFSS [5], C2Syn [4], BaFSyn [14] and the current state of the art 2-QBF solvers CADET [39],CAQE [40] and DepQBF [32]. The certifying 2-QBF solver produces QBF certificates, that can be used to extract Skolem functions [8]. Developers of BaFSyn and DepQBF confirmed that the tools produce Skolem function for only valid instances, i.e. when <sup>∀</sup>X∃Y F(X, Y ) is valid. Note that the current version of CAQE does not support certification and we have used CAQE version 2 for the experiments after consultation with the developers of CAQE.

**Table 1.** No. of benchmarks solved by different tools


We present the number of instances solved Table 1. Out of 609 benchmarks, the most number of instances solved by any of the remaining techniques is 280 while Manthan is able to solve 356 instances – a significant improvement over state of the art. We will focus on top 4 synthesis tools from Table 1 for further analysis.

For a deeper analysis of runtime behavior, we present the cactus plot in Fig. 2: the number of instances are shown on the x-axis and the time taken on the y-axis; a point (x, y) implies that a solver took less than or equal to y seconds to find Skolem function of x instances on a total of 609 instances. An interesting behavior predicted by cactus plot and verified upon closer analysis is that for instances that can be solved by most of the tools, the initial overhead due to a multi-phase approach may lead to relatively larger runtime for Manthan. However, with the rise in empirically observed hardness of instances, one can observe the strengths of the multi-phase approach. Overall, Manthan solves 76 more instances than the rest of the remaining techniques.

We show a pairwise comparison of Manthan vis-a-vis other techniques in Table 2. The second row of the table lists the number of instances that were solved by the technique in the corresponding column but not by Manthan while the third row lists the number of instances that were solved by Manthan but not the corresponding technique. First, we observe that Manthan solves 163, 194, and 187 instances that are not solved by C2Syn, BFSS, and CADET respectively. Though

**Fig. 2.** Manthan versus competing tools for Skolem function synthesis


BFSS and CADET solve more than 80 instances that Manthan does not solve, they are not complementary; there are only 121 instances that can be solved by either BFSS or CADET but Manthan fails to solve. A closer analysis of Manthan's performance on these instances revealed that the decision trees generated by CandidateSkF were shallow, which is usually a sign of significant under-fitting. On the other hand, there are 130 instances that Manthan solves, but neither CADET nor BFSS can solve. These instances have high dependencies between variables that Manthan can infer from the samples en route to predicting good candidate Skolem functions. Akshay et al. [4] suggest that C2Syn is an orthogonal approach to BFSS. Manthan solves 81 instances that neither C2Syn nor BFSS is able to solve, and these tools together solve 86 instances that Manthan fails to solve. Overall, Manthan solves **60** instances beyond the reach of any of the above state of the art tools.

#### **6.2 Impact of the Sampling Scheme**

To analyze the impact of the adaptive sampling and the quality of distributions generated by underlying samplers, we augmented Manthan with samples drawn from different samplers for adaptive and non-adaptive sampling. In particular, we employed QuickSampler [16], KUS [42], UniGen2 [15], and BiasGen<sup>1</sup>. The

<sup>1</sup> BiasGen is developed by Mate Soos and Kuldeep S. Meel, and is pending publication.

samplers KUS and UniGen2 could only produce samples for mere 14 and 49 benchmarks respectively within a timeout of 3600 s. Hence, we have omitted KUS and UniGen2 from further analysis. We also experimented with a naive enumeration of solution using off-the-shelf SAT solver, CryptoMiniSat [43]. It is worth noting that QuickSampler performs worse than BiasGen for uniformity testing using Barbarik [13]. In our implementation, we had to turn off the validation phase of QuickSampler to allow generation number of samples within a reasonable time. To statistically validate our intuition described in Sect. 4, we performed adaptive sampling using BiasGen. We use AdaBiasGen to refer to the adaptive sampling implementation.

Table 3 presents the performance of Manthan with different samplers listed in Column 1. The columns 2, 3, and 4 lists the number of instances that were solved during the execution of respective phases: Preprocess, LearnSkF, and Refine. Finally, column 5 lists the total number of instances solved. Two important findings emerge from Table 3: Firstly, as the quality of samplers improve, so does the performance of Manthan. In particular, we observe that with the improvement in the quality of samples leads to Manthan solving more instances in LearnSkF. Secondly, we see a significant increase in the number of instances that can be solved due to LearnSkF with samples from AdaBiasGen. It is worth remarking that one should view the adaptive scheme proposed in Sect. 4 to be a proof of concept and our results will encourage the development of more complex schemes.


**Table 3.** Manthan with different samplers

**Fig. 3.** Heatmap of # instances solved. (Color figure online)

#### **6.3 Impact of LearnSkF**

To analyze the impact of different design choices in LearnSkF, we analyzed the performance of Manthan for different samples (1000, 5000 and 10000) generated by GetSamples and for different choices of minimum impurity decrease (0.001, 0.005, 0.0005). Figure 3 shows a heatmap on the number of instances solved on each combination of the hyperparameters; the closer the color of a cell is to the red end of the spectrum, the better the performance of Manthan.

At the first look, Fig. 3 presents a puzzling picture: It seems that increasing the number of samples does not improve the performance of Manthan. On a closer analysis, we found that the increase in the number of samples leads to an increase in the runtime of CandidateSkF but without significantly increasing the number of instances solved during LearnSkF. The runtime of CandidateSkF is dependent on the number of samples and <sup>|</sup><sup>Y</sup> <sup>|</sup>. On the other hand, we see an interesting trend with respect to minimum impurity decrease where the performance first improves and then degrades. A plausible explanation for such a behavior is that with an increase in *minimum impurity decrease*, the generated decision trees tend to underfit while significantly low values of *minimum impurity decrease* lead to overfitting. We intend to study this in detail in the future.

Based on the above observations, we set the value of minimum impurity decrease to 0.005 and set the number of samples to (1) 10000 for <sup>|</sup><sup>Y</sup> <sup>|</sup> <sup>&</sup>lt; 1200, (2) 5000 for 1200 <sup>&</sup>lt; <sup>|</sup><sup>Y</sup> | ≤ 4000, and (3) 1000 for <sup>|</sup><sup>Y</sup> <sup>|</sup> <sup>&</sup>gt; 4000.

#### **6.4 Division of Time Taken Across Different Phases**

To analyze the time taken by different phases of Manthan across different categories of the benchmarks, we normalize the time taken for each of the four core subroutines, Preprocess, GetSamples, CandidateSkF, and RefineSkF, for every benchmark that was solved by Manthan such that the sum of time taken for each benchmark is 1. We then compute the mean of the normalized times across different categories instances. Figure 4 shows the distribution of mean normalized times for different categories: Arithmetic, Disjunction, Factorization, QBFEval, and all the instances.

**Fig. 4.** Fraction of time spent in different phases in Manthan over different classes of benchmarks. (Color figure online)

The diversity of our benchmark suite shows a nuanced picture and shows that the time taken by different phases strongly depends on the family of instances. For example, the disjunctive instances are particularly hard to sample and an improvement in the sampling techniques would lead to significant performance gains. On the other hand, a significant fraction of runtime is spent in the CandidateSkF subroutine indicating the potential gains due to improvement in decision tree generation routines. In all, Fig. 4 identifies the categories of instances that would benefit from algorithmic and engineering improvements in Manthan's different subroutines.

#### **6.5 Impact of Using MaxSAT**

In RefineSkF, Manthan invokes the MaxSATList subroutine, which calls MaxSAT solver to identify the potential erring Skolem functions. To observe the impact of using MaxSAT solver to identify the candidates to refine, we did an experiment with Manthan, without MaxSATList subroutine call. For all <sup>y</sup>i, where <sup>σ</sup>[yi] <sup>=</sup> <sup>σ</sup>[y i] were considered as candidates to refine. Manthan without MaxSATList subroutine call solved 204 instances that represents a significant drop in the number of solved instances by Manthan with MaxSATList subroutine.

#### **6.6 Impact of Self-substitution**

To understand the impact of self-substitution, we profile the behavior of candidate Skolem functions with respect to number of refinements for two of our benchmarks; *pdtpmsmiim-all-bit* and *pdtpmsmiim*. In Fig. 5, we use histograms with the number of candidate Skolem functions on y-axis and required number of refinements on x-axis. A bar of height a i.e y = a at b i.e x = b in Fig. 5 represents that a candidate Skolem functions converged in b refinements. The histograms show that only a few Skolem functions require a large number of refinements: the tiny bar towards the right end in Fig. 5a represents that for the benchmark *pdtpmsmiim-all-bit* only 1 candidate Skolem function required more than 60 refinements whereas all other candidate Skolem functions needed less than 15 refinements. Similarly, for the benchmark *pdtpmsmiim*, Fig. 5b shows that only 1 candidate Skolem function was refined more than 15 times, whereas all other Skolem functions required less than 5 refinements. We found similar behaviors in many of our other benchmarks.

Based on the above trend and an examination of the decision trees corresponding to these instances, we hypothesize that some Skolem functions are hard to learn through data. For such functions, the candidate Skolem function generated from the data-driven phase in Manthan tends to be poor, and hence Manthan requires a long series of refinements for convergence. Since our refinement algorithm is designed for small, efficient corrections, we handle such hard to learn Skolem functions by synthesizing via self-substitution. Manthan detects such functions via a threshold on the number of refinements, which is empirically determined as 10, to identify hard to learn instances and sets them up for self-substitution.

In our experiments, we found 75 instances out of 356 solved instances required self-substitution, and for 51 of these 75 instances, only one variable undergoes self-substitution. Table 4 shows the impact of self-substitution for five of our benchmarks: Manthan has significant performance improvement with selfsubstitution in terms of the required number of refinements, which in turns affects the overall time. Note that Manthan can refine multiple candidates in a single RefineSkF call. For the first four benchmarks, all the other Skolem function

**(a)** Benchmark *pdtpmsmiim-all-bit:* plot for no. of Skolem functions vs required no. of refinements

**Fig. 5.** The plots to show the required number of refinements for the candidate Skolem functions.

except the poor candidates were synthesized earlier than 10 refinement iteration, and at the 10th refinement iteration the poor candidate functions hit our threshold for self-substitution. Taking the case of the last benchmark, all the other Skolem functions for it were synthesized earlier than 40 refinement cycles, and the last 16 iterations were only needed for 2 of the poor candidate functions to hit our threshold for self-substitution. Note that self-substitution can lead to an exponential blowup in the size of the formula, but it works quite well in our design as most Skolem functions are learnt quite well in the LearnSkF phase.


**Table 4.** Manthan : Impact of self substitution

#### **7 Conclusion**

Boolean functional synthesis is a fundamental problem in Computer Science with a wide variety of applications. In this work, we propose a novel data-driven approach to synthesis that employs constrained sampling techniques for generation of data, machine learning for candidate Skolem functions, and automated reasoning to verify and refine to generate Skolem functions. Our approach achieves significant performance improvements. As pointed out in Sects. 5 and 6, our work opens up several interesting directions for future work at the intersection of machine learning, constrained sampling, and automated reasoning.

**Acknowledgment.** We are grateful to the anonymous reviewers and Dror Fried for constructive comments that significantly improved the final version of the paper. We are grateful to Mate Soos for tweaking BiasGen to support Manthan. We are indebted to S. Akshay, Supratik Chakraborty, and Shetal Shah for their patient responses to our tens of queries regarding prior work.

This work was supported in part by National Research Foundation Singapore under its NRF Fellowship Programme [NRF-NRFFAI1-2019-0004] and AI Singapore Programme [AISG-RP-2018-005], and NUS ODPRT Grant [R-252-000-685-13]. The computational work for this article was performed on resources of the National Supercomputing Centre, Singapore: https://www.nscc.sg [1].

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Decidable Synthesis of Programs with Uninterpreted Functions**

Paul Krogmeier(B) , Umang Mathur , Adithya Murali , P. Madhusudan, and Mahesh Viswanathan

> University of Illinois at Urbana-Champaign, Champaign, USA {paulmk2,umathur3,adithya5,madhu,vmahesh}@illinois.edu

**Abstract.** We identify a decidable synthesis problem for a class of programs of unbounded size with conditionals and iteration that work over infinite data domains. The programs in our class use uninterpreted functions and relations, and abide by a restriction called coherence that was recently identified to yield decidable verification. We formulate a powerful grammar-restricted (syntax-guided) synthesis problem for coherent uninterpreted programs, and we show the problem to be decidable, identify its precise complexity, and also study several variants of the problem.

#### **1 Introduction**

Program synthesis is a thriving area of research that addresses the problem of automatically constructing a program that meets a user-given specification [1,21,22]. Synthesis specifications can be expressed in various ways: as input-output examples [19,20], temporal logic specifications for reactive programs [44], logical specifications [1,4], etc. Many targets for program synthesis exist, ranging from transition systems [31,44], logical expressions [1], imperative programs [51], distributed transition systems/programs [38,43,45], filling holes in programs [51], or repairs of programs [49].

A classical stream of program synthesis research is one that emerged from a problem proposed by Church [13] in 1960 for Boolean circuits. Seminal results by B¨uchi and Landweber [9] and Rabin [48] led to a mature understanding of the problem, including connections to infinite games played on finite graphs and automata over infinite trees (see [18,32]). Tractable synthesis for temporal logics like LTL, CTL, and their fragments was investigated and several applications for synthesizing hardware circuits emerged [6,7].

In recent years, the field has taken a different turn, tackling synthesis of programs that work over infinite domains such as strings [19,20], integers [1,51], and heaps [47]. Typical solutions derived in this line of research involve (a) bounding the class of programs to a finite set (perhaps iteratively increasing the class) and (b) searching the space of programs using techniques like symmetryreduced enumeration, SAT solvers, or even random walks [1,4], typically guided

Paul Krogmeier and Mahesh Viswanathan are partially supported by NSF CCF 1901069. Umang Mathur is partially supported by a Google PhD Fellowship.

c The Author(s) 2020

S. K. Lahiri and C. Wang (Eds.): CAV 2020, LNCS 12225, pp. 634–657, 2020. https://doi.org/10.1007/978-3-030-53291-8\_32

by counterexamples (CEGIS) [28,34,51]. Note that iteratively searching larger classes of programs allows synthesis engines to find a program if one exists, but it does not allow one to conclude that there is no program that satisfies the specification. Consequently, in this stream of research, decidability results are uncommon (see Sect. 7 for some exceptions in certain heavily restricted cases).

*In this paper we present, to the best of our knowledge, the first decidability results for program synthesis over a natural class of programs with iteration/recursion, having arbitrary sizes, and which work on infinite data domains. In particular, we show decidable synthesis of a subclass of programs that use uninterpreted functions and relations.*

Our primary contribution is a decidability result for realizability and synthesis of a restricted class of imperative *uninterpreted* programs. Uninterpreted programs work over infinite data models that give arbitrary meanings to their functions and relations. Such programs satisfy their assertions if they hold along all executions for *every* model that interprets the functions and relations. The theory of uninterpreted functions and relations is well studied—classically, in 1929, by G¨odel, where completeness results were shown [5] and, more recently, its decidable quantifier-free fragment has been exploited in SMT solvers in combination with other theories [8]. In recent work [39], a subclass of uninterpreted programs, called *coherent* programs, was identified and shown to have a decidable verification problem. Note that in this verification problem there are no user-given loop invariants; the verification algorithm finds inductive invariants and proves them automatically in order to prove program correctness.

In this paper, we consider the synthesis problem for coherent uninterpreted programs. The user gives a *grammar* G that generates well-formed programs in our programming language. The grammar can force programs to have **assert** statements at various points which collectively act as the specification. The program synthesis problem is then to construct a coherent program, if one exists, conforming to the grammar G that satisfies all assertions in all executions when running on *any* data model that gives meaning to function and relation symbols.

Our primary result is that the realizability problem (checking the existence of a program conforming to the grammar and satisfying its assertions) is decidable for coherent uninterpreted programs. We prove that the problem is 2EXPTIMEcomplete. Further, whenever a correct coherent program that conforms to the grammar exists, we can synthesize one. We also show that the realizability/synthesis problem is undecidable if the coherence restriction is dropped. In fact we show a stronger result that the problem is undecidable even for synthesis of *straight-line* programs (without conditionals and iteration)!

Coherence of programs is a technical restriction that was introduced in [39]. It consists of two properties, both of which were individually proven to be essential for ensuring that program verification is decidable. Intuitively, the restriction demands that functions are computed on any tuple of terms only once and that assumptions of equality come early in the executions. In more recent work [41], the authors extend this decidability result to handle map updates, and applied it to memory safety verification for a class of heap-manipulating programs on forest data-structures, demonstrating that the restriction of coherence is met in practice by certain natural and useful classes of programs.

Note that automatic synthesis of correct programs over infinite domains demands that we, at the very least, can automatically verify the synthesized program to be correct. The class of coherent uninterpreted programs identified in the work of [39] is the only natural class of programs we are aware of that has recursion and conditionals, works over infinite domains, and admits decidable verification. Consequently, this class is a natural target for proving a decidable synthesis result.

The problem of synthesizing a program from a grammar with assertions is a powerful formulation of program synthesis. In particular, the grammar can be used to restrict the space of programs in various ways. For example, we can restrict the space syntactically by disallowing while loops. Or, for a fixed n, by using a set of Boolean variables linear in n and requiring a loop body to strictly increment a counter encoded using these variables, we can demand that loops terminate in a linear/polynomial/exponential number of iterations. We can also implement loops that do not always terminate, but terminate only when the data model satisfies a particular property, e.g., programs that terminate only on finite list segments, by using a skeleton of the form: **while** (<sup>x</sup> -<sup>=</sup> <sup>y</sup>){ ... ; <sup>x</sup> := next(x)}. Grammar-restricted program synthesis can express the synthesis of programs with holes, used in systems like Sketch [50], where the problem is to fill holes using programs/expressions conforming to a particular grammar so that the assertions in the program hold. Synthesizing programs or expressions using restricted grammars is also the cornerstone of the intensively studied SyGuS (syntax-guided synthesis) format [1,52] 1.

The proof of our decidability result relies on tree automata, a callback to classical theoretical approaches to synthesis. The key idea is to represent programs as trees and build automata that accept trees corresponding to correct programs. The central construction is to build a two-way alternating tree automaton that accepts *all* program trees of coherent programs that satisfy their assertions. Given a grammar G of programs (which has to satisfy certain natural conditions), we show that there is a regular set of program trees for the language of allowed programs <sup>L</sup>(G). Intersecting the automata for these two regular tree languages and checking for emptiness establishes the upper bound. Our constructions crucially use the automaton for verifying coherent uninterpreted programs in [39] and adapt ideas from [35] for building two-way automata over program trees. Our final decision procedure is doubly-exponential in the number of program variables and *linear* in the size of the grammar. We also prove a matching lower bound by reduction from the acceptance problem for alternating exponential-space Turing machines. The reduction is non-trivial in that programs (which correspond to runs in the Turing machine) must simulate sequences of configurations, each of which is of exponential size, by using only polynomially-many variables.

<sup>1</sup> Note, however, that both Sketch and SyGuS problems are defined using functions and relations that are interpreted using standard theories like arithmetic, etc., and hence of course do not have decidable synthesis.

**Recursive Programs, Transition Systems, and Boolean Programs:** We study three related synthesis problems. First, we show that our results extend to synthesis of call-by-value *recursive* uninterpreted programs (with a fixed number of functions and fixed number of local/global variables). This problem is also 2EXPTIME-complete but is more complex, as even single executions simulated on the program tree must be split into separate copies, with one copy executing the summary of a function call and the other proceeding under the assumption that the call has returned in a summarized state.

We next examine a synthesis problem for *transition systems*. Transition systems are similar to programs in that they execute similar kinds of atomic statements. We allow the user to restrict the set of allowable executions (using regular sets). Despite the fact that this problem seems very similar to program synthesis, we show that it is an *easier* problem, and coherent transition system realizability and synthesis can be solved in time exponential in the number of program variables and polynomial in the size of the automata that restrict executions. We prove a corresponding lower bound to establish EXPTIME-completeness of this problem.

Finally, we note that our results also show, as a corollary, that the grammarrestricted realizability/synthesis problem for Boolean programs (resp. executionrestricted synthesis problem for Boolean transition systems) is decidable and is 2EXPTIME-complete (resp. EXPTIME-complete). These results for Boolean programs are themselves new. The lower bound results for these problems hence show that coherent program/transition-system synthesis is not particularly harder than Boolean program synthesis for uninterpreted programs. Grammarrestricted Boolean program synthesis is an important problem which is addressed by many practical synthesis systems like Sketch [50].

Due to space restrictions, we present only proof gists for main results in the paper. All the complete proofs can be found in our technical report [30].

#### **2 Examples**

We will begin by looking at several examples to gain some intuition for uninterpreted programs.

*Example 1.* Consider the program in Fig. 1 (left). This program has a *hole* ' ?? | Cannot . . . ' that we intend to fill with a sub-program so that the entire program (together with the contents of the hole) satisfies the assertion at the end. The sub-program corresponding to the hole is allowed to use the variable cipher as well as some additional variables <sup>y</sup>1,..., <sup>y</sup><sup>n</sup> (for some fixed <sup>n</sup>), but is not allowed to refer to key or secret in any way. Here we also restrict the hole to exclude while loops. This example models the encryption of a secret message secret with a key key. The assumption in the second line of the program models

```
cipher := enc(secret, key);
assume(secret = dec(cipher, key));
 ?? | Cannot refer to secret or key ;
assert(z = secret)
  Decrypting a ciphertext
                                            assume(T = F);
                                            if (x = T) then b := T else b := F;
                                            -
                                             -
                                              ?? | Cannot refer to x or b ;
                                            assert(y = b)
                                           Synthesis with incomplete information
```
the fact that the secret message can be decrypted from cipher and key. Here, the functions enc and dec are *uninterpreted functions*, and thus the program we are looking for is an *uninterpreted program*. For such a program, the assertion "**assert**(z = secret)" holds at the end if it holds for *all models*, i.e, for all interpretations of enc and dec and for all initial values of the program variables secret, key, cipher, and <sup>y</sup>1,..., <sup>y</sup>n. With this setup, we are essentially asking whether a program that does not have access to key can recover secret. It is not hard to see that there is no program which satisfies the above requirement. The above modeling of keys, encryption, nonces, etc. is common in algebraic approaches to modeling cryptographic protocols [15,16].

*Example 2.* The program in Fig. 1 (right) is another simple example of an unrealizable specification. The program variables here are x, b, and y. The hole in this partial program is restricted so that it cannot refer to x or b. It is easy to phrase the question for synthesis of the complete program in terms of a grammar. The restriction on the hole ensures that the synthesized code fragment can neither directly check if x = T, nor indirectly check via b. Consequently, it is easy to see that there is no program for the hole that can ensure y is equal to b. We remark that the code at the hole, apart from not being allowed to examine some variables, is also implicitly prohibited from looking at the control path taken to reach the hole. If we could synthesize two different programs depending on the control path taken to reach the hole, then we could set y := T when the **then**-branch is taken and set y := F when the **else**-branch is taken. Program synthesis requires a control-flow independent decision to be made about how to fill the hole. In this sense, we can think of the hole as having only *incomplete information* about the executions for which it must be correct. This can be used to encode specifications using complex ghost code, as we show in the next examples. In Sect. 6, we explore a slightly different synthesis problem, called *transition system synthesis*, where holes can be differently instantiated based on the history of an execution.

*Example 3.* In this example, we model the synthesis of a program that checks whether a linked list pointed to by some node x has a key k. We model a *next* pointer with a unary function next and we model locations using elements in the underlying data domain.

Our formalism allows only for **assert** statements to specify desired program properties. In order to state the correctness specification for our desired list-search program, we interleave *ghost code* into the program skeleton; we distinguish ghost code fragments by enclosing them in dashed boxes . The skeleton in Fig. 2 has a loop that advances the pointer variable x along the list until NIL is reached. We model NIL with an immutable program variable. The first hole ' ?? <sup>1</sup> ' before the **while**-loop and the second hole ' ?? <sup>2</sup> ' within the **while**-loop need to be filled so that the assertion at the end is satisfied. We use three ghost variables in the skeleton: gans, gwitness, and gfound. The ghost variable gans evaluates to whether we expect to find k in the list or not, and hence at the end the skeleton asserts that the Boolean variable b computed by the holes is precisely gans. The holes are restricted to not look at the ghost variables.

Now, notice that the skeleton needs to *check* that the answer gans is indeed correct. If gans is not T, then we add the assumption that key(x) -= k in each iteration of the loop, hence ensuring the key is not present. For ensuring correctness in the case gans = T, we need two more ghost variables gwitness and gfound. The variable gwitness witnesses the precise location in the list that holds the key k, and variable gfound indicates whether the location at gwitness belongs to the list pointed to by x. Observe that this specification can be realized by filling ' ?? <sup>1</sup> ' with "b := F" and ' ?? <sup>2</sup> ' with "**if** key(x) <sup>=</sup> <sup>k</sup> **then** <sup>b</sup> := T", for instance. Furthermore, this program is *coherent* [39] and hence our decision procedure will answer in the affirmative and synthesize code for the holes.


In fact, our procedure will synthesize a representation for *all* possible ways to fill the holes (thus including the solution above) and it is therefore possible to enumerate and pick specific solutions. It is straightforward to formulate a grammar which matches this setup. As noted, we must stipulate that the holes do not use the ghost variables.

*Example 4.* Consider the same program skeleton as in Example 3, but let us add an assertion at the end: "**assert** (b=T <sup>⇒</sup> <sup>z</sup> <sup>=</sup> <sup>g</sup>witness)", where <sup>z</sup> is another program variable. We are now demanding that the synthesized code also find a location z, whose key is k, that is equal to the ghost location gwitness, which is guessed nondeterministically at the beginning of the program. This specification is *unrealizable*: for a list with multiple locations having the key k, no matter what the program picks we can always take gwitness to be the *other* location with key k in the list, thus violating the assertion. Our decision procedure will report in the negative for this specification.

*Example 5 (Input/Output Examples).* We can encode input/output examples by adding a sequence of assignments and assumptions that define certain models at the beginning of the program grammar. For instance, the sequence of statements in Fig. 3 defines a linked list of two elements with different keys.

We can similarly use special variables to define the output that we expect in the case of each model. And as we saw in the ghost code of Fig. 2, we can use fresh variables to introduce nondeterministic choices, which the grammar can use to pick an example model nondeterministically. Thus when the synthesized program is executed on the chosen model it computes the expected answer. This has the effect of requiring a solution that generalizes across models. See [30] for a more detailed example.

```
assume(x1 = NIL);
x2 := next(x1);
assume(x2 = NIL);
assume(next(x2) = NIL);
k1 := key(x1);
k2 := key(x2);
assume(k1 = k2)
```
**Fig. 3.** An example model

#### **3 Preliminaries**

In this section we define the syntax and semantics of uninterpreted programs and the *(grammar-restricted) uninterpreted program synthesis* problem.

**Syntax.** We fix a first order signature <sup>Σ</sup> = (F, <sup>R</sup>), where <sup>F</sup> and <sup>R</sup> are sets of function and relation symbols, respectively. Let V be a finite set of program variables. The set of programs over V is inductively defined using the following grammar, with <sup>f</sup> ∈ F, <sup>R</sup> ∈ R (with <sup>f</sup> and <sup>R</sup> of the appropriate arities), and x, y, z1,...,z<sup>r</sup> <sup>∈</sup> <sup>V</sup> .

$$\begin{aligned} \langle \textit{smt} \rangle\_V &::= \begin{array}{ll} \textbf{skip} \mid x := y \mid x := f(z\_1, \ldots, z\_r) \mid \\ \textbf{assume} \left( \langle cond \rangle\_V \right) \mid \textbf{assert} \left( \langle cond \rangle\_V \right) \mid \langle \textit{smt} \rangle\_V \mid \langle \textit{stmt} \rangle\_V \mid \\ \textbf{if} \left( \langle cond \rangle\_V \right) \textbf{then} \left( \langle stmt \rangle\_V \textbf{else} \; \langle stmt \rangle\_V \mid \textbf{while} \left( \langle cond \rangle\_V \right) \nmid \textbf{stmt} \rangle\_V \right) \end{array} \end{aligned}$$

$$\begin{aligned} \langle cond \rangle\_V &::= x = y \mid R(z\_1, \ldots, z\_r) \mid \langle cond \rangle\_V \lor \langle cond \rangle\_V \mid \neg \langle cond \rangle\_V \end{aligned}$$

Without loss of generality, we can assume that our programs do not use relations (they can be modeled with functions) and that every condition is either an equality or disequality between variables (arbitrary Boolean combinations can be modeled with nested **if**−**then**−**else**). When the set of variables <sup>V</sup> is clear from context, we will omit the subscript <sup>V</sup> from stmt<sup>V</sup> and cond<sup>V</sup> .

**Program Executions.** An execution over V is a finite word over the alphabet

$$\begin{aligned} \mathcal{H}\_V = \{ \text{``}x := y \text{''}, \text{``}x := f(\overline{z}) \text{''}, \text{``assume}(x = y) \text{''}, \text{``assume}(x \neq y) \text{''},\\ \text{``asserr}(\bot) \text{''} \mid x, y \in V, \overline{z} \in V^r, f \in \mathcal{F} \}. \end{aligned}$$

The set of *complete executions* for a program p over V , denoted Exec(p), is a regular language. See [30] for a straightforward definition. The set PExec(p) of *partial executions* is the set of prefixes of complete executions in Exec(p). We refer to partial executions as simply *executions*, and clarify as needed when the distinction is important.

**Semantics.** The semantics of executions is given in terms of data models. A data model <sup>M</sup> = (U, <sup>I</sup>) is a first order structure over <sup>Σ</sup> comprised of a universe <sup>U</sup> and an interpretation function <sup>I</sup> for the program symbols. The semantics of an execution <sup>π</sup> over a data model <sup>M</sup> is given by a configuration <sup>σ</sup>(π,M) : <sup>V</sup> <sup>→</sup> <sup>U</sup> which maps each variable to its value in the universe U at the end of π. This notion is straightforward and we skip the formal definition (see [39] for details). For a fixed program p, any particular data model corresponds to at most one complete execution <sup>π</sup> <sup>∈</sup> Exec(p).

An execution <sup>π</sup> is *feasible* in a data model <sup>M</sup> if for every prefix <sup>ρ</sup> <sup>=</sup> <sup>ρ</sup> · **assume**(<sup>x</sup> <sup>∼</sup> <sup>y</sup>) of <sup>π</sup> (where ∼∈ {=, -<sup>=</sup>}), we have <sup>σ</sup>(ρ ,M)(x) <sup>∼</sup> <sup>σ</sup>(ρ ,M)(y). Execution <sup>π</sup> is said to be *correct* in a data model <sup>M</sup> if for every prefix of <sup>π</sup> of the form <sup>ρ</sup> <sup>=</sup> <sup>ρ</sup> · **assert**(⊥), we have that <sup>ρ</sup> is not feasible, or *infeasible* in <sup>M</sup>. Finally, a program <sup>p</sup> is said to be *correct* if for all data models <sup>M</sup> and executions <sup>π</sup> <sup>∈</sup> PExec(p), <sup>π</sup> is correct in <sup>M</sup>.

#### **3.1 The Program Synthesis Problem**

We are now ready to define the program synthesis problem. Our approach will be to allow users to specify a grammar and ask for the synthesis of a program from the grammar. We allow the user to express specifications using *assertions* in the program to be synthesized.

**Grammar Schema and Input Grammar.** In our problem formulation, we allow users to define a grammar which conforms to a schema, given below. The input grammars allow the usual context-free power required to describe proper nesting/bracketing of program expressions, but disallow other uses of the context-free power, such as *counting statements*.

For example, we disallow the grammar in Fig. 4. This grammar has two non-terminals S (the start symbol) and T. It generates programs with a conditional that has the *same* number of assignments in the **if** and **else** branches. We assume a countably infinite set P N of nonterminals and a countably infinite set P V of program variables. The grammar schema <sup>S</sup> over P N and P V is an infinite collection of productions:

*S* → **if** (x = y) **then** u := v *T* u := v *T* → **else** *T* → ; u := v *T* u := v ;

**Fig. 4.** Grammar with counting

$$\mathcal{S} = \left\{ \begin{array}{l} \text{``}P \to x := y^{\text{"}}, \text{"}P \to x := f(\overline{z})^{\text{"}},\\ \text{``}P \to \texttt{assume}(x \sim y)^{\text{"}}, \text{"}P \to \texttt{assert}(\bot)^{\text{"}},\\ \text{``}P \to \texttt{skip}^{\text{"}}, \text{"}P \to \texttt{while}\,(x \sim y) \,\,P\_{1}^{\text{"}},\\ \text{``}P \to \text{if}(x \sim y) \,\texttt{then} \,P\_{1} \,\texttt{else} \,P\_{2}^{\text{"}}, \text{"}P \to \,P\_{1}, P\_{2}^{\text{"}} \end{array} \; \middle| \begin{array}{l} P, P\_{1}, P\_{2} \in PN \\ x, y \in PV, \,\overline{z} \in PV \\ \sim \{=, \neq\} \end{array} \right\}$$

An *input grammar* G is any finite subset of the schema S, and it defines a set of programs, denoted <sup>L</sup>(G). We can now define the main problem addressed in this work.

**Definition 1 (Uninterpreted Program Realizability and Synthesis).** *Given an input grammar* G*, the realizability problem is to determine whether there is an uninterpreted program* <sup>p</sup> <sup>∈</sup> <sup>L</sup>(G) *such that* <sup>p</sup> *is correct. The synthesis problem is to determine the above, and further, if realizable, synthesize a correct program* <sup>p</sup> <sup>∈</sup> <sup>L</sup>(G)*.*

*Example 6.* Consider the program with a hole from Example 1 (Fig. 1, left). We can model that synthesis problem in our framework with the following grammar.

<sup>S</sup> <sup>→</sup> <sup>P</sup>1; <sup>P</sup>2; <sup>P</sup> ?? ; <sup>P</sup><sup>3</sup> <sup>P</sup> ?? → stmt<sup>V</sup>-- ?? <sup>P</sup><sup>1</sup> <sup>→</sup> "cipher := enc(secret, key)" <sup>P</sup><sup>3</sup> <sup>→</sup> "**assert**(<sup>z</sup> <sup>=</sup> secret)" <sup>P</sup><sup>2</sup> <sup>→</sup> "**assume**(secret <sup>=</sup> dec(cipher, key))"

Here, <sup>V</sup> ?? <sup>=</sup> {cipher, <sup>y</sup>1,..., <sup>y</sup>n} and the grammar stmt<sup>V</sup>-- ?? is that of Sect. 3, restricted to loop-free programs. Any program generated from this grammar indeed matches the template from Fig. 1 (left) and any such program is correct if it satisfies the last assertion for all models, i.e., all interpretations of the function symbols enc and dec and for all initial values of the variables in <sup>V</sup> <sup>=</sup> <sup>V</sup> ?? ∪ {key, secret}.

## **4 Undecidability of Uninterpreted Program Synthesis**

Since verification of uninterpreted programs with loops is undecidable [39,42], the following is immediate.

#### **Theorem 1.** *The uninterpreted program synthesis problem is undecidable.*

We next consider synthesizing loop-free uninterpreted programs (for which verification reduces to satisfiability of quantifier-free EUF) from grammars conforming to the following schema:

$$\mathcal{S}\_{\text{loop-free}} = \mathcal{S} \backslash \{ \text{``}P \to \text{ while} \,(x \sim y) \,\, P\_1 \text{''} \mid P, P\_1 \in PN, \, x, y \in PV, \sim \in \{=, \ne\} \}$$

**Theorem 2.** *The uninterpreted program synthesis problem is undecidable for the schema* S*loop-free.*

This is a corollary of the following stronger result: synthesis of *straight-line uninterpreted programs* (conforming to schema SSLP below) is undecidable.

$$\mathcal{S}\_{\mathsf{SLP}} = \mathcal{S}\_{\mathsf{loop-free}} \mid \{ \text{``}P \to \mathsf{if}(x \sim y) \,\,\mathsf{then}\, P\_1 \,\mathsf{else}\, P\_2 \,\, ^\circ \mid P, P\_1, P\_2 \in PN, \\\ \qquad x, y \in PV, \sim \in \{ =, \neq \} \}$$

**Theorem 3.** *The uninterpreted program synthesis problem is undecidable for the schema* S*SLP.*

In summary, program synthesis of even straight-line uninterpreted programs, which have neither conditionals nor iteration, is already undecidable. The notion of *coherence* for uninterpreted programs was shown to yield decidable verification in [39]. As we'll see in Sect. 5, restricting to coherent programs yields decidable synthesis, even for programs with conditionals *and* iteration.

#### **5 Synthesis of Coherent Uninterpreted Programs**

In this section, we present the main result of the paper: grammar-restricted program synthesis for uninterpreted *coherent* programs [39] is decidable. Intuitively, coherence allows us to maintain congruence closure in a streaming fashion when reading a coherent execution. First we recall the definition of coherent executions and programs in Sect. 5.1 and also the algorithm for verification of such programs. Then we introduce the synthesis procedure, which works by constructing a two-way alternating tree automaton. We briefly discuss this class of tree automata in Sect. 5.2 and recall some standard results. In Sects. 5.3, 5.4 and 5.5 we describe the details of the synthesis procedure, argue its correctness, and discuss its complexity. In Sect. 5.6, we present a tight lower bound result.

#### **5.1 Coherent Executions and Programs**

The notion of coherence for an execution π is defined with respect to the *terms* it computes. Intuitively, at the beginning of an execution, each variable <sup>x</sup> <sup>∈</sup> <sup>V</sup> stores some constant term <sup>x</sup> ∈ C. As the execution proceeds, new terms are computed and stored in variables. Let Terms<sup>Σ</sup> be the set of all ground terms defined using the constants and functions in Σ. Formally, the term corresponding to a variable <sup>x</sup> <sup>∈</sup> <sup>V</sup> at the end of an execution <sup>π</sup> <sup>∈</sup> <sup>Π</sup><sup>∗</sup> <sup>V</sup> , denoted <sup>T</sup>(π, x) <sup>∈</sup> TermsΣ, is inductively defined as follows. We assume that the set of constants C includes a designated set of *initial* constants <sup>V</sup> <sup>=</sup> {x <sup>|</sup> <sup>x</sup> <sup>∈</sup> <sup>V</sup> }⊆C.

$$\begin{array}{cc} \mathsf{T}(\varepsilon, x) = \widehat{x} & x \in V \\ \mathsf{T}(\pi \cdot \upmodels x := y^{\gamma}, x) = \mathsf{T}(\pi, y) & x, y \in V \\ \mathsf{T}(\pi \cdot \upmodels x := f(z\_1, \dots, z\_r)^{\gamma}, x) = f(\mathsf{T}(\pi, z\_1), \dots, \mathsf{T}(\pi, z\_r)) \; x, z\_1, \dots, z\_r \in V \\ \mathsf{T}(\pi \cdot a, x) = \mathsf{T}(\pi, x) & \text{otherwise} \end{array}$$

We will use <sup>T</sup>(π) to denote the set {T(π , x) <sup>|</sup> <sup>x</sup> <sup>∈</sup> V,π is a prefix of <sup>π</sup>}.

A related notion is the set of *term equality assumptions* that an execution accumulates, which we formalize as <sup>α</sup> : <sup>π</sup> → P(Terms<sup>Σ</sup> <sup>×</sup> TermsΣ), and define inductively as <sup>α</sup>(ε) = <sup>∅</sup>, <sup>α</sup>(π·"**assume**(<sup>x</sup> <sup>=</sup> <sup>y</sup>)") = <sup>α</sup>(π) ∪ {(T(π, x),T(π, y))}, and <sup>α</sup>(π·a) = <sup>α</sup>(π) otherwise.

For a set of term equalities <sup>A</sup> <sup>⊆</sup> Terms<sup>Σ</sup> <sup>×</sup> TermsΣ, and two ground terms <sup>t</sup>1, t<sup>2</sup> <sup>∈</sup> TermsΣ, we say <sup>t</sup><sup>1</sup> and <sup>t</sup><sup>2</sup> are *equivalent modulo* <sup>A</sup>, denoted <sup>t</sup><sup>1</sup> <sup>∼</sup>=<sup>A</sup> <sup>t</sup>2, if <sup>A</sup> <sup>|</sup><sup>=</sup> <sup>t</sup><sup>1</sup> <sup>=</sup> <sup>t</sup>2. For a set of terms <sup>S</sup> <sup>⊆</sup> TermsΣ, and a term <sup>t</sup> <sup>∈</sup> Terms<sup>Σ</sup> we write <sup>t</sup> <sup>∈</sup><sup>A</sup> <sup>S</sup> if there is a term <sup>t</sup> <sup>∈</sup> <sup>S</sup> such that <sup>t</sup> <sup>∼</sup>=<sup>A</sup> <sup>t</sup> . For terms t, s <sup>∈</sup> TermsΣ, we say s is a *superterm modulo* A of t, denoted t -<sup>A</sup> s if there are terms t , s <sup>∈</sup> Terms<sup>Σ</sup> such that t ∼=<sup>A</sup> t , s ∼=<sup>A</sup> s and s is a superterm of t .

With the above notation in mind, we now review the notion of coherence.

**Definition 2 (Coherent Executions and Programs** [39]**).** *An execution* <sup>π</sup> <sup>∈</sup> Π<sup>∗</sup> <sup>V</sup> *is said to be* coherent *if it satisfies the following two conditions.*

**Memoizing.** *Let* ρ = ρ ·*"*<sup>x</sup> := <sup>f</sup>(y)*" be a prefix of* <sup>π</sup>*. If* <sup>t</sup><sup>x</sup> <sup>=</sup> *<sup>T</sup>*(ρ, x) <sup>∈</sup>α(ρ) *<sup>T</sup>*(ρ )*, then there is a variable* <sup>z</sup> <sup>∈</sup> <sup>V</sup> *such that* <sup>t</sup><sup>x</sup> <sup>∼</sup>=α(ρ) <sup>t</sup>z*, where* <sup>t</sup><sup>z</sup> <sup>=</sup> *<sup>T</sup>*(ρ , z)*.*

**Early Assumes.** *Let* <sup>ρ</sup> <sup>=</sup> <sup>ρ</sup> · *"assume*(<sup>x</sup> <sup>=</sup> <sup>y</sup>)*" be a prefix of* <sup>π</sup>*,* <sup>t</sup><sup>x</sup> <sup>=</sup> *<sup>T</sup>*(ρ , x) *and* t<sup>y</sup> = *T*(ρ , y)*. If there is a term* <sup>s</sup> <sup>∈</sup> *<sup>T</sup>*(ρ ) *such that either* t<sup>x</sup> <sup>α</sup>(ρ) s *or* t<sup>y</sup> <sup>α</sup>(ρ) <sup>s</sup>*, then there is a variable* <sup>z</sup> <sup>∈</sup> <sup>V</sup> *such that* <sup>s</sup> <sup>∼</sup>=α(ρ) <sup>t</sup>z*, where* t<sup>z</sup> = *T*(ρ , z)*.*

*A program* <sup>p</sup> *is coherent if every complete execution* <sup>π</sup> <sup>∈</sup> *Exec*(p) *is coherent.*

The following theorems due to [39] establish the decidability of verifying coherent programs and also of checking if a program is coherent.

**Theorem 4 (**[39]**).** *The verification problem for coherent programs, i.e. checking if a given uninterpreted coherent program is correct, is decidable.*

**Theorem 5 (**[39]**).** *The problem of checking coherence, i.e. checking if a given uninterpreted program is coherent, is decidable.*

The techniques used in [39] are automata theoretic. They allow us to construct an automaton <sup>A</sup>exec<sup>2</sup>, of size <sup>O</sup>(2poly(|<sup>V</sup> <sup>|</sup>) ), which accepts all coherent executions that are also correct.

To give some intuition for the notion of coherence, we illustrate simple example programs that are not coherent. Consider program p<sup>0</sup> below, which is not coherent because it fails to be memoizing.

$$p\_0 \quad \stackrel{\Delta}{=} \quad \mathbf{x} := \mathbf{f}\,\mathbf{(y)};\,\mathbf{x} := \mathbf{f}\,\mathbf{(x)};\,\mathbf{z} := \mathbf{f}\,\mathbf{(y)};$$

The first and third statements compute <sup>f</sup>(y), storing it in variables <sup>x</sup> and <sup>z</sup>, respectively, but the term is *dropped* after the second statement and hence is not contained in any program variable when the third statement executes. Next consider program p1, which is not coherent because it fails to have early assumes.

$$p\_1 \quad \overset{\Delta}{=} \quad \mathbf{x} := \mathbf{f}\{\mathbf{w}\}; \mathbf{x} := \mathbf{f}\{\mathbf{x}\}; \mathbf{y} := \mathbf{f}\{\mathbf{z}\}; \mathbf{y} := \mathbf{f}\{\mathbf{y}\}; \text{ assume}\{\mathbf{w} = \mathbf{z}\}$$

Indeed, the assume statement is not early because superterms of w and z, namely <sup>f</sup>(w) and <sup>f</sup>(z), were computed and subsequently dropped before the assume.

Intuitively, the coherence conditions are necessary to allow equality information to be tracked with finite memory. We can make this stark by tweaking the example for p<sup>1</sup> above as follows.

$$\begin{aligned} p'\_1 &\stackrel{\Delta}{=} \quad \mathbf{x} := \mathbf{f}\{\mathbf{w}\}; \underbrace{\mathbf{x} := \mathbf{f}\{\mathbf{x}\} \cdots \mathbf{x}}\_{n \text{ times}} & \text{if } \mathbf{x} \text{)};\\ \mathbf{y} &:= \mathbf{f}\{\mathbf{z}\}; \underbrace{\mathbf{y} := \mathbf{f}\{\mathbf{y}\} \cdots \mathbf{y}}\_{n \text{ times}} & \text{assume}\{\mathbf{w} = \mathbf{z}\}.\end{aligned}$$

Observe that, for large n (e.g. n > 100), many terms are computed and dropped by this program, like <sup>f</sup> <sup>42</sup>(x) and <sup>f</sup> <sup>99</sup>(y) for instance. The difficulty with this

<sup>2</sup> We use superscripts ' ' and ' ' for word and tree automata, respectively.

program, from a verification perspective, is that the assume statement entails equalities between many terms which have not been kept track of. Imagine trying to verify the following program

$$p\_2 \quad \triangleq \quad p'\_1; \texttt{assert} \left(\mathbf{x} = \mathbf{y}\right)$$

Let πp 1 <sup>∈</sup> Exec(p <sup>1</sup>) be the unique complete execution of p <sup>1</sup>. If we examine the details, we see that t<sup>x</sup> = T(πp 1 , x) = <sup>f</sup> <sup>101</sup>(w) and <sup>t</sup><sup>y</sup> <sup>=</sup> <sup>T</sup>(πp 1 , y) = <sup>f</sup> <sup>101</sup>(z). The assertion indeed holds because <sup>t</sup><sup>x</sup> <sup>∼</sup>={(w, z-)} <sup>t</sup>y. However, to keep track of this fact requires remembering an arbitrary number of terms that grows with the size of the program. Finally, we note that the coherence restriction is met by many single-pass algorithms, e.g. searching and manipulation of lists and trees.

#### **5.2 Overview of the Synthesis Procedure**

Our synthesis procedure uses tree automata. We consider tree representations of programs, or *program trees*. The synthesis problem is thus to check if there is a program tree whose corresponding program is coherent, correct, and belongs to the input grammar G.

The synthesis procedure works as follows. We first construct a top-down tree automaton <sup>A</sup><sup>G</sup> that accepts the set of trees corresponding to the programs generated by G. We next construct another tree automaton Acc, which accepts all trees corresponding to programs that are coherent and correct. Acc is a two-way alternating tree automaton that simulates all executions of an input program tree and checks that each is both correct and coherent. In order to simulate longer and longer executions arising from constructs like **while**-loops, the automaton traverses the input tree and performs multiple passes over subtrees, visiting the internal nodes of the tree many times. We then translate the two-way alternating tree automaton to an equivalent (one-way) nondeterministic top-down tree automaton by adapting results from [33,53] to our setting. Finally, we check emptiness of the intersection between this top-down automaton and the grammar automaton <sup>A</sup>G. The definitions for trees and the relevant automata are standard, and we refer the reader to [14] and to our technical report [30].

#### **5.3 Tree Automaton for Program Trees**

Every program can be represented as a tree whose leaves are labeled with basic statements like "x := y" and whose internal nodes are labeled with constructs like **while** and **seq** (an alias for the sequencing construct '**;**'), which have subprograms as children. Essentially, we represent the set of programs generated by an input grammar G as a regular set of program trees, accepted by a nondeterministic top-down tree automaton <sup>A</sup>G. The construction of <sup>A</sup><sup>G</sup> mimics the standard construction for tree automata that accept *parse trees* of context free grammars. The formalization of this intuition is straightforward, and we refer the reader to [30] for details. We note the following fact regarding the construction of the acceptor of program trees from a particular grammar G.

**Lemma 1.** <sup>A</sup><sup>G</sup> *has size* <sup>O</sup>(|G|) *and can be constructed in time* <sup>O</sup>(|G|)*.*

#### **5.4 Tree Automaton for Simulating Executions**

We now discuss the construction of the two-way alternating tree automaton Acc that underlies our synthesis procedure. A two-way alternating tree automaton consists of a finite set of states and a transition function that maps tuples (q, m, a) of state, incoming direction, and node labels to positive Boolean formulas over pairs (q , m ) of next state and next direction. In the case of our binary program trees, incoming directions come from {D, UL, UR}, corresponding to coming down from a parent, and up from left and right children. Next directions come from {U, L, R}, corresponding to going up to a parent, and down to left and right children.

The automaton Acc is designed to accept the set of all program trees that correspond to correct and coherent programs. This is achieved by ensuring that a program tree is accepted precisely when all executions of the program it represents are accepted by the word automaton Aexec (Sect. 5.1). The basic idea behind <sup>A</sup>cc is as follows. Given a program tree <sup>T</sup> as input, <sup>A</sup>cc traverses <sup>T</sup> and explores all the executions of the associated program. For each execution <sup>σ</sup>, <sup>A</sup>cc keeps track of the state that the word automaton Aexec would reach after reading <sup>σ</sup>. Intuitively, an accepting run of <sup>A</sup>cc is one which never visits the unique rejecting state of Aexec during simulation.

We now give the formal description of <sup>A</sup>cc = (Qcc, Icc, δcc <sup>0</sup> , δcc <sup>1</sup> , δcc <sup>2</sup> ), which works over the alphabet Γ<sup>V</sup> described in Sect. 5.3.

*States.* Both the full set of states and the initial set of states for Acc coincide with those of the word automaton <sup>A</sup>exec. That is, <sup>Q</sup>cc <sup>=</sup> <sup>Q</sup>exec and <sup>I</sup>cc <sup>=</sup> {qexec <sup>0</sup> }, where qexec <sup>0</sup> is the unique starting state of Aexec.

*Transitions.* For intuition, consider the case when the automaton's control is in state q reading an internal tree node n with one child and which is labeled by a = "**while**(x = y)". In the next step, the automaton simultaneously performs two transitions corresponding to two possibilities: entering the loop after assuming the guard "x = y" to be true and exiting the loop with the guard being false. In the first of these simultaneous transitions, the automaton moves to the left child <sup>n</sup>·L, and its state changes to <sup>q</sup> <sup>1</sup>, where q <sup>1</sup> = δexec(q, "**assume**(x = y)"). In the second simultaneous transition, the automaton moves to the parent node <sup>n</sup>·<sup>U</sup> (searching for the next statement to execute, which follows the end of the loop) and changes its state to q <sup>2</sup>, where q <sup>2</sup> <sup>=</sup> <sup>δ</sup>exec(q, "**assume**(<sup>x</sup> -= y)"). We encode these two possibilities as a *conjunctive* transition of the two-way alternating automaton. That is, δcc <sup>1</sup> (q, m, a) = - (q <sup>1</sup>, L) <sup>∧</sup> (q <sup>2</sup>, U) .

For every i, m, a, we have <sup>δ</sup>i(qreject, m, a) = <sup>⊥</sup>, where <sup>q</sup>reject is the unique, absorbing rejecting state of Aexec. Below we describe the transitions from all other states <sup>q</sup> -<sup>=</sup> <sup>q</sup>reject. All transitions <sup>δ</sup>i(q, m, a) not described below are <sup>⊥</sup>.

*Transitions from the Root.* At the root node, labeled by "**root**", the automaton transitions as follows:

$$\delta\_1^{\mathbf{cc}}(q, m, \mathbf{root}) = \begin{cases} (q, L) \text{ if } m = D \\ \mathbf{true} \text{ otherwise} \end{cases}$$

A two-way tree automaton starts in the configuration where m is set to D. This means that in the very first step the automaton moves to the child node (direction L). If the automaton visits the root node in a subsequent step (marking the completion of an execution), then all transitions are enabled.

*Transitions from Leaf Nodes.* For a leaf node with label <sup>a</sup> <sup>∈</sup> <sup>Γ</sup><sup>0</sup> and state <sup>q</sup>, the transition of the automaton is δcc <sup>0</sup> (q, D, a)=(δexec(q, a), U). That is, when the automaton visits a leaf node from the parent, it simulates reading <sup>a</sup> in <sup>A</sup>exec and moves to the resulting state in the parent node.

*Transitions from* "**while**" *Nodes.* As described earlier, when reading a node labeled by "**while**(<sup>x</sup> <sup>∼</sup> <sup>y</sup>)", where ∼∈ {=, -=}, the automaton simulates both the possibility of entering the loop body as well as the possibility of exiting the loop. This corresponds to a conjunctive transition:

$$\begin{aligned} \delta\_1^{\mathbf{ex}}(q, m, \text{``while}(x \sim y)\text{''}) &= (q', L) \land (q'', U) \\ \text{where } q' &= \delta^{\mathbf{exec}}(q, \text{``assume}(x \sim y)\text{''}) \\ \text{and } q'' &= \delta^{\mathbf{exec}}(q, \text{``assume}(x \not\sim y)\text{''}) \end{aligned}$$

Above, -∼ refers to " = " when ∼ is " -= ", and vice versa. The first conjunct corresponds to the execution where the program enters the loop body (assuming the guard is true), and thus control moves to the left child of the current node, which corresponds to the loop body. The second conjunct corresponds to the execution where the loop guard is false and the automaton moves to the parent of the current tree node. Notice that, in both the conjuncts above, the direction in which the tree automaton moves does not depend on the last move m of the state. That is, no matter how the program arrives at a **while** statement, the automaton simulates both the possibilities of entering or exiting the loop body.

*Transitions from* "**ite**" *Nodes.* At a node labeled "**ite**(<sup>x</sup> <sup>∼</sup> <sup>y</sup>)", when coming down the tree from the parent, the automaton simulates both branches of the conditional:

$$\begin{array}{c} \delta\_2^{\text{ec}}(q, D, \text{``ite}(x \sim y) \text{''}) = (q', L) \land (q'', R) \\ \text{where } q' = \delta^{\text{exec}}(q, \text{``assume}(x \sim y) \text{''}) \\ \text{and } q'' = \delta^{\text{exec}}(q, \text{``assume}(x \not\sim y) \text{''}) \end{array}$$

The first conjunct in the transition corresponds to simulating the word automaton on the condition <sup>x</sup> <sup>∼</sup> <sup>y</sup> and moving to the left child, i.e. the body of the **then** branch. Similarly, the second conjunct corresponds to simulating the word automaton on the negation of the condition and moving to the right child, i.e. the body of the **else** branch.

Now consider the case when the automaton moves *up* to an **ite** node from a child node. In this case, the automaton moves up to the parent node (having completed simulation of the **then** or **else** branch) and the state q remains unchanged:

$$\delta\_2^{\mathbf{c}\mathbf{c}}(q, m, \text{``ite}(x \sim y)\text{''}) = (q, U) \qquad m \in \{U\_L, U\_R\}$$

*Transitions from* "**seq**" *Nodes.* In this case, the automaton moves either to the left child, the right child, or to the parent, depending on the last move. It does not change the state component. Formally,

$$\delta\_2^{\mathbf{ce}}(q, m, \text{``seq''}) = \begin{cases} (q, L) \text{ if } m = D\\ (q, R) \text{ if } m = U\_L\\ (q, U) \text{ if } m = U\_R \end{cases}$$

The above transitions match the straightforward semantics of sequencing two statements s1; s2. If the automaton visits from the parent node, it next moves to the left child to simulate s1. When it finishes simulating s1, it comes up from the left child and enters the right child to begin simulating s2. Finally, when simulation of s<sup>2</sup> is complete, the automaton moves to the parent node, exiting the subtree.

The following lemma asserts the correctness of the automaton construction and states its complexity.

**Lemma 2.** <sup>A</sup>*cc accepts the set of all program trees corresponding to correct, coherent programs. It has size* |A*cc*<sup>|</sup> <sup>=</sup> <sup>O</sup>(2poly(|<sup>V</sup> <sup>|</sup>) )*, and can be constructed in* O(2poly(|<sup>V</sup> <sup>|</sup>) ) *time.*

#### **5.5 Synthesis Procedure**

The rest of the synthesis procedure goes as follows. We first construct a nondeterministic <sup>t</sup>op-down tree automaton <sup>A</sup>cc-td such that <sup>L</sup>(Acc-td) = <sup>L</sup>(Acc). An adaptation of results from [33,53] ensures that <sup>A</sup>cc-td has size |Acc-td<sup>|</sup> <sup>=</sup> <sup>O</sup>(2<sup>2</sup>poly(|*<sup>V</sup>* <sup>|</sup>) ) and can be constructed in time O(2<sup>2</sup>poly(|*<sup>V</sup>* <sup>|</sup>) ). Next we construct a top-down nondeterministic tree automaton <sup>A</sup> such that <sup>L</sup>(<sup>A</sup> ) = <sup>L</sup>(Acc-td) <sup>∩</sup> <sup>L</sup>(AG) = <sup>L</sup>(Acc)∩L(AG), with size |A <sup>|</sup> <sup>=</sup> <sup>O</sup>(2<sup>2</sup>poly(|*<sup>V</sup>* <sup>|</sup>) ·|G|) and in time <sup>O</sup>(|Acc-td|·|AG|) = O(2<sup>2</sup>poly(|*<sup>V</sup>* <sup>|</sup>) · |G|). Finally, checking emptiness of A can be done in time <sup>O</sup>(|A <sup>|</sup>) = <sup>O</sup>(2<sup>2</sup>poly(|*<sup>V</sup>* <sup>|</sup>) · |G|). If non-empty, a program tree can be constructed. This gives us the central upper bound result of the paper.

**Theorem 6.** *The grammar-restricted synthesis problem for uninterpreted coherent programs is decidable in* 2EXPTIME*, and in particular, in time doubly exponential in the number of variables and linear in the size of the input grammar. Furthermore, a tree automaton representing the set of* all *correct coherent programs that conform to the grammar can be constructed in the same time.*

#### **5.6 Matching Lower Bound**

Our synthesis procedure is optimal. We prove a 2EXPTIME lower bound for the synthesis problem by reduction from the 2EXPTIME-hard acceptance problem of *alternating* Turing machines (ATMs) with exponential space bound [12]. Full details of the reduction can be found in [30].

**Theorem 7.** *The grammar-restricted synthesis problem for coherent uninterpreted programs is* 2EXPTIME*-hard.*

#### **6 Further Results**

In this section, we give results for variants of uninterpreted program synthesis in terms of transition systems, Boolean programs, and recursive programs.

#### **6.1 Synthesizing Transition Systems**

Here, rather than synthesizing programs from grammars, we consider instead the synthesis of transition systems whose executions must belong to a regular set. Our main result is that the synthesis problem in this case is EXPTIME-complete, in contrast to grammar-restricted program synthesis which is 2EXPTIMEcomplete.

**Transition System Definition and Semantics.** Let us fix a set of program variables V as before. We consider the following finite alphabet

$$\Sigma\_V = \{ \text{"\$x := y"}, \text{"\$x := f(z)"}, \text{"\$\textbf{assert}(\perp)"}, \text{"\textbf{check}}(x = y)" \mid x, y, \in V, \overline{z} \in V^r \} $$

Let us define <sup>Γ</sup><sup>V</sup> <sup>⊆</sup> <sup>Σ</sup><sup>V</sup> to be the set of all elements of the form "**check**(<sup>x</sup> <sup>=</sup> <sup>y</sup>)", where x, y <sup>∈</sup> <sup>V</sup> . We refer to the elements of <sup>Γ</sup><sup>V</sup> as *check* letters.

A (deterministic) transition system T S over V is a tuple (Q, q0, H, λ, δ), where <sup>Q</sup> is a finite set of states, <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup> is the initial state, <sup>H</sup> <sup>⊆</sup> <sup>Q</sup> is the set of halting states, <sup>λ</sup> : <sup>Q</sup> <sup>→</sup> <sup>Σ</sup><sup>V</sup> is a labeling function such that for any <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, if <sup>λ</sup>(q)="**assert**(⊥)" then <sup>q</sup> <sup>∈</sup> <sup>H</sup>, and <sup>δ</sup> : (<sup>Q</sup> \ <sup>H</sup>) <sup>→</sup> <sup>Q</sup> <sup>∪</sup> (<sup>Q</sup> <sup>×</sup> <sup>Q</sup>) is a transition function such that for any <sup>q</sup> <sup>∈</sup> <sup>Q</sup> \ <sup>H</sup>, <sup>δ</sup>(q) <sup>∈</sup> <sup>Q</sup> <sup>×</sup> <sup>Q</sup> iff <sup>λ</sup>(q) <sup>∈</sup> <sup>Γ</sup>*V*.

We define the semantics of a transition system using the set of executions that it generates. A *(partial) execution* π of a transition system T S = (Q, q0, H, λ, δ) over variables V is a finite word over the induced execution alphabet Π<sup>V</sup> (from Sect. 3) with the following property. If <sup>π</sup> <sup>=</sup> <sup>a</sup>0a<sup>1</sup> ...a<sup>n</sup> with <sup>n</sup> <sup>≥</sup> 0, then there exists a sequence of states <sup>q</sup><sup>j</sup>0 , q<sup>j</sup>1 ,...,q<sup>j</sup>*<sup>n</sup>* with <sup>q</sup><sup>j</sup>0 <sup>=</sup> <sup>q</sup><sup>0</sup> such that (0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>):

$$\begin{aligned} \text{- } & \text{If } \lambda(q\_{j\_i}) \notin I\_V \text{ then } a\_i = \lambda(q\_{j\_i}), \text{ and if } i < n \text{ then } q\_{j\_{i+1}} = \delta(q\_{j\_i}).\\ \text{- } & \text{Otherwise } \begin{cases} \text{either} \quad a\_i = \mathsf{assume}(x = y) \text{" and } i < n \Rightarrow q\_{j\_{i+1}} = \delta(q\_{j\_i}) \mid\_{1,1} \\ \text{or } & a\_i = \mathsf{"assume}(x \neq y) \text{" and } i < n \Rightarrow q\_{j\_{i+1}} = \delta(q\_{j\_i}) \mid\_{2} \end{cases} \end{aligned}$$

In the above, we denote pair projection with , i.e., (t1, t2) i= ti, where <sup>i</sup> ∈ {1, <sup>2</sup>}. A *complete execution* is an execution whose corresponding final state (q<sup>n</sup> above) is in H. For any transition system T S, we denote the set of its executions by Exec(T S) and the set of its complete executions by CompExec(T S). The notions of *correctness* and *coherence* for transition systems are identical to their counterparts for programs.

**The Transition System Synthesis Problem.** We consider transition system specifications that place restrictions on executions (both partial and complete) using two regular languages S and R. Executions must belong to the first language S (which is prefix-closed) and all complete executions must belong to the second language <sup>R</sup>. A specification is given as two deterministic automata <sup>A</sup><sup>S</sup> and <sup>A</sup><sup>R</sup> over executions, where <sup>L</sup>(A<sup>S</sup> ) = <sup>S</sup> and <sup>L</sup>(AR) = <sup>R</sup>. For a transition system T S and specification automata <sup>A</sup><sup>S</sup> and <sup>A</sup>R, whenever Exec(T S) <sup>⊆</sup> <sup>L</sup>(A<sup>S</sup> ) and CompExec(T S) <sup>⊆</sup> <sup>L</sup>(AR) we say that T S satisfies its (syntactic) specification. Note that this need not entail correctness of T S. Splitting the specification into partial executions S and complete executions R allows us, among other things, to constrain the executions of non-halting transition systems.

**Definition 3 (Transition System Realizability and Synthesis).** *Given a finite set of program variables* <sup>V</sup> *and deterministic specification automata* <sup>A</sup><sup>S</sup> *(prefix-closed) and* <sup>A</sup><sup>R</sup> *over the execution alphabet* <sup>Π</sup><sup>V</sup> *, decide if there is a* correct*, coherent transition system* T S *over* V *that satisfies the specification. Furthermore, produce one if it exists.*

Since programs are readily translated to transition systems (of similar size), the transition system synthesis problem seems, at first glance, to be a problem that ought to have similar complexity. However, as we show, it is crucially different in that it allows the synthesized transition system to have *complete information* of past commands executed at any point. We will observe in this section that the transition system synthesis problem is EXPTIME-complete.

To see the difference between program and transition system synthesis, consider program skeleton P from Example 2 in Sect. 2. The problem is to fill the hole in P with either y := T or y := F. Observe that when P executes, there are *two* different executions that lead to the hole. In grammar-restricted program synthesis, the hole must be filled by a sub-program that is executed *no matter how the hole is reached*, and hence no such program exists. However, when we model this problem in the setting of transition systems, the synthesizer is able to produce transitions that depend on how the hole is reached. In other words, it does not fill the hole in P with *uniform* code. In this sense, in grammar-restricted program synthesis, programs have *incomplete information* of the past. We crucially exploited this difference in the proof of 2EXPTIME-hardness for grammarrestricted program synthesis (see [30]). No such incomplete information can be enforced by regular execution specifications in transition system synthesis, and indeed the problem turns out to be easier: transition system realizability and synthesis are EXPTIME-complete.

**Theorem 8.** *Transition system realizability is decidable in time exponential in the number of program variables and polynomial in the size of the automata* A<sup>S</sup> *and* AR*. Furthermore, the problem is* EXPTIME*-complete. When realizable, within the same time bounds we can construct a correct, coherent transition system whose partial and complete executions are in* <sup>L</sup>(A<sup>S</sup> ) *and* <sup>L</sup>(AR)*, respectively.*

#### **6.2 Synthesizing Boolean Programs**

Here we observe corollaries of our results when applied to the more restricted problem of synthesizing Boolean programs.

In Boolean program synthesis we interpret variables in programs over the Boolean domain {T,F}, and we disallow computations of uninterpreted functions and the checking of uninterpreted relations. Standard Boolean functions like ∧ and ¬ are instead allowed, but note that these can be modeled using conditional statements. We allow for *nondeterminism* with a special assignment "b := \*", which assigns b nondeterministically to T or F. As usual, a program is correct when it satisfies all its assertions.

Synthesis of Boolean programs can be reduced to uninterpreted program synthesis using two special constants T and F. Each nondeterministic assignment is modeled by computing a next function on successive nodes of a linked list, accessing a nondeterministic value by computing key on the current node, and assuming the result is either T or F. Since uninterpreted programs must satisfy assertions in all models, this indeed captures nondeterministic assignment. Further, every term ever computed in such a program is equivalent to T or F (by virtue of the interleaved **assume** statements), making the resulting program coherent. The 2EXPTIME upper bound for Boolean program synthesis now follows from Theorem 6. We further show that, perhaps surprisingly, the 2EXPTIME lower bound from Sect. 5 can be adapted to prove 2EXPTIMEhardness of Boolean program synthesis.

**Theorem 9.** *The grammar-restricted synthesis problem for Boolean programs is* 2EXPTIME*-complete, and can be solved in time doubly-exponential in the number of variables and linear in the size of the input grammar.*

Thus synthesis for coherent uninterpreted programs is no more complex than Boolean program synthesis, establishing decidability and complexity of a problem which has found wide use in practice—for instance, the synthesis tool Sketch solves precisely this problem, as it models integers using a small number of bits and allows grammars to restrict programs with holes.

#### **6.3 Synthesizing Recursive Programs**

We extend the positive result of Sect. 5 to synthesize coherent recursive programs. The setup for the problem is very similar. Given a grammar that identifies a class of recursive programs, the goal is to determine if there is a program in the grammar that is coherent and correct.

The syntax of recursive programs is similar to the non-recursive case, and we refer the reader to [30] for details. In essence, programs are extended with a new function call construct. Proofs are similar in structure to the non-recursive case, with the added challenge of needing to account for recursive function calls and the fact that Aexec becomes a (visibly) pushdown automaton rather than a standard finite automaton. This gives a 2EXPTIME algorithm for synthesizing recursive programs; a matching lower bound follows from the non-recursive case.

**Theorem 10.** *The grammar-restricted synthesis problem for uninterpreted coherent* recursive *programs is* 2EXPTIME*-complete. The algorithm is doubly exponential in the number of program variables and linear in the size of the input grammar. Furthermore, a tree automaton representing the set of all correct, coherent recursive programs that conform to the grammar can be constructed in the same time.*

#### **7 Related Work**

The automata and game-theoretic approaches to synthesis date back to a problem proposed by Church [13], after which a rich theory emerged [9,18,32,48]. The problems considered in this line of work typically deal with a system reacting to an environment interactively using a finite set of signals over an infinite number of rounds. Tree automata over infinite *trees*, representing strategies, with various infinitary acceptance conditions (B¨uchi, Rabin, Muller, parity) emerged as a uniform technique to solve such synthesis problems against temporal logic specifications with optimal complexity bounds [31,38,44,45]. In this paper, we use an alternative approach from [35] that works on *finite* program trees, using two-way traversals to simulate iteration. The work in [35], however, uses such representations to solve synthesis problems for programs over a fixed finite set of Boolean variables and against LTL specifications. In this work we use it to synthesize coherent programs that have finitely many variables working over infinite domains endowed with functions and relations.

While decidability results for program synthesis beyond finite data domains are uncommon, we do know of some results of this kind. First, there are decidability results known for synthesis of tranducers with registers [29]. Transducers interactively read a stream of inputs and emit a stream of outputs. Finite-state tranducers can be endowed with a set of registers for storing inputs and doing only equality/disequality comparisons on future inputs. Synthesis of such transducers for temporal logic specifications is known to be decidable. Note that, although the data domain is infinite, there are no functions or relations on data (other than equality), making it a much more restricted class (and grammarbased approaches for syntactically restricting transducers has not been studied). Indeed, with uninterpreted functions and relations, the synthesis problem is undecidable (Theorem 1), with decidability only for coherent programs. In [11], the authors study the problem of synthesizing uninterpreted terms from a grammar that satisfy a first-order specification. They give various decidability and undecidability results. In contrast, our results are for programs with conditionals and iteration (but restricted to coherent programs) and for specifications using assertions in code.

Another setting with a decidable synthesis result over unbounded domains is work on strategy synthesis for linear arithmetic *satisfiability* games [17]. There it is shown that for a satisfiability game, in which two players (SAT and UNSAT) play to prove a formula is satisfiable (where the formula is interpreted over the theory of linear rational arithmetic), if the SAT player has a winning strategy then a strategy can be synthesized. Though the data domain (rationals) is infinite, the game consists of a finite set of interactions and hence has no need for recursion. The authors also consider reachability games where the number of rounds can be unbounded, but present only sound and incomplete results, as checking who wins in such reachability games is undecidable.

Tree automata techniques for accepting finite parse trees of programs was explored in [37] for synthesizing reactive programs with variables over finite domains. In more recent work, automata on finite trees have been explored for synthesizing data completion scripts from input-output examples [55], for accepting programs that are verifiable using abstract interpretations [54], and for relational program synthesis [56].

The work in [36] explores a decidable logic with ∃<sup>∗</sup>∀<sup>∗</sup> prefixes that can be used to encode synthesis problems with background theories like arithmetic. However, encoding program synthesis in this logic only expresses programs of finite size. Another recent paper [27] explores sound (but incomplete) techniques for showing unrealizability of syntax-guided synthesis problems.

#### **8 Conclusions**

We presented foundational results on synthesizing coherent programs with uninterpreted functions and relations. To the best of our knowledge, this is the first natural decidable program synthesis problem for programs of arbitrary size which have iteration/recursion, and which work over infinite domains.

The field of program synthesis lacks theoretical results, and especially decidability results. We believe our results to be the first of their kind to fill this lacuna, and we find this paper exciting because it bridges the worlds of program synthesis and the rich classical synthesis frameworks of systems over finite domains using tree automata [9,18,32,48]. We believe this link could revitalize both domains with new techniques and applications.

Turning to practical applications of our results, several questions require exploration in future work. First, one might question the utility of programs that verify only with respect to uninterpreted data domains. Recent work [10] has shown that verifying programs using uninterpreted abstractions can be extremely effective in practice for proving programs correct. Also, recent work by Mathur et al. [40] explores ways to add *axioms* (such as commutativity of functions, axioms regarding partial orders, etc.) and yet preserve decidability of verification. The methods used therein are compatible with our technique, and we believe our results can be extended smoothly to their decidable settings. A more elaborate way to bring in complex theories (like arithmetic) would be to marry our technique with the *iterative* automata-based software verification technique pioneered by work behind the Ultimate tool [23–26]; this won't yield decidable synthesis, but still could result in *complete* synthesis procedures.

The second concern for practicality is the coherence restriction. There is recent work by Mathur et al. [41] that shows single-pass heap-manipulating programs respect a (suitably adapted) notion of coherence. Adapting our technique to this setting seems feasible, and this would give an interesting application of our work. Finally, it is important to build an implementation of our procedure in a tool that exploits pragmatic techniques for constructing tree automata, and the techniques pursued in [54–56] hold promise.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Must Fault Localization for Program Repair**

Bat-Chen Rothenberg and Orna Grumberg(B)

Technion - Israel Institute of Technology, Haifa, Israel {batg,orna}@cs.technion.ac.il

**Abstract.** This work is concerned with fault localization for automated program repair.

We define a novel concept of a *must* location set. Intuitively, such a set includes at least one program location from every repair for a bug. Thus, it is impossible to fix the bug without changing at least one location from this set. A fault localization technique is considered a *must* algorithm if it returns a must location set for every buggy program and every bug in the program. We show that some traditional fault localization techniques are not must.

We observe that the notion of must fault localization depends on the chosen repair scheme, which identifies the changes that can be applied to program statements as part of a repair. We develop a new algorithm for fault localization and prove that it is *must* with respect to commonly used schemes in automated program repair.

We incorporate the new fault localization technique into an existing mutation-based program repair algorithm. We exploit it in order to prune the search space when a buggy mutated program has been generated. Our experiments show that must fault localization is able to significantly speed-up the repair process, without losing any of the potential repairs.

#### **1 Introduction**

Fault localization and automated program repair have long been combined. Traditionally, given a buggy program, fault localization suggests locations in the program that might be the cause of the bug. Repair then attempts to change those suspicious locations in order to eliminate the bug.

Bad fault localization may cause a miss of potential repairs, if it is too restrictive, or cause an extra work, if it is too permissive. Studies have shown that for test-based repair imprecise fault localizations happen very often in practice [27]. This identifies the need for fault localization that can narrow down the space of candidates while still promising not to lose potential causes for a bug.

In this work, we define the concept of a *must* location set. Intuitively, such a set includes at least one location from every repair for the bug. Thus, it *must* be

c The Author(s) 2020

This research was partially supported by the Technion Hiroshi Fujiwara cyber security research center and the Israel cyber bureau and partially by the Israel Science Foundation.

S. K. Lahiri and C. Wang (Eds.): CAV 2020, LNCS 12225, pp. 658–680, 2020. https://doi.org/10.1007/978-3-030-53291-8\_33

used for repair. In other words, **it is impossible to fix the bug using only locations outside this set**. A fault localization technique is considered a *must* algorithm if it returns a must location set for every buggy program and every bug in the program.

To demonstrate the importance of the *must* notion, consider the program in Fig. 1 for computing the absolute value of a variable x. The program is buggy since the assertion in location 4 is violated when initially x = -1. Intuitively, a good repair would replace the condition (x < -1) in location 2 with condition x <= -1. Our must fault localization, defined formally in the paper, will include location 2 in the must location set. In contrast, the fault localization techniques defined for instance in [14,21] do not include 2 in their location sets: They are not must and may miss optional repairs.

Our first observation regarding must notions is that their definition should take into account the *repair scheme* under consideration. A repair scheme identifies the changes that can be applied to program statements as part of a repair. A scheme can allow, for instance, certain syntactic changes in a condition (e.g. replacing < with >) or in the right-hand-side expression of an assignment (e.g. replacing + by -). A particular location set can be a *must* set using one scheme, but non-*must* using another. We further discuss this observation when presenting our formal definition of a must fault localization.

The setting of our work is as follows. Our approach is formula-based rather than test-based. We handle simple C-programs, with specification given as assertions in the code. Similarly to bounded model checking tools (e.g. [8]), the program and the negated specification are translated to a set of constraints, whose conjunction forms the *program formula*. This formula is satisfiable if and only if the program violates an assertion, in which case a satisfying assignment (also called a *model*) is returned.

We focus on a simple repair scheme of syntactic changes, as described above. We assume that the user prefers repairs that are as close to the original program as possible and will want to get several repair suggestions. Thus, we return *all minimal repairs* (minimal in the number of changes applied to the program code).

Once the notion of must fault localization is defined, we develop a new algorithm for fault localization and prove that it is *must* with respect to syntactic mutation schemes. The input to the algorithm is a program formula ϕ and a model μ for ϕ, representing a buggy execution of the program. Our approach is based on a dynamic-slicing-like algorithm that computes dependencies.

For a variable v in ϕ, its slice F is computed based on dynamic dependencies among variables in ϕ, whose values influence the value of v in μ. Informally, F is a must location set that contains all assignment to the variables that v depends upon. Some assignment from F thus must be changed in order to eliminate the bug associated with μ.

We incorporated the new fault localization technique into an existing mutation-based program repair algorithm [38]. In [38], the repair scheme is based on a predefined set of mutations. Given a buggy program P, the goal of the algorithm is to return all minimal repairs for P. The algorithm goes through iterations of generate-validate, where the generate part produces a mutated program of P and the validate part checks whether it is bounded-correct. The bottleneck of the algorithm is the size of the search space, consisting of all possible mutated programs of P. In [38], the search space has been pruned when the generated mutated program has been successfully validated. No pruning has been applied otherwise.

In this work, we exploit our novel *must* fault localization in order to prune the search space when a buggy mutated program P has been generated (i.e. validation failed). In this case, we compute the *must* location set F of P . We can now prune from the search space any mutated program whose F locations are identical to those of P . This is because, by the property of *must* location set, it is guaranteed that the bug cannot be repaired without changing a location in F. Thus, a large set of buggy mutated programs is pruned, without the need for additional validation and without losing any minimally repaired program. It should be noted that the smaller F is, the larger the pruned set is. Our experimental results confirm the effectiveness of this pruning by showing significant speedups.

To summarize, the contributions of this work are:


### **2 Motivating Example**

Figure 1 presents a simple program for computing the absolute value of a variable x. The result is computed in the variable abs, and the specification states, using an assertion on line 4, that in the end abs should always be non-negative. Unfortunately, the program

**procedure** absValue(x) 1: abs := x 2: **if** x < -1 **then** 3: abs := -x 4: **assert** (abs >= 0)

has a bug. The true branch of the if is intended to flip the sign of x whenever x is negative, but it accidentally misses the case where x is −1. As a result, if x is −1, the wrong branch of the if is taken, and the assertion is reached with abs = −1, which causes a violation.

Clearly, it is desirable that line number 2 be returned when running fault localization on this bug, as a human written repair is likely to change the condition on this line from x < −1 to x <= -1 or x<0. But, as we will show next, some of the existing formula-based fault localization techniques do not include this line in their result.

The error trace representing the bug for input I = {x ← −1} is π =< 1, 2, 4 > (this is the sequence of program locations visited when executing the program on I). The MAX-SAT-based fault localization technique of [21] and the errorinvariant-based technique of [14] use a formula called the *extended trace formula* in order to find faulty statements along the error trace. The extended trace formula for the bug in question is

$$\underbrace{(x=-1)}\_{\text{Input}} \land \underbrace{(abs=x)}\_{\text{Computation}} \land \underbrace{(x \ge -1)}\_{\text{Assection}} \land \underbrace{(abs \ge 0)}\_{\text{Assection}}$$

This formula encodes three things: a) that the input remains I, b) that the computation is as the trace dictates, and, c) that the assertion holds at the end. Therefore, the formula is unsatisfiable. Both [21] and [14] intuitively look for explanations of its unsatisfiability, and therefore decide that the statement (x ≥ −1) on line 2 is irrelevant; The formula remains unsatisfiable even if the constraint (x ≥ −1) is removed.

Even the method of [6], which suggests a flow-sensitive encoding of the extended trace formula, with the goal of including all statements affecting control-flow decisions that are relevant to the bug, classifies the statement on line 2 as irrelevant. This is because the error trace does not include any location from the body of the branch that was taken (in our case it is the else branch, which is empty), in which case the flow-sensitive formula remains identical to the traditional formula.

The dynamic slicing method of [2,23] also fails to include line 2 in its result. This method computes the set of statements influencing the evaluation of the assertion along the trace, using data and control dependency relations. A statement st<sup>1</sup> is data dependent on st<sup>2</sup> iff st<sup>1</sup> uses a variable x, and st<sup>2</sup> is the last to assign a value to x along the trace. In our example, the assertion on line 4 is data dependent only on the statement in line 1, which in itself is not data dependent on any other statement. A statement st<sup>1</sup> is control dependent on a conditional statement st<sup>2</sup> iff st<sup>1</sup> is inside the body of either branch of st2. None of the statements along our error trace is control dependent on another statement. The slice, which is the set of lines returned, is computed using the transitive closure of these relations. Thus, for our example, only line 1 is part of the slice.

In this example, we have seen how many different fault localization techniques fail to include a statement that is relevant, i.e., where a modification could be made for the bug to be fixed. In contrast, the set of locations returned by our technique for this example is {1, 2}. The fact that our technique includes line 2 is not a coincidence: We show that, intuitively, whenever a repair can be made by making changes to a single line, this line *must* be included in the result.

```
proc. foo(x, w)
1: t := 0
2: y := x - 3
3: z := x + 3
4: if (w>3) then
5: t := z + w
6: assert (t<x)
7: y := y + 10
8: assert (y>z)
                           proc. simFoo(x, w)
                           t := 0
                           y := x - 3
                           z := x + 3
                           g := w > 3
                           if (g) then
                               t := z + w
                               assert (t<x)
                               y := y + 10
                           assert (y>z)
                                                      proc. SSAFoo(x, w)
                                                      t0 := 0
                                                      y0 := x0 - 3
                                                      z0 := x0 + 3
                                                      g0 := w0 > 3
                                                      t1 := z0 + w0
                                                      assert (g0 → t1 < x0)
                                                      y1 := y0 + 10
                                                      t2 := g0 ? t1 : t0
                                                      y2 := g0 ? y1 : y0
                                                      assert (y2 > z0)
                                                                                ϕf oo = {
                                                                                t0 = 0,
                                                                                y0 = x0 − 3,
                                                                                z0 = x0 + 3,
                                                                                g0 = w0 > 3,
                                                                                t1 = z0 + w0,
                                                                                y1 = y0 + 10,
                                                                                t2 = ite(g0, t1, t0),
                                                                                y2 = ite(g0, y1, y0),
                                                                                ¬(y2 > z0) ∨ ¬(g0 → t1 < x0)
                                                                                }
```
**Fig. 2.** Example of the translation process of a simple program

In general, whenever a repair can be made by making changes to a set of lines, at least one of them must be included in the result.

#### **3 Preliminaries**

#### **3.1 Programs and Error Traces**

For our purposes, a *program* is a sequential program composed of standard statements: assignments, conditionals, loops and function calls, all with their standard semantics. Each statement is located at a certain *location* (or *line*) li, and all statements are defined over the set of program variables X.

In addition to the standard statements, a program may also contain *assume* statements of the form assume(bexpr), and *assert* statements of the form assert(bexpr). In both cases bexpr is a boolean expression over X. If an assume or an assert statement is located in li, execution of the program stops whenever location l<sup>i</sup> is reached in a state where bexpr is evaluated to false. In the case of an assertion, this early termination has the special name *assertion violation*, and it is an indication that an error has occurred.

A program P has a *bug on input* I if an assertion violation occurs during the execution of P on I. Otherwise, the program is *correct for* I. <sup>1</sup> Whenever P has a bug on I, this bug is associated with an *error trace*, which is the sequence of statements visited during the execution of P on I.

#### **3.2 From Programs to Program Formulas**

In this section we explain how a program is translated into a set of constraints, whose conjunction constitutes the program formula. In addition to constraints representing assignments and conditionals, such a formula includes constraints representing assumptions and a constraint representing the negated conjunction of all assertions. Thus, a satisfying assignment (a *model*) of the program formula

<sup>1</sup> Alternatively, one could assume to know the desired output of the program for I and define a bug on I as a case where the program outputs the wrong value for I.

represents an execution of the program that satisfies all assumption but violates at least one assertion. Such an execution is a *counterexample*.

The translation, following [8], goes through four stages. We refer to the example in Fig. 2 to demonstrate certain steps.


Note that, assertions are also expressed by means of indexed variables. The specific indices in the assertion indicate the location in the execution in which the assertion is checked. In addition, if an assumption or an assertion is located within an if statement with branch condition g, then it is implied by g if it is within the then part of the if and is implied by ¬g, if it is within the else part. In the example, assert (t < x) is encoded by (g<sup>0</sup> → t<sup>1</sup> < x0).

4. Conversion to SMT constraints: Once the program is in SSA form, conversion to SMT is straightforward: An assignment x:=e is converted to the constraint x = e; A Φ-assignment x:= b?x1:x2 is converted to the constraint (x = ite(b, x1, x2)), which is an abbreviation of ((b∧x = x1)∨(¬b∧x = x2)); An assume statement assume(bexpr) is converted to the constraint bexpr, and an assert statement assert(bexpr) is converted to the constraint ¬bexpr (since a model of the SMT formula should correspond to an assertion violation).

If the program includes several assertions, then they are converted to one constraint, representing the negation of their conjunction. In the example, the two assertions are converted to the following constraint:

$$\neg(y2>z0)\lor\neg(g0\to t1$$

We say that a constraint *encodes* the statement it came from and we partition constraints into three sets, Sassign, Sphi and Sdemand, based on what they encode. Sassign contains constraints encoding assignments, including those originated from assigning a fresh boolean variable with a branching condition; Sphi - encoding Φ-assignments; and Sdemand - encoding demands from assert and assume statements. In particular, it encodes the negated conjunction of all assertions.

The triple (Sassign, Sphi, Sdemand) is called a *program constraint set*. The program constraint set we get from a program P when using wb as an unwinding bound is denoted CSwb <sup>P</sup> . The *program formula* ϕwb <sup>P</sup> , is the conjunction of all constraints in all three sets of CSwb <sup>P</sup> :

$$\varphi\_P^{wb} = (\bigwedge\_{s \in S\_{assigen}} s) \wedge (\bigwedge\_{s \in S\_{phi}} s) \wedge (\bigwedge\_{s \in S\_{demand}} s).$$

**Theorem 1 (**[9]**).** *A program* P *is* wb*-violation free iff the formula* ϕwb <sup>P</sup> *is unsatisfiable.*

For simplicity of notation, in the rest of the paper we omit the superscript wb.

Since the program formula is the result of translating an SSA program, the formula is defined over indexed variables. Further, each constraint in Sassign corresponds to the single variable, which is assigned in the statement encoded by the constraint.

#### **4 Must Fault Localization**

In this section, we precisely define when a location should be considered relevant for a bug. This definition is motivated by a repair perspective, taking into account which changes can be made to statements in order to repair a bug.

In order to define the changes allowed, we use repair schemes. A *repair scheme* S is a function from statements to sets of statements. An S*-patch* for a program <sup>P</sup> is a set of pairs of location and statement {(l1, st<sup>r</sup> <sup>1</sup>), ··· ,(lk, st<sup>r</sup> <sup>k</sup>)}, for which the following holds: for all 1 ≤ i ≤ k, let st<sup>i</sup> be the statement in location l<sup>i</sup> in P, then st<sup>r</sup> <sup>i</sup> ∈ S(sti). The patch is said to be *defined over* the set of locations {l1, ··· , lk}. Applying an S*-patch* τ to a program P means replacing for every location l<sup>i</sup> in τ , the statement st<sup>i</sup> with st<sup>r</sup> <sup>i</sup> . This results in an S*-patched* program of P. The set of all S*-patched* programs created from a program P is the S*-search space* of P.

Let P be a program with a bug on input I, and S be a repair scheme. An S*-repair* for I is an S-patched program that is correct for I. An S*-repairable set* is a set of locations F such that there exists an S-repair defined over F. An Srepairable set is *minimal* if removing any location from it makes it no longer an S-repairable set. A location is S*-relevant* if it is a part of a minimal S-repairable set.<sup>2</sup>

In this paper, we focus on two repair schemes that are frequently used for automated program repair: the arbitrary scheme (Sarb) and the mutation scheme (Smut). Both schemes only manipulate program expressions, but the

<sup>2</sup> We sometimes omit <sup>S</sup> from notations where <sup>S</sup> is clear from context.

mutation scheme is more restrictive than the arbitrary scheme: Sarb(st) is the set of all options to replace the expression of st<sup>3</sup> with an arbitrary expression, while Smut(st) only contains statements where the expression in st is mutated according to a set of simple syntactic rules. The rules we consider are replacing a + operator with a - operator, and vice versa, replacing a < operator with a > operator, and vice versa, and increasing or decreasing a numerical constant by 1.<sup>4</sup>

*Example 1.* In this example we demonstrate how different repair schemes define different sets of relevant locations. Consider again the foo program from Fig. 2. This program has a bug on input I = x ← 0, w ← 0. The error trace associated with the bug is 1, 2, 3, 4, 8 (the assertion on line 8 is violated).

The location set {3, 4} is a minimal Smut-repairable set: It is an Smutrepairable set because applying the Smut-patch {(3, z:=x-3),(4, w<3)}, results in an Smut-patched program that is correct for I. This set is also minimal, because none of the Smut-patches defined over {3} or {4} alone is an Smut-repair for I: Each one of the Smut-patches {(3, z:=x-3)}, {(3, z:=x+4)}, {(3, z:=x+2)}, {(4, w<3)}, {(4, w>4)}, {(4, w>2)} results in an assertion violation for I.

On the other hand, {3, 4} is *not* a minimal Sarb-repairable set: For example, the Sarb-patch {(3, z:=-6)} is an Sarb-repair for I. Note that, the Sarb-patch only needs to repair the bug, and not the program. That is, it is sufficient that there is no assertion violation on the specific input I, even though an assertion could be violated in the Sarb-patched program on another input.

The set of all minimal Sarb*-repairable set*s is {{2}, {3}, {4, 5}}. Therefore, the set of Sarb-relevant statements is {2, 3, 4, 5}. The set of all minimal Smut*repairable set*s is {{2, 3}, {3, 4}}. Therefore, the set of Smut-relevant statements is {2, 3, 4}.

Fault localization should focus the programmer's attention on locations that are relevant for the bug. But, returning the exact set of S*-relevant* locations, as defined above, can be computationally hard. In practice, what many fault localization algorithms return is a set of locations that *may* be relevant: The returned locations have a higher chance of being S-relevant than those who are not, but there is no guarantee that all returned locations are S-relevant, nor that all S-relevant locations are returned. We call such an algorithm *may fault localization*. In contrast, we define *must fault localization*, as follows:

**Definition 1 (**S**-must location set).** *An* <sup>S</sup>*-must location set is a set of locations that contains at least one location from each minimal* <sup>S</sup>*-repairable set.*<sup>5</sup>

<sup>3</sup> If st is an assignment, its expression is its right-hand-side. If st is a conditional statement, its expression is its condition.

<sup>4</sup> This simple definition of the mutation scheme is used only for simplicity of presentation. Our implementation supports a much richer set of mutation rules, as explained in Sect. 7.

<sup>5</sup> This is, in fact, a hitting set of the set of all minimal <sup>S</sup>-repairable sets.

**Definition 2 (**S**-must fault localization).** *An* <sup>S</sup>*-must fault localization algorithm is an algorithm that for every program* P *and every buggy input* I*, returns an* S*-must location set.*

Note that, an S-must location set is not required to contain all S-relevant locations, but only one location from each minimal S-repairable set. Still, this is a powerful notion since it guarantees that no repair is possible without including at least one element from the set.

Also note, that the set of all locations visited by P during its execution on I is always an S-must location set. This is because any S-patch where none of these locations is included is definitely **not** an <sup>S</sup>-repair, since the same assertion will be violated along the same path. However, this set of locations may not be minimal. In the sequel, we aim at finding small S-must location sets.

*Example 2.* Continuing the previous example, the set {2, 3, 4} is an Sarb-must location set, and also an Smut-must location set. In contrast, the set {2, 3} is only an Smut-must location set, but not an Sarb-must location set, since it does not contain any location from the Sarb-minimal repairable set {4, 5}. The set {2} is neither an Sarb-must location set nor an Smut-must location set.

*Example 3.* Consider again the absValue procedure of Fig. 1. The set {2} is an Smut-minimal repairable set and an Sarb-minimal repairable set for the bug in question. Therefore, we can say that all algorithms that were shown in Sect. 2 not to include the location 2 in their result [2,6,14,21,23], are neither Sarb-must nor Smut-must fault localization algorithms.

#### **5 Fault Localization Using Program Formula Slicing**

In this section we formally define the notion of slicing. Based on this, we present an algorithm for computing must fault localization for Sarb and Smut.

#### **5.1 Program Formula Slicing**

A central building block in our fault localization technique is *slicing*. But, we do not define slicing in terms of the program directly, but in terms of the program formula representing it, instead. The input to the slicing algorithm is a program formula ϕ, a model μ of it, and a variable v. Recall that ϕ is a conjunction of constraints from Sassign, Sphi and Sdemand (see Sect. 3.2). The goal of the slicing algorithm is to compute the *slice* of the variable v with respect to ϕ and μ. Intuitively, this slice includes the set of all constraints that influence the value v gets in μ.

Similar to traditional slicing, it is easy to define the slice as the reflexivetransitive closure of a dependency relation. But, unlike traditional slicing, which defines dependencies between statements, our dependency relation is between variables of the formula. These variables are indexed. Each originates from a variable of the underlying SSA program, where it was assigned at most once.

**Fig. 3.** Illustration of the static and dynamic dependency relations of the foo procedure

We refer to variables never assigned as *input variables*, and denote the set containing them by InputV ars. A variable v that was assigned once is called a *computed variable*, and the (unique) constraint encoding the assignment to it is denoted Assign(v). The set of all computed variables is denoted ComputedV ars. We also denote by vars(e) the set of variables that appear in a formula or expression e.

**Definition 3 (Static Dependency).** *The static dependency relation of a program formula* ϕ *is* SD<sup>ϕ</sup> ⊆ vars(ϕ) × vars(ϕ) *s.t.*

$$SD\_{\varphi} = \{ (v\_1, v\_2) \mid \exists e \text{ } s.t. \ (v\_1 = e) \in S\_{assign}, v\_2 \in vars(e) \} \cup$$

$$\{ (v, b), (v, v\_1), (v, v\_2) \mid (v = ite(b, v\_1, v\_2)) \in S\_{phi} \}$$

*.*

The left-hand-side of Fig. 3 presents the graph for the static dependency relation of the foo procedure of Fig. 2. The nodes in the graph are (indexed) variables and there is an arrow from v<sup>1</sup> to v<sup>2</sup> iff (v1, v2) ∈ SDϕ.

**Definition 4 (Dynamic Dependency).** *The dynamic dependency relation of a program formula* ϕ *and a model* μ *of* ϕ *is* DDϕ,μ ⊆ vars(ϕ) × vars(ϕ) *s.t.*

$$DD\_{\varphi,\mu} = \{(v, v\_1) \mid \exists b, v\_2 \; s.t. \; (v =ite(b, v\_1, v\_2)) \in S\_{phi}, \; \mu[b] = true\}$$

$$\cup \{(v, v\_2) \mid \exists b, v\_1 \; s.t. \; (v =ite(b, v\_1, v\_2)) \in S\_{phi}, \; \mu[b] = false\}$$

$$\cup \{(v, b) \mid \exists v\_1, v\_2 \; s.t. \; (v =ite(b, v\_1, v\_2)) \in S\_{phi}\}$$

$$\cup \{(v, v\_1) \mid \exists e \; s.t. \; (v = e) \in S\_{assign}, v\_1 \in vars(e)\}$$

Note that, dynamic dependency includes only dependencies that coincide with the specific model μ, which determines whether the then or the else direction of the if is executed. Static dependency, on the other hand, takes both options into account. Thus, DDϕ,μ ⊆ SD<sup>ϕ</sup> for every model μ.

The bold arrows on the right-hand-side of Fig. 3 represent the relation DDϕ,μ of the foo procedure, for any μ where μ[g0] = f alse.

**Definition 5 (Influencing Variables).** *Given a program formula* ϕ*, a model* μ *of it, and a computed variable* v*, the set of influencing variables of* v *with respect to* ϕ *and* μ *is:*

$$InfluenceVars\_{\varphi,\mu}(v) = \{v' \mid (v, v') \in (DD\_{\varphi,\mu})^\*\}$$

The circled nodes on the right-hand-side of Fig. 3 represents the variables that belong to InfluenceV arsϕ,μ(y2).

**Definition 6 (Program Formula Slice).** *Given a program formula* ϕ*, a model* μ *of it, and a computed variable* v*, the program formula slice of* v *with respect to* ϕ *and* μ *is:*

$$Slice\_{\varphi,\mu}(v) = \{Assign(v') \mid v' \in (InfluenceVars\_{\varphi,\mu}(v) \cap computedVars) \}$$

Thus, intuitively, Sliceϕ,μ(v) includes all constraints (in SSA form) encoding assignments that influence the value of v in μ. More precisely, when considering the conjunction of only the constraints of Sliceϕ,μ(v), as long as the value of all input variables remains the same as in μ, the value of v will remain the same as well. This is formalized in the following theorem, whose proof can be found in the full version [39].

**Theorem 2.** *For every* ϕ,μ *and* v*, the following holds:*

$$\left[\bigwedge\_{c \in Slice\_{\varphi, \mu}(v)} c \wedge \bigwedge\_{v\_i \in InputVars} (v\_i = \mu[v\_i])\right] \implies (v = \mu[v])^2$$

Continuing with our example of foo procedure,

$$\text{Alice}\_{\varphi,\mu}(y\_2) = \{ \ y\_2 = \text{ite}(g\_0, y\_1, y\_0), \ y\_0 = x\_0 - 3, \ g\_0 = w\_0 > 3 \}.$$

#### **5.2 Computing the Program Formula Slice**

The computation of the program formula slice is composed of two steps. In the first step, we build a graph based on the static dependency relation, SDϕ. In the second step, we compute the slice Sliceϕ,μ(v) by computing the set of nodes reachable from v in this graph, using a customized reachability algorithm, which makes use of the model μ.

The graph built during the first step is called the *Static Dependency Graph (SDG)* of ϕ. Nodes of this graph are variables of ϕ and edges are the static dependencies of SDϕ. Edges are annotated using the function ψ, mapping every static dependency (v, v ) to a boolean formula such that (v, v ) ∈ DDϕ,μ iff μ |= ψ[(v, v )]. Specifically, for every constraint of the form (v = ite(b, v1, v2)) in Sphi, the edge (v, v1) is annotated with b and the edge (v, v2) is annotated with ¬b. All other edges of the graph are annotated with true. See the left-hand-side of Fig. 3. For simplicity all true annotations are omitted.

The algorithm for the second step is presented in Algorithm 1. This algorithm gets a program formula ϕ, its SDG, a model μ of ϕ, and a variable v, and computes Sliceϕ,μ(v). First, the set InfluenceV arsϕ,μ(v) is computed as the set of nodes reachable from v in SDG, except that the reachability algorithm traverses an edge (v, v ) only if μ |= ψ[(v, v )]. Thus, an edge (v, v ) is traversed iff (v, v ) ∈ DDϕ,μ, which means that the set of reachable nodes computed this way is in fact InfluenceV arsϕ,μ(v). Finally, the slice Sliceϕ,μ(v) is the set of constraints encoding assignments to variables in InfluenceV arsϕ,μ(v).


#### **5.3 The Fault Localization Algorithm**

Our fault localization algorithm is presented in Algorithm 2. The input to this algorithm is a program formula ϕ of a program P, and a model μ of ϕ. The model μ represents a buggy execution of P on an input I, and the algorithm returns a set of locations, F, that is an Smut-must location set.

As before, we assume to know the origin of constraints in ϕ, and use the sets Sassign, Sphi and Sdemand. Furthermore, here we also assume that for every constraint c ∈ Sassign, we know exactly which program statement it came from. We call this statement the *origin* of c, and denote it by Origin(c).

As a first step, the algorithm computes a set of variables V by calling the procedure ImportantV ars. This procedure receives an SMT formula ϕ and a model μ of ϕ, and reduces μ to a partial model of ϕ. A *partial model* of ϕ w.r.t. μ is a partial mapping from variables of the formula to values, which is consistent with μ and is sufficient to satisfy the formula. For example, for the formula ϕ = (a = 0 ∨ b = 0) and the model μ = {a → 0, b → 1}, the valuation {a → 0} is a partial model of ϕ. Procedure ImportantV ars will return the set of variables that appear in the partial model ({a} in our example). Details of this procedure are presented in the full version [39].

The formula passed to ImportantV ars in our case is the conjunction of all demands in Sdemand. Recall that the set Sdemand contains constraints encoding all conditions that need to be met for an assertion violation to happen: Conditions from assumptions appear as is, while conditions from assertions are negated and disjuncted (See Fig. 2. The last constraint on the right-hand-side represents the disjunction of the negated assertions). Therefore, the set of variables V , returned by ImportantV ars, is such that as long as their values in μ remain the same, this conjunction will still be satisfied, which means that an assertion violation will still occur.

To make sure that their values do *not* remain the same, we use slicing: The algorithm proceeds by computing the program formula slice for each of the variables in V using Algorithm 1. All slices are united into the combined set S. This set represents all constraints that if remain the same, then *all* the variables in V maintain their value. Thus, at least one element from S must be included in any repair.

Note that, by first applying ImportantV ars, we reduce the number of variables whose value should be preserved in order to maintain the bug. The smaller this number, the smaller F is. We will explain the usefulness of a small F in Sect. 6.

Finally, we need to translate the constraints in S back to statements of P. Because of how the slicing algorithm works, constraints in S may belong to either Sassign or Sphi. If they belong to Sphi, we ignore them, because they encode the control-flow structure of the program, rather than a particular statement. Otherwise, we add the origin of the constraint, which is a statement of the program, to the set of returned locations, F. Note that, several different constraints may have the same origin, for example due to loop unwinding. In such a case, it is sufficient for one constraint encoding the statement st to be included in S, for st to be included in F. A proof for the following theorem can be found in the full version [39].

**Theorem 3.** *Algorithm FOSFL is an* <sup>S</sup>arb*-must and also an* <sup>S</sup>mut*-must fault localization algorithm.*

#### **5.4 Incremental Fault Localization**

It is often necessary to apply fault localization to several bugs in the same program, or even to several programs with different bugs. Therefore, it is desired that the fault localization algorithm be *incremental*, which means that the computation effort of each fault localization attempt should be proportional to the changes made from the previous attempt. In other words, we should avoid recomputation whenever possible, taking advantage of the fact that the program remains the same, or at least remains similar.

Algorithm FOSFL can be easily made incremental for the case of different bugs of the same program. In this case, several successive calls are made to the algorithm using the same program formula ϕ, but with different models of it. Since the static dependency relation SD<sup>ϕ</sup> depends solely on the program formula, and not on the model, we can avoid re-computing the SDG for each call. Instead, we can compute the SDG once, upfront, and whenever FOSFL is called, simply skip the first line. We call the incremental version of FOSFL Incremental-Formula-Slicing-Fault-Localization (I-FOSFL).

Note that I-FOSFL is useful not only for fault localization of different bugs of the same program, but also whenever the SDG remains the same during successive fault localization calls. This is the case when considering different mutated programs P of the same program P, since every change to P replaces an expression e with an expression e over the same variables. Thus, the SDG remains the same, since the static dependency relation, in fact, only depends on vars(e), and not on e itself<sup>6</sup>.

#### **6 Program Repair with Iterative Fault Localization**

In [38], a mutation-based algorithm for program repair, named AllRepair, was presented. This algorithm uses the mutation scheme in order to repair programs with respect to assertions in the code. Unlike fault localization, where the motivation is repairing a bug for a specific input, program repair aims at repairing the program for *all* inputs. To avoid confusion, we refer to a repair for all inputs as a *full repair*. In [38], the notion of a *full repair* is bounded: loops are unwound wb times, and a program is considered *fully repaired* if no assertion is violated along executions with at most wb unwindings. A program that is not fully repaired is said to be *buggy*. For the rest of this section, we refer to an Smut-patch as a patch, and to an Smut-patched program as a mutated program.

As its name implies, the goal of AllRepair is to obtain all *minimal* fully repaired mutated programs, where minimality refers to the patch used in the program. It goes through an iterative generate-validate process. The generate phase chooses a mutated program from the search space, and the validate phase checks whether this program is fully repaired, by solving its program formula. The mutated program is fully repaired iff the formula is unsatisfiable.

The generate-validate process is realized using an interplay between a SAT solver and an SMT solver. The SAT solver is used for the generate stage. For every mutation M and line l, there is a boolean variable BM(l), which is true if and only if mutation M is applied to line l. A boolean formula is constructed and sent to the SAT solver, where each satisfying assignment corresponds to a program in the search space. The SMT solver is used for the validate stage. The program formula of the mutated program is solved to check if it is buggy

<sup>6</sup> This is true for <sup>S</sup>mut but not for <sup>S</sup>arb, since the latter allows to replace an expression e with an expression eover different variables.

**Fig. 4.** Algorithm fl-AllRepair: Mutation-based program repair with iterative fault localization. The notation <sup>P</sup> <sup>M</sup> <sup>≡</sup><sup>F</sup> <sup>P</sup> means that P <sup>M</sup> and P agree on the content of all locations in <sup>F</sup>. The notation <sup>P</sup> <sup>M</sup> <sup>P</sup> means that the patch used for creating P- is a superset of the patch used for creating P <sup>M</sup>.

or not. To achieve minimality, when a mutated program created using a patch τ is fully repaired, every mutated program created using a patch τ , with τ ⊆ τ , is blocked.

*Example 4.* Let P <sup>M</sup> be a fully repaired mutated program obtained by applying the patch τ , consisting of mutating line l<sup>1</sup> using mutation M<sup>1</sup> and mutating line l<sup>2</sup> using mutation M2. Then blocking any superset of τ will we done by adding to the boolean formula representing the search space, the blocking clause ¬(B<sup>M</sup><sup>1</sup> (l1) ∧ B<sup>M</sup><sup>2</sup> (l2)), which means "either do not apply M<sup>1</sup> to l<sup>1</sup> or do not apply M<sup>2</sup> to l2". This clause blocks any mutated program with τ ⊆ τ .

Blocking such programs prunes the search space, but only in a limited way. No pruning occurs when the mutated program is buggy.

In this paper, we extend the algorithm of [38] with a fault localization component. The goal of the new component is to prune the search space by identifying sets of mutated programs that are buggy, without inspecting each of the individual programs in the set.

Figure 4 shows the program repair algorithm with the addition of fault localization. In the new algorithm, called fl-AllRepair, whenever a mutated program is found to be buggy during the validation step, its program formula is passed to the fault localization component along with the model obtained when solving the formula. The fault localization component returns a set of locations F, following the I-FOSFL algorithm. Since this set is guaranteed to be an Smutmust location set, at least one of the locations in it should be changed for the bug to be fixed. Consequently, all mutated programs in which all locations from F remain unchanged are blocked from being explored in the future. As before, blocking is done by adding a blocking clause that disallows such programs.

*Example 5.* Let P <sup>M</sup> be a buggy mutated program for which F consists of {l1, l2, l3}, where l<sup>1</sup> was mutated with M1, l<sup>2</sup> was not mutated, and l<sup>3</sup> was mutated with M3. The blocking clause ¬B<sup>M</sup><sup>1</sup> (l1)∨¬BOriginal(l2)∨¬B<sup>M</sup><sup>3</sup> (l3) will be added to the boolean formula representing the search space of mutated programs. It restricts the search space to those mutated programs that either do not apply mutation M<sup>1</sup> to l1, or do mutate l<sup>2</sup> or do not apply M<sup>3</sup> to l3. This will prune from the search space all mutated programs which are identical to P <sup>M</sup> on the locations in F. Note that smaller F will result in a larger set of pruned programs.

**Proposition 1.** *Algorithm* fl-AllRepair *is sound and complete.*

#### **7 Experimental Results**

We have implemented our fault localization technique and its integration with mutated-based program repair in the tool AllRepair, available at https://github.com/batchenRothenberg/AllRepair. In this section, we present experiments evaluating the contribution of the new fault localization component to the program repair algorithm. We refer to the algorithm of [38], without fault localization, as AllRepair, and to the algorithm presented in this paper as FL-AllRepair. Both algorithms search for minimal wb-violation free programs, and both are sound and complete. Thus, for every buggy program and every bound wb, both algorithms will eventually produce the same list of repairs.

The difference between the algorithms lies in the repair loop. In case a mutated program is found to be buggy, the AllRepair algorithm will only block the one program, while the FL-AllRepair algorithm might block a set of programs. Therefore, the number of repair iterations required to cover the search space can only decrease using the FL-AllRepair algorithm. On the other hand, the cost of each iteration with fault localization is strictly higher than without it. Our goal in this evaluation is to check if the use of fault localization pays off. That is, to check if repairs are produced faster using FL-AllRepair than using AllRepair.

*Benchmarks.* For our evaluation, we have used programs from two benchmarks: TCAS and Codeflaws. The TCAS benchmark is part of the Siemens suite [12], and is frequently used for program repair evaluation [5,34,38]. The TCAS program implements a traffic collision avoidance system for aircrafts, and consists of approximately 180 lines of code. We have used all 41 faulty versions of the benchmark in our experiments.

The Codeflaws benchmark [41] is also a well-known and widely used benchmark for program repair. Programs in this benchmark are taken from buggy user submissions to the programming contest site Codeforces<sup>7</sup>. In each program, a user tries to solve a programming problem published as part of a contest on the site. The programming problems are varied, and also the users have a diverse level of expertise. The benchmark also provides correct versions for all buggy versions, which are used to classify bug types by computing the syntactic difference. For our experiments we randomly chose 13 buggy versions classified with bug types that can be fixed using mutations. The size of the chosen programs ranges from 17 to 44 lines of code.

<sup>7</sup> http://codeforces.com/.

*Mutations.* The mutations used in AllRepair (and accordingly in FL-AllRepair) is a subset of the mutations used in [37].We define two *mutation levels*, where level 1 contains only a subset of the mutations available in level 2. Thus, level 1 involves easier computation but may fail more often in finding repairs.

Table 1 shows the list of mutations used in each mutation level. For example, for the category of arithmetic operator replacement, in mutation level 1, the table specifies two sets: {+, −} and {/, %}. This means that a + can be replaced by a − , and vice versa, and that the operators /, % can be replaced with each other. Constant manipulation mutations


**Table 1.** Partition of mutations to levels

apply to a numeric constant and include increasing its value by 1 (C → C +1), decreasing it by 1 (C → C −1), setting it to 0 (C → 0) and changing its sign (C → −C).

*Setting.* All of our experiments were run on a Linux 64-bit Ubuntu 16.0.4 virtual machine with 1 CPU, 4 GB of RAM and 40 GB of storage, provided using the Vmware vRA service<sup>8</sup>. For each of the buggy versions in our benchmarks we have experimented with both mutation levels 1 and 2. For the Codeflaws benchmarks we additionally experimented with different unwinding bounds: 2 (entering the loop once), 5, 8 and 10. This experiment is irrelevant to the TCAS benchmarks since the TCAS program does not contain loops or recursive calls. Overall we had 186 combinations of buggy programs, mutation levels and unwinding bounds. We refer to each such combination as an *input*. For each input, we run both the AllRepair and the FL-AllRepair algorithms with a timeout of 10 minutes and a mutation size limit of 2 (i.e., at most two mutations could be applied at once).

#### **7.1 Results**

In total, 131 different repairs were found during our experiments, for 60 different inputs (for several inputs there was more than one possible repair). In this count, we treat repairs fixing the same program in the same way as different, if they were produced using different mutation levels or unwinding bounds. This is because our evaluation is concerned with the time to find these repairs, and both the mutation level and the unwinding bound greatly influence this time.

Because the time to produce a repair sometimes varied in several orders of magnitude depending on the input, we have chosen to split repairs into three categories: fast, intermediate, and slow, and examine the time difference separately for each category. Splitting repairs to categories was done according to the time it took to find them using the AllRepair algorithm. If that time was

<sup>8</sup> https://www.vmware.com/il/products/vrealize-automation.html.

**Fig. 5.** Time to find each repair using AllRepair (AR) and FL-AllRepair (FLAR). Each x value represents a single repair, and the corresponding y values represent the time, in seconds, it took to find that repair using both algorithms. Note that the graphs differ in the y axis scale.

under 5 seconds, the repair was considered fast. If it was over 4 minutes, it was considered slow, and otherwise it was considered intermediate.

Figure 5 shows a comparison of the time, in seconds, it took to find repairs in both algorithms. There are three graphs, according to our three categories. In all graphs, each x value represents a single repair, where the corresponding blue dot in the y axis represents the time it took to find that repair using AllRepair, and the red square represents the time using FL-AllRepair. So, whenever the blue dot is above the red square, FL-AllRepair was faster in finding that repair, and the y difference represents the time saved.

For the fast category (Fig. 5a), there is no clear advantage to FL-AllRepair. The majority of the repairs in this category are produced in less than a second using both algorithms. For the remaining repairs, there appears to be as many cases where FL-AllRepair is faster as when it is slower. But, in all cases where there is a time difference, in either direction, it is only of a few seconds.

For the intermediate category (Fig. 5b), the advantage of FL-AllRepair is starting to become clear. There are now only 4 repairs (out of 20) for which FL-AllRepair is slower. Also, on average, it is slower by 4 seconds, but faster by 10 seconds. Finally, for the slow category (Fig. 5c), there is an obvious advantage to FL-AllRepair. First, it is able to find 6 repairs *exclusively*, while AllRepair reaches a time-out. Also, for the remaining 27 repairs, FL-AllRepair is faster in all cases but one. The time difference is now also very significant: FL-AllRepair is faster by 1512 seconds (around 25 minutes) on average.

To sum up, the results show that in many cases our algorithm FL-AllRepair is able to save time in finding repairs. The savings are especially significant in cases where it takes a long time to produce the repair using the original AllRepair algorithm, and these are the cases where time savings are most needed.

#### **7.2 Comparison with Other Repair Methods**

The TCAS benchmark was recently used also in [34], where AllRepair's performance was compared to that of four other automated repair tools: Angelix [29], GenProg [26], FoRenSiC [5] and Maple [34]. AllRepair was found to be faster by an order of magnitude than all of the compared tools, taking only 16.9 seconds to find a repair on average, where the other tools take 1540.7, 325.4, 360.1, and 155.3 seconds, respectively. Since in our experiments on TCAS fl-AllRepair was faster than AllRepair on average (and even when it was slower it was only by a few seconds), we conclude that fl-AllRepair also compares favorably to these other tools.

In terms of repairability, the repair scheme used by AllRepair (and fl-AllRepair) is limited compared to the other tools: AllRepair only uses mutations on expressions while Angelix, FoRenSiC and Maple allow replacing an expression with a template (e.g., a linear combination of variables), which is then filled out to create a repair. GenProg allows modifying a statement as well as deleting it or adding a statement after it. Therefore, the other tools are inherently capable of producing repairs in more cases than AllRepair.

In the case of TCAS, the study showed that AllRepair is able to find repairs for 18 versions (a result that we confirm in our experiments as well), while Angelix, GenProg, FoRenSiC and Maple found 32, 11, 23 and 26, respectively. But, what the study also showed, is that in repair methods that are based on tests, in many cases the repair found only adhered to the testsuite, but was not correct when inspected manually. When counting only correct repairs, AllRepair finds repairs for 18 versions (all of AllRepairs repairs are correct), while Angelix, GenProg, FoRenSiC and Maple find 9, 0, 15 and 26, respectively. Since fl-AllRepair is able to find all repairs found by AllRepair, the same results also apply to fl-AllRepair.

#### **8 Related Work**

Dynamic slicing has been widely used for fault localization in the past [16,36, 43,45–47]. But, as we have seen, traditional notations of dynamic slicing [2,23] are not must (with respect to neither of the presented schemes), and thus, the above techniques may fail to include relevant locations in their results.

Other approaches for fault localization include spectrum-based (SBFL) [1, 13,20,31,44], mutation-based (MBFL) [15,18,30,35] and formula-based (FBFL) [7,14,17,21,40]. Both SBFL and MBFL techniques compute the suspiciousness of a statement using coverage information from failing and passing test executions. MBFL uses, in addition, information on how test results change after applying different mutations to the program. Both SBFL and MBFL techniques can be seen as may fault localization techniques, in nature: they return locations that *are likely* to be relevant to the failing execution, based on all executions. We see may fault localization techniques as orthogonal to ours (and to must fault localization techniques in general), since in the trade-off between returning a small set of locations, and returning one that is guaranteed to contain all relevant statements, may techniques prefer the first, while must techniques prefer the second. In the context of repair, there are interesting applications for both.

FBFL techniques represent an error trace using an SMT formula and analyze it to find suspicious locations. These techniques include using error invariants [6,14,17,40], maximum satisfiability [21,24,25], and weakest preconditions [7]. What we were able to show in this paper, is that the methods of [6,14,21] are not must. In contrast, we believe (though we do not prove it) that the methods of [7,24,25] are must. But, what [7,24,25] have in common is that they use the semantics of the error trace or the program. Though semantic information can help to further minimize the number of suspicious locations, retrieving it involves using expensive solving-based procedures. Our approach, on the other hand, uses only syntactic information, which makes the fault localization computation relatively cheap; No SMT solving is needed. Thus, these approaches can be seen as complementary to ours.

In the literature there is also a wide range of techniques for automated program repair using formal methods [4,10,19,22,29,32,33,42]. Both [11] and [37] also use fault localization followed by applying mutations for repair. But, unlike this work, fault localization is applied only for the original program. Also, neither the Tarantula fault localization used in [11] nor the dynamic slicing used in [37] carries the guarantee of being a must fault localization. The tool MUT-APR [3] fixes binary operator faults in C programs, but only targets faults that require one line modification. The tools FoREnSiC [5] and Maple [34] repair C programs with respect to a formal specification, but they do so by replacing expressions with templates, which are then patched and analysed. SemGraft [28] conducts repair with respect to a reference implementation, but relies on tests for SBFL fault localization of the original program.

#### **9 Conclusion**

In this work we define a novel notion of *must* fault localization, that carefully identifies program locations that are relevant for a bug, so that the set is sufficiently small but is guaranteed not to miss desired repairs. We also show that the notion of *must* fault localization should be defined with respect to the repair scheme in use. We show that our notion of must fault localization is particularly useful in pruning the search space of a specific mutation-based repair algorithm.

To the best of our knowledge, we are the first to investigate the widely-used notion of fault localization and to suggest criteria for evaluating its different implementation.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Author Index

Albert, Elvira I-177 Almagor, Shaull II-541 Arcak, Murat I-556 Backes, John I-165 Bak, Stanley I-3, I-18, I-66 Barrett, Clark I-137, I-403 Bastani, Osbert II-587 Batz, Kevin II-512 Baumeister, Jan II-28 Bazille, Hugo II-304 Bendík, Jaroslav I-439 Beneš, Nikola I-569 Berdine, Josh II-225 Berrueco, Ulises I-165 Beyer, Dirk II-165 Blackshear, Sam I-137 Blahoudek, František II-15, II-421 Blondin, Michael II-372 Bray, Tyler I-165 Brázdil, Tomáš II-421 Brim, Daniel I-165 Brim, Luboš I-569 Brotherston, James II-203 Buiras, Pablo I-225 Büning, Julian I-376

Češka, Milan I-653 Chang, Kai-Chieh I-543 Chatterjee, Krishnendu II-398 Chau, Calvin I-653 Cheang, Kevin I-137 Chen, Mingshuai II-327 Chen, Xin I-582 Chen, Yanju II-587 Chen, YuTing II-101 Chiu, Johnathan I-122 Çirisci, Berk I-350 Cook, Byron I-165 Costa, Diana II-203 D'Antoni, Loris II-3

Dai, Hanjun II-151

Dai, Liyun I-415 Daly, Ross I-403 Dang, Hoang-Hai II-225 Devonport, Alex I-556 Dill, David L. I-137 Dillig, Isil II-564, II-587 Donovick, Caleb I-403 Dreyer, Derek II-225 Dross, Claire II-178 Dullerud, Geir E. II-448 Duret-Lutz, Alexandre II-15 Dwyer, Matthew B. I-97 Elbaum, Sebastian I-97 Elboher, Yizhak Yisrael I-43 Enea, Constantin I-350 Esparza, Javier II-372 Fan, Chuchu I-629 Farzan, Azadeh I-350 Feng, Shenghua II-327 Feng, Yu II-587 Finkbeiner, Bernd II-28, II-40, II-64 Fremont, Daniel J. I-122 Gacek, Andrew I-165 Gan, Ting I-415 Genest, Blaise II-304 Gieseking, Manuel II-64 Gocht, Stephan I-463 Golia, Priyanka II-611 Gopinathan, Kiran II-279 Gordillo, Pablo I-177 Gottschlich, Justin I-43 Grieskamp, Wolfgang I-137 Grumberg, Orna II-658 Guanciale, Roberto I-225 Gurfinkel, Arie II-101 Haas, Thomas II-349 Hahn, Christopher II-40 Hanrahan, Pat I-403

Hartmanns, Arnd II-488

Hasuo, Ichiro II-349 Hecking-Harbusch, Jesko II-64 Helfrich, Martin II-3, II-372 Henzinger, Thomas A. I-275 Herbst, Steven I-403 Hobbs, Kerianne I-66 Hobor, Aquinas II-203 Hofmann, Jana II-40 Horowitz, Mark I-403 Houshmand, Farzin I-324 Huang, Chao I-543 Hunt Jr., Warren A. I-485

Jaber, Nouraldin I-299 Jacobs, Swen I-225, I-299 Jagannathan, Suresh I-251 Jegourel, Cyrille II-304 Jhala, Ranjit I-165 Johnson, Taylor T. I-3, I-18, I-66 Junges, Sebastian II-512

Kadlecaj, Jakub I-569 Kaminski, Benjamin Lucien II-488, II-512 Kanig, Johannes II-178 Katoen, Joost-Pieter II-398, II-512 Katz, Guy I-43 Khaled, Mahmoud I-556, II-461 Klimis, Vasileios II-126 Kölbl, Martin I-529 Kragl, Bernhard I-275 Křetínský, Jan II-3, I-653 Krogmeier, Paul II-634 Kučera, Antonín II-372 Kulkarni, Milind I-299 Kupferman, Orna II-541 Kwiatkowska, Marta II-475

Laprell, David I-376 Lavaei, Abolfazl II-461 Lesani, Mohsen I-324 Leue, Stefan I-529 Li, Xiao I-324 Li, Xuandong I-582 Lin, Chung-Wei I-543 Lin, Wang I-582 Lindner, Andreas I-225 Luckow, Kasper I-165

Madhusudan, P. II-634 Mann, Makai I-403 Manzanas Lopez, Diego I-3 Margineantu, Dragos D. I-122 Matheja, Christoph II-512 Mathur, Umang II-634 McLaughlin, Sean I-165 McMillan, Kenneth L. II-190 Meel, Kuldeep S. I-439, I-463, II-611 Menon, Madhav I-165 Meyer, Philipp J. II-372 Miller, Kristina I-629 Mitra, Sayan I-629 Mukherjee, Prasita I-251 Murali, Adithya II-634 Musau, Patrick I-3 Mutluergil, Suha Orhun I-350

Nagar, Kartik I-251 Naik, Aaditya II-151 Naik, Mayur II-151 Nelson, Luke II-564 Nemati, Hamed I-225 Nguyen, Luan Viet I-3 Norman, Gethin II-475 Novotný, Petr II-421

O'Hearn, Peter II-225 Olderog, Ernst-Rüdiger II-64 Ornik, Melkior II-421 Osipychev, Denis I-122

Padon, Oded II-190 Parisis, George II-126 Park, Daejun I-151 Park, Junkil I-137 Parker, David II-475 Pastva, Samuel I-569 Peebles, Daniel I-165 Peng, Chao I-582 Phalakarn, Kittiphon II-349 Pugalia, Ujjwal I-165

Qadeer, Shaz I-137, I-275

Raad, Azalea II-225 Ramneantu, Emanuel II-3 Reus, Bernhard II-126

Rodríguez, César I-376 Roohi, Nima II-448 Rosu, Grigore I-151 Rothenberg, Bat-Chen II-658 Roy, Subhajit II-611 Rubio, Albert I-177 Rungta, Neha I-165 Šafránek, David I-569 Sahai, Shubham I-201 Samanta, Roopsha I-299 Sankaranarayanan, Sriram I-604, II-327 Santos, Gabriel II-475 Schemmel, Daniel I-376 Schett, Maria A. I-177 Schirmer, Sebastian II-28 Schlesinger, Cole I-165 Schodde, Adam I-165 Schröer, Philipp II-512 Schwenger, Maximilian II-28 Sergey, Ilya II-279 Seshia, Sanjit A. I-122, II-255 Setaluri, Rajsekhar I-403 Shoham, Sharon II-101 Shriver, David I-97 Si, Xujie II-151 Siegel, Stephen F. II-77 Sinha, Rohit I-201 Slivovsky, Friedrich I-508 Slobodova, Anna I-485 Song, Le II-151 Soos, Mate I-463 Soudjani, Sadegh II-461 Spiessl, Martin II-165 Stanley, Daniel I-403 Strejček, Jan II-15 Subramanyan, Pramod I-201 Sun, Jun II-304

Takisaka, Toru II-349 Tanuku, Anvesh I-165 Temel, Mertcan I-485 Tentrup, Leander II-40 Thangeda, Pranay II-421 Topcu, Ufuk II-421 Torens, Christoph II-28 Torlak, Emina II-564 Tran, Hoang-Dung I-3, I-18, I-66 Truong, Lenny I-403

Van Geffen, Jacob II-564 Varming, Carsten I-165 Vazquez-Chanlatte, Marcell II-255 Vediramana Krishnan, Hari Govind II-101 Villard, Jules II-225 Viswanathan, Deepa I-165 Viswanathan, Mahesh II-448, II-634

Wagner, Christopher I-299 Wang, Chenglong II-587 Wang, Xi II-564 Wang, Yu II-448 Wehrle, Klaus I-376 Weininger, Maximilian II-3, II-398 West, Matthew II-448 Wickerson, John II-203 Wies, Thomas I-529 Winkler, Tobias II-398

Xia, Bican I-415 Xiang, Weiming I-3, I-18 Xu, Dong I-97 Xue, Bai I-415, II-327

Yan, Yihao II-77 Yang, Xiaodong I-3 Yang, Zhengfeng I-582

Zamani, Majid I-556, II-461 Zhan, Naijun I-415, II-327 Zhang, Keyi I-403 Zhang, Yi I-151 Zhang, Yifang I-582 Zhong, Jingyi Emma I-137 Zhu, Qi I-543 Zohar, Yoni I-137