**Armin Biere David Parker (Eds.)**

# **Tools and Algorithms for the Construction and Analysis of Systems**

**26th International Conference, TACAS 2020 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020 Dublin, Ireland, April 25–30, 2020, Proceedings, Part I**

## Lecture Notes in Computer Science 12078

Founding Editors

Gerhard Goos, Germany Juris Hartmanis, USA

## Editorial Board Members

Elisa Bertino, USA Wen Gao, China Bernhard Steffen , Germany Gerhard Woeginger , Germany Moti Yung, USA

## Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen , University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at http://www.springer.com/series/7407

# Tools and Algorithms for the Construction and Analysis of Systems

26th International Conference, TACAS 2020 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020 Dublin, Ireland, April 25–30, 2020 Proceedings, Part I

Editors Armin Biere Johannes Kepler University Linz, Austria

David Parker University of Birmingham Birmingham, UK

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-45189-9 ISBN 978-3-030-45190-5 (eBook) https://doi.org/10.1007/978-3-030-45190-5

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2020. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## ETAPS Foreword

Welcome to the 23rd ETAPS! This is the first time that ETAPS took place in Ireland in its beautiful capital Dublin.

ETAPS 2020 was the 23rd instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations of programming language developments, analysis tools, and formal approaches to software engineering. Organizing these conferences in a coherent, highly synchronized conference program enables researchers to participate in an exciting event, having the possibility to meet many colleagues working in different directions in the field, and to easily attend talks of different conferences. On the weekend before the main conference, numerous satellite workshops took place that attracted many researchers from all over the globe. Also, for the second time, an ETAPS Mentoring Workshop was organized. This workshop is intended to help students early in the program with advice on research, career, and life in the fields of computing that are covered by the ETAPS conference.

ETAPS 2020 received 424 submissions in total, 129 of which were accepted, yielding an overall acceptance rate of 30.4%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2020 featured the unifying invited speakers Scott Smolka (Stony Brook University) and Jane Hillston (University of Edinburgh) and the conference-specific invited speakers (ESOP) Işıl Dillig (University of Texas at Austin) and (FASE) Willem Visser (Stellenbosch University). Invited tutorials were provided by Erika Ábrahám (RWTH Aachen University) on the analysis of hybrid systems and Madhusudan Parthasarathy (University of Illinois at Urbana-Champaign) on combining Machine Learning and Formal Methods. On behalf of the ETAPS 2020 attendants, I thank all the speakers for their inspiring and interesting talks!

ETAPS 2020 took place in Dublin, Ireland, and was organized by the University of Limerick and Lero. ETAPS 2020 is further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology). The local organization team consisted of Tiziana Margaria (general chair, UL and Lero), Vasileios Koutavas (Lero@UCD), Anila Mjeda (Lero@UL), Anthony Ventresque (Lero@UCD), and Petros Stratis (Easy Conferences).

The ETAPS Steering Committee (SC) consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Holger Hermanns (Saarbrücken), Marieke Huisman (chair, Twente), Joost-Pieter Katoen (Aachen and Twente), Jan Kofron (Prague), Gerald Lüttgen (Bamberg), Tarmo Uustalu (Reykjavik and Tallinn), Caterina Urban (Inria, Paris), and Lenore Zuck (Chicago).

Other members of the SC are: Armin Biere (Linz), Jordi Cabot (Barcelona), Jean Goubault-Larrecq (Cachan), Jan-Friso Groote (Eindhoven), Esther Guerra (Madrid), Jurriaan Hage (Utrecht), Reiko Heckel (Leicester), Panagiotis Katsaros (Thessaloniki), Stefan Kiefer (Oxford), Barbara König (Duisburg), Fabrice Kordon (Paris), Jan Kretinsky (Munich), Kim G. Larsen (Aalborg), Tiziana Margaria (Limerick), Peter Müller (Zurich), Catuscia Palamidessi (Palaiseau), Dave Parker (Birmingham), Andrew M. Pitts (Cambridge), Peter Ryan (Luxembourg), Don Sannella (Edinburgh), Bernhard Steffen (Dortmund), Mariëlle Stoelinga (Twente), Gabriele Taentzer (Marburg), Christine Tasson (Paris), Peter Thiemann (Freiburg), Jan Vitek (Prague), Heike Wehrheim (Paderborn), Anton Wijs (Eindhoven), and Nobuko Yoshida (London).

I would like to take this opportunity to thank all speakers, attendants, organizers of the satellite workshops, and Springer for their support. I hope you all enjoyed ETAPS 2020. Finally, a big thanks to Tiziana and her local organization team for all their enormous efforts enabling a fantastic ETAPS in Dublin!

February 2020 Marieke Huisman ETAPS SC Chair ETAPS e.V. President

## Preface

TACAS 2020 was the 26th edition of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems conference series. TACAS 2020 was part of the 23rd European Joint Conferences on Theory and Practice of Software (ETAPS 2020). The conference was held at the Royal Marine Hotel in Dublin, Ireland, during April 25–30, 2020.

TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, flexibility, and efficiency of tools and algorithms for building systems. TACAS solicited four types of submissions:


This year 155 papers were submitted to TACAS, consisting of 111 research papers, 8 case study papers, 19 regular tool papers, and 17 tool demo papers. Individual authors were limited to a maximum of three submissions. Each paper was reviewed by at least three Program Committee (PC) members, who also provided feedback whether certain papers should go through a rebuttal process.

The chairs asked for 59 rebuttals, usually following such rebuttal recommendations by PC members. In parallel to PC reviewing, the Artifact Evaluation Committee (AEC) reviewed the artifacts. A formal summary review of this evaluation was made available to the PC members and taken into account in the discussion phase. The case study chair and the tools chair made sure that identical reviewing and selection criteria were applied within their respective class of papers. After this thorough reviewing, rebuttal and discussion phase, a total of 48 papers were accepted, including 31 research papers, 4 case study papers, 5 regular tool papers and 8 tool demo papers.

As in 2019, TACAS 2020 included an artifact evaluation (AE) for all types of papers. There were two rounds of the AE: for regular tool papers and tool demonstration papers AE was compulsory and artifacts had to be submitted to the first round. For research and case study papers, it was voluntary, and artifacts could be submitted to either the first or the second round. The results of the first round were communicated to the TACAS PC before their discussion phase so that the quality of the artifact could be considered prior to the TACAS decision making. Each artifact was evaluated independently by at least three reviewers. All accepted papers with accepted artifacts received a badge which is added to the title page of the respective paper if desired by the authors.

The AEC used a two-phase reviewing process: reviewers first performed an initial check to see whether the artifact was technically usable and whether the accompanying instructions were consistent, followed by a full evaluation of the artifact. The main criteria for artifact acceptance was consistency with the paper, with completeness, and documentation being handled in a more lenient manner as long as the artifact was useful overall.

In the first round, out of 44 artifact submissions, 29 were accepted and 15 were rejected. This corresponds to an acceptance rate of 66%. Out of the 36 artifacts for regular tool papers and tool demonstration papers, 25 artifacts were accepted and 11 artifacts were rejected resulting in an acceptance rate of 69%. In all but five cases, tool papers whose artifacts did not pass the evaluation were rejected. Those 5 artifacts were invited for submission in the second evaluation round and 3 of these artifacts were resubmitted and successfully evaluated. Overall, out of the 20 artifacts submitted to the second evaluation round, 17 were accepted and 3 were rejected resulting in an acceptance rate of 85%.

TACAS 2020 also hosted the 9th International Competition on Software Verification (SV-COMP 2020), chaired and organized by Dirk Beyer. The competition had again a high participation: 28 verification systems with developers from 11 countries were submitted for the systematic comparative evaluation, including 3 submissions from industry. Six teams contributed validators for verification witnesses. The TACAS proceedings includes the competition report and short papers describing 11 of the participating verification systems. These papers were reviewed by a separate SV-COMP program committee; each of the papers was assessed by at least three reviewers. Two sessions in the TACAS program were reserved for the presentation of the results: the summary by the SV-COMP chair and the participating tools by the developer teams in the first session, and the open community meeting in the second session.

We are grateful to everyone who helped to make TACAS 2020 a success. In particular, we would like to thank all PC members, external reviewers, and the members of the AEC for their detailed and informed reviews and for their discussions during the virtual PC and AEC meetings. The collection and selection of papers was organized through the EasyChair Conference System and the proceedings volumes were published with the help of Springer; we thank them all for their assistance. We also thank the SC for their advice, the Organizing Committee of ETAPS 2020 and its general chair (Tiziana Margaria) and the chair of the ETAPS Executive Board (Marieke Huisman).

March 2020 Armin Biere David Parker PC Chairs Marijn Heule Case Study Chair Falk Howar Tools Chair Dirk Beyer Competition Chair Arnd Hartmanns Martina Seidl AEC Chairs

## Organization

## Program Committee

Dirk Beyer LMU Munich, Germany Roderick Bloem TU Graz, Austria Alessandro Cimatti FBK-irst, Italy Jan Kretinsky TU Munich, Germany Wenchao Li Boston University, USA Ken McMillan Microsoft, USA Bernhard Steffen TU Dortmund, Germany Christoph Wintersteiger Microsoft, UK

Christel Baier TU Dresden, Germany Ezio Bartocci Vienna University of Technology, Austria Armin Biere (Chair) Johannes Kepler University Linz Jasmin Blanchette Vrije Universiteit Amsterdam, The Netherlands Hana Chockler King's College London, UK Rance Cleaveland University of Maryland, USA Goran Frehse Université Grenoble Alpes, France Martin Fränzle Carl von Ossietzky Univ. Oldenburg, Germany Orna Grumberg Technion - Israel Institute of Technology Kim Guldstrand Larsen Aalborg University, Denmark Holger Hermanns Universität des Saarlandes, Germany Marijn Heule Carnegie Mellon University, USA Falk Howar TU Clausthal, IPSSE, Germany Benjamin Kiesl CISPA Helmholtz Center for Inf. Security, Germany Laura Kovacs Vienna University of Technology, Austria Aina Niemetz Stanford University, USA Gethin Norman University of Glasgow, UK David Parker (Chair) University of Birmingham, UK Corina Pasareanu CMU/NASA Ames Research Center, USA Nir Piterman University of Gothenburg, Sweden Kristin Yvonne Rozier Iowa State University, USA Philipp Ruemmer Uppsala University, Sweden Natasha Sharygina Università della Svizzera italiana, Switzerland Jan Strejček Masaryk University, Czech Republic Michael Tautschnig Queen Mary University of London, UK Jaco van de Pol Aarhus University, Denmark Tom van Dijk University of Twente, The Netherlands

## Artifact Evaluation Committee


## SV-COMP – Program Committee and Jury


Hernán Ponce de León (Dartagnan) Bundeswehr University Munich, Germany Henrich Lauko (DIVINE) Masaryk University, Czechia Felipe R. Monteiro (ESBMC) Fed. Univ. of Amazonas, Brazil Benjamin Quiring (GACAL) Northeastern University, USA Vaibhav Sharma (Java-Ranger) University of Minnesota, USA Philipp Ruemmer (JayHorn) Uppsala University, Sweden Peter Schrammel (JBMC) University of Sussex, UK Falk Howar (JDart) TU Dortmund, Germany Omar Inverso (Lazy-CSeq) Gran Sasso Science Institute, Italy Herbert Rocha (Map2Check) Universidade Federal do Amazonas, Brazil Philipp Berger (NITWIT) RWTH Aachen, Germany Cedric Richter (PeSCo) Paderborn University, Germany Saurabh Joshi (Pinaka) IIT Hyderabad, India Veronika Šoková (PredatorHP) BUT Brno, Czech Republic Willem Visser (SPF) Amazon Web Services, USA Marek Chalupa (Symbiotic) Masaryk University, Czech Republic Matthias Heizmann (UAutomizer) University of Freiburg, Germany Alexander Nutz (UKojak) University of Freiburg, Germany Daniel Dietsch (UTaipan) University of Freiburg, Germany Priyanka Darke (VeriAbs) Tata Consultancy Services, India Raveendra K. Medicherla (VeriFuzz) Tata Consultancy Services, India Liangze Yin (Yogar-CBMC) Nat. Univ. of Defense Technology, China

## Steering Committee

Bernhard Steffen (Chair) TU Dortmund, Germany Dirk Beyer LMU Munich, Germany Rance Cleaveland University of Maryland, USA Holger Hermanns Universität des Saarlandes, Germany Kim G. Larsen Aalborg University, Denmark

## Additional Reviewers

Alexandre Dit Sandretto, Julien Asadi, Sepideh Ashok, Pranav Avigad, Jeremy Baanen, Tim Bacci, Giorgio Bacci, Giovanni Backeman, Peter Bae, Kyungmin Barbosa, Haniel Bentkamp, Alexander Berani Abdelwahab, Erzana Biewer, Sebastian Blahoudek, Fanda Blicha, Martin Bozga, Marius Bozzano, Marco Bønneland, Frederik M. Cerna, David Ceska, Milan Chalupa, Marek Chapoutot, Alexandre Dierl, Simon Dureja, Rohit Ebrahimi, Masoud Eisentraut, Julia Endrullis, Jörg Ernst, Gidon Esen, Zafer Fan, Jiameng Fazekas, Katalin Fedyukovich, Grigory Fleury, Mathias Fokkink, Wan Forets, Marcelo Freiberger, Felix Frenkel, Hadar Friedberger, Karlheinz Frohme, Markus Fu, Feisi Fürnkranz, Johannes Giacobbe, Mirco Gjøl Jensen, Peter

Gossen, Frederik Goudsmid, Ohad Griggio, Alberto Grover, Kush Gutiérrez, Elena Haaswijk, Winston Hadžić, Vedad Hahn, Ernst Moritz Hansen, Mikkel Hartmanns, Arnd Hecking-Harbusch, Jesko Hofmann, Jana Holzner, Stephan Hugunin, Jasper Humenberger, Andreas Hupel, Lars Hyvärinen, Antti Irfan, Ahmed Jasper, Marc Jaulin, Luc Jensen, Mathias Claus Jensen, Peter Gjøl Jonas, Martin Jonsson, Bengt Jonáš, Martin Kacianka, Severin Kaminski, Benjamin Lucien Kanav, Sudeep Kempa, Brian Khalimov, Ayrat Kiourti, Panagiota Klauck, Michaela Klüppelholz, Sascha Koenighofer, Bettina Kopetzki, Dawid Krcal, Pavel Kröger, Paul Kupferman, Orna Köhl, Maximilian Lahkim Bennani, Ismail Legay, Axel Lemberger, Thomas Liang, Chencheng

Lorber, Florian Ma, Meiyi Major, Juraj Mann, Makai Marcovich, Ron Marescotti, Matteo Martins, Ruben Meggendorfer, Tobias Mikučionis, Marius Mitsch, Stefan Mover, Sergio Mues, Malte Murtovi, Alnis Möhlmann, Eike Mömke, Tobias Müller, David Narváez, David Naujokat, Stefan Oliveira da Costa, Ana Otoni, Rodrigo Pagel, Jens Parlato, Gennaro Paskevich, Andrei Peppelman, Marijn Perelli, Giuseppe Pivoluska, Matej Popescu, Andrei Puch, Stefan Putot, Sylvie Rebola-Pardo, Adrián

Reynolds, Andrew Rothenberg, Bat-Chen Roveri, Marco Rowe, Reuben Rüthing, Oliver Schilling, Christian Shoukry, Yasser Spießl, Martin Srba, Jiri Stankovic, Miroslav Stierand, Ingo Štill, Vladimír Stjerna, Albin Stock, Gregory Stojic, Ivan Theel, Oliver Tian, Chun Tonetta, Stefano Trtík, Marek van der Ploeg, Atze Vom Dorff, Sebastian Wardega, Kacper Weininger, Maximilian Wendler, Philipp Wimmer, Simon Winkels, Jan Yolcu, Emre Zeljić, Aleksandar Zhou, Weichao

## Contents – Part I

#### Program Verification

and Stefano Tonetta


Multi-agent Safety Verification Using Symmetry Transformations . . . . . . . . . 173 Hussein Sibai, Navid Mokhlesi, Chuchu Fan, and Sayan Mitra

xviii Contents – Part I


### Verifying Concurrent Systems


#### Model Checking and Reachability

Partial Order Reduction for Deep Bug Finding in Synchronous Hardware . . . 367 Makai Mann and Clark Barrett


## Contents – Part II

#### Bisimulation


#### Logic and Proof




## Program Verification

## Software Verification with PDR: An Implementation of the State of the Art

Dirk Beyer<sup>1</sup> and Matthias Dangl<sup>1</sup>

LMU Munich, Germany

Abstract. Property-directed reachability (PDR) is a SAT/SMT-based reachability algorithm that incrementally constructs inductive invariants. After it was successfully applied to hardware model checking, several adaptations to software model checking have been proposed. We contribute a replicable and thorough comparative evaluation of the state of the art: We (1) implemented a standalone PDR algorithm and, as improvement, a PDR-based auxiliary-invariant generator for k-induction, and (2) performed an experimental study on the largest publicly available benchmark set of C verification tasks, in which we explore the effectiveness and efficiency of software verification with PDR. The main contribution of our work is to establish a reproducible baseline for ongoing research in the area by providing a well-engineered reference implementation and an experimental evaluation of the existing techniques.

Keywords: Software verification · Program analysis · Invariant generation · Property-directed reachability (PDR) · IC3 · k-Induction· VVT · CPAchecker

## 1 Introduction

Automatic software verification [24] is a broad research area with many success stories and large impact on technology that is applied in industry [2, 14, 27]. It complements other general approaches to ensure functional correctness, like software testing [31] and interactive software verification [3]. One large sub-area of automatic software verification includes algorithms and approaches that are based on SMT technology. Classic approaches like bounded model checking [10], predicate abstraction [1, 19], and k-induction [5, 26, 32] are well understood and evaluated; a recent survey [6] provides a uniform overview and sheds light on the differences of the algorithms. Property-directed reachability (PDR) [12] is a relatively recent (2011) approach that is not yet included in comparative evaluations that go beyond applying different implementations of the same or different techniques to a set of benchmark tasks, but additionally pair such experiments with a discussion of how the concepts can be expressed in a common formalism. The approach was originally applied to transition systems from hardware designs, but was also adapted to software verification [11, 12, 13, 15, 16, 25, 28, 29].

An extended version of this article is available as technical report [8].

A replication package is available on Zenodo [9].

While in theory, given the aforementioned body of work on the topic, the advantages and disadvantages of using PDR seem clear, we are interested in understanding the effect of applying PDR to a large set of verification tasks that were collected from academia and also from industrial software, such as the Linux kernel. To achieve this goal, we implemented one PDR adaptation for software verification, and another approach that integrates a PDR-like invariantgeneration module into a k-induction approach.

PDR Adaptation for Software Verification. PDR is a model-checking algorithm that tries to construct an inductive safety invariant by incrementally learning clauses that are inductive relative to previously learned clauses. The clauselearning strategy is guided by counterexamples to induction, i.e., each time a proof of inductiveness fails, the algorithm attempts to learn a new clause to avoid the same counterexample to induction in the future. Originally, this algorithm was designed as a SAT-based technique for Boolean finite-state systems. Every adaptation of PDR to software verification therefore needs to consider how to effectively and efficiently handle the infinite state space and how to transfer the algorithm from SAT to SMT. Furthermore, the adaptation to software has to deal with the program counter.

PDR-like Invariant Generation. Whenever an induction-proof attempt fails with a counterexample, the counterexample describes a state s that can transition into a bad state (that violates the safety property), which means that in order to make the proof succeed, s must be removed from consideration by an auxiliary invariant. From this bad-state predecessor s, the clause-learning strategy of PDR proceeds to generate such an auxiliary invariant by applying the following two steps: (1) s is first generalized to a set of states C that all transition into a bad state; (2) an invariant is constructed that is (a) inductive relative to previously found invariants<sup>1</sup> and (b) at least strong enough to eliminate all states in C. If it fails to construct such an invariant and prove its inductiveness, then the steps are recursively re-applied to the counterexample obtained from the failed induction attempt.

We experimentally investigate two implementations of adaptations of PDR to software verification (CPAchecker-CTIGAR and Vvt-CTIGAR), as well as several combinations that use the PDR-like invariant-generation module that we designed and implemented for this study.

Example. Figure 1 shows an example C program (eq2.c) that contains four unsigned integer variables w, x, y, and z. In line 10, the variable w is initialized to an unknown value via the input function \_\_VERIFIER\_nondet\_uint(); then, its value is copied to x in line 11. In line 12, variable y is initialized with the value of w+1, and in line 13, variable z is initialized with the value of x+1, such

<sup>1</sup> An assertion F is said to be inductive relative to an invariant Inv if Inv can be used as an auxiliary invariant for the proof of inductiveness <sup>∀</sup>s<sup>j</sup> , sj+1 : <sup>F</sup>(s<sup>j</sup> ) <sup>∧</sup> <sup>T</sup>(s<sup>j</sup> , sj+1) <sup>⇒</sup> <sup>F</sup>(sj+1) by conjoining Inv to the induction hypothesis F(s<sup>j</sup> ), such that the modified induction query ∀s<sup>j</sup> , sj+1 : F(s<sup>j</sup> ) *∧ Inv*(**sj**) ∧ T(s<sup>j</sup> , sj+1) ⇒ F(sj+1) allows a proof by induction to succeed. [12]

```
1 extern void __VERIFIER_error() __attribute__
 -
 → ((__noreturn__));
2 extern unsigned int __VERIFIER_nondet_uint(void);
3 void __VERIFIER_assert(int cond) {
4 if (!(cond)) {
5 ERROR: __VERIFIER_error();
6 }
7 return;
8 }
9 int main(void) {
10 unsigned int w = __VERIFIER_nondet_uint();
11 unsigned int x = w;
12 unsigned int y = w + 1;
13 unsigned int z = x + 1;
14 while (__VERIFIER_nondet_uint()) {
15 y++;
16 z++;
17 }
18 __VERIFIER_assert(y == z);
19 return 0;
20 }
```
Fig. 1: Example C program eq2.c

that at this point, w and x are equal to each other, and y and z are also equal to each other. Then, from line 14 to line 17, a loop with a nondeterministic exit condition (and therefore an unknown number of iterations) increments in each iteration both variables y and z. Lastly, line 18 asserts that after the loop, y and z are (still) equal to each other. Since y and z are equal before the loop, and are always incremented together within the loop, the invariant y = z is inductive. However, since there is no direct connection between y and z but only an indirect one via their shared dependency on w, naïve data-flow-based techniques may fail to find this invariant. In fact, we tried several configurations of the verification framework CPAchecker, and found that many of them fail to prove this program:


make the step to y = z due to the inequalities between w and y, and x and z, respectively.


We will now briefly sketch how KIPDR detects the invariant y = z for the example verification task. At first, KIPDR attempts to prove by induction that when line 18 is reached, the assertion condition holds, which fails as discussed previously. However, this failed induction attempt yields a counterexample to induction where the values of <sup>y</sup> and <sup>z</sup> differ from each other, e.g., <sup>y</sup> = 0 <sup>∧</sup> <sup>z</sup> = 1, which is then generalized to y = z, i.e., a set of states that includes the concrete predecessor of a bad state from the counterexample, as well as many other states that would violate the assertion, if they were reachable themselves. Then, KIPDR attempts to find an inductive invariant that eliminates all of these states, and the attempt succeeds with the invariant y = z. Afterwards, KIPDR re-attempts its original induction proof to show that the assertion is never violated, which now succeeds due to the auxiliary invariant y = z.

Contributions. We present the following contributions:


Related Work. While PDR (also known as IC3 for its first implementation [12]) was introduced as a SAT-based algorithm for model checking finite-state Boolean transition systems [13], several approaches have since then been presented to extend it to SMT and to apply it to the verification of software models: PDR has been suggested as an interpolation engine for Impact, but experiments have shown that it is too expensive in the general case, and is most effective if only

applied as a fall-back engine for cases where a cheaper interpolation engine fails to produce useful interpolants [15]. It also has been proposed to improve this approach by tracking control-flow locations explicitly instead of symbolically [28], thereby avoiding the problem that many iterations of the algorithm are spent only to learn the control flow, and this idea has later been extended by several improvements to the generalization step of PDR [29]. Another approach is to model the program using a Boolean abstraction, which has the advantage that it requires only few changes to the original algorithm, but the disadvantage that a refinement procedure is necessary to handle the spurious paths introduced by the abstraction: One such approach uses infeasible error paths (i.e., counterexampleguided abstraction refinement (CEGAR) [17]) to refine the abstraction [16], while another (CTIGAR) uses counterexamples to induction [11]; both of these refinement techniques use interpolation to obtain abstraction predicates; the latter of the two techniques is used in two of the configurations we compare in our evaluation (CPAchecker-CTIGAR and Vvt-CTIGAR [20]). A different extension of PDR to verify infinite-state systems that does not require abstraction refinement is property-directed k-induction [25], which increases the power of the induction checks used in PDR by applying k-induction instead of 1-induction, and which uses model-based generalization in addition to interpolation to reason about potentially-infinite sets of states. Unfortunately, support for effective model-based generalization is rare in SMT solvers <sup>2</sup>, making this approach impractical. In contrast, our KIPDR algorithm presented in Sect. 3 only requires support for interpolation, which is available in several SMT solvers.

Despite this multitude of adaptations of PDR to infinite-state systems, most implementations in practice require their input to be encoded as transition systems. The only available software verifiers applicable to actual C programs and implement PDR-based techniques are CPAchecker [7], SeaHorn [23], and Vvt [20].

## 2 Background

In this section, we briefly introduce the algorithms PDR and k-induction, which provide the core concepts on which we base our ideas. In the following description of PDR and k-induction, we use the following notation: given the state variables s and s within a state-transition system T that represents the program, predicate I(s) denotes that s is an initial state, T(s, s- ) that a transition from s to s exists, and P(s) that the safety property P holds for state s.

#### 2.1 PDR

PDR maintains a list of k frames, where a frame F<sup>i</sup> is a predicate that represents an overapproximation of all states reachable within at most 0 ≤ i ≤ k steps, and a queue of proof obligations, which guide invariant discovery towards invariants

<sup>2</sup> The implementation of the approach of property-directed k-induction combines two SMT solvers, because neither of them supports all features required by the technique.

relevant to prove the correctness of a safety property P. For a given state s, the notation Fi(s) means that the predicate F<sup>i</sup> holds for state s. The index i of a frame F<sup>i</sup> is called its level, and the frame F<sup>k</sup> is called the frontier, because it represents the largest overapproximation of reachable states computed by the algorithm [12]. The algorithm maintains the following invariants:


Using these data structures and algorithm invariants, the algorithm attempts to find either a counterexample to P or a 1-inductive invariant F<sup>i</sup> such that Fi(s) ⇔ Fi+1(s) for some level i ∈ {0,...,k − 1}. Until either of these potential outcomes is reached, PDR shifts back and forth between the following two phases:


An example of this algorithm is presented in a technical report [8, pp. 7–8]. A more detailed presentation of PDR can be found in the literature [12].

<sup>3</sup> By "push forward", we mean to add a predicate c from frame F<sup>i</sup> to frame Fi+1 [12].

(a) Consecution check makes sure to only conjoin to frame Fi+1 such c<sup>i</sup> from F<sup>i</sup> that are inductive relative to F<sup>i</sup> w.r.t. transition relation T

(b) If phase 1 results in a proof obligation t, k − 1 (top), then phase 2 resolves either by strengthening F<sup>k</sup> with c (left), or by creating a new (backwards) proof obligation u, k−2 (right); if the chain of proof obligations propagates back to the initial states, then a feasible error path is found

Fig. 2: Visualization of (a) the consecution check and (b) the handling of proofobligations.

#### 2.2 k-Induction

Like PDR, k-induction attempts to prove a safety property P by applying induction. However, while PDR strengthens its induction hypothesis by using clauses extracted from specific counterexamples to induction after failed induction attempts, k-induction strengthens its induction hypothesis by increasing the length of the unrolling of the transition relation.

Starting with an initial value for the bound k (usually 1), the k-induction algorithm increases the value of k iteratively after each unsuccessful attempt at finding a specification violation (base case), proving correctness via complete loop unrolling (forward condition), or inductively proving correctness of the program (inductive-step case).

Base Case. The base case of k-induction consists of running BMC with the current bound k. <sup>4</sup> This means that starting from all initial program states, all

<sup>4</sup> We define the loop bound as the number of visits of the loop head, that is, with loop bound k = 1, the loop head is visited once, but there was not yet any unwinding of the loop body. This nicely matches the intuition for k-induction: 1-inductiveness means that if the invariant holds for one state (without loop unrolling), then it holds again after one loop unrolling in the successor state; k-inductiveness means that if the invariant holds for k states (k − 1 loop unrollings), then it holds again after one more loop unrolling in the successor state.

states of the program reachable within at most k −1 unwindings of the transition relation are explored. If a ¬P-state is found, the algorithm terminates.

Forward Condition. If no ¬P-state is found by the BMC in the base case, the algorithm continues by performing the forward-condition check, which attempts to prove that BMC fully explored the state space of the program by checking that no state with distance k- > k − 1 to the initial state is reachable. If this check is successful, the algorithm terminates.

Inductive-Step Case. The forward-condition check, however, can only prove safety for programs with finite (and, in practice, short) loops. To prove safety beyond the bound k, the algorithm applies induction: The inductive-step case attempts to prove that after every sequence of k unrollings of the transition relation that did not reach a ¬P-state, there can also be no subsequent transition into a ¬P-state by unwinding the transition relation once more. In the realm of model checking of software, however, the safety property P is often not directly k-inductive for any value of k, thus causing the inductive-step-case check to fail. It is therefore state-of-the-art practice to add auxiliary invariants to this check to further strengthen the induction hypothesis and make it more likely to succeed. Thus, the inductive-step case proves a program safe if the following condition is unsatisfiable:

$$Inv(s\_n) \land \bigwedge\_{i=n}^{n+k-1} \left( P(s\_i) \land T(s\_i, s\_{i+1}) \right) \land \neg P(s\_{n+k})$$

where Inv is an auxiliary invariant, and sn,...,sn+<sup>k</sup> is any sequence of states. If this check fails, the induction attempt is inconclusive, and the program is neither proved safe nor unsafe yet with the current value of k and the given auxiliary invariant. In this case, the algorithm increases the value of k and starts over.

A detailed presentation of k-induction can be found in the literature [5, 6].

## 3 Combining k-Induction with PDR

Algorithm 1 shows an extension of k-induction with continuously-refined invariants [5] that applies PDR's aspect of learning from counterexamples to induction and that can be applied both as a main proof engine as well as an invariant generator. This allows us to apply this extension of k-induction as an invariant generator to a main <sup>k</sup>-induction procedure, similar to the KI←−-KI approach [5].

Inputs. The algorithm takes the following inputs: The value kinit is used to initialize the unrolling bound k, whereas the function inc is used to increase k in line 33 after each major iteration of the algorithm, up to an upper limit of k defined by the value kmax enforced in line 3. The set of initial program states is described by the predicate I, the possible state transitions are described

```
Algorithm 1 Iterative-Deepening k-Induction with Property Direction
Input: the initial value kinit ≥ 1 for the bound k,
    an upper limit kmax for the bound k,
    a function inc : N → N with ∀n ∈ N : inc(n) > n,
    the initial states defined by the predicate I,
    the transfer relation defined by the predicate T,
    a safety property P,
    a function get_currently_known_invariant to obtain auxiliary invariants,
    a Boolean pd that enables or disables property direction,
    a function lift : N × (S → B) × (S → B) × S → (S → B), and
    a function strengthen : N × (S → B) × (S → B) → (S → B),
    where S is the set of program states.
Output: true if P holds, false otherwise
Variables: the current bound k := kinit,
    the invariant InternalInv := true computed by this algorithm internally, and
    the set O := {} of current proof obligations.
 1: while k ≤ kmax do
 2: Oprev := O
 3: O := {}
 4: base_case := I(s0) ∧
                          k-

                           −1
                          n=0 n
                                −1
                               i=0
                                   T(si, si+1) ∧ ¬P(sn)

 5: if sat(base_case) then
 6: return false
 7: forward_condition := I(s0) ∧
                                  k
                                   −1
                                  i=0
                                     T(si, si+1)
 8: if ¬ sat(forward_condition) then
 9: return true
10: if pd then
11: for each o ∈ Oprev do
12: base_caseo := I(s0) ∧
                                k-

                                 −1
                                n=0 n
                                      −1
                                     i=0
                                        T(si, si+1) ∧ ¬o(sn)

13: if sat(base_caseo) then
14: return false
15: else
16: step_caseon :=
                            n+
                             k−1
                             i=n
                                 (o(si) ∧ T(si, si+1)) ∧ ¬o(sn+k)
17: ExternalInv := get_currently_known_invariant()
18: Inv := InternalInv ∧ ExternalInv
19: if sat(Inv(sn) ∧ step_caseon) then
```

$$\begin{array}{ll} 20 \colon & s\_o := \text{satisfying preaccessor state} \\ 21 \colon & O := O \cup \{ \neg \text{lift}(k, Inv, o, s\_o) \} \end{array}$$

$$\text{22:} \qquad \qquad \text{else}$$

23: InternalInv := InternalInv <sup>∧</sup> strengthen(k,Inv, o) n+

$$\begin{aligned} 24: \quad &step\\_case\_n := \bigwedge\_{i=n}^{n+k-1} (P(s\_i) \land T(s\_i, s\_{i+1})) \land \neg P(s\_{n+k})\\ 25: \quad &ExteralInv := \textit{get\\_currently\\_known\\_invariant} () \end{aligned}$$

26: Inv := InternalInv <sup>∧</sup> ExternalInv

27: if sat(Inv(sn) <sup>∧</sup> step\_casen) then

$$\begin{array}{ll} 28: & \text{if } pd \text{ then} \\ 29: & s := \text{satisfying predecessor state} \\ 30: & O := O \cup \{\neg \text{lift}(k, Inv, P, s)\} \end{array}$$

31: else

32: return true

```
33: k := inc(k)
34: return unknown
```
by the transition relation T, and the set of safe states is described by the safety property P. The accessor get\_currently\_known\_invariant is used to obtain the strongest invariant currently available via a concurrently running (external) auxiliary-invariant generator. A Boolean flag pd (reminding of "property-directed") is used to control whether or not failed induction checks are used to guide the algorithm towards a sufficient strengthening of the safety property P to prove correctness; if pd is set to false, the algorithm behaves exactly like standard k-induction. Given a failed attempt to prove some candidate invariant Q <sup>5</sup> by induction, the function lift is used to obtain from a concrete counterexample-to-induction (CTI) state a set of CTI states described by a state predicate C. An implementation of the function lift needs to satisfy the condition that for a CTI <sup>s</sup> <sup>∈</sup> <sup>S</sup> where <sup>S</sup> is the set of program states, <sup>k</sup> <sup>∈</sup> <sup>N</sup>, Inv <sup>∈</sup> (<sup>S</sup> <sup>→</sup> <sup>B</sup>), <sup>Q</sup> <sup>∈</sup> (<sup>S</sup> <sup>→</sup> <sup>B</sup>), and <sup>C</sup> <sup>=</sup> lift(k,Inv, Q, s), the following holds: C(s)∧ <sup>∀</sup>s<sup>n</sup> <sup>∈</sup> <sup>S</sup> : <sup>C</sup>(sn) <sup>⇒</sup> Inv(sn)<sup>∧</sup> n+ k−1 i=n (Q(si)∧T(si,si+1)) ⇒ ¬Q(sn+k) , which means that the CTI s must be an element of the set of states described by the resulting predicate C and that all states in this set must be CTIs, i.e., they need to be k-predecessors of ¬Q-states, or in other words, each state in the set of states described by the predicate C must reach some ¬Q-state via k unrollings of the transition relation T. We can implement lift using Craig interpolation [18, 30]

$$\text{between } A: s = s\_n \text{ and } B: Inv(s\_n) \land \bigwedge\_{i=n}^{n+k-1} (Q(s\_i) \land T(s\_i, s\_{i+1})) \Rightarrow \neg Q(s\_{n+k}),$$

because <sup>s</sup> is a CTI, and therefore we know that <sup>A</sup> <sup>⇒</sup> <sup>B</sup> holds. <sup>6</sup> Hence, the resulting interpolant satisfies the criteria for C to be a valid lifting of s according to the requirements towards the function lift as outlined above. The function strengthen is used to obtain for a k-inductive invariant a stronger k-inductive invariant, i.e., its result needs to imply the input invariant, and, just like the input invariant, it must not be violated within k loop iterations and must be k-inductive.

Algorithm. Lines 4 to 6 show the base-case check (BMC) and lines 7 to 9 show the forward-condition check, both as described in Sect. 2. If pd is set to true, lines 10 to 23 attempt to prove each proof obligation using k-induction: Lines 12 to 14 check the base case for a proof obligation o. If any violations of the proof obligation o are found, this means that a predecessor state of <sup>a</sup> <sup>¬</sup>P-state, and thus, transitively, a <sup>¬</sup>P-state, is reachable, so we return false. If, otherwise, no violation was found, lines 16 to 23 check the inductive-step case to prove o. <sup>7</sup> We strengthen the induction hypothesis of the step-case check by

<sup>5</sup> Depending on the step the algorithm is in, Q may be either the safety property P or a proof obligation o.

<sup>6</sup> The formula <sup>C</sup> is called Craig interpolant for two formulas <sup>A</sup> and <sup>B</sup> with <sup>A</sup> <sup>⇒</sup> <sup>B</sup>, if

<sup>A</sup> <sup>⇒</sup> <sup>C</sup>, <sup>C</sup> <sup>⇒</sup> <sup>B</sup>, and all variables in <sup>C</sup> occur in both <sup>A</sup> and <sup>B</sup>. <sup>7</sup> Note that we do not need to check the forward condition for proof obligations, because the forward condition is unrelated to the safety property and the proof obligations, and therefore only needs to be checked once in each major iteration (i.e., once after each increment of k).

conjoining auxiliary invariants from an external invariant generator (via a call to get\_currently\_known\_invariant) and the auxiliary invariant computed internally from proof obligations that we successfully proved previously. If the step-case check for o is unsuccessful, we extract the resulting CTI state, lift it to a set of CTI states, and construct a new proof obligation so that we can later attempt to prove that these CTI states are unreachable. If, on the other hand, the step-case check for o is successful, we no longer track o in the set O of unproven proof obligations (this case corresponds to line 22). We could now directly use the proof obligation as an invariant, but instead, in line 23 we first try to strengthen it into a stronger invariant that removes even more unreachable states from future consideration before conjoining it to our internally computed auxiliary invariant. In our implementation, we implement strengthen by attempting to drop components from a (disjunctive) invariant and checking if the remaining clause is still inductive. In lines 24 to 32, we check the inductive-step case for the safety property P. This check is mostly analogous to the inductive-step case check for the proof obligations described above, except that if the check is successful, we immediately return true.

Note that Alg. 1 eagerly increases k, even if the set O of proof obligations is not empty. This heuristic prevents the PDR part from iterating through long chains of proof obligations, it rather delegates the unrolling to the k-induction part.

An in-depth discussion of a practical example of Alg. 1 is presented in a technical report [8, pp. 12–14].

## 4 Evaluation

In this section, we present an extensive experimental study on the effectiveness and efficiency of adaptations of PDR to software verification.

#### 4.1 Compared Approaches

We use the following abbreviations to distinguish between the different techniques that we evaluated:


We do not evaluate the used invariant generators as standalone approaches, as they are designed specifically to be used as auxiliary components and do not perform well enough in isolation. For example, data-flow based invariant-generation approaches are often too imprecise to verify tasks, whereas more precise techniques like KIPDR might run into too many timeouts to be competitive. Instead, we use the framework of k-induction with continuously refined invariant generation, which has been shown to be able to combine quick and precise techniques [5].

#### 4.2 Experimental Setup

Details about the experimental setup can be found in the technical report [8], which describes in Sect. 4.2 which tool versions and SMT theory we used, in Sect. 4.3 which benchmark sets we used and why, in Sect. 4.4 which existing verifiers we compared to and which versions we took, in Sect. 4.5 which computing resources and execution environment were used, in Sect. 4.6 the scoring schema, and in Sect. 4.12 which threats to the validity of the evaluation we identified and how we mitigated them.

### 4.3 Results

In the following, we pick a few highlights from the results of our experimental evaluation, in order to illustrate the potential of the approaches. A complete and more detailed report of the results is available in the extended version of this article [8]. Table 1: Results for all 5 591 verification tasks, 1 457 of which contain bugs, while the other 4 134 are considered to be safe, for the two CTIGAR implementations CPAchecker-CTIGAR and Vvt-CTIGAR, for a theoretical "virtual best" combination of both CTIGAR implementations where an oracle selects the best implementation for each task, for k-induction without auxiliary invariants (KI), and for the best configurations of each tool: CPAchecker's KI←−- DF;KIPDR, SeaHorn, and Vvt as a portfolio verifier.


Suitability of CPAchecker for PDR. The first set of experiments showed that our implementation is at least as good as (and even better than) the only available implementation of PDR for software model checking. Columns two and three of Table 1 compare the results obtained by running the two implementations of CTIGAR on the whole benchmark set, and the last column of the table shows the results achieved with the standard configuration of Vvt, which runs not only CTIGAR, but a portfolio analysis of CTIGAR and bounded model checking. The quantile plot in Fig. 3 shows the CPU times that the two tool configurations spent on their correct results.

KIPDR versus Data-Flow Techniques. Data-flow-based techniques are usually more efficient than KIPDR. The higher efficiency of data-flow-based techniques is most likely due to the simple form of the invariants needed to prove the programs correct. In order to experiment with progams that have some more interesting invariants, we created a few programs by hand and tried to verify those. Table 2 shows the results we obtained for these tasks. Our experiments support the hypothesis that KIPDR can be very strong and efficient on tasks that other approaches can not solve. It is important to note that this is an 'exists' statement and can not be generalized, as shown by the results that KIPDR is often outperformed by simpler, data-flow-based invariant-generation techniques.

Fig. 3: Comparing two implementations of CTIGAR; quantile plot for accumulated number of solved tasks (proofs and alarms) showing the CPU time (linear scale below 1 s, logarithmic above) for the successful results of CPAchecker-CTIGAR and Vvt-CTIGAR

Table 2: Results of four k-induction-based configurations in CPAchecker with different approaches for generating auxiliary invariants for seven manually crafted verification tasks that do not contain bugs and are not solved by k-induction without auxiliary invariants; an entry "T" means that the CPU-time limit was exceeded, an entry "M" means that the memory limit was exceeded, and all other entries represent the CPU time a configuration spent to correctly solve the task


Comparison with Non-PDR Approaches. The seven example programs <sup>8</sup> were added to the benchmark collection that was also used for SV-COMP 2019, and thus, results are available for all verifiers that participated in the competition <sup>9</sup>. Table 3 summarizes the results of the best six verifiers in comparison with the KI←−- KIPDR approach that we created for the study in this paper. Those verifiers are, in alphabetical order, Skink, Ultimate Automizer, Ultimate Kojak,

<sup>8</sup> https://github.com/sosy-lab/sv-benchmarks/tree/svcomp19/c/loop-invariants/

<sup>9</sup> See the last seven rows in this table: https://sv-comp.sosy-lab.org/2019/results/ results-verified/ReachSafety-Loops.table.html

Table 3: Results of SV-COMP 2019 for the six verifiers that performed best on our seven manually crafted verification tasks, compared to the results of KI←−- KIPDR approach previously shown in Table 2; an entry "T" means that the CPU-time limit was exceeded, an entry "M" means that the memory limit was exceeded, an entry "O" means that the verifier gave up deliberately for other reasons, and all other entries represent the CPU time a verifier configuration spent to correctly solve the task; note that SV-COMP 2019 used Ubuntu 18.04 based on Linux 4.15, whereas our evaluation of KI←−- KIPDR used Ubuntu 16.04 based on Linux 4.4; otherwise, the evaluation environment was the same


Ultimate Taipan, VeriAbs, and VIAP. Fig. 4a directly compares the CPU times spent on tasks of in the subcategory ReachSafety-Loops, which is known to contain many tasks that require effort to be spent on generating loop invariants, by both VeriAbs, which was the best verifier in that subcategory, and KI←−- KIPDR. We observe that for the majority of tasks that were solved by both verifiers, KI←−- KIPDR is faster than VeriAbs, often by more than an order of magnitude. This shows that the invariant generator KIPDR can be significantly faster than other approaches, depending on the benchmark set. As before, a more in-depth discussion can be found in the technical report [8].

Comparison against PDR-Based Verification Tools. The last three columns of Table 1 give an overview over the best configurations of three software verifiers that use adaptations of PDR: For CPAchecker, we selected KI←−- DF;KIPDR. For SeaHorn, we used the same configuration as submitted by the developers to the 2016 Competition on Software Verification (SV-COMP 2016) [22]. For Vvt, we used the portfolio configuration. We observe that SeaHorn achieves the highest number of correct proofs, but also has a significant amount of incorrect proofs. CPAchecker is the slowest of the three tools and finds fewer proofs than SeaHorn, but CPAchecker has no wrong proofs, and also closely leads in the amount of found bugs. The score-based quantile plot of these results displayed in Fig. 4b visualizes the effects of incorrect results on the computed score. While the graph for SeaHorn is longer, i.e., shows that it solved the most tasks, it is offset to the left by a total penalty of −3 344 points, such that in the end, KI←−- DF;KIPDR accumulates the highest score because it has a smaller penalty of only −32 points.

(a) Scatter plot comparing the CPU times spent on tasks by VeriAbs and KI←−-KIPDR

(b) Quantile plot for accumulated score of solved tasks (offset to the left by total penalty from wrong results) showing the CPU time (linear scale below 1 s, logarithmic above) for the successful results of KI←−- DF;KIPDR, Sea-Horn, and Vvt-Portfolio

Fig. 4: Plots that support the claim that the conclusions of the evaluation are relevant

These results confirm our hypothesis that our previous conclusions are relevant, because they are supported by an implementation that is competitive when compared to the best available PDR-based tool implementations.

## 5 Conclusion

Property-directed reachability (a.k.a. IC3) is a verification approach that is popular and successful in some fields of formal verification (e.g., hardware designs, Horn clauses). Unfortunately, there is a large gap between this success story and the applicability in practical software verification. We are closing this gap by (a) providing a well-engineered implementation of one published adaptation of PDR to software verification, (b) designing and implementing an invariant generator based on the ideas of PDR, and (c) providing an evaluation of all applicable tools and approaches on the largest available benchmark set of C verification tasks. This provides a good foundation as baseline for ongoing research in this area.

The results of our comparative evaluation extend the knowledge about PDR for software verification in the following ways: (1) Our implementation outperforms the existing implementation of PDR (Vvt) and is more precise than the other software verifier that uses PDR (SeaHorn). Thus, our implementation can serve as a reference implementation for further research on PDR for software verification. (2) On most of the programs in the widely used sv-benchmarks collection of verification tasks, other techniques are more effective (solve more problems) and more efficient (solve the problems faster). (3) PDR can be an effective and efficient technique for computing invariants that are difficult to obtain: there are programs for which our PDR-based approach is more efficient than the best invariant generator from SV-COMP in the subcategory ReachSafety-Loops.

#### 5.1 Data Availability Statement

A replication package for this article including all evaluated implementations and BenchExec is available at Zenodo [9]. Current versions of CPAchecker are available at https://github.com/sosy-lab/cpachecker. The benchmark set of SV-COMP 2018 used in Sect. 4 is available online at https://github. com/sosy-lab/sv-benchmarks/releases/tag/svcomp18 and the dataset from SV-COMP 2019 [4] that we analyzed is available at https://sv-comp.sosy-lab. org/2019/results/results-verified/All-Raw.zip.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Verifying Array Manipulating Programs with Full-Program Induction**

Supratik Chakraborty<sup>1</sup> , Ashutosh Gupta1, and Divyesh Unadkat1,<sup>2</sup>

<sup>1</sup> Indian Institute of Technology Bombay, Mumbai, India {supratik,akg}@cse.iitb.ac.in <sup>2</sup> TCS Research, Pune, India divyesh.unadkat@tcs.com

**Abstract.** We present a full-program induction technique for proving (a sub-class of) quantified as well as quantifier-free properties of programs manipulating arrays of parametric size N. Instead of inducting over individual loops, our technique inducts over the entire program (possibly containing multiple loops) directly via the program parameter N. Significantly, this does not require generation or use of loop-specific invariants. We have developed a prototype tool Vajra to assess the efficacy of our technique. We demonstrate the performance of Vajra vis-a-vis several state-of-the-art tools on a set of array manipulating benchmarks.

## **1 Introduction**

Programs with loops manipulating arrays are common in a variety of applications. Unfortunately, assertion checking in such programs is undecidable. Existing tools therefore use a combination of techniques that work well for certain classes of programs and assertions, and yield conservative results otherwise. In this paper, we present a new technique to add to this arsenal of techniques. Specifically, we focus on programs with loops manipulating arrays, where the size of each array is a symbolic integer parameter N (> 0). We allow (a subclass of) quantified and quantifier-free pre- and post-conditions that may depend on the symbolic parameter N. Thus, the problem we wish to solve can be viewed as checking the validity of a parameterized Hoare triple {ϕ(N)} P<sup>N</sup> {ψ(N)} for all values of N (> 0), where the program P<sup>N</sup> computes with arrays of size N, and N is a free variable in ϕ(·) and ψ(·). Fig. 1(a) shows an example of one such Hoare triple, written using assume and assert. This triple effectively verifies that i−1 <sup>j</sup>=0 1 + j−1 <sup>k</sup>=0 <sup>6</sup> · (<sup>k</sup> + 1) = i <sup>3</sup> for all <sup>i</sup> ∈ {<sup>0</sup> ...N <sup>−</sup>1}, and for all N > 0. Although each loop in Fig. 1(a) is simple, their sequential composition makes it difficult even for state-of-the-art tools like VIAP [26], VeriAbs [8], FreqHorn [10], Tiler [4], Vaphor [24], or Booster [1] to prove the post-condition correct. In fact, none of the above tools succeed in automatically proving the post-condition in Fig. 1(a). In contrast, the technique presented in this paper, called *full-program induction*, proves the post-condition in Fig. 1(a) correct within a few seconds.

Like several earlier approaches [29], full-program induction relies on mathematical induction to reason about programs with loops. However, the way in

```
// assume(true)
1. for (int t1=0; t1<N; t1=t1+1) {
2. if (t1==0) { A[t1] = 6; }
3. else { A[t1] = A[t1-1]+6; }
4. }
5. for (int t2=0; t2<N; t2=t2+1) {
6. if (t2==0) { B[t2] = 1; }
7. else { B[t2] = B[t2-1]+A[t2-1]; }
8. }
9. for (int t3=0; t3<N; t3=t3+1) {
10. if (t3==0) { C[t3] = 0; }
11. else { C[t3] = C[t3-1]+B[t3-1]; }
12.}
// assert(forall i in 0..N-1, C[i]= i^3)
                  (a)
                                        // assume(true)
                                        1. A[0] = 6;
                                        2. B[0] = 1;
                                        3. C[0] = 0;
                                        // assert((C[0] = 0^3) and (B[0] = 1^3 - 0^3) and
                                        // (A[0] = 2^3 - 2*1^3 + 0^3))
                                                                  (b)
                                        // assume((N > 1) and (C_Nm1[N-2] = (N-2)^3) and
                                        // (B_Nm1[N-2] = (N-1)^3 - (N-2)^3) and
                                        // (A_Nm1[N-2] = N^3 - 2*(N-1)^3 + (N-2)^3))
                                        1. A[N-1] = A_Nm1[N-2] + 6;
                                        2. B[N-1] = B_Nm1[N-2] + A_Nm1[N-2];
                                        3. C[N-1] = C_Nm1[N-2] + B_Nm1[N-2];
                                        // assert((C[N-1] = (N-1)^3) and
                                        // (B[N-1] = N^3 - (N-1)^3) and
                                        // (A[N-1] = (N+1)^3 - 2*N^3 + (N-1)^3))
                                                                  (c)
```
**Fig. 1.** Original and simplified Hoare triples

which the inductive claim is formulated and proved differs significantly. Specifically, (i) we *do not require explicit or implicit loop-specific invariants* to be provided by the user or generated by a solver (viz. by constrained Horn clause solvers [21,15,10] or recurrence solvers [26,17]), (ii) we *induct on the full program* (possibly containing multiple loops) with parameter N and not on iterations of individual loops in the program, and (iii) we perform *non-trivial correct-byconstruction code transformations*, whenever feasible, to simplify the inductive step of reasoning. The combination of these factors often reduces reasoning about a program with multiple loops to reasoning about one with fewer (sometimes even none) and "simpler" loops, thereby simplifying proof goals. In this paper, we demonstrate this, focusing on programs with sequentially composed, but non-nested loops.

As an illustration of simplifications that can result from application of fullprogram induction, consider the problem in Fig. 1(a) again. Full-program induction reduces checking the validity of the Hoare triple in Fig. 1(a) to checking the validity of two "simpler" Hoare triples, represented in Figs. 1(b) and 1(c). Note that the programs in Figs. 1(b) and 1(c) are loop-free. In addition, their pre- and post-conditions are quantifier-free. The validity of these Hoare triples (Figs. 1(b) and 1(c)) can therefore be easily proved, e.g. by bounded model checking [6] with a back-end SMT solver like Z3 [25]. Note that the value computed in each iteration of each loop in Fig. 1(a) is data-dependent on previous iterations of the respective loops. Hence, none of these loops can be trivially translated to a set of parallel assignments.

Invariant-based techniques, viz. [13,16,23,7,14,30,2,19], are popularly used to reason about array manipulating programs. If we were to prove the assertion in Fig. 1(a) using such techniques, it would be necessary to use appropriate loop-specific invariants for each of the three loops in Fig. 1(a). The weakest loop invariants needed to prove the post-condition in this example are: ∀i ∈ [0...t1 − 1] (A[i]=6i + 6) for the first loop (lines 1-4), ∀j ∈ [0...t2 − 1] (B[j]=3j<sup>2</sup> + 3<sup>j</sup> + 1) <sup>∧</sup> (A[j]=6<sup>j</sup> + 6) for the second loop (lines 5-8), and <sup>∀</sup><sup>k</sup> <sup>∈</sup> [0...t<sup>3</sup> <sup>−</sup> 1] (C[k] = <sup>k</sup><sup>3</sup>) <sup>∧</sup> (B[k]=3k<sup>2</sup> + 3<sup>k</sup> + 1) for the third loop (lines 9-12). Unfortunately, automatically deriving such quantified non-linear loop invariants is far from trivial. Template-based invariant generators, viz. [12,9], are among the best-performers when generating such complex invariants. However, their abilities are fundamentally limited by the set of templates from which they choose. We therefore choose not to depend on invariants for individual loops in our work at all. Instead of inducting over the iterations of each individual loop, we propose to reason about the entire program (containing one or more loops) directly, while inducting on the parameter N. Needless to say, each approach has its own strengths and limitations, and the right choice always depends on the problem at hand. Our experiments show that full-program induction is able to solve several difficult problem instances with an off-the-shelf SMT solver (Z3) at the back-end, which other techniques either fail to solve these instances, or rely on sophisticated recurrence solvers.

The primary contributions of our work can be summarized as follows.


*Related Work.* Earlier work on inductive techniques can be broadly categorized into those that require loop-specific invariants to be provided or automatically generated, and those that work without them. Requiring a "good" inductive invariant for every loop in a program effectively shifts the onus of assertion checking to that of invariant generation. Among techniques that do not require explicit inductive invariants or mid-conditions for each loop, there are some that require loop invariants to be implicitly generated by a constraint solver. These include techniques based on constrained Horn clause solving [21,15,10,24], acceleration and lazy interpolation for arrays [1] and those that use inductively defined predicates and recurrence solving [26,17], among others. Thanks to the impressive capabilities of modern constraint solvers and the effectiveness of carefully tuned heuristics for stringing together multiple solvers, this approach has shown a lot of promise in recent years. However, at a fundamental level, these formulations rely on solving implicitly specified loop invariants garbed as constraint solving problems. There are yet other techniques, such as that in [28], that truly do not depend on loop invariants being generated. In fact, the technique of [28] comes closest to our work in principle. However, [28] imposes severe restrictions on the input programs, and the example in Fig. 1 does not meet these restrictions. Therefore, the technique of [28] is applicable only to a small part of the program-assertion space over which our technique works. Techniques such as tiling [4] reason one loop at a time and apply only when loops have simple data dependencies across iterations (called *non-interference* of tiles in [4]). It effectively uses a slice of the post-condition of a loop as an inductive invariant, and

also requires strong enough mid-conditions to be generated in the case of sequentially composed loops. We circumvent all of these requirements in the current work. For some other techniques for analyzing array manipulating programs, please see [7,19,18].

## **2 Overview of Full-program Induction**

Recall that our objective is to check the validity of the parameterized Hoare triple {ϕ(N)} P<sup>N</sup> {ψ(N)} for all N > 0. At a high level, our approach works like any other inductive technique. Thus, we have a base case, where we verify that the parameterized Hoare triple holds for some small values of N, say 0 < N ≤ M. We then hypothesize that {ϕ(N −1)} PN−<sup>1</sup> {ψ(N −1)} holds for some N>M, and try to show that this implies {ϕ(N)} P<sup>N</sup> {ψ(N)}. While this sounds simple in principle, there are several technical difficulties en route. Our contribution lies in overcoming these difficulties algorithmically for a large class of programs and assertions, thereby making *full-program induction* a viable and competitive technique for proving properties of array manipulating programs.

We rely on an important, yet reasonable, assumption that can be stated as follows: *For every value of* N (> 0)*, every loop in* P<sup>N</sup> *can be statically unrolled a fixed number (say* f(N)*) of times to yield a loop-free program* P <sup>N</sup> *that is semantically equivalent to* P<sup>N</sup> *.* Note that this does not imply that reasoning about loops can be translated into loop-free reasoning. In general, f(N) is a non-constant function, and hence, the number of unrollings of loops in P<sup>N</sup> may strongly depend on N. In our experience, loops in a vast majority of array manipulating programs (including Fig. 1(a)) satisfy the above assumption. Consequently, the base case of our induction reduces to checking a Hoare triple for a loop-free program. Checking such a Hoare triple is easily achieved by compiling the pre-condition, program and post-condition into an SMT formula, whose (un)satisfiability can be checked with an off-the-shelf back-end SMT solver.

The inductive step is the most complex one, and is the focus of the rest of the paper. Recall that the inductive hypothesis asserts that {ϕ(N−1)} P<sup>N</sup>−<sup>1</sup> {ψ(N− 1)} is valid. To make use of this hypothesis in the inductive step, we must relate the validity of {ϕ(N)} P<sup>N</sup> {ψ(N)} to that of {ϕ(N − 1)} P<sup>N</sup>−<sup>1</sup> {ψ(N − 1)}. We propose doing this, whenever possible, via two key notions – that of "difference" program and "difference" pre-condition. Given a parameterized program P<sup>N</sup> , intuitively the "difference" program ∂P<sup>N</sup> is one such that P<sup>N</sup>−<sup>1</sup>; ∂P<sup>N</sup> is semantically equivalent to P<sup>N</sup> , where ";" denotes sequential composition. It turns out that for our purposes, the semantic equivalence alluded to above is not really necessary; it suffices to have ∂P<sup>N</sup> such that {ϕ(N)} P<sup>N</sup> {ψ(N)} is valid iff {ϕ(N)} P<sup>N</sup>−<sup>1</sup>; ∂P<sup>N</sup> {ψ(N)} is valid. We will henceforth use this interpretation of a "difference" program. The "difference" pre-condition ∂ϕ(N) is a formula such that (i) ϕ(N) → (ϕ(N − 1) ∧ ∂ϕ(N)) and (ii) the execution of P<sup>N</sup>−<sup>1</sup> doesn't affect the truth of ∂ϕ(N). Computing ∂P<sup>N</sup> and ∂ϕ(N) is not easy in general, and we discuss this in detail in the rest of the paper.

Assuming we have ∂P<sup>N</sup> and ∂ϕ(N) with the properties stated above, the proof obligation {ϕ(N)} P<sup>N</sup> {ψ(N)} can now be reduced to proving {ϕ(N − 1)} P<sup>N</sup>−<sup>1</sup> {ψ(N − 1)} and {ψ(N − 1) ∧ ∂ϕ(N)} ∂P<sup>N</sup> {ψ(N)}. The first triple follows from the inductive hypothesis. Proving the second triple may require strengthening the pre-condition, say by a formula Pre(N − 1), in general. Recalling that we are in the inductive step of mathematical induction, we formulate the new proof sub-goal in such a case as {(ψ(N − 1) ∧ Pre(N − 1)) ∧ ∂ϕ(N)} ∂P<sup>N</sup> {ψ(N) ∧ Pre(N)}. While this is somewhat reminiscent of loop invariants, observe that Pre(N) is *not* really a loop-specific invariant. Instead, it is analogous to computing an invariant for the entire program, possibly containing multiple loops. Specifically, the above process strengthens both the preand post-condition of {ψ(N − 1) ∧ ∂ϕ(N)} ∂P<sup>N</sup> {ψ(N)} simultaneously using Pre(N − 1) and Pre(N), respectively. The strengthened post-condition of the resulting Hoare triple may, in turn, require a new pre-condition Pre- (N − 1) to be satisfied. This process of strengthening the pre- and post-conditions of the Hoare triple involving ∂P<sup>N</sup> can be iterated until a fix-point is reached, i.e. no further pre-conditions are needed for the parameterized Hoare triple to hold. While the fix-point was quickly reached for all benchmarks we experimented with, we also discuss how to handle cases where the above process may not converge easily. Note that since we effectively strengthen the pre-condition of the Hoare triple in the inductive step, for the overall induction to go through, it is also necessary to check that the strengthened assertions hold at the end of each base case check. The technique described above is called *full-program induction*, and the following theorem guarantees its soundness.

## **Theorem 1.** *Given* {ϕ(N)} P<sup>N</sup> {ψ(N)}*, suppose the following are true:*


*Then* {ϕ<sup>N</sup> } P<sup>N</sup> {ψ<sup>N</sup> } *holds for all* N ≥ 1*.*

*Proof.* For 0 < N ≤ M, condition 3(a) ensures that {ϕ(N)} P<sup>N</sup> {ψ(N)} holds. For N>M, note that by virtue of condition 1 and 2(b), {ϕ(N)} P<sup>N</sup> {ψ(N)} holds if {ϕ(N − 1) ∧ ∂ϕ(N)} P<sup>N</sup>−<sup>1</sup>; ∂P<sup>N</sup> {ψ(N) ∧ Pre(N)} holds. With ψ(N − 1) ∧ Pre(N − 1) as a mid-condition, and by virtue of condition 2(a), the latter Hoare triple holds for N>M if {ϕ(M)} P<sup>M</sup> {ψ(M) ∧ Pre(M)} holds and {ψ(N − 1) ∧ Pre(N − 1) ∧ ∂ϕ(N)} ∂P<sup>N</sup> {ψ(N) ∧ Pre(N)} holds for all N>M. Both these triples are seen to hold by virtue of conditions 3(b) and (c). 

## **3 Algorithms for Full-program Induction**

We now discuss the *full-program induction* algorithm, focusing on generation of three crucial components: *difference program* ∂P<sup>N</sup> , *difference pre-condition* ∂ϕ(N), and the formula Pre(N) for strengthening pre- and post-conditions.

#### **3.1 Preliminaries**

We consider array manipulating programs generated by the grammar shown below (adapted from [4]).

$$\begin{array}{lcll} \mathsf{PB} ::= \mathsf{St} \\ \mathsf{St} ::= v := \mathsf{E} \mid A[\mathsf{E}] := \mathsf{E} \mid \mathsf{if}(\mathsf{Bool}) \text{ then } \mathsf{St} \text{ else } \mathsf{St} \mid \mathsf{St} \text{ } \mid \mathsf{St} \mid \mathsf{S} \\ \qquad \qquad \qquad \mathsf{for } (\ell := 0; \ell < \mathsf{E}; \ell := \ell + 1) \mid \mathsf{St1} \rangle \\ \mathsf{St1} ::= v := \mathsf{E} \mid A[\mathsf{E}] := \mathsf{E} \mid \mathsf{if}(\mathsf{Bool}) \text{ then } \mathsf{St1} \text{ else } \mathsf{S} \mathsf{t1} \mid \mathsf{S} \mathsf{t1} \mid \mathsf{S} \mathsf{t1} \\ \qquad \mathsf{E} ::= \mathsf{E} \, \mathsf{op} \, \mathsf{E} \mid A[\mathsf{E}] \mid v \mid \, \ell \mid \, \mathsf{C} \mid \mathsf{S} \, \mid \, \mathsf{N} \\ \qquad \qquad \qquad \mathsf{op} ::= + \mid \, \mathsf{I} \, \mathsf{-} \mid \, \, \mathsf{'} \\ \mathsf{Bool} ::= \mathsf{E} \, \mathsf{relop} \, \mathsf{E} \mid \, \mathsf{Bool} \, \mathsf{AND} \, \mathsf{Bool} \mid \, \mathsf{NOT} \, \mathsf{Bool} \, \mathsf{E} \mid \, \mathsf{Bool} \, \mathsf{BoolE} \, \, \, \, \, \mathsf{Bool} \, \mathsf{BoolE} \, \, \, \, \, \end{array}$$

This grammar restricts programs to have non-nested loops. While this limits the set of programs to which our technique currently applies, there is a large class of useful programs, with possibly long sequences of loops, that are included in the scope of our work. In reality, our technique also applies to a subclass of programs with nested loops. However, characterizing this class of programs through a grammar is a bit unwieldy, and we avoid doing so for reasons of clarity. A program P<sup>N</sup> is a tuple (V,L, A, PB, N), where V is a set of scalar variables, L⊆V is a set of scalar loop counter variables, A is a set of array variables, PB is the program body, and N is a special symbol denoting a positive integer parameter. In the grammar shown above, we assume A ∈ A, v ∈ V\L, ∈ L and <sup>c</sup> <sup>∈</sup> <sup>Z</sup>. Furthermore, "relop" is assumed to be one of the relational operators and "op"is an arithmetic operator from the set {+, -, \*, /}. We also assume that each loop L has a unique loop counter variable which is initialized at the beginning of L and is incremented by 1 at the end of each iteration. Assignments in the body of L are assumed not to update . Finally, for each loop with termination condition < E, we assume that E is an expression in terms of N. We denote by kL(N) the number of times loop L iterates in the program with parameter N. We verify Hoare triples of the form {ϕ(N)} P<sup>N</sup> {ψ(N)}, where ϕ(N) and ψ(N) are either universally quantified formulas of the form ∀I (Φ(I,N) =⇒ Ψ(A, V,I,N)) or quantifier-free formulas of the form Ξ(A, V, N). In the above, I is a sequence of array index variables, Φ is a quantifier-free formula in the theory of arithmetic over integers, and Ψ and Ξ are quantifier-free formulas in the combined theory of arrays and arithmetic over integers.

Static single assignment (SSA) [27] is a well-known technique for renaming scalar variables such that a variable is written at most once in a program. For our purposes, we also wish to rename arrays so that each loop updates its own version of an array and multiple writes to an array element within the same loop happen on different versions of the array. Array SSA [20] renaming has been studied earlier in the context of compilers to achieve this goal. We propose using SSA renaming for both scalars and arrays as a pre-processing step of our analysis. Therefore, we assume henceforth that the input program is SSA renamed (for both scalars and arrays). We also assume that the post-condition is expressed in terms of these SSA renamed scalar and array variables.

We represent a program using a *control flow graph* G = (Locs, Edges, μ), where Locs denotes the set of control locations (nodes) of the program, Edges ⊆ Locs×Locs×{**tt**, **ff**,U} represents the flow of control and μ : Locs → AssignSt ∪ BoolE annotates every node in Locs with either an assignment statement (of the form v := E or A[E] := E) from the set of assignment statements AssignSt, or a Boolean condition. Two distinguished control locations, called Start and End in Locs, represent the entry and exit points of the program. An edge (n1, n2, label) represents flow of control from n<sup>1</sup> to n<sup>2</sup> without any other intervening node. It is labeled **tt** or **ff** if μ(n1) is a Boolean condition, and is labeled U otherwise. If μ(n1) is a Boolean condition, there are two outgoing edges from n1, labeled **tt** and **ff** respectively, and control flows from n<sup>1</sup> to n<sup>2</sup> along (n1, n2, label) only if μ(n1) evaluates to label. If μ(n1) is an assignment statement, there is a single outgoing edge from n1, and it is labeled U. Henceforth, we use CFG to refer to the control flow graph.

A CFG may have cycles due to the presence of loops in the program. A *backedge* of a loop is an edge from the node corresponding to the last statement in the loop body to the node representing the loop head. An *exit-edge* is an edge from the loop head to a node outside the loop body. An *incoming-edge* is an edge to the loop head from a node outside the loop body. We assume that every loop has exactly one *back-edge*, one *incoming-edge* and one *exit-edge*. For technical reasons, and without loss of generality, we also assume that the *exit-edge* of a loop always goes to a "nop" node (say, having a statement x = x;).

Given a program, the program dependence graph (or PDG) G = (V, DE, CE) represents data and control dependencies among program statements. Here, V denotes vertices representing assignment statements and boolean expressions, DE ⊆ V × V denotes data dependence edges and CE ⊆ V × V denotes control dependence edges. Standard dataflow analysis identifies dependencies between program variables and thereby among statements. Dependence between statements updating array elements requires a more careful analysis. Let S<sup>1</sup> and S<sup>2</sup> be two statements in loops L<sup>1</sup> and L<sup>2</sup> where there is a control-flow path from S<sup>1</sup> to S<sup>2</sup> in the CFG. Suppose S<sup>1</sup> is of the form A[f(i1, N)] = F(...); where f is an array index expression, i<sup>1</sup> is the loop counter variable of L1, and F is an arbitrary expression. Suppose S<sup>2</sup> is of the form X = G(A[g(i2, N)]);, where X is a variable or array element, G is an arbitrary expression, and g is an array index expression.

**Definition 1.** *We say that* S<sup>2</sup> *in* L<sup>2</sup> *depends on* S<sup>1</sup> *in* L<sup>1</sup> *if there exists* i1, i<sup>2</sup> *such that* 0 ≤ i<sup>1</sup> < k<sup>L</sup><sup>1</sup> (N) *and* 0 ≤ i<sup>2</sup> < k<sup>L</sup><sup>2</sup> (N) *and* f(i1, N) = g(i2, N)*.*

The routine ComputeRefinedPDG shown in Algorithm 1 constructs and refines the program dependence graph G = (V, DE, CE) for the input program P<sup>N</sup> . It uses the function ConstructPDG (line 1) based on the technique of [11] to create an initial graph. For a node n in G, let *def* (n) and *uses*(n) refer to the set of variables/array elements defined and used, respectively, in the statement/boolean expression corresponding to n. Similarly, let *subscript*(v, n) refer to the index expression of the array element v referred to at node n. Predicate is-array(v) evaluates to true if v is an array element and false if v is

### **Algorithm 1** ComputeRefinedPDG(P<sup>N</sup> : Program)

1: G(V, DE, CE) := ConstructPDG(P<sup>N</sup> ); 2: **if** <sup>∃</sup>v, n, n- . (n, n- ) ∈ DE ∧ is-array(v) ∧ v ∈ def (n) ∧ v ∈ uses(n- ) **then** 3: **if** n is part of a loop L **then** 4: - := loop counter of L; 5: Let <sup>φ</sup>(n) be the constraint (0 <sup>≤</sup> -<kL); 6: **else** 7: Let φ(n) be true; 8: **if** n is part of a loop L **then** 9: -- := loop counter of L- ; 10: Let φ- (n- ) be the constraint (0 ≤ -- < kL- ); 11: **else** 12: Let φ- (n- ) be true; 13: **if** <sup>φ</sup>(n) <sup>∧</sup> <sup>φ</sup>(n- ) ∧ - subscript(v, n) = subscript(v, n- ) is unsatisfiable **then** 14: DE <sup>=</sup> DE \ {(n, n- )}; Remove dependence edges with non-overlapping subscripts

15: **return** G(V, DE, CE);

## **Algorithm 2** PeelAllLoops((Locs, Edges, μ) : CFG of P<sup>N</sup> )

1: P<sup>p</sup> <sup>N</sup> := (Locsp, Edgesp, μp), where <sup>L</sup><sup>p</sup> <sup>=</sup> Locs, Edges<sup>p</sup> <sup>=</sup> Edges, <sup>μ</sup><sup>p</sup> <sup>=</sup> <sup>μ</sup>; Copy of <sup>P</sup><sup>N</sup> 2: peelNodes := ∅; 3: **for** each loop <sup>L</sup> <sup>∈</sup> Loops(P<sup>p</sup> <sup>N</sup> ) **do** 4: Let kL(N) be the expression for iteration count of L in P<sup>p</sup> N ; 5: peelCount := Simplify(kL(N) <sup>−</sup> <sup>k</sup>L(<sup>N</sup> <sup>−</sup> 1)); 6: **if** peelCount is non-constant **then throw** "Failed to peel non-constant number of iterations"; 7: P<sup>p</sup> <sup>N</sup> , Locs- := PeelSingleLoop(P<sup>p</sup> <sup>N</sup> , L, kL(N − 1), peelCount); Transforms loop L so that last peelCount iterations of L are peeled/unrolled. Updated CFG and newly created CFG nodes for the peeled iterations are returned by PeelSingleLoop. 8: peelNodes := peelNodes <sup>∪</sup> Locs- ;

9: **return** P<sup>p</sup> <sup>N</sup> , peelNodes;

a scalar variable. Note that lines 2-14 of ComputeRefinedPDG removes data dependence edges between nodes of G that do not satisfy Definition 1.

#### **3.2 Core Modules in the Technique**

*Peeling the Loops.* To relate P<sup>N</sup> to PN−1, we first ensure that the corresponding loops in both programs iterate the same number of times by *peeling* extra iterations from the loops in P<sup>N</sup> . This is done by routine PeelAllLoops shown in Algorithm 2. The algorithm first makes a copy, viz. P<sup>p</sup> <sup>N</sup> , of the input CFG P<sup>N</sup> . Let Loops(P<sup>p</sup> <sup>N</sup> ) denote the set of loops of <sup>P</sup><sup>p</sup> <sup>N</sup> , and let kL(N) and kL(N −1) denote the number of times loop L iterates in P<sup>p</sup> <sup>N</sup> and <sup>P</sup><sup>p</sup> <sup>N</sup>−<sup>1</sup> respectively. The difference kL(N) − kL(N − 1), computed in line 5, gives the extra iterations of loop L in P<sup>p</sup> <sup>N</sup> . If this difference is not a constant, we currently report a failure of our technique (line 6). Otherwise, routine PeelSingleLoop transforms loop L of P<sup>p</sup> <sup>N</sup> as follows: it replaces the termination condition (<kL(N)) of L by (<kL(N − 1)). It also peels (or unrolls) the last (kL(N) − kL(N − 1)) iterations of L and adds control flow edges such that the the peeled iterations are executed immediately after the loop body is iterated kL(N − 1) times. Effectively, PeelSingleLoop unrolls/peels the last (kL(N) <sup>−</sup> <sup>k</sup>L(<sup>N</sup> <sup>−</sup> 1)) iterations of loop L in P<sup>p</sup> <sup>N</sup> . The transformed CFG is returned as the updated <sup>P</sup><sup>p</sup> <sup>N</sup> in line 7. In addition, PeelSingleLoop also returns the set Locs of all CFG nodes newly added while peeling the loop L. The overall updated CFG and the set of all peeled nodes obtained after peeling all loops in P<sup>p</sup> <sup>N</sup> is returned in line 9.

**Lemma 1.** {ϕ<sup>N</sup> } <sup>P</sup><sup>N</sup> {ψ<sup>N</sup> } *holds iff* {ϕ<sup>N</sup> } <sup>P</sup><sup>p</sup> <sup>N</sup> {ψ<sup>N</sup> } *holds.*


**Algorithm 3** ComputeAffected(P<sup>N</sup> : Program, peelNodes : Peeled Statements)

*Affected Variable Analysis.* Before we discuss the generation of ∂P<sup>N</sup> , we present an analysis that identifies variables/array elements that may take different values in P<sup>N</sup> and PN−1. For example, the first kL(N − 1) iterations of L in P<sup>N</sup> may not be semantically equivalent to the (entire) kL(N − 1) iterations of L in PN−1. This is because the semantics of statements in L may depend on the value of N either directly or indirectly. We call variables/array elements updated in such statements as *affected* variables. For every loop with statements having potentially different semantics in P<sup>N</sup> and PN−1, the difference program ∂P<sup>N</sup> must have a version of the loop with statements that restore the effect of the first kL(N −1) iterations of L in P<sup>N</sup> after the (entire) kL(N −1) iterations of L in PN−<sup>1</sup> have been executed. Furthermore, for statements in P<sup>N</sup> that are not enclosed within loops but have potentially different semantics from the corresponding statements in PN−1, ∂P<sup>N</sup> must also rectify the values of variables/array elements updated in such statements.

Subroutine ComputeAffected, shown in Algorithm 3, computes the set of *affected* variables P<sup>N</sup> . We first construct the program dependence graph by calling the function ComputeRefinedPDG (line 1) defined in Algorithm 1. Let AffectedVars represent the set of *affected* variables/array elements. We initialize it (line 2) with variable N since its value is different in P<sup>N</sup> and P<sup>N</sup>−<sup>1</sup>. For a node n in the PDG G, we use *reaching*-*def* (v, n) to refer to the set of nodes where the variable/array element v is defined and the definition reaches its use at node n. In line 4, we collect nodes in the graph that are not the ones peeled from loops in P<sup>N</sup> . The loop in lines 5-18 iterates over the collected nodes to identify affected variables. If a variable in the index expression of an array access is affected then that array element is considered affected (lines 7-8). A definition at a node n is affected (marked in line 11) if any variable v used in the statement (checked in line 9) is defined in a *peeled* node (line 10). Similarly if the reaching definition of v is affected (line 12) the definition at n is affected (line 13). A variable defined in terms of an affected variable is also deemed to be affected (lines 1415). Finally, a variable definition that is control dependent on an affected variable is also considered affected (lines 16-18). The computation of affected variables is iterated until the set AffectedVars saturates.

**Lemma 2.** *Variables/Array elements not present in* AffectedVars *have the same value after* kL(N − 1) *iterations of its enclosing loop (if any) in* P<sup>N</sup>−<sup>1</sup> *as in* P<sup>N</sup> *.*

*Generating the Difference Program* ∂P**N***.* The routine ProgramDiff in Algorithm 4 shows how the difference program is computed. We peel each loop in the program and collect the list of peeled nodes (line 1) using Algorithm 2. We then compute the set of *affected* variables (line 2) using Algorithm 3. The difference program ∂P<sup>N</sup> inherits the skeletal structure of the program P<sup>N</sup> after peeling each loop (line 4). The algorithm then traverses the CFG of each loop in P<sup>N</sup> and removes the loops (lines 16-17) that do not update any *affected* variables from ∂P<sup>N</sup> . For every CFG node in other loops, it determines the corresponding node type (assignment or branch) and acts accordingly (lines 7-14). To explain the intuition behind the steps of this algorithm, we use the convention that all variables and arrays of PN−<sup>1</sup> have the suffix Nm1 (for N-minus-1), while those of P<sup>N</sup> have the suffix N. This allows us to express variables/array elements of P<sup>N</sup> in terms of the corresponding variables/array elements of PN−<sup>1</sup> in a systematic way in ∂P<sup>N</sup> , given that the intended composition is PN−1; ∂P<sup>N</sup> .

For assignment statements using simple arithmetic operators (+,-,\*,/), the sub-routine AssignmentDiff in Algorithm 4 computes a "difference" statement as follows. We assume that Nodes(L) returns the set of CFG nodes in loop L. For every assignment statement of the form v = E; in L, a corresponding statement is generated in ∂P<sup>N</sup> that expresses v N in terms of v Nm1 and the difference (or ratio) between versions of variables/arrays that appear as sub-expressions in E in PN−<sup>1</sup> and P<sup>N</sup> . For example, the statement A N[i] = B N[i] + v N; in P<sup>N</sup> gives rise to the "difference" statement A N[i] = A Nm1[i] + (B N[i] - B Nm1[i]) + (v N-v Nm1); in ∂P<sup>N</sup> . Similarly, the statement A N[i] = B N[i] \* v N; in P<sup>N</sup> gives rise to the "difference" statement A N[i] = A Nm1[i] \* (B N[i] / B Nm1[i]) \* (v N/v Nm1); under the assumption B Nm1[i] \* v Nm1 = 0.

There are additional kinds of statements that need special processing when generating ∂P<sup>N</sup> . These relate to accumulation of differences (or ratios). For example, if P<sup>N</sup> has a loop for(i = 0; i < N; i++) sum N = sum N+A N[i]; then the difference A N[i] - A Nm1[i] is aggregated over all indices from 0 through N − 2. In this case, the corresponding "difference" loop in ∂P<sup>N</sup> has the following form: sum N = sum Nm1; for(i = 0; i < N-1; i++) sum N = sum N + (A N[i] - A Nm1[i]);. A similar aggregation for multiplicative ratios can also be defined. Sub-routine AggregateAssignmentDiff in Algorithm 4 generates these "difference" statements.

Note that expressions like (B N[i] - B Nm1[i]) or (v N/v Nm1) can often be simplified from the already generated part of ∂P<sup>N</sup> . For example, if the already generated part has a statement of the form B N[i] = B Nm1[i] + expr1; or v N = expr2\*v Nm1;, and if expr1 and expr2 are constants or functions of N and loop counters, then we can use expr1 for B N[i] - B Nm1[i] and expr2 for

### **Algorithm 4** ProgramDiff(P<sup>N</sup> : program)

1: P<sup>N</sup> , peelNodes := PeelAllLoops(P<sup>N</sup> ); 2: AffectedVars := ComputeAffected(P<sup>N</sup> , peelNodes); 3: Let the CFG of P<sup>N</sup> be (Locs, E, μ); 4: ∂P<sup>N</sup> := (Locs- , E- , μ- ), where Locs- := Locs, E- := E, and μ- := ∅; 5: **for** each loop <sup>L</sup> <sup>∈</sup> Loops(P<sup>N</sup> ) **do** 6: **if** <sup>∃</sup><sup>v</sup> such that <sup>v</sup> is updated in <sup>L</sup> and <sup>v</sup> <sup>∈</sup> AffectedVars **then** 7: **for** each node <sup>n</sup> <sup>∈</sup> Nodes(L) **do** 8: st<sup>N</sup> := μ(n); 9: **if** st<sup>N</sup> is of the form w<sup>N</sup> := r<sup>1</sup> <sup>N</sup> op <sup>r</sup><sup>2</sup> <sup>N</sup> **then** 10: μ- (n) := AssignmentDiff( w<sup>N</sup> := r<sup>1</sup> <sup>N</sup> op <sup>r</sup><sup>2</sup> <sup>N</sup> ); 11: **else if** st<sup>N</sup> is of the form w<sup>N</sup> := w<sup>N</sup> op r<sup>1</sup> <sup>N</sup> wherein <sup>w</sup><sup>N</sup> is a scalar **then** 12: μ- (n) := AggregateAssignmentDiff( L, w<sup>N</sup> := w<sup>N</sup> op r<sup>1</sup> <sup>N</sup> ); 13: **else** st<sup>N</sup> is a conditional statement 14: μ- (n) := BranchDiff( st<sup>N</sup> , AffectedVars ); 15: **else** Remove loop L from CFG of ∂P<sup>N</sup> 16: (n1, n, U) := IncomingEdge(L); (n, n2, **ff**) := ExitEdge(L); 17: E- := E- \ {(n1, n, <sup>U</sup>), (n, n2, **ff**)}∪{(n1, n2, <sup>U</sup>)}; Locs- := Locs- \ Nodes(L); 18: **return** ∂P<sup>N</sup> ; AssignmentDiff( w<sup>N</sup> := r<sup>1</sup> <sup>N</sup> op <sup>r</sup><sup>2</sup> <sup>N</sup> ) 1: Let invop be the arithmetic inverse operator of op; <sup>+</sup> and <sup>−</sup> are inverse operators of each other, and so are <sup>×</sup> and <sup>÷</sup> 2: **if** op ∈ {+, ×} **then** 3: **return** w<sup>N</sup> := wNm<sup>1</sup> op (Simplify(r<sup>1</sup> <sup>N</sup> invop <sup>r</sup><sup>1</sup> Nm1) op Simplify(r<sup>2</sup> <sup>N</sup> invop <sup>r</sup><sup>2</sup> Nm1)); 4: **else if** op ∈ {−, ÷} **then** 5: **return** w<sup>N</sup> := wNm<sup>1</sup> invop (Simplify(r<sup>1</sup> <sup>N</sup> op <sup>r</sup><sup>1</sup> Nm1) op Simplify(r<sup>2</sup> <sup>N</sup> op <sup>r</sup><sup>2</sup> Nm1)); 6: **else** 7: **throw** "Specified operator not handled"; AggregateAssignmentDiff( L: loop, w<sup>N</sup> := w<sup>N</sup> op r<sup>1</sup> <sup>N</sup> ) 1: nfresh := FreshNode(); μ- (nfresh) := (w<sup>N</sup> := wNm1); Locs- := Locs- ∪ {nfresh}; 2: (n- , n--, U) := IncomingEdge(L); 3: E- := E- \ {(n- , n--, U)}∪{(n- , nfresh, U), (nfresh, n--, U)}; 4: **if** op ∈ {+, ∗} **then** 5: **return** w<sup>N</sup> := w<sup>N</sup> op Simplify(r<sup>1</sup> <sup>N</sup> invop <sup>r</sup><sup>1</sup> Nm1); 6: **else if** op ∈ {−, ÷} **then** 7: **return** w<sup>N</sup> := w<sup>N</sup> op Simplify(r<sup>1</sup> <sup>N</sup> op <sup>r</sup><sup>1</sup> Nm1); 8: **else** 9: **throw** "Specified operator not handled"; BranchDiff( st<sup>N</sup> : branch condition, AffectedVars : set of affected variables ) 1: Let n be CFG node corresponding to st<sup>N</sup> ; 2: **if** (∃<sup>v</sup> such that v is read in st<sup>N</sup> and <sup>v</sup> <sup>∈</sup> AffectedVars) <sup>∨</sup> (st<sup>N</sup> <sup>=</sup> stN−<sup>1</sup> is satisfiable) **then** 3: **throw** "Branch conditions in <sup>P</sup><sup>N</sup> and <sup>P</sup>N−<sup>1</sup> may not evaluate to same value"; 4: **else** 5: **return** stN−1;

v N/v Nm1 respectively. We use these optimizations aggressively in the function Simplify used in AssignmentDiff and AggregateAssignmentDiff.

For every CFG node representing a conditional branch in P<sup>N</sup> , Algorithm BranchDiff is used to determine if the result of the condition check can differ in P<sup>N</sup> and P<sup>N</sup>−<sup>1</sup>. If not, the conditional statement can be retained as such in the "difference" program. Otherwise, our current technique cannot compute ∂P<sup>N</sup> and we report a failure of our technique (see body of BranchDiff). For example, the conditional statement if (t3 == 0) in line 10 of Fig. 1(a) behaves identically in P<sup>N</sup>−<sup>1</sup> and P<sup>N</sup> , and therefore can be used as is in the loop in the difference program.

**Lemma 3.** ∂P<sup>N</sup> *generated by* ProgramDiff *is such that, for all* N > 1*,* {ϕ(N)} P<sup>N</sup>−<sup>1</sup>; ∂P<sup>N</sup> {ψ(N)} *holds iff* {ϕ(N)} P<sup>N</sup> {ψ(N)} *holds.*

### **Algorithm 5** SimplifyDiff(∂P<sup>N</sup> : difference program)

1: ∂P<sup>N</sup> := (Locs, E, μ) 2: ∂P- <sup>N</sup> := (Locs- , E- , μ- ), where Locs- := Locs, E- := E, and μ- := μ; 3: **for** each loop <sup>L</sup> <sup>∈</sup> Loops(∂P<sup>N</sup> ) **do** 4: (n1, n, U) := IncomingEdge(L); (n, n2, **ff**) := ExitEdge(L); 5: **if** Loop body of L is of the form w<sup>N</sup> := w<sup>N</sup> op expr, wherein w<sup>N</sup> is a scalar variable **then** 6: nacc = FreshNode(); 7: **if** op ∈ {+, −} **then** 8: μ- (nacc) := (w<sup>N</sup> := <sup>w</sup><sup>N</sup> op Simplify(kL(<sup>N</sup> <sup>−</sup> 1) <sup>∗</sup> expr)); 9: **else if** op ∈ {∗, ÷} **then** 10: μ- (nacc) := (w<sup>N</sup> := w<sup>N</sup> op Simplify(exprkL(N−1))); 11: **else throw** "Specified operator not handled"; 12: E- := E- - {(n1, n, <sup>U</sup>), (n, n2, **ff**)}∪{(n1, nacc, <sup>U</sup>), (nacc, n2, <sup>U</sup>)}; 13: Locs- := Locs- <sup>−</sup> Nodes(L) ∪ {nacc} ; 14: **if** Loop body of L is of the form w<sup>N</sup> := wNm<sup>1</sup> or w<sup>N</sup> := w<sup>N</sup> **then** 15: E- := E- − {(n1, n, <sup>U</sup>), (n, n2, **ff**)}∪{(n1, n2, <sup>U</sup>)}; Locs- := Locs- <sup>−</sup> Nodes(L); 16: **return** ∂P- N

*Simplifying the Difference Program.* While we have described a simple strategy to generate ∂P<sup>N</sup> above, this may lead to redundant statements in the naively generated "difference" code. For example, we may have a loop like for (i=0; i < N-1; i++) A N[i] = A Nm1[i];. Our implementation aggressively optimizes and removes such redundant code, renaming variables/arrays as needed (see routine SimplifyDiff in Algorithm 5). The program ∂P<sup>N</sup> may also contain loops that compute values of variables that can be accelerated. For example, we may have a loop for(i=0; i < N-1; i++) sum = sum + 1;. Algorithm SimplifyDiff removes this loop and introduces the statement sum = sum + (N-1);. This helps in ∂P<sup>N</sup> having fewer and simpler loops in a lot of cases.

**Lemma 4.** *Program* ∂P- <sup>N</sup> *generated by* SimplifyDiff *is such that, for all* N > 1*,* {ϕ(N)} PN−1; ∂P- <sup>N</sup> {ψ(N)} *holds iff* {ϕ(N)} PN−1; ∂P<sup>N</sup> {ψ(N)} *holds.*

*Generating the Difference Pre-condition* ∂ϕ(**N**)*.* We now present a simple syntactic algorithm, called SyntacticDiff, for generation of the difference precondition ∂ϕ(N). Although this suffices for all our experiments, for the sake of completeness, we present later a more sophisticated algorithm for generating ∂ϕ(N) simultaneously with Pre(N).

Formally, given ϕ(N), algorithm SyntacticDiff generates a formula ∂ϕ(N) such that ϕ(N) → (ϕ(N − 1) ∧ ∂ϕ(N)). Observe that if such a ∂ϕ(N) exists, then ϕ(N) → ϕ(N − 1) holds as well. Therefore, we can use the validity of ϕ(N) → ϕ(N − 1) as a test to decide the existence of ∂ϕ(N).

If <sup>ϕ</sup>(N) is of the syntactic form <sup>∀</sup><sup>i</sup> ∈ {<sup>0</sup> ...N} <sup>ϕ</sup>(i), then ∂ϕ(N) is easily seen to be ˆϕ(N). If <sup>ϕ</sup>(N) is of the syntactic form <sup>ϕ</sup><sup>1</sup>(N) ∧···∧ <sup>ϕ</sup><sup>k</sup>(N), then ∂ϕ(N) can be computed as ∂ϕ<sup>1</sup>(N) ∧···∧ ∂ϕ<sup>k</sup>(N). Finally, if <sup>ϕ</sup>(N) doesn't belong to any of these syntactic forms or if condition 2(a) of Theorem 1 is violated by the heuristically computed ∂ϕ(N), then we over-approximate ∂ϕ<sup>N</sup> by True. For a large fraction of our benchmarks, the pre-condition ϕ(N) was True, and hence ∂ϕ(N) was also True.

*Generating the Formula* Pre(**N** − **1**)*.* We use Dijsktra's weakest pre-condition computation to obtain Pre(N −1) after the "difference" pre-condition ∂ϕ(N) and the "difference" program ∂P<sup>N</sup> have been generated. The weakest pre-condition

**Algorithm 6** FPIVerify(P<sup>N</sup> : program, ϕ(N): pre-condn, ψ(N): post-condn)

1: **if** Base case check {ϕ(1)} <sup>P</sup><sup>1</sup> {ψ(1)} fails **then** 2: **return** "Counterexample found!"; 3: ∂ϕ(N) := SyntacticDiff(ϕ(N)); 4: ∂P<sup>N</sup> := ProgramDiff(P<sup>N</sup> ); 5: ∂P<sup>N</sup> := SimplifyDiff(∂P<sup>N</sup> ); Simplify and Accelerate loops 6: i := 0; 7: Prei(N) := ψ(N); 8: c Prei(N) := True; Cumulative conjoined pre-condition 9: **do** 10: **if** {<sup>c</sup> Prei(<sup>N</sup> <sup>−</sup> 1) <sup>∧</sup> <sup>ψ</sup>(<sup>N</sup> <sup>−</sup> 1) <sup>∧</sup> ∂ϕ(N)} <sup>∂</sup>P<sup>N</sup> {<sup>c</sup> Prei(N) <sup>∧</sup> <sup>ψ</sup>(N)} **then** 11: **return** True; Assertion verified 12: i := i + 1; 13: Prei(<sup>N</sup> <sup>−</sup> 1) := LoopFreeWP(Prei−1(N), ∂P<sup>N</sup> ); Dijkstra's WP sans WP-for-loops 14: **if** no new Prei(<sup>N</sup> <sup>−</sup> 1) obtained **then** Can happen if <sup>∂</sup>P<sup>N</sup> has a loop 15: **return** FPIVerify(∂P<sup>N</sup> , <sup>c</sup> Pre (<sup>N</sup> <sup>−</sup> 1) <sup>∧</sup> <sup>ψ</sup>(<sup>N</sup> <sup>−</sup> 1) <sup>∧</sup> ∂ϕ(N), <sup>c</sup> Pre (N) <sup>∧</sup> <sup>ψ</sup>(N)); 16: **else** 17: <sup>c</sup> Prei(N) := <sup>c</sup> Prei−1(N) <sup>∧</sup> Prei(N); 18: **while** Base case check {ϕ(1)} <sup>P</sup><sup>1</sup> {<sup>c</sup> Prei(1)} passes; 19: **return** False; Failed to prove by full-program induction i−1 i−1

can always be computed using quantifier elimination engines in state-of-the-art SMT solvers like Z3 if ∂P<sup>N</sup> is loop-free. In such cases, we use a set of heuristics to simplify the calculation of the weakest pre-condition before harnessing the power of the quantifier elimination engine. If ∂P<sup>N</sup> contains a loop, it may still be possible to obtain the weakest pre-condition if the loop doesn't affect the post-condition. Otherwise, we compute as much of the weakest pre-condition as can be computed from the non-loopy parts of ∂P<sup>N</sup> , and then try to recursively solve the problem by invoking full-program induction on ∂P<sup>N</sup> with appropriate pre- and post-conditions.

*Verification by Full-program Induction.* The basic full-program induction algorithm is presented as routine FPIVerify in Algorithm 6. The main steps of this algorithm are: checking conditions 3(a), 3(b) and 3(c) of Theorem 1 (lines 1, 18 and 10), calculating the weakest pre-condition of the relevant part of the post-condition (line 13), and strengthening the pre-condition and postcondition with the weakest pre-condition thus calculated (line 17). Since the weakest pre-condition computed in every iteration of the loop (Prei(N − 1) in line 13) is conjoined to strengthen the inductive pre-condition (c Prei(N) in line 17), it suffices to compute the weakest pre-condition of Pre<sup>i</sup>−<sup>1</sup>(N) (instead of c Prei(N) ∧ ψ(N)) in line 13. The possibly multiple iterations of strengthening of pre- and post-conditions is effected by the loop in lines 9-18. In case the loop terminates via the return statement in line 11, the inductive claim has been successfully proved. If the loop terminates by a violation of the condition in line 18, we report that verification by full-program induction failed. In case ∂P<sup>N</sup> has loops and no further weakest pre-conditions can be generated, we recursively invoke FPIVerify on ∂P<sup>N</sup> in line 15. This situation arises if, for example, we modify the example in Fig. 1(a) by having the statement C[t3] = N; (instead of C[t3] = 0;) in line 10. In this case, ∂P<sup>N</sup> has a single loop corresponding to the third loop in Fig. 1(a). The difference program of ∂P<sup>N</sup> is, however, loop-free, and hence the recursive invocation of full-program induction on ∂P<sup>N</sup> easily succeeds.

```
Algorithm 7 FPIDecomposeVerify( i : integer )
1: do
2: Pre-

        i(N − 1), ∂ϕ-

                  i(N) := NextDecomposition(Prei(N − 1));
3: Check if (a) ∂ϕ-

                 i(N) ∧ Pre-

                         i(N − 1) → Prei(N − 1),
4: (b) ϕ(N) → ϕ(N − 1) ∧ -

                                ∂ϕ-

                                  i(N) ∧ ∂ϕ(N)

                                              ,
5: (c) PN−1 does not update any variable or array element in ∂ϕ-

                                                            i(N)
6: if any check in lines 3-5 fails then
7: if HasNextDecomposition(Prei(N − 1)) then
8: continue;
9: else
10: return False;
11: if {c Prei−1(N −1)∧ψ(N −1)∧Prei(N −1)∧∂ϕ(N)} ∂PN {c Prei−1(N)∧ψ(N)∧Pre-

                                                                        i(N)}
  then
12: return True;  Assertion verified
13: else
14: c Prei(N) := c Prei−1(N) ∧ Pre-

                                i(N);
15: i := i + 1;
16: Prei(N − 1) := LoopFreeWP(Pre-

                                  i−1(N), ∂PN );  Dijkstra's WP sans WP-for-loops
17: if {ϕ(1)} P1 {c Prei−1(1) ∧ Prei(1)} does not hold then
18: i := i − 1;
19: else
20: prev ∂ϕ(N) := ∂ϕ(N);
21: ∂ϕ(N) := ∂ϕ-

                     i−1(N) ∧ ∂ϕ(N);
22: if FPIDecomposeVerify(i) returns False then
23: i := i − 1; ∂ϕ(N) := prev ∂ϕ(N);
24: else
25: return True;
26: while HasNextDecomposition(Prei(N − 1));
27: return False;
```
*Generalized FPI Algorithm.* While algorithm FPIVerify suffices for all of our experiments, we may not always be so lucky. Specifically, even if ∂P<sup>N</sup> is loopfree, the analysis may exit the loop in lines 9-18 of FPIVerify by violating the base case check in line 18. To handle (at least partly) such cases, we propose the following strategy. Whenever a (weakest) pre-condition Prei(N −1) is generated, instead of using it directly to strengthen the current pre- and post-conditions, we "decompose" it into two formulas Pre- <sup>i</sup>(N − 1) and ∂ϕ- <sup>i</sup>(N) with a two-fold intent: (a) potentially weaken Prei(N − 1) to Pre- <sup>i</sup>(N − 1), and (b) potentially strengthen the difference formula ∂ϕ(N) to ∂ϕ- <sup>i</sup>(N) ∧ ∂ϕ(N). The checks for these intended usages of Pre- <sup>i</sup>(N − 1) and ∂ϕ- <sup>i</sup>(N) are implemented in lines 3, 4, 5, 11 and 17 of routine FPIDecomposeVerify, shown as Algorithm 7. This routine is meant to be invoked as FPIDecomposeVerify(i) after each iteration of the loop in lines 9-18 of routine FPIVerify (so that Prei(N), c Prei(N) etc. are initialized properly). In general, several "decompositions" of Prei(N) may be possible, and some of them may work better than others. FPIDecompseVerify permits multiple decompositions to be tried through the use of the NextDecomposition and HasNextDecomposition functions. Lines 22-25 of FPIDecomposeVerify implement a simple back-tracking strategy, allowing a search of the space of decompositions of Prei(N − 1). Observe that when we use FPIDecomposeVerify, we simultaneously compute a difference formula (∂ϕ- <sup>i</sup>(N) ∧ ∂ϕ(N)) and an inductive pre-condition (c Pre<sup>i</sup>−<sup>1</sup>(N) ∧ Pre- <sup>i</sup>(N)).

**Lemma 5.** *Algorithms* FPIVerify *and* FPIDecomposeVerify *ensure conditions 2 and 3 of Theorem 1 upon successful termination.*

While we have presented our technique focusing on a single symbolic parameter N, a straightforward extension works for multiple independent parameters, multiple independent array sizes, different induction directions, and non-uniform loop termination conditions. For more details, please refer to the long version of our paper at [3].

*Limitations.* There are several scenarios under which full-program induction may not produce a conclusive result. Currently, we only analyze programs with non-nested loops with +, −, ×, ÷ expressions in assignments. We also do not handle branch conditions that are dependent on the parameter N (this doesn't include loop conditions, which are handled by unrolling the loop). The technique also remains inconclusive when the difference program ∂P<sup>N</sup> does not have fewer loops than the original program. Reduction in verification complexity of the program, in terms of the number of loops and assignment statements dependent on N, is crucial to the success of full-program induction. Finally, our technique may fail to verify a correct program if the heuristics used for weakest pre-condition either fail or return a pre-condition that causes violation of the base case check in line 18 of FPIVerify. Despite these limitations, our experiments show that full-program induction performs remarkably well on a large suite of benchmarks.

## **4 Implementation and Experiments**

We have implemented our technique in a prototype tool called Vajra, available at [5]. It takes a C program in SVCOMP format as input. The tool, written in C++, is built on top of the LLVM/CLANG [22] 6.0.0 compiler infrastructure and uses Z3 [25] v4.8.7 as the SMT solver to prove Hoare triples for loop-free programs.

We have evaluated Vajra on a test-suite of 42 safe benchmarks inspired from different algebraic functions that compute polynomials as well as a standard array operations such as copy, min, max and compare. Our programs take a symbolic parameter N which specifies the size of each array as well as the number of times each loop executes. Assertions, possibly quantified, are (in-)equalities over array elements, scalars and (non-)linear polynomial terms over N.

All experiments were performed on a Ubuntu 18.04 machine with 16GB RAM and running at 2.5 GHz. We have compared Vajra against VIAP(v1.0) [26], VeriAbs(v1.3.10) [8], Booster(v0.2) [1], Vaphor(v1.2) [24] and FreqHorn(v3) [10]. C programs were manually converted to mini-Java as required by Vaphor and CHC's as required by FreqHorn. Our results are shown in Table 1. Vajra verified 36 benchmarks, compared to 23 verified by VIAP, 12 by VeriAbs, 8 by Booster, 5 each by Vaphor and FreqHorn. Vajra was unable to compute the difference program for 5 benchmarks and was inconclusive on 1 benchmark.

Vajra verified 17 benchmarks on which VIAP diverged, primarily due to the inability of VIAP's heuristics to get closed form expressions. VIAP verified 4 benchmarks that could not be verified by the current version of Vajra due to syntactic limiations. Vajra, however, is two orders of magnitude faster than VIAP on programs that were verified by both. Vajra proved 28 benchmarks on which VeriAbs diverged. VeriAbs ran out of time on programs where loop shrinking and merging abstractions were not strong enough



**Table 1.** First column is the benchmark name. Second column indicates the number loops in the benchmark (excluding the assertion loop). Successive columns indicate the results generated by tools and the time taken where T1 is Vajra, T2 is VIAP, T3 is VeriAbs, T4 is Booster, T5 is Vaphor, T6 is FreqHorn. ✓indicates assertion safety, ✗indicates assertion violation, **?** indicates unknown result, and **-** indicates an abrupt stop. All the times are in seconds. TO is time-out of 100 secs.

to prove the assertions. VeriAbs reported 1 program as unsafe due to the imprecision of its abstractions and it proved 4 benchmarks that Vajra could not. Vajra verified 30 benchmarks that Booster could not. Booster reported 4 benchmarks as unsafe due to imprecise abstractions, its fixed-point computation engine reported unknown result on 12 benchmarks and it ended abruptly on 3 benchmarks. Booster also proved 2 benchmarks that couldn't be handled by the current version of Vajra due to syntactic limitations. Vajra verified 32 benchmarks on which Vaphor was inconclusive. Distinguished cell abstraction in Vaphor is unable to prove safety of programs, when the value at each array index needs to be tracked. Vaphor reported 9 programs unsafe due to imprecise abstraction, returned unknown on 2 programs and ended abruptly on 1 program. Vaphor proved a benchmark that Vajra could not. Vajra verified 32 programs on which FreqHorn diverged, especially when constants and terms that appear in the inductive invariant are not syntactically present in the program. FreqHorn ran out of time on 22 programs, reported unknown result on 12 and ended abruptly on 3 benchmarks. FreqHorn verified a benchmark with a single loop that Vajra could not. On an extended set of 231 benchmarks, Vajra verified 110 programs out of 121 safe programs, falsified 108 out of 110 unsafe programs, and was inconclusive on the remaining 13 programs.

## **5 Conclusion**

We presented a novel property-driven verification method that performs induction over the entire program via parameter N. Significantly, this obviates the need for loop-specific invariants. Experiments show that full-program induction performs remarkably well vis-a-vis state-of-the-art tools for analyzing array manipulating programs. Further improvements in the algorithms for computing difference programs and for strengthening of pre- and post-conditions are envisaged as part of future work.

## **Data Availability Statement**

The datasets generated and analyzed during the current study are available in the figshare repository: https://doi.org/10.6084/m9.figshare.11875428.v1

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Interpretation-Based Violation Witness Validation for C: NITWIT**

Jan Svejda ˇ , Philipp Berger , and Joost-Pieter Katoen

RWTH Aachen University, Germany {berger, katoen}@cs.rwth-aachen.de jan.svejda@rwth-aachen.de

**Abstract.** As software verification is gaining traction in academia and industry the number and complexity of verification tools is growing constantly. This initiated research and interest into exchangeable verification witnesses as well as tools for automated witness validation. Initial witness validators used model checkers that were amended to benefit from guidance information provided by the witness. This approach comes with substantial overhead. Second-generation execution-based validators traded speed for reduced strength in case of incomplete and non-exact witnesses. This was done by extracting test harnesses and compiling them with the original program. We present the nitwit tool, a new interpretation-based violation witness validator for C programs that is trimmed to be fast and memory efficient. It verifies a record number of witnesses of SV-COMP'20 in the ReachSafety category. Our novel tool exchanges initial compilation overhead and optimized execution for rapid startup performance. nitwit borrows C semantics from the compiler used for compilation. This offloads this hard-to-get-right task and enables using several compilers in parallel to inspect possible semantic differences.

## **1 Introduction**

*The importance of witnesses.* Model checking is a very successful automated verification technique with many applications. Its usage is rapidly increasing and one may fairly argue that model checking has penetrated various industries. This is true as well for software model checkers that, as opposed to first generation model checkers, directly verify program code. Model checking is in particular a very effective *bug hunting technique*: in case a property is violated, a counterexample is provided witnessing the property's violation. This is why they are often named witnesses. As phrased by Clarke *et al.* [16] "It is impossible to overestimate the importance of the counterexample feature. The counterexamples are invaluable in debugging complex systems. Some people use model checking just for this feature."

*Witness validation.* Early model checkers provided witnesses for safety properties such as "certain bad states should always be avoided" as finite paths that end in a bad state. A simple witness-steered simulation could reveal the flaw. Modern model checkers heavily use abstraction, and witnesses are no longer concrete, but rather phrased in terms of some abstract model. This is in particular true for software model checkers. Witnesses are in fact finite paths through an abstracted program representing sets of paths in the concrete program that is to be verified. These sets may contain spurious concrete paths. This raises the question whether witnesses are correct. Witness validation is the process of checking whether a witness produced by a software model checker is indeed a witness showing that the concrete program violates the property. Software model checkers such as CBMC, CPAchecker and so on, that generate witnesses are called *producers*, while software tools that perform the witness validation are named *validators*. With a single exception [12], existing validators are incorporated or directly built on top of the existing software model checkers CPAchecker [13] or Ultimate Automizer [19,18,17].

*A format for witnesses.* In order to facilitate the validation of witnesses by various different tools, a witness format has been developed that nowadays is used by many software model checkers. For safety properties as above, this format prescribes how to represent a witness for reaching a bad state. Due to this format, witnesses are exchangeable and witness validation can be done using different techniques and tools. This format allows (i) a cross-platform exchange of information that enables "drop-in" replacement of tools such as visualization and reviews of results [10], (ii) validation of witnesses which strengthens trust in verification results, especially if the verifier and validator use different techniques and (iii) a significant amount of false bug alarms to be caught by failed validation.

*Witness validation in software verification competitions.* Since a few years, the use of witnesses has become an important part in software competitions such as the annual TACAS Competition on Software Verification (SV-COMP) [2,3,4,5,6]. SV-COMP is a competition in automatic software verification, in which academic, but also some industrial, software verifiers participate. In the 2019 edition [6], 31 verifiers participated in verifying 10 522 verification tasks for C programs (and 368 for Java programs). SV-COMP has different categories, such as reachability, memory and concurrency safety, absence of overflows, and termination. SV-COMP adopted violation witnesses as part of its benchmark scoring schema since 2015 [3] and adhered to it also in the following editions [4,5,6]. This means that a verifier does not receive a point for a violated property unless the produced violation witness could be validated by at least one validator. This applies to all categories. To reflect that violation witnesses contain sufficient information for validation, the validators are granted only limited resources (e.g., only 10% of the amount of time available for verification, and 7 GB memory). Correctness witnesses were incorporated into the score evaluation in 2017 [5] – since this competition, validated correctness witnesses yield a bonus point for the producer.

*Contributions of this paper.* This paper presents the interpretation-based witness validator nitwit. It validates violation witnesses for safety reachability properties as above. It does so for C programs. In contrast to most other validators (a) it does not rely on an existing software model checker, and (b) exploits an interpretation-based approach. nitwit uses a home-made extension of the *PicoC* interpreter which feeds a witness automaton with steering information during a step-by-step interpretation of the C program, see Figure 1. nitwit was evaluated on 11 533 violation witnesses in the ReachSafety category during SV-COMP 2020 and we compared its outcomes to another five witness validators that participated. nitwit was able to validate more witnesses in this category (8 526 in total) than all its competitors, and did so substantially faster. In addition, nitwit was able to validate 399 witnesses that could not be validated with any of the five competitors.

Fig. 1: High-level architecture of the nitwit Validator.

## **2 Background**

The need for achieving portability of counterexamples and proofs between tools gave rise to a type of non-deterministic finite automaton (NFA) called a *witness automaton*, or simply a witness [11]. Two types of witnesses exist – a violation and a correctness witness. In this paper, we focus on *violation witnesses*.

The concepts defined in this section follow the definitions of [22,11]. We represent programs by control-flow graphs (CFGs).

**Definition 1 (Control-flow graph).** *<sup>A</sup>* control-flow graph <sup>C</sup> = (L, l<sup>0</sup>, G, V ) *is a finite set of locations* <sup>L</sup>*, initial location* <sup>l</sup><sup>0</sup> <sup>∈</sup> <sup>L</sup>*,* <sup>G</sup> <sup>⊆</sup> <sup>L</sup> <sup>×</sup> Op <sup>×</sup><sup>L</sup> *a set of edges where* Op = {*skip*, *assume*(ϕ), *assign*(x, E)} *with* x <sup>∈</sup> V,ϕ *a predicate over the program variables* V *and* E *an expression over* V *.*

In a CFG over V <sup>=</sup> {x, y}, e.g., an assignment is of the form x := x <sup>+</sup> y. The interpretation of a CFG is given by a (possibly countably infinite) transition system where states are of the form (l, v) where l <sup>∈</sup> L and v is a variable assignment over V . For the sake of brevity, we refrain from a formal definition.

For predicate ϕ over V , let v <sup>|</sup><sup>=</sup> ϕ denote that ϕ holds in valuation v. A witness automaton (WA) is a finite-state automaton (NFA) used by the validator to run in parallel to the CFG such that a program run violating the specification is accepted.

## **Definition 2 (Witness automaton).** *<sup>A</sup>* witness automaton *(WA)*

<sup>A</sup> = (Q, Σ, δ, q0, qE) *for a CFG* <sup>C</sup> = (L, l0, G, V ) *is an NFA with states* <sup>Q</sup>*, initial state* <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup> *and* <sup>δ</sup> : <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>2</sup><sup>Q</sup> *as usual,* <sup>q</sup>E *the accepting state and* Σ <sup>⊆</sup> <sup>2</sup><sup>G</sup> <sup>×</sup> <sup>Φ</sup>*, where* <sup>Φ</sup> *is the set of predicates over* <sup>V</sup> *.*

The transitions of A have source code and guards [11] that identify program edges and place constraints on variable assignments respectively. They correspond to pairs (Di, ϕi), where <sup>D</sup>i <sup>⊆</sup> <sup>G</sup> and <sup>ϕ</sup>i is a predicate over variables.

**Definition 3 (Simulation).** *Let* <sup>A</sup> = (Q, Σ, δ, q0, qE) *be a WA for a CFG* <sup>C</sup> = (L, l0, G, V ) *and* <sup>ρ</sup> <sup>=</sup> <sup>l</sup><sup>0</sup> g1 −→ ... <sup>g</sup><sup>n</sup> −→ <sup>l</sup>n *a path in* <sup>C</sup>*. The run* <sup>q</sup><sup>0</sup> <sup>σ</sup><sup>1</sup> −→ ... <sup>σ</sup><sup>n</sup> −−→ <sup>q</sup>n *in* <sup>A</sup> simulates <sup>ρ</sup> *iff* <sup>σ</sup>i+1 = (Di+1, ϕi+1) *with* (li, gi+1, li+1) <sup>∈</sup> <sup>D</sup>i+1 *and* <sup>v</sup>i+1 <sup>|</sup><sup>=</sup> <sup>ϕ</sup>i+1 *for some state* (li+1, vi+1)*. The run is* accepted *if* <sup>q</sup>n <sup>=</sup> <sup>q</sup>E *and* <sup>L</sup>(A) *is the set of words* <sup>σ</sup><sup>1</sup> ...σn *for which* <sup>A</sup> *has an accepting run.*

The path <sup>l</sup><sup>0</sup> g1 −→ ... <sup>g</sup><sup>n</sup> −→ <sup>l</sup>n represents a set of concrete program executions (l0, v0) <sup>→</sup> ...(ln, vn) in which variable <sup>x</sup> has value <sup>v</sup>i(x). The state conditions <sup>ϕ</sup>i+1 restrict the set of concrete program executions to those for which <sup>v</sup>i+1 <sup>|</sup><sup>=</sup> <sup>ϕ</sup>i+1, for all i<n. Thus, a predicate <sup>ϕ</sup>i+1 constrains the concrete values in <sup>C</sup>.

When a verifier checks a property, its output should not only be *yes* or *no*, but preferably also a program execution that leads to the property violation. It is not always easy to construct a precise program execution path, as various verification techniques apply abstractions. This is taken into consideration in the witness format, for they represent a part of the state space that contains a property violation. The "narrower" the space they represent is, the easier it is to re-verify that a property is truly violated. A trivial witness automaton, e.g., which consists of only an (accepting) state with a self-loop, does not restrict the program's execution at all. Witness validation essentially then requires a verification from scratch. On the other hand, a precise witness permits only program executions leading to an error state, thereby making the validation as direct as possible.

**Definition 4 (Exact Witness).** *Let* <sup>A</sup> = (Q, Σ, δ, q<sup>0</sup>, qE) *be a WA for a CFG* <sup>C</sup> = (L, l<sup>0</sup>, G, V ) *and* <sup>L</sup>E <sup>⊆</sup> <sup>L</sup> *be a set of error locations. A WA* <sup>A</sup> *is* exact *iff for all* (D<sup>1</sup>, ϕ<sup>1</sup>)...(Dn, ϕn) ∈ L(A) *it holds for all path* <sup>l</sup><sup>0</sup> g1 −→ ... <sup>g</sup><sup>n</sup> −→ <sup>l</sup>n *of* <sup>C</sup>*: if* (li, gi+1, li+1) <sup>∈</sup> <sup>D</sup>i+1 *and* <sup>v</sup>i+1 <sup>|</sup><sup>=</sup> <sup>ϕ</sup>i+1 *in state* (li+1, vi+1) *for all* <sup>0</sup> <sup>≤</sup> i<n*, then* <sup>l</sup>n <sup>∈</sup> <sup>L</sup>E*.*

## **3 Validators for Violation Witnesses**

Apart from a new format for exchanging verification results, [11] also presents a feasibility study with implementing both a witness producer and a validator in two well-established tools – CPAchecker and Ultimate Automizer. Subsequently, [12] reports on two more validators that extract test harnesses from violation witnesses to perform validation. A test harness is compiled with the program to supply input values during runtime and provide definitions for necessary external functions. This approach differs from tools using formal verification/model-checking techniques by offloading semantics to a compiler and only investigating a single path through the program. Validators that explore a single path through compilation/execution are called *execution-based* validators. In addition, a new validator MetaVal<sup>1</sup> was introduced in SV-COMP 2020 – we refrain from describing it as it is yet to be published though we do include it in the benchmark evaluation. All five validators participated in SV-COMP.

**CPAchecker** This tool employs a so-called Configurable Program Analysis (CPA), which allows selecting the desired level of precision to control the tradeoff between performance gain and spurious counterexamples [13]. When witness validation is enabled, it matches a witness automaton against the program's CFG. Afterwards, as part of the CPA, it strengthens the exploration with state-space guards from the witness at matched locations. [11] reports that e.g. their value analysis and predicate analysis are capable of using this strengthening [15,14].

**Ultimate Automizer** This tool uses an automata-based approach to verification [19,18,17]. Prior to the analysis, it transforms programs into a variant of CFGs over an alphabet of program statements. Such a CFG, say <sup>C</sup>error, recognizes control-flow traces – sequences of statements – that lead to a property violation. A control-flow trace is *feasible* if it is a run of <sup>C</sup>error and ends in an accepting error state. For validation, the tool creates a new CFG <sup>C</sup>w from the Cartesian product of the <sup>C</sup>error and a witness automaton. Subsequently, the tool runs the same analysis over the CFG <sup>C</sup>w as for a usual verification run and validates the witness if an error trace is found. State-space guards, such as <sup>ϕ</sup>i+1 in Definition <sup>3</sup> over control edges and source code guards that characterize branching are ignored.

**CPA-witness2test** This tool exploits the verifier of CPAchecker. It constructs and matches a CFG with the witness, but does not perform a CPA analysis. It collects the input and initialization values from matched assumptions and assembles an ordered vector of values for every used nondeterministic function, which it then transforms into a switch statement supplied as function implementation. For uninitialized variables, which in C are also nondeterministic, no values are injected.

In automatic software verification, programs are usually decorated with an external function VERIFIER error to identify a point which should never be reached, i.e., an error location. CPA-witness2test implements the function as a call to exit(107), which immediately terminates an execution with return

<sup>1</sup> https://gitlab.com/sosy-lab/software/metaval

code 107. This signals the successful validation of a witness, because the error was reached.

**FShell-witness2test** This tool does not rely on an existing software model checker. It begins with reading the specification and parses the program with pycparser<sup>2</sup> – a Python library for C, which constructs an abstract syntax tree (AST). This AST is traversed to find uninitialized variables and uses of nondeterministic functions. This yields *watch points*, indicating where variable(s) need to be resolved in order to find the right concrete path. Once watch points are established, the tool reads the provided witness and obtains a sequence of control states from program start to the error state. Further on, states of the sequence are matched to the found watch points. For any such match, the tool tries to determine the watch point value from a corresponding assumption in the witness. Finally, these values are added to a test vector, which is transformed into a test harness prepared for compilation. If the function VERIFIER error is called during execution, then the witness is accepted.

## **4 Interpretation-based Witness Validation**

This section presents a new interpretation-based validator for violation witnesses of C programs with an embedded<sup>3</sup> reachability safety property. The validator is named Nitwit Validator (or nitwit for short) as a shorthand for iNterpretation-based vIolaTion WITness Validator. The programs *must* designate the error location by a function call to VERIFIER error in order for nitwit to recognize that a program violates the invariant "begin in main and never call VERIFIER error". nitwit is restricted to these programs.

*A bird's eye view on* nitwit*.* Our implementation approach consists of combining an existing C interpreter with a witness automaton that provides witness assumptions used for resolving variables according to the current position (li, vi) in the program execution. The WA is fed with information from the interpreter, which executes the C program step by step. For validations both source code and state-space guards are taken into account. When a state-space guard (an assumption) does not hold for the current variable values, then the WA does not proceed. To illustrate, suppose an integer variable x initiated to one and incremented on every line (numbered from one). A witness control edge consisting of an assumption x = 7 matches only on line seven and will block the WA until then if no other edge is satisfied. If, however, the assumption concerns nondeterministic variables, then we extract a value from it and resolve the nondeterminism in the interpreter. E.g., if x is not initialized at all, then assumption x = 7 assigns it the value 7 already on line one.

<sup>2</sup> https://github.com/eliben/pycparser

<sup>3</sup> The program is enhanced with error location(s) VERIFIER error, assume statements VERIFIER assume with conditions and calls to VERIFIER nondet functions, which return nondeterministic values.

As the program executes, the WA progresses through its states until either the execution ends or the error function is called. The latter we consider a testament to the property violation, accepting the witness.

*Implementing* nitwit*.* An *interpreter* is a program that takes as input a program, parses it and executes commands as part of its own runtime instead of producing machine code like a compiler. Interpreters translate programs directly into the behavior they represent; they keep track of all variable values and execute statements based on results of expressions and control flow [21,1,20].

nitwit's input is a C program. The choice of C interpreters is limited moreover, compiled C often widely outperforms interpreters in terms of speed, due to extensive compiler optimizations and the unavoidable overhead in parsing and program state management. Nonetheless, in a witness validation setting, when a program only needs to be executed once, the advantage of machine code speed can fade away, because compilation-based validators spend effort on optimizations and translation, which is part of validation time. Furthermore, we wanted to control the simulated program during runtime to alter variables and track the position in source code, which is difficult after compilation.

Our requirements on an interpreter in the order of relevance were: (i) an open-source license permitting free use and distribution of the source code, (ii) a moderate learning curve because of the limited time for implementation, (iii) flexibility so that we can easily modify it, (iv) good coverage of C and (v) tested with realistic C programs. We have chosen *PicoC* <sup>4</sup>, a portable interpreter written in C with a very small code base originally built as a scripting language interpreter for unmanned aerial vehicles (UAVs). In its original form, PicoC supports the basics of ANSI C, but misses some important features like function pointers or an implementation of const variables. For being able to execute C99-compliant C code, which is common in the benchmarks of SV-COMP, we extended it with new functionalities, such as goto constructs, function pointers, the double, long long and const types, better parsing for numerical constants, variable shadowing, struct initialization and bit fields.

By using an interpreter, nitwit has full control over the simulation of a program. For our purposes, we have supplemented PicoC with function callbacks at locations corresponding to places from which a verifier might extract control-flow edges. During execution, the interpreter returns control to our witness automaton whenever it reaches a callback. The callbacks carry all of the necessary information like the current position, variable values, presence of non-determinism or the selected branch in if-statements, loops and ternary operators.

The validator's managing component stores the witness automaton and starts the program's simulation in PicoC. It also stores the current control state in the witness and tries to progress to the error state whenever it receives a callback and the source code and state-space guards match. If a state-space guard involves a nondeterministic variable, nitwit attempts to extract a value from the given assumption. Upon success, the value is stored in the variable management system.

<sup>4</sup> https://gitlab.com/zsaleeba/picoc

For the assumption evaluation we execute assumptions (recall a WA-transition may have multiple of them) as conditions in the program context and if any one of them fails, then the control edge is considered as non-matching. If an assumption resolves a nondeterministic variable (e.g. the assumption x = 2 resolves the nondeterministic variable x), then we automatically accept it and store the given variable value. A variable becomes nondeterministic if it has no initialization or if it is assigned a nondeterministic value (for example from a VERIFIER nondet function). Analogously, it becomes deterministic when a deterministic value is assigned to it, e.g., as a result from an expression involving only deterministic variables and constants. Moreover, if in the assumption evaluator an assumption involving a nondeterministic variable occurs and is resolved, then the variable gets assigned the new value and is registered as being deterministic.

## **5 Evaluation**

## **5.1 Benchmarks**

Primarily, we have tested nitwit on witnesses produced during SV-COMP 2019 [6], however, as data from the current edition were already available to us, we present the results attained during SV-COMP 2020. The set of all witnesses produced is available at [9]. It consists of the witnesses and index files that contain information about the witness producer, date of creation, corresponding program file and its hash value (that can be used to find the program in the SV-COMP program repository), the programming language, specification, type of witness and so on. The witnesses and programs cover a large spectrum of possible language features in a variety of applications and settings. We used the dataset of the previous edition [7] to evaluate nitwit extensively and prepare it for competing in 2020.

During the competition nitwit was executed only on witnesses in the category *ReachSafety* with a known specification violation as our validator targets only reachability safety violations. This amounts to a set of 11 533 violation witnesses produced by 17 different verifiers.

The witnesses were not manually reviewed to check for each if the language of the WA indeed contains a violating path. This would be a laborious task doing it automatically is a better fit, which in fact is precisely what validators are designed for. Nevertheless, this means that we cannot claim that our or other validators are incorrect when they do not find a violation, because the witness may steer them inappropriately. As the dataset does not exclusively contain exact witnesses, some witnesses might not resolve enough nondeterminism for nitwit to find a violation based on the selected single execution.

Witnesses show a lot of heterogeneity based on their producer. Whilst some are very detailed, like in the case of Pinaka and Map2Check with approximately 23 and 13 thousand nodes on average respectively, others tend to keep the WA more succinct or even minimal. For example, tools like Brick or DIVINE usually provide the least verbose witnesses. The average number of edges typically lies near the average number of nodes due to the fact that witness producers

output automata that lead directly to the error location. Not many specify information about function *enter* and *return*. Except for VeriFuzz, Map2Check and Symbiotic, tools usually put assumptions on edges selectively, though there are also some that do not use them – DIVINE and PredatorHP. Assumptions are an important part of witness automata, they restrict the exploration of state space and potentially save the most work during validation. Nevertheless, having to check a large number of them may prove difficult. On the whole, an average witness has around 2 000 nodes, 2 200 transitions between them, 1 300 state-space guards in form of assumptions, 360 controls for branching conditions, 15 function calls and return guards. The largest witness was produced by Pinaka and contains 2.1 million nodes and transitions with assumptions on almost half of them.

**5.2 Evaluation Setting** The runtime was limited to 90 s, while memory was limited to 7 GB [6]. Based on recorded data and extracted results, we distinguish six different outcomes of a validator:


**5.3 Experimental Results** Figure 2 presents the results on validating 11 533 witnesses by the five violation witness validators. Note that sometimes validator names in tables or plots are abbreviated for readability. The colors discern the possible outcomes described above. The validators are sorted in ascending order by the number of False results (blue). nitwit and CPAchecker manage to find the most violations (8 526 and 7 642 respectively), closely followed by FShell-witness2test (7 005). CPA-witness2test is able to validate 6 104, Ultimate Automizer finds 4 393 and MetaVal 1 681.

All validators except for nitwit output True (green) in some cases, which means the validator rejected the witness. Ultimate Automizer rejects the majority of witnesses during validation. CPA-witness2test shows the highest ratio of Unknown results, whereas FShell-witness2test exhibits the largest amount of unaccepted witnesses due to malformation (Bad witness). MetaVal exceeds the alloted time in most cases. The results are detailed in Table 2 on page 13.

Fig. 2: Validator outcomes on 11 533 witnesses from SV-COMP 2020.

*Producing output false.* With the result False, validators indicate they have found a property violation, i.e., a *reachable error location*. These results are of particular interest, as the dataset used for our evaluation contains witnesses only for programs deemed incorrect.

For 10 933 witnesses at least one validator validated the verification result. Figure 3 presents a Venn diagram that displays the partitioning of these witnesses between validators based on shared successful validations. The shape as a whole stands for all of the validated witnesses and each validator is represented by a distinctly colored enclosure. Circles group intersecting results and the bigger numbers inside describe their cardinality. The smaller numbers underneath are for making clear which validators belong to the group (ordering is from top to bottom, so CPAchecker is number one and so on). The diagram reveals that only about 226 witnesses are approved by all verifiers, though the largest shared subset has 2 010 of them – it corresponds to results shared by all of the validators with exception of Ultimate Automizer and MetaVal. In total, 1 411 instances are validated only once, 1 878 twice, 2 290 thrice, 3 682 four times and 1 446 five times. nitwit validates 399 witnesses that no other tool validates. Interestingly, none of the validators subsume each other in terms of False results, each has some not negligible amount of witnesses validated uniquely.

Concerning resource usage, Figure 4(a) depicts the reached number of successfully validated witnesses plotted against the required CPU time (in logscale). Data points are sorted by the required CPU time and the black line at the top marks the timeout. nitwit finds violations systematically faster than any other tool. Its mean runtime amounts to 0.63 seconds, the median was noticeably smaller at 0.02 seconds, standard deviation was 4.74. The runtime for nitwit is skewed towards zero with most results achieved under half a second. We also see that running nitwit more than 10 seconds scarcely produces any new results. That is not the case for CPAchecker, Ultimate Automizer, MetaVal and CPA-witness2test, which frequently need more than that,

even though they rarely finish without a considerable headroom until the limit of 90 seconds. On average, nitwit is about 4.2 times faster than the runner up FShell-witness2test, 17.8 times than CPA-witness2test, 22.0 times than CPAchecker, 35.3 times than Ultimate Automizer and 39.1 times than MetaVal.


Table 1: Runtime statistics for validators (in seconds).

Figure 4(b) shows the memory usage in successful validations plotted on a logscale with data sorted again in ascending order. nitwit needed the least (5 MB on average; maximum 1 GB) RAM, closely followed by FShell-witness2test. The validators were only rarely approaching the limit of 7 GB (black line at the top); the largest value slightly above 4 GB during a successful validation was exhibited by CPA-witness2test. The tools do not suffer from a lack of available memory, which is also demonstrated by the low rate of Out of memory results in Figure 2.

*All validations.* Figures 4(c) and 4(d) demonstrate the resource consumption of all validations. Until about the 10 500th witness, nitwit remains consistently faster than all other validators, usually finishing under one second. Then, it struggles to find the answers as some witnesses do not resolve enough nondeterminism or contain very long or even infinite paths.

Compilation-based FShell- and CPA-witness2test avoid the overhead of an interpreter, so are mostly able to finish before the 90 second mark, because even if the harness they extract is incomplete (still contains nondeterminism), then after compilation the execution ends quicker than if it were interpreted. In terms of absolute numbers, nitwit takes an average 0.64 seconds per witness on the whole dataset with a median of 0.02 and standard deviation 4.99. The runtime difference on average is 3.3 seconds in favor of Nitwit compared to FShell-witness2test and 13.0 seconds to CPA-witness2test. More interesting is the median though, this was 0.02 seconds, 1.4 seconds, 8.6 seconds,

Fig. 3: A Venn diagram showing the coverage of False validation results by various validators.

(a) CPU time (seconds) in validations for result False.

(c) CPU time (seconds) in validations for all results.

(b) Memory consumption (in MB) during validations for result False.

(d) Memory consumption (in MB) in validations for all results.

Fig. 4: Resources for results False (first row) and all together (second row).

<sup>12</sup>.0 seconds, 16.0 seconds, 96.0 seconds for Nitwit, FShell-witness2test, CPA-witness2test, CPAchecker, Ultimate Automizer and MetaVal respectively.

Figure 5 shows how nitwit compares to the other four validators in terms of time and successful validation results. In each plot, a validator is compared against nitwit. Witnesses validated by both have a blue color, validated only by nitwit yellow, by the other tool green and any other are depicted in red. The diagonal line is supplemented by two other lines representing a ±30% difference in CPU time. The result, if not false, is plotted on one of six lines at the end of its axis. These lines correspond to a Timeout (abbreviated by *to*), Unknown (*uk*), True (*tu*), Error (*er* ) and Out of memory (*om*). Every point represents a witness (identical for both validators).

Figure 5 shows that in instances of agreed False results, nitwit is always faster than other validators. FShell-witness2test has 1 114 validations within less than one second difference. This is 0 for all of the others.


Table 2: Results on successful validations of violation witnesses generated by the various verifiers. Column *Virtual best* aggregates witnesses that are validated at least once.

## **5.4 Discussion**

*Nondeterminism in programs.* nitwit is not designed for proving a program correct with respect to some specification, because the validator explores only a single path. Nevertheless, to prove a program incorrect it may suffice to look at a single path and although the program may contain nondeterministic choices (e.g., if a condition depends on a nondeterministic variable) – if these are resolved using a witness, then the execution becomes deterministic. This is the main idea behind execution- and interpretation-based validators, because after resolving

Fig. 5: Comparing nitwit (x-axis) with the other five validators (y-axis) in terms of speed and outcome. Each point represents a witness.

nondeterminism, there exists only a single path through the program. If this leads to an error location, then the validator may confidently claim that the provided program and witness constitute a specification violation.

nitwit guarantees (except for implementation bugs, supported syntax and available stack- and heap size) a validated violation witness iff it allows only such abstract paths that end in an error location. Thus, given a well-specified exact witness, nitwit should always find a violation, because it has the program state space restricted to only such paths which reach an error location. If a witness allows inexact abstract paths, then nitwit (and in fact also an execution-based validator) may select the wrong path and see no error state. Results in Section 5 demonstrate that even without the guarantee of exact witnesses, interpretationbased validators can find a substantial amount of violations.

*Finding violations.* Results clearly show that nitwit is a competitive validator of witnesses for C programs and invariant properties. Our validators implemented independently of any verification platform can efficiently reestablish violations from witnesses. We outperform other tools especially on the less time intensive instances as nitwit works well in validating witnesses that restrict the state space sufficiently. For these witnesses, it is the fastest among state-of-the-art validators and has the smallest memory footprint.

We attribute the good outcomes in speed and memory to the choice of employing *an interpretation-based approach*. As nitwit explores only one path, it is obviously faster than full fledged model-checking validators that explore many paths. Interestingly, an interpreter-based execution analysis is often much faster than compiled. This difference might be attributed to the fact that executionbased tools build the whole AST and CFG, whereas PicoC saves a lot of time by not having to construct them. Moreover, a compiler translates the program into machine code, a non-trivial task which PicoC circumvents.

*Weaknesses.* One of nitwit's limitations is inherent to exploring only a single execution. Suppose a non-terminating program P, a trivial witness without assumptions and a property violation, whose reachability depends on a nondeterministic variable being zero. nitwit, if it cannot resolve a nondeterministic variable, *assumes it has value one*. In such a setting, the simulated program diverges and so does nitwit, because it cannot recognize an infinite execution. A similar situation may occur even if the witness is non-trivial. If its transitions are not matched to the right operations (which can be a fault in both the witness producer or validator), then P will diverge due to unresolved nondeterminism.

Secondly, as we employ an interpreter, there is a noticeable overhead compared to compiled programs in terms of CPU instructions per operation. Therefore, even if an execution is finite or reaches a violation in finitely many steps, it might simply be too computationally intensive for nitwit to provide an answer within time. Combined with unresolved nondeterminism, this explained a relatively high amount of Timeout results in an early version of nitwit benchmarked on SV-COMP 2019.

To combat the timeouts, we decided to implement a simple check in the witness automaton. After a certain number of unsuccessful transitions to a different state, we deliberately stop the validation and output Unknown. We experimented with the threshold and concluded that 1 million attempts is appropriate. By enabling this threshold, we went from 784 to 123 killed validations and lost only 25 witnesses that would otherwise have been validated, which is an acceptable trade-off. An analysis showed that 573 of the 784 timeouts were validations of possibly non-terminating programs, 18 for terminating and the 193 remaining validations without specified termination5. The check for the threshold can be disabled.

*Processing witnesses.* In some cases, software verifiers do not always produce witnesses in exactly the correct format. For example, in GraphML it is necessary to define attributes for the graph, nodes and edges. If a witness happens to contain no such definitions, we supply a basic configuration that allows for its successful parsing. By default, we also do not extensively check for correctness of all of the graph attributes like the program hash.

Furthermore, we consider a reached error location as a proof of violation even if the witness automaton itself does not finish in an error state. This behavior can be changed by a compilation flag to rejection. Nevertheless, if a witness resolves enough determinism for one execution to find an error, we think it is sufficiently "good" for it to be a viable witness. For some programs, the variable resolving at the start suffices to reach a violation. However, we output a special exit code to make it clear that the witness did not in fact accept this path.

## **6 Conclusion**

We presented the new interpretation-based violation witness validator nitwit, that was able to validate 8 526 witnesses from a dataset of 11 533 witnesses [9] that were produced in the ReachSafety category of the 2020 edition of SV-COMP. nitwit was able to validate 399 witnesses that have not been validated by any other participating tool. In addition, nitwit has a small memory footprint and is mostly significantly faster than its competitors.

*Data Availability Statement and Acknowledgments.* nitwit is available for free at https://github.com/moves-rwth/nitwit-validator and is licensed under the New BSD license. The replication artifact can be found at the Zenodo repository https://doi.org/10.5281/zenodo.3518139 [23] and the datasets analyzed during the current study at https://doi.org/10.5281/zenodo.3630205 [8]. We thank Dirk Beyer for very useful feedback on an earlier version of the paper and assistance with configuring nitwit for SV-COMP 2020.

<sup>5</sup> We know whether these programs are (non-)terminating, as they were reviewed in SV-COMP before including them in the competition on termination analysis.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **A Calculus for Modular Loop Acceleration***-*

Florian Frohn

Max Planck Institute for Informatics Saarland Informatics Campus, Saarbr¨ucken, Germany

**Abstract.** Loop acceleration can be used to prove safety, reachability, runtime bounds, and (non-)termination of programs operating on integers. To this end, a variety of acceleration techniques has been proposed. However, all of them are monolithic: Either they accelerate a loop successfully or they fail completely. In contrast, we present a calculus that allows for combining acceleration techniques in a modular way and we show how to integrate many existing acceleration techniques into our calculus. Moreover, we propose two novel acceleration techniques that can be incorporated into our calculus seamlessly. An empirical evaluation demonstrates the applicability of our approach.

## **1 Introduction**

In the last years, loop acceleration techniques have successfully been used to build static analyses for programs operating on integers [2, 8, 11, 16–18, 28]. Essentially, such techniques extract a quantifier-free first-order formula ψ from a single-path loop <sup>T</sup> , i.e., a loop without branching in its body, such that <sup>ψ</sup> under-approximates (resp. is equivalent to) <sup>T</sup> . More specifically, each model of the resulting formula <sup>ψ</sup> corresponds to an execution of T (and vice versa). By integrating such techniques into a suitable program-analysis framework [3, 11, 16–18, 23], whole programs can be transformed into first-order formulas which can then be analyzed by off-the-shelf solvers. Applications include proving safety [23] or reachability [23, 28], deducing bounds on the runtime complexity [16, 17], and proving (non-) termination [8, 11].

However, existing acceleration techniques only apply if certain prerequisites are in place. So the power of static analyses built upon loop acceleration depends on the applicability of the underlying acceleration technique.

In this paper, we introduce a calculus which allows for combining several acceleration techniques modularly in order to accelerate a single loop. Consequently, it can handle classes of loops where all standalone techniques fail. Moreover, we present two novel acceleration techniques and integrate them into our calculus.

In the following, we introduce preliminaries in Sec. 2. Then, we discuss existing acceleration techniques in Sec. 3. In Sec. 4, we present our calculus to combine acceleration techniques. Sec. 5 shows how existing acceleration techniques can be

<sup>-</sup> This work has been funded by DFG grant 389792660 as part of TRR 248 (see https://perspicuous-computing.science).

c The Author(s) 2020

A. Biere and D. Parker (Eds.): TACAS 2020, LNCS 12078, pp. 58–76, 2020. https://doi.org/10.1007/978-3-030-45190-5\_4

integrated into our framework. Next, we present two novel acceleration techniques and incorporate them into our calculus in Sec. 6. After discussing related work in Sec. 7, we demonstrate the applicability of our approach via an empirical evaluation in Sec. 8 and conclude in Sec. 9. All proofs can be found in [13].

## **2 Preliminaries**

We use bold letters *x*, *y*, *z*, ... for vectors. Let C (*z*) be the set of *closed-form expressions* over the variables *z* containing, e.g., all arithmetic expressions built from *z*, integer constants, addition, subtraction, multiplication, division, and exponentiation.<sup>1</sup> We consider loops of the form

$$\text{while } \varphi \text{ do } x \leftarrow a \tag{7}$$

where *x* is a vector of d pairwise different variables that range over the integers, the loop condition <sup>ϕ</sup> <sup>∈</sup> *Prop*(<sup>C</sup> (*x*)) is a finite propositional formula over the atoms {p > <sup>0</sup> <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>C</sup> (*x*)}, and *<sup>a</sup>* <sup>∈</sup> <sup>C</sup> (*x*)<sup>d</sup> such that the function<sup>2</sup> *<sup>x</sup>* → *<sup>a</sup>* maps integers to integers. *Loop* denotes the set of all such loops.

We identify <sup>T</sup>loop and the pair ϕ, *<sup>a</sup>*. Moreover, we identify *<sup>a</sup>* and the function *x* → *a* where we sometimes write *a*(*x*) to make the variables *x* explicit and we use the same convention for other (vectors of) expressions. Similarly, we identify the formula <sup>ϕ</sup> resp. <sup>ϕ</sup>(*x*) and the predicate *<sup>x</sup>* → <sup>ϕ</sup>.

Throughout this paper, let n be a designated variable and let:

$$\mathbf{a} := \begin{pmatrix} a\_1 \\ \cdots \\ a\_d \end{pmatrix} \qquad \mathbf{x} := \begin{pmatrix} x\_1 \\ \cdots \\ x\_d \end{pmatrix} \qquad \mathbf{x}' := \begin{pmatrix} x'\_1 \\ \cdots \\ x'\_d \end{pmatrix} \qquad \mathbf{y} := \begin{pmatrix} \begin{smallmatrix} x \\ n \\ \mathbf{z}' \end{smallmatrix} \end{pmatrix}$$

Intuitively, the variable n represents the number of loop iterations and *x* corresponds to the values of the program variables *x* after n iterations.

<sup>T</sup>loop induces a relation −→Tloop on <sup>Z</sup><sup>d</sup>:

<sup>ϕ</sup>(*x*) <sup>∧</sup> *<sup>x</sup>*- = *a*(*x*) ⇐⇒ *x* −→Tloop *x*-

Our goal is to find a formula <sup>ψ</sup> <sup>∈</sup> *Prop*(<sup>C</sup> (*y*)) such that

$$
\psi \iff x \longrightarrow^{n}\_{\overline{\pi}\_{loop}} x' \qquad \text{for all } n > 0. \tag{equiv}
$$

To see why we use C (*y*) instead of, e.g., polynomials, consider the loop

$$\text{while } x\_1 > 0 \text{ do } \begin{pmatrix} x\_2 \\ x\_2 \end{pmatrix} \leftarrow \begin{pmatrix} x\_1 - 1 \\ 2 \cdot x\_2 \end{pmatrix}. \tag{7}$$

Here, an acceleration technique synthesizes, e.g., the formula

$$
\begin{pmatrix} x\_1' \\ x\_2' \end{pmatrix} = \begin{pmatrix} x\_1 - n \\ 2^n \cdot x\_2 \end{pmatrix} \wedge x\_1 - n + 1 > 0 \tag{\psi\_{exp}}
$$

<sup>1</sup> Note that there is no widely accepted definition of "closed forms" and the results of the current paper are independent of the precise definition of <sup>C</sup> (*z*). <sup>2</sup> i.e., the (anonymous) function that maps *<sup>x</sup>* to *<sup>a</sup>*

where <sup>x</sup>1−<sup>n</sup> <sup>2</sup>n·x<sup>2</sup> is equivalent to the value of ( <sup>x</sup><sup>1</sup> <sup>x</sup><sup>2</sup> ) after n iterations and the inequation <sup>x</sup><sup>1</sup> <sup>−</sup> <sup>n</sup> + 1 <sup>&</sup>gt; 0 ensures that <sup>T</sup>exp can be executed at least <sup>n</sup> times. Clearly, the growth of x<sup>2</sup> cannot be captured by a polynomial, i.e., even the behavior of quite simple loops is beyond the expressiveness of polynomial arithmetic.

In practice, one can restrict our approach to weaker classes of expressions to ease automation, but the presented results are independent of such considerations.

Some acceleration techniques cannot guarantee (equiv), but the resulting formula is an under-approximation of Tloop, i.e., we have

$$
\psi \implies x \longrightarrow\_{\mathcal{T}\_{loop}}^n x' \qquad \text{for all } n > 0. \tag{approx}
$$

If (equiv) resp. (approx) holds, then <sup>ψ</sup> is *equivalent* to resp. *approximates* <sup>T</sup>loop.

**Definition 1 (Acceleration Technique).** *An* acceleration technique *is a partial function*

$$accel: Loop \to Prop(\mathcal{C}(y)).$$

*It is* sound *if accel*(T ) *approximates* T *for all* T ∈ dom(*accel*)*. It is* exact *if accel*(T ) *is equivalent to* T *for all* T ∈ dom(*accel*)*.*

## **3 Existing Acceleration Techniques**

We now recall several existing acceleration techniques. In Sec. 4 we will see how these techniques can be combined in a modular way. All of them first compute a *closed form <sup>c</sup>* <sup>∈</sup> <sup>C</sup> (*x*, n)<sup>d</sup> for the values of the program variables after <sup>n</sup> iterations.

**Definition 2 (Closed Form).** *We call <sup>c</sup>* <sup>∈</sup> <sup>C</sup> (*x*, n)<sup>d</sup> *<sup>a</sup>* closed form *of* <sup>T</sup>loop *if* <sup>∀</sup>*<sup>x</sup>* <sup>∈</sup> <sup>Z</sup><sup>d</sup>, n <sup>∈</sup> <sup>N</sup>. *<sup>c</sup>* <sup>=</sup> *<sup>a</sup>*<sup>n</sup>(*x*).

Here, *a*<sup>n</sup> is the n-fold application of *a*, i.e., *a*<sup>0</sup>(*x*) = *x* and *a*<sup>n</sup>+1(*x*) = *a*(*a*<sup>n</sup>(*x*)). To find closed forms, one tries to solve the system of recurrence equations *x*(n) = *a*(*x*(n−1)) with the initial condition *x*(0) = *x*. In the sequel, we assume that we can represent *a*<sup>n</sup>(*x*) in closed form. Note that one can always do so if *<sup>a</sup>*(*x*) = <sup>A</sup>*<sup>x</sup>* <sup>+</sup> *<sup>b</sup>* with <sup>A</sup> <sup>∈</sup> <sup>Z</sup><sup>d</sup>×<sup>d</sup> and *<sup>b</sup>* <sup>∈</sup> <sup>Z</sup><sup>d</sup>, i.e., if *<sup>a</sup>* is affine. To this end, one considers the matrix B := <sup>A</sup> *<sup>b</sup>* **0**<sup>T</sup> 1 and computes its Jordan normal form B = T <sup>−</sup><sup>1</sup>JT where J is a block diagonal matrix (which has complex entries if B has complex eigenvalues). Then the closed form for J<sup>n</sup> can be given directly (see, e.g., [31]) and *a*<sup>n</sup>(*x*) = T <sup>−</sup><sup>1</sup>J<sup>n</sup>T ( *<sup>x</sup>* <sup>1</sup> ). Moreover, one can compute a closed form if *a* = <sup>c</sup>1·x1+p<sup>1</sup> ... <sup>c</sup>d·xd+p<sup>d</sup> where <sup>c</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup> and each <sup>p</sup><sup>i</sup> is a polynomial over <sup>x</sup>1,...,x<sup>i</sup>−<sup>1</sup> [15].

## **3.1 Acceleration via Decrease** *or* **Increase**

The first acceleration technique discussed in this section exploits the following observation: If <sup>ϕ</sup>(*a*(*x*)) implies <sup>ϕ</sup>(*x*) and <sup>ϕ</sup>(*a*<sup>n</sup>−<sup>1</sup>(*x*)) holds, then <sup>T</sup>loop is applicable at least n times. So in other words, it requires that the indicator function (or characteristic function) <sup>I</sup><sup>ϕ</sup> : <sup>Z</sup><sup>d</sup> → {0, <sup>1</sup>} of <sup>ϕ</sup> with <sup>I</sup>ϕ(*x*)=1 ⇐⇒ <sup>ϕ</sup>(*x*) is monotonically decreasing w.r.t. *<sup>a</sup>*, i.e., <sup>I</sup>ϕ(*x*) <sup>≥</sup> <sup>I</sup>ϕ(*a*(*x*)).

#### **Theorem 1 (Acceleration via Monotonic Decrease [28]).** *If*

$$
\varphi(a(x)) \implies \varphi(x),
$$

*then the following acceleration technique is exact:*

$$\mathcal{T}\_{loop} \mapsto x' = a^n(x) \wedge \varphi(a^{n-1}(x))$$

So for example, Thm. <sup>1</sup> accelerates <sup>T</sup>exp to <sup>ψ</sup>exp. However, the requirement <sup>ϕ</sup>(*a*(*x*)) =<sup>⇒</sup> <sup>ϕ</sup>(*x*) is often violated in practice. To see this, consider the loop

$$\text{while } x\_1 > 0 \land x\_2 > 0 \text{ do } \begin{pmatrix} x\_1 \\ x\_2 \end{pmatrix} \leftarrow \begin{pmatrix} x\_1 - 1 \\ x\_2 + 1 \end{pmatrix}. \tag{7}$$

It cannot be accelerated with Thm. 1 as

$$x\_1 - 1 > 0 \land x\_2 + 1 > 0 \implies x\_1 > 0 \land x\_2 > 0.$$

A dual acceleration technique is obtained by "reversing" the implication in the prerequisites of Thm. 1. Then I<sup>ϕ</sup> is monotonically increasing w.r.t. *a*. So ϕ is an invariant and thus {*<sup>x</sup>* <sup>∈</sup> <sup>Z</sup><sup>d</sup> <sup>|</sup> <sup>ϕ</sup>(*x*)} is a *recurrent set* [22] of <sup>T</sup>loop.

#### **Theorem 2 (Acceleration via Monotonic Increase).** *If*

$$
\varphi(x) \implies \varphi(a(x)),
$$

*then the following acceleration technique is exact:*

$$\mathcal{T}\_{loop} \mapsto x' = a^n(x) \wedge \varphi(x)$$

As a minimal example, Thm. 2 accelerates

$$\text{while } x > 0 \text{ do } x \gets x + 1$$

to x-<sup>=</sup> <sup>x</sup> <sup>+</sup> <sup>n</sup> <sup>∧</sup> x > 0.

## **3.2 Acceleration via Decrease** *and* **Increase**

Both acceleration techniques presented so far have been generalized in [11].

**Theorem 3 (Acceleration via Monotonicity [11]).** *If*

$$\begin{aligned} \varphi(\mathbf{z}) & \Longleftrightarrow \varphi\_1(\mathbf{z}) \land \varphi\_2(\mathbf{z}) \land \varphi\_3(\mathbf{z}), \\ \varphi\_1(\mathbf{z}) & \Longrightarrow \varphi\_1(\mathbf{a}(\mathbf{z})), \\ \varphi\_1(\mathbf{z}) \land \varphi\_2(\mathbf{a}(\mathbf{z})) & \Longrightarrow \varphi\_2(\mathbf{z}), \\ \varphi\_1(\mathbf{z}) \land \varphi\_2(\mathbf{z}) \land \varphi\_3(\mathbf{z}) & \Longrightarrow \varphi\_3(\mathbf{a}(\mathbf{z})), \end{aligned} \qquad \text{and}$$

*then the following acceleration technique is exact:*

Tloop → *x*-<sup>=</sup> *<sup>a</sup>*<sup>n</sup>(*x*) <sup>∧</sup> <sup>ϕ</sup>1(*x*) <sup>∧</sup> <sup>ϕ</sup>2(*a*<sup>n</sup>−<sup>1</sup>(*x*)) <sup>∧</sup> <sup>ϕ</sup>3(*x*)

Here, ϕ<sup>1</sup> and ϕ<sup>3</sup> are again invariants of the loop. Thus, as in Thm. 2 it suffices to require that they hold before entering the loop. On the other hand, ϕ<sup>2</sup> needs to satisfy a similar condition as in Thm. 1 and thus it suffices to require that ϕ<sup>2</sup> holds before the last iteration. We also say that ϕ<sup>2</sup> is a *converse invariant* (w.r.t. <sup>ϕ</sup>1). It is easy to see that Thm. <sup>3</sup> is equivalent to Thm. <sup>1</sup> if <sup>ϕ</sup><sup>1</sup> <sup>≡</sup> <sup>ϕ</sup><sup>3</sup> ≡ (where denotes logical truth) and it is equivalent to Thm. <sup>2</sup> if <sup>ϕ</sup><sup>2</sup> <sup>≡</sup> <sup>ϕ</sup><sup>3</sup> ≡ .

With this approach, Tnon-dec can be accelerated to

$$
\begin{pmatrix} x\_1' \\ x\_2' \end{pmatrix} = \begin{pmatrix} x\_1 - n \\ x\_2 + n \end{pmatrix} \land x\_2 > 0 \land x\_1 - n + 1 > 0 \tag{\psi\_{non-dec}}
$$

by choosing <sup>ϕ</sup><sup>1</sup> := <sup>x</sup><sup>2</sup> <sup>&</sup>gt; 0, <sup>ϕ</sup><sup>2</sup> := <sup>x</sup><sup>1</sup> <sup>&</sup>gt; 0, and <sup>ϕ</sup><sup>3</sup> := .

Thm. 3 naturally raises the question: Why do we need *two* invariants? To see this, consider a restriction of Thm. <sup>3</sup> where <sup>ϕ</sup><sup>3</sup> := . It would fail for a loop like

$$\text{while } x\_1 > 0 \land x\_2 > 0 \text{ do } \begin{pmatrix} x\_1 \\ x\_2 \end{pmatrix} \leftarrow \begin{pmatrix} x\_1 + x\_2 \\ x\_2 - 1 \end{pmatrix} \tag{7}$$

which can easily be handled by Thm. <sup>3</sup> (by choosing <sup>ϕ</sup><sup>1</sup> := , <sup>ϕ</sup><sup>2</sup> := <sup>x</sup><sup>2</sup> <sup>&</sup>gt; 0, and ϕ<sup>3</sup> := x<sup>1</sup> > 0). The problem is that the converse invariant x<sup>2</sup> > 0 is needed to prove invariance of <sup>x</sup><sup>1</sup> <sup>&</sup>gt; 0. Similarly, a restriction of Thm. <sup>3</sup> where <sup>ϕ</sup><sup>1</sup> := would fail for the following variant of T2-invs:

$$\text{while } x\_1 > 0 \land x\_2 > 0 \text{ do } \begin{pmatrix} x\_1 \\ x\_2 \end{pmatrix} \leftarrow \begin{pmatrix} x\_1 - x\_2 \\ x\_2 + 1 \end{pmatrix}$$

Here, the problem is that the invariant x<sup>2</sup> > 0 is needed to prove converse invariance of x<sup>1</sup> > 0.

#### **3.3 Acceleration via Metering Functions**

Another approach for loop acceleration uses *metering functions*, a variation of classical *ranking functions* from termination and complexity analysis [17]. While ranking functions give rise to *upper* bounds on the runtime of loops, metering functions provide *lower* runtime bounds, i.e., the definition of a metering function *mf* : <sup>Z</sup><sup>d</sup> <sup>→</sup> <sup>Q</sup> ensures that for each *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup><sup>d</sup>, the loop under consideration can be applied at least *mf* (*x*) times.

**Theorem 4 (Acceleration via Metering Functions [17]).** *Let mf be a metering function for* Tloop*. Then the following acceleration technique is sound:*

$$\mathcal{T}\_{loop} \mapsto x' = a^n(x) \land \varphi(x) \land n < mf(x) + 1$$

So using the metering function <sup>x</sup>, Thm. <sup>4</sup> accelerates <sup>T</sup>exp to

$$\begin{pmatrix} x\_1' \\ x\_2' \end{pmatrix} = \begin{pmatrix} x\_1 - n \\ 2^n \cdot x\_2 \end{pmatrix} \land x\_1 > 0 \land n < x\_1 + 1 \quad \equiv \quad \psi\_{exp} \dots$$

However, synthesizing non-trivial (i.e., non-constant) metering functions is challenging. Moreover, unless the number of iterations of Tloop equals *mf* (*x*) for all *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup><sup>d</sup>, *acceleration via metering functions* is not exact.

*Linear* metering functions can be synthesized via Farkas' Lemma and SMT solving [17]. However, many loops do not have non-trivial linear metering functions. To see this, reconsider <sup>T</sup>non-dec. Here, (x1, x2) → <sup>x</sup><sup>1</sup> is not a metering function as <sup>T</sup>non-dec cannot be iterated at least <sup>x</sup><sup>1</sup> times if <sup>x</sup><sup>2</sup> <sup>≤</sup> 0. Thus, [16] proposes a refinement of [17] based on metering functions of the form *<sup>x</sup>* → <sup>I</sup>ξ(*x*) · <sup>f</sup>(*x*) where <sup>ξ</sup> <sup>∈</sup> *Prop*(<sup>C</sup> (*x*)) and <sup>f</sup> is linear. With this improvement, the metering function (x1, x2) → <sup>I</sup>x2>0(x2) · <sup>x</sup><sup>1</sup> can be used to accelerate <sup>T</sup>non-dec to

$$
\binom{x\_1'}{x\_2'} = \binom{x\_1 - n}{x\_2 + n} \land x\_1 > 0 \land x\_2 > 0 \land n < x\_1 + 1.
$$

## **4 A Calculus for Modular Loop Acceleration**

All acceleration techniques presented so far are monolithic: Either they accelerate a loop successfully or they fail completely. In other words, we cannot *combine* several techniques to accelerate a single loop. To this end, we now present a calculus that repeatedly applies acceleration techniques to simplify an *acceleration problem* resulting from a loop Tloop until it is *solved* and hence gives rise to a suitable <sup>ψ</sup> <sup>∈</sup> *Prop*(<sup>C</sup> (*y*)) which approximates resp. is equivalent to <sup>T</sup>loop.

#### **Definition 3 (Acceleration Problem).** *A tuple*

$$\text{problem}$$
 
$$\|\psi \mid \check{\varphi} \mid \hat{\varphi} \mid a\|$$

*where* <sup>ψ</sup> <sup>∈</sup> *Prop*(<sup>C</sup> (*y*))*,* ϕ, <sup>q</sup> <sup>ϕ</sup> <sup>∈</sup> *Prop*(<sup>C</sup> (*x*))*, and <sup>a</sup>* : <sup>Z</sup><sup>d</sup> <sup>→</sup> <sup>Z</sup><sup>d</sup> *is an* acceleration problem*. It is* consistent *if* <sup>ψ</sup> *approximates* ϕ, <sup>q</sup> *<sup>a</sup>,* exact *if* <sup>ψ</sup> *is equivalent to* ϕ, <sup>q</sup> *<sup>a</sup>, and* solved *if it is consistent and* <sup>ϕ</sup> ≡ *. The* canonical acceleration problem *of a loop* Tloop *is*

$$\left[x'=a^n(x)\mid \top \mid \varphi(x) \mid a(x)\right].$$

*Example 1.* The canonical acceleration problem of Tnon-dec is

$$
\left\lceil \begin{pmatrix} x'\_1\\ x'\_2 \end{pmatrix} = \begin{pmatrix} x\_1 - n\\ x\_2 + n \end{pmatrix} \right\rceil \top \left\lfloor x\_1 > 0 \land x\_2 > 0 \right\rfloor \left( \begin{pmatrix} x\_1 - 1\\ x\_2 + 1 \end{pmatrix} \right\rfloor \dots
$$

The first component <sup>ψ</sup> of an acceleration problem <sup>ψ</sup> <sup>|</sup> <sup>ϕ</sup><sup>q</sup> <sup>|</sup> <sup>ϕ</sup> <sup>|</sup> *<sup>a</sup>* is the partial result that has been computed so far. The second component <sup>ϕ</sup><sup>q</sup> corresponds to the part of the loop condition that has already been processed successfully. As our calculus preserves consistency, <sup>ψ</sup> always approximates ϕ, <sup>q</sup> *<sup>a</sup>*. The third component is the part of the loop condition that remains to be processed, i.e., the loop ϕ, *<sup>a</sup>* still needs to be accelerated. The goal of our calculus is to transform a canonical into a solved acceleration problem.

More specifically, when we have simplified a canonical acceleration problem *x*- <sup>=</sup> *<sup>a</sup>*<sup>n</sup>(*x*) || <sup>ϕ</sup>(*x*) <sup>|</sup> *<sup>a</sup>*(*x*) to <sup>ψ</sup>1(*y*) <sup>|</sup> <sup>ϕ</sup>q(*x*) <sup>|</sup> <sup>ϕ</sup>(*x*) <sup>|</sup> *<sup>a</sup>*(*x*), then <sup>ϕ</sup> <sup>≡</sup> <sup>ϕ</sup><sup>q</sup> <sup>∧</sup> <sup>ϕ</sup> and ϕ, q *a x*-

$$
\psi\_1 \implies x \longrightarrow^n\_{\langle \check{\varphi}, a \rangle} \; x'.
$$

Thus, it then suffices to find some <sup>ψ</sup><sup>2</sup> <sup>∈</sup> *Prop*(<sup>C</sup> (*y*)) such that

$$\text{is to find some } \psi\_2 \in \operatorname{Prop}(\mathcal{K}(\mathfrak{z})) \text{ such that}$$

$$x \longrightarrow\_{\langle \check{\varphi}, \mathfrak{a} \rangle} x' \land \psi\_2 \implies x \longrightarrow\_{\langle \hat{\varphi}, \mathfrak{a} \rangle} x'. \tag{1}$$

$$\text{we have } \operatorname{\longrightarrow}\_{\langle \check{\varphi}, \mathfrak{a} \rangle} \cap \operatorname{\longrightarrow}\_{\langle \hat{\varphi}, \mathfrak{a} \rangle} = \longrightarrow\_{\langle \check{\varphi} \wedge \hat{\varphi}, \mathfrak{a} \rangle} = \longrightarrow\_{\langle \varphi, \mathfrak{a} \rangle} \text{ and thus}$$

The reason is that we have −→ϕ, <sup>q</sup> *<sup>a</sup>* ∩ −→ϕ, *<sup>a</sup>* = −→ϕ,*a* and thus

$$
\psi\_1 \land \psi\_2 \implies x \longrightarrow^n\_{\langle \varphi, a \rangle} \ x',
$$

i.e., <sup>ψ</sup><sup>1</sup> <sup>∧</sup> <sup>ψ</sup><sup>2</sup> approximates <sup>T</sup>loop.

Note that the acceleration techniques presented so far would map ϕ, *<sup>a</sup>* to some <sup>ψ</sup><sup>2</sup> <sup>∈</sup> *Prop*(<sup>C</sup> (*y*)) such that

$$
\psi\_2 \implies x \longrightarrow^n\_{\langle \hat{\varphi}, a \rangle} x',\tag{2}
$$

which is more restrictive than (1). In Sec. 5, we will adapt all acceleration techniques from Sec. <sup>3</sup> to search for some <sup>ψ</sup><sup>2</sup> <sup>∈</sup> *Prop*(<sup>C</sup> (*y*)) that satisfies (1) instead of (2), i.e., we will turn them into *conditional acceleration techniques*.

#### **Definition 4 (Conditional Acceleration).** *We call a partial function*

*accel* : *Loop* <sup>×</sup> *Prop*(<sup>C</sup> (*x*)) *Prop*(<sup>C</sup> (*y*)).

*a* conditional acceleration technique*. It is* sound *if*

$$\textit{a.c.} \textit{ } \textit{Lou} \textit{p} \land \textit{Prop}(\textit{v} \ (\omega)) \longrightarrow \textit{Top}(\textit{v} \ (\mathtt{y})).$$

$$\textit{a} \text{ conditional acceleration technique.} \textit{It} \text{ is sound if}$$

$$\textit{x} \longrightarrow \textit{^{n}\_{\langle \check{\varphi}, \mathbf{a} \rangle}} \ x' \land \textit{accel}(\langle \chi, \mathbf{a} \rangle, \check{\varphi}) \quad implies \ \textit{x} \longrightarrow \textit{^{n}\_{\langle \chi, \mathbf{a} \rangle}} x'$$

$$\textit{for all } (\langle \chi, \mathbf{a} \rangle, \check{\varphi}) \in \textit{dom}(\textit{accel}), \,\textit{x}, \boldsymbol{x}' \in \mathbb{Z}^{d}, \,\textit{and} \,\boldsymbol{n} > 0. \,\textit{It} \,\textit{is exact} \ i\prime$$

<sup>∈</sup> <sup>Z</sup><sup>d</sup>*, and* n > <sup>0</sup>*. It is* exact *if additionally*

*<sup>x</sup>* −→<sup>n</sup> χ∧ϕ, <sup>q</sup> *<sup>a</sup> <sup>x</sup>implies accel*(χ, *<sup>a</sup>*, <sup>ϕ</sup>q)

*for all* (χ, *<sup>a</sup>*, <sup>ϕ</sup>q) <sup>∈</sup> dom(*accel*)*, <sup>x</sup>*, *<sup>x</sup>*-<sup>∈</sup> <sup>Z</sup><sup>d</sup>*, and* n > <sup>0</sup>*.*

We are now ready to present our *acceleration calculus*, which combines loop acceleration techniques in a modular way. In the following, w.l.o.g. we assume that propositional formulas are in CNF and we identify the formula k <sup>i</sup>=1 <sup>C</sup><sup>i</sup> with the set of clauses {C<sup>i</sup> <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>}.

**Definition 5 (Acceleration Calculus).** *The relation on acceleration problems is defined by the following rule:* <sup>∅</sup><sup>=</sup> <sup>χ</sup> <sup>⊆</sup> <sup>ϕ</sup> *accel*(χ, *<sup>a</sup>*, <sup>ϕ</sup>q) = <sup>ψ</sup><sup>2</sup>

<sup>ψ</sup><sup>1</sup> <sup>|</sup> <sup>ϕ</sup><sup>q</sup> <sup>|</sup> <sup>ϕ</sup> <sup>|</sup> *<sup>a</sup>* -(e) <sup>ψ</sup><sup>1</sup> <sup>∪</sup> <sup>ψ</sup><sup>2</sup> <sup>|</sup> <sup>ϕ</sup><sup>q</sup> <sup>∪</sup> <sup>χ</sup> <sup>|</sup> <sup>ϕ</sup> \ <sup>χ</sup> <sup>|</sup> *<sup>a</sup> accel is a sound conditional acceleration technique*

*A* -*-step is* exact *(written* <sup>e</sup>*) if accel is exact.*

So our calculus allows us to pick a subset χ (of clauses) from the yet unprocessed condition <sup>ϕ</sup> and "move" it to <sup>ϕ</sup>q, which has already been processed successfully. To this end, χ, *<sup>a</sup>* needs to be accelerated by a conditional acceleration technique, i.e., when accelerating χ, *<sup>a</sup>* we may assume *<sup>x</sup>* −→<sup>n</sup> ϕ, q *a x*- .

Note that every acceleration technique trivially gives rise to a conditional acceleration technique (by disregarding the second argument <sup>ϕ</sup><sup>q</sup> of *accel* in Def. 4). Thus, our calculus allows for combining arbitrary existing acceleration techniques without adapting them. However, many acceleration techniques can easily be turned into more sophisticated conditional acceleration techniques (cf. Sec. 5), which increases the power of our approach.

*Example 2.* We continue Ex. 1 and fix χ := x<sup>1</sup> > 0. Thus, we need to accelerate the loop x<sup>1</sup> > 0, <sup>x</sup>1−<sup>1</sup> <sup>x</sup>2+1  to enable a --step. We obtain

$$\begin{split} \left[\psi\_{non-dec}^{init} := \left(\begin{smallmatrix} x\_1'\\ x\_2' \end{smallmatrix}\right) \left(\begin{smallmatrix} x\_1 - n\\ x\_2 + n \end{smallmatrix}\right) \top \; \middle|\; x\_1 > 0 \land x\_2 > 0 \; \middle|\; \begin{smallmatrix} x\_1 - 1\\ x\_2 + 1 \end{smallmatrix}\right) \right] \\ \overset{Thm.-1}{\sim} \, \left[\psi\_{non-dec}^{init} \land x\_1 - n + 1 > 0 \; \middle|\; x\_1 > 0 \; \middle|\; x\_2 > 0 \; \middle|\; \begin{smallmatrix} x\_1 - 1\\ x\_2 + 1 \end{smallmatrix}\right] \\ \overset{Thm.-2}{\sim} \, \left[\psi\_{non-dec}^{init} \land x\_1 - n + 1 > 0 \land x\_2 > 0 \; \middle|\; x\_1 > 0 \land x\_2 > 0 \; \middle|\; \top \; \middle|\; \begin{smallmatrix} x\_1 - 1\\ x\_2 + 1 \end{smallmatrix}\right] \right] \\ = \left[\psi\_{non-dec} \; \middle|\; x\_1 > 0 \land x\_2 > 0 \; \middle|\; \top \; \middle|\; \begin{smallmatrix} x\_1 - 1\\ x\_2 + 1 \end{smallmatrix}\right] \end{split}$$

where Thm. 2 was applied to the loop x<sup>2</sup> > 0, <sup>x</sup>1−<sup>1</sup> <sup>x</sup>2+1  in the second step. Thus, we successfully constructed the formula <sup>ψ</sup>non-dec, which is equivalent to <sup>T</sup>non-dec.

The crucial property of our calculus is the following.

**Lemma 1.** *preserves consistency and* <sup>e</sup> *preserves exactness.*

Then the correctness of our calculus follows immediately. The reason is that *x*- <sup>=</sup> *<sup>a</sup>*<sup>n</sup>(*x*) || <sup>ϕ</sup>(*x*) <sup>|</sup> *<sup>a</sup>*(*x*) -∗ (e) <sup>ψ</sup>(*y*) <sup>|</sup> <sup>ϕ</sup>q(*x*) || *<sup>a</sup>*(*x*) implies <sup>ϕ</sup> <sup>≡</sup> <sup>ϕ</sup>q.

**Theorem 5 (Correctness of** -**).** *If*

$$\begin{array}{ll} \text{In 5 (Correctness of } \stackrel{\frown}{\leadsto}). & If\\ \left[x'=a^n(x) \mid \top \mid \varphi(x) \mid a(x)\right] \stackrel{\frown^\*}{\leadsto} \left[\psi(y) \mid \check{\varphi}(x) \mid \top \mid a(x)\right], \end{array}$$

*then* <sup>ψ</sup> *approximates* <sup>T</sup>loop*. If*

$$\begin{aligned} &\text{proxinates } \mathsf{T}\_{\mathsf{loop}}. \text{ If} \\ &\left\{ \left. x' = a^n(x) \right| \top \mid \varphi(x) \mid a(x) \right\| \stackrel{\twoheadrightarrow\_e^\*}{\leadsto\_e} \left\[ \psi(y) \mid \check{\varphi}(x) \mid \top \mid a(x) \right\|, \end{aligned}$$

*then* <sup>ψ</sup> *is equivalent to* <sup>T</sup>loop*.*

Termination of our calculus is trivial, as the size of the third component <sup>ϕ</sup> of the acceleration problem is decreasing.

**Theorem 6 (Termination of** -**).** *terminates.*

## **5 Conditional Acceleration Techniques**

We now show how to turn the acceleration techniques from Sec. 3 into conditional acceleration techniques, starting with *acceleration via monotonic decrease*.

#### **Theorem 7 (Conditional Acceleration via Monotonic Decrease).** *If*

$$\stackrel{\smile}{\text{al Aceleration via Mon}}$$

$$\check{\varphi}(x) \land \chi(a(x)) \implies \chi(x),$$

*then the following conditional acceleration technique is exact:*

$$\begin{aligned} \text{conditional acceleration } techique &\; is \; exac \\\\ (\langle \chi, a \rangle, \check{\varphi}) &\mapsto x' = a^n(x) \land \chi(a^{n-1}(x)) \end{aligned}$$

So we just add <sup>ϕ</sup><sup>q</sup> to the premise of the implication that needs to be checked to apply *acceleration via monotonic decrease*. Thm. 2 can be adapted analogously.

#### **Theorem 8 (Conditional Acceleration via Monotonic Increase).** *If*

$$\text{al } \mathsf{Acceleration} \text{ via } \mathsf{Mon}$$

$$\check{\varphi}(x) \wedge \chi(x) \implies \chi(a(x)),$$

*then the following conditional acceleration technique is exact:*

$$\varphi(x) \land \chi(x) \implies \chi(a(x)),$$

$$\text{tional acceleration technique is } a$$

$$(\langle \chi, a \rangle, \check{\varphi}) \mapsto x' = a^n(x) \land \chi(x)$$

*Example 3.* For the canonical acceleration problem of T2-invs, we obtain:

$$\left[\mathbf{z}' = \mathbf{a}\_{2\cdot invs}^{n}(\mathbf{z}) \mid \top \mid x\_1 > 0 \land x\_2 > 0 \mid \mathbf{a}\_{2\cdot invs} := \left(\begin{array}{c} x\_1 + x\_2\\ x\_2 - 1 \end{array}\right)\right]$$

$$\stackrel{Thm.}{\leadsto}\_e^{7} \left[\mathbf{z}' = \mathbf{a}\_{2\cdot invs}^{n}(\mathbf{z}) \land x\_2 - n + 1 > 0 \mid x\_2 > 0 \mid x\_1 > 0 \mid \mathbf{a}\_{2\cdot invs}\right]$$

$$\stackrel{Thm.}{\leadsto}\_e^{8} \left[\mathbf{z}' = \mathbf{a}\_{2\cdot invs}^{n}(\mathbf{z}) \land x\_2 - n + 1 > 0 \land x\_1 > 0 \mid x\_2 > 0 \land x\_1 > 0 \mid \top \mid \mathbf{a}\_{2\cdot invs}\right]$$

While we could also use Thm. 1 for the first step, Thm. 2 is inapplicable in the second step. The reason is that we need the converse invariant x<sup>2</sup> > 0 to prove invariance of x<sup>1</sup> > 0.

It is not a coincidence that T2-invs, which could also be accelerated with *acceleration via monotonicity* (cf. Thm. 3) directly, can be handled by applying our novel calculus with Theorems 7 and 8.

*Remark 1.* If applying *acceleration via monotonicity* to <sup>T</sup>loop yields <sup>ψ</sup>, then

$$\left\{ x' = a^n(x) \mid \top \mid \varphi(x) \mid a(x) \right\| \leadsto\_e^{\leq 3} \left\{ \psi(y) \mid \varphi(x) \mid \top \mid a(x) \right\}$$

where either Thm. 7 or Thm. 8 is applied in each <sup>e</sup>-step.

Thus, there is no need for a conditional variant of *acceleration via monotonicity*. Note that combining Theorems 7 and 8 with our calculus is also useful for loops where *acceleration via monotonicity* is inapplicable.

*Example 4.* Consider the following loop, which can be accelerated by splitting its guard into one invariant and two converse invariants.

$$\text{while } x\_1 > 0 \land x\_2 > 0 \land x\_3 > 0 \text{ do } \begin{pmatrix} x\_1 \\ x\_3 \end{pmatrix} \leftarrow \begin{pmatrix} x\_1 - 1 \\ x\_3 - x\_2 \end{pmatrix} \tag{7} \qquad (\mathcal{T}\_{2 \cdot c \cdot invs})$$

Let

$$\begin{aligned} \varphi\_{2 \circ c \circ invs} &:= x\_1 > 0 \land x\_2 > 0 \land x\_3 > 0, \\ \mathbf{a}\_{2 \circ c \circ invs} &:= \begin{pmatrix} x\_1 - 1 \\ x\_2 + x\_1 \\ x\_3 - x\_2 \end{pmatrix}, \\ \psi\_{2 \circ c \circ invs}^{init} &:= x' = \mathbf{a}\_{2 \circ c \circ invs}^n(x), \end{aligned}$$

and let x(m) <sup>i</sup> be the i th component of *a*<sup>m</sup> 2-c-invs(*x*). Starting with the canonical acceleration problem of T2-c-invs, we obtain:

$$\begin{aligned} & \left[ \psi\_{2\cdot c\cdot invs}^{init} \; \vert \; \top \; \vert \; \varphi\_{2\cdot c\cdot invs} \; \vert \; \mathbf{a}\_{2\cdot c\cdot invs} \right] \\ & \stackrel{Thm\cdot 7}{\sim} \, ^7\_e \left[ \psi\_{2\cdot c\cdot invs}^{init} \land x\_1^{(n-1)} > 0 \; \vert \; x\_1 > 0 \; \vert \; x\_2 > 0 \land x\_3 > 0 \; \vert \; \mathbf{a}\_{2\cdot c\cdot invs} \right] \\ & \stackrel{Thm\cdot 8}{\sim} \, ^8\_e \left[ \psi\_{2\cdot c\cdot invs}^{init} \land x\_1^{(n-1)} > 0 \land x\_2 > 0 \; \vert \; x\_1 > 0 \land x\_2 > 0 \; \vert \; x\_3 > 0 \; \vert \; \mathbf{a}\_{2\cdot c\cdot invs} \right] \\ & \stackrel{Thm\cdot 7}{\sim} ^{9}\_e \left[ \psi\_{2\cdot c\cdot invs}^{init} \land x\_1^{(n-1)} > 0 \land x\_2 > 0 \land x\_3^{(n-1)} > 0 \; \vert \; \varphi\_{2\cdot c\cdot invs} \right] \; \vert \; \mathbf{a}\_{2\cdot c\cdot invs} \; \vert \; \mathbf{b} \; \vert \; \mathbf{a}\_{2\cdot c\cdot invs} \end{aligned}$$

Finally, we present a variant of Thm. 4 for conditional acceleration. The idea is similar to the approach for deducing metering functions of the form *<sup>x</sup>* → <sup>I</sup>ϕ<sup>q</sup>(*x*) · <sup>f</sup>(*x*) from [16] (see Sec. 3.3 for details). But in contrast to [16], in our setting the "conditional" part <sup>ϕ</sup><sup>q</sup> does not need to be an invariant of the loop.

**Theorem 9 (Conditional Acceleration via Metering Functions).** *Let mf* : <sup>Z</sup><sup>d</sup> <sup>→</sup> <sup>Q</sup>*. If* <sup>ϕ</sup>q(*x*) <sup>∧</sup> <sup>χ</sup>(*x*) =<sup>⇒</sup> *mf* (*x*) <sup>−</sup> *mf* (*a*(*x*)) <sup>≤</sup> <sup>1</sup> *and*

<sup>ϕ</sup>q(*x*) ∧ ¬χ(*x*) =<sup>⇒</sup> *mf* (*x*) <sup>≤</sup> <sup>0</sup>,

*then the following conditional acceleration technique is sound:*

$$\begin{cases} \text{s.t.} & \mathbf{y} \wedge \mathbf{z} = \mathbf{z} \\ \text{ing conditional acceleration technique is sound:} \end{cases}$$

$$(\langle \chi, \mathbf{a} \rangle, \check{\varphi}) \mapsto \mathbf{z}' = \mathbf{a}^n(\mathbf{x}) \wedge \chi(\mathbf{z}) \wedge n < mf(\mathbf{x}) + 1$$

## **6 Acceleration via Eventual Monotonicity**

The combination of the calculus from Sec. 4 and the conditional acceleration techniques from Sec. 5 still fails to handle certain interesting classes of loops. Thus, to improve the applicability of our approach we now present two new acceleration techniques based on *eventual* monotonicity.

#### **6.1 Acceleration via Eventual Decrease**

All (combinations of) techniques presented so far fail for the following example.

$$\text{while } x\_1 > 0 \text{ do } \begin{pmatrix} x\_1 \\ x\_2 \end{pmatrix} \leftarrow \begin{pmatrix} x\_1 + x\_2 \\ x\_2 - 1 \end{pmatrix} \tag{\mathcal{T}\_{ev\text{-}dec}}.$$

The reason is that x<sup>1</sup> does not behave monotonically, i.e., x<sup>1</sup> > 0 is neither an invariant nor a converse invariant. Essentially, Tev-dec proceeds in two phases: In the first (optional) phase, x<sup>2</sup> is positive and hence the value of x<sup>1</sup> is monotonically increasing. In the second phase, x<sup>2</sup> is non-positive and consequently the value of x<sup>1</sup> decreases (weakly) monotonically. The crucial observation is that once the value of x<sup>1</sup> decreases, it can never increase again. Thus, despite the non-monotonic behavior of x1, it suffices to require that x<sup>1</sup> > 0 holds before the first and before the nth loop iteration to ensure that the loop can be iterated at least n times.

**Theorem 10 (Acceleration via Eventual Decrease).** *If* <sup>ϕ</sup>(*x*) <sup>≡</sup> k <sup>i</sup>=1 <sup>C</sup><sup>i</sup> *where each* C<sup>i</sup> *contains an inequation expr* <sup>i</sup>(*x*) > 0 *such that*

$$
\exp r\_i(\mathbf{z}) \ge \exp r\_i(\mathbf{a}(\mathbf{z})) \implies \exp r\_i(\mathbf{a}(\mathbf{z})) \ge \exp r\_i(\mathbf{a}^2(\mathbf{z})),
$$

*then the following acceleration technique is sound:*

$$\mathcal{T}\_{loop} \mapsto x' = \mathfrak{a}^n(\mathfrak{x}) \land \bigwedge\_{i=1}^k \left( \exp r\_i(\mathfrak{x}) > 0 \land \exp r\_i(\mathfrak{a}^{n-1}(\mathfrak{x})) > 0 \right)$$

*If* <sup>C</sup><sup>i</sup> <sup>≡</sup> *expr* <sup>i</sup> <sup>&</sup>gt; <sup>0</sup> *for all* <sup>i</sup> <sup>∈</sup> [1, k]*, then it is exact.*

With Thm. 10, we can accelerate Tev-dec to

$$\begin{pmatrix} x\_1' \\ x\_2' \end{pmatrix} = \begin{pmatrix} \frac{n-n^2}{2} + x\_2 \cdot n + x\_1 \\ x\_2 - n \end{pmatrix} \wedge x\_1 > 0 \wedge \frac{n-1 - (n-1)^2}{2} + x\_2 \cdot (n-1) + x\_1 > 0$$

as we have

$$(x\_1 \ge x\_1 + x\_2) \equiv (0 \ge x\_2) \implies (0 \ge x\_2 - 1) \equiv (x\_1 + x\_2 \ge x\_1 + x\_2 + x\_2 - 1).$$

Turning Thm. 10 into a conditional acceleration technique is straightforward.

**Theorem 11 (Conditional Acceleration via Eventual Decrease).** *If we have* <sup>χ</sup>(*x*) <sup>≡</sup> k <sup>i</sup>=1 <sup>C</sup><sup>i</sup> *where each* <sup>C</sup><sup>i</sup> *contains an inequation expr* <sup>i</sup>(*x*) <sup>&</sup>gt; 0 *such that* <sup>ϕ</sup>q(*x*) <sup>∧</sup> *expr* <sup>i</sup>(*x*) <sup>≥</sup> *expr* <sup>i</sup>(*a*(*x*)) =<sup>⇒</sup> *expr* <sup>i</sup>(*a*(*x*)) <sup>≥</sup> *expr* <sup>i</sup>(*a*<sup>2</sup>(*x*)), (3)

$$\check{\varphi}(\mathbf{z}) \land \operatorname{expr}\_i(\mathbf{z}) \ge \operatorname{expr}\_i(\mathbf{a}(\mathbf{z})) \implies \operatorname{expr}\_i(\mathbf{a}(\mathbf{z})) \ge \operatorname{expr}\_i(\mathbf{a}^2(\mathbf{z})),\qquad(3)$$

*then the following conditional acceleration technique is sound:*

$$\begin{aligned} & \text{the following conditional acceleration technique is sound:}\\ & \left( \langle \chi, \mathbf{a} \rangle, \check{\varphi} \right) \mapsto \mathbf{x}' = \mathbf{a}^n(\mathbf{x}) \wedge \bigwedge\_{i=1}^k \left( \exp r\_i(\mathbf{x}) > 0 \wedge \exp r\_i(\mathbf{a}^{n-1}(\mathbf{x})) > 0 \right) \end{aligned}$$

*If* <sup>C</sup><sup>i</sup> <sup>≡</sup> *expr* <sup>i</sup> <sup>&</sup>gt; <sup>0</sup> *for all* <sup>i</sup> <sup>∈</sup> [1, k]*, then it is exact.*

*Example 5.* Consider the following variant of Tev-dec.

$$\text{while } x\_1 > 0 \land x\_3 > 0 \text{ do } \begin{pmatrix} x\_1 \\ x\_3 \end{pmatrix} \leftarrow \begin{pmatrix} x\_1 + x\_2 \\ x\_3 - x\_3 \end{pmatrix}$$

Starting with its canonical acceleration problem, we get

$$\begin{aligned} \left\lbrack \begin{aligned} & \left\lbrack \boldsymbol{x}' = \boldsymbol{a}^n(\boldsymbol{x}) \right\rbrack \left\lbrack \top \left\lvert \boldsymbol{x}\_1 > 0 \wedge \boldsymbol{x}\_3 > 0 \right\rbrack \left\lbrack \boldsymbol{a} := \begin{pmatrix} \boldsymbol{x}\_1 + \boldsymbol{x}\_2 \\ \boldsymbol{x}\_2 - \boldsymbol{x}\_3 \end{pmatrix} \right\rbrack \\ \stackrel{Thum.8}{\rightsquigarrow} \left\lbrack \boldsymbol{x}' = \boldsymbol{a}^n(\boldsymbol{x}) \wedge \boldsymbol{x}\_3 > 0 \mid \boldsymbol{x}\_3 > 0 \mid \boldsymbol{x}\_1 > 0 \mid \boldsymbol{a} \right\rbrack \end{aligned} \\ \stackrel{Thum.11}{\rightsquigarrow} \left\lbrack \boldsymbol{x}' = \boldsymbol{a}^n(\boldsymbol{x}) \wedge \boldsymbol{x}\_3 > 0 \wedge \boldsymbol{x}\_1 > 0 \wedge \boldsymbol{x}\_1^{(n-1)} > 0 \mid \boldsymbol{x}\_3 > 0 \wedge \boldsymbol{x}\_1 > 0 \mid \top \mid \boldsymbol{a} \right\rbrack \end{aligned}$$

where the second step can be performed via Thm. 11 as

$$\mathbb{T}$$

$$\text{where the second step can be performed via Thm. 11 as}$$

$$(\check{\varphi}(\mathbf{z}) \wedge \exp r(\mathbf{z}) \ge \exp r(\mathbf{a}(\mathbf{z}))) \equiv (x\_3 > 0 \wedge x\_1 \ge x\_1 + x\_2) \equiv (x\_3 > 0 \wedge 0 \ge x\_2)$$

$$\text{where}$$

implies

$$(0 \ge x\_2 - x\_3) \equiv (x\_1 + x\_2 \ge x\_1 + x\_2 + x\_2 - x\_3) \equiv (\exp(\mathfrak{a}(x)) \ge \exp(\mathfrak{a}^2(x))).$$

#### **6.2 Acceleration via Eventual Increase**

Still, all (combinations of) techniques presented so far fail for

$$\text{while } x\_1 > 0 \text{ do } \begin{pmatrix} x\_1 \\ x\_2 \end{pmatrix} \leftarrow \begin{pmatrix} x\_1 + x\_2 \\ x\_2 + 1 \end{pmatrix}. \tag{\mathcal{T}\_{ev\text{-}inc}})$$

As in the case of <sup>T</sup>ev-dec, the value of <sup>x</sup><sup>1</sup> does not behave monotonically, i.e., x<sup>1</sup> > 0 is neither an invariant nor a converse invariant. However, this time x<sup>1</sup> is eventually *increasing*, i.e., once x<sup>1</sup> starts to grow, it never decreases again. Thus, in this case it suffices to require that x<sup>1</sup> is positive and (weakly) increasing.

**Theorem 12 (Acceleration via Eventual Increase).** *If* <sup>ϕ</sup>(*x*) <sup>≡</sup> k <sup>i</sup>=1 <sup>C</sup><sup>i</sup> *where each* C<sup>i</sup> *contains an inequation expr* <sup>i</sup>(*x*) > 0 *such that*

$$
\epsilon \operatorname{expr}\_i(x) \le \operatorname{expr}\_i(a(x)) \implies \operatorname{expr}\_i(a(x)) \le \operatorname{expr}\_i(a^2(x)),
$$

*then the following acceleration technique is sound:*

$$\mathcal{T}\_{loop} \mapsto x' = a^n(x) \land \bigwedge\_{i=1}^k 0 < \exp r\_i(x) \le \exp r\_i(a(x))$$

With Thm. 12, we can accelerate Tev-inc to

$$
\begin{pmatrix} x\_1' \\ x\_2' \end{pmatrix} = \begin{pmatrix} \frac{n^2 - n}{2} + x\_2 \cdot n + x\_1 \\ x\_2 + n \end{pmatrix} \land 0 < x\_1 \le x\_1 + x\_2 \tag{\psi\_{ev-inc}}
$$

as we have

$$(x\_1 \le x\_1 + x\_2) \equiv (0 \le x\_2) \implies (0 \le x\_2 + 1) \equiv (x\_1 + x\_2 \le x\_1 + x\_2 + x\_2 + 1).$$

However, Thm. 12 is *not* exact, as the resulting formula only covers program runs where each *expr* <sup>i</sup> behaves monotonically. So ψev-inc only covers those runs of <sup>T</sup>ev-inc where the initial value of <sup>x</sup><sup>2</sup> is non-negative. Again, turning Thm. <sup>12</sup> into a conditional acceleration technique is straightforward.

**Theorem 13 (Conditional Acceleration via Eventual Increase).** *If we have* <sup>χ</sup>(*x*) <sup>≡</sup> k <sup>i</sup>=1 <sup>C</sup><sup>i</sup> *where each* <sup>C</sup><sup>i</sup> *contains an inequation expr* <sup>i</sup>(*x*) <sup>&</sup>gt; 0 *such that* <sup>ϕ</sup>q(*x*) <sup>∧</sup> *expr* <sup>i</sup>(*x*) <sup>≤</sup> *expr* <sup>i</sup>(*a*(*x*)) =<sup>⇒</sup> *expr* <sup>i</sup>(*a*(*x*)) <sup>≤</sup> *expr* <sup>i</sup>(*a*<sup>2</sup>(*x*)), (4)

$$\check{\varphi}(x) \land \exp r\_i(x) \le \exp r\_i(\mathbf{a}(x)) \implies \exp r\_i(\mathbf{a}(x)) \le \exp r\_i(\mathbf{a}^2(x)),\qquad(4)$$

*then the following conditional acceleration technique is sound:*

$$\begin{aligned} &\text{ following conditional acceleration technique is sound:}\\ &(\langle \chi, \mathbf{a} \rangle, \check{\varphi}) \mapsto \mathbf{z}' = \mathbf{a}^n(x) \wedge \bigwedge\_{i=1}^k 0 < \exp r\_i(x) \le \exp r\_i(\mathbf{a}(x)) \end{aligned}$$

*Example 6.* Consider the following variant of Tev-inc.

$$\text{while } x\_1 > 0 \land x\_3 > 0 \text{ do } \begin{pmatrix} x\_1 \\ x\_3 \end{pmatrix} \leftarrow \begin{pmatrix} x\_1 + x\_2 \\ x\_3 + x\_3 \end{pmatrix}$$

Starting with its canonical acceleration problem, we get

$$\begin{aligned} \left[\mathbf{z}' &= \mathbf{a}^n(\mathbf{z}) \; \middle| \; \top \; \middle| \; x\_1 > 0 \land x\_3 > 0 \; \middle| \; \mathbf{a} &:= \begin{pmatrix} x\_1 + x\_2 \\ x\_3 + x\_3 \end{pmatrix} \right] \\ \stackrel{Thm.8}{\leadsto}\_e \left[\mathbf{z}' = \mathbf{a}^n(\mathbf{z}) \land x\_3 > 0 \; \middle| \; x\_3 > 0 \; \middle| \; x\_1 > 0 \; \middle| \; \mathbf{a} \right] \\ \stackrel{Thm.13}{\leadsto}\_\Rightarrow \left[\mathbf{z}' = \mathbf{a}^n(\mathbf{z}) \land x\_3 > 0 \land 0 < x\_1 \le x\_1 + x\_2 \; \middle| \; x\_3 > 0 \land x\_1 > 0 \; \middle| \; \top \; \middle| \; \mathbf{a} \right] \end{aligned}$$

where the second step can be performed via Thm. 13 as

(ϕq(*x*) <sup>∧</sup> *expr* (*x*) <sup>≤</sup> *expr* (*a*(*x*))) <sup>≡</sup> (x<sup>3</sup> <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> <sup>x</sup><sup>1</sup> <sup>≤</sup> <sup>x</sup><sup>1</sup> <sup>+</sup> <sup>x</sup>2) <sup>≡</sup> (x<sup>3</sup> <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> <sup>0</sup> <sup>≤</sup> <sup>x</sup>2) implies

$$(0 \le x\_2 + x\_3) \equiv (x\_1 + x\_2 \le x\_1 + x\_2 + x\_2 + x\_3) \equiv (\exp(\mathbf{a}(x)) \le \exp(\mathbf{a}^2(x))).$$

We also considered versions of Theorems 11 and 13 where the inequations in (3) resp. (4) are strict, but this did not lead to an improvement in our experiments. Moreover, we experimented with a variant of Thm. 13 that splits the loop under consideration into two consecutive loops, accelerates them independently, and composes the results. While such an approach can accelerate loops like ψev-inc exactly, the impact on our experimental results was minimal. Thus, we postpone an in-depth investigation of this idea to future work.

## **7 Related Work**

Acceleration-like techniques are also used in *over-approximating* settings (see, e.g., [10, 20, 21, 25, 26, 29, 32, 33]), whereas we consider *exact* and *under-approximating* loop acceleration techniques. As many related approaches have already been discussed in Sec. 3, we only mention two more techniques here.

First, [4, 7] presents an exact acceleration technique for *finite monoid affine transformations* (FMATs), i.e., loops with linear arithmetic whose body is of the form *<sup>x</sup>* <sup>←</sup> <sup>A</sup>*<sup>x</sup>* <sup>+</sup> *<sup>b</sup>* where {A<sup>i</sup> <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>N</sup>} is finite. For such loops, Presburger-Arithmetic is sufficient to construct an equivalent formula ψ, i.e., it can be expressed in a decidable logic. In general, this is clearly not the case for the techniques presented in the current paper (which may even synthesize nonpolynomial closed forms, see Texp). As a consequence and in contrast to our technique, this approach cannot handle loops where the values of variables grow super-linearly (i.e., it cannot handle examples like T2-invs). Implementations are available in the tools FAST [2] and Flata [24]. Further theoretical results on linear transformations whose n-fold closure is definable in (extensions of) Presburger-Arithmetic can be found in [5].

Second, [6] shows that *octagonal relations* can be accelerated exactly and in [27], it is proven that such relations can even be accelerated in polynomial time. This generalizes earlier results for *difference bound constraints* [9]. As in the case of FMATs, the resulting formula can be expressed in Presburger-Arithmetic. Octagonal relations are defined by a finite conjunction ξ of inequations of the form <sup>±</sup><sup>x</sup> <sup>±</sup> <sup>y</sup> <sup>≤</sup> <sup>c</sup>, x, y <sup>∈</sup> *<sup>x</sup>* <sup>∪</sup> *<sup>x</sup>*- , <sup>c</sup> <sup>∈</sup> <sup>Z</sup>. Then <sup>ξ</sup> induces the relation *x* −→<sup>ξ</sup> *x*- ⇐⇒ <sup>ξ</sup>(*x*, *<sup>x</sup>*- ). So in contrast to the loops considered in the current paper where *x* is uniquely determined by *x*, octagonal relations can represent non-deterministic programs. Therefore and due to the restricted form of octagonal relations, the work from [6, 27] is orthogonal to ours.

## **8 Implementation and Experiments**

We prototypically implemented our approach in our open-source Loop Acceleration Tool LoAT [11, 16, 17]:

#### https://github.com/aprove-developers/LoAT/tree/tacas20

It uses Z3 [30] to check implications and PURRS [1] to compute closed forms.

For technical reasons, the closed forms computed by LoAT are valid only if n > 0, whereas Def. <sup>2</sup> requires them to be valid for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>. The reason is that PURRS has only limited support for initial conditions. In the future, we plan to use a different recurrence solver to circumvent this problem. Thus, LoAT's results are only correct for all n > 1 (instead of all n > 0). Moreover, LoAT can currently compute closed forms only if the loop body is *triangular*, meaning that each a<sup>i</sup> is an expression over x1,...,xi. The reason is that PURRS cannot solve *systems* of recurrence equations, but only a single recurrence equation at a time. However, LoAT failed to compute closed forms for just 26 out of 1511 loops in our experiments, i.e., this appears to be a minor restriction in practice. Furthermore, *conditional acceleration via metering functions* has not yet been integrated into the implementation of our calculus. While LoAT can synthesize formulas with non-polynomial arithmetic, it cannot yet parse them, i.e., the input is restricted to polynomials. Finally, LoAT does not yet support disjunctive loop conditions.

Apart from these differences, our implementation closely follows the current paper. It repeatedly applies the conditional acceleration techniques from Sections 5 and 6 with the following priorities: T hm. 8 > T hm. 7 > T hm. 11 > T hm. 13.

To evaluate our approach, we extracted 1511 loops with conjunctive guards from the category *Termination of Integer Transition Systems* of the *Termination Problems Database* [35], the benchmark collection which is used at the annual *Termination and Complexity Competition* [19], as follows:


We compared our implementation with LoAT's implementation of *acceleration via monotonicity* (Thm. 3, [11]) and its implementation of *acceleration via metering functions* (Thm. 4, [17]), which also incorporates the improvements proposed in [16]. We did not include the techniques from Theorems 1 and 2 in our evaluation, as they are subsumed by *acceleration via monotonicity*. Furthermore, we compared with Flata [24], which implements the techniques to accelerate FMATs and octagonal relations discussed in Sec. 7. Note that our benchmark collection contains 16 loops with non-linear arithmetic where Flata is bound to fail, since it only supports linear arithmetic. We did not compare with FAST [2], which uses a similar approach as the more recent tool Flata.

All tests have been run on StarExec [34]. The results can be seen in Table 1. They show that our novel calculus was superior to the competing techniques in our experiments. In all but 7 cases where our calculus successfully accelerated the given loop, the resulting formula was polynomial. Thus, integrating our approach into existing acceleration-based verification techniques should not present major obstacles w.r.t. automation.




Furthermore, we evaluated the impact of our new acceleration techniques from Sec. 6 independently. To this end, we once disabled *acceleration via eventual increase*, *acceleration via eventual decrease*, and both of them. The results can be seen in Table 2. They show that our calculus does not improve over *acceleration via monotonicity* if both *acceleration via eventual increase* and *acceleration via eventual decrease* are disabled (i.e., our benchmark collection does not contain examples like T2-c-invs). However, enabling either *acceleration via eventual decrease* or *acceleration via eventual increase* resulted in a significant improvement. Interestingly, there are many examples that can be accelerated with either of these two techniques: When both of them were enabled, LoAT (exactly or approximately) accelerated 1482 loops. When one of them was enabled, it accelerated 1444 resp. 1338 loops. But when none of them was enabled, it only accelerated 845 loops. We believe that this is due to examples like

$$\text{while } x\_1 > 0 \land \dots \text{ do } \begin{pmatrix} x\_1 \\ x\_2 \\ \dots \end{pmatrix} \leftarrow \begin{pmatrix} x\_2 \\ x\_2 \\ \dots \end{pmatrix}.$$

where Thm. <sup>11</sup> *and* Thm. <sup>13</sup> are applicable (since <sup>x</sup><sup>1</sup> <sup>≤</sup> <sup>x</sup><sup>2</sup> implies <sup>x</sup><sup>2</sup> <sup>≤</sup> <sup>x</sup><sup>2</sup> and <sup>x</sup><sup>1</sup> <sup>≥</sup> <sup>x</sup><sup>2</sup> implies <sup>x</sup><sup>2</sup> <sup>≥</sup> <sup>x</sup>2).

Flata exactly accelerated 49 loops where LoAT failed or approximated and LoAT exactly accelerated 262 loops where Flata failed. So there were only 18 loops where both Flata and the full version of our calculus failed to compute an exact result. Among them were the only 3 examples where our implementation found a closed form, but failed anyway. One of them was<sup>4</sup>

<sup>3</sup> While acceleration via metering functions may be exact in some cases (see the discussion after Thm. 4), our implementation cannot check whether this is the case.

<sup>4</sup> The other two are structurally similar, but more complex.

$$\text{while } x\_3 > 0 \text{ do } \begin{pmatrix} x\_1 \\ x\_3 \end{pmatrix} \leftarrow \begin{pmatrix} x\_1 + 1 \\ x\_2 - x\_1 \\ x\_3 + x\_2 \end{pmatrix}.$$

Here, the updated value of x<sup>1</sup> depends on x1, the update of x<sup>2</sup> depends on x<sup>1</sup> and x2, and the update of x<sup>3</sup> depends on x<sup>2</sup> and x3. Hence, the closed form of x<sup>1</sup> is linear, the closed form of x<sup>2</sup> is quadratic, and the closed form of x<sup>3</sup> is cubic:

$$x\_3^{(n)} = -\frac{1}{6} \cdot n^3 + \frac{1 - x\_1}{2} \cdot n^2 + \left(\frac{x\_1}{2} + x\_2 - \frac{1}{3}\right) \cdot n + x\_3$$

So when fixing <sup>x</sup>1, x2, and <sup>x</sup>3, <sup>x</sup>(n) <sup>3</sup> has up to 2 extrema, i.e., its monotonicity may change twice. However, our techniques based on eventual monotonicity require that the respective expressions behave monotonically once they start to de- or increase, so these techniques only allow one change of monotonicity.

This raises the question if our approach can accelerate *every* loop with conjunctive guard and linear arithmetic whose closed form is a vector of (at most) quadratic polynomials with rational coefficients. We leave this to future work.

For our benchmark collection, links to the StarExec-jobs of our evaluation, and a pre-compiled binary (Linux, 64 bit) we refer to [14].

## **9 Conclusion and Future Work**

After discussing existing acceleration techniques (Sec. 3), we presented a calculus to combine acceleration techniques modularly (Sec. 4). Then we showed how to combine existing (Sec. 5) and two novel (Sec. 6) acceleration techniques with our calculus. This improves over prior approaches, where acceleration techniques were used independently, and may thus improve acceleration-based verification techniques [6,7,11,16–18,28] in the future. An empirical evaluation (Sec. 8) shows that our approach is more powerful than state-of-the-art acceleration techniques. Moreover, if it is able to accelerate a loop, then the result is exact (instead of just an under-approximation) in most cases. Thus, our calculus can be used for under-approximating techniques (e.g., to find bugs or counterexamples) as well as in over-approximating settings (e.g., to prove safety or termination).

In the future, we plan to implement the missing features mentioned in Sec. 8 and integrate our novel calculus into our own acceleration-based program analyses to prove lower bounds on the runtime complexity [16,17] and non-termination [11] of integer programs. Furthermore, our experiments indicate that integrating specialized techniques for FMATs (cf. Sec. 7) would improve the power of our approach, as Flata exactly accelerated 49 loops where LoAT failed to do so (cf. Sec. 8). Moreover, we plan to design a *loop acceleration library*, such that our technique can easily be incorporated by other verification tools.

**Data Availability Statement and Acknowledgments** The tools and datasets used for the current study are available in the Zenodo repository [12].

I thank Carsten Fuhs, Marcel Hark, Sophie Tourret, and the anonymous reviewers for helpful feedback and comments. Moreover, I thank Radu Iosif and Filip Konecn´y for their help with Flata.

## **References**


format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. **Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## SAT and SMT

## **Mind the Gap: Bit-vector Interpolation recast over Linear Integer Arithmetic**

Takamasa Okudono1,<sup>2</sup> and Andy King<sup>3</sup>

<sup>1</sup> National Institute of Informatics, Tokyo, Japan <sup>2</sup> The Graduate University for Advanced Studies (SOKENDAI), Tokyo, Japan <sup>3</sup> University of Kent, Canterbury, UK

**Abstract.** Much of an interpolation engine for bit-vector (BV) arithmetic can be constructed by observing that BV arithmetic can be modeled with linear integer arithmetic (LIA). Two BV formulae can thus be translated into two LIA formulae and then an interpolation engine for LIA used to derive an interpolant, albeit one expressed in LIA. The construction is completed by back-translating the LIA interpolant into a BV formula whose models coincide with those of the LIA interpolant. This paper develops a back-translation algorithm showing, for the first time, how back-translation can be universally applied, whatever the LIA interpolant. This avoids the need for deriving a BV interpolant by bit-blasting the BV formulae, as a backup process when back-translation fails. The new back-translation process relies on a novel geometric technique, called gapping, the correctness and practicality of which are demonstrated.

## **1 Introduction**

Given two formulae A and B which are inconsistent, an interpolant for the ordered pair -A, B is a formula <sup>I</sup> over the variables common to both <sup>A</sup> and <sup>B</sup> which is a relaxation of A that is still inconsistent with B. For example, when working over the theory of linear inequalities, if <sup>A</sup> = (<sup>x</sup> <sup>=</sup> <sup>y</sup> + 1) <sup>∧</sup> (<sup>y</sup> = 0) and <sup>B</sup> = (<sup>x</sup> <sup>=</sup> <sup>z</sup> + 2) <sup>∧</sup> (1 <sup>≤</sup> <sup>z</sup>) then interpolants for -A, B are <sup>I</sup><sup>1</sup> = (<sup>x</sup> = 1), <sup>I</sup><sup>2</sup> = (<sup>x</sup> <sup>≤</sup> 1) and <sup>I</sup><sup>3</sup> = (x < 3), ordering by increasing generality. The intuition behind I1, I<sup>2</sup> and I<sup>3</sup> is that they are abstractions of A which concisely explain the inconsistency between A and B. Interpolation has attracted growing attention over the last decade [26], because of the crucial role it plays in model checking in lazy [18] predicate abstraction [15] and lazy abstraction with interpolants [25], as exemplified in BLAST [5] and IMPACT [25] respectively. In lazy predicate abstraction [25], interpolation is used to synthesise predicates which describe program state. Predicates are added, on demand, to explain why a path through a program cannot reach an error state. In lazy abstraction with interpolants [25], program state is described with unrestricted formulae, rather than merely using predicates, and interpolation is applied to relax sequences of formulae that describe the states down paths which do not error. Interpolation simplify these formulae but increasing the likelihood of covering, again accelerating path exploration. In effect, interpolation is the key abstraction mechanism.

Context As solvers for richer theories have evolved so have interpolation engines for these theories, with a notable flurry of activity around one decade ago [10, 11, 19, 20, 23, 24, 30]. However, progress on the important theory of bit-vectors (BV) has been surprisingly slow, the two key works [2, 16] taking opposing approaches. One takes advantage of existing interpolation engines [16] and the another develops a bespoke interpolation engine around lazy reduction [2], which supports bit-vector operations by expanding them, on demand, to Presburger arithmetic [2]. This paper develops the former approach, aiming to use an LIA solver as is.

The central problem in bit-vector interpolation is to construct an interpolant which is compact (one might even say beautiful [1]). Although a pair of inconsistent BV formulae can always be bit-blasted (unfolded) into a pair of inconsistent propositional formulae, it is not always obvious how the resulting propositional interpolant can be folded back into a compact bit-vector (BV) formula to derive a BV interpolant. Interpolation engines over linear integer arithmetic (LIA) have thus been repurposed for BV interpolation [16]. First, operations on bit-vectors are reformulated as LIA formulae. An interpolant over LIA is then reinterpreted as a candidate interpolant for a pair of BV formulae. Because of wrap-around, LIA does not necessarily align with BV arithmetic, hence the LIA interpolant is adopted as a BV interpolant only if it passes a (unsatisfiability) check over bit-vectors. This checks that the interpolant relaxes the first BV formula of the pair and yet is still inconsistent with the second. If the candidate fails the check, then the two BV formulae are bit-blasted to recover a propositional interpolant, albeit one which looses the high-level structure of bit-vectors, and therefore is not compact. This approach is promising: it exploits robust off-the-shelf LIA interpolation [17] yet is compromised by the quality of the interpolants which follow from bit-blasting.

Contribution This paper plugs this gap, addressing the issue of interpolant quality by developing a new, principled encoding LIA formulae into BV formulae which does not enlarge the bit-width of the BV formulae. This ensures that the interpolant is still drawn from the language used to define BV formulae. We show that a na¨ıve encoding of an LIA inequality as a BV inequality can give a formula which has a completely different meaning from LIA inequality: the BV inequality can have solutions not admitted by the LIA inequality and vice versa. Moreover, we illustrate how a straightforward encoding of a single LIA inequality can require many BV inequalities, which compromises the quality of a BV interpolant. We therefore propose a technique, which we call gapping, which adds range constraints to LIA inequality which reduces the LIA inequality into two or three LIA systems the solutions of which are amenable to compact BV representation. The term gapping reflects a geometric interpretation of this transformation which introduces a gap<sup>4</sup> between the solutions of the two LIA systems. We demonstrate the value of this approach with a BV interpolation engine which side-steps bit-blasting (and the complexity of providing bit-level

<sup>4</sup> The title of the paper alludes to both this geometric technique, the conceptual gap in previous work, and collaboration which entailed traveling through London.

circuits for arithmetic) and show that the approach usually gives a modest slowdown relative to LIA. We also prove the validity of the BV encoding, and the correctness of the reductions the encoding relies on, though the proofs themselves are omitted here for brevity. To summarise, the contributions of this paper are as follows:


Use case Since BV formulae are converted to LIA one might wonder why one cannot work with LIA throughout and avoid BV interpolation all together. First, such an approach would not fit with a layered approach to interpolation [17] where one uses one lightweight theory (eg. uninterpreted functors) and then, if necessary, a more complicated one (eg. LIA) to construct a BV interpolant. BV formulae provide a uniform way expressing interpolants, no matter how they are derived. Second, computing LIA interpolants is complex and it is not surprising that these engines contain subtle<sup>5</sup> bugs. Translating a LIA interpolant back into a BV formula enables interpolants to be validated using a BV solver [3, 7, 27], using the reference (BV) semantics of a program. Moreover, validation need make no assumption on the correctness of a translation between theories. Validation can be performed on-the-fly, as the unwinding tree [25] is constructed, or by translating the complete, stable unwinding tree into its BV counterpart. The BV version can then be validated as a form of post-processing, akin to post-fixpoint validation in abstract interpretation [4, 14].

Road map This paper is structured as follows: Section 2 gives the intuition behind boxing and gapping whereas Section 3 argues for the correctness of the approach. Section 4 presents the experimental work. Section 5 presents the related work and section 6 concludes.

## **2 Boxing and Gapping in Pictures**

Given a linear inequality -, we seek to find a bit-vector formula f such that - f BV <sup>=</sup> - - LIA where - f BV and - - LIA are respectively the sets of solutions (models) of f and in the linear integer arithmetic (LIA) and bit-vector (BV) semantics. Ideally f should be compact where we measure size by the number of binary logical connectives in f. This section gives the intuition behind two

<sup>5</sup> We refrain from mentioning specific solvers because we do not want to embarrass any particular research team to whom we are grateful.

**Fig. 1.** Gapping and boxing for <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> 3 and <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> <sup>7</sup>

techniques, boxing and gapping, and demonstrate how they are used together to construct such an f; the sequel provides a more formal development.

To illustrate boxing and gapping, first consider the set of solutions to the inequality <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> 3, when interpreted with both the LIA semantics and BV semantics. Figure 1(a) gives the LIA solutions in blue and the BV solutions in red over the non-negative integer grid {(x, y) <sup>|</sup> <sup>0</sup> <sup>≤</sup> x < <sup>8</sup> <sup>∧</sup> <sup>0</sup> <sup>≤</sup> y < <sup>8</sup>} using a modulo of 8 for bit-vectors. The solution sets differ on, for instance, (5, 6) since (5 + 6) (mod 8) = 3 <sup>≤</sup> 3 but 5 + 6 = 11 ≤ 3. It does not generally follow that - f LIA <sup>⊆</sup> - f BV as Figure 1(d) illustrates for <sup>f</sup> <sup>=</sup> <sup>x</sup>+<sup>y</sup> <sup>−</sup><sup>4</sup> <sup>≤</sup> 3. Then (1, 2) <sup>∈</sup> - <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>−</sup> <sup>4</sup> <sup>≤</sup> <sup>3</sup> LIA since 1+2−4 = <sup>−</sup><sup>1</sup> <sup>≤</sup> 3 but (1, 2) ∈ - <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>−</sup> <sup>4</sup> <sup>≤</sup> <sup>3</sup> BV since (1 + 2 − 4) (mod 8) = 7 ≤ 3.

Enumeration A naive approach to finding a formula f such that - f BV <sup>=</sup> - - LIA is to enumerate all solutions of - - LIA to then summarise them in a single BV formula. Figure 1(a) illustrates the 4 + 3 + 2 + 1 = 10 LIA solutions for - = (<sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> 3) which are summarised in the following BV formula:

$$f\_1 = \left(x = 0 \land y = 0\right) \lor \dots \lor \left(x = 0 \land y = 3\right) \lor \dots \lor \left(x = 3 \land y = 0\right)$$

This formula has 9 binary disjuncts and 10 binary conjuncts, hence 19 logical connectives in total. A more compact formulation is to cover the blue triangular region of Figure 1(a) with columns as realised with the following BV formula:

$$f\_2 = (x = 0 \land y \le 3) \lor \dots \lor (x = 3 \land y \le 0)$$

Only non-negative solutions on the grid are considered so there is no need to additionally assert 0 <sup>≤</sup> <sup>y</sup>. This formula has 3 binary disjuncts and 4 binary conjuncts giving and 7 connectives in total.

Boxing Observe from Figure 1(a) that the extra solutions of - <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> <sup>3</sup> BV over - <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> <sup>3</sup> LIA stem from overflow. Overflow can be avoided by constraining BV solutions with <sup>x</sup> <sup>≤</sup> 3 and <sup>y</sup> <sup>≤</sup> 3 which amounts to placing a box (in general a hyper-rectangle) around the LIA solutions, as illustrated in Figure 1(b). This tactic, henceforth called boxing, leads to the following formula:

$$f\_3 = (x+y \le 3 \land x \le 3 \land y \le 3)$$

which requires 2 binary conjuncts.

Gapping Figure 1(c) illustrates that in general boxing cannot be applied in isolation because a box around the LIA solutions would not eliminate any extraneous BV solutions. Boxing is successful for Figure 1(b) because of the absence of solutions (a gap) between the LIA solutions inside the box and the BV solutions outside the box. No such gap exists for the box of Figure 1(c). Yet boxing can still be applied by decomposing the inequality <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> 7 into two inequalities both of which are amenable to boxing. The construction is based on - <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> <sup>7</sup> LIA <sup>=</sup> - <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> <sup>3</sup> LIA <sup>∪</sup>- <sup>4</sup> <sup>≤</sup> <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>∧</sup> <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> <sup>7</sup> LIA <sup>=</sup> - <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> <sup>3</sup> LIA ∪ - <sup>0</sup> <sup>≤</sup> <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>−</sup> <sup>4</sup> <sup>∧</sup> <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>−</sup> <sup>4</sup> <sup>≤</sup> <sup>3</sup> LIA. Recall that boxing alone allows the LIA solutions of <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> 3 to be expressed as a BV formula of 2 binary connectives. Thus consider the compound formula -- = (0 <sup>≤</sup> <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>−</sup> <sup>4</sup> <sup>∧</sup> <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>−</sup> <sup>4</sup> <sup>≤</sup> 3) whose LIA solutions are illustrated in Figure 1(d). Observe that the BV solutions of - can be covered with two rectangles without including the extraneous 6 BV solutions in top right. Then - -- LIA <sup>=</sup> - <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>−</sup> <sup>4</sup> <sup>≤</sup> <sup>3</sup> <sup>∧</sup> (<sup>x</sup> <sup>≤</sup> <sup>3</sup> <sup>∨</sup> <sup>y</sup> <sup>≤</sup> 3) BV which leads to the complete formula

$$f\_3 = \left(x+y \le 3 \land x \le 3 \land y \le 3\right) \lor \left(x+y-4 \le 3 \land \left(x \le 3 \lor y \le 3\right)\right)$$

such that - f3 BV <sup>=</sup> - <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> <sup>7</sup> LIA. This tactic of artificially introducing a gap, henceforth called gapping, is equally applicable for larger grids too. For instance, working over a modulo of 32 - <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> <sup>31</sup> LIA <sup>=</sup> - f4 BV where

$$f\_4 = \left(x+y \le 15 \land x \le 15 \land y \le 15\right) \lor \left(x+y-16 \le 15 \land \left(x \le 15 \lor y \le 15\right)\right)$$

## **3 Formal correctness of boxing and gapping**

In what follows we consider LIA and BV formulae over an ordered set of variables {x1,...,x<sup>d</sup>} for some d > 1. We consider bit-vectors of fixed width w > <sup>1</sup> and interpret LIA and BV formulae over the product space M<sup>d</sup> where M = {0, <sup>1</sup>, <sup>2</sup>,...,m <sup>−</sup> <sup>1</sup>} and <sup>m</sup> = 2<sup>w</sup> as follows:

**Definition 1.** Let *c*, *c-* <sup>∈</sup> <sup>Z</sup><sup>d</sup> and b, b- <sup>∈</sup> <sup>Z</sup>. If - ≡ ( d <sup>i</sup>=1 <sup>c</sup>ixi)+<sup>b</sup> <sup>≤</sup> ( d <sup>i</sup>=1 <sup>c</sup>- <sup>i</sup>xi)+ bthen

$$\begin{aligned} \left[\ell\right]\_{\mathsf{LIA}} &= \left\{ x \in \mathsf{M}^d \Big| \sum\_{i=1}^d c\_i x\_i + b \le \sum\_{i=1}^d c'\_i x\_i + b' \right\} \\\\ \left[\ell\right]\_{\mathsf{BV}} &= \left\{ x \in \mathsf{M}^d \Big| \left(\sum\_{i=1}^d c\_i x\_i + b\right) \bmod m \le \left(\sum\_{i=1}^d c'\_i x\_i + b'\right) \bmod m \right\} \end{aligned} $$

Furthermore, the LIA semantics can be lifted from inequalities to LIA formulae by: - <sup>f</sup><sup>1</sup> <sup>∨</sup> <sup>f</sup><sup>2</sup> LIA <sup>=</sup> - f1 LIA∪ - f2 LIA, - <sup>f</sup><sup>1</sup> <sup>∧</sup> <sup>f</sup><sup>2</sup> LIA <sup>=</sup> - f1 LIA∩ - f2 LIA and - ¬f LIA = <sup>M</sup><sup>d</sup> \ - f LIA. Likewise for BV formulae.

In the sequel, N denotes the set of (strictly) positive integers, R the set of real numbers, and <sup>R</sup>≥<sup>0</sup> the set of non-negative real numbers. We extend the floor and ceiling function for the sequences in R<sup>d</sup> in a component-wise manner: *<sup>x</sup>*<sup>i</sup> <sup>=</sup> <sup>x</sup>i and *x*<sup>i</sup> <sup>=</sup> xi. If *<sup>x</sup>* <sup>∈</sup> <sup>R</sup><sup>d</sup> then <sup>|</sup>*x*<sup>|</sup> <sup>=</sup> <sup>d</sup>. The partial order <sup>≤</sup> on <sup>R</sup><sup>d</sup> is defined by *<sup>x</sup>* <sup>≤</sup> *<sup>y</sup>* if and only if <sup>x</sup><sup>i</sup> <sup>≤</sup> <sup>y</sup><sup>i</sup> for all <sup>i</sup> = 1,...,d.

#### **3.1 Boxing**

Boxing is founded on the following result and its corollary in which sets of solutions to inequalities which describe hyper-rectangles are pinched, above and below, by inclusions to systems of inequalities with positive, unary coefficients:

**Lemma 1.** Let d > 1 and <sup>L</sup> <sup>∈</sup> <sup>N</sup>. Then:

$$\begin{aligned} & \left\{ \begin{aligned} & \mathbf{z} \in \mathbb{R}\_{\geq 0}^{d} \Big| \sum\_{i=1}^{d} x\_{i} \leq L \cdot (m/2) - 1 \right\} \\ & \subseteq \bigcup\_{\mathbf{p} \in I\_{d}((d-1)(L+1))} \bigcap\_{i=1}^{d} \left\{ \mathbf{z} \in \mathbb{R}\_{\geq 0}^{d} \, | \, x\_{i} < \frac{p\_{i} \cdot (m/2)}{d-1} \right\} \\ & \subseteq \left\{ \mathbf{z} \in \mathbb{R}\_{\geq 0}^{d} \, \middle| \sum\_{i=1}^{d} x\_{i} < (L+1) \cdot (m/2) \right\} \end{aligned} \right\} $$

where Id(n) = (i1,...,id) <sup>∈</sup> <sup>N</sup><sup>d</sup> <sup>|</sup> <sup>i</sup><sup>1</sup> <sup>+</sup> ··· <sup>+</sup> <sup>i</sup><sup>d</sup> <sup>=</sup> <sup>n</sup> .

**Corollary 1.** Let d > 1, <sup>L</sup> <sup>∈</sup> <sup>N</sup> and *<sup>c</sup>* <sup>∈</sup> <sup>N</sup><sup>d</sup>. Then:

$$\begin{aligned} & \left\{ x \in \mathbb{Z}\_{\geq 0}^{d} \, | \, \sum\_{i=1}^{d} c\_{i} x\_{i} \leq L \cdot (m/2) - 1 \right\} \\ & \subseteq \bigcup\_{\mathbf{p} \in I\_{d}((d-1)(L+1))} \bigcap\_{j=1}^{d} \left\{ x \in \mathbb{Z}\_{\geq 0}^{d} \, | \, x\_{j} \leq \lceil \frac{p\_{j} \cdot (m/2)}{c\_{i}(d-1)} \rceil - 1 \right\} \\ & \subseteq \left\{ x \in \mathbb{Z}\_{\geq 0}^{d} \, | \, \sum\_{i=1}^{d} c\_{i} x\_{i} \leq (L+1) \cdot (m/2) - 1 \right\} \end{aligned} $$

The corollary leads to two types of box constraint: one for LIA and the other, reducing boxing, for BV. Boxing formulae are purely conceptual and are used to reason about correctness; reduced boxing formulae are deployed within BV interpolants.

**Definition 2.** Let *<sup>c</sup>* <sup>∈</sup> <sup>N</sup><sup>d</sup>, <sup>b</sup> <sup>∈</sup> <sup>N</sup> and <sup>L</sup> <sup>∈</sup> <sup>N</sup> be the unique natural number such that (<sup>L</sup> <sup>−</sup> 1)· (m/2) <sup>≤</sup> <sup>b</sup> <sup>≤</sup> <sup>L</sup> · (m/2) <sup>−</sup> 1. The boxing and reduced boxing of d <sup>i</sup>=1 <sup>c</sup>ix<sup>i</sup> <sup>≤</sup> <sup>b</sup> are formulae defined as follows:

$$\text{box}\_{\mathsf{IA}}(\mathsf{c};b) \equiv \bigvee\_{\substack{\mathsf{p} \in I\_d((d-1)(L+1))}} \bigwedge\_{j=1}^d \left( x\_j \le \lceil \frac{p\_j \cdot (m/2)}{c\_j(d-1)} \rceil - 1 \right) \tag{1}$$

$$\text{box}\_{\mathsf{B}\mathsf{V}}(\mathsf{c};b) \equiv \bigvee\_{\substack{\mathsf{p}\in I\_d((d-1)(L+1))}} \bigwedge\_{j=1}^d \left(x\_j \le \min\left( \lceil \frac{p\_j \cdot (m/2)}{c\_j(d-1)} \rceil - 1, m - 1 \right) \right) \tag{2}$$

Given <sup>m</sup> and <sup>b</sup> <sup>∈</sup> <sup>N</sup>, it is always possible to find a unique <sup>L</sup> <sup>∈</sup> <sup>N</sup> which satisfies Definition 2 by putting <sup>L</sup> <sup>=</sup> <sup>2</sup><sup>b</sup> <sup>m</sup> + 1. Then <sup>L</sup> <sup>−</sup> 1 = <sup>2</sup><sup>b</sup> <sup>m</sup> ≤ <sup>2</sup><sup>b</sup> <sup>m</sup> <sup>&</sup>lt; <sup>2</sup><sup>b</sup> <sup>m</sup> +1= <sup>L</sup> hence (<sup>L</sup> <sup>−</sup> 1)(m/2) <sup>≤</sup> b<L(m/2) whence (<sup>L</sup> <sup>−</sup> 1)(m/2) <sup>≤</sup> <sup>b</sup> <sup>≤</sup> <sup>L</sup>(m/2) <sup>−</sup> <sup>1</sup> because b and L(m/2) are integral.

One might expect that the cardinality of <sup>I</sup>d((<sup>d</sup> <sup>−</sup> 1)(<sup>L</sup> + 1)) becomes large as d or L grow large. Yet d is the number of variables occurring in the LIA interpolant, which is typically small. Furthermore, when L is large, the values of p are also large, so that many terms become equivalent because of the min operation in equation (2) of Definition 2. Thus the number of terms required to define boxBV(*c*; b) does not grow excessively large in practice.

The following proposition asserts that the boxing and reduced boxing formulae share the same solution set when interpreted with, respectively, the LIA and BV semantics.

**Proposition 1.** - boxLIA(*c*; b) LIA <sup>=</sup> - boxBV(*c*; b) BV

Example 1. To demonstrate this equivalence, consider again <sup>x</sup>+<sup>y</sup> <sup>≤</sup> 3 for <sup>m</sup> = 8. Then put <sup>L</sup> <sup>=</sup> <sup>6</sup>/8 + 1 = 1 and <sup>I</sup>2((<sup>d</sup> <sup>−</sup> 1)(<sup>L</sup> + 1)) = <sup>I</sup>2(2) = {-<sup>1</sup>, <sup>1</sup>}. Observe boxLIA(-<sup>1</sup>, <sup>1</sup>; 3) = boxBV(-<sup>1</sup>, <sup>1</sup>; 3) since

$$\text{box}\_{\mathsf{IA}}(\langle 1, 1 \rangle; 3) = (x \le \lceil 4/1 \rceil - 1 = 3) \land (y \le \lceil 4/1 \rceil - 1 = 3)$$

$$\text{box}\_{\mathsf{BV}}(\langle 1, 1 \rangle; 3) = (x \le \min(3, 7) = 3) \land (y \le \min(3, 7) = 3)$$

Example 2. Although - boxLIA(*c*; b) LIA <sup>=</sup> - boxBV(*c*; b) BV, it does not necessarily follow that - boxLIA(*c*; b) LIA <sup>=</sup> - boxLIA(*c*; b) BV. To illustrate, consider <sup>x</sup> <sup>+</sup> <sup>y</sup> <sup>≤</sup> <sup>7</sup> for <sup>d</sup> = 2 and <sup>m</sup> = 4. Thus *<sup>c</sup>* <sup>=</sup> -<sup>1</sup>, <sup>1</sup> and <sup>b</sup> = 7. Then <sup>L</sup> <sup>=</sup> <sup>14</sup>/4 + 1 = 4 and <sup>I</sup>2((<sup>d</sup> <sup>−</sup> 1)(<sup>L</sup> + 1)) = <sup>I</sup>2(5) = {-<sup>1</sup>, <sup>4</sup>,-<sup>2</sup>, <sup>3</sup>,-<sup>3</sup>, <sup>2</sup>,-<sup>4</sup>, <sup>1</sup>} hence

$$\begin{array}{c} \text{box\\_A}(\mathbf{c}; b) = \left(x \le 1 \land y \le 7\right) \lor \left(x \le 3 \land y \le 5\right) \lor \\\ \left(x \le 5 \land y \le 3\right) \lor \left(x \le 7 \land y \le 1\right) \end{array}$$

Therefore - boxLIA(*c*; b) LIA <sup>=</sup> <sup>M</sup><sup>2</sup> but (2, 2) ∈ - boxLIA(*c*; b) BV.

The following lemma shows that the solution sets for boxing grow monotonically as the constant of the inequality is relaxed.

**Lemma 2.** If <sup>b</sup> <sup>≤</sup> <sup>b</sup> then - boxLIA(*c*; b) LIA <sup>⊆</sup> - boxLIA(*c*; b- ) LIA.

The following results explains how to augment an inequality with a box so as to align its BV semantics with its LIA semantics.

**Theorem 1 (boxing without gapping).** Let *<sup>c</sup>* <sup>∈</sup> <sup>N</sup><sup>d</sup> and <sup>b</sup> <sup>∈</sup> <sup>N</sup>. If b < m/<sup>2</sup> then

$$\left\| \sum\_{i=1}^d c\_i x\_i \le b \right\|\_{\mathsf{UA}} = \left\| \left( \sum\_{i=1}^d c\_i x\_i \le b \right) \land \text{box}\_{\mathsf{BV}}(c; b) \right\|\_{\mathsf{BV}}$$

(a) <sup>x</sup> + 2<sup>y</sup> <sup>≤</sup> 5 with boxes (b) <sup>x</sup> + 2<sup>y</sup> <sup>≤</sup> 3 with box (c) 0 <sup>≤</sup> <sup>x</sup> + 2<sup>y</sup> <sup>−</sup> <sup>4</sup> <sup>≤</sup> 1 with boxes

**Fig. 2.** Gapping and boxing for <sup>x</sup> + 2<sup>y</sup> <sup>≤</sup> <sup>5</sup>

Observe that the result requires b < m/2. In this circumstance <sup>L</sup> <sup>=</sup> <sup>2</sup>b/m <sup>+</sup> 1 = 1 and number of logical connectives in boxBV(*c*; b) is determined by the cardinality of the set <sup>I</sup>d((<sup>d</sup> <sup>−</sup> 1)(<sup>L</sup> + 1)) = <sup>I</sup>d(2(<sup>d</sup> <sup>−</sup> 1)), which is given below:


where Π(v) denote the set of permutations of the vector v. For d = 4, boxBV(*c*; b) thus requires 10(<sup>d</sup> <sup>−</sup> 1) = 30 binary conjunctions and 10 <sup>−</sup> 1 = 9 disjunctions.

#### **3.2 Boxing and Gapping**

Example 3. Consider - <sup>x</sup> + 2<sup>y</sup> <sup>≤</sup> <sup>5</sup> BV and - <sup>x</sup> + 2<sup>y</sup> <sup>≤</sup> <sup>5</sup> LIA for <sup>m</sup> = 8 as shown in Figure 2(a). Observe

$$(\text{box}\_{\mathsf{BV}}(\langle 1,2 \rangle; 5) = (x \le 3 \land y \le 3) \lor (x \le 7 \land y \le 1))$$

which is illustrated by the two grey rectangles. Hence -<sup>2</sup>, <sup>3</sup> <sup>∈</sup>/ - <sup>x</sup> + 2<sup>y</sup> <sup>≤</sup> <sup>5</sup> LIA but -<sup>2</sup>, <sup>3</sup> ∈ - <sup>x</sup> + 2<sup>y</sup> <sup>≤</sup> <sup>5</sup> <sup>∧</sup> boxBV(-<sup>1</sup>, <sup>2</sup>; 5) BV therefore using boxing alone is not sufficient to encode the LIA inequality <sup>x</sup> + 2<sup>y</sup> <sup>≤</sup> 5.

Example 4. Yet the LIA inequality <sup>x</sup> + 2<sup>y</sup> <sup>≤</sup> 5 can be decomposed as follows:

$$\begin{array}{c} \left\lbrack x + 2y \le 5 \right\rbrack\_{\mathsf{LIA}} = \left\lbrack x + 2y \le 3 \right\rbrack\_{\mathsf{LIA}} \cup \left\lbrack 4 \le x + 2y \le 5 \right\rbrack\_{\mathsf{LIA}}\\ = \left\lbrack x + 2y \le 3 \right\rbrack\_{\mathsf{LIA}} \cup \left\lbrack 0 \le x + 2y - 4 \le 1 \right\rbrack\_{\mathsf{LIA}} \end{array}$$

Figures 2(b, c) illustrates boxing for <sup>x</sup> + 2<sup>y</sup> <sup>≤</sup> 3 and 0 <sup>≤</sup> <sup>x</sup> + 2<sup>y</sup> <sup>−</sup> <sup>4</sup> <sup>≤</sup> 1 where:

$$\begin{array}{c} \left\lbrack x + 2y \le 3 \right\rbrack\_{\mathsf{LIA}} = \left\lbrack x + 2y \le 3 \land \mathsf{box\_{\mathsf{BV}}}(\langle 1, 2 \rangle; 3) \right\rbrack\_{\mathsf{BV}}\\ = \left\lbrack x + 2y \le 3 \land (x \le 3 \land y \le 1) \right\rbrack\_{\mathsf{BV}} \end{array}$$

Observe from Figure 2(c) that


$$\begin{aligned} \left\| 0 \le x + 2y - 4 \le 1 \right\|\_{\mathsf{LIA}} &= \left\| x + 2y - 4 \le 1 \land \mathsf{box\_{\mathsf{BV}}}(\langle 1, 2 \rangle; 5) \right\|\_{\mathsf{BV}} \end{aligned}$$

therefore cumulatively - <sup>x</sup> + 2<sup>y</sup> <sup>≤</sup> <sup>5</sup> LIA <sup>=</sup> - <sup>ϕ</sup><sup>1</sup> <sup>∨</sup> <sup>ϕ</sup><sup>2</sup> BV where

$$\begin{array}{ll} \varphi\_1 = [x+2y \le 3 \quad \land \left(x \le 3 \land y \le 1\right)] \\ \varphi\_2 = [x+2y-4 \le 1 \land \left(\left(x \le 3 \land y \le 3\right) \lor \left(x \le 7 \land y \le 1\right)\right)] \end{array}$$

The general rule of the separation of the given inequality and the boxing is shown in this theorem:

**Theorem 2 (boxing with gapping).** Let *<sup>c</sup>* <sup>∈</sup> <sup>N</sup><sup>d</sup> and <sup>b</sup> <sup>∈</sup> <sup>N</sup>. -d <sup>i</sup>=1 <sup>c</sup>ix<sup>i</sup> <sup>≤</sup> <sup>b</sup> LIA <sup>=</sup> - <sup>φ</sup><sup>0</sup> <sup>∨</sup> <sup>φ</sup><sup>1</sup> <sup>∨</sup> <sup>φ</sup><sup>2</sup> BV where <sup>S</sup> <sup>=</sup> b/(m/2) and

$$\begin{array}{l} \phi\_0 \equiv \left(\sum\_{i=1}^d c\_i x\_i - (S-2)(m/2) \le m/2 - 1\right) \land \text{box}\_{\mathsf{V}}(\mathsf{c}; (S-1)(m/2) - 1) \\\phi\_1 \equiv \left(\sum\_{i=1}^d c\_i x\_i - (S-1)(m/2) \le m/2 - 1\right) \land \text{box}\_{\mathsf{V}}(\mathsf{c}; S(m/2) - 1) \\\phi\_2 \equiv \left(\sum\_{i=1}^d c\_i x\_i - S(m/2) \le b \text{ mod } (m/2)\right) \land \text{box}\_{\mathsf{V}}(\mathsf{c}; b) \end{array}$$

**Corollary 2 (boxing and gapping with simplification).** If b/(m/2) = 1 or <sup>b</sup> mod <sup>m</sup> <sup>=</sup> m/<sup>2</sup> <sup>−</sup> <sup>1</sup> then -d <sup>i</sup>=1 <sup>c</sup>ix<sup>i</sup> <sup>≤</sup> <sup>b</sup> LIA <sup>=</sup> - <sup>φ</sup><sup>1</sup> <sup>∨</sup> <sup>φ</sup><sup>2</sup> BV.

Example 5. Let <sup>m</sup> = 8 and consider again <sup>x</sup> + 2<sup>y</sup> <sup>≤</sup> 5 so that *<sup>c</sup>* <sup>=</sup> -<sup>1</sup>, <sup>2</sup>. Then <sup>S</sup> <sup>=</sup> <sup>5</sup>/4 = 1 and, applying corollary 2, - <sup>x</sup> + 2<sup>y</sup> <sup>≤</sup> <sup>5</sup> LIA <sup>=</sup> - <sup>φ</sup><sup>1</sup> <sup>∨</sup> <sup>φ</sup><sup>2</sup> BV where

$$\begin{array}{llll} \phi\_1 \equiv (x+2y-0 \cdot 4 \le 4-1) & \land \text{box}\_{\mathsf{BV}}(\mathsf{c}; 1 \cdot 4-1) = \varphi\_1 \\ \phi\_2 \equiv (x+2y-1 \cdot 4 \le 5 \text{ mod } 4) \land \text{box}\_{\mathsf{BV}}(\mathsf{c}; 5) & = \varphi\_2 \end{array}$$

aligning with the intuition given in example 4.

Example 6. Figure 3 illustrates Theorem 2 for 7<sup>x</sup> + 3<sup>y</sup> <sup>≤</sup> 17 and <sup>m</sup> = 8. Then <sup>S</sup> <sup>=</sup> <sup>17</sup>/(8/2) = 4 and - <sup>7</sup><sup>x</sup> + 3<sup>y</sup> <sup>≤</sup> <sup>17</sup> LIA <sup>=</sup> - <sup>φ</sup><sup>0</sup> <sup>∨</sup> <sup>φ</sup><sup>1</sup> <sup>∨</sup> <sup>φ</sup><sup>2</sup> BV where

$$\begin{array}{l} \phi\_0 = 7x + 3y - 8 \leq 3 \land \text{box}\_{\mathsf{B}}(\mathsf{c}; 11) \\ \phi\_1 = 7x + 3y - 12 \leq 3 \land \text{box}\_{\mathsf{B}}(\mathsf{c}; 15) \\ \phi\_2 = 7x + 3y - 16 \leq 1 \land \text{box}\_{\mathsf{B}}(\mathsf{c}; 17) \end{array}$$

The boxBV(*c*; 11), boxBV(*c*, 15), boxBV(*c*; 17) formulae are again depicted in grey. For example,

$$(\text{box}\_{\mathsf{BV}}(\mathsf{c}; 11) = (x \le 0 \land y \le 3) \lor (x \le 1 \land y \le 2) \lor (x \le 1 \land y \le 1))$$

because <sup>d</sup> = 2, <sup>L</sup> = 3 and <sup>I</sup>2((d−1)(L+1)) = {-<sup>1</sup>, <sup>3</sup>,-<sup>2</sup>, <sup>2</sup>,-<sup>3</sup>, <sup>1</sup>}. From Figure 3 observe - <sup>7</sup><sup>x</sup> + 3<sup>y</sup> <sup>≤</sup> <sup>17</sup> LIA <sup>=</sup> - φ0 BV <sup>∪</sup> - φ1 BV <sup>∪</sup> - φ2 BV.

Example 7. Consider again example 5 where S = 1. Then φ<sup>0</sup> = f alse because boxBV(*c*; (<sup>S</sup> <sup>−</sup> 1)(m/2) <sup>−</sup> 1) = boxBV(*c*; <sup>−</sup>1) = f alse. This is because <sup>L</sup> = 0 and <sup>I</sup>d((<sup>d</sup> <sup>−</sup> 1)(<sup>L</sup> + 1)) = <sup>I</sup>2(1) = <sup>∅</sup>. Theorem 2 then gives - <sup>x</sup> + 2<sup>y</sup> <sup>≤</sup> <sup>5</sup> LIA <sup>=</sup> - <sup>φ</sup><sup>1</sup> <sup>∨</sup> <sup>φ</sup><sup>2</sup> BV which squares with Corollary 2.

**Fig. 3.** Gapping and boxing for 7<sup>x</sup> + 3<sup>y</sup> <sup>≤</sup> 17 where *<sup>c</sup>* <sup>=</sup> 7, <sup>3</sup>, <sup>m</sup> = 8 and <sup>S</sup> = 4

#### **3.3 Boxing, Gapping and Flipping**

To handle inequalities which have indeterminates with negative coefficients, boxing and gapping are augmented with a third technique, which we have informally named flipping. Flipping transforms an inequality into a syntactic form which is amenable to boxing and gapping by reflecting the solutions of the inequality. To detail the transformation, we assume without loss of generality, that an inequality takes the syntactic form *<sup>c</sup>*<sup>+</sup> · *<sup>x</sup>*<sup>+</sup> <sup>+</sup> *<sup>c</sup>*<sup>−</sup> · *<sup>x</sup>*<sup>−</sup> <sup>≤</sup> <sup>b</sup> where *<sup>c</sup>*<sup>+</sup> <sup>&</sup>gt; **<sup>0</sup>** and *<sup>c</sup>*<sup>−</sup> <sup>&</sup>lt; **<sup>0</sup>**. Hence *<sup>x</sup>* <sup>=</sup> *<sup>x</sup>*<sup>+</sup> ◦ *<sup>x</sup>*<sup>−</sup> where ◦ denotes vector concatenation. The act of flipping reflects the solutions of the inequality simultaneously around the axes x<sup>−</sup> <sup>1</sup> = 0, . . . , x<sup>−</sup> <sup>e</sup> = 0 where *x*<sup>−</sup> = x<sup>−</sup> <sup>1</sup> ,...,x<sup>−</sup> <sup>e</sup> and <sup>e</sup> is the dimension of <sup>x</sup>−. The development starts with the flipping transformation itself:

**Definition 3.** Given <sup>e</sup> ∈ {1,...,d}, then the (semantic) flipping function <sup>F</sup><sup>e</sup> : <sup>M</sup><sup>d</sup> <sup>→</sup> <sup>M</sup><sup>d</sup> is defined:

$$F\_e(\langle x\_1^+, \dots, x\_{d-e}^+, x\_1^-, \dots, x\_e^- \rangle) = \langle x\_1^+, \dots, x\_{d-e}^+, m - 1 - x\_1^-, \dots, m - 1 - x\_e^- \rangle.$$

Given an inequality with negative coefficients, we derive a new inequality whose solutions coincide with the flipped solutions of the given inequality. This transformation is then lifted to formulae as follows:

**Definition 4.** Given a partition of *<sup>x</sup>* into the sub-vectors *<sup>x</sup>*<sup>+</sup> <sup>=</sup> x<sup>+</sup> <sup>1</sup> ,...,x<sup>+</sup> <sup>d</sup>−<sup>e</sup> and *x*<sup>−</sup> = x<sup>−</sup> <sup>1</sup> ,...,x<sup>−</sup> <sup>e</sup> , then the (syntactic) flipping function <sup>F</sup>*<sup>x</sup>*<sup>−</sup> is defined:

$$\begin{array}{c} F\_{\mathfrak{x}^{-}}(\mathfrak{c}^{+} \cdot \mathfrak{x}^{+} + \mathfrak{c}^{-} \cdot \mathfrak{x}^{-} \le b) = \mathfrak{c}^{+} \cdot \mathfrak{x}^{+} - \mathfrak{c}^{-} \cdot \mathfrak{x}^{-} + (m-1)(\mathfrak{c}^{-} \cdot \mathbf{1}) \le b, \\\ F\_{\mathfrak{x}^{-}}(f\_{1} \vee f\_{2}) = F\_{\mathfrak{x}^{-}}(f\_{1}) \vee F\_{\mathfrak{x}^{-}}(f\_{2}) \\\ F\_{\mathfrak{x}^{-}}(f\_{1} \wedge f\_{2}) = F\_{\mathfrak{x}^{-}}(f\_{1}) \wedge F\_{\mathfrak{x}^{-}}(f\_{2}) \\\ F\_{\mathfrak{x}^{-}}(\neg f) = \neg F\_{\mathfrak{x}^{-}}(f) \end{array}$$

(c) <sup>F</sup>y(φ1)=7<sup>x</sup> <sup>−</sup> <sup>3</sup><sup>y</sup> <sup>−</sup> <sup>9</sup> <sup>≤</sup> <sup>3</sup> <sup>∧</sup> <sup>F</sup>y(boxBV(*c*; 15)) (d) <sup>F</sup>y(φ2)=7<sup>x</sup> <sup>−</sup> <sup>3</sup><sup>y</sup> + 5 <sup>≤</sup> <sup>1</sup> <sup>∧</sup> <sup>F</sup>y(boxBV(*c*; 17))

**Fig. 4.** Flipping <sup>φ</sup> = 7<sup>x</sup> <sup>−</sup> <sup>3</sup><sup>y</sup> ≤ −4 where <sup>m</sup> = 8, *<sup>x</sup>* <sup>=</sup> x, y, *<sup>x</sup>*<sup>+</sup> <sup>=</sup> x and *<sup>x</sup>*<sup>−</sup> <sup>=</sup> y

The overall strategy involves applying boxing and gapping to an inequality derived by the flipping function F*<sup>x</sup>*<sup>−</sup> . The validity of this strategy is based on the following proposition:

**Proposition 2.** If <sup>|</sup>*x*<sup>−</sup><sup>|</sup> <sup>=</sup> <sup>e</sup> then

$$\begin{array}{c} -\ \left[F\_{\mathfrak{w}^-}(f)\right]\_{\mathsf{LVA}} = F\_e(\left[f\right]\_{\mathsf{LVA}})\\ -\ \left[F\_{\mathfrak{w}^-}(f)\right]\_{\mathsf{BV}} = F\_e(\left[f\right]\_{\mathsf{BV}}) \end{array}$$

A complete strategy for handling inequalities with negative coefficients is justified by the following corollary. The strategy entails flipping an LIA inequality, deriving a BV formula by boxing and gapping, and then flipping the BV formula.

**Corollary 3.** Suppose *c*<sup>+</sup> > **0**, *c*<sup>−</sup> < **0** and

$$\left\|\mathbf{c}^{+} \cdot \mathbf{x}^{+} - \mathbf{c}^{-} \cdot \mathbf{x}^{-} \leq b + (1 - m)(\mathbf{c}^{-} \cdot \mathbf{1})\right\|\_{\mathsf{LIA}} = \left\|\phi\_{0} \vee \phi\_{1} \vee \phi\_{2}\right\|\_{\mathsf{BV}}$$

$$\text{Then } \left\|\mathbf{c}^{+} \cdot \mathbf{x}^{+} + \mathbf{c}^{-} \cdot \mathbf{x}^{-} \leq b\right\|\_{\mathsf{LIA}} = \left\|F\_{\mathbf{z}^{-}}(\phi\_{0}) \vee F\_{\mathbf{z}^{-}}(\phi\_{1}) \vee F\_{\mathbf{z}^{-}}(\phi\_{2})\right\|\_{\mathsf{BV}}$$

Example 8. Consider <sup>φ</sup> = 7<sup>x</sup> <sup>−</sup> <sup>3</sup><sup>y</sup> ≤ −4 which is illustrated in Fig. 4(a). Then *<sup>x</sup>*<sup>+</sup> <sup>=</sup> <sup>x</sup>, *<sup>x</sup>*<sup>−</sup> <sup>=</sup> <sup>y</sup> and <sup>F</sup>*<sup>x</sup>*<sup>−</sup> (φ) = <sup>F</sup>y(φ)=7<sup>x</sup> + 3<sup>y</sup> <sup>−</sup> <sup>21</sup> ≤ −4. Fig. 3(a) shows - <sup>7</sup><sup>x</sup> + 3<sup>y</sup> <sup>−</sup> <sup>21</sup> ≤ −<sup>4</sup> LIA <sup>=</sup> - <sup>7</sup><sup>x</sup> + 3<sup>y</sup> <sup>≤</sup> <sup>17</sup> LIA and so building on example 6 - <sup>7</sup><sup>x</sup> + 3<sup>y</sup> <sup>≤</sup> <sup>17</sup> LIA <sup>=</sup> - <sup>φ</sup><sup>0</sup> <sup>∨</sup> <sup>φ</sup><sup>1</sup> <sup>∨</sup> <sup>φ</sup><sup>2</sup> BV. By corollary 3 it follows - φ LIA <sup>=</sup> - <sup>F</sup>y(φ0) <sup>∨</sup> <sup>F</sup>y(φ1) <sup>∨</sup> <sup>F</sup>y(φ2) BV where <sup>F</sup>y(φ0), <sup>F</sup>y(φ1) and <sup>F</sup>y(φ2) are given in Fig. 4(b), (c) and (d) respectively. Finally, to illustrate the handling of boxing, recall boxBV(*c*; 11) from example 6 and

$$\begin{array}{ccc} \text{box}\_{\mathsf{BV}}(\mathsf{c};11) = \left(x \le 0 \land y \le 3\right) & F\_{\langle y\rangle}\left(\text{box}\_{\mathsf{BV}}(\mathsf{c};11)\right) = \left(x \le 0 \land \left(-y + 7 \le 3\right)\right) \\ \lor \left(x \le 1 \land y \le 2\right) & \lor \left(x \le 1 \land \left(-y + 7 \le 2\right)\right) \\ \lor \left(x \le 1 \land y \le 1\right) & \lor \left(x \le 1 \land \left(-y + 7 \le 1\right)\right) \end{array}$$

Finally observe

$$\begin{cases} \left\lbrack x \le 0 \land \left( -y + 7 \le 3 \right) \right\rbrack \kern-5pt \begin{aligned} \left\lbrack \begin{aligned} & = \{ (0, y) \in \mathbb{M}^2 \mid 4 \le y \le 7 \end{aligned} \right\rbrack \\ \kern-5pt \begin{aligned} & = \{ (-y + 7 \le 2) \} \kern-5pt \end{aligned} \kern-5pt \begin{aligned} & = \{ (0, y) \in \mathbb{M}^2 \mid 4 \le y \le 7 \} \\ & = \{ (x, y) \in \mathbb{M}^2 \mid 0 \le x \le 1 \land 5 \le y \le 7 \} \end{aligned} \end{cases}$$

and that the disjunct (<sup>x</sup> <sup>≤</sup> <sup>1</sup> <sup>∧</sup> (−<sup>y</sup> + 7 <sup>≤</sup> 1)) is actually redundant.

### **3.4 Boxing, Gapping, Flipping and Demoding**

Griggio [16] gives a procedure for encoding machine arithmetic in LIA, illustrating that the resulting LIA interpolants can include inequalities such as <sup>−</sup>x2+x3−256−x2/256 ≤ 255 [16, Example 5]. Relaxing inequalities to include ceiling (or floor) functions can reduce the size of interpolants whilst simplifying their derivation [17]. These more general forms of interpolant include inequalities of the form *<sup>c</sup>* · *<sup>x</sup>* <sup>+</sup> <sup>n</sup>- *c*- · *<sup>x</sup>*/n ≤ <sup>b</sup> [9] or *<sup>c</sup>* · *<sup>x</sup>* <sup>+</sup> <sup>n</sup>- *c*- · *<sup>x</sup>*/n ≤ <sup>b</sup> [17], though for our purposes it is sufficient to consider *<sup>c</sup>* · *<sup>x</sup>* <sup>+</sup> <sup>n</sup>- <sup>2</sup><sup>n</sup> *<sup>c</sup>*- · *<sup>x</sup>*/2<sup>n</sup> ≤ <sup>b</sup> or *<sup>c</sup>*·*x*+n- <sup>2</sup><sup>n</sup>*c*- ·*x*/2<sup>n</sup> ≤ <sup>b</sup>, where the divisors are powers of 2, stemming from the way they model wrap-around in machine arithmetic. To extend boxing to these generalised interpolants we extend the LIA and BV semantics two new types of atomic constraint (though the definitions are almost vacuous):

**Definition 5.** If - <sup>≡</sup> *<sup>c</sup>* · *<sup>x</sup>* <sup>+</sup> <sup>n</sup>- *c*-· *<sup>x</sup>*/2<sup>n</sup> ≤ <sup>b</sup> then

$$\begin{aligned} \left[\ell\right]\_{\mathsf{LIA}} &= \left\{ x \in \mathbb{M}^d \, \middle| \, \mathbf{c} \cdot x + n' \big[ \mathbf{c'} \cdot x / 2^n \big] \le b \right\} \\ \left[\ell\right]\_{\mathsf{BV}} &= \left\{ x \in \mathbb{M}^d \, \middle| \, (\mathbf{c} \cdot x + n' \big[ \mathbf{c'} \cdot x / 2^n \big] \right) \text{ mod } m \le b \bmod m \right\} \end{aligned} $$

The following proposition shows generalised LIA interpolants are not an obstacle to boxing. These inequalities are handled through a transformation scheme which exploits the property that if <sup>n</sup> <sup>≤</sup> <sup>w</sup> then (*<sup>c</sup>* · *<sup>x</sup>* mod 2<sup>n</sup>) mod <sup>m</sup> <sup>=</sup> *<sup>c</sup>* · *<sup>x</sup>* mod 2<sup>n</sup>. We informally call this transformation tactic demoding, because like gapping and flipping, it is designed to increase the general applicability of boxing.

**Proposition 3.** Suppose 0 <sup>≤</sup> <sup>n</sup> <sup>≤</sup> <sup>w</sup> and - (*c* + n- *c*- ) · *<sup>x</sup>* <sup>−</sup> <sup>n</sup>- <sup>y</sup> <sup>≤</sup> <sup>b</sup> LIA <sup>=</sup> - φ BV. If y does not occur in *x* then

$$\left\| \mathbf{c} \cdot \mathbf{z} + n' 2^n \lfloor \mathbf{c'} \cdot \mathbf{z} / 2^n \rfloor \le b \right\|\_{\mathsf{LIA}} = \left\lceil \phi[y \mapsto \mathbf{c'} \cdot \mathbf{z} \bmod 2^n] \right\rceil\_{\mathsf{BV}}$$

Inequalities such as *<sup>c</sup>* · *<sup>x</sup>* <sup>+</sup> <sup>n</sup>- <sup>2</sup>n*c*- · *<sup>x</sup>*/2n ≤ <sup>b</sup> can be handled similarly. For completeness, we note that expansion can be applied for non-powers of 2:

**Proposition 4.** Suppose n > 0. Then

$$\left\| \left[ \mathbf{c} \cdot \mathbf{z} + n' \middle| \mathbf{c}' \cdot \mathbf{z} / n \right] \leq b \right\|\_{\mathsf{LNA}} = \left\| \bigvee\_{i=\ell}^{u} (\mathbf{c} \cdot \mathbf{z} \leq b - n'i \wedge ni \leq \mathbf{c}' \cdot \mathbf{z} \leq ni - 1) \right\|\_{\mathsf{LNA}}$$

where - = min{ *c*- · *<sup>x</sup>*/n | *<sup>x</sup>* <sup>∈</sup> <sup>M</sup><sup>d</sup>} and <sup>u</sup> = max{ *c*-· *<sup>x</sup>*/n | *<sup>x</sup>* <sup>∈</sup> <sup>M</sup><sup>d</sup>}.

## **4 Experiments**

To evaluate the performance of boxing we implemented a model checker based on the lazy abstraction (IMPACT) [25] algorithm. The model checker is implemented in Python 3.7.2 and uses MathSAT5 [9] for satisfiability checking and interpolation over LIA. The model checker parses a subset of the C language, but is rich enough to handle 312 benchmarks drawn from [2, 12]. The model checker was instantiated in one of three ways to use: (1) LIA interpolation [17]; (2) BV interpolation by covering the solutions of an LIA interpolate with columns (recall f<sup>2</sup> of section 2); and (3) BV interpolation by covering the solutions of an LIA interpolate using boxing, gapping and flipping. Experiments were performed using an Amazon Web Service EC2 c3.xlarge cloud architecture of 14 EC2 Computing Units [31] each equipped with 4 cores and 7.5 GB of RAM. The timeout for each run of IMPACT was set to 600 seconds.

All arithmetic is idealised in configuration (1) taking no account of integer overflow and underflow. This is not, in general, safe. In configurations (2) and (3) the model checker interprets machine arithmetic and bit operations using the LIA encoding of BV operations outlined in [16, Fig 1]. This is safe but complicates the LIA formulae, often substantially. One would expect this to enlarge the interpolants, even before boxing and gapping are deployed. We would also expect (1) to be substantially faster than (2) and (3). Due to differences in the semantics of arithmetic, we might also see differences in the number of programs proved to be safe or found to be unsafe. The experiments quantify these predictions. To discuss the experiments, (2) will be referred to as the naive encoding, even though it improves on complete enumeration (recall f<sup>1</sup> of section 2).

#### **4.1 Overall Result**

Table 1 summarises the outcomes of running IMPACT on all 312 programs, using the three different instances of interpolation, categorised as to whether the run proved safety (safe rows) or found a counterexample (unsafe rows). The Solved column of the left-hand table gives the total of the programs there were either shown to be safe or unsafe within 600 seconds. Time is the mean execution of a run (for all those programs which did not timeout). Size is mean


**Table 1.** Comparison of the theories: performance and correctness

LIA safe unsafe

BV safe <sup>90</sup> <sup>1</sup> unsafe 17 34

total number of atomic constraints in all interpolants encountered over a run (for those programs which did not timeout). We observe that more programs can be analysed to completion with LIA than with BV, as one would expect, but that BV (boxing) improves on BV (naive), the speedup being significant when proving safety.

The right-hand table compares a terminating run of LIA to a terminating run of BV (boxing). For 17 of these 142 runs, LIA (incorrectly) verified the program to be safe whereas BV found a counter-example. Unexpectantly for trex03 true-unreach-call.i.annot.c from [12], LIA found a counter-example but BV verified safety. This program contains three integers, x1, x2 and x3, which can become negative in the idealised arithmetic employed in LIA, triggering an assertion. But x1, x2 and x3 are actually unsigned.

### **4.2 Runtime for Naive encoding and Boxing**

The scatter plot of Figure 5 compares the runtime of the naive encoding against that of boxing and its allied techniques of gapping and flipping. The scatter plot excludes timeouts and depicts 151 pairs of runs. Almost all points are under the dotted line, indicating the boxing significantly improves performance. The line graph plots the ratio of the execution times, from which we observe that boxing does not accelerate the verification for almost half of the runs, but does speed it up between 2- and 256-fold for the other half.

#### **4.3 Interpolant Size for Naive encoding and Boxing**

The line graph on Figure 6 compares the relative size of interpolants for boxing versus the naive encoding. Size is the sum of the sizes of all the interpolants generated during a run, where the size of an interpolant is itself defined as the number of atomic constraints that occur within it. We observe that for most problems the size ratio is around one, but a second peak occurs at 1/32, giving an overall size reduction. The scatter plot explores how interpolant size correlates with runtime, showing how the relative size of interpolants varies with relative runtimes. We observe that reducing the size of interpolants improves runtime, and that two peaks of the line graph manifesting themselves as two clusters of points in the scatter plot.

**Fig. 5.** Runtime of boxing versus naive: scatter plot and ratio plot

**Fig. 6.** Size of interpolants in boxing versus naive and its impact on performance

## **5 Related work**

The problem of reasoning about machine arithmetic and wrapping arises not only in model checking, but abstract interpretation too, where solvers are augmented with support for relaxing abstractions by join rather than interpolation.

Despite the long-standing work [3, 7, 27] in deciding BV theories, there has been scant work on BV interpolation. Although not focussing on BV interpolation, an early work on deriving work-level interpolants [23] uses bit-vectors to interpolate equality logic. This logic supports equations of the form x = y and x = c where x and y are variables and c is drawn from a finite set of symbols C. Bit-vectors with width log2(|C|) are used to bit-blast equations [29] so that formulae are encoded entirely propositionally. Then a propositional resolution proof of the inconsistence of two formulae is lifted to the work-level.

Seminal work by Griggio [16] advocated encoding BV formulae in theories of increasing complexity. The pair of BV formulae are encoded in a theory whose interpolation engine is used to find an interpolant in that theory. The interpolant is then reinterpreted as a BV formula and tested to see if it is still an interpolant the pair of BV formulae. The approach resorts to bit-blasting if no simpler theory can find an interpolant, at the cost of losing world-level information. By way of contrast, Backeman et al. [2] propose a calculus over a core language, which supports interpolation and is rich enough to describe BV formulae, even making use of Groebner bases to express polynomial equality relationships. Since interpolation is performed within their core language, they do not aim to derive a BV interpolant, and therefore their work is orthogonal to ours. Yet if Backeman's procedure returns an interpolant in their core language and it could be interpreted as an LIA formula, which would seem likely for many cases, then our work could convert the LIA formula back to BV.

Further afield, polynomial algorithms for interpolation have developed for systems of linear congruence equations [19, section 4], conjunctions of linear Diophantine equations and disequations [19, section 6], and systems of mixed integer linear equations [19, section 7]. This comprehensive study stops short of using LIA to interpolate BV formula, mentioning the problem as future work.

Abstract domains have been proposed for tracking linear modulo relationships where the module is a power of 2 [13, 22, 28]. These domains, which are essentially specialist solvers, express more than linear equalities [21], while enabling the domain operations to be realised using machine arithmetic. Surprisingly, systems of linear inequalities can be reinterpreted to model machine arithmetic by just changing the concretisation function [32] and the handling of guards [32].

## **6 Concluding Discussion**

To repurpose efficient LIA interpolation engines to BV, we have shown how to systematically construct a BV formula so its solutions are exactly those of an LIA interpolant. Since an LIA interpolant summarises the reason for a conflict between two LIA formula, we seek to retain its compact structure by introducing no more than simple boxes around the LIA solutions which block extraneous BV solutions. When this encoding tactic, called boxing, is not applicable, gapping is used to decompose an LIA inequality into two or more inequalities which are amenable to boxing. We show how the size of the resulting BV interpolants are smaller than BV interpolants constructed by merely partitioning the LIA solutions into columns, and demonstrate how boxing and gapping improves the runtime of an interpolation-based model-checker. We instantiate a model-checker with LIA and BV to compare their performance, and conclude that with this encoding BV interpolation is feasible. Because of wrap-around, BV is substantially more complicated than LIA for interpolation, yet BV is no more than twice as slow as LIA for over half the benchmarks. Furthermore, the resulting BV interpolants can be validated, independent of LIA, just using a BV solver.

Acknowledgments We thank anonymous reviewers for their comments which helped us improve the paper. We thank Alberto Griggio for his help with MathSAT5 and looking into the intellectual property restrictions on sharing his extension to Kratos [8], a model checker based on lazy predicate abstraction. We also thank Chris Coppins with his help with IMPACT and Peter Backeman for tirelessly answering our e-mails. This work was funded, in part, by EPSRC EP/N020243/1 and by JST ERATO HASUO Metamathematics for Systems Design Project JPMJER1603.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Automated and Sound Synthesis of Lyapunov Functions with SMT Solvers**

Daniele Ahmed1,2, Andrea Peruffo<sup>1</sup> , and Alessandro Abate<sup>1</sup>

<sup>1</sup> Department of Computer Science, University of Oxford, OX1 3QD Oxford, UK name.surname@cs.ox.ac.uk <sup>2</sup> Amazon Inc, London, UK

**Abstract.** In this paper we employ SMT solvers to soundly synthesise Lyapunov functions that assert the stability of a given dynamical model. The search for a Lyapunov function is framed as the satisfiability of a second-order logical formula, asking whether there exists a function satisfying a desired specification (stability) for all possible initial conditions of the model. We synthesise Lyapunov functions for linear, non-linear (polynomial), and for parametric models. For non-linear models, the algorithm also determines a region of validity for the Lyapunov function. We exploit an inductive framework to synthesise Lyapunov functions, starting from parametric templates. The inductive framework comprises two elements: a *learner* proposes a Lyapunov function, and a *verifier* checks its validity - its lack is expressed via a counterexample (a point over the state space), for further use by the learner. Whilst the verifier uses the SMT solver Z3, thus ensuring the overall soundness of the procedure, we examine two alternatives for the learner: a numerical approach based on the optimisation tool Gurobi, and a sound approach based again on Z3. The overall technique is evaluated over a broad set of benchmarks, which shows that this methodology not only scales to 10-dimensional models within reasonable computational time, but also offers a novel soundness proof for the generated Lyapunov functions and their domains of validity.

**Keywords:** Lyapunov functions, automated synthesis, inductive synthesis, counter-example guided synthesis

## **1 Introduction**

Dynamical systems represent a major modelling framework in both theoretical and applied sciences: they describe how objects move by means of the laws governing their dynamics in time. Often they encompass a system of ordinary differential equations (ODE) with nontrivial solutions.

This work aims at studying the stability property of general ODEs, without knowledge of their analytical solution. Stability analysis via Lyapunov functions is a known approach to assert such property. As such, the problem of constructing relevant Lyapunov functions for stability analysis has drawn much attention in the literature [1,2]. A brief introduction to the concepts of Lyapunov stability is presented in Section 3. By and large, existing approaches leverage Linear Algebra or Convex Optimisation solutions, and are not fully automated nor numerically sound.

**Contributions** We apply an inductive synthesis framework, known as Counter-Example Guided Inductive Synthesis (CEGIS) [3,4] and recently employed in a number of control applications [5,6,7,8], to construct Lyapunov functions for linear, polynomial and parametric ODEs, and (for non-linear ODEs) to constructively characterise their domain of validity. CEGIS, originally developed for program synthesis based on the satisfiability of second-order logical formulae, is employed in this work with template Lyapunov functions and in conjunction with a Satisfiability Modulo Theory (SMT) solver [9]. Our results offer a formal guarantee of correctness in combination with a simple algorithmic implementation.

The synthesis of a Lyapunov function V can be written as a second-order logic formula F := ∃V ∀x : ψ, where x represents the state variables and ψ represents requirements that V needs to satisfy in order to be a Lyapunov function.

The CEGIS architecture is structured as a loop between two components, a "learner" and a "verifier". The learner provides a candidate function V and the verifier checks the validity of ψ over the set of x; if the function is not valid, the verifier provides a counterexample, namely a point ¯x in the state space where the candidate function does not satisfy ψ. The learner incorporates the generated counterexample ¯x, subsequently computes a new candidate function, and passes it back to the verifier.

We exploit SMT solvers to (repeatedly) assert the validity of ψ, given V , over a domain in the space of x. Satisfiability Modulo Theory (SMT) is a powerful tool to assert the existence of such a function. An SMT problem is a decision problem – a problem that can be formulated as a yes/no question – for logical formulae within one or more theories, e.g. the theory of arithmetics over real numbers. The generation of simple counterexamples ¯x is a key new feature of our technique.

Furthermore, in this work we provide two alternative CEGIS implementations: 1) a numerical learner and an SMT-based verifier, and 2) an SMT-based learner and verifier. The numerical generation of Lyapunov functions is based on the optimisation tool Gurobi [10], whereas the SMT-based one leverages Z3 [11].

**Related Work** The construction of Lyapunov functions is recognisably an important yet hard problem, particularly for non-linear ODE models, and it has been the objective of classical studies [12,13,14]. A know constructive result has been introduced in [15], which additionally provides an estimate of the domain of attraction. It has led to further work based on recursive procedures. Broadly, these approaches are numerical and based on the solution of optimisation problems. For instance, linear programming is exploited in [16] to iteratively search for stable matrices inside a predefined convex set, resulting in an approximate Lyapunov function for the given model. Alternative approximate methods include [1] ε-bounded numerical methods, techniques leveraging series expansion of a function, the construction of functions from trajectory samples, and the framework of linear matrix inequalities. The approach in [17] uses sum-of-squares (SOS) polynomials to synthesise Lyapunov functions, however its scalability remains an issue. The work in [18] uses SOS decomposition to synthesise Lyapunov functions for (non-polynomial) non-linear systems: the algorithmic implementation is know as SOSTOOLS [19,20]. [21] focuses on an analytical result involving a summation over finite time interval, under a stability assumption. Recent developments are in [22] and subsequent work, whereas surveys on this topic are in [1,2].

In conclusion, existing constructive approaches either rely on complex candidate functions (whether rational or polynomial), on semi-analytical results, or alternatively they involve state-space partitions (for which scalability with the state-space dimension is problematic) accompanied by correspondingly complex or large optimisation problems. These approximate methods evidently lack either numerical robustness, being bound by machine precision, or algorithmic soundness: they cannot provide formal certificates of reliability which, in safety-critical applications, can be an evident limit.

In [23] Lyapunov functions are soundly found within a parametric framework, by constructing a system of linear inequality constraints over unknown coefficients. A twofold linear programming relaxation is made: it includes interval evaluation of the polynomial form and "Handelman representations" for positive polynomials. Simulations are used in [24] to generate constraints for a template Lyapunov function, which are then resolved via LP, resulting in candidate solutions. Whilst the authors refer to traces as counterexamples, they do not employ the CEGIS framework, as in this work. When no counterexamples are found, [24] further uses dReal [25] and Mathematica [26] to verify the obtained candidate Lyapunov functions. The sound technique, which is not complete, is tested on low-dimensional models with non-linear dynamics.

The cognate work in [7,8,27] is the first to employ a CEGIS-based approach to synthesise Lyapunov functions. [7,8] focuses on such synthesis for switching control models - a more general setup that ours. [7] employs an SMT solver for the learner, and towards scalability solves an optimisation problem over LMI constraints for the verifier over a given domain (unlike our approach). As such, counterexamples are matrices, not points over the state space, and furthermore the use of LMI solvers does not in principle lead to sound outcomes. Along the above line, [8] expands this approach towards robust synthesis; [27] instead employs MPC (Model Predictive Control) techniques within the learner to suggest template functions, which are later verified via semi-definite programming relaxations (again, possibly generating counterexamples by solving optimisation problems over a given domain). Whilst inspired by this line of work, our contribution provides a simple (with interpretable counterexamples that are points over the state space) yet effective (scalable to at least 10-dimensional models)

SAT-based CEGIS implementation, which automates the construction of Lyapunov functions and associated validity domains, which is is sound, and also applicable to parameterised models.

The remainder of the paper is organised as follows. In Section 2 we present the SMT Z3 solver and the inductive synthesis (IS) framework. The implementation of CEGIS, for both linear and non-linear models, is explained in Section 3. Experiments and case studies are in Section 4. Finally, conclusions are drawn in Section 5.

## **2 Formal Verification – Concepts and Techniques**

In this work we use Z3, an SMT solver, and the CEGIS architecture, to build and to verify Lyapunov functions.

## **2.1 Satisfiability Modulo Theory**

A Satisfiability Modulo Theory problem is a decision problem formulated within a theory, e.g. first-order logic with equality [28]. The aim is to check whether a first-order logical formula within such theory, referred to as an SMT instance, is satisfied. For example, a formula can be the inequality 3x<sup>0</sup> + x<sup>1</sup> > 0 evaluated within the theory of linear inequalities. An SMT solver is a software that checks the satisfiability of an SMT instance, i.e. whether there exists an instantiation of the formula that evaluates to True. SMT solvers can be useful for function synthesis, namely to mechanically construct a function, given requirements on its output.

## **2.2 The Z3 SMT Solver**

Z3 [11,29] is a powerful SMT solver that integrates SAT solvers, theory solvers for equalities and interpreted functions, satellite solvers for arithmetic, real, array, and other theories, and an abstract machine to handle quantifiers. Receiving an input formula, Z3 represents it as an abstract syntax tree and processes it with its SAT solver core, until it returns SAT if the formula is satisfiable, UNSAT otherwise.

*Example 1 (Operation of Z3).* Consider the formula <sup>a</sup> <sup>=</sup> <sup>b</sup> <sup>∧</sup> <sup>f</sup>(a) = <sup>f</sup>(b) in the theory of equality. To verify its satisfiability, Z3 constructs a syntax tree, with nodes for each variable (a, b) and formulae (a = b, f(a), f(b), f(a) = f(b)). Once the tree is built, Z3 merges a with b and f(a) with f(b) to represent the equality operation and, in order to verify the correctness of the assertion, applies the congruence rule <sup>n</sup>−<sup>1</sup> <sup>i</sup>=0 x<sup>i</sup> = y<sup>i</sup> ⇒ f(x0,...x<sup>n</sup>−<sup>1</sup>) = f(y0,...y<sup>n</sup>−<sup>1</sup>) to conclude that a = b ⇒ f(a) = f(b). Finally, nodes a = b and f(a) = f(b) are merged and Z3 returns SAT. -

Of particular interest for the synthesis of Lyapunov functions, is the ability of Z3 to solve polynomial constraints. Z3 stores and exactly manipulates algebraic real numbers that are roots of rational univariate polynomials: this is done for an algebraic real α, by storing a polynomial p(x) for which p(α) = 0 and two rationals l, u such that p(x) = 0 for x ∈ (l, u) if and only if x = α. In this work, Z3 has been used through its Python APIs, named Z3Py. An example of a simple assertion verification follows.

*Example 2 (Assertion in Z3).* Consider the (valid) formula <sup>x</sup> <sup>≥</sup> <sup>0</sup> <sup>⇒</sup> <sup>3</sup><sup>x</sup> + 1 <sup>&</sup>gt; 0. The code using Z3Py results in:

```
x = Real('x')
s = Solver()
s.add(Implies(x >= 0, 3 * x + 1 > 0))
print(s.check())
```
which evaluates (as expected) to SAT. -

#### **2.3 Inductive Synthesis - CEGIS**

An approach to solve second-order logic problems, such as those characterising the synthesis of Lyapunov functions, is *inductive synthesis* (IS). IS infers general rules (or functions) from specific examples (observations), entailing the process of generalisation. Within the IS procedure, a synthesiser attempts the construction from a (usually small) subset of the original specifications. It then generalises to the complete specification by identifying patterns in the input data.

An exemplar of IS is the CEGIS framework. Fig. 1 depicts the relation between its two main components. It sets off with a given specification ψ over a set I for the synthesis. The synthesis engine (a component that will be also denoted as *learner* ) provides a candidate solution for <sup>ι</sup>, a subset of <sup>I</sup>, the space of possible inputs. This candidate solution is passed to a second component, called *verifier*, that acts as an oracle: either it approves the solution over the entire <sup>I</sup>, so that the process terminates, or it finds an instance ¯x (a counterexample in I) where the candidate solution does not comply with the specifications. The learner takes ¯x and adds it to ι, computing a new (more general) candidate solution for the problem. This cycle is repeated. Note that this algorithm might not terminate, depending on the structure of I, or might take many cycles to find a proper solution: in those instances, tailored candidate solutions and insightful counterexamples are necessary. In this work, the IS is implemented using SMTsolvers. The verifier finds counterexamples ¯x by seeking a witness of the negated formula ¬ψ, namely trying to prove that a violation of the formula exists. The learner might employ SMT solvers to solve the system of constraints generated by the counterexamples, i.e. to find a valid instance of such constraints, however in general it does not need to be sound, as it is the verifier that guarantees the soundness of the proposed solution. Section 3.1 illustrates the two CEGIS components, the learner L and the verifier Z in relation to Lyapunov function synthesis.

*Example 3 (CEGIS Operation).* Assume the task is the synthesis of a function g(x) that satisfies the following formula F(g(x)):

$$\exists \text{ } g(x) \,\,\forall x \in \mathbb{R}: \psi, \text{ where } \psi(g(x)) = g(x) + 1 > 0.$$

The learner L offers an initial (often na¨ıve, random or default) candidate, e.g. g(x) = x, and passes it to the verifier Z. The verifier checks the validity of <sup>ψ</sup>(x) = <sup>x</sup> + 1 <sup>&</sup>gt; <sup>0</sup>, <sup>∀</sup><sup>x</sup> <sup>∈</sup> <sup>R</sup>, by searching an instance ¯<sup>x</sup> that might invalidate the formula. Z finds that ¯x = −1 invalidates the formula, thus sends ¯x to L, which incorporates this counterexample to synthesise a new g(x). The learner now adds a constraint on the next candidate, as

$$C := g(-1) + 1 > 0, \quad \forall x \in \mathbb{R},$$

such that the new candidate solution satisfies the formula at ¯x = −1. The learner now proposes g(x) = x<sup>2</sup>, which satisfies C, and passes it to Z. The verifier searches for a counterexample to ψ(x<sup>2</sup>), but cannot find any. Thus, it exits the loop with an UNSAT answer, which proves that the synthesised function <sup>g</sup>(x) = <sup>x</sup><sup>2</sup> is valid <sup>∀</sup><sup>x</sup> <sup>∈</sup> <sup>R</sup>. -

**Fig. 1.** CEGIS-based inductive synthesis. The iterative procedure loops between a learner L and a verifier Z. L provides a candidate solution S to the verifier Z, which asserts its validity or outputs a counterexample ¯x. The learner provides a new solution encompassing also ¯x. The procedure stops once no counterexamples are found.

## **3 Automated and Sound Synthesis of Lyapunov Functions via CEGIS and SMT**

Consider a dynamical system ˙<sup>x</sup> <sup>=</sup> <sup>f</sup>(x), where <sup>f</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup><sup>n</sup>, and assume that the point <sup>x</sup><sup>e</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> is an equilibrium, namely such that <sup>f</sup>(xe) = 0 – without loss of generality, we assume that x<sup>e</sup> = 0 (the origin). The goal is assessing the stability of such equilibrium point via the synthesis of a Lyapunov function <sup>V</sup> (x) : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>. The stability of an equilibrium guarantees that trajectories starting by the equilibrium remain close to it at all times (how close can often be quantified, as done later in this work). If V (x) fulfils the following two conditions, ∀x ∈ D,

$$V(x) > 0, \quad \dot{V}(x) = \nabla V(x) \cdot f(x) \le 0,\tag{1}$$

where D is a domain of interest containing x<sup>e</sup> then the Lyapunov function ensures boundedness of the trajectories. In other words, for every initial point in a neighbourhood of xe, the trajectories of the model do not escape from D (with reference to notations introduced above, the condition in (1) represents the requirement ψ, and D denotes the set of inputs I). We use the following polynomial expression for the Lyapunov function

$$V(x) = \sum\_{l=1}^{c} (x^l)^T \ P\_l \ x^l,\tag{2}$$

where x<sup>l</sup> represents the element-wise exponentiation of vector x, i.e. element <sup>x</sup>(j) to the power <sup>l</sup>, <sup>∀</sup><sup>j</sup> = 1,...,n; <sup>P</sup><sup>l</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>×<sup>n</sup> is a weighting matrix associated with x<sup>l</sup> , and 2c is the order of the polynomial function. In order to obtain a proper Lyapunov function V (x), the synthesiser is asked to verify the specification expressed by the formula

$$F(V(x)): \forall x \in \mathcal{D}, V(x) > 0 \land \dot{V}(x) \le 0. \tag{3}$$

This specification requires the Lyapunov function to be positive definite, and not to increase along the trajectories of the model. For linear systems, unless otherwise stated, we consider <sup>D</sup> <sup>=</sup> <sup>R</sup><sup>n</sup> \ {0} and <sup>c</sup> = 1, as it is known that quadratic functions are sufficient to prove the stability of linear models over the whole state space. Formula (3) keeps the elements of P uninterpreted, and thus they are parameters to be found. Notice that the second-order formula

$$\exists P \in \mathbb{R}^{n \times n} : \forall x \in \mathcal{D}, V(x) > 0 \land \dot{V}(x) \le 0,$$

would return a boolean value, i.e. True or False: to obtain the synthesised V (x) function, we remove the existential quantifier.

### **3.1 The CEGIS Architecture for Lyapunov Function Synthesis**

We introduce the CEGIS architecture to find Lyapunov functions. To better illustrate the methodology, we start by considering linear models (the non-linear case is further discussed in Section 3.2). As mentioned earlier, two components characterise the CEGIS approach: a learner and a verifier. The CEGIS architecture takes the system matrix A and outputs a matrix P as the key component of the function V (x), verifying the conditions in Eq. (1). We denote by P¯i, i = 0, 1, 2,... the *candidate* matrices yet to be verified, i.e. the outputs of the learner. As anticipated earlier, referring to Eq. (2), we set <sup>c</sup> = 1 and <sup>D</sup> <sup>=</sup> <sup>R</sup><sup>n</sup>\{0}.

**Verifier** The scope of a verifier is twofold: generate a counterexample to the validity of the candidate Lyapunov function, or certify its validity over a domain of interest. We implement the verifier in Z3.

The methodology to assert the correctness of a Lyapunov function is as follows. Assume the learner computes a candidate Lyapunov function V (x) and passes it to the verifier (in case of a linear function, the learner offers a matrix P¯i). The goal of the verifier is to assert the validity of formula F from (3) according to the specification ψ in (1). The check is performed by negating F: if there exists a vector ¯x that satisfies ¬F, it is a counterexample for F; if it does not exist, formula F is valid and the candidate Lyapunov function is an actual Lyapunov function. The domain D is encoded as an additional formula. Assume, as an example, the domain is an hyper-sphere of radius one: D can be written formally as <sup>d</sup>: ||x||<sup>2</sup> <sup>≤</sup> 1. The final formula thus results in <sup>¬</sup><sup>F</sup> <sup>∧</sup> <sup>d</sup>.

A counterexample ¯<sup>x</sup> must satisfy the formula <sup>V</sup> (¯x) <sup>≤</sup> <sup>0</sup>∨V˙ (¯x) <sup>&</sup>gt; 0. Reasoning on either condition, it is easy to show that if there exists a counterexample ¯x invalidating a matrix P¯, then there exists an infinite number of counterexamples for this P¯. Thus, particularly for high-dimensional models the generation of meaningful counterexamples is crucial to find a Lyapunov function quickly.

Let us denote ¯xi, i = 1,... , the series of counterexamples provided by the verifier and P¯<sup>i</sup> the series of candidate Lyapunov function matrices provided by the learner. In this setting, the learner proposes the first default candidate matrix P¯0; the verifier will (possibly) provide a counterexample ¯x0; the learner includes x¯<sup>0</sup> in the set of constraints (cf. Section 3.1) and offers a new candidate P¯1.

In this work, we let Z3 generate counterexamples without any further goals. However, counterexamples can be generated adding constraints, e.g. linear independence or orthogonality. Intuitively, more constraints might generate "better" candidates by the learner, albeit at an increase in computational cost.

As intuition suggests, if we were to work with models having a diagonal matrix A, then the synthesis of diagonal candidates P¯<sup>i</sup> and of a diagonal solution P would reduce the number of variables needed, thus speeding up the computation. As such, if A is not diagonal but diagonalisable, the algorithm pre-computes the system diagonalisation and feeds it to the CEGIS architecture returning a matrix P for the diagonal system, which is then converted to a solution for the original model.

**Learner** A learner is the CEGIS component designated to suggest a candidate solution for the problem under consideration. Within our framework, a learner solves linear inequalities derived from F(V (¯x)) as per Eq. (3), while memorising the set of counterexamples {x¯<sup>i</sup> | ¬F(¯xi)} generated by the verifier. Whilst the verifier works over continuous domains, note that the learner only considers a *finite* number of points to synthesise the candidate Lyapunov function. At each iteration i, the learner is tasked to solve 2i linear inequalities: i inequalities for <sup>V</sup> <sup>≥</sup> 0 and <sup>i</sup> for <sup>V</sup>˙ <sup>≤</sup> 0 – this is two inequalities per counterexample, so a set of useful counterexamples is vital to achieve efficiency.

We implement two learners, for comparison: 1) a numerical and 2) a Z3 based learner. However, our CEGIS architecture can in principle accommodate any learner. The first learner uses Gurobi [10], a fast, commercial optimisation solver for, among others, linear and quadratic programming problems, supporting continuous variables. Notice that the synthesis is a linear program: variables pi,j , the entries of matrix P, appear linearly within the inequalities in F(V (¯xi)). Gurobi is thus expected to outperform an SMT solver in this specific task. However these variables do not represent real numbers, but floating point numbers that are approximated at machine precision. The second learner instead employs Z3, which is numerically sound and not affected by machine precision. Z3 solves an SMT instance to synthesise V (x): it asserts the satisfiability of Eq. (3) F(V (¯xi)) for all collected counterexamples ¯xi.

As mentioned earlier, the number of inequalities to be solved depends on the number of counterexamples, which can grow to be quite large. Whilst the verifier ought to generate useful counterexamples, the learner is optimised to output a matrix P¯<sup>i</sup> that is easy to handle. The comparison between a numerical learner (running on Gurobi) and a sound one (based on Z3) shows that the compromise between speed and soundness results is evident (cf. Section 4). Z3 is sound, yet slower when compared to the numerical learner.

Z3 offers an incremental feature to the learner. During each CEGIS loop, on the verification side the memory is cleared from the previous constraints as the verifier re-initialises the verification problem with a new candidate V (x). On the other hand, the learner keeps the previous synthesis instance adding a new constraint related to the latest counterexample. This incremental approach reduces the computational effort, as the learner does not initialise a new problem for every CEGIS loop.

#### **3.2 Lyapunov Function Synthesis for Non-linear Models**

The problem of synthesizing Lyapunov functions and their region of validity for a general non-linear system ˙x = f(x(t)) is approached via linearisation or via direct computation.

The linearisation approach consists of three steps for the learner: we first linearise the f(x(t)), obtaining

$$
\dot{\hat{x}}(t) = A\_L \hat{x}(t),
$$

where A<sup>L</sup> is the Jacobian of f(x(t)) evaluated at xe; we then compute matrix P – and quadratic Lyapunov function V (x) = x<sup>T</sup> P x – on the linearised system; finally, we find R, defined as the set in which the linear Lyapunov function is valid. Next, we detail the synthesis of region R. Consider, without loss of generality, an autonomous non-linear system with (at least one) equilibrium point x<sup>e</sup> = 0. Assume the CEGIS procedure is successful, i.e. it finds a Lyapunov function VL(x) = x<sup>T</sup> P x that guarantees the asymptotic stability of system x˜˙ = ALx˜ around xe. We now compute the region where VL(x) guarantees stability with the original system, i.e. ˙x = f(x). In view of the existence of VL(x) and by definition of linearisation, there exists a neighbourhood of the origin B<sup>0</sup> in which the derivative of the Lyapunov function V˙ (x) is non-positive; formally such set is defined as

$$\mathcal{B}\_0 = \{ x \in \mathbb{R}^n \backslash \{0\} \mid \dot{V}(x) \le 0 \},$$

where V˙ (x) is computed on the original system, namely

$$
\dot{V}(x) = \nabla V\_L(x) \cdot f(x).
$$

Let us define the boundary of <sup>B</sup><sup>0</sup> as <sup>∂</sup>B<sup>0</sup> <sup>=</sup> {<sup>x</sup> <sup>∈</sup> <sup>R</sup>n\{0} | <sup>V</sup>˙ (x)=0}. This set may be composed by single points or regions of the state space: in this case, we find r, the closest point to the equilibrium that belongs to ∂B0, as

$$r = \min\_{x \in \partial \mathcal{B}\_0} \sum\_l x(l)^2.$$

We finally compute region R as a hyper-sphere of radius r,

$$\mathcal{R} = \{x \in \mathbb{R}^n \backslash \{0\} \mid \|x\|^2 < r^2\},\tag{4}$$

defining the region where the Lyapunov function is valid. Finally, region R is tested with the verifier: formula F(V (x)) from Eq. (3) is passed to Z3 with D = R. Our implementation uses a numerical optimisation technique to compute a value for r that is passed to Z3, as Z3 does not natively handle non-linear optimisation problems. With this selection, the region R represents a sound under-approximation of the maximal stability region. The linearisation method is used in view of its rapid and effective synthesis capability. However, it produces a Lyapunov function that does not ensures global stability when one of the eigenvalues of A<sup>L</sup> is equal to zero. This is a well-known limitation of the linearisation, which suggests a more formal approach, called *direct computation method*.

The direct computation method, as the name suggests, analytically computes V (x) and V˙ (x) from a template V (x) as in Eq. (2). The learner is tasked with resolving conditions ψ obtained by a light relaxation of the two inequalities in (1), namely

$$V(x) \ge 0, \quad \dot{V}(x) = \nabla V(x) \cdot f(x) \le 0.$$

Note that the first inequality is not strict: this relaxation allows for a faster computation of a candidate. The verifier, on the other hand, produces counterexamples for V (x) > 0, thus retaining soundness of the overall procedure. The CEGIS framework allows the separation between synthesis and verification. So whilst the learner might propose candidates being completely independent from domain D, the verifier is responsible to assert or to find the domain of validity D. Our implementation establishes that at first the verifier checks the validity of <sup>V</sup> (x) on the whole state space <sup>D</sup> <sup>=</sup> <sup>R</sup><sup>n</sup>; if the computation is not successful – namely, the computational time is greater than a predefined timeout – the verifier checks its validity over a smaller region, e.g. <sup>D</sup> = [−1, 1]<sup>n</sup>, and so on. If also this program fails, the algorithm returns an empty V (x). Recall that our algorithm is in general not complete - indeed, consider the trivial problem of the synthesis of a Lyapunov function for an unstable system, which is not possible: in this case, the CEGIS procedure will surely return an empty V (x).

## **3.3 Lyapunov Function Synthesis for Parametric Models**

Parametric models represent a challenge for both sound and numerical solvers. Let us remark that both Gurobi and Z3 cannot synthesise functions in the presence of uncertainty, whereas Z3 can provide counterexamples using one or more variables as fixed parameters, using the quantifier ForAll.

Let us consider variable x, a parameter μ and a formula ψ(x, μ): Z3 can find a counterexample for all values of μ by validating ForAll(μ, ψ). If μ belongs to a range [l, u], Z3 can find a counterexample by checking ψ ∧ μ ≥ l ∧ μ ≤ u. This provides a counterexample (¯x, μ¯) for x and μ, respectively.

The synthesis procedure is split into two steps, in view of the inability of Z3 and Gurobi to propose parametric solutions. The first step synthesises a candidate Lyapunov function solely using the constraint V (x) > 0, in which no parameter appears. The second step evaluates the constraint <sup>V</sup>˙ <sup>≤</sup> 0 to propose a parametric Lyapunov function exploiting the results from the first step. The following example details the procedure.

*Example 4.* Consider a two-dimensional linear parametric system [23] and a candidate Lyapunov function

$$\begin{cases} \dot{x} = y \\ \dot{y} = -(2+\mu)x - y \end{cases}, \quad V(x,y) = p\_1 x^2 + p\_2 y^2.$$

Assume the first guess of the learner is invalid, i.e. the verifier finds a counterexample for the validity of V (x, y). The counterexample (¯x, y¯) is then sent to the learner. The synthesis procedure is split into two steps: the first step entails the synthesis solely accounting for V (¯x, y¯) > 0. The learner is tasked to solve

$$V(\bar{x}, \bar{y}) = p\_1 \bar{x}^2 + p\_2 \bar{y}^2 > 0,$$

where p1, p<sup>2</sup> are the variables of the inequality. The learner will propose values ¯p<sup>1</sup> and ¯p<sup>2</sup> satisfying the inequality. The second step removes one of the synthesised p¯i, e.g. ¯p1, in order to re-synthesise it including the parameters found in V˙ . In practical terms, the expression of V˙ is evaluated at ¯x, ¯y and ¯p2, as

$$\dot{V} = 2p\_1 \bar{x}\bar{y} - 2\bar{p}\_2 \bar{y}^2 - 2(\mu + 2)\bar{x}\bar{y} \le 0 \Longrightarrow p\_1 \le \bar{p}\_2 \left(\frac{\bar{y}}{\bar{x}} + 2 + \mu\right).$$

We choose the value p<sup>1</sup> that satisfies the equality. The candidate Lyapunov function thus results in V (x, y)=¯p<sup>2</sup> y¯ <sup>x</sup>¯ +2+ μ · <sup>x</sup><sup>2</sup> + ¯p<sup>2</sup> · <sup>y</sup><sup>2</sup>. This procedure holds as long as ¯x = 0: if this is not the case, we can either choose to synthesise a new value for p<sup>2</sup> or simply maintain the numerical values obtained after the first step. In the latter case, once the candidate Lyapunov function is passed to the verifier, a new counterexample will be generated and the procedure can be repeated until a parametric Lyapunov function is found and verified. Another possible approach is based on the mixed-terms removal: p<sup>1</sup> is synthesised so that the terms carrying ¯xy¯ cancel out. Further, the choice of p<sup>1</sup> satisfying the equality is arbitrary: we can add a negative constant to its value to solve the strict inequality instead. Finally, more than one parameter ¯p<sup>i</sup> can be removed in the second step: this can spread the parametric coefficients among more than one pi. However, this is likely to increase the computational cost in view of the inequality being a function of more than one variable. -

## **4 Case Studies and Experiments**

In this Section we outline a few experiments to challenge the validity of our approach. Our technique is coded in Python 2.7 [30], using external libraries as the numerical solver Gurobi and the SMT solver Z3 (cf. Section 2). Specifically, we compare two CEGIS architectures:


later denoted as *Gurobi-CEGIS* and *Z3-CEGIS*, respectively, against the optimisation toolbox SOSTOOLS. Whilst Z3 is an efficient verifier, it carries the weight of exact representations. We therefore compare its use within the learner to that of a numerical solver such as Gurobi - recall that the learner does not need to be sound. A relevant feature of the synthesis procedure is its *linearity* in the entries of matrix P: we expect an efficient LP solver to outperform an SMT solver. As such, we study the expected tradeoff between speed and precision. As specified earlier, the initial candidate for the learner P¯<sup>0</sup> is arbitrary: we challenge the procedure by setting <sup>P</sup>¯<sup>0</sup> <sup>=</sup> <sup>−</sup>I, which does not satisfy the first positivity condition for Lyapunov functions, thus showing that even with an ill-suited initial guess the procedure can rapidly synthesise a valid Lyapunov function. SOSTOOLS is a sum-of-squares optimisation toolbox available for MATLAB, equipped with the solver SeDuMi [31]. It can be used to solve a wide range of problems, from mixed continuous-discrete optimisations to finding Lyapunov functions for polynomial dynamical systems.

We consider linear, non-linear and parametric ODEs with the origin as (one of) the equilibrium(a), and aim to obtain a Lyapunov function guaranteeing the stability of such equilibrium point. The procedure entails the following steps:


Let us emphasise that Z3 is unable to fully handle non-polynomial terms, which represents the only limitation of our approach. Unlike most of the literature, counterexamples are not limited to a finite set but searched over the whole R<sup>n</sup>.

Linear models are certainly an easier task than polynomial systems. The study with linear models focuses mainly on the scalability of the method, encompassed by the average and maximum/minimum computational time, and the number of iterations performed. We generate N = 100 random linear models of dimension n ∈ [3, 10]. For each linear system, the entries of matrix A range within [−1000, 1000] <sup>∈</sup> <sup>R</sup>. For each test we set <sup>c</sup> = 1 (cf. Eq. (2)), namely we impose a quadratic structure to the Lyapunov function, and collect the number of iterations of the procedure, i.e. the number of counterexamples needed to compute a valid Lyapunov function, and the total elapsed time. Recall that the initial synthesiser's candidate is <sup>P</sup>¯<sup>0</sup> <sup>=</sup> <sup>−</sup>I, which challenges the reliability of our method with a bad initial condition. A 180 seconds timeout is set for every run.

Results comparing the numerical learner using Gurobi and the sound learner using Z3 are reported in Table 1. The average values, as well as the minimum and maximum value among the N random systems, are computed on the synthesis tests that have not timed out. The number of timed out procedures are also listed in the Table.

With regards to non-linear and parametric models, we assess our approach over a suite of examples taken from related work on Lyapunov function synthesis [18], [19], [20], [23], which are reported in the following. The value c from Eq. (2) is set heuristically as ceil(d/2), where d is the order of the system (this choice follows the common interpretation of Lyapunov maps as storage functions). Due to ease of implementation, only Z3-CEGIS performs the synthesis with c > 1 and in the case of parametric models. Results in terms of computational time and iterations are reported in Table 2. Experiments are run on a 4-core Dell laptop with Fedora 30 and 8GB RAM.

*Example 5.* Consider the model [18]

$$\begin{aligned} \dot{x}\_1 &= -x\_1^2 - 4x\_2^3 - 6x\_3 x\_4, & \dot{x}\_4 &= x\_1 x\_3 + x\_3 x\_6 - x\_4^3, \\ \dot{x}\_2 &= -x\_1 - x\_2 + x\_5^3, & \dot{x}\_5 &= -2x\_2^3 - x\_5 + x\_6, \\ \dot{x}\_3 &= x\_1 x\_4 - x\_3 + x\_4 x\_6, & \dot{x}\_6 &= -3x\_3 x\_4 - x\_5^3 - x\_6. \end{aligned}$$

Z3-CEGIS finds the Lyapunov function V (x)=2x<sup>2</sup> 1+4x<sup>4</sup> 2+x<sup>2</sup> 3+11x<sup>2</sup> 4+2x<sup>4</sup> 5+4x<sup>2</sup> 6, ensuring stability over the whole state space. SOSTOOLS fails to find a 2nd<sup>−</sup> or 4th−order Lyapunov function for this model. -

*Example 6.* Consider the model [23]

$$\begin{cases} \dot{x} = -x^3 + y\\ \dot{y} = -x - y. \end{cases}$$

Gurobi-CEGIS finds the Lyapunov function <sup>V</sup> (x)=5 · <sup>10</sup>−<sup>5</sup>x<sup>2</sup> + 5 · <sup>10</sup>−<sup>5</sup>y<sup>2</sup>, whereas Z3-CEGIS finds V (x)=0.5x<sup>2</sup> + 0.5y<sup>2</sup>, both ensuring global stability. The linearised Gurobi-CEGIS finds <sup>V</sup> (x)=3.<sup>2</sup> · <sup>10</sup>−<sup>3</sup>x<sup>2</sup> + 3.<sup>2</sup> · <sup>10</sup>−<sup>3</sup>y<sup>2</sup>, whereas SOSTOOLS finds V (x)=0.7844(x<sup>2</sup> +y<sup>2</sup>), also ensuring stability over the whole state space. -

*Example 7.* Consider the system [20]

$$\begin{cases} \dot{x}\_1 = -x\_1^3 - x\_1x\_3^2, \\ \dot{x}\_2 = -x\_2 - x\_1^2x\_2, \\ \dot{x}\_3 = -x\_3 - \frac{3x\_3}{x\_3^2 + 1} + 3x\_1^2x\_3. \end{cases}$$

Note that the term x<sup>2</sup> <sup>3</sup>+1 is always non-negative, therefore we can consider <sup>V</sup>˙ (x)· (x<sup>2</sup> <sup>3</sup> + 1) <sup>≤</sup> 0. Gurobi-CEGIS finds the Lyapunov function <sup>V</sup> (x) = 32 · <sup>10</sup>−<sup>4</sup>x<sup>2</sup> <sup>1</sup> + <sup>32</sup> · <sup>10</sup>−<sup>4</sup>x<sup>2</sup> <sup>2</sup> + 8 · <sup>10</sup>−<sup>4</sup>x<sup>2</sup> <sup>3</sup>, whereas Z3-CEGIS finds V (x)=3x<sup>2</sup> <sup>1</sup> + x<sup>2</sup> <sup>2</sup> + x<sup>2</sup> <sup>3</sup>, and finally SOSTOOLS finds the function V (x)=6.659x1<sup>2</sup> + 4.628x2<sup>2</sup> + 2.073x3<sup>2</sup>, all ensuring global stability. - *Example 8.* Consider the system [23]

$$\begin{cases} \dot{x} = -x - 1.5x^2y^3, \\ \dot{y} = -y^3 + 0.5x^3y^2. \end{cases}$$

Z3-CEGIS finds V (x)=1/3x<sup>2</sup> + y2, valid on the whole R2, whereas SOS-TOOLS finds V (x)=0.4707x<sup>2</sup> + 1.412y2, with a stability region of radius r = 68. Gurobi-CEGIS returns an error, as it finds V (x)=1.00066454641347x<sup>2</sup>+ 2.99933545358653y<sup>2</sup> that is *not* a valid Lyapunov function. The correct solution, V (x) = x<sup>2</sup> + 3y<sup>2</sup>, can not be attained in view of lack of convergence of the optimisation algorithm. On the other hand, the linearised Gurobi-CEGIS delivers <sup>V</sup> (x) = 32 · <sup>10</sup>−<sup>4</sup>x<sup>2</sup> + 2 · <sup>10</sup>−<sup>4</sup>y<sup>2</sup> with a radius <sup>r</sup> = 1.25. -

*Example 9.* Consider the system [23]:

$$\begin{aligned} \dot{x}\_1 &= -x\_1 + x\_2^3 - 3x\_3 x\_4, & \dot{x}\_3 &= x\_1 x\_4 - x\_3, \\ \dot{x}\_2 &= -x\_1 - x\_2^3, & \dot{x}\_4 &= x\_1 x\_3 - x\_4^3. \end{aligned}$$

Z3-CEGIS finds the Lyapunov function V (x)=2x<sup>2</sup> <sup>1</sup> + x<sup>4</sup> <sup>2</sup> + 3201/1024x<sup>2</sup> <sup>3</sup> + 2943/1024x<sup>2</sup> <sup>4</sup>, ensuring global stability. SOSTOOLS, on the other hand, finds a complex 4th order polynomial, omitted here for brevity, with a stability region that is hard to characterise analytically. -

*Example 10.* Consider the parametric linear system [23]

$$\begin{cases} \dot{x} = y, \\ \dot{y} = -(2+\mu)x - y, \end{cases}$$

where μ ∈ (−2, 5]. Z3-CEGIS discovers the Lyapunov function V (x)=(μ + 2)x<sup>2</sup> + y<sup>2</sup>, ensuring stability on the whole state space. On the other hand, SOS-TOOLS fails to find a solution when setting V (x, μ) to be independent from, linear in, or quadratic in μ. -

*Example 11.* Consider the parametric system [23]

$$\begin{cases} \dot{x} = -(1+\mu\_1)x + (4+\mu\_2)y, \\ \dot{y} = -(1+\mu\_3)x - \mu\_4 y^3, \end{cases}$$

where μ<sup>i</sup> ∈ [0, 100] for i = 1,... 4. Z3-CEGIS discovers the Lyapunov function <sup>V</sup> (x) = <sup>μ</sup><sup>3</sup> + 1 <sup>μ</sup><sup>2</sup> + 4x<sup>2</sup> <sup>+</sup> <sup>y</sup><sup>2</sup> that asserts stability on the whole state space, whereas SOSTOOLS can not find a solution considering V (x) independent from, linear in, or quadratic in μi, where i = 1,..., 4. -

As expected, Gurobi is faster than Z3 in terms of iterations and computational time. The gap becomes larger with a high-dimensional system, as the SMT learner does not implement any optimisation techniques. The Z3-CEGIS synthesis is performed via an SMT call, which grows in complexity as the number of constraints – related to the number of counterexamples – increases. Gurobi, on the other hand, using optimisation techniques converges faster to a candidate solution that is closer to the actual solution. Our approach outperforms SOS-TOOLS in terms of computational time, and it is able to handle parametric and complex models.

Notice that the coefficients of the Lyapunov function synthesised by Gurobi are small in magnitude, as the linear programming problem can encompass the minimisation of coefficients in its setup. On the other hand those obtained from Z3 (rational fractions) are arguably more interpretable. A very interesting result comes from Example 8. Gurobi-CEGIS converges towards the correct Lyapunov function, yet it can not reach the exact numerical values in view of the algorithmic precision. Gurobi numerical guidelines [10] suggest that, as a rule of thumb, the ratio of the largest to the smallest coefficient of the LP problem should be less than 10<sup>9</sup>. In our setting, the coefficients are the counterexamples found by Z3, which might require higher precision. In this case, the issue is (probably) caused by a counterexample ¯x [−755145, 1/8], where the first element is actually represented as a (very long) ratio between two integers. The ratio between the two ¯x coefficient is in the order of 10<sup>7</sup>. Roughly speaking, the counterexamples generated by Z3 depend on the complexity of the tested model: a high-order system might generate numerically ill-conditioned counterexamples, as this example shows. It is also significant how the numerical algorithm tries to converge to a correct solution. The first candidate Lyapunov function results in V (x)=1.07079661938449x<sup>2</sup>+2.92920338061551y<sup>2</sup> and it takes 99 counterexamples to reach the final value (cf. Example 8), until the procedure stops, resulting in an infeasible problem. Even enveloping the numerical values with the Python Sympy objects Rational, Decimal, Fraction, or the function simplify do not help in this context, the limitation being Gurobi's numerical precision.


**Table 1.** Comparison between Gurobi-CEGIS and Z3-CEGIS over <sup>n</sup>-dimensional linear models. The first values are the average performance on the N = 100 randomly generated models, and within brackets the minimum and maximum values. Oot is the number of runs (out of N) not finishing after 180 [sec].


**Table 2.** Comparison between Gurobi-CEGIS, Z3-CEGIS and SOSTOOLS for nonlinear models (see Examples description in main text). The result for Gurobi-CEGIS in Example 8 is obtained via linearisation.

## **5 Conclusions and Future Work**

In this work, we have studied the problem of automated and sound synthesis of Lyapunov functions. We have exploited a CEGIS framework, equipped with a sound verifier (the Z3 SMT solver) and with either a numerical LP solver (Gurobi) or a sound (Z3) learner.

We have provided a simple – yet effective – methodology to synthesise Lyapunov functions for linear, polynomial and parametric systems and shown evidence of scalability and reliability of our method using benchmarks from the literature. We have in particular synthesised quadratic Lyapunov functions for linear models and verified their validity on the whole state space. We have tackled non-linear models following two approaches: either 1) the computation of Lyapunov functions over the linearised system and the synthesis of its validity region; or 2) the direct computation of a higher-order Lyapunov function.

Future work includes the implementation of synthesis techniques for Gurobi-CEGIS for high-order and parametric models, together with the study of optimisation techniques for the synthesis in Z3-CEGIS: the tuning of the SMT solvers leaves much room, for example in order to provide insightful counterexamples or to additionally optimise an objective function. Further, we aim at embedding CEGIS with neural networks (as function approximators) to replace the learner, whilst maintaining the verification in the hands of an SMT solver - this approach has been recently pursued also in [32].

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## A Study of Symmetry Breaking Predicates and Model Counting

Wenxi Wang1, Muhammad Usman1, Alyas Almaawi1, Kaiyuan Wang2, Kuldeep S. Meel3, and Sarfraz Khurshid<sup>1</sup>

> <sup>1</sup> University of Texas at Austin, Austin, TX, USA <sup>2</sup> Google Inc., Sunnyvale, CA, USA <sup>3</sup> National University of Singapore, Singapore

Abstract. Propositional model counting is a classic problem that has recently witnessed many technical advances and novel applications. While the basic model counting problem requires computing the number of all solutions to the given formula, in some important application scenarios, the desired count is not of *all* solutions, but instead, of all *unique solutions up to isomorphism*. In such a scenario, the user herself must try to either use the full count that the model counter returns to compute the count up to isomorphism, or ensure that the input formula to the model counter adequately captures the *symmetry breaking predicates* so it can directly report the count she desires.

We study the use of *CNF-level* and *domain-level* symmetry breaking predicates in the context of the state-of-the-art in model counting, specifically the leading approximate model counter ApproxMC and the recently introduced exact model counter ProjMC. As benchmarks, we use a range of problems, including structurally complex specifications of software systems and constraint satisfaction problems. The results show that while it is sometimes feasible to compute the model counts up to isomorphism using the full counts that are computed by the model counters, doing so suffers from poor scalability. The addition of symmetry breaking predicates substantially assists model counters. Domain-specific predicates are particularly useful, and in many cases can provide *full* symmetry breaking to enable highly efficient model counting up to isomorphism. We hope our study motivates new research on designing model counters that directly account for symmetries to facilitate further applications of model counting.

## 1 Introduction

Propositional model counting is the classic problem of counting the number of all solutions for the given formula in propositional logic. While the core problem is an integral part of complexity theory literature, advances in propositional satisfiability (SAT) solvers and other decision procedures in the last decade have led to much progress in tackling this problem in innovative ways [7, 9, 10, 15, 17, 31, 39, 40, 47, 49, 50, 56, 64]. These advances have fueled the application of model counters in various software verification and reliability domains, e.g., to perform probabilistic analyses [13, 26, 28], check and repair string manipulation code [9, 41], and estimate information leakage using quantified information flow [19, 44].

While the basic model counting problem requires computing the number of *all* solutions, in some important application scenarios, the desired count is not of all solutions, but instead, of all *unique solutions up to isomorphism*, i.e., *non-isomorphic* (also called *non-symmetric*) solutions. For example, consider the context of software *reliability analysis* [26] where a goal is to find the number of inputs that can lead to an assertion violation, or *bounded exhaustive testing* [14, 42, 62, 68] where the goal is to estimate the total number of inputs that exist for a certain bound on the input size to decide what bound to use to stay within the testing budget. The desired counts in these cases are of non-isomorphic inputs, which are *non-equivalent* with respect to behaviors that a program can have because two inputs that are equivalent (and possibly not identical) produce the same output [66]. As another example, consider computing the number of solutions to a *constraint satisfaction problem* (CSP) [45], e.g., the number of unique ways 8 queens can be arranged on a fixed chess board such that no queen is under attack [6]. Once again, one is typically interested in the number of non-symmetric solutions because the indistinguishability of queens implies that a user does not consider two solutions obtained by swapping positions of queens to be unique.

In such scenarios, the user has two basic options. One option is to compute the full count using the model counter, and then use mathematical reasoning about symmetries to project the full count to the desired count. Doing so is straightforward in some cases, e.g., if each solution consists of n indistinguishable objects of the same type and the composition of each solution implies that each permutation of those n objects leads to a distinct (albeit isomorphic) solution, dividing the full count by n! gives the count for non-isomorphic solutions; doing so is however, not always easy, for example when different solutions have different number of objects that can be permuted to form non-identical solutions. The other option is to ensure the formula that is input to the model counter includes *symmetry breaking predicates* [20, 21], i.e., additional constraints that only allow canonical solutions from each isomorphism class, so the model counter can report the desired count.

Symmetry breaking predicates can be added using three basic approaches [29]. Perhaps the most common approach is to add them at the CNF-level by using an off-the-shelf tool [8,23], which takes as input a CNF formula and creates symmetry breaking predicates for it. Another common approach is to create them at the problem domain level using a domain-specific tool [58], and then translate the formula and predicates together to CNF. A third approach is to add them *manually* at the problem domain level [38, 59], and then translate to CNF.

A goal of our work is to study what is the best way to add symmetry breaking predicates (if at all) to obtain *precise* counts of non-isomorphic solutions. We conduct the study in the context of the state-of-the-art in model counting, specifically the leading approximate model counter *ApproxMC* [16, 17, 52] and the recently introduced exact model counter *ProjMC* [40]. ApproxMC and ProjMC embody very different algorithms for model counting and provide us a diverse set of tools for the study. ApproxMC employs novel approximation methods to efficiently predict highly accurate model counts with formal guarantees, and is now in its third generation (called ApproxMC3 [52]). ProjMC uses a recursive algorithm and employs a disjunctive decomposition method together with a search for disjoint components, and just had its first public release.

As benchmark formulas, we use a range of problems, including structurally complex specifications of software systems [34] and constraint satisfaction problems [45]. To create the benchmark formulas, we employ the Alloy toolset [34] and its Kodkod backend [58]. Alloy allows writing formulas in relational first order logic with transitive closure, and has been used in academia and industry for design and specification of systems [11, 18, 35, 37, 65, 67, 70] as well as for various forms of analyses of code [27, 32, 36, 42, 48, 69]. The Alloy analyzer translates Alloy formulas with respect to a *scope*, i.e., bound on the universe of discourse, into propositional logic to create CNF problems that are solved using off-the-shelf SAT solvers [25]. Alloy supports fully automatic (partial) symmetry breaking at the level of Alloy specifications [51,57] by adapting Crawford's symmetry breaking predicates [20], which are *statically* added to the formula *before* the solvers solve it. Alloy provides an ideal vehicle for evaluating the different approaches to symmetry breaking that are our focus in this study.

Similar to other techniques that use CNF-based backends, the Alloy analyzer translates problems from a higher-level (Alloy) to a lower-level (CNF). This translation often introduces new boolean variables in the resulting formula, which are not essential for creating the CNF formula but are required for a compact (feasible) encoding in CNF [60]. As a result, the translated formula is *equisatisfiable* to the original formula but may not be *equivalent* to it, and hence it may be the case that the model count for the CNF formula is very different from the original formula. Several modern model counters [16, 40, 50] readily handle this case by providing support for *projected model counting* [10], i.e., computing the model count with respect to a *subset* of all the variables. For Alloy, the subset is the *primary variables*, i.e., all boolean variables that directly correspond to the variables in the Alloy specification.

For each benchmark formula f, we create three model counting problems using automatic tools: 1) f with no symmetry breaking, which we create by setting Alloy's default symmetry breaking to *off* ; 2) f with symmetry breaking predicates added at the problem domain level, which we create by having Alloy's default symmetry breaking turned *on*; and 3) f with symmetry breaking predicates added at the CNF level, which we create by first using Alloy to create a CNF formula with no domain-level symmetry breaking, and then using the BreakID [23] tool to add CNF-level symmetry breaking predicates using its default settings. In addition, for select benchmarks we create formulas with manually added domain-specific symmetry breaking predicates, which we write in Alloy following previous work [38].

The results show that while it is sometimes feasible to compute the model counts up to isomorphism using the full counts that are computed by the model counters, doing so suffers from poor scalability. The addition of symmetry breaking predicates substantially assists model counters, although it is a well-known feature in *SAT solving* supported by theory finding [46, 61]. Domain specific predicates are particularly effective, and in many cases, can provide *full* symmetry breaking to enable highly efficient model counting up to isomorphism. We were surprised by the extent of the impact. Since the addition of symmetry breaking predicates introduces new dependencies among the variables, we expected these dependencies to make the formula more complex and perhaps less amenable to efficient model counting. However, the sheer reduction in the number of solutions caused by symmetry breaking more than compensates for the additional logical complexity of the formula. In cases where it was possible to create *full* symmetry breaking predicates, the model count for the formula *with* the predicates was computed up to a few orders of magnitude faster than the formula with *no* symmetry breaking predicates.

A key lesson of our study (in the context of the model counting problems considered) is: *if non-isomorphic solution counts are desired, use full symmetry breaking predicates at the domain-level whenever feasible – even if it is straightforward to compute the number of non-isomorphic solutions from the number of all solutions, or even if the symmetry breaking constraints have to be written manually*. This paper makes the following contributions:


We believe there is an important *bi-directional* relation between symmetry breaking and model counting whereas: 1) in one direction the model counters directly support computing the counts for non-isomorphic solutions to facilitate applications that so require; and 2) in the other direction symmetry breaking helps model counters become more efficient. We hope our study motivates future work that further investigates this relation.

## 2 Examples

This section provides two illustrative examples that require computing the number of unique solutions up to isomorphism. We specify the examples in the Alloy

```
module nqueens -- name of the specification
sig Queen {} -- set of queen atoms
one sig Board { state: Queen -> Int -> Int } -- one board
fact StateOkay {
  all q: Queen | one q.(Board.state) -- each queen occupies exactly one cell
  all x: Queen.(Board.state).Int | ValidIndex[x] -- all x-coordinates are valid
  all y: Int.(Queen.(Board.state)) | ValidIndex[y] -- all y-coordinates are valid
  all disj q, r: Queen | q.(Board.state) != r.(Board.state) } -- queens do not share cells
pred ValidIndex[x: Int] { x.gte[0] and x.lte[(#Queen).minus[1]] } -- x >= 0 && x <= |Queen|-1
fun X[q: Queen]: Int { (q.(Board.state)).Int } -- x-coordinate of q
fun Y[q: Queen]: Int { Int.(q.(Board.state)) } -- y-coordinate of q
fun Abs[x: Int]: Int { x.lt[0] implies negate[x] else x } -- absolute value of x
pred SameRow[q, r: Queen] { X[q] = X[r] } -- q and r are in the same row
pred SameColumn[q, r: Queen] { Y[q] = Y[r] } -- q and r are in the same column
pred SameDiagonal[q, r: Queen] { -- q and r share a diagonal
  Abs[X[q].minus[X[r]]] = Abs[Y[q].minus[Y[r]]] }
pred NQueensProblem { -- no queen attacks another queen
   all disj q, r: Queen | !SameRow[q, r] and !SameColumn[q, r] and !SameDiagonal[q, r] }
```
Fig. 1: Alloy specification of n-Queens.

language, which allows us to explore different approaches for applying symmetry breaking. We provide intuitive descriptions of Alloy constructs as we introduce them; further details can be found elsewhere [34].

The first example illustrates a CSP problem [45] where Alloy's default symmetry breaking provides *full* symmetry breaking; we use ApproxMC to solve this problem (Section 2.1). The second example illustrates a software testing problem [42] where manually written symmetry breaking predicates provide full symmetry breaking; we use ProjMC to solve this problem (Section 2.2). Section 5 presents a detailed experimental evaluation where we use the two tools against many additional benchmarks.

#### 2.1 *n*-Queens

Consider specifying the well-known *n-Queens* problem of placing n interchangeable queens<sup>4</sup> on a fixed <sup>n</sup>×<sup>n</sup> chess-board, and computing the number of solutions to the problem using a modern propositional model counter [16, 40, 50].

Figure 1 shows a fragment of an Alloy specification of the n-Queens problem, which has been studied before using Alloy [2, 4, 55]. The keyword sig introduces a set of (interchangeable) atoms. The keyword one makes the set a singleton. The field state introduces a quaternary relation of type "Board x Queen x Int x Int" where Int is a built-in type that represents integers. The *fact* StateOkay describes the basic constraints for the state of the board to be valid; the fact contains

<sup>4</sup> Here, we only consider symmetries based on permuting the queens (and not other forms, e.g., rotations of the board.)

```
Queen={Queen$0, Queen$1, Queen$2, Queen$3,
     Queen$4, Queen$5, Queen$6, Queen$7}
Board={Board$0}
Board<:state={Board$0->Queen$0->7->5, Board$0->Queen$1->6->0,
           Board$0->Queen$2->5->4, Board$0->Queen$3->4->1,
           Board$0->Queen$4->3->7, Board$0->Queen$5->2->2,
           Board$0->Queen$6->1->6, Board$0->Queen$7->0->3}
                                                      80Z0l0Z0Z 7ZqZ0Z0Z0 60Z0Z0Z0l 5Z0Z0ZqZ0 4qZ0Z0Z0Z 3Z0l0Z0Z0 20Z0ZqZ0Z 1Z0Z0Z0l0
                                                        abcde f gh
```
120 W. Wang et al.

Fig. 2: A solution to 8-queens created by the Alloy analyzer illustrated.

4 sub-formulas that are implicitly conjoined; each of them uses universal quantification (all); the keyword disj constrains the quantified variables to represent distinct values. The dot operator ('.') is *relational join* [34]. A *predicate* (pred) is a parameterized formula that can be invoked elsewhere; likewise, a fun is a parameterized expression. The predicate NQueensProblem represents the overall specification of the n-Queens constraints. Any model of the Alloy specification must satisfy the constraints in all the facts and any predicates that are invoked (directly or transitively).

The Alloy user writes a *command* and executes it to solve desired constraints. For example, "run NQueensProblem for 5 int, exactly 8 Queen" asks the analyzer to find a solution to the 8-Queens problem. This command creates a constraint solving problem such that the integer bit-width is 5, and there are exactly 8 queens. Figure 2 shows a *valuation* for each set and relation created by the Alloy analyzer to solve this problem, and graphically illustrates the solution.

Next, we illustrate the use of the approximate model counter ApproxMC [16]. For the nqueens specification, for each 7 ≤ n ≤ 12, we create three constraint solving problems: 1) no symmetry breaking (*no-sb*); 2) BreakID's default CNFlevel symmetry breaking [23] (*cnf-sb*); and 3) Alloy's default domain-level symmetry breaking [58] (*dom-sb*). Table 1 shows the number of solutions found and time taken in each case. The model count with no symmetry breaking is the highest and takes the longest to compute; this approach times out for 8×8 and larger boards. BreakID's default CNF-level symmetry breaking significantly reduces

Table 1: ApproxMC results for n-Queens for 7 ≤ n ≤ 12. Model count ("#") and time taken in seconds ("t[s]") for different problem sizes are shown. Time-out (t.o.) is 5000 sec.


the counts and the time. Alloy's default domain-level symmetry breaking is the most effective, and for this problem, removes all symmetries. Some of the approximate model counts reported by ApproxMC are coincidentally the *exact* counts. We validated the counts using the On-line Encyclopedia of Integer Sequences (OEIS) [6]: the sequence #A000170 represents the number of solutions for the n-Queen problem. The counts computed using Alloy's default symmetry breaking with ApproxMC up to board size 8 × 8 form a subsequence of A000170. For the other board sizes, the table lists the error, which is max( approx exact , exact approx ) − 1, based on multiplicative guarantees.

Note that the non-isomorphic solution count can easily be estimated from the full count for this problem. For example, for the 7×7 board we can estimate it as 208896 7! = 41.44, which is quite close to the actual count of 40. While the calculation is simple, the time to compute the full count is much higher (3727.1 seconds instead of 1.14 seconds). Moreover, for larger board sizes, computing the full count times out, so using it for those sizes may be simply infeasible. This example illustrates a case where symmetry breaking predicates reduce both the model count and the time to compute it by relatively large factors.

3-queens. Table 2 shows the results for a variation of the n-queens problem where the number of queens is fixed to 3, and the board size varies. To specify this variation, we replace the expression "(#Queen).minus[1]" in predicate *ValidIndex* with the value of "k − 1" for the board size k × k, and set the scope for Queen to "exactly 3" in the run command. We validate the ApproxMC counts using the OEIS sequence #A047659 [6]. Once again, BreakID's CNF-level predicates significantly reduce the model count and time to compute it, and Alloy's domainlevel predicates reduce them further. Since the number of queens is fixed to 3, the ratio of total number of solutions (*no-sb*) to number of non-isomorphic solution is 3! = 6. For example, for 11 × 11 board, the ratio for ApproxMC counts is exactly 6; however, the time to compute the full count is, as before, much higher (1307.04 seconds instead of 45.1 seconds). This example shows a case where symmetry breaking predicates reduce the model count by a relatively small factor but the time to compute the counts by a much larger factor.

#### 2.2 Data structure invariants

Next, consider the context of bounded exhaustive testing where the program under test is run against every non-equivalent input within a bound on the

Table 2: ApproxMC results for 3-Queens where 3 queens are placed on n × n board for 8 ≤ n ≤ 12.


...

input size, and the inputs are characterized by a logical formula [42]. Assume the goal is to identify a bound that will lead to a feasible number of inputs that can be executed within the testing budget. We use model counting to estimate the number of solutions for different bounds.

Assume the inputs to the program under test are *binary trees*. Figure 3a shows a partial Alloy specification for binary trees. The singleton sig BT represents the tree, which has a root node and an integer size; the keyword lone defines a partial function, so, e.g., the tree root is either exactly one node or none. Each node has an integer key and a left and a right child. The predicate RepOk specifies the constraints for a valid binary tree, which must be acyclic. The predicate Acyclic specifies acyclicity; the operator "ˆ" is transitive closure, "\*" is reflexive transitive closure, "+" is set union, "&" is set intersection, and "˜" is transpose.

Consider the constraint solving problem for size k so that the binary tree has exactly k nodes and the keys are 1,...,k. Figure 3b illustrates the 5 nonisomorphic trees for size 3.

To show that the impact of symmetry is not limited to only approximate counting, we perform this case study with the exact model counter ProjMC [40]. Table 3 shows the model counts for different sizes. As before, CNF-level symmetry breaking reduces the model count, which is further reduced by Alloy's



```
fact SymmetryBreaking { // pre-order
  BT.root in first[]
  all n: BT.root.*(left + right) {
    some n.left implies n.left in next[n]
    no n.left implies n.right in next[n]
    some n.right and some n.left implies
      n.right in next[max[n.left.*(left + right)]] }}
```
Fig. 4: *Full* symmetry breaking predicates in Alloy [38].

default symmetry breaking. However, unlike before, CNF-level symmetry breaking sometimes makes the model counter, which is ProjMC in this case, slower. Moreover, Alloy's default symmetry breaking does not break all symmetries. For this example, they can be broken using *manually written* predicates. Binary trees belongs to a restricted class of data structures for which *full* symmetry breaking can be achieved by writing predicates in Alloy so that only the *canonical* solution from each isomorphism class is allowed [38]. Figure 4 shows a fact that embodies this approach. Intuitively, the fact requires that a *pre-order* traversal starting at the root visits the nodes in the same order as a *pre-defined* linear ordering of the nodes; the ordering *module* in Alloy allows defining a linear order. The manually written predicates provide the most efficient counting. In this example the count up to isomorphism can, once again, be computed from the full count but at a much higher computational cost. For example, for 8 nodes, the full count is 57657600, which divided by 8! is 1430, i.e., the count up to isomorphism, but ProjMC takes 3673 seconds to compute the full count whereas once the manual symmetry breaking predicates are added it takes 0.34 seconds. The number of binary trees with n nodes is the OEIS sequence #A000108, which allows us to validate that the manually written predicates are indeed breaking all symmetries.

## 3 Background: Model counting

This section gives the relevant background on model counting, with a focus on projected and approximate model counting.

Let ϕ be a Boolean formula in conjunctive normal form (CNF) over the variable set X. An assignment σ of truth values to the variables in ϕ is called solution of ϕ if it makes ϕ evaluate to true. We denote the set of all witnesses of F by R<sup>F</sup> . Given a set of variables S ⊆ X and an assignment σ, we use σ ↓ S to denote the projection of σ on S. Similarly, R<sup>ϕ</sup>↓<sup>S</sup> denotes projection of R<sup>ϕ</sup> on S.

The *projected model counting problem* is to compute |R<sup>ϕ</sup>↓<sup>S</sup>| for a given CNF formula F and sampling set S ⊆ X. When S = X, the problem is referred to as model counting. A *probably approximately correct* (or PAC) counter is a probabilistic algorithm ApproxCount(·, ·, ·, ·) that takes as inputs a formula F, a sampling set S, a tolerance ε > 0, and a confidence 1 − δ ∈ (0, 1], and returns a count <sup>c</sup> such that P r- |R<sup>ϕ</sup>↓<sup>S</sup>|/(1 + ε) ≤ c ≤ (1 + ε)|R<sup>ϕ</sup>↓<sup>S</sup>| ≥ 1 − δ. For clarity, we omit mention of S unless needed for a given context.

Projected Model counting is a fundamental problem in computer science with applications ranging from reliability of networks to information leakage. Valiant initiated complexity theoretic studies of model counting and showed that model counting is #P-hard [63]. The earliest practical approaches to model counting such as Relsat [12], were based on extending DPLL approaches. The advent of CDCL solvers led to the paradigm of combining conflict driven search with component caching leading to the development of solvers such as Cachet [49] and sharpSAT [56]. Furthermore, Darwiche and Marquis [22] pioneered a knowledgecompilation-based approach, relying on the static partitioning of the solution space, which led to development of c2d. The recent years have witnessed combination of CDCL and static approaches with solvers such as D4 and DSharp. Recently, Lagniez and Marquis proposed a recursive algorithm, called ProjMC [40], that exploits the disjunctive decomposition technique pioneered in earlier works to perform projected model counting. Concurrently, another approach, called Ganak [50], for projected model counting has been developed that provides *probabilistic exact* bounds via usage of universal hash functions. In this work, we focus on ProjMC due to its ability to provide exact counts and demonstrated scalability in comparison to other approaches.

The theoretical studies of approximation led to the introduction of PAC style, also referred to as (ε, δ), guarantees wherein the underlying algorithm returns an estimate within (1 + ε) factor of the exact count with confidence at least 1 − δ. Stockmeyer [54] demonstrated that PAC guarantees can be achieved by a probabilistic polynomial Turing machine with access to NP oracle. The practical exploration of Stockmeyer's approach was pursued with Gomes et al with the development of MBound [31] and SampleCount [30]. Chakraborty, Meel, and Vardi proposed a scalable approximate counter, called ApproxMC, with formal (ε, δ) guarantees which seeks to combine the advances in SAT solving with design of efficient universal hash functions.

ApproxMC is now in its third generation, called ApproxMC3. The central idea behind ApproxMC is to employ universal hash functions, represented by randomly chosen XOR constraints, to partition the solution space into roughly equal small cells where every cell can be defined by the original constraints augmented with randomly chosen XOR constraints. ApproxMC invokes CryptoMinisat [53], a solver designed specifically for combination of CNF and XOR constraints, to enumerate solutions in a randomly chosen *small* cell. ApproxMC2 achieves a significant reduction in the number of SAT calls from linear in |S| to log(|S|) by exploiting dependence among different SAT calls. Soos and Meel proposed ApproxMC3 by augmenting ApproxMC2 with a new architecture to handle CNF+XOR formulas [52].

## 4 Study methodology

This section describes the overall design of our study, including the model counting tools, the generation of constraint solving problems, and the measurements for evaluation.

## 4.1 Tools

For approximate model counting, we use ApproxMCv3 (https://github.com/ meelgroup/ApproxMC), which is the latest public release of ApproxMC [52]. For

each model counting problem, we list the primary variables in the input CNF file as a comment as required by ApproxMC. For exact model counting, we use the latest public release of ProjMC [40] (http://www.cril.univ-artois.fr/kc/ projmc.html). For each model counting problem, we list the primary variables in a separate file as required by ProjMC.

## 4.2 Benchmarks

Base formulas. We use four sources of base formulas.

(1) *Alloy specs*. We consider all Alloy specifications in the standard distribution [1]; each command in an Alloy spec defines a constraint solving problem and provides a scope; we use the given scope. We remove unsatisfiable problems since their model count is 0 (regardless of symmetry breaking), and our focus in this study is on satisfiable problems. We also remove all "easy" cases that complete within 1 second for both tools and all symmetry settings. This creates a set of 47 base problems derived from Alloy specifications.

(2) *Kodkod problems*. We consider all Kodkod programs in the standard distribution [5]. Once again, we remove the unsatisfiable problems and "easy" cases. In addition, we remove problems that do not admit symmetry breaking, i.e., where Kodkod does not add any symmetry breaking by default (e.g., when there is a given partial solution, which prevents Kodkod's *greedy base partitioning* [57] from having an effect). Some of the Kodkod programs are parameterized over integer bounds and input files. We manually create those inputs in the appropriate format. This gives us a total of 13 base problems derived from Kodkod programs.

(3) n*-Queens*. We use 2 common variations of the *n*-Queens problem: 1) k queens are placed on a k × k board (1 ≤ k ≤ 12); 2) 3 queens are placed on a k × k board (1 ≤ k ≤ 12). This gives us a total of 24 base problems derived from the *n*-Queens problem<sup>5</sup>.

(4) *Complex data structures*. We use 6 complex data structures: (1) singly-linked lists; (2) sorted lists; (3) doubly-linked lists; (4) binary trees; (5) binary search trees; and (6) red-black trees. For each structure, we bound the number of nodes to be between 6 and 9 (inclusive). This gives us a total of 24 base problems based on structural invariants.

Model counting benchmarks. For each base formula f, we create 3 model counting problems using automatic tools: 1) f with no symmetry breaking, which we create by setting Alloy's default symmetry breaking to *off* ; 2) f with symmetry breaking predicates added at the CNF level, which we create by first using Alloy to create a CNF formula with no domain-level symmetry breaking, and then using the BreakID [23] tool to add CNF-level symmetry breaking predicates using the same arguments as in the SATRACE'15 competition [3]; and 3) f with symmetry breaking predicates added at the problem domain level, which we create by having Alloy's default symmetry breaking turned *on*. Moreover, for data

<sup>5</sup> Unfortunately, we were not able to get the results for majority of the n-Queens benchmarks with ProjMC due to an unknown issue with the tool, so we do not use the *n*-Queens benchmarks for experiments with ProjMC; we have requested the ProjMC team to look into the issue.

structures, we create formulas with manually added domain-specific symmetry breaking predicates, which we write in Alloy following previous work [38]. This gives us a total of 348 model counting problems.

Table 4 shows some characteristics of the benchmarks, specifically the minimum and maximum numbers of primary variables, and all variables and clauses under the different symmetry breaking settings.

## 4.3 Metrics

We use two key metrics – the model counts and the time to compute them – and measure them under different symmetry breaking settings. For model counts, we report the tool output and the ratio of the count under one setting to the count under another setting. For time, we report the actual wall-clock times, and the ratio of time taken under one setting to the time taken under another setting. In line with prior work [17], we report the error rate of the approximate model counting which is max( approx exact , exact approx ) − 1, based on multiplicative guarantees.

## 5 Experimental evaluation

The section reports the results of the experimental evaluation. Section 5.1 describes the results for ApproxMC. Section 5.2 describes the results for ProjMC.

## 5.1 Symmetry breaking and approximate model counting

Time. Figures 5a, 5c, and 5e illustrate the time performance of ApproxMC on the benchmarks based on Alloy, Kodkod, and data structure invariants respectively. With no symmetry breaking, ApproxMC times out on 21 (of 47) Alloy benchmarks, 6 (of 13) Kodkod benchmarks, and 10 (of 24) data structure benchmarks. In all but 16 cases, formulas with Alloy's default symmetry breaking take less time than with CNF-level symmetry breaking. In all but 10 cases, formulas with CNF-level symmetry breaking take less time than with no symmetry breaking. Moreover, for data structure benchmarks, in all but 1 cases, formulas with manual symmetry breaking take less time than Alloy's default symmetry breaking. Among all the problems that time out with no symmetry breaking, the smallest time taken by the corresponding problem with Alloy's default symmetry breaking was 0.14 seconds, and the smallest time taken by


Table 4: Benchmark characteristics.

(e) Time: ApproxMC – Data structures

(f) Time: ProjMC – Data structures

Fig. 5: Time results. x-axis has benchmark model counting problems. y-axis has time in seconds (log-scale). Benchmarks on x-axis are sorted in ascending order based on the number of primary variables; moreover, the data structure benchmarks are grouped by the type of the structure. Blue diamond is no symmetry breaking (*no-sb*) ; red triangle is CNF-level symmetry breaking (*cnf-sb*); green square is Alloy's default symmetry breaking (*dom-sb*); and orange cross is manual symmetry breaking (*man-sb*).

the corresponding problem with manual symmetry breaking was 0.008 seconds. For the Alloy benchmarks, ApproxMC does not time-out under any symmetry breaking setting for benchmarks that have up to 90 primary variables. The time results for the n-Queens benchmarks were presented in Section 2.1.

Model counts. Figure 6a graphically illustrates how the model counts vary under different symmetry breaking settings. For the Alloy and Kodkod benchmarks, in all but 10 cases the model count for the formula with Alloy's default symmetry breaking is less than the corresponding count with CNF-level sym-

Fig. 6: Model count results. x-axis has benchmark model counting problems. yaxis (log-scale) has count ratio n/c where n is the model count for the formula with no symmetry breaking and c is the corresponding count with CNF-level symmetry breaking (green-square), Alloy's default symmetry breaking (bluediamond), and manual symmetry breaking (red-triangle – only for data structures). Only cases where the calculation of n did not time out are shown.

(b) Model count: ProjMC

metry breaking. For the data structures, the model count for the formula with Alloy's symmetry breaking is less than the corresponding count with CNF-level symmetry breaking in all cases; moreover, in all but 5 cases, manual symmetry breaking gives the lowest count (the 5 exceptions are due to approximation in computing the model counts). Among all problems where ApproxMC reports a count with no symmetry breaking, the largest ratio of count with no symmetry breaking to count with Alloy's default symmetry breaking was 61167, and the largest ratio of count with no symmetry breaking to count with manual symmetry breaking was 45056. The model count results for the n-Queens benchmarks were presented in Section 2.1.

Error. For the Alloy, Kodkod, and data structure benchmarks, we compute the error in ApproxMC with respect to the counts reported by ProjMC for the cases where ProjMC reported a count. The error ranges were: [0, 0.168] for the Alloy benchmarks, [0, 0.168] for the Kodkod benchmarks, and [0, 0.165] for the data structure benchmarks. Section 2.1 presented the error results for the n-Queens benchmarks with respect to the number in OEIS [6].

## 5.2 Symmetry breaking and exact model counting

Time. Figures 5b, 5d, and 5f illustrate the time performance of ProjMC on the benchmarks based on Alloy, Kodkod, and data structure invariants respectively. With no symmetry breaking, ProjMC times out on 21 (of 47) Alloy benchmarks (which is the same number as ApproxMC although the two sets of benchmarks are not the same), 9 (of 13) Kodkod benchmarks (which is more that the number for ApproxMC), and 9 (of 24) data structure benchmarks (which is more than ApproxMC). In all but 8 cases, formulas with Alloy's default symmetry breaking take less time than with CNF-level symmetry breaking. In all but 24 cases, formulas with CNF-level symmetry breaking take less time than with no symmetry breaking. Moreover, for data structure benchmarks, in all but 2 cases, formulas with manual symmetry breaking take less time than Alloy's default symmetry breaking. Among all the problems that time out with no symmetry breaking, the smallest time taken by the corresponding problem with Alloy's default symmetry breaking was 3.12 seconds, and the smallest time taken by the corresponding problem with manual symmetry breaking was 0.01 seconds.

Model counts. Figure 6b graphically illustrates how the model counts vary under different symmetry breaking settings. For the Alloy and Kodkod benchmarks, in all but 9 cases the model count for the formula with Alloy's default symmetry breaking is less than the corresponding count with CNF-level symmetry breaking. For the data structures, the model count for the formula with Alloy's symmetry breaking is less than the corresponding count with CNF-level symmetry breaking in all cases; moreover, in all cases, manual symmetry breaking gives the lowest count. Among all problems where ApproxMC reports a count with no symmetry breaking, the largest ratio of count with no symmetry breaking to count with Alloy's default symmetry breaking was 40320, and the largest ratio of count with no symmetry breaking to count with manual symmetry breaking was 362880.

Overall, the impact of symmetry breaking is significant for both ApproxMC and ProjMC. In majority of the cases, Alloy's default symmetry breaking is more effective than CNF-level symmetry breaking using BreakID. For data structure benchmarks, manual symmetry breaking is the most effective, and reports exactly the counts of the non-isomorphic solutions as desired; moreover, in cases where Alloy's default symmetry breaking provides *full* symmetry breaking, manual symmetry breaking provides much faster solving.

## 5.3 Discussion

The empirical evaluation in the preceding subsections clearly demonstrates the significant impact of symmetry breaking on ApproxMC and ProjMC. While a detailed study to explain the observed behavior is beyond the scope of this work, we offer some explanations. As pointed out by Soos and Meel [52], over 99% of the runtime of ApproxMC is consumed by the underlying SAT solver handling CNF-XOR formulas. The usage of symmetry breaking predicates for satisfiable instances typically leads to smaller overheads in runtime in the context of satisfiability queries. As discussed above, the use of symmetry breaking predicates significantly reduces the number of solutions and thereby leads to the significant reduction in the number of XORs to be added by ApproxMC. Note that the number of XORs to be added is logarithmically proportional to the number of solutions of a formula. The performance of SAT solvers has been observed to be sensitive to the number of XORs [24] and therefore, we believe that reduction in the required number of XORs is the primary reason behind the performance improvements in the context of ApproxMC.

The performance improvement of ProjMC is, however, more surprising since it is not necessarily the case that reduction in the number of solutions would lead to reduction in the size of the corresponding d-DNNF (decision-Deterministic Decomposable Negation Normal Form), which represents the trace of the execution of ProjMC [33]. Furthermore, given the lack of noticable difference in runtime performance improvement via off-the-shelf symmetry breaking tools, it would be an interesting direction of future work to understand the difference in the traces between the formulas generated via Alloy's default symmetry breaking and CNF-level symmetry breaking.

## 6 Conclusions

This paper presented, to the best of our knowledge, the first study of symmetry breaking and model counting. A goal of the study was to determine what is the best way to add symmetry breaking predicates (if at all) to obtain precise counts of non-isomorphic solutions. We studied two model counters from two different classes and four scenarios of applying symmetry breaking. A key lesson of our study is that domain-specific symmetry breaking predicates are most effective at enabling precise computation of model counts up to isomorphism. We believe the results of our study can provide insights into more effective use of cutting edge model counters in important domains where the number of unique solutions up to isomorphism is desired, and also enable developing novel model counting methods that exploit symmetries.

## Acknowledgments

This work was supported in part by the U.S. National Science Foundation Grant CCF-1718903, and the National Research Foundation Singapore under its AI Singapore Programme [AISG-RP-2018-005].

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **TACAS MUST: Minimal Unsatisfiable Subsets Enumeration Tool***-*

Jaroslav Bend´ık and Ivana Cern´a ˇ

Faculty of Informatics, Masaryk University, Brno, Czech Republic {xbendik,cerna}@fi.muni.cz

**Abstract.** In many areas of computer science, we are given an unsatisfiable set of constraints with the goal to provide an insight into the unsatisfiability. One of common approaches is to identify minimal unsatisfiable subsets (MUSes) of the constraint set. The more MUSes are identified, the better insight is obtained. However, since there can be up to exponentially many MUSes, their complete enumeration might be intractable. Therefore, we focus on algorithms that enumerate MUSes *online*, i.e. one by one, and thus can find at least some MUSes even in the intractable cases.

Since MUSes find applications in different constraint domains and new applications still arise, there have been proposed several *domain agnostic* algorithms. Such algorithms can be applied in any constraint domain and thus theoretically serve as ready-to-use solutions for all the emerging applications. However, there are almost no domain agnostic tools, i.e. tools that both implement domain agnostic algorithms and can be easily extended to support any constraint domain. In this work, we close this gap by introducing a domain agnostic tool called MUST. Our tool outperforms other existing domain agnostic tools and moreover, it is even competitive to fully domain specific solutions.

**Keywords:** Minimal unsatisfiable subsets · Unsatisfiability analysis · Infeasibility analysis · MUS · Diagnosis.

## **1 Introduction**

In various areas of computer science, we are given a set C of constraints with the goal to determine whether the set is satisfiable, i.e. whether all the constraints can hold simultaneously. In the case where the set is shown to be unsatisfiable, we are often interested in analysing the unsatisfiability. Identification of minimal unsatisfiable subsets (MUSes) of C is a kind of such analysis. A set M ⊆ C is a MUS of C iff M is unsatisfiable and all proper subsets of M are satisfiable. The more MUSes are identified, the better insight into the unsatisfiability of C is obtained. However, the complete MUS enumeration is often intractable since

<sup>-</sup> This research was supported by ERDF "CyberSecurity, CyberCrime and Critical Information Infrastructures Center of Excellence" (No. CZ.02.1.01/0.0/0.0/16 019/0000822).

there can be up to exponentially many MUSes w.r.t. the number of constraints in C. Therefore, several *online* MUS enumeration algorithms (e.g. [3,29,22,1,25,10]) were proposed, i.e. algorithms that identify MUSes gradually, one by one, and thus identify at least some MUSes even in the intractable cases.

Various applications of MUSes arise for example in requirements analysis [4,6], during formal equivalence checking [15], proof based abstraction refinement [23], Boolean function bi-decomposition [12], circuit error diagnosis [21], type debugging in Haskell [30], or proof explanation in symbolic model checking [20]. The domain of the constraint sets ranges from Boolean formulas [23,14], over temporal logic formulas [4,6], to transition state predicates constraining transition systems [20]. Since the list of constraint domains where MUSes find an application is quite long and new applications still arise, there have been proposed several *domain agnostic* MUS enumeration algorithms (e.g. [3,22,9,7,10]). Such algorithms can be used in an arbitrary constraint domain, and thus theoretically serve as ready-to-use solutions for any constraint domain where MUSes might eventually find an application.

Unfortunately, there is no available *domain agnostic tool implementation* of the algorithms that would actually serve as a ready-to-use solution for an arbitrary constraint domain. Although the papers that present existing domain agnostic algorithms provide results of an experimental evaluation, it is often the case that the implementation is either not publicly available [4,3], or there is a hard-coded support for a particular constraint domain [10,20]. The closest to a domain agnostic tool is a tool by Liffiton et al. [22] where the authors implement their domain agnostic MUS enumeration algorithm MARCO. Their tool currently supports the SAT and the SMT domains and can be relatively easily extended to support also another constraint domains. However, our recent evaluation [8] of contemporary domain agnostic algorithms in various constraint domains has shown that the efficiency of the algorithms (including MARCO) varies a lot in different constraint domains. There is no silver bullet algorithm that would be efficient in all the domains. Thus, to deal with a particular constraint domain, one has to wisely choose from individual algorithms.

In this work, we present the first stable release of our domain agnostic tool, called MUST, for MUS enumeration. The tool implements several domain agnostic MUS enumeration algorithms and currently provides support for 3 constraint domains: SAT, SMT, and LTL. Moreover, due to a modular architecture of the tool, the tool can be easily extended to support another constraint domain: it requires only to implement an API for communication with a satisfiability solver for the constraint domain. We also provide a guidance on which algorithms are suitable for which kinds of input constraint systems.

To demonstrate the efficiency of our tool, we experimentally compare it to the tool by Liffiton et al. [22] in the SAT and SMT domains, and we show that our tool clearly wins in both the domains. Moreover, we also provide a comparison with two contemporary tools that are tailored to the SAT domain: MCSMUS [1] and FLINT [25]. The results show that MUST is competitive to the two domain specific solutions. Moreover in case of many benchmarks, MUST actually significantly dominates the other tools.

Finally, to advocate the practical applicability of our tool in industrial settings, we provide a use case from the area of requirements analysis. In particular, we have employed our tool in the European Unions Horizon 2020 project called AMASS. The project focused on development and verification of cyberphysical systems in the largest industrial markets including automotive, railway, aerospace, space, and energy. One of the verification tasks is to verify that requirements on the system are consistent, i.e., to ensure that there can be even built a system that satisfies the requirements. If the requirements are found to be inconsistent, an identification of minimal inconsistent (unsatisifable) subsets of the requirements helps to fix the conflicts among the requirements. Our tool has proved to be very efficient in dealing with this task.

## **2 Preliminaries**

#### **2.1 Basic Definitions**

We are given a set C = {c1, c2,...,cn} of constraints such that each subset of C is either *satisfiable* or *unsatisfiable*. The notion of satisfiability varies in particular constraint domains. We only assume that if a set N, N ⊆ C, is satisfiable then all subsets of N are also satisfiable. Dually, if a set K, K ⊆ C, is unsatisfiable then all supersets of K are also unsatisfiable. We will use C to denote the input set of constraints throughout the whole paper.

**Definition 1 (MUS).** *A subset* N *of* C *is a minimal unsatisfiable subset (MUS) of* C *if and only if* N *is unsatisfiable and for all* c ∈ N *the set* N\{c} *is satisfiable.*

Note that the minimality concept used here is set minimality, not minimum cardinality. Therefore, there can be MUSes with different cardinalities. Also, there can be up to exponentially many MUSes w.r.t. the number of constraints in C (see the Sperner's theorem [28]).

**Definition 2 (critical constraint).** *Let* U *be an unsatisfiable subset of* C *and* c ∈ U*. The constraint* c *is* critical *for* U *if and only if* U \ {c} *is satisfiable.*

Note that U is a MUS of C if and only if all constraints in U are critical for U. Furthermore, if c is critical for U then c has to be contained in every MUS of U.

*Example 1.* We illustrate the concepts on a small example. Assume that we are given a set C of four Boolean satisfiability constraints: c<sup>1</sup> = a, c<sup>2</sup> = ¬a, c<sup>3</sup> = b, and c<sup>4</sup> = ¬a∨¬b. Clearly, the whole set is unsatisfiable as the first two constraints are negations of each other. There are two MUSes: {c1, c2}, {c1, c3, c4}. As for the critical constraints, we can for example see that c<sup>1</sup> is the only critical constraint for C, and that c1, c<sup>2</sup> are critical for {c1, c2, c3}.

**Algorithm 1:** Domain Agnostic Shrinking

```
input : an unsatisfiable set S of constraints
  input : a set crits of constraints that are critical for S
  output: A MUS of S
1 for c ∈ S \ crits do
2 if not CheckSat(S \ {c}) then
3 S ← S \ {c}
4 return S
```
#### **2.2 Shrink**

Let us now define an operation, called Shrink, that is used in our tool to identify individual MUSes.

**–** Shrink(S, *crits*) takes an unsatisfiable subset <sup>S</sup> of <sup>C</sup> together with a set *crits* of constraints that are critical for S and returns a MUS Smus of S.

We say that S is *shrunk* into a MUS Smus. The shrinking is maintained in in our algorithms as a black-box subroutine and thus can be implemented using any available single MUS extraction algorithm. Especially, we can implement the operation using a domain specific solution and thus indirectly exploit domain specific properties of particular constraint domains.

To shed more light on how a shrinking can be done, we describe in Algorithm 1 a domain agnostic single MUS extraction approach that forms the basis of many contemporary domain specific solutions. To find a MUS of S, the algorithm iteratively attempts to remove individual constraints in S \ *crits* from S, checking each new set for satisfiability, and keeping only the changes that preserve S to be unsatisfiable. The most expensive part of the shrinking are the satisfiability checks. In total, the algorithm performs |S|−|crits| satisfiability checks. Domain specific algorithms (e.g. [5,24,1,19]) that are based on Algorithm 1 are often able to further reduce the number of performed satisfiability checks by exploiting domain specific properties of particular constraint domains.

#### **2.3 Unexplored Subsets**

Our algorithms for MUS enumeration during their computation gradually *explore* satisfiability of individual subsets of C. The *explored* subsets are those, whose satisfiability is already known by the algorithm whereas *unexplored* subsets are those whose satisfiability is not determined yet. We use *Unexplored* to denote the set of all unexplored subsets of C. Recall that all subsets of a satisfiable set are also satisfiable. Thus, if a set S is determined to be satisfiable, then not just S but also all of its subsets become explored. Dually, if a set U is determined to be unsatisfiable, then all supersets of U become explored. We further classify unexplored subsets as follows:

Fig. 1: Illustration of Example 2. We encode individual subsets of C as bitvectors; for example, the subset {c1, c3, c4} is written as 1011.

**Definition 3 (Minimal Unexplored Subset).** *A set* S *is a minimal unexplored subset, if* S *is unexplored and for all* c ∈ S *is* S \ {c} *explored.*

**Definition 4 (Maximal Unexplored Subset).** *A set* S *is a maximal unexplored subset, if* S *is unexplored and for all* c ∈ C \ S *is* S ∪ {c} *explored.*

Details on how we actually store, maintain, and use unexplored subsets are described later in Section 4.2. Here, we conclude by defining the concept of *minable critical* constraints:

**Definition 5 (minable critical).** *Let* N *be an unsatisfiable subset of* C *such that* N ∈ *Unexplored , and let* c ∈ N*. The constraint* c *is a* minable *critical constraint for* N *if* N \ {c} ∈ *Unexplored .*

*Example 2.* Let us illustrate the concepts on an example. Assume that we are given the same set of four constraints as in Example 1: c<sup>1</sup> = a, c<sup>2</sup> = ¬a, c<sup>3</sup> = b, and c<sup>4</sup> = ¬a ∨ ¬b. Fig. 1 shows a possible state of exploration of the power-set. Satisfiable subsets are drawn with a solid border and unsatisfiable ones with a dashed border. There are 2 explored unsatisfiable subsets (red color), 7 explored satisfiable subsets (green color), and 7 unexplored subsets (black color). There are two minimal unexplored subsets: {c2} and {c1, c3, c4}, and three maximal unexplored subsets: {c1, c2, c3}, {c1, c3, c4} and {c2, c3, c4}. As for the minable critical constraints, we can for example see that c<sup>2</sup> is minable critical for the set {c1, c2, c3}, and that all constraints are minable critical for the set {c1, c3, c4}.

## **3 Implemented Algorithms**

Our tool currently implements three domain agnostic algorithms for online MUS enumeration: MARCO [22], TOME [7], and ReMUS [9]. MARCO was originally developed by Liffiton et al. [22]; the other two algorithms are originally ours. All the three algorithms are based on a common scheme that we call *seed-shrink scheme*. In this section, we first describe the base scheme and then briefly comment also on the individual algorithms.

#### **Algorithm 2:** Seed-Shrink Scheme

```
input : an unsatisfiable set C of constraints
  output: All MUSes of C
1 Unexplored ← P(C)
2 while there is a seed do
3 S ← find a seed
4 crits ← collect minable critical constraints for S
5 Smus ← Shrink(S, crits)
6 Unexplored ← Unexplored \ {T | T ⊂ Smus or Smus ⊆ T ⊆ C}
7 output Smus
```
#### **3.1 Seed-Shrink Scheme**

The *seed-shrink scheme* is shown in Algorithm 2. The computation starts by initializing the set *Unexplored* to P(C), i.e. all subsets of C are initially unexplored. Subsequently, the scheme iteratively identifies all MUSes of C. Each iteration starts by finding a so called *seed*, i.e. an unexplored subset that is unsatisfiable. Subsequently, the set *crits* of all constraints that are minable critical for the seed are collected and the shrinking procedure is used to find a MUS of the seed. The iteration is concluded by marking all subsets and supersets of the MUS as explored (the subsets are necessarily satisfiable, and the supersets are unsatisfiable). The computation terminates once there is no more seed.

The scheme does not specify how to find a seed; this part differs for individual algorithms implementing the scheme. In general, to find a seed, the algorithms check several unexplored subsets for satisfiability and reduce the set *Unexplored*. The difference between the algorithms is in *which* and *how many* subsets they check, and *how large* is the resultant seed. In general, the smaller the seed is, the easier is to shrink it. On the other hand, unsatisfiable subsets are naturally more concentrated among the larger subsets, thus looking for a seed among small unexplored subsets might come with the price of checking many unexplored subsets for satisfiability. Individual seed-shrink algorithms make a different trade-off between the size of identified seeds and the number of satisfiability checks that are performed to identify the seeds. In some constraint domains, it is worth to find a small seed even if it requires performing many satisfiability checks, and in other constraint domains the situation is exactly the opposite. The optimal choice of a seed-shrink algorithm thus differs for individual constraint domains.

**MARCO** [22] searches for a seed S among the maximal unexplored subsets and often performs only few satisfiability checks to identify a seed. Since maximal unexplored subsets are usually very large, the seeds identified by MARCO are generally hard to be shrunk. Yet, in some constraint domains, such as SAT and SMT, the size of the seed has just a negligible effect on the complexity of the shrinking. In particular, in the SAT and SMT domains, contemporary satisfiability solvers can extract an *unsat core* of the seed S, i.e. unsatisfiable, yet not necessarily minimal, subset of S. The extraction comes with almost no overhead compared to an ordinary check for satisfiability, and the unsat core is usually very close, in terms of cardinality, to a MUS of S. Thus, instead of shrinking the whole S, the unsat core is passed to the shrinking procedure.

**TOME** [7] identifies seeds iteratively as follows. Each iteration of the algorithm starts by picking a minimal unexplored subset N<sup>1</sup> and a maximal unexplored subset N<sup>p</sup> such that N<sup>1</sup> ⊆ Np. Subsequently, TOME builds a chain N<sup>1</sup> ⊂ N<sup>2</sup> ⊂ ··· ⊂ N<sup>p</sup> of unexplored subsets. Such a chain necessarily either contains only unsatisfiable subsets, only satisfiable subsets, or it contains an element N<sup>i</sup> such that ∀j, 1 ≤ j<i, is N<sup>j</sup> satisfiable and ∀k, i ≤ k ≤ p, is N<sup>k</sup> unsatisfiable. In the first case, it is guaranteed that N<sup>1</sup> is a MUS. In the second case, the chain does not give us any seed. Finally, in the third case, TOME finds N<sup>i</sup> using binary search (which takes only O(log<sup>2</sup> p) satisfiability checks). Subsequently N<sup>i</sup> is used as a seed for the shrinking procedure and shrunk into a MUS.

There are no guarantees on distribution of satisfiable and unsatisfiable subsets on the chain, since the subsets are unexplored. In the best case, where N<sup>1</sup> is unsatisfiable, TOME identifies a MUS using just a single satisfiability check. In the worst case, the whole chain is satisfiable and TOME has to build another chain. Based on our experience, TOME on average performs more satisfiability checks to find a seed than MARCO does, but the seeds are much smaller than in the case of MARCO. Thus, TOME is efficient especially in constraint domains where the size of the seed highly affects the complexity of the shrinking.

**ReMUS** [9] is based on the following observation: if C, C<sup>k</sup>, and M are unsatisfiable sets such that <sup>C</sup><sup>k</sup> <sup>⊆</sup> <sup>C</sup> and <sup>M</sup> is a MUS of <sup>C</sup><sup>k</sup>, then <sup>M</sup> is necessarily also a MUS of C. Note that the smaller C<sup>k</sup> is the smaller seeds are in C<sup>k</sup>. ReMUS tends to identify C<sup>k</sup> that is very small, yet contains many MUSes, and searches for seeds in C<sup>k</sup>. In particular, the very first seed S is found among the maximal unexplored subsets of C<sup>0</sup> = C and then shrunk to a MUS Smus. To find a next seed, ReMUS chooses <sup>C</sup><sup>1</sup> such that <sup>S</sup>mus <sup>⊆</sup> <sup>C</sup><sup>1</sup> <sup>⊆</sup> <sup>S</sup>, and searches for a seed <sup>S</sup><sup>1</sup> among maximal unexplored subsets of C<sup>1</sup>. If a seed S<sup>1</sup> is identified, then it is again shrunk to a MUS S<sup>1</sup> mus and again used to reduce the search space, i.e. the a next seed S<sup>2</sup> is searched for in a set C<sup>2</sup> such that S<sup>1</sup> mus <sup>⊆</sup> <sup>C</sup><sup>2</sup> <sup>⊆</sup> <sup>S</sup><sup>1</sup>. The search space reduction is recursively repeated as long as possible. Once the current search space is completely explored, ReMUS backtracks from the recursion and searches for a seed on the previous recursion level. Moreover, ReMUS employs several heuristics to pre-emptively backtrack from a search space that contains a lot of unexplored subsets but only few MUSes.

The larger the input set C of constraints is, the more extensive recursive reduction is possible, and thus the smaller seeds can be found. We recommend to use ReMUS, rather than MARCO or TOME, if the input constraint set contains at least hundreds of constraints and hundreds of MUSes, no matter what the constraint domain is.

For a more elaborated description of the three algorithms, please refer to the original papers [22,7,9] or to our recent work [8] where we have experimentally compared the algorithms in various constraint domains.

## **4 Architecture of the Tool**

Our tool is implemented in C++ and is available under the MIT license at:

## https://github.com/jar-ben/mustool

The tool consists of six logical components: *SatSolver*, *Explorer*, *Master*, *Algorithms*, *Heuristics*, and *Initializer*. In the following section 4.1 we provide a brief description of the individual components. Subsequently, in Sections 4.2 and 4.3 we provide a more detailed description of *Explorer* and *SatSolver*. Finally, in Section 4.4, we give instructions on how to install and use our tool.

## **4.1 Logical Components**

**SatSolver** *SatSolver* (declared in *SatSolver.h*) is the only domain specific part of our tool. It provides the functionality for checking sets of constraints for satisfiability, and implements the shrinking procedure. Also, *SatSolver* copes with parsing the input set of constraints (provided by the user) and exporting the identified MUSes in particular domain specific formats. A more detailed description of *SatSolver* is provided in Section 4.3.

**Explorer** *Explorer* (declared in *Explorer.h*) maintains the set *Unexplored* of all unexplored subsets and handles related operations including: marking sets as explored, obtaining unexplored subsets, and mining critical constraints based on the set *Unexplored*. For more information, see Section 4.2.

**Master** *Master* (declared in Master.h) is the coordinator of the whole computation. In particular, it holds an instance of Explorer and an instance of SatSolver and provides wrappers for calling their methods. Moreover, it runs a MUS enumeration algorithm that is specified by the user via a command line argument (see below).

**Algorithms** The algorithms MARCO [22], TOME [7], and ReMUS [9] are declared in Master (Master.h) and implemented in marco.cpp, tome.cpp, and remus.cpp, respectively. All calls to SatSolver and Explorer are made via the wrappers defined in Master. This means that any improvement to Explorer and especially to SatSolver (i.e. a more efficient shrinking procedure or satisfiability solver) is immediately reflected by all the algorithms.

**Heuristics** There are several heuristics that are bound to the wrappers defined in Master, and thus can be exploited by all the three algorithms. For example, in the wrapper for invoking the shrinking procedure, we provide two heuristics for computing critical constraints for the set that is being shrunk. One of the two heuristics uses Explorer to compute critical constraints based on the set *Unexplored*. The other heuristic uses SatSolver to obtain additional critical constraints that cannot be mined from *Unexplored*.

**Initializer** *Initializer* (implemented in main.cpp) parses the command line arguments provided by the user, and creates, sets-up, and runs the Master.

#### **4.2 Explorer**

Since there can be up to exponentially many unexplored subsets w.r.t. the number of constraints in C, it is intractable to represent them explicitly. Instead, we adopt a symbolic representation that was first proposed by Liffiton et al. [22] and subsequently used in many other works (e.g. [1,20,10]).

Given a set C = {c1, c2,...,cn} of constraints, we introduce a set X = {x1, x2,...,xn} of Boolean variables, and maintain two Boolean formulas, *map*<sup>+</sup> and *map*−, over <sup>X</sup> such that each model of *map*<sup>+</sup> <sup>∧</sup> *map*<sup>−</sup> corresponds to an unexplored subset and vice versa. The formulas are maintained as follows:


We use the SAT solver miniSAT [18] to hold and query the formulas *map*<sup>+</sup> and *map*−. To get an arbitrary element of *Unexplored*, we can ask miniSAT for a model of *map*<sup>+</sup> <sup>∧</sup>*map*−. However, in our algorithms, we need to be able to obtain two specific kinds of unexplored subsets.

First, given a set N, N ⊆ C, we need to be able to find a maximal unexplored subset of N. We exploit that miniSAT allows the user to fix values of some variables and also to set the default *polarity* of variables, i.e. the default value assignment to variables in decision points during the solving. To get a maximal unexplored subset of N, we fix the values of the variables {xi|c<sup>i</sup> ∈ N} to *False*, set the default polarity to *True*, and ask miniSAT for a model of *map*<sup>+</sup> <sup>∧</sup> *map*−.

Second, given an *unexplored* N, N ⊆ C, we need to find a minimal unexplored subset B of N (this is used by TOME while constructing a chain of unexplored subsets). To do this, we fix the values of the variables {xi|c<sup>i</sup> ∈ N} to *False*, set the default polarity to *False*, and ask miniSAT for a model of *map*<sup>+</sup>. Note that we do not include *map*<sup>−</sup> in the query. Intuitively, *map*<sup>−</sup> requires an absence of constraints and since N satisfies *map*−, every subset of N also satisfies *map*−.

As for the implementation, we integrate miniSAT via it's C API and we maintain two instances of the solver. One instance holds the formula *map*<sup>+</sup> <sup>∧</sup> *map*<sup>−</sup> whereas the other instance holds just *map*<sup>+</sup>. Both the instances are used incrementally, i.e. the formulas are incrementally build during the whole MUS enumeration and simplified (internally by miniSAT) when possible. Let us note that Liffiton et al. also incrementally use miniSAT in their tool<sup>1</sup>. However, they maintain just the whole conjunction *map*<sup>+</sup> <sup>∧</sup> *map*<sup>−</sup> since a separate maintenance of *map*<sup>−</sup> or *map*<sup>+</sup> would not bring any speed-up in case of their MUS enumeration algorithm.

<sup>1</sup> https://sun.iwu.edu/%7eliffito/marco/

Finally, Explorer provides one more functionality. Given an unexplored subset N, Explorer can collect minable critical constraints of N. Recall that a constraint c ∈ N is minable critical for N iff N \ {c} is explored. All the minable critical constraints can be determined based on the formula *map*<sup>+</sup> <sup>∧</sup>*map*−. In particular, if we simplify the formula by fixing the variables {xi|c<sup>i</sup> ∈ N} to *False*, then values of some variables from {xi|c<sup>i</sup> ∈ N} will be *implied* to be *True*. These implicants correspond to the minable critical constraints. This observation has been already exploited by Liffiton et al. [22] and they use miniSAT to obtain the implicants in their tool. However, the miniSAT's procedure for computing the implicants is not dedicated solely to this purpose; it is optimized w.r.t. the overall satisfiability solving process. Therefore, a use of miniSAT for this task brings an unnecessary overhead. In our tool, we directly compute the implicants from the formula *map*<sup>+</sup> instead of using a SAT solver to do it.

## **4.3 SatSolver**

*SatSolver* (declared in *SatSolver.h*) is an abstract class stating all the domain specific functionality that needs to be implemented (in a derived class) to support a particular constraint domain in our tool. There are three methods that have to be implemented by every derived class:


e.g. invokes the procedure solve(N), it passes the bit-vector representation of N to SatSolver and SatSolver converts it to particular constraints.

Besides the above three methods that have to be implemented by every derived class, SatSolver defines and implements a method that can be overridden by a derived class:

**–** shrink(N, *crits*) performs the shrinking, i.e. it takes an unsatisfiable set N together with a set *crits* of constraints that are critical for N and returns a MUS of N. The default domain agnostic implementation of this method is carried out by Algorithm 1 (Section 2.2).

Currently, our tool supports 3 constraint domains via the following 4 derived classes of SatSolver:


If anyone wants to add support for another constraint domain to our tool, it is enough to implement a derived class of SatSolver. For example, the implementation of SpotHandle takes only 45 lines of code, including several empty lines

<sup>2</sup> https://es-static.fbk.eu/tools/nuxmv/index.php?n=Main.License

caused by formatting and lines containing only closing brackets ("}"). Therefore, we claim our tool to be indeed domain agnostic and ready-to-use solution for any constraint domain.

### **4.4 Installation and Execution of the Tool**

For detailed installation and usage instructions, please follow the README.md file at: https://github.com/jar-ben/mustool.

Briefly, our tool can be built either in lightweight settings with support only for SAT domain, or with support also for the SMT and/or LTL domains. Whereas in the SAT domain, we use miniSAT that can be built very quickly, the z3 and SPOT solvers that we use in the SMT and LTL domains can take several hours to install. Once you have installed all the solvers you want to use, our tool can be simply built with an invocation of the command "make".

To run our tool in its default settings, execute:

## ./must input file,

where input file specifies the input file of constraints, and it has to have either .cnf, smt2, or .ltl extension. Based on the extension, Master selects and uses an appropriate derived class of SatSolver. To specify a MUS enumeration algorithm to be used, invoke the tool by:

```
./must -a alg input file,
```
where alg can be either marco, tome, or remus (the default one). To see all the available settings, run

./must -h.

## **5 Experimental Evaluation**

### **5.1 Evaluated Tools**

The only other existing MUS enumeration tool that can be seen as domain agnostic is the implementation<sup>3</sup> of the domain agnostic algorithm MARCO (invented by Liffiton et al. [22] and implemented by Liffiton and Zhao). In the following, we refer to the tool as MARCO. Currently, MARCO supports the SAT and SMT domains and can also relatively easily be extended to support another constraint domains. Here, we provide results of an experimental comparison of our tool MUST with MARCO in both the SAT and SMT domains. Moreover, to demonstrate that our domain agnostic tool can be competitive even to fully domain specific solutions, we include a comparison with two state-of-the-art MUS enumeration tools from the SAT domain: MCSMUS<sup>4</sup> [2] and FLINT<sup>5</sup> [25].

Due to the space limitation, we show here only results achieved by the best (default) configurations of our tool. In particular, in both domains, we use the

<sup>3</sup> https://sun.iwu.edu/%7emliffito/marco/

<sup>4</sup> https://bitbucket.org/gkatsi/mcsmus/src

<sup>5</sup> The tool was kindly provided to us by its author, Nina Narodytska.

algorithm ReMUS. As for the shrinking, in SMT domain, we use our custom shrinking solution, and in the SAT domain we employ a single MUS extraction algorithm by Bacchus and Katsirelos [1]. Complete results of the evaluation are available at: https://www.fi.muni.cz/%7exbendik/research/must.

All experiments were run using a time limit of 3600 seconds and computed on an Intel(R) Core(TM) i5-4690 CPU, 3.50GHz, 16 GB memory machine running Arch Linux 4.19.69-1-lts. The comparison criterion used in our evaluation is the number of identified MUSes within the given time limit.

#### **5.2 Benchmarks**

In the SAT domain, we used a collection of 291 Boolean CNF benchmarks that were taken from the MUS track of the SAT 2011 Competition<sup>6</sup>. This collection has been used in many recent MUS related papers (e.g. [22,7,9,25,2]), including the ones that present MARCO, FLINT, and MCSMUS. The benchmarks range in their size from 70 to 16 million constraints and use from 26 to 4.4 million variables. In case of 28 benchmarks, all the evaluated algorithms identified all the MUSes within the given time limit. Since the comparison criterion of our evaluation is the number of identified MUSes, the 28 benchmarks are irrelevant for the evaluation (all three tools found the same number of MUSes for these benchmarks). Therefore, only the remaining 263 benchmarks are the subject of our evaluation.

In the SMT domain, we used a collection of 433 benchmarks that were taken from the QF UF, QF IDL, QF RDL, QF LIA and QF LRA divisions of the library SMT-LIB<sup>7</sup>. Also this collection has been already used in several works, e.g. in the work by Cimatti et al. [13] or in our recent papers [9,8]. The benchmarks range in their size from 70 to 16 million constraints and use from 26 to 4.4 million variables. In case of 249 benchmarks, both the evaluated algorithms identified all the MUSes. Therefore, we focus here on the remaining 184 benchmarks.

#### **5.3 Results**

In Figs. 2a, 2b, and 2c, we provide scatter plots that compare pair-wise MUST with the other tools in the SAT domain, and in Fig. 2d a scatter plot comparing MUST with MARCO in the SMT domain. Each point in a scatter plot corresponds to a single benchmark and shows the number of MUSes identified by the two algorithms. The x-coordinate of a point is given by the algorithm that labels the x-axis and the y-coordinate is given by the algorithm that labels the y-axis. Moreover, note that each scatter plot contains three additional numbers that are above/on right/in the right corner of the plot. These numbers show the number of points that are above/below/on the diagonal, respectively.

In the SMT domain, MUST conclusively dominates MARCO: it found more, less, and the same number of MUSes as MARCO in case of 100, 32, and 52 benchmarks, respectively. In the SAT domain, MUST outperforms on majority of benchmarks

<sup>6</sup> http://www.cril.univ-artois.fr/SAT11/

<sup>7</sup> http://www.smt-lib.org/

Fig. 2: Scatter plots comparing the number of produced MUSes.

both MARCO and FLINT. Finally, MCSMUS outperforms MUST in case of 52 percent of benchmarks and is worse than MUST in case of 43 percent of benchmarks. Still, this is a very good result since MUST is a domain agnostic tool whereas MCSMUS is tailored to the SAT domain.

Besides the pair-wise comparison of the algorithms, we also provide an overall ranking of the algorithms on individual benchmarks in the SAT domain. In particular, assume that for a benchmark B both MUST and MCSMUS found 100 MUSes, FLINT found 80 MUSes, and MARCO found 50 MUSes. In such a case, MUST and MCSMUS share the 1st (best) rank for B , FLINT is 3rd, and MARCO is on the 4th position. In Fig. 3 we show the average ranking (from all benchmarks) of all algorithms for each subsequent 60 seconds of the computation. We can see that MARCO ranked the worse during the whole computation. FLINT ranked quite well during the first 600 seconds, but then its performance degraded. Finally, MUST and MCSMUS maintained the best and the second best ranking, respectively. This might be quite surprising since MCSMUS is slightly better than MUST in Fig. 2c.

Fig. 3: Average ranking in time.

The thing is that MUST mostly ranks either as 1st or 2nd on a benchmark and rarely ranks as 4th, whereas MCSMUS more often ranks as 3rd or 4th.

Finally, let us recall that our tool contains also implementation of the algorithm MARCO and thus one might be interesting in comparing the performance of MARCO in our tool and MARCO in the tool MARCO. In the SAT domain, we found our implementation to be more efficient, equal, and less efficient than MARCO in case of 68, 6, and 26 percent of benchmarks, respectively. In the SMT domain, our implementation is better, equal, and worse in 37, 29, 34 percent of benchmarks, respectively<sup>8</sup>. Therefore, shall anyone want to use the algorithm MARCO, we recommend to use our implementation.

## **6 Case Study**

During the last 4 years, we participated on the European Union's Horizon 2020 project called AMASS [26]. The project brought together researchers from academia and engineers from large industrial companies such as Honeywell, Alstom, or Infineon. The project focused on improving the process of development and certification of Cyber-Physical Systems in markets such as automotive, railway, aerospace, space, and energy. Among others, this included the development of techniques for assessing quality of system specification/requirements and this is where our tool found an application.

Establishing the requirements is an important stage in all development. In general, the requirements can be expressed either informally, e.g. using a natural language, or formally by employing a kind of mathematical logic such as the Linear Temporal Logic (LTL). The formalization removes ambiguity and allows to employ various model-based techniques, such as model checking. Moreover, we get the opportunity to verify the requirements earlier, even before any system model is built. In particular, we can verify that the requirements are consistent (satisfiable), i.e. that there can be even built a system that satisfies all the requirements. If the requirements are inconsistent, they need to be refined.

<sup>8</sup> See the appendix https://www.fi.muni.cz/%7exbendik/research/must

Fig. 4: Application of MUS enumeration in requirements analysis.

Within the AMASS project, we proposed a scheme [6] that exploits MUSes to help the user to establish a consistent set of requirements. A basic workflow of the scheme is depicted in Fig. 4. The process starts by introducing a set of requirements in some natural-language like format, yet using a restricted grammar that avoids ambiguities. In the next step, the requirements are formalized using LTL and gathered in a set C. Subsequently, C is checked for consistency. If C is consistent, then the software development process can continue with a next stage. Otherwise, a MUS enumeration tool is used to identify a set K of MUSes of C, and the user uses K to refine C. The MUS identification and refinement steps are repeated until the set of requirements becomes consistent.

We implemented the scheme in AMASS as a part of a so-called V&V manager [27]: a tool for validation and verification of the system model and system requirements. Our industrial partners employed the scheme on a set of industrial benchmarks, and evaluated two contemporary MUS enumeration tools from the LTL domain: our MUST, and Looney by Bauch et al. [4]. They found MUST to be faster by several orders of magnitude. Unfortunately, the industrial benchmarks are confidential and cannot be published in this paper. Yet, authors of Looney indeed acknowledge in their paper that Looney can handle only small input constraint sets containing just low tens of constraints. On the other hand, MUST was shown [8] to be able to efficiently work with hundreds of constraints.

## **7 Conclusion**

We presented a tool, called MUST, for online enumeration of Minimal Unsatisfiable Subsets (MUSes). MUST implements three contemporary *domain agnostic* MUS enumeration algorithms, i.e. algorithms that can be applied in any constraint domain. Currently, the tool supports enumeration in the SAT, SMT and LTL domains, and can be easily extended to support another domains. Therefore, we classify the tool itself as *domain agnostic*; it serves as (an almost) ready-touse solution for any domain where MUSes already find or eventually will find an application. We experimentally compared MUST to a domain agnostic tool by Liffiton et al. [22] in the SAT and SMT domains, and we showed that MUST conclusively dominates in both domains. Moreover, we showed that MUST is even competitive to contemporary tools that are tailored for the SAT domain.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Timed and Dynamical Systems

## **Safe Decomposition of Startup Requirements: Verification and Synthesis**

Alessandro Cimatti<sup>1</sup> , Luca Geatti1,<sup>2</sup> , Alberto Griggio<sup>1</sup> , Greg Kimberly3, and Stefano Tonetta<sup>1</sup>

<sup>1</sup> Fondazione Bruno Kessler, Trento, Italy cimatti@fbk.eu, lgeatti@fbk.eu, griggio@fbk.eu, tonettas@fbk.eu <sup>2</sup> University of Udine, Udine, Italy luca.geatti@uniud.it <sup>3</sup> The Boeing Company, Seattle, USA greg.kimberly@boeing.com

**Abstract.** The initialization of complex cyber-physical systems often requires the interaction of various components that must start up with strict timing requirements on the provision of signals (power, refrigeration, light, etc.). In order to safely allow an independent development of components, it is necessary to ensure a safe decomposition, i.e. the specification of local timing requirements that prevent later integration errors due to the dependencies.

We propose a high-level formalism to model local timing requirements and dependencies. We consider the problem of checking the consistency (existence of an execution satisfying the requirements) and compatibility (absence of an execution that reaches an integration error) of the local requirements, and the problem of synthesizing a region of timing constraints that represents all possible correct refinements of the original specification. We show how the problems can be naturally translated into a model checking and synthesis problem for timed automata with shared variables. Exploiting the linear structure of the requirements, we propose an encoding of the problem into SMT. We evaluate the SMTbased approach using MathSAT and show how it scales better than the automata-based approach using Uppaal and nuXmv.

## **1 Introduction**

Complex industrial cyber-physical systems often have an initialization procedure that requires to reach a startup mode within a specified design target time interval. In order for the system as a whole to complete the startup within the required interval, each subcomponent of the system may have to go through a number of intermediate phases, within their own target intervals, each of which may itself be dependent upon other subcomponents reaching startup or intermediate phases. E.g. for a power generation system to startup at full power, it may need to transition first through a low power output phase and a number of subsidiary systems (perhaps cooling or fuel supply) may first have to undergo their own phase transitions. In turn, these subsidiary systems may require transitions to occur in systems subsidiary to them and so on.

Traditionally, the integration of these distributed transition targets are validated via simulation and testing, which while sufficient to reach a desired design performance are labor and time intensive. Having a more efficient process for arriving at and validating a set of design targets that satisfy the overall system requirements is clearly beneficial in these contexts. Firstly, we would like to verify that these requirements prevent failed transitions in which the time performance of the subsidiary systems lead to outcomes where our main system (e.g., the power generation system) cannot perform a transition within its time window. For example, suppose the power system has a time window within which it must transition from low-power mode to high-power mode; in order for it to achieve this transition, however, it requires that two subsidiary systems, a cooling system and a fuel supply system, must themselves transition from a low-output mode to a high-output mode, each within their own target transition time windows. If these time windows are not compatible, the power generator may fail to provide the high power in time. Secondly, if our starting set of requirements is inadequate to provide this guarantee, we would like to be able to synthesize a set of requirements that is adequate to this task.

In this paper, we formalize the problem starting from a simple industrially relevant setting, where the components have a linear sequence of phases, must progress to the next phase within a certain interval of time, and must respect some dependencies upon the phases of other components. Dependencies are expressed as Boolean combinations of variables representing the component phases and are divided into two types: (i) *signal dependencies*, where the entering of a component into a phase is conditioned by the presence of other components in some specific phases; (ii) *state dependencies*, where a component can stay in a phase only if, during all its stay, other components are in some specific phases. We are interested in the following problems: 1) checking if the requirements are compatible, *i*.*e*., if all reachable states can be extended with an execution satisfying the requirements; thus, if the components satisfy the local requirements, they cannot lead the system to an illegal state (where a component does not receive the input in time); 2) checking if the requirements are consistent, *i*.*e*., there exists an execution of the components satisfying all requirements (inconsistency is actually a pathological case of incompatibility); 3) synthesizing the set of refinements (same requirements with stricter intervals) that are consistent and compatible. We show how the first two verification problems can be naturally translated into a model checking problem for timed automata with shared variables. Exploiting the linear structure of the requirements, we propose an encoding of the problem into SMT. If all intervals are bounded, the encoding is quantifier-free. Finally, both approaches have been extended to solve also the synthesis problem, using synthesis for parametrized model checking of TAs and quantifier elimination in SMT, respectively.

We implemented the SMT-based approach in a tool called TRICker and carried out experimental evaluation, comparing it with other tools for the verification of timed automata. We used Uppaal [6] and nuXmv [7] to model check TAs and MathSAT [12] to solve the SMT problems. We performed an experimental evaluation based on a test-set of randomly generated local requirements. When comparing the SMT-based approach with the automata-based one, the results highlight a better performance of the former technique on all three problems.

*Related Work* The problem of the integration and compatibility of input/output timed automata has been extensively studied in the literature. Typically, works in the literature focus on deadlock checking (see, e.g., [4,5]). The work of [2] also addresses the parameter synthesis to avoid deadlocks in timed automata. In order to check for livelocks, liveness properties can be addressed with approaches proposed in [10,7]. A general definition of illegal states for timed interface automata is given in [13]. As shown in the extended version of the paper the compatibility problem addressed in this paper can be seen as a subcase of the homonym problem for input/output timed interface automata. As we are considering a closed system, the problem reduces to the existence of a deadlock or livelock in a phase of some component (depending if the related time interval is bounded or not). Moreover, compared to the above model checking approaches we are considering a specific fragment of timed automata with a linear structure that can be exploited for specialized solutions.

Related problems have been addressed in the context of task scheduling. In the formalism introduced in [16,17], called DRT (short for *digraph real-time task* model), in which tasks and deadlines are expressed as directed graphs, the problem of determining whether a schedule exists (*feasibility problem*) bears some similarities with the consistency checking problem we study here. The DRT model allows the use of very general graph topologies, with multiple outgoing branches and loop-backs, but it does not consider dependencies across different tasks. The main difference with our work is that the problem is addressed from a *global* point of view (*i*.*e*., the existence of a global scheduler that can coordinate the execution of the tasks), whereas we are interested in local solutions, in which each requirement can be considered in isolation. Another difference is the approach used to tackle the problem: while in [16] dynamic programming is used to deal with the possible explosion of the search space, we use SMT [14] as the main framework for all the three above-mentioned problems.

*Outline.* In Sec. 2, we introduce a suitable formalism to model local requirements and we formalize the three problems. In Sec. 3, we propose the reductions of *compatibility checking* and *consistency checking* into TAs and SMT. The corresponding solutions for the *synthesis* problem are then described in Sec. 4. The experimental results are described in Sec. 5. In Sec. 6, we draw some conclusions and highlight possible future directions of this work.

## **2 Problem Statement**

*Domain formalization* We propose a high level formalism to model the local requirements.

(a) Example of system with two local requirements and one state dependency.

(b) Example of system with two local requirements and two signal dependencies.

**Definition 1 (Local Requirements)** *A specification* S *is given by a set of local (or component) requirements, where each local requirement* <sup>C</sup> <sup>∈</sup> <sup>S</sup> *is given by an (ordered) sequence* <sup>P</sup> <sup>C</sup> <sup>1</sup> ,...,P <sup>C</sup> <sup>n</sup> *of phases. In turn, each phase* <sup>P</sup><sup>i</sup> *of* <sup>C</sup> *is associated when* i > 1 *with a closed real interval* β<sup>P</sup><sup>i</sup> *with non-negative lower limit* l<sup>P</sup><sup>i</sup> *and (finite or infinite) upper limit* u<sup>P</sup><sup>i</sup> *, with a formula* φ<sup>P</sup><sup>i</sup> *(called* signal dependency*) and, when* i > 0 *with a formula* ψ<sup>P</sup><sup>i</sup> *(called* state dependency*). Both* <sup>φ</sup><sup>P</sup><sup>i</sup> *and* <sup>ψ</sup><sup>P</sup><sup>i</sup> *are Boolean formulae over the atoms in* {D, Q}<sup>D</sup>∈S\{C},Q∈<sup>D</sup> *(i.e., the phases of other components).*

If a dependency ψ<sup>P</sup> is just a conjunction of atoms, then we say that ψ<sup>P</sup> is *convex*. With the notation <sup>|</sup>C|, we will refer to the number of phases of <sup>C</sup>.

Figs. 1a and 1b show two examples of sets of local requirements. In Fig. 1a, we have two local requirements <sup>A</sup> and <sup>B</sup> (i.e., <sup>S</sup> <sup>=</sup> {A, B}); each local requirement has two phases *Off* and *On* (i.e., P <sup>A</sup> <sup>1</sup> <sup>=</sup> *Off* and <sup>P</sup> <sup>A</sup> <sup>2</sup> = *On* and similarly for B); the bounds are depicted in square brackets (thus, for example β<sup>A</sup> On = [3, 6]); all dependencies are trivially true apart from the state dependency ψ<sup>B</sup> On <sup>=</sup> A, *On* of the local requirement <sup>B</sup>, which is plotted as an arrow from the phase *On* of B to phase *On* of A. In Fig. 1b, we have another example with three components and some signal dependencies; for example, signal dependency φ<sup>C</sup> Normal <sup>=</sup> E, *Normal* is plotted as an arrow from the *transition* to phase *Normal* of C to phase *Normal* of E.

**Definition 2 (Stronger local requirements)** *We say that a local requirement* <sup>C</sup> <sup>=</sup> <sup>P</sup> <sup>C</sup>- <sup>1</sup> ,...,P <sup>C</sup>- <sup>n</sup> *is* stronger *than* <sup>C</sup> <sup>=</sup> <sup>P</sup> <sup>C</sup> <sup>1</sup> ,...,P <sup>C</sup> <sup>n</sup> *(written* <sup>C</sup> <sup>C</sup>*), iff phase* P <sup>C</sup>- <sup>i</sup> *is identical to* P <sup>C</sup> <sup>i</sup> *except that* l<sup>P</sup> <sup>C</sup> <sup>i</sup> <sup>≤</sup> <sup>l</sup> P C- i *and* u<sup>P</sup> <sup>C</sup>- i <sup>≤</sup> <sup>u</sup><sup>P</sup> <sup>C</sup> <sup>i</sup> *, for all* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>*. Given two specifications* <sup>S</sup> <sup>=</sup> {C1,...,C<sup>n</sup>} *and* <sup>S</sup> <sup>=</sup> {C 1,...,C n}*, we say that* <sup>S</sup> *is stronger than* <sup>S</sup> *(written* <sup>S</sup> <sup>S</sup>*) iff for all* <sup>i</sup>*,* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>*,* <sup>|</sup>C<sup>i</sup><sup>|</sup> <sup>=</sup> <sup>|</sup>C <sup>i</sup><sup>|</sup> *and* <sup>C</sup> <sup>i</sup> <sup>C</sup>i*.*

In defining the semantics of a composition of local requirements C<sup>1</sup> ...Cn, every local requirement C<sup>i</sup> is associated with a local clock, which is reset each time it enters a new phase. Given a local requirements specification {C1,...,C<sup>n</sup>}, we define its semantics formally by defining the predicate Reach((C1, j1, t1),...,(Cn, jn, tn)), which is true iff the phases P <sup>C</sup><sup>1</sup> <sup>j</sup><sup>1</sup> ...P <sup>C</sup><sup>n</sup> <sup>j</sup><sup>n</sup> are reachable at local times <sup>t</sup><sup>1</sup> ...tn.

**Definition 3 (Reachability for local requirements)** *Given the specification* {C<sup>1</sup> ...Cn} *and the time points* <sup>t</sup><sup>1</sup> <sup>∈</sup> <sup>R</sup> ...t<sup>n</sup> <sup>∈</sup> <sup>R</sup>*, we inductively define the predicate* Reach((C1, j1, t1),...,(Cn, jn, tn)) *as follows:*

	- *1. for all* <sup>i</sup> ∈ {<sup>1</sup> ...n} *such that* <sup>j</sup><sup>i</sup> <sup>&</sup>lt; <sup>|</sup>C<sup>i</sup>|*,* <sup>t</sup><sup>i</sup> <sup>+</sup> <sup>δ</sup> <sup>∈</sup> [<sup>l</sup> Ci <sup>j</sup>i+1, u<sup>C</sup><sup>i</sup> <sup>j</sup>i+1] *if* <sup>i</sup> <sup>∈</sup> <sup>M</sup>*, and* <sup>t</sup><sup>i</sup> <sup>+</sup> <sup>δ</sup> <sup>≤</sup> <sup>u</sup><sup>C</sup><sup>i</sup> <sup>j</sup>i+1 *otherwise;*
	- *2. for all* <sup>i</sup> <sup>∈</sup> <sup>M</sup>*, it holds that (signal dependencies):* ((C1, j1),...,(Cn, jn)) <sup>|</sup><sup>=</sup> <sup>φ</sup><sup>C</sup><sup>i</sup> ji+1
	- *3. for all* <sup>i</sup> <sup>∈</sup> <sup>M</sup>*, it holds that (state dependencies entry):* ((C1, j1),...,(Cn, jn)) <sup>|</sup><sup>=</sup> <sup>ψ</sup><sup>C</sup><sup>i</sup> ji+1
	- *4. for all* <sup>i</sup> ∈ {<sup>1</sup> ...n}*, it holds that (state dependencies invariant):* ((C1, j 1),...,(Cn, j <sup>n</sup>)) <sup>|</sup><sup>=</sup> <sup>ψ</sup><sup>C</sup><sup>i</sup> j-

i *then it holds that* Reach((C1, j 1, t 1),...,(Cn, j <sup>n</sup>, t <sup>n</sup>))*, where* j <sup>i</sup> = j<sup>i</sup> + 1 *and* t <sup>i</sup> = 0 *if* <sup>i</sup> <sup>∈</sup> <sup>M</sup> *and* <sup>j</sup><sup>i</sup> <sup>&</sup>lt; <sup>|</sup>C<sup>i</sup>|*, and* <sup>j</sup> <sup>i</sup> = j<sup>i</sup> *and* t <sup>i</sup> = t<sup>i</sup> + δ *otherwise.*

We define the predicate Comp<sup>S</sup> to be true iff there are no reachable states in S such that no component can proceed to its next phase.

**Definition 4 (Compatibility for local requirements)** *Given the set of local requirements* <sup>S</sup> <sup>=</sup> {C<sup>1</sup> ...C<sup>n</sup>}*, the predicate* Comp<sup>S</sup> *is true iff:*

$$\forall j\_1 \in \{1 \ldots |C\_1| - 1\} \ldots \forall j\_n \in \{1 \ldots |C\_n| - 1\} \,\forall t\_1 \ldots t\_n \in \mathbb{R} \left($$

$$\begin{aligned} &\quad \exists M \subseteq \{1 \ldots n\} (M \neq \emptyset \wedge \text{Reach}((C\_1, j\_1', t\_1'), \ldots, (C\_n, j\_n', t\_n'))) \end{aligned} \right)$$

*where* j <sup>i</sup> = j<sup>i</sup> + 1 *and* t <sup>i</sup> = 0 *for all* <sup>i</sup> <sup>∈</sup> <sup>M</sup>*, or* <sup>j</sup> <sup>i</sup> = j<sup>i</sup> *and* t <sup>i</sup> = t<sup>i</sup> *otherwise. If* Comp<sup>S</sup> *holds, we say that* C<sup>1</sup> ...C<sup>n</sup> *are compatible, or equivalently that* S *is compatible.*

For example, in Fig. 1a, predicate Reach((A, 1, 4),(B, 1, 4)) holds, but predicate Reach((A, <sup>1</sup>, 4),(B, <sup>2</sup>, 0)) does not, because for all <sup>δ</sup> <sup>∈</sup> <sup>R</sup> and for all <sup>S</sup> <sup>⊆</sup> {<sup>1</sup> ...n}, predicate Reach((A, <sup>1</sup>, 4),(B, <sup>2</sup>, 0)) is false.

*Strict Semantics* The above definition adopts a weakly-monotonic model of time, where discrete transitions are instantaneous and, therefore, the system may be in two different states at the same instant. The definition and the reductions to model checking and SMT can be easily adapted to have a strict semantics.

*Verification and Synthesis Problems* The core problem we address is to check if a given specification <sup>S</sup> <sup>=</sup> {C1,...,Cn} is *compatible*, *<sup>i</sup>*.*e*., if Comp<sup>S</sup> holds. The *consistency checking* problem amounts to checking if there *exists* a time point in which the final phase of all the local requirements is reached, that is it amounts to checking if the following formula holds:

$$\exists t\_1 \dots \exists t\_n \; Reach((C\_1, |C\_1|, t\_1), \dots, (C\_n, |C\_n|, t\_n))$$

If this is the case, then we say that S is *consistent*. Finally, we can formalize the *synthesis problem* as the problem of computing (a symbolic representation of) the set: {S <sup>|</sup> Comp<sup>S</sup>-<sup>∧</sup> <sup>S</sup> <sup>S</sup>}

#### **2.1 NP-hardness**

In this section, we show that the simplest of the problems defined above is already NP-hard. In fact, we show a reduction from SAT to the *consistency checking* problem.

Let <sup>ϕ</sup>(¯x) be a Boolean formula over the variables ¯<sup>x</sup> <sup>=</sup> x<sup>1</sup> ...x<sup>n</sup>; without loss of generality, we assume ϕ(¯x) to be in negated normal form, *i*.*e*., with all the negations only in front of literals. For all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, we define the local requirement corresponding to variable <sup>x</sup><sup>i</sup> as <sup>C</sup><sup>i</sup> <sup>=</sup> P<sup>i</sup> 1, P<sup>i</sup> <sup>2</sup>, such that B<sup>P</sup> <sup>i</sup> <sup>2</sup> = [0, <sup>+</sup>∞) and <sup>φ</sup><sup>P</sup> <sup>i</sup> <sup>1</sup> <sup>=</sup> <sup>ψ</sup><sup>P</sup> <sup>i</sup> <sup>1</sup> <sup>=</sup> <sup>φ</sup><sup>P</sup> <sup>i</sup> <sup>2</sup> <sup>=</sup> <sup>ψ</sup><sup>P</sup> <sup>i</sup> <sup>2</sup> = ; the idea is to encode the values <sup>⊥</sup> and of each <sup>x</sup><sup>i</sup> with the two phases <sup>P</sup><sup>i</sup> <sup>1</sup> and <sup>P</sup><sup>i</sup> <sup>2</sup>, respectively. Moreover, we define the local requirement G, which will be useful as a gadget for the reduction, as follows: <sup>G</sup> <sup>=</sup> <sup>P</sup> <sup>G</sup> <sup>1</sup> , P <sup>G</sup> <sup>2</sup> , where <sup>P</sup> <sup>G</sup> <sup>2</sup> <sup>=</sup> [0, <sup>+</sup>∞), ϕ[x<sup>i</sup> → Ci, P<sup>i</sup> <sup>2</sup>, <sup>¬</sup>x<sup>i</sup> → Ci, P<sup>i</sup> <sup>1</sup>], . The specification <sup>S</sup><sup>ϕ</sup> corresponding to the Boolean formula <sup>ϕ</sup>(¯x) is defined as <sup>S</sup><sup>ϕ</sup> <sup>=</sup> {G, C1,...,C<sup>n</sup>}. It holds that <sup>ϕ</sup>(¯x) is satisfiable if and only if S<sup>ϕ</sup> is consistent. In fact, if S<sup>ϕ</sup> is consistent, then there exists a time point in which the signal dependency of the second phase of G has been satisfied, and thus ϕ(¯x) is satisfiable. Viceversa, let's suppose that ϕ(¯x) is satisfiable and let M be an arbitrary model of it, expressed as the set of true atoms, in which we also substitute every x<sup>i</sup> in it with the pair Ci, P<sup>i</sup> <sup>2</sup>. Since the local requirements <sup>C</sup><sup>1</sup> ...C<sup>n</sup> have no dependencies and, together with G, have only infinite bounds, there exists a time t such that predicate Reach((G, P <sup>G</sup> <sup>1</sup> , t),(C1, P <sup>G</sup> <sup>b</sup><sup>1</sup> , t1),...,(Cn, P <sup>n</sup> <sup>b</sup><sup>n</sup> , tn)) is true, where for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, <sup>b</sup><sup>i</sup> = 2 and <sup>t</sup><sup>i</sup> = 0 iff <sup>x</sup><sup>i</sup> <sup>∈</sup> <sup>M</sup> and <sup>t</sup><sup>i</sup> <sup>=</sup> <sup>t</sup> otherwise. By definition of Reach (see Definition 3), this implies that Reach((G, P <sup>G</sup> <sup>2</sup> , t),(C1, P<sup>1</sup> <sup>2</sup> , t),...,(Cn, P <sup>n</sup> <sup>2</sup> , t)) holds, *i*.*e*., S is consistent.

In Sec. 3.2, we will give an encoding of the consistency checking problem based on SMT(DL) (*i*.*e*., *Satisfiability Modulo Theory of Difference Logic*). In particular, we will show that the problem can be reduced to the satisfiability of a formula in SMT(DL). Since the latter belongs to NP [15], the consistency checking problem belongs to NP as well, having that *consistency checking* is NP-complete.

## **3 Verification**

#### **3.1 Reduction to Model Checking**

In order to formalize the two verification problems into ones of model checking networks of timed automata, we use timed automata with shared variables. To this end, besides the clock constraints <sup>Ξ</sup>(C), we define <sup>L</sup> <sup>=</sup> {lA, lB,... } as a set of *location variables* (one for each automaton <sup>A</sup> in the network), and <sup>Θ</sup>(L) as the set of all Boolean combinations of atoms of type <sup>l</sup><sup>A</sup> <sup>=</sup> <sup>v</sup>A, where <sup>A</sup> is a timed automata, <sup>l</sup><sup>A</sup> <sup>∈</sup> <sup>L</sup> and <sup>v</sup><sup>A</sup> is a state of <sup>A</sup>.

**Definition 5 (Timed Automata with Shared Variables)** *A* timed automaton with shared variables *(TASV, for short)* <sup>A</sup> <sup>=</sup> VA, v<sup>0</sup> A, lA, CA, invcl A, invloc <sup>A</sup> , TA *consists of:*


Given a set of clocks <sup>C</sup>, we denote with <sup>ν</sup> : <sup>C</sup> <sup>→</sup> <sup>R</sup> <sup>a</sup> *clock valuation*, that is a function assigning a rational value to each clock; with V<sup>C</sup> , we denote the set of all possible clock valuations over <sup>C</sup>. For <sup>t</sup> <sup>∈</sup> <sup>R</sup>, <sup>ν</sup> <sup>+</sup><sup>t</sup> is the clock valuation which maps every clock <sup>c</sup> <sup>∈</sup> <sup>C</sup> to the value <sup>ν</sup>(c) + <sup>t</sup>. For <sup>R</sup> <sup>⊆</sup> <sup>C</sup>, we define <sup>ν</sup>[<sup>R</sup> → 0] to be the valuation that maps <sup>x</sup> to 0 if <sup>x</sup> <sup>∈</sup> <sup>R</sup>, and to <sup>ν</sup>(x) otherwise. When defining the product of two TASVs, we will deal with tuples (lA<sup>1</sup> ,...,lA<sup>n</sup> ) of location variables; in this context, we usually denote with λ any function from the set of <sup>n</sup>-tuples of location variables to the set <sup>V</sup><sup>A</sup><sup>1</sup> ×···× <sup>V</sup><sup>A</sup><sup>n</sup> . Moreover, we write that <sup>λ</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup> (where <sup>Φ</sup> <sup>∈</sup> <sup>Θ</sup>(L)) iff <sup>Φ</sup>[l<sup>A</sup><sup>i</sup> → <sup>v</sup><sup>A</sup><sup>i</sup> , for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>] is true and <sup>λ</sup>((...,l<sup>A</sup><sup>i</sup> ,...)) = (...,v<sup>A</sup><sup>i</sup> ,...). We give the semantics of a TASV in terms of traces and we define their product as described below.

**Definition 6 (Trace of a TASV)** *<sup>A</sup>* trace <sup>τ</sup> *of a TASV* <sup>A</sup> <sup>=</sup> VA, v<sup>0</sup> <sup>A</sup>, lA, CA, invcl A, invloc <sup>A</sup> , TA *is a (either finite or infinite) sequence of* states *of the form:*

$$\langle v\_0, \nu\_0, \lambda\_0 \rangle \xrightarrow{\alpha\_1} \langle v\_1, \nu\_1, \lambda\_1 \rangle \xrightarrow{\alpha\_2} \langle v\_2, \nu\_2, \lambda\_2 \rangle \xrightarrow{\alpha\_3} \dots$$

*such that* <sup>v</sup><sup>i</sup> <sup>∈</sup> <sup>V</sup>A*,* <sup>α</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup> ∪ {τ}*,* <sup>ν</sup><sup>i</sup> ∈ V<sup>C</sup><sup>A</sup> *and* <sup>λ</sup><sup>i</sup> ∈ V<sup>L</sup> *for all* <sup>i</sup> <sup>≥</sup> <sup>0</sup>*, and:*

	- *(timed transition) if* <sup>α</sup> <sup>∈</sup> <sup>R</sup>*, then* <sup>v</sup><sup>i</sup>+1 <sup>=</sup> <sup>v</sup><sup>i</sup> *and* <sup>ν</sup><sup>i</sup>+1 <sup>=</sup> <sup>ν</sup><sup>i</sup> <sup>+</sup> <sup>α</sup>*,* <sup>ν</sup><sup>i</sup> <sup>+</sup> <sup>δ</sup> <sup>|</sup><sup>=</sup> invcl <sup>A</sup>(vi)*, for all* <sup>0</sup> <sup>≤</sup> <sup>δ</sup> <sup>≤</sup> <sup>α</sup>*, and* <sup>λ</sup><sup>i</sup>+1(lA) = <sup>v</sup>i*;*

• *(discrete transition) if* <sup>α</sup> <sup>=</sup> <sup>τ</sup> *then there is a tuple* (vi, Ri, Ξi, Φi, vi+1) <sup>∈</sup> <sup>T</sup><sup>A</sup> *such that:* <sup>ν</sup><sup>i</sup> <sup>|</sup><sup>=</sup> invcl <sup>A</sup>(vi) <sup>∧</sup> <sup>Ξ</sup>i*;* <sup>λ</sup><sup>i</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup>i*;* <sup>ν</sup>i+1 <sup>=</sup> <sup>ν</sup>i[R<sup>i</sup> → 0]*;* <sup>ν</sup>i+1 <sup>|</sup><sup>=</sup> invcl <sup>A</sup>(vi+1)*;* <sup>λ</sup>i+1(lA) = <sup>v</sup>i+1*, and* <sup>λ</sup>i+1 <sup>|</sup><sup>=</sup> invloc <sup>A</sup> (vi+1)*.*

**Definition 7 (Product of TASVs)** *Given two TASVs* A *and* B*, their product is the TASV* A⊗B *defined as follows:*


$$\begin{aligned} T\_{\mathcal{A}\circledast \mathcal{B}} &= \{ ((v, u), R, \Xi, \Phi, (v', u)) \mid (v, R, \Xi, \Phi, v') \in T\_{\mathcal{A}} \} \cup \\ &\{ ((v, u), R, \Xi, \Phi, (v, u')) \mid (u, R, \Xi, \Phi, u') \in T\_{\mathcal{B}} \} \end{aligned}$$

It is worth noting that each TASV corresponds to a timed automaton defined in the standard way [1], and viceversa. We define now the TASV corresponding to a local requirement.

**Definition 8 (TASV for a Local Requirement)** *Let* <sup>C</sup> <sup>=</sup> <sup>P</sup> <sup>C</sup> <sup>1</sup> ,...,P <sup>C</sup> <sup>n</sup> *be a local requirement. We define the corresponding TASV* <sup>A</sup> <sup>=</sup> {VA, v<sup>0</sup> <sup>A</sup>, lA, CA, invcl A, invloc <sup>A</sup> , TA} *as follows:*


*where* <sup>Φ</sup><sup>P</sup> := <sup>φ</sup><sup>P</sup> [(d, j) → (l<sup>d</sup> <sup>=</sup> <sup>v</sup><sup>j</sup> )]*, for each phase* <sup>P</sup> *(the same holds for* <sup>Ψ</sup>*);*

Fig. 2: Example of TASV corresponding to a local requirement.

*Example.* Consider Fig. 1a: the corresponding TASV is depicted in Fig. 2. Each phase of each local requirement corresponds to a location of the corresponding TASV; in the example, phase *off* is mapped into location off. The first locations of automata A and <sup>B</sup> have attached the invariants <sup>c</sup><sup>A</sup> <sup>≤</sup> <sup>6</sup> and <sup>c</sup><sup>B</sup> <sup>≤</sup> 4, respectively. Automaton A proceeds to location *on* (corresponding to phase A.on) by a tran-

sition labelled with clock constraint 3 <sup>≤</sup> <sup>c</sup><sup>A</sup> <sup>≤</sup> 6 and clock reset <sup>c</sup><sup>A</sup> := 0. Since the second phase of local requirement A has no dependencies, the transition to

on has no constraints on the location variables. The situation is different for automaton <sup>B</sup>, for which the transition to on is labelled with 2 <sup>≤</sup> <sup>c</sup><sup>B</sup> <sup>≤</sup> 4 and <sup>c</sup><sup>B</sup> := 0, and also with <sup>ψ</sup>on := (l<sup>A</sup> <sup>=</sup> on), that is the state dependency of phase B.on; moreover, ψon is also an invariant for the second location of automaton B, since it is a state dependency.

Given a network S := A<sup>1</sup> ×···×A<sup>n</sup> of TASVs, the problem of *consistency checking* can be expressed as the reachability of location (A1.last, . . . , <sup>A</sup>n.last) <sup>∈</sup> <sup>V</sup><sup>S</sup> . A *deadlock* of a TASV <sup>A</sup> is defined as a state (v, t) <sup>∈</sup> <sup>V</sup><sup>A</sup> <sup>×</sup> <sup>R</sup> such that <sup>A</sup> can take neither a timed nor a discrete transition from (v, t). We call *livelock* a state (v, t) such that <sup>A</sup> can take only timed transitions. The *compatibility checking* problem can be expressed as the problem of checking if there exists a trace of S such that (i) either the trace is finite and its final state is a deadlock of S; we can check this property by adding a *sink* location to the TASV S to which all locations can transition to and by checking the reachability of it; (ii) or the trace is infinite and there exists a location <sup>v</sup> <sup>∈</sup> <sup>V</sup><sup>S</sup> and a point <sup>k</sup> <sup>≥</sup> 0 such that <sup>l</sup><sup>S</sup> <sup>=</sup> <sup>v</sup> = (A1.last, . . . , <sup>A</sup><sup>n</sup>.last), for all the states after <sup>k</sup> in the trace, where the i th component of v together with the time of the current state is a *livelock* for automata <sup>A</sup><sup>i</sup>, for some 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>. The second point is fundamental for local requirements featuring *infinite bounds*: in these automata, it is not sufficient to check for deadlocks, since a timed transition could be always enabled; instead, an illegal state can be described by a trace of the system that reaches a *livelock* whose location has no invariants attached and then stays constantly in this location. Having reached a livelock, the automaton can proceed only with timed moves: in particular, it can't proceed to the next location because its dependencies are violated. We can check the second point in this way: we first add a sink location sinkA<sup>i</sup> <sup>v</sup> for each location <sup>v</sup> ∈ A<sup>i</sup> (and of course a transition from the latter to the former), for each 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, and we attach to it the invariant <sup>¬</sup>invloc <sup>A</sup><sup>i</sup> (v). Now, in the product <sup>S</sup> of these modified automata, we look for a trace such that, from a certain time point onwards, it stays constantly in a location (l1,...,ln) such that at least one l<sup>i</sup> is a sink state. This property can be formalized in *Linear Temporal Logic* as F G( <sup>1</sup>≤i≤n,v∈A<sup>i</sup> sink<sup>A</sup><sup>i</sup> <sup>v</sup> ).

#### **3.2 Encoding into SMT(DL)**

We describe the encoding into SMT(DL) (Satisfiability Modulo Theory of Difference Logic) for the problems of *consistency checking* and *compatibility checking*. For all 1 <sup>≤</sup> <sup>c</sup> <sup>≤</sup> <sup>n</sup> and 1 <sup>≤</sup> <sup>i</sup> ≤ |c|, we introduce the following variables: (i) r<sup>c</sup> <sup>i</sup> <sup>∈</sup> <sup>B</sup> represents the fact that phase <sup>i</sup> of local requirement <sup>c</sup> is *reachable*; (ii) s<sup>c</sup> <sup>i</sup> = (t c <sup>i</sup> , p<sup>c</sup> <sup>i</sup> ) represents the superdense time instant in which local requirement c enters phase i, where t c <sup>i</sup> <sup>∈</sup> <sup>R</sup> and <sup>p</sup><sup>c</sup> <sup>i</sup> <sup>∈</sup> <sup>N</sup>. We can compare two superdensevalued variables (t, p) and (t , p ) with the lexicographical order, which we define as follows: (t, p) (<sup>t</sup> , p ) iff <sup>t</sup> <sup>≤</sup> <sup>t</sup> <sup>∧</sup> (<sup>t</sup> <sup>=</sup> <sup>t</sup> <sup>→</sup> <sup>p</sup> <sup>≤</sup> <sup>p</sup> ). We now give the set of (conjunctively related) constraints which form our SMT(DL) encoding.

*Initialization.* Each local requirement starts in its first phase at the same time, *<sup>i</sup>*.*e*., the real time point 0. Hence, for all 1 <sup>≤</sup> <sup>c</sup> <sup>≤</sup> <sup>n</sup>, we add the constraint <sup>t</sup> c <sup>0</sup> = 0. *Reachability.* For all local requirements <sup>c</sup> and all phases <sup>i</sup>, it holds that if <sup>i</sup> <sup>−</sup> <sup>1</sup> is not reachable then so is phase <sup>i</sup>, *<sup>i</sup>*.*e*., <sup>¬</sup>r<sup>c</sup> <sup>i</sup>−<sup>1</sup> → ¬r<sup>c</sup> <sup>i</sup> . Moreover, we require the monotonicity over time, *i*.*e*., r<sup>c</sup> <sup>i</sup> <sup>→</sup> (s<sup>c</sup> <sup>i</sup>−<sup>1</sup> <sup>≺</sup> <sup>s</sup><sup>c</sup> <sup>i</sup> ).

*Bounds.* For all local requirements c and all phases i, c can move to i only if it respects the bounds [l c <sup>i</sup> , u<sup>c</sup> <sup>i</sup> ] of phase i, namely r<sup>c</sup> <sup>i</sup> <sup>→</sup> (<sup>l</sup> c <sup>i</sup> <sup>≤</sup> <sup>t</sup> c <sup>i</sup> <sup>−</sup> <sup>t</sup> c <sup>i</sup>−<sup>1</sup> <sup>≤</sup> <sup>u</sup><sup>c</sup> <sup>i</sup> ). If uc <sup>i</sup> = ∞, then we add only the left-most inequality.

*Signal and State dependencies.* Consider a local requirement c and one of its phases i. Since we have only a *finite* number of phases, we can preprocess both signal and state dependencies to remove from them all negations, as explained in the extended version of the paper <sup>4</sup>; this means that every atom in φ<sup>c</sup> <sup>i</sup> and ψ<sup>c</sup> i occurs positive.

We want c to reach i only if all its signal and state dependencies are satisfied. For signal dependencies, we require the time point in which c enters i to be strictly greater<sup>5</sup>than the time point of the entry of the target phase and smaller than or equal to the time point of the exit of the target phase.

$$r\_i^c \rightarrow \phi\_i^c[(d, j) \mapsto (r\_j^d \land s\_j^d \prec s\_i^c \preceq s\_{j+1}^d)],$$

Moreover, we have to guarantee that the state dependencies hold as well. In particular, if phase i is reachable, then surely the time point in which c enters i has to be strictly greater than the time point in which the other local requirement reaches the target phase.

$$r\_i^c \rightarrow \psi\_i^c[(d, j) \mapsto (r\_j^d \wedge s\_j^d \prec s\_i^c)],$$

Since state dependencies are invariant properties, *i*.*e*., they have to hold for each time instant a local requirement is in a particular phase, if one state dependency is violated at some time point of phase <sup>i</sup> <sup>−</sup> 1, then phase <sup>i</sup> is not reachable. The contrapositive means that if phase i is reachable, then the state dependencies of phase <sup>i</sup> <sup>−</sup> 1 have to be invariant for phase <sup>i</sup> <sup>−</sup> 1, namely:

$$r\_i^c \to \forall \tilde{s} (s\_{i-1}^c \preceq \tilde{s} \preceq s\_i^c \to \psi\_{i-1}^c [(d, j) \mapsto (r\_j^d \wedge s\_j^d \prec \tilde{s} \preceq s\_{j+1}^d)]) \tag{1}$$

*Illegal States.* If phase i of local requirement c is not reachable, *i*.*e*., i is an *illegal state*, then there exists a time point s<sup>c</sup> ill such that, for all the next (remaining) time points ¯s between s<sup>c</sup> ill and the upperbound of the transition, at least one dependency is not satisfied.

$$\vdash (r\_{i-1}^c \land \neg r\_i^c) \rightarrow \exists s\_{ill}^c \forall \bar{s} (s\_{ill}^c \preceq \bar{s} \preceq s\_{i-1}^c + u\_{i-1}^c \rightarrow \text{VIOLATION}(\bar{s})) \tag{2}$$

<sup>4</sup> http://users.dimi.uniud.it/∼luca.geatti/tricker.html

<sup>5</sup> This allows us to model the observability of the events: c first observes d entering its phase j and then moves.

where

$$\text{ViOLATION}(\bar{s}) := \neg \phi\_i^c[(d, j) \mapsto (r\_j^d \land s\_j^d \prec \bar{s} \preceq s\_{j+1}^d)] \lor \tag{3}$$

$$\neg \psi\_i^c[(d, j) \mapsto (r\_j^d \land s\_j^d \prec \bar{s})] \lor \tag{4}$$

$$\exists \tilde{s} (s\_{i-1}^c \preceq \tilde{s} \preceq \bar{s} \land \neg \psi\_{i-1}^c[(d, j) \mapsto (r\_j^d \land s\_j^d \prec \tilde{s} \preceq s\_{j+1}^d)]) \tag{5}$$

We interpret ¯<sup>s</sup> <sup>s</sup><sup>c</sup> <sup>i</sup>−<sup>1</sup> <sup>+</sup> <sup>u</sup><sup>c</sup> <sup>i</sup> as <sup>∀</sup>p¯(¯<sup>s</sup> <sup>s</sup><sup>c</sup> <sup>i</sup>−<sup>1</sup> + (u<sup>c</sup> <sup>i</sup> , p¯)) and the + symbol as the pairwise sum. In the case the upperbound of the transition is infinite, we simply do not add the ¯<sup>s</sup> <sup>s</sup><sup>c</sup> <sup>i</sup>−<sup>1</sup> <sup>+</sup> <sup>u</sup><sup>c</sup> <sup>i</sup> inequality. We refer to the conjunction of all these constraints as W.

For *consistency checking*, we define END := <sup>1</sup>≤c≤<sup>n</sup> <sup>r</sup>|c<sup>|</sup> and we call Wcons the conjunction of W with END. We check consistency by checking the satisfiability of Wcons.

For *compatibility checking*, we define ILL := <sup>1</sup>≤c≤<sup>n</sup> <sup>1</sup>≤i≤|c<sup>|</sup> ¬rc <sup>i</sup> and we call Will the

conjunction of W with ILL. We check the existence of an illegal state in the system by checking the satisfiability of Will, *i*.*e*., Will is satisfiable iff the local requirements are *not* compatible.

*Strict Semantics* In the strict semantics setting, we forbid two events to occur at the same real-time point. For strict semantics, the encoding is equal to W except that we interpret <sup>≺</sup> and as <sup>&</sup>lt; and <sup>≤</sup>, respectively, and all the <sup>s</sup><sup>c</sup> <sup>i</sup> variables as single real-valued variables t c <sup>i</sup> <sup>∈</sup> <sup>R</sup>. We call S this encoding and we define Scons and Sill as above.

*Finite bounds and convex dependencies.* Despite being very close to the problem formalization, the W encoding features a high number of quantifications, also in alternation; therefore, in the general case, it is very burdensome for an SMT solver to first perform quantifier elimination on W and then to solve the resulting formula. Nevertheless, if we make some restrictions on the type of local requirements we consider, we are able to remove *upfront* all the quantifiers from W, without the need to use quantifier elimination techniques. In fact, suppose we consider only local requirements with *finite bounds* and *convex* state dependencies (see Sec. 2). We call W ill fin the encoding equal to W except that Eq. (1) is replaced by:

$$r\_i^c \to \psi\_{i-1}^c[(d, j) \mapsto (r\_j^d \land s\_i^c \preceq s\_{j+1}^d)] \tag{6}$$

and we add the following constraint:

$$(r\_{i-1}^c \land \neg r\_i^c) \to (t\_i^c = t\_{i-1}^c + u\_{i-1}^c) \tag{7}$$

and we replace Eq. (2) with:

$$(r\_{i-1}^c \land \neg r\_i^c) \to \text{WEAKVIOL}(t\_i^c) \tag{8}$$

where:

$$\begin{aligned} \text{WEAKVIOL}(t\_i^c) &:= \neg \phi\_i^c[(d, j) \mapsto (r\_j^d \land t\_j^d \le t\_i^c < t\_{j+1}^d)] \lor \\ &\neg \psi\_i^c[(d, j) \mapsto (r\_j^d \land t\_j^d \le t\_i^c)] \lor \\ &\neg \psi\_{i-1}^c[(d, j) \mapsto (r\_j^d \land t\_i^c \le t\_{j+1}^d)] \end{aligned} \tag{9}$$

We can prove that Will and W ill fin are equisatisfiable for every set of local requirements with only finite bounds and convex dependencies. Notably, there are no quantifiers in Will fin: as said before, this makes the encoding dramatically more efficient with respect to W: in Sec. 5, we will consider only local requirements of this type. The details of the proofs are reported in the extended version of the paper in which, given that the proofs are a bit involved, we proceed incrementally, showing first how we can remove upfront the quantifiers in case of finite bounds with strict semantics, then in the case with weak semantics and finally in case of convex dependencies.

## **4 Synthesis**

In this section, we tackle the *synthesis* problem, *i*.*e*., computing the set of *all stronger* local requirements (as defined in Def. 2) of the initial local requirements such that their composition is *compatible*. We solve this problem by reducing it to a *parameter synthesis* problem (see [9] for a more detailed description); given a local requirement <sup>C</sup>, its corresponding *parametric local requirement* C, π is defined as C (see Sec. 2), except that the bounds l<sup>P</sup> and u<sup>P</sup> of each phase P are now the parameters ¯l<sup>P</sup> and ¯u<sup>P</sup> , respectively, and <sup>π</sup> := {¯l<sup>P</sup> <sup>|</sup> <sup>P</sup> is a phase of <sup>C</sup>}∪ {u¯<sup>P</sup> <sup>|</sup> <sup>P</sup> is a phase of <sup>C</sup>}. Given a set of local requirements <sup>S</sup> <sup>=</sup> {C1,...,C<sup>n</sup>}, we write S, Π for its parametric version {C1, π1,...,Cn, π<sup>n</sup>}, where the set of parameters is defined as Π := <sup>n</sup> <sup>i</sup>=1 <sup>π</sup>i. A parameter valuation <sup>γ</sup> : <sup>Π</sup> <sup>→</sup> <sup>Q</sup> assigns a rational value to each parameter; moreover, for each 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, it also induces a (concrete) local requirement Ci, γ(πi), obtained from Ci, π<sup>i</sup> by replacing every parameter <sup>p</sup> <sup>∈</sup> <sup>π</sup><sup>i</sup> with the concrete value <sup>γ</sup>(p). In the same way, we can define the *concrete* version S, γ(π) of S, π. <sup>γ</sup> is said to be *feasible for* <sup>S</sup> if Ci, γ(πi) is a *stronger* local requirement of <sup>C</sup>i, for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, and S, γ(π) is *compatible*. A *feasible region* is a set <sup>R</sup> := {<sup>γ</sup> <sup>|</sup> <sup>γ</sup> is feasible for <sup>S</sup>}. Also in this case, we can either use parameter synthesis algorithms over timed automata [3] or reduce the problem to SMT(LRA); we focus on the latter and in particular, we will synthesize a symbolic representation of the region R, namely an SMT formula <sup>ϕ</sup><sup>R</sup> with the following property: <sup>γ</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup><sup>R</sup> iff <sup>γ</sup> ∈ R, for each valuation γ.

Let Will be the encoding equal to Will except that each number l c <sup>i</sup> (resp. u<sup>c</sup> i ) is replaced with the variable ¯l c <sup>i</sup> (resp. ¯u<sup>c</sup> <sup>i</sup> ) and each phase is required to have finite bounds. We define the sets of variables <sup>R</sup> := {r<sup>c</sup> <sup>i</sup> <sup>|</sup> <sup>c</sup> <sup>∈</sup> S, i is a phase of <sup>c</sup>} and <sup>S</sup> := {s<sup>c</sup> <sup>i</sup> <sup>|</sup> <sup>c</sup> <sup>∈</sup> S, i is a phase of <sup>c</sup>}: these are the variables we are going to remove by means of quantifier elimination. Finally, we define:

$$\begin{array}{rcl} \text{DOMAIN} & := & \bigwedge\_{\begin{subarray}{c} 1 \leq c \leq n \\ 1 \leq i \leq |c| \end{subarray}} \left( \bar{l}\_{i}^{c} \geq 0 \wedge \bar{l}\_{i}^{c} \leq \bar{u}\_{i}^{c} \right) \\ \text{REFINE} & := & \bigwedge\_{\begin{subarray}{c} 1 \leq c \leq n \\ 1 \leq i \leq |c| \\ u\_{i}^{c} \neq \infty \end{subarray}} \left( a\_{i}^{c} \leq \bar{l}\_{i}^{c} \wedge \bar{u}\_{i}^{c} \leq b\_{i}^{c} \right) \\ & & \quad \times \text{ $u\_{i}^{c} \neq \infty$ } \end{array}$$

The symbolic representation of the *feasible region* R is given by:

$$\text{SYNTH} := \text{DOMAIN} \land \text{REFINE} \land \neg \exists \mathbb{S}, \mathbb{R}\left(\overline{\text{W}^{\text{ill}}}\right) \tag{10}$$

By removing the existential quantification on S and R (this can be done by means of quantifier elimination techniques), we obtain a quantifier-free formula over the variables in Π. By construction, we have that each model γ of SYNTH is a feasible valuation, and viceversa. Therefore SYNTH is the symbolic representation of the feasible region R.

## **5 Experimental Evaluation**

We implemented the encoding described in Sec. 3.2 in a tool called TRICker (Timing Requirements Integration Checker) <sup>6</sup>, which uses MathSAT [12] as the backend SMT engine. We compared TRICker with Uppaal [6] and Timed-nuXmv [8], both using the automata-based encoding described in Sec. 3.1.

The test set is partitioned into three categories: (i) bounded convex contains only systems with finite bounds and convex state dependencies; (ii) bounded contains systems with only finite bounds, but with arbitrary dependencies (not necessarily convex); (iii) general contains systems with infinite bounds and arbitrary dependencies (this is the most general fragment). Each category in turn consists of ca. 500 randomly-generated systems, divided in 10 sub-categories, namely 2c3p, 2c15p, 5c3p, 5c20p, 10c4p, 10c30p, 50c5p, 50c30p, 100c3p and 100c10p, where NcMp is the category containing only systems with N components and (approximately) M phases for each component. Inside each sub-category, each benchmark is randomly generated, meaning that the exact number of phases for each component and the density of its signal and state dependencies was chosen uniformly at random. For each benchmark, we compare the time spent by the three tools on the *consistency checking* and *compatibility checking* problems. We ran the experiments on a cluster of Linux machines with a 2.27GHz Xeon CPU, with a timeout of 360 seconds for each instance.

We consider first the bounded convex category. Fig. 3 shows the comparison of TRICker with Timed-nuXmv and Uppaal on the two verification problems. In both cases, Timed-nuXmv runs the infinite-state variant of IC3 described in [11] after discretizing the timed automata. As for Uppaal, we verify a property in the form EFϕ, where ϕ is a Boolean formula. For both problems, the SMTbased approach implemented in TRICker outperforms the model checkers. While there are a number of instances for which the model checkers perform better

<sup>6</sup> http://users.dimi.uniud.it/∼luca.geatti/tricker.html

Fig. 3: Comparison on the bounded convex category (*consistency checking* on the first row and *compatibility checking* on the second).

than TRICker (especially for Uppaal), the latter overall solves a significantly larger amount of problems within the timeout, showing a clear improvement in scalability. This can be seen also in the survival plots comparing the three tools with the Virtual Best Solver (*vbs* for short). We can make similar considerations for the bounded and general categories, shown respectively in Fig. 4 and Fig. 5. (Note that for the general case, we could not evaluate Uppaal as it does not support the verification of fairness properties.) We remark that we did not note any kind of correlation between the number of signal or state dependencies in the benchmarks and the time spent by the solver. Finally, Fig. 6 shows the correlation between the memory (measured in MB) and the time (in seconds) spent by TRICker on consistency and compatibility checking, respectively.

We also evaluated the parameter synthesis algorithm described in Sec. 4. Since Uppaal currently does not support parameter synthesis for timed automata, we could not include it in the comparison. We therefore compared TRICker with Timed-nuXmv, for which we used the ParamIC3 parameter synthesis algorithm described in [9]. The algorithm is based on the inverse method, *i*.*e*., it finds a bad configuration for the parameters and it tries to generalize it, maximizing the set of bad parameters removed from the current approximation of the region. We took all the consistent benchmarks of the previous test sets, which amounts to approximately 100 instances (note that for each instance of the class NcMp, the number of parameters is <sup>≈</sup> <sup>2</sup> · <sup>N</sup> · <sup>M</sup><sup>7</sup>). The results of the comparison are shown in Fig. 7; as in the previous cases, TRICker shows better performance and

<sup>7</sup> recall that both the lower and the upper bounds are parameters.

Fig. 4: Comparison on the bounded category (*consistency checking* on the first row and *compatibility checking* on the second).

scalability than ParamIC3, though there are several instances for which synthesis via quantifier elimination is still very expensive.

## **6 Conclusions**

In this paper, we defined verification and synthesis problems of industrial relevance focused on the decomposition of startup requirements into local timing constraints and dependencies on components. Namely, we addressed the problem of checking if the local requirements are free of integration errors (i.e., consistent and compatible), and the problem of synthesizing the region of refinements of the original specification that are error free. The problem can be naturally translated into model checking and synthesis problems for timed automata with shared variables. Exploiting the structure of the requirements, we provide an encoding into SMT where consistency and incompatibility correspond to satisfiability queries, while synthesis is resolved by means of quantifier elimination.

In the future, we will consider various directions, such as extending the applicability of the approach to more general structures with loops, enriching the synthesis problem with cost functions to repair the specification driven by specific industrial goals, and considering more complex representations of signals exchanged between components.

Fig. 5: Comparison on the general category (*consistency checking* on the first row and *compatibility checking* on the second).

Fig. 6: Comparison between time and memory consuption of TRICker (*consistency checking* on the left and *compatibility checking* on the right).

Fig. 7: Comparison on parameter synthesis.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Multi-Agent Safety Verification using Symmetry Transformations-

Hussein Sibai , Navid Mokhlesi , Chuchu Fan , and Sayan Mitra {sibai2,navidm2,cfan10,mitras}@illinois.edu

University of Illinois, Urbana IL 61801, USA

Abstract. We show that symmetry transformations and caching can enable scalable, and possibly unbounded, verification of multi-agent systems. Symmetry transformations map any solution of the system to another solution. We show that this property can be used to transform cached reachsets to compute new reachsets, for hybrid and multi-agent models. We develop a notion of a *virtual system* which defines symmetry transformations for a broad class of agent models that visit waypoint sequences. Using this notion of a virtual system, we present a prototype tool CacheReach that builds a cache of reachsets, in a way that is agnostic of the representation of the reachsets and the reachability analysis method used. Our experimental evaluation of CacheReach shows up to 64% savings in safety verification computation time on multi-agent systems with 3-dimensional linear and 4-dimensional nonlinear fixed-wing aircraft models following sequences of waypoints. These savings and our theoretical results illustrate the potential benefits of using symmetry-based caching in the safety verification of multi-agent systems.

## 1 Introduction

As the cornerstone for safety verification of dynamical and hybrid systems, reachability analysis has attracted attention and has delivered automatic analysis of automotive, aerospace, and medical applications [2,24,17,11]. Notable advances from the last few years include the development of the generalized star data-structure [14] and the HyLaa tool [3] which can analyze massive linear models [4]; Taylor model based reachability analysis algorithms for nonlinear systems and their implementations in Flow\* [7]; and a simulation-based algorithm that guarantees locally optimal precision [15].

Exact symbolic reachability analysis of nonlinear models is generally hard. One prominent approach is based on generalizing individual behaviors or simulations to cover a whole set of behaviors. The idea was pioneered in [10] and implemented in Breach [9] with sound generalization guarantees for linear models based on *sensitivity analysis*. Subsequently, the idea has been significantly extended to cover nonlinear, hybrid, and black-box models and it has been implemented in tools like C2E2 and DryVR [12,19,17,16].

In all of the above, a single behavior ξ of the system from an initial state, is generalized to a *compact set* of *neighboring* behaviors that contains all the behaviors starting

<sup>-</sup> The authors are supported by a research grant from The Boeing Company and a research grant from NSF (CPS 1739966). We would like to thank John L. Olson and Arthur S. Younger from The Boeing Company for valuable technical discussions.

from a small neighborhood around the initial state of ξ . Thus, the computed neighboring set of behaviors always contains ξ and its size is determined by the algorithms for sensitivity analysis. In contrast, the type of generalization we pursue here uses *symmetry transforms* on the state space. Given a group Γ of operators on the state space, and a single behavior ξ , we can generalize ξ to γ(ξ ), for each γ ∈ Γ . Symmetry transformations can be applied to sets of behaviors symbolically. Not only can this type of generalization work in conjunction with sensitivity analysis, it captures structural properties of the system that make behaviors similar in a way that is not covered by sensitivity analysis.

In our recent work [29], we showed how symmetry transforms can be used to produce new reachsets from other previously computed reachsets for *non-parameterized* dynamical systems. In this paper, we introduce the use of symmetry transforms of *parameterized* dynamical systems for safety verification. We present an algorithm symComputeReachtube (Algorithm 1) which caches and reuses reachsets, avoiding repeating expensive computations. We show how an infinite number of reachsets can be obtained by transforming a single one using symmetry transforms (Corollary 2). Building on it, we provide unbounded time safety guarantees using finite cached safety checking results (Theorem 6).

The key contributions of this paper are as follows.

First, we show how symmetry transformations for parameterized dynamical systems can be used to compute reachable states (Theorem 2). Going well beyond the previous theory [29], this enables *cached reachtubes* to be reused for verification across different modes and across multiple agents.

We develop a notion of *virtual system* (Section 4) which automatically defines symmetry transformations for a broad swathe of hybrid and dynamical systems modeling agents visiting a sequence of waypoints (see Theorem 3 and Examples 3 and 4). That is, reachability analysis of a multi-agent system, with possibly different dynamics and different parameters, can be performed in a common transformed coordinate system, and thus, increases the possibility of reuse. We show how this principle can make it possible to verify systems over unbounded time and with infinite number of agents (Theorem 6), provided that no new unproven scenarios appear for the virtual system.

We present a prototype implementation of a tool that uses symComputeReachtube. We name it CacheReach. It builds a cache of reachtubes for the virtual system, from different sets of initial states. In performing reachability analysis of a multi-agent hybrid or dynamical system, for each agent and each mode, the algorithm proceeds as follows: (1) transform the initial set *X* to an initial set of the virtual system to get γ(*X*). (2) If the transformed set γ(*X*) has already been stored in the cache, then extract it and apply γ−<sup>1</sup> to get the actual reachset. (3) Otherwise, compute the reachset from γ(*X*) and cache it. Our algorithm symComputeReachtube and its implementation in CacheReach are agnostic of the representation of the reachsets and the reachability analysis subroutine, and therefore, any of the ever-improving libraries can be plugged-in for step 3.

Our experimental evaluation of CacheReach shows safety verification computation time savings of up to 64% on scenarios with multiple agents with 3-dimensional linear and 4-dimensional nonlinear fixed-wing aircraft model following sequences of waypoints. These savings illustrate the potential benefits of using symmetry transformations and caching in the safety verification of multi-agent systems.

## 2 Model and problem statement

*Notations.* We denote by N, R, and R≥<sup>0</sup> the sets of natural numbers, real numbers and non-negative reals. Given a finite set *<sup>S</sup>*, its cardinality is denoted by <sup>|</sup>*S*|. Given *<sup>N</sup>* <sup>∈</sup> <sup>N</sup>, we denote by [*N*] the set {1,...,*N*}. Given a vector *<sup>v</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* and a set *<sup>L</sup>* <sup>⊆</sup> [*n*], we denote the projection of *v* to the indices in *L* by *v*[*L*]. We define an *n*-dimensional hyper-rectangle by a 2d-array specifying its bottom-left and upper-right corners. We denote the projection of a hyper-rectangle *<sup>H</sup>* on the set of dimensions *<sup>L</sup>* by *<sup>H</sup>*[*L*]. Given a function <sup>γ</sup> : <sup>R</sup>*<sup>k</sup>* <sup>→</sup> <sup>R</sup>*<sup>k</sup>* and a set *<sup>S</sup>* <sup>⊆</sup> <sup>R</sup>*k*, we abuse notation and define <sup>γ</sup>(*S*) = {γ(*x*) <sup>|</sup> *<sup>x</sup>* <sup>∈</sup> *<sup>S</sup>*}. Moreover, given *<sup>S</sup>* <sup>∈</sup> <sup>2</sup>R*<sup>k</sup>* <sup>×</sup>R≥0, we define <sup>γ</sup>(*S*) = {(γ(*X*),*t*) <sup>|</sup> (*X*,*t*) <sup>∈</sup> *<sup>S</sup>*}.

#### 2.1 Agent mode dynamics

In this section, we define the syntax and semantics of the model that determines the dynamics of an agent. We present the syntax first.

Definition 1 (syntax). *The agent dynamics are defined by a tuple A* = *S*,*P*, *f, where <sup>S</sup>* <sup>⊆</sup> <sup>R</sup>*<sup>n</sup> is its state space, <sup>P</sup>* <sup>⊆</sup> <sup>R</sup>*<sup>m</sup> is its parameter or mode space, and the dynamic function f* : *S*×*P* → *S that is Lipschitz in the first argument.*

The semantics of an agent dynamics is defined by trajectories, which describe the evolution of states over time.

Definition 2 (semantics). *For a given agent A* = *S*,*P*, *f, we call a function* ξ : *S*×*P*× <sup>R</sup>≥<sup>0</sup> <sup>→</sup> *<sup>S</sup> <sup>a</sup>* trajectory *if* <sup>ξ</sup> *is differentiable in its third argument, and given an initial state* x<sup>0</sup> ∈ *S and a mode p* ∈ *P,* ξ (x0, *p*,0) = x<sup>0</sup> *and for all t* > 0*,*

$$\frac{d\tilde{\xi}}{dt}(\mathbf{x}\_0, p, t) = f(\tilde{\xi}(\mathbf{x}\_0, p, t), p). \tag{1}$$

*We say that* ξ (x0, *p*,*t*) *is the state of A at time t when it starts from* x<sup>0</sup> *in mode p.*

Given an initial state x<sup>0</sup> ∈ *S* and mode *p* ∈ *P*, the trajectory ξ (x0, *p*,·) is the unique solution of the ordinary differential equation (ODE) (1) since *f* is Lipschitz continuous.

Given a compact initial set *K* ⊆ *S*, a parameter *p* ∈ *P*, the set of *reachable states* of *A* over a time interval [*ftime*,*etime*] is defined as

$$\text{Reach}(K, p, [time, time]) = \{ \mathbf{x} \in \mathcal{S} \mid \exists \mathbf{x}\_0 \in K, t \in [time, time], \mathbf{x} = \mathbf{\tilde{s}}(\mathbf{x}\_0, p, t) \}. \tag{2}$$

We let Reach(*K*, *<sup>p</sup>*,*t*) denote the set of reachable states at time *<sup>t</sup>*. Unbounded reachset from *<sup>K</sup>* and *<sup>p</sup>* is Reach(*K*, *<sup>p</sup>*,[*ftime*,∞)).

The *bounded time safety verification* problem requires one to check if any state reachable by *A* for a given initial set *K* and mode *p* is unsafe within a given time bound. That is, given a time bound *T* > 0, *p* ∈ *P*, and an unsafe set *U* ⊆ *S*, we want to check whether Reach(*K*, *<sup>p</sup>*,[0,*T*])∩*<sup>U</sup>* <sup>=</sup> /0.

## 2.2 Reachtubes

Computing reachsets exactly is theoretically hard [22]. There are many reachability analysis tools [8,1,3] that can compute bounded-time over-approximations of the reachsets. Generally, given an initial set *K* for a set of ODEs, these tools can return a sequence of sets that contain the exact reachset over small time intervals. Motived by this, we define reachtubes as sequences of time-annotated over-approximations of exact reachsets:

Definition 3. *For a given agent A* = *S*,*P*, *f, an initial set K* ⊆ *T, a mode p* ∈ *P, and a time interval* [*ftime*,*etime*]*, a* (*K*, *<sup>p</sup>*,[*ftime*, *etime*])*-reachtube* ReachTb(*K*, *<sup>p</sup>*,[0,*T*]) *is a sequence* {(*Xi*,[τ*i*−1, <sup>τ</sup>*i*])}*<sup>j</sup> <sup>i</sup>*=<sup>1</sup> *such that* Reach(*K*, *<sup>p</sup>*,[τ*i*−1, <sup>τ</sup>*i*]) <sup>⊆</sup> *Xi, and* <sup>τ</sup><sup>0</sup> <sup>=</sup> *ftime* <sup>&</sup>lt; τ<sup>1</sup> < ··· < τ *<sup>j</sup>* = *etime. Without loss of generality, we assume equal separation between the time points, i.e.* ∃ τ*<sup>s</sup>* > 0,∀*i* ∈ [ *j*], τ*<sup>i</sup>* −τ*i*−<sup>1</sup> = τ*s.*

For a given (*K*, *p*,[*ftime*,*etime*])-reachtube *rtube*, we denote its parameters by *rtube*.*K*, *rtube*.*p*, *rtube*.*ftime*, and *rtube*.*etime*, respectively, and its cardinality by *rtube*.*len*.

We define union, truncation, concatenate, and time-shift operators on reachtubes. Fix *rtube*<sup>1</sup> <sup>=</sup> {(*Xi*,1,[τ*i*−1,1, <sup>τ</sup>*i*,1])}*j*<sup>1</sup> *<sup>i</sup>*=<sup>1</sup> and *rtube*<sup>2</sup> <sup>=</sup> {(*Xi*,2,[τ*i*−1,2, <sup>τ</sup>*i*,2])}*j*<sup>2</sup> *<sup>i</sup>*=<sup>1</sup> to be two reachtubes, where *j*<sup>1</sup> = *rtube*1.*len* and *j*<sup>2</sup> = *rtube*2.*len*. If τ*i*,<sup>1</sup> = τ*i*,<sup>2</sup> for all *i* ∈ [min(*j*1, *j*2)], we say they are *time-aligned*. Without loss of generality, assume that *j*<sup>1</sup> ≤ *j*2. The operators are defined as follows:


A *simulation* of system (1) is a reachtube with *X*<sup>0</sup> being a singleton state *x*<sup>0</sup> ∈ *K*. That is, a simulation is a representation of ξ (*x*0, *p*,·). Several numerical solvers can compute such simulations as VNODE-LP and CAPD Dyn-Sys library . 1 2

*Example 1 (Fixed-wing aircraft following a single waypoint).* Consider an agent with state space *<sup>S</sup>* <sup>=</sup> <sup>R</sup>4, parameter space *<sup>P</sup>* <sup>=</sup> <sup>R</sup>4, and *<sup>f</sup>* : *<sup>S</sup>*×*<sup>P</sup>* <sup>→</sup> *<sup>S</sup>* defined as follows: for any x ∈ *S* and *p* ∈ *P*,

$$f(\mathbf{x}, p) = \left[\frac{T\_c - c\_{d1}\mathbf{x}[0]^2}{m}, \frac{\mathbf{g}}{\mathbf{x}[0]}\sin\phi, \mathbf{x}[0]\cos\mathbf{x}[1], \mathbf{x}[0]\sin\mathbf{x}[1]\right], \dots$$

where *Tc* = *k*1*m*(*vc* −x[0]), φ = *k*<sup>2</sup> *vc <sup>g</sup>* (ψ*<sup>c</sup>* <sup>−</sup>x[1]), <sup>ψ</sup>*<sup>c</sup>* <sup>=</sup> arctan2( <sup>x</sup>[2]−*p*[2] x[3]−*p*[3] ), and *k*1, *k*2,*m*,*g*, *cd*1, and *vc* are positive constants. The agent models a fixed-wing aircraft starting from a waypoint and following another in the 2D plane: x[0] is its speed, x[1] is its heading angle, (x[2],x[3]) is its position in the plane, [*p*[0], *p*[1]] is the position of the source waypoint, and (*p*[2], *p*[3]) is the position of the destination one. Note that the source waypoint does not affect the dynamics, but will be useful later in the paper.

<sup>1</sup> http://www.cas.mcmaster.ca/~nedialk/vnodelp/

<sup>2</sup> http://capd.sourceforge.net/capdDynSys/docs/html/odes\_rigorous.html

## 3 Symmetry and Equivariant Dynamical Systems

Symmetry plays a fundamental role in the analysis of dynamical systems. It has been used for studying stability of feedback systems [25], designing observers [5] and controllers [30], and analyzing neural networks [20]. In this section, we present definitions of symmetries and their implications on systems that posses them.

## 3.1 Symmetry of systems with inputs

In the following, symmetry transformations are defined by the ability of computing new solutions of (1) using already computed ones. First, let Γ be a group of smooth maps acting on *S*.

Definition 4 (Definition 2 in [27]). *We say that* γ ∈ Γ *is a symmetry of (1) if for any solution* ξ (x0, *p*,·)*,* γ(ξ (x0, *p*,·)) *is also a solution.*

Using γ-symmetry, we can get a new trajectory without simulating the system but instead by just transforming the entire old trajectory using γ .

In the following definition we characterize the conditions under which a transformation is a symmetry of (1).

Definition 5. *The dynamic function f* : *S*×*P* → *S is said to be* Γ *-equivariant if for any* <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup> *, there exists* <sup>ρ</sup> : *<sup>P</sup>* <sup>→</sup> *P such that for all* <sup>x</sup> <sup>∈</sup> *S,* ∂ γ <sup>∂</sup><sup>x</sup> *f*(x, *p*) = *f*(γ(x),ρ(*p*))*.*

The following theorem shows that it is enough to check the condition in Definition 5 to prove that a transformation is a symmetry of (1).

Theorem 1 (part of Theorem 10 in [27]). *If f is* Γ *-equivariant, then all maps in* Γ *are symmetries of (1). Moreover, for any solution* ξ (x0, *p*,·) *and* γ ∈ Γ *,* γ(ξ (x0, *p*,·)) = ξ (γ(x0),ρ(*p*),·)*, where* ρ *is the transformation associated with* γ *in Definition 5.*

*Proof.* Let y = γ(x), then y˙ = ∂ γ <sup>∂</sup><sup>x</sup> (x˙) = ∂ γ <sup>∂</sup><sup>x</sup> (*f*(x, *p*)) = *f*(γ(x),ρ(*p*)) = *f*(y,ρ(*p*)). The second equality is a result of the derivative chain rule. The 3*rd* equality uses Definition 5.

*Remark 1.* If γ in Theorem 1 is linear, the condition in Definition 5 for a map γ to be a symmetry becomes γ(*f*(x, *p*)) = *f*(γ(x),ρ(*p*)).

*Example 2 (Fixed-wing aircraft coordinate transformation symmetry).* Consider the fixed-wing aircraft model of Example 1. Fix *goal* <sup>∈</sup> <sup>R</sup><sup>2</sup> and <sup>θ</sup> <sup>∈</sup> <sup>R</sup>. Let <sup>γ</sup> : <sup>R</sup><sup>4</sup> <sup>→</sup> <sup>R</sup><sup>4</sup> and <sup>ρ</sup> : <sup>R</sup><sup>4</sup> <sup>→</sup> <sup>R</sup><sup>4</sup> be defined as:

$$\begin{aligned} \boldsymbol{\gamma}(\mathbf{x}) &= [\mathbf{x}[0], \mathbf{x}[1] + \boldsymbol{\theta}, (\mathbf{x}[2] - goal[0])\cos(\boldsymbol{\theta}) + (\mathbf{x}[3] - goal[1])\sin(\boldsymbol{\theta}), \\ &- (\mathbf{x}[2] - goal[0])\sin(\boldsymbol{\theta}) + (\mathbf{x}[3] - goal[1])\cos(\boldsymbol{\theta}) \, \text{and} \\ \boldsymbol{\rho}(\boldsymbol{p}) &= [0, 0, (p[2] - goal[0])\cos(\boldsymbol{\theta}) + (p[3] - goal[1])\sin(\boldsymbol{\theta}), \\ &- (p[2] - goal[0])\sin(\boldsymbol{\theta}) + (p[3] - goal[1])\cos(\boldsymbol{\theta}) \, \text{.} \end{aligned} \tag{3}$$

Then, for all x ∈ *S* and *p* ∈ *P*, γ(*f*(x, *p*)) = *f*(γ(x),ρ(*p*)), where *f* is as in Section 2.1. The transformation γ would change the origin of *S* from [0,0,0,0] to [0,0,*goal*[0],*goal*[1]]. Then, it would rotate the third and four axes counter-clockwise by θ. Moreover, ρ would set the first two coordinates of the parameters to zero as they do not affect the dynamics, translate the origin of the parameter space *P* to [0,0,*goal*[0],*goal*[1]], and rotate the third and fourth axes counter-clockwise by θ. For the aircraft, this means translating and rotating the plane where the aircraft and the waypoint positions reside.

### 3.2 Symmetry and reachtubes

Computing reachtubes is computationally expensive as it requires non-trivial optimization problems and integrating non-linear functions [13,15,16,8,6]. Compared with that, transforming reachtubes is much cheaper, especially if the transformation is linear.

In our previous work [29], we showed how to get reachtubes of autonomous systems from previously computed ones using symmetry transformations. In this paper, we show how to do that for systems with parameters. This allows different modes of a hybrid system and different agents with similar dynamics to share reachtube computations. That was not possible when the theory was limited to non-parameterized systems.

Theorem 2. *Let (1) be* Γ *-equivariant. Then for any* γ ∈ Γ *and its corresponding* ρ*, any <sup>K</sup>*, *<sup>p</sup>*,[*ftime*, *etime*] *and* {(*Xi*,[τ*i*−1, <sup>τ</sup>*i*])}*<sup>j</sup> <sup>i</sup>*=<sup>1</sup> *as a* (*K*, *p*,[*ftime*, *etime*])*-reachtube,*

$$\forall i \in [j], \mathsf{Reach}(\gamma(K), \mathfrak{p}(p), [\tau\_{i-1}, \tau\_i]) = \mathfrak{p}(\mathsf{Reach}(K, p, [\tau\_{i-1}, \tau\_i])) \subseteq \mathfrak{p}(X\_i).$$

*Proof.* (Sketch) The first part Reach(γ(*K*),ρ(*p*),[τ*i*−1, <sup>τ</sup>*i*]) = <sup>γ</sup>(Reach(*K*, *<sup>p</sup>*,[τ*i*−1, <sup>τ</sup>*i*])) follows directly from Theorem 1. The second part <sup>γ</sup>(Reach(*K*, *<sup>p</sup>*,[τ*i*−1, <sup>τ</sup>*i*])) <sup>⊆</sup> <sup>γ</sup>(*Xi*) follows from the reachtube ReachTb(*K*, *<sup>p</sup>*,[*tb*,*te*]) being an over-approximation of the exact reachset during the small time intervals [τ*i*−1, τ*i*].

Theorem 2 says that we can transform a computed reachtube ReachTb(*K*, *<sup>p</sup>*,[*t*1,*t*2]) = {(*Xi*,[τ*i*−1, <sup>τ</sup>*i*])}*<sup>j</sup> <sup>i</sup>*=<sup>1</sup> to get another reachtube {(γ(*Xi*),[τ*i*−1, <sup>τ</sup>*i*])}*<sup>j</sup> i*=1, which is an overapproximation of the reachsets starting from γ(*K*).

The results of this section subsume the results about transforming reachtubes of autonomous systems-dynamical systems without parameters as presented in [29].

## 4 Virtual system

The challenge in safety verification of multi-agent systems is that the dimensionality of the problem grows rapidly with the number of agents. However, often agents share the same dynamics. For instance, several of the type described in Example 1 share the same dynamics but may have different initial conditions and follow different waypoints. This commonality has been exploited in developing specialized proof techniques [23]. For reachability analysis, using symmetry transforms of the previous section, reachtubes of one agent in one mode can be used to get the reachtubes of other modes and even other agents. fixed-wing aircrafts

Fix a particular value *pv* ∈ *P* and call it the *virtual* parameter. Assume that for all *p* ∈ *P*, there exists a pair of transformations (γ*p*,ρ*p*) such that ρ*p*(*p*) = *pv*, γ*<sup>p</sup>* is invertible, and γ*p*(*f*(x, *p*)) = *f*(γ*p*(x),ρ*p*(*pv*)) = *f*(γ*p*(x), *pv*). Consider the resulting ODE:

$$\frac{d\tilde{\xi}}{dt}(\mathbf{y}, p\_{\boldsymbol{\nu}}, t) = f(\tilde{\xi}(\mathbf{y}, p\_{\boldsymbol{\nu}}, t), p\_{\boldsymbol{\nu}}).\tag{5}$$

Following [27], we call (5) a *virtual system*. Correspondingly, we call (1), the *real system* for the rest of the paper. The virtual system unifies the behavior of all modes of the real system in one representative mode, the virtual one *pv*.

*Example 3 (Fixed-wing aircraft virtual system).* Consider the fixed-wing aircraft agent described in Example 1 and the corresponding transformations described in Example 2. Fix *p* ∈ *P*, we set *goal* in the transformation of Example 2 to [*p*[2], *p*[3]] and θ to arctan2(*p*[0]− *p*[2], *p*[3]− *p*[1]) and let γ*<sup>p</sup>* and ρ*<sup>p</sup>* be the resulting transformations. Then, for all *p* ∈ *P*, ρ*p*(*p*)=[0,0,0,0]. Hence, *pv* = [0,0,0,0] and the virtual system is that of Example 1 with the parameter *p* = *pv*. For the aircraft, γ*<sup>p</sup>* would translate the origin of the plane to the destination waypoint and rotate its axes so that the *y*-axis is aligned with the segment between the source and destination waypoints. Hence, in the constructed virtual system, the destination waypoint is the origin of the plane. The source waypoint is the origin as well as it does not affect the dynamics.

The solutions of the virtual system can be transformed to get solutions of all other modes in *<sup>P</sup>* using {γ−<sup>1</sup> *<sup>p</sup>* }*p*∈*P*. This is shown in the following theorem.

Theorem 3. *Given any initial state* <sup>y</sup><sup>0</sup> <sup>∈</sup> *S, and any mode <sup>p</sup>* <sup>∈</sup> *P,* <sup>γ</sup>−<sup>1</sup> *<sup>p</sup>* (ξ (y0, *pv*,·)) *is a solution of the real system (1) with mode p starting from* γ−<sup>1</sup> *<sup>p</sup>* (y0)*. Similarly, given any* x<sup>0</sup> ∈ *S,* γ*p*(ξ (x0, *p*,·)) *is the solution of the virtual system (5) starting from* γ*p*(*x*0)*.*

*Proof.* Lets start with the first part of the theorem. Fix *<sup>p</sup>* <sup>∈</sup> *<sup>P</sup>* and let <sup>x</sup><sup>0</sup> <sup>=</sup> <sup>γ</sup>−<sup>1</sup> *<sup>p</sup>* (y0). Using Theorem 1, γ*p*(ξ (x0, *p*,·)) = ξ (γ*p*(x0),ρ*p*(*p*),·)) and is the solution of the real system (1). Furthermore, ρ*p*(*p*) = *pv*, by definition, and γ*p*(x0) = γ*p*(γ−<sup>1</sup> *<sup>p</sup>* (y0)) = y0. Hence, <sup>γ</sup>*p*(<sup>ξ</sup> (x0, *<sup>p</sup>*,·)) = <sup>ξ</sup> (y0, *pv*,·). Applying <sup>γ</sup>−<sup>1</sup> *<sup>p</sup>* on both sides implies the first part of the theorem. The second part is a direct application of Theorem 1.

The following corollary extends the result of Theorem 3 to reachtubes. It follows from Theorem 2.

Corollary 1. *Given a Kv* <sup>⊆</sup> *<sup>S</sup> and a mode <sup>p</sup>* <sup>∈</sup> *P,* <sup>γ</sup>−<sup>1</sup> *<sup>p</sup>* (ReachTb(*Kv*, *pv*,[*tb*,*te*])) *is a reachtube of the real system (1) with mode p starting from* γ−<sup>1</sup> *<sup>p</sup>* (*Kv*)*. Similarly, given any initial set <sup>K</sup>* <sup>⊂</sup> *S,* <sup>γ</sup>*p*(ReachTb(*K*, *<sup>p</sup>*,[*tb*,*te*])) *is a reachtube of the virtual system (5) starting from* γ*p*(*K*)*.*

Consequently, we get a solution or a reachtube for each mode *p* ∈ *P* of the real system by simply transforming a single solution or a single reachtube of the virtual system using the transformations {γ*p*}*p*∈*<sup>P</sup>* and their inverses. This will be the essential idea behind the savings in computation time of the new symmetry-based reachtube computation algorithm and symmetry-based safety verification algorithms presented next. It will be also the essential idea behind proving safety in the case of unbounded time and infinite number of modes.

*Example 4 (Fixed-wing aircraft infinite number of reachtubes resulting from transforming a single one).* Consider the real system in Example 1 and the virtual one in Example 3. Fix the initial set, which is represented as a hyper-rectangle, *Kr* = [[1, <sup>π</sup> <sup>4</sup> ,3,1],[2, <sup>π</sup> <sup>3</sup> ,4,2]], the real mode *pr* = [2.5,0.5,13.3,5], and the time bound 20 seconds. Then, similar to Example 3, we fix θ = arctan2(2.5−13.3,5−0.5) = −1.176 rad and *goal* = [13.3,5]. We call the resulting transformations from Example 3, γ*pr* and ρ*pr*. Let *Kv* = γ*pr*(*Kr*) and *pv* = <sup>ρ</sup>*pr*(*pr*)=[0,0,0,0]. Assume that we have the reachtube *rtuber* <sup>=</sup> ReachTb(*Kr*, *pr*,*T*). Then, using Corollary 1, we can get *rtubev* <sup>=</sup> ReachTb(*Kv*, *pv*,*T*) by transforming *rtuber* using <sup>γ</sup>*pr*. The benefit of the corollary appears in the following: for any *<sup>p</sup>* <sup>∈</sup> *<sup>P</sup>* <sup>=</sup> <sup>R</sup>4, we can get the corresponding reachtube ReachTb(γ−<sup>1</sup> *<sup>p</sup>* (*Kv*), *p*,*T*) by transforming *rtubev* using γ−<sup>1</sup> *<sup>p</sup>* .

The projection of *Kv* on its last two coordinates *Kv*[2:3] represents the possible initial position of the aircraft in the plane relative to the destination waypoint. It would be a rotated square with angle θ. The distance from *Kv*[2:3] center to the origin would be equal to the distance from *K*[2:3] center to the destination waypoint. Moreover, the angle between the *y*-axis and the line connecting the origin with the center of *Kv*[2:3] would be equal to the angle from the segment connecting the source and destination waypoints to the line connecting the destination waypoint with the center of *K*[2:3]. On the other hand, *Kv*[0] = *K*[0] and *Kv*[1] = *K*[1] +θ.

In summary, the absolute positions of the aircraft and waypoints do not matter. What matters is their relative positions. The virtual system stores what matters and whenever a reachtube is needed for a new absolute position, we can transform it from the virtual one.

## 5 Symmetry-based verification algorithm

In this section, we introduce a novel safety verification algorithm, symSafetyVerif, which uses existing reachability subroutines, but exploits symmetry, unlike existing algorithms. In our earlier work [29], we introduced reachtube transformations using symmetry for single mode dynamical systems. Here, we extend the method across modes, introduce the virtual system, and develop the corresponding verification algorithm.

In Section 5.1, we define *tubecache*—a data-structure for storing reachtubes; in 5.2, we present the symmetry-based reachtube computation algorithm symComputeReachtube that reuses reachtubes stored in *tubecache*; finally, in 5.3, we define the *safetycache* datastructure which stores previously computed safety verification results. These results would be used by the symSafetyVerif algorithm.

#### 5.1 *tubecache*: shared memory for reachtubes

We show how we use the virtual system (5) to create a shared memory for the different modes of the real system (1) to reuse each others' computed reachtubes. We call this shared memory *tubecache*.

Definition 6. *A tubecache is a data structure that stores a set of reachtubes of the virtual system (5). It has two methods: getTube, for retrieving stored tubes and storeTube, for storing a newly computed one.*

The function getTube returns a set of reachtubes {ReachTb(*Ki*, *pv*,[0,*Ti*])}*i*∈[*h*], for some *<sup>h</sup>* <sup>∈</sup> <sup>N</sup>, that are already stored in *tubecache*. Moreover, the union of *Ki*s is the largest subset of *K* that can be covered by the initial sets of the reachtubes in *tubecache*. Formally,

$$\text{table}\,\text{cache}.\text{getTube}(K) = \underset{\{\text{RaachTb}(K\_l, p\_l, [0, T\_l]) \in \text{table}\,\text{cache}\}\_l}{\text{argmax}} \,\text{Vol}(K \cap \cup\_l K\_l), \qquad (6)$$

where Vol(·) is the Lebesgue measure of the set. Note that for any *<sup>K</sup>* <sup>⊂</sup> <sup>R</sup>*n*, a maximizer of (6) would be the set of all reachtubes in *tubecache*. However, this is very inefficient and it would be too conservative to be useful for checking safety. Therefore, getTube should return the minimum number of reachtubes that maximize (6). Note that the reachtubes in *tubecache* may have different time bounds. We will truncate or extend them when used.

## 5.2 symComputeReachtube: symmetry-based reachtube computation

Given an initial set *K* ⊂ *S*, a mode *p* ∈ *P*, and time bound *T*, there are dozens of tools that can return a ReachTb(*K*, *<sup>p</sup>*,[0,*T*]). See [13,8,9] for examples of such tools and [26] for a comprehensive survey. We denote this procedure by computeReachtube(*K*, *<sup>p</sup>*,[0,*T*]).

Whenever a reachtube is needed, instead of calling computeReachtube, we will use symmetry to retrieve corresponding reachtubes that are already stored in *tubecache* and only compute what is not stored. We introduce Algorithm 1 which implements this idea and name it symComputeReachtube.

It takes as input the initial set of the virtual system *Kv*, the time bound *T*, and *tubecache*. It returns a reachtube of the virtual system starting from *Kv* and running for *T* time units. Hence, to get a reachtube of the real system starting from an initial set *K* and having a mode *p* and time bound *T*, we transform *K* using γ*<sup>p</sup>* to get *Kv*, call symComputeReachtube, and transform the result using <sup>γ</sup>−<sup>1</sup> *<sup>p</sup>* .

First, it initializes *restubev* as an empty tube of the virtual system (5) to store the result in line 2. It then gets the reachtubes from *tubecache* that corresponds to *Kv* using the getTube method in line 3. Now that it has the relevant tubes in *storedtubes*, it adjusts their lengths based on the time bound *T*. For a retrieved tube with a time bound less than *<sup>T</sup>* in line 5, symComputeReachtube extends the tube for the remaining time using computeReachtube in lines 6-7, store the resulting tube in *tubecache* instead of the shorter one in line 8. If the retrieved tube is longer than *T* (line 9), it trims it in line 10. However, we keep the long one in the *tubecache* to not lose a computation we already did. Then, the tube with the adjusted length is added to the result tube *restubev* in line 11.

The union of the initial sets of the tubes retrieved *storedtubes* may not contain all of the initial set *Kv*. That uncovered part is called *K <sup>v</sup>* in line 12. The reachtube starting from *K <sup>v</sup>* would be computed from scratch using computeReachtube in line 13, stored in *tubecache* in line 14, and added to *restubev* in line 15. The resulting tube of the virtual system (5) is returned in line 16. This tube would be transformed by the calling algorithm using γ−<sup>1</sup> *<sup>p</sup>* to get the corresponding tube of the real system (5).

Theorem 4. *The output of Algorithm 1 is an over-approximation of the reachtube* ReachTb(*Kv*, *pv*,[0,*T*])*.*

## Algorithm 1 symComputeReachtube

```
1: input: Kv,T,tubecache
2: restubev ← /0
3: storedtubes ← tubecache.getTube(Kv)
4: for i ∈ [|storedtubes|] do
5: if storedtubes[i].T < T then
6: (Ki,[τi,Ti]) ← storedtubes[i].end
7: tubei ← storedtubes[i]  computeReachtube(Ki, pv,[0,T −τi])
8: tubecache.storeTube(tubei)
9: else if storedtubes[i].T > T then
10: tubei ← storedtubes[i].truncate(T)
11: restubev ← restubev ∪tubei
12: K
     v ← Kv\ ∪istoredtubes[i].K
13: tube = computeReachtube(K
                                v, pv,[0,T])
14: tubecache.storeTube(tube
                             )
15: restubev ← restubev ∪tube
16: return: restubev
```
*Proof.* The function computeReachtube always returns over-approximations of the reachset from a given initial set and for a given time bound. The set *restube* contains reachtubes that were computed by computeReachtube at some point. There are three types of reachtubes in *restube*:


The union of the initial sets of the tubes in *storedtubes* and *K <sup>v</sup>* contains *Kv*, so the union of the reachtubes the algorithm returns a (*Kv*, *pv*,[0,*T*])-reachtube.

The importance of symComputeReachtube lies in that if a mode *<sup>p</sup>* required a computation of a reachtube and the result is saved in *tubecache*, another mode with a similar scenario with respect to the virtual system would reuse that tube instead of computing one from scratch. Moreover, reachtubes of the same mode might be reused as well if the scenario was repeated again.

#### 5.3 Bounded time safety

In this section, we show how to use *tubecache* and symComputeReachtube of the previous section for bounded and unbounded time safety verification of the real system (1). We consider a scenario where the safety verification of multiple modes of the real system (1)

starting from different initial sets and for different time horizons is needed. We will use the virtual system (5) and the transformations {γ*p*}*p*∈*<sup>P</sup>* to share safety computations across modes, initial sets, time horizons, and unsafe sets.

We first introduce *safetycache*, a shared memory to store the results of intersecting reachtubes of the virtual system (5) with different unsafe sets. It will prevent repeating safety checking computations of different modes under similar scenarios and can be used in finding unbounded time safety properties of the real system (1).

Definition 7. *A safetycache is a data structure that stores the results of intersecting reachtubes of the virtual system (5) with unsafe sets. It has two functions:* getIntersect*, for retrieving stored results and* storeIntersect*, for storing a newly computed one.*

Given an initial set *Kv*, a time bound *T*, and an unsafe set *Uv*, the reachtube *rtube* = ReachTb(*Kv*, *pv*,[0,*T*])is unsafe if there is another one *rtube* <sup>=</sup> ReachTb(*K <sup>v</sup>*, *pv*,[0,*T* ]), is unsafe, and is an under-approximation of *rtube*. Similarly, if *rtube* is an overapproximation of *rtube* and is safe, then *rtube* is safe. Formally, the getIntersect function of *safetycache* returns the truth value of the predicate ReachTb(*Kv*, *pv*,[0,*T*])∩*Uv* /0 if a subsuming computation is stored, and returns ⊥, otherwise. =

Formally, *safetycache*.getIntersect(*Kv*,*T*,*Uv*) =

$$\begin{cases} 0, \text{ if } \exists \ K\_{\boldsymbol{\nu}}', T', U\_{\boldsymbol{\nu}}' \mid K\_{\boldsymbol{\nu}} \supseteq K\_{\boldsymbol{\nu}}', T \geq T', U\_{\boldsymbol{\nu}} \supseteq U\_{\boldsymbol{\nu}}', \text{safety}(K\_{\boldsymbol{\nu}}', T', U\_{\boldsymbol{\nu}}') = 0, \\ 1, \text{ if } \exists \ K\_{\boldsymbol{\nu}}', T', U\_{\boldsymbol{\nu}}' \mid K\_{\boldsymbol{\nu}} \subseteq K\_{\boldsymbol{\nu}}', T \leq T', U\_{\boldsymbol{\nu}} \subseteq U\_{\boldsymbol{\nu}}', \text{safety}(K\_{\boldsymbol{\nu}}', T', U\_{\boldsymbol{\nu}}') = 1, \text{ and} \\ \bot, \text{ otherwise}, \end{cases}$$

where 0 means *unsafe* and 1 means *safe*.

It is equivalent to check the intersection of a reachtube of the real system (1) with an unsafe set *U* and to check the intersection of the corresponding reachtube and unsafe set of the virtual one. This is formalized in the following lemma.

Lemma 1. *Consider an unsafe set <sup>U</sup>* <sup>⊆</sup> <sup>R</sup>*<sup>n</sup>* <sup>×</sup> <sup>R</sup><sup>+</sup> *and rtube* <sup>=</sup> ReachTb(*K*, *<sup>p</sup>*,[*t*1,*t*2])*. Then, for any invertible* <sup>γ</sup> : <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup>*n, rtube*∩*<sup>U</sup>* <sup>=</sup> /0 *if and only if* <sup>γ</sup>(*rtube*)∩γ(*U*) <sup>=</sup> /0*.*

Now that we have established the equivalence of safety checking between the real and virtual systems, we present Algorithm 2 denoted by symSafetyVerif. It uses *safetycache*, *tubecache*, and symComputeReachtube in order to share safety verification computations across modes. The method symSafetyVerif would be called several times to check safety of different scenarios and *safetycache* and *tubecache* would be maintained across calls.

The function symSafetyVerif takes as input an initial set *<sup>K</sup>*, a mode *<sup>p</sup>*, a time bound *T*, an unsafe set *U*, the transformation γ*p*, and *safetycache* and *tubecache* that resulted from previous runs of the algorithm.

It starts by transforming the initial and unsafe sets *K* and *U* to a virtual system initial and unsafe sets *Kv* and *Uv* using γ*<sup>p</sup>* in line 2. It then checks if a subsuming result of the safety check for the tuple (*Kv*,*T*,*Uv*) exists in *safetycache* using its method getIntersect in line 3. If it does exist, it returns it directly in line 8. Otherwise, the approximate reachtube is computed using symComputeReachtube in line 5. The returned tube is intersected with *Uv* in line 6 and the result of the intersection is stored in *safetycache* in line 7 and returned in line 8.

## Algorithm 2 symSafetyVerif

1: input: *K*, *p*,*T*,*U*, γ*p*,*safetycache*,*tubecache* 2: *Kv* ← γ*p*(*K*), *Uv* ← γ*p*(*U*) 3: *result* <sup>←</sup> *safetycache*.getIntersect(*Kv*,*T*,*Uv*) 4: if *result* = ⊥ then 5: *rtube* <sup>←</sup> symComputeReachtube (*Kv*,*T*,*tubecache*) 6: *result* ← (*tube*∩*Uv* = /0) 7: *safetycache*.storeIntersect(*Kv*,*T*,*Uv*,*result*) 8: return: *result*

## Theorem 5. *If* symSafetyVerif *returns safe, then* ReachTb(*K*, *<sup>p</sup>*,[0,*T*])∩*<sup>U</sup>* <sup>=</sup> /0*.*

*Proof.* From Theorem 4, if the result is not stored in *safetycache*, we know that *rtube* in line 5 is an over-approximation of ReachTb(*Kv*, *pv*,[0,*T*]). Moreover, we know from Corollary 1 that ReachTb(*K*, *<sup>p</sup>*,[0,*T*]) <sup>⊆</sup> <sup>γ</sup>−<sup>1</sup> *<sup>p</sup>* (*rtube*). But, from Lemma 1, we know that the truth value of the predicate (*rtube*∩*Uv* <sup>=</sup> /0) is equal to that of (γ−<sup>1</sup> *<sup>p</sup>* (*rtube*)∩*U* = /0) and hence *result* is *safe* if γ−<sup>1</sup> *<sup>p</sup>* (*rtube*)∩*<sup>U</sup>* <sup>=</sup> /0 and thus it is *safe* if ReachTb(*K*, *<sup>p</sup>*,*T*)<sup>∩</sup> *U* = /0. Finally, the stored values in *safetycache* are results from previous runs, and hence have the same property.

However, if symSafetyVerif returns *unsafe*, it might be that *rtube* in line 5 intersected the unsafe set because of an over-approximation error. There are two sources of such errors: first, the method computeReachtube used by symComputeReachtube can itself result in over-approximation errors. Actually, it will, most of the time [13,8]. But it may be exact too [3]. Second, the *tubecache*.getTube method which would return a list of tubes with the union of their initial sets strictly over-approximating the needed initial set. The first problem can be solved by asking the method computeReachtube to compute tighter reachtubes. Existing methods provide this option at the expense of worse computational complexity [13,8]. However, we can use symmetry in these tightening computations as well, as we did in [29]. We can also replace saved tubes in *tubecache* with newly computed tighter ones. The second problem can be solved by asking *tubecache*.getTube to return only the tubes with initial sets that are fully contained in the asked initial set. This would decrease the savings from transforming cached results, but it would reduce the false-positive error, saying *unsafe* while it is *safe*.

#### 5.4 Unbounded time safety

In this section, we show how infinite number of results of safety checks, i.e. results of intersections of reachtubes with unsafe sets, can be deduced from finite ones. The following corollary applies Lemma 1 to the transformations {γ*p*}*p*∈*<sup>P</sup>* that map the different modes of the real system (1) to the unique virtual one (5).

Corollary 2 (Infinite safety verification results from a single one). *Fix <sup>U</sup>* <sup>⊆</sup> <sup>R</sup>*<sup>n</sup> and rtube* <sup>=</sup> ReachTb(*Kv*, *pv*,[0,*T*])*. If rtube*∩*<sup>U</sup>* <sup>=</sup> /0*, then* <sup>∀</sup>*<sup>p</sup>* <sup>∈</sup> *P,* <sup>γ</sup>−<sup>1</sup> *<sup>p</sup>* (*rtube*)∩γ−<sup>1</sup> *<sup>p</sup>* (*U*) = /0*.*

The corollary means that from a single scenario safety check, i.e. an intersection operation between a reachtube ReachTb(*K*, *pv*,[0,*T*]) and unsafe set *<sup>U</sup>*, we can deduce the safety of any mode *<sup>p</sup>* <sup>∈</sup> *<sup>P</sup>* starting from <sup>γ</sup>−<sup>1</sup> *<sup>p</sup>* (*K*) and running for *T* time units with respect to the corresponding unsafe set γ−<sup>1</sup> *<sup>p</sup>* (*U*). This would, for example, imply unbounded time safety of a hybrid automaton under the assumption that the unsafe sets of the modes are at the same relative position with respect to the reachtube. But, *safetycache* stores a number of results of such operations. We can infer from each one of them the safety of infinite scenarios. This is formalized in the following theorem which follows directly from Corollary 2.

Theorem 6 (Infinite safety verification results from finite ones). *For any mode p* ∈ *P, initial set <sup>K</sup>* <sup>⊆</sup> *S, time bound <sup>T</sup>* <sup>≥</sup> <sup>0</sup>, *and unsafe set <sup>U</sup>* <sup>⊂</sup> *<sup>S</sup>*×R≥0*, such that <sup>K</sup>* <sup>⊆</sup> <sup>γ</sup>−<sup>1</sup> *<sup>p</sup>* (*K* )*, <sup>U</sup>* <sup>⊆</sup> <sup>γ</sup>−<sup>1</sup> *<sup>p</sup>* (*U* )*, and safetycache*(*K* ,*T*,*U* ) = 1*, system (1) is safe.*

As more results are added to *safetycache*, then we can deduce the safety of more scenarios in all modes. If at a given point of time, we are sure that no new scenarios would appear, we can deduce the safety for unbounded time and unbounded number of agents with the same dynamics having scenarios already covered.

*Example 5 (Fixed-wing aircraft infinite number of safety verification results from computing a single one).* Consider the initial set *K*, mode *p*, time bound *T*, their corresponding virtual ones *Kv* and *pv*, and the symmetry transformation γ*pr* considered in Example 4. Let the unsafe set be *<sup>U</sup>* = [[0,−∞,11.9,5.1],[∞,∞,12.9,6.1]]×R≥<sup>0</sup> and *Uv* <sup>=</sup> <sup>γ</sup>*pr*(*U*). Assume that *rtubev* ∩*Uv* = /0 and the result is stored in *safetycache*. Then, for all *p* ∈ *P*, γ−<sup>1</sup> *<sup>p</sup>* (*rtubev*)∩γ−<sup>1</sup> *<sup>p</sup>* (*Uv*) = /0.

For the aircraft, *U* could represent a mountain. Crashing with the mountain at any speed, heading angle, and time is unsafe. *Uv* represents the relative position of the mountain with respect to the segment of waypoints. Theorem 6 says that for any initial set of states *K* of the aircraft and time bound *T*, if the relative positions of the aircraft, unsafe set, and the segment of waypoints are the same or subsumed by those of *Kv*, *Uv*, and the origin, we can infer safety irrespective of their absolute positions.

## 6 Experimental evaluation

We implemented a software safety verification tool for multi-agent hybrid systems based on symComputeReachtube using Python 3. We named it CacheReach. By hybrid, we mean systems that transition between different modes under different conditions. We tested it on a linear dynamical system and the aircraft model of Example 1, following sequences of waypoints, using DryVR [18] and Flow\* [8] as reachability subroutines. Our code is available in a figshare repository [28] and has been tested on an Ubuntu virtual machine available in another figshare repository [21].

#### 6.1 CacheReach: multi-agent safety verification tool

Our tool CacheReach takes as input a JSON file specifying a list of *N* agents of dimension *n*. It also specifies the python file that contains the dynamics function *f* of

Definition 1 and two symmetry-related functions: *symGamma* and *symGammaInv*. Given <sup>a</sup> *<sup>p</sup>* <sup>∈</sup> *<sup>P</sup>* and a polytope3 *poly* of dimension *<sup>n</sup>* representing a set of states of the agent, *symGamma* returns γ*p*(*poly*), where γ*<sup>p</sup>* is the symmetry map to the virtual system. Similarly, *symGammaInv* would return γ−<sup>1</sup> *<sup>p</sup>* (*poly*). The list of modes that the *i th* agent transition between sequentially and their corresponding transitions conditions, denoted by guards, are specified as well and denoted by *Hi*. The guard of the *j th* mode of the *i th* agent *Hi*[ *j*].*guard* is a hyper-rectangle in the state space which when the agent reaches, it transitions to the (*j* +1)*st* mode. The guard *Hi*[ *j*] has time bound *Hi*[ *j*].*T* on how long the agent can stay in the mode. Moreover, it specifies the initial set of states for each agent as a hyper-rectangle. Finally, it specifies the static unsafe set *U* and the subset of dimensions *O* ⊆ [*n*] that is relevant for dynamic safety checking between agents. If the reachtubes of two agents projected on *O* intersect each other, it would model a collision between the agents. For example, *O* would be {2,3} for the aircraft model in Example 1 as (x[2],x[3]) represents its position.

CacheReach would return *unsafe* if the reachtubes of the agents starting from their initial sets of states and following the sequence of modes intersect a static unsafe set, or when projected to *O*, intersect each other. It would return *safe*, otherwise. Currently, CacheReach assumes that all agents share the same dynamics but do not interact. Hence, it has a single *tubecache* that is shared by all.

CacheReach computes the reachtubes of individual agents iteratively. It would compute the reachtube *mtubei* of the *j th* mode of the *i th* agent using symComputeReachtube. Then, it intersects it with the guard using the function *guardIntersect* to get the initial set *initseti* for the next mode. In addition to *initseti*, *guardIntersect* computes the minimum and maximum times: *mintimei* and *maxtimei*, respectively, at which *mtubei* intersects the guard. The value *mintimei* is the time at which a trajectory of the next mode may start at and *maxtimei* is the maximum such time. These values are used to check safety against time-annotated unsafe sets such as collision between agents.

The computed tube *mtubei* gets appended to *atubei* storing the full reachtube of the *i th* agent. The benefit of this method is that now all modes of all agents can be mapped to a single virtual system. They can resuse each others reachtubes using *tubecache* that is getting updated at every call to symComputeReachtube. Moreover, the static safety is done in the usual way.

The collision between agents is done by the function checkDynamicSafety. It takes two full reachtubes of two agents *atube*<sup>1</sup> and *atube*<sup>2</sup> along with two arrays *lookback*<sup>1</sup> and *lookback*2. For agent *i*, the array *lookbacki* consists of pairs of integers (*ind <sup>j</sup>*,*timerange <sup>j</sup>*) specifying the index identifying the beginning of the *j th* mode tube in *atubei* and the uncertainty in the starting time of the trajectories from its initial set. checkDynamicSafety would use this information to time-align parts of *atube*<sup>1</sup> and *atube*<sup>2</sup> so that the intersection check happens only between two sets that may have been reached at the same time by the two agents.

<sup>3</sup> https://github.com/tulip-control/polytope

#### 6.2 Experimental results

We ran experiments using our tool CacheReach on two models: a 3-dimensional linear dynamical system example and the nonlinear aircraft model described in Example 1. The linear model is of the form x˙ = *A*(x− *p*[3:5]), where *A* = [[−3,1,0],[0,−2,1],[0,0,−1]], <sup>x</sup> <sup>∈</sup> <sup>R</sup>3, and *<sup>p</sup>* <sup>∈</sup> <sup>R</sup>6. We considered scenarios with single, two, and three agents for each model following different sequences of waypoints. The sequences of waypoints for the linear model are translations and rotations of a digital-*S* shaped path. For the aircraft model, the paths are random crossing paths going north-east. In every scenario, all the agents have the same model. In the aircraft scenarios, the agent would switch to the next waypoint once its x, y position is within 0.5 units from the current waypoint in each dimension. The initial set of the aircraft was of size 1 in the position components, 0.1 in the speed, and 0.01 in the heading angle. We used Flow\* [8] and DryVR [18] to compute reachtubes from scratch for the linear example. We only used DryVR for the aircraft model since our C++ Flow\* wrapper does not handle a model having arctan2 in the dynamics. We ran all scenarios in CacheReach with and without using *tubecache*. The symmetry used for the aircraft was the one we showed in Example 3. For the linear model, the symmetry transformation γ*<sup>p</sup>* that was used to map the state to the virtual system was a coordinate transformation where the new origin is at the next waypoint *p*[3:5] and rotating the *xy*-plane by the angle between the previous and the next waypoints *p*[0:2] and *p*[3:5] projected to the plane. We compared the computation time with and without symmetry and show the results in Table 1. The reachtubes for three nonlinear and three linear agents are shown in Figure 1. The different colors represent reachtubes of different agents, the black points represent the waypoints, the black segments connect consecutive waypoints, and the red rectangles represent the unsafe sets. The figures on the top represent the real reachtubes while those on the bottom represent the ones corresponding to the virtual system saved in *tubecache*.


Table 1: Results.

In Table 1, we call CacheReach, when ran with DryVR while using *tubecache*, Sym-DryVR, for symmetric DryVR. We call it Sym-Flow\* if we are using Flow\* instead. If we are not using *tubecache*, we call them NoSym-DryVR and NoSym-Flow\*, respectively. Remember in symComputeReachtube, some tubes may be cached but they have shorter time horizons than the needed tube. So, we compute the rest from scratch. Here, we report the fractions of tubes computed from scratch and tubes that were transformed from cached ones. Moreover, we report the execution time till the tubes are

Fig. 1: Reachtubes for three fixed-wing aircrafts (left) and three linear models (right). Real reachtubes (top) vs. the virtual ones saved in *tubecache* (bottom).

computed. In the experiments, we always compute the full tubes even if it was detected to be unsafe earlier to have a fair comparison of running times. Moreover, the execution time does not include dynamic safety checking as the four versions of the experiments are doing the same computations for that purpose. We are using CacheReach in all scenarios with other reachability computation tools to decrease the degrees of freedom and show the benefits of transforming reachtubes over computing them. The Sym versions result in decrease of running time up-to 64% in the linear case with three agents. The ratio of transformed vs. computed tubes increases as the number of agents increase. This means that different agents are sharing reachtubes with each other in the virtual system. The total number of reachtubes is the same, whether *tubecache* is used or not. This means that the quality of the tubes, i.e. how tight they are, is the same whether we are transforming from *tubecache* or computing from scratch since the initial sets of modes are computed from intersections of reachtubes with guards. The fatter the reachtube is, the larger the initial set gets and the larger the number of reachtubes need to be computed.

## 7 Discussion and conclusions

In this paper, we investigated how symmetry transformations and caching can help achieve scalable, and possibly unbounded, verification of multi-agent systems. We developed a notion of *virtual system* which define symmetry transformations for a broad class of hybrid and dynamical agent models visiting waypoint sequences. Using virtual system, we present a prototype tool called CacheReach that builds a cache of reachtubes for the transformed virtual system, in a way that is agnostic of the representation of the reachsets and the reachability analysis subroutine used. Our experimental evaluation show significant improvement in computation time on simple examples and increased savings as number of agents increase.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Relational Differential Dynamic Logic***-*

Juraj Kolˇc´ak<sup>1</sup> , J´er´emy Dubut2,<sup>3</sup> , Ichiro Hasuo2,<sup>4</sup> , Shin-ya Katsumata<sup>2</sup> , David Sprunger2, and Akihisa Yamada<sup>2</sup>

<sup>1</sup> LSV, CNRS & ENS Paris-Saclay, Universit´e Paris-Saclay, Cachan, France kolcak@lsv.fr <sup>2</sup> National Institute of Informatics, Tokyo, Japan

{dubut,hasuo,s-katsumata,sprunger,akihisayamada}@nii.ac.jp <sup>3</sup> Japanese-French Laboratory for Informatics, CNRS IRL 3527, Tokyo, Japan

<sup>4</sup> The Graduate University for Advanced Studies (SOKENDAI), Tokyo, Japan

**Abstract.** In the field of quality assurance of hybrid systems, Platzer's *differential dynamic logic* (dL) is widely recognized as a deductive verification method with solid mathematical foundations and sophisticated tool support. Motivated by case studies provided by our industry partner, we study a *relational extension* of dL, aiming to formally prove statements such as "an earlier engagement of the emergency brake yields a smaller collision speed." A main technical challenge is to combine two dynamics, so that the powerful inference rules of dL (such as the differential invariant rules) can be applied to such relational reasoning, yet in such a way that we relate two different time points. Our contributions are a semantical theory of *time stretching*, and the resulting *synchronization* rule that expresses time stretching by the syntactic operation of Lie derivative. We implemented this rule as an extension of KeYmaera X, by which we successfully verified relational properties of a few models taken from the automotive domain.

**Keywords:** hybrid system · cyber-physical system · formal verification · theorem proving · dynamic logic.

## **1 Introduction**

**Hybrid Systems** *Cyber-physical systems* (CPSs) have been studied as a subject in their own right for over a decade, but the rise of *automated driving* in the last few years has created a panoply of challenges in the quality assurance of these systems. In the foreseeable future, millions of cars will be driving on streets

<sup>-</sup> Thanks are due to Stefan Mitsch, Andr´e Platzer, and Yong Kiam Tan for useful tips on the KeYmaera X source code; and to Kenji Kamijo, Yoshiyuki Shinya, and Takamasa Suetomi from Mazda Motor Corporation for helpful discussions. The authors are supported by ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603), JST. I.H. is supported by Grant-in-Aid No. 15KT0012, JSPS. J.D. is supported by Grant-in-aid No. 19K20215, JSPS. The work was done during J.K.'s internship at the National Institute of Informatics, Tokyo, Japan.

with unprecedented degrees of automation; ensuring the safety and reliability of these automated driving systems is a pressing social and economic challenge.

The *hybridity* of cyber-physical systems, the combination of continuous physical dynamics and discrete digital control, poses unique scientific challenges. To address these challenges, two communities have naturally joined forces: *control theory* whose traditional application domain is continuous dynamics and *formal methods* that have mainly focused on the analysis of software systems. This has been a fruitful cross-pollination: techniques from formal methods such as bisimilarity [9] and temporal logic specification [8] have been imported to control theory, and conversely, control theory notions such as Lyapunov functions have been used in formal methods [26].

**Deductive Verification of Hybrid Systems** In the formal methods community, two major classes of techniques are *model checking* (usually automatabased and automatic) and *deductive verification* (based on logic and can be automated or interactive). Model checking techniques rely on exhaustive search in state spaces and therefore cannot be applied *per se* to hybrid systems with infinite state spaces. This has led to the active study of *discrete abstraction* of hybrid dynamics, see e.g. [9]; or of *bounded model checking*, see [5].

In contrast, nothing immediately rules out the use of the deductive approach for hybrid systems. Finitely many variables in logical formulas can represent infinitely many states, and proofs in suitably designed logics are valid even when the semantic domain is uncountable. That said, designing such a logic, proving the soundness of its rules, and showing that logics is actually *useful* in hybrid system verification is a difficult task.

Platzer's *differential dynamic logic* dL [21] is a remarkable success in this direction. Its syntax is systematic and intuitive, extending the classic formalism of *dynamic logic* [10] with differential equations as programs. Its proof rules encapsulate several essential proof principles about differential equations, including a *differential invariant* (DI) rule for universal properties and *side deduction* for existential properties. The logic dL has served as a general platform that accommodates a variety of techniques, including those which come from real algebraic geometry [22]. Furthermore, dL comes with sophisticated tool support: the latest tool KeYmaera X [15] comes with graphical interface for interactive proving and a number of automation heuristics.

**Relational Reasoning on Hybrid Systems** In this work, we introduce proof-based techniques for *relational reasoning* to the deductive verification of hybrid systems. Here, by relational reasoning we mean analyzing how changes in the system will affect the overall system behavior. One of the applications of such reasoning in our mind is to deduce the safety of a system by checking the most aggressive settings. To make such reduction sound, we need to verify that less aggressive versions result in less dangerous outcomes than the aggressive ones. As a simple example, consider the following case distilled from our collaboration with an industrial partner.

**Example 1 (leading example: collision speed).** Consider two cars C and C, whose positions and velocities are real numbers denoted by x, x and v, v,

The two hatched areas designate the traveled distances (x = x = 1). We can compute the collision speeds (<sup>v</sup> <sup>=</sup> <sup>√</sup>2 and <sup>v</sup> = 2) via the closed-form solutions of the differential equations (1), concluding v ≤ v when x = x = 1.

**Fig. 1.** An ad-hoc proof for Example 1

respectively. Their dynamics are governed by the following differential equations:

$$
\dot{\overline{x}} = \overline{v}, \quad \dot{\overline{v}} = 1; \qquad \dot{\underline{x}} = \underline{v}, \quad \dot{\underline{v}} = 2. \tag{1}
$$

Both cars start at the same position at rest (x <sup>=</sup> x = 0 <sup>∧</sup> v <sup>=</sup> v = 0), and both drive towards a wall at position 1. We consider this question: *which car is traveling faster when it hits the wall?*

The second car, C, has strictly greater acceleration all the time, so we can imagine that C hits the wall harder. This hypothesis turns out to be correct, but we are more interested in how this claim could be proven.

A simple proof would be to solve the differential equation exactly and notice C has greater velocity at the end of its run. However, it is known that closed-form solutions are scarce for ODEs—we want a proof method that is more general.

Another possible argument is based on the relationship between the accelerations. Since the second car's acceleration is greater at every point in time, we might be tempted to conclude that the second car's velocity must always be greater than the first car's, based on the monotonicity of integration: a(t) <sup>≤</sup> a(t) <sup>⇒</sup> v(t) = - T <sup>0</sup> <sup>a</sup>(t) d<sup>t</sup> <sup>≤</sup> - T <sup>0</sup> <sup>a</sup>(t) d<sup>t</sup> <sup>=</sup> <sup>v</sup>(t). However, this reasoning has a flaw. C reaches the wall at an earlier point in time than C, and therefore C has more time to accelerate. In the end, we have to compare - T <sup>0</sup> <sup>a</sup>(t) d<sup>t</sup> and - T <sup>0</sup> <sup>a</sup>(t) d<sup>t</sup> where a(t) <sup>≤</sup> a(t) for all t <sup>∈</sup> [0, T ] but T >T, as depicted in figure 1.

Our solution, roughly stated, is to compare the two cars at the same points *in space* by reparametrizing time for one of the two cars. This parametrization is specially chosen to ensure the two cars pass through the same points in space at the same points in time.

Our current work is about a logical infrastructure needed to support this kind of relational reasoning comparing two different dynamics, based on dL. Our semantical theory, as well as the resulting syntactic extension of dL by what we call the synchronization rule, generalizes the kind of reasoning in Example 1 using the notion of *time stretching*.

**Technical Contributions** We make the following technical contributions.


**Relational Reasoning in Practice** We contend relational reasoning has practical significance based on our collaboration with an industry partner. Relational properties, especially with an aspect of *monotonicity*, abound in real-world examples. In particular, we have often encountered situations where we have a parametrized model M(p) and need to show a property of the form:

$$p\_1 < p\_2 \text{ implies } M(p\_2) \text{ is less safe than } M(p\_1). \tag{2}$$

These properties occur especially in the context of *product lines*, where the same model can come in many slight variants. Example 1 is such a situation.

Relational statements (such as monotonicity) are easy to state and interpret. Intuitions about the *direction of* the change in a behavior of a system resulting from the change of a parameter are more often valid than intuitions about the *amount of* such a change. These kinds of simple statements are often used by engineers to establish the basic credibility of a model. Qualitative, relational properties also tend to be easier to prove than exact, quantitative properties.

Finally, monotonicity can serve as a powerful technique in *test-case reduction*. If a safety property is too complex to be deductively verified, one usually turns to testing. It is often still possible to establish a simple monotonicity property of the form (2). This can powerfully boost testing efforts: one can focus exclusively on establishing safety for the extreme case <sup>M</sup>(pmax).

**Related Work** Since this work is about its relational extension, the works we mentioned on dL are naturally relevant. We discuss other related works here.

Simulink (Mathworks, Inc.) is an industry standard in modeling hybrid systems, but unfortunately Simulink models do not come with rigorously defined semantics. Therefore, while integration with Simulink is highly desirable any quality assurance methods for hybrid systems, formal verification methods require some work to set up the semantics for Simulink models. The recent work [12] tackles this problem, identifying a fragment of Simulink, and devising a translator from Simulink models to dL programs. Their translation is ingenious, and their tool is capable of proving rather complicated properties when used in combination with KeYmaera X [15].

Relational extensions of the *Floyd–Hoare logic*—which can be thought of as a discrete-time version of dL—have been energetically pursued especially in the context of *differential privacy* [4,2,3].

In deductive verification of hybrid systems, an approach alternative to dL uses *nonstandard analysis* [23] and regards continuous dynamics as if they were discrete due to the existence of infinitesimal elements [24,25]. The logic used in that framework is exactly the same as the classic Floyd–Hoare logic, and the soundness of the logic in the hybrid setting is shown by a model-theoretic result called the *transfer principle*. Its tool support has been pursued as well [11].

This is not the first time that relational reasoning—in a general sense has been pursued in dL. Specifically, Loos and Platzer introduce the *refinement* primitive β <sup>≤</sup> α, which asserts a refinement relation between two hybrid dynamics, meaning the set of successor states of β is included in that of α [14]. This kind of relation is inspired by the software engineering paradigm of incremental modeling (supported by languages and tools such as Event-B [1,6]); the result is a rigorous deductive framework for refining an abstract model (with more nondeterminism) into a more concrete one (with less nondeterminism). In contrast, we compare one concrete model (not necessarily with nondeterminism) with another. Thus, our notion of relational reasoning builds more on relational extensions of the Floyd–Hoare logic [4,2,3] than on Event-B. Combining these two orthogonal kinds of relational extensions of dL is important future work.

**Organization** In Section 2, we recall some basics of differential dynamic logic dL: its syntax, semantics and some proof rules. Our main goal, relational reasoning, is formulated in Section 3, where we identify difficulties in doing so in the original dL. In Section 4 we introduce the semantical notion of time stretching, and turn its theory into the new synchronization rule. After introducing our implementation in Section 5, we describe our three case studies in Section 6.

The appendix containing omitted proofs and details, the source code and the artifact are found at http://group-mmm.org/rddl tacas 2020/.

## **2 Preliminaries: Syntax and Semantics of the Logic dL**

We recall some of the basics of *differential dynamic logic* (dL). The interested reader is referred to [19,20] for full details.

**Definition 2 (language).** We fix a set <sup>V</sup> of *variables*, denoted by x, y, . . . . The set of *terms* is defined by the following grammar:

$$e, f, g, \dots \quad \coloneqq \; x \mid n \mid -e \mid e + f \mid e \cdot f \mid e/f$$

where x ∈ V and n <sup>∈</sup> <sup>N</sup>. First-order *formulas* are defined by

$$P, Q, \dots \quad \coloneqq \ e \le f \mid \neg P \mid P \land Q \mid \forall x. P$$

<sup>A</sup> *state* is a function mapping each variable to a real number, ω : V → <sup>R</sup>. We denote the set of all states by R<sup>V</sup> . Given a state, each term has a valuation in the reals, and each formula has a valuation in Booleans defined by the usual induction. We denote these by - e ω <sup>∈</sup> <sup>R</sup> and - P ω ∈ {true, false}, respectively. The *models* of a first-order formula <sup>P</sup> are the states satisfying <sup>P</sup>, - P := {ω <sup>∈</sup> <sup>R</sup><sup>V</sup> <sup>|</sup> - P ω <sup>=</sup> true}.

We use classical shorthands, including e <sup>=</sup> f := e <sup>≤</sup> f <sup>∧</sup> f <sup>≤</sup> e, P <sup>∨</sup> Q := <sup>¬</sup>(¬P ∧ ¬Q), <sup>∃</sup>x. P := <sup>¬</sup>(∀x. <sup>¬</sup>P), and := 0 ≤ 0. We denote a vector (e1,...,en) of terms (or variables) by **<sup>e</sup>** when the length <sup>n</sup> is irrelevant or clear from the context.

We now introduce the syntax of hybrid programs.

**Definition 3 (hybrid programs).** The set HP(V) of *hybrid programs* over variables V is given by the following grammar:

$$\{\alpha\_1, \alpha\_2, \dots \} \coloneqq \begin{array}{c} \text{?} P \ \mid \ x := e \mid \dot{x}\_1 = e\_1, \dots, \dot{x}\_n = e\_n \& Q \mid \ \alpha\_1; \alpha\_2 \mid \ \alpha\_1 \cup \alpha\_2 \mid \alpha\_1^\* \end{array}$$

We may also abbreviate ˙x<sup>1</sup> <sup>=</sup> <sup>e</sup>1,..., <sup>x</sup>˙ n <sup>=</sup> <sup>e</sup>n by **<sup>x</sup>**˙ <sup>=</sup> **<sup>e</sup>**. Hybrid programs of the form **<sup>x</sup>**˙ <sup>=</sup> **<sup>e</sup>** & Q are especially important in this work. We call such a program *differential dynamics*, where **x**˙ = **e** is its *differential equation* and the first-order formula Q is its *evolution domain constraint*. The intuitive meaning of such a program is that the values of the variables **x** evolve continuously in time according to **<sup>x</sup>**˙ <sup>=</sup> **<sup>e</sup>**, as long as Q is satisfied at the current value of **<sup>x</sup>**. If we see differential dynamics as a continuous analog of loops, then Q plays the role of guard and **<sup>x</sup>**˙ <sup>=</sup> **<sup>e</sup>** plays the role of body.<sup>5</sup> We write **<sup>x</sup>**˙ <sup>=</sup> **<sup>e</sup>** instead of **<sup>x</sup>**˙ <sup>=</sup> **<sup>e</sup>**& .

**Definition 4 (solutions).** A mapping ψ : [0, T) <sup>→</sup> <sup>R</sup><sup>V</sup> with <sup>T</sup> <sup>∈</sup> [0, <sup>∞</sup>] is called <sup>a</sup> *solution* of a differential equation ˙x<sup>1</sup> <sup>=</sup> <sup>e</sup><sup>1</sup>,..., <sup>x</sup>˙ n <sup>=</sup> <sup>e</sup>n if <sup>ψ</sup> is differentiable in [0, T) and, whenever <sup>t</sup> <sup>∈</sup> [0, T), <sup>ψ</sup>˙(t)(xi) = - ei ψ(t) for <sup>i</sup> ∈ {1,...,n} and <sup>ψ</sup>˙(t)(y) = 0 for any <sup>y</sup> ∈V\{x<sup>1</sup>,...,xn}.

According to the Picard–Lindel¨of theorem [13], for each differential equation **<sup>x</sup>**˙ <sup>=</sup> **<sup>e</sup>** and each state <sup>ω</sup>, there is a unique maximal solution <sup>ψ</sup>ω : [0, Tω) <sup>→</sup> <sup>R</sup><sup>V</sup> of the differential equation satisfying <sup>ψ</sup>ω(0) = <sup>ω</sup>.

**Definition 5 (semantics of hybrid programs).** The *semantics* of a hybrid program α is a relation <sup>−</sup> - α → ⊆ <sup>R</sup><sup>V</sup> <sup>×</sup> <sup>R</sup><sup>V</sup> on states, defined by:

<sup>5</sup> This analogy is not perfect: a typical while loop can only exit when its guard is false, whereas a hybrid program can exit the differential dynamics while Q is satisfied.


**Definition 6 (**dL **formulas).** *Modal formulas* extend first-order formulas and are defined by the following grammar:

$$<\varphi, \varphi\_1, \varphi\_2, \dots \ := e \le f \mid \neg \varphi \mid \varphi\_1 \land \varphi\_2 \mid \forall x. \varphi \mid [\alpha] \varphi.$$

As usual, we write αϕ to abbreviate <sup>¬</sup> α <sup>¬</sup>ϕ. We will also call modal formulas "dL formulas" since these are the widest class of formulas in dL.

The Boolean valuation - ϕ ω of a modal formula <sup>ϕ</sup> in a state <sup>ω</sup> is defined in the same way as for first-order formulas, with the addition of - α ϕ ω <sup>=</sup> true if and only if - <sup>=</sup> true for all <sup>ω</sup> such that <sup>ω</sup> <sup>−</sup> - <sup>→</sup> ω .

ϕ ωα We take the sequent-calculus style proof system for dL, following [22]. It has judgments of the form Γ ϕ, where Γ is a set of modal formulas and ϕ is a single modal formula. One of the most fundamental axiom is

$$\left[\dot{\mathbf{x}} = \mathbf{e} \,\, \&\,\, Q\right] \phi \Longleftrightarrow \forall t \ge 0. \left(\forall v \in [0, u]. \left[x := f(v) \right] Q\right) \Rightarrow \left[x := f(u) \right] \phi \quad \text{(solve)}$$

where f(t) is a term with a fresh variable t such that - f is a solution of **x**˙ = **e** and - f(0) <sup>=</sup> id.

Some other rules of dL, such as the differential invariant rule (DI) that is central in many proofs, are introduced later in Definition 13.

## **3 Relational Differential Dynamic Logic**

Intuitively, we want a way to describe two dynamics that are executed in parallel, and compare their outputs. In terms of (nondeterministic) transition systems, parallel composition is available via tensor products.

**Definition 7 (tensor product).** Given two transition systems (S, R) and (S , R ), their *tensor product* (S <sup>×</sup> S , R <sup>⊗</sup> R ) is defined to be the transition system whose transition relation is given by

$$R \otimes R' := \{ (s, s'), (t, t') \mid (s, t) \in R, (s', t') \in R' \}.$$

No extension of the dL syntax is needed to model tensor products: disjointness of the variables of the two systems suffices. From now on we split variables into two disjoint sets: <sup>V</sup> <sup>=</sup> VV. We denote variables in <sup>V</sup> by x, y,... and those in <sup>V</sup> by x, y,... . Terms in <sup>T</sup> ( <sup>V</sup> ), first-order formulas in <sup>F</sup>*ml*( <sup>V</sup> ), and programs in HP( <sup>V</sup> ) are denoted by e, f,... , P , Q, . . . , and α, β,... , and similarly for the corresponding constructs with V.

An easy proof of the following fact can be found in the appendix.

**Proposition 8.** − - α →⊗−- α → = − - α; α →

Scenarios with two parallel differential dynamics are the main focus of this work. We formalize an assertion relating two dynamics using the following format. It is a syntactic counterpart of Proposition 8.

**Definition 9 (relational differential dynamics).** We call hybrid programs of the following form *relational differential dynamics (RDD)*

$$
\dot{\overline{\mathbf{x}}} = \overline{\mathbf{e}} \& \,\, \overline{Q} \; ; \quad \dot{\underline{\mathbf{x}}} = \underline{\mathbf{e}} \& \,\, \underline{Q} \tag{3}
$$

Now that we have ways to express separate systems evolving in parallel, we turn to the construction of proofs which reason about their relationships.

**Example 10.** Using RDD, the problem in Example <sup>1</sup> is expressed as <sup>Γ</sup>C <sup>δ</sup>C ; <sup>δ</sup>C <sup>φ</sup>C where <sup>δ</sup>C := x˙ <sup>=</sup> v, v˙ = 1 , <sup>δ</sup>C := ( ˙<sup>x</sup> <sup>=</sup> v, <sup>v</sup>˙ = 2), <sup>Γ</sup>C := {<sup>x</sup> <sup>=</sup> <sup>x</sup> = 0, <sup>v</sup> <sup>=</sup> <sup>v</sup> = 0} is the precondition, and <sup>φ</sup>C := (<sup>x</sup> <sup>=</sup> <sup>x</sup> = 1 <sup>⇒</sup> <sup>v</sup> <sup>≤</sup> <sup>v</sup>) is the postcondition.

Let us prove, in KeYmaera X, the RDD sequent <sup>Γ</sup>C <sup>δ</sup>C ; <sup>δ</sup>C <sup>φ</sup>C . In KeYmaera X, the only applicable rule to this sequent turns it into <sup>Γ</sup>C δC <sup>δ</sup>C <sup>φ</sup>C . We then explicitly "solve" the second dynamics, yielding the following goal:

$$\{F\_C \vdash \left[\overline{\delta\_C}\right] \forall \underline{t} \ge 0. \left(\overline{x} = \underline{x} + \underline{v} \cdot \underline{t} + \underline{t}^2 = 1 \Rightarrow \overline{v} \le \underline{v} + \underline{t}\right) \tag{4}$$

where <sup>x</sup> and <sup>v</sup> in <sup>φ</sup>C are replaced by their explicit solutions with respect to the freshly introduced time variable t. Again differential invariant rules do not apply to (4), so one must solve the first dynamics, too, yielding

$$\exists \, I\_C \vdash \forall \overline{t} \ge 0. \,\forall \underline{t} \ge 0. \left(\overline{x} + \overline{v} \cdot \overline{t} + \overline{t}^2 / 2 = \underline{x} + \underline{v} \cdot \underline{t} + \underline{t}^2 = 1 \Rightarrow \overline{v} + \overline{t} \le \underline{v} + \underline{t}\right)$$

Since this goal is first order, the quantifier elimination, a central proof technique in KeYmaera X [18], proves the goal.

The above example worked out since it admits explicit solutions expressible in dL. This is not always the case as the following example demonstrates.

**Example 11.** We consider two objects moving through fluids subjected to different kinds of drag. One object moves through a viscous fluid and is therefore subject to linear drag; its dynamics are <sup>δ</sup>F := ( ˙<sup>x</sup> <sup>=</sup> v, <sup>v</sup>˙ <sup>=</sup> <sup>−</sup>v).

The other object moves through a less viscous fluid and is subject to turbulent drag; its dynamics are <sup>δ</sup>F := (x˙ <sup>=</sup> v, <sup>v</sup>˙ <sup>=</sup> <sup>−</sup>v<sup>2</sup>). Our goal is to show that the latter has higher speed when both objects reach a certain point in space (x <sup>=</sup> x <sup>=</sup> l).

The following functions v<sup>∗</sup>, x<sup>∗</sup>, <sup>v</sup><sup>∗</sup> and <sup>x</sup><sup>∗</sup> are solutions of the dynamics.

$$\begin{aligned} \underline{v}^\*(\underline{t}) &= \underline{v}\_0 \cdot e^{-\underline{t}} & \underline{x}^\*(\underline{t}) &= \underline{x}\_0 + \underline{v}\_0 \cdot (1 - e^{-\underline{t}}) \\ \overline{v}^\*(\overline{t}) &= \frac{\overline{v}\_0}{1 + \overline{v}\_0 \cdot \overline{t}} & \overline{x}^\*(\overline{t}) &= \overline{x}\_0 + \log(1 + \overline{v}\_0 \cdot \overline{t}) \end{aligned}$$

where <sup>v</sup><sup>0</sup> etc. denote the initial values. Unfortunately, we cannot express exponentiations and logarithms in KeYmaera X, and thus the "solve" rule that we used in Example 10 cannot be applied here.

One obvious solution to this would be to add support for exponentiations and logarithms in KeYmaera X, but this would break the decidability of the underlying first order logic, which is a major feature of dL [18]. In fact, the same issue occurs even in standard use cases of KeYmaera X, and motivated the introduction of proof rules which do not demand explicit solutions to differential dynamics [20,22] using the *Lie derivative*.

**Definition 12 (formal Lie derivative in** dL **from [20,22]).** The formal Lie derivative of a term f along dynamics δ <sup>≡</sup> (**x**˙ <sup>=</sup> **<sup>e</sup>** & Q) of dimension n is a dL term <sup>L</sup>δ <sup>f</sup> ∈ T (V) given by<sup>6</sup>

$$\mathcal{L}\_{\delta}f := \frac{\partial}{\partial x\_1} f \cdot e\_1 + \dots + \frac{\partial}{\partial x\_n} f \cdot e\_n$$

**Definition 13 (proof rules from [20,22]).** The following rules are sound:

$$\frac{\Gamma, Q \vdash f \sim 0 \quad \Gamma \vdash \left[\delta\right] \mathcal{L}\_{\delta} \, f \simeq 0}{\Gamma \vdash \left[\delta\right] f \sim 0} \text{ DI} \qquad \frac{\Gamma \vdash p \sim 0 \quad Q \vdash \mathcal{L}\_{\delta} \, p \simeq g \cdot p}{\Gamma \vdash \left[\delta\right] p \sim 0} \text{ Dbx}$$

where δ <sup>≡</sup> (**x**˙ <sup>=</sup> **<sup>e</sup>** & Q), (∼, ) ∈ { (=, =),(>, <sup>≥</sup>),(≥, <sup>≥</sup>) }, and g is any term without division.

The differential invariant rule (DI) is the central rule for proving safety properties [20,22]: it reduces a global property of the dynamics to local reasoning by means of Lie derivatives. The Darboux inequality rule (Dbx) is derived from real algebraic geometry; see e.g. [22].

**Example 14.** Consider an example differential dynamics in one variable, ˙x <sup>=</sup> 2. Suppose we want to show that x <sup>≥</sup> 0 holds after following these dynamics for any amount of time, starting from x = 1. One way to do this is to show that (1) this predicate holds initially and (2) the time derivative of x is always nonnegative. These are precisely the two premises of the (DI) rule: to show the sequent x = 1 [ ˙x = 2]x <sup>≥</sup> 0 (DI) requires us to prove (1) x = 1 x <sup>≥</sup> 0 and (2) <sup>x</sup> = 1 [ ˙<sup>x</sup> = 2]Lx˙=2 <sup>x</sup> <sup>≥</sup> 0, where <sup>L</sup>x˙=2 <sup>x</sup> = 2. Note that we give an initial condition x = 1 in the precedent of this sequent.

## **4 Synchronizing Dynamics**

The intuitive explanation of the RDD construction of Definition 9 is a "serialization" of two dynamics. This construction however does not match the (DI)

<sup>6</sup> It is easy to see that the *derivative* of a term <sup>t</sup> ∈ T (V) with respect to <sup>x</sup> ∈ V can be given as a dL term <sup>∂</sup> ∂x <sup>e</sup> ∈ T (V) such that - ∂ ∂x e = <sup>∂</sup> ∂x - e . The definition of <sup>∂</sup> ∂x e is inductive with respect to the term e.

and (Dbx) rules, as they accept only one dynamics followed by a comparison. In order to make use of these rules in our relational reasoning, we introduce another proof method. It "synchronizes" two dynamics.

After some theoretical preparations we define the new rule and prove its soundness. We will illustrate the usefulness of this rule in Section 6, through some case studies that are inspired by our collaboration with the industry.

## **4.1 Time Stretching**

A key theoretical tool towards the soundness of our synchronization rule is called *time stretching*. Its idea is very similar to the technique of *time-reparametrization* for ODEs [7].

**Definition 15 (time stretch function).** Let <sup>T</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>. A function K : [0, T] <sup>→</sup> <sup>R</sup>≥<sup>0</sup> is a *time stretch function* if <sup>K</sup>(0) = 0, <sup>K</sup> is continuously differentiable and K˙ (t) > 0 for each t <sup>∈</sup> [0, T].

**Remark 16.** The condition K˙ (t) > 0 ensures that K is strictly increasing and is a bijection from [0, T] to [0, K(T)]. The inverse of K is K<sup>−</sup><sup>1</sup> : [0, K(T)] <sup>→</sup> [0, T], and it is straightforward to check K<sup>−</sup><sup>1</sup> is another time stretch function.

The next results tell us how to turn an ODE into another, given a time stretching function K, so that a time-stretch ψ◦K of a solution ψ of one becomes a solution of the other.

**Lemma 17.** *Suppose* <sup>f</sup> : <sup>R</sup><sup>V</sup> <sup>→</sup> <sup>R</sup><sup>V</sup> *is a vector field and* K : [0, T] <sup>→</sup> [0, K(T)] *is a time stretch function. If* ψ : [0, K(T)) <sup>→</sup> <sup>R</sup><sup>V</sup> *satisfies* <sup>ψ</sup>˙(s) = <sup>f</sup>(ψ(s)) *for all* s <sup>∈</sup> [0, K(T))*, then the function* ρ <sup>=</sup> ψ ◦ K : [0, T) <sup>→</sup> <sup>R</sup><sup>V</sup> *satisfies* ρ˙(t) = K˙ (t) · f(ρ(t)) *for all* t <sup>∈</sup> [0, T)*.*

*Proof.* We have ˙ρ(t) = K˙ (t)·ψ˙(K(t)) = K˙ (t)·f(ψ(K(t))) = K˙ (t)·f(ρ(t)), where the first equality is by the definitions and the chain rule, the second equality is by the assumption on ψ˙, and the last equality is by the definition of ρ.

Since the inverse of a time stretch function is another time stretch function, we obtain the following corollary of Lemma 17.

**Corollary 18.** *Let* K : [0, T] <sup>→</sup> [0, K(T)] *be a time stretch function. Let* ρ : [0, T) <sup>→</sup> <sup>R</sup><sup>V</sup> *satisfy* ρ˙(t) = K˙ (t)· f(ρ(t)) *whenever* <sup>0</sup> <sup>≤</sup> t<T*. Then the function* ψ : [0, K(T)) <sup>→</sup> <sup>R</sup><sup>V</sup> *, defined by* <sup>ψ</sup>(s) := <sup>ρ</sup>(K<sup>−</sup><sup>1</sup>(s))*, satisfies* <sup>ψ</sup>˙(s) = <sup>f</sup>(ψ(s)) *whenever* <sup>0</sup> <sup>≤</sup> s<K(T)*.*

### **4.2 Towards a Syntactic Representation**

So far our time-stretch function K has been a semantical object. Here we introduce a syntactic way of reasoning via time-stretch functions. Since a desired time-stretch function is not necessarily expressible in dL, our syntactic reasoning uses an indirect method that exploits a pair of functions called a synchronizer. We will be eventually led to a syntactic reasoning rule (Sync) (Thm. 24).

Given a term <sup>g</sup> ∈ T (X) and a mapping <sup>ψ</sup> : [0, T) <sup>→</sup> <sup>R</sup>X, we define <sup>g</sup>ψ : [0, T) <sup>→</sup> <sup>R</sup> by

$$g\_{\psi}(t) := \|g\|\_{\psi(t)}.\tag{5}$$

Intuitively, <sup>g</sup>ψ(t) is the value of <sup>g</sup> at time <sup>t</sup> when we follow the dynamics whose solution is ψ.

**Definition 19 (synchronizers).** Let (δ, δ) be a pair of dynamics, (ω, ω) <sup>∈</sup> <sup>R</sup><sup>V</sup> <sup>×</sup> <sup>R</sup><sup>V</sup> be a pair of states, and ψ : [0, T) <sup>→</sup> <sup>R</sup><sup>V</sup> and ψ : [0, T) <sup>→</sup> <sup>R</sup><sup>V</sup> be the unique solutions of <sup>δ</sup> and <sup>δ</sup> from <sup>ω</sup> and <sup>ω</sup>, respectively. We say a pair of dL terms (g, g) ∈ T ( <sup>V</sup> ) × T ( <sup>V</sup> ) *synchronizes* (δ, δ) from (ω, ω) if the following hold.

**–** <sup>g</sup>ψ(0) = <sup>g</sup>ψ(0)

**–** The derivatives of <sup>g</sup>ψ and <sup>g</sup><sup>ψ</sup> are both strictly positive.

The following lemma ensures that, for any synchronizer, a corresponding time stretch function exists.

**Lemma 20.** *In the setting of Definition 19, let* t <sup>∈</sup> [0, T) *and* t <sup>∈</sup> [0, T) *be such that* <sup>g</sup>ψ(t) = <sup>g</sup>ψ(t)*. Then the function* <sup>K</sup>*, defined by* <sup>K</sup>(s) := <sup>g</sup>ψ−<sup>1</sup>(gψ(s))*, is a time stretch function from* [0,t] *to* [0, t]*. Moreover we have* <sup>K</sup>˙ (s) = <sup>g</sup>˙ <sup>ψ</sup>(s) g˙ <sup>ψ</sup>(K(s)) *.*

*Proof.* Since <sup>g</sup>ψ is strictly monotonic on [0, t], it has an inverse <sup>g</sup>ψ−<sup>1</sup> defined from <sup>g</sup>ψ([0, t]) to [0, t]. By assumption we have <sup>g</sup>ψ(0) = <sup>g</sup>ψ(0), and thus <sup>K</sup>(0) = <sup>g</sup>ψ−<sup>1</sup>(gψ(0)) = <sup>g</sup>ψ−<sup>1</sup>(gψ(0)) = 0. Also since <sup>g</sup>ψ(t) = <sup>g</sup>ψ(t), we see that <sup>g</sup>ψ−<sup>1</sup> is defined from <sup>g</sup>ψ([0,t]) to [0, t]. Thus <sup>K</sup> <sup>=</sup> <sup>g</sup>ψ−<sup>1</sup> ◦ <sup>g</sup>ψ is defined from [0,t] to [0, t].

$$\begin{aligned} \dot{K}(s) &= \dot{\overline{g}}\_{\overline{\psi}}(s) \cdot \left( \underline{g\_{\psi}}^{\cdot -1} \right) (\overline{g\_{\overline{\psi}}}(s)) & \text{derivative of } K = \underline{g\_{\psi}}^{-1} \circ \overline{g\_{\overline{\psi}}} \\ &= \frac{\dot{\overline{g\_{\psi}}}(s)}{\underline{g\_{\psi}}(\underline{g\_{\psi}}^{\cdot -1} \circ \overline{g\_{\overline{\psi}}}(s)))} & \text{derivative of } \underline{g\_{\psi}}^{-1} \\ &= \frac{\dot{\overline{g\_{\psi}}}(s)}{\underline{g\_{\psi}}(K(s))} \end{aligned}$$

whose value is positive by assumptions on the derivatives of <sup>g</sup>ψ and <sup>g</sup>ψ.

We remark that time stretch functions we obtain in Lemma 20 are not necessarily expressible as a dL term, as exemplified by the following example.

**Example 21.** Consider two dynamics <sup>δ</sup>F := (x˙ <sup>=</sup> v, <sup>v</sup>˙ <sup>=</sup> <sup>−</sup>v<sup>2</sup>) and <sup>δ</sup> := (x˙ <sup>=</sup> 1). Their solutions ψ,ψ : <sup>R</sup>≥<sup>0</sup> <sup>→</sup> <sup>R</sup><sup>2</sup> from initial value <sup>x</sup> = 0, v = 1 are

$$
\overline{\psi}(s) = \left(\log(1+s), (1+s)^{-1}\right) \qquad \qquad \underline{\psi}(s) = (s,0).
$$

Now let <sup>g</sup> <sup>=</sup> <sup>x</sup> and <sup>g</sup> <sup>=</sup> <sup>x</sup>. Then <sup>g</sup>ψ(s) = log(1 + <sup>s</sup>), <sup>g</sup><sup>ψ</sup> <sup>=</sup> <sup>g</sup>ψ−<sup>1</sup> <sup>=</sup> id and thus <sup>K</sup>(s) = <sup>g</sup>ψ−<sup>1</sup>(gψ(s)) = log(1 + <sup>s</sup>). This is not rational and not expressible in dL.

Using the syntactic Lie derivative (Definition 12), we state a sound inference rule that does not need K to be represented explicitly. We note that there is strong support for Lie derivatives in the tool KeYmaera X, as a key syntactic operation behind the differential invariant (DI) rule (Definition 13).

**Definition 22.** Let δ := **<sup>x</sup>**˙ <sup>=</sup> **<sup>e</sup>** & Q and δ := **<sup>x</sup>**˙ <sup>=</sup> **<sup>e</sup>** & Q be two dynamics and let (g, g) ∈ T ( <sup>V</sup> ) × T ( <sup>V</sup> ) (which is supposed to be a synchronizer). We define the *synchronized dynamics* of (δ, δ) with respect to (g, g) as follows:

$$\left(\overline{\boldsymbol{\delta}} \otimes\_{(\overline{\boldsymbol{g}}, \underline{\boldsymbol{g}})} \underline{\boldsymbol{\delta}}\right. := \left(\dot{\overline{\mathbf{x}}} = \overline{\mathbf{e}}, \ \dot{\underline{\mathbf{x}}} = \frac{\mathcal{L}\_{\overline{\boldsymbol{\delta}}} \overline{\boldsymbol{g}}}{\mathcal{L}\_{\underline{\boldsymbol{g}}} \underline{\boldsymbol{g}}} \cdot \underline{\mathbf{e}}\right) \&\left(\overline{\boldsymbol{Q}} \wedge \underline{\boldsymbol{Q}} \wedge \mathcal{L}\_{\overline{\boldsymbol{\delta}}} \overline{\boldsymbol{g}} > 0 \wedge \mathcal{L}\_{\underline{\boldsymbol{\delta}}} \underline{\boldsymbol{g}} > 0\right),$$

**Lemma 23.** *Let* (g, g) *be a synchronizer of* (δ, δ) *from* (ω0, ω<sup>0</sup>)*. The following are equivalent, where the semantical transition relations are from Definition 5.*

*1.* (ω0, ω<sup>0</sup>) <sup>−</sup> - δ; δ <sup>→</sup> (ω, ω) *and* (ω, ω) <sup>∈</sup> - g <sup>=</sup> g *2.* (ω0, ω<sup>0</sup>) <sup>−</sup> - <sup>δ</sup> <sup>⊗</sup>(g,g) <sup>δ</sup> <sup>→</sup> (ω, ω)

*Proof.* We first prove (1 ⇒ 2). In the proof of Lemma 20, we can observe that g˙ ψ(s) = - <sup>L</sup>δ <sup>g</sup> ψ(s), and analogously, ˙gψ(s) = - <sup>L</sup>δ <sup>g</sup> ψ(s). Hence we obtain

$$\dot{K}(s) = \frac{\left[\mathcal{L}\_{\overline{\mathcal{S}}} \overline{g}\right]\_{\overline{\psi}(s)}}{\left[\mathcal{L}\_{\underline{\mathcal{S}}} \underline{g}\right]\_{\overline{\psi}(K(s))}} = \left[\frac{\mathcal{L}\_{\overline{\mathcal{S}}} \overline{g}}{\mathcal{L}\_{\underline{\mathcal{S}}} \underline{g}}\right]\_{\rho(s)}\tag{6}$$

where ρ : [0,t) <sup>→</sup> <sup>R</sup>VV is defined by <sup>ρ</sup>(s) := ψ(s), ψ(K(s)) .

We note that K : [0,t] <sup>→</sup> [0, K(t)] is a time-stretch function, and that <sup>ψ</sup> is a solution of **<sup>x</sup>**˙ <sup>=</sup> **<sup>e</sup>**, that is, ψ˙ (u) = - **e** ψ(u) whenever 0 <sup>≤</sup> u<t <sup>=</sup> <sup>K</sup>(t). Combined with Lemma 17, we obtain

$$\iota\left(\underline{\psi}\circ\overset{\cdot}{K}\right)(s) = \dot{K}(s)\cdot\left[\underline{\mathbf{e}}\right]\underline{\psi}(K(s)) = \dot{K}(s)\cdot\left[\underline{\mathbf{e}}\right]\_{\rho(s)}\qquad\text{whenever }0 \le s < \overline{t}.$$

Hence, with the fact that ψ is a solution of **<sup>x</sup>**˙ <sup>=</sup> **<sup>e</sup>**, we obtain

$$\dot{\rho}(s) = \left( \dot{\overline{\psi}}(s), \left( \dot{\underline{\psi}} \circ \overset{\cdot}{K} \right)(s) \right) = \left( \left[ \overline{\mathbf{e}} \right]\_{\rho(s)}, \dot{K}(s) \cdot \left[ \underline{\mathbf{e}} \right]\_{\rho(s)} \right) = \left[ \left( \overline{\mathbf{e}}, \frac{\mathcal{L}\_{\overline{\delta}} \overline{\mathcal{g}}}{\mathcal{L}\_{\underline{\delta}} \underline{\mathcal{g}}} \cdot \underline{\mathbf{e}} \right) \right] \rho(s)$$

whenever 0 <sup>≤</sup> s < t. Here the last equality is from (6). This concludes that ρ is a solution of the dynamics <sup>δ</sup> <sup>⊗</sup>(g,g) <sup>δ</sup>. It remains to prove that for all <sup>τ</sup> <sup>∈</sup> [0,t], - <sup>Q</sup> <sup>∧</sup> <sup>Q</sup> ∧ Lδ g > <sup>0</sup> ∧ L<sup>δ</sup> <sup>g</sup> <sup>&</sup>gt; <sup>0</sup> ρ(τ) is true. This is an easy consequence of item 1, and the fact that (g, g) is a synchronizer of (δ, δ) from (ω<sup>0</sup>, ω<sup>0</sup>).

For the direction (2 <sup>⇒</sup> 1), let (ξ, ξ) : [0, T) <sup>→</sup> <sup>R</sup><sup>V</sup> <sup>×</sup>R<sup>V</sup> be the unique solution of <sup>δ</sup> <sup>⊗</sup>(g,g) <sup>δ</sup> from (ω<sup>0</sup>, ω<sup>0</sup>). Then there is <sup>t</sup> <sup>∈</sup> [0, T) such that (ξ(t), ξ(t)) = (ω, ω). Let us prove that (ω, ω) <sup>∈</sup> - g <sup>=</sup> g . The function h : s <sup>∈</sup> [0, T) <sup>→</sup> - g <sup>ξ</sup>(s) <sup>−</sup> - g ξ(s) is equal to 0 at <sup>s</sup> = 0 and its derivative is given by:

$$\dot{h}(s) = \left[\mathcal{L}\_{\overline{\delta}} \overline{g}\right]\_{\overline{\xi}(s)} - \left[\mathcal{L}\_{\underline{\delta}} \underline{g}\right]\_{\underline{\xi}(s)} \cdot \frac{\left[\mathcal{L}\_{\overline{\delta}} \overline{g}\right]\_{\overline{\xi}(s)}}{\left[\mathcal{L}\_{\underline{\delta}} \underline{g}\right]\_{\underline{\xi}(s)}} = 0$$

Consequently, h is the constant function equal to 0, which implies that (ω, ω) <sup>∈</sup> - g <sup>=</sup> g . By definition, <sup>ξ</sup> is a solution of <sup>δ</sup>, so <sup>ω</sup><sup>0</sup> <sup>−</sup> - δ <sup>→</sup> <sup>ω</sup>. Furthermore, by Corollary 18, <sup>ξ</sup> ◦ <sup>K</sup>−<sup>1</sup> is a solution of <sup>δ</sup>. Thus <sup>ω</sup><sup>0</sup> <sup>−</sup> - δ <sup>→</sup> <sup>ω</sup> and

$$\{\left(\overline{\omega}\_{0},\underline{\omega}\_{0}\right)\}\text{ --}\lbrack\overline{\delta};\ \underline{\delta}\rbrack\mapsto(\overline{\omega},\underline{\omega}).\tag{7}$$

The above lemma is a key observation in the current work. It allows us to turn the relational dynamics δ; δ—expressed as a sequential composition in dL into a combined dynamics <sup>δ</sup> <sup>⊗</sup>(g,g) <sup>δ</sup>. Moreover, we can do so in a way that the two dynamics are synchronized in a reparametrized manner, as specified by (g, g). Such combination of two dynamics is crucial in exploiting the logical infrastructure of dL and KeYmaera X—we emphasize again that the (DI) rule does not support invariant reasoning about the relationship between δ and δ, when the relational dynamics is expressed in the original form <sup>δ</sup>; <sup>δ</sup>.

The following is an incarnation of Lemma 23 as a proof rule. We assume that a postcondition is a conditional form E <sup>⇒</sup> ϕ; E is called an *exit condition*. By assuming that E implies <sup>g</sup> <sup>=</sup> <sup>g</sup>, we enforce the second condition (ω, ω) <sup>∈</sup> - g <sup>=</sup> g in item <sup>1</sup> of Lemma 23. The first three premises are there to ensure that (g, g) is a synchronizer. Under these premises (the first four), the rule allows one to transform its conclusion (about <sup>δ</sup>; <sup>δ</sup>) into one about the combined dynamics <sup>δ</sup> <sup>⊗</sup>(g,g) <sup>δ</sup>, which is amenable to application of the (DI) rule, for example.

**Theorem 24 (synchronization rule).** *The following inference rule is sound:*

$$\frac{\begin{array}{c} I \vdash [\,^{\tilde{\delta}}\rangle \mathcal{L}\_{\overline{\delta}} \overline{g} > 0 \quad \Gamma \vdash \overline{g} = \underline{g} \\\ I \vdash [\underline{\delta}\,] \mathcal{L}\_{\underline{\delta}} \underline{g} > 0 \quad E \vdash \overline{g} = \underline{g} \quad \Gamma \vdash [\,^{\overline{\delta}}\otimes|\_{(\overline{g},\underline{g})} \,\underline{\delta}\,](E \Rightarrow \varphi) \\\ I \vdash [\,^{\overline{\delta}}\overline{\delta};\underline{\delta}\,](E \Rightarrow \varphi) \end{array}}{(Sym)}(Smc))$$

Recall the definition of <sup>δ</sup> <sup>⊗</sup>(g,g) <sup>δ</sup> (Definition 22), where time stretching for the second dynamics δ is expressed syntactically by Lie derivatives. We call the four premises <sup>Γ</sup> <sup>g</sup> <sup>=</sup> <sup>g</sup>, <sup>E</sup> <sup>g</sup> <sup>=</sup> <sup>g</sup>, <sup>Γ</sup> [ <sup>δ</sup> ]Lδ g > 0, and <sup>Γ</sup> [ <sup>δ</sup> ]L<sup>δ</sup> <sup>g</sup> <sup>&</sup>gt; <sup>0</sup> the *synchronizability conditions*. These obligations are usually easy to discharge. The last premise, which we call the *synchronized formula*, is typically the core remaining obligation.

**Remark 25 (choice of** (g, g)**).** In applying the (Sync) rule, one still has to find a suitable synchronizer (g, g). This turns out to be straightforward in many examples. In all the case studies in Section 6 and in Example 1, the exit condition E is of the form <sup>x</sup> <sup>=</sup> <sup>x</sup> <sup>=</sup> <sup>C</sup> where <sup>C</sup> is a constant. This suggests the use of <sup>g</sup> <sup>=</sup> <sup>x</sup>, g <sup>=</sup> <sup>x</sup>. Indeed, all our proofs use this choice of (g, g).

## **5 Implementation**

KeYmaera X [17] is an interactive theorem prover based on the sequent calculus formulation of dL. It is implemented in Scala, replacing its former system KeYmaera [16]. It has a web-based GUI environment, and a support of automated theorem proving using computer algebra systems such as Mathematica [27].

For the formalization of case studies in Section 6, we extended KeYmaera X version 4.7 (available at [17]) with the (Sync) rule. This extension of KeYmaera X, together with our proofs in case studies, are currently available at http:// group-mmm.org/rddl tacas 2020/.

The KeYmaera X implementation is structured in a flexible manner, from which we benefited. To add a rule to KeYmaera X, one has to implement a Scala program that take the conclusion of the rule and generate the premises of the rule as subgoals. The fact that any Scala program is allowed here enabled us to implement complex algorithms, such as inductive translation of formulas.

In implementing the (Sync) rule, the functions in KeYmaera X called *helpers* helped us, such as in the Lie derivative computation and the functionality to simplify formulas into equivalent ones. The bulk of our effort regarded the <sup>⊗</sup>(g,g) operator. There we did a bit more general than we stated in the paper: not only taking dynamics of form **<sup>x</sup>**˙ <sup>=</sup> **<sup>e</sup>** & Q, we also allow sequences of dynamics possibly interleaved by guards and nondeterministic choices. This feature was utilized in the case study that will be described in Section 6.3.

## **6 Case Studies**

We describe three case studies where we proved relational properties of hybrid dynamics. We did so formally in our extension of KeYmaera X described in Section 5. In all the examples, we apply the (Sync) rule as a main proof step, in conjunction with the existing rules in dL. Below, we describe our example systems and outline the important steps in the formal proofs.

#### **6.1 Collision Speed with Constant Acceleration**

In this section we apply the (Sync) rule to the running Example 1. For this example we consider two dynamics <sup>δ</sup>C := x˙ <sup>=</sup> v, v˙ <sup>=</sup> a and <sup>δ</sup>C := ( ˙<sup>x</sup> <sup>=</sup> v, <sup>v</sup>˙ <sup>=</sup> <sup>a</sup>). Both dynamics represent a car with constant acceleration. Our claim is that if acceleration is larger in the first system, then the first car is necessarily faster than the second car after traveling the same distance l; formally,

$$\left[\varGamma \vdash \left[\overline{\delta\_C}; \underline{\delta\_C}\right]\right] \left(\overline{x} = l \land \underline{x} = l \Rightarrow \underline{v} \le \overline{v}\right) \tag{7}$$

where

$$\Gamma := \{ 0 = \overline{x} = \underline{x}, \ 0 < \overline{v} = \overline{v}\_0, \ \underline{v} = \underline{v}\_0, \ \overline{v}\_0 \ge \underline{v}\_0, \ \ 0 \le \underline{a} \le \overline{a} \}$$

We apply the (Sync) rule, where <sup>g</sup> := <sup>x</sup> and <sup>g</sup> := <sup>x</sup>. The first two synchronizability conditions are <sup>Γ</sup> <sup>x</sup> <sup>=</sup> x and x <sup>=</sup> l, x <sup>=</sup> l x <sup>=</sup> x, which are trivial. The last two synchronizability conditions are

$$\Gamma \vdash \left[\overline{\delta\_C}\right] \mathcal{L}\_{\overline{\delta\_C}} \overline{g} = \overline{v} > 0 \qquad\qquad \Gamma \vdash \left[\underline{\delta\_C}\right] \mathcal{L}\_{\underline{\delta\_C}} \underline{g} = \underline{v} > 0$$

which are proven using differential invariants (DI). The synchronized formula is

$$T \vdash \left[\overline{\delta\_C}, \underline{\dot{x}} = \underline{v} \cdot (\overline{v}/\underline{v}), \underline{\dot{v}} = \underline{a} \cdot (\overline{v}/\underline{v}) \& \ \overline{v} > 0 \land \underline{v} > 0 \right] (\overline{x} = l \land \underline{x} = l \Rightarrow \underline{v} \le \overline{v})$$

One might try to show the inequality v <sup>−</sup> v <sup>≥</sup> 0 by the differential invariant (DI) rule, but the Lie derivative of the term <sup>v</sup> <sup>−</sup> <sup>v</sup> is <sup>a</sup> <sup>−</sup> <sup>a</sup> · (v/v), which is not obviously nonnegative. Instead, a trickier expression a·(v2−v<sup>2</sup> <sup>0</sup>)−a·(v2−v<sup>2</sup> <sup>0</sup>)=0 turns out to be an invariant. Its Lie derivative is a · (2v) · a · (v/v) <sup>−</sup> a · (2v) · a, which is clearly 0, since we also know v <sup>&</sup>gt; 0.

We do not have an intuitive explanation for this invariant, but it was found by a template-based search, like many other invariants in dL. By positing the existence of a polynomial invariant of a certain degree, we can find conditions on the coefficients by requiring its Lie derivative and initial value are zero. Solving these conditions for a second-degree invariant on the velocities in the system yielded the invariant above.

After finding our invariant, we additionally have to show the invariant entails our desired result, <sup>v</sup> <sup>≤</sup> v. This can be shown with a standard monotonicity property of modal logics: from φ ψ and Γ [α]φ, we can conclude Γ [α]ψ, where φ states the expression above is an invariant and the velocities are always greater than their initial value, and <sup>ψ</sup> is our goal: <sup>v</sup> <sup>≤</sup> v.

#### **6.2 Collision Speed with Different Kinds of Friction**

Here we continue Example 11, where we consider two dynamics <sup>δ</sup>F <sup>≡</sup> (x˙ <sup>=</sup> v, <sup>v</sup>˙ <sup>=</sup> <sup>−</sup>v<sup>2</sup>) and <sup>δ</sup>F <sup>≡</sup> ( ˙<sup>x</sup> <sup>=</sup> v, <sup>v</sup>˙ <sup>=</sup> <sup>−</sup>v). Our goal is <sup>Γ</sup>F [ <sup>δ</sup>F ; <sup>δ</sup>F ](<sup>x</sup> <sup>=</sup> <sup>x</sup> <sup>=</sup> <sup>l</sup> <sup>⇒</sup> <sup>v</sup> <sup>≤</sup> <sup>v</sup>), with <sup>Γ</sup>F := {<sup>x</sup> <sup>=</sup> <sup>x</sup> = 0, <sup>0</sup> < v <sup>≤</sup> <sup>v</sup> <sup>≤</sup> <sup>1</sup>}.

First, we establish the fact that the objects in this example always have positive velocity. We show this by the (Dbx) rule (Definition 13), where <sup>L</sup>δ<sup>F</sup> <sup>v</sup> <sup>=</sup> <sup>−</sup>v<sup>2</sup> and <sup>L</sup>δ<sup>F</sup> <sup>v</sup> <sup>=</sup> <sup>−</sup>v. This allows us to infer v > 0 and <sup>v</sup> <sup>&</sup>gt; 0 hold at all times.

We apply the (Sync) rule along <sup>x</sup> <sup>=</sup> <sup>x</sup>, yielding the synchronized dynamics

$$\dot{\overline{x}} = \overline{v}, \dot{\overline{v}} = -\overline{v}^2, \dot{\underline{x}} = \underline{v} \cdot (\overline{v}/\underline{v}), \dot{\underline{v}} = -\underline{v} \cdot (\overline{v}/\underline{v}) \text{ & } \overline{v} > 0 \land \underline{v} > 0$$

Note that the new evolution domain condition <sup>v</sup> > 0 allows us to rewrite v ·(v/v) to <sup>v</sup>. The synchronizability conditions follow immediately from the fact that v <sup>&</sup>gt; 0 and v > 0. For the synchronized formula, we apply the (DI) rule, so the desired inequality <sup>v</sup> <sup>≥</sup> <sup>v</sup> is reduced to <sup>v</sup><sup>2</sup> <sup>≤</sup> <sup>v</sup>, that is, <sup>v</sup> <sup>≤</sup> 1. To this end, v > <sup>0</sup> tells us that the derivative of <sup>v</sup>, that is, <sup>−</sup>v<sup>2</sup>, is always negative, therefore <sup>v</sup> <sup>≤</sup> 1.

#### **6.3 Model Refinement**

In this example, we consider two abstract models of cars. The first car is able to provide a high amount of constant acceleration a at low velocities, but at a certain velocity <sup>v</sup>cut the engine switches to a different mode and then provides a lesser, but still constant acceleration <sup>a</sup>cut. The second car is an abstracted version of the first, which ignores this mode change and provides the same constant amount of acceleration a at all velocities. Our aim in this example is to establish a safety envelope around the first car's behavior using the more simply stated second car's dynamics. Hence we show that the second car's velocity is greater than the first's at any position x <sup>=</sup> x <sup>=</sup> l. More formally, the behavior of the

first car is expressed as a hybrid program <sup>α</sup> := ( <sup>δ</sup>1; ?<sup>v</sup> <sup>=</sup> <sup>v</sup>cut; <sup>δ</sup><sup>2</sup> ) with two modes: <sup>δ</sup><sup>1</sup> := (x˙ <sup>=</sup> v, <sup>v</sup>˙ <sup>=</sup> <sup>a</sup> & <sup>v</sup> <sup>≤</sup> <sup>v</sup>cut) and <sup>δ</sup><sup>2</sup> := (x˙ <sup>=</sup> v, <sup>v</sup>˙ <sup>=</sup> <sup>a</sup>cut). The second car follows the simple dynamics <sup>δ</sup> := ( ˙x <sup>=</sup> v, v˙ <sup>=</sup> a). Our goal is to prove the sequent Γ α; δ (<sup>x</sup> <sup>=</sup> x <sup>=</sup> l <sup>⇒</sup> v <sup>≤</sup> v), where the initial conditions are given by

$$\Gamma := \left( \overline{x} = \underline{x} = 0, 0 < \overline{v} = \underline{v} = v\_0, 0 < v\_{cut}, 0 < a\_{cut} \le a \right)$$

Technically, the (Sync) rule merges one differential dynamics with another, but the program the first car executes is a more complicated composition of dynamics and testing. However, it is possible to synchronize *piecewise*, first synchronizing <sup>δ</sup> with <sup>δ</sup><sup>1</sup> until the first car changes modes, then synchronizing <sup>δ</sup> with <sup>δ</sup><sup>2</sup> for the remainder of their runs. This slightly generalized synchronization procedure means that we can instead show

$$F \vdash \left[ \overline{\delta\_1} \otimes\_{\left( \overline{x}, \underline{x} \right)} \underline{\delta}; ? \overline{v} = v\_{cut}; \overline{\delta\_2} \otimes\_{\left( \overline{x}, \underline{x} \right)} \underline{\delta} \right] \left( \underline{x} = \overline{x} = l \Rightarrow \underline{v} \le \overline{v} \right)$$

There are also now two sets of synchronizability conditions to satisfy, but both are again straightforward. Since <sup>δ</sup><sup>1</sup> and <sup>δ</sup> are nearly identical (except for the evolution domain constraint), their synchronization <sup>δ</sup><sup>1</sup>⊗(x,x) <sup>δ</sup> basically identifies the two dynamics. The synchronization of <sup>δ</sup><sup>2</sup> and <sup>δ</sup> is exactly the synchronization performed above in Section 6.1, and proceeds in the same way.

## **7 Conclusions and Future Work**

In this paper, we present a relational extension of the differential dynamic logic based on time stretching of dynamics. This reparametrization enables us to enforce that comparisons between two systems occur when certain conditions are satisfied, for example when two cars are passing through the same position. While such reparametrizations can be thought of as stretching or compressing time for one of the dynamics, we also show they can be conducted by a transformation of the dynamics themselves, based on Lie derivatives. We call this process *synchronizing* the dynamics (Definition 19), and it leads us to a new dL proof rule, the (Sync) rule (Theorem 24). We implemented the new rule in the KeYmaera X tool and use our extension to demonstrate several nontrivial relational properties of dynamical systems.

In the future, we think it would be interesting to combine our relational logic with orthogonal relational extensions of dL [14] which focus on *refinement relations* with varying levels of nondeterminism. We also hinted in our last case study that it is possible to synchronize wider classes of hybrid programs than just two differential dynamics. We also think that the level of automated proof search available in KeYmaera X may enable the automatic detection of monotonic properties in *product lines*. This may be useful in industry both to provide sanity checks on formalized models of products, as well as enabling strong guarantees to be more easily obtained for those models.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Verifying Concurrent Systems

#### Assume, Guarantee or Repair*-*

Hadar Frenkel<sup>1</sup>, Orna Grumberg<sup>1</sup>, Corina Pasareanu<sup>2</sup> and Sarai Sheinvald<sup>3</sup>

<sup>1</sup> Department of Computer Science, The Technion, Haifa, Israel

<sup>2</sup> Carnegie Mellon University and NASA Ames Research Center, CA, USA

<sup>3</sup> Department of Software Engineering, Braude College of Engineering, Karmiel, Israel

Abstract. We present Assume-Guarantee-Repair (AGR) – a novel framework which not only verifies that a program satisfies a set of properties, but also *repairs* the program in case the verification fails. We consider *communicating programs* – these are simple C-like programs, extended with synchronous communication actions over communication channels. Our method, which consists of a learning-based approach to assume-guarantee reasoning, performs verification and repair simultaneously: in every iteration, AGR either makes another step towards proving that the (current) system satisfies the specification, or alters the system in a way that brings it closer to satisfying the specification. We manage handling infinite-state systems by using a finite abstract representation, and reduce the semantic problems in hand – satisfying complex specifications that also contain first-order constraints – to syntactic ones, namely membership and equivalence queries for regular languages. We implemented our algorithm and evaluated it on various examples. Our experiments present compact proofs of correctness and quick repairs.

#### 1 Introduction

Verification of large-scale systems is a main challenge in the field of formal verification. Often, the verification process of such a system does not scale well. *Compositional verification* aims to verify small components of a system separately, and from the correctness of the individual components, to conclude the correctness of the entire system. This, however, is not always possible, since the correctness of a component often depends on the behavior of its environment.

The Assume-Guarantee (AG) style compositional verification [22,26] suggests a solution to this problem. The simplest AG rule checks if a system composed of components M<sup>1</sup> and M<sup>2</sup> satisfies a property P by checking that M<sup>1</sup> under assumption A satisfies P and that any system containing M<sup>2</sup> as a component satisfies A. Several frameworks have been proposed to support this style of reasoning. Finding a suitable assumption A is then a common challenge in such frameworks.

In this work, we present *Assume-Guarantee-Repair* (AGR) – a fully automated framework which applies the Assume-Guarantee rule, and while seeking a suitable assumption A, incrementally repairs the given program in case the verification fails. Our framework is inspired by [24], which presented a learning-based method to finding an assumption A, using the L<sup>∗</sup> [5] algorithm for learning regular languages.

Our AGR framework handles *communicating programs*. These are infinite-state C-like programs, extended with the ability to synchronously read and write messages over communication channels. We model such programs as finite-state automata over an *action alphabet*, which reflects the program statements. The accepting states in these automata model points of interest in the program that the specification can relate to. The automata representation is similar in nature to that of control-flow graphs. Its advantage, however, is in the ability to exploit an automata-learning algorithm such as L∗.

<sup>-</sup> This research was partially supported by the Technion Hiroshi Fujiwara Cyber Security Research Center, the Israel National Cyber Directorate and the Israel Science Foundation (ISF)

#### 212 H. Frenkel et al.

The composition of the two program components, M<sup>1</sup> and M2, denoted M1||M2, synchronizes on readwrite actions on the same channel. Between two synchronized actions, the individual actions of both systems interleave.

Fig. 1: Modeling a communicating program as an automaton M<sup>2</sup>

Figure 1 presents the code of a communicating program (left) and its corresponding automaton M<sup>2</sup> (right). The automaton alphabet consists of constraints (e.g. xpw ≤ 999), assignment actions (e.g. ypw := 2 · ypw in M<sup>1</sup> of Figure 2), and communication actions (e.g. enc!xpw sends the value of variable xpw over channel enc, and getEnc?xpw<sup>2</sup> reads a value to xpw<sup>2</sup> on channel getEnc).

The specification P is modeled as an automaton that does not contain assignment actions. It may contain communication actions in order to specify behavioral requirements, as well as constraints over the variables of both system components, that express requirements on their values in various points in the runs.

Consider, for example, the program M<sup>1</sup> and the specification P seen in Figure 2, and the program M<sup>2</sup> of Figure 1. M<sup>2</sup> reads a password on channel read to the variable xpw, and once it is long enough (has at least four digits), it sends the value of xpw to M<sup>1</sup> through channel enc. M<sup>1</sup> reads this value to variable ypw and then applies a simple function that changes its value, and sends the changed variable back to M2. The property P reasons about the parallel run of the two programs. The pair (getEnc?xpw<sup>2</sup>, getEnc!ypw) denotes a synchronization of M<sup>1</sup> and M<sup>2</sup> on channel getEnc. P makes sure that the parallel run of M<sup>1</sup> and M<sup>2</sup> always reads a value and then encrypts it – a temporal requirement. In addition, it makes sure that the value after encryption is different than the original value, and that there is no overflow – both are semantic requirements on the program variables. That is, P expresses temporal requirements that contain first order constraints. In case one of the requirements does not hold, P reaches the state r<sup>4</sup> which is an error state. Note that P here is not complete, for simplicity of presentation (see Definition 5 for a formal definition of a complete program).

Fig. 2: The programs M1, M2, and the specification P

The L<sup>∗</sup> algorithm aims at learning a regular language U. Its entities consist of a *teacher* – an oracle who answers *membership queries* ("is the word w in U?") and *equivalence queries* ("is A an automaton whose language is U?"), and a *learner*, who iteratively constructs a finite deterministic automaton A for U by submitting a sequence of membership and equivalence queries to the teacher.

In using the L<sup>∗</sup> algorithm for learning an assumption A for the AG-rule, membership queries are answered according to the satisfaction of the specification P: If M1||t satisfies P, then the trace t in hand should be in A. Otherwise, t should not be in A. Once the learner constructs a stable system A, it submits an equivalence query. The teacher then checks whether A is a suitable assumption, that is, whether M1||A satisfies P, and whether the language of M<sup>2</sup> is contained in the language of A. According to the results, the process either continues or halts with an answer to the verification problem. The learning procedure aims at learning the weakest assumption Aw, which contains all the traces that in parallel with M<sup>1</sup> satisfy P . The key observation that guarantees termination in [24] is that the components in this procedure – M1,M2, P and A<sup>w</sup> – are all regular.

Our setting is more complicated, since the traces in the components – both the programs and the specification – contain constraints, which are to be checked semantically and not syntactically. These constraints may cause some traces to become infeasible. For example, if a trace contains an assignment x := 3 followed by a constraint x ≥ 4 (modeling an "if" statement), then this trace does not contribute any concrete runs, and therefore does not affect the system behavior. Thus, we must add feasibility checks to the process.

Constraints in the specification also pose a difficulty, as satisfiability of a specification is determined by the semantics of the constraints and not only by the language syntax, and so there is more here to check than standard language containment. Moreover, in our setting A<sup>w</sup> above may no longer be regular, see Example 3. However, our method manages to overcome this problem.

As we have described above, not only do we construct a learning-based method for the AG-rule for communicating programs, but we also repair the programs in case the verification fails. An AG-rule can either conclude that M1||M<sup>2</sup> satisfies P, or return a real, non-spurious counterexample of a computation t of M1||M<sup>2</sup> that violates P. In our case, instead of returning t, we repair M<sup>2</sup> in a way that eliminates this counterexample. Our repair is both syntactic and semantic, where for semantic repair we use *abduction* [25] to infer a new constraint which makes the counterexample t infeasible.

Consider again M<sup>1</sup> and P of Figure 2 and M<sup>2</sup> of Figure 1. The composition M1||M<sup>2</sup> does not satisfy P. For example, if the initial value of xpw is 2<sup>63</sup>, then after encryption the value of ypw is 2<sup>64</sup>, violating P. Our algorithm finds a bad trace during the AG stage which captures this bad behavior, and the abduction in the repair stage finds a constraint that eliminates it: xpw < 2<sup>63</sup>, and inserts this constraint to M2.

Following this step we now have an updated M2, and we continue with applying the AG-rule again, using information we have gathered in the previous steps. In addition to removing the error trace, we update the alphabet of M<sup>2</sup> with the new constraint.

Continuing our example, in a following iteration AGR will verify that the repaired M<sup>2</sup> together with M<sup>1</sup> satisfy P, and terminate.

Thus, AGR operates in a verify-repair loop, where each iteration runs a learning-based process to determine whether the (current) system satisfies P, and if not, eliminates bad behaviors from M<sup>2</sup> while enriching the set of constraints derived from these bad behaviors, which often leads to a quicker convergence. In case the current system does satisfy P, we return the repaired M<sup>2</sup> together with an assumption A that abstracts M<sup>2</sup> and acts as a smaller proof for the correctness of the system.

We have implemented a tool for AGR and evaluated it on examples of various sizes and of various types of errors. Our experiments show that for most examples, AGR converges and finds a repair after 2-5 iterations of verify-repair. Moreover, our tool generates assumptions that are significantly smaller then the (possibly repaired) M2, thus constructing a compact and efficient proof of correctness.

214 H. Frenkel et al.

Contributions To summarize, the main contributions of this paper are:


Related Work Assume-guarantee style compositional verification [22,26] has been extensively studied. The assumptions necessary for compositional verification were first produced manually, limiting the practicality of the method.

More recent works [9,16,14,6] proposed techniques for automatic assumption generation using learning and abstraction refinement techniques, making assume-guarantee verification more appealing. In [24,6] alphabet refinement has been suggested as an optimization, to reduce the alphabet of the generated assumptions, and consequently their sizes. This optimization can easily be incorporated into our framework as well.

Other learning-based approaches for automating assumption generation have been described in [7,17,8]. All these works address non-circular rules and are limited to finite state systems. Automatic assumption generation for circular rules is presented in [12,13], using compositional rules similar to the ones studied in [21,23].

Our approach is based on a non-circular rule but it targets complex, infinite-state concurrent systems, and addresses not only verification but also repair. The compositional framework presented in [19] addresses L∗ based compositional verification and synthesis but it only targets finite state systems.

Also related is the work in [18], which addresses automatic synthesis of circular compositional proofs based on logical abduction; however the focus of that work is sequential programs, while here we target concurrent programs. A sequential setting is also considered in [3], where abduction is used for automatically generating a program environment. Our computation of abduction is similar to that of [3]. However, we require our constraints to be over a predefined set of variables, while they look for a minimal set.

The approach presented in [27] aims to compute the *interface* of an infinite-state component. Similar to our work, the approach works with both over- and under- approximations but it only analyzes one component at a time. Furthermore, the component is restricted to be deterministic (necessary for the permissiveness check). In contrast we use both components of a system to compute the necessary assumptions, and as a result they can be much smaller than in [27]. Furthermore, we do not restrict the components to be deterministic and, more importantly, we also address the system repair in case of dissatisfaction.

#### 2 Communicating Programs

In this section we present the notion of *communicating programs*. These are C-like programs, extended with the ability to synchronously read and write messages over communication channels. We model such programs as automata over an *action alphabet* that reflects the program statements. The alphabet includes *constraints*, which are quantifier-free first-order formulas, representing the conditions in *if* and *while* statements. It also includes *assignment statements* and read and write *communication actions*. The automata representation is similar in nature to that of a control-flow graph. Its advantage, however, is in the ability to exploit an automata-learning algorithm such as L<sup>∗</sup> for its verification.

We first formally define the alphabet over which communicating programs are defined. Let G be a finite set of communication channels. Let X be a finite set of variables (whose ordered vector is x¯) and D be a (possibly infinite) data domain. For simplicity, we assume that all variables are defined over D. The elements of D are also used as constants in arithmetic expressions and constraints.

Definition 1. *An* action alphabet *is* α = G∪E∪C *where:*


Definition 2. *A* communicating program *(or, a program) is* M = Q, X, α, δ, q0, F *, where:*

	- *–* α ∈G∪E *for all* (q, α, q ) ∈ δ

*That is, for each state it holds that either all outgoing edges are labeled with constraints, or that all outgoing edges are labeled with assignments or communication actions.*

*5.* F ⊆ Q *is the set of accepting states.*

The words that are read along a communicating program are a *symbolic representation* of the program behaviors. We refer to such a word as a *trace*. Each such trace induces *concrete runs* of the program, which are formed by concrete assignments to the program variables in a way that conforms with the actions along the word.

We now formally define these notions.

Definition 3. *A* path *in a program* M *is a finite sequence of states and actions* p = (q0, a1,

q1,...,an, qn)*, starting with the initial state* q0*, such that* ∀0 ≤ i<n *we have* (qi, ai+1, qi+1) ∈ δ*. The* induced trace *of* p *is the sequence* t = (a1,...,an) *of the actions in* p*. If* q<sup>n</sup> *is accepting, then* t *is an* accepted trace *of* M*.*

From now on we assume that every trace we discuss is induced by some path. We turn to define the concrete runs of the program.

Definition 4. *Let* t = (a1,...,an) *be a trace and let* (β0,...,βn) *be a sequence of valuations (i.e., assignments to the program variables)*4*. Then a sequence* r = (β0, a1, β1, a2,...,an, βn) *is a* run *of* t *if the following holds.*

<sup>4</sup> Such valuations are usually referred to as states. We do not use this terminology here in order not to confuse them with the states of the automaton.


*We say that* t *is* feasible *if there exists a run of* t*.*

The *symbolic language* of M, denoted T (M), is the set of all *accepted* traces induced by paths of M. The *concrete language* of M is the set of all runs of accepted traces in T (M). We will mostly be interested in feasible traces, which represent (concrete) runs of the program. Intuitively, the symbolic language of a program M corresponds to its syntactic behavior, while the concrete language corresponds to the semantics of the program.


#### 2.1 Parallel Composition

We now describe and define the parallel run of two communicating programs, and the way in which they communicate.

Let M<sup>1</sup> and M<sup>2</sup> be two programs, where M<sup>i</sup> = Qi, Xi, αi, δi, q<sup>0</sup> i , F<sup>i</sup> for i ∈ {1, 2}. Let G1, G<sup>2</sup> be the sets of communication channels occurring in actions of M1, M2, respectively. We assume X<sup>1</sup> ∩ X<sup>2</sup> = ∅.

The *interface alphabet* αI of M<sup>1</sup> and M<sup>2</sup> consists of all communication actions on channels that are common to both components. That is, αI = { g?x, g!x | g ∈ G<sup>1</sup> ∩ G2, x ∈ X<sup>1</sup> ∪ X2}.

In *parallel composition*, the two components synchronize on their communication interface only when one component writes data through a channel, and the other reads it through the same channel. The two components cannot synchronize if both are trying to read or both are trying to write. We distinguish between communication of the two components with each other (on their common channels), and their communication with their environment. In the former case, the components must "wait" for each other in order to progress together. In the latter case, the communication actions of the two components interleave asynchronously.

Formally, the *parallel composition* of M<sup>1</sup> and M2, denoted M1||M2, is the program M = Q, x, α, δ, q0, F defined as follows.

	- (a) For (g ∗ x1, g ∗ x2) ∈ α:
		- i. δ((q1, q2),(g ∗ x1, g ∗ x2)) = (q 1, q 2).

ii. δ((q 1, q <sup>2</sup>), x<sup>1</sup> == x2)=(δ1(q1, g ∗ x1), δ2(q2, g ∗ x2)).

That is, when a communication is performed synchronously in both components, the data is transformed through the channel from the writing component to the reading component. As a result, the values of x<sup>1</sup> and x<sup>2</sup> equalize. This is enforced in M by adding a transition labeled by the constraint x<sup>1</sup> == x<sup>2</sup> that immediately follows the synchronous communication.


That is, on actions that are not in the interface alphabet, the two components interleave.

5. F = F<sup>1</sup> × F<sup>2</sup>

Figure 3 demonstrates the parallel composition of components M<sup>1</sup> and M<sup>2</sup> of Figures 1 and 2. The program M = M1||M<sup>2</sup> reads a password from the environment through channel read. The two components synchronize on channels enc and getEnc.

Fig. 3: Parallel composition M = M1||M<sup>2</sup> of components M<sup>1</sup> and M<sup>2</sup> from Figures 1, 2

#### 3 Regular Properties and Their Satisfaction

In this section we define the syntax and semantics of the properties that we consider. These are properties that can be represented as finite automata, hence the name *regular*. However, the alphabet of such automata includes communication actions and first-order constraints over program variables. Thus, such automata are suitable for specifying the desired and undesired behaviors of communicating programs over time.

In order to define our properties, we first need the notion of a *deterministic and complete* program. The definition is somewhat different from the standard definition for finite automata, since it takes the semantic meaning of constraints into account.

Intuitively, in a deterministic and complete program, every concrete run has exactly one trace that induces it.

Definition 5. *A program over alphabet* α *is* deterministic and complete *if for every state* q *and for every action* a ∈ α *the following hold:*

*1. There is exactly one state* q *such that* (q, a, q ) *is in* δ*.* 5

<sup>5</sup> in our examples we sometimes omit the actions that lead to a rejecting sink for the sake of clarity.

*2. If* (q, c1, q ) *and* (q, c2, q) *are in* δ *for constraints* c1, c<sup>2</sup> ∈ C *and* q = q*, then* c<sup>1</sup> ∧ c<sup>2</sup> ≡ f alse*. 3. Let* C<sup>q</sup> *be the set of all constraints on transitions leaving* q*. Then* ( - <sup>c</sup>∈C<sup>q</sup> <sup>c</sup>) <sup>≡</sup> true *.*

A *property* is a deterministic and complete program with no assignment actions.

A trace is accepted by a property P if it reaches a state in F, the set of accepting states of P. Otherwise, it reaches a state in Q \ F, and is rejected by P.

Next, we define the satisfaction relation between a program and a property. Intuitively, a program M satisfies a property P (denoted M - P) if all runs induced by accepted traces of M reach an accepting state in P.

A property P specifies the behavior of a program M by referring to communication actions of M and imposing constraints over the variables of M. Thus, the set of variables of P is identical to that of M. Let G be the set of communication actions of M. Then, αP includes a subset of G as well as constraints over the variables of M. The *interface* of M and P, which consists of the communication actions that occur in P, is defined as αI = G ∩ αP.

In order to capture the satisfaction relation between M and P, we define a *conjunctive composition* between M and P, denoted M × P. In conjunctive composition, the two components synchronize on their common communication actions when both read or both write through the same communication channel. They interleave on constraints and on actions of αM that are not in αP.

Definition 6. *Let* <sup>M</sup> <sup>=</sup> QM, XM, αM, δM, q<sup>M</sup> <sup>0</sup> , F<sup>M</sup> *be a program and* <sup>P</sup> <sup>=</sup> Q<sup>P</sup> , X<sup>P</sup> , αP, δ<sup>P</sup> , q<sup>P</sup> <sup>0</sup> , F<sup>P</sup> *be a property, where* X<sup>M</sup> = X<sup>P</sup> *. The* conjunctive composition *of* M *and* P *is* M ×P = Q, X, α, δ, q0, F *, where:*

	- *(a) For* a = (g ∗ x, g ∗ y) ∈ αI*, or* a = g ∗ x ∈ αI*:* δ((q1, q2), a)=(δM(q1, a), δ<sup>P</sup> (q2, a))*.*
	- *(b) For* a ∈ αM \ αI*:* δ((q1, q2), a)=(δM(q1, a), q2)*.*
	- *(c) For* a ∈ αP \ αI*:* δ((q1, q2), a)=(q1, δ<sup>P</sup> (q2, a))*.*

*That is, on actions that are not common communication actions to* M *and* P*, the two components interleave.*

*5.* F = F<sup>M</sup> × B<sup>P</sup> *, where* B<sup>P</sup> = Q<sup>P</sup> \ F<sup>P</sup> *.*

Note that accepted traces in M × P are those that are accepted in M and rejected in P. Such traces are called *error traces* and their corresponding runs are called *error runs*. Intuitively, an error run is a run along M which violates the properties modeled by P. Such a run either fails to synchronize on the communication actions, or reaches a point in the computation in which its assignments, coming from M, violate some constraint described by P. These runs are manifested in the traces that are accepted in M but are composed with matching traces that are rejected in P. We can now formally define when a program satisfies a property.

Definition 7. *For a program* M *and a property* P*, we define* M - P *iff* M ×P *contains no feasible accepted traces.*

<sup>6</sup> Note that communication actions of the form (g <sup>∗</sup> x, g <sup>∗</sup> y) can only appear if M is a parallel composition of two programs.

Thus, a feasible error trace in <sup>M</sup> <sup>×</sup> <sup>P</sup> is an evidence to <sup>M</sup> - P, since it indicates the existence of a run that violates P.

*Example 2.* Consider the program M of Figure 3 and the property P of Figure 2. As we discussed in Section 1, M - P. The trace t = read?xpw, 999 < xpw,(enc!xpw, enc?ypw), xpw == ypw, ypw := <sup>2</sup> · <sup>y</sup>pw,(getEnc?xpw<sup>2</sup>, getEnc!ypw), xpw<sup>2</sup> == <sup>y</sup>pw, xpw! = <sup>x</sup>pw<sup>2</sup>, ypw <sup>≥</sup> <sup>2</sup><sup>64</sup> is a feasible error trace in M × P proving that an overflow is possible.

#### 4 The Assume-Guarantee-Repair (AGR) Framework

In this section we discuss our Assume-Guarantee-Repair (AGR) framework for communicating programs. The framework consists of a learning-based Assume-Guarantee algorithm, called AG<sup>L</sup><sup>∗</sup> , and a REPAIR procedure, which are tightly joined.

Let M<sup>1</sup> and M<sup>2</sup> be two programs, and let P be a property. The classical Assume-Guarantee (AG) proof rule [26] assures that if we find an assumption <sup>A</sup> (in our case, a communicating program) such that <sup>M</sup>1||<sup>A</sup> - P and M<sup>2</sup> - <sup>A</sup> both hold, then <sup>M</sup>1||M<sup>2</sup> - P holds as well. For LTSs [9], the AG-rule is guaranteed to either prove correctness or return a real (non-spurious) counterexample. The work in [9] relies on the L<sup>∗</sup> algorithm [5] for learning an assumption A for the AG-rule. In particular, L<sup>∗</sup> aims at learning Aw, the weakest assumption for which <sup>M</sup>1||A<sup>w</sup> - P holds. A crucial point of this method is the fact that A<sup>w</sup> is *regular* [15], and thus can be learned by L∗.

Lemma 1. *For infinite-state communicating programs, the weakest assumption* A<sup>w</sup> *is not always regular.*

*Example 3.* Consider the programs M1, M<sup>2</sup> and the property P of Figure 4. The weakest assumption with which M<sup>1</sup> satisfies P should contain exactly all traces (over the alphabet of M2) that contain equally many actions of the form x := x + 1 and y := y + 1. This set of traces is not regular, and therefore cannot be learned by L∗.

Fig. 4: A system for which the weakest assumption is not regular

To cope with this difficulty, we change the target of learning. Instead of learning the (possibly) nonregular language of Aw, we learn T (M2), the set of accepted traces of M2. This language is guaranteed to be regular, as it is represented by the automaton M2.

Note that in case that <sup>M</sup>1||M<sup>2</sup> - P, repair is never needed, and M<sup>2</sup> is a valid assumption. In the worst case, the procedure halts once it has learned M2. In particular, in case there are no error traces, termination of our algorithm is guaranteed. If <sup>M</sup>1||M<sup>2</sup> - P then there does not exist a matching assumption, and attempting to learn M<sup>2</sup> will reveal this. Therefore, using T (M2) as a learning goal matches the AG rule. The nature of AG<sup>L</sup><sup>∗</sup> is such that the assumptions it learns before it reaches M<sup>2</sup> may contain the traces of M<sup>2</sup> and more, but still be represented by a smaller automaton. Therefore, similarly to [9], AG<sup>L</sup><sup>∗</sup> often terminates with an assumption A that is much smaller than M2. Indeed, our tool often produces very small assumptions (see Section 5).

As mentioned before, not only that we determine whether <sup>M</sup>1||M<sup>2</sup> - P, but we also repair the program in case it violates the specification. When <sup>M</sup>1||M<sup>2</sup> - P, the AG<sup>L</sup><sup>∗</sup> algorithm returns an error trace t as a witness for the violation. In this case, we initiate the REPAIR procedure, which eliminates t from M2. REPAIR applies abduction in order to learn a new constraint which, when added to t, creates an infeasible trace.<sup>7</sup> The new constraint enriches the alphabet in a way which may make similar traces infeasible as well. We elaborate on our use of abduction in Section 4.2. The removal of t and the addition of the new constraint result in a new goal M <sup>2</sup> for AG<sup>L</sup><sup>∗</sup> to learn. We now return to AG<sup>L</sup><sup>∗</sup> to search for a new assumption A that allows to verify M1||M 2 -P.

An important feature of our AGR algorithm is its *incrementality*. When learning an assumption A for M <sup>2</sup> we can use the membership queries previously asked for M2, since the answer for them has not been changed. In the full version [1] we prove that the difference between the languages of M<sup>2</sup> and M <sup>2</sup> lies in words (traces) whose membership has not yet been queried on M2. This allows the learning of M <sup>2</sup> to start from the point where the previous learning has left off, resulting in a more efficient algorithm.

As opposed to the case where <sup>M</sup>1||M<sup>2</sup> - P, we cannot guarantee the termination of the repair process in case <sup>M</sup>1||M<sup>2</sup> - P. This, since we are only guaranteed to remove one (bad) trace and add one (infeasible) trace in every AGR REPAIR iteration (although in practice, every iteration may remove a larger set of traces). Thus, we may never converge to a repaired system. Nevertheless, in case of property violation, our algorithm always finds an error trace, thus a progress towards a "less erroneous" program is guaranteed.

It should be noted that the AG<sup>L</sup><sup>∗</sup> part of our AGR algorithm deviates from the AG-rule of [9] in two important ways. First, since the goal of our learning is M<sup>2</sup> rather than Aw, our membership queries are different in type and order. Second, in order to identify real error traces and send them to REPAIR as early as possible, we add additional queries to the membership phase that reveal such traces. We then send them to REPAIR without ever passing through equivalence queries, which improves the overall efficiency. Indeed, our experiments include several cases in which all repairs were invoked from the membership phase. In these cases, AGR ran an equivalence query only when it has already successfully repaired M2, and terminated.

#### 4.1 The Assume-Guarantee-Repair (AGR) Algorithm

We now describe our AGR algorithm in more detail (see Algorithm 1). Figure 5 describes the flow of the algorithm. AGR comprises two main parts, namely AG<sup>L</sup><sup>∗</sup> and REPAIR.

The input to AGR are the components M<sup>1</sup> and M2, and the property P. While M<sup>1</sup> and P stay unchanged during AGR, M<sup>2</sup> keeps being updated as long as the algorithm recognizes that it needs repair (we can guarantee termination in certain cases, as we discuss in Section 4.4).

The algorithm works in iterations, where in every iteration the next updated M<sup>i</sup> <sup>2</sup> is calculated, starting with iteration i = 0, where M<sup>0</sup> <sup>2</sup> = M2. An iteration starts with the membership phase in line 2, and ends either when AG<sup>L</sup><sup>∗</sup> successfully terminates (line 16) or when procedure REPAIR is called (lines 7 and 24). When a new system M<sup>i</sup> <sup>2</sup> is constructed, AG<sup>L</sup><sup>∗</sup> does not start from scratch. The information that has been used in previous iterations is still valid for M<sup>i</sup> <sup>2</sup>. The new iteration is given additional new trace(s) that have been added or removed from the previous M<sup>i</sup> <sup>2</sup> (lines 9,11,20, 27).

AG<sup>L</sup><sup>∗</sup> consists of two phases: membership, and equivalence.

The membership phase (lines 2-11) consists of a loop in which the learner constructs the next assumption Ai <sup>j</sup> according to answers it gets from the teacher on a sequence of membership queries on various traces.

<sup>7</sup> There are also cases in which we do not use abduction, as discussed in Section 4.3

Fig. 5: The flow of AGR

These queries are answered in accordance with traces we allow in A<sup>i</sup> <sup>j</sup> : traces in M<sup>i</sup> <sup>2</sup> that in parallel with M<sup>1</sup> satisfy <sup>P</sup>. If a trace <sup>t</sup> ∈ T (M<sup>i</sup> <sup>2</sup>) in parallel with M<sup>1</sup> does not satisfy P, then t is a bad behavior of M2. Therefore, if such a t is found during the membership phase, REPAIR is invoked.

Once the learner reaches a stable assumption A<sup>i</sup> <sup>j</sup> , it passes it to the equivalence phase (lines 12-27). A<sup>i</sup> j is a suitable assumption if both <sup>M</sup>1||A<sup>i</sup> <sup>j</sup> - <sup>P</sup> and <sup>T</sup> (M<sup>i</sup> <sup>2</sup>) ⊆ T (A<sup>i</sup> <sup>j</sup> ) hold. In this case, AGR terminates and returns M<sup>i</sup> <sup>2</sup> as a successful repair of <sup>M</sup>2. If <sup>M</sup>1||A<sup>i</sup> j - P, then a counterexample t is returned, that is composed of bad traces in M1, A<sup>i</sup> <sup>j</sup> , and P. If the bad trace t2, the restriction of t to the alphabet of A<sup>i</sup> <sup>j</sup> , is also in M<sup>i</sup> <sup>2</sup>, then t<sup>2</sup> is a bad behavior of M<sup>i</sup> <sup>2</sup>, and here too the REPAIR phase is invoked. Otherwise, AGR returns to the membership phase with t<sup>2</sup> as a trace that should not be in A<sup>i</sup> <sup>j</sup> , and continues to learn A<sup>i</sup> <sup>j</sup>+1.

As we have described, REPAIR is called when a bad trace <sup>t</sup> is found in (M1||M<sup>i</sup> <sup>2</sup>) × P and should be removed. If t contains no constraints then its sequence of actions is illegal and its subtrace t<sup>2</sup> from M<sup>i</sup> 2 should be removed from M<sup>i</sup> <sup>2</sup>. In this case, REPAIR returns to AG<sup>L</sup><sup>∗</sup> with a new learning goal M<sup>i</sup>+1 <sup>2</sup> such that <sup>T</sup> (M<sup>i</sup>+1 <sup>2</sup> ) ⊆ T (M<sup>i</sup> <sup>2</sup>) \ {t2}, along with the answer "no" to the membership query on t2. In 4.3 we discuss different methods for removing t<sup>2</sup> from M<sup>i</sup> 2.

The more interesting case is when t contains constraints. In this case, we not only remove the matching t<sup>2</sup> from M<sup>i</sup> <sup>2</sup>, but we also add a new constraint c to the alphabet of M<sup>i</sup>+1 <sup>2</sup> , which causes t<sup>2</sup> to be infeasible. This way we eliminate t2, and may also eliminate a family of bad traces that violate the property in the same manner. We deduce c using abduction, see Section 4.2. As before, REPAIR returns to AGL<sup>∗</sup> with a new goal to be learned, but now also with an extended alphabet. The membership phase is then provided with two new answers to the membership query: t<sup>2</sup> that should *not* be included in the new assumption, and (t<sup>2</sup> · c) that should be included.

Incremental learning One of the advantages of AGR is that it is *incremental*, in the sense that membership answers from previous iterations remain unchanged for the repaired system. Indeed, since this is the first time that AG<sup>L</sup><sup>∗</sup> queries <sup>t</sup>2, we can return to AG<sup>L</sup><sup>∗</sup> with the answer <sup>t</sup><sup>2</sup> ∈ T / (M<sup>i</sup>+1 <sup>2</sup> ), without contradicting any previous queries. In addition, t <sup>2</sup> obtained by abduction is a new word (over a new alphabet), which also was not queried earlier. Therefore, we can incrementally add t<sup>2</sup> and t <sup>2</sup> as answers from the teacher, and continue to use answers from previous queries on all other traces.



#### 4.2 Repair by Abduction

We now describe the repair we apply to M<sup>i</sup> <sup>2</sup>, in case the error trace t contains constraints (see Algorithm 1, line 32). Error traces with no constraints are removed from M<sup>i</sup> <sup>2</sup> syntactically (line 31), while in abduction we *semantically* eliminate t by making it infeasible. The new constraints are then added to the alphabet of M<sup>i</sup>+1 <sup>2</sup> in a way that may eliminate additional error traces. Note that even-though we add new alphabet letters to M2, we do not add new *feasible traces*, since the constraints added by abduction can only restrict the behavior of M2, making more traces infeasible. Therefore, we do not add counterexamples to M2.

The process of inferring new constraints from known facts about the program is called *abduction* [11]. We now describe how we apply it. Given a trace t, let ϕ<sup>t</sup> be the first-order formula (a conjunction of constraints), which constitutes the SSA representation of t [4]. In order to make t infeasible, we look for a formula <sup>ψ</sup> such that <sup>ψ</sup> <sup>∧</sup> <sup>ϕ</sup><sup>t</sup> <sup>→</sup> false8.

Note that <sup>t</sup> ∈ T (M1||M<sup>i</sup> <sup>2</sup>) × P, and so it includes variables both from X1, the set of variables of M1, and from X2, the set of variables of M<sup>i</sup> <sup>2</sup>. Since we wish to repair M<sup>i</sup> <sup>2</sup>, the learned ψ is over the variables in X<sup>2</sup> only.

The formula ψ ∧ ϕ<sup>t</sup> → false is equivalent to ψ → (ϕ<sup>t</sup> → false). Thus, ψ = ∀x ∈ X1(ϕ<sup>t</sup> → false) ≡ ∀x ∈ X1(¬ϕt), is such a desired constraint: ψ makes t infeasible and is defined only over X2. We now use quantifier elimination [28] to produce a quantifier-free formula over X2. Computing ψ is similar to the abduction suggested in [11], but the focus here is on finding a formula over X<sup>2</sup> rather than over any minimal set of variables. We use Z3 [10] to apply quantifier elimination and to generate the new constraint. After generating ψ(X2), we add it to the alphabet of M<sup>i</sup>+1 <sup>2</sup> (line 35 of Algorithm 1). In addition, we produce a new trace t <sup>2</sup> = t<sup>2</sup> · ψ(X2). The trace t <sup>2</sup> is returned as the output of the abduction.

*Example 4.* Recall the error trace t = read?xpw, 999 < xpw,(enc!xpw, enc?ypw), xpw == ypw, ypw := <sup>2</sup> · <sup>y</sup>pw,(getEnc?xpw<sup>2</sup>, getEnc!ypw), xpw<sup>2</sup> == <sup>y</sup>pw, xpw! = <sup>x</sup>pw<sup>2</sup>, ypw <sup>≥</sup> <sup>2</sup><sup>64</sup> of Example 2. From <sup>t</sup> we create the formula ϕ<sup>t</sup> = (999 < xpw) ∧ (ypw = xpw) ∧ (y pw = 2 · ypw) ∧ (xpw<sup>2</sup> = y pw) ∧ (xpw = xpw<sup>2</sup>) ∧ (y pw <sup>≥</sup> <sup>2</sup><sup>64</sup>). We then apply quantifier elimination and simplification on the formula <sup>∀</sup>ypw∀y pw(¬ϕt) and get the new constraint xpw < 2<sup>63</sup>.

Lemma 2. *Let* t = (t1||t2) × t<sup>P</sup> *. If* t<sup>2</sup> *is infeasible, then* t *is infeasible as well.*

This is due to the fact that t<sup>P</sup> can only restrict the behaviors of t<sup>1</sup> and t2, thus if t<sup>2</sup> is infeasible, t cannot be made feasible. See the full version of the paper [1] for a formal proof. Therefore, by making t<sup>2</sup> infeasible, we eliminate the error trace t.

We now want to build a repaired component M<sup>i</sup>+1 <sup>2</sup> of M<sup>i</sup> <sup>2</sup>, which includes t<sup>2</sup> · ψ(X2) but not t2. To do so, we split the state q that t<sup>2</sup> reaches in M<sup>i</sup> <sup>2</sup> into two states q, q , and add a transition labeled ψ(X2) from q to q , where only <sup>q</sup> is now accepting9. Thus, we eliminated a violating trace from <sup>M</sup>1||M<sup>i</sup> <sup>2</sup>. AGR now returns to AG<sup>L</sup><sup>∗</sup> in order to learn an assumption for the repaired component M<sup>i</sup>+1 <sup>2</sup> , which now includes t 2 but not t2.

#### 4.3 Removal of Error Traces

Recall that the goal of REPAIR is to remove a bad trace t from M<sup>2</sup> once it is found by AG<sup>L</sup><sup>∗</sup> . If t contains constraints, we remove it using abduction. Otherwise, we can remove t by constructing a system whose language is T (M2) \ {t}. We call this the *exact* method for repair. However, removing a single trace at a time may lead to slow convergence, and to an exponential blow-up in the size of the repaired systems. Moreover, as we have discussed, in some cases there are infinitely many such error traces, in which case AGR may never terminate.

For faster convergence, we have implemented two additional heuristics, namely *approximate* and *aggressive*. These heuristics may remove more than a single trace at a time, while keeping the size of the systems small. While "good" traces may be removed as well, the correctness of the repair is maintained, since no bad traces are added. Moreover, an error trace is likely to be in an erroneous part of the system, and in these cases our heuristics manage removing a set of error traces in a single step.

We briefly survey the three methods.

<sup>8</sup> Usually, in abduction, we look for <sup>ψ</sup> such that <sup>ψ</sup> <sup>∧</sup> <sup>ϕ</sup><sup>t</sup> is not a contradiction. In our case, however, since <sup>ϕ</sup><sup>t</sup> is a violation of the specification, we want to infer a formula that makes <sup>ϕ</sup><sup>t</sup> unsatisfiable. <sup>9</sup> Note that q is an accepting state in M<sup>i</sup>

<sup>2</sup> since <sup>t</sup> ∈ T (M<sup>i</sup> <sup>2</sup>).


#### 4.4 Correctness and Termination

For this discussion, we assume a sound and complete teacher who can answer the membership and equivalence queries in AG<sup>L</sup><sup>∗</sup> , which require verifying communicating programs and properties with first-order constraints.

As we have discussed earlier, AGR is not guaranteed to terminate, and there are cases where the REPAIR stage may be called infinitely many times. However, in case that no repair is needed, or if a repaired system is obtained after finitely many calls to REPAIR, then AGR is guaranteed to terminate with a correct answer.

To see why, consider a repaired system M<sup>i</sup> <sup>2</sup> for which <sup>M</sup>1||M<sup>i</sup> <sup>2</sup> - P. Since the goal of AG<sup>L</sup><sup>∗</sup> is to syntactically learn M<sup>i</sup> <sup>2</sup>, which is regular, this stage will terminate at the latest when AG<sup>L</sup><sup>∗</sup> learns exactly M<sup>i</sup> <sup>2</sup> (it may terminate sooner if a smaller appropriate assumption is found). Notice that, in particular, if <sup>M</sup>1||M<sup>2</sup> -P, then AGR terminates with a correct answer in the first iteration of the verify-repair loop.

REPAIR is only invoked when a (real) error trace t is found in M<sup>i</sup> <sup>2</sup>, in which case a new system M<sup>i</sup>+1 <sup>2</sup> , that does not include <sup>t</sup>, is produced by REPAIR. If <sup>M</sup>1||M<sup>i</sup> <sup>2</sup> - P, then an error trace is guaranteed to be found by AG<sup>L</sup><sup>∗</sup> either in the membership or equivalence phase. Therefore, also in case that M<sup>i</sup> <sup>2</sup> violates P, the iteration is guaranteed to terminate. To conclude, we have the following.

Theorem 1. – *An iteration* <sup>i</sup> *of AGR ends with an error trace* <sup>t</sup> *iff* <sup>M</sup>1||M<sup>i</sup> 2 - P*, where* M<sup>i</sup> <sup>2</sup> *is the repaired system at iteration* i*.*

– *If, after finitely many iterations, a repaired program* M <sup>2</sup> *is such that* M1||M 2 - P*, then AGR terminates with a correct answer.*

We have shown that every iteration of AGR is guaranteed to terminate with a correct answer. The detailed correctness proofs are in the full version of this paper [1].

In particular, since every iteration of AGR finds and removes an error trace t, and no new erroneous traces are introduced in the updated system, then in case that M<sup>2</sup> has finitely many error traces, AGR is guaranteed to terminate with a correctly repaired system.

#### 5 Experimental Results and Conclusions

We implemented our AGR framework in Java, integrating L<sup>∗</sup> implementation from the LTSA tool [20]. We used Z3 [10] as the teacher for the satisfaction queries in AG<sup>L</sup><sup>∗</sup> , and for abduction in REPAIR.

Table 1 displays some results of running AGR on various examples, varying in their sizes, types of errors – semantic and syntactic – and their amount. Additional results are in the full version of this paper [1], and the full examples are available on [2]. The *iterations* column indicates the number of iterations of the verify-repair loop, until a repaired M<sup>2</sup> is achieved. Examples with no errors were verified in the first iteration, and are indicated by *verification*. We tested the three repair methods described in Section 4.3 for counterexamples without constraints, and used abduction when needed. Figure 6 presents comparisons between the three methods in terms of run-time and the size of the repair and assumptions (note that the graphs are given in logarithmic scale).


Table 1: AGR algorithm results on various examples

Most of our examples model multi-client-server communication protocols, with varying sizes. Our tool managed repairing all these examples when needed.

As can be seen in Table 1, our tool successfully generates assumptions that are significantly smaller than the repaired and the original M2.

For the examples that needed repair, in most cases our tool needed 2-5 iterations of verify-repair in order to successfully construct a repaired component. Interestingly, in example #15 the *aggressive* method converged slower than the *approximate* method. This is due to the structure of M2, in which different error traces lead to different states. Marking these states as non-accepting removed each trace separately. However, some of these traces have a common transition, and preventing this transition from reaching an accepting state, as done in the *approximate* method, managed removing several error traces in a single repair. This example also includes repairs by abduction (as do examples #16, #18 and #19).

Example #22 models a simple structure in which, due to a loop in M2, the same alphabet sequence can generate infinitely many error traces. The *exact* repair method timed out, since it attempted removing one

Fig. 6: Comparing repair methods: time and repair size (logarithmic scale).

error trace at a time. On the other hand, the *aggressive* method removed all accepting states, creating an empty program – a trivial (yet valid) repair. However, the *approximate* method created a valid, non-trivial repair.

Conclusion AGR offers a new take on the learning-based approach to assume-guarantee verification, and manages coping with complex properties and repairing infinite-state programs. Our experimental results show that using existing semantic tools, AGR produces very succinct proofs, and quickly and efficiently repairs flawed communicating programs.

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Structural Invariants for the Verification of Systems with Parameterized Architectures

Marius Bozga1, Javier Esparza2 , Radu Iosif1, Joseph Sifakis1 and Christoph Welzel<sup>2</sup>

<sup>1</sup> Univ. Grenoble Alpes, CNRS, Grenoble INP--, Verimag <sup>2</sup> Technische Universitat M¨ unchen ¨

We consider parameterized concurrent systems consisting of a finite but unknown number of components, obtained by replicating a given set of finite state automata. Components communicate by executing atomic interactions whose participants update their states simultaneously. We introduce an interaction logic to specify both the type of interactions (e.g. rendez-vous, broadcast) and the topology of the system (e.g. pipeline, ring). The logic can be easily embedded in monadic second order logic of κ ≥ 1 successors (WSκS), and is therefore decidable.

Proving safety properties of such a parameterized system, like deadlock freedom or mutual exclusion, requires to infer an inductive invariant that contains all reachable states of all system instances, and no unsafe state. We present a method to automatically synthesize inductive invariants directly from the formula describing the interactions, without costly fixed point iterations. We experimentally prove that this invariant is strong enough to verify safety properties of a large number of systems, including textbook examples (dining philosophers, synchronization schemes), classical mutual exclusion algorithms, cache-coherence protocols and self-stabilization algorithms, for an arbitrary number of components.

## 1 Introduction

The problem of parameterized verification asks whether a system composed of *n* replicated processes is safe, for all *n* ≥ 2. By safety we mean that every execution of the system stays clear of a set of global error configurations, such as deadlocks or mutual exclusion violations. Even if we assume each process to be finite-state and every interaction to be a synchronization of actions without exchange of data, ranging over large or infinite domains, the problem remains challenging because we ask for a general proof of safety that works for any number of processes.

Parameterized verification is undecidable, even if processes only manipulate data from a bounded domain [6]. Various restrictions of communication and architecture<sup>3</sup> define decidable subproblems [18,31,27,5]. Seminal works consider *rendez-vous* communication, with participants placed in a ring [18,27] or a clique [31] of arbitrary size. Recently, MSO-definable graphs (with bounded tree- and clique-width) and point-topoint rendez-vous communication have been considered [5]. Most approaches to define decidable problems focus on manually proving a *cut-o*ff bound *c* ≥ 2 such that

<sup>-</sup>-Institute of Engineering Univ. Grenoble Alpes

<sup>3</sup> We use the term architecture for the shape of the graph along which the interactions take place.

correctness for at most *c* processes implies correctness for any number of processes [18,27,26,7,34]. Other methods identify systems with well-structured transition relations [31,1,29]. An exhaustive chart of decidability results for verification of parameterized systems is drawn in [12]. When decidability is not of concern, over-approximation and semi-algorithmic techniques such as *regular model checking* [36,2], SMT-based *bounded model checking* [4,21], *abstraction* [10,14] and *automata learning* [19] can be used to deal with more general classes of systems.

The efficiency of a verification method crucially relies on its ability to synthesize an *inductive safety invariant*, i.e., an infinite set of configurations that contains the initial configurations, is closed under the transition relation, and excludes the error configurations. In general, automatically synthesizing invariants requires computationally expensive fixpoint iterations [22]. In the particular case of parameterized systems, invariants can be either *global*, relating the local states of all processes [23], or *modular*, relating the local states of a few processes whose identity is irrelevant [38,20].

*Our Contributions.* The novelty of the approach described in this paper is three-fold:


Comparison to related work. Trap invariants have been very successfully used in the verification of non-parameterized systems [11,28,13]. The technique was lifted to parameterized systems in [17], but the work there is only applicable to clique architec-

<sup>4</sup> Called in this way by analogy with the notion of traps for Petri Nets [39].

tures, in which processes are indistinguishable, and the system can be described by one single Petri Net with an infinite family of initial markings. Here, for the first time, we show that the trap technique can be extended to pipelines, token rings and trees, where the system is defined by an infinite family of Petri Nets, each with a different structure. These systems cannot be analyzed using the techniques of [31,1,29], because they do not yield well-structured transition systems. Contrary to [18,27,26,7,34], our approach does not require a manual cut-off proof. Contrary to regular model checking and automata learning [2,19], it does not require any symbolic state-space exploration. Finally, our approach produces an explanation of why the property holds in terms of the trap invariant and 1-invariants used. Summarizing, our approach provides a comparatively cheap technique for parameterized verification, that succeeds in numerous cases. It is ideal as preprocessing step that can very quickly lead to success with a very clear explanation of why the property holds, and otherwise provides at least a strong invariant that can be used for further analysis.

Fig. 1: Parameterized Dining Philosophers

*Running Example.* Consider the dining philosophers system in Fig. 1, consisting of *<sup>n</sup>* <sup>≥</sup> 2 components of type Fork and Philosopher respectively, placed in a ring of size 2*n*. The *k*-th philosopher has a left fork, of index *k*, and a right fork, of index (*k*+1) mod *n*. Each component is an instance of a finite state automaton with states *f*(ree) and *b*(usy) for Fork, respectively *w*(aiting) and *e*(ating) for Philosopher. A fork goes from state *f* to *b* via a *t*(ake) transition and from *f* to *b* via a (eave) transition. A philosopher goes from *w* to *e* via a *g*(et) transition and from *e* to *w* via a *p*(ut) transition. The *g* action of the *k*-th philosopher is executed jointly with the *t* actions of the *k*-th and [(*k*+1) mod *n*]-th forks, in other words, the philosopher takes both its left and right forks simultaneously. Similarly, the *p* action of the *k*-th philosopher is executed simultaneously with the action of the *k*-th and [(*k* + 1) mod *n*]-th forks, i.e. each philosopher leaves both its left and right forks at the same time. We describe these interactions by the *interaction formula*:

$$\Gamma\_{\text{philo}} = (\mathbf{g}(\mathbf{i}) \wedge \mathbf{t}(\mathbf{i}) \wedge \mathbf{t}(\text{succ}(\mathbf{i}))) \quad \lor \quad (\mathbf{p}(\mathbf{i}) \wedge \mathbf{t}(\mathbf{i}) \wedge \mathbf{t}(\text{succ}(\mathbf{i}))) \tag{1}$$
 where the free variable  $i$  refers at some arbitrary component index.

Intuitively, the transitions of the system with *n* dining philosophers and *n* forks are given by the *minimal models* of the disjuncts of Γ*philo* with universe {0,1,...,*n*−1}, and succ interpreted as "successor modulo *n*'. In particular, for each 0 ≤ *k* ≤ *n*−1 the first disjunct has a minimal model that interprets the predicates *g* and *t* as the sets {*k*} and {*k*,(*k* + 1) mod *n*}. This model describes the interaction in which the *k*-th philosopher takes a *g*-transition (from waiting to eating), while, simultaneously, the *k*-th and (*k*+1) th forks take *t*-transitions (from free to busy). This is graphically represented by one of the dashed lines in Fig. 1. Observe that the ring topology of the system is implicit in the modulo-*n* interpretation of the successor function.

Since philosophers can only grab their two forks simultaneously, the system is deadlock-free for any number *n* ≥ 2 of philosophers. An automatic proof requires to compute an invariant, and prove that it has an empty intersection with the set of deadlock configurations defined by the WSκS formula

$$\begin{aligned} \text{deadlock}(X\_{\textit{w}}, X\_{\textit{e}}, X\_{f}, X\_{b}) &= \forall i \ . \left[ \neg X\_{\textit{w}}(i) \lor \neg X\_{f}(i) \lor \neg X\_{f}(\text{succc}(i)) \right] \land \\ & \quad \left[ \neg X\_{\textit{e}}(i) \lor \neg X\_{b}(i) \lor \neg X\_{b}(\text{succc}(i)) \right] \end{aligned} \tag{2}$$

where *Xw*, *Xe*, *Xf* , *Xb* are set variables, the intended meaning of *Xw*(*i*) resp. *Xe*(*i*) is that the *i*-th philosopher is waiting, resp. eating, and the intended meaning of *Xf*(*i*) resp. *Xb*(*i*) is that the *i*-th fork is free, resp. busy. Our method automatically computes from Γ*philo* a formula *trap*-*invariant*<sup>S</sup> which formalizes an inductive invariant of the system. Moreover, we express the consistency requirement that every component is in one of its state at all times in a formula *marking*<sup>S</sup> and derive the deadlock-freeness for any number of philosophers by the unsatisfiability of the formula

*deadlock*∧*trap*-*invariant*<sup>S</sup> ∧*marking*<sup>S</sup> .

## 2 Parameterized Component-based Systems

<sup>A</sup> *component type* is a tuple <sup>C</sup> <sup>=</sup> P,S,*s*0,Δ, where <sup>P</sup> <sup>=</sup> {*p*,*q*,*r*,...} is a finite set of *ports*, <sup>S</sup> is a finite set of *states*, *<sup>s</sup>*<sup>0</sup> <sup>∈</sup> <sup>S</sup> is an initial state and <sup>Δ</sup> <sup>⊆</sup> <sup>S</sup> <sup>×</sup> <sup>P</sup> <sup>×</sup> <sup>S</sup> is a set of *transitions* denoted *s <sup>p</sup>* −→ *s* , for *<sup>s</sup>*,*<sup>s</sup>* <sup>∈</sup> <sup>S</sup> and *<sup>p</sup>* <sup>∈</sup> <sup>P</sup>. We assume there are no two different transitions with the same port.

A *component-based system* S = C<sup>1</sup> ,...,C*<sup>N</sup>*,Γ consists of a fixed number *N* ≥ 1 of component types <sup>C</sup>*<sup>k</sup>* <sup>=</sup> P*<sup>k</sup>* ,S*<sup>k</sup>* ,*s*<sup>0</sup> *k* ,Δ*<sup>k</sup>* and an *interaction formula* Γ. In the dining philosophers there are two component types, Philosopher and Fork, each with two states and two transitions, as shown in Fig. 1. We assume that P*<sup>i</sup>* <sup>∩</sup>P*<sup>j</sup>* <sup>=</sup> <sup>∅</sup> and <sup>S</sup>*<sup>i</sup>* <sup>∩</sup>S*<sup>j</sup>* <sup>=</sup> <sup>∅</sup>, for all 1 ≤ *i* < *j* ≤ *N*. We denote the component type of a port *p* or a state *s* by *type*(*p*) and *type*(*s*), respectively. For instance, in Fig. 1 we have *type*(*p*) = *type*(*g*) = *type*(*w*) = *type*(*e*) = Philosopher and *type*(*t*) = *type*() = *type*(*f*) = *type*(*b*) = Fork.

The interaction formula Γ determines the family of systems we can construct out of these components. It does so by specifying, for each possible number of replicated instances (for example, 3 philosophers and 3 forks), which are the possible interactions between them. An interaction consists of a set of transitions that are executed simultaneously. For example, in an interaction philosopher 3 executes a *g*(et) transition simultaneously with *t*(ake) transitions of the forks 2 and 3. Before formalizing this, we introduce the syntax and semantics of Interaction Logic.

Interaction Logic. For a constant κ ≥ 1, fixed throughout the paper, the *Interaction Logic* ILκ is built on top of a countably infinite set Var of variables, the set Pred = -*N <sup>k</sup>*=<sup>1</sup> <sup>P</sup>*<sup>k</sup>* of monadic predicate symbols ranged over by pr (i.e. the logic has a predicate symbol for each port), the binary predicate ≤, and the *successor* functions succ0, ..., succκ−1, of arity one. The formulae of IL<sup>κ</sup> are generated by the syntax

$$\begin{array}{l} t := i \in \mathsf{Val} \mid \mathsf{succc}\_{0}(t) \mid \dots \mid \mathsf{succc}\_{\kappa - 1}(t) \begin{array}{l} \mathsf{terms} \\ \phi := t\_{1} \leq t\_{2} \mid \mathsf{pr}(t) \mid \phi\_{1} \land \phi\_{2} \mid \neg \phi\_{1} \mid \exists i \; . \ \phi\_{1} \; \text{formulaae} \end{array} \end{array}$$

Abbreviations like *t*<sup>1</sup> = *t*2, *t*<sup>1</sup> < *t*2, φ<sup>1</sup> ∨ φ2, φ<sup>1</sup> ↔ φ2, and ∀*i* . φ are defined as usual. ILκ is interpreted over finite ranked trees of arity κ, which we identify with a prefixclosed language of words, also called *nodes*, over the alphabet {0,...,κ−1}. The root of the tree is the empty word , and the children of *w* are *w*0,*w*1,...,*w*(κ−1). Formally, an *interpretation* or *structure* is a pair I = (U,ι), where the universe U is a tree and ι assigns a node to each variable and a set of nodes to each predicate in Pred. The predicate ≤ and the functions succ0,...,succ*k*−<sup>1</sup> have the usual fixed interpretations: If *t* and *t*  are interpreted as *w* and *w* , then *t*<sup>1</sup> ≤ *t*<sup>2</sup> holds iff *w* is a prefix of *w* , and succ*i*(*t*) is interpreted as the node *wi*, if *wi* ∈ U, and as the root otherwise. So, loosely speaking, successor functions wrap around to the root.

When <sup>κ</sup> <sup>=</sup> 1, formulae are interpreted on languages {,0,00,...,0*n*−1} for some number *n*. To simplify notation, in this case we assume that they are interpreted over the set {0,1,...,*n*−1}, and succ0 is the usual successor function on numbers, modulo *n*.

Intuitively, a universe U determines an instance of the component-based system, with one instance of each component for each *w* ∈ U. So, for example, for κ = 1 and U = {0,1,2,...,*n* − 1} in our running example we have philosophers 0,1,...,*n* − 1 and forks 0,1,...,*n* − 1. Generally, with κ = 1 we can describe pipeline and token-ring architectures, whereas higher values describe tree-shaped architectures.

Interaction formulae. A formula of ILκ is an *interaction formula* if it is the conjunction of the following formula:

$$\forall i \forall j \ . \bigwedge\_{\substack{p, q \in \mathsf{Preed} \\ \mathsf{type}(p) = \mathsf{type}(q)}} p(i) \land q(j) \to i \neq j \tag{3}$$

with a finite disjunction of formulae of the form:

$$\mathfrak{C}(i\_1, \ldots, i\_\ell) \stackrel{\text{def}}{=} \varphi \; \wedge \; \bigwedge\_{j=1}^\ell p\_j(i\_j) \; \wedge \; \bigwedge\_{j=1}^m \forall k \; . \; \psi\_j \to q\_j(k) \tag{4}$$

where ϕ,ψ1,...,ψ*<sup>m</sup>* are conjunctions of atomic formulae of the form *t*<sup>1</sup> ≤ *t*<sup>2</sup> and their negations. Intuitively, formula (3) is a generic axiom that prevents two ports of the same instance of a component type from interacting. The formulae of form (4) are called the *clauses* of the interaction formula.

*Example 1.* Consider a component-based system S = C<sup>1</sup> ,C2 ,Γ, where C<sup>1</sup> and C<sup>2</sup> have ports *p*<sup>1</sup> and *p*2, respectively, and Γ has one single clause

$$\mathfrak{C}(i,j,k) = (i < j \land k = \text{succc}(j)) \quad \land \ (p\_1(i) \land p\_2(j)) \quad \land \ \forall i. i > k \to p\_1(i)$$

Γ states that an interaction consists of: the *i*-th process of type C<sup>1</sup> executes transition *p*1; the *j*-th process of type C<sup>2</sup> executes *p*2; and, for every *i* > (*j*+1) mod *n*, the *i*-th process of type C<sup>1</sup> executes transition *p*<sup>1</sup> as well; all this happens simultaneously in one atomic step. -

Loosely speaking, (4) states that in an interaction components can simultaneously engage in a multiparty rendez-vous, together with a broadcast to the ports *q*1,...,*qm* of the components whose indices satisfy the constraints ψ1,...,ψ*m*, respectively. An example of peer-to-peer rendez-vous with no broadcast is the dining philosophers system in Fig. 1, whereas examples of broadcast are found among the benchmarks in §5. In the next section we show that, despite this generality, it is possible to construct a trap invariant for any interaction formula in a purely syntactic way.

Observe that the interaction formula does not explicitly specify that every other process remains idle. Formally, as we will see in the next section, the system has an interaction for each *minimal model* of (4), which allows us not to have to specify idleness. Given structures I<sup>1</sup> = (U,ι1) and I<sup>2</sup> = (U,ι2) sharing the same universe U, we say <sup>I</sup><sup>1</sup> I<sup>2</sup> if and only if <sup>ι</sup>1(pr) <sup>⊆</sup> <sup>ι</sup>2(pr) for every pr <sup>∈</sup> Pred. Given a formula <sup>φ</sup>, a structure I is a *minimal model* of φ if I |= φ and, for all structures I such that I I and I - I, we have I |= φ.

#### 2.1 Execution Semantics of Component-based Systems

The semantics of a component-based system S = C<sup>1</sup> ,...,C*<sup>N</sup>*,Γ is an infinite family of Petri Nets, one for each universe of Γ. The reachable markings and actions of the Petri Net characterize the reachable global states and transitions of the system, respectively. To fix notations, we recall several basic definitions.

Preliminaries: Petri Nets. <sup>A</sup> *Petri Net* (PN) is a tuple <sup>N</sup> <sup>=</sup> *S*,*T*,*E*, where *<sup>S</sup>* is a set of *places*, *T* is a set of *transitions*, *S* ∩*T* = ∅, and *E* ⊆ (*S* ×*T*)∪(*T* ×*S* ) is a set of *arcs*. The elements of *S* ∪*T* are called *nodes*. Given nodes *x*, *y* ∈ *S* ∪*T*, we write *E*(*x*, *y*) def = 1 if (*x*, *y*) ∈ *E* and *E*(*x*, *y*) def = 0, otherwise. For a node *x*, let • *x* def = {*y* ∈ *S* ∪*T* | *E*(*y*, *x*) = 1}, *x*• def = {*y* ∈ *S* ∪*T* | *E*(*x*, *y*) = 1} and lift these definitions to sets of nodes.

<sup>A</sup> *marking* of <sup>N</sup> is a function m : *<sup>S</sup>* <sup>→</sup> <sup>N</sup>. A transition *<sup>t</sup>* is *enabled* in m if and only if m(*s*) <sup>&</sup>gt; 0 for each place *<sup>s</sup>* <sup>∈</sup> •*t*. For all markings m, m and transitions *<sup>t</sup>*, we write m *<sup>t</sup>* −→ m whenever *t* is enabled in m and m (*s*) = m(*s*)− *E*(*s*,*t*)+ *E*(*t*,*s*), for all *s* ∈ *S* . Given two markings m and m , a finite sequence of transitions σ = *t*1,...,*tn* is a *firing sequence*, written m <sup>σ</sup> −→ m if and only if either (i) *n* = 0 and m = m , or (ii) *n* ≥ 1 and there exist markings m1,...,m*n*−<sup>1</sup> such that m *<sup>t</sup>* 1 −→ m1 ...m*n*−<sup>1</sup> *tn* −→ m .

<sup>A</sup> *marked Petri Net* is a pair <sup>N</sup> <sup>=</sup> (N,m0), where m0 is the *initial marking* of <sup>N</sup>. A marking m is *reachable* in N if there exists a firing sequence σ such that m0 σ −→ m. We denote by R(N) the set of reachable markings of N. A marked PN N is 1*-safe* if m(*s*) ≤ 1, for each *s* ∈ *S* and m ∈ R(N). All PNs considered in the following will be 1-safe and we shall silently blur the distinction between a marking m : *S* → {0,1} and the boolean valuation β<sup>m</sup> : *S* → {⊥,} defined as βm(*s*) = ⇐⇒ m(*s*) = 1. A set of markings <sup>M</sup> is an *inductive invariant* of <sup>N</sup> <sup>=</sup> (N,m0) if and only if m0 ∈ M and for each m *<sup>t</sup>* −→ m such that m ∈ M, we have m ∈ M.

Petri Net Semantics of Component-Based Systems. We define the semantics of a component-based system as an infinite family of 1-safe Petri Nets. For *k* = 1,...,*N* let <sup>C</sup>*<sup>k</sup>* <sup>=</sup> P*<sup>k</sup>* ,S*<sup>k</sup>* ,*s*<sup>0</sup> *k* ,Δ*<sup>k</sup>* be a component type and, then, let S = C<sup>1</sup> ,...,C*<sup>N</sup>*,Γ be a system. Fix a universe <sup>U</sup> of <sup>Γ</sup>. We define a marked Petri Net <sup>N</sup><sup>U</sup> S def = (*S*,*T*,*E*,m0) as follows:


It follows immediately from this definition that <sup>N</sup><sup>U</sup> <sup>S</sup> is a 1-safe Petri Net. Indeed, for every *u* ∈ U, for every component-type C*<sup>k</sup>* , and for every reachable marking *m*, we have *<sup>s</sup>*∈S*<sup>k</sup> <sup>m</sup>*((*s*,*u*)) <sup>=</sup> 1. This reflects that the instance of <sup>C</sup>*<sup>k</sup>* at *<sup>u</sup>* is always in exactly one of the states of S*<sup>k</sup>* ; if *s* is that state, then (*s*,*u*) is the place carrying the token.

*Example 2.* Consider our running example, with U = {0,1,...,*n*−1}, i.e., *n* philosophers and *n* forks. Since the interaction formula (1) has no constants, its models are pairs (U,ι), where ι gives the interpretation of the free variable *i* and the predicates *g*, *t*, etc. The first disjunct of (1) is [*g*(*i*)∧*t*(*i*)∧*t*(succ(*i*))]. It has a minimal model for each *k* ∈ U, namely the model with ι(*i*) = *k*, ι(*g*) = {*k*} and ι(*t*) = {*k*,(*k*+1) mod *n*}. In the interaction produced by this model, the *k*-th philosopher executes transition *g*(et), the forks with numbers *k* and (*k* + 1) mod *n* execute transition *t*(ake), and all other philosophers and forks remain idle. The second disjunct yields the interactions in which a philosopher puts down its forks. Fig. <sup>2</sup> shows the Petri Net <sup>N</sup><sup>U</sup> <sup>S</sup> for universe <sup>U</sup> <sup>=</sup> {0,1,2}. For clarity,

Fig. 2: Petri Net of the dining philosophers for the universe U = {0,1,2}. In reality, the two pink and green places are only one place.

the places (*f*,0) and (*b*,0) have been duplicated; in reality the two copies are merged. The places of each philosopher are {(*w*,*i*),(*e*,*i*)} for *i* = 0,1,2. For example, transition *i*<sup>3</sup> corresponds to the minimal model {(*g*,1),(*t*,1),(*t*,2)}, in which philosopher 1 takes forks 1 and 2.

## 3 Trap Invariants

Given a Petri Net <sup>N</sup> <sup>=</sup> (*S*,*T*,*E*), a set of places *<sup>W</sup>* <sup>⊆</sup> *<sup>S</sup>* is called a *trap* if and only if *<sup>W</sup>*• <sup>⊆</sup> •*W*. A trap *<sup>W</sup>* of <sup>N</sup> is an *initially marked trap* (IMT) of the marked PN <sup>N</sup> <sup>=</sup> (N,m0) if and only if m0(*s*) = for some *s* ∈ *W*.

*Example 3.* {(*f*,1),(*b*,1)} and {(*f*,0),(*b*,1),(*f*,2),(*e*,2)} are two traps of the Petri Net in Figure 2.

An IMT defines an invariant of the Petri Net, because some place in the trap will always be marked, no matter which sequence of transitions is fired. The *trap invariant* of N is the set of markings that mark each IMT of N. Clearly, since marked traps remain marked, the set of reachable markings is contained in the trap invariant. Hence, to prove that a certain set of markings is unreachable, it is sufficient to prove that the set has empty intersection with the trap invariant. For self-completeness, we briefly discuss the computation of the trap invariant for a given marked Petri Net of fixed size, before explaining how this can be done for the infinite family of marked Petri Nets defining the executions of parameterized systems.

The *trap constraint* of a Petri Net N = (*S*,*T*,*E*) is the formula:

$$\Theta(\mathsf{N}) \stackrel{\mathsf{def}}{=} \bigwedge\_{I \in T} \left( \vee\_{\times \in \mathsf{T}} \mathsf{1}\_I \right) \to \left( \vee\_{\mathfrak{y} \in I} \mathsf{y} \right),$$

where each place *<sup>x</sup>*, *<sup>y</sup>* <sup>∈</sup> *<sup>S</sup>* is viewed as a propositional variable. It is not hard to show<sup>5</sup> that any boolean valuation <sup>β</sup> : *<sup>S</sup>* → {⊥,} that satisfies the trap constraint <sup>Θ</sup>(N) defines a trap *<sup>W</sup>*<sup>β</sup> of <sup>N</sup> in the obvious sense *<sup>W</sup>*<sup>β</sup> <sup>=</sup> {*<sup>s</sup>* <sup>∈</sup> *<sup>S</sup>* <sup>|</sup> <sup>β</sup>(*s*) <sup>=</sup> }. Further, if m0 : *<sup>S</sup>* → {0,1} is the initial marking of a 1-safe Petri Net N and μ<sup>0</sup> def = m0(*s*)=<sup>1</sup> *s* is a propositional formula, then every valuation of <sup>μ</sup><sup>0</sup> <sup>∧</sup>Θ(N) defines an IMT of (N,m0). Usually, computing invariants requires building a sequence of underapproximants whose limit is the least fixed point of an abstraction of the transition relation of the system [22]. This is not the case of the trap invariant, that can be directly computed from the trap constraint and the initial marking [11,17].

In the rest of the section we construct a parameterized trap constraint that characterizes the traps, not of one single net, as Θ(N), but of the infinite family of Petri Nets obtained from a component-based system. The parameterized trap constraint is a formula of WSκS. In Section 3.1 we first explain how to embed our interaction logic into WSκS, and in Section 3.2 we construct the parameterized trap constraint.

### 3.1 From ILκ to WSκS

We briefly recall the syntax and semantics of WSκS, the monadic second order logic WSκS of κ successors (see e.g. [37]). Let SVar be a countably infinite set of second-

<sup>5</sup> See e.g. [8] for a proof.

order variables (also called set variables), denoted as *X*,*Y*,... in the following. The syntax of WSκS is:

$$\begin{aligned} t &:= \overline{\epsilon} \mid \ge |\operatorname{succ}\_0(t)| \mid \dots \mid \operatorname{succ}\_{\kappa}(t) & \text{terms} \\ \phi &:= t\_1 = t\_2 \mid \operatorname{pr}(t) \mid X(t) \mid \phi\_1 \land \phi\_2 \mid \neg \phi\_1 \mid \exists \ge \dots \, \phi\_1 \mid \exists X \, . \phi\_1 \; \operatorname{formula} \end{aligned}$$

So WSκS extends ILκ with the constant symbol , atoms *X*(*t*) and monadic second order quantifiers ∃*X* . φ. We can consider w.l.o.g. equality atoms *t*<sup>1</sup> = *t*<sup>2</sup> instead of the inequalities *<sup>t</sup>*<sup>1</sup> <sup>≤</sup> *<sup>t</sup>*<sup>2</sup> in ILκ, because the latter can be defined in WSκ<sup>S</sup> as usual:

*<sup>x</sup>* <sup>≤</sup> *<sup>y</sup>* def = ∀*X* . *closed*(*X*)∧ *X*(*x*) → *X*(*y*) *closed*(*X*) def <sup>=</sup> <sup>∀</sup>*<sup>x</sup>* . *<sup>X</sup>*(*x*) <sup>→</sup> κ−<sup>1</sup> *<sup>i</sup>*=<sup>0</sup> *X*(succ*i*(*x*))

Like ILκ, the formulae of WSκS are interpreted on ordered trees of arity κ. The models of WSκS are structures (U,ι), where ι assigns the root of the tree to , a node <sup>ι</sup>(*x*) to each variable *<sup>x</sup>* <sup>∈</sup> Var and a set <sup>ι</sup>(*X*) <sup>⊆</sup> <sup>U</sup> to each set variable *<sup>X</sup>* <sup>∈</sup> SVar. The satisfaction relation (U,ι) <sup>|</sup>=WSκ<sup>S</sup> <sup>φ</sup> is defined as for ILκ, with one difference: in ILκ, the successor of a leaf of a tree is the root of the tree, while in WSκS the successor of leaf is, by convention, the leaf itself [37, Example 2.10.3]. This is the only reason why ILκ is not just a fragment of WSκS.

We define an embedding of ILκ formulae, without occurrences of predicates and set variables, into WSκS. W.l.o.g. we consider ILκ formulae that have been previously flattened, i.e the successor function occurs only within atomic propositions of the form succ*i*(*x*) = *y*. This is done by replacing each atomic proposition of the form succ*i*<sup>1</sup> (...succ*in* (*x*)...) = *y* by the formula ∃*x*<sup>1</sup> ...∃*xn* . *xn* = succ*in* (*x*)∧*y* = succ*i*<sup>1</sup> (*x*1)∧ *n*−<sup>1</sup> *<sup>j</sup>*=<sup>1</sup> *xj* = succ*ij* (*x <sup>j</sup>*+1). The translation of an ILκ formula φ into WSκS is the formula *Tr*(φ), defined recursively on the structure of φ such that *Tr* simply preserves first-order connectives and, secondly, yields:

$$\operatorname{Tr}(\mathbf{succc}\_i(\mathbf{x}) = \mathbf{y}) \stackrel{\mathsf{def}}{=} (\neg \max(\mathbf{x}) \land \operatorname{succc}\_i(\mathbf{x}) = \mathbf{y}) \lor (\max(\mathbf{x}) \land \mathbf{y} = \overline{\mathbf{c}}) \dots$$

We show that a formula φ of ILκ and its WSκS counterpart *Tr*(φ) are equivalent:

Lemma 1. *Given an* IL<sup>κ</sup> *formula* <sup>φ</sup>*, for any structure* <sup>I</sup> <sup>=</sup> (U,ι)*, we have* I |=IL <sup>φ</sup> ⇐⇒ I |=WSκ<sup>S</sup> *Tr*(φ)*.*

## 3.2 Defining Parameterized Trap Invariants in WSκS

Fix a component-based system S = C<sup>1</sup> ,...,C*<sup>N</sup>*,Γ and recall that every universe U induces a Petri Net N<sup>U</sup> <sup>S</sup> whose set of places is -*N <sup>k</sup>*=<sup>1</sup> <sup>S</sup>*<sup>k</sup>* <sup>×</sup>U. For every state *<sup>s</sup>* <sup>∈</sup> -*N <sup>i</sup>*=<sup>1</sup> <sup>S</sup>*<sup>i</sup>* , let *Xs* be a monadic second-order variable, and let *X* be the tuple of these variables in an arbitrary but fixed order. We define a formula *trap*-*pred*S(*X*), with *X* as set of free variables, that characterizes the traps of the infinitely many Petri Nets N<sup>U</sup> <sup>S</sup> corresponding to S. Formally, *trap*-*pred*S(*X*) has the following property:

For every universe U and for every set *P* ⊆ -*N <sup>k</sup>*=<sup>1</sup> <sup>S</sup>*<sup>k</sup>* <sup>×</sup><sup>U</sup> of places of <sup>N</sup><sup>U</sup> S: *P* is a trap of N<sup>U</sup> <sup>S</sup> iff the assignment *Xq* → {*<sup>u</sup>* <sup>∈</sup> <sup>U</sup> <sup>|</sup> (*q*,*u*) <sup>∈</sup> *<sup>P</sup>*}satisfies*trap*-*pred*S(*X*).

Observe that every assignment to *X* encodes a set of places, and vice versa. So, abusing language, we can speak of the set of places *X*.

We define auxiliary predicates that capture the intersection of the set of places *X* with the pre (•*t*) and postset (*t* •) of a transition *t* in N<sup>U</sup> S. For every clause <sup>C</sup> of <sup>Γ</sup>, of the form (4), we define the WSκS formulae:

$$\text{intersects-pre}\_{\mathcal{S}}^{\mathbb{E}}(\overline{X}, \mathbf{x}\_{1}, \dots, \mathbf{x}\_{\ell}) = \bigvee\_{j=1}^{\ell} X \bullet\_{P\_{j}} (\mathbf{x}\_{j}) \vee \bigvee\_{j=\ell+1}^{\ell+m} \exists \mathbf{x}\_{j} \cdot \operatorname{Tr}(\psi\_{j}) \wedge X \bullet\_{P\_{j}} (\mathbf{x}\_{j}) \text{ and } \mathbf{x}\_{\ell} \text{ in } \mathbf{x}\_{\ell} \text{ in } \mathbf{x}\_{\ell} \text{ in } \mathbf{x}\_{\ell} \text{ in } \mathbf{x}\_{\ell} \text{ in } \mathbf{x}\_{\ell} \text{ in } \mathbf{x}\_{\ell} \text{ in } \mathbf{x}\_{\ell} \text{ in } \mathbf{x}\_{\ell} \text{ in } \mathbf{x}\_{\ell} \text{ in } \mathbf{x}\_{\ell} \text{ in } \mathbf{x}\_{\ell}}{\forall \mathbf{x}\_{\ell}}$$
 
$$\text{intersects-post}\_{\mathcal{S}}^{\mathbb{E}}(\overline{X}, \mathbf{x}\_{1}, \dots, \mathbf{x}\_{\ell}) = \bigvee\_{j=1}^{\ell} X\_{P\_{j}} \bullet \left(\mathbf{x}\_{j} \right) \vee \bigvee\_{j=\ell+1}^{\ell+m} \exists \mathbf{x}\_{j} \dots \operatorname{Tr}(\psi\_{j}) \wedge X\_{P\_{j}} \bullet \left(\mathbf{x}\_{j} \right) .$$

Now we can define *trap*-*pred*S(*X*) as the conjunction of the following formulae, one for each clause C (in the form described in (4)) of Γ

$$\begin{array}{c} \mathsf{V}\mathsf{x}\_{1}\ldots\mathsf{V}\mathsf{x}\_{\ell} \cdot \left[ Tr(\varphi) \wedge \mathsf{intersets-pre} \frac{\mathbb{C}}{\mathbb{S}} (\overline{\mathsf{X}}, \mathsf{x}\_{1}, \ldots, \mathsf{x}\_{\ell}) \right] \\ \rightarrow \mathsf{intersets-post} \frac{\mathbb{C}}{\mathbb{S}} (\overline{\mathsf{X}}, \mathsf{x}\_{1}, \ldots, \mathsf{x}\_{\ell}). \end{array} \tag{5}$$

So, intuitively, *trap*-*pred*S(*X*) states that for every transition of the Petri Net, if the set *X* of places intersects the preset of the transition, then it also intersects its postset. This is the condition for the set of places to be a trap. Formally, we obtain:

Lemma 2. *Given a component-based system* S = C<sup>1</sup> ,...,C*<sup>N</sup>*,Γ *and a structure* I = (U,ι)*, where* ι *is an interpretation of the set variables X, the set P* = {*s*,*u* ∈ -*N <sup>k</sup>*=<sup>1</sup> <sup>S</sup>*<sup>k</sup>*×<sup>U</sup> <sup>|</sup> *<sup>u</sup>* <sup>∈</sup> <sup>ι</sup>(*Xs*)} *is a trap of* <sup>N</sup><sup>U</sup> <sup>S</sup> *if and only if* (U,ι) <sup>|</sup>=WSκ<sup>S</sup> *trap*-*pred*S(*X*)*.*

Parameterized Trap Invariants in WSκS. Loosely speaking, the intended meaning of *trap*-*pred*S(*X*) is "the set of places *X* is a trap". Our goal is to construct a formula stating: "the marking *m* marks all initially marked traps".

Recall that the Petri Nets obtained from component-based systems are always 1 safe, and so a marking is also a set of places. Recall, however, that all reachable markings have the property that they place exactly one token in the set of places modeling the set of states of a component (loosely speaking, the set of places of the *k*-th philosopher is (*w*, *k*) and (*e*,*k*), and there is always one token in the one or the other). So we define a formula *marking*S(*X*) with intended meaning "the set of places *X* is a legal marking", and another one, *trap*-*invariant*S(*X*) with intended meaning "the set of places *X* marks every initially marked trap".

In addition to the tuple of set variables *X* defined above, we consider now the "copy" tuple *X* def = *X <sup>s</sup>s*∈S*i*,1≤*i*≤*N*. Intuitively, *X* and *X* represent one set of places each. First, we define a (1-safe) marking as a set of places that marks exactly one state of each copy of each component:

$$
bar{\operatorname{sing}}\_{\mathcal{S}}(\overline{X}) = \forall \mathbf{x} \cdot \bigwedge\_{1 \le i \le N} \bigvee\_{s \in S} \left( X\_s(\mathbf{x}) \wedge \bigwedge\_{s' \in S'} \neg X\_{s'}(\mathbf{x}) \right).$$

Second, we give a formula describing the intersection of two sets of places:

$$\text{intersection}\_{\mathfrak{S}}(\overline{X}, \overline{X'}) = \exists \boldsymbol{\alpha} \; . \quad \bigvee\_{s \in \bigcup\_{1 \le i \le N} \mathbb{S}^{l}} (X\_s(\boldsymbol{\alpha}) \wedge X'\_s(\boldsymbol{\alpha})) .$$

Finally, to actually capture IMTs we need to determine if a trap is initially marked. However, this can be easily described by the formula:

$$\text{initially-mark}\_{\mathcal{S}}(\overline{\mathcal{X}}) = \exists \mathbf{x} \; . \bigvee\_{1 \le i \le N} X\_{s\_0}(\mathbf{x}).$$

So we can define the *trap-invariant* by the WSκS formula:

$$\text{trap-invariants}\_{\mathcal{S}}(\overline{X}) = \forall \overline{X'} \cdot \begin{bmatrix} \text{trap-pred}\_{\mathcal{S}}(\overline{X'}) \land \text{initially-marked}\_{\mathcal{S}}(\overline{X'})\\ \to \text{intersection}\_{\mathcal{S}}(\overline{X}, \overline{X'}) . \end{bmatrix} \tag{6}$$

Relying on Lemma 2 we are assured that the set represented by *X* intersects all IMTs. Now, let ϕ(*X*) be any formula that defines a set of *good* global states of the componentbased systems (or, equivalently, a good set of markings of their corresponding Petri nets), with the intuition that, at any moment during execution, the current global state of the component-based system should be good. We can now state the following theorem, that captures the soundness of the verification method based on trap invariants:

Theorem 1. *Given a component-based system* <sup>S</sup> *and a* WSκ<sup>S</sup> *formula* <sup>ϕ</sup>(*X*)*, if the formula*

∃*X* . *marking*S(*X*)∧*trap*-*invariant*S(*X*)∧ ¬ϕ(*X*) (7) *is unsatisfiable, then for every universe* U*, the property defined by the formula* ϕ(*X*) *holds in every reachable marking of* <sup>N</sup><sup>U</sup> S *.*

In the light of the above theorem, verifying the correctness of a component-based system with any number of active components boils down to deciding the satisfiability of a WSκS formula. The latter problem is known to be decidable, albeit with nonelementary worst-case complexity. A closer look at the verification conditions of the form (7) generated by our method suffices to see that the quantifier alternation is finite, which implies that the time needed to decide the (un)satisfiability of (7) is elementary. Moreover, our experiments show that these checks are very fast (less than 1 second on an average machine) for a non-trivial set of examples.

## 4 Refining Trap Invariants

Since the safety verification problem is undecidable for parameterized systems [6], the verification method based on trap invariants cannot be complete. As an example, consider the alternating dining philosophers system, of which an instance (for *n* = 3) is shown in Fig. 3. The system consists of two philosopher component types, namely Philosopher*rl*, which takes its right fork before its left fork, and Philosopher*lr*, taking the left fork before the right one. Each philosopher has two interaction ports for taking the forks, namely *g* (get left) and *gr* (get right) and one port for releasing the forks *p* (put). The ports of the Philosopher*rl* component type are overlined, in order to be distinguished. The Fork component type is the same as in Fig. 1. The interaction formula for this system Γ*alt philo*, shown in Fig. 3, implicitly states that only the 0-index philosopher component is of type Philosopher*rl*, whereas all other philosophers are of type Philosopher*lr*. Note that the interactions on ports *g*, *gr* and *p* are only allowed if *zero*(*x*) def = ∀*y* . *x* ≤ *y* holds, in other words if *x* is interpreted as the root of the universe (in our case, 0 since U = {0,...,*n*−1}).

It is well-known that any instance of the parameterized alternating dining philosophers system consisting of at least one Philosopher*rl* and one Philosopher*lr* is deadlockfree. However, trap invariants are not enough to prove deadlock freedom, as shown by the global state {*b*,0,*h*,0,*b*,1,*w*,1,*f*,2,*e*,2}, marked with thick red lines in Fig. 3. Note that no interaction is enabled in this state. Moreover, this state intersects with any trap of the marked PN that defines the executions of this particular instance, as proved below. Consequently, the trap invariant contains a deadlock configuration, and the system cannot be proved deadlock-free by this method.

Fig. 3: Alternating Dining Philosophers Structural Invariants for Parameterized Architectures 239

Proposition 1. *Consider an instance of the alternating dining philosophers system in Fig. 3, consisting of components* Fork(0)*,* Philosopher*rl*(0)*,* Fork(1)*,* Philosopher*lr*(1)*,* Fork(2) *and* Philosopher*lr*(2) *placed in a ring, in this order. Then each nonempty trap of this system contains one of the places b*,0,*h*,0,*b*,1,*w*,1,*f*,2 *or e*,2*.*

However, the configuration is unreachable by a real execution of the PN, started in the initial configuration that marks *f*,*i* and *w*,*i*, for all *i* = 0,1,2. An intuitive reason is that, in any reachable configuration, each fork is in state *f*(ree) only if none of its neighboring philosophers is in state *e*(ating). In order to prove deadlock freedom, one must learn this and other similar constraints. Next, we present a heuristic method for strengthening the trap invariant that infers such universal constraints.

#### 4.1 One Invariants

As shown by the example above, trap constraints do sometimes fail to prove interesting properties. Hence, it is desirable to refine the overapproximation of viable markings to exclude more spurious counterexamples. In order to do so, we consider a special class of *linear invariants*, called 1*-invariants* in the following. Although linear invariants are not structural and rely on the set of reachable markings of a marked Petri Net, the set of 1-invariants can be sufficiently under-approximated by structural conditions.

Definition 1. *Given a marked PN* N = ((*S*,*T*,*E*),m0)*, with S* = {*s*1,...,*sn*}*, a vector* a = (*a*1,...,*an*) ∈ {0,1} *<sup>n</sup> is a* 1-invariant *of* <sup>N</sup> *if and only if, for each reachable marking* <sup>m</sup> ∈ R(N)*, we have <sup>n</sup> <sup>i</sup>*=<sup>1</sup> *ai* ·m(*si*) = 1*.*

The following lemma relates 1-invariants to some structural properties. However, there are 1-invariants not captured by these conditions. Taking the intersection of this set of 1-invariants defines a weaker invariant, which is sound for our verification purposes.

Lemma 3. *Given a marked PN* <sup>N</sup> <sup>=</sup> ((S,*T*,*E*),m0)*, a set of places* <sup>F</sup> <sup>⊆</sup> <sup>S</sup> *is a* <sup>1</sup> *invariant if the following hold:*

$$I.\quad\sum\_{s}\mathfrak{m}\_{0}(s) = 1,$$

*s*∈F *2. either* ||F∩•*t* || = ||F∩*t* • || = *k with k* ∈ {0,1} *or* ||F∩•*t* || > 1 *for every t* ∈ *T.*

We devote the rest of this section to describe WSκS formulae which capture the structural properties necessary to define 1-invariants as laid down by Lemma 3 (2). As demonstrated in Section 3 the pre- and postset of transitions, as well as general sets of places in a PN describing the execution semantics can be defined in WSκS. Hence, we present the definitions of the following formulae only in the full version of this article [16] and just give the intuitions here.

As before, we fix two tuples of set variables *X* and *X*, with one variable *Xs* for each state *s* ∈ -*N <sup>i</sup>*=<sup>1</sup> <sup>S</sup>*<sup>i</sup>* and define the following formulae:


Given a transition *<sup>t</sup>* of the marked Petri Net <sup>N</sup><sup>U</sup> <sup>S</sup> defining the execution semantics of a component-based system S, for a universe U, we consider the following formulae:


Now we define a predicate *1*-*pred*<sup>S</sup> which consists of a conjunction of *unique*-*init*<sup>S</sup> and the formulae:

$$\begin{array}{l} \vee \vee \iota\_{1}, \dots, \vee \iota\_{\ell} \ . (Tr(\varphi) \to [\neg intersects\neg pre\prescript{\mathbb{Q}}{}{\mathbb{S}} \land \neg intersects\neg post\prescript{\mathbb{Q}}{}{\mathbb{S}} \\ \vee \ unique\ prere{\prescript{\mathbb{Q}}{}} \wedge \ unique\ post\prescript{\mathbb{Q}}{}{\mathbb{S}} \\ \vee \text{intersects\neg pre\prescript{\mathbb{Q}}{}{\mathbb{S}} \land \neg uniqueppre\prescript{\mathbb{Q}}{}{\mathbb{S}}} \end{array} \tag{8}$$

one for each clause C in Γ. We show the soundness of this definition, by the following:

Lemma 4. *Let* S = C<sup>1</sup> ,...,C*<sup>N</sup>*,Γ *be a component-based system and let X be a tuple of set variables, one for each state in a component of* S*. Then, for any structure* (U,ι) *such that* ι *interprets the variables in X, the set P* = {*s*,*u* ∈ -*N <sup>i</sup>*=<sup>1</sup> <sup>S</sup>*<sup>i</sup>* ×U | *u* ∈ ι(*Xs*)} *is a* <sup>1</sup>*-invariant of* <sup>N</sup><sup>U</sup> <sup>S</sup> *if* (U,ι) <sup>|</sup>=WSκ<sup>S</sup> *<sup>1</sup>*-*pred*S(*X*)*.*

We may now define the 1*-invariant* analogously to the trap-invariant before:

$$I \text{-} invariant\_{\mathcal{S}}(\overline{X}) = \forall \overline{X'} \quad I \text{-} pred\_{\mathcal{S}}(\overline{X'}) \to \text{unique} \text{-} intersection\_{\mathcal{S}}(\overline{X}, \overline{X'}). \tag{9}$$

Reasoning as before we obtain a refinement of Theorem 1 since every reachable marking has to satisfy both invariants.

Theorem 2. *Given a component-based system* <sup>S</sup> *and a* WSκ<sup>S</sup> *formula* <sup>ϕ</sup>(*X*)*, if the formula:*

$$\exists \overline{X} \; . \; marking\_{\mathcal{S}}(\overline{X}) \land I \; \text{imvariant}\_{\mathcal{S}}(\overline{X}) \land trap \; \text{imvariant}\_{\mathcal{S}}(\overline{X}) \land \neg \varphi(\overline{X}) \qquad (10)$$

*is unsatisfiable, then for every universe* U*, the property defined by the formula* ϕ(*X*) *holds in every reachable marking of* <sup>N</sup><sup>U</sup> S *.*

## 5 Experiments

We have implemented a prototype (called ostrich [15]) of this verification procedure to evaluate the viability of our approach. The current version of the prototype can only handle token-ring and pipeline topologies, but not trees; for these topologies the verification reduces to checking satisfiability of a formula of WS1S. We have also considered one example with tree-topology (see below), for which the formula was constructed manually. Satisfiability of WS1S and WSκS formulae was checked using version 1.4/17 of Mona [33]. We consider various examples separated in categories:

	- there is one philosopher who takes first her left and then her right fork,
	- as above but the forks remember whom took them, and
	- there are two global forks everyone grabs in the same order.

Lehmann-Rabin. This is a randomized solution to the dining philosophers problem.

Dining Cryptographers. A group of cryptographers want to determine if one of them paid for a meal or a stranger but do not reveal how they acted individually.

The results are shown in Table 1. The first column reports the size of the example in terms of the amount of states (#st.) and clauses (#cls.). The second column indicates which properties could () and could not be verified (×) because the conjunction of trap and one-invariant was not strong enough to prove the given property. The third column reports the time (in second) it takes to prove all considered properties. These results are measured on the provided virtual machine for artifacts [32] where the host system is an average laptop. To understand the next four columns, recall that Mona constructs for a given formula φ(*X*) a finite automaton recognizing all the sets *X* for which ϕ holds. Since the automaton can have very large alphabets, its transition relation is encoded as a binary decision diagram (BDD). The columns report the number of states and the number of nodes of the BDD for different formulas. More precisely, the columns trap, trap-inv, flow, and flow-inv give the sizes of the automata for the formula *trap*-*pred*<sup>S</sup> ∧*initially*-*marked*S, *trap*-*invariant*S, *1*-*pred*<sup>S</sup> and *1*-*invariant*<sup>S</sup> respectively. We write "n.a." (for "not available") to indicate that Mona timed out before the automaton was computed.

The first observation is that the satisfiability checks often can be done in very short time. This is surprising, because the formulas to be checked, namely (7) and (10), exhibit one quantifier alternation (recall that *trap*-*invariant*<sup>S</sup> and *1*-*invariant*<sup>S</sup> contain universal quantifiers). More specifically, since *trap*-*invariant*<sup>S</sup> is obtained by universally quantifying over *trap*-*pred*<sup>S</sup> ∧*initially*-*marked*S, one would expect the automaton for the former to be much larger than the one for the latter, at least in some cases. But this does not happen: In fact, the automaton for *trap*-*invariant*<sup>S</sup> is almost always smaller. Similarly, there is no blowup from *1*-*pred*<sup>S</sup> to *1*-*invariant*S. A possible explanation could be that the exponential blowup caused by universal quantification in WSκS manifests only on theoretical corner cases, which do not occur in our examples.

## 6 Conclusions

We have shown that the trap technique used in [11,28,13] for the verification of single systems can be extended to parameterized systems with sophisticated communication structures, like pipelines, token rings and trees. Our extension constructs a parameterized trap invariant, a formula of WSκS satisfied by the reachable global states of all instances of the system. The core of the approach is a purely syntactic, automatic derivation of the trap invariant from the interaction formula describing the possible transitions of the system. When the set of safe global states can also be expressed in WSκS, which is usually the case, we check using the Mona tool whether the trap invariant implies the safety formula. The technique proves correctness of systems that do not produce well-structured transition systems in the sense of [1,29], and of systems with broadcast communication, for which, to the best of our knowledge, cut-off results have not been obtained yet.

Our experiments demonstrate that trap invariants can be very effective in finding proofs of correctness (inductive invariants) of common benchmark examples. In practice, the technique is very cheap, since it avoids costly fixpoint computations. This suggests incorporating it into other verifiers as a preprocessing step.

Data Availability Statement and Acknowledgements. The work of the second and fifth author has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme under grant agreement No 787367 (PaVeS).

The tool ostrich and associated files are available in the Zenodo repository: https://zenodo.org/ record/3676940



## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Automated Verification of Parallel Nested DFS**

Wytse Oortwijn1- , Marieke Huisman<sup>2</sup> , Sebastiaan J. C. Joosten3-, and Jaco van de Pol2,<sup>4</sup>

<sup>1</sup> Department of Computer Science, ETH Zurich, Zurich, Switzerland wytse.oortwijn@inf.ethz.ch

<sup>2</sup> Formal Methods and Tools, University of Twente, Enschede, The Netherlands m.huisman@utwente.nl <sup>3</sup> Dartmouth College, Hanover NH, USA sebastiaan.joosten@dartmouth.edu

<sup>4</sup> Department of Computer Science, Aarhus University, Aarhus, Denmark jaco@cs.au.dk

**Abstract.** Model checking algorithms are typically complex graph algorithms, whose correctness is crucial for the usability of a model checker. However, establishing the correctness of such algorithms can be challenging and is often done manually. Mechanising the verification process is crucially important, because model checking algorithms are often parallelised for efficiency reasons, which makes them even more error-prone.

This paper shows how the VerCors concurrency verifier is used to mechanically verify the parallel nested depth-first search (NDFS) graph algorithm of Laarman et al. [25]. We also demonstrate how having a mechanised proof supports the easy verification of various optimisations of parallel NDFS. As far as we are aware, this is the first automated deductive verification of a multi-core model checking algorithm.

## **1 Introduction**

Model checking is an automated procedure for verifying behavioural properties of reactive systems. To avoid a false sense of safety, it is essential that model checkers are themselves correct. However, model checkers use ever more ingenious algorithms [12] and even parallel implementations [2] to be able to combat the large state spaces of critical industrial systems, which makes it increasingly difficult to guarantee their correctness.

This paper focusses on the mechanical verification of a *multi-core* model checking algorithm for detecting accepting cycles in automata, called nested depth-first search (NDFS). This algorithm solves the model checking problem for Linear-time Temporal Logic (LTL), a widely used logic for specifying reactive systems. Multi-core NDFS is developed by Laarman et al. in 2011 [25] and is currently deployed in the high-performance model checker LTSmin [23].

The mechanical verification of parallel NDFS is carried out in VerCors [6], a verifier based on concurrent separation logic that targets real-world concurrent

<sup>-</sup>This research has been performed while working at the University of Twente.

and parallel programs. The presented verification is inspired by a previous mechanical verification of *sequential* NDFS [37] that was carried out in Dafny [30].

This paper demonstrates the feasibility of mechanical program verification of parallel graph algorithms, like multi-core NDFS. To the best of our knowledge we present the first mechanical verification of a parallel graph algorithm. Our formalisation provides reusable components that can be used to verify variations of parallel NDFS, as well as other algorithms for parallel model checking.

Before listing our contributions (§1.3) we first provide more background on model checking algorithms (§1.1) and related work on their verification (§1.2).

#### **1.1 Background on Model Checking**

Pnueli introduced the Linear-time Temporal Logic (LTL) [36] to specify properties of reactive systems. The model checking problem [12] decides whether a transition system satisfies a given LTL property. The automata-based approach [45] reduces the model checking problem to the graph-theoretic problem of checking the reachability of accepting cycles. Reachability of accepting cycles in directed graphs can be checked in linear time, with the nested depth-first search (NDFS) algorithm [13,19,41], which forms the basis of the Spin model checker [17].

Several distributed and parallel model checking algorithms have been proposed, to allocate more memory and processors to the problem [2]. NDFS is based on depth-first search, which is considered hard (impossible) to parallelise efficiently [39]. For distributed approaches, the best strategy is to turn to BFS algorithms [3], which are straightforward to parallelise but at the cost of increasing the amount of work beyond linear time. For the shared-memory setting, swarm verification was proposed [18], where each worker runs its own instance of NDFS. Various DFS-based multi-core algorithms for full LTL model checking have been devised for this strategy [14,15,25]. This paper considers the version by Laarman et al. [25], which is a parallel version of improved sequential NDFS [41].

The correctness of parallel NDFS is quite subtle. In particular, parallel DFS does not fully respect a global depth-first ordering, since each worker maintains its own search stack, yet the correctness of NDFS depends on the search order. Also, to realise speedups, the implementation avoids locking shared data structures by using atomics. This raises the question whether the implementation of a parallel model checker, meant to verify the correctness of safety-critical systems, is itself correct. For this reason the original paper [25] contains a detailed *pen-and-paper* correctness proof, which is based on a number of invariants.

#### **1.2 Related Work**

To raise the level of confidence in model checkers, one approach is to certify each of their individual runs. Obviously, the counterexample returned by a model checker is itself a certificate that can easily be verified independently. However, double-checking the absence of errors is harder. Namjoshi [33] proposed to instrument a μ-calculus model checker, to generate a deductive proof that can

be checked independently, also in case the property holds. Recently, an IC3 style symbolic LTL model checker has been extended with deductive proofs as well [16]. However, these approaches do not prove correctness of the model checking algorithm, but only validate its outcome for each specific use.

Alternatively, one can formalise the model checking algorithm and its correctness proof in an interactive theorem prover. An early example of this approach was the verification of a model checker for the modal μ-calculus in Coq [43]. A framework for verifying sequential depth-first search algorithms was developed in Isabelle [27,28], and applied to the verification of NDFS with partial order reduction [9] as well as a model checker for timed automata [47]. The recent formalisations of Tarjan's SCC algorithm [10] fit in the same line of research. These approaches require to model and verify the algorithm in an interactive theorem prover, allowing one to use the full power of the theorem prover.

If one wishes to verify the code of the algorithm directly, yet another approach is to model the algorithm and its specification in a (semi-)automated program verifier, where the code is enriched with sufficient annotations to prove its correctness. This approach was followed for several standard sequential graph algorithms in Why3 [46] and for sequential NDFS in Dafny [37]. However, there is hardly any work on automated verification of parallel graph algorithms. Raad et al. [38] verified four concurrent graph algorithms in the context of CoLoSL, but the proofs have not been automated. Sergey et al. [42] verified a concurrent spanning tree algorithm, but interactively, through an embedding in Coq.

To support the verification of shared-memory parallel software, program verifiers typically use concurrent separation logic. VeriFast [20] aims at sequential and multi-threaded C and Java programs. VerCors [6] verifies concurrent programs in Java and OpenCL, by applying a correctness-preserving translation into a sequential imperative language, delegating the generation of the verification conditions to Viper [32] and their verification ultimately to Z3 [31].

#### **1.3 Contributions and Outline**

This paper discusses the mechanical verification of the *parallel* NDFS algorithm of Laarman et al. [25] using VerCors. To the best of our knowledge, this is the first mechanical verification of a parallel graph (and model checking) algorithm.

Section 2 recalls both sequential and parallel NDFS (§2.1–2.2), and gives preliminaries on concurrency verification with VerCors (§2.3). It also explains that parallel NDFS uses various colour markings on the input graph to administer the status of the nested searches of workers. Some of these colours are local to a single worker, while other colours are globally shared among all workers.

Section 3.1 presents our new (informal) correctness proof of parallel NDFS, that is based on a number of global invariants on the possible colour configurations. The main challenge lies in proving completeness, which is particularly difficult since workers can delegate the detection of accepting cycles to other workers. To be able to mechanise our completeness proof, we contribute a new invariant (Lemma 4) that guarantees the preservation of so-called *special paths*. This allows to circumvent using the complicated inductive argument used by [25].

Section 3.2 discusses how parallel NDFS is specified in VerCors. In particular, this requires the specification of permissions, to verify data race-free access to shared data structures. Moreover, we encode the colour maps and the transition relation of the input automaton as matrices, which greatly contribute to the feasibility of proof checking. We also explain how atomic updates are specified, which was left implicit in the high-level pseudo code. Similarly, we implement asymmetric termination detection: if one worker finds a counterexample, all workers can terminate immediately; if, on the other hand, all workers have completely finished their exploration, only then may one conclude that the model is correct.

Section 3.3 explains the techniques to formalise the full functional correctness proof in VerCors. In particular, this requires the distribution of permissions and invariants over threads and locks, and the introduction of auxiliary ghost state to track the precise progress of the various nested search phases of all workers.

Section 4 demonstrates how our verification is reused to verify optimisations to the algorithm. In particular, we check the optimisation "early cycle detection" that, for weak LTL properties, detects all cycles in the outer search instead of the nested inner search. We also propose and verify a repair to the "all-red" extension, by inserting an extra check that was missing in [25]. This extension improves the speedup of parallel NDFS by sharing more global information.

Finally, Section 5 concludes with a perspective on reusing our techniques for verifying other parallel graph algorithms.

## **2 Preliminaries**

Section 2.1 recalls the standard sequential NDFS algorithm for finding reachable accepting cycles in automata. We verified a parallel version of NDFS, which is introduced in Section 2.2. The verification has been performed with VerCors; Section 2.3 gives prerequisites on concurrency verification and separation logic.

Before discussing the NDFS algorithms, let us first recall the basic definitions of automata and accepting cycles. An *automaton* <sup>G</sup> is a quadruple (S, s<sup>I</sup> ,succ, <sup>A</sup>) consisting of a finite set <sup>S</sup> of *states*, an *initial state* <sup>s</sup><sup>I</sup> ∈ S, a *next-state relation* succ : S → <sup>2</sup><sup>S</sup> and a set A⊆S of *accepting states*. A *path* in <sup>G</sup> is a sequence <sup>P</sup> <sup>=</sup> <sup>s</sup>0,...,s<sup>n</sup>+1 of <sup>S</sup>-states so that <sup>s</sup><sup>i</sup>+1 <sup>∈</sup> succ(si) for every 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>. The notation <sup>|</sup>P<sup>|</sup> n + 2 denotes the *length of* P, P[i] s<sup>i</sup> the i*th state on* P, and P[i..] the *subpath* si,...,s<sup>n</sup>+1. Any state s is defined to be *reachable* (in G) if there exists an (s<sup>I</sup> , s)-path. Any path <sup>P</sup> is a *cycle* whenever <sup>P</sup>[0] = <sup>P</sup>[|P| − 1] and 1 <sup>&</sup>lt; <sup>|</sup>P|. Finally, any cycle <sup>P</sup> is *accepting* if <sup>P</sup>[i] ∈ A for some 0 <sup>≤</sup> i < <sup>|</sup>P|.

#### **2.1 Nested Depth-First Search**

Figure 1 presents a standard, sequential implementation of NDFS, consisting of two nested DFS searches: dfsblue and dfsred. The *blue search* processes successors recursively in DFS order, marking them blue when done on line 8. The colour cyan indicates a partially explored state, i.e., not all of its successors have been visited yet by the blue search. Just before backtracking from an accepting


Fig. 1: A standard sequential implementation of nested DFS.

state, dfsblue calls the *red search* on line 7, to report any accepting cycle. This colours a state red after processing its successors recursively on line 16. The pink colour denotes states that are only partially explored by dfsred5.

It is straightforward to see that NDFS is *sound*, meaning that it only reports true accepting cycles. To see that NDFS is also *complete*, i.e., finds an accepting cycle if one exists, observe that dfsred will indeed be started from every accepting state. This in itself is not enough: the red search ignores states marked red in a previous call. It is essential that dfsred explores accepting states in the right order. The crucial insight is that dfsred only visits cyan and blue states and that accepting states coloured blue cannot be part of any accepting cycle.

The correctness of NDFS has been verified with Dafny [37]. We ported this correctness proof to VerCors as the basis for the verification of parallel NDFS.

#### **2.2 Parallel Nested Depth-First Search**

A naive strategy for parallelising NDFS is *swarming* [18]: running several instances of NDFS in parallel, each working on a private set of colours. Swarmed NDFS tends to find accepting cycles faster, since its workers are expected to explore different parts of the input graph. The correctness of swarmed NDFS with respect to sequential NDFS is almost immediate, except for termination handling: workers only share information about the exit condition. We also verified swarmed NDFS in VerCors, as a stepping stone for verifying parallel NDFS.

Laarman et al. improve on the swarming algorithm by sharing information of the red search in the backtrack phase. Figure 2 presents the improved algorithm. Here every line of code is supposed to be executed atomically. The entry point is pndfs(s<sup>I</sup> , n), which spawns n parallel instances of dfsblue(s<sup>I</sup> , *tid*) in the fashion of swarming. However, the red colourings are shared now, by which workers can guarantee that certain states are, or will be, sufficiently explored. So the red states can now be skipped in both the red search (line 19) *and* the blue search (line 4). PNDFS thus improves performance, since workers prune each other's search space. At the same time this significantly complicates the correctness argument, since workers may now prevent each other from finding accepting

<sup>5</sup> In the sequential algorithm, pink and red do not need to be distinguished, but having the distinction here makes the parallel version easier to explain.

Fig. 2: An implementation of parallel NDFS, where the red colours are shared.

cycles. Moreover, if multiple workers initiated dfsred from the same accepting state s, they must now finish their red search simultaneously for the algorithm to be correct. The **await** synchroniser on line 23 ensures this, by blocking thread execution until s.*count*—the number of workers in dfsred(s, ·)—reaches 0.

The original correctness argument of Laarman et al. relies on a complicated inductive invariant stating that not all accepting cycles can be missed due to pruning. However, this invariant is unsuitable for use in a (semi-)automated verifier. Section 3 discusses the verification of pndfs and provides a new invariant on the red colours that allows its correctness to be proven mechanically. It also discusses how our verification handles concurrency and thread synchronisation.

### **2.3 Concurrency Verification with VerCors**

Before discussing the actual verification, let us first briefly introduce VerCors, an automated program verifier for parallel programs. VerCors uses concurrent separation logic with permissions as its logical foundation. Its annotation language contains *fractional permission predicates* of the form Perm(s, π), in the style of Boyland [7], that capture the notion of ownership enforced by separation logic, where <sup>s</sup> is a shared memory location (e.g., a class field) and <sup>π</sup> <sup>∈</sup> (0, 1]<sup>Q</sup> <sup>a</sup> *fractional value*. The fractional permissions denote access rights: if π = 1 it denotes *write access* to s, whereas π < 1 denotes a *read access* to s. Sometimes Perm(s) is written as shorthand for <sup>∃</sup><sup>π</sup> : Perm(s, π), to indicate *some* ownership of <sup>s</sup>. Soundness of the underlying logic ensures that the total sum of permissions for any shared memory location does not exceed 1, which implies data race freedom.

In addition to ownership predicates, the annotation language supports the ∗∗ connective, which is the *separating conjunction* of separation logic. The assertion <sup>P</sup> ∗∗ <sup>Q</sup> expresses that the ownerships captured by <sup>P</sup> and <sup>Q</sup> are *disjoint*, e.g., it is disallowed that both express write access to the same shared location. Ownership

predicates can be *split* into disjoint parts and be *combined* as follows:

$$\mathsf{Perm}(s,\pi\_1+\pi\_2) \iff \mathsf{Perm}(s,\pi\_1) \ast \colon \mathsf{Perm}(s,\pi\_2)$$

A standard pattern in concurrency verification is to split and distribute the ownership of all shared memory over *threads* and *locks*. Clarifying the latter; in case multiple threads need to write to a common footprint of shared memory, the ownerships to this footprint are typically protected by a *resource invariant*. Threads can then only use the resources protected by this invariant when they execute *atomic* instructions (i.e., when no other threads can interfere). For more details we refer to the standard papers on concurrent separation logic [34,8,44].

## **3 Automated Verification of Parallel NDFS**

This section elaborates on the verification of pndfs with VerCors [35]. Section 3.1 presents and discusses our new correctness argument for pndfs, which includes the new invariant on the red colours and a proof of its correctness. Sections 3.2 and 3.3 discuss the mechanisation of this proof in VerCors.

## **3.1 Correctness of** pndfs

The soundness proof of pndfs is not very different from the soundness argument of sequential NDFS: every time **report cycle** is executed, a witness cycle can be found. The main challenge lies in proving completeness, i.e., proving that if there exists any accepting cycle, pndfs will report it. This is difficult since workers can *obstruct* each other's red searches and thereby prevent the detection of accepting cycles. This section proposes a new key invariant and completeness proof that is suitable for deductive verification.

We start by introducing a number of low-level invariants on the local configurations of colours that can arise during a run of pndfs. Let *Cyan*tid be the set of cyan-coloured states {<sup>s</sup> ∈S| s.*color* [*tid*] = cyan} private to worker *tid*, and likewise for *White*tid , *Blue*tid and *Pink*tid . Moreover, let *Red* be the set of globally red states, and succ(X) -∪<sup>s</sup>∈<sup>X</sup>succ(s) the *successor set* of a given set <sup>X</sup> ⊆ S.

**Lemma 1.** pndfs *maintains the following global invariants during execution:*


*Proof.* The proof basically checks their preservation by each line of the program.

Invariants *1.1* –*1.5* are reused from [25], whereas *1.6* is new and needed for the new completeness proof. Proving completeness amounts to proving that not all reachable accepting cycles can be missed due to search space pruning. To help proving this, we identify a new class of paths, which we call tid*-special paths*.

**Definition 1 (Special path).** *Any path* P = s0,...,sn+1 *is defined to be tid*  special *if* <sup>s</sup><sup>0</sup> <sup>∈</sup> *Pink*tid *,* <sup>s</sup>n+1 <sup>∈</sup> *Cyan*tid *, and none of the states on* <sup>P</sup> *are red, i.e.,* <sup>s</sup><sup>k</sup> <sup>∈</sup> *Red for every* <sup>k</sup> *such that* <sup>0</sup> <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>n</sup> + 1*.*

Any path P is *special* if P is *tid*-special for some worker *tid*. Intuitively, the existence of a *tid*-special path during execution of pndfs means that (*i*) worker *tid* is doing a red search, since it has pink states, and (*ii*) this worker will eventually find an accepting cycle, unless other workers obstruct this path. Thus the above definition allows to formally define obstruction: a worker *tid* is *obstructed* (will miss an accepting cycle) if any state on a *tid*-special path is coloured red.

Our main strategy for proving completeness involves showing that every time a worker gets obstructed, a new special path can be found. A direct consequence of this is that not all accepting cycles can be missed: upon termination of pndfs, there are no more cyan or pink states. To help prove this, we use the following property (taken from [25], but rephrased to handle our special paths), that allows to find special paths by using the colouring invariants.

**Lemma 2.** *If invariants 1.1–1.6 are satisfied, then every path* P = s0,...,s<sup>n</sup>+1 *with* <sup>s</sup><sup>0</sup> <sup>∈</sup> *Red and* <sup>s</sup><sup>n</sup>+1 ∈A\ *Red contains a special subpath.*

*Proof.* The original handwritten proof from [25] shows that this lemma follows from invariants *1.1* –*1.6* , by induction on <sup>P</sup>.

The original completeness proof of [25] performs induction on the number of obstructed accepting cycles, to show the absence of such cycles upon termination as a result of Lemma 2. However, such an argument is out of reach for Hoare-style reasoning, since it is not an *inductive* invariant. We propose a new invariant that *is* inductive, which builds on the insight that, under certain colouring conditions, new special paths can always be found when workers get obstructed, as is shown by Lemma 3. In particular, pndfs guarantees that if there exists a special path before executing line 24, then there also exists a special path after its execution.

**Lemma 3.** *For any non-red state* <sup>r</sup> ∈S\ *Red that is on a tid -special path, if:*

$$\begin{array}{c} i. \ r \in \mathcal{A} \implies \mathsf{succc}(r) \subseteq \mathit{Red}, \; \mathit{and} \\ \vdots \implies \begin{array}{c} \ldots \end{array} \begin{array}{c} \mathtt{succc}(r) \subseteq \mathit{Red}, \; \mathit{and} \\ \mathit{Div=L} \end{array} \end{array}$$

*ii.* <sup>r</sup> ∈A∩ *Pink*tid <sup>=</sup><sup>⇒</sup> *Pink*tid <sup>=</sup> {r}*,*

*then there still exists a special path after adding* r *to Red .*

*Proof.* Let P = s<sup>0</sup> ...s<sup>n</sup>+1 be a tid-special path and assume that r is on P, so that <sup>r</sup> <sup>=</sup> <sup>s</sup> for some such that 0 <sup>≤</sup> <sup>≤</sup> <sup>n</sup> + 1. Since *Pink*tid <sup>=</sup> <sup>∅</sup>, worker *tid* is performing dfsred that was started from some accepting state <sup>a</sup> ∈A∩ *Pink*tid. Then <sup>a</sup> <sup>=</sup> <sup>r</sup>, as otherwise <sup>s</sup><sup>0</sup> <sup>=</sup> <sup>a</sup> due to *ii.*, which by *i.* would contradict that <sup>P</sup> is special. Moreover, since <sup>s</sup><sup>n</sup>+1 <sup>∈</sup> *Cyan*tid there exists a (s<sup>n</sup>+1, a)-path <sup>Q</sup> (this is a standard property of dfsblue; the path Q must be on the recursive call stack). Then Lemma 2 applies on the path s,...,s<sup>n</sup>+1, Q[1..] and gives a new special path when considering *Red* ∪ {r} as the new set of red states.

Lemma 3 implies that every time an accepting cycle is missed due to pruning, there is always another accepting cycle that will eventually be reported. This is enough to establish completeness of pndfs, via the following key invariant.

## **Lemma 4.** *The* pndfs *algorithm maintains the global invariant that either:*

*4.1. All reachable accepting cycles contain an accepting state that is not red; or 4.2. There exists a special path.*

*Proof.* The interesting case is showing that this invariant remains preserved after making a non-red state <sup>s</sup> <sup>∈</sup> *Pink*tid \ *Red* red (on line 24 of Fig. 2), by some worker *tid* that is doing a red search from some accepting state <sup>a</sup> ∈A∩ *Pink*tid .


The next theorem shows how Lemma 4 allows deriving completeness of parallel NDFS. In particular, it shows that no accepting cycles can exist when all threads have terminated, in which case all the theorem's premises are fulfilled.

**Theorem 1.** *If for every worker tid it holds that Pink*tid = ∅*, Cyan*tid = ∅ *and* <sup>s</sup><sup>I</sup> <sup>∈</sup> *Blue*tid *, then there does not exist a reachable accepting cycle.*

*Proof.* Towards a contradiction, suppose that there exists an accepting cycle P that is reachable via an (s<sup>I</sup> , P[0])-path Q. Due to the theorem's premises no special paths can exist, and therefore by Lemma 4 there is an accepting state on <sup>P</sup> that is not red. Without loss of generality, assume that (†) <sup>P</sup>[0] ∈A\ *Red*. Since <sup>Q</sup>[0] <sup>∈</sup> *Blue*<sup>0</sup> (since there is at least one worker), by induction on <sup>Q</sup> together with Lemma 1 we have that <sup>P</sup>[0] <sup>∈</sup> *Red*, which contradicts (†).

All the above invariants and proof steps have been encoded in VerCors, which was highly non-trivial. While mechanising the proofs, many implicit proof steps had to be made explicit. Section 3.3 further details the proof mechanisation.

## **3.2 Encoding of** pndfs **in VerCors**

Graph structures are notoriously difficult to handle in separation logics, as they usually rely on pointer aliasing, which complicates ownership handling and prevents easy use of the frame rule [38]. However, since automata have a fixed and finite set of states, we can overcome this limitation by representing the input automata as an |S| × |S| *adjacency matrix*. This does not impose serious restrictions: other automata encodings can be transformed at the specification level to an adjacency matrix, e.g., via model fields in the style of JML [11,29]. The suitability of adjacency matrices for deductive verification is confirmed by [24].

```
1 enum Color {white, cyan, blue};
```
**<sup>2</sup> int** <sup>N</sup>; // the number of automata states (equal to |S|)


```
10
```
**<sup>11</sup> resource** resource invariant - ··· ; // full definition is deferred to Fig. 4. **12**

 **bool** Path(**int** s, **int** t, **seqint** <sup>P</sup>) - // the encoding of (s, t)-paths in G <sup>0</sup> <sup>≤</sup> s, t < N <sup>∧</sup> <sup>0</sup> <sup>&</sup>lt; <sup>|</sup>P| ∧ <sup>P</sup>[0] = <sup>s</sup> <sup>∧</sup> <sup>P</sup>[|P| − 1] = <sup>t</sup> <sup>∧</sup> (∀<sup>i</sup> : 0 <sup>≤</sup> i < <sup>|</sup>P| ⇒ <sup>0</sup> <sup>≤</sup> <sup>P</sup>[i] < N)∧(∀<sup>i</sup> : 0 <sup>≤</sup> i < <sup>|</sup>P|−<sup>1</sup> <sup>⇒</sup> <sup>G</sup>[P[i]][P[i+1]]); **bool** Path(**seqint** <sup>P</sup>) - <sup>0</sup> <sup>&</sup>lt; <sup>|</sup>P| ∧ Path(P[0], P[|P| − 1], P); **bool** ExPath(**int** s, **int** t, **int** n) - <sup>∃</sup><sup>P</sup> : <sup>n</sup> ≤ |P| ∧ Path(s, t, P); **bool** SpecialPath(**seqint** P, **int** tid) - // the encoding of tid-special paths *pink*[*tid*][P[0]] <sup>∧</sup> *color* [*tid*][P[|P| − 1]] = cyan ∧ ∀<sup>i</sup> : 0 <sup>≤</sup> i < <sup>|</sup>P|⇒¬*red*[P[i]]; **bool** ExSpecialPath(**int** *tid*) - <sup>∃</sup><sup>P</sup> : 1 <sup>&</sup>lt; <sup>|</sup>P| ∧ Path(P) <sup>∧</sup> SpecialPath(P, *tid*); **21** /<sup>∗</sup> An excerpt of the top-level contract (further discussed in Section 3.3). <sup>∗</sup>/ **ensures** \**result**⇒(∃<sup>a</sup> : 0≤a<N <sup>∧</sup>*acc*[a]∧ExPath(s<sup>I</sup> , a, 1)∧ExPath(a, a, 2)); **ensures** (∃<sup>a</sup> : 0≤a<N <sup>∧</sup>*acc*[a]∧ExPath(s<sup>I</sup> , a, 1)∧ExPath(a, a, 2))⇒ \**result**; **bool** pndfs(**int** s<sup>I</sup> );

Fig. 3: The automata representation and an excerpt of pndfs's top-level contract.

Figure 3 shows the encoding of the input automaton G in VerCors. The thread-local colour sets are represented as matrices of dimension *nthreads*×|S|, so that each thread *tid* uses *color* [*tid*][·] and *pink*[*tid*][·] to administrate their (local) status of exploration. The sets of red and accepting states are shared between threads and thus encoded as |S|-sized Boolean arrays. The succ function can now be defined such that <sup>t</sup> <sup>∈</sup> succ(s) whenever <sup>G</sup>[s][t] is true for every 0 <sup>≤</sup> s, t < N.

This encoding of automata, together with an encoding of the definition of paths (on lines 13–17) is sufficient to express the main correctness property that is proven by VerCors. More specifically, line 23 expresses *soundness*: a positive return value indicates the existence of an accepting cycle. Line 24 expresses *completeness*: if there exists an accepting cycle, then pndfs returns positively.

**Atomic operations.** The handwritten correctness argument of [25] for Figure 2 assumes that all program lines are executed atomically. This is reflected in the VerCors encoding: all updates to shared memory are made within atomic operations, which specification-wise all give access to the same shared resources. For example, the assignment s.*pink*[*tid*] := *true* on line 15 (Fig. 2) is implemented as the atomic operation "**atomic** { *pink*[*tid*][s] := *true* }". On the specification level, the atomic sub-program receives all the missing access rights required for

the assignment, which are otherwise protected by the resource invariant declared on line 11 (Fig. 3). The exact definition of resource invariant is deferred to §3.3, and the type **resource** is the type of separation logic assertions. Moreover, the **await** instruction on line 23 (Fig. 2) is implemented as a busy while-loop that only stops when s.*count* = 0, which is checked atomically in every iteration.

**Termination handling.** The pseudocode in Figure 2 uses an "**exit all**" command to terminate all threads when an accepting cycle has been found. However, this mechanism was left implicit. Our formalisation in VerCors makes the termination system explicit: it consists primarily of a global *abort* flag that is declared on line 9 in Figure 3. All workers regularly poll this flag to determine whether they continue or not. The *abort* flag is set to true by the main thread—the thread that started pndfs and spawned all worker threads on line 11 of Fig. 2—as soon as one of the workers returns with an accepting cycle.

## **3.3 Verification of** pndfs **in VerCors**

One major challenge of concurrency verification is finding a proper distribution of shared-memory ownership, that allows proving memory safety as well as any functional properties of interest. This section starts by discussing how we distribute the ownership of the input automaton over threads and the resource invariant, in such a way that Invariants *1.1* –*1.6* and *4.1* –*4.2* can be encoded.

To prove the preservation of these invariants after every computation step, auxiliary bookkeeping is needed on the specification level. For example, to mechanise the proof of Lemmas 3 and 4 we need to make explicit that all workers *tid* with *Pink*tid = ∅ are doing a red search that was started from some root state <sup>a</sup> ∈A∩ *Pink*tid . This auxiliary bookkeeping is maintained in the resource invariant, via *auxiliary ghost state*, which is explained later. Finally, we give the fully annotated version of pndfs and explain how completeness is proven from Lemma 4, by applying the VerCors encoding of Theorem 1.

**Ownership distribution.** We start by explaining how the ownership of the automaton encoding (lines 2–8 in Fig. 3) is distributed among workers and the resource invariant. First observe that all colouring invariants express *global properties* that span over (*i*) the shared *red* colourings, as well as (*ii*) the local configurations *color* [*tid*] and *pink*[*tid*] of every worker *tid*. To define the ownership distribution for (*i*), observe that the only way to distribute the access rights to *red* to enable all threads to regain write access, is to let the resource invariant protect full ownership of *red*. The resource invariant therefore fully captures the properties about red states expressed in Lemmas 1 and 4. However, to be able to specify that, it also requires partial ownership of all thread-local colourings.

Figure 4 presents the full resource invariant, that includes: access rights to both global and thread-local colourings on lines 2–4; the encoding of Lemma 1 on lines 10–17 and 22; and the encoding of Lemma 4 on lines 30–32. In addition, the resource invariant holds partial ownership of the *abort* flag on line 8, to ensure that global termination is only announced when an accepting cycle is found.

```
1 resource resource invariant -

 2 Perm(N) ∗∗ Perm(nthreads) ∗∗ Perm(G) ∗∗ Perm(acc) ∗∗
 3 (∀tid, s : Perm(color [tid][s], 1
                                  2 ) ∗∗ Perm(pink[tid][s], 1
                                                         2 )) ∗∗
 4 (∀s : Perm(red[s], 1)) ∗∗
 5 termination() ∗∗ colourings() ∗∗ dfsred status() ∗∗ keyinvariant();
 6
 7 resource termination() -
                               // Resources for termination handling.
 8 Perm(abort, 1
                  2 ) ∗∗ abort ⇒ ∃s : acc[s] ∧ ExPath(sI , s, 1) ∧ ExPath(s, s, 2);
 9
10 resource colourings() -
                              // The low-level colouring invariant encodings.
11 ∀tid, s : (color [tid][s] = blue ∨ pink[tid][s]) ⇒ ∀s-
                                                       ∈ succ(s) :
12 color [tid][s-

                    ] = blue ∨ color [tid][s-

                                          ] = cyan ∨ red[s-

                                                          ] ∗∗ // Inv. 1.1
13 ∀s : red[s] ⇒ ∀s-
                       ∈ succ(s) :
14 red[s-

              ] ∨ ∃tid : pink[tid][s-

                                  ] ∧ color [tid][s-

                                                 ] = cyan ∗∗ // Inv. 1.2
15 ∀tid, s : (acc[s] ∧ color [tid][s] = blue) ⇒ red[s] ∗∗ // Inv. 1.3
16 ∀tid, s : (acc[s] ∧ pink[tid][s]) ⇒ color [tid][s] = cyan ∗∗ // Inv. 1.4
17 ∀tid, s : pink[tid][s] ⇒ (color [tid][s] = cyan ∨ color [tid][s] = blue); // 1.5
18
19 /∗ Auxiliary ghost state for proving Lemma 3 and preserving Inv. 4. ∗/
20 resource dfsred status() -
                                 ∀tid : (
21 Perm(exploringred[tid], 1
                              2 ) ∗∗ Perm(redroot[tid], 1
                                                     2 ) ∗∗ Perm(waiting[tid], 1
                                                                             2 ) ∗∗
22 ∀s : pink[tid][s] ⇒ (exploringred[tid] ∧ (acc[s] ⇒ s = redroot[tid])) ∗∗ // 1.6
23 exploringred[tid] ⇒ acc[redroot[tid]] ∧
24 (∀s : pink[tid][s] ⇒ ExPath(redroot[tid], s, 1)) ∧
25 (∀s : color [tid][s] = cyan ⇒ ExPath(s, redroot[tid], 1)) ∧
26 (¬waiting[tid] ⇒ ¬red[redroot[tid]]) ∧
27 (waiting[tid] ⇒ ∀s : pink[tid][s] ⇔ s = redroot[tid])
                                                            -

28
29 /∗ The encoding of Lemma 4, from which completeness of pndfs follows. ∗/
30 resource keyinvariant() -

31 (∀s : acc[s] ∧ ExPath(sI , s, 1) ∧ ExPath(s, s, 2) ⇒ ¬red[s]) ∨
```
**<sup>32</sup>** (∃*tid* : ExSpecialPath(*tid*));

Fig. 4: The full definition of the resource invariant. Several bound checks have been omitted for presentational clarity.

Observe that the resource invariant holds a lot of quantified information. As a result, we experienced that proving the reestablishment of resource invariant after finishing **atomic**s is expensive performance-wise. To make verification more efficient, we extracted all atomic operations (e.g., colour updates) into separate methods and prove their contracts in a function-modular way. This improves performance, as it cuts the problem of verifying dfsred and dfsblue into smaller sub-problems that are individually more manageable for the SMT solver.

Finally, Figure 5 presents an excerpt of the contract of dfsblue, which shows the ownership pattern of all threads. Notably, every thread *tid* receives the remaining ownership of *color* [*tid*] and *pink*[*tid*] on line 4. Thus threads can always read from their thread-local colour fields, and may write to them while doing so

```
1 context Perm(N) ∗∗ Perm(nthreads) ∗∗ Perm(G) ∗∗ Perm(acc);
 2 context 0 ≤ s<N;
 3 context 0 ≤ tid < nthreads;
 4 context ∀t : 0 ≤ t<N ⇒ Perm(color [tid][t], 1
                                                 2 ) ∗∗ Perm(pink[tid][t], 1
                                                                       2 );
 5 requires color [tid][s] = white;
 6 requires ∀t : (0 ≤ t<N ∧ color [tid][t] = cyan) ⇒ ExPath(t, s, 1);
 7 ensures \result ⇒ ∃a : 0 ≤ a<N ∧acc[a]∧ExPath(sI , a, 1)∧ExPath(a, a, 2);
 8 ensures ¬\result ⇒ ∀t : color [tid][t] = cyan ⇔ \old(color [tid][t]) = cyan;
 9 ensures ¬\result ⇒ pink[tid] = \old(pink[tid]) ∧ color [tid][s] = blue;
10 bool dfsblue(s, tid)
11 ···
```
Fig. 5: The ownership specification in the contract dfsblue for thread *tid*. Annotations of the form **context** P abbreviate **requires** P; **ensures** P.

atomically. This distribution of ownership matches with the encoding of atomic operations discussed earlier. Line 7 expresses soundness of dfsblue, captured in the resource invariant (line 8 of Fig. 4) on global termination. This allows to deduce soundness of pndfs from the resource invariant, after all threads have terminated as result of the detection of an accepting cycle.

**Auxiliary ghost state.** As mentioned earlier, to prove that pndfs also *preserves* the (encodings of) Invariants *1.1* –*1.6* and *4.1* –*4.2* after every computation step, additional ghost state needs to be maintained. In particular, we need to make explicit that every worker *tid* with *Pink*tid <sup>=</sup> <sup>∅</sup> is doing a dfsred search that was started from some root state <sup>a</sup> ∈A∩ *Pink*tid . In addition, the proof of Lemma 3 needs that there exists an (s, a)-path for every <sup>s</sup> <sup>∈</sup> *Cyan*tid . To prove the preservation of Lemma 4 we also need that, if worker *tid* is not yet executing the **await** instruction, we have that <sup>a</sup> <sup>∈</sup> *Red*, and otherwise that *Pink*tid <sup>=</sup> {a}.

This extra information is encoded in the loop invariant on lines 20–27 (Figure 4), via three *ghost arrays*, named *exploringred*, *redroot* and *waiting*. Firstly, *exploringred* administrates which workers are doing a red search. For verification purposes we added ghost code to the program, to set *exploringred*[*tid*] to true whenever dfsred(a, *tid*) is invoked by worker *tid* from a blue search, and back to false whenever dfsred(a, *tid*) returns. Secondly, *redroot* stores the root state on which dfsred was invoked. Finally, *waiting* administrates which workers are executing an **await** instruction. These three ghost arrays together are closely related to the s.*count* fields in the program of Figure 2, via the following invariant: <sup>∀</sup><sup>s</sup> : s.*count* <sup>=</sup> |{*tid* <sup>|</sup> *exploringred*[*tid*] <sup>∧</sup> *redroot*[*tid*] <sup>=</sup> <sup>s</sup> ∧ ¬*waiting*[*tid*]}|.

Establishing that pndfs adheres to the invariants in Lemmas 1 and 4 was highly non-trivial and required various complex auxiliary lemmas to be encoded and proven. These are all encoded in VerCors as *ghost methods*: side-effect-free helper methods on which the lemma is encoded in the method's contract [21,22]. Induction proofs, for example, are encoded using either loop invariants or recursion. Application of a lemma then translates to a function call on the specification level. The proofs in Section 3.1 are all encoded and applied in this way.

```
1 context Perm(N) ∗∗ Perm(nthreads) ∗∗ Perm(G) ∗∗ Perm(acc) ∗∗ Perm(abort, 1
                                                                              2 );
 2 context ∀tid, s : Perm(color [tid][s], 1
                                       2 ) ∗∗ Perm(pink[tid][s], 1
                                                              2 );
 3 context ∀tid : Perm(exploringred[tid], 1
                                          2 ) ∗∗ Perm(redroot[tid], 1
                                                                 2 );
 4 context ∀tid : Perm(waiting[tid], 1
                                     2 );
 5 context 0 ≤ sI < N;
 6 requires ∀tid, s : ¬exploringred[tid] ∧ color [tid][s] = white ∧ ¬pink[tid][s];
 7 ensures \result ⇒ (∃a : acc[a] ∧ ExPath(sI , a, 1) ∧ ExPath(a, a, 2));
 8 ensures (∃a : acc[a] ∧ ExPath(sI , a, 1) ∧ ExPath(a, a, 2)) ⇒ \result;
 9 bool pndfs(sI )
10 par tid = 0 to nthreads
11 context Perm(N) ∗∗ Perm(nthreads) ∗∗ Perm(G) ∗∗ Perm(acc);
12 context ∀s : Perm(color [tid][s], 1
                                        2 ) ∗∗ Perm(pink[tid][s], 1
                                                               2 );
13 context Perm(term[tid], 1
                                 2 ) ∗∗ Perm(exploringred[tid], 1
                                                             2 );
14 context Perm(redroot[tid], 1
                                   2 ) ∗∗ Perm(waiting[tid], 1
                                                           2 );
15 requires ¬exploringred[tid] ∧ ∀s : color [tid][s] = white ∧ ¬pink[tid][s];
16 ensures ¬abort ⇒ ∀s : color [tid][s] = cyan ∧ ¬pink[tid][s];
17 ensures ¬abort ⇒ color [tid][sI ] = blue;
18 do
19 bool found := dfsblue(sI , tid);
20 if found then
21 atomic { abort := true; } // initiate global termination.
22 atomic { if ¬abort then theorem one() }; // apply Thm. 1's encoding.
23 return abort;
```
Fig. 6: The annotated version of pndfs, extending the excerpt given in Figure 3.

**Correctness of** pndfs**.** Figure 6 gives the annotated version of pndfs<sup>6</sup> that extends the excerpt given earlier, in lines 23–25 of Figure 3. The main thread requires partial ownership of all thread-local colour fields on line 2 and distributes these over the appropriate threads on line 12. The contract associated to the parallel block (lines 11–17) is called an *iteration contract* and assigns pre- and postconditions to every parallel instance. For more details on iteration contracts we refer to [5]. Most importantly, the iteration contract of each thread holds enough resources to satisfy all the preconditions of dfsblue, on line 19.

Soundness of pndfs (line 7) is proven as follows. Suppose that all threads have terminated and *abort* has been set to *true*. In that case, the resource invariant states that an accepting cycle has been found. This information can be retrieved by briefly obtaining the resource invariant in *ghost code* on line 22, which directly allows to deduce soundness. Note that this information is not lost upon releasing the resource invariant, as it is a Boolean property and thus duplicable.

To prove completeness, suppose that *abort* is still false when all workers have terminated. This implies that *Pink*tid = ∅ and *Cyan*tid = ∅ for every worker *tid* (line 16), as well as <sup>s</sup><sup>I</sup> <sup>∈</sup> *Blue*tid (line 17), since all threads started their blue

 Observe that every thread reads *abort* in their contract on lines 16–17, even though they do not have the required access rights to do so. This is resolved by adding some auxiliary ghost state, but this is omitted for presentational clarity.

Fig. 7: Two extensions (highlighted grey) to dfsblue that improve work sharing.

search from s<sup>I</sup> . Combining this information with the information in the resource invariant allows one to prove all the premises of Theorem 1. Therefore its ghost method encoding can be applied on line 22, from which completeness is derived.

The encoding of parallel NDFS in VerCors [35] comprises roughly 2500 lines of code (of which ∼85% is proof overhead), which includes the mechanisation of all proof steps described in §3.1. The verification time is about 140s, measured on a Macbook with an Intel Core i5 CPU with 2,9 GHz, and 8Gb memory.

## **4 Optimisations**

One major benefit of mechanically verified code is that optimisations can be applied with full confidence. Without verification, changes to critical code are often avoided, to ensure that no errors are introduced. A verified algorithm allows to apply optimisations easily, as these often do not change the outer contract, at most requiring only minor adaptions to the invariants. We illustrate this with two optimisations, for which [25] experimentally demonstrated improved speedup.

"Early cycle detection" checks already in the blue search if an accepting cycle is closed, cf. lines 4–6 in Figure 7a. It is known that for weak LTL properties, all accepting cycles will be found in the blue search when applying early cycle detection. To show that this optimisation indeed preserves all invariants, we simply inserted these 3 lines in the VerCors specification. The proof introduces a case distinction on whether s or t is accepting and constructs a witness path. This adds another 10 lines: two for the case distinction and four in each branch to show that a witness accepting cycle exists. Collectively, these extra 13 lines constitute indeed very little effort to prove this particular optimisation correct.

The second optimisation, called "all-red", checks if all successors of s became red during the blue search (lines 2 and 7 in Figure 7b). If so, we can mark s.*red*

early (lines 8–10). This optimisation is important, since it allows the global red colour to spread even in portions of the graph that are not under an accepting state, thereby allowing more pruning. However, this optimisation only preserves the invariants if we wait until s.*count* = 0 (on line 9). This test was erroneously omitted in [25] <sup>7</sup>. Fortunately, the version in Figure 7b is correct, which has now been checked in VerCors in a straightforward manner.

## **5 Conclusion**

This paper presents the first automated deductive verification of a parallel graph algorithm: we verified soundness and completeness of parallel nested depth-first search using VerCors. We also show that this mechanisation is helpful in quickly discovering whether optimisations of the algorithm preserve its correctness.

Many of the presented verification techniques, e.g., the use of separate contracts for single statements, the way we handle termination, and the construction of explicit witnesses through ghost variables, will be useful for the verification of other similar algorithms. Moreover, our encoding of parallel nested DFS closely resembles the implementation of such an algorithm in mainstream programming languages like C++ and Java. It would be interesting to investigate how our VerCors encoding can be automatically deployed on multi-core architectures, for example to enable comparing its performance and scalability with LTSmin.

There are many possibilities to extend the line of research on the verification of parallel model checking algorithms initiated in this paper. First, one may consider to extend the scope of this verification closer towards the actual efficient C-implementation in LTSmin. This would involve verifying the underlying concurrent hash table to store visited states (a simplified version of which has been verified before with VerCors [1]), the encoding of the colours as "bits" in the hash table buckets, and the use of CAS to manipulate these bits.

One might consider alternative parallel NDFS versions, notably [15], which shares the blue colour, invoking a repair procedure when the depth-first order is violated. Both algorithms have been reconciled in [14], sharing both blue and red. This work could be extended to a wealth of other optimisations like partial-order reduction, or other parallel model checking algorithms, for example [26,4,40].

Our work can be considered as a first step towards a library for the verification of graph-based (multi-core) model checking algorithms. It will be an interesting line of future work to continue this: developing a full-fledged verification library for common subtasks, like graph manipulations and termination detection.

**Acknowledgments and data availability statement.** This work is partially supported by the NWO VICI 639.023.710 Mercedes project and by the NWO TOP 612.001.403 VerDi project. The datasets for this case study are available at: https://doi.org/10.4121/uuid:36c00955-5574-44d9-9b26-340f7a1ea03b.

<sup>7</sup> Wan Fokkink and his students Stefan Vijzelaar and Pieter Hijma already found in 2012 that the "all-red" extension required an extra check '**await** s.*count* = 0' in [25], and wondered whether '**await** s.*count* <sup>≤</sup> 1' would be sufficient. Independently, Akos Hajdu reported this omission in 2015.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Discourje: Runtime Verification of Communication Protocols in Clojure**

Ruben Hamers<sup>1</sup> and Sung-Shik Jongmans1,<sup>2</sup>

<sup>1</sup> Open University, Heerlen, the Netherlands <sup>2</sup> CWI, Amsterdam, the Netherlands

**Abstract.** This paper presents Discourje: a runtime verification framework for communication protocols in Clojure. Discourje guarantees safety of protocol implementations relative to specifications, based on an expressive new version of multiparty session types. The framework has a formal foundation and is itself implemented in Clojure to offer a seamless specification–implementation experience. Benchmarks show Discourje's overhead can be less than 5% for real/existing concurrent programs.

## **1 Introduction**

**Background.** To take advantage of today's and tomorrow's multi-core processors, shared-memory concurrent programming—a notoriously complex enterprise—is becoming increasingly important. To alleviate some of the complexities, in addition to low-level *synchronization primitives*, several modern programming languages have started to offer core support for higher-level *communication primitives* as well, in the guise of message passing through *channels* (e.g., Go [25], Rust [42], Clojure [17]). The idea is that, beyond their usage in distributed computing, channels can also serve as a *programming abstraction* for shared memory, supposedly less prone to concurrency bugs than locks, semaphores, and the like. However, in a recent study of 171 concurrency bugs in popular open source Go programs [48], Tu et al. found that "message passing does not necessarily make multi-threaded programs less error-prone than shared memory."

From a programmer's perspective, a key problem is this: if we already know which *roles* (threads), *infrastructure* (channels between threads), and *protocols* (communications through channels) our program should consist of, then how can we ensure our implementation is indeed *safe* relative to our specification? Safety means "bad" channel actions never occur: if a send, receive, or close happens in the implementation, then it is allowed by the protocol in the specification. For instance, typical protocols rule out common message-passing concurrency bugs [48], such as sends without receives, receives without sends, and type mismatches (actual type sent -= expected type received). Essentially, thus, we face a classical verification problem, with classical ingredients: an implementation language I, a specification language S, and an inclusion relation .

Over the past years, a significant body of research in this area has been based on *multiparty session types* (MPST) [27]. The idea is to specify protocols as behavioral types [1,30] against which threads are subsequently type-checked; the

theory guarantees that static well-typedness of threads at compile-time implies dynamic safety of their channel actions at run-time. Originally [27], I was a dialect of pi-calculus, S was a calculus of behavioral types, and was defined through formal typing rules, but more recently, practical implementations were developed as well [14,28,29,37,38,44], where I is an existing *general-purpose language* (GPL; Erlang, F#, Go, Java, Scala), S is a new *domain-specific language* (DSL; Scribble), and encodes behavioral types in S as non-behavioral types in I (e.g., through custom communication API generation [29]). These works highlight two key strengths of the MPST methodology, namely it supports:


**Problem.** One of the key open problems of MPST concerns *expressiveness*. For instance, suppose we need to write a program in which messages are repeatedly communicated from threads I<sup>1</sup> and I<sup>2</sup> to thread I3, non-deterministically ordered (i.e., standard producers–consumer); this protocol is not supported by MPST.

We identify two reasons why expressiveness is limited.

**First**, MPST were originally developed for distributed computing (service choreographies [10,11]); accordingly, *decoupled* verification of roles (per-service type-checking) has always been a key requirement [14]. This is reflected in the MPST workflow (Fig. 1): first, the programmer writes a *global* protocol specification; then, an MPST tool *projects* it onto every role to infer *local* protocol specifications; then, the implemented threads are type-checked. However, role-based decomposition of global behavior into equivalent local behaviors often cannot be done statically (e.g., [12]), so expressiveness is limited by "projectability".

**Second**, MPST prescribes static type-checking, which limits expressiveness, because: (a) type-checking is sound, but not complete, so the static MPST approach rejects implementations that are conservatively ill-typed but actually safe; (b) protocols whose execution relies on value-dependent control flow are supported only in limited circumstances. To alleviate (b), value-dependent type constructors can be added to S [20,47], but this raises practical issues (i.e., dependent types are only scarcely supported by mainstream GPLs).

**Contributions.** To simplify shared-memory concurrent programming in languages with channels, we aim to consolidate strengths #1 and #2 (page 2), but alleviate MPST's expressiveness issues. Specifically, this paper is founded on two tenets that depart from existing work in significant ways (Fig. 2).

**First**, we exploit the fact that in our context, channels serve "merely" as programming abstractions for shared memory; there is no distribution whatsoever. Thus, whereas MPST-based verification for distributed computing requires projection, this is not the case in our setting, opening the door to fully automated *projection-free* MPST and eliminating a significant source of restrictions.

**Second**, instead of adopting MPST-based verification through static typechecking at compile-time, we explore MPST-based verification through *dynamic monitoring at run-time*. This enables soundness *and completeness*, while it also supports *value-dependent protocols* in a generally implementable way (i.e., we are not aware of a mainstream GPL that does not support our monitoring approach).

In this paper, we present our practical embodiment of these ideas: *Discourje* (pronounced "discourse"), a runtime verification framework for communication protocols in *Clojure* [17,26]. Discourje consists of two components: a DSL to specify protocols and construct monitors, and an API to implement protocols (supplementing Clojure) and add instrumentation. While we could have developed this framework for any language with channel-based programming abstractions, including Go and Rust, Clojure is particularly interesting, because: (1) Clojure has a powerful macro system that enabled us to develop the Discourje DSL as an extension to Clojure, thereby offering programmers a seamless specification– implementation experience; (2) contrasting Go and Rust, Clojure is not a systems language but an applications language, so runtime verification overheads might be more tolerable. We summarize our contributions as follows:


Our artifact is available at https://github.com/discourje.

## **2 Overview**

**Clojure (in a nutshell).** Clojure [17,26] is a general-purpose, impure functional language that compiles to Java bytecode. As a dialect of Lisp, Clojure follows the code-as-data philosophy, provides a powerful macro system, and adopts parenthesized prefix notation. Clojure offers asynchronous channel-based programming abstractions through core library clojure.core.async [16]. In the annual Clojure survey [15], Clojure programmers indicate "ease of development" is more important than "runtime performance"; this makes Clojure an interesting target for runtime verification (viz. overheads).

To introduce the core features of Clojure relevant to this paper, Fig. 3 shows a channel-based concurrent Tic-Tac-Toe program in Clojure,<sup>3</sup> while Fig. 4 summarizes the meaning of every primitive (";;" indicates comments). Lines 1–9 define constants (blank, cross, nought, initial-grid) and functions (get-blank, add, not-final?) to represent Tic-Tac-Toe concepts. Lines 11-12 define two channels (a->b and b->a) that implement the infrastructure through which players Alice and Bob communicate. Channels in Clojure are bounded: sends/receives block until a channel is not full/empty. Lines 14–24 and 25–35 define threads that implement Alice and Bob. Both players execute a loop, starting with a blank grid. In each iteration, Alice first gets the index of some blank space on the grid, then plays a cross in that space, then sends a message to Bob to communicate the index, then awaits a message from Bob, and then updates the grid accordingly; Bob acts symmetrically. After every grid update, Alice or Bob checks if it has reached a final configuration; if so, the loop is exited and channels are closed.

Every Clojure data structure, including the vector that implements the grid, is *persistent*, and therefore, effectively *immutable*. This means that every operation on an existing data structure leaves it intact, and instead, it returns a new data structure. Thus, Alice and Bob initially share the same initial grid, but because it cannot be modified in-place, modifications need to be explicitly communicated. Persistence of Clojure data structures is also why we can guarantee freedom from data races in pure Clojure (= Clojure without Java objects): if users communicate only Clojure data through channels, race freedom is guaranteed (if Java objects are communicated, the user is responsible to avoid races).

**Basic Discourje: Tic-Tac-Toe.** A basic Discourje specification of the Tic-Tac-Toe protocol for Alice and Bob is shown in Fig. 5. We typeset Discourje "keywords" (which are actually just Clojure functions and macros) bold violet.

Lines 1–2 define two roles (role) to represent Alice and Bob. Lines 4–6 define an auxiliary specification, inserted twice into the main specification (ins); it states that the channels between Alice and Bob are closed (-##), in parallel (par). Lines 7–13 define the main specification; it states that recursively (fix), first a message of type Long (the index of a grid) is communicated from Alice to Bob (-->), and then from Bob to Alice, unless the channels are closed (the game is done). Square brackets are used to build lists of sub-specifications (sequencing).

The Tic-Tac-Toe protocol depends on value-dependent control flow, as Alice and Bob close the channels only once the grid has reached a final configuration. This is a non-protocol-related property that no existing MPST tool supports.

<sup>3</sup> Tic-Tac-Toe is a two-player game played on a 3x3 grid. Players take turns to fill the initially blank spaces of the grid with crosses ("X") and noughts ("O"). The first player to fill three adjacent spaces, in any direction, with the same symbol wins.

```
1 (def blank " ") (def cross "x") (def nought "o")
2
3 (def initial-grid [blank blank blank ;; an initial 3x3 grid of blank spaces,
4 blank blank blank ;; implemented as a vector of length 9
5 blank blank blank]) ;; (persistent data structure)
6
7 (def get-blank (fn [g] ...)) ;; returns a blank space in g
8 (def add (fn [g i x-or-o] ...)) ;; returns g, but with i set to x-or-o
9 (def not-final? (fn [g] ...)) ;; returns true iff g is not final
10
11 (def a->b (chan 1)) (def b<-a a->b) ;; b<-a is an alias of a->b
12 (def b->a (chan 1)) (def a<-b b->a) ;; a<-b is an alias of b->a
13
14 (thread ;; alice
15 (loop [g initial-grid]
16 (let [i (get-blank g)
17 g (set g i cross)]
18 (>!! a->b i)
19 (if (not-final? g)
20 (let [i (<!! a<-b)
21 g (set g i nought)]
22 (if (not-final? g)
23 (recur g))))))
24 (close! a->b))
                                 25 (thread ;; bob
                                 26 (loop [g initial-grid]
                                 27 (let [i (<!! b<-a)
                                 28 g (set g i cross)]
                                 29 (if (not-final? g)
                                 30 (let [i (get-blank g)
                                 31 g (set g i nought)]
                                 32 (>!! b->a i)
                                 33 (if (not-final? g)
                                 34 (recur g))))))
                                 35 (close! b->a))
```
**Fig. 3.** Clojure implementation of Tic-Tac-Toe (dashed arrows: matching send/receive)

Library clojure.core (basic): **–** (def x e): first evaluates e to v; then binds x to v in the global environment. **–** (fn [x<sup>1</sup> ... xn] e<sup>1</sup> ... em): evaluates to a function with parameters x1, ..., x<sup>n</sup> and creates a recursion point; then, when applied to arguments v1, ..., vn, sequentially evaluates e1, ..., e<sup>m</sup> with x1, ..., x<sup>n</sup> bound to v1, ..., vn. **–** (let [x<sup>1</sup> e<sup>1</sup> ... x<sup>n</sup> en] e): first evaluates e<sup>1</sup> to v1; then evaluates e<sup>2</sup> to v<sup>2</sup> with x<sup>1</sup> bound to v1; ...; then evaluates e<sup>n</sup> to v<sup>n</sup> with x1, ..., x<sup>n</sup>−<sup>1</sup> bound to v1, ..., v<sup>n</sup>−<sup>1</sup>; then evaluates e with x1, ..., x<sup>n</sup> bound to v1, ..., vn. **–** (loop [x<sup>1</sup> e<sup>1</sup> ... x<sup>n</sup> en] e): same as let, but also creates a recursion point. **–** (recur e<sup>1</sup> ... en): first evaluates e1, ..., e<sup>n</sup> to v1, ..., vn; then evaluates the nearest recursion point with x1, ..., x<sup>n</sup> bound to v1, ..., vn. **–** (if e<sup>1</sup> e<sup>2</sup> e3): first evaluates e1; if true, evaluates e2; else, evaluates e3. Library clojure.core.async (concurrency): **–** (>!! c e): first evaluates e to v; then sends v through channel c. **–** (<!! c): receives a value through channel c.


**Fig. 4.** Clojure primitives

```
1 (def alice (role "alice")) ;; roles
2 (def bob (role "bob"))
3
4 (def ttt-close (dsl ;; auxiliary spec
5 (par (-## alice bob)
6 (-## bob alice))))
                                 7 (def ttt (dsl ;; main spec
                                 8 (fix :X
                                 9 [(--> alice bob Long)
                                10 (alt (ins ttt-close)
                                11 [(--> bob alice Long)
                                12 (alt (ins ttt-close)
                                13 (fix :X))])])))
```
**Fig. 5.** Discourje specification of Tic-Tac-Toe

```
10 (def m (moni (spec ttt)))
11 (def a->b (chan 1 alice bob m)) (def b<-a a->b)
12 (def b->a (chan 1 bob alice m)) (def a<-b b->a)
```
**Fig. 6.** Changes to Fig. 3 to monitor Alice and Bob against the specification in Fig. 5

To monitor the implementations of Alice and Bob against this specification, first, we need to load library discourje.core.async instead of clojure.core.async (implicitly loaded in Fig. 3). All other code modifications are shown in Fig. 6: on line 10, the specification is evaluated to an internal form (spec) and wrapped in a new monitor (moni), while on lines 11–12, we associate the intended sender, receiver, and monitor with the channels. *No other changes are needed:* notably, the code for Alice (Fig. 3, lines 14–24) and Bob (lines 25–35) is unaffected; Discourje is non-invasive to start using. Running the monitor alongside the implementation guarantees safety: if a non-compliant channel action were to be attempted, the monitor prevents it from happening and throws an exception.

The implementation in Fig 3 can *indeed* violate the specification in Fig. 5: the specification states channels are allowed to be closed only *after* (the receive of) the previous communication is done, but in the implementation, Alice or Bob could attempt to close already *before*. In our artifact, we have a solution where we mix channels with *barrier synchronization* from the standard java.util.concurrent library (readily usable in Clojure), to let Alice and Bob first await each other and then close. Thus, channel-based programming abstractions monitored through Discourje can be mixed seamlessly with other concurrency libraries, which happens regularly in message passing programs [46,48].

**Advanced Discourje: common patterns.** Discourje specifications of common patterns of communication are shown in Fig. 7; they make use of Discourje's *role indexing* and finite *repetition* (rep) features.

Imagine we have a sequence of worker threads, organized in a pipeline (i.e., the i-th worker receives from its predecessor, i−1, and sends to it successor, i+1). Lines 1–2 define the specification of a communication from a worker to its successor. Intuitively, succ is a function that maps three parameters to a specification. For instance, (ins succ bob 5 Turn) inserts (--> (bob 5) (bob 6) Turn), where (bob 5) and (bob 6) are indexed roles. We note that every role created with role allows indexing (with arbitrary types), and that specifications can be

```
1 (def succ (dsl :w :i :t
2 (--> (:w :i) (:w (inc :i)) t)))
3
4 (def pipe (dsl :w :k :t
5 (rep seq [:i (range (dec :k))]
6 (ins succ :w :i :t))))
7
8 (def ring (dsl :w :k :t
9 [(ins pipe :w :k :t)
10 (--> (:w (dec :k)) (:w 0) :t)]))
                                    11 (def one-one-one (dsl :m :w :k :t :u
                                    12 (rep alt [:i (range :k)]
                                    13 [(--> :m (:w :i) :t)
                                    14 (--> (:w :i) :m :u)]))
                                    15
                                    16 (def one-all-one (dsl :m :w :k :t :u
                                    17 (rep par [:i (range :k)]
                                    18 [(--> :m (:w :i) :t)
                                    19 (--> (:w :i) :m :u)]))
```
**Fig. 7.** Discourje specification of common patterns

parametrized by roles (:w), indices (:i), and/or types (:t). We also note that *any* Clojure function can be used in specifications (e.g., inc, to manipulate indices).

Lines 4–6 define the specification of a pipeline communication pattern; it states that specification (ins succ :w :i :t) is repeated for each value :i from 0 to k-1, and the iterations are composed sequentially (seq). Lines 8–10 extend the pipeline to a ring, where the last worker also communicates with the first.

Lines 11–14 define the specification of a communication from a "master" to *one* of k workers, and back. Similarly, lines 16–19 define the specification of a communication from a master to *all* of k workers, and back. In these specifications, loop iterations are composed alternatively (alt) and in parallel (par).

## **3 Design**

**Implementation calculus.** To formalize our verification problem, we first define a calculus to model Clojure implementations. Let range over heap locations, x over variables, v over values, and I over implementations. The calculus is generated by the following grammar:

```
v ::= nil | -
            | fn x I | true | false | 0 | 1 | 2 | ...
I ::= v | I1 I2 | x | def x I | let x I1 I2 | loop x I1 I2 | recur I |
     if I1 I2 I3 | I1 · I2 | send I1 I2 | recv I | close I | chan I | I1  I2
```
Calculus notation corresponds closely with Clojure notation (Fig. 4), with the exception of application (I<sup>1</sup> I2), sequencing (I<sup>1</sup> · I2), and threading (I<sup>1</sup> I2).

The operational semantics of the calculus is defined in terms of labeled reductions of triples (I, E, H): I is an implementation, E is a global environment (from variables to values), and H is a heap (from heap locations to channel states). Channel states are represented as pairs (*w*, n), where *w* is a list of values (messages in transit, from left to right), and n the buffer size. Labels, ranged over by α, are of the form -!v (send), -?v (receive), -# (close), and τ (anything else; we verify only channel actions). The reduction rules are shown in Fig. 8.

Rule [I-Ctxt] executes the first step of implementation I in context C: it first substitutes I for in C (notation: C[I]), and then executes the first step.

(I, <sup>E</sup>, <sup>H</sup>) <sup>α</sup> −→ (I- , E- , H- ) (C[I], <sup>E</sup>, <sup>H</sup>) <sup>α</sup> −→ (C[I-], E-, H-) [I-Ctxt] I[v/x] <sup>α</sup> −→ I- ((fn x I) v, <sup>E</sup>, <sup>H</sup>) <sup>α</sup> −→ (I-, E, H) [I-App] E(x) = v (x, <sup>E</sup>, <sup>H</sup>) <sup>τ</sup> −→ (v, E, H) [I-Var] (def x v, <sup>E</sup>, <sup>H</sup>) <sup>τ</sup> −→ (nil, E[x → v], H) [I-Def] I[v/x] <sup>α</sup> −→ I- (let x v I, <sup>E</sup>, <sup>H</sup>) <sup>α</sup> −→ (I-, E, H) [I-Let] I[v/x][(fn x<sup>r</sup> (loop x x<sup>r</sup> I))/recur ] <sup>α</sup> −→ I- (loop x v I, <sup>E</sup>, <sup>H</sup>) <sup>α</sup> −→ (I-, <sup>E</sup>, <sup>H</sup>) [I-Loop] <sup>v</sup> ∈ {true, false} (Iv, <sup>E</sup>, <sup>H</sup>) <sup>α</sup> −→ (I- <sup>v</sup>, E, H) (if v Itrue <sup>I</sup>false, <sup>E</sup>, <sup>H</sup>) <sup>α</sup> −→ (I- <sup>v</sup>, <sup>E</sup>, <sup>H</sup>) [I-If] (I, <sup>E</sup>, <sup>H</sup>) <sup>α</sup> −→ (I- , E, H) (<sup>v</sup> · I, <sup>E</sup>, <sup>H</sup>) <sup>α</sup> −→ (I-, E, H) [I-Seq] H(-)=(*w*, n) **and** |*w*| < n (send v, <sup>E</sup>, <sup>H</sup>) !<sup>v</sup> −−→ (nil, E, H[- → (v·*w*, n)]) [I-Send] <sup>H</sup>(-)=(*w*·v, n) (recv -, <sup>E</sup>, <sup>H</sup>) ?<sup>v</sup> −−→ (v, E, H[- → (*w*, n)]) [I-Recv] H(-)=(*w*, n) **and** n > 0 (close -, <sup>E</sup>, <sup>H</sup>) #−−→ (nil, E, H[- → (*w*, 0)]) [I-Close] <sup>H</sup>(-) = <sup>⊥</sup> **and** <sup>v</sup> <sup>&</sup>gt; <sup>0</sup> (chan v, <sup>E</sup>, <sup>H</sup>) <sup>τ</sup> −→ (-, E, H[- → (, <sup>v</sup>)]) [I-Chan]

**Fig. 8.** Operational semantics of the implementation calculus

Contexts are generated by the following grammar:

$$
\begin{array}{ccl}
\mathcal{C} ::= \square & \mid \mathcal{C} \; I \; \mid \; (\mathsf{fn} \; x \; I) \; \mathcal{C} \; \mid \; \mathsf{def} \; x \; \mathcal{C} \; \mid \; \mathsf{let} \; x \; \mathcal{C} \; I \; \mid \; \mathsf{lookup} \; x \; \mathcal{C} \; I \; \mid \; \mathsf{if} \; \mathcal{C} \; I\_{\mathsf{t}} \; \mathsf{I} \; \mid \; \mathcal{C} \; \bot \; I \; \mid \; \mathsf{end} \\ \mathsf{send} \; \mathcal{C} \; I \; \mid \; \mathsf{send} \; \ell \; \mathcal{C} \; \mid \; \mathsf{recve} \; \mathcal{C} \; \mid \; \mathsf{close} \; \mathcal{C} \; \mid \; \mathsf{char} \; \mathcal{C} \; \mid \; \mathcal{C} \; \parallel \; I \; \mid \; I \; \mid \; \mathcal{C}
\end{array}
$$

Rule [I-App] executes the first step of a function: it first substitutes value v for variable x in body I (notation: I[v/x]), and then executes the first step. Rule [I-Var] executes a read in the global environment. Rule [I-Def] executes a write to the global environment (notation: E[x → v]). Rule [I-Let] executes the first step of a let binder, similar to rule [I-App]. Rule [I-Loop] executes the first step of a loop: it first substitutes value v for variable x (the loop parameter) in body I, then substitutes the loop itself (wrapped in a function to rebind x in the loop's next iteration) for recur, and then executes the first step. Rule [I-If] executes the first step of a branch of a conditional, if the condition is boolean. Rule [I-Seq] executes the first step of the suffix of a sequence, after the prefix has been executed using rule [I-Ctxt]. Rule [I-Send] executes the send through a channel, if that channel exists and is not full. Rule [I-Recv] executes the receive through a channel, if that channel exists and is not empty. Rule [I-Close] executes the close of a channel, if that channel exists and is not yet closed. Rule [I-Chan] executes the creation of a new channel.

**Specification calculus.** Next, we define a calculus to model Discourje specifications. Let p, q range over roles, f over boolean functions (from the implementation calculus), n, m over number expressions (from the implementation calculus), 274 R. Hamers and S.-S. Jongmans

$$\begin{array}{llll} \frac{1}{1\downarrow} \text{[S\downarrow-One]} & \frac{S\_{i\in\{1,2\}}\downarrow}{S\_{1}+S\_{2}\downarrow} \text{[S\downarrow-Alt]} & \frac{S\_{1}\downarrow \text{and } S\_{2}\downarrow}{S\_{1}\cdot S\_{2}\downarrow} \text{[S\downarrow-Seq]} & \frac{S\_{1}\downarrow \text{and } S\_{2}\downarrow}{S\_{1}\parallel S\_{2}\downarrow} \text{[S\downarrow-Par]} \end{array}$$

**Fig. 9.** Operational semantics of the specification calculus (termination)

(f v, <sup>∅</sup>, <sup>∅</sup>) <sup>τ</sup> −→<sup>∗</sup> (true, ∅, ∅) <sup>p</sup>[n]q[m]: <sup>f</sup> <sup>p</sup>[n]q[m] !<sup>v</sup> −−−−−−−→ <sup>p</sup>[n]q[m]?<sup>v</sup> <sup>p</sup>[n]q[m]?<sup>v</sup> −−−−−−−→ <sup>1</sup> [S-Com] <sup>S</sup> <sup>=</sup> <sup>p</sup>[n]q[m] <sup>S</sup> <sup>p</sup>[n]q[m]# −−−−−−→ <sup>1</sup> [S-Cls] S<sup>i</sup>∈{1,2} β −→ S- S<sup>1</sup> + S<sup>2</sup> β −→ S- [S-Alt] <sup>S</sup><sup>1</sup> β −→ S- 1 S<sup>1</sup> · S<sup>2</sup> β −→ S- <sup>1</sup> · S<sup>2</sup> [S-Seq1] <sup>S</sup><sup>1</sup> <sup>↓</sup> **and** <sup>S</sup><sup>2</sup> β −→ S- 2 S<sup>1</sup> · S<sup>2</sup> β −→ S- 2 [S-Seq2] S[fixX S/X] <sup>β</sup> −→ S- fixX S <sup>β</sup> −→ S- [S-Rec] <sup>S</sup><sup>1</sup> β −→ S- 1 S<sup>1</sup> S<sup>2</sup> β −→ S- <sup>1</sup> S<sup>2</sup> [S-Par1] <sup>S</sup><sup>2</sup> β −→ S- 2 S<sup>1</sup> S<sup>2</sup> β −→ S<sup>1</sup> S- 2 [S-Par2] S[n/x] ⊗ (... ⊗ (S[n- −1/x] ⊗ S[n- /x])) <sup>β</sup> −→ S- -⊗ n≤x≤n- S <sup>β</sup> −→ S-[S-Rep]

**Fig. 10.** Operational semantics of the specification calculus (reduction)

and ⊗ over {+, ·, }. The calculus is generated by the following grammar:

$$S ::= \boxed{1} \mid \begin{array}{l|l} p[n] \multicolumn{1}{|s|}{p[n]} \multicolumn{1}{|s|}{q[n]} \multicolumn{1}{|p[n]}{q[n]q[m]} \mid \begin{array}{l|l} p[n]q[m] \urcorner v \\ \hline \end{array} \mid \begin{array}{l|l} p[n] \multicolumn{1}{|s|}{q[n]} \multicolumn{1}{|s|}{q[m]} \multicolumn{1}{|s|}{S\_1} \multicolumn{1}{|s|}{S\_2} \multicolumn{1}{|s|}{S\_1} \end{array} \mid \begin{array}{l|l} \\ \hline \end{array} \\ \mid S\_1 \mid S\_2 \mid S \end{array}$$

Calculus notation corresponds with Discourje notation (Sect. 2): <sup>p</sup>[n]q[m]: f specifies communication of a value that satisfies <sup>f</sup> from <sup>p</sup>[n] to <sup>p</sup>[m]; <sup>p</sup>[n]q[m] specifies closing of the channel from p[n] to q[m]; S<sup>1</sup> ⊗ S<sup>2</sup> specifies the alternative, sequential, and parallel composition of S<sup>1</sup> and S2; fixX S and X specify recursion; and <sup>⊗</sup> n≤x≤n- S specifies repetition of S for every value x between n and n , where iterations are composed using ⊗. "Boxed" specifications (1 and p[n]q[m]?v; the box is not part of the syntax) are *auxiliary* in the sense they are used in defining the operational semantics (below), but they are not written directly in specifications by programmers: 1 specifies a skip; p[n]q[m]?v specifies a receive of v by q[m], previously sent by p[n].

The operational semantics of the calculus is defined in terms of termination predicate ↓ and labeled reduction relation →. Labels, ranged over by β, are of the form p[n]q[m]!v (send), p[n]q[m]?v (receive), and p[n]q[m]# (close). The termination and reduction rules are shown in Figs. 9–10. (This operational semantics coincides with Basic Process Algebra [22], plus free merge, recursion, and repetition.) Rule [S-Com] induces two reductions (first a send, then a receive), via auxiliary specification p[n]q[m]?v. We note that the specification calculus has no τ -reductions (which are not monitored; we verify only channel actions). We also note that it can express some, but not all, context-free languages: it can count (using ), but it cannot encode a stack.

**Fig. 11.** DSL workflow

**Fig. 12.** API workflow

**Inclusion relation.** Finally, we define a relation to decide if the behavior of an implementation I is included in the behavior of a specification S.

First, let † range over functions from heap locations to sender–receiver pairs; informally, † establishes a correspondence between channel references in the implementation (characterized by their heap locations) and channel references in the specification (characterized by the roles that use them as sender/receiver).

Next, let →<sup>I</sup> ⊆ →. We call →<sup>I</sup> an *execution* of I if it satisfies these conditions:


Finally, a (†, <sup>→</sup><sup>I</sup> )*-simulation* **<sup>R</sup>** is a binary relation such that if ( <sup>ˆ</sup>I, <sup>E</sup>, <sup>H</sup>) <sup>α</sup> −→<sup>I</sup> (ˆI , E , H ) and (ˆI, <sup>E</sup>, <sup>H</sup>) **<sup>R</sup>** <sup>S</sup>, then for some <sup>S</sup> :

**–** if α ∈ {-!v, -?v, -#} for some -, v, then <sup>S</sup> <sup>α</sup>[†()/] −−−−−→ <sup>S</sup> and (ˆI , E , H ) **R** S ; **–** if α = τ , then (ˆI , E , H ) **R** S.

In words, (ˆI, <sup>E</sup>, <sup>H</sup>) **<sup>R</sup>** <sup>S</sup> iff whenever <sup>ˆ</sup><sup>I</sup> can reduce to <sup>ˆ</sup>I , S can reduce accordingly to S (and ˆI and S are again related by **R**), up to τ -reductions (**R** is weak [24]).

Implementation I is *safe* relative to specification S, denoted as I S, if for every execution →<sup>I</sup> of I, there is a (†, →<sup>I</sup> )-simulation **R** such that (I, ∅, ∅) **R** S.

## **4 Implementation**

**The DSL.** The DSL consists of: Clojure macros to write specifications (cf. syntax of the specification calculus; Sect. 3); Clojure data structures to represent specifications as state machines (cf. operational semantics of the specification calculus); Clojure functions to instantiate these data structures and construct monitors. The workflow is shown in Fig. 11: first, the programmer writes a specification S using the macros; then, at run-time, function spec is applied to S to expand and evaluate the macros to a data structure S; then, function moni is applied to S to construct a monitor.

Essentially, the monitor provides two operations, depicted as "lollipops" in Fig. 11: *checking* if a given channel action <sup>α</sup> is allowed by S (formally: <sup>S</sup> <sup>α</sup> −→ S for some S ), and subsequently *updating* S to its successor. In this way, effectively, the monitor incrementally builds a formal simulation to ensure safety (Sect. 3, page 10). We note that checking/updating is protected by lock-free synchronization (compare-and-set): an α reduction happens only if it was already checked if α is allowed, *and* the state has not yet been updated after that check.

**The API.** The API consists of Clojure functions that act as proxies for Clojure's own functions to send, receive, close channels, and construct channels. The workflow is shown in Fig. 12: first, the programmer writes an implementation I using Clojure's own functions; then, by loading library discourje.core.async instead of clojure.core.async, the programmer adds instrumentation to the implementation that allows channel actions to be monitored. As the signatures of Discourje's send, receive, and close functions are identical to Clojure's, adding instrumentation in this way is non-invasive and nearly effortless; the only changes needed, pertain to channel creation (Sect. 2, Fig. 6), since we require the programmer also to specify which roles will use the channel and associate a monitor (this is the practical embodiment of function † in Sect. 3, page 10).<sup>4</sup>

Discourje's send function works as follows. When invoked, first, it waits until the underlying channel c is not full (recall channels in Clojure are bounded and blocking). Then, at time t1, it calls the monitor associated with c to check if the send is allowed. If yes, at time t2, it calls the monitor to update accordingly and the "actual send" happens through c; if no, only an exception is thrown. If, between t<sup>1</sup> and t2, multiple threads call the monitor to update, only one will succeed; the others need to retry from the start. Discourje's receive and close functions work similarly. In this way, Discourje detects safety violations in a way that is both sound (if an exception is thrown, the violating action really was not allowed) and complete (if no exception is thrown, all actions were really allowed).

**Extensions.** We implemented a number of extensions to the basic framework:


## **5 Evaluation**

**General setup.** We developed Discourje for two primary usage types:

<sup>4</sup> We currently support the following main channel operations of clojure.core.async: sending, receiving, and closing. Discourje works out-of-the-box for all Clojure programs, except those that use unsupported clojure.core.async features; mixing Discourje with other concurrency libraries is fine (Sect. 2).

An interesting next step is to also support clojure.core.async's transducers (operations on data-in-transit): to our knowledge, no existing work on MPST supports transducers, so supporting those requires significant new theoretical work.


A key factor that determines Discourje's fitness for purpose is *overhead*. We therefore conducted two kinds of benchmarks: microbenchmarks to study the *scalability* of Discourje and whole-program benchmarks to study the *slowdown* it inflicts relative to unmonitored code.

We used two different hardware configurations to run our benchmarks: vm2 is an instance of the TACAS'20 Artifact Evaluation Virtual Machine for VirtualBox, configured with 2 virtual cores and 8 GB of virtual memory; lisa is a high-end machine with 16 physical cores (Intel Xeon 6130 processor; hyperthreading disabled) and 96 GB of physical memory (far more than needed for our benchmarks). We hosted vm2 on a machine with 4 physical cores (Intel Core i7-8569U; hyper-threading enabled) and 16 GB of physical memory.

**Microbenchmarks.** In the microbenchmarks, we studied Discourje's scalability under extreme circumstances where threads perform *only* sends/receives and no real computations; this is the worst-case scenario for the lock-free algorithm to synchronize monitor access, as it gives rise to maximal thread contention.

We considered three specifications to investigate the core features/operators offered by the Discourje DSL in isolation, using our built-in common patterns (Fig. 7): ring for sequential composition, one-one-one (OOO) for alternative composition, and one-all-one (OAO) for parallel composition. Each pattern was recursively repeated (i.e., wrapped in (fix :X [... (fix :X)]). For Ring and OAO, a *round* consists of 1000 repetitions; for OOO, a round consists of 1000·n repetitions, where n is the number of worker threads.

For each implementation I ∈ {Ring, OOO, OAO} with n ∈ {2, 4, 6, 8, 10, 12, <sup>14</sup>, <sup>16</sup>} worker threads,<sup>5</sup> we recorded the mean round latency <sup>μ</sup><sup>I</sup> <sup>n</sup> in eight hours of execution on lisa, the standard deviation σ<sup>I</sup> <sup>n</sup>, and the coefficient of variation cI n=<sup>μ</sup><sup>I</sup> n σ<sup>I</sup> n . We found c<sup>I</sup> <sup>n</sup> <sup>≤</sup> 6% for all <sup>I</sup> and <sup>n</sup>, except <sup>c</sup>OOO <sup>6</sup> = 14% and cOOO <sup>8</sup> = 8%.

As a measure of scalability, we computed normalized means <sup>|</sup>μ<sup>I</sup> <sup>n</sup><sup>|</sup> <sup>=</sup> <sup>μ</sup><sup>I</sup> n <sup>0</sup>.5·n·μ<sup>I</sup> 2 : this metric is a dimensionless number that indicates the extent to which implementations scale linearly in the number of worker threads, relative to n = 2. For instance, if <sup>|</sup>μ<sup>I</sup> <sup>16</sup>| = 1, I with 16 workers threads is exactly 8× as slow as I with 2 worker threads; this is reasonable, because the worker threads perform 8× more sends and receives in each round (due to the adversarial microbenchmark conditions, the sends and receives are effectively linearized by the monitor, which can check and update at most one channel action at a time).

The normalized means are shown in Fig. 13; our raw data (including standard deviations) are included in our artifact. We summarize the findings:

<sup>5</sup> For Ring, the total number of threads is n; for OOO and OAO, the total number of threads is n+1 (the master thread).

**Fig. 13.** Microbenchmarks on lisa: Ring (blue), OOO (red), and OAO (yellow); number of threads (x-axis) vs. scalability relative to n = 2 (y-axis)

**Fig. 14.** Whole-program benchmarks on vm2: Chess (left) and NPB (right); play time (x-axis, left) and program (x-axis, right) vs. monitoring slowdown (y-axis)

**Fig. 15.** Whole-program benchmarks on lisa: CG, FT, IS, and MG (from left to right); number of threads (x-axis) vs. monitoring slowdown (y-axis)


To conclude, Ring (which exercises sequential composition) enjoys excellent scalability, while OOO (which exercises alternative composition) enjoys decent scalability, even under the adversarial microbenchmark conditions. Scalability of OAO (parallel composition) can be improved; we discuss one avenue in Sect. 7. **Whole-program benchmarks.** In our whole-program benchmarks, we studied Discourje's possible slowdown in five real(istic)/existing concurrent programs:


For Chess, we used Clojure code similar to threads Alice and Bob in Tic-Tac-Toe (Fig. 3), combined with invocations of the open source chess engine Stockfish 10 (https://stockfishchess.org) to compute moves. For CG, FT, IS, and MG, we adapted existing Java implementations from the *NAS Parallel Benchmarks* (NPB) [23] suite, which consists of computational fluid dynamics kernels, by taking advantage of our Java interoperability wrapper (Sect. 4) to replace the monitor-based synchronization used in the original versions.

We also wrote specifications for these implementations in the Discourje DSL. For Chess, the specification is the same as the Tic-Tac-Toe specification (Sect. 2); for CG, FT, IS, and MG, the specifications consist of recursively repeated choices among various instances of the one-all-one pattern (each of which involves different subsets of worker threads and message types); the key difference between the specifications, then, is the *frequency* in which repetitions occur.

We recorded execution times of each of the implementations without and with monitoring enabled, using existing/standardized workloads. For Chess, the workload is controlled by the total amount of time each player has to compute its moves during the entire game; we used the four smallest such workloads supported by the open source chess server Lichess (https://lichess.org), namely {15, 30, 45, 60} seconds, and we limited games to a maximum of 40 turns per player (*UltraBullet chess*).<sup>6</sup> For CG, FT, IS, and MG, the workload is controlled by the input size; we used the standardized inputs that are predefined by NPB.

We ran Chess on vm2; we ran CG-n, FT-n, IS-n, and MG-n on vm2 for <sup>n</sup> = 2 and on lisa for <sup>n</sup> ∈ {2, <sup>4</sup>, <sup>6</sup>, <sup>8</sup>, <sup>10</sup>, <sup>12</sup>, <sup>14</sup>, <sup>16</sup>}. We repeated each of the runs 50 times to smooth out variability; the resulting coefficients of variation are below 5% for CG, FT, IS, and MG, and between 19%–22% for Chess (because moves are not computed deterministically, which affects the number of turns per game). As a measure of slowdown, we computed normalized means of execution times with monitoring, μw, against those without monitoring, μwo (i.e., <sup>μ</sup><sup>w</sup> <sup>μ</sup>wo ): this

<sup>6</sup> We allow concurrent "ponder" computations during opponents' turns.

metric is a dimensionless number that indicates the factor by which monitoring slows down the implementation.

The normalized means are shown in Figs. 14-15; the raw data (including standard deviations) are included in our artifact. We summarize the findings:


To conclude, we believe it is encouraging to see that *even* (extended versions of) the specification that scaled poorest in our microbenchmarks, can give well enough performance in real concurrent programs for both usage types A and B.

## **6 Related Work**

Expressiveness issues of multiparty session types (MPST) have received some attention, but efforts have primarily been geared towards adding more advanced features (e.g., time [5,36], security [7,8,9,13], and parametrisation [14,20,39]); in contrast, restrictions on the usage of core features like choice and interleaving have remained, even though they limit MPST's applicability in practice (e.g., our Tic-Tac-Toe specification cannot be expressed; Fig. 5). Recently, work has been done to improve MPST's expressiveness in this regard using static techniques [31], but our specification language in this paper is still more expressive.

Closest to our work, then, are hybrid MPST approaches that combine static type-checking with a form of distributed runtime monitoring and/or assertion checking [3,4,19,36,37]. In contrast to this paper, however, these dynamic techniques still rely on projection, which limits expressiveness (Sect. 1); none of the specifications in this paper are supported.

Projection-free MPST has also been explored by L´opez et al. [34,43]. Their idea is to specify MPI communication protocols in an MPI-tailored DSL, inspired

by MPST, and verify the implementation against the specification using deductive verification tools (VCC [18] and Why3 [21]). However, this approach does not support push-button verification: considerable manual effort is required. In contrast, our approach is fully automated.

We are aware of only two other works that use formal techniques to reason about Clojure programs: Bonnaire-Sergeant et al. [6] formalized the optional type system for Clojure and proved soundness, while Pinzaru et al. [41] developed a translation from Clojure to Boogie [2] to verify Clojure programs annotated with pre/post-conditions. Ours is the first paper that targets concurrency in Clojure.

Verification of shared-memory concurrency with channels has received attention in the context of Go [40,32,33,45]. However, emphasis in these works is on checking deadlock-freedom, liveness, and generic safety properties, while we focus on program-specific protocol compliance. Castro et al. [14] also consider protocol compliance, but their specification language (of global types) is less expressive than ours and does not support this paper's examples.

## **7 Conclusion**

We presented Discourje: a runtime verification framework for channel-based communication protocols in Clojure. Discourje is based on a projection-free interpretation of multiparty session types, trading static type-checking for dynamic runtime monitoring to alleviate expressiveness issues. A key design principle of Discourje has been ergonomics: we aim to make Discourje's use as comfortable as possible. Specifically, programmers can decide to start using Discourje at any stage of development (and doing so requires little effort); Discourje is itself implemented in Clojure (so no need to use a different IDE, learn completely new syntax, or install special compilers); and Discourje can be used seamlessly alongside other concurrency libraries. The framework has a formal foundation, and benchmarks indicate that monitoring overhead can be less than 5% for real/existing concurrent programs. This makes Discourje suitable both as a testing/debugging tool in development, and as a fail-safe mechanism in production.

We list two interesting avenues for future work. First, we want to refine our lock-free synchronization algorithm to enhance the way parallel composition is handled. Second, a much more profound extension pertains to *feedback* and *recovery*. Specifically, we want to explore the idea that whenever a monitor detects a violation, instead of throwing an exception, it should simply *delay* the violating action as a corrective measure, in an attempt to steer the implementation toward safe behavior. When done naively, such delays can easily yield deadlocks, so our plan is to combine this with runtime model-checking/reachability analysis to check if *eventually*, the violating action is allowed (if yes, delay; if no, throw).

*Acknowledgments.* Funded by the Netherlands Organisation of Scientific Research (NWO): 016.Veni.192.103. This work was carried out on the Dutch national e-infrastructure with the support of SURF Cooperative.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Probabilistic Systems

#### **Scenario-Based Verification of Uncertain MDPs***-*

Murat Cubuktepe<sup>1</sup> , Nils Jansen<sup>2</sup> , Sebastian Junges<sup>3</sup> , Joost-Pieter Katoen<sup>3</sup> , Ufuk Topcu<sup>1</sup>

<sup>1</sup> The University of Texas at Austin, Austin, USA <sup>2</sup> Radboud University Nijmegen, Nijmegen, The Netherlands <sup>3</sup> RWTH Aachen University, Aachen, Germany

**Abstract.** We consider Markov decision processes (MDPs) in which the transition probabilities and rewards belong to an uncertainty set parametrized by a collection of random variables. The probability distributions for these random parameters are unknown. The problem is to compute the probability to satisfy a temporal logic specification within any MDP that corresponds to a sample from these unknown distributions. In general, this problem is undecidable, and we resort to techniques from so-called scenario optimization. Based on a finite number of samples of the uncertain parameters, each of which induces an MDP, the proposed method estimates the probability of satisfying the specification by solving a finite-dimensional convex optimization problem. The number of samples required to obtain a high confidence on this estimate is independent from the number of states and the number of random parameters. Experiments on a large set of benchmarks show that a few thousand samples suffice to obtain high-quality confidence bounds with a high probability.

**Keywords:** MDP, Uncertainty, Verification, Scenario optimisation

## **1 Introduction**

MDPs. Markov decision processes (MDPs) model sequential decision-making problems in stochastic dynamic environments [51]. They are widely used in areas like planning [52], reinforcement learning [53], formal verification [48], and robotics [24]. Mature model checking tools like PRISM [21] and Storm [35] employ efficient algorithms to verify the correctness of MDPs against temporal logic specifications [2] provided all transition probabilities and cost functions are exactly known. In many applications, however, this assumption may be unrealistic, as certain system parameters are typically not exactly known and under control by external sources.

<sup>-</sup> Supported by the grants ARL # ACC-APG-RTP W911NF, NASA # 80NSSC19K0209, NSF # 1646522, and NSF # 1652113.

A. Biere and D. Parker (Eds.): TACAS 2020, LNCS 12078, pp. 287–305, 2020. https://doi.org/10.1007/978-3-030-45190-5\_16

Uncertain MDPs. A common approach to deal with unknown system parameters is to let transition probabilities and cost functions of an MDP belong to uncertainty sets, resulting in so-called uncertain MDPs [25,14,28], which generalize interval MDPs [20,27,7]. However, solution approaches, e.g., in [25,14,28], usually rely on the potentially limiting assumption that the uncertainty sets at different states of the MDP are independent from each other.

Consider a simple motion planning scenario where an unmanned aerial vehicle (UAV) is tasked to transport a certain payload to a target location. The problem is to compute a policy for the UAV to successfully deliver the payload while taking into account the weather conditions. External factors like wind strength or direction may affect the movement of the UAV. The assumption that such weather conditions are independent between the different possible states of UAV is unrealistic, and does not adequately model the scenario at hand.

For settings in which the uncertainties at different states depend on each other, an option is to account for all possible–albeit infinitely many–values in the uncertainty sets. The policy synthesis problem can be formulated as a so-called semi-infinite convex optimization problem, which includes finitely many variables but infinitely many constraints [28]. This problem, however, is NP-hard [28,18]. Furthermore, it fails to exploit additional information that may be available as random variables over the uncertainty sets [36], and may be very conservative. For instance, weather-data in the form of probability distributions may provide additional information on potential changes during the mission.

In this paper, we study a setting in which the fact the uncertain parameters are random variables and the dependencies between them are accounted for explicitly. Furthermore, each random parameter follows an unknown probability distribution from which we can sample the parameter values.

Problem statement. Compute the probability with which there exists a policy such that a reachability or an expected-cost specification is satisfied for any randomly drawn parameter value.

We call this probability the satisfaction probability. The intuition is that the question of whether all (or some) parameter values satisfy a specification—as is often done in parameter synthesis [46]—is replaced by the question of how much we expect the (sampled) model to satisfy a specification. For example, a satisfaction probability of 80% tells that, if we randomly sample the parameters, with a probability of 80% there exists a policy for the resulting MDP that satisfies the specification. Computing the satisfaction probability is in general undecidable, even for known probability distributions over the parameter values [37].

Scenario-based verification. Therefore, we resort to sampling-based algorithms that yield a confidence (probability) on the bounds of the satisfaction probability. Referring back to the UAV example, we want to compute a confidence probability in the probability that there exists a policy for the UAV to successfully finish the mission. As a first step, we take the aforementioned semi-infinite optimization problem that accounts for all possible parameter values as a basis. Each concrete

parameter value is referred to as a scenario in the convex optimization literature [15]. For specific problems where a distribution over individual scenarios is present, a technique called scenario-based optimization provides guarantees on the satisfaction probability via efficient sampling techniques [15,16]. The basic idea is to consider a finite set of samples from the distribution over the scenarios and restrict the semi-infinite problem to these samples. The resulting convex optimization problem with finitely many constraints can be solved efficiently [50].

For our setting, we first sample a finite number of parameter instantiations each of which induces a concrete MDP. We can solve the synthesis problem for this MDP efficiently using, e.g., a probabilistic model checker. Based on the results, we compute a satisfaction probability and an estimate of its potential error. For example, a 90% estimate in a satisfaction probability of 80%, means that the error is at most 10%. We show that the error in the estimate diminishes to zero exponentially rapidly with increasing number of samples. Moreover, we show that the number of required samples does neither depend on the size of the state space nor the number of random parameters. We validate the theoretical results using several MDPs that have different sizes of state and parameter spaces and demonstrate experimentally that the required number of samples is indeed not sensitive to the dimension of the state and parameter space. In addition, we show the effectiveness of our method with a new dedicated case study based on the aforementioned UAV example which incorporates 2 500 random parameters.

Related work. The so-called parameter synthesis problem is concerned with computing parameter values such that there exists a policy in the induced nonparametric MDP that satisfies the specifications. Most of the work in parameter synthesis focus on finding one parameter value that satisfies the specification. The approaches involve computing a rational function of the reachability probabilities [11,17,41], utilizing convex optimization [34,40], and sampling-based methods [26,29]. The problem of whether there exists a value in the parameter space that satisfies a reachability specification is ETR-complete<sup>4</sup> [47], and finding a satisfying parameter value is exponential in the number of parameters.

The work in [45] considers the analysis of Markov models in the presence of uncertain rewards, utilizing statistical methods to reason about the probability of a parametric MDP satisfying an expected cost specification. This approach is restricted to reward parameters and does not explicitly compute confidence bounds. [43] computes bounds on the long-run probability of satisfying a specification with probabilistic uncertainty for Markov chains. Other related techniques include multi-objective model checking to maximize the average performance with probabilistic uncertainty sets [36], sampling-based methods which minimize the regret with uncertainty sets [33], and Bayesian reasoning to compute parameter values that satisfy a metric temporal logic specification on a continuous-time Markov chain [38]. [37] considers a variant of the problem in this paper where

<sup>4</sup> The ETR satisfiability problem is to decide if there exists a satisfying assignment to the real variables in a Boolean combination of a set of polynomial inequalities. It is known that NP ⊆ ETR ⊆ PSPACE.

the probability distribution of the uncertainty sets is assumed to be known. The paper formulates the policy synthesis problem as an (undecidable [30]) partially observable Markov decision process (POMDP) synthesis problem and use offthe-shelf point-based POMDP methods [10,6]. The work in [27,25] consider the verification of MDPs with convex uncertainties. However, the uncertainty sets for different states in an MDP are restricted to be independent, which does not hold in our problem setting where we have parameter dependencies.

Uncertainties in MDPs have received quite some attention in the artificial intelligence and planning literatures. Interval MDPs [27,7] use probability intervals in the transition probabilities. Dynamic programming, robust value iteration and robust policy iteration have been developed for MDPs with uncertain transition probabilities whose parameters are statistically independent, also referred to as rectangular, to find a policy ensuring the highest expected total reward at a given confidence level [14,25]. The work in [28] relaxes this independence assumption a bit and determines a policy that satisfies a given performance with a pre-defined confidence provided an observation history of the MDP is given by using conic programming. State-of-the art exact methods can handle models of up to a few hundred of states [42]. Multi-model MDPs [44] treat distributions over probability and cost parameters and aim at finding a single policy maximizing a weighted value function. For deterministic policies this problem is NP-hard, and it is PSPACE-hard for history-dependent policies.

## **2 Preliminaries**

A - probability distribution over a finite set <sup>X</sup> is a function <sup>μ</sup>: <sup>X</sup> <sup>→</sup> [0, 1] <sup>⊆</sup> <sup>R</sup> with <sup>x</sup>∈<sup>X</sup> <sup>μ</sup>(x) = 1. The set of all distributions on <sup>X</sup> is denoted by Distr (X). Let <sup>V</sup> <sup>=</sup> {x1,...,xn} be a finite set of parameters over <sup>R</sup><sup>n</sup>. The set of polynomials over <sup>V</sup> is denoted by <sup>Q</sup>[<sup>V</sup> ]. We denote the cardinality of a set <sup>U</sup> by |U|.

#### **2.1 Parametric Models**

**Definition 1 (pMDP).** A parametric Markov decision process (pMDP) M is a tuple <sup>M</sup> = (S, Act, s<sup>I</sup> ,V,P) with a finite set <sup>S</sup> of states, a finite set Act of actions, an initial state <sup>s</sup><sup>I</sup> <sup>∈</sup> <sup>S</sup>, a finite set <sup>V</sup> of real-valued variables (parameters) and a transition function <sup>P</sup> : <sup>S</sup> <sup>×</sup> Act <sup>×</sup> <sup>S</sup> <sup>→</sup> <sup>Q</sup>[<sup>V</sup> ].

For <sup>s</sup> <sup>∈</sup> <sup>S</sup>, ActS(s) = {<sup>α</sup> <sup>∈</sup> Act | ∃s <sup>∈</sup> S, <sup>P</sup>(s, α, s ) = 0} is the set of enabled actions at <sup>s</sup>. Without loss of generality, we require ActS(s) <sup>=</sup> <sup>∅</sup> for <sup>s</sup> <sup>∈</sup> <sup>S</sup>. If <sup>|</sup>ActS(s)<sup>|</sup> = 1 for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>, <sup>M</sup> is a parametric discrete-time Markov chain (pMC). We denote the transition function for pMCs by <sup>P</sup>(s, s ).

A pMDP M is a Markov decision process (MDP) if the transition function yields - well-defined probability distributions, i.e., <sup>P</sup> : <sup>S</sup> <sup>×</sup> Act <sup>×</sup> <sup>S</sup> <sup>→</sup> [0, 1] and s-<sup>∈</sup><sup>S</sup> <sup>P</sup>(s, α, s ) = 1 for all <sup>s</sup> <sup>∈</sup> <sup>S</sup> and <sup>α</sup> <sup>∈</sup> ActS(s). We denote the parameter space of <sup>M</sup> by <sup>V</sup>M. Applying an instantiation <sup>u</sup> ∈ V<sup>M</sup> to a pMDP <sup>M</sup> yields the instantiated MDP <sup>M</sup>[u] by replacing each <sup>f</sup> <sup>∈</sup> <sup>Q</sup>[<sup>V</sup> ] in <sup>M</sup> by <sup>f</sup>[u]. An instantiation <sup>u</sup> is well-defined for <sup>M</sup> if the resulting model <sup>M</sup>[u] is an MDP. We assume that all parameter instantiations in V<sup>M</sup> yield well-defined MDPs. We call <sup>u</sup> graph-preserving if for all s, s <sup>∈</sup> <sup>S</sup> and <sup>α</sup> <sup>∈</sup> Act it holds that <sup>P</sup>(s, α, s ) = 0 ⇒ <sup>P</sup>(s, α, s )[u] <sup>∈</sup> (0, 1]. If <sup>P</sup>(s, α, s ) ∈ {p, <sup>1</sup> <sup>−</sup> <sup>p</sup> <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>V</sup> } ∪ <sup>Q</sup>, then the parameter space <sup>V</sup><sup>M</sup> is given by the rectangle [0, 1]|<sup>V</sup> <sup>|</sup> . We also consider a state-action cost function <sup>c</sup> : <sup>S</sup> <sup>×</sup> Act <sup>→</sup> <sup>Q</sup>[<sup>V</sup> ]. We denote the set of cost parameters as <sup>W</sup>.

To define measures on MDPs, nondeterministic choices are resolved by a so-called policy <sup>σ</sup> : <sup>S</sup> <sup>→</sup> Act with <sup>σ</sup>(s) <sup>∈</sup> ActS(s). The set of all policies over M is StrM. For the specifications that we consider in this paper, memoryless deterministic policies are sufficient [48]. Applying a policy to an MDP yields an induced Markov chain where all nondeterminism is resolved.

For an MC <sup>D</sup>, the reachability specification <sup>ϕ</sup><sup>r</sup> <sup>=</sup> **<sup>P</sup>**≤λ(♦T) asserts that a set <sup>T</sup> <sup>⊆</sup> <sup>S</sup> of target states is reached with probability at most <sup>λ</sup> <sup>∈</sup> [0, 1]5. If <sup>ϕ</sup><sup>r</sup> holds for <sup>D</sup>, we write D |<sup>=</sup> <sup>ϕ</sup>r. Model checking for the more general PCTL [4] or <sup>ω</sup>regular specifications is often reducible to checking reachability specifications [48]. For an MDP <sup>M</sup>, <sup>ϕ</sup><sup>r</sup> holds if for all <sup>σ</sup> <sup>∈</sup> Str<sup>M</sup> such that the induced MC <sup>D</sup> by the policy σ reaches the set T with a probability of at most λ. For an expected cost specification <sup>ϕ</sup><sup>c</sup> <sup>=</sup> EC≤<sup>κ</sup>(♦G), it holds that D |<sup>=</sup> <sup>ϕ</sup><sup>c</sup> if and only if the expected cost of reaching a set <sup>G</sup> <sup>⊆</sup> <sup>S</sup> is at most <sup>κ</sup> <sup>∈</sup> <sup>R</sup>. The expected cost of reaching <sup>G</sup> is well-defined if and only if **P**(♦T) = 1 for all policies in an MDP.

#### **2.2 Uncertain MDPs**

We now introduce the setting that we study in this paper. Specifically, we use parameters to define the uncertainty in the transition probabilities and cost functions of an MDP. Each random parameter follows an unknown probability distribution from which we can sample the parameter values.

**Definition 2 (uMDP).** An uncertain Markov decision process M<sup>P</sup> (uMDP) is a tuple <sup>M</sup><sup>P</sup> <sup>=</sup> (M, <sup>P</sup>) where <sup>M</sup> is a pMDP, and <sup>P</sup> is a probability distribution over the parameter space VM. If M is a pMC, then we call M<sup>P</sup> a uMC.

Intuitively, a uMDP is a pMDP with an associated distribution over possible (graph-preserving) parameter instantiations. That is, a realization of P yields a concrete MDP <sup>M</sup>[u] with the respective instantiation <sup>u</sup> ∈ V<sup>M</sup> (and <sup>P</sup>(u) <sup>&</sup>gt; 0).

Remark 1. In a uMDP, we distinguish controllable and uncontrollable parameters. The uncontrollable parameters follow the probability distribution P. In contrast, we can actively instantiate the controllable parameters. In the following, we specifically allow cost parameters to be controllable.

**Definition 3 (Satisfaction Probability).** Let <sup>M</sup><sup>P</sup> <sup>=</sup> (M, <sup>P</sup>) be a uMDP and ϕ a specification. The (weighted) satisfaction probability of ϕ is

$$F(\mathcal{M}\_{\mathbb{P}}, \varphi) = \int\_{\mathcal{V}\_{\mathcal{M}}} I\_{\varphi}(u) \, d\mathbb{P}(u)$$

<sup>5</sup> The theory also applies to lower bounded properties.

**Fig. 1.** Left: A uMC with parameter v. Right: The probability of satisfying the reachability specification <sup>ϕ</sup><sup>r</sup> <sup>=</sup> **<sup>P</sup>**≤<sup>λ</sup>(♦T) versus the value of the parameter <sup>v</sup>. Intervals that satisfy ϕ<sup>r</sup> are green, intervals that violate ϕ<sup>r</sup> are red.

with <sup>u</sup> ∈ V<sup>M</sup> and <sup>I</sup><sup>ϕ</sup> : <sup>V</sup><sup>M</sup> → {0, <sup>1</sup>} is the indicator for <sup>ϕ</sup>, i.e. <sup>I</sup>ϕ(u)=1 iff <sup>M</sup>[u] <sup>|</sup><sup>=</sup> <sup>ϕ</sup>.

Note that <sup>I</sup><sup>ϕ</sup> is measurable, as <sup>V</sup><sup>M</sup> is the finite union of semi-algebraic sets [49]. Moreover, we have that <sup>F</sup>(M<sup>P</sup>, ϕ) <sup>∈</sup> [0, 1] and <sup>F</sup>(M<sup>P</sup>, ϕ) + <sup>F</sup>(M<sup>P</sup>, <sup>¬</sup>ϕ) = 1.

Example 1. Consider the uMC in the left figure of Fig. 1 with the uncontrollable parameter set <sup>V</sup> <sup>=</sup> {v}, initial state <sup>s</sup>0, target set <sup>T</sup> <sup>=</sup> {s3} and an uniform distribution for the parameter v over the interval [0, 1]. We plot the probability of satisfying the specification <sup>ϕ</sup><sup>r</sup> <sup>=</sup> **<sup>P</sup>**≤λ(♦T) as a function of <sup>v</sup> in the right figure of Fig. 1. We also show the satisfying region and its complementary as green and red regions. The satisfying region is given by the union of the intervals [0.13, 0.525] and [0.89, <sup>1</sup>.0], and the satisfaction probability <sup>F</sup>(M<sup>P</sup>, ϕr) is 0.395 + 0.11 = 0.505.

## **3 Problem Statement**

In this section, we state the problem that we study in this paper. We seek to compute the satisfaction probability of the parameter space for a reachability or an expected cost specification ϕ on a uMDP. Intuitively, we seek the probability that a randomly sampled instantiation from the parameter space induces an MDP which satisfies <sup>ϕ</sup>. Formally: Given a uMDP <sup>M</sup><sup>P</sup> <sup>=</sup> (M, <sup>P</sup>), and a specification <sup>ϕ</sup>, compute the satisfaction probability <sup>F</sup>(M<sup>P</sup>, ϕ). However, as mentioned, the problem is in general undecidable [37]. Therefore, we consider an approximation of computing the satisfaction probability:

Problem 1. Given a uMDP <sup>M</sup><sup>P</sup> <sup>=</sup> (M, <sup>P</sup>), a reachability specification <sup>ϕ</sup><sup>r</sup> <sup>=</sup> **<sup>P</sup>**≤<sup>λ</sup>(♦T), and a tolerance probability <sup>ν</sup>, compute a confidence probability <sup>α</sup><sup>ν</sup> such that <sup>F</sup>(M<sup>P</sup>, ϕr) <sup>≥</sup> <sup>1</sup> <sup>−</sup> <sup>ν</sup> holds with a probability of at least 1 <sup>−</sup> <sup>α</sup>ν.

We illustrate the problem statement with the following example.

Example 2. For the UAV motion planning example, consider the question "What is the probability on a given day such that there exists a policy for the UAV to successfully finish the mission." A possible result is, e.g., 0.78 (confidence probability: 0.99) and 0.81 (confidence probability: 0.95). Then, with a confidence probability of 0.99, the actual satisfaction probability is indeed greater than 0.78, and with a (slightly lower) confidence probability of 0.95 it is greater than 0.81. Such a result shows that it is quite likely that the UAV will finish the mission successfully with a probability that is at least 81%.

Similar to Problem 1, we also consider expected cost specifications.

Problem 2. Given a uMDP <sup>M</sup><sup>P</sup> <sup>=</sup> (M, <sup>P</sup>), and an expected cost specification <sup>ϕ</sup><sup>c</sup> <sup>=</sup> EC≤κ(♦G), a tolerance probability <sup>ν</sup>, and a confidence probability <sup>α</sup><sup>ν</sup> determine if there exists an instantiation to the cost parameters such that <sup>F</sup>(M<sup>P</sup>, ϕc) <sup>≥</sup> <sup>1</sup> <sup>−</sup> <sup>ν</sup> holds with a probability of at least 1 <sup>−</sup> <sup>α</sup>ν.

Remark 2. The main difference between Problem 1 and Problem 2 is that we consider controllable cost parameters. We seek to compute an instantiation to these parameters such that the satisfaction probability is greater than 1 <sup>−</sup> <sup>ν</sup> with high confidence.

## **4 Scenario-Based Verification**

In this section, we present our approach to solving Problem 1 and 2, that is, to approximate the satisfaction probability with respect to a specification. We first consider the robust policy synthesis problem that accounts for all possible values in the uncertainty set, potentially leading to a very pessimistic result. This problem can be formulated as a semi-infinite convex optimization problem, which is NP-hard [28]. Here, we exploit the structure of this problem, which includes finitely many variables but infinitely many constraints. Our approach is based on scenario optimization [15,16]: We sample a finite number of parameter values and restrict the semi-infinite problem to these samples. The resulting finite-dimensional convex optimization problem can be solved efficiently [50]. Based on the solution of the optimization problem, we compute high confidence in the estimate of the satisfaction probability. The estimate also generalizes to the samples from the probability distribution that are not in the sample set.

Remark 3. For ease of presentation, we focus on uncertain Markov chains (uMCs). Our results and methods generalize to uncertain MDPs (uMDPs).

We first develop the main results for the simple setting where all sampled instantiated MCs from the parameter space V<sup>D</sup> satisfy the reachability specification ϕr. This assumption does not imply that all instantiated MCs satisfy ϕr: The sample set does not contain an MC that violates ϕ<sup>r</sup> even though there exists such an MC in the parameter space. In Section 4.2, we drop this assumption and allow sampled points in <sup>V</sup><sup>D</sup> to violate <sup>ϕ</sup>r. This completes our treatment of Problem 1. In Section 4.3, we show how our results generalize to expected cost specifications ϕc, to solve Problem 2.

#### **4.1 Restriction to Satisfying Samples**

In this section, we assume that all instantiated MCs satisfy ϕr. We then generalize our method to any values of <sup>ν</sup>. We want to check if a uMC <sup>D</sup> satisfies a reachability specification <sup>ϕ</sup><sup>r</sup> <sup>=</sup> **<sup>P</sup>**≤<sup>λ</sup>(♦T) for all instantiations in the sample set <sup>U</sup>. For each instantiation, we can formulate a linear program (LP) that is feasible if and only if <sup>ϕ</sup><sup>r</sup> is satisfied [51]. For a subset U⊆V<sup>D</sup> of the parameter space <sup>V</sup><sup>D</sup> of the uMC D, we can then write the conjunction of these LPs. We assume that |U| is finite and sampled from the probability distribution <sup>P</sup> over the parameter space <sup>V</sup>D.

For each instantiation <sup>u</sup> ∈ U, we introduce a set of linear constraints that are parametrized by <sup>u</sup><sup>6</sup>. We use the following variables. For <sup>s</sup> <sup>∈</sup> <sup>S</sup> and <sup>u</sup> ∈ U, the variable p<sup>u</sup> <sup>s</sup> <sup>∈</sup> [0, 1] represents the probability of reaching the target set <sup>T</sup> <sup>⊆</sup> <sup>S</sup> from state <sup>s</sup>. The variable <sup>τ</sup> represents an upper bound on the probability of satisfying <sup>ϕ</sup><sup>r</sup> for all instantiations in <sup>U</sup>. Note that <sup>τ</sup> is a variable in our formulation, whereas λ is the threshold of the reachability specification, and thus constant. The set ¬∃♦<sup>T</sup> represents the set of states which cannot reach the target set T. The probability of reaching T from these states is zero, and the set ¬∃♦<sup>T</sup> does not change for different graph-preserving instantiations [17]. The set ¬∃♦<sup>T</sup> can be found in polynomial time in the size of a uMC by using standard graph-based search algorithms [48]. We solve the following LP Lr(U), which is parametrized by each instantiation <sup>u</sup> in <sup>U</sup>,

$$\begin{array}{c} \text{minimize} \ \tau \\ \end{array} \tag{1}$$

$$\text{A subject to} \quad \forall u \in \mathcal{U},$$

$$\begin{cases} p\_{s\_I}^u \le \tau, \\ p\_{s\_I}^u \le \lambda, \end{cases} \tag{2}$$

$$\begin{cases} p\_{s\_I}^u \le \lambda, \\ p\_s^u = 1, \quad \forall s \in T, \end{cases} \tag{3}$$

$$p\_s^u = 0, \quad \forall s \in \neg \exists \Diamond T,\tag{5}$$

$$p\_s^u = \sum\_{s' \in S} \mathcal{P}(s, s')[u] \cdot p\_{s'}^u, \quad \forall s \in S \; (T \cup \neg \exists \Diamond T) \; . \tag{6}$$

The objective (1) minimizes the maximal probability that can be achieved by all MCs induced by U. The constraint (2) represents an upper bound on the reachability probability for all instantiations. We minimize the upper bound to compute the maximal probability of satisfying ϕ<sup>r</sup> for all instantiated MCs. The constraint (3) ensures that the probability of reaching T from the initial state s<sup>I</sup> is below the threshold λ. The constraint (4) sets the probability to reach a state in T from T to 1. The constraint (5) sets the reachability probabilities from the states in ¬∃♦<sup>T</sup> to zero. The constraint (6) computes the probability of satisfying the specification for each non-target state <sup>s</sup> <sup>∈</sup> <sup>S</sup> in the standard way.

There are infinitely many constraints in the semi-infinite LP Lr(VD) as the cardinality of (VD) is infinite and Lr(VD) has infinitely many constraints in the form of (2)–(6). Our approach is based on scenario optimization [13,15,16], where

<sup>6</sup> we assume that each sample has a unique index

we instantiate the parameters <sup>u</sup> ∈ V<sup>D</sup> by sampling the probability distribution <sup>P</sup>. Then, for a given violation probability <sup>ν</sup> <sup>∈</sup> (0, 1), we compute a solution that violates the constraints in the LP Lr(VD) with a probability that is not larger than <sup>ν</sup>. We first give some properties of the LP <sup>L</sup>r(U).

**Theorem 1.** Let uMC <sup>D</sup> and the sample sets U⊆V<sup>D</sup> with <sup>K</sup> <sup>=</sup> |U| ≥ <sup>2</sup>. Assume for all <sup>u</sup> ∈ U, <sup>D</sup>[u] <sup>|</sup><sup>=</sup> <sup>ϕ</sup>r. For a given tolerance probability <sup>ν</sup> <sup>∈</sup> [0, 1), let the associated confidence probability

$$\alpha\_{\nu} = \sum\_{i=0}^{1} \binom{K}{i} (1-\nu)^{K-i} \nu^{i}. \tag{7}$$

Then, with a probability of at least <sup>1</sup> <sup>−</sup> <sup>α</sup>ν, we have

$$F(\mathcal{D}\_{\mathbb{P}}, \varphi\_r) \ge 1 - \nu. \tag{8}$$

Proof. The key idea of the proof is to relate the finite LP Lr(U) induced by a sampled set U to the semi-infinite LP Lr(VD). Then, we use the results given in [16, Theorem 1] to obtain the lower bound 1−αν. Let the convex set <sup>C</sup>D<sup>P</sup> <sup>U</sup> (λ, τ ) be generated by the set <sup>U</sup> according to the probability distribution <sup>P</sup> over <sup>V</sup><sup>D</sup> as

$$C\_{\mathcal{U}}^{\mathcal{D}\_{\mathbb{P}}}(\lambda,\tau) = \{ (\lambda,\tau) \mid \forall u \in \mathcal{U} \text{ satisfying (2)} - (6) \}. \tag{\star}$$

The convex set CD<sup>P</sup> <sup>U</sup> (λ, τ ) constitutes the set of feasible instantiations to the LP <sup>L</sup>r(U) and is exactly in the form of Equation 5 in [16]. Using <sup>C</sup>D<sup>P</sup> <sup>U</sup> (λ, τ ), we reformulate Lr(U) as the convex program

$$\begin{aligned} \text{minimize } & \tau \\ \text{subject to } & (\lambda, \tau) \in \mathcal{C}\_{\mathcal{U}}^{\mathcal{D}\_{\mathbb{P}}}(\lambda, \tau), \end{aligned} \tag{9}$$

where the last constraint denotes that for a given (λ, τ ), the feasible set of CD<sup>P</sup> <sup>U</sup> (λ, τ ) is not empty, i.e., there exists a feasible solution pair (λ, τ ) to the scenario problem Lr(U). This convex program asserts that all MCs in U should induce a reachability probability that is less than τ , satisfying the specification ϕr. Moreover, the convex program constitutes a scenario approximation to the socalled chance-constrained problem [1]. Such an optimization problem states that the probability of satisfying a (chance) constraint is above a certain threshold:

$$\begin{array}{ll}\text{minimize } \tau\\\text{subject to } (\lambda, \tau) \in \mathbb{R} \times \mathbb{R},\\\mathbb{P}\left( (\lambda, \tau) \in \mathcal{C}\_{\mathcal{V}\_{\mathcal{D}}}^{\mathcal{D}\_{\mathbb{P}}}(\lambda, \tau) \right) \geq 1 - \nu. \end{array} \tag{10}$$

The chance constraint in (10) ensures that the probability that an instantiation obtained via distribution <sup>P</sup>—satisfies the specification <sup>ϕ</sup><sup>r</sup> is at least 1−ν. Theorem 1 in [16] shows that any feasible solution to the problem in (9) is feasible to the problem in (10) with a confidence probability of 1 <sup>−</sup> <sup>α</sup>ν, which shows that the violation probability of the solution is at most ν. In our case, the probability of violation is exactly the probability that the instantiated MCs do not satisfy the specification ϕr. Thus, the claim follows.

Remark 4 (Independence to model size). The confidence probability in Theorem 1 is in fact independent from the number of states, transitions, or random parameters of the uMC. From a practical perspective, the number of samples that are needed for a certain confidence does not depend on the model size.

Finally, Theorem <sup>1</sup> asserts that with a probability of at least 1 <sup>−</sup> <sup>α</sup>ν, the next sampled point from V<sup>D</sup> will satisfy the specification with a probability of at least <sup>1</sup> <sup>−</sup> <sup>ν</sup>. Note that <sup>α</sup><sup>ν</sup> is the tail probability of a binomial distribution. It converges exponentially rapidly to 0 in |U| [16].

#### **4.2 Satisfaction Probability by Treating Violating Samples**

Theorem 1 assumes that all sampled points, that is, the induced MCs, satisfy the specification ϕr. This is a severe assumption in general. To lift this assumption, we consider the discarding approach from [19]. Specifically, after sampling a set of instantiations <sup>U</sup> from <sup>V</sup><sup>D</sup> according to the probability distribution <sup>P</sup>, we remove the constraints for the MCs that violate the specification ϕ<sup>r</sup> from the LP. We construct the set R = U\Q, where Q denotes the set of samples that induce MCs violating the specification <sup>ϕ</sup>r. Therefore, the set <sup>R</sup> denotes the set of sampled MCs that satisfy the specification <sup>ϕ</sup>r. We then solve the LP <sup>L</sup>r(R)

$$\begin{array}{ll}\text{minimize} & \tau\\\text{subject to} & \forall u \in \mathcal{R},\\\text{(2)}-\text{(6)},\end{array} \tag{11}$$

where for <sup>u</sup> ∈ R and <sup>s</sup> <sup>∈</sup> <sup>S</sup>, <sup>p</sup><sup>u</sup> <sup>s</sup> gives the probability of satisfying the reachability specification of the instantiated MC <sup>D</sup>[u] at state <sup>s</sup>. The other constraints in the optimization problem in LP Lr(R) are identical to the LP Lr(U). We give the main result of this section.

**Theorem 2.** Let uMC <sup>D</sup> and the sample sets <sup>U</sup>, Q⊆VD, with <sup>K</sup> <sup>=</sup> |U| ≥ <sup>2</sup> and <sup>L</sup> <sup>=</sup> |Q|. For a given tolerance probability <sup>ν</sup> <sup>∈</sup> [0, 1), the associated confidence probability is

$$\alpha\_{\nu} = \binom{L+1}{L} \sum\_{i=0}^{L+1} \binom{K}{i} (1-\nu)^{K-i} \nu^i. \tag{12}$$

Then, with a probability of at least <sup>1</sup> <sup>−</sup> <sup>α</sup>ν, we have

$$F(\mathcal{D}\_{\mathbb{P}}, \varphi\_r) \ge 1 - \nu. \tag{13}$$

Proof. Similar to the proof of Theorem 1, the main idea is to relate the LP Lr(R) to the chance-constrained convex problem in (10). Then, we invoke the results from [19, Theorem 1] to get the desired result. Let the convex set CD<sup>P</sup> <sup>R</sup> (λ, τ ), which is generated by the samples in R, be defined by

$$C\_{\mathcal{R}}^{\mathcal{D}\_{\mathbb{P}}}(\lambda,\tau) = \{ (\lambda,\tau) \mid \forall u \in \mathcal{R} \text{ such that (\*) is satisfied} \}.$$

The set CD<sup>P</sup> <sup>R</sup> (λ, τ ) is in the form of the Definition 2.1 in [16]. We reformulate the LP Lr(R) as the convex program

$$\begin{aligned} \text{minimize } & \tau\\ \text{subject to } & (\lambda, \tau) \in C\_{\mathcal{R}}^{\mathcal{D}\_{\mathbb{P}}}(\lambda, \tau). \end{aligned} \tag{14}$$

where the last constraint denotes that the instantiated MCs from the parameter values of the set <sup>R</sup> should induce a reachability probability less than <sup>τ</sup> , and thus, satisfy the specification ϕr. The problem in (14) is a scenario approximation to the problem in (10). Theorem 2.1 in [19] asserts that with a probability of αν, the violation probability of the solution is at most ν, which is the probability of violating the specification for the next sample. Similar to Theorem 1, the violation probability ν is the probability that an instantiated MC does not satisfy the specification ϕr. Thus, the claim follows.

#### **4.3 Expected Cost Specifications**

So far, we have focused on parameters that were uncontrollable, and assumed to be random. Now, we consider the case where the cost function c is parametric and the cost parameters are controllable. Therefore, the parameters in the cost function are now variables that we can optimize over to satisfy an expected cost specification <sup>ϕ</sup><sup>c</sup> <sup>=</sup> EC≤<sup>κ</sup>(♦G) for the instantiated MCs. Similar to the previous sections, we assume that we sample a set of instantiations U<sup>c</sup> from the probability distribution <sup>P</sup> over the parameter space <sup>V</sup>D. In this case, we modify the LP <sup>L</sup>r(U) to obtain the following LP, which we denote by <sup>L</sup>c(Uc),

$$\begin{array}{ll}\text{minimize} & \tau\\\text{subject to} & \forall u \in \mathcal{U}\_{c},\\ & c\_{s\_{l}}^{u} \leq \tau, \\ & c\_{s\_{l}}^{u} \leq \kappa, \\ & c\_{s}^{u} = 0 \quad \forall s \in G, \\ & c\_{s}^{u} = c(s) + \sum\_{s' \in S} \mathcal{P}(s, s')[u] \ c\_{s'}^{u} \quad \forall s \in S \; \backslash G, \end{array} \tag{15}$$

where for <sup>s</sup> <sup>∈</sup> <sup>S</sup>, <sup>c</sup>(s) <sup>∈</sup> <sup>R</sup>|W| <sup>≥</sup><sup>0</sup> is the cost function at state <sup>s</sup>, |W| is the number of the cost parameters, and for <sup>u</sup> ∈ Uc, <sup>c</sup><sup>k</sup> <sup>s</sup> gives the expected cost of reaching the target <sup>G</sup> of the instantiated MC <sup>D</sup>[k] at state <sup>s</sup>. Note that the cost parameters <sup>W</sup> are in the LP Lc(Uc) as variables for the parametric cost function,. In the scenario problem (15), we optimize over c(s) and c<sup>k</sup> <sup>s</sup> to minimize the maximal induced cost of the instantiated MCs. If c is an affine function, then the optimization problem Lc(Uc) is convex. In this case, the probabilistic properties of the scenario problem are given by the following theorem.

**Theorem 3.** Let uMC <sup>D</sup> and the sample set <sup>U</sup><sup>c</sup> ⊆ V<sup>D</sup> with <sup>W</sup> <sup>=</sup> |W|, and <sup>K</sup> <sup>=</sup> |Uc| ≥ <sup>W</sup> + 1. Assume for all <sup>u</sup> ∈ Uc, <sup>D</sup>[k] <sup>|</sup><sup>=</sup> <sup>ϕ</sup>c. For a given tolerance probability <sup>ν</sup> <sup>∈</sup> [0, 1), let the associated confidence probability

$$\alpha\_{\nu} = \sum\_{i=0}^{W+1} \binom{K}{i} (1-\nu)^{K-i} \nu^i. \tag{16}$$

Then, with a probability of at least <sup>1</sup> <sup>−</sup> <sup>α</sup>ν, we have <sup>F</sup>(MP, ϕc) <sup>≥</sup> <sup>1</sup> <sup>−</sup> <sup>ν</sup>.

Proof. Following the proof of Theorem 1, we define the convex set

$$\begin{array}{ll} C\_{\mathcal{U}\_{\mathcal{C}}}^{\mathcal{D}}(\kappa,\tau,c) = \left\{ (\kappa,\tau,c) \mid \quad \forall u \in \mathcal{U}\_{c} \text{ such that} \\\ c\_{s\_{\mathcal{I}}}^{u} \le \tau, \\\ c\_{s\_{\mathcal{I}}}^{u} \le \kappa, \\\ c\_{s}^{u} = 0 \quad \forall s \in G, \\\ c\_{s}^{u} = c(s) + \sum\_{s' \in S} \mathcal{P}(s,s')[u] \; c\_{s'}^{u} \; \forall s \in S \; \backslash G \right\} \end{array}$$

The main difference compared to the proof of Theorem 1 is that we have cost parameters in c as the decision variables and we consider an expected cost specification instead of a reachability specification. Similarly to the proof of Theorem 1, we reformulate the LP Lc(Uc) as the following convex problem

$$\begin{array}{ll}\text{minimize } \tau\\\text{subject to } (\kappa, \tau, c) \in \mathbb{R} \times \mathbb{R} \times \mathbb{R}^{|\mathcal{W}|}, \\\ (\kappa, \tau, c) \in C\_{\mathcal{U}\_c}^{\mathcal{D}\_{\mathbb{P}}}(\kappa, \tau, c). \end{array} \tag{17}$$

This convex problem is a scenario approximation to the chance constrained problem given by

$$\begin{array}{ll}\text{minimize } \tau\\\text{subject to } (\kappa, \tau, c) \in \mathbb{R} \times \mathbb{R} \times \mathbb{R}^{|\mathcal{W}|},\\\mathbb{P}\left( (\kappa, \tau, c) \in C\_{\mathcal{V}\_{\mathcal{D}}}^{\mathcal{D}\_{\mathbb{P}}}(\kappa, \tau, c) \right) \ge 1 - \nu. \end{array} \tag{18}$$

Therefore, similar to the Theorem 1, we obtain the desired claim.

We now consider the case that we compute an instantiation of the cost variables, and some of the instantiated MCs satisfy the expected cost specification. We construct the set R<sup>c</sup> = U<sup>c</sup> \ Qc, where Q<sup>c</sup> denotes the set of samples that induce MCs which violate the specification ϕc. For this case, we obtain:

**Theorem 4.** Let uMC <sup>D</sup> and the sample sets <sup>U</sup>c, <sup>Q</sup><sup>c</sup> ⊆ VD, with <sup>W</sup> <sup>=</sup> |W|, <sup>K</sup> <sup>=</sup> |Uc| ≥ <sup>2</sup> and <sup>L</sup> <sup>=</sup> |Q|. For a given tolerance probability <sup>ν</sup> <sup>∈</sup> [0, 1), let the associated confidence probability

$$\alpha\_{\nu} = \binom{l+W+1}{l} \sum\_{i=0}^{l+W+1} \binom{K}{i} (1-\nu)^{K-i} \nu^i. \tag{19}$$

Then, with a probability of at least <sup>1</sup> <sup>−</sup> <sup>α</sup>ν, we have <sup>F</sup>(M<sup>P</sup>, ϕc) <sup>≥</sup> <sup>1</sup> <sup>−</sup> ν.

Proof. The proof is similar to the proofs of Theorem 2 and 3, and omitted.

#### **4.4 Building Scenario-Based Algorithms**

The question remains how we leverage the theoretical results to compute an estimate on the satisfaction probability to solve Problems 1 and 2. For instance, let <sup>ν</sup> be a violation probability and <sup>U</sup> the sample set. Then, we can use Theorem <sup>2</sup> or 4 to compute the confidence probability α<sup>ν</sup> by using the discarding approach from [19]. Similarly, for a the sample set U and a threshold on the confidence probability α<sup>ν</sup> we do a bisection on ν. Specifically, we repeatedly apply Theorem 2 or <sup>4</sup> for different values of <sup>ν</sup> <sup>∈</sup> (0, 1), to see if the corresponding confidence probability α<sup>ν</sup> is below the threshold. We then approximate the lower and upper bounds on ν.

The correctness of the approach is based on scenario-based optimization. However, it also applies to an obtained solution by any procedure [39]. For instance, for any obtained value for the controlled parameters, we can construct a scenario program by sampling from random parameters. We can then apply Theorem 2 or 4 to compute the confidence probability α<sup>ν</sup> or the violation probability ν.

Generalization to uMDPs. Recall that we want to compute the satisfaction probability for a uMDP. The probability that for any sampled MDP we are able to synthesize a policy that satisfies the specification ϕr. To generalize our results to uMDPs, we can modify the constraint (6) in the LP Lr(U) as

$$p\_s^u \le \sum\_{s' \in S} \mathcal{P}(s, \alpha, s')[u] \cdot p\_{s'}^u, \quad \forall s \in S \; (\; (T \cup \neg \exists \lozenge T) \; , \; \forall \alpha \in ActS(s), \tag{20}$$

asserting that, for each non-target state <sup>s</sup> <sup>∈</sup> <sup>S</sup> and action <sup>α</sup> <sup>∈</sup> ActS(s), the probability induced by the minimizing policy is an upper bound to the probability variables p<sup>u</sup> <sup>s</sup> . The reachability specification ϕ<sup>r</sup> is satisfied if and only if the reachability probability at the initial state induced by the minimizing policy is less than λ. We can assert if ϕ<sup>r</sup> is satisfied by combining the constraints (20) with the constraints (2)–(5). Then, our theoretical results apply to the uMDPs.

## **5 Numerical Examples**

We implemented the approach from Section 4 using the model checker Storm [35] to construct and analyze samples of MDPs. To solve the scenario optimization problems with cost parameters, we used the SCS solver [31]. All computations ran on a computer with 8 2.2 GHz cores, and 32 GB of RAM.

We report on a set of well-known benchmarks used in parameter synthesis [46] that are, for instance, available on the website of the tools PARAM [17] or part of the PRISM benchmark suite [23]. Moreover, we created a dedicated case study that is based on the aforementioned UAV example.

#### **5.1 Parameter Synthesis Benchmarks**

Setup. In our first set of benchmarks, we adopt parametric MDPs and MCs from [32]. Essentially, the technique from that paper allows to approximate the percentage of instantiations that satisfy (or do not satisfy) a specification. We assume a uniform distribution over the parameter space and set ν equal to the


**Table 1.** Information for the benchmark instances taken from [32].

**Table 2.** Confidence probabilities α<sup>ν</sup> for different numbers of samples.


percentage of instantiations that do not satisfy the specification (and vice versa for 1 <sup>−</sup> <sup>ν</sup>). We solve Problem 1 and show that the satisfaction probability is with confidence α<sup>ν</sup> as least as high as the approximate satisfaction percentages from [32]. We adapt the Consensus protocol [3] and the Bounded Retransmission Protocol (brp) [5] to uMDPs; the Crowds Protocol (crowds) [12] and the NAND Multiplexing benchmark (nand) [8] become uMCs. In Table 1 we list the type of specification checked (ϕ) and the number of parameters, states, and transitions. We also list the satisfaction probability (as obtained in [32]) for satisfying (sat) and falsifying (unsat) the specification ϕ.

Results. Table 2 shows the confidence probability α<sup>ν</sup> for each benchmark to satisfy and falsify the specification after 100, 1 000 and 10 000 samples from the parameter space. In particular, for each number of samples, we report the average α<sup>ν</sup> after running 10 full iterations of the same benchmark. Furthermore, we list the time to solve 1 000 samples for each instance (Time (s)).

The results in Table 2 show that for some benchmarks we get a high confidence probability already after 1 000 samples. For other benchmarks, the confidence probability is still considerably low, for instance considering nand and falsifying the specification. After 10 000 samples, we get a very high confidence in the satisfaction probability for all benchmarks. These results demonstrate that we

**Fig. 2.** An example of a 3D UAV benchmark with obstacles and a target area.

can efficiently compute a high confidence in the satisfaction probability. In particular, for the same number of samples, the obtained confidence probabilities are consistent for varying number of states and parameters of the underlying models. Therefore, no dependence on the size of models is shown (see Remark 4).

#### **5.2 UAV Motion Planning**

In our second benchmark, we consider the previously mentioned UAV motion planning example to model a realistic problem with a high number of random parameters. We model the problem as a uMDP, where the parameters represent how the weather conditions affect the movement of the UAV, and how the weather may change. In particular, different wind conditions induce specific satisfaction probabilities. We assume that the planning area is a certain valley where we have historic weather data which provide distributions over parameter values. The mission of the UAV is to transport a payload to a specific location and return safely to its initial position. The problem is to compute the satisfaction probability, that is, the probability that for any sampled MDP for this scenario we are able to synthesize a UAV policy that satisfies the specification.

We model the problem as follows: States encode the position of the UAV, the current weather situation, and the general wind direction in the valley. Parameters describe how the weather affects the position of the UAV for different zones in the valley, and how the weather/wind may change during the day. Fig. 2 shows an example environment with zones to avoid (red) and a target zone (green). We define four different weather conditions that each induce certain probability distributions over the eight different wind directions. The parameters of the model determine the probabilities of transitioning between different weather and wind conditions at each time step. The specification is to reach the target zone safely with a probability of at least 0.9. The number of states in our example is 266 880, and the number of parameters is 2 500.

For the distributions over parameter values, that is, over weather conditions, we consider the following cases. First, we assume a uniform distribution over the different weather conditions in each zone. Second, the probability for a weather condition inducing a wind direction that pushes the UAV into the positive ydirection is five times more likely than others. Similarly, in the third case, it is five times more likely to push the UAV into the negative x-direction. We depict some example trajectories of the UAV for three different conditions in Fig. 2. The trajectory given by the blue dashed line represents the expected trajectory for the first case, taking a direct route to reach the target area. Similarly, the trajectories given by the black dotted and solid green lines represent the expected trajectories for the second and third cases. For the second case, we observe that the UAV tries to avoid to get closer to the obstacles in x direction as the wind may push the UAV to the obstacles. For the third case, the UAV avoids the obstacle at the bottom and then reaches the target area.

We sample 1 000 parameters for each case and approximate the maximal satisfaction probability with a confidence probability of at least 1 <sup>−</sup> <sup>α</sup>ν, with α<sup>ν</sup> = 10−<sup>6</sup>. The highest satisfaction probability is given by the first weather condition with 0.86, and the other conditions have a satisfaction probability of 0.78 and 0.75, showing that it may be harder to navigate around the obstacles with non-uniform probability distributions. The average time to compute the satisfaction probabilities is 1 341 seconds.

Finally, we introduce costs to a 2-dimensional example, where hitting an obstacle causes (1) a cost of 100 and (2) the UAV to return to the initial position. Specifically, we introduce cost parameters for transitions that steer the UAV towards x or y-directions. We minimize the maximal possible expected cost (under all parameter values) to reach the target location. The specification asserts that the resulting expected cost should be less than 20.

We uniformly sample 1 000 parameter values for weather conditions and note that the UAV policies favor on average transitioning to y-direction more compared to the x-direction to minimize the cost while ensuring that the probability of hitting an obstacle is minimized. The average expected cost of the induced MDPs is 7.41 and the satisfaction probability is 0.71. The solving time for this example is 2 274 seconds.

## **6 Conclusion**

We presented a new sampling-based approach to uncertain Markov models. Theoretically, we showed how to effectively and efficiently approximate the probability that any randomly drawn sample satisfies a temporal logic specification. Furthermore, we showed the computational tractability of our approaches by means of well-known benchmarks and a new, dedicated case study.

In the future, we plan to exploit our approaches for more involved models such as parametric extensions to continuous-time Markov chains [9] or Markov automata [22]. Another line of future work will be a closer integration with a parameter synthesis framework.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning*-*

Ernst Moritz Hahn1,<sup>2</sup> , Mateo Perez<sup>3</sup> , Sven Schewe<sup>4</sup> , Fabio Somenzi<sup>3</sup> , Ashutosh Trivedi<sup>3</sup> , and Dominik Wojtczak<sup>4</sup>

**TACAS** School of EEECS, Queen's University Belfast, UK State Key Laboratory of Computer Science, Institute of Software, CAS, PRC University of Colorado Boulder, USA University of Liverpool, UK

Abstract. We characterize the class of nondeterministic ω-automata that can be used for the analysis of finite Markov decision processes (MDPs). We call these automata 'good-for-MDPs' (GFM). We show that GFM automata are closed under classic simulation as well as under more powerful simulation relations that leverage properties of optimal control strategies for MDPs. This closure enables us to exploit state-space reduction techniques, such as those based on direct and delayed simulation, that guarantee simulation equivalence. We demonstrate the promise of GFM automata by defining a new class of automata with favorable properties—they are Buchi automata with low branching degree obtained through ¨ a simple construction—and show that going beyond limit-deterministic automata may significantly benefit reinforcement learning.

## 1 Introduction

System specifications are often captured in the form of finite automata over infinite words (ω-automata), which are then used for model checking, synthesis, and learning. Of the commonly-used types of ω-automata, Buchi automata have the simplest ¨ acceptance condition, but require nondeterminism to recognize all ω-regular languages. Nondeterministic machines can use unbounded look-ahead to resolve nondeterministic choices. However, important applications—like reactive synthesis or model checking and reinforcement learning (RL) for Markov Decision Process (MDPs [23])—have a game setting, which restrict the resolution of nondeterminism to be based on the past.

Being forced to resolve nondeterminism on the fly, an automaton may end up rejecting words it should accept, so that using it can lead to incorrect results. Due to this difficulty, initial solutions to these problems have been based on deterministic automata usually with Rabin or parity acceptance conditions. For two-player games, Henzinger and Piterman proposed the notion of *good-for-games (GFG)* automata [15]. These are nondeterministic automata that simulate [21,14,9] a deterministic automaton that recognizes the same language. The existence of a simulation strategy means that nondeterministic choices can be resolved without look-ahead.

<sup>-</sup> This work has been supported by the National Natural Science Foundation of China (Grant Nr. 61532019), EPSRC grants EP/M027287/1 and EP/P020909/1, and a CU Boulder RIO grant.

The situation is better in the case of probabilistic model checking, because the game for which a strategy is sought is played on an MDP against "blind nature," rather than against a strategic opponent who may take advantage of the automaton's inability to resolve nondeterminism on the fly. As early as 1985, Vardi noted that probabilistic model checking can be performed with Buchi automata endowed with a limited form ¨ of nondeterminism [34]. *Limit deterministic Buchi automata (LDBA) ¨* [4,11,29] perform no nondeterministic choice after seeing an accepting transition. Still, they recognize all ω-regular languages and are, under mild restrictions [29], *suitable* for probabilistic model checking.

Related Work. The production of deterministic and limit deterministic automata for model checking has been intensively studied [24,22,1,26,33,32,27,29,8,30,20], and several tools are available to produce different types of automata, incl. MoChiBA/Owl [29,30,20], LTL3BA [1], GOAL [33,32], SPOT [8], Rabinizer [19], and Buchifier [ ¨ 16].

So far, only deterministic and a (slightly restricted [29]) class of limit deterministic automata have been considered for probabilistic model checking [34,4,11,29]. Thus, while there have been advances in the efficient production of such automata [11,29,30,20], the consideration of suitable LDBAs by Courcoubetis and Yannakakis in 1988 [3] has been the last time when a fundamental change in the automata foundation of MDP model checking has occurred.

Contribution. The simple but effective observation that simulation preserves the suitability for MDPs (for both traditional simulation and the AEC simulation we introduce) extends the class of automata that can be used in the analysis of MDPs. This provides us with three advantages: The first advantage is that we can now use a wealth of simulation based statespace reduction techniques [7,31,10,9] on an automaton A (e.g. an SLDBA) that we would otherwise use for MDP model checking. The second advantage is that we can use A to check if a different language equivalent automaton, such as an NBA B (e.g. an NBA from which A is derived) simulates A. For this second advantage, we can dip into the more powerful class of AEC simulation we define in Section 4 that use properties of winning strategies on finite MDPs. While this is not a complete method for identifying GFM automata, our experimental results indicate that the GFM property is quite frequent for NBAs constructed from random formulas, and can often be established efficiently, while providing a significant statespace reduction and thus offering a significant advantage for model checking.

A third advantage is that we can use the additional flexibility to tailor automata for different applications than model checking, for which specialized automata classes have not yet been developed. We demonstrate this for model-free reinforcement learning (RL). We argue that RL benefits from three properties that are less important in model checking: The first—easy to measure—property is a small number of successors, the second and third, are *cautiousness*, the scope for making wrong decisions, and *forgiveness*, the resilience against making wrong decisions, respectively.

A small number of successors is a simple and natural goal for RL, as the lack of an explicit model means that the product space of a model and an automaton cannot be evaluated backwards. In a forward analysis, it matters that nondeterministic choices have to be modeled by enriching the decisions in the MDPs with the choices made by the automaton. For LDBAs constructed from NBAs, this means guessing a suitable subset of the reachable states when progressing to the deterministic part of the automaton, meaning a number of choices that is exponential in the NBA. We show that we can instead use *slim automata* in Section 3.2 as a first example of NBAs that are good-for-MDPs, but not limit deterministic. They have the appealing property that their branching degree is at most two, while keeping the Buchi acceptance mechanism that ¨ works well with RL [12]. (Slim automata can also be used for model checking, but they don't provide similar advantages over suitable LDBAs there, because the backwards analysis used in model checking makes selecting the correct successor trivial.)

Cautiousness and forgiveness are further properties, which are—while harder to quantify—very desirable for RL: LDBAs, for example, suffer from having to make a *correct* choice when moving into the deterministic part of the automaton, and they have to make this correct choice from a very large set of nondeterministic transitions. While this is unproblematic for standard model checking algorithms that are based on backwards analysis, applications like RL that rely on forward analysis can be badly affected when more (wrong) choices are offered, and when wrong choices cannot be rectified. Cautiousness and forgiveness are a references to this: an automaton is more *cautious* if it has less scope for making wrong decisions and more *forgiving* if it allows for correcting previously made decisions (cf. Figure 5 for an example). Our experiments (cf. Section 5) indicate that cautiousness and forgiveness are beneficial for RL.

Organization of the Paper. After the preliminaries, we introduce the "good-for-MDP" property (Section 3) and show that it is preserved by simulation, which enables all minimization techniques that offer the simulation property (Section 3.1). In Section 3.2 we use this observation to construct slim automata—NBAs with a branching degree of 2 that are neither limit deterministic nor good-for-games—as an example of a class of automata that becomes available for MDP model checking and RL. We then introduce a more powerful simulation relation, AEC simulation, that suffices to establish that an automaton is good-for-MDPs (Section 4). In Section 5, we evaluate the impact of the contributions of the paper on model checking and reinforcement learning algorithms.

## 2 Preliminaries

A *nondeterministic Buchi automaton ¨* is a tuple A = -Σ, Q, q0, Δ, Γ, where <sup>Σ</sup> is a finite *alphabet*, <sup>Q</sup> is a finite set of *states*, <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup> is the *initial state*, <sup>Δ</sup> <sup>⊆</sup> <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>×</sup> <sup>Q</sup> are transitions, and <sup>Γ</sup> <sup>⊆</sup> <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>×</sup> <sup>Q</sup> is the transition-based *acceptance condition*.

<sup>A</sup> *run* <sup>r</sup> of <sup>A</sup> on <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> is an <sup>ω</sup>-word <sup>r</sup>0, w0, r1, w1,... in (Q×Σ)<sup>ω</sup> such that <sup>r</sup><sup>0</sup> <sup>=</sup> <sup>q</sup><sup>0</sup> and, for i > <sup>0</sup>, it is (r<sup>i</sup>−<sup>1</sup>, w<sup>i</sup>−<sup>1</sup>, ri) <sup>∈</sup> <sup>Δ</sup>. We write inf(r) for the set of transitions that appear infinitely often in the run <sup>r</sup>. A run <sup>r</sup> of <sup>A</sup> is *accepting* if inf(r) <sup>∩</sup> <sup>Γ</sup> <sup>=</sup> <sup>∅</sup>.

The *language*, <sup>L</sup>A, of <sup>A</sup> (or, *recognized* by <sup>A</sup>) is the subset of words in <sup>Σ</sup><sup>ω</sup> that have accepting runs in <sup>A</sup>. A language is <sup>ω</sup>-*regular* if it is accepted by a Buchi automaton. An ¨ automaton A = -Σ, Q, Q0, Δ, Γ is *deterministic* if (q, σ, q- ),(q, σ, q--) <sup>∈</sup> <sup>Δ</sup> implies q- = q--. <sup>A</sup> is *complete* if, for all <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> and all <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, there is a transition (q, σ, q- ) ∈ Δ. A word in Σ<sup>ω</sup> has exactly one run in a deterministic, complete automaton.

<sup>A</sup> *Markov decision process (MDP)* <sup>M</sup> is a tuple (S, A, T, Σ, L) where <sup>S</sup> is a finite set of states, <sup>A</sup> is a finite set of *actions*, <sup>T</sup> : <sup>S</sup> <sup>×</sup> <sup>A</sup> <sup>−</sup> <sup>D</sup>(S), where <sup>D</sup>(S) is the set of probability distributions over S, is the *probabilistic transition (partial) function*, Σ is an alphabet, and <sup>L</sup> : <sup>S</sup> <sup>×</sup> <sup>A</sup> <sup>×</sup> <sup>S</sup> <sup>→</sup> <sup>Σ</sup> is the *labeling function* of the set of transitions. For a state <sup>s</sup> <sup>∈</sup> <sup>S</sup>, <sup>A</sup>(s) denotes the set of actions available in <sup>s</sup>. For states s, s- <sup>∈</sup> <sup>S</sup> and <sup>a</sup> <sup>∈</sup> <sup>A</sup>(s), we have that <sup>T</sup>(s, a)(s- ) equals Pr (s- <sup>|</sup>s, a).

<sup>A</sup> *run* of <sup>M</sup> is an <sup>ω</sup>-word <sup>s</sup>0, a1,... <sup>∈</sup> <sup>S</sup> <sup>×</sup>(A×S)<sup>ω</sup> such that Pr (si+1|si, ai+1) <sup>&</sup>gt; <sup>0</sup> for all <sup>i</sup> <sup>≥</sup> <sup>0</sup>. A finite run is a finite such sequence. For a *run* <sup>r</sup> <sup>=</sup> <sup>s</sup>0, a1, s1,... we define the corresponding labeled run as <sup>L</sup>(r) = <sup>L</sup>(s0, a1, s1), L(s1, a2, s2),... <sup>∈</sup> <sup>Σ</sup>ω. We write <sup>Ω</sup>(M) (Paths(M)) for the set of runs (finite runs) of <sup>M</sup> and <sup>Ω</sup>s(M) (Pathss(M)) for the set of runs (finite runs) of <sup>M</sup> starting from state <sup>s</sup>. When the MDP is clear from the context we drop the argument M.

A strategy in <sup>M</sup> is a function <sup>μ</sup> : Paths → D(A) such that supp(μ(r)) <sup>⊆</sup> A(last(r)), where supp(d) is the support of d and last(r) is the last state of r. Let Ω<sup>M</sup> <sup>μ</sup> (s) denote the subset of runs ΩM(s) that correspond to strategy μ and initial state <sup>s</sup>. Let <sup>Π</sup><sup>M</sup> be the set of all strategies. We say that a strategy <sup>μ</sup> is *pure* if <sup>μ</sup>(r) is a point distribution for all runs <sup>r</sup> <sup>∈</sup> Paths and we say that <sup>μ</sup> is *positional* if last(r) = last(r- ) implies μ(r) = μ(r- ) for all runs r, r- ∈ Paths. The behavior of an MDP M under a strategy μ with starting state s is defined on a probability space (Ω<sup>μ</sup> <sup>s</sup> , <sup>F</sup><sup>μ</sup> <sup>s</sup> ,Pr<sup>μ</sup> <sup>s</sup> ) over the set of infinite runs of μ from s.

## 3 Good-for-MDP (GFM) Automata

Given an MDP M and an automaton A = -Σ, Q, q0, Δ, Γ, we want to compute an optimal strategy satisfying the objective that the run of M is in the language of A. We define the semantic satisfaction probability for <sup>A</sup> and a strategy <sup>μ</sup> from state <sup>s</sup> as:

$$\mathsf{PSem}\_{\mathcal{A}}^{\mathcal{M}}(s,\mu) = \Pr\_{s}^{\mu} \{ r \in \Omega\_{s}^{\mu} : L(r) \in L\_{\mathcal{A}} \} \text{ and } \mathsf{PSem}\_{\mathcal{A}}^{\mathcal{M}}(s) = \sup\_{\mu \in \Pi\_{\mathcal{M}}} \left( \mathsf{PSem}\_{\mathcal{A}}^{\mathcal{M}}(s,\mu) \right).$$

When using automata for the analysis of MDPs, we need a syntactic variant of the acceptance condition. Given an MDP <sup>M</sup> = (S, A, T, Σ, L) with initial state <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>S</sup> and automaton A = -Σ, Q, q0, Δ, Γ, the *product* M×A=(S×Q,(s0, q0), A×Q, T <sup>×</sup>, Γ <sup>×</sup>) is an MDP [17] augmented with an initial state (s0, q0) and accepting transitions Γ <sup>×</sup>. The (partial) function <sup>T</sup> <sup>×</sup> : (<sup>S</sup> <sup>×</sup> <sup>Q</sup>) <sup>×</sup> (<sup>A</sup> <sup>×</sup> <sup>Q</sup>) <sup>−</sup> <sup>D</sup>(<sup>S</sup> <sup>×</sup> <sup>Q</sup>) is defined by

$$(T^\times((s,q),(a,q'))((s',q')) = \begin{cases} T(s,a)(s') & \text{if } (q,L(s,a,s'),q') \in \Delta\\ \text{undefined} & \text{otherwise.} \end{cases}$$

Finally, <sup>Γ</sup> <sup>×</sup> <sup>⊆</sup> (S×Q)×(A×Q)×(S×Q) is defined by ((s, q),(a, q- ),(s- , q- )) <sup>∈</sup> <sup>Γ</sup> <sup>×</sup> if, and only if, (q,L(s, a, s- ), q- ) <sup>∈</sup> <sup>Γ</sup> and <sup>T</sup>(s, a)(s- ) > 0. A strategy μ on the MDP defines a strategy μ<sup>×</sup> on the product, and vice versa. We define the syntactic satisfaction probabilities as

$$\begin{split} \mathsf{PSyn}\_{\mathcal{A}}^{\mathcal{M}}((s,q),\mu^{\times}) &= \mathsf{Pr}\_{s}^{\mu} \{ r \in \mathcal{Q}\_{(s,q)}^{\mu^{\times}}(\mathcal{M}\times\mathcal{A}) : \inf(r) \cap I^{\times} \neq \emptyset \} \ , \quad \text{and} \\ \mathsf{PSyn}\_{\mathcal{A}}^{\mathcal{M}}(s) &= \sup\_{\mu^{\times} \in \Pi\_{\mathcal{M}\times\mathcal{A}}} \left( \mathsf{PSyn}\_{\mathcal{A}}^{\mathcal{M}}((s,q\_{0}),\mu^{\times}) \right) \ . \end{split}$$

Note that PSyn<sup>M</sup> <sup>A</sup> (s) = PSem<sup>M</sup> <sup>A</sup> (s) holds for a deterministic <sup>A</sup>. In general, PSyn<sup>M</sup> <sup>A</sup> (s) ≤ PSem<sup>M</sup> <sup>A</sup> (s) holds, but equality is not guaranteed because the optimal resolution of nondeterministic choices may require access to future events (see Figure 1).

Fig. 1. An NBA, which accepts all words over the alphabet {a, b}, that is not good for MDPs. The dotted transitions are accepting. For the Markov chain on the right where the probability of a and b is <sup>1</sup> <sup>2</sup> , the chance that the automaton makes infinitely many correct predictions is 0

Definition 1 (GFM automata). *An automaton* A *is* good for MDPs *if, for all MDPs* M*,* PSyn<sup>M</sup> <sup>A</sup> (s0) = PSem<sup>M</sup> <sup>A</sup> (s0) *holds, where* <sup>s</sup><sup>0</sup> *is the initial state of* <sup>M</sup>*.*

For an automaton to match PSem<sup>M</sup> <sup>A</sup> (s0), its nondeterminism is restricted not to rely heavily on the future; rather, it must possible to resolve the nondeterminism on-the-fly. For example, the Buchi automaton presented on the left of Figure ¨ 1, which has to guess whether the next symbol is a or b, is not good for MDPs, because the simple Markov chain on the right of Figure 1 does not allow resolution of its nondeterminism on-the-fly.

There are three families of automata that are known to be good for MDPs: (1) deterministic automata, (2) good for games automata [15,18], and (3) limit deterministic automata that satisfy a few side constraints [4,11,29].

A *limit-deterministic* Buchi automaton (LDBA) is a nondeterministic B ¨ uchi au- ¨ tomaton (NBA) A = -Σ,Q<sup>i</sup> <sup>∪</sup> <sup>Q</sup><sup>f</sup> , q0, Δ, Γ such that <sup>Q</sup><sup>i</sup> <sup>∩</sup> <sup>Q</sup><sup>f</sup> <sup>=</sup> <sup>∅</sup>; <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup>i; <sup>Γ</sup> <sup>⊆</sup> <sup>Q</sup><sup>f</sup> <sup>×</sup> <sup>Σ</sup> <sup>×</sup> <sup>Q</sup><sup>f</sup> ; (q, σ, q- ),(q, σ, q--) <sup>∈</sup> <sup>Δ</sup> and q, q- <sup>∈</sup> <sup>Q</sup><sup>f</sup> implies <sup>q</sup>- = q--; and (q, σ, q- ) <sup>∈</sup> <sup>Δ</sup> and <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>f</sup> implies <sup>q</sup>- <sup>∈</sup> <sup>Q</sup><sup>f</sup> . An LDBA behaves deterministically once it has seen an accepting transition. Usual LDBA constructions [11,29] produce GFM automata. We refer to LDBAs with this property as *suitable* (SLDBAs), cf. Theorem 1.

In the context of RL, techniques based on SLDBAs are particularly useful, because these automata use the Buchi acceptance condition, which can be translated to reacha- ¨ bility goals. Good for games and deterministic automata require more complex acceptance conditions, like parity, that do not have a natural translation into rewards [12].

Using SLDBA [4,11,29] has the drawback that they naturally have a high branching degree in the initial part, as they naturally allow for many different transitions to the accepting part of the LDBA. This can be avoided, but to the cost of a blow-up and a more complex construction and data structure [29]. We therefore propose an automata construction that produces NBAs with a small branching degree—it never produces more than two successors. We call these automata *slim*. The resulting automata are not (normally) limit deterministic, but we show that they are good for MDPs.

Due to technical dependencies we start with presenting a second observation, namely that automata that *simulate* language equivalent GFM automata are GFM. As a side result, we observe that the same holds for good-for-games automata. The side result is not surprising, as good-for-games automata were defined through simulation of deterministic automata [15]. But, to the best of our knowledge, the observation from Corollary 1 has not been made yet for good-for-games automata.

## 3.1 Simulating GFM

An automaton A *simulates* an automaton B if the duplicator wins the *simulation game*. The simulation game is played between a duplicator and a spoiler, who each control a pebble, which they move along the edges of A and B, respectively. The game is started by the spoiler, who places her pebble on an initial state of B. Next, the duplicator puts his pebble on an initial state of A. The two players then take turns, always starting with the spoiler choosing an input letter and a transition for that letter in B, followed by the duplicator choosing a transition for the same letter in A. This way, both players produce an infinite run of their respective automaton. The duplicator has two ways to win a play of the game: if the run of A he constructs is accepting, and if the run the spoiler constructs on B is rejecting. The duplicator wins this game if he has a winning strategy, i.e., a recipe to move his pebble that guarantees that he wins. Such a winning strategy is "good-for-games," as it can only rely on the past. It can be used to transform winning strategies of B, so that, if they were witnessing a good for games property or were good for an MDP, then the resulting strategy for A has the same property.

## Lemma 1 (Simulation Properties). *For* <sup>ω</sup>*-automata* <sup>A</sup> *and* <sup>B</sup> *the following holds.*


*Proof.* Facts (1) and (2) are well known observations. Fact (1) holds because an accepting run of <sup>B</sup> on a word <sup>α</sup> can be translated into an accepting run of <sup>A</sup> on <sup>α</sup> by using the winning strategy of A in the simulation game. Fact (2) follows immediately from Fact (1). Facts (3) and (4) follow by simulating the behaviour of B on each run.

This observation allows us to use a family of state-space reduction techniques, in particular those based on language preserving translations for Buchi automata based on ¨ simulation relation [7,31,10,9]. This requires stronger notions of simulations, like direct and delayed simulation [9]. For the deterministic part of an LDBA, one can also use space reduction techniques for DBAs like [25].

Corollary 1. *All statespace reduction techniques that turn an NBA* A *into an NBA* B *that simulates* A *preserve GFG and GFM: if* A *is GFG or GFM, then* B *is GFG or GFM, respectively.*

## 3.2 Constructing Slim GFM Automata

Let us fix Buchi automaton ¨ <sup>B</sup> <sup>=</sup> Σ, Q, Q0, Δ, Γ . We can write Δ as a function <sup>ˆ</sup><sup>δ</sup> : <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>2</sup><sup>Q</sup> with <sup>ˆ</sup><sup>δ</sup> : (q, σ) → {q- <sup>∈</sup> <sup>Q</sup> <sup>|</sup> (q, σ, q- ) <sup>∈</sup> <sup>Δ</sup>}, which can be lifted to sets, using the deterministic transition function <sup>δ</sup> : 2<sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>2</sup><sup>Q</sup> with <sup>δ</sup> : (S, σ) → <sup>q</sup>∈<sup>S</sup> <sup>ˆ</sup>δ(q, σ). We also define an operator, ndet, that translates deterministic transition functions <sup>δ</sup> : <sup>R</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>R</sup> to relations, using

ndet: (<sup>R</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>R</sup>) <sup>→</sup> <sup>2</sup><sup>R</sup>×Σ×<sup>R</sup> with ndet: <sup>δ</sup> → (q, σ, q- ) <sup>|</sup> <sup>q</sup>- <sup>∈</sup> <sup>δ</sup>({q}, σ) . This is just an easy means to move back and forth between functions and relations, and helps one to visualize the maximal number of successors. We next define the variations of subset and breakpoint constructions that are used to define the well-known limit deterministic GFM automata—which we use in our proofs—and the slim GFM automata we construct. Let 3<sup>Q</sup> := (S, S- ) <sup>|</sup> <sup>S</sup>- - <sup>S</sup> <sup>⊆</sup> <sup>Q</sup> and 3<sup>Q</sup> <sup>+</sup> := (S, S- ) <sup>|</sup> <sup>S</sup>- ⊆ <sup>S</sup> <sup>⊆</sup> <sup>Q</sup> . We define the subset notation for the transitions and accepting transitions as <sup>δ</sup>S, γ<sup>S</sup> : 2<sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>2</sup><sup>Q</sup> with

$$\begin{aligned} \delta\_S \colon (S, \sigma) &\mapsto \left\{ q' \in Q \mid \exists q \in S. \ (q, \sigma, q') \in \Delta \right\} \text{ and }\\ \gamma\_S \colon (S, \sigma) &\mapsto \left\{ q' \in Q \mid \exists q \in S. \ (q, \sigma, q') \in \Gamma \right\}. \end{aligned}$$

We define the raw breakpoint transitions <sup>δ</sup><sup>R</sup> : 3<sup>Q</sup>×Σ→3<sup>Q</sup> <sup>+</sup> as - (S, S- ), σ →- δS(S, σ), δS(S- , σ) <sup>∪</sup> <sup>γ</sup>S(S, σ) . In this construction, we follow the set of reachable states (first set) and the states that are reachable while passing at least one of the accepting transitions (second set). To turn this into a breakpoint automaton, we reset the second set to the empty set when it equals the first; the transitions where we reset the second set are exactly the accepting ones. The breakpoint automaton <sup>D</sup> <sup>=</sup> Σ, <sup>3</sup><sup>Q</sup>,(Q0, <sup>∅</sup>), δB, γ<sup>B</sup> is defined such that, when δ<sup>R</sup> : - (S, S- ), σ → (R, R- ), then there are three cases:


Finally, we define transitions <sup>Δ</sup>SB <sup>⊆</sup> <sup>2</sup><sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>×</sup> <sup>3</sup><sup>Q</sup> that lead from a subset to a breakpoint construction, and <sup>γ</sup>2,<sup>1</sup> : 3<sup>Q</sup> <sup>×</sup><sup>Σ</sup> <sup>→</sup> <sup>3</sup><sup>Q</sup> that promote the second set of a breakpoint construction to the first set as follows.


We can now define standard limit deterministic good for MDP automata.

Theorem 1. *[11]* <sup>A</sup> <sup>=</sup> Σ, <sup>2</sup><sup>Q</sup> <sup>∪</sup>3<sup>Q</sup>, Q0, ndet(δS)<sup>∪</sup> <sup>Δ</sup>SB <sup>∪</sup>ndet(δB), ndet(γB) *recognizes the same language as* B*. It is limit deterministic and good for MDPs.*

We now show how to construct a slim GFM Buchi automaton. ¨

Theorem 2 (Slim GFM Buchi Automaton). ¨ *The automaton*

$$\mathcal{S} = \left\langle \Sigma, 3^{Q}, (Q\_0, \emptyset), \mathsf{ndet}(\delta\_B) \cup \mathsf{ndet}(\gamma\_{2,1}), \mathsf{ndet}(\gamma\_B) \cup \mathsf{ndet}(\gamma\_{2,1}) \right\rangle$$

*simulates* A*.* S *is slim, language equivalent to* B*, and good for MDPs.*

*Proof.* S is slim: its set of transitions is the union of two sets of deterministic transitions. We show that S simulates A by defining a strategy in the simulation game, which ensures that, if the spoiler produces a run <sup>S</sup><sup>0</sup> ...S<sup>j</sup>−<sup>1</sup>(S<sup>j</sup> , S- <sup>j</sup> )(Sj+1, S- <sup>j</sup>+1)... for <sup>A</sup>, then the duplicator produces a run (T0, T- <sup>0</sup>) ...(Tj−1, T- <sup>j</sup>−1)(T<sup>j</sup> , T- <sup>j</sup> )(Tj+1, T- <sup>j</sup>−1)... for <sup>S</sup>, such that (1) <sup>S</sup><sup>i</sup> <sup>⊆</sup> <sup>T</sup><sup>i</sup> holds for all <sup>i</sup> <sup>∈</sup> <sup>ω</sup>, and (2) if there are two accepting transitions - (Sk−1, S- <sup>k</sup>−1), σk,(Sk, S- k) and - (Sl−1, S- <sup>l</sup>−1), σl,(Sl, S- l) with k<l, there is an k<m <sup>≤</sup> <sup>l</sup>, such that - (Tm−1, T- <sup>m</sup>−1), σm(Tm, T- m) is accepting.

To obtain this, we describe a winning strategy for the duplicator while arguing inductively that it mainains (1). Note that (1) holds initially (T<sup>0</sup> = S0, induction basis).

Initial Phase: Every move of the spoiler—with some letter σ—that uses a transition from <sup>δ</sup>S—the subset part of <sup>A</sup>—is followed by a move from <sup>δ</sup><sup>B</sup> with the same letter σ. When the duplicator follows this strategy the following holds: when, after a pair of moves, the pebble of the spoiler is on state <sup>S</sup> <sup>⊆</sup> <sup>Q</sup>, then the pebble of the duplicator is on some state (S, S- ). In particular, (1) is preserved during this phase (induction step).

Transition Phase: The one spoiler move—with some letter σ—that uses a transition from <sup>Δ</sup>SB—the transition to the breakpoint part of <sup>A</sup>—is followed by a move from <sup>δ</sup><sup>B</sup> with the same letter σ. When the duplicator follows this strategy, and when, after the pair of moves, the pebble of the spoiler is on state (S, <sup>∅</sup>), then the pebble of the duplicator is on some state (T,T- ) with <sup>S</sup> <sup>⊆</sup> <sup>T</sup>. In particular, (1) is preserved (induction step).

Final Phase: When the spoiler moves from some state (S, S- )—with some letter σ that uses a transition from <sup>δ</sup>B—the breakpoint part of <sup>A</sup>—to (S, ¯ <sup>S</sup>¯- ), and when the duplicator is in some state (T,T- ), then the duplicator does the following. He calculates (T , ¯ <sup>∅</sup>) = <sup>γ</sup>2,<sup>1</sup> - (T,T- ), σ and checks if <sup>S</sup>¯ <sup>⊆</sup> <sup>T</sup>¯ holds. If <sup>S</sup>¯ <sup>⊆</sup> <sup>T</sup>¯ holds, he plays this transition from γ2,<sup>1</sup> (with the same letter σ). Otherwise, he plays the transition from δ<sup>B</sup> (with the same letter σ). In either case (1) is preserved (induction step), which closes the inductive argument for (1).

Note that no accepting transition of A is passed in the initial or tansition phase, so the two accepting transitions from (2) must both fall into the final phase.

To show (2), we first observe that S- <sup>k</sup> <sup>=</sup> <sup>∅</sup>, and thus <sup>S</sup>- <sup>k</sup> <sup>⊆</sup> <sup>T</sup>- <sup>k</sup> holds. Assuming for contradition that all transitions of <sup>S</sup> for <sup>σ</sup>k+1 ...σ<sup>l</sup>−<sup>1</sup> are non-accepting, we obtain using (1)—by a straightforward inductive argument that S- <sup>i</sup> <sup>⊆</sup> <sup>T</sup>- <sup>i</sup> for all <sup>i</sup> with <sup>k</sup>≤i<l. (Note that transitions in δ<sup>B</sup> are accepting when they are also be in γB.)

Using that S<sup>l</sup> = δS(S- <sup>l</sup>−<sup>1</sup>, σl) <sup>∪</sup> <sup>γ</sup>S(S<sup>l</sup>−<sup>1</sup>, σl) <sup>⊆</sup> <sup>δ</sup>S(T- <sup>l</sup>−<sup>1</sup>, σl) <sup>∪</sup> <sup>γ</sup>S(T<sup>l</sup>−<sup>1</sup>, σl) holds, the spoiler uses an accepting transition from γ<sup>2</sup>,<sup>1</sup> in this step.

Using Lemma 1, it now suffices to show that the language of S is included in the language of <sup>B</sup>. To show this, we simply argue that an accepting run <sup>ρ</sup> = (Q0, Q- 0),(Q1, Q- 1), (Q2, Q- 2),(Q3, Q- <sup>3</sup>),... of <sup>S</sup> on an input word <sup>α</sup> <sup>=</sup> <sup>σ</sup>0, σ1, σ2,... can be interpreted as a forest of finitely many finitely branching trees of overall infinite size, where all infinite branches are accepting runs of B. Konig's Lemma then proves the existence of ˝ an accepting run of B.

This forest is the usual one. The nodes are labeled by states of B, and the roots (level 0) are the initial states of <sup>B</sup>. Let <sup>I</sup> <sup>=</sup> <sup>i</sup> <sup>∈</sup> <sup>N</sup> <sup>|</sup> - (Q<sup>i</sup>−<sup>1</sup>, Q- <sup>i</sup>−<sup>1</sup>), σ<sup>i</sup>−<sup>1</sup>,(Qi, Q- i) <sup>∈</sup> <sup>Γ</sup> := ndet(γB)∪ndet(γ<sup>2</sup>,<sup>1</sup>) be the set of positions after accepting transitions in ρ. We define the predecessor function pred: <sup>N</sup> <sup>→</sup> <sup>I</sup>∪{0} with pred: <sup>i</sup> → max <sup>j</sup> <sup>∈</sup> <sup>I</sup>∪{0} | j<i .

We call a node with label q<sup>l</sup> on level l an end-point if one of the following applies: (1) <sup>q</sup><sup>l</sup> <sup>∈</sup>/ <sup>Q</sup><sup>l</sup> or (2) <sup>l</sup> <sup>∈</sup> <sup>I</sup> and for all <sup>j</sup> such that pred(l) <sup>≤</sup> j<l, where <sup>q</sup><sup>j</sup> is the label of the ancestor of this node on level <sup>j</sup>, we have (q<sup>j</sup> , σ<sup>j</sup> , qj+1) <sup>∈</sup>/ <sup>Γ</sup>.

Fig. 2. An NBA for G F a (in the upper right corner) together with an SLDBA and a slim NBA constructed from it. The SLDBA and the slim NBA are shown sharing their common part. State {0, 1}, produced by the subset construction, is the initial state of the SLDBA, while state ({0, 1}, ∅)—the initial state of the breakpoint construction—is the initial state of the slim NBA. States ({1}, ∅) and ({0}, ∅) are states of the breakpoint construction that only belong to the SLDBA because they are not reachable from ({0, 1}, ∅). The transitions out of {0, 1}, except the self loop, belong to ΔSB. The dashed-line transition from ({0, 1}, {0}) belongs to γ<sup>2</sup>,<sup>1</sup>

(1) may only happen after a transition from γ2,<sup>1</sup> has been taken, and the q<sup>l</sup> is not among the states that is traced henceforth. (2) identifies parts of the run tree that do not contain an accepting transition.

A node labeled with q<sup>l</sup> on level l that is not an endpoint has  <sup>δ</sup>S(ql, σl)  children, labeled with the different elements of δS(ql, σl). It is now easy to show by induction over i that the following holds.


Consequently, the forest is infinite, finitely branching, and finitely rooted, and thus contains an infinite path. By construction, this path is an accepting run of B.

The resulting automata are simple in structure and enable symbolic implementation (See Fig. 2). It cannot be expected that there are much smaller good for MDP automata, as its explicit construction is the only non-polynomial part in model checking MDPs.

Theorem 3. *Constructing a GFM Buchi automaton ¨* G *that recognizes the models of an LTL formula* ϕ *requires time doubly exponential in* ϕ*, and constructing a GFM Buchi ¨ automaton* <sup>G</sup> *that recognizes the language of an NBA* <sup>B</sup> *requires time exponential in* <sup>B</sup>*.*

*Proof.* As resulting automata are GFM, they can be used to model check MDPs M against this property, with cost polynomial in product of M and G. If G could be produced faster (and if they could, consequently be smaller) than claimed, it will contradict the 2-EXPTIME- and EXPTIME-hardness [4] of these model checking problems.

## 4 Accepting End-Component Simulation

An *end-component* [5,2] of an MDPMis a sub-MDPM ofMsuch that its underlying graph is strongly connected. A *maximal* end-component is maximal under set-inclusion. Every state of an MDP belongs to at most one maximal end-component.

Theorem 4 (End-Component Properties. Theorem 3.1 and Theorem 4.2 of [5]). *Once an end-component* C *of an MDP is entered, there is a strategy that visits every state-action combination in* C *infinitely often with probability* 1 *and stays in* C *forever.*

*For a product MDP, an* accepting end-component *(AEC) is an end-component that contains some transition in* Γ <sup>×</sup>*. There is a positional pure strategy for an AEC* C *that surely stays in* C *and almost surely visits a transition in* Γ <sup>×</sup> *infinitely often.*

*For a product MDP, there is a set of disjoint accepting end-components such that, from every state, the maximal probability to reach the union of these accepting endcomponents is the same as the maximal probability to satisfy* Γ <sup>×</sup>*. Moreover, this probability can be realized by combining a positional pure (reachability) strategy outside of this union with the aforementioned positional pure strategies for the individual AECs.*

Lemma 1 shows that the GFM property is preserved by simulation: For languageequivalent automata A and B, if A simulates B and B is GFM, then A is also GFM. However, a GFM automaton may not simulate a language-equivalent GFM automaton. (See Figure 3.) Therefore we introduce a coarser preorder, Accepting End-Component (AEC) simulation, that exploits the finiteness of the MDP M. We rely on Theorem 4 to focus on positional pure strategies for M×B. Under such strategies, M×B becomes a Markov chain [2] such that almost all its runs have the following properties:


With this in mind, we can intuitively ask the spoiler to pick a run through this Markov chain, and to disclose information about this run. Specifically, we can ask her to signal when she has reached an accepting LSCC5 in the Markov chain, and to provide information about this LSCC, in particular information entailed by the full list of sequences of transitions of some fixed length described above. Runs that can be identified to either not reach an accepting LSCC, to visit transitions not in this list, or to visit only a subset of sequences from this list, form a 0 set. In the simulation game we define below, we make use of this observation to discard such runs.

A simulation game can only use the syntactic material of the automata—-neither the MDP nor the strategy are available. The information the spoiler may provide cannot explicitly refer to them. What the spoiler may be asked to provide is information on when she has entered an accepting LSCC, and, once she has signaled this, which sequences of length <sup>l</sup> of *automata* transitions of <sup>B</sup> occur in the LSCC. The sequences of automata transitions are simply the projections on the automata transitions from the

<sup>5</sup> There is nothing to show when a non-accepting LSCC is reached—if <sup>B</sup> rejects, then <sup>A</sup> may reject too—nor when no LSCC is reached, as this occurs with probability 0.

sequences of transitions of length that occur in the LSCC L. We call this information a *gold-brim accepting end-component claim* of length , -GAEC claim for short.

The term "gold-brim" in the definition indicates that this is a powerful approach, but not one that can be implemented efficiently. We will define weaker, efficiently implementable notions of accepting end-component claims (AEC claims) later.

The AEC simulation game is very similar to the simulation game of Section 3.1. Both players produce an infinite run of their respective automata. If the spoiler makes an AEC claim, e.g., an -GAEC claim, we say that her run *complies* with it if, starting with the transition when the AEC claim is made, all states, transitions, or sequences of transitions in the claim appear infinitely often, and all states, transitions, and sequences of transitions the claim excludes do not appear. For an -GAEC claim, this means that all of the sequences of transitions of length in the claim occur infinitely often, and no other sequence of length occurs henceforth.

Thus, like a classic simulation game, an -GAEC simulation game is started by the spoiler, who places her pebble on an initial state of B. Next, the duplicator puts his pebble on an initial state of A. The two players then take turns, always starting with the spoiler choosing an input letter and an according transition from B, followed by the duplicator choosing a transition for the same letter in A.

Different from the classic simulation game, in an -GAEC simulation game, the spoiler has an additional move that she can (and, in order to win, has to) perform once in the game: In addition to choosing a letter and a transition, she can claim that she has reached an accepting end-component, and provide a complete list of sequences of automata transitions of length that can henceforth occur. This store is maintained, and never updated. It has no further effect on the rules of the game: Both players produce an infinite run of their respective automata. The duplicator has four ways to win:


For -GAEC claims, (4) simply means that the set of transitions defined by the sequences does not satisfy the Buchi, parity, or Rabin acceptance condition. ¨

Theorem 5. *[-GAEC Simulation] If* <sup>A</sup> *and* <sup>B</sup> *are language equivalent automata,* <sup>B</sup> *is GFM, and there exists an such that* <sup>A</sup> *-GAEC simulates* <sup>B</sup>*, then* <sup>A</sup> *is GFM.*

For the proof, we use an arbitrary (but fixed) MDP M, and an arbitrary (but fixed) pure optimal positional strategy <sup>μ</sup> for M×B, resulting in the Markov chain (M×B)μ. We assume w.l.o.g. that the accepting LSCCs in (M×B)<sup>μ</sup> are identified, e.g., by a bit.

Let τ be a winning strategy of the duplicator in an -GAEC simulation game. Abusing notation, we let <sup>τ</sup> ◦ <sup>μ</sup> denote the finite-memory strategy<sup>6</sup> obtained from <sup>μ</sup> and <sup>τ</sup> for M×A, where <sup>τ</sup> is acting only on the automata part of (M×B), and where the spoiler

<sup>6</sup> The strategy τ consists of one sub-strategy to be used before the AEC claim is made and one sub-strategy for each possible -GAEC claim. The memory of τ ◦ μ tracks the position in (M×B)μ. When an accepting LSCC is detected (via the marker bit) analysis of (M×B)<sup>μ</sup> reveals the only possible -GAEC claim. This claim is used to select the right entry from τ .

makes the move to the end-component when she is in some LSCC <sup>B</sup> of (M×B)<sup>μ</sup> and gives the full list of sequences of transitions of length that occur in B.

*Proof.* As B is good for MDPs, we only have to show that the chance of winning in (M×A)τ◦<sup>μ</sup> is at least the chance of winning in (M×B)μ. The chance of winning in (M×B)<sup>μ</sup> is the chance of reaching an accepting LSCC in (M×B)μ. It is also the chance of reaching an accepting LSCC <sup>L</sup> <sup>∈</sup> (M×B)<sup>μ</sup> *and*, after reaching <sup>L</sup>, to see exactly the sequences of transitions of length that occur in L, and to see all of them infinitely often.

By construction, <sup>τ</sup> ◦ <sup>μ</sup> will translate those runs into accepting runs of (M×A)<sup>τ</sup>◦<sup>μ</sup>, such that the chance of an accepting run of (M×A)<sup>τ</sup>◦<sup>μ</sup> is at least the chance of an accepting run of (M×B)μ. As <sup>μ</sup> is optimal, the chance of winning in M×A is at least the chance of winning in M×B. As B is GFM, this is the chance of M producing a run accepted by B (and thus A) when controlled optimally, which is an upper bound on the chance of winning in M×A.

An -GAEC simulation, especially for large , results in very large state spaces, because the spoiler has to list all sequences of transitions of <sup>B</sup> of length that will appear infinitely often. No other sequence of length may then appear in the run7. This can, of course, be prohibitively expensive.

As a compromise, one can use coarser-grained information at the cost of reducing the duplicator's ability of winning the game. E.g., the spoiler could be asked to only reveal a transition that is repeated infinitely often, plus (when using more powerful acceptance conditions than Buchi), some acceptance information, say the dominating ¨ priority in a parity game or a winning Rabin pair. This type of coarse-grained claim can be refined slightly by allowing the *duplicator* to change at any time the transition that is to appear infinitely often to the transition just used by the spoiler. Generally, we say that an AEC simulation game is any simulation game, where


The requirement that a winning spoiler strategy translates into a winning spoiler strategy in an -GAEC game entails that AEC simulation games can prove the GFM property.

Corollary 2. *[AEC Simulation] If* A *and* B *are language equivalent automata,* B *is good for MDPs, and* A *AEC-simulates* B*, then* A *is good for MDPs.*

<sup>7</sup> The AEC claim provides information about the accepting LSCC in the product under the chosen pure positional strategy. When the AEC claim requires the exclusion of states, transitions, or sequences of transitions, then they are therefore surely excluded, whereas when it requires inclusion of, and thus inclusion of infinitely many occurrances of, states, trasitions, or sequences of transitions, then they (only) occur almost surely infinitely often. Yet, runs that do not contain them all infinitely often form a zero set, and can thus be ignored.

Fig. 3. Automata A (left) and B (right) for ϕ = (G F a) ∨ (G F b). The dotted transitions are accepting. The NBA A does not simulate the DBA B: B can play a's until A moves to either the state on the left, or the state on the right. B then wins by henceforth playing only b's or only a's. However, A is good for MDPs. It wins the AEC simulation game by waiting until an AEC is reached (by B), and then check if a or b occurs infinitely often in this AEC. Based on this knowledge, A can make its decision. This can be shown by AEC simulation if B has to provide sufficient information, such as a list of transitions—or even a list of letters—that occur infinitely often. The amount of information the spoiler has to provide determines the strength of the AEC simulation used. If, e.g., B only has to reveal one accepting transition of the end-component, then it can select an end-component where the revealed transition is (b1, c, b0), which does not provide sufficient information. Whereas, if the duplicator is allowed to update the transition, then the duplicator wins by updating the recorded transition to the next a or b transition

Of course, for every AEC simulation, one first has to prove that winning strategies for the spoiler translate. We have used two simple variations of the AEC simulation games:

accepting transition: the spoiler may only make her AEC claim when taking an accepting transition; this transition—and no other information—is stored, and the spoiler commits to—and commits only to—seeing this transition infinitely often;

accepting transition with update: different to the *accepting transition* AEC simulation game, the duplicator can—but does not have to—update the stored accepting transition whenever the spoiler passes by an accepting transition.

Theorem 6. *Both, the* accepted transition *and the* accepted transition with update *AEC simulation, can be used to establish the good for MDPs property.*

To show this, we describe the strategy translations in accordance with Corollary 2.

*Proof.* In both cases, the translation of a winning strategy of the spoiler for the 1-GAEC simulation game are straightforward: The spoiler essentially follows her winning strategy from the 1-GAEC simulation game, with the extra rule that she will make her AEC claim to the duplicator on the first accepting transition on or after her AEC claim in the 1-GAEC claim. If the duplicator is allowed to update the transition, this information is ignored by the spoiler—she plays according to her winning strategy from the 1-GAEC simulation game. Naturally, the resulting play will comply with her 1-GAEC claim, and will thus also be winning for the—weaker—AEC claim made to the duplicator.

We use AEC simulation to identify GFM automata among the automata produced (e.g., by SPOT [8]) at the beginning of the transformation. Figure 3 shows an example for which the duplicator wins the AEC simulation game, but loses the ordinary simulation game. Candidates for automata to simulate are, e.g., the slim GFM Buchi automata ¨ and the limit deterministic Buchi automata discussed above. ¨

## 5 Evaluation

#### 5.1 Size of General Buchi Automata for Probabilistic Model Checking ¨

As discussed, automata that simulate slim automata or SLDBAs are good for MDPs. This fact can be used to allow Buchi automata produced from general-purpose tools ¨ such as SPOT's [8] ltl2tgba rather than using specialized automata types. Automata produced by such tools are often smaller because such general-purpose tools are highly optimized and not restricted to producing slim or limit deterministic automata. Thus, one produces an arbitrary Buchi automaton using any available method, then transforms ¨ this automaton into a slim or limit deterministic automaton, and finally checks whether the original automaton simulates the generated one.

We have evaluated this idea on random LTL formulas produced by SPOT's tool randltl. We have set the tree size, which influences the size of the formulas, to 50, and have produced 1000 formulas with 4 atomic propositions each. We left the other values to their defaults. We have then used SPOT's ltl2tgba (version 2.7) to turn these formulas into non-generalized Buchi automata using default options. Finally, for each ¨ automaton, we have used our tool to check whether the automaton simulates a limit deterministic automaton that we produce from this automaton. For comparison, we have also used Owl's [29] tool ltl2ldba (version 19.06.03) to compute limit deterministic nongeneralized Buchi automata. We have also used the option of this tool to compute Buchi ¨ automata with a nondeterministic initial part. We used 10 minute timeouts.

Of these 1000 formulas, 315 can be transformed to deterministic Buchi automata. ¨ For an additional 103 other automata generated, standard simulation sufficed to show that they are GFM. For a further 11 of them, the simplest AEC simulation (the spoiler chooses an accepting transition to occur infinitely often) sufficed, and another 1 could be classed GFM by allowing the duplicator to update the transition. 501 automata turned out to be nonsimulatable and for 69 we did not get a decision due to a timeout.

For the LTL formulas for which ltl2tgba could not produce deterministic automata, but for which simulation could be shown, the number of states in the generated automata was often lower than the number of states in the automata produced by Owl's tools. On average, the number of states per automaton was ≈15.21 for SPOT's ltl2tgba; while for Owl's ltl2ldba it was ≈46.35. The extended version of this paper [13] contains more details about the evaluation.

Fig. 4. Deciles ratio ltl2tgba /semi-deterministic automata Let us consider the ratio between the size of automata produced by ltl2tgba and the size of semi-deterministic automata produced by Owl. The average of this number for all automata that are not deterministic and that can be simulated in some way is <sup>≈</sup> <sup>1</sup>.0335. This means that on average, for these automata, the semi-deterministic automata are slightly smaller. If we take a look at the first 5 deciles depicted in Fig. 4, we see that there is a large number of formulas for which ltl2tgba and Owl

produce automata of the same size. For around 24.3478% of the cases, automata by SPOT are smaller than those produced by Owl (ratio < 1).

#### 5.2 GFM Automata and Reinforcement Learning

SLDBAs have been used in [12] for model-free reinforcement learning of ω-regular objectives. While the Buchi acceptance condition allows for a faithful translation of ¨ the objective to a scalar reward, the agent has to learn how to control the automaton's nondeterministic choices; that is, the agent has to learn when the SLDBA should cross from the initial component to the accepting component to produce a successful run of a behavior that satisfies the given objective.

Any GFM automaton with a Buchi acceptance condition can be used instead of ¨ an SLDBA in the approach of [12]. While in many cases SLDBAs work well, GFM automata that are not limit-deterministic may provide a significant advantage.

Early during training, the agent relies on uniform random choices to discover policies that lead to successful episodes. This includes randomly resolving the automaton nondeterminism. If random choices are unlikely to produce successful runs of the automaton in case of behaviors that should be accepted, learning is hampered because good behaviors are not rewarded. Therefore, GFM automata that are more likely to accept under random choices will result in the agent learning more quickly. We have found the following properties of GFM automata to affect the agent's learning ability.

Low branching degree. A low branching degree presents the agent with fewer alternatives, reducing the expected number of trials before the agent finds a good combination of choices. Consider an MDP and an automaton that require a specific sequence of k nondeterministic choices in order for the automaton to accept. If at each choice there are b equiprobable options, the correct sequence is obtained with probability b−<sup>k</sup>.

Cautiousness. An automaton that enables fewer nondeterministic choices for the same finite input word gives the agent fewer chances to choose wrong. The slim automata construction has the interesting property of "collecting hints of acceptance" before a nondeterministic choice is enabled because S has to be nonempty for a γ2,<sup>1</sup> transition to be present and that requires going through at least one accepting transition.

Forgiveness. Mistakes made in resolving nondeterminism may be irrecoverable. This is often true of SLDBAs meant for model checking, in which jumps are made to select a subformula to be eventually satisfied. However, general GFM automata, thanks also to their less constrained structure, may be constructed to "forgive mistakes" by giving more chances of picking a successful run.

Figure 5 compares a typical SLDBA to an automaton that is not limit-deterministic and is not produced by the breakpoint construction, but is proved GFM by AEC simulation. This latter automaton has a nondeterministic choice in state <sup>q</sup><sup>0</sup> on letter <sup>x</sup> ∧ ¬<sup>y</sup> that can be made an unbounded number of times. The agent may choose q<sup>1</sup> repeatedly even if eventually F Gx is false and G F y is true. With the SLDBA, on the other hand, there is no room for error.

A Case Study. We compared the effectiveness in learning to control a cart-pole model of three automata for the property - (F Gx) <sup>∨</sup> (G F <sup>y</sup>) ∧ Gsafe. The safety component of the objective is to keep the pole balanced and the cart on the track. The left two thirds of the track alternate between x and y at each step. The right third is always labeled y, but in order to reach it, the cart has to cross a barrier, with probability 1/3 of failing.

The three automata are an SLDBA (4 states), a slim automaton (8 states), and a handcrafted forgiving automaton (4 states) similar to the one of Fig. 5.

Fig. 5. Two GFM automata for (F G x) ∨ (G F y). SLDBA (left), and forgiving (right)

Training of the continuous-statespace model employed PPO [28] as implemented in OpenAI Baselines [6]. Figure 6 shows the learning curves for the three automata averaged over ten runs. They underline the importance of choosing the right automaton in RL. Training parameters, more details on the model, and additional examples can be found in the extended version of this paper [13].

6 Conclusion

#### Fig. 6. Learning curves

We have defined the class of automata that are *good for MDPs*—nondeterministic automata that can be used for the analysis of MDPs—and shown it to be closed under different simulation relations. This has multiple favorable implications for model checking and reinforcement learning. Closure under classic simulation opens a rich toolbox of statespace reduction techniques that come in handy to push the boundary of analysis techniques, while the more powerful (and more expensive) AEC simulation has promise to identify source automata that happen to be good for MDPs.

The wider class of GFM automata also shows promise: the slim automata we have defined to tame the branching degree while retaining the desirable Buchi condition for ¨ reinforcement learning are able to compete even against optimized SLDBAs.

As outlined in Section 5.2, a low branching degree, cautiousness, and forgiveness make automata particularly well-suited for learning. From a practical point of view, much of the power of this new approach is in harnessing the power of simulation for learning, and forgiveness is closely related to simulation.

The natural follow-up research is to tap the full potential of simulation-based statespace reduction instead of the limited version that we have implemented. Besides using this to get the statespace small—useful for model checking—we will use simulation to construct forgiving automata, which is promising for reinforcement learning.

Datasets generated and analyzed during the current study are available at: https://doi.org/10.6084/m9.figshare.11882739 [35,36]

## References


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Farkas certificates and minimal witnesses for probabilistic reachability constraints**

Florian Funke , Simon Jantsch , and Christel Baier

Technische Universit¨at Dresden, Germany- {florian.funke, simon.jantsch, christel.baier}@tu-dresden.de

**Abstract.** This paper introduces Farkas certificates for lower and upper bounds on minimal and maximal reachability probabilities in Markov decision processes (MDP), which we derive using an MDP-variant of Farkas' Lemma. The set of all such certificates is shown to form a polytope whose points correspond to witnessing subsystems of the model and the property. Using this correspondence we can translate the problem of finding minimal witnesses to the problem of finding vertices with a maximal number of zeros. While computing such vertices is computationally hard in general, we derive new heuristics from our formulations that exhibit competitive performance compared to state-of-the-art techniques. As an argument that asymptotically better algorithms cannot be hoped for, we show that the decision version of finding minimal witnesses is NP-complete even for acyclic Markov chains.

## **1 Introduction**

The goal of program verification is to consolidate the user's trust that a given system works as intended, and if this is not the case, to provide her with useful diagnostic information. Verification tools may, however, contain bugs and so a last grain of insecurity regarding their results always remains. A widely acknowledged approach to overcome this dilemma has been made in the form of *certifying algorithms* [17, 64]. These algorithms provide every result with an accompanying *certificate*, i.e., a token that can be used to verify the result independently and with little ressources. In this way, certificates enable the user (or a third party) to quickly give a mathematically rigorous proof for the correctness of the result *irrespective* of whether the algorithm itself works correctly.

*Counterexamples*, i.e. certificates for the violation of a property, can often be obtained as a byproduct of verification procedures. What constitutes a counterexample is highly context-dependent. Finite executions suffice as counterexamples for safety properties and single, possibly infinite, executions are viable counterexamples for LTL [29]. Tree-like counterexamples have been considered for

<sup>-</sup> This work was funded by DFG grant 389792660 as part of TRR 248, the Cluster of Excellence EXC 2050/1 (CeTI, project ID 390696704, as part of Germany's Excellence Strategy), DFG-projects BA-1679/11-1 and BA-1679/12-1, and the Research Training Group QuantLA (GRK 1763).

fragments of CTL [28]. For a probabilistic system M and a linear time property φ, the most prominent notion of counterexample to PrM(φ) < λ is a set of paths satisfying φ whose probability mass is at least λ (see [1] for a survey).

Another notion of counterexample for probabilistic systems M and properties of the form PrM(φ) < λ are *critical subsystems* [1]. We adopt the reverse perspective and call a subsystem M of M a *witnessing subsystem* for the property PrM(φ) ≥ λ if PrM- (φ) ≥ λ. Small witnessing subsystems offer an insight into what parts of the system are responsible for the satisfaction of the property. Nonetheless, witnessing subsystems can hardly be regarded as viable certificates since verifying PrM-(φ) ≥ λ is as hard as checking PrM(φ) ≥ λ itself.

In this paper we build a solid bridge between certificates and witnessing subsystems. The systems we consider are modeled as Markov decision processes (MDP), which contain an absorbing goal state representing a desirable outcome. This approach is motivated by the fact that numerous model checking tasks can be reduced to reachability problems [3, 31, 32, 46, 73, 74].

Using Farkas' Lemma, we introduce certificates for bounds on the minimal and maximal probability to reach the goal state. We show that the set of these certificates forms a polytope and we provide a direct translation of a certificate to a witnessing subsystems for lower bounded threshold properties. Thereby, we bridge the gap between an abstract gadget, serving solely as a proof that the result is correct, and a concrete object, containing crucial diagnostic information about *why* the result holds. Moreover, our translation reduces the computation of minimal witnessing subsystems to a purely geometric problem, for which we provide and evaluate new exact and heuristic algorithms.

All omitted proofs can be found in the full version of this paper [42].

#### **Contributions.**



**Table 1:** Overview of Farkas certificates for reachability properties in MDPs (where ∈ {≤, <} and ∈ {≥, >}).

**Related work.** The fundament of certifying algorithms has been surveyed in [64]. In the context of model checking, the most prominent approach for the certification of a positive result has been to construct a proof of the property in the system [15, 66, 67]. Rank-based certificates for the emptiness of a certain automaton [57] can be used to certify positive model checking results. Model checking MDPs in the presence of multiple objectives has been studied in [37, 39].

Heuristic approaches for computing small witnessing subsystems in DTMCs have been proposed in [5, 7, 49, 51, 52] and implemented in the tool Comics [50]. Witnessing subsystems in MDPs have been considered in [6, 9] and [19], which focuses on succinctly representing witnessing schedulers. The mixed integer linear programming (MILP) formulation of [77, 78] allows for an exact computation of minimal witnessing subsystems for the property **Pr**max <sup>s</sup><sup>0</sup> (♦ goal) <sup>λ</sup>. NPcompleteness of computing minimal witnessing subsystems in MDPs was shown in [24], but the exact complexity has, to the best of our knowledge, not been determined for DTMCs (the problem was conjectured to be NP-complete in [77]).

Minimal probabilistic counterexamples given as sets of paths can be computed by reframing the problem as a k-shortest-path problem [44, 45]. Regular expressions have been considered to succinctly represent the set of paths in [33], and extensions were proposed in [18, 76]. The tool Dipro [4] computes probabilistic counterexamples, and a translation of these to fault trees was given in [56]. Another, learning-based, approach [20] also enumerates paths and produces a witnessing subsystem as a byproduct. But none of these approaches considers state-based minimality. Probabilistic counterexamples can be used to automatically guide iterative and refinement-based model checking techniques [23–25, 27, 48, 53].

Farkas' Lemma is a well-known source of certificates for the (in)feasibility of tasks in combinatorial optimization, operations research, and economics, as presented in the detailed historical account given in [70, pp. 209–226] as well as [62, Chapter 2] and [30, 65, 75]. The lecture notes [71] contain a rich variety of applications of linear programming in general and Farkas' Lemma in particular.

## **2 Preliminaries**

**Polyhedra and Farkas' Lemma.** Throughout the article we write the dot product of two vectors **<sup>x</sup>**, **<sup>y</sup>** <sup>∈</sup> <sup>R</sup><sup>n</sup> as **xy** or **<sup>x</sup>** · **<sup>y</sup>**. A *halfspace* in <sup>R</sup><sup>n</sup> is a set <sup>H</sup> <sup>=</sup> {**<sup>v</sup>** <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>|</sup> **<sup>a</sup>** · **<sup>v</sup>** <sup>≤</sup> <sup>b</sup>} for some non-trivial **<sup>a</sup>** <sup>∈</sup> <sup>R</sup><sup>n</sup> and <sup>b</sup> <sup>∈</sup> <sup>R</sup>. A *polyhedron* is the intersection of finitely many halfspaces, and a *polytope* is a bounded polyhedron. A *face* of a polyhedron P is a subset F ⊆ P of the form F = {**x** ∈ <sup>P</sup> <sup>|</sup> **<sup>a</sup>**·**<sup>x</sup>** <sup>=</sup> max{**a**·**<sup>y</sup>** <sup>|</sup> **<sup>y</sup>** <sup>∈</sup> <sup>P</sup>}} for some **<sup>a</sup>** <sup>∈</sup> <sup>R</sup>n. A *vertex* of <sup>P</sup> is a face consisting of only one point.

Farkas' Lemma [38] is part of the fundament of polyhedra theory and linear programming. It provides a natural source of certificates showing the infeasibility of a given system of inequalites, or in other words, the emptiness of the polyhedron described by the system. We will use it in the following version.

**Lemma 2.1 (Farkas' Lemma, cf. [70, Corollary 7.1f on p. 90]).** *Let* **<sup>A</sup>** <sup>∈</sup> <sup>R</sup><sup>m</sup>×<sup>n</sup> *and* **<sup>b</sup>** <sup>∈</sup> <sup>R</sup><sup>m</sup>*. Then there exists* **<sup>z</sup>** <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>≥</sup><sup>0</sup> *with* **Az** <sup>≤</sup> **<sup>b</sup>** *if and only if there does* not *exist* **<sup>y</sup>** <sup>∈</sup> <sup>R</sup><sup>m</sup> <sup>≥</sup><sup>0</sup> *with* **yA** <sup>≥</sup> <sup>0</sup> <sup>∧</sup> **yb** <sup>&</sup>lt; <sup>0</sup>*.*

**Markov decision processes.** A *Markov decision process* (MDP) is a tuple M = (S, Act, ι, **P**), where S is a finite set of *states*, Act is a finite set of *actions*, ι is a probability distribution on S called the *initial distribution* of M, and **P** -: S × Act ×S → [0, 1] is the *transition probability function* where we require s-<sup>∈</sup><sup>S</sup> **<sup>P</sup>**(s, α, s- ) ∈ {0, 1} for all s ∈ S and α ∈ Act. An action α is *enabled* in state s ∈ S if - s-<sup>∈</sup><sup>S</sup> **<sup>P</sup>**(s, α, s- ) = 1. The set of enabled actions at state s are denoted by Act(s), and we require Act(s) <sup>=</sup> <sup>∅</sup> for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>. A *path* in an MDP M is an infinite sequence s0α0s1α1... such that **P**(si, αi, si+1) > 0 for all i ≥ 0. A finite path is a finite sequence π = s0α0s1α1...s<sup>n</sup> with the same condition for all 0 ≤ i ≤ n − 1. In this case, we define last(π) = sn. Denote by Paths(M) and Pathsfin(M) the set of infinite and finite paths in M.

A *discrete-time Markov chain* (DTMC) is an MDP with a single action which is enabled at every state. If M is a DTMC, then Paths(M) carries a probability measure, where the associated σ-algebra is generated by the cylinder sets Cyl(τ ) = {π ∈ Paths(M) | π has prefix τ} of finite paths τ = s0s1...s<sup>n</sup> in M with probability Pr(Cyl(τ )) = ι(s0) · <sup>0</sup>≤i<n **<sup>P</sup>**(si, si+1) (fore more details see [13, Section 10.1]). In the following we denote for a finite set X the set of probability distributions on X by Dist(X). Given μ ∈ Dist(X) let the *support* of μ be supp(μ) = {x ∈ X | μ(x) > 0}.

A *deterministic scheduler* is a function S: Pathsfin(M) → Act such that S(π) ∈ Act(last(π)) and a *randomized scheduler* is a function S: Pathsfin(M) → Dist(Act) such that supp(S(π)) ⊆ Act(last(π)) for all π ∈ Pathsfin(M). Given a deterministic (or randomized) scheduler S, a path π = s0α0s1α1... in M is an S*-path* if α<sup>i</sup> = S(s0α0...si) (or α<sup>i</sup> ∈ supp(S(s0α0...si))) for all i ≥ 0.

We denote by Pr<sup>S</sup> the probability measure on infinite S-paths (see [13, Definition 10.92 on page 843] for more details). If we replace ι with the distribution concentrated on state s, then we obtain a probability measure Pr<sup>S</sup> <sup>M</sup>,s or short Pr<sup>S</sup> s on infinite S-paths starting in s. The scheduler is *memoryless* if S(π) = S(last(π)) for all π ∈ Pathsfin(M). We abbreviate memoryless deterministic schedulers as *MD-schedulers* and memoryless randomized schedulers as *MR-schedulers*.

Given a state t ∈ S, we let

$$\mathbf{Pr}\_s^{\text{max}}(\Diamond t) = \sup\_{\mathfrak{S}} \text{Pr}\_s^{\mathfrak{S}}(\Diamond t) \quad \text{and} \quad \mathbf{Pr}\_s^{\text{min}}(\Diamond t) = \inf\_{\mathfrak{S}} \text{Pr}\_s^{\mathfrak{S}}(\Diamond t)$$

denote the maximal and minimal probability to reach t eventually when starting in s and set **Pr**min(♦t)=(**Pr**min <sup>s</sup> (♦t))s∈<sup>S</sup> and **Pr**max(♦t)=(**Pr**max <sup>s</sup> (♦t))s∈S. The supremum and infimum is indeed attained by an MD-scheduler [13, Lemmata 10.102 and 10.113], thus justifying the superscripts.

**Setting 2.2.** Henceforth we will assume that M = (Sall, Act, ι, **P**) has a unique initial state s<sup>0</sup> ∈ S and two distinguished absorbing states fail and goal ∈ Sall, i.e., **P**(goal, α, s) = 0 for all α ∈ Act and s ∈ Sall with s = goal, and likewise for fail. Here goal represents a desirable outcome of the modeled system and fail an outcome that is to be avoided. We use the notation S = Sall \ {fail, goal}, we assume that every state s ∈ S is reachable from s0. We also assume that under every scheduler fail or goal is reachable from any state, i.e., **Pr**min <sup>s</sup> (♦(goal <sup>∨</sup> fail)) <sup>&</sup>gt; 0 for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>. If <sup>M</sup> does not satisfy this condition from the start, we can apply a standard preprocessing step, which is essentially given by taking the MEC quotient of M, see [2, 3] and also [26]. While it is often easier to verify the condition **Pr**min <sup>s</sup> (♦(goal <sup>∨</sup> fail)) <sup>&</sup>gt; 0, it is in fact equivalent to **Pr**min <sup>s</sup> (♦(goal <sup>∨</sup> fail)) = 1 (see the full version [42]).

Whenever suitable, we denote by M also the set of enabled state-action pairs, i.e., <sup>M</sup> <sup>=</sup> {(s, α) <sup>∈</sup> <sup>S</sup> <sup>×</sup> Act <sup>|</sup> <sup>α</sup> <sup>∈</sup> Act(s)}. Let **<sup>A</sup>** <sup>∈</sup> <sup>R</sup>M×<sup>S</sup> be defined by

$$\mathbf{A}((s,\alpha),t) = \begin{cases} 1 - \mathbf{P}(s,\alpha,s), & \text{if } s = t \\ -\mathbf{P}(s,\alpha,t), & \text{if } s \neq t \end{cases}$$

We denote by **<sup>b</sup>** = (**b**(s, α))(s,α)∈M <sup>∈</sup> <sup>R</sup><sup>M</sup> with **<sup>b</sup>**(s, α) = **<sup>P</sup>**(s, α, goal) and by δ<sup>s</sup><sup>0</sup> the probability distribution that assigns 1 to s0, and 0 to all other states.

The vectors **Pr**min(♦ goal) and **Pr**max(♦ goal) can be characterized using the following linear programs. Although this characterization is well-known, we give a proof in the full version [42] due to slight differences with the standard literature.

**Proposition 2.3 (LP characterization, cf. [16, Lemma 8]).** *Let* M *be an MDP as in Setting 2.2 and let <sup>δ</sup>* <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>&</sup>gt;0*. Then the vectors* **Pr**min(♦ goal) *and* **Pr**max(♦ goal) *are, respectively, the* unique *solution of the LPs*

max *δ* · **z** *s.t.* **Az** ≤ **b** *and* min *δ* · **z** *s.t.* **Az** ≥ **b**.

## **3 Farkas certificates for reachability in MDPs**

In this section we establish certificates for the following statements:


where ∈ {≤, <} and ∈ {≥, >}. The basis of our construction is the LP characterization of the probabilities above and, crucially, Farkas' Lemma.

**Certificates for universally-quantified statements.** In order to deal with the cases (1) and (3), we need the following lemma proved in the full version [42].

**Lemma 3.1.** *For* **<sup>A</sup>** <sup>∈</sup> <sup>R</sup>M×S, **<sup>b</sup>** <sup>∈</sup> <sup>R</sup><sup>M</sup> *as in Setting 2.2, we have for all* **<sup>z</sup>** <sup>∈</sup> <sup>R</sup><sup>S</sup> :

$$\begin{aligned} \mathbf{Az} \le \mathbf{b} &\implies \mathbf{z} \le \mathbf{Pr}^{\min}(\diamond \text{goal})\\ \mathbf{Az} \ge \mathbf{b} &\implies \mathbf{z} \ge \mathbf{Pr}^{\max}(\diamond \text{goal}) \end{aligned}$$

**Corollary 3.2.** *For* ∈ {≥, >} *and* ∈ {≤, <} *we have*

$$\begin{aligned} \mathbf{Pr}\_{s\_0}^{\min}(\diamondsuit \text{goal}) &\gtrsim \lambda \iff \exists \mathbf{z} \in \mathbb{R}^S. \mathbf{Az} \le \mathbf{b} \land \mathbf{z}(s\_0) \gtrsim \lambda\\ \mathbf{Pr}\_{s\_0}^{\max}(\diamondsuit \text{goal}) &\lesssim \lambda \iff \exists \mathbf{z} \in \mathbb{R}^S. \mathbf{Az} \ge \mathbf{b} \land \mathbf{z}(s\_0) \lesssim \lambda \end{aligned}$$

*Proof.* For the direction from left to right, we take **z** to be **Pr**min(♦ goal). The opposite direction follows from Lemma 3.1.

The right hand sides of Corollary 3.2 provide *certifying* formulations for problems (1) and (3): to check whether the corresponding threshold statement holds, one must merely check whether **z** satisfies the inequalities, rather than checking whether **Pr**min / max <sup>s</sup><sup>0</sup> (♦ goal) was computed correctly. If the threshold condition is satisfied, then the vectors **Pr**min / max <sup>s</sup><sup>0</sup> (♦ goal) are also valid certificates.

**Certificates for existentially-quantified statements.** To find certificates for the cases (2) and (4), we calculate:

$$\begin{array}{c} \mathbf{Pr}\_{s\_0}^{\text{min}}(\diamondsuit\text{goal}) < \lambda\\ \xleftarrow{\text{Cor.}} \exists\quad\neg\exists\mathbf{z}\in\mathbb{R}\_{\geq0}^{S}.\quad\mathbf{A}\mathbf{z}\leq\mathbf{b}\wedge\mathbf{z}(s\_0)\geq\lambda\\ \iff\quad\neg\exists\mathbf{z}\in\mathbb{R}\_{\geq0}^{S}.\quad\begin{pmatrix}\mathbf{A}\\\\ -1\,\boldsymbol{0}\ldots\boldsymbol{0}\end{pmatrix}\mathbf{z}\leq\begin{pmatrix}\mathbf{b}\\\\ -\lambda\end{pmatrix}\\ \xleftarrow{\text{Lem.}}\exists\quad\mathbf{z}\in\mathbb{R}\_{\geq0}^{M}, y^{\*}\geq\mathbf{0}.\quad(\mathbf{y},y^{\*})\begin{pmatrix}\mathbf{A}\\\\ -1\,\boldsymbol{0}\ldots\boldsymbol{0}\end{pmatrix}\geq\mathbf{0}\wedge(\mathbf{y},y^{\*})\begin{pmatrix}\mathbf{b}\\\\ -\lambda\end{pmatrix}<\mathbf{0}\end{array}$$
 
$$\iff\quad\exists\mathbf{y}\in\mathbb{R}\_{\geq0}^{M}.\quad\mathbf{y}\mathbf{A}\geq\delta\_{\text{s}0}\wedge\mathbf{y}\mathbf{b}<\lambda.$$

For non-strict inequalities, we apply Farkas' Lemma in the opposite direction:

$$\begin{split} \mathbf{Pr}\_{s\_0}^{\min}(\diamondsuit\text{goal}) &\leq \lambda \\ \xleftarrow{\text{Cor. 3.2}} &\neg \exists \mathbf{z} \in \mathbb{R}\_{\geq 0}^{S}. \quad \mathbf{Az} \leq \mathbf{b} \wedge \mathbf{z}(s\_0) > \lambda \\ \iff &\neg \exists \mathbf{z} \in \mathbb{R}\_{\geq 0}^{S}, z^\* \geq 0. \quad \left(-\mathbf{A} \ \mathbf{b}\right) \left(\begin{matrix} \mathbf{z} \\ z^\* \end{matrix}\right) \geq 0 \wedge \left(-\delta\_{s\_0} \ \lambda\right) \left(\begin{matrix} \mathbf{z} \\ z^\* \end{matrix}\right) < 0 \\ \xleftarrow{\text{Term. 2.1}} &\exists \mathbf{y} \in \mathbb{R}\_{\geq 0}^{M}. \ \mathbf{y}\left(-\mathbf{A} \ \mathbf{b}\right) \leq \left(-\delta\_{s\_0} \ \lambda\right) \\ \iff &\exists \mathbf{y} \in \mathbb{R}\_{\geq 0}^{M}. \ \mathbf{y}\mathbf{A} \geq \delta\_{s\_0} \wedge \mathbf{y}\mathbf{b} \leq \lambda. \end{split}$$

The deductions for **Pr**max(♦ goal) are analogous, so that we get:

**Proposition 3.3.** *For* ∈ {≥, >} *and* ∈ {≤, <} *we have*

$$\begin{aligned} \mathbf{Pr}\_{s\_0}^{\min}(\diamondsuit \text{goal}) \lesssim \lambda &\iff \exists \mathbf{y} \in \mathbb{R}\_{\geq 0}^{\mathcal{M}} . \ \mathbf{y} \mathbf{A} \geq \delta\_{s\_0} \land \mathbf{y} \mathbf{b} \lesssim \lambda \\\mathbf{Pr}\_{s\_0}^{\max}(\diamondsuit \text{goal}) \gtrsim \lambda &\iff \exists \mathbf{y} \in \mathbb{R}\_{\geq 0}^{\mathcal{M}} . \ \mathbf{y} \mathbf{A} \leq \delta\_{s\_0} \land \mathbf{y} \mathbf{b} \gtrsim \lambda \end{aligned}$$

Together, Corollary 3.2 and Proposition 3.3 give us all certificate conditions of Table 1.

## **4 Minimal witnesses for reachability in MDPs**

In this section we consider the following problem: Given an MDP M that satisfies the property **Pr**min <sup>M</sup>,s<sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup> (or **Pr**max <sup>M</sup>,s<sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup>), find a small subsystem M of M that still satisfies these thresholds. Such a subsystem is a witness to the satisfaction of the property in M. We first define subsystems and consider different measures of size which we show to be equivalent. Then we deal with the question of finding minimal witnessing subsystems.

**Subsystems, witnesses and notions of minimality.** Our definition of subsystem is essentially the same to the definition in [77, 78] that was used for witnessing subsystems of **Pr**max <sup>M</sup>,s<sup>0</sup> (♦ goal) <sup>λ</sup>. From now on we restrict our attention to properties of the form **Pr**min / max <sup>M</sup>,s<sup>0</sup> (♦ goal) <sup>λ</sup>. One can deal with upper bounds by exchanging the roles of fail and goal and invoking the equality **Pr**min <sup>M</sup>,s<sup>0</sup> (♦ goal)=1−**Pr**max <sup>M</sup>,s<sup>0</sup> (♦ fail), which holds by the conditions of Setting 2.2.

Intuitively, a subsystem M of M contains a subset of states of M, and a transition of M originating in a state of M remains unchanged in M or is redirected to fail (instead of explicitely redirecting to fail, sub-stochastic distributions are used in [77, 78] with the same effect).

**Definition 4.1 (Subsystem and witness).** *Let* M = (Sall, Act, s0, **P**) *be an MDP as in Setting 2.2. A* subsystem M- ⊆ M *is an MDP* M- = (S- all, Act, s0, **P**- ) *with* fail, goal ∈ S- all ⊆ Sall*,* ActM- (s) = ActM(s) *for all* s ∈ S- all*, and for all* s, t ∈ S- all *with* t = fail *and* α ∈ Act *we have*

$$\mathbf{P}'(s,\alpha,t) > 0 \Longrightarrow \mathbf{P}'(s,\alpha,t) = \mathbf{P}(s,\alpha,t).$$

*We say that the states* Sall \S- all *and the transitions* (s, α, t) *with* **P**(s, α, t) > 0 *and* **P**- (s, α, t)=0 *have been* deleted *in* M- *. A* witness *for* **Pr**min / max <sup>M</sup>,s<sup>0</sup> (♦ goal) <sup>λ</sup> *is a subsystem* M- ⊆ M *such that* **Pr**min / max M-,s<sup>0</sup> (♦ goal) <sup>λ</sup>*.*

*Remark 4.2.* The condition ActM- (s) = ActM(s) ensures that the probability of a deleted transition (s, α, t) is added to (s, α, fail). This is essential for witnesses for **Pr**min <sup>M</sup>,s<sup>0</sup> (♦ goal) <sup>λ</sup> as one could otherwise remove entire actions causing low probabilities and obtain greater **Pr**min in <sup>M</sup> than in M as a result. For witnesses of **Pr**max M-,s<sup>0</sup> (♦ goal) <sup>λ</sup> one could delete this condition, thus leading to the notion of [77, 78].

**Fig. 1:** An MDP (with omitted probabilities (a)) and a subsystem (b), where redirected transitions are dashed.

*Example 4.3.* Figure 1a depicts an MDP and Figure 1b indicates the subsystem that is obtained by deleting the state t and additionally the transition (u, α, s0).

The following lemma ensures that we can use the subsystems as witnesses for both **Pr**max <sup>M</sup>,s<sup>0</sup> (♦ goal) <sup>λ</sup> and **Pr**min <sup>M</sup>,s<sup>0</sup> (♦ goal) <sup>λ</sup>.

**Lemma 4.4.** *Let* M *be an MDP as in Setting 2.2 and* M-⊆ M*. Then:*

**Pr**min M-,s<sup>0</sup> (♦ goal) <sup>≤</sup> **Pr**min <sup>M</sup>,s<sup>0</sup> (♦ goal) *and* **Pr**max M-,s<sup>0</sup> (♦ goal) <sup>≤</sup> **Pr**max <sup>M</sup>,s<sup>0</sup> (♦ goal)

We consider the following notions of minimality for subsystems:


Depending on the situation, one notion might be more suitable than the others. However, in the full version [42] we show that finding transition-minimal (respectively, size-minimal) witnesses can be reduced to finding state-minimal witnesses with a linear (respectively, quadratic) blow-up. We will therefore restrict ourselves to state-minimality for the rest of this paper.

**NP-completeness of finding minimal witnesses for DTMCs.** In this section we determine the computational complexity of the *witness problem*: Given a DTMC M, a positive integer k, and a rational number λ ∈ [0, 1], decide whether there exists a witness M- ⊆ M for PrM,s<sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup> with at most k states. The corresponding problem for MDPs is known to be NPcomplete [24, 78] <sup>1</sup>. In this section we show that the witness problem is already

<sup>1</sup> Although the framework in [24] considers a richer logic, the hardness proof uses only probabilistic reachability formulas such as the ones we consider.

NP-complete for acyclic DTMCs, where acyclicity means that the underlying graph with V = S and E = {(s, t) ∈ S × S | **P**(s, t) > 0} is acyclic (as before, we take S = Sall\{goal, fail}). This answers a conjecture of [77] in the affirmative and also shows NP-completeness of finding minimal witnesses for **Pr**min <sup>M</sup>,s<sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup>.

**Theorem 4.5.** *The witness problem is NP-complete for acyclic DTMCs.*

*Proof (Sketch).* An NP-algorithm for the witness problem is given by guessing a set of states of size k and verifying in polynomial time that the corresponding subsystem satisfies PrM-,s<sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup>.

For hardness, we give a reduction from the *clique problem*, which is among Karp's 21 NP-complete problems [54]. The idea is the following: Given an instance of the clique problem with graph G = (V,E) and integer k, construct an acyclic Markov chain M with states S = {s0} ∪ V ∪ E ∪ {goal, fail} and edges from each vertex v ∈ V to all edges to which it is incident. Then the existence of a k-clique can be reduced to the existence of a "saturated" subsystem in M with k states in V . To check whether the subsystem is saturated, we require it to have more probability than a certain threshold, which depends on k and |V |. Details can be found in the full version [42].

*Remark 4.6.* NP-completeness of transition-minimal and size-minimal versions of the witness problem for acyclic DTMCs follows along the same lines, where only the sizes and thresholds for the subsystems need to be adapted.

However, DTMCs whose underlying graph is a tree permit an efficient algorithm for computing minimal witnesses (for the proof see the full version [42]).

**Proposition 4.7.** *Minimal witnesses in tree-shaped DTMCs can be computed in polynomial time.*

*Proof (Sketch).* The algorithm first transforms the DTMC at hand into a binary (tree-shaped) DTMC, and then works bottom up by storing for each state the highest probability that can be obtained with a subsystem of size k, for all k up to the size of the subtree.

## **5 Relating Farkas certificates and minimal witnesses**

In this section we establish a strong connection between Farkas certificates on the one hand and witnesses for probabilistic reachability constraints on the other hand. We first note that the set of Farkas certificates for non-strict lower bounds forms a polytope, i.e., a bounded polyhedron.

**Lemma 5.1 (Polytopes of Farkas certificates).** *Let* M = (Sall, Act, s0, **P**) *be an MDP as in Setting 2.2 and consider* **<sup>A</sup>** <sup>∈</sup> <sup>R</sup>M×<sup>S</sup> *and* **<sup>b</sup>** <sup>∈</sup> <sup>R</sup><sup>S</sup> *introduced there. Then for every* λ ∈ [0, 1] *the polyhedra*

$$\mathcal{P}^{\min}(\lambda) = \{ \mathbf{z} \in \mathbb{R}^S \mid \mathbf{A}\mathbf{z} \le \mathbf{b} \land \mathbf{z}(s\_0) \ge \lambda \}$$

$$\mathcal{P}^{\max}(\lambda) = \{ \mathbf{y} \in \mathbb{R}^M \mid \mathbf{y} \ge 0 \land \mathbf{y}\mathbf{A} \le \delta\_{s\_0} \land \mathbf{y}\mathbf{b} \ge \lambda \}$$

*are both polytopes, called* the polytopes of Farkas certificates*.*

*Remark 5.2.* For any vector **<sup>v</sup>** <sup>∈</sup> <sup>R</sup><sup>n</sup> the support is defined as supp(**v**) = {<sup>i</sup> <sup>∈</sup> {1, ..., n} | **<sup>v</sup>**<sup>i</sup> <sup>&</sup>gt; <sup>0</sup>}, and analogously for the vector spaces <sup>R</sup><sup>S</sup> and <sup>R</sup>M. As our connection between subsystems of <sup>M</sup> and points in <sup>P</sup>min(λ) is based on taking the support, we restrict our attention to the subpolytope <sup>P</sup>min <sup>≥</sup><sup>0</sup> (λ) = <sup>P</sup>min(λ) <sup>∩</sup> <sup>R</sup><sup>S</sup> ≥0.

**Notation 5.3.** Given an MDP M = (Sall, Act, s0, **P**) as in Setting 2.2 and a subset R ⊆ M, where M also denotes the state-action pairs (compare with Section 2). We let M<sup>R</sup> = (S- all, Act, s0, **P**- ) be the subsystem where, roughly speaking, the state-action pairs in R *remain*. More precisely, let

$$\begin{aligned} S'\_{\text{all}} &= \{ s \in S \mid \exists \alpha \in \text{Act} \dots (s,\alpha) \in R \} \cup \{ \text{goal}, \text{fail} \} \\ \mathbf{P}'(s,\alpha,t) &= \begin{cases} \mathbf{P}(s,\alpha,t) & \text{if } (s,\alpha) \in R \text{ and } t \in S'\_{\text{all}} \\ 1 - \sum\_{t \in S'\_{\text{all}}} \{ \text{fail} \} & \text{if } (s,\alpha) \in R \text{ and } t = \text{fail} \\ 1 & \text{if } (s,\alpha) \notin R, \alpha \in \text{Act} (s) \text{ and } t = \text{fail} \\ 0 & \text{else} \end{cases} \end{aligned}$$

For R ⊆ S we set M<sup>R</sup> = M<sup>R</sup> for R- = <sup>s</sup>∈<sup>R</sup>{s} × Act(s).

**Theorem 5.4 (Farkas certificates yield witnesses).** *Let* M *be an MDP as in Setting 2.2 and* λ ∈ [0, 1]*. Then for a set* R ⊆ S *the following statements are equivalent:*


*Moreover, for a set* R ⊆ M *the following statements are equivalent:*


One consequence of Theorem 5.4 is that every MD-scheduler S with Pr<sup>S</sup> <sup>s</sup><sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup> corresponds to a point in <sup>P</sup>max(λ), i.e. to a certificate for **Pr**max <sup>M</sup>,s<sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup>.

**Corollary 5.5 (Detecting minimal witnesses by vertices of** P**).** *Let* M = (Sall, Act, s0, **P**) *be an MDP as in Setting 2.2 and* λ ∈ [0, 1]*. Then a vertex* **v** *of* <sup>P</sup>min <sup>≥</sup><sup>0</sup> (λ) *has a maximal number of zeros among all vertices of* <sup>P</sup>min <sup>≥</sup><sup>0</sup> (λ) *if and only if* <sup>M</sup>supp(**v**) *is a minimal witness for* **Pr**min <sup>s</sup><sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup>*.*

*Dually, a vertex* **<sup>v</sup>** *of* <sup>P</sup>max(λ) *has a maximal number of zeros among all vertices of* <sup>P</sup>max(λ) *if and only if all of the following hold:*


## **6 Computing witnessing subsystems**

In this section we use the results of Section 5 to derive two algorithms for the computation of minimal witnesses for reachability constraints in MDPs. As the problem is NP-hard, we also present a heuristic approach aimed at computing small witnessing subsystems.

**Vertex enumeration.** Corollary 5.5 gives rise to the following approach of computing minimal witnessing subsystems: enumerate all vertices in the corresponding polytope and choose one with a maximal amount of zeros. Vertex enumeration of polytopes has been studied extensively [11, 12, 14, 21, 22, 35, 36, 40, 41, 63, 68] and has been shown to be computationally hard [55, Corollary 2].

First experiments that we have conducted with the SageMath<sup>2</sup> toolkit which supports vertex enumeration have not scaled well in the dimension, which in our case is the number of states in the original system. Also, we found no tool support for vertex enumeration that is able to handle sparse matrices, which is essential for bigger benchmarks.

**Mixed integer linear programming.** An approach that computes minimal witnesses to the threshold problem **Pr**max <sup>s</sup><sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup> using mixed integer linear programs (MILP) was presented in [77, 78]. Using the following lemma, we can derive MILP formulations from our polytope formulations.

**Lemma 6.1.** *Let* <sup>P</sup> <sup>=</sup> {**<sup>x</sup>** <sup>|</sup> **Ax** <sup>≤</sup> **<sup>b</sup>**, **<sup>x</sup>** <sup>≥</sup> <sup>0</sup>} ⊆ <sup>R</sup><sup>n</sup> *be a polytope and* <sup>K</sup> <sup>≥</sup> <sup>0</sup> *be such that for all* **p** ∈ P *and* 1 ≤ i ≤ n *we have* **p**(i) ≤ K*. Consider the MILP*

$$\min \sum\_{1 \le i \le n} \sigma(i) \quad s.t. \quad \mathbf{x} \in \mathcal{P}, \quad \mathbf{x} \le K \cdot \sigma, \quad \sigma(i) \in \{0, 1\}$$

*Then a vector* (*σ*, **x**) *is an optimal solution of this MILP if and only if* **x** *is a point in* P *with a maximal number of zeros.*

For <sup>P</sup>min <sup>≥</sup><sup>0</sup> (λ) we can use Lemma 3.1 to derive that <sup>K</sup> = 1 is a viable bound. By invoking again Corollary 5.5, this means that a solution (**z**,*σ*) of the MILP

$$\min \sum\_{s \in S} \sigma(s) \text{ s.t. } \mathbf{z} \in \mathcal{P}\_{\geq 0}^{\min}(\lambda), \quad \mathbf{z} \leq \sigma, \quad \sigma(i) \in \{0, 1\}$$

encodes a minimal witnessing subsystem in the integral variables *σ*. This MILP was used in [77, 78] for the computation of minimal witnessing subsystems of DTMCs .

An upper bound <sup>K</sup> as in Lemma 6.1 for <sup>P</sup>max(λ) can be found in polynomial time by taking the objective value of an optimal solution to the LP

$$\max \sum\_{(s,\alpha)\in\mathcal{M}} \mathbf{y}(s,\alpha) \text{ s.t. } \mathbf{y} \in \mathcal{P}^{\max}(\lambda)$$

<sup>2</sup> http://www.sagemath.org/

*Remark 6.2.* To compute minimal witnesses for **Pr**max <sup>s</sup><sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup>, [77, 78] (witnesses for **Pr**min <sup>s</sup><sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup> were not considered) propose the MILP with objective: min - (s,α)∈M *<sup>σ</sup>*(s, α), subject to the conditions

$$\forall (s, \alpha) \in \mathcal{M}. \quad \mathbf{z}(s) \le 1 - \sigma(s, \alpha) + \sum\_{s' \in S} \mathbf{P}(s, \alpha, s') \cdot \mathbf{z}(s') + \mathbf{b}(s) \tag{6.1}$$

$$\forall s \in S. \quad \mathbf{z}(s) \le \sum\_{\alpha \in \text{Act}(s)} \sigma(s, \alpha), \quad \mathbf{z}(s\_0) \ge \lambda \tag{6.2}$$

where *σ*(s, α) are binary integer variables. It was implemented in the tool ltlsubsys. The idea is to directly encode a scheduler in the set of equations **Az** ≤ **b** using *σ*. In [77, 78] a number of additional redundant constraints are given to guide the search. In contrast to [77, 78] we do not need to handle so-called *problematic states*, as our precondition **Pr**min <sup>s</sup> (♦(goal <sup>∨</sup> fail)) <sup>&</sup>gt; 0 guarantees that no such states exist.

*k***-step quotient sum (QS***k***) heuristics.** Approximating the maximal number of zeros in a polytope is computationally hard in general [8]. We now derive a heuristic approach for this problem called *quotient sum heuristic* which is based on iteratively solving LPs over the polytope, where the objective function for each iteration depends on an optimal solution of the previous LP. More precisely, we take **o**<sup>1</sup> = (1,..., 1) and take an optimal solution QS<sup>1</sup> of the LP min **<sup>o</sup>**<sup>1</sup> · **<sup>y</sup>** s.t. **<sup>y</sup>** ∈ Pmax(λ). Many entries in QS<sup>1</sup> may be small, but still greater than zero. In order to push as many of the small values of QS<sup>1</sup> to zero, we define a new objective function by

$$\mathbf{o}\_2(i) = \begin{cases} 1/\text{QS}\_1(i), & \text{if } \text{QS}\_1(i) > 0 \\ C, & \text{if } \text{QS}\_1(i) = 0 \end{cases} \tag{6.3}$$

where C is a value that is greater than any value 1/ QS1(i). We now take a solution QS<sup>2</sup> of the new LP min **<sup>o</sup>**<sup>2</sup> · **<sup>y</sup>** s.t. **<sup>y</sup>** ∈ Pmax(λ) and form the next objective function **o**<sup>3</sup> as in (6.3). Inductively this generates a sequence of objective functions (**o**k)k≥<sup>1</sup> and corresponding optimal solutions (QSk)k≥<sup>1</sup> in <sup>P</sup>max / min(λ). By Theorem 5.4 we can construct a witnessing subsystem with as many states as the number of non-zero entries in QSk.

## **7 Experiments**

In this section we evaluate our MILP formulations and heuristics on a number of DTMC and MDP benchmarks from the Prism benchmark-suite [58, 59]. We compare our results with the tool Comics [50], which implements heuristic approaches to compute small subsystems for DTMCs. It has two modes: the *local search* extends a given subsystem by short paths that carry much probability, whereas the *global search* searches for the next most probable path from the

initial state to goal, and adds it to the subsystem. Both approaches iteratively extend a subsystem until it carries more probability than the given threshold and thus have to compute the probability of the subsystem at each iteration.

All computations were performed on a computer with two Intel E5-2680 8 cores at 2.70 GHz running Linux, with a time bound of 30 minutes, a memory bound of 100 GB and with each benchmark instance having access to 4 cores. For the LP and MILP instances we use the Gurobi solver, version 8.1.1 [43]. The recorded times of our computations include the construction of the LPs/MILPs and are wall clock times. Pre-processing steps, such as collapsing states that cannot reach goal, are not counted in the time consumption. For Comics, we use the time that is reported as counterexample generation time by the tool.

To validate our implementation, we used Prism to verify that the subsystems that we compute indeed satisfy the probability thresholds. We noticed that for a few instances (< 0.5%) Prism reported a deviation of less than 10−8, which can be explained by the fact that both Prism and the solvers that we use rely on floating-point arithmetic, which is approximate by nature.

Our implementation, together with the models we use and benchmark results can be found at https://github.com/simonjantsch/farkas.

**Fig. 2:** crowds-2-8: comparing QS<sup>k</sup> for growing k.

<sup>≥</sup><sup>0</sup> (λ).

**DTMC benchmarks.** As **Pr**max and **Pr**min coincide on DTMCs, we can use the heuristics and exact computations derived from either the <sup>P</sup>max or the <sup>P</sup>min ≥0 polytope for DTMCs (in Comics we use the standard query Pr<sup>s</sup><sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup>). We consider two DTMC benchmarks: a model of the crowds-N-K protocol [69, 72] for ensuring anonymous web browsing (with N members and K protocol runs) and a model of the bounded retransmission protocol [34, 47] for file transfers (where brp-N-K is the instance with N chunks and K retransmissions).

**(b)** brp-512-2 (15,875 states). Comics-local reports an error for <sup>λ</sup> <sup>≥</sup> <sup>2</sup>.<sup>1</sup> · <sup>10</sup>−<sup>5</sup> and Comics-global runs out of memory for <sup>λ</sup> = 2.<sup>6</sup> · <sup>10</sup>−<sup>5</sup>.

**Fig. 3:** Comparison of heuristic methods on DTMC benchmarks.

Figure 2 shows the effect of increasing the number of iterations of the QSheuristic for the model crowds-2-8. While the first iteration (taking QS<sup>2</sup> instead of QS1) has an impact on the number of states, more iterations do not improve the result significantly. For QS1, the sizes of subsystems increase monotonically with growing λ. Starting with QS<sup>2</sup> the results may, interestingly, have "spikes": increasing λ can lead to smaller subsystems.

Figure 3 shows the results of the QS2-heuristic compared to the two modes of Comics for λ that ranges between 0 and the actual reachability probability of the model. A general observation is that the runtime of the QS-heuristic is independent of λ, whereas both modes of Comics use significantly more time with increasing λ. The same observation can be done for memory consumption, which stayed below 200 MB for our heuristics. Also, especially for crowds-5-8, one can see that relatively small subsystems are possible even for large λ. The exact computations via MILPs hit the timeout for almost all instances.

**Fig. 4:** MDP benchmark: consensus-2-4 (528 states)

In Figure 3 it can be seen that the QS heuristics derived from the two polytopes <sup>P</sup>max and <sup>P</sup>min <sup>≥</sup><sup>0</sup> may produce different results. However, for both models one of them gives monotonically growing subsystems and outperforms Comics. While QS<sup>2</sup> applied to <sup>P</sup>min <sup>≥</sup><sup>0</sup> performs better on crowds-5-8 (Figure 3a), it is the other way around on brp-512-2 (Figure 3b). In future work we intend to investigate what properties determine which of the two formulations performs better for a given DTMC.

**MDP benchmarks.** We consider two MDP models: the randomized consensus-N-K protocol of [10, 60] (with N processes and a bound K on the random walk) and the CSMA-N-K protocol for data channels [61] (where N is the number of stations, and K is the maximal backoff count). The results of both heuristic and exact computations can be seen in Figure 4 and Figure 5. Whereas the heuristics all needed less than 5 minutes, all MILP instances ran into the timeout except for the ones in Figure 4a. Whenever a MILP instance could not be solved optimally in 30 minutes, we plot both the found upper and lower bound, with the region in between shaded. It should be noted that the condition **Pr**min(♦(goal <sup>∨</sup> fail)) holds for the instances of these models, and reachability properties, that we consider.

The comparison between the MILP formulation that we derived from <sup>P</sup>max(λ) and the one presented in [77, 78] (labeled by ltlsubsys, see also Section 6) shows that both compute comparable upper and lower bounds in Figure 4b, whereas ltlsubsys found worse upper bounds in Figure 5b. In all instances apart from Figure 4b the corresponding QS<sup>2</sup> heuristics performs well and generates subsystems that are as good, or better, than the best upper bounds computed by the MILPs in 30 minutes. As expected, the witnessing subsystems for **Pr**min <sup>s</sup><sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup> tend to the entire state space as <sup>λ</sup> tends to the actual value **Pr**min <sup>s</sup><sup>0</sup> (♦ goal) (which is 1 in these two models). However, subsystems for **Pr**max <sup>s</sup><sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup> may be substantially smaller even for large <sup>λ</sup>.

**Fig. 5:** MDP benchmark: CSMA-3-2 (36,850 states)

## **8 Conclusion**

In this paper we brought together two a priori unrelated notions in the context of probabilistic reachability constraints: on the one hand Farkas certificates, which are vectors satisfying certain linear inequalities that we derive using MDP-specific variants of Farkas' Lemma, and on the other hand witnessing subsystems, which provide insight into which parts of the system are essential for the satisfaction of the considered property. This connection reduces the computation of minimal (respectively, small) witnessing subsystems to finding a Farkas certificate with a maximal (respectively, large) number of zeros. Furthermore, it leads to a unified notion of witnessing subsystem for **Pr**max <sup>s</sup><sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup> and **Pr**min <sup>s</sup><sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup>.

We showed that the decision version of computing minimal witnessing subsystems is NP-complete for acyclic DTMCs and introduced heuristics for the computation of small witnesses based on Farkas certificates. Experiments of the heuristics exhibited competitive results compared to the approach implemented in Comics and showed that they scale well with the system size and threshold. As expected, computing minimal subsystems using the derived MILP formulations consumed significantly more time than the heuristics and often triggered timeouts. The upper and lower bounds that were computed in the given time by the new MILP formulation for **Pr**max <sup>s</sup><sup>0</sup> (♦ goal) <sup>≥</sup> <sup>λ</sup> were comparable to known techniques.

We have considered MDPs in which the probability to reach goal or fail is positive under each scheduler. In future work, we plan to extend our techniques to weaken this assumption. Exploring how vertex enumeration techniques could be adapted to the MDP-specific form of the Farkas polytopes is another interesting line of future work. We also plan to implement a tool for working with Farkas certificates in practice, which encompasses their generation as well as their independent validation.

## **References**


tions of Computer Science. pp. 338–345. SFCS '88, IEEE Computer Society (1988), https://doi.org/10.1109/SFCS.1988.21950


which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. **Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

**TACAS Evaluation Artifact 2020 Accepted**

## Simple Strategies in Multi-Objective MDPs*-*

Florent Delgrange1,2,† , Joost-Pieter Katoen<sup>1</sup> , Tim Quatmann<sup>1</sup> , and Mickael Randour<sup>2</sup>

<sup>1</sup> RWTH Aachen University, Aachen, Germany

<sup>2</sup> UMONS – Université de Mons, Mons, Belgium

Abstract We consider the verification of multiple expected reward objectives at once on Markov decision processes (MDPs). This enables a trade-off analysis among multiple objectives by obtaining a Pareto front. We focus on strategies that are easy to employ and implement. That is, strategies that are pure (no randomization) and have bounded memory. We show that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and we provide an MILP encoding to solve the corresponding problem. The bounded memory case is treated by a product construction. Experimental results using Storm and Gurobi show the feasibility of our algorithms.

## 1 Introduction

MDPs. Markov decision processes (MDPs) [4,3] are a key model in stochastic decision making. The classical setting involves a system subject to a stochastic model of its environment, and the goal is to synthesize a system controller, represented as a strategy for the MDP, ensuring a given level of expected performance. Tools such as Prism [30] and Storm [16] support MDP model checking.

Multi-objective MDPs. MDPs where the goal is to achieve a combination of objectives (rather than just one) are popular in e.g., AI [41] and verification [2]. This is driven by applications, where controllers have to fulfill multiple, potentially conflicting objectives, requiring a trade-off analysis. This includes multi-dimension MDPs [14,20,40,13] where weight vectors are aggregated at each step and MDPs where the specification mixes different views (e.g., average and worst case performance) of the same weight [11,8]. With multiple objectives, optimal strategies no longer exist in general: instead, Pareto-optimal strategies are considered. The Pareto front, i.e., the set of non-dominated achievable value vectors is usually non-trivial. Elaborate techniques are needed to explore it efficiently, e.g., [23,24].

Simple strategies. Another stumbling block in multi-objective MDPs is the complexity of strategies: Pareto-optimal strategies typically need both memory and randomization. A simple conjunction of reachability objectives already requires randomization and exponential memory (in the number of reachability sets) [40].

<sup>†</sup> currently affiliated with Vrije Universiteit Brussel.

<sup>-</sup> Research partially supported by F.R.S.-FNRS Grant n◦ F.4520.18 (ManySynth). Mickael Randour is an F.R.S.-FNRS Research Associate.

Some complex objectives even need infinite memory, e.g., [11,8]. In controller synthesis, strategies requiring randomization and/or (much) memory may not be practical. Limited-memory strategies are required on devices with limited resources [7]. Randomization is elegant and powerful from a theoretical view, but has practical limitations, e.g., it limits reproducibility which complicates debugging. Randomized strategies are also often despised for medical applications [33] and product design – all products should have the same design, not a random one. This motivates to consider the analysis of simple strategies, i.e., strategies using no randomization and a limited amount of memory (given as a parameter). While most works study the Pareto front among all strategies, we establish ways to explore efficiently the Pareto front among simple strategies only.

Problem statement. We consider pure (i.e., no randomization) and boundedmemory strategies and study two problems: (a) achievability queries – is it possible to achieve a given value vector – and (b) approximation of the Pareto front. Considering pure, bounded-memory strategies is natural as randomization can be traded for memory [12]: without randomization, optimal strategies may require arbitrarily large memory, (see Ex. 4). We study mixtures of expected (accumulated) reward objectives, covering various studied settings like reachability [20,40], shortest path [39,40,28,9] and total reward objectives [23,24].

Contributions. We first consider the achievability problem for pure stationary (i.e., memoryless) strategies and show that finding optimal strategies for multiobjective MDPs is NP-complete, even for two objectives. This contrasts the case of general strategies, where the problem is polynomial-time if the number of objectives is fixed [40]. We provide a mixed integer linear program (MILP) encoding. The crux lies in dealing with end components. The MILP is polynomial in the input MDP and the number of objectives. Inspired by [22], we give an alternative MILP encoding which is better suited for total reward objectives. To approximate the Pareto front under pure stationary strategies, we solve multiple MILP queries. This iteratively divides the solution space into achievable and non-achievable regions. Bounded-memory strategies are treated via a product construction. Our approach works for finite and infinite expected rewards.

Practical evaluation. We successfully compute Pareto fronts for 13 benchmarks using our implementation in Storm, exploiting the MILP solver Gurobi. Despite the hard nature of the problem, our experiments show that Pareto fronts for models with tens of thousands of states can be successfully approximated.

Related work. NP completeness for discounted rewards under pure strategies was shown in [14]. [19] claims that this generalizes to PCTL objectives but no proof is given. [42] treats multi-objective bounded MDPs whose transition probabilities are intervals. A set of Pareto optimal policies is computed using policy iteration and an efficient heuristic is exploited to compute a set of mutually non-dominated policies that are likely to be Pareto optimal. Pure stationary Pareto optimal strategies for discounted rewards are obtained in [44] using value-iteration but is restricted to small MDPs where all probabilities are 0 or 1. In [34],Tchebycheffoptimal strategies for discounted rewards are obtained via an LP approach; such strategies minimize the distance to a reference point and are not always pure.

## 2 Preliminaries

For a finite set Ω, let Dist(Ω) = - μ: Ω → [0, 1] | <sup>ω</sup>∈<sup>Ω</sup> <sup>μ</sup>(ω)=1 be the set of probability distributions over <sup>Ω</sup> with support supp(μ) = {<sup>ω</sup> <sup>∈</sup> <sup>Ω</sup> <sup>|</sup> <sup>μ</sup>(ω) <sup>&</sup>gt; <sup>0</sup>}. We write <sup>R</sup>≥<sup>0</sup> <sup>=</sup> {|x| | <sup>x</sup> <sup>∈</sup> <sup>R</sup>} and <sup>R</sup><sup>∞</sup> <sup>=</sup> <sup>R</sup>∪{∞} for the non-negative and extended real numbers, respectively. **<sup>1</sup>** <sup>=</sup> 1,..., <sup>1</sup> denotes the vector of size <sup>∈</sup> <sup>N</sup> with all entries 1. We just write **<sup>1</sup>** if is clear. Let *<sup>p</sup>*<sup>i</sup> denote the <sup>i</sup> th entry and *<sup>p</sup>* · *<sup>p</sup>* the dot product of *<sup>p</sup>*, *<sup>p</sup>* <sup>∈</sup> (R∞). *<sup>p</sup>* <sup>≤</sup> *<sup>p</sup>* , *p* + *p* , and |*p*| are entry-wise. For Boolean expression cond, let [cond]=1 if cond is true and [cond]=0 otherwise.

#### 2.1 Markov Decision Processes, Strategies, and End Components

Definition 1 (Markov decision process [36]). A Markov decision process (MDP) is a tuple <sup>M</sup> <sup>=</sup> S, Act, **<sup>P</sup>**, s<sup>I</sup> with finite set of states <sup>S</sup>, initial state <sup>s</sup><sup>I</sup> <sup>∈</sup> <sup>S</sup>, finite set of actions Act, and transition function **<sup>P</sup>**: <sup>S</sup> <sup>×</sup>Act <sup>×</sup><sup>S</sup> <sup>→</sup> [0, 1] with s-<sup>∈</sup><sup>S</sup> **<sup>P</sup>**(s, α, s ) ∈ {0, <sup>1</sup>} for all <sup>s</sup> <sup>∈</sup> <sup>S</sup> and <sup>α</sup> <sup>∈</sup> Act.

We fix an MDP <sup>M</sup> <sup>=</sup> S, Act, **<sup>P</sup>**, s<sup>I</sup> . Intuitively, **<sup>P</sup>**(s, α, s ) is the probability to take a transition from s to s when choosing action α. An infinite path in M is a sequence <sup>π</sup> <sup>=</sup> <sup>s</sup>0α1s1α<sup>2</sup> ··· ∈ (S×Act)<sup>ω</sup> with **<sup>P</sup>**(si, αi+1, si+1) <sup>&</sup>gt; <sup>0</sup> for all <sup>i</sup> <sup>∈</sup> <sup>N</sup>. We write π[i] = s<sup>i</sup> for the (i+1)th state visited by π and define the length of π as |π| = ∞. A finite path is a finite prefix πˆ = s0α<sup>1</sup> ...αns<sup>n</sup> of infinite path π, where last(ˆπ) = <sup>s</sup><sup>n</sup> <sup>∈</sup> <sup>S</sup>, <sup>|</sup>πˆ<sup>|</sup> <sup>=</sup> <sup>n</sup> and <sup>π</sup>ˆ[i] = <sup>s</sup><sup>i</sup> for <sup>i</sup> <sup>≤</sup> <sup>n</sup>. The set of finite (infinite) paths in <sup>M</sup> is denoted by Paths<sup>M</sup> fin (Paths<sup>M</sup> inf). The enabled actions at a state <sup>s</sup> <sup>∈</sup> <sup>S</sup> are given by the set Act(s) = {<sup>α</sup> <sup>∈</sup> Act | ∃ <sup>s</sup> <sup>∈</sup> <sup>S</sup> : **<sup>P</sup>**(s, α, s ) > 0}. We assume Act(s) <sup>=</sup> <sup>∅</sup> for all <sup>s</sup>. If <sup>|</sup>Act(s)<sup>|</sup> = 1 for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>, <sup>M</sup> is called a Markov Chain (MC). We write M<sup>s</sup> for the MDP obtained by replacing the initial state of <sup>M</sup> by <sup>s</sup> <sup>∈</sup> <sup>S</sup>. For <sup>s</sup> <sup>∈</sup> <sup>S</sup> and <sup>α</sup> <sup>∈</sup> Act, we define the set of successor states succ(s, α) = {s <sup>|</sup> **<sup>P</sup>**(s, α, s ) > 0}. For s ∈ S, the set of predecessor state-action pairs is given by pre(s ) = {s, α | **P**(s, α, s ) <sup>&</sup>gt; <sup>0</sup>}. For a set E ⊆ <sup>S</sup> <sup>×</sup> Act, we define <sup>S</sup>-<sup>E</sup> <sup>=</sup> {<sup>s</sup> <sup>∈</sup> <sup>S</sup> | ∃ <sup>α</sup>: s, α ∈ E}, Act-<sup>E</sup> <sup>=</sup> {<sup>α</sup> <sup>∈</sup> Act | ∃ <sup>s</sup>: s, α ∈ E}, and **<sup>P</sup>**-<sup>E</sup>(s, α, s )=[s, α∈E] · [s <sup>∈</sup> <sup>S</sup>-<sup>E</sup>] · **<sup>P</sup>**(s, α, s ). We say E is closed for <sup>M</sup> if ∀ s, α∈E : <sup>α</sup> <sup>∈</sup> Act(s) and succ(s, α) <sup>⊆</sup> <sup>S</sup>-E.

Definition 2 (Sub-MDP). The sub-MDP of <sup>M</sup>, closed E ⊆ <sup>S</sup> <sup>×</sup> Act, and <sup>s</sup> <sup>∈</sup> <sup>S</sup>-<sup>E</sup> is given by <sup>M</sup>-<sup>E</sup>, s <sup>=</sup> S-<sup>E</sup>, Act-<sup>E</sup>, **<sup>P</sup>**-<sup>E</sup>, s. We also write <sup>M</sup>-<sup>E</sup> for the sub-MDP <sup>M</sup>-<sup>E</sup>, s and an arbitrary state <sup>s</sup> <sup>∈</sup> <sup>S</sup>-E.

Definition 3 (End Component). A non-empty set E ⊆ <sup>S</sup> <sup>×</sup> Act is an end component (EC) of <sup>M</sup> if <sup>E</sup> is closed for <sup>M</sup> and for each pair of states s, s <sup>∈</sup> <sup>S</sup>-E there is a finite path <sup>π</sup><sup>ˆ</sup> <sup>∈</sup> PathsM-E fin with <sup>π</sup>ˆ[0] = <sup>s</sup> and last(ˆπ) = <sup>s</sup> . An EC E is maximal, if there is no other EC <sup>E</sup> with <sup>E</sup> - E . The set of all maximal end components of <sup>M</sup> is MECS(M).

The maximal ECs of a Markov chain are also called bottom strongly connected components (BSCCs). A strategy resolves nondeterminism in MDPs:

Definition 4 (Strategy). A (general) strategy for MDP M is a function σ : Paths<sup>M</sup> fin <sup>→</sup> Dist(Act) with supp(σ(πˆ)) <sup>⊆</sup> Act(last(ˆπ)) for all <sup>π</sup><sup>ˆ</sup> <sup>∈</sup> Paths<sup>M</sup> fin.

Let σ be a strategy for M. Intuitively, σ(ˆπ)(α) is the probability to perform action <sup>α</sup> after observing history <sup>π</sup><sup>ˆ</sup> <sup>∈</sup> Paths<sup>M</sup> fin. A strategy is pure if all histories are mapped to Dirac distributions, i.e., the support is a singleton. A strategy is stationary if its decisions only depend on the current state, i.e., ∀ π, ˆ πˆ ∈ PathsM fin : last(ˆπ) = last(ˆπ ) implies σ(ˆπ) = σ(ˆπ ). We often assume σ : S → Dist(Act) for stationary and <sup>σ</sup> : <sup>S</sup> <sup>→</sup> Act for pure stationary strategies <sup>σ</sup>. Let Σ<sup>M</sup> and Σ<sup>M</sup> PS be the sets of general and pure stationary strategies, respectively. A set of paths <sup>Π</sup> <sup>⊆</sup> Paths<sup>M</sup> inf is compliant with σ ∈ Σ<sup>M</sup> if for all π = s0α1s<sup>1</sup> ···∈ Π and prefixes πˆ of π satisfy σ(ˆπ)(α|πˆ|+1) > 0. The induced Markov chain of M and σ ∈ Σ<sup>M</sup> PS is given by <sup>M</sup><sup>σ</sup> <sup>=</sup> <sup>M</sup>-<sup>E</sup><sup>σ</sup>, s<sup>I</sup> with <sup>E</sup><sup>σ</sup> <sup>=</sup> {s, σ(s) | <sup>s</sup> <sup>∈</sup> <sup>S</sup>}.

MDP M and strategy σ ∈ Σ<sup>M</sup> induce a probability measure Pr<sup>M</sup><sup>σ</sup> on subsets <sup>Π</sup> <sup>⊆</sup> Paths<sup>M</sup> inf given by a standard cylinder set construction [4,22]. The expected value of X : Paths<sup>M</sup> inf <sup>→</sup> <sup>R</sup><sup>∞</sup> is <sup>E</sup><sup>M</sup><sup>σ</sup> (X) = <sup>π</sup> X(π) dPr<sup>M</sup><sup>σ</sup> ({π}). For σ ∈ Σ<sup>M</sup> PS, Pr<sup>M</sup><sup>σ</sup> and <sup>E</sup><sup>M</sup><sup>σ</sup> coincide with the corresponding measures on MC <sup>M</sup><sup>σ</sup>.

#### 2.2 Objectives

A reward structure **<sup>R</sup>**: <sup>S</sup> <sup>×</sup> Act <sup>×</sup> <sup>S</sup> <sup>→</sup> <sup>R</sup>≥<sup>0</sup> assigns non-negative rewards to transitions. We accumulate rewards on (in)finite paths π = s0α1s1α<sup>2</sup> ... : **R**(π) = <sup>|</sup>π<sup>|</sup> <sup>i</sup>=1 **<sup>R</sup>**(s<sup>i</sup>−<sup>1</sup>, αi, si). For a set of goal states <sup>G</sup> <sup>⊆</sup> <sup>S</sup>, let **<sup>R</sup>**♦G(π) = **<sup>R</sup>**(ˆπ), where <sup>π</sup><sup>ˆ</sup> is the smallest prefix of <sup>π</sup> with last(ˆπ) <sup>∈</sup> <sup>G</sup> (or <sup>π</sup><sup>ˆ</sup> <sup>=</sup> <sup>π</sup> if no such prefix exists). Intuitively, **R**♦G(π) is the reward accumulated on π until a state in <sup>G</sup> is reached. A (reward) objective has the form <sup>E</sup>∼(**R**♦G) for ∼ ∈ {≥, ≤}. We write M, σ, p |<sup>=</sup> <sup>E</sup>∼(**R**♦G) iff <sup>E</sup><sup>M</sup><sup>σ</sup> (**R**♦G) <sup>∼</sup> <sup>p</sup>, i.e., for <sup>M</sup> and <sup>σ</sup>, the expected accumulated reward until reaching <sup>G</sup> is at least (or at most) <sup>p</sup> <sup>∈</sup> <sup>R</sup>∞. We call the objective maximizing if ∼ = ≥ and minimizing otherwise. If G = ∅ (i.e., **R**♦G(π) = **R**(π) for all paths π), we call the objective a total reward objective. Let the reward structure **R**<sup>G</sup> be given by **R**(s, α, s )=[s ∈ G]. Then, Pr<sup>M</sup><sup>σ</sup> (♦G)=E<sup>M</sup><sup>σ</sup> (**R**<sup>G</sup>♦G) for every <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>M, where ♦<sup>G</sup> <sup>⊆</sup> Paths<sup>M</sup> inf denotes the set of paths that visit a state in <sup>G</sup>. We use <sup>P</sup>∼(♦G) as a shortened for <sup>E</sup>∼(**R**<sup>G</sup>♦G) and call such an objective a reachability objective.

Definition 5 (Multi-objective query). For MDP M, an -dimensional multiobjective query is a tuple <sup>Q</sup> <sup>=</sup> ψ1,...,ψ of objectives <sup>ψ</sup><sup>j</sup> <sup>=</sup> <sup>E</sup><sup>∼</sup><sup>j</sup> (**R**j♦G<sup>j</sup> ).

Each objective ψ<sup>j</sup> considers a different reward structure **R**<sup>j</sup> . The MDP M, strategy <sup>σ</sup>, and point *<sup>p</sup>* <sup>∈</sup> (R∞) satisfy a multi-objective query <sup>Q</sup> <sup>=</sup> ψ1,...,ψ (written M, σ, *<sup>p</sup>* |<sup>=</sup> <sup>Q</sup>) iff <sup>∀</sup> <sup>j</sup> : M, σ, *<sup>p</sup>*<sup>j</sup> |<sup>=</sup> <sup>ψ</sup><sup>j</sup> . Then, we also say <sup>σ</sup> achieves *<sup>p</sup>* and call *<sup>p</sup>* achievable. Let AchM(Q) (Ach<sup>M</sup> PS(Q)) denote the set of points achieved by a general (pure stationary) strategy. The closure of a set <sup>P</sup> <sup>⊆</sup> (R∞) with respect to query <sup>Q</sup> is clQ(P) = - *<sup>p</sup>* <sup>∈</sup> (R∞) | ∃ *<sup>p</sup>* <sup>∈</sup> <sup>P</sup> : <sup>∀</sup> <sup>j</sup> : *<sup>p</sup>* <sup>j</sup> <sup>∼</sup><sup>j</sup> *<sup>p</sup>*j . For *<sup>p</sup>*, *<sup>p</sup>* <sup>∈</sup> (R∞), we say that *<sup>p</sup>* dominates *<sup>p</sup>* if *<sup>p</sup>* <sup>∈</sup> clQ({*p*}). In this case, M, σ, *p* |= Q implies M, σ, *p* |= Q for any σ ∈ ΣM. We are interested in the Pareto front, which is the set of non-dominated achievable points.

Figure 1: An MDP and a plot of the pure stationary and general Pareto fronts.

Definition 6 (Pareto front). The (general) Pareto front for M and Q is ParetoM(Q) = *<sup>p</sup>* <sup>∈</sup> AchM(Q) | ∀ *<sup>p</sup>* <sup>∈</sup> AchM(Q): *<sup>p</sup>* <sup>∈</sup> clQ({*p* }) =⇒ *p* = *p* .

The Pareto front is the smallest set <sup>P</sup> <sup>⊆</sup> (R∞) with clQ(P) = AchM(Q). In a similar way, we define the pure stationary Pareto front Pareto<sup>M</sup> PS(Q) which only consider points in Ach<sup>M</sup> PS(Q).

Example 1. Let <sup>M</sup> be the MDP in Fig. 1a and <sup>Q</sup> <sup>=</sup> P≥(♦G), <sup>P</sup>≥(♦G-). A pure stationary strategy choosing β at s<sup>1</sup> reaches both, s<sup>4</sup> ∈ G and s<sup>3</sup> ∈ G- with probability 0.7 and thus achieves 0.7, 0.7. Similarly, 0, 1 and 1, 0 are achievable by a pure stationary strategy. Point 1, 0.8 is achievable by a nonstationary pure strategy that chooses α at s1, γ at the first visit of s2, and δ in all other cases. Changing this strategy by picking γ only with probability 0.5 achieves 0.5, <sup>0</sup>.9. Fig. 1b illustrates Pareto<sup>M</sup> PS(Q) (dots), Ach<sup>M</sup> PS(Q) (green area), ParetoM(Q) (dotted line), and AchM(Q) (blue and green area).

## 3 Deciding Achievability

The achievability problem asks whether a given point is achievable.

```
General Multi-objective Achievability Problem (GMA)
Input: MDP M, -dimensional multi-objective query Q, point p ∈ (R∞)
Output: Yes iff p ∈ AchM(Q) holds
```
For GMA, the point can be achieved by a general strategy that can potentially make use of memory and randomization. As discussed earlier, this class of strategies is not suitable for various applications. In this work, we focus on a variant of the achievability problem that only considers pure stationary strategies. Sect. 5 also addresses pure strategies that can store more information from the history, e.g., whether a goal state set has been reached already.

Pure Stationary Multi-objective Achievability Problem (PSMA)

Input: MDP <sup>M</sup>, -dimensional multi-objective query <sup>Q</sup>, point *<sup>p</sup>* <sup>∈</sup> (R∞) Output: Yes iff *<sup>p</sup>* <sup>∈</sup> Ach<sup>M</sup> PS(Q) holds

#### 3.1 Complexity Results

GMA is PSPACE hard (already with only reachability objectives) [40] and solvable within exponential runtime [20,23]. To the best of our knowledge, a PSPACE upper bound on the complexity of GMA is unknown. This complexity is rooted in the dimension of the query Q: for fixed , the algorithms of [20,23] have polynomial runtime. In contrast, PSMA is NP-complete, even if restricted to 2 objectives.

## Lemma 1. PSMA with only reachability objectives is NP-hard.

Proof. The result follows by a reduction from the subset sum problem. Given <sup>n</sup> <sup>∈</sup> <sup>N</sup>, *<sup>a</sup>* <sup>∈</sup> <sup>N</sup><sup>n</sup> and <sup>z</sup> <sup>∈</sup> <sup>N</sup>, the subset sum problem is to decide the existence of *<sup>v</sup>* ∈ {0, <sup>1</sup>}<sup>n</sup> such that *<sup>v</sup>* · *<sup>a</sup>* <sup>=</sup> <sup>z</sup>. This problem is NP-complete [25]. For a given instance of the subset sum problem, we construct the MDP <sup>M</sup> <sup>=</sup> S, Act, **<sup>P</sup>**, s<sup>I</sup> with state space <sup>S</sup> <sup>=</sup> {s<sup>I</sup> , s1,...,sn, g1, g2}, actions Act <sup>=</sup> {α, Y, N}, and for all <sup>i</sup> ∈ {1,...,n}, **<sup>P</sup>**(s<sup>I</sup> , α, si) = *<sup>a</sup>*i **<sup>1</sup>***·<sup>a</sup>* and **<sup>P</sup>**(si,Y,g1) = **<sup>P</sup>**(si,N,g2)=1. States <sup>g</sup><sup>1</sup> and g<sup>2</sup> are made absorbing, i.e., **P**(g1, α, g1) = **P**(g2, α, g2)=1.

We claim that the PSMA problem for <sup>M</sup>, <sup>Q</sup> <sup>=</sup> P≥(♦ {g1}), <sup>P</sup>≥(♦ {g2}), and *p* = <sup>z</sup> **<sup>1</sup>**·*<sup>a</sup>* , <sup>1</sup> <sup>−</sup> <sup>z</sup> **1**·*a* answers "yes" iff there is a vector *v* satisfying the subset sum problem for n, *a* and z. Consider the bijection f : ΣM- PS → {0, <sup>1</sup>}<sup>n</sup> with <sup>f</sup>(σ)<sup>i</sup> = [σ(si)=<sup>Y</sup> ] for all <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>M- PS and <sup>i</sup> ∈ {1,...,n}. We get PrM- <sup>σ</sup> (♦ {g1}) = <sup>n</sup> i=1 *a*i **<sup>1</sup>***·<sup>a</sup>* [σ(si)=<sup>Y</sup> ] = <sup>f</sup>(σ)·*<sup>a</sup>* **<sup>1</sup>**·*<sup>a</sup>* . Moreover, PrM- <sup>σ</sup> (♦ {g2})=1 <sup>−</sup> PrM- <sup>σ</sup> (♦ {g1}) = <sup>1</sup><sup>−</sup> <sup>f</sup>(σ)·*<sup>a</sup>* **<sup>1</sup>**·*<sup>a</sup>* . It follows that <sup>σ</sup> achieves *<sup>p</sup>* iff <sup>f</sup>(σ) is a solution to the instance of the subset sum problem. Our construction is inspired by similar ideas from [14,40].

Lemma 2 ([14]). PSMA with only total reward objectives is NP-hard.

Theorem 1. PSMA is NP-complete.

Proof. Containment follows by guessing a pure stationary strategy and evaluating it on the individual objectives. This can be done in polynomial time [4]. Hardness follows by either Lemma 1 or 2.

Proofs of Lemmas 1 and 2 only consider 2-dimensional multi-objective queries. Hence, in contrast to GMA, the hardness of PSMA is not due to the size of the query.

Corollary 1. PSMA with only two objectives is NP-complete.

#### 3.2 A Mixed Integer Linear Programming Approach

An MDP <sup>M</sup> <sup>=</sup> S, Act, **<sup>P</sup>**, s<sup>I</sup> has exactly <sup>|</sup>Σ<sup>M</sup> PS| = <sup>s</sup>∈<sup>S</sup> <sup>|</sup>Act(s)<sup>|</sup> many pure stationary strategies. A simple algorithm for PSMA enumerates all σ ∈ Σ<sup>M</sup> PS and checks whether M, σ, *p* |= Q holds. In practice, however, such a brute-force approach is not feasible. For the MDPs that we consider in our experiments in Sect. 6, the number of pure stationary strategies often exceeds 1010 000. Instead, our approach is to encode an instance for PSMA as an MILP problem.


For an MILP instance as above, each of the n rows of the inequation system *Ax* ≤ *b* represent a constraint that is linear over the integral and m real-valued variables given by *x*. We call the constraints feasible if there is a solution to the inequation system. The task is to decide whether the constraints are feasible and if so, find a solution that maximizes a linear optimization function *c*<sup>T</sup> *x*. The optimization function can be omitted if we are only interested in feasibility. MILP is NP-complete [35]. However, tools such as Gurobi [27] and SCIP [26] implement practically efficient algorithms that can solve large instances.

For the rest of this section, let <sup>M</sup> <sup>=</sup> S, Act, **<sup>P</sup>**, s<sup>I</sup> , <sup>Q</sup> <sup>=</sup> ψ1,...,ψ with <sup>ψ</sup><sup>j</sup> <sup>=</sup> <sup>E</sup><sup>∼</sup><sup>j</sup> (**R**j♦G<sup>j</sup> ), and *<sup>p</sup>* <sup>∈</sup> (R∞) be an instance for PSMA. We provide a translation of the PSMA instance to an instance for MILP that has a feasible solution iff *<sup>p</sup>* <sup>∈</sup> Ach<sup>M</sup> PS(Q). The MILP encoding considers integer variables to encode a pure stationary strategy σ ∈ Σ<sup>M</sup> PS. The other variables and constraints encode the expected reward for each objective on the induced MC <sup>M</sup><sup>σ</sup>.

#### 3.3 Unichain MDP and Finite Rewards

Restriction 1 (Unichain MDP). MDP M has exactly one end component.

Restriction 2 (Reward Finiteness). E<sup>M</sup><sup>s</sup> <sup>σ</sup> (**R**j♦G<sup>j</sup> ) <sup>&</sup>lt; <sup>∞</sup> holds for each objective <sup>ψ</sup><sup>j</sup> <sup>=</sup> <sup>E</sup><sup>∼</sup><sup>j</sup> (**R**j♦G<sup>j</sup> ), state <sup>s</sup>, and pure stationary strategy <sup>σ</sup>.

For simplicity, we first explain our encoding for unichain MDP with finite reward. Sect. 3.5 lifts Restriction 1 and Sect. 3.6 lifts Restriction 2 with more details given in [17, App. B]. Sect. 3.4 presents an alternative to the encoding of this section, which is smaller but restricted to total reward objectives.

Fig. 2 shows the MILP encoding in case Restrictions 1 and 2 hold. We assume <sup>∀</sup> <sup>j</sup> : *<sup>p</sup>*j <sup>=</sup> <sup>∞</sup> for the point *<sup>p</sup>* since (i) <sup>E</sup><sup>M</sup><sup>σ</sup> (**R**j♦G<sup>j</sup> ) ≤ ∞ holds trivially and (ii) <sup>E</sup><sup>M</sup><sup>σ</sup> (**R**j♦G<sup>j</sup> ) ≥ ∞ will never hold due to Restriction 2. For <sup>j</sup> ∈ {1,...,}, let S<sup>j</sup> <sup>0</sup> <sup>=</sup> {<sup>s</sup> <sup>∈</sup> <sup>S</sup> | ∀ <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup> : E<sup>M</sup><sup>σ</sup> (**R**j♦G<sup>j</sup> )=0} and <sup>S</sup><sup>j</sup> ? <sup>=</sup> {<sup>s</sup> <sup>∈</sup> <sup>S</sup> \ <sup>S</sup><sup>j</sup> <sup>0</sup> | s can be reached from s<sup>I</sup> without visiting a state in S<sup>j</sup> 0}. These sets can be obtained a priori by analyzing the graph structure of M [4]. Moreover, we consider upper bounds U<sup>j</sup> <sup>s</sup> <sup>∈</sup> <sup>Q</sup> for the expected reward at state <sup>s</sup> <sup>∈</sup> <sup>S</sup><sup>j</sup> ? such that U<sup>j</sup> <sup>s</sup> ≥ max<sup>σ</sup>∈Σ<sup>M</sup> E<sup>M</sup><sup>s</sup> <sup>σ</sup> (**R**j♦G<sup>j</sup> ). We compute such upper bounds using singleobjective model checking techniques [4,5]. The MILP encoding applies the characterization of expected rewards for MCs as a linear equation system [4].

Lemma 3. For every σ ∈ Σ<sup>M</sup> PS, the following equation system has a unique solution <sup>Φ</sup>: {x<sup>s</sup> <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup>} → <sup>R</sup>|S<sup>|</sup> satisfying <sup>Φ</sup>(xs)=E<sup>M</sup><sup>s</sup> <sup>σ</sup> (**R**j♦G<sup>j</sup> ):

$$\forall s \in S\_0^j \colon x\_s = 0 \qquad \forall s \in S\_? \colon x\_s = \sum\_{s' \in S} \mathbf{P}(s, \sigma(s), s') \cdot \left(x\_{s'} + \mathbf{R}(s, \sigma(s), s')\right)$$

∀ s ∈ S : - Select an action at each state ∀ α ∈ Act(s):

$$a\_{s, \alpha} \in \{0, 1\} \tag{1}$$

$$\sum \quad a\_{s, \alpha} = 1 \tag{2}$$

$$\forall j \in \{1, \ldots, \ell\}: \begin{array}{ll} \alpha \in \mathcal{A}ct(s) \\\\ \cdot \end{array} \qquad\qquad\Rightarrow \begin{array}{ll} \Rightarrow \begin{array}{ll} Compute \; \, e \; \text{array } \epsilon \text{-} x \; \text{end} \; \vert \; \text{ward} \; \text{ward} \; \text{ward} \; \text{ward} \; \text{ward} \end{array}$$

<sup>∀</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup><sup>j</sup> <sup>0</sup> : x<sup>j</sup> <sup>s</sup> = 0 (3)

If ψ<sup>j</sup> is maximising, ± = + and [min] = 0. Otherwise, ± = − and [min] = 1.

$$\forall s \in S\_?^j: \qquad \qquad \pm x\_s^j \in [0, U\_s^j] \tag{4}$$

$$\forall \alpha \in Act(s): \qquad \pm x\_{s,\alpha}^j \in [0, U\_s^j] \tag{5}$$

$$x\_{s,\alpha}^j \le \sum\_{s' \in S} \mathbf{P}(s, \alpha, s') \cdot \left( x\_{s'}^j \pm \mathbf{R}\_{\boldsymbol{\beta}}(s, \alpha, s') \right) \tag{6}$$

$$x\_{s, \alpha}^j \le U\_s^j \cdot (a\_{s, \alpha} - \text{[min]}) \tag{7}$$

$$x\_s^j \le \sum\_{\alpha \in Act(s)} x\_{s,\alpha}^j + \left[ \min \right] \cdot \left( |Act(s)| - 1 \right) \cdot U\_s^j \tag{8}$$

$$\pm x\_{s\_I}^j \sim\_j \mathbf{p}[j] \qquad\qquad\Rightarrow \text{A secret value at initial state} \tag{9}$$

Figure 2: MILP encoding for unichain MDP and finite rewards.

Proof. Since M is unichain and we do not collect infinite reward, the only EC of <sup>M</sup> (i.e., the only BSCC of <sup>M</sup><sup>σ</sup> for any <sup>σ</sup>) either contains a goal state or only contains transitions with zero reward. It follows that ∀ σ ∈ Σ<sup>M</sup> PS : Pr<sup>M</sup><sup>σ</sup> (♦S<sup>j</sup> <sup>0</sup>)=1. Lemma 3 follows by standard arguments for MCs with rewards [4, Section 10.5.1].

We discuss the intuition of each constraint in Fig. 2. Let <sup>Φ</sup>: Var <sup>→</sup> <sup>R</sup> be an assignment of the occurring variables Var to values. Φ is a solution of the constraints if all (in)equations are satisfied upon replacing all variables v by Φ(v).

Lines 1 and 2 encode a strategy σ ∈ Σ<sup>M</sup> PS by considering a binary variable as,α for each state s and enabled action α such that σ(s)(α)=1 iff Φ(as,α)=1 for a solution Φ. Due to Line 2, exactly one action has to be chosen at each state.

Lines 3 to 8 encode for each objective ψ<sup>j</sup> the expected rewards obtained for the encoded strategy <sup>σ</sup>. For every <sup>s</sup> <sup>∈</sup> <sup>S</sup>, the variable <sup>x</sup><sup>j</sup> <sup>s</sup> represents a (lower or upper) bound on the expected reward at s. Line 3 sets this value for all <sup>s</sup> <sup>∈</sup> <sup>S</sup><sup>j</sup> <sup>0</sup>, reflecting the analogous case from Lemma 3. For <sup>s</sup> <sup>∈</sup> <sup>S</sup><sup>j</sup> ? , we distinguish maximizing (∼<sup>j</sup> = ≥) and minimizing (∼<sup>j</sup> = ≤) objectives ψ<sup>j</sup> .

For maximizing ψ<sup>j</sup> , we have Φ(x<sup>j</sup> <sup>s</sup>) ≤ E<sup>M</sup><sup>s</sup> <sup>σ</sup> (**R**j♦G<sup>j</sup> ) for every solution Φ. This is achieved by considering a variable x<sup>j</sup> s,α for each enabled action <sup>α</sup> <sup>∈</sup> Act(s). In Line 6, we use the equation system characterization from Lemma 3 to assert that the value of x<sup>j</sup> s,α can not be greater than the expected reward at s, given that the encoded strategy σ selects α. If σ does not select α (i.e., Φ(as,α)=0), Line 7 implies Φ(x<sup>j</sup> s,α)=0. Otherwise, this constraint has no effect. Line 8 ensures that every solution satisfies Φ(x<sup>j</sup> <sup>s</sup>) <sup>≤</sup> <sup>Φ</sup>(x<sup>j</sup> s,α) ≤ E<sup>M</sup><sup>s</sup> <sup>σ</sup> (**R**j♦G<sup>j</sup> ) for α with Φ(as,α)=1.

For minimizing <sup>ψ</sup><sup>j</sup> , we have <sup>−</sup>Φ(x<sup>j</sup> <sup>s</sup>) ≥ E<sup>M</sup><sup>s</sup> <sup>σ</sup> (**R**j♦G<sup>j</sup> ) for every solution Φ, i.e., we consider the negated reward values. The encoding is as for maximizing

objectives. However, Line 7 yields Φ(x<sup>j</sup> s,α) = <sup>−</sup>U<sup>j</sup> <sup>s</sup> if α is not selected. Thus, in Line 8 we add U<sup>j</sup> <sup>s</sup> for each of the (|Act(s)| − 1) non-selected actions.

Line 9 and our observations above yield EM<sup>s</sup> <sup>σ</sup> (**R**j♦G<sup>j</sup> ) <sup>≥</sup> <sup>Φ</sup>(x<sup>j</sup> <sup>s</sup><sup>I</sup> ) <sup>≥</sup> *<sup>p</sup>*<sup>j</sup> for maximizing and EM<sup>s</sup> <sup>σ</sup> (**R**j♦G<sup>j</sup> ) ≤ −Φ(x<sup>j</sup> <sup>s</sup><sup>I</sup> ) <sup>≤</sup> *<sup>p</sup>*<sup>j</sup> for minimizing objectives. Therefore, *p* is achievable if a solution Φ exists. On the other hand, if *p* is achievable by some σ ∈ Σ<sup>M</sup> PS, the solution Φ exists with Φ(as,α) = σ(s)(α), Φ(x<sup>j</sup> <sup>s</sup>) = Φ(x<sup>j</sup> s,α) = ±EM<sup>s</sup> <sup>σ</sup> (**R**j♦G<sup>j</sup> ) if <sup>α</sup> <sup>=</sup> <sup>σ</sup>(s), and <sup>Φ</sup>(v)=0 for other <sup>v</sup> <sup>∈</sup> Var .

Theorem 2. For unichain M and finite rewards, the constraints in Fig. 2 are feasible iff *<sup>p</sup>* <sup>∈</sup> Ach<sup>M</sup> PS(Q).

Proposition 1. The MILP encoding above considers <sup>O</sup>(|S|·|Act| · ) variables.

#### 3.4 Alternative Encoding for Total Rewards

We now consider PSMA instances where all objectives <sup>ψ</sup><sup>j</sup> <sup>=</sup> <sup>E</sup><sup>∼</sup><sup>j</sup> (**R**j♦G<sup>j</sup> ) are expected total reward objectives, i.e., G<sup>j</sup> = ∅. For such instances, we can employ an encoding from [23] (restated in Lemma 4) for GMA. In fact, we can often translate reachability reward objectives to total reward objectives, e.g., if the set of goal states can not be left or if all objectives consider the same goal states.

Lemma 4 ([23]). For <sup>S</sup><sup>0</sup> <sup>⊆</sup> <sup>S</sup>, let <sup>Φ</sup>: Var <sup>→</sup> <sup>R</sup>≥<sup>0</sup> be an assignment of variables Var <sup>=</sup> {ys,α <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup> \ <sup>S</sup>0, α <sup>∈</sup> Act(s)} and let <sup>σ</sup><sup>Φ</sup> be a stationary strategy satisfying σΦ(s)(α) = Φ(ys,α)/ <sup>β</sup>∈Act(s) <sup>Φ</sup>(ys,β) for all <sup>s</sup> <sup>∈</sup> <sup>S</sup> \ <sup>S</sup><sup>0</sup> and <sup>α</sup> <sup>∈</sup> Act(s) for which the denominator is non-zero. Then, Φ is a solution to the equation system

$$\begin{aligned} \forall s \in S \mid S\_0 \colon \sum\_{\alpha \in Act(s)} y\_{s,\alpha} &= \begin{bmatrix} s = s\_I \end{bmatrix} + \sum\_{\langle s',\alpha' \rangle \in pr(s)} \mathbf{P}(s',\alpha',s) \cdot y\_{s',\alpha'}\\ 1 &= \sum\_{y\_{s,\alpha} \in Var} y\_{s,\alpha} \cdot \sum\_{s' \in S\_0} \mathbf{P}(s,\alpha,s') \end{aligned}$$

iff Pr<sup>M</sup>σ<sup>Φ</sup> (♦S0)=1 and <sup>∀</sup> <sup>y</sup>s,α <sup>∈</sup> Var : <sup>Φ</sup>(ys,α)=E<sup>M</sup>σ<sup>Φ</sup> (**R**s,α♦S0) with reward structure **R**s,α given by **R**s,α(ˆs, α, s ˆ ) = [ˆs = s and αˆ = α].

In [23], the lemma is applied to decide achievability of multiple total reward objectives under strategies that are stationary, but not necessarily pure. Intuitively, <sup>E</sup><sup>M</sup>σ<sup>Φ</sup> (**R**s,α♦S0) coincides with the expected number of times action <sup>α</sup> is taken at state <sup>s</sup> until <sup>S</sup><sup>0</sup> is reached. Since this value can be infinite if Pr<sup>M</sup>σ<sup>Φ</sup> (♦S0) <sup>&</sup>lt; <sup>1</sup>, a solution Φ can only exist if it induces a strategy that almost surely reaches S0.

The encoding for unichain MDP with finite rewards and total reward objectives is shown in Fig. 3, where S<sup>0</sup> = <sup>j</sup> <sup>S</sup><sup>j</sup> <sup>0</sup> and S? = S \ S0. We consider the constraints in conjunction with Lines 1 and 2 from Fig. 2. Let Φ be a solution and let σ be the strategy encoded by such a solution, i.e., σ(s)(α) = Φ(as,α).

Lines 10 to 12 reflect the equations of Lemma 4. Since M is unichain and we assume finite rewards, there is just one end component in which no reward can be collected. Hence, S<sup>0</sup> is almost surely reached. Line 10 ensures that the

$$\forall s \in S\_{\mathcal{I}}, \alpha \in Act(s) \colon \underbrace{y\_{s,\alpha} \in [0, V\_s \cdot a\_{s,\alpha}]}\_{\hspace{0.1cm}} \tag{10}$$

$$\sum\_{\alpha \in Act(s)} y\_{s,\alpha} = [s = s\_I] + \sum\_{\langle s',\alpha'\rangle \in pre(s)} \mathbf{P}(s',\alpha',s) \cdot y\_{s',\alpha'} \tag{11}$$

$$1 = \sum\_{s \in S\_?} \sum\_{\alpha \in \mathcal{Act}(s)} y\_{s,\alpha} \cdot \sum\_{s' \in S\_0} \mathbf{P}(s,\alpha, s') \tag{12}$$

$$\forall j \in \{1, \ldots, \ell\}: \qquad x\_{s\_I}^j = \sum\_{s \in S\_{\mathcal{T}}} \sum\_{\alpha \in Act(s)}^{\ell} y\_{s,\alpha} \cdot \sum\_{s' \in S}^{\ell} \left(\mathbf{P}(s, \alpha, s') \cdot \mathbf{R}\_J(s, \alpha, s')\right) \tag{13}$$

$$x\_{s\_I}^j \sim\_j \mathbf{p}[j] \tag{14}$$

$$\text{Figure 3: } \mathsf{MLIP} \text{ encoding for total reward objects.}$$

strategy in Lemma 4 coincides with the encoded pure strategy σ. We write V<sup>s</sup> for an upper bound of the value a solution can possibly assign to ys,α, i.e., ∀ σ ∈ Σ<sup>M</sup> PS : <sup>V</sup><sup>s</sup> <sup>≥</sup> <sup>E</sup><sup>M</sup><sup>σ</sup> (**R**s,α♦S0). Such an upper bound can be computed based on ideas of [5]. More details are given in [17, App. A].

With Lemma 4 we get that Φ(ys,σ(s)) is the expected number of times state s is visited under strategy σ. Therefore, in Line 13 we sum up for each state s ∈ S? the expected amount of reward collected at s. This yields Φ(x<sup>j</sup> <sup>s</sup><sup>I</sup> )=E<sup>M</sup><sup>σ</sup> (**R**j♦G<sup>j</sup> ). Finally, Line 14 asserts that the resulting values exceed the thresholds given by *p*.

Theorem 3. For unichain M, finite rewards, and total reward objectives, the constraints in Fig. <sup>3</sup> and Lines <sup>1</sup> and <sup>2</sup> of Fig. <sup>2</sup> are feasible iff *<sup>p</sup>* <sup>∈</sup> Ach<sup>M</sup> PS(Q).

Proposition 2. The MILP encoding above considers <sup>O</sup>(|S|·|Act<sup>|</sup> <sup>+</sup> ) variables.

The encoding for total reward objectives considers fewer variables compared to the encoding of Sect. 3.3 (cf. Proposition 1). In practice, this often leads to faster solving times as we will see in Sect. 6.

#### 3.5 Extension to Multichain MDP

We now lift the restriction to unichain MDP, i.e., we consider multichain MDP with finite rewards. We focus on the encoding of Sect. 3.3. Details for the approach of Sect. 3.4 are in [17, App. C]. The key challenge is that the equation system in Lemma 3 does not yield a unique solution for multichain MDP.

Example 2. For the multichain MDP in Fig. 5a with G = {s1} we have S<sup>0</sup> = {s1} and S? = {s0} (the superscript j is omitted as there is only one objective). For σ with σ(s0) = α we get E<sup>M</sup><sup>σ</sup> (**R**♦G)=0, but every Φ: - x<sup>s</sup><sup>0</sup> , x<sup>s</sup><sup>1</sup> <sup>→</sup> <sup>R</sup> × {0} is a solution for the equation system in Lemma 3.

For multichain MDP it can be the case that for some strategy σ the set S<sup>j</sup> <sup>0</sup> is not reached with probability 1, i.e., there is a positive probability to stay in the set S<sup>j</sup> ? forever. For the induced Markov chain <sup>M</sup><sup>σ</sup>, this means that there is a reachable BSCC consisting only of states in S<sup>j</sup> ? . Since BSCCs of <sup>M</sup><sup>σ</sup> coincide with end components of M, we need to inspect the ECs of M that only consist of

zj

<sup>∀</sup> <sup>j</sup> ∈ {1,...,} , E ∈ MECS(M-Ej ? ): -Detect states with zero reward

$$\forall s \in S \left[ \mathcal{E} \right] \colon \begin{array}{c} \quad \left. \pm x\_s^j \le U\_s^j \cdot \left( 1 - e\_s^j \right) \right| \\ \quad \left. \right| \quad \left. \left. \right| \quad \left. \right| \end{array} \tag{15}$$

$$\forall \begin{array}{c} \forall s, \alpha \rangle \in \mathcal{E}: \\\\ \end{array} \qquad \qquad \begin{array}{c} e\_{s,\alpha}^{j} \in \{0, a\_{s,\alpha}\} \end{array} \tag{16}$$

$$e' \in \underline{succ(s, \alpha)} \colon \qquad e^j\_{s, \alpha} \le e^j\_{s'} \tag{17}$$

$$\forall s \in succ(s, \alpha) \colon \qquad \begin{aligned} \forall s' \in succ(s, \alpha) & \quad & e'\_{s, \alpha} \le e'\_{s'} \\ \quad & e^j\_s = \sum\_{\alpha \in Act(s)} \left[ \langle s, \alpha \rangle \in \mathcal{E} \right] \cdot e^j\_{s, \alpha} \end{aligned} \tag{17}$$

$$\forall \alpha \in \operatorname{Act}(s): \qquad \qquad z\_{s,\alpha}^j \in [0, V\_s \cdot a\_{s,\alpha}] \tag{19}$$

$$z\_{s,\perp}^{j} + \sum\_{s',\perp}^{z\_{s,\perp}^{j}} z\_{s,\alpha}^{j} = \frac{1}{|S\llbracket \mathcal{E} \rceil|} + \sum\_{s',\perp,\perp'} \mathbf{P}(s',\alpha',s) \cdot z\_{s',\alpha'}^{j} \tag{21}$$

$$\begin{array}{llll} \sum\_{\alpha \in Act(s)} & \left| S \left[ \mathcal{E} \right] \right| & \sum\_{\langle s', \alpha' \rangle \in pre(s) \cap \mathcal{E}} & \left( \omega \cdot \omega \right) \cdot z\_{s', \alpha'} \\ & 1 = \sum\_{\omega} & \left( z\_{s, \bot}^{\psi\_j} + \sum\_{\bot} \quad \left[ \langle s, \alpha \rangle \notin \mathcal{E} \right] \cdot z\_{s, \alpha}^{j} \right) \end{array} \tag{22}$$

α∈Act(s)

Figure 4: MILP encoding for detection of end components.

<sup>s</sup>∈SE

Sj ? -states. These ECs correspond to the ECs of the sub-MDP <sup>M</sup>-Ej ? , where <sup>E</sup><sup>j</sup> ? is the largest subset of S<sup>j</sup> ? <sup>×</sup>Act that is closed for <sup>M</sup>. For each E ∈ MECS(M-Ej ? ), we need to detect whether the encoded strategy induces a BSCC E ⊆ E.

To cope with multiple ECs, we consider the constraints from Fig. 2 in conjunction with the constraints from Fig. 4. Let Φ be a solution to these constraints and let σ be the encoded strategy σ with σ(s)(α) = Φ(as,α). For each objective ψ<sup>j</sup> and state s, a binary variable e<sup>j</sup> <sup>s</sup> is set to 1 if s lies on a BSCC of the induced MC <sup>M</sup><sup>σ</sup>. We only need to consider states <sup>s</sup> <sup>∈</sup> <sup>S</sup>-<sup>E</sup> for E ∈ MECS(M-Ej ? ).

Line 15 ensures that the value of x<sup>j</sup> <sup>s</sup> is set to 0 if <sup>s</sup> lies on a BSCC of <sup>M</sup><sup>σ</sup>. Lines 16 to 18 introduce binary variables e<sup>j</sup> s,α for each state-action pair in the EC such that any solution Φ satisfies Φ(e<sup>j</sup> s,α)=1 iff Φ(e<sup>j</sup> <sup>s</sup>) = Φ(as,α)=1. Line 17 yields that Φ(e<sup>j</sup> s,α)=1 implies Φ(e j s- )=1 for all successors s of s and the selected action α. Hence, for all s with Φ(e<sup>j</sup> <sup>s</sup>)=1 and for all s reachable from s in <sup>M</sup><sup>σ</sup>, we have <sup>Φ</sup>(<sup>e</sup> j s- )=1 and s , σ(s )∈E. Therefore, we can only set <sup>e</sup><sup>j</sup> <sup>s</sup> to 1 if there is a BSCC E ⊆ E that either contains s or that is almost surely reached from s without leaving E. As finite rewards are assumed, E can not contain a transition with positive reward, yielding E<sup>M</sup><sup>σ</sup> (**R**j♦G<sup>j</sup> )=0 if Φ(e<sup>j</sup> <sup>s</sup>)=1.

An assignment that sets all variables e<sup>j</sup> <sup>s</sup> and e<sup>j</sup> s,α to 0 trivially satisfies the constraints in Lines 15 to 18. In Lines 19 to 22 we therefore ensure that if a BSCC <sup>E</sup> ⊆ E exists in <sup>M</sup><sup>σ</sup>, <sup>Φ</sup>(e<sup>j</sup> <sup>s</sup>)=1 holds for at least one <sup>s</sup> <sup>∈</sup> <sup>S</sup>-E . The idea is based on the observation that if a BSCC <sup>E</sup> ⊆ E exists, there is a state <sup>s</sup> <sup>∈</sup> <sup>S</sup>-<sup>E</sup> that does not reach the set <sup>S</sup> \S-<sup>E</sup> almost surely. We consider the MDP <sup>M</sup><sup>E</sup> , a mild extension of <sup>M</sup>-<sup>E</sup> given by <sup>M</sup><sup>E</sup> = (S-<sup>E</sup> - sE <sup>I</sup> , s<sup>E</sup> ⊥ , Act {α<sup>I</sup> , ⊥} , **<sup>P</sup>**<sup>E</sup> , s<sup>E</sup> I ), where **<sup>P</sup>**<sup>E</sup> extends **<sup>P</sup>**-<sup>E</sup> such that **<sup>P</sup>**<sup>E</sup> (s<sup>E</sup> <sup>⊥</sup>, <sup>⊥</sup>, s<sup>E</sup> <sup>⊥</sup>)=1 and <sup>∀</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup>-E : – **P**<sup>E</sup> (s<sup>E</sup> <sup>I</sup> , α<sup>I</sup> , s)=1/|S-<sup>E</sup>|, **<sup>P</sup>**<sup>E</sup> (s, <sup>⊥</sup>, s<sup>E</sup> <sup>⊥</sup>)=1, and

$$-\left\{\alpha \stackrel{\cdot}{\in} \{\hat{\alpha} \in \operatorname{Act}(s) \mid \bar{\langle s, \stackrel{\cdot}{\alpha} \rangle} \notin \mathcal{E}\right\} \colon \mathbf{P}^{\mathcal{E}}(\bar{s}, \alpha, s\_{\perp}^{\mathcal{E}}) = 1.\right\}$$

Lines 21 and 22 reflect the equation system from Lemma 4 for MDP M<sup>E</sup> and S<sup>0</sup> = {s⊥}. Additionally, Lines 19 and 20 exclude negative solutions and assert

Figure 5: MDPs referenced in Examples 2 and 4.

Φ(z<sup>j</sup> s,α)=0 if Φ(as,α)=0 and Φ(z ψ<sup>j</sup> s,⊥)=0 if <sup>Φ</sup>(e<sup>j</sup> <sup>s</sup>)=0 for any solution Φ. Hence, for states <sup>s</sup> <sup>∈</sup> <sup>S</sup>-<sup>E</sup> where <sup>Φ</sup>(e<sup>j</sup> <sup>s</sup>)=0, the strategy σ encoded by the variables as,α coincides with the strategy considered in Lemma 4. Assume that solution <sup>Φ</sup> yields a BSCC within the states of <sup>E</sup> in <sup>M</sup><sup>σ</sup> and therefore also a BSCC in (M<sup>E</sup> )<sup>σ</sup>. Since <sup>s</sup><sup>E</sup> <sup>⊥</sup> has to be reached almost surely in <sup>M</sup><sup>E</sup> (cf. Lemma 4), the BSCC has to contain at least one state s with Φ(e<sup>j</sup> <sup>s</sup>)=1.

In summary, Lines <sup>19</sup> to <sup>22</sup> imply that every BSCC <sup>E</sup> ⊆ E of <sup>M</sup><sup>σ</sup> contains at least one state s with Φ(e<sup>j</sup> <sup>s</sup>)=1. Then, with Lines 16 to 18 we get that Φ(e j s- )=1 has to hold for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>-E . In <sup>M</sup><sup>σ</sup>, the set <sup>S</sup><sup>j</sup> <sup>0</sup> ∪ - <sup>s</sup> <sup>|</sup> <sup>Φ</sup>(e<sup>j</sup> <sup>s</sup>)=1 is therefore reached almost surely and all the states in this set get assigned value 0. In this case, the solution of the equation system from Lemma 3 becomes unique again.

Theorem 4. For finite rewards, the constraints in Figs. 2 and 4 are feasible iff *<sup>p</sup>* <sup>∈</sup> Ach<sup>M</sup> PS(Q).

#### 3.6 Extension to Infinite Rewards

Our approach can be modified to allow PSMA instances where infinite expected reward can be collected, i.e., where Restriction 2 does not hold. Infinite reward can be collected if we cycle through an EC of M that contains a transition with positive reward. Such instances are of practical interest as this often corresponds to strategies that do not accomplish a certain goal (e.g., a robot that stands still and therefore requires infinite time to finish its task).

We sketch the necessary modifications. More details are in [17, App. B]. Let S<sup>∞</sup> be the set of states where every pure strategy induces infinite reward for at least one minimizing objective. To ensure that the MILP instance has a (realvalued) solution, we consider the sub-MDP of M obtained by removing S∞.

If infinite reward can be collected in an EC, it should not be considered in Fig. 4. We therefore let E range over maximal ECs that only consist of (a) states in S<sup>j</sup> ? and (b) transitions with reward 0.

The upper bounds U<sup>j</sup> <sup>s</sup> for the maximal expected rewards at each state can not be set to ∞. However, for the encoding it suffices to compute values that are sufficiently large. However, we remark that in practice our approach from [17, App. B] can lead to very large values, yielding numerical instabilities.

For maximizing objectives, we introduce one additional objective which, in a nutshell, checks that the probability to reach a 0-reward BSCC is below 1. If this is the case, there is a positive probability to reach a BSCC in which infinitely many reward can be collected.

## 4 Computing the Pareto Front

Our next goal is to compute the pure stationary Pareto front Pareto<sup>M</sup> PS(Q) for MDP M and multi-objective query Q. This set can be very large, in particular if the objectives are strongly conflicting with many different tradeoffs. In the worst case, every pure stationary strategy induces a point *<sup>p</sup>* <sup>∈</sup> Pareto<sup>M</sup> PS(Q) (e.g., for <sup>Q</sup> <sup>=</sup> E≤(**R**♦G), <sup>E</sup>≥(**R**♦G)). We try to find an approximation of Pareto<sup>M</sup> PS(Q).

Definition 7. Let <sup>∈</sup> (R><sup>0</sup>). An -approximation of <sup>P</sup> <sup>⊆</sup> (R∞) is a pair L, U with <sup>L</sup> <sup>⊆</sup> <sup>P</sup> <sup>⊆</sup> <sup>U</sup> and <sup>∀</sup> *<sup>p</sup>* <sup>∈</sup> <sup>P</sup> : <sup>∃</sup> *<sup>p</sup>* <sup>∈</sup> <sup>L</sup> <sup>∪</sup> ((R∞) \ <sup>U</sup>): <sup>|</sup>*<sup>p</sup>* <sup>−</sup> *<sup>p</sup>* | ≤ .

Pure Stationary Pareto Approximation Problem (PSP <sup>≈</sup>)

Input: MDP <sup>M</sup>, -dimensional multi-objective query <sup>Q</sup>, precision <sup>∈</sup> (R>0) such that Pareto<sup>M</sup> PS(Q) <sup>⊆</sup> <sup>R</sup> Output: An -approximation of clQ(Pareto<sup>M</sup> PS(Q))

For simplicity, we only consider inputs that satisfy restriction Restriction 2, i.e., for <sup>ψ</sup><sup>j</sup> <sup>=</sup> <sup>E</sup><sup>∼</sup><sup>j</sup> (**R**j♦G<sup>j</sup> ) there is <sup>U</sup><sup>j</sup> = ∞ such that ∀ σ ∈ Σ<sup>M</sup> PS : <sup>U</sup><sup>j</sup> <sup>≥</sup> <sup>E</sup><sup>M</sup><sup>σ</sup> (**R**j♦G<sup>j</sup> ). Ideas of Sect. 3.6 can be used for some other inputs. An all-embracing treatment of infinite rewards, in particular for maximizing ψ<sup>j</sup> , is subject to future work.

Our approach for PSP <sup>≈</sup> successively divides the solution space into candidate regions. For each region <sup>R</sup> (initially, let <sup>R</sup> = [0, U<sup>1</sup>] × ··· × [0, U]), we use the MILP encoding from Sect. 3 with an optimization function to find a point *<sup>p</sup>* ∈R∩ Pareto<sup>M</sup> PS(Q) (or find out that no such point exists). The region R is divided into (i) an achievable region <sup>R</sup><sup>A</sup> <sup>⊆</sup> Ach<sup>M</sup> PS(Q), (ii) an unachievable region <sup>R</sup><sup>U</sup> <sup>⊆</sup> <sup>R</sup>\Ach<sup>M</sup> PS(Q), (iii) further candidate regions R1,..., R<sup>n</sup> that are analyzed subsequently, and (iv) the remaining area R \ (R<sup>A</sup> ∪ R<sup>U</sup> ∪ R<sup>1</sup> ∪···∪Rn) which does not require further analysis as we are only interested in an -approximation. The procedure stops as soon as no more candidate regions are found.

Example 3. Fig. 6 sketches the approach for an MDP M and a query Q with two maximizing objectives. We maintain a set of achievable points (light green) and a set of unachievable points (red). Initially, our candidate region corresponds to <sup>R</sup><sup>1</sup> = [0, U<sup>1</sup>]×[0, U<sup>2</sup>] given by the white area in Fig. 6a. We consider the direction vector *w*<sup>1</sup> which is orthogonal to the line connecting U<sup>1</sup>, 0 and 0, U<sup>2</sup> . To find some point *<sup>p</sup>* <sup>∈</sup> Pareto<sup>M</sup> PS(Q) ∩ R1, we solve the MILP resulting from the constraints as in Sect. 3, the constraint x1 <sup>s</sup><sup>I</sup> , x<sup>2</sup> sI ∈ R1, and the optimization function *w*<sup>1</sup> · x1 <sup>s</sup><sup>I</sup> , x<sup>2</sup> sI . Fig. 6b shows the obtained point *p*<sup>1</sup> ∈ R1. Since *p*<sup>1</sup> is achievable, we know that any point in clQ({*p*1}) has to be achievable as well. Moreover, the set {*p* ∈ R<sup>1</sup> | *w*<sup>1</sup> · *p* > *w*<sup>1</sup> · *p*1} indicated by the area above the diagonal line in Fig. 6b can not contain an achievable point. The gray areas do not have to be checked in order to obtain an -approximation. We continue with R<sup>2</sup> indicated by the white area and the direction vector *w*2, orthogonal to the line connecting 0, U<sup>2</sup> and *p*1. As before, we solve an MILP now yielding the point *<sup>p</sup>*<sup>2</sup> in Fig. 6c. We find achievable points clQ({*p*2}) but no further unachievable points. The next iteration considers candidate region R<sup>3</sup> and direction vector

Figure 6: Example exploration of achievable points.

*w*1, yielding point *p*<sup>3</sup> shown in Fig. 6d. The trapezoidal area is added to the unachievable points whereas clQ({*p*3}) is achievable. Finally, we check <sup>R</sup><sup>4</sup> for which the corresponding MILP instance is infeasible, i.e., R<sup>4</sup> is unachievable.

The ideas sketched above can be lifted to > 2 objectives. Inspired by [24, Alg. 4], we choose direction vectors that are orthogonal to the convex hull of the achievable points found so far. In fact, for total reward objectives we can apply the approach of [24] to compute the points in Pareto<sup>M</sup> PS(Q)∩ParetoM(Q) first and only perform MILP-solving for the remaining regions. As the distance between two found points *p*, *p* is at least |*p* − *p* | ≥ , we can show that our approach terminates after finding at most <sup>j</sup> <sup>U</sup><sup>j</sup>/<sup>j</sup> points. Other strategies for choosing direction vectors are possible and can strongly impact performance.

## 5 Bounded Memory

For GMA, it is necessary and sufficient to consider strategies that require memory exponential in the number of objectives [20,24,40] by storing which goal state set has been reached already. In contrast, restricting to pure (but not necessarily stationary) strategies imposes nontrivial memory requirements that do not only depend on the number of objectives, but also on the point that is to be achieved.

Example 4. Let <sup>M</sup> be the MDP in Fig. 5b and <sup>Q</sup> <sup>=</sup> P<sup>≥</sup> (♦G), <sup>P</sup><sup>≥</sup> (♦G-). The point *p*<sup>k</sup> = <sup>0</sup>.5<sup>k</sup>, <sup>1</sup>−0.5<sup>k</sup> for <sup>k</sup> <sup>∈</sup> <sup>N</sup> is achievable by taking <sup>α</sup> with probability <sup>0</sup>.5<sup>k</sup>. *<sup>p</sup>*<sup>k</sup> is also achievable with the pure strategy <sup>σ</sup><sup>k</sup> where <sup>σ</sup>k(ˆπ) = <sup>α</sup> iff <sup>|</sup>πˆ| ≥ <sup>k</sup>. σ<sup>k</sup> uses k memory states. Pure strategies with fewer memory states do not suffice.

We search for pure strategies with bounded memory. For an MDP M and K > 0, let Σ<sup>M</sup> <sup>P</sup>,K denote the set of pure K-memory strategies, i.e., any σ ∈ Σ<sup>M</sup> <sup>P</sup>,K can be represented by a Mealy machine using up to K states (c.f. [17, App. D]). For a query <sup>Q</sup>, let Ach<sup>M</sup> <sup>P</sup>,K(Q) be the set of points achievable by some σ ∈ Σ<sup>M</sup> <sup>P</sup>,K.

Pure Bounded Multi-objective Achievability Problem (PBMA) Input: MDP <sup>M</sup>, multi-objective query <sup>Q</sup>, memory bound <sup>K</sup>, point *<sup>p</sup>* <sup>∈</sup> (R∞) Output: Yes iff *<sup>p</sup>* <sup>∈</sup> Ach<sup>M</sup> <sup>P</sup>,K(Q)


Table 1: Results for stationary strategies.

The pure bounded Pareto approximation problem is defined similarly. We reduce a PBMA instance to an instance for PSMA. The idea is to incorporate a memory structure of size K into M and then construct a pure stationary strategy in this product MDP (see, e.g., [29] for a similar construction). The set of strategies can be further refined by considering e.g., a memory structure that only allows counting or that only remembers visits of goal states. See [17, App. D] for details.

## 6 Evaluation

We implemented our approach for PSP <sup>≈</sup> in the model checker Storm [16] using Gurobi [27] as back end for MILP-solving. The implementation takes an MDP (e.g., in Prism syntax), a multi-objective query, and a precision ε > 0 as input and computes an -approximation of the Pareto front. Here, we set <sup>j</sup> <sup>=</sup> <sup>ε</sup> · <sup>δ</sup><sup>j</sup> , where δ<sup>j</sup> is the difference between the maximal and minimal achievable value for objective ψ<sup>j</sup> . We also support reward objectives for Markov automata via [38]. The computations within Gurobi might suffer from numerical instabilities. To diminish their impact, we use the exact engine of Storm to confirm for each MILP solution that the encoded strategy achieves the encoded point. However, sub-optimal solutions returned by Gurobi may still yield inaccurate results.

We evaluate our approach on 13 multi-objective benchmarks from [24,28,38], each considering one or two parameter instantiations. Application areas range over scheduling (dpm [37], eajs [1], jobs [10], polling [43]), planning (rg [6], rover [28], serv [32], uav [21]), and protocols (mutex [38], str [38], team [15], wlan [31]).

The results for pure stationary strategies are summarized in Table 1. For each benchmark we denote the number of objectives and whether the alternative encoding from Sect. 3.4 has been applied (∗). For each parameter instantiation (Par.), the number of states (|S|), the percentage of the states that are contained in an end component (%E), and the average number of available actions at each state (Act) are given. For each precision <sup>ε</sup> ∈ {0.01, <sup>0</sup>.001}, we then depict the

(a) Alt. (x) vs. original (y) encoding. (b) Pareto fronts for restricted strategies.

Figure 7: Comparison of the two encodings (left) and impact of memory (right).

runtime of Storm and the number of points on the computed approximation of the Pareto front. TO denotes that the approach did not terminate within 2 hours, MO denotes insufficient memory (16 GB). All experiments used 8 cores of an Intel <sup>R</sup> Xeon <sup>R</sup> Platinum 8160 Processor.

Storm is often able to compute pure stationary Pareto fronts, even for models with over 100 000 states (e.g., uav). However, the model structure strongly affects the performance. For example, the second instance of jobs is challenging although it only considers 117 states, a low degree of nondeterminism, and no (non-trivial) end components. Small increments in the model size can increase runtimes significantly (e.g., dpm or uav). If a higher precision is requested, much more points need to be found, which often leads to timeouts. Similarly, for more than 2 objectives the desired accuracy can often not be achieved within the time limit. The approach can be stopped at any time to report on the current approximation, e.g., after 2 hours Storm found 65 points for Instance 1 of mutex.

For almost all benchmarks, the objectives could be transformed to total reward objectives, making the more efficient encoding form Sect. 3.4 applicable. We plot the runtimes of the two encoding in Fig. 7a. The alternative encoding is superior for almost every benchmark. In fact, the original encoding timed out for many models as indicated at the horizontal line at the top of the figure.

In Fig. 7b we plot the Pareto front for the first polling instance under general strategies (Gen), pure 2-memory strategies that can change the memory state exactly once (PM2), pure strategies that observe which goal state set G<sup>j</sup> has been visited already (PMG), and pure stationary strategies (PS). Adding simple memory structures already leads to noticeable improvements in the quality of strategies. In particular, PM<sup>2</sup> strategies perform quite well, and even outperform PM<sup>G</sup> strategies (which would be optimal if randomization were allowed).

Data availability. The artifact [18] accompanying this paper contains source code, benchmark files, and replication scripts for our experiments.

Acknowledgments. The authors thank Sebastian Junges for his valuable contributions during early stages of this work.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Model Checking and Reachability

#### **Partial Order Reduction for Deep Bug Finding in Synchronous Hardware** *-*

Makai Mann and Clark Barrett

Stanford University, Stanford, CA 94305 USA

**Abstract.** Symbolic model checking has become an important part of the verification flow in industrial hardware design. However, its use is still limited due to scaling issues. One way to address this is to exploit the large amounts of symmetry present in many real world designs. In this paper, we adapt partial order reduction for bounded model checking of synchronous hardware and introduce a novel technique that makes partial order reduction practical in this new domain. These approaches are largely automatic, requiring only minimal manual effort. We evaluate our technique on open-source and commercial packet mover circuits – designs containing FIFOs and arbiters.

## **1 Introduction**

Modern society relies increasingly on electronic systems, powered by hardware components that continue to grow in complexity and variety. Ensuring the functional correctness of these components is essential, as bugs and errors can have consequences ranging from undermining a company's reputation to jeopardizing human safety [1,22,25,32,33]. Most electronic designs must therefore include a significant verification effort, and this effort often consumes more time and resources than all other aspects of the design process [17,34].

Formal methods such as symbolic model checking have become a crucial part of the verification effort because of their strong guarantees and automation [24]. However, due to the *state space explosion problem* [14], model checking typically only works well for small- to medium-sized circuits with primarily control logic, limiting its potential for addressing industry verification challenges.

One approach for combating the state space explosion problem is *partial order reduction* [14]. While symbolic partial order reduction has been successfully applied for the verification of asynchronous systems [37], its use in synchronous systems has been limited. In this paper, we introduce a novel approach for adapting symbolic partial order reduction to model checking of synchronous hardware

<sup>-</sup> This work was supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1656518. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. This work was also supported by the Defense Advanced Research Projects Agency, grant FA8650-18-2-7854. We thank the funding agencies and our corporate collaborators for their support.

and demonstrate dramatic reductions in the time to reach deep bugs on certain classes of synchronous circuits. Moreover, the technique requires only an interface-level annotation of the circuit, and when fully automated approaches fail, can be guided by the user. The paper makes the following contributions:


The rest of the paper is organized as follows. We first provide a motivating example, below. Then, in Section 2, we cover relevant background material and notation. We explain our partial order reduction in Section 3 and our interface simplification technique in Section 4. We provide an experimental evaluation in Section 5. Section 6 covers related work, and Section 7 concludes.

#### **1.1 Motivating Example**

Throughout this paper we use the running example shown in Code Snippet 1. We chose this example because: i) it is easy to understand; ii) it resembles real-world packet mover circuits; and iii) it contains a difficult to reach bug.

The system has a synchronizing clock and takes two 1-bit inputs: inc x and inc y. The 6-bit registers (state elements) x and y index the valid vector and are initialized to 0. The 64-bit registers valid and data start at 0 and 1, respectively. The 64x64 bit memory is uninitialized. If inc x and en x are true, the system increments the value of x. When inc y is true, the system increments y, sets the valid bit at index y, writes data to the memory at location y, and rotates the data vector to the left. Notice that the en x signal ensures that x never surpasses y (until all bits in valid are set). This incrementing pointer logic is similar to that found in a circular pointer FIFO. To ensure the asserted property, the code attempts to maintain the invariant: data = 1 << y.

At first, it appears that the asserted property should hold based on this invariant, but it does not. There is a bug that can first occur at cycle 65: the overflow check in the data update uses integers, which are assumed to be 32 bits. Since y is zero-extended to be 32-bits, y+1 can never be equal to 0. Thus, when y has the value 63 and is incremented, data, which is supposed to be one-hot, is set to 0.

Although the system is small, this is a surprisingly difficult bug to reach using model checking. We believe this is due in part to the non-determinism in the

```
1 module deep_bug(input clk, input inc_x, input inc_y);
2 reg [5:0] x = 0;
3 reg [5:0] y = 0;
4 reg [63:0] valid = 0;
5 reg [63:0] data = 1;
6 reg [63:0] mem [63:0];
7 wire en_x;
8 assign en_x = valid[x];
9
10 always @(posedge clk) begin
11 if (inc_x & en_x)
12 x <= x + 1;
13 if (inc_y) begin
14 y <= y + 1; valid[y] <= 1'b1; mem[y] <= data;
15 data <= (y+1 == 0)?1: (data << 1);
16 end
17 end
18
19 always @*
20 assert ((mem[x] == (1 << x)) || ˜valid[x]);
21 endmodule // deep_bug
```
Code Snippet 1: Buggy Toy Example

update logic. In every state, there are 4 possible input combinations. As a result, there are an exponential number of execution paths. Model checkers routinely verify hardware designs with an exponential number of reachable states; however, we have observed that systems such as this which also have an exponential number of execution paths are difficult for a model checker to manage.

Specifically, all but two of the model checker configurations we tried timed out at 2 hours before reaching the bug. Since bounded model checking (BMC) is one of the best approaches for bug-finding, we focus on improvements to BMC that help reach this bug. We introduce automated, best effort techniques that reduce the time to hit this bug from over 1000 seconds to 46 seconds by safely adding temporal symmetry breaking constraints to the system.

## **2 Background**

Before explaining our algorithm, we adapt the standard notion of synchronous transition systems and review fundamental model checking concepts below. For a more thorough introduction to model checking, we refer the reader to [14,15].

**Definition 1.** *A Synchronous Transition System (STS) is a tuple,* -S,*Init*, A,*En*, D, T*:*

```
– S: a set of states
```
**–** *Init* ⊆ S*: a set of initial states*


For our purposes, an STS instruction can perform multiple atomic actions simultaneously. We define the system's *instruction set* (i.e. the set of actions that the system can perform in one transition) as I := P(A). We then define the set of *inputs* of an STS as *Input* := I × D. Thus, the transition relation T is a subset of S × *Input* × S.

We denote the cardinality of an instruction i as |i|. For s, s- ∈ S, *in* ∈ *Input*, T(s, *in*, s- ) holds *iff* it is possible to reach s from s by applying input *in*. It is often convenient to reason about sequences using vector notation. Let *in* <sup>∈</sup> *Input*<sup>n</sup> and *<sup>s</sup>* <sup>∈</sup> <sup>S</sup><sup>n</sup>+1, with n > 0. We use subscripts to name individual elements of vectors, e.g. *s* := s0, s1,.... We use the notation T(*in*, *s*) to denote - <sup>0</sup>≤i<n <sup>T</sup>(si, *in*i, si+1). The length of a vector is given by |·|, e.g. |*s*| = n + 1, and prepending is represented as · : ·, e.g. *s* = s<sup>0</sup> : *s* for some s- <sup>∈</sup> <sup>S</sup><sup>n</sup>. With some abuse of notation, we allow prepending both sequences and single elements. For k > 0, we say that *<sup>s</sup>* <sup>∈</sup> <sup>S</sup><sup>k</sup> is reachable if <sup>∃</sup><sup>n</sup> <sup>∈</sup> <sup>N</sup>, *<sup>s</sup>*- <sup>∈</sup> <sup>S</sup><sup>n</sup>+1,*in* <sup>∈</sup> *Input*<sup>n</sup>+<sup>k</sup> . *Init*(s- <sup>0</sup>) ∧ T(*in*, *s*-: *s*).

The set of enabledness predicates *En* constrain the valid states in which an action can occur. For an instruction i ∈ I and s ∈ S, let *en*i(s) := - <sup>a</sup>∈<sup>i</sup> ena(s). In the remainder of the paper, we only consider transition relations T that respect the enabledness conditions. That is, we assume ∀ s, i.(*en*i(s) ↔ ∃ s- , d.T(s,i, d, s- )). Depending on the context, this can be checked with a model checker or added as an environmental assumption. We also assume that the existence of a transition does not depend on the data input, that is, ∀ s, i.(∃ d, s- . T(s,i, d, s- ) =⇒ ∀ d. ∃ s- . T(s,i, d, s- )).

*Example 1.* We can define an STS for the motivating example. Let *BV*<sup>k</sup> denote the set of all bitvectors of width k. Because there is only a single clock with no negative edge behavior, we model the system without the clock, where every transition corresponds to a clock cycle. Define an STS -S,*Init*, A,*En*, D, T, where:


**Model Checking.** Given an STS **S**, let a safety property P ⊆ S be a set containing acceptable states. The *model checking problem* is to determine whether the system stays within this acceptable set for all possible execution traces. Formally, we want to check whether the following holds:

$$\forall n \ge 0, \dot{m} \in Input^n, \mathbf{s} \in S^{n+1}. \left( Int(s\_0) \land T(\dot{m}, \mathbf{s}) \right) \implies s\_n \in P \tag{1}$$

When equation (1) holds, we say that P is an invariant of **S**. A number of techniques exist for solving this problem, including Binary Decision Diagram (BDD)-based [12] approaches, Interpolant-based [27] approaches, and IC3/PDR (property directed reachability) techniques [10,16]. We refer the interested reader to [15] for a more complete survey of model checking algorithms.

In this paper, we will focus on *bounded model checking* (BMC). In BMC, instead of proving (1) for all n, we prove it for all n less than some finite bound k. Though it typically cannot be used to prove properties, BMC can be quite effective at finding bugs [6] and is especially useful when full model checking is infeasible.

**Symmetry.** Early on in the development of model checking, researchers recognized the importance of symmetry reduction to combat the state explosion problem [13]. Existing approaches in the hardware domain perform data symmetry reduction and data type reduction through the use of bit-width reduction preprocessing passes or syntactic restrictions such as *scalarsets* [8,20,28]. There have also been abstraction-refinement loop algorithms proposed to handle memory symmetries [9]. All of these approaches are focused on symmetries present in the transition system description, such as the presence of large data types. We refer to these types of symmetries as *data* symmetries. Most of these techniques are intended to speed up proofs of true properties rather than accelerate bug-finding.

Model checking of asynchronous systems such as concurrent programs faces an orthogonal issue due to the many possible redundant interleavings of independent processes. Throughout this paper, we refer to this as *path* symmetry. Path symmetry is a temporal symmetry: it relates to executions of a system rather than just its size. Path symmetries occur when there are many distinct ways of reaching the same state in a system execution. Exploring all such paths can result in exponential case splitting.

This paper provides evidence that path symmetry can also severely hurt model checking performance in synchronous systems. One of the first techniques proposed to handle path symmetry was partial order reduction.

**Partial Order Reduction.** Partial order reduction was first developed in the explicit-state model checking context but was later extended to symbolic model checking [37]. The approach is named "partial order reduction" for historical reasons, but Clarke noted in [14] that "model checking using representatives" [30,31] may have been a more appropriate name. In particular, partial order reduction attempts to develop equivalence classes of behaviors so that only one representative from each class needs to be considered during model checking. Note that

partial order reductions are sound only for checking state invariants. If the property of interest is temporal, the reduction could disallow input sequences that trigger the property. This can be avoided by first instantiating a monitor [15] and, if necessary, converting liveness properties to safety [5].

Partial order reduction is less natural in the synchronous setting, because synchronous transition systems do not have easily expressible independent actions. Nevertheless, these systems can still benefit from partial order reduction. Consider our motivating example: despite the huge number of system execution paths to consider, many of them are redundant. Observe that if both inputs are zero, then the state does not change. Furthermore, there is a temporal symmetry in the system execution: from any state where en x is true, driving only inc x followed by only inc y results in the same state as driving them in the opposite order. Thus, this system has a large number of redundant interleavings, much like a multi-threaded program. To address this problem, we introduce a partial order reduction for synchronous hardware. Our goal is to remove redundant interleavings by adding constraints to the system. To maintain soundness, we provide a set of conditions which must pass before we can add constraints.

## **3 Synchronous Partial Order Reduction**

In order to be able to apply partial order reduction to a synchronous transition system, we are interested in identifying pairs of instructions that can be reordered without affecting the resulting state. More generally, we also want to be able to find pairs that can only be reordered under certain conditions. To formalize these notions, we adapt the notation and representation of *guarded independence relations* from [37].<sup>1</sup>

**Definition 2.** *Given an STS:* -S,*Init*, A,*En*, D, T *with instruction set* I*, let* G := P(S) *be the set of predicates over the states. Let* i0, i1, g *be a* guarded independence tuple *iff for all* <sup>d</sup>0, d<sup>1</sup> <sup>∈</sup> <sup>D</sup> *and reachable <sup>s</sup>* <sup>∈</sup> <sup>S</sup><sup>3</sup>*, the following condition holds:*

*en*<sup>i</sup><sup>0</sup> (s0)∧g(s0)∧T(-i1, d1,i0, d0, *s*) =⇒ ∃ s- .T(-i0, d0,i1, d1,s0, s- , s2).

According to this definition, if we can prove that i0, i1, g is a guarded independence tuple, then we can reorder i1, i0 instruction sequences as long as i) i<sup>0</sup> is enabled in the first state; ii) g holds in the first state; and iii) we also reorder the corresponding data inputs. We check only the enabledness of i<sup>0</sup> because i0, i1 is the representative order, and we only need to be able to reorder to the representative, not from it. The guard allows us to consider partial order reductions that only hold for a subset of the reachable states. To avoid trivially overconstraining the system with conflicting reorderings, we will only consider one ordering for each pair of instructions.

The condition in Definition 2 is difficult to check automatically because of the existential quantifier. We instead check two slightly weaker conditions that

<sup>1</sup> The main differences are our STS formalism and that we consider reachability.

Fig. 1: Partial Order Reduction Condition (3) Proof Goal

imply guarded independence. These conditions are also standard in the POR literature [14,37]. The first condition states that instruction i<sup>0</sup> cannot disable i<sup>1</sup> under guard g:

$$\forall d \in D, \mathbf{s} \in S^2. \left( en\_{i\_1}(s\_0) \land g(s\_0) \land T(s\_0, \langle i\_0, d \rangle, s\_1) \right) \implies en\_{i\_1}(s\_1) \tag{2}$$

Intuitively, this condition ensures that we do not remove reachable states by disabling instructions. The second condition is that executing the instructions in either order leads to the same final state:

$$\begin{aligned} \forall d\_0, d\_1 \in D, \mathbf{s}, \mathbf{s}' \in S^3 \ &\ (g(s\_0) \land (s\_0 = s\_0') \land \\ T(\langle \langle i\_0, d\_0 \rangle, \langle i\_1, d\_1 \rangle \rangle, \mathbf{s} \rangle \land T(\langle \langle i\_1, d\_1 \rangle, \langle i\_0, d\_0 \rangle \rangle, \mathbf{s}')) \implies (s\_2 = s\_2') \end{aligned} \tag{3}$$

When applying partial order reduction to concurrent programs, the standard approach is to check conservative syntactic properties which guarantee conditions (2) and (3). Synchronous systems do not typically have these syntactic properties, because there is no notion of distinct processes. Instead, we must check these conditions directly. In real circuits, it is unlikely that (2) will hold over arbitrary states. However, it is sufficient to prove that it holds for all reachable states. This can be done with a model checker.

To prove (3), we could encode it as an LTL property or build a monitor automaton and use a model checker. Alternatively, we have found that we can often use a straightforward commuting-diagram approach starting from a symbolic initial state, depicted in Fig. 1. We duplicate the system, unroll it twice, then start both copies in the same symbolic state and check that applying the instructions in either order results in the same final state. This simple approach has the disadvantage that a symbolic initial state ignores reachability which could lead to spurious counterexamples. However, notice that the initial state is constrained by enabledness assumptions. To apply an instruction it must be enabled, so both instructions must be enabled in the initial state. We have found that these enabledness assumptions often constrain the initial state enough to rule out spurious counterexamples.

If both conditions pass, then we can choose a representative order and disallow the opposite ordering for that pair of instructions. If the proof of condition (3) fails, it provides a counterexample which should either convince the user that partial order reduction does not apply for that pair of instructions (a real counterexample), or serve as a guide for the user to write guards that would remove the spurious counterexample. Other invariants of the system, either obtained automatically or manually guessed by the user, could also remove spurious counterexamples. We can now state the first theorem of synchronous partial order reduction: that these conditions guarantee guarded independence over all reachable states.

**Theorem 1.** *Given an STS* **S** := -S,*Init*, A,*En*, D, T*, with instruction set* I := P(A)*: if conditions* (2) *and* (3) *hold for instructions* io, i<sup>1</sup> ∈ I*, and guard* g ∈ P(S)*, then* i0, i1, g *is a guarded independence tuple.*

*Proof.* Assume conditions (2) and (3) and that for some d0, d<sup>1</sup> ∈ D and reachable *<sup>s</sup>* <sup>∈</sup> <sup>S</sup><sup>3</sup>, we have:

$$\operatorname{en}\_{i\_0}(s\_0) \land g(s\_0) \land T(\langle \langle i\_1, d\_1 \rangle, \langle i\_0, d\_0 \rangle \rangle, \mathbf{s})$$

Because en<sup>i</sup><sup>0</sup> (s0), we have ∃s- , d- . T(s0,i0, d- , s- ) because of our enabledness assumption. Furthermore, by the data-input independence property of transition relations, it follows that for some s- <sup>1</sup>, T(s0,i0, d0, s- <sup>1</sup>) Now, because one of our assumptions is a transition from s<sup>0</sup> using i1, en<sup>i</sup><sup>1</sup> (s0) must be true. Condition (2) implies that en<sup>i</sup><sup>1</sup> (s- <sup>1</sup>), thus ∃ s- , d- . T(-i0, d0,i1, d- ,s0, s- 1, s- ). As before, this implies that for some s- <sup>2</sup>, we also have that T(-i0, d0,i1, d1,s0, s- 1, s- <sup>2</sup>). It then follows from (3) that s- <sup>2</sup> = s2, and thus, i0, i1, g satisfies the condition from Definition 2. -

Let a guarded independence relation, R ⊆ I×I× G, be a set of guarded independence tuples. We now describe how to apply partial order reductions, given some R. For each <sup>i</sup>0, i1, g ∈ <sup>R</sup>, and for every *<sup>s</sup>* <sup>∈</sup> <sup>S</sup><sup>2</sup>, d <sup>∈</sup> <sup>D</sup>, whenever T(s0,i1, d1, s1)∧en<sup>i</sup><sup>0</sup> (s0)∧g(s0) holds, we remove from T every transition of the form s1,<sup>i</sup>0, d, s (for any <sup>d</sup> and <sup>s</sup>). Let <sup>T</sup> <sup>R</sup> be the result. To apply this reduction in practice, we add a constraint to the BMC encoding: (g(s0)∧en<sup>i</sup><sup>0</sup> (s0)∧i1) =⇒ ¬*next*(i0).

This makes it impossible for the STS system to ever execute an instruction i<sup>0</sup> after an instruction i<sup>1</sup> when starting from a state where i<sup>0</sup> is enabled and g holds. This effectively gives preference to i<sup>0</sup> as long as it is enabled. The effect of partial order reduction on a pair of instructions in a synchronous system is depicted in Fig. 2. Red X's show removed transitions, and for simplicity, we assume a trivial guard of *true*. Notice that all states are still reachable via some path from the initial state in the bottom left corner.

**Theorem 2.** *Given* **S** := -S,*Init*, A,*En*, D, T*, let* R *be a guarded independence relation and let* **S**<sup>R</sup> *be the* reduced *STS obtained by replacing* T *with* T <sup>R</sup> *in* **S***. Then, if a property* P *is an invariant for* **S**R*, it is also an invariant for* **S***.*

Fig. 2: Effect of Partial Order Reduction for Instructions i<sup>0</sup> and i1. Initial state is green.

*Proof.* It suffices to show that **S**<sup>R</sup> can reach all the same states as **S**. We prove this by contradiction. Assume there is some *in*, *s* such that *Init*(s0)∧T(*in*, *s*) and 0 ≤ j ≤ |*s*|−1 such that s<sup>j</sup> is the first state that is unreachable in SR. The value of j cannot be 0 or 1, because **S** and **S**<sup>R</sup> have the same initial states and T <sup>R</sup> only excludes sequences of length 2. Then, by the definition of TR, *in*<sup>j</sup>−<sup>2</sup>, *in*<sup>j</sup>−<sup>1</sup> must be a sequence excluded by T <sup>R</sup>. Conditions (2) and (3) guarantee that permuting *in*<sup>j</sup>−<sup>2</sup> and *in*<sup>j</sup>−<sup>1</sup> results in an enabled sequence that ends in the same state, s<sup>j</sup> , which contradicts the assumption. Thus, there cannot be a state which is reachable in **S** but not **S**R. -

## **4 Reduced Instruction Sets**

Now that we can apply partial order reduction to synchronous systems, our main goal is to identify a maximal guarded independence relation, R. Recall that we defined instructions as sets of atomic actions. We call an instruction containing at most one action *atomic* (this includes the instruction with *no* actions). Non-atomic instructions are *complex*. Instructions thus reflect the parallelism of synchronous hardware, and lead to natural candidates for R: pairs of atomic instructions.

Furthermore, notice that the number of instructions is exponential in the number of actions. Thus, it could be prohibitively expensive to check every pair of instructions for guarded independence. In contrast, the number of *atomic* instructions is equal to the number of actions (plus one). Furthermore, it is likely that many complex instruction pairs will not have a guarded independence relationship because they contain common actions. Our goal in this section is to disallow as many complex instructions as possible without losing any reachable

states, thereby reducing the number of pairs of instructions we need to check while also making it more likely for the checks to succeed. Note that, in isolation, removing instructions might be problematic, because it could extend the bound needed to reach a property violation. However, as we will demonstrate in the experimental section, this disadvantage is more than compensated for when it is applied in combination with partial order reduction.

Given an STS with instruction set I, we seek a reduced instruction set, I<sup>r</sup> ⊆ I, which preserves the reachable states of the system. Let *Input* <sup>r</sup> be the set of inputs which only use instructions from Ir. Given an input *in* ∈ *Input*, our goal is to prove the existence of a witness <sup>w</sup>(*in*) <sup>∈</sup> *Input*<sup>n</sup> <sup>r</sup> (for some n > 0) that simulates the behavior of *in* using only reduced instructions. Formally, the witness function w should satisfy:

$$\begin{aligned} \forall s, s' \in S, in &\in Input. \, T(s, in, s') \implies \\ \exists n \in \mathbb{N}, s \in S^n. \, T(w(in), s: \mathbf{s}) \land (s\_{n-1} = s') \end{aligned} \tag{4}$$

In other words, we need to show that for every instruction in the original instruction set, there exists a sequence of inputs, using only instructions from the reduced instruction set (RIS), that results in the same final state. Notice that a witness function that also depended on the state would be more general, but for our purposes, it is sufficient for the witness function to depend only on the input.

#### **4.1 Atomic instruction sets**

The condition in (4) is quite general and does not provide any intuition on how to choose w. Here, we focus on a specific case where w is easy to construct: we choose I<sup>r</sup> to be an *atomic instruction set*, defined as an instruction set containing only atomic instructions. We then must prove that the set of reachable states is not affected by restricting the instructions to those in Ir.

It is sufficient to prove that for each complex instruction, we can remove one of its actions and perform that action in the next step, with the same result. For some complex instruction <sup>i</sup> containing <sup>a</sup> and some data input <sup>d</sup>, let <sup>w</sup><sup>a</sup>(i, d) be -i−{a}, d,-{a}, d. We must show that for each input *in* containing a complex instruction, there exists some a where w<sup>a</sup>(*in*) has the equivalent effect on the system as *in*. Formally, the requirement is:

$$\begin{aligned} \forall i \in \mathcal{I} \; & \; \mathcal{I}\_r, d \in D, \mathbf{s} \in S^2. \\ \quad & T(s\_0, \langle i, d \rangle, s\_1) \implies \exists \; a \in i, \mathbf{s}' \in S^3. \; T(w^a(\langle i, d \rangle), \mathbf{s}') \land s\_0 = s\_0' \land s\_1 = s\_2' \end{aligned} \tag{5}$$

Condition (5) is still difficult to prove because of the existential quantifier. One conservative approach is to replace the existential quantifier with a universal quantifier and attempt to prove that stronger condition. For real systems, this is unlikely to hold. Instead, we propose a counterexample blocking procedure which, if it succeeds, guarantees (5). We introduce symbolic values for i, d, and a and then iteratively add constraints over them until the proof succeeds or

Fig. 3: Equivalence of original and reduced instruction sequences. Circles represent states.

we have enumerated all possibilities. This algorithm is a specialized ∀∃ decision procedure that exploits the structure of (5) and additional domain knowledge about the proof goal. We use a constraint solver as an oracle.

#### **Algorithm 1 ProveRIS(S)**

1: **S**- := -S- ,Init- , A- ,En- , D- , T- ← copy sys(**S**) 2: <sup>I</sup> := <sup>P</sup>(A), <sup>I</sup>- := P(A- ) // instruction sets are power sets of actions 3: **var** <sup>i</sup> : <sup>I</sup>, **var** <sup>i</sup> - : I- , **var** a : A 4: **var** *s* : S<sup>2</sup>, **var** *s*- : S-<sup>3</sup>, **var** d : D, **var** d- : D- 5: add constraint(s<sup>0</sup> = s- <sup>0</sup> <sup>∧</sup> <sup>d</sup> <sup>=</sup> <sup>d</sup>- <sup>∧</sup> <sup>i</sup> - <sup>=</sup> <sup>i</sup> − {a}) 6: add constraint(T(-i, d, *<sup>s</sup>*) <sup>∧</sup> <sup>T</sup>- (-i - , d- ,-{a}, d- , *<sup>s</sup>*- )) 7: **for** c = 2 ... <sup>|</sup>A<sup>|</sup> **do** 8: **while** check sat(|i<sup>|</sup> <sup>=</sup> <sup>c</sup> <sup>∧</sup> <sup>s</sup><sup>1</sup> <sup>=</sup> <sup>s</sup>- <sup>2</sup>) **do** 9: <sup>μ</sup> <sup>←</sup> get model() 10: <sup>i</sup><sup>μ</sup> <sup>←</sup> assignment(μ, i) 11: <sup>a</sup><sup>μ</sup> <sup>←</sup> assignment(μ, a) 12: add constraint(i<sup>μ</sup> <sup>⊆</sup> <sup>i</sup> <sup>=</sup><sup>⇒</sup> <sup>a</sup> <sup>=</sup> <sup>a</sup>μ) 13: **if** <sup>¬</sup>check sat(<sup>i</sup> <sup>=</sup> <sup>i</sup>μ) **then** 14: **return** false // exhausted all possible decompositions for this instruction 15: **end if** 16: **end while** 17: **end for** 18: **return** true // every instruction can be decomposed

Algorithm 1 takes an STS, **S** := -S,*Init*, A,*En*, D, T and returns true if the instruction set can be decomposed into an atomic instruction set by delaying a single action from each instruction.<sup>2</sup> For simplicity, the algorithm assumes (and we check this assumption separately) that if a complex instruction i is enabled, then for each a ∈ i, executing i − {a} results in a state where a is enabled.

<sup>2</sup> We also implemented a more general version of this algorithm which can drop more than one action at a time from the instruction i, but this simpler version is sufficient for the results we report in this paper.

Formally:

$$\forall i \in \mathcal{T} \left< \mathcal{Z}\_r, d \in D, \mathbf{s} \in S^2, a \in i. \text{ } en\_i(s\_0) \land T(s\_0, \langle i - \{a\}, d \rangle, s\_1) \implies en\_a(s\_1) \quad (6)$$

Note that this is only a slight generalization of the property that atomic instructions do not disable each other, a condition that we will need anyway in order to apply partial order reduction to the atomic instruction set (see condition (2)).

The algorithm first creates an identical copy of the STS in line 1. Lines 2-4 set up symbolic variables for the instructions, data, and states of each system. Line 5 adds constraints to the solver enforcing that both systems start in the same state, use the same data, and that i is i but with symbolic action a dropped. Line 6 adds the transition relation constraint for each STS. The initial symbolic set up is depicted in Fig. 3.

The outer loop at line 7 iterates over all possible complex instruction cardinalities. The inner loop starting at line 8 attempts to show that for each cardinality c, instructions of that cardinality can be decomposed by delaying one action (symbolically represented by a). If all instructions of cardinality c have been decomposed, then the while loop condition is false and the outer loop continues. Otherwise, it gets variable assignments from the constraint solver in lines 9-11 and learns a constraint at line 13 that prevents this particular action, aμ, from being chosen for decomposition again. To ensure that we have not blocked all possible actions, there is an additional check at line 13, which returns false in the case that no action can be delayed for the current instruction.

Importantly, the algorithm assumes that if the delay of action a<sup>μ</sup> does not create a valid witness sequence for a given complex instruction iμ, then the same is true *whenever* the instruction i includes iμ. We call this a *monotonicity* assumption, and it typically holds when actions are somewhat independent. The monotonicity assumption motivated the current structure of the algorithm and can significantly reduce the number of iterations in the algorithm. We can remove this assumption by changing i<sup>μ</sup> ⊆ i to i<sup>μ</sup> = i in the antecedent in line 13. Note that the monotonicity assumption does not make the algorithm unsound: if it returns true, then (as we prove below) condition (5) holds. However, if the algorithm returns false, then it may be that the version without the assumption would return true. For each of our experiments, we were able to get a true result with the monotonicity assumption.

Because the algorithm does not consider state reachability and looks for a witness function that only depends on inputs, it can still return false when an equivalent sequence might exist for reachable states. In such cases, users can examine the constraint solver models and attempt to remove some of them by proving other invariants.<sup>3</sup>

If algorithm 1 returns true, we replace T with Tr, where T<sup>r</sup> is the result of removing from T all transitions s,i, d, s- where |i| > 1. Practically, this is

<sup>3</sup> This was rarely necessary in our experiments. Our implementation also extended the algorithm to support predicate abstraction, which could also rule out spurious counterexamples, but this feature was never needed in our experiments.

achieved by adding a disjunctive constraint over the possible atomic actions. We can now state the main results for reduced instruction sets.

**Theorem 3.** *Let* S *be an STS. If condition* (6) *holds and ProveRIS(***S***) returns true, then* (5) *holds.*

*Proof.* We maintain the loop invariant at line 8 that for every instruction i - , there is some action a such that *check sat*(|i| = c ∧ i = i - ∧ a = a- ) is true. It's true initially for each c by condition (6). Afterwards, the check on line 14 ensures that it is maintained. Furthermore, the check on line 9 ensures that when the while loop is exited, then any satisfying assignment for *check sat*(|i| = c) is such that s<sup>1</sup> = s- <sup>2</sup>. Together, these conditions guarantee that (5) holds.

**Theorem 4.** *Let* **S** := -S,*Init*, A,*En*, D, T *be an STS such that condition* (6) *holds and ProveRIS(***S***) returns true, and let* T<sup>r</sup> *be the transition relation for the reduced instruction set. Let* **S**<sup>r</sup> *be the* reduced *STS obtained by replacing* T *with* T<sup>r</sup> *in* **S***. Then, safety property* P ∈ S *is an invariant for* **S**<sup>r</sup> *if and only if it is also an invariant for* **S***.*

*Proof.* It suffices to show that the reachable states of **S** and **S**<sup>r</sup> are identical. *Init* does not change, so the initial states cannot be different. Furthermore, T<sup>r</sup> is obtained by removing transitions from T, we know that S<sup>r</sup> cannot add any reachable states. To show that it also does not remove any reachable states, consider an arbitrary trace *Init*(s0) ∧ T(*in*, *s*) with |s| = n, we must show ∃ *in-* , m, *s-* <sup>∈</sup> <sup>S</sup><sup>m</sup>.*Init*(s- 0)∧Tr(*in-* , *s-* )∧s<sup>n</sup>−<sup>1</sup> = s- <sup>m</sup>−1. We prove this by showing by induction that it holds whenever *in* contains instructions of cardinality at most c.

In the base case, c = 1, so all instructions are of size one or less. All of these are already atomic and thus we can take *in-* = *in* and *s-* = *s* by the definition of Tr.

For the inductive step, suppose that it holds for cardinalities up to c−1, and assume *Init*(s0) ∧ T(*in*, *s*) with |s| = n. Let *in*<sup>j</sup> = i, d be an input containing an instruction of size at most c. If |i| < c, there is nothing to be done. Thus we only consider the case where |i| = c. We know that T(s<sup>j</sup> , *in*<sup>j</sup> , sj+1) holds. By Theorem 3 and condition (5), it follows that T(-i−{a}, d,-{a}, d,s<sup>j</sup> , s, sj+1) holds for some a and s. We can thus replace *in*<sup>j</sup> in *in* by i − {a}, d followed by -{a}, d to obtain an input sequence *in*<sup>c</sup> and insert s between s<sup>j</sup> and sj+1 in *s* to obtain *s*<sup>c</sup> with final state s<sup>n</sup>−<sup>1</sup> such that *Init*(s0) ∧ T(*in*c, *s*c). Repeating this process for each input containing an instruction of size c yields a final *in*<sup>c</sup> such that the maximum cardinality of any instruction is c−1. The property then holds by the inductive hypothesis. -

Note that if there is some instruction i ∈ I which cannot be decomposed into atomic instructions, we could always keep this instruction in I<sup>r</sup> and still benefit from removing other complex instructions. In many cases, we can also remove the empty instruction, i<sup>e</sup> = ∅. If applying i<sup>e</sup> cannot change the state of the system, regardless of the data input, then it is considered a *stutter step* [14]. It is straightforward to check whether i<sup>e</sup> can be removed by comparing the state before and after applying ie.

## **5 Experimental Results**

We developed a prototype flow for proving the POR and RIS conditions and applying the necessary constraints. We use the IC3/PDR implementation in ABC [11], pdr, to prove condition (6) (which implies condition (2)). This requires manually writing a Verilog property for each atomic instruction.<sup>4</sup> We implemented the **ProveRIS** algorithm in our SMT-based model checker, CoSA [26], configured with boolector [29] on the smtcomp19 branch, using CaDiCaL [4] as the underlying SAT solver.<sup>5</sup> We check the commuting diagram for condition (3) in CoSA as well. It tries the trivial guard true by default, and allows the user to provide additional candidate guards if necessary. The set up for proofs in CoSA is automated based on user-provided annotations for the actions and enable conditions. We show our best results which used an encoding leveraging the SMT theory of arrays to represent memories for proving conditions, and a pure bitvector encoding for bounded model checking.

Our flow applies the following steps: i) read in a system description in Verilog using Yosys [38] and generate AIGER [7] for ABC (or BTOR2 [29] for other tools); ii) check condition (6) for each atomic instruction; iii) run the **ProveRIS** algorithm, and if it returns true, add constraints to rule out all but atomic instructions; and iv) check POR condition (3) for each pair of atomic instructions and add constraints for each passing pair of instructions with the associated guard. Each step depends on the previous step passing successfully. In each of our experiments described below, we successfully completed every step of this flow, though in some cases guards were required in step (iv). For POR and RIS runtimes, we always include the time to check the conditions. We tried running with POR alone, but it resulted in negligible improvements in runtime and thus we omit these results. This demonstrates the importance of RIS. We ran all experiments on a 3.5GHz Intel Xeon CPU with 16GB of RAM.

#### **5.1 Motivating Example**

First, we return to our motivating example. We compare the time to reach the bug using the SAT-based ABC [11] engines pdr and bmc, and SMT-based bounded model checking using btormc [29] and CoSA. We ran the SMT-based model checkers both with and without the SMT theory of arrays for the encoding of the memory. Both btormc and CoSA without the array encoding were able to reach the bug in 1230s and 1437s, respectively, but all other approaches timed out at two hours. In particular, pdr times out at 2 hours on the property, but can

<sup>4</sup> This could be automated based on user-tagged actions and user-provided enable conditions.

<sup>5</sup> GitHub Commit Hashes for Tools: Boolector/Btormc: 1989080261235f33e344cbd095e70a337c45bd16 CoSA: ff3c8cee1f0834c03167b2a8ecdd1223031312b3 PySMT: 09dc303185812149550110123ad266326beb1179 Yosys: a4b59de5d48a89ba5e1b46eb44877a91ceb6fa44 ABC: 5776ad07e7247993976bffed4802a5737c456782


Table 1: Number Solved and Average Runtime Fig. 4: Runtime Comparison

prove condition (6) for every atomic instruction in less than a second. Intuitively, this makes sense because the enabledness conditions do not involve data or mem. Thus, none of the datapath falls in the cone of influence, leaving only control logic for IC3 to reason about. The remaining conditions, (3) and (5), are proven in less than three seconds. Since all the conditions pass, we apply the POR and RIS constraints, which reduces the time to hit the bug from 1437s to 46s in CoSA, including the time to check the conditions.

#### **5.2 Packet Movers**

We now evaluate our approach on data integrity properties for a variety of packet-mover circuits. Data integrity is a safety property that ensures no packets are dropped or corrupted. In practice, data integrity is often checked by instantiating a monitor, called a *scoreboard*. It provides the necessary infrastructure for formal verification. In our case, it non-deterministically tags a *magic packet* and checks that this packet exits the system when it should. Crucially, the scoreboard is a reusable module which can check data integrity of arbitrary packet movers.

Notice that existing symmetry reduction techniques will not be very effective for this scoreboard setup. For example, consider a circular pointer FIFO which maintains two incrementing pointers that index a memory for reading and writing, respectively. We cannot use scalarsets to break symmetries in the memory addresses because the pointers index the memory and are involved in arithmetic, breaking the syntactic requirements for scalarsets [28]. Furthermore, sequential memory abstraction [9] could reduce the size of the memory, but does not address the path symmetry. In addition, both these symmetry reduction techniques are focused on proofs, not bug-finding.

We evaluate our approach on two commercial library components from a major hardware company. We also implemented simpler, open-source versions of these designs. Our open-source benchmarks include: i) a circular pointer FIFO which assumes power-of-two depth but is instantiated with a non-power-of-two depth (one greater than the provided parameter); ii) a shift register FIFO which does not properly add data to the last register in the pipeline; and iii) 2-5

correct circular pointer FIFOs in parallel with a non-deterministic arbiter and credit counters for managing data flow. The reset state of the credit counter has one too many credits, so data can be pushed to a full FIFO. The single FIFOs have two actions each: one for pushing data, and one for popping data. For the arbitrated circuits, there is a separate action for pushing data onto each FIFO as well as a single request action which is enabled whenever any FIFO is non-empty. There is an inherent symmetry in all of these designs. Consider any of the FIFOs. There are two main actions: pushing data (which is enabled if the FIFO is not full); and popping data (which is enabled if the FIFO is not empty). In a state where both are enabled, pushing data followed by popping results in the same state as popping and then pushing the same data. Furthermore, the actions can be performed simultaneously, but requiring that they are performed separately should not change the reachable states (depending on the implementation), so RIS is applicable.

Our experiments vary both the parameterizable data width and depth of the packet movers, by sweeping all powers of two between 2 and 128. All benchmarks contain injected bugs and reach the bug at a deep bound relative to the depth. We used a timeout of 4 hours. We use our prototype flow for checking the conditions and CoSA for bounded model checking.<sup>6</sup> For condition (3), we had to write one guard which is true whenever the scoreboard counter is greater than zero to handle an edge case. This same guard was used for every design, but an appropriate invariant relating the scoreboard counter to the internal state of the system being verified would also have worked. The open-source shift register FIFO required one more guard about the number of stored elements. We obtained both guards by observing counterexamples.

Table 1 compares the number of solved instances (49 total per row) within the timeout and the average runtime of commonly solved instances in seconds. Columns marked "PR" used the POR and RIS constraints. We additionally use the following abbreviations: "com" for commercial, "cp" for circular pointer, "sr" for shift register and "arb" for arbitrated. In Fig. 4 we plot the actual runtime on a log-scale for all the benchmarks with and without POR and RIS. The dotted lines show 10x and 100x improvements.

**Analysis.** There is a cluster of points in the bottom left of Fig. 4 which are solved extremely quickly by both approaches, but slightly faster without POR and RIS. These are results on benchmarks with very small parameter values, where the bug occurs at a low depth, and so the POR and RIS results are dominated by the time taken to check the conditions. However, as the parameter sizes, and runtimes, increase, it is clear that POR and RIS can result in exponential speed ups.

Recall that one concern is that RIS could extend the bound needed to reach the bug. In the shift register and arbitrated FIFO systems, it extended the bound by a few steps. However, for the bug in the open-source circular pointer FIFO, it doubled the bound needed to reach the bug. Regardless, this was more than

<sup>6</sup> Note: CoSA's bounded model checking performance is comparable to commercial model checkers on these benchmarks.

compensated for by the symmetry-breaking of POR, as evidenced by the faster times to reach the bug. The deepest bound was 260 which occurred at FIFO depth 129.

It is interesting to note that encoding the transition systems to SMT using the theory of arrays was always slower for bounded model checking, but was noticeably faster for checking RIS and POR conditions. Perhaps this is because the state comparison is easier for the solver to reason about using array extensionality [23].

We have demonstrated that these techniques work well for packet movers. In part, this is because packet movers are often well-constrained by their environmental assumptions, and their behavior is largely independent of incoming data values. Furthermore, we typically expect the POR and RIS conditions to hold for a correct packet-mover implementation, so a failure in a condition could identify a bug.

## **6 Related Work**

Various techniques have been employed to accelerate bounded model checking. The authors of [19] use BDDs to accelerate BMC, and the techniques introduced in [35,36] exploit the structure of BMC queries to help the SAT solver. The authors of [18] take advantage of structural information with an SMT framework tailored for BMC. Our technique is similar in that we speed up bounded model checking by adding constraints to the transition system, but we obtain constraints using partial order reduction analysis.

Wang et al. [37] pioneered partial order reduction for symbolic software model checking, guaranteeing optimal reduction for two threads. Their follow-up paper, [21], extended this framework to find the optimal reduction for any number of threads. We adapted their symbolic POR technique for synchronous hardware model checking, and developed reduced instruction sets to improve the efficacy of POR in this new domain. Bhattacharya et al. used a SAT solver to directly check guarded independence conditions (as opposed to checking syntactic properties) for asynchronous rule-based languages [3]. We also check conditions directly, but in a synchronous setting.

The techniques developed by McMillan, *temporal case splitting* and *path splitting* [28], provide a framework for splitting on possible values at a given timestep. These approaches deal with system executions, but still rely on breaking data symmetries for performance. In contrast, our techniques focus on mitigating path symmetries.

The work of Bengtsson et al. [2] extended POR to timed automata using a local-time desynchronization of clocks, followed by resynchronization with an added global clock. Similarly, our techniques adapt POR by modifying the system. However, our approach targets a different domain, and only modifies the original system by adding constraints.

## **7 Conclusion**

We have presented a set of conservative conditions over transition systems and automated techniques for proving these conditions. If the conditions can be proved, then constraints can be added to the system that break path symmetries. We evaluated our approach on parameterized open-source and commercial packet-mover circuits and demonstrated significant improvements in bounded model checking performance.

Some potential future work includes improvements to the **ProveRIS** procedure, investigating applications of partial order reduction to sequentially-composed packet movers, developing more targeted condition proofs by associating actions with particular data inputs, and building an interactive tool which helps the user identify and manage reduced instruction sets and partial order reductions.

## **8 Data Availability Statement**

The experimental results and the necessary software for reproducing results in a standard Ubuntu 18.04 installation are available in the Figshare repository: https://doi.org/10.6084/m9.figshare.11874687.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Revisiting Underapproximate Reachability for Multipushdown Systems***-*

S. Akshay1, Paul Gastin2, S Krishna1, and Sparsa Roychowdhury<sup>1</sup>

<sup>1</sup> IIT Bombay, Mumbai, India <sup>2</sup> ENS Paris-Saclay, Paris, France

**Abstract** Boolean programs with multiple recursive threads can be captured as pushdown automata with multiple stacks. This model is Turing complete, and hence, one is often interested in analyzing a restricted class that still captures useful behaviors. In this paper, we propose a new class of bounded underapproximations for multi-pushdown systems, which subsumes most existing classes. We develop an efficient algorithm for solving the under-approximate reachability problem, which is based on efficient fix-point computations. We implement it in our tool BHIM and illustrate its applicability by generating a set of relevant benchmarks and examining its performance. As an additional takeaway BHIM solves the binary reachability problem in pushdown automata. To show the versatility of our approach, we then extend our algorithm to the timed setting and provide the first implementation that can handle timed multipushdown automata with closed guards.

**Keywords:** Multipushdown Systems · Underapproximate Reachability · Timed pushdown automata.

## **1 Introduction**

The reachability problem for pushdown systems with multiple stacks is known to be undecidable. However, multi-stack pushdown automata (MPDA hereafter) represent a theoretically concise and analytically useful model of multi-threaded recursive programs with shared memory. As a result, several previous works in the literature have proposed different under-approximate classes of behaviors of MPDA that can be analyzed effectively, such as Round Bounded, Scope Bounded, Context Bounded and Phase Bounded [18,19,27,14,20,28]. From a practical point of view, these underapproximations have led to efficient tools including, GetaFix [21], SPADE [23]. It has also been argued (e.g., see [24]) that such bounded underapproximations suffice to find several bugs in practice. In many such tools efficient fix-point techniques are used to speed-up computations.

We extend known fix-point based approaches by developing a new algorithm that can handle a larger class of bounded underapproximations than the well-known bounded context and bounded scope underapproximations for

c The Author(s) 2020

<sup>-</sup>Partly supported by UMI ReLaX, DST/CEFIPRA/INRIA project EQuaVe & TCS.

https://doi.org/10.1007/978-3-030-45190-5\_21 A. Biere and D. Parker (Eds.): TACAS 2020, LNCS 12078, pp. 387–404, 2020.

multi-pushdown systems while remaining efficiently implementable. Our algorithm works for a new class of underapproximate behaviors called hole bounded behaviors, which subsumes context/scope bounded underapproximations, and is orthogonal to phase bounded underapproximations. A "hole" is a maximal sequence of push operations of a fixed stack, interspersed with well-nested sequences of any stack. Thus, in a sequence α = βγ where β = [push1(push2push<sup>3</sup> pop3pop2)push1(push3pop3)]<sup>10</sup> and <sup>γ</sup> <sup>=</sup> push2push1pop2pop1(pop1) <sup>20</sup>, β is a hole with respect to stack 1. The suffix γ has 2 holes (the push<sup>2</sup> and the push1). Thus we say that α is 3-hole bounded. On the other hand, the number of context switches (and scope bound) in α is > 50. A (k-)hole bounded sequence is one such, where, at any point of the computation, the number of "open" holes are bounded at this point (by k). We show that the class of hole bounded sequences subsumes most of the previously defined classes of underapproximations and is, in fact, contained in the very generic class of tree-width bounded sequences. This immediately shows decidability of the reachability problem for our class.

Analyzing the more generic class of tree-width bounded sequences is often much more difficult; for instance, building bottom-up tree automata for this purpose does not scale very well as it explores a large (and often useless) state space. Our technique is radically different from using tree automata. Under the hole bounded assumption, we pre-compute information regarding well-nested sequences and holes using fix-point computations and use them in our algorithm. Using efficient data structures to implement this approach, we develop a tool (BHIM) for Bounded Hole reachability in Multi-stack pushdown systems.

## **Highlights of** BHIM.

• Two significant aspects of the fix-point approach in BHIM are: (i) we efficiently solve the binary reachability problem for pushdown automata. i.e., BHIM computes all pairs of states (s, t) such that t is reachable from s with empty stacks. This allows us to go beyond reachability and handle some liveness questions; (ii) we pre-compute the set of pairs of states that are endpoints of holes. This allows us to greatly limit the search for an accepting run.

• While the fix-point approach solves (binary) reachability efficiently, it does not a priori produce a witness of reachability. We remedy this situation by proposing a backtracking algorithm, which cleverly uses the computations done in the fixpoint algorithm, to generate a witness efficiently.

• BHIM is parametrized with respect to the hole bound: if non-emptiness can be checked or witnessed by a well-nested sequence (this is an easy witness and BHIM looks for easy witnesses first, then gradually increases complexity, if no easy witness is found), then it is sufficient to have the hole bound 0. Increasing this complexity measure as required to certify non-emptiness gives an efficient implementation, in the sense that we search for harder witnesses only when no easier witnesses (w.r.t this complexity measure) exist. In examples described in the experimental section, a small (less than 4) bound suffices and we expect this to be the case for most practical examples.

• Finally, we extend our approach to handle timed multi-stack pushdown systems. This shows the versatility of our approach and also requires us to solve several technical challenges which are specific to the timed setting. Implementing this approach in BHIM makes it, to the best of our knowledge, the first tool that can analyze timed multi-stack pushdown automata with closed guards.

We analyze the performance of BHIM in practice, by considering benchmarks from the literature, and generating timed variants of some of them. One of our benchmarks is a variant of the Bluetooth example [11,23], where BHIM was able to catch a known race detection error. Another interesting benchmark is a model of a parameterized multiple producer consumer example, having parameters M,N on the quantities of two items A, B produced. Here, BHIM could detect bugs by finding witnesses having just 2 holes, while, it is unlikely that existing tools working on scope/context bounded underapproximations can handle them as the number of scope/context switches is dependent on M,N (in fact, it is twice the least common multiple of M and N). In the timed setting, one of the main challenges has been the unavailability of timed benchmarks; even in the untimed setting, many benchmarks were unavailable due to their proprietary nature. Due to lack of space, proofs, technical details and parametric plots of experiments are in [4].

**Related Work**. Among other under-approximations, scope bounded [27] subsumes context and round bounded underapproximations, and it also paves path for GetaFix [21], a tool to analyze recursive (and multi-threaded) boolean programs. As mentioned earlier hole boundedness strictly subsumes scope boundedness. On the other hand, GetaFix uses symbolic approaches via BDDs, which is orthogonal to the improvements made in this paper. Indeed, our next step would be to build a symbolic version of BHIM which extends the hole-bounded approach to work with symbolic methods. Given that BHIM can already handle synthetic examples with 12-13 holes (see [4]), we expect this to lead to even more drastic improvements and applicability. For sequential programs, a summary-based algorithm is used in [21]; summaries are like our well-nested sequences, except that well-nested sequences admit contexts from different stacks unlike summaries. As a result, our class of bounded hole behaviors generalizes summaries. Many other different theoretical results like phase bounded [18], order bounded [8] which gives interesting underapproximations of MPDA, are subsumed in tree-width bounded behaviors, but they do not seem to have practical implementations. Adding real-time information to pushdown automata by using clocks or timed stacks has been considered, both in the discrete and dense-timed settings. Recently, there has been a flurry of theoretical results in the topic [10,1,2,5,6]. However, to the best of our knowledge none of these algorithms have been successfully implemented (except [6] which implements a tree-automata based technique for single-stack timed systems) for multi-stack systems. One reason is that these algorithms do not employ scalable fix-point based techniques, but instead depend on region automaton-based search or tree automata-based search techniques.

## **2 Underapproximations in MPDA**

A multi-stack pushdown automaton (MPDA) is a tuple <sup>M</sup> = (S, Δ, s0, <sup>S</sup><sup>f</sup> , n, Σ, Γ) where, <sup>S</sup> is a finite non-empty set of locations, <sup>Δ</sup> is a finite set of transitions, <sup>s</sup><sup>0</sup> ∈ S is the initial location, <sup>S</sup><sup>f</sup> ⊆ S is a set of final locations, <sup>n</sup> <sup>∈</sup> <sup>N</sup> is the number of stacks, <sup>Σ</sup> is a finite input alphabet, and <sup>Γ</sup> is a finite stack alphabet which contains <sup>⊥</sup>. A transition <sup>t</sup> <sup>∈</sup> <sup>Δ</sup> can be represented as a tuple (s, op, a, s- ), where, s, s- ∈ S are respectively, the source and destination locations of the transition <sup>t</sup>, <sup>a</sup> <sup>∈</sup> <sup>Σ</sup> is the label of the transition, and op is one of the following operations (1) nop, or no stack operation, (2) (↓<sup>i</sup> <sup>α</sup>) which pushes <sup>α</sup> <sup>∈</sup> <sup>Γ</sup> onto stack <sup>i</sup> ∈ {1, <sup>2</sup>,...,n}, (3) (↑<sup>i</sup> <sup>α</sup>) which pops stack <sup>i</sup> if the top of stack <sup>i</sup> is <sup>α</sup> <sup>∈</sup> <sup>Γ</sup>.

For a transition t = (s, op, a, s- ) we write src(t) = s,tgt(t) = s and op(t) = op. At the moment we ignore the action label a but this will be useful later when we go beyond reachability to model checking. A configuration of the MPDA is a tuple (s, λ1, λ2,...,λn) such that, <sup>s</sup> ∈ S is the current location and <sup>λ</sup><sup>i</sup> <sup>∈</sup> Γ<sup>∗</sup> represents the current content of i th stack. The semantics of the MPDA is defined as follows: a run is accepting if it starts from the initial state and reaches a final state with all stacks empty. The language accepted by a MPDA is defined as the set of words generated by the accepting runs of the MPDA. Since the reachability problem for MPDA is Turing complete, we consider underapproximate reachability.

A sequence of transitions is called **complete** if each push in that sequence has a matching pop and vice versa. A **well-nested** sequence denoted ws is defined inductively as follows: a possibly empty sequence of nop-transitions is ws, and so is the sequence t ws t where op(t)=(↓iα) and op(<sup>t</sup> - )=(↑iα) are a matching pair of push and pop operations of stack i, <sup>∀</sup><sup>i</sup> ∈ {<sup>1</sup> ...n}. Finally the concatenation of two well-nested sequences is a well-nested sequence, i.e., they are closed under concatenation. The set of all well-nested sequences defined by an MPDA is denoted WS. If we visualize this by drawing edges between pushes and their corresponding pops, well-nested sequences have no crossing edges, as in and , where we have two stacks, depicted with red and violet edges. We emphasize that a well-nested sequence can have well-nested edges from any stack. In a sequence σ, a push (pop) is called a **pending** push (pop) if its matching pop (push) is not in the same sequence σ.

**Bounded Underapproximations**. As mentioned in the introduction, different bounded under-approximations have been considered in the literature to get around the Turing completeness of MPDA. During a computation, a context is a sequence of transitions where only one stack or no stack is used. In context bounded computations the number of contexts are bounded [25]. A round is a sequence of (possibly empty) contexts for stacks 1, 2,...,n. Round bounded computations restrict the total number of rounds allowed [19,5,6]. Scope bounded computations generalize bounded context computations. Here, the context changes within any push and its corresponding pop is bounded [19,20,28]. A phase is a contiguous sequence of transitions in a computation, where we restrict pop to only one stack, but there are no restrictions on the pushes [18]. A phase bounded computation is one where the number of phase changes is bounded.

**Tree-width**. A generic way of looking at them is to consider classes which have a bound on the tree-width [22]. In fact, the notions of split-width/cliquewidth/tree-width of communicating finite state machines/timed push down systems has been explored in [3], [13]. The behaviors of the underlying system are then represented as graphs. It has been shown in these references that if the family of graphs arising from the behaviours of the underlying system (say S) have a bounded tree-width, then the reachability problem is decidable for S via, treeautomata. However, this does not immediately give rise to an efficient implementation. The tree-automata approach usually gives non-deterministic or bottomup tree automata, which when implemented in practice (see [6]) tend to blow up in size and explore a large and useless space. Hence there is a need for efficient algorithms, which exist for more specific underapproximations such as contextbounded (leading to fix-point algorithms and their implementations [21]).

#### **2.1 A new class of under-approximations**

Our goal is to bridge the gap between having practically efficient algorithms and handling more expressive classes of under-approximations for reachability of multi-stack pushdown systems. To do so, we define a bounded approximation which is expressive enough to cover previously defined practically interesting classes (such as context bounded etc), while at the same time allowing efficient decidable reachability tests, as we will see in the next section.

**Definition 1.** (Holes). Let σ be complete sequence of transitions, of length n in a MPDA, and let ws be a well-nested sequence.


As an example, consider the sequence σ in Figure 1 of transitions of a MPDA having stacks 1,2 (denoted respectively red and blue). We use superscripts for

**Figure 1.** A run σ with 2 holes (2 red patches) of the red stack and 1 hole (one blue patch) of the blue stack.

each push, pop of each stack to distinguish the ith push, jth pop and so on of each stack. There are two holes of stack 1 (red stack) denoted by the red patches, and one hole of stack 2 (blue stack) denoted by the blue patch. The subsequence ↓1 1↓2 <sup>1</sup> ws<sup>2</sup> of the first hole is not a maximal factor, since it can be extended by ↓3 <sup>1</sup> ws<sup>3</sup> in the run σ, extending the hole. Consider the position in σ marked with ↓1 <sup>2</sup>. At this position, there is an open hole of the red stack (the first red patch), and there is an open hole of the blue stack (the blue patch). Likewise, at the position <sup>↑</sup><sup>5</sup> <sup>1</sup>, there are 2 open holes of the red stack (2 red patches) and one open hole of the blue stack 2 (the blue patch). The hole bound of σ is 3. The green patch consisting of <sup>↑</sup><sup>3</sup> <sup>1</sup>, <sup>↑</sup><sup>2</sup> <sup>1</sup> and ws<sup>5</sup> is a pop-hole of stack 1. Likewise, the pops ↑2 <sup>2</sup>, <sup>↑</sup><sup>5</sup> <sup>1</sup>, <sup>↑</sup><sup>1</sup> <sup>2</sup> are all pop-holes (of length 1) of stacks 2,1,2 respectively.

**Definition 2.** (Hole Bounded Reachability Problem) Given a MPDA and <sup>K</sup> <sup>∈</sup> <sup>N</sup>, the <sup>K</sup>-hole bounded reachability problem is the following: Does there exist a K-hole bounded accepting run of the MPDA?

**Proposition 1.** The tree-width of K-hole bounded MPDA behaviors is at most (2K + 3).

With this, from [22][5][6], decidability and complexity follow. Thus,

**Corollary 1.** The K-hole bounded reachability problem for MPDA is decidable in <sup>O</sup>(|M|<sup>2</sup>K+3) where, <sup>M</sup> is the size of the underlying MPDA.

Next, we turn to the expressiveness of this class with respect to the classical underapproximations of MPDA: first, the **hole** bounded class strictly subsumes **scope** bounded which already subsumes **context** bounded and **round** bounded classes. Also **hole** bounded MPDA and **phase** bounded MPDA are orthogonal.

**Proposition 2.** Consider a MPDA M. For any K, let L<sup>K</sup> denote a set of sequences accepted by M which have number of rounds or number of contexts or scope bounded by K. Then there exists K- <sup>≤</sup> <sup>K</sup> such that <sup>L</sup><sup>K</sup> is <sup>K</sup> hole bounded. Moreover, there exist languages which are K hole bounded for some constant K, which are not K round or context or scope bounded for any K- . Finally, there exists a language which is accepted by phase bounded MPDA but not accepted by hole bounded MPDA and vice versa.

Proof. We first recall that if a language L is K-round, or K-context bounded, then it is also K- -scope bounded for some K- <sup>≤</sup> <sup>K</sup> [20,19]. Hence, we only show that scope bounded systems are subsumed by hole bounded systems.

Let L be a K-scope bounded language, and let M be a MPDA accepting <sup>L</sup>. Consider a run <sup>ρ</sup> of <sup>w</sup> <sup>∈</sup> <sup>L</sup> in <sup>M</sup>. Assume that at any point <sup>i</sup> in the run <sup>ρ</sup>, #i(holes) = k- , and towards a contradiction, let, k- > K. Consider the leftmost open hole in <sup>ρ</sup> which has a pending push <sup>↓</sup><sup>p</sup> whose pop <sup>↑</sup><sup>p</sup> is to the right of i. Since k- > K is the number of open holes at i, there are at least k- > K context changes in between <sup>↓</sup><sup>p</sup> and <sup>↑</sup>p. This contradicts the <sup>K</sup>-scope bounded assumption, and hence k-<sup>≤</sup> <sup>K</sup>.

To show the strict containment, consider the visibly pushdown language [7] given by <sup>L</sup>bh <sup>=</sup> {anb<sup>n</sup>(a<sup>p</sup><sup>1</sup> <sup>c</sup><sup>p</sup>1+1bp- <sup>1</sup> dp- <sup>1</sup>+1 ··· <sup>a</sup><sup>p</sup><sup>n</sup> <sup>c</sup><sup>p</sup>n+1bp- <sup>n</sup> dp- <sup>n</sup>+1) <sup>|</sup> n, p1, p- 1,...,pn, p- <sup>n</sup> ∈ <sup>N</sup>}. A possible word <sup>w</sup> <sup>∈</sup> <sup>L</sup>bh is <sup>a</sup>3b<sup>3</sup> <sup>a</sup>2c3b2d<sup>3</sup> <sup>a</sup>2c3bd<sup>2</sup> ac2bd<sup>2</sup> with a, b representing push in stack 1,2 respectively and c, d representing the corresponding matching pop from stack 1,2. A run <sup>ρ</sup> accepting the word <sup>w</sup> <sup>∈</sup> <sup>L</sup>bh will start with a sequence of pushes of stack 1 followed by another sequence of pushes of stack 2. Note that, the number of the pushes n is same in both stacks. Then there is a group G consisting of a well-nested sequence of stack 1 (equal a and c) followed by a pop of the stack 1 (an extra c), another well-nested sequence of stack 2 (equal b and d) and a pop of the stack 2 (an extra d), repeated n times. From the definition of the hole, the total number of holes required in G is 0. But, we need 1 hole for the sequence of a's and another for the sequence of b's at the beginning of the run, which creates at most 2 holes during the run. Thus, the hole bound for any accepting run ρ is 2, and the language Lbh is 2-hole bounded.

However, <sup>L</sup>bh is not <sup>k</sup>-scope bounded for any <sup>k</sup>. Indeed, for each <sup>m</sup> <sup>≥</sup> 1, consider the word <sup>w</sup><sup>m</sup> <sup>=</sup> <sup>a</sup>mb<sup>m</sup>(ac2bd<sup>2</sup>)<sup>m</sup> <sup>∈</sup> <sup>L</sup>bh. It is easy to see that <sup>w</sup><sup>m</sup> is 2mscope bounded (the matching c, d of each a, b happens 2m context switches later) but not k-scope bounded for k < 2m. It can be seen that Lbh is not k-phase bounded either. Finally, L- <sup>=</sup> {(ab)<sup>n</sup>cnd<sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>∈</sup> <sup>N</sup>} with a, b and c, d respectively being push and pop of stack 1,2 is not hole-bounded but 2-phase bounded. 

## **3 A Fix-point Algorithm for Hole Bounded Reachability**

In the previous section, we showed that hole-bounded underapproximations are a decidable subclass for reachability, by showing that this class has a bounded tree-width. However, as explained in the introduction, this does not immediately give a fix-point based algorithm, which has been shown to be much more efficient for other more restricted sub-classes, e.g., context-bounded. In this section, we provide such a fix-point based algorithm for the hole-bounded class and explain its advantages. Later we discuss its versatility by showing extensions and evaluating its performance on a suite of benchmarks.

We describe the algorithm in two steps: first we give a simple fix-point based algorithm for the problem of 0-hole or well-nested reachability, i.e, reachability by a well-nested sequence without any holes. For the 0-hole case, our algorithm computes the reachability relation, also called the binary reachability problem [15]. That is, we accept all pairs of states (s, s- ) such that there is a well-nested run from s with empty stack to s with empty stack. Subsequently, we combine this binary reachability for well-nested sequences with an efficient graph search to obtain an algorithm for K-hole bounded reachability.

**Binary well-nested reachability for** MPDA. Note that single stack PDA are a special case, since all runs are indeed well-nested.

1. **Transitive Closure**: Let <sup>R</sup> be the set of tuples of the form (si, s<sup>j</sup> ) representing that state s<sup>j</sup> is reachable from state s<sup>i</sup> via a nop discrete transition. Such a sequence from s<sup>i</sup> to s<sup>j</sup> is trivially well-nested. We take the TransitiveClosure of R using Floyd-Warshall algorithm [12]. The resulting set R<sup>c</sup> of tuples answers the binary reachability for finite state automata (no stacks).

```
1 Function IsEmpty(M = (S, Δ, s0, Sf , n, Σ, Γ), K):
      Result: True or False
2 WR := WellNestedReach(M); \\Solves binary reachability for pushdown system
3 if some (s0, s1) ∈ WR with s1 ∈ Sf then
4 return False;
5 forall i ∈ [n] do
6 AHSi := ∅; Seti := ∅;
7 forall (s, ↓i(α), a, s1) ∈ Δ and (s1, s-

                                      ) ∈ WR do
8 AHSi := AHSi ∪ {(i, s, α, s-

                                   )}; Seti := Seti ∪ {(s, s-

                                                      )};
9 HSi := {(i, s, s-

                     ) | (s, s-

                            ) ∈ TransitiveClosure(Seti)};
10 μ := [s0]; μ.NumberOfHoles := 0;
11 SetOfListsnew := {μ}; SetOfLists := ∅;
12 do
13 SetOfLists := SetOfLists ∪ SetOfListsnew;
14 SetOfListstodo := SetOfListsnew; SetOfListsnew := ∅;
15 forall μ-
                 ∈ SetOfListstodo do
16 if μ-

                .NumberOfHoles < K then
17 forall i ∈ [n] do
                   \\ Add hole for stack i
18 SetOfListsh := AddHolei(μ-

                                        ,HSi) \ SetOfLists;
19 SetOfListsnew := SetOfListsnew ∪ SetOfListsh;
20 if μ-

                .NumberOfHoles > 0 then
21 forall i ∈ [n] do
                   \\ Add pop for stack i
22 SetOfListsp := AddPopi(μ-

                                       , M, AHSi,HSi, WR) \ SetOfLists;
23 SetOfListsnew := SetOfListsnew ∪ SetOfListsp;
24 forall μ3 ∈ SetOfListsp do
25 if μ3.last ∈ Sf and μ3.NumberOfHoles = 0 then
26 return False; \\If reached destination state
27 while SetOfListsnew 	= ∅;
28 return T rue;
```
**Algorithm 1:** Algorithm for Emptiness Checking of hole bounded MPDA

2. **Push-Pop Closure**: For stack operations, consider a push transition on some stack (say stack i) of symbol γ, enabled from a state s1, reaching state s2. If there is a matching pop transition from a state s<sup>3</sup> to s4, which pops the same stack symbol <sup>γ</sup> from the stack <sup>i</sup> and if we have (s2, s3) ∈ Rc, then we can add the tuple (s1, s4) to <sup>R</sup>c. The function WellNestedReach repeats this process and the transitive closure described above until a fix-point is reached. Let us denote the resulting set of tuples by WR. Thus,

**Lemma 1.** (s1, s2) <sup>∈</sup> WR iff <sup>∃</sup> a well-nested run in the MPDA from <sup>s</sup><sup>1</sup> to <sup>s</sup>2.

**Beyond well-nested reachability**. A naive algorithm for K-hole bounded reachability for K > 0 is to start from the initial state s0, and do a Breadth First Search (BFS), nondeterministically choosing between extending with a well-nested segment, creating hole segments (with a pending push) and closing hole segments (using pops). We accept when there are no open hole segments and reach a final state; this gives an exponential time algorithm. Given the exponential dependence on the hole-bound K (Corollary 1), this exponential blowup is unavoidable in the worst case, but we can do much better in practice. In particular, the naive algorithm makes arbitrary non-deterministic choices resulting in a blind exploration of the BFS tree.

In this section, we use the binary well-nested reachability algorithm as an efficient subroutine to limit the search in BFS to its reachable part (note that this is quite different from DFS as well since we do not just go down one path). The crux is that at any point, we create a new hole for stack i, only when (i) we know that we cannot reach the final state without creating this hole and (ii) we know that we can close all such holes which have been created. Checking (i) is easy, since we just use the WR relation for this. Checking (ii) blindly would correspond to doing a DFS; however, we precompute this information and simply look it up, resulting in a constant time operation after the precomputation.

**Precomputing hole information.** Recall that a hole of stack i is a maximal sequence of the form (↓<sup>i</sup> ws)<sup>+</sup>, where ws is a well-nested sequence and <sup>↓</sup><sup>i</sup> represents a push of stack i. A hole segment of stack i is a prefix of a hole of stack i, ending in a ws, while an atomic hole segment of stack i is just the segment of the form <sup>↓</sup><sup>i</sup> ws. A hole-segment of stack <sup>i</sup> which starts from state <sup>s</sup> in the MPDA and ends in state s- , can be represented by the triple (i, s, s- ), that we call a hole triple. We compute the set HS<sup>i</sup> of all hole triples (i, s, s- ) such that starting at s, there is a hole segment of stack i which ends at state s- , as detailed in lines (5-9) of Algorithm 1. In doing so, we also compute the set AHS<sup>i</sup> of all atomic hole segments of stack i and store them as tuples of the form (i, sp, α, sq) such that s<sup>p</sup> and s<sup>q</sup> are the MPDA states respectively at the left and right end points of an atomic hole segment of stack i, and α is the symbol pushed on stack i (s<sup>p</sup> <sup>↓</sup>i(α)ws −−−−−→ <sup>s</sup>q).

**A guided BFS exploration.** We start with a list μ<sup>0</sup> = [s0] consisting of the initial state and construct a BFS exploration tree whose nodes are lists of bounded length. A list is a sequence of states and hole triples representing a K-hole bounded run in a concise form. If H<sup>i</sup> represents a hole triple for stack i, then a list is a sequence of the form [s, Hi, H<sup>j</sup> , Hk, Hi,...,H, s- ]. The simplest kind of list is a single state s. For example, a list with 3 holes of stacks i, j, k is μ = [s0,(i, s, s- ),(j, r, r- ),(k, t, t- ),t --]. The hole triples (in red) denote open holes in the list. The maximum number of open holes in a list is bounded, making the length of the list also bounded. Let last(μ) represent the last element of the list μ. This is always a state. For a node v storing list μ in the BFS tree, if v1,...v<sup>k</sup> are its children, then the corresponding lists μ1,...μ<sup>k</sup> are obtained by extending the list μ by one of the following operations:


enough since we also need to match the stack content. Instead, we check if we can split the hole (i, s, s- ) into (1) a hole triple (i, s, sa) <sup>∈</sup> HSi, and (2) a tuple (i, sa, α, s- ) <sup>∈</sup> AHSi. If both (1) and (2) are possible, then the pop transition t corresponds to the last pending push of the hole (i, s, s- ). t indeed matches the pending push recorded in the atomic hole (i, sa, α, s- ) in μ, enabling the firing of transition t from the state sk, reaching s- <sup>k</sup>. In this case, we add the child node with the list <sup>μ</sup> obtained from μ as follows. We replace (i) s<sup>k</sup> with s- k, and (ii) (i, s, s- ) with (i, s, sa), respectively signifying firing of the transition t and the "shrinking" of the hole, by shifting the end point of the hole segment to the left. When we obtain the hole triple (i, s, s) (the start and end points of the hole segment coincide), we may have uncovered the last pending push and thereby "closed" the hole segment completely. At this point, we may choose to remove (i, s, s) from the list, obtaining [s0,...,(h, u, v), (j, t, t- )...,s- <sup>k</sup>]. For every such μ- = [s0,...,(h, u, v),(i, s, sa),(j, t, t- ),...,s- k] and all (s- <sup>k</sup>, sm) <sup>∈</sup> W S we also extend μ to μ-- = [s0,...,(h, u, v),(i, s, sa),(j, t, t- ),...,sm]. Notice that the size of the list in the child node obtained on a pop, is either the same as the list in the parent, or is smaller.

The number of lists is bounded since the number of states and the length of the lists are bounded. The BFS exploration tree will thus terminate. Combining the above steps gives us Algorithm 1, whose correctness gives us:

**Theorem 1.** Given a MPDA and a positive integer K, Algorithm 1 terminates and answers "false" iff there exists a K-hole bounded accepting run of the MPDA.

**Complexity of the Algorithm**. The maximum number of states of the system is |S|. The time complexity of transitive closure is <sup>O</sup>(|S|<sup>3</sup>), using a Floyd-Warshall implementation. The time complexity of computing WellNestedReach which uses the transitive closure, is <sup>O</sup>(|S|<sup>5</sup>) + <sup>O</sup>(|S|<sup>2</sup> <sup>×</sup> (|Δ| × |S|)). To compute AHS for <sup>n</sup> stacks the time complexity is <sup>O</sup>(<sup>n</sup> × |Δ| × |S|<sup>2</sup>) and to compute HS for <sup>n</sup> stacks the complexity is <sup>O</sup>(n×|S|<sup>2</sup>). For multistack systems, each list keeps track of (i) the number of hole segments(<sup>≤</sup> <sup>K</sup>), and (ii) information pertaining to holes (start, end points of holes, and which stack the hole corresponds to). In the worst case, this will be (2K + 2) possible states in a list, as we are keeping the states at the start and end points of all the hole segments and a stack per hole. So, there are ≤ |S|<sup>2</sup>K+3 <sup>×</sup> <sup>n</sup>K+1 lists. In the worst case, when there is no K-hole bounded run, we may end up generating all possible lists for a given bound K on the hole segments. The time complexity is thus bounded above by <sup>O</sup>(|S|<sup>2</sup>K+3 <sup>×</sup> <sup>n</sup>K+1 <sup>+</sup> |S|<sup>5</sup> <sup>+</sup> |S|<sup>3</sup> × |Δ|).

**Beyond Reachability**. We can solve the usual safety questions in the (boundedhole) underapproximate setting, by checking for underapproximate reachability on the product of the given system with the complement of the safe set. Given the way Algorithm 1 is designed, the fix-point algorithm allows us to go beyond reachability. In particular, we can solve several (increasingly difficult) variants of the repeated reachability problem, without much modification.

Consider the question : For a given state s and MPDA, does there exist a run ρ starting from s<sup>0</sup> which visits s infinitely often? This is decidable if we can

decompose ρ into a finite prefix ρ<sup>1</sup> and an infinite suffix ρ<sup>2</sup> s.t. (1) both ρ1, ρ<sup>2</sup> are well-nested, or (2) ρ<sup>1</sup> is K-hole bounded complete (all stacks empty), and ρ<sup>2</sup> is well-nested, or (3) ρ<sup>1</sup> is K-hole bounded, and ρ<sup>2</sup> = (ρ3)ω, where ρ<sup>3</sup> is K-hole bounded. It is easy to see that (1) is solved by two calls to WellNestedReach and choosing non-empty runs. (2) is solved by a call to Algorithm 1, modified so that we reach s, and then calling WellNestedReach. Lastly, to solve (3), first modify Algorithm 1 to check reachability to s with possibly non-empty stacks. Then run the modified algorithm twice : first start from s<sup>0</sup> and reach s; second start from s and reach s again.

## **4 Generating a Witness**

We next focus on the question of generating a witness for an accepting run when our algorithm guarantees non-emptiness. This question is important to address from the point of view of applicability: if our goal is to see if bad states are reachable, i.e., non-emptiness corresponds to presence of a bug, the witness run gives the trace of how the bug came about and hence points to what can be done to fix it (e.g., designing a controller). We remark that this question is difficult in general. While there are naive algorithms which can explore for the witness (thus also solving reachability), these do not use fix-point techniques and hence are not efficient. On the other hand, since we use fix-point computations to speed up our reachability algorithm, finding a witness, i.e., an explicit run witnessing reachability, becomes non-trivial. Generation of a witness in the case of wellnested runs is simpler than the case when the run has holes, and requires us to "unroll" pairs (s0, s<sup>f</sup> ) <sup>∈</sup> WR recursively and generate the sequence of transitions responsible for (s0, s<sup>f</sup> ).

**Getting Witnesses from Holes**. Now we move on to the more complicated case of behaviours having holes. Recall that in BFS exploration we start from the states reachable from s<sup>0</sup> by well-nested sequences, and explore subsequent states obtained either from (i) a hole creation, or (ii) a pop operation on a stack. Proceeding in this manner, if we reach a final configuration (say s<sup>f</sup> ), with all holes closed (which implies empty stacks), then we declare non-emptiness. To generate a witness, we start from the final state s<sup>f</sup> reachable in the run (a leaf node in the BFS exploration tree) and backtrack on the BFS exploration tree till we reach the initial state s0. This results in generating a witness run in the reverse, from the right to the left.

• Assume that the current node of the BFS tree was obtained using a pop operation. There are two possibilities to consider here (see below) depending on whether this pop operation closed or shrunk some hole. Recall that each hole has a left end point and a right end point and is of a specific stack i, depending on the pending pushes <sup>↓</sup><sup>i</sup> it has. So, if the MPDA has <sup>k</sup> stacks, then a list in the exploration tree can have k kinds of holes. The witness algorithm uses k stacks called witness stacks to correctly implement the backtracking procedure, to deal with k kinds of holes. Witness stacks should not be confused with the stacks of the MPDA.

• Assume that the current pop operation is closing a hole of kind i as in Figure 2. This hole consists of the atomic holes , and . The atomic hole consists of the push and the well-nested sequence (same for the other two atomic holes). Searching among possible push transitions, we identify the matching push associated with the current pop, resulting in closing the hole. On backtracking, this leads to a parent node with the atomic hole having as left end point, the push , and the right end point as the target of the ws . We push onto the witness stack i, a barrier (a delimiter symbol #) followed by the matching push transition and then the ws, . The barrier segregates the contents of the witness stack when we have two pop transitions of the same stack in the reverse run, closing/shrinking two different holes.

**Figure 2.** Backtracking to spit out the hole in reverse. The transitions of the atomic hole are first written in the reverse order, followed by those of in reverse, and then of in reverse.

• Assume that the current pop operation is shrinking a hole of kind i. The list at the present node has this hole, and its parent will have a larger hole (see Figure 2, where the parent node of has ). As in the case above, we first identify the matching push transition, and check if it agrees with the push in the last atomic hole segment in the parent. If so, we populate the witness stack i with the rightmost atomic hole segment of the parent node (see Figure 2, is populated in the stack). Each time we find a pop on backtracking the exploration tree, we find the rightmost atomic hole segment of the parent node, and keep pushing it on the stack, until we reach the node which is obtained as a result of a hole creation. Now we have completely recovered the entire hole information by backtracking, and fill the witness stack with the reversed atomic

hole segments which constituted this hole. Notice that when we finish processing a hole of kind i, then the witness stack i has the hole reversed inside it, followed by a barrier. The next hole of the same kind i will be treated in the same manner. • If the current node of the BFS tree is obtained by creating a hole of kind <sup>i</sup> in the fix-point algorithm, then we pop the contents of witness stack i till we reach a barrier. This spits out the atomic hole segments of the hole from the right to the left, giving us a sequence of push transitions, and the respective ws in between. The transitions constituting the ws are retrieved and added. Notice that popping the witness stack i till a barrier spits out the sequence of transitions in the correct reverse order while backtracking.

## **5 Adding Time to Multi-pushdown systems**

In this section, we briefly describe how the algorithms described in section 3 can be extended to work in the timed setting. Due to lack of space, we focus on

some of the significant challenges and advances, leaving the formal details and algorithms to the supplement [4]. A TMPDA extends a MPDA S with a set X of clock variables. Transitions check constraints which are conjunctions/disjunctions of constraints (called closed guards in the literature) of the form <sup>x</sup> <sup>≤</sup> <sup>c</sup> or <sup>x</sup> <sup>≥</sup> <sup>c</sup> for <sup>c</sup> <sup>∈</sup> <sup>N</sup> and <sup>x</sup> any clock from <sup>X</sup> . Symbols pushed on stacks "age" with time elapse; that os, they store the time elapsed since they were pushed onto the stack. A pop is successful only when the age of the symbol lies within a certain interval. The acceptance condition is as in the case of MPDA.

The first main challenge in adapting the algorithms in section 3 to the timed setting was to take care of all possible time elapses along with the operations defined in Algorithm 1. The usage of closed guards in TMPDA means that it suffices to explore all runs with integral time elapses (for a proof see e.g., Lemma 4.1 in [5]). Thus configurations are pairs of states with valuations that are vectors of non-negative integers, each of which is bounded by the maximal constant in the system. Now, to check reachability we need to extend all the precomputations (transitive closure, well-nested reachability, as well as atomic and non-atomic hole segments) with the time elapse information. To do this, we use a weighted version of the Floyd-Warshall algorithm by storing time elapses during precomputations. This allows us to use this precomputed timed well-nested reachability information while performing the BFS tree exploration, thus ensuring that any explored state is indeed reachable by a timed run. In doing so, the most challenging part is extending the BFS tree wrt a pop. Here, we not only have to find a split of a hole into an atomic hole-segment and a hole-segment as in Algorithm 1, but also need to keep track of possible partitions of time, making the algorithm quite challenging.

**Timed Witness:** As in the untimed case, we generate a witness certifying nonemptiness of TMPDA. But, producing a witness for the fix-point computation as discussed earlier requires unrolling. The fix-point computation generates a pre-computed set WRT of tuples ((s, ν), t,(s- , ν- )), where s, s are states t is time elapsed in the well-nested sequence and ν, ν- <sup>∈</sup> <sup>N</sup>|X | are integral valuations, i.e., integer values taken by clocks. This set of tuples does not have information about the intermediate transitions and time-elapses. To handle this, using the pre-computed information, we define a lexicographic progress measure which ensures termination of this search. The main idea is as follows: the first progress measure is to check if there a time-elapse t transition possible between (s, ν) and (s- , ν- ) and if so, we print this out. If not, ν- <sup>=</sup> <sup>ν</sup> <sup>+</sup> <sup>t</sup>, and some set of clocks have been reset in the transition(s) from (s, ν) to (s- , ν- ). The second progress measure looks at the sequence of transitions from (s, ν) to (s- , ν- ), consisting of reset transitions (at most the number of clocks) that result in ν from ν. If neither the first nor the second progress measure apply, then ν = ν- , and we are left to explore the last progress measure, by exploring at most |S| number of transitions from (s, ν) to (s- , ν- ). Using this progress measure, we can seamlessly extend the witness generation to the timed setting. The challenges involved therein, can be seen in the full version [4].

## **6 Implementation and Experiments**

We implemented a tool BHIM (**B**ounded **H**oles **I**n **M**PDA) in C++ based on Algorithm 1, which takes an MPDA and a constant K as input and returns True iff there exists a K-hole bounded run from the start state to an accepting state of the MPDA. In case there is such an accepting run, BHIM generates one such, with minimal number of holes. For a given hole bound K, BHIM first tries to produce a witness with 0 holes, and iteratively tries to obtain a witness by increasing the bound on holes till K. In most cases, BHIM found the witness before reaching the bound K. Whenever BHIM's witness had K holes, it is guaranteed that there are no witnesses with a smaller number of holes.

To evaluate the performance of BHIM, we looked at some available benchmarks and modeled them as MPDA. We also added timing constraints to some examples such that they can be modeled as TMPDA. Our tests were run on a GNU/Linux system with Intel <sup>R</sup> CoreTM i7–4770K CPU @ 3.50GHz, and 16GB of RAM. Details of all examples here, as well as an additional example of a linux kernel bug can be found [4].

• **Bluetooth Driver [25]**. The Bluetooth device driver example [25], has an arbitrary number of threads, working with a shared memory. We model this using a 2-stack pushdown system, where a system state represents the current valuation of the global variables, and the stacks are used to maintain the callreturn between different functions, as well as to keep track of context switches between threads. A known error as pointed out in [25] is a race condition between two threads where one thread tries to write to a global variable and the other thread tries to read from it. BHIM found this error, with a well-nested witness. A timed extension of this example was also considered, where, a witness was obtained again with hole bound 0.

• **Bluetooth Driver v2** [11,23]. A modified version of Bluetooth driver is considered [11,23], where a counter is maintained to count the number of threads actively using the driver. We model this with a A two stack MPDA. With a wellnested witness, BHIM found the error of interrupted I/O, where the stopping thread kills the driver while the other thread is busy with I/O.

• **A Multi-threaded Producer Consumer Problem**. The producer consumer problem (see e.g., [26]) is a classic example of concurrency and synchronization. An interesting scenario is when there are multiple producers and consumers. Assume that two ingredients called 'A' and 'B' are produced in a production line in batches (of M and N respectively). These parameters M and N are fixed for each day but may vary across days. There is another consumer machine that (1) consumes one unit of 'A' and one unit of 'B' in that order; (2) repeats this process until all ingredients are consumed. In between if one of the ingredients runs out, then we non-deterministically produce more batches of the ingredient and then continue. To avoid wastage the factory aims to consume all ingredients produced in a day, hence the problem of interest is to check if all A's and B's produced in a day are consumed. We can model this factory using a two-stack pushdown system, one stack per product, A, B, where the sizes of the batches, M > 0 and N > 0 respectively, are parameters. The production


**Table 1.** Experimental results: Time Empty and Time Witness column represents no. of milliseconds needed for emptiness checking and to generate witness respectively.

and consumption of the 'A's and 'B's are modeled using push and pop in the respective stack. For a given M and N, the language accepted by the system is non-empty iff there is a run where all the produced 'A's and 'B's are consumed. The language accepted by the two-stack pushdown system is given by LM,N = ((a<sup>M</sup> + b<sup>N</sup> )<sup>+</sup>(¯a¯b)<sup>+</sup>)<sup>+</sup>, where a, b represent respectively, the push on stack 1, 2 and ¯a, ¯b represent the pop on stack 1, 2 and hence must happen equal number of times.

For any M,N > 0, any accepting run of the two stack pushdown system cannot be well-nested. Further, in an accepting run, the minimum number of items produced (and hence its length) must be a multiple of LCM(M,N). As the consumption of 'A's and 'B's happen in an order one by one i.e., in a sequence where consumption of 'A' and 'B' alternate, the minimum number of context changes (and the scope bound) required in an accepting run depends on M and <sup>N</sup> (in fact it is <sup>O</sup>(2 <sup>×</sup> LCM(M,N)). On the other hand, the shortest accepting run is 2-hole bounded: at any position of the word, the open holes come from the unmatched sequences of a and b seen so far. Thus for any M,N>0, BHIM was able to check for non-emptiness of LM,N with a witness of hole bound 2.

• **Critical time constraints [9]**. This is one of the timed examples, where we consider the language <sup>L</sup>crit <sup>=</sup> {aybzcyd<sup>z</sup> <sup>|</sup> y, z <sup>≥</sup> <sup>1</sup>} with time constraints between occurrences of symbols. The first c must appear after 1 time-unit of the last a, the first d must appear within 3 time-units of the last b, and the last b must appear within 2 time units from the start, and the last d must appear at 4 time units. Lcrit is accepted by a TMPDA with two timed stacks. Lcrit has no well-nested word, is 4-context bounded, but only 2 hole-bounded.

• **Concurrent Insertions in Binary Search Trees**. Concurrent insertions in binary search trees is a very important problem in database management systems. [17,11] proposes an algorithm to solve this problem for concurrent implementations. However, incorrect implementation of locks allows a thread to overwrite others. We modified the algorithm [17] to capture this bug, and modeled it as MPDA. BHIM found the bug with a witness of hole-bound 2.

• **Maze Example**. Finally we consider a robot navigating a maze, picking items; an extended (from single to multiple stack) version of the example from [6]. In the untimed setting, a witness for non-emptiness was obtained with hole-bound 0, while in the extension with time, the witness had a hole-bound 2.


**Table 2.** Experimental results of timed examples. The column cmax is defined as the maximum constant in the automaton, and Aged denotes if the stack is timed or not

**Results and Discussion**. The performance of BHIM is presented in Table 1 for untimed examples and in Table 2 for timed examples.

Apart from the results in the tables, to check the robustness of BHIM wrt parameters like the number of locations, transitions, stacks, holes and clocks (for TMPDA), we looked at examples with an empty language, by making accepting states non-accepting in the examples considered so far. This forces BHIM to explore all possible paths in the BFS tree, generating the lists at all nodes. The scalability of BHIM wrt all these parameters are in [4].

BHIM **Vs. State of the art**. What makes BHIM stand apart wrt the existing state of the art tools is that (i) none of the existing tools handle underapproximations captured by bounded holes, (ii) none of the existing tools work with multiple stacks in the timed setting (even closed guards!). The state of the art research in underapproximations wrt untimed multistack pushdown systems has produced some robust tools like GetaFix which handles multi-threaded programs with bounded context switches. While we have adapted some of the examples from GetaFix, the latest available version of GetaFix has some issues in handling those examples<sup>3</sup>. Likewise, SPADE, MAGIC and the counter implementation [16] are currently not maintained, resulting in a non-comparison of BHIM and these tools. Most examples handled by BHIM correspond to non-context bounded, or non-scope bounded, or timed languages which are beyond GetaFix : the 2-hole bounded witness found by BHIM for the language L<sup>9</sup>,<sup>5</sup> for the multi producer consumer case cannot be found by GetaFix/MAGIC/SPADE with less than 90 context switches. In the timed setting, the Maze example which has a 2 holebounded witness where the robot visits certain locations equal number of times is beyond [6], which can handle only single stack.

## **7 Future Work**

As immediate future work, we are working on BHIM **v2** to be symbolic, inspired from GetaFix. The current avatar of BHIM showcases the efficiency of fix-point techniques extended to larger bounded underapproximations; indeed going symbolic will make BHIM much more robust and scalable. This version will also include a parser to handle boolean programs, allowing us to evaluate larger repositories of available benchmarks.

Acknowledgements. We would like to thank Gennaro Parlato for the discussions on GetaFix and for providing us benchmarks and anonymous reviewers for more pointers.

<sup>3</sup> we did get in touch with one of the authors, who confirmed this.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **KReach: A Tool for Reachability in Petri Nets***-*

Alex Dixon and Ranko Lazi´c

Department of Computer Science, University of Warwick Coventry, United Kingdom {alexander.dixon,r.s.lazic}@warwick.ac.uk

**Abstract.** We present KReach, a tool for deciding reachability in general Petri nets. The tool is a full implementation of Kosaraju's original 1982 decision procedure for reachability in VASS. We believe this to be the first implementation of its kind. We include a comprehensive suite of libraries for development with Vector Addition Systems (with States) in the Haskell programming language. KReach serves as a practical tool, and acts as an effective teaching aid for the theory behind the algorithm. Preliminary tests suggest that there are some classes of Petri nets for which we can quickly show unreachability. In particular, using KReach for coverability problems, by reduction to reachability, is competitive even against state-of-the-art coverability checkers.

**Keywords:** Petri Nets · Kosaraju's Algorithm · Reachability · Vector Addition Systems · Coverability

## **1 Introduction**

Petri nets [26] (equivalently, Vector Addition Systems with States [12,14]) are one of the best-known formalisms in concurrency theory. They form a highly expressive model which is applicable in a broad range of domains including software and hardware verification [5,6], chemical modelling [3], and business processes [22]. Two of the most studied decision problems on Petri nets are those of *coverability* and *reachability*.

Coverability is the central decision problem for verifying safety properties on Petri nets. The coverability problem asks, given a starting configuration **m<sup>0</sup>** and a target **m**, whether we can reach, by some sequence of valid transitions (i.e. by a *run*), any configuration **m**- <sup>≥</sup> **<sup>m</sup>**. The problem is known to be EX-PSPACE-complete [23,27]. Coverability has seen considerable study in recent years, in particular with a view towards minimising the running time of coverability decision procedures [2,11].

The reachability problem, to which coverability is easily reducible, can capture both safety and liveness properties of systems [13]. Formally, the reachability problem asks if we can, by some run, get from a starting configuration **m<sup>0</sup>** to

<sup>-</sup> Supported by the Centre for Doctoral Training in Urban Science and Progress (EP-SRC Grant EP/L016400/1).

the target configuration **m** exactly. Historically, there has been a wide gap between the upper and lower bounds, but recently these have been improved to a non-elementary lower bound [7] and an Ackermannian upper bound [21].

The first complete algorithm for reachability is due to Mayr [24], since further developed and simplified by Kosaraju [17] and Lambert [18]. More recently, a strikingly simple but not yet practical algorithm was obtained by Leroux [20], based on enumerating Presburger-definable invariants. In this paper we will focus on Kosaraju's algorithm. The latter is the subject of an entire book by Reutenauer [29], since translated into English [28], and more recently presented in a novel, readable format with contemporary notation by Lasota [19].

In spite of these substantial and sustained theoretical developments, the area has seen little in the way of practical implementation. We seek to address this gap by making the following contributions:


## **2 Design and Implementation**

**Algorithm** We will now give a brief overview of Kosaraju's classic reachability algorithm for Petri nets, and explain the translation into code. Note that Kosaraju's algorithm operates over VASS—the translation between the two is immediate, and can be performed while parsing the problem instance.

$$\underbrace{\bigotimes\_{t\_0}^{t\_0}[-1,1]}\_{\{1,0\}} \xrightarrow{t\_1\\_{1}\\_{0,1}} \cdots \xrightarrow{t\_0\\_{1}\\_{1}} \bigotimes\_{t\_0\\_{1}\\_{0,1}} \left(\underbrace{\bigotimes\_{t\_0}^{\left[-1,1\right]}}\_{\{1,0\}}\right) \xrightarrow{\cdots} \cdots$$

(a) We denote constraints and [transitions]. The states of our original VASS are q and r; the transitions are t<sup>0</sup> and t1. Here we are testing reachability from q(1, 0) to r(0, 0).

(b) Gex after SCC decomposition. We now have two components, the second of which is trivial. The adjoinment is marked by a dashed arrow. The shaded rectangles are separate components.

$$\text{Fig.} \, 1 \colon \text{A simple GVASS, } \mathcal{G}\_{\text{ex}}.$$

The procedure revolves around *Generalized VASS (GVASS)*, an extension of VASS. A GVASS G is a sequence (C1, .., Cn), of VASSs annotated with metadata, most notably constraints on their entry and exit configurations. The exit state of C<sup>i</sup> is adjoined to the entry state of Ci+1 by a transition. Reachability in our original VASS V is implied by reachability in an induced GVASS G(V ), a singleton sequence where we constrain the entry and exit configurations as being equal to our initial **m<sup>0</sup>** and target **m**. Figure 1a gives an example.

In outline, Kosaraju's algorithm operates as follows. At each step, given a strongly-connected GVASS G:


This produces a finitely-branching tree of GVASSs in which every branch forms a strictly descending chain with respect to a well-quasi-ordering [21]. The algorithm therefore always terminates, and **m** is reachable from **m<sup>0</sup>** in the root GVASS G if and only if some GVASS in the tree satisfies θ.

**The** *θ* **Condition** Kosaraju's main predicate comprises two parts. θ<sup>1</sup> is a global property of the system, while θ<sup>2</sup> must hold for each component.

θ<sup>1</sup> : There exist *pseudo-runs* through the GVASS which use every edge in every component unboundedly many times, and attain unboundedly large values for every unconstrained coordinate (here a *pseudo-run* is a run over Z<sup>d</sup> rather than over N<sup>d</sup>). This condition can be formulated as an integer linear programming problem. From this we obtain a semilinear set of vectors of variables representing counts of transition occurrences and values of unconstrained coordinates. If there is no bounded variable then θ<sup>1</sup> holds. Otherwise, one such bounded variable is "refined" by either constraining the associated coordinate's value or by unfolding the associated transition. For example, we may deduce that the number of firings of t<sup>0</sup> never exceeds 1 in our example GVASS Gex; we generate the refinements as in Figure 2.

$$\langle 1,0\rangle \twoheadrightarrow \widehat{\left\{ q \atop \begin{array}{c} \Box \\ \Box \\ \end{array} \right\} \star \widehat{\left\{ \begin{array}{c} [0,-1] \\ \hline \\ \end{array} \right\} \star \left( \begin{array}{c} \\ \hline \\ \end{array} \right)} \star \widehat{\left\{ \begin{array}{c} \\ \hline \\ \\ \end{array} \right\} \star \left( \begin{array}{c} \\ \hline \\ \end{array} \right)} \star \langle 0,0 \rangle$$

(a) Refinement when t<sup>0</sup> is activated 0 times.


(b) Refinement when t<sup>0</sup> is activated 1 time.

Fig. 2: Refinement of Gex by removing bounded transition t0.

θ<sup>2</sup> : Each non-trivial component of the GVASS contains some path from the initial to the final state, via which all unconstrained coordinates are increased. The same must also hold if the component's arcs are reversed. We can evaluate θ<sup>2</sup> using standard algorithms which compute the coverability set. If θ<sup>2</sup> fails, then some coordinate is bounded everywhere in the state space of the component; such a coordinate is refined by removing it entirely (making it *rigid*) and encoding its possible values into the component's states.

**Solving Coverability** We are able to reduce from the coverability problem to the reachability problem in the following way. Suppose we are intending to *cover* some vector **m**—that is, we wish to reach any vector **m** such that **m**- ≥ **m**. We introduce a new state Δ, and add a transition δ from the final state of the original VASS to Δ, which subtracts **m** on activation. As Δ can only be reached by subtracting **m** from our current vector, reaching a vector **m**- ≥ **m** (i.e. covering **m**) is equivalent to reaching state Δ. For each vector coordinate we introduce a looping transition on Δ which reduces the value of that coordinate by 1. This ensures that (0, ..., 0) can be reached from any configuration in state Δ. As a result, covering **m** in the original net is equivalent to reaching (Δ,(0, ..., 0)) in the augmented version.

**Implementation** KReach is implemented in the Haskell programming language. This is a strongly-typed, functional language with lazy evaluation. The language was chosen for its high level of expressiveness, type-safety, and the ease of translation between algorithm and implementation.

We represent the algorithm as a function which takes a list of GVASSs, and returns a KosrajuResult. We perform a depth-first search of the refinement tree, either finding a refinement which permits reachability (KosarajuHolds) or exhausting all possibilities (KosarajuDoesNotHold). The algorithm is guaranteed to terminate [17], and so constitutes a full decision procedure for reachability.

The ILP subproblem (θ1) is solved with the SBV (SMT Based Verification) package, an interface to a variety of SMT solvers. We formulate all the constraints as an integer linear program, and evaluate with the ldn function (Linear Diophantine equations over Naturals).

The coverability subproblem (θ2) is solved by an implementation of the standard Karp-Miller algorithm for Vector Addition Systems [16]. This algorithm computes the *coverability set*—the upward closure of the set of all vectors that are reachable in a net from some starting vector. The extensible nature of the code allows the basic implementation to be swapped out for a more optimised one (e.g. based on [10]) at a later stage.

We ensure that the strongly connected property holds by decomposing the original GVASS via the SCC implementation found in the Data.Graph module.

**Optimisations** In spite of the ominous non-elementary complexity lower bound, some effort was still undertaken to improve the runtime of test cases. A number of minor improvements have been made over the standard algorithm which remove unneccessary computations.

For example, when constructing refinements for a GVASS G, when a variable is bounded above by some constant c, Kosaraju suggests to generate refinements Ri(G) for every i from {0,...,c}. Instead, we refine only to Ri(G) for values i that feature in the corresponding semilinear set.

The algorithm has also been multithreaded with Haskell's lightweight concurrency toolkit [1], so that it evaluates refinements in parallel rather than sequentially. Any return value of KosarajuHolds will terminate the program.

The program uses the vass library (released as part of this publication) to parse file formats. By default a parser for MIST's .spec format<sup>1</sup> is provided. This format is traditionally a representation of coverability problems; KReach translates these to reachability problems by replacing p ≥ n constraints by p = n in target places.

## **3 Installation and Usage**

**Installation** The KReach tool is available from a public GitHub repository. One can clone the repository in full with the following command:

git clone https://github.com/dixonary/kosaraju.git

The program is built against the Haskell stack toolchain<sup>2</sup>. In order to build the tool, a locally installed version of stack is required. The tool can be compiled and locally installed by running stack install in the cloned directory. One must also ensure that an SMT solver is installed and accessible on the user's binary path; z3<sup>3</sup> and cvc4<sup>4</sup> are supported. A compiled program binary, along with benchmarks, is provided on the "Releases" section of the GitHub page.

**Usage** The compiled kosaraju tool can be interacted with through the command line. Simple wrapper scripts are provided; the standard invocation is kreach FILENAME to check reachability, and kcover FILENAME for coverability. Intermediate output can be hidden by providing the -q (quiet) flag.

Figure 3b shows the relative performance of z3 against cvc4 for growing inputs. cvc4 tends to far outperform z3 on the constructed ILP problems.

## **4 Experimental Results**

KCover allows us to use benchmarks for the coverability problem as a source of test cases for the reachability algorithm. The suite provided with the tool includes also a number of test cases for various aspects of the implementation, as well as examples from the non-elementary lower bound construction [7].

KReach was evaluated against many problems and solvers from the literature on coverability. QCover [4] implements coverability based on relaxation to continuous coverability; ICover [11] refines this further with inductive invariants.

<sup>1</sup> https://github.com/pierreganty/mist/wiki

<sup>2</sup> https://docs.haskellstack.org/en/stable/README/

<sup>3</sup> https://github.com/Z3Prover/z3

<sup>4</sup> https://cvc4.github.io

$$\bigoplus\_{\langle X,0\rangle} \xrightarrow{t\_0} \underbrace{\bigotimes\_{a}}\_{\langle 1,\bigotimes\_{a} t\_1,[0,-1] \bigotimes\_{b}} \longleftrightarrow\_{\langle 0,0\rangle} \langle 0,0\rangle$$

(a) The parameterized version of our origi-

(b) Time against Parameter X for Gex(X) with the supported solvers.

Table 1 includes some specific instances which are representative of the broader trends in experimental results. On many safe cases, such as Kanban and Bingham, KReach is able to determine safety faster than state of the art coverability solvers by finding zero valid refinements (terminating the search immediately). On some safe nets such as Manufacturing, KReach cannot immediately rule out coverability in this way, and the refinement tree must be explored. The Bug\_Tracking examples induced intractably large ILP problems. Unsafe cases such as PNCSACover induced large refinement trees, which were unable to be explored fully within the time limit.


Table 1: Sample of test cases. All results were computed on consumer hardware. MLE = Memory Limit Exceeded (4GB); TLE = Time Limit Exceeded (1 hour).

## **5 Concluding Remarks**

The experimental results suggest that KReach may be a fruitful source of static invariants for ruling out coverability on some classes of Petri nets. One line of further work may be to attempt to formally classify those nets for which Kosaraju's algorithm is effective in practice.

Further work may also include optimisations based on the novel theoretical developments in the Ackermannian upper bound proof [21], and building parsers to enable experiments on instances of problems that are known to reduce to reachability in Petri nets (e.g., in logic [15,8], concurrent systems [9] or process calculi [25]).

**Data Availability Statement** The data analyzed here are available in the Figshare data repository: https://doi.org/10.6084/m9.figshare.11887956

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **AVR: Abstractly Verifying Reachability**

Aman Goel( -) and Karem Sakallah

University of Michigan, Ann Arbor MI 48105, USA {amangoel,karem}@umich.edu

**Abstract.** We present AVR, a push-button model checker for verifying state transition systems directly at the source-code level. AVR uses information embedded in the word-level syntax of the design representation to automatically perform scalable model checking by combining a novel syntax-guided abstraction-refinement technique with a word-level implementation of the IC3 algorithm. AVR provides independently-verifiable certificates that offer provable assurance and are easy to relate to the word-level system. Moreover, proof certificates can be further used in innovative ways to extract key design information and are useful in a growing number of applications.

## **1 Introduction**

Model checking [27,28] techniques based on incremental induction (like IC3 [19,31]) have gained significant success [21] due to their property-directed nature and clever use of incremental SAT solving. Bit-level implementations of IC3, however, struggle with scalability due to being overwhelmed by low-level propositional learning [33]. Rapid advances in SMT solving [54,12] offer a solution and allow for performing IC3 directly at the word level by combining the incremental induction algorithm with an abstraction-refinement procedure [18,41,23,34].

AVR [2] is a model checker designed, primarily, for verifying safety properties of hardware. It uses syntax-guided abstraction [34], a generalization of implicit predicate abstraction [22], to perform IC3-style reachability on a first-order logic encoding of the transition relation resulting in word-level clause learning. Upon termination, AVR will either produce a proof certificate, in the form of a state formula representing an inductive invariant, if the safety property holds or a counterexample execution trace if it fails. In both cases, confidence in the verification output is achieved by using an external proof checker to independently confirm the correctness of the proof certificate or a trace simulator depicting the sequence of transitions leading to the failure. Beyond hardware, these features allow AVR to be used in innovative ways including the verification of distributed protocols defined over unbounded domains [44,45]. AVR also provides a variety of complementary verification techniques, such as data abstraction and interpolation, to increase its scalability, as well as useful utilities, such as design statistics and graphical visualizations, to provide high-level insights on the input design. AVR was independently evaluated to be the best word-level verifier in the single bit-vector track of Hardware Model Checking Competition (HWMCC) 2019 [17].

## **2 Motivation**

Consider a predicate p := (a + b < 1) defined over two 32-bit variables a and b. An equivalent propositional-level representation of p will involve a bit-blasted expression involving 64 Boolean variables and several hundred clauses. As a consequence, bit-level model checking algorithms do not scale as variable bit widths increase and suffer from the so-called state-space explosion problem [26].

AVR derives its motivation from the fact that the word-level representation of a problem contains useful high-level information that can be exploited for better scalability. Building on our previous work [33,34], AVR uses this insight to infer an implicit syntax-guided abstraction using terms built from objects present in the word-level syntactic description of the problem (like a, b, 1, +, <). The approach can be further combined with data abstraction using uninterpreted functions [20,11] to simplify reasoning for the underlying query solver. This, coupled with efficient SMT solving, allows for an effective word-level model checking algorithm that can scale better than bit-level engines for a variety of verification problems. Moreover, the underlying induction-based verification procedure has the unique strength of producing word-level proof certificates that are useful in a variety of applications [32,37,45,44].

## **3 System Architecture**

**Fig. 1:** Verification flow with AVR UF: uninterpreted functions, BV: bit-vectors, LIA: linear integer arithmetic

Fig. 1 shows the architecture and verification flow of AVR.

Frontends in AVR extract the model checking problem from inputs in different formats using openly-available tools.

– Verilog + SystemVerilog Assertions [9] (using Yosys [55]) – VMT [8] (using MathSAT 5 [24]) – BTOR2 [51] (using Btor2Tools [3])

AVR core performs IC3 with syntax-guided abstraction (IC3+SA) and implements several verification techniques and utilities (detailed in §3.1, §3.2).

SMT solver backends use the latest versions of state-of-the-art SMT solvers (Yices 2 [30], Boolector [50], MathSAT 5 [24] and Z3 [48]) to efficiently integrate incremental solver reasoning with AVR core using a C++ interface.

Multi-engine wrapper allows for process-level parallelism by running multiple instances of AVR in parallel using proof race (as elaborated later in §3.3).

## **3.1 Techniques**

At its core, AVR implements a word-level IC3 procedure where terms in the implicit syntax of the problem are used as building blocks to perform IC3-style clause learning at the word level using SMT solving. The key differences between IC3+SA [34], as implemented in AVR, and bit-level IC3 [19,31] can be summarized as follows:


Within the core IC3+SA framework, AVR implements several optimizations and important features that are helpful in improving model checking performance.

## **Core features**


## **Add-on techniques**


#### **Utilities**

AVR also provides a number of useful utilities to the user including:


#### **3.2 Certificates**

Once a model checking problem is solved, there can be two possible outcomes: either the property holds (safe), or it fails (unsafe).

If the property holds, IC3+SA produces an inductive invariant, i.e. an approximate fixpoint that establishes the property to be true in all executions of the system. Inductive invariants act as proof certificates that guarantee the correctness of the verification outcome. AVR prints such proof certificates directly in the SMT-LIB format, which allows for independent checking of their correctness using an external SMT solver like Yices 2 or Z3. Since proof certificates are in the word-level format, they are human-readable and much easier to relate to the word-level input directly at the source-code level (as against bit-level invariants which are usually too hard to understand). Proof certificates have many useful applications, including the derivation of inductive validity cores [32], gaining deeper insights on design behavior, deriving assume-guarantee verification conditions [37,53], deriving helper assertions during multi-property verification [36,29], and generalizing to quantified domains (as elaborated later in §4.3).

When the property fails, AVR produces a counterexample trace that establishes how to reach a bad state (a state where the property is false) starting from an initial state. AVR prints the counterexample witness in BTOR2 witness format [51], which allows for independent verification of the execution trace using a BTOR2 witness simulator [4]. This allows the designer to debug and pin-point the source of error by analyzing the execution leading to the buggy state.

#### **3.3 Proof Race**

AVR supports a variety of configurations and add-on features (as discussed in §3.1). Without detailed knowledge of the input, it is hard to tell upfront which technique will perform the best. Different configurations are useful to tackle different types of problems, though manually trying different configurations can become tedious for the user. To counter this, AVR offers a multi-engine wrapper called proof race that automatically runs multiple instances of AVR with different configurations in parallel and offers process-level parallelism. Given a set of specified resource limits, proof race initiates multiple AVR instances and terminates execution as soon as one of these instances successfully races to the result. Such a portfolio-based approach is crucial in practice for fast verification performance since no single technique performs best in all cases [21,16]. It is also further strengthened by complementing AVR's word-level techniques with state-of-the-art model checking engines like ABC dprove [14], IC3ia [23] etc.

## **4 Case Studies**<sup>1</sup>

## **4.1 Apache Buffer Overflow**

We consider patched versions of two buffer overflow vulnerabilities [40] from standard modules of the Apache web server [1].

apache-escape-absolute corrects a high severity vulnerability CVE2006-3747 [7] that fixes the out-of-bounds buffer overflow exploitation which allows a remote attacker to cause a denial of service and execute arbitrary code via crafted URLs. The patched version corrects a check (c < TOKEN SZ) to (c < TOKEN SZ − 1).

apache-get-tag fixes a medium severity vulnerability CVE-2004-0940 [6] that exploits a buffer overflow when copying user-supplied tag strings into finite buffers. A local attacker may leverage this issue to execute arbitrary code on the affected computer with the privileges of the affected Apache server. The patched version corrects a check that validates the length of the tag strings.

In less than a minute, AVR successfully verifies that both of these buffer overflow exploits are unreachable in the patched versions for any buffer size. AVR also provides human-readable proof certificates that are externally verified using Z3, and provides provable assurance against these security vulnerabilities.

## **4.2 Public Key Authentication Protocol**

The Needham-Schroeder public key authentication protocol [49] allows establishing mutual authentication between an initiator A and a responder B, after which some session involving the exchange of messages between them can take place. Unfortunately, this protocol is vulnerable to a man-in-the-middle attack [43]. If an intruder I can persuade A to initiate a session with him, he can relay the messages to B and convince B that he is communicating with A.

We consider an instance of the protocol from HWMCC'19 [17,52] with 3 initiators and responders each, and with an unsafe state defined as a responder being finished authentication with the intruder as a party. Within a minute, AVR finds an execution trace that establishes how to reach an unsafe state. The counterexample witness produced by AVR can be replayed using the BtorSIM simulator [4] to verify the execution trace and to debug the protocol.

### **4.3 Verifying Distributed Protocols**

Beyond verifying model checking problems from finite domains, AVR has shown preliminary application in the verification of distributed protocols, which are

<sup>1</sup> All results presented in this paper can be replicated from [35,5].

generally expressed over unbounded domains (with an unbounded number of clients, servers, epochs, messages, etc.). The I4 system [45,44] demonstrates how AVR can be used to verify a simpler finite version of the protocol, followed by generalizing AVR's proof certificates to the unbounded domain. For example, a finite-domain invariant saying "clients C<sup>1</sup> and C<sup>2</sup> cannot both link to the server S" i.e. ¬(link(C1, S)∧link(C2, S)) can be generalized to the unbounded domain as "no two different clients can both link to a server" i.e. ∀C1,C2,S (C<sup>1</sup> = C2) =⇒ ¬(link(C1, S) ∧ link(C2, S)).

## **5 Strengths**

Control-centric properties, where much of the complexity lies in the control logic (such as sequential equivalence checking, microprocessor instruction control unit, key-value store) are much easier to verify using AVR. Syntax-guided abstraction hides the domain complexity outside of the problem syntax, and automatically separates important control-flow details from the irrelevant data component. This, combined with data abstraction, allows for scalable model checking with the capacity to scale independently of the variable bit widths [33,34].

Push-button verification using AVR eliminates the need for tedious human intervention in verification (such as manual identification of abstraction predicates, manually adding helper assertions) by automatic incremental construction of abstraction and word-level clauses using the IC3+SA algorithm.

Provable assurance on the verification outcome is guaranteed by AVR using independently-checkable proof certificates and counterexample traces.

Useful utilities that AVR provides, such as support for multiple input formats, efficient integration with state-of-the-art SMT solvers, proof race, high-level system statistics, graphical visualizations, etc. contribute to a user-friendly experience and ease of use.

## **6 Limitations**

Heavy data dependency can make word-level techniques in AVR ineffective for certain problems, especially when a majority of bit-precise values in the data domain play an important role (for example, puzzle solving problems like Tower of Hanoi [39], Peg Solitaire [38], etc. formulated as reachability problems [52]). Logic synthesis and bit-level optimizations [14,47] can be very useful for such problems and help bit-level checkers perform better than word-level techniques by significantly decreasing the problem complexity at the bit level.

First-order logic fragments beyond quantifier-free bit-vectors, arrays and uninterpreted functions (such as non-linear arithmetic, floating-point numbers, quantifiers, etc.) and properties beyond safety (such as liveness and fairness) have limited support in the current tool implementation. AVR's primary focus has been on verification of safety properties defined on hardware systems.

## **7 Conclusions**

AVR provides a variety of techniques to efficiently perform automatic word-level verification using SMT solvers with provable guarantees and security. AVR has been effective in hardware verification [17,33,34] and shows significant promise for the verification of distributed protocols [44,45]. In the future, we plan to address some of its current limitations and extend its application to practical verification problems beyond the hardware domain.

Data Availability Statement and Acknowledgments. The software and datasets generated and analyzed during the current study are available in the Zenodo repository: https://doi.org/10.5281/zenodo.3677545. The authors would like to thank Ranan Fraer, Ravi Prakash, Habeeb Farah and Ziyad Hanna from Cadence Design Systems for their help in shaping some of the concepts presented in this paper.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Timed and Probabilistic Systems

## **Verified Certification of Reachability Checking for Timed Automata**

Simon Wimmer and Joshua von Mutius

Fakult¨at f¨ur Informatik, Technische Universit¨at M¨unchen, Munich, Germany wimmers@in.tum.de joshua.von-mutius@tum.de

**Abstract.** Prior research has shown how to construct a mechanically verified model checker for timed automata, a popular formalism for modeling real-time systems.

In this paper, we shift the focus from verified model checking to certifying unreachability. This allows us to benefit from better approximation operations for symbolic states, and reduces execution time by exploring fewer states and by exploiting parallelism. Moreover, this gives us the ability to audit results of unverified model checkers that implement a range of further optimizations, including certificate compression.

The resulting tool is evaluated on a set of standard benchmarks to demonstrate its practicality, using a new unverified model checker implementation in Standard ML to construct the certificates.

**Keywords:** Timed automata · Certification · Model Checking · Interactive Theorem Proving · Isabelle/HOL

Timed automata [1] are a widely used formalism for modeling real-time systems, which is employed in a class of successful model checkers such as Uppaal [4]. These tools can be understood as trust-multipliers: we trust their correctness to deduce trust in the safety of systems checked by these tools. As a consequence, one wants to ensure as rigorously as possible that the computation results of timed automata model checkers are correct.

Previous work [31] has addressed this problem by constructing a model checker for timed automata that is fully verified using Isabelle/HOL [25]. This tool is intended to be a reference implementation that can be used to scrutinize the correctness of other model checkers. As such, it is mainly able to check small and medium-sized benchmark examples, but the performance gap w.r.t. more practical model checkers prevents it from checking realistic benchmark models within reasonable time and space bounds.

We address this issue by shifting the focus from full verified model checking to only certifying that the result produced by an unverified model checker is correct. We only study reachability: it is the most important property that is checked with timed automata model checkers, and some model checkers only support reachability. It is crucial to ensure that a bad state is certainly not reachable if

the model checker claims so, thus we want to certify *unreachability*. Certifying that a state is indeed reachable would amount to extracting a timed trace and certifying that the trace is compatible with the model. While implementing this in a verified manner would be comparatively easy, we consider it less important because it corresponds to the bug finding functionality of model checkers, which carries less trust.

The recipe for certifying unreachability is simple: the model checker explores a number of states until it determines that there are no more states to be found. If none of the states fulfill the final state predicate (i.e. violates the safety property), then the model checker will answer "unreachable". We use the set of explored states as the unreachability certificate. In essence, we only need to check that the initial state is contained in this set, that there are no outgoing edges from this set, and that none of the states in the set fulfill the final state predicate.

The switch to certification holds many advantages. Timed automata model checking uses *over-approximations* of symbolic states to ensure termination. A large variety of these approximation operators has been studied [2,3,14]. Our previous work [29] has shown that, while formally proving the correctness of these approximation operations is feasible in principle with an interactive theorem prover, the effort is rather high. Instead, to *certify* unreachability, it is sufficient to only know that the approximation operator indeed yields a state that is at least as big as the precise symbolic state. Certifying this property is cheap.

Moreover, certification eases *parallelization*. Checking that a state is not final and that all its successors are covered by the state set are local properties. We show how to exploit this in a verified implementation, while only mildly increasing the verification effort and the size of the trusted code base.

Finally, the number of states explored by a model checker can vary immensely, depending on a range of factors such as the chosen approximation operator or the search order. Thus, an efficient unverified tool can exploit different heuristics and strategies to compute a state space that is as small as possible, and thereby speedup the certification effort. In this context, we also study a number of *compression techniques* to reduce the number of states in the certificate after the model checker has concluded its search.

We use a new unverified model checker called Mlunta, which is implemented in Standard ML (SML), to generate certificates for a set of standard benchmarks, and to evaluate our verified certifier's performance on these benchmarks <sup>1</sup>.

*Related Work* This work is based on an existing Isabelle/HOL formalization of timed automata model checking [29,31]. Other proof-assistant formalizations of timed automata focus on proving elementary properties about the basic formalism [33,34], or proving properties about concrete automata [26,10,8], but none of them are concerned with model checking.

Earlier work formalizes a model checker for the modal μ-calculus [28], and constructs a verified finite state LTL model checker [9,24,6].

<sup>1</sup> Both tools are available online: https://doi.org/10.5281/zenodo.3679245.

The idea of extracting certificates from the model checking process has previously been studied in the context of the μ-calculus [23] and finite state LTL model checking [27]. However, these works are not accompanied by a verified certificate checker and do not attempt to scale the approach to practical examples. Only the recent work of Griggio et al. [11] provides a practical extraction mechanism and a certificate checker for LTL model checking, but the checker is not verified. To the best of our knowledge, we are the first to examine certification in the context of timed automata model checking.

Finally, in the context of software verification, the idea of producing certificates for the correctness of a program has been broadly studied [16,5].

*Isabelle/HOL* Isabelle/HOL [25] is an interactive theorem prover based on Higher-Order Logic (HOL). HOL can be thought of as a combination of a functional programming language and mathematical logic. Isabelle/HOL mostly resembles standard mathematical notation. Some conventions that are borrowed from functional programming need to be explained, however. Functions are mostly curried, i.e. of type τ<sup>1</sup> ⇒ τ<sup>2</sup> ⇒ τ instead of τ<sup>1</sup> × τ<sup>2</sup> ⇒ τ . As a consequence, function application is usually denoted as f ab instead of f(a, b). Function abstraction with lambda terms uses the standard syntax λx. t (the function that maps x to t) and can also have paired arguments λ(x, y). t. Type variables are written - a, - b, etc. Compound types are written in postfix syntax: τ set is the type of sets of elements of type τ . We use the Isabelle/HOL convention that free variables are implicitly all-quantified throughout the paper. In parts of the paper, formulas or syntax have been simplified for readability, but we have stayed largely faithful to the Isabelle/HOL formalization.

*Contributions* In short, these are the main contributions of our work:


*Outline* The remainder of the paper is organized as follows. The first section briefly recalls the theory of timed automata, and sketches the state-of-the-art model checking process. The second section details our approach to certification and explains how, starting from an abstract theory, a concrete verified implementation of the certificate checker can be obtained. Section three illustrates a number of techniques to improve the certificate checker's performance, while only mildly increasing the formalization effort. Section four discusses two methods for certificate compression. The paper is concluded by an experimental evaluation and remarks on potential future work.

## **1 Timed Automata and Model Checking**

*Transition Systems* We take a very simple view of transition systems: they are simply a relation → of type - a ⇒ - a ⇒ bool for a type of states - a. We write a →<sup>∗</sup> b to denote that b can be reached from a via a sequence of →-transitions.

*Timed Automata* To make the paper self-contained, this paragraph briefly describes timed automata and is mostly reproduced from Wimmer and Lammich [29]. For a thorough introduction see the tutorial paper of Bengtsson and Yi [4].

Compared to standard finite automata, timed automata introduce a notion of clocks. Figure 1 depicts an example of a timed automaton. We will assume that clocks are of type *nat*. A *clock valuation* u is a function of type *nat* ⇒ *real*. Locations and transitions are guarded by *clock constraints*, which have to be

Fig. 1: Example of a timed automaton with two clocks.

fulfilled to stay in a location or to take a transition. Clock constraints are conjunctions of constraints of the form c ∼ d for a clock c, an integer d, and ∼∈{<, ≤, =, ≥, >}. We write u |= cc if the clock constraint cc holds for the clock valuation u. We define a timed automaton A as a pair (T , I) where I is a mapping from locations to clock constraints (also named invariants); and T is a set of transitions written as <sup>A</sup> <sup>l</sup> −→g,a,r <sup>l</sup> where l and l are start and successor location, g is the guard of the transition, a is an action label, and r is a list of clocks that will be reset to zero when the transition is taken. States of timed automata are pairs of a location and a clock valuation. The operational semantics defines two kinds of steps (given as their HOL descriptions):

**–** Delay: (l, u) <sup>→</sup><sup>d</sup> (l, u <sup>⊕</sup> <sup>d</sup>) if <sup>d</sup> <sup>≥</sup> 0 and <sup>u</sup> <sup>⊕</sup> <sup>d</sup> <sup>|</sup><sup>=</sup> <sup>I</sup> <sup>l</sup>; **–** Action: (l, u) →<sup>a</sup> (l - , [r := 0]u) if <sup>A</sup> <sup>l</sup> −→g,a,r <sup>l</sup> - , u |= g, and [r := 0]u |= I l - ;

where u ⊕ d = (λc. u c + d) offsets all clocks by d in the valuation u, and [r := 0]u = (λc. if c ∈ r then 0 else u c) resets all clocks in r to 0 in valuation u. For any (timed) automaton A, we consider the transition system

$$(l, u) \rightarrow\_A (l', u') = \left(\exists d \ge 0. \; \exists a \; u''. \; (l, u) \rightarrow^d (l, u') \land (l, u'') \rightarrow\_a (l', u')\right).$$

That is, each transition consists of a delay step that advances all clocks by some amount of time, followed by an action step that takes a transition and resets the clocks annotated to the transition. Given a final state predicate F and an initial state (l0, u0), we are interested in whether (l0, u0) →<sup>∗</sup> <sup>A</sup> (l, u) for any l, u with F l. In Figure 1, the final state is l<sup>3</sup> (i.e. F l ←→ l = l3). As the guard for action a<sup>4</sup> is never enabled, l<sup>3</sup> is unreachable.

*Model Checking* Due to the use of clock valuations, the state space of timed automata is inherently infinite. Thus, model checking algorithms for timed automata are based on the idea of abstracting from concrete valuations to *sets* of clock valuations of type (*nat* ⇒ *real*) set, often called *zones*. The resulting transition system of reachable states from an initial zone is called the zone graph. It is explored in an *on-the-fly* manner, computing successors on zones, which are typically represented symbolically as *Difference Bound Matrices* (DBMs). Knowledge of this data structure is not necessary to understand the rest of the paper. Thus we refer the interested reader to Bengtsson and Yi [4] and to Wimmer and Lammich [29,31] for a verification of this data structure. In the remainder we will only use the term "zones" instead of referring to their implementation as DBMs.

The delicate part of this method is that the number of reachable zones could still be infinite. Therefore, over-approximations (or *abstractions*) of zones are computed to obtain a finite search space. For our purpose, it sufficient to assume an abstraction operator α indeed computes an over-approximation, i.e. Z ⊆ α(Z) for any zone Z. We call the version of the zone graph where abstractions are applied the *abstract zone graph* [13]. For a number of such abstraction operators, it can be shown that the abstract zone graph is sound and complete <sup>2</sup>. The proofs are rather intricate, however. Thus formalizing them would be a big effort. By focusing on certification of unreachability, this problem vanishes, as we only need to ensure that any state (l, Z) that we deem reachable in the zone graph is *subsumed* by some state (l, Z- ) with Z ⊆ Z that is part of the certificate and that was computed by the abstraction (i.e. Z-= α(Z1) for some Z1).

*Certificates by Example* Figure 2 depicts the zone graph of the automaton in Figure 1. Each zone Z is given as a clock constraint cc such that Z = {u | u |= cc}. A model checker like Munta would have to explore the full zone graph before being able to decide that l<sup>3</sup> is unreachable. Any model checker that uses the same abstraction technique as Munta [2] would not be able to benefit from abstractions for this example and thus the abstract zone graph is the same as the zone graph. However, such a model checker could apply subsumptions while exploring the zone graph. That is, when a symbolic state of the form (l2, {u | u |= c<sup>1</sup> = 0∧c<sup>2</sup> < k+1}) is explored, the state (l2, {u | u |= c<sup>1</sup> = 0 ∧ c<sup>2</sup> < k}) can safely be discarded.

This means that at the end of the model checking process, only the three states in Figure 3a will be stored. The solid edges are part of the zone graph,

<sup>2</sup> Soundness: for every abstract run, there is a concrete instantiation. Completeness: every concrete run can be abstracted.

Fig. 2: The zone graph of the automaton depicted in Figure 1.

while the dashed edge indicates that the zone at its tail has a successor in the zone graph ((l2, {u | u |= c<sup>1</sup> = 0 ∧ c<sup>2</sup> < 1})) that is subsumed by the tip of the edge. The set of these three states can act as a *certificate* of unreachability. They essentially form an inductive invariant of the zone graph: for each state in the certificate, all its successors in the zone graph are either contained in the certificate themselves or subsumed by another state in the certificate. Thus we know that any symbolic state that is reachable from the initial state is subsumed by some state in the certificate, and as the final state is not contained in the certificate, we can conclude that it is unreachable.

Figure 3b shows a certificate with only two states that replaces the two states for l<sup>2</sup> by the state with a dashed border. Note that this state is not part of the original zone graph. The certificate fulfills the same invariant property and thus also proves unreachability. We will use this technique of adding larger states to the certificate that are not part of the zone graph for our compression techniques in section 4.

Fig. 3: Two certificates of unreachability for the automaton from Figure 1.

## **2 From Model Checking to Certifying Unreachability**

This section first describes our approach to certification abstractly. Then, we detail how the existing formalization of a timed automata model checker was extended—with rather low effort—to a verified certifier. In practice, networks of timed automata with additional modeling features such as, e.g. shared state variables, are used. However, due to the existing verified product construction for such a formalism [31], it is sufficient to study the case of a single timed automaton here.

#### **2.1 An Abstract Correctness Theorem**

To work towards a rigorous justification of the certification process, we first study the problem on a more abstract level. Consider a transition system → on states of type - l × - s where - l corresponds to the finite state part of timed automata and - s corresponds to zones. We assume an invariant P on states, i.e.:

$$P\ (l\_1,\ s\_1) \ \land (l\_1,\ s\_1) \to (l\_2,\ s\_2) \Longrightarrow P\ (l\_2,\ s\_2) \dots$$

This invariant essentially represents a restriction of → to valid states. While this would usually be assumed implicitly, we explicate P here as it is technically more convenient to do so in the Isabelle/HOL formalization.

The interesting feature that sets timed automata model checking apart is subsumption. Recall that during the model checking process, it is possible to first discover some (symbolic) state (l, Z) (a pair of a discrete state l and a zone Z), and to find at some later point that another reachable state (l, Z- ) subsumes (l, Z) because Z semantically contains Z- , i.e. Z ⊆ Z- . At this point the state (l, Z) can be discarded as we know that anything that is reachable from (l, Z) is also reachable from (l, Z- ). Abstractly, subsumption is modeled by some fixed preorder (i.e. a reflexive and transitive relation) on - s which is a simulation relation between → and itself:

$$s\_1 \prec s\_2 \land (l\_1, s\_1) \to (l\_2, t\_1) \land P\left(l\_1, s\_1\right) \land P\left(l\_2, s\_2\right)$$
 
$$\implies \qquad \exists t\_2. \ t\_1 \prec t\_2 \land (l\_1, s\_2) \to (l\_2, t\_2)$$

In the abstract setting, a certificate consists of a set of discrete states L of type - l set, and a mapping M of type - l ⇒ - s set that gives the set of reachable symbolic states that were computed for any discrete state l ∈ L. We say that (L, M) satisfies P if all states in the certificate (L, M) satisfy P:

$$l \in L \land s \in M \\ l \implies P(l, s)$$

Moreover, the certificate needs to be *closed*. Following Herbreteau et al. [13], we call a state *covered* if it is subsumed by another state in the certificate. A certificate is closed if for each state in the certificate all its successors are covered:

$$l\_1 \in L \land s\_1 \in M \, l\_1 \land (l\_1, s\_1) \to (l\_2, s\_2) \implies l\_2 \in L \land (\exists s\_3 \in M \, l\_2.s\_2 \prec s\_3) \tag{\*} $$

The following key theorem states that all reachable states are covered if the initial state is covered:

**Theorem 1.** *Let* (L, M) *be closed and invariant under* P*. Assume* l<sup>0</sup> ∈ L*,* s- <sup>0</sup> <sup>∈</sup> M l0*,* <sup>s</sup><sup>0</sup> s- <sup>0</sup>*, and* (l0, s0) →<sup>∗</sup> (l, s)*. Then* l ∈ L *and there exists* s *such that* s- <sup>∈</sup> M l *and* <sup>s</sup> s- *.*

*Proof.* By induction on the number of steps in (l0, s0) →<sup>∗</sup> (l, s). The following sketches how the run of covering states is constructed. The first line represents (l0, s0) →<sup>∗</sup> (l, s) and the states in the third line are all part of the certificate.

$$\begin{array}{ccccccccc}(l\_0, s\_0) & \rightarrow & \{l\_1, s\_1\} & \rightarrow & \{l\_1, \ldots \} & \rightarrow & \{l, s\} \\ & & \preceq & & & \leq & \{l\_1, s\_1\} \\ & & & & \langle l\_1, t\_1\rangle & & \ldots & \qquad & \langle l, t\rangle \\ & \preceq & \nearrow & \precarrow & \nearrow & \qquad & \nearrow & \langle l, t\rangle \\ & (l\_0, s\_0') & & & \langle l\_1, s\_1'\rangle & & \ldots & \quad & \langle l, s\rangle \\ \end{array}$$

From the assumptions on l0, s0, and s- <sup>0</sup>, we can first apply the self-simulation property of <sup>→</sup> to (l0, s0) <sup>→</sup> (l1, s1) to obtain a <sup>t</sup><sup>1</sup> such that <sup>s</sup><sup>1</sup> t<sup>1</sup> and (l0, s- <sup>0</sup>) → (l1, t1). As the certificate is closed we thus get l<sup>1</sup> ∈ L and we can find an s- <sup>1</sup> ∈ M l<sup>1</sup> such that t<sup>1</sup> s- <sup>1</sup> (and thus s<sup>1</sup> s- <sup>1</sup> by transitivity). The induction hypothesis can then be applied to l1, s1, and s- <sup>1</sup>.

We will now say that a certificate (L, M) is *admissible* iff


**Corollary 1.** *If* F *is monotone w.r.t. and the certificate* (L, M) *is admissible, then* l s.(l0, s0) →<sup>∗</sup> (l, s) ∧ F l *.*

#### **2.2 An Abstract Certificate Checker**

In practice, the certification process has to consider one additional complication. A model is typically described in terms of human-readable identifiers, while most model checkers and the verified model checker Munta [30] in particular represent these as natural numbers internally to allow for efficient indexing. In our certifier, this is accounted for by relabeling the human-readable identifiers in a given model to natural numbers in a first (verified) pre-processing step. To save additional transformations of the certificate after it was emitted, we let the unverified model checker additionally emit a textual description of such a renaming. The certifier then just needs to check that the given renaming is injective to ensure that it can safely be applied.

Together with the theoretical analysis laid out in the last section, we can thus derive the following strategy for certifying unreachability:

**–** An unverified model checker explores the reachable state space of a given model symbolically and checks that none of the discovered states (l, s) fulfills F l.

```
1 definition check (L, M) ≡
2 monadic list all L (λl. do {
3 let S = M l ;
4 let next = succs l S ;
5 monadic list all next (λ(l
                           -

                           , S-

                              ). do {
6 xs ← SPEC (λxs. set xs = S-

                               ) ;
7 if xs = [] then return True else do {
8 b1 ← return (l
                      -
                       ∈ L) ;
9 ys ← SPEC (λxs. set xs = M l-

                                   ) ;
10 b2 ← monadic list all xs (λx.
11 monadic list ex ys (λy.return (x -
                                         y))
12 ) ;
13 return (b1 ∧ b2)
14 }
15 })
16 })
```
Listing 1.1: Monadic program to check whether a certificate is closed.


If the process is successful, we can conclude by Corollary 1 that no "bad" state (l, s) (i.e. with F l) is reachable symbolically. We will argue that this really implies that the model is safe in the concrete case of timed automata in section 2.3.

We now lay out how a verified certificate checker that implements said strategy for an abstract transition system can be constructed in Isabelle/HOL. Listing 1.1 displays the definition of the core of the checker that checks whether the certificate is closed in the sense defined above. The program is defined in the non-determinism monad of the Imperative Refinement Framework (IRF) [20]. Some parts, such as checking set membership or converting a (finite) set to a list are still left abstract. A non-deterministic specification SPEC Q returns some value v with Q v.

The body of the program (lines 2-16) iterates over all discrete states in the certificate L and checks that all corresponding symbolic states are covered. Line 3 retrieves the symbolic states that correspond to discrete state l and in line 4 their symbolic successor states are computed. The result (*next*) is a list of pairs of a discrete state and the set of its corresponding symbolic states. The loop ranging from lines 5 to 15 iterates over this list to ensure that all the successor states are covered. Given a discrete state l and a set of symbolic states S- , line 6 first converts it into a list xs that can be iterated over. This turns into a vacuous

operation when the algorithm is refined to an executable version where sets are implemented as lists. Line 8 checks that l is also part of the certificate. Then, in line 9 the set of corresponding symbolic states is retrieved and converted to a list ys. Finally, lines 10-12 ensure that all states in xs are subsumed by some state in ys.

To prove soundness of *check*, we mainly need correctness theorems for the monadic combinators monadic list all and monadic list ex. Given a list xs and a monadic implementation Q<sup>i</sup> of a predicate Q, they check whether all states (at least one state) in xs satisfy (satisfies) Q. This is the correctness theorem for monadic list all, for instance:

$$\begin{aligned} (\forall x. Q\_i \, x \le \textsf{SPEC} \, (\lambda r. r \longleftrightarrow Q \, x))\\ \implies \qquad \qquad monadir. \mathtt{List}. \mathtt{all}. \mathtt{xs} \, Q\_i \le \textsf{SPEC} \, (\lambda r. r \longleftrightarrow list \, \textsf{all}. \mathtt{xs} \, Q) \end{aligned}$$

where *list all* xs Q holds if and only if Q holds for all elements in xs. After setting up the IRF's verification condition generator with this rule and the corresponding rule for *monadic list ex*, it is easy to prove that *check* is sound:

$$\text{check}\left(L, M\right) \le \mathsf{SPEC}\left(\lambda r. r \implies closed\left(L, M\right)\right)$$

where the property *closed* (L, M) corresponds to condition (∗) from above.

We then use standard refinement techniques to obtain an algorithm *check*<sup>i</sup> that refines *check*, replacing sets by lists. However, the algorithm is still specified in the non-determinism monad and therefore not executable. We use a simple technique to make it executable. Consider the following theorem for *monadic list all*:

*monadic list all* xs (λx.return (P x)) = return (*list all* xs P).

It allows us to replace the non-deterministic combinator *monadic list all* by the deterministic *list all*, pushing return to the outside. By exhaustively applying a set of such rewrite rules we obtain an alternative definition of *check*<sup>i</sup> where return appears only on the outermost level, and the inner term is deterministic and thus executable. Using these techniques, we obtain a simple certificate checker that is executable, provided that we can implement the elementary model checking primitives such as the subsumption check or computing the list of successors of a state.

#### **2.3 Transferring the Correctness Theorem**

For timed automata, the abstract transition system studied above is the zone graph →ZG(A) of a given (single) automaton A. One can show that it simulates →<sup>A</sup> (completeness of →ZG(A)):

$$(l, u) \to\_A (l', u') \land u \in Z \Longrightarrow (\exists Z'. \ (l, Z) \to\_{Z \hookrightarrow (A)} (l', Z') \land u' \in Z') \ .$$

This simulation property is sufficient to establish that if there is no reachable state (l, Z) in →ZG(A) with F l, then no final state (l, u) is reachable in →A:

$$\begin{aligned} (\nexists l, Z. \left(l\_0, Z\_0\right) \to^\*\_{ZG(A)} \left(l, Z\right) \land F \, l\right) \land u\_0 \in Z\_0\\ \Longrightarrow \qquad \left(\not\exists l, u. \left(l\_0, u\_0\right) \to^\*\_{A} \left(l, u\right) \land F \, l\right) \end{aligned}$$

In the formalization, these proofs rely on instantiating a general theory of simulations in transition systems that is derived from the theory of Wimmer and Lammich [31]. From Corollary 1 we get that there is no reachable final state in →ZG(A) if the certificate check is passed. Finally, by correctness of the renaming process and the product construction, we can conclude that there is no final reachable state in the input model if there is no final reachable state in →A.

#### **2.4 Implementing a Concrete Checker**

All the elementary model checking primitives we need for certification have already been implemented [31]. The abstract implementation presented above assumes that the model checking primitives are implemented in a purely functional manner (as they are just regular HOL functions). The existing (verified) model checker [31], however, is an imperative implementation in the Imperative HOL framework. Imperative HOL [7] is a framework for specifying and reasoning about imperative programs in Isabelle/HOL. It provides a *heap monad* in which one can use—analogously to the ML family of programming languages—imperative references and arrays to express imperative programs. Usually, once we have used an imperative implementation anywhere, the whole program would need to be stated in the heap monad. However, we can employ a technique similar to the one that is used for Haskell's *ST* monad [21] to erase the heap monad in a safe way under certain circumstances.

More precisely, if it can be deduced from the type of an imperative computation that no information about references or arrays on the heap can be leaked to the outside of the computation in its result, then the heap monad can be erased for this computation, yielding a pure computation. In the certifier, this is primarily used for computing the symbolic successor of a zone Z for a certain transition. To that end, an immutable representation of the DBM M corresponding to Z is copied to the a newly allocated imperative array, then the imperative pipeline of computations to compute the successor M is applied to M, and finally M- is copied back to an immutable array. Taken together, this whole computation does not contain the type of an array or reference in its result type, and thus can safely be turned into a pure computation. As a consequence, we are able to reuse the existing verified model checking primitives, while being able to state the certificate checking algorithm purely functionally.

In the concrete checker, the mapping M is implemented using a verified functional hash table implementation based on so-called *diff arrays* [19]. This data structure provides a purely functional interface to an underlying imperative array. When a diff array is updated, it performs the update on the imperative array, and stores a difference that can be used to re-compute the old state of the array. Reading from the most recent version of a diff array is fast as the value can directly be read from the underlying imperative array. If an old version is accessed, the whole array has to be copied to recompute the old version. This gives diff arrays good performance characteristics, as long as they are mostly used linearly. This is the case in our application as the hash table is filled in an initial phase, after which the hash table is used in a read-only manner.

#### **2.5 Parallel Execution**

The attentive reader may wonder why we care about a purely functional implementation of the certificate checker at all. Indeed, we could use existing techniques [31] to obtain an imperative implementation of the certificate checker in the heap monad. However, in this setting it would be hard to justify the soundness of executing parts of the checker in parallel. In the purely functional setting, this is much simpler. Our approach to parallel execution is minimalist: we only provide means to execute the *map* combinator on lists in parallel. This is achieved by another custom code translation that is part of the trusted code base. The parallel implementation of *map* uses a task queue that will contain the individual computations that need to be run for each element of *xs*, and uses a fixed number of threads to work through this list and assemble the final result.

We exploit this *map* implementation to work through the list of discrete states L in parallel, using the equivalence:

$$list\\_all\ Q\ xs = list\ \text{\$all\ id\$}\ (map\ Q\ xs)\ \text{\$.}$$

In doing so, we lose the ability to stop execution early once a list element does not satisify Q. For the certificate checker, however, we assume that usually the certificate is correct, meaning that we have to go through the whole list anyway. We only parallelize the outermost loop of check<sup>i</sup> because this should yield reasonably-sized work portions, given that the size of L will typically at least be in the hundreds.

## **3 Scaling Performance**

In this section we discuss two techniques to improve the performance of the certificate checker without increasing the verification effort significantly.

#### **3.1 Monomorphization**

Isabelle/HOL supports polymorphism and type classes, which are valuable features for sizeable formalization efforts. Large parts of our formalization also make use of these features, e.g., most of the timed automata semantics are formalized for a general time domain, and operations on DBMs are applicable on DBMs whose entries are formed from more general algebraic structures than the ring of integers. While this yields an abstract and general formalized theory, it can get in our way when trying to obtain efficient code.

When generating SML code from HOL, Isabelle uses a so-called dictionary construction to compile out type classes, which are not supported by SML. This means that most functions carry a large number of additional parameters, which are used to look up elementary operations, such as addition of two numbers. These additional lookup operations degrade performance. One solution is to ensure that all relevant constants that are exported to SML are monomorphic (i.e. specialized to the integer type), eliminating the need for the dictionary construction in most places. Thus, we apply a semi-automated procedure to achieve this monomorphization.

#### **3.2 Integer Representation**

Types such as *int* or *nat* are unbounded in Isabelle/HOL meaning they are implemented with the help of big integers in the target languages. To improve performance, we want to use machine integers instead, and instruct Isabelle/HOL's code generator to do that. This is still sound: SML's standard integer operations throw an exception if an overflow occurs instead of silently wrapping around. The code generator can only achieve partial correctness anyway: if program execution does not fail, then its result is consistent with the evaluated HOL term.

## **3.3 Refined Code Equations**

The last type of optimizations we use can be considered to belong to the category of micro-optimizations. These are improved code generator translations for elementary operations and combinators. We employ such improved translations to use native implementation language primitives to convert from mutable to immutable arrays and back. The other such optimizations we use, is to directly use integer values as counters in imperative loops instead of a natural number representation that would box the integers in a data constructor. In the same way, we use integers directly for array indexing.

## **4 Certificate Compression**

In this section, we present two techniques to compress the unreachability certificate. By compression we mean reducing the number of zones that are present in the certificate for each discrete state, using the unverified model checker. The first technique relies on subsumption. As explained above, it is possible that the model checker adds a zone *Z* to the set of explored states and later another zone *Z* with Z ⊆ Z- (i.e. Z subsumes Z). Thus the first technique simply filters the set w.r.t. ⊆ in the end.

The second technique relies on the following idea: we replace one or more zones by their union, and check that the state space is still closed. This means that we have to check that all the successors of the larger zone are still covered by the current set of states. In that case, we can discard the old zones, and replace them by their union. As the union of two zones is not necessarily convex and thus cannot be represented as a DBM, we do not compute a precise union of zones but their convex hull. This operation is rather cheap as it amounts to taking the pointwise maximum of DBM entries. After computing the convex hull of a number of zones (in canonical form), we only need to apply the expensive operation to restore a canonical form once.

The latter technique yields a whole family of compression algorithms by iterating one of the following operations for each discrete state until a fixed-point is reached:


The next section contains an experimental evaluation of these techniques.

Note that similar techniques for reducing the search-space could also be applied already during model checking. By doing so, the number of states explored and the runtime of model checking could be reduced. This, however, comes at the risk of producing spurious model checking results (i.e. a final state might be deemed reachable, although there is no corresponding reachable state in the timed automaton).

## **5 Experimental Evaluation**

We evaluate the checker on a set of benchmarks that is derived from Uppaal's standard benchmark suite [22]. Additionally, to cover the advanced modeling features of committed locations and broadcast channels, we use a set of benchmarks that is derived from the pacemaker models of Jiang et al. [17] and a modified version of the FDDI benchmark with broadcast channels. A prototype SML implementation of a timed automata model checker (Mlunta) is used to compute the certificates. We use reachability properties of the form **E**♦ *false* to enforce that the model checker explores the complete state space. The results are given in Table 1. The problem size is specified as the number of automata in the network. We report the total runtime (wall time) of:


As can be seen from the results, the tandem is still one order of magnitude slower than Uppaal, but certificate checking in isolation is also up to one order of magnitude faster than the previous verified model checker [31]. Note that Mlunta explores significantly more states than Uppaal and Munta for "Pacemaker". Multi-core scale beyond two threads is relatively unsatisfactory, however. In microbenchmarks, we have identified that the problem appears to be with memory allocation on the heap, even if no data is shared among threads (in our case, only the certificate is shared but successors are computed locally). There does not

seem to be an obvious way to improve on this situation for SML implementations. Finally, one can see that the verified certifier is not drastically slower than the unverified implementation based on Mlunta, indicating that the verified certifier is not missing any obvious significant optimizations.


Table 1: Benchmarks results on a machine with 16 GB RAM and an Intel(R) Core(TM) i7-4610M CPU at 3.00GHz with two cores and two threads per core. The column labeled "Tandem" gives the runtime for a combination of the unverified SML tool and the verified certificate checker. The next column gives the runtime of the unverified SML certifier, followed by the runtimes of the verified checker for a varying number of threads. All times are given in seconds.

Table 2 gives the results of evaluating the different compression algorithms on the same set of benchmarks. The second variant is always applied to the compression result of the first variant to avoid trivial computations of the convex hull. Variant 2c) (the most expensive one) can produce drastically smaller certificates than any other variant, and its minimum compression factor is an order of a magnitude higher than for any other variant. Nevertheless, only variants 1 and 2a) appear to be useful in practice, as they are relatively cheap to compute. The other variants could prove useful if the certificates were produced by a significantly more efficient model checker, such as Uppaal or TChecker [15]. On a final note, we have constructed a more than 95% smaller but valid certificate for the Fischer benchmark, suggesting that there is room for improvement on the compression algorithms.


Table 2: Certificate compression factors (given in %).

## **6 Conclusion and Future Work**

We have presented a verified certifier of unreachability certificates for a timed automata. The certificates are ought to be produced by an unverified model checker. Experimentation shows that verified certificate checking in isolation is up to an order of magnitude faster than what was previously possible with a verified model checker [31]. The performance of a tandem of an unverified model checker and the verified certifier could be improved by replacing the certificate-producing part with a highly optimized tool, possibly opening room to use some of the more powerful certificate compression techniques we suggested above. As we pointed out above, there appears to be further room for improvement on the certificate compression algorithms as well.

Moreover, more sophisticated tools also employ more powerful abstraction techniques, for which our proposed certification technique is still suitable—to a large extent without requiring additional verification effort. An exception is the implicit abstraction technique studied by Herbreteau et al. [14] as it does not compute abstractions of zones explicitly but rather checks subsumptions of the form Z ⊆ α(Z- ) implicitly, meaning that one would have to prove correctness of the subsumption check to validate certificates produced by such a model checking process.

Finally, we intend to extend this work to certification of emptiness of timed B¨uchi automata in the future, using the idea of *subsumption graphs* [13] and relying on an unverified model checker implementation for timed B¨uchi automata to produce the certificates [13,18].

## **Data Availability Statement**

The datasets generated and/or analyzed during the current study are available in the Zenodo repository [32]: https://doi.org/10.5281/zenodo.3679245. The artifact has been tested on the TACAS artifact evaluation VM [12].

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### Learning One-Clock Timed Automata*-*

Jie An1(-) , Mingshuai Chen2,3,<sup>4</sup> , Bohua Zhan3,<sup>4</sup> , Naijun Zhan3,4(-) , and Miaomiao Zhang1(-)

<sup>1</sup> School of Software Engineering, Tongji University, Shanghai, China {1510796,miaomiao}@tongji.edu.cn <sup>2</sup> Lehrstuhl fur Informatik 2, RWTH Aachen University, Aachen, Germany ¨ chenms@cs.rwth-aachen.de <sup>3</sup> State Key Lab. of Computer Science, Institute of Software, CAS, Beijing, China

{bzhan,znj}@ios.ac.cn <sup>4</sup> University of Chinese Academy of Sciences, Beijing, China

Abstract. We present an algorithm for active learning of deterministic timed automata with a single clock. The algorithm is within the framework of Angluin's L<sup>∗</sup> algorithm and inspired by existing work on the active learning of symbolic automata. Due to the need of guessing for each transition whether it resets the clock, the algorithm is of exponential complexity in the size of the learned automata. Before presenting this algorithm, we propose a simpler version where the teacher is assumed to be *smart* in the sense of being able to provide the reset information. We show that this simpler setting yields a polynomial complexity of the learning process. Both of the algorithms are implemented and evaluated on a collection of randomly generated examples. We furthermore demonstrate the simpler algorithm on the functional specification of the TCP protocol.

Keywords: Automaton learning · Active learning · One-clock timed automata · Timed language · Reset-logical-timed language.

## 1 Introduction

In her seminal work [10], Angluin introduced the L<sup>∗</sup> algorithm for learning a regular language from queries and counterexamples within a query-answering framework. The Angluin-style learning therefore is also termed *active learning* or *query learning*, which is distinguished from *passive learning*, i.e., generating a model from a given data set. Following this line of research, an increasing number of efficient active learning methods (cf. [38]) have been proposed to learn, e.g., Mealy machines [34,30], I/O automata [2], register automata [25,1,15], nondeterministic finite automata [12], Buchi ¨ automata [19,28], symbolic automata [29,18,11] and Markov decision processes [36], to name just a few. Full-fledged libraries, tools and applications are also available for automata-learning tasks [13,27,20,21].

<sup>-</sup> This work has been partially funded by NSFC under grant No. 61625206, 61972284, 61732001 and 61872341, by the ERC Advanced Project FRAPPANT under grant No. 787914, and by the CAS Pioneer Hundred Talents Program under grant No. Y9RC585036.

For real-time systems where timing constraints play a key role, however, learning a formal model is much more complicated. As a classical model for real-time systems, timed automata [4] have an infinite set of timed actions. This yields a fundamental difference to finite automata featuring finite alphabets. Moreover, it is difficult to detect resets of clock variables from observable behaviors of the system. This makes learning formal models of timed systems a challenging yet interesting problem.

Various attempts have been carried out in the literature on learning timed models, which can be classified into two tracks. The first track pursues active learning methods, e.g. [22] for learning event-recording automata (ERA) [5] and [9] for learning real-time automata (RTA) [17]. ERA are time automata where, for every untimed action a, a clock is used to record the time of the last occurrence of a. The underlying learning algorithm [ clocks for recording events. RTA are a class of special timed automata with one clock to record the execution time of each action by resetting at the starting. The other track pursues passive learning. In [42,41], an algorithm was proposed to learn deterministic RTA. The basic idea is that the learner organizes a tree sketching traces of the data set while merging nodes of the tree following a certain heuristic function. A passive learning algorithm for timed automata with one clock was further proposed in [39,40]. A common weakness of passive learning methods is that the generated model merely accepts all positive traces while it rejects all negative ones for the given set of traces, without guaranteeing that it is a correct model of the target system. A theoretical result was established in [40] showing it is possible to obtain the target system by continuously enriching the data set, however the number of iterations is unknown. In addition, the passive learning methods cited above concern only discrete-time semantics of the underlying timed models, i.e., the clock takes values from non-negative integers. We furthermore refer the readers to [14,32] for learning specialized forms of practical timed systems in a passive manner, [37] for passively learning timed automata using genetic programming which scales to automata of large sizes, [33] for learning probabilistic real-time automata incorporating clustering techniques in machine learning, and [36] for <sup>L</sup><sup>∗</sup>-based learning of Markov decision processes with testing and sampling. 22], however,is prohibitively complex due to too many degrees of freedom and multiple

In this paper, we present the first active learning method for deterministic one-clock timed automata (DOTAs) under continuous-time semantics1. Such timed automata provide simple models while preserving adequate expressiveness, and therefore have been widely used in practical real-time systems [35,3,16]. We present our approach in two steps. First, we describe a simpler algorithm, under the assumption that the teacher is *smart* in the sense of being able to provide information about clock resets in membership and equivalence queries. The basic idea is as follows. We define the *reset-logicaltimed language* of a DOTA and show that the timed languages of two DOTAs are equivalent if their reset-logical-timed languages are equivalent, which reduces the learning problem to that of learning a reset-logical-timed language.Then we show how to learn the reset-logical-timed language following Maler and D'Antoni's learning algorithms for symbolic automata [29,18]. We claim the correctness, termination and polynomial complexity of this learning algorithm. Next, we extend this algorithm to the case of a normal teacher. The main difference is that the learner now needs to *guess* the reset

<sup>1</sup> The proposed learning method applies trivially to discrete-time semantics too.

information on transitions discovered in the observation table. Due to these guesses, the latter algorithm features exponential complexity in the size of the learned automata. The proposed learning methods are implemented and evaluated on randomly generated examples. We also demonstrate the simpler, polynomial algorithm on a practical case study concerning the functional specification of the TCP protocol. Detailed proofs for theorems and lemmas in this paper can be found in Appendix A of the full version [7].

In what follows, Sect. 2 provides preliminary definitions on one-clock timed automata. The learning algorithm with a smart teacher is presented and analyzed in Sect. 3. We then present the situation with a normal teacher in Sect. 4. The experimental results are reported in Sect. 5. Finally, Sect. 6 concludes this paper.

## 2 Preliminaries

Let <sup>R</sup>≥<sup>0</sup> and <sup>N</sup> be the set of non-negative reals and natural numbers, respectively, and <sup>B</sup> the Boolean set. We use to stand for true and <sup>⊥</sup> for false. The projection of an n-tuple **<sup>x</sup>** onto its first two components is denoted by Π{1,2}**x**, which extends to a sequence of tuples as Π{1,2}(**x**<sup>1</sup>,..., **<sup>x</sup>**k) = - <sup>Π</sup>{1,2}**x**<sup>1</sup>,...,Π{1,2}**x**<sup>k</sup> .

*Timed automata* [4], a kind of finite automata extended with a finite set of realvalued clocks, are widely used to model real-time systems. In this paper, we consider a subclass of timed automata with a single clock, termed *one-clock timed automata* (OTAs). Let <sup>c</sup> be the clock variable, denote by <sup>Φ</sup><sup>c</sup> the set of clock constraints of the form φ ::= -<sup>|</sup> c m <sup>|</sup> φ <sup>∧</sup> φ, where m <sup>∈</sup> <sup>N</sup> and ∈ {=, <, >, <sup>≤</sup>, ≥}.

Definition 1 (One-clock timed automata). *A one-clock timed automaton* <sup>A</sup> = (Σ, Q, <sup>q</sup>0, F, c, Δ)*, where* Σ *is a finite set of actions, called the* alphabet*;* Q *is a finite set of locations;* <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup> *is the initial location;* <sup>F</sup> <sup>⊆</sup> <sup>Q</sup> *is a set of accepting locations;* <sup>c</sup> *is the unique clock; and* <sup>Δ</sup> <sup>⊆</sup> <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>×</sup> <sup>Φ</sup><sup>c</sup> <sup>×</sup> <sup>B</sup> <sup>×</sup> <sup>Q</sup> *is a finite set of transitions.*

A transition δ = (q, σ, φ, b, q ) allows a jump from the *source location* q to the *target location* <sup>q</sup> by performing the action <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> if the constraint <sup>φ</sup> <sup>∈</sup> <sup>Φ</sup><sup>c</sup> is satisfied. Meanwhile, clock c is reset to zero if b <sup>=</sup> -, and remains unchanged otherwise.

<sup>A</sup> *clock valuation* is a function <sup>ν</sup> : <sup>c</sup> <sup>→</sup> <sup>R</sup>≥<sup>0</sup> that assigns a non-negative real number to the clock. For t <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>, let <sup>ν</sup> <sup>+</sup> <sup>t</sup> be the clock valuation with (<sup>ν</sup> <sup>+</sup> <sup>t</sup>)(c) = <sup>ν</sup>(c) + <sup>t</sup>. According to the definitions of clock valuation and clock constraint, a transition *guard* can be represented as an interval whose endpoints are in <sup>N</sup>∪{∞}. For example, <sup>φ</sup><sup>1</sup> : c < <sup>5</sup> <sup>∧</sup> <sup>c</sup> <sup>≥</sup> <sup>3</sup> is represented as [3, 5), <sup>φ</sup><sup>2</sup> : <sup>c</sup> = 6 as [6, 6], and <sup>φ</sup><sup>3</sup> : as [0, <sup>∞</sup>). We will use the inequality- and interval-representation interchangeably in this paper.

<sup>A</sup> *state* s of <sup>A</sup> is a pair (q, ν), where q <sup>∈</sup> Q and ν is a clock valuation. A *run* ρ of <sup>A</sup> is a finite sequence ρ = (q<sup>0</sup>, ν<sup>0</sup>) <sup>t</sup>1,σ<sup>1</sup> −−−→ (q<sup>1</sup>, ν<sup>1</sup>) <sup>t</sup>2,σ<sup>2</sup> −−−→ ··· <sup>t</sup>n,σ<sup>n</sup> −−−→ (q<sup>n</sup>, ν<sup>n</sup>), where <sup>ν</sup><sup>0</sup>(c)=0, <sup>t</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> stands for the time delay spending on <sup>q</sup><sup>i</sup>−<sup>1</sup> before <sup>δ</sup><sup>i</sup> <sup>=</sup> (q<sup>i</sup>−<sup>1</sup>, σ<sup>i</sup>, φ<sup>i</sup>, b<sup>i</sup>, q<sup>i</sup>) <sup>∈</sup> <sup>Δ</sup> is taken, only if (1) <sup>ν</sup><sup>i</sup>−<sup>1</sup>+t<sup>i</sup> satisfies <sup>φ</sup><sup>i</sup>, (2) <sup>ν</sup><sup>i</sup>(c) = <sup>ν</sup><sup>i</sup>−<sup>1</sup>(c)+ <sup>t</sup><sup>i</sup> if <sup>b</sup><sup>i</sup> <sup>=</sup> <sup>⊥</sup>, otherwise <sup>ν</sup><sup>i</sup>(c)=0, for all <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>. A run <sup>ρ</sup> is *accepting* if <sup>q</sup><sup>n</sup> <sup>∈</sup> <sup>F</sup>.

The *trace* of a run ρ is a timed word, denoted by *trace*(ρ). *trace*(ρ) = if ρ <sup>=</sup> (q<sup>0</sup>, ν<sup>0</sup>), and *trace*(ρ)=(σ<sup>1</sup>, t<sup>1</sup>)(σ<sup>2</sup>, t<sup>2</sup>)···(σ<sup>n</sup>, t<sup>n</sup>)if ρ = (q<sup>0</sup>, ν<sup>0</sup>) <sup>t</sup>1,σ<sup>1</sup> −−−→ (q<sup>1</sup>, ν<sup>1</sup>) <sup>t</sup>2,σ<sup>2</sup> −−−→ ··· <sup>t</sup>n,σ<sup>n</sup> −−−→ (q<sup>n</sup>, ν<sup>n</sup>). Since <sup>t</sup><sup>i</sup> is the time delay on <sup>q</sup><sup>i</sup>−<sup>1</sup>, for <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, such a timed

word is also called *delay-timed word*. The corresponding *reset-delay-timed word* can be defined as *trace*r(ρ)=(σ1, t1, b1)(σ2, t2, b2)···(σn, tn, bn), where <sup>b</sup><sup>i</sup> is the reset indicator for <sup>δ</sup>i, for <sup>1</sup> <sup>≤</sup> i <sup>≤</sup> n. If ρ is an accepting run of <sup>A</sup>, *trace*(ρ) is called an *accepting timed word*. The *recognized timed language* of A is the set of accepting delay-timed words, i.e., <sup>L</sup>(A) = {*trace*(ρ)<sup>|</sup> ρ is an accepting run of A}. The *recognized reset-timed language* <sup>L</sup>r(A) is defined as {*trace*r(ρ)<sup>|</sup> ρ is an accepting run of A}.

The delay-timed word ω = (σ1, t1)(σ2, t2)···(σn, tn) is observed outside, from the view of the global clock. On the other hand, the behavior can also be observed inside, from the view of the local clock. This results in a *logical-timed word* of the form <sup>γ</sup> = (σ1, μ1)(σ2, μ2)···(σ<sup>n</sup>, μ<sup>n</sup>) with <sup>μ</sup><sup>i</sup> <sup>=</sup> <sup>t</sup><sup>i</sup> if <sup>i</sup> = 1∨b<sup>i</sup>−<sup>1</sup> <sup>=</sup> and <sup>μ</sup><sup>i</sup> <sup>=</sup> <sup>μ</sup><sup>i</sup>−<sup>1</sup> <sup>+</sup>t<sup>i</sup> otherwise. We will denote the mapping from delay-timed words to logical-timed words above by Γ.

Similarly, we introduce *reset-logical-timed word* <sup>γ</sup><sup>r</sup> = (σ1, μ1, b1)(σ2, μ2, b2) ··· (σ<sup>n</sup>, μ<sup>n</sup>, b<sup>n</sup>) as the counterpart of <sup>ω</sup><sup>r</sup> = (σ1, t1, b1)(σ2, t2, b2)···(σ<sup>n</sup>, t<sup>n</sup>, b<sup>n</sup>) in terms of the local clock. Without any substantial change, we can extend the mapping Γ to map reset-delay-timed words to reset-logical-timed words. The *recognized logical-timed language* of <sup>A</sup> is given as L(A) = {Γ(*trace*(ρ))<sup>|</sup> ρ is an accepting run of A}, and the *recognized reset-logical-timed language* of <sup>A</sup> as L<sup>r</sup>(A) = {Γ(*trace*r(ρ))<sup>|</sup> <sup>ρ</sup> is an accepting run of A}.

An OTA is a *deterministic one-clock timed automaton* (DOTA) if there is at most one run for a given delay-timed word. In other words, for any location q <sup>∈</sup> Q and action σ <sup>∈</sup> Σ, the guards of transitions outgoing from q labelled with σ are disjoint subsets of <sup>R</sup>≥0. We say a DOTA is *complete* if for any of its location q <sup>∈</sup> Q and action σ <sup>∈</sup> Σ, the corresponding guards form a partition of <sup>R</sup>≥0. This means any given delay-timed word has exactly one run. Any DOTA A can be transformed into a complete DOTA (referred to as COTA) <sup>A</sup> accepting the same timed language as follows: (1) Augment Q with a "sink" location <sup>q</sup><sup>s</sup> which is not an accepting location; (2) For every <sup>q</sup> <sup>∈</sup> <sup>Q</sup> and σ <sup>∈</sup> Σ, if there is no outgoing transition from q labelled with σ, introduce a (resetting) transition from <sup>q</sup> to <sup>q</sup><sup>s</sup> with label <sup>σ</sup> and guard [0, <sup>∞</sup>); (3) Otherwise, let <sup>S</sup> be the subset of <sup>R</sup>≥<sup>0</sup> not covered by the guards of transitions from <sup>q</sup> with label <sup>σ</sup>. Write <sup>S</sup> as a union of intervals <sup>I</sup><sup>1</sup>,...,I<sup>k</sup> in a minimal way, then introduce a (resetting) transition from <sup>q</sup> to <sup>q</sup><sup>s</sup> with label <sup>σ</sup> and guard <sup>I</sup><sup>j</sup> for each <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup>.

From now on, we therefore assume that we are working with COTAs.

*Example 1.* Fig. <sup>1</sup> depicts the transformation of a DOTA <sup>A</sup> (left part) into a COTA <sup>A</sup> (right part). First, a non-accepting "sink" location <sup>q</sup><sup>s</sup> is introduced. Second, we introduce three fresh transitions (marked in blue) from <sup>q</sup><sup>1</sup> to <sup>q</sup><sup>s</sup> as well as transitions from <sup>q</sup><sup>s</sup> to itself. At last, for location <sup>q</sup><sup>0</sup> and label <sup>a</sup>, the existing guards cover (1, 3), with complement [0, 1] <sup>∪</sup> [3, <sup>∞</sup>). Hence, we introduce transitions (q<sup>0</sup>, a, [0, 1], -, q<sup>s</sup>) and (q<sup>0</sup>, a, [3, <sup>∞</sup>), -, q<sup>s</sup>). Two fresh transitions from <sup>q</sup><sup>1</sup> to <sup>q</sup><sup>s</sup> are introduced similarly.

## 3 Learning from a Smart Teacher

In this section, we consider the case of learning a COTA A with a smart teacher. Our learning algorithm relies on the following reduction of the equivalence over timed languages to that of reset-logical timed languages.

Fig. 1: A DOTA <sup>A</sup> on the left and the corresponding COTA <sup>A</sup> on the right. The initial location is indicated by 'start' and an accepting location is doubly circled.

## Theorem 1. *Given two DOTAs* <sup>A</sup> *and* <sup>B</sup>*, if* L<sup>r</sup>(A) = L<sup>r</sup>(B)*, then* <sup>L</sup>(A) = <sup>L</sup>(B)*.*

Theorem <sup>1</sup> assures that <sup>L</sup><sup>r</sup>(H) = L<sup>r</sup>(A) implies <sup>L</sup>(H) = <sup>L</sup>(A), that is, to construct a COTA <sup>A</sup> that recognizes a target timed language <sup>L</sup> <sup>=</sup> <sup>L</sup>(A), it suffices to learn a *hypothesis* H which recognizes the same reset-logical timed language. For equivalence queries, instead of checking directly whether <sup>L</sup><sup>r</sup>(H) = L<sup>r</sup>(A), the contraposition of Theorem 1 guarantees that we can perform equivalence queries over their timed counterparts: if <sup>L</sup>(H) = <sup>L</sup>(A), then <sup>H</sup> recognizes the target language already; otherwise, a counterexample making <sup>L</sup>(H) <sup>=</sup> <sup>L</sup>(A) yields an evidence also for L<sup>r</sup>(H) <sup>=</sup> <sup>L</sup><sup>r</sup>(A).

We now describe the behavior of the teacher who keeps an automaton A to be learnt, while providing knowledge about the automaton by answering membership and equivalence queries through an oracle she maintains. For the membership query, the teacher receives a logical-timed word γ and returns whether γ is in L(A). In addition, she is smart enough to return the reset-logical-timed word <sup>γ</sup><sup>r</sup> that corresponds to <sup>γ</sup> (the exact correspondence is described in Sect. 3.1). For the equivalence query, the teacher is given a hypothesis <sup>H</sup> and decides whether <sup>L</sup>(H) = <sup>L</sup>(A). If not, she is smart enough to return a reset-delayed-timed word <sup>ω</sup><sup>r</sup> as a counterexample. The usual case where a teacher can deal with only standard delay-timed words will be discussed in Sect. 4.

*Remark 1.* The assumption that the teacher can respond with timed words coupled with reset information is reasonable, in the sense that the learner can always infer and detect the resets of the logical clock by referring to a global clock on the wall, as long as he can observe running states of A, i.e., observing the clock valuation of the system whenever an event happens therein. This conforms with the idea of combining automata learning with white-box techniques, as exploited in [24], providing that in many application scenarios source code is available for the analysis.

In what follows, we elaborate the learning procedure including membership queries, hypotheses construction, equivalence queries and counterexample processing.

#### 3.1 Membership query

In our setting, the oracle maintained by the smart teacher can be regarded as a COTA <sup>A</sup> that recognizes the target timed language <sup>L</sup>, and thereby its logical-timed language <sup>L</sup>(A) and reset-logical-timed counterpart <sup>L</sup><sup>r</sup>(A). In order to collect enough information for constructing a hypothesis, the learner makes membership queries as "Is the logicaltimed word γ in L(A)?". If there does not exist a run ρ such that Γ(*trace*(ρ)) = γ, meaning that there is some k such that the run is blocked after the k'th action (i.e. γ is *invalid*) and hence the teacher gives a negative answer, associated with a resetlogical-timed word <sup>γ</sup><sup>r</sup> where all <sup>b</sup>i's with i>k are set to -; If there exists a run ρ (which is unique due to the determinacy of <sup>A</sup>) that admits γ (i.e., γ is *valid*), the teacher answers "Yes", if ρ is accepting, or "No" otherwise, while in both cases providing the corresponding reset-logical-timed word <sup>γ</sup><sup>r</sup>, with <sup>Π</sup>{1,2}γ<sup>r</sup> <sup>=</sup> <sup>γ</sup>.

For the sake of simplicity, we define a function π that maps a logical-timed word to its unique reset-logical-timed counterpart in membership queries. Information gathered from the membership queries is stored in a timed observation table defined as follows.

Definition 2 (Timed observation table). *A timed observation table for a COTA* A *is a 7-tuple* **<sup>T</sup>** = (Σ, *<sup>Σ</sup>*, *<sup>Σ</sup><sup>r</sup>*,*S*, *<sup>R</sup>*, *<sup>E</sup>*, f) *where* <sup>Σ</sup> *is the finite alphabet; <sup>Σ</sup>* <sup>=</sup> <sup>Σ</sup> <sup>×</sup> <sup>R</sup>≥<sup>0</sup> *is the infinite set of logical-timed actions; <sup>Σ</sup><sup>r</sup>* <sup>=</sup> <sup>Σ</sup> <sup>×</sup> <sup>R</sup>≥<sup>0</sup> <sup>×</sup> <sup>B</sup> *is the infinite set of reset-logical-timed actions; <sup>S</sup>*, *<sup>R</sup>* <sup>⊂</sup> *<sup>Σ</sup>*<sup>∗</sup> *<sup>r</sup> and E* ⊂ *Σ*<sup>∗</sup> *are finite sets of words, where S is called the set of prefixes, R the boundary, and E the set of suffixes. Specifically,*


Given a table **<sup>T</sup>**, we define row : *<sup>S</sup>* <sup>∪</sup> *<sup>R</sup>* <sup>→</sup> (*<sup>E</sup>* → {+, −}) as a function mapping each <sup>γ</sup><sup>r</sup> <sup>∈</sup> *<sup>S</sup>* <sup>∪</sup> *<sup>R</sup>* to a vector indexed by <sup>e</sup> <sup>∈</sup> *<sup>E</sup>*, each of whose components is defined as <sup>f</sup>(γ<sup>r</sup> · <sup>e</sup>), denoting a potential location.

Before constructing a hypothesis H based on the timed observation table **T**, the learner has to ensure that **T** satisfies the following conditions:


A timed observation table **T** is *prepared* if it satisfies the above five conditions. To get the table prepared, the learner can perform the following operations:

*Making* **<sup>T</sup>** *closed.* If **<sup>T</sup>** is not closed, there exists r <sup>∈</sup> *<sup>R</sup>* such that for all s <sup>∈</sup> *<sup>S</sup>* row(r) <sup>=</sup> row(s). The learner thus can move such r from *<sup>R</sup>* to *<sup>S</sup>*. Moreover, each reset-logical-timed word <sup>π</sup>(Π{1,2}r · *<sup>σ</sup>*) needs to be added to *<sup>R</sup>*, where *<sup>σ</sup>* = (σ, 0) for all σ <sup>∈</sup> Σ. Such an operation is important since it guarantees that at every location all actions in Σ are enabled, while specifying a clock valuation of these actions, despite that some invalid logical-timed words might be involved. Particularly, giving a bottom value 0 as the clock valuation satisfies the precondition of the partition functions that will be described in Sect. 3.2.

*Making* **T** *consistent.* If **T** is not consistent, one inconsistency is resolved by adding *<sup>σ</sup>* · e to *<sup>E</sup>*, where *<sup>σ</sup>* and e can be determined as follows. T being inconsistent implies that there exist two reset-logical-timed words <sup>γ</sup>r, γ<sup>r</sup> <sup>∈</sup> *<sup>S</sup>* <sup>∪</sup> *<sup>R</sup>* at least, such that <sup>γ</sup><sup>r</sup> · *<sup>σ</sup>r*, γ<sup>r</sup> · *σ<sup>r</sup>* <sup>∈</sup> *<sup>S</sup>* <sup>∪</sup> *<sup>R</sup>* and <sup>Π</sup>{1,2}*σ<sup>r</sup>* <sup>=</sup> <sup>Π</sup>{1,2}*σ<sup>r</sup>* for some *<sup>σ</sup>r*,*σ<sup>r</sup>* ∈ *Σr*, with row(γr) = row(γ<sup>r</sup> ) but row(γ<sup>r</sup> · *<sup>σ</sup>r*) <sup>=</sup> row(γ<sup>r</sup> · *σ<sup>r</sup>* ). So, let *<sup>σ</sup>* <sup>=</sup> <sup>Π</sup>{1,2}*σ<sup>r</sup>* <sup>=</sup> <sup>Π</sup>{1,2}*σ<sup>r</sup>* and <sup>e</sup> <sup>∈</sup> *<sup>E</sup>* such that <sup>f</sup>(γr*σ<sup>r</sup>* · <sup>e</sup>) <sup>=</sup> <sup>f</sup>(γ<sup>r</sup> *σ<sup>r</sup>* · e). Thereafter, the learner fills the table by making membership queries. Note that this operation keeps the set *E* of suffixes being a set of logical-timed words.

*Making* **T** *evidence-closed.* If **T** is not evidence-closed, then the learner needs to add all prefixes of π(Π{1,2}<sup>s</sup> · <sup>e</sup>) to *<sup>R</sup>* for every <sup>s</sup> <sup>∈</sup> *<sup>S</sup>* and <sup>e</sup> <sup>∈</sup> *<sup>E</sup>*, except those already in *S* ∪ *R*. Similarly, the learner needs to fill the table through membership queries.

The condition that a timed observation table **T** is reduced and prefix-closed is inherently preserved by the aforementioned operations, together with the counterexample processing described later in Sect. 3.3. Furthermore, a table may need several rounds of these operations before being prepared (cf. Algorithm 1), since certain conditions may be violated by different, interleaved operations.

#### 3.2 Hypothesis construction

As soon as the timed observation table **T** is prepared, a hypothesis can be constructed in two steps, i.e., the learner first builds a DFA M based on the information in **T**, and then transforms M to a hypothesis H, which will later be shown as a COTA.

Given a prepared timed observation table **<sup>T</sup>** = (Σ, *<sup>Σ</sup>*, *<sup>Σ</sup><sup>r</sup>*,*S*, *<sup>R</sup>*, *<sup>E</sup>*, f), a DFA <sup>M</sup> = (Q<sup>M</sup>, Σ<sup>M</sup>, Δ<sup>M</sup>, q<sup>0</sup> <sup>M</sup>, F<sup>M</sup>) can be built as follows:


The constructed DFA M is compatible with the timed observation table **T** in the sense captured by the following lemma.

Lemma 1. *For a prepared timed observation table* **<sup>T</sup>** = (Σ, *<sup>Σ</sup>*, *<sup>Σ</sup><sup>r</sup>*,*S*, *<sup>R</sup>*, *<sup>E</sup>*, f)*, for every* <sup>γ</sup><sup>r</sup> ·<sup>e</sup> <sup>∈</sup> (*S*∪*R*)·*E, the constructed DFA* <sup>M</sup> = (Q<sup>M</sup>, Σ<sup>M</sup>, Δ<sup>M</sup>, q<sup>0</sup> <sup>M</sup>, F<sup>M</sup>) *accepts* <sup>π</sup>(Π{1,2}γ<sup>r</sup> · <sup>e</sup>) *if and only if* <sup>f</sup>(γ<sup>r</sup> · <sup>e</sup>)=+*.*

The learner then transforms the DFA M to a hypothesis <sup>H</sup> = (Σ, Q, q<sup>0</sup>, F, c, Δ), with <sup>Q</sup> <sup>=</sup> <sup>Q</sup><sup>M</sup>, <sup>q</sup><sup>0</sup> <sup>=</sup> <sup>q</sup><sup>0</sup> <sup>M</sup>, F <sup>=</sup> F<sup>M</sup>, c being the clock and Σ the given alphabet as in **<sup>T</sup>**. The set of transitions <sup>Δ</sup> in <sup>H</sup> can be constructed as follows: For any <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>M</sup> and <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>, let <sup>Ψ</sup>q,σ <sup>=</sup> {<sup>μ</sup> <sup>|</sup> (q,(σ, μ, b), q ) <sup>∈</sup> Δ<sup>M</sup>}, then applying the partition function P<sup>c</sup>(·) (defined below) to <sup>Ψ</sup>q,σ returns <sup>k</sup> intervals, written as <sup>I</sup><sup>1</sup>, ··· , I<sup>k</sup>, satisfying <sup>μ</sup><sup>i</sup> <sup>∈</sup> <sup>I</sup><sup>i</sup> for any <sup>1</sup> <sup>≤</sup> i <sup>≤</sup> k, where k <sup>=</sup> <sup>|</sup>Ψq,σ|; consequently, for every (q,(σ, μ<sup>i</sup>, b<sup>i</sup>), q ) <sup>∈</sup> Δ<sup>M</sup>, a fresh transition <sup>δ</sup><sup>i</sup> = (q, σ, I<sup>i</sup>, b<sup>i</sup>, q ) is added to Δ.

Fig. 2: The prepared timed observation table **T5**, the corresponding DFA M<sup>5</sup> and hypothesis H5.

Definition 3 (Partition function). *Given a list of clock valuations* <sup>=</sup> <sup>μ</sup>0, μ1, ··· , μ<sup>n</sup> *with* 0 = <sup>μ</sup><sup>0</sup> < μ<sup>1</sup> ··· < μ<sup>n</sup>*, and* μ<sup>i</sup> <sup>=</sup> μ<sup>j</sup> *if* <sup>μ</sup><sup>i</sup>, μ<sup>j</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> \ <sup>N</sup> *and* <sup>i</sup> <sup>=</sup> <sup>j</sup> *for all* <sup>1</sup> <sup>≤</sup> i, j <sup>≤</sup> <sup>n</sup>*, let* <sup>μ</sup><sup>n</sup>+1 <sup>=</sup> <sup>∞</sup>*, then a partition function* <sup>P</sup><sup>c</sup>(·) *mapping to a set of intervals* {I0, I1,...,I<sup>n</sup>}*, which is a partition of* <sup>R</sup>≥0*, is defined as*

$$I\_{i} = \begin{cases} [\mu\_{i}, \mu\_{i+1}), & \text{if } \mu\_{i} \in \mathbb{N} \land \mu\_{i+1} \in \mathbb{N};\\ (\lfloor \mu\_{i} \rfloor, \mu\_{i+1}), & \text{if } \mu\_{i} \in \mathbb{R}\_{\geq 0} \nmid \mathbb{N} \land \mu\_{i+1} \in \mathbb{N};\\ [\mu\_{i}, \lfloor \mu\_{i+1} \rfloor], & \text{if } \mu\_{i} \in \mathbb{N} \land \mu\_{i+1} \in \mathbb{R}\_{\geq 0} \nmid \mathbb{N};\\ (\lfloor \mu\_{i} \rfloor, \lfloor \mu\_{i+1} \rfloor), & \text{if } \mu\_{i} \in \mathbb{R}\_{\geq 0} \nmid \mathbb{N} \land \mu\_{i+1} \in \mathbb{R}\_{\geq 0} \nmid \mathbb{N}.\end{cases}$$

*Remark 2.* Definition 3 is adapted from that in [18] by imposing additional assumptions of the list of clock valuations in order to guarantee <sup>μ</sup><sup>i</sup> <sup>∈</sup> <sup>I</sup><sup>i</sup>, for any <sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, due to the underlying continuous-time semantics. Whereas, by **T** being prepared and the normalization function described in Sect. 3.3, the set of clock valuations <sup>Ψ</sup>q,σ can be arranged into a list q,σ <sup>=</sup> <sup>μ</sup>0, μ1,...,μ<sup>n</sup> satisfying such assumptions given in Definition <sup>3</sup> for any <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>M</sup> and <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>.

*Example 2.* Suppose A in Fig. 1 recognizes the target timed language. Then the prepared table **T5**, the corresponding DFA M<sup>5</sup> and hypothesis H<sup>5</sup> are depicted in Fig. 2. Here, the subscript 5 indicates the fifth iteration of **T** (Details concerning the constructions and the entire learning process are enclosed in Appendix B of [7].).

Lemma 2. *Given a DFA* <sup>M</sup> = (Q<sup>M</sup>, Σ<sup>M</sup>, δ<sup>M</sup>, q<sup>0</sup> <sup>M</sup>, F<sup>M</sup>)*, which is generated from a prepared timed observation table* **<sup>T</sup>***, the hypothesis* <sup>H</sup> = (Σ, Q, q<sup>0</sup>, F, c, Δ) *is transformed from M. For all* <sup>γ</sup><sup>r</sup> · <sup>e</sup> <sup>∈</sup> (*<sup>S</sup>* <sup>∪</sup> *<sup>R</sup>*) · *<sup>E</sup>,* <sup>H</sup> *accepts the reset-logical-timed word* <sup>π</sup>(Π{1,2}γ<sup>r</sup> · <sup>e</sup>) *iff* <sup>f</sup>(γ<sup>r</sup> · <sup>e</sup>)=+*.*

Theorem 2. *The hypothesis* H *is a COTA.*

Given a clock valuation μ, we denote the *region* containing μ as μ, defined as μ = [μ, μ] if μ <sup>∈</sup> <sup>N</sup>, and μ = (μ, μ + 1) otherwise. The following theorem establishes the compatibility of the constructed hypothesis H with the timed observation table **T**.

Theorem 3. *For* <sup>γ</sup><sup>r</sup> ·<sup>e</sup> <sup>∈</sup> (*S*∪*R*)·*E, let* <sup>π</sup>(Π{1,2}γ<sup>r</sup> ·e)=(σ<sup>1</sup>, μ<sup>1</sup>, b<sup>1</sup>)···(σ<sup>n</sup>, μ<sup>n</sup>, b<sup>n</sup>)*. Then for every* μ <sup>i</sup> ∈ μ<sup>i</sup>*, the hypothesis* <sup>H</sup> *accepts the reset-logical-timed word* γ <sup>r</sup> = (σ<sup>1</sup>, μ <sup>1</sup>, b<sup>1</sup>)···(σ<sup>n</sup>, μ <sup>n</sup>, b<sup>n</sup>) *if* <sup>f</sup>(γ<sup>r</sup> · <sup>e</sup>)=+*, and cannot accept it if* <sup>f</sup>(γ<sup>r</sup> · <sup>e</sup>) = <sup>−</sup>*.*

#### 3.3 Equivalence query and counterexample processing

Suppose that the teacher knows a COTA A which recognizes the target timed language <sup>L</sup>. Then to answer an equivalence query is to determine whether <sup>L</sup>(H) = <sup>L</sup>(A), which can be divided into two timed language inclusion problems, i.e., whether L(H) ⊆ <sup>L</sup>(A) and <sup>L</sup>(A) ⊆ L(H). Most decision procedures for language inclusion proceed by complementation and emptiness checking of the intersection [23]: <sup>L</sup>(A) ⊆ L(B) iff <sup>L</sup>(A)∩L(B) = <sup>∅</sup>. The fact that deterministic timed automata can be complemented [6] enables solving the inclusion problem by checking the emptiness of the resulted product automata H×<sup>A</sup> and H×A. The complementation technique, however, does not apply to nondeterministic timed automata even if with only one single clock [4], which we plan to incorporate in our learning framework in future work. We therefore opt for2 the alternative method presented by Ouaknine and Worrell in [31] showing that the language inclusion problem of timed automata with one clock (regardless of their determinacy) is decidable by reduction to a reachability problem on an infinite graph. That is, there exists a delay-timed word ω that leads to a *bad configuration* if <sup>L</sup>(H) - <sup>L</sup>(A). In detail, the corresponding run ρ of ω ends in an accepting location in <sup>H</sup> but the counterpart <sup>ρ</sup> of ω in <sup>A</sup> is not accepting. Consequently, the teacher can provide the reset-delaytimed word <sup>ω</sup><sup>r</sup> resulted from <sup>ω</sup> as a negative counterexample ctx <sup>−</sup>. Similarly, a positive counterexample ctx <sup>+</sup> = (ω<sup>r</sup>, +) can be generated if <sup>L</sup>(A) - L(H). An algorithm elaborating the equivalence query is provided in Appendix C of the full version [7].

When receiving a counterexample ctx = (ω<sup>r</sup>, <sup>+</sup>/−), the learner first converts it to a reset-logical-timed word <sup>γ</sup><sup>r</sup> <sup>=</sup> <sup>Γ</sup>(ω<sup>r</sup>)=(σ1, μ1, b1)(σ2, μ2, b2)···(σ<sup>n</sup>, μ<sup>n</sup>, b<sup>n</sup>). By definition, <sup>γ</sup><sup>r</sup> and <sup>ω</sup><sup>r</sup> share the same sequence of transitions in <sup>A</sup>. Furthermore, by the contraposition of Theorem 1, <sup>γ</sup><sup>r</sup> is an evidence for <sup>L</sup><sup>r</sup>(H) <sup>=</sup> <sup>L</sup><sup>r</sup>(A) if <sup>ω</sup><sup>r</sup> is an evidence for <sup>L</sup>(H) <sup>=</sup> <sup>L</sup>(A).

Additionally, by the definition of clock constraints <sup>Φ</sup><sup>c</sup>, at any location, if an action σ is enabled, i.e., its guard is satisfied, w.r.t. the clock value <sup>μ</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> \N, then <sup>σ</sup> should be enabled w.r.t. any clock value μ+θ at the location, where θ <sup>∈</sup> (0, 1). Specifically, only one transition is available for σ at the location on the interval μ, because the target automaton is deterministic. Therefore, in order to avoid unnecessarily distinguishing timed words and violating the assumptions of the list for the partition function, the learner needs to apply a *normalization function* <sup>g</sup> to normalize <sup>γ</sup><sup>r</sup>.

Definition 4 (Normalization). *A normalization function* g *maps a reset-logical-timed word* <sup>γ</sup><sup>r</sup> = (σ<sup>1</sup>, μ<sup>1</sup>, b<sup>1</sup>)(σ<sup>2</sup>, μ<sup>2</sup>, b<sup>2</sup>)···(σ<sup>n</sup>, μ<sup>n</sup>, b<sup>n</sup>) *to another reset-logical-timed word by resetting any logical clock to its integer part plus a constant fractional part, i.e.,* g(γ<sup>r</sup>)=(σ<sup>1</sup>, μ <sup>1</sup>, b<sup>1</sup>)(σ<sup>2</sup>, μ <sup>2</sup>, b<sup>2</sup>)···(σ<sup>n</sup>, μ <sup>n</sup>, b<sup>n</sup>)*, where* μ <sup>i</sup> <sup>=</sup> <sup>μ</sup><sup>i</sup> *if* <sup>μ</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup>*,* <sup>μ</sup> <sup>i</sup> <sup>=</sup> μ<sup>i</sup><sup>+</sup> θ *for some fixed constant* θ <sup>∈</sup> (0, 1) *otherwise.*

We will instantiate θ = 0.<sup>1</sup> in what follows. Clearly our approach works for any other θ valued in (0, 1). This *normalization* process guarantees the assumptions needed for Definition 3.

<sup>2</sup> Remark that the learning complexity (Sect. 3.5) is measured in terms of the number of queries rather than the time complexity of the specific method for checking the equivalence (nor membership). Additionally, the specific method of equivalence checking is not the main concern.

Fig. 3: An illustration of the necessity of normalization by the normalization function.


*Example 3.* Consider the prepared table **T**<sup>5</sup> in Fig. 3 (as in Fig. 2). When the leaner asks an equivalence query with hypothesis <sup>H</sup>5, the teacher answers that <sup>L</sup>(H5) <sup>=</sup> <sup>L</sup>(A), where <sup>A</sup> in Fig. <sup>1</sup> is the target automaton, and provides a counterexample (ω<sup>r</sup>, <sup>−</sup>) with <sup>ω</sup><sup>r</sup> = (a, <sup>0</sup>, -)(a, <sup>1</sup>.3, -), which can be transformed to a reset-logical-timed word <sup>γ</sup><sup>r</sup> <sup>=</sup> (a, <sup>0</sup>, -)(a, <sup>1</sup>.3, -). If he adds prefixes of <sup>γ</sup><sup>r</sup> to the table directly, the learner will get a prepared table **T**<sup>6</sup> and thus construct a DFA M6. Unfortunately, the partition function defined in Definition <sup>3</sup> is not applicable to (a, <sup>1</sup>.3, -) and (a, <sup>1</sup>.1, <sup>⊥</sup>) any more. On the other hand, if he adds the prefixes of the normalized reset-logical-timed word, i.e., γ <sup>r</sup> = (a, <sup>0</sup>, -)(a, <sup>1</sup>.1, -), to **T**5, the learner will then get an inconsistent table whose consistency can be retrieved by the operation of "making **T** consistent" as expected.

The following theorem guarantees that the normalized reset-logical-timed word γ r is also an evidence for L<sup>r</sup>(H) <sup>=</sup> L<sup>r</sup>(A). Therefore, the learner can use it as a counterexample and thus adds all the prefixes of γ <sup>r</sup> to *R* except those already in *S* ∪ *R*.

Theorem 4. *Given a valid reset-logical-timed word* <sup>γ</sup><sup>r</sup> *of* <sup>A</sup>*, its normalization* <sup>γ</sup> <sup>r</sup> = g(γ<sup>r</sup>) *shares the same sequence of transitions in* <sup>A</sup>*.*

#### 3.4 Learning algorithm

We present in Algorithm 1 the learning procedure integrating all the previously stated ingredients, including preparing the table, membership and equivalence queries, hypothesis construction and counterexample processing. The learner first initializes the timed observation table **<sup>T</sup>** = (Σ, *<sup>Σ</sup>*, *<sup>Σ</sup>r*,*S*, *<sup>R</sup>*, *<sup>E</sup>*, f), where *<sup>S</sup>* <sup>=</sup> { }, *<sup>E</sup>* <sup>=</sup> { }, while for every σ <sup>∈</sup> Σ, he builds a logical-timed word γ = (σ, 0) and then obtains its reset counterpart π(γ)=(σ, <sup>0</sup>, b) by triggering a membership query to the teacher, which is then added to *R*. Thereafter, the learner can fill the table by additional membership queries. Before constructing a hypothesis, the learner performs several rounds of operations described in Sect. 3.1 until **T** is prepared. Then, a hypothesis H is constructed leveraging an intermediate DFA M and submitted to the teacher for an equivalence query. If the answer is positive, H recognizes the target language. Otherwise, the learner receives a counterexample ctx and then conducts the counterexample processing to update **T** as described in Sect. 3.3. The whole procedure repeats until the teacher gives a positive answer to an equivalence query.

To facilitate the analysis of correctness, termination and complexity of Algorithm 1, we introduce the notion of *symbolic state* that combines each location with its clock regions: a symbolic state of a COTA <sup>A</sup> = (Σ, Q, q0, F, c, Δ) is a pair (q, μ), where q <sup>∈</sup> Q and μ is a region containing μ. If κ is the maximal constant appearing in the clock constraints of <sup>A</sup>, then there exist <sup>2</sup>κ + 2 such regions, including [n, n] with <sup>0</sup> <sup>≤</sup> n <sup>≤</sup> κ, (n, n+1) with <sup>0</sup> <sup>≤</sup> n<κ, and (κ, <sup>∞</sup>) for each location, so there are a total of <sup>|</sup>Q| ×(2κ+ 2) symbolic states. Then the correctness and termination of Algorithm <sup>1</sup> is stated in the following theorem, based on the fact that there is an injection from *S* (or equivalently, the locations of <sup>H</sup>) to symbolic states of <sup>A</sup>.

Theorem 5. *Algorithm 1 terminates and returns a COTA* H *which recognizes the target timed language* L*.*

#### 3.5 Complexity

Given a target timed language <sup>L</sup> which is recognized by a COTA <sup>A</sup>, let n <sup>=</sup> <sup>|</sup>Q<sup>|</sup> be the number of locations of <sup>A</sup>, m <sup>=</sup> <sup>|</sup>Σ<sup>|</sup> the size of the alphabet, and κ the maximal constant appearing in the clock constraints of A. In what follows, we derive the complexity of Algorithm 1 in terms of the number of queries.

By the proof of Theorem 5, <sup>H</sup> has at most n(2κ + 2) locations ( the size of *<sup>S</sup>*) distinguished by *<sup>E</sup>*. Thus, <sup>|</sup>*E*<sup>|</sup> is at most n(2κ + 2) in order to distinguish these locations. Therefore, the number of transitions of <sup>H</sup> is bounded by mn<sup>2</sup>(2κ + 2)<sup>3</sup>. Furthermore, as every counterexample adds at least one fresh transition to the hypothesis H, where we consider each interval of the partition corresponds to a transition, this means that the number of counterexamples and equivalence queries is at most mn<sup>2</sup>(2<sup>κ</sup> + 2)<sup>3</sup>.

Now, we consider the number of membership queries, that is, to compute (|*S*| + <sup>|</sup>*R*|) × |*E*|. Let h be the maximal length of counterexamples returned by the teacher, which is polynomial in the size of A according to Theorem 5 in [40], bounded by <sup>n</sup><sup>2</sup>. There are three cases of extending *<sup>R</sup>* by adding fresh rows, namely during the processing of counterexamples, making **T** closed, and making **T** evidence-closed. The first case adds at most hmn<sup>2</sup>(2<sup>κ</sup> + 2)<sup>3</sup> rows to *<sup>R</sup>*, while the latter two add at most n(2κ+ 2)<sup>×</sup> m and n<sup>2</sup>(2κ+ 2)<sup>2</sup>, respectively, yielding that the size of *<sup>R</sup>* is bounded by <sup>O</sup>(hmn<sup>2</sup>κ<sup>3</sup>), where <sup>O</sup>(·) is the big Omicron notation. As a consequence, the number of membership queries is bounded by <sup>O</sup>(mn<sup>5</sup>κ<sup>4</sup>). So, the total complexity is <sup>O</sup>(mn<sup>5</sup>κ<sup>4</sup>).

It is worth noting the above analysis is given in the worst case, where all partitions need to be fully refined. But, in practice we can learn the automaton without refining most partitions, and therefore the number of equivalence and membership queries, as well as the number of locations in the learned automaton are much fewer than the corresponding worst-case bounds. This will be demonstrated by examples in Sect. 5.

#### 3.6 Accelerating Trick

In the timed observation table, the function f maps invalid reset-logical-timed words as well as certain valid ones to "−" when the teacher maintains a COTA <sup>A</sup> as the oracle. The learner thus needs multiple rounds of queries to distinguish the "sink" location from other unaccepting locations. If the function f is extended to map invalid reset-logicaltimed words to a distinct symbol, say "×", when we let a DOTA A be the oracle, then the learner will take much fewer queries. We will later show in the experiments that such a trick significantly accelerates the learning process.

## 4 Learning from a Normal Teacher

In this section, we consider the problem of learning timed automata with a normal teacher. As before, we assume the timed language to be learned comes from a complete DOTA. For the normal teacher, inputs to membership queries are delay-timed words, and the teacher returns whether the word is in the language (without giving any additional information). Inputs to equivalence queries are candidate DOTAs, and the teacher either answers they are equivalent or provides a delay-timed word as a counterexample.

The algorithm here is based on the procedure in the previous section. We still maintain observation tables where the elements in *S* ∪ *R* are reset-logical-timed words and the elements in *E* are logical-timed words. In order to obtain delay-timed words for the membership queries, we need to *guess* clock reset information for transitions in the table. More precisely, in order to convert a logical-timed word to a delay-timed word, it is necessary to know clock reset information for all but the last transition. Hence, it is necessary to guess reset information for each word in *S* ∪ *R* (since *S* ∪ *R* is prefix-closed, this is equivalent to guessing reset information for the last transition of each word). Also, for each entry in (*S* ∪ *R*) × *E*, it is necessary to guess all but the last transition in *E*. The algorithm can be thought of as exploring a search tree, where branching is caused by guesses, and successor nodes are constructed by the usual operations of preparing a table and dealing with a counterexample.

The detailed process is given in Algorithm 2. The learner maintains a set of table instances, named ToExplore, which contains all table instances that need to be explored.

The initial tables in ToExplore are as follows. Each table has *<sup>S</sup>* <sup>=</sup> *<sup>E</sup>* <sup>=</sup> { }. For each σ <sup>∈</sup> Σ, there is one row in *<sup>R</sup>* corresponding to the logical-timed word ω = (σ, 0). It is necessary to guess a reset b for each ω thereby transforming it to a reset-logicaltimed word <sup>γ</sup><sup>r</sup> = (σ, <sup>0</sup>, b). There are <sup>2</sup>|Σ<sup>|</sup> possible combinations of guesses. These tables are filled by making membership queries (in this case, the membership queries for each table are the same). The resulting 2|Σ<sup>|</sup> tables form the initial tables in ToExplore.

```
Algorithm 2: Learning one-clock timed automaton with a normal teacher
  input : the timed observation table T = (Σ, Σ, Σr, S, R, E, f).
  output: the hypothesis H recognizing the target language L.
1 ToExplore ← ∅; S ← {}; R ← {π(Γ(ω)) | ω = (σ, 0), ∀σ ∈ Σ}; E ← {};
2 currentTable ← (Σ, Σ, Σr, S, R, E, f);
3 tables ← guess and fill(currentTable); // guess resets and fill all table instances
4 ToExplore.insert(tables); // insert table instances tables into ToExplore
5 currentTable ← ToExplore.pop(); // pop out head instance as the current table
6 equivalent ← ⊥;
7 while equivalent = ⊥ do
8 prepared ← is prepared(currentTable); // whether the current table is prepared
9 while prepared = ⊥ do
10 if currentTable is not closed then
11 tables ← guess and make closed(currentTable); ToExplore.insert(tables);
12 currentTable ← ToExplore.pop();
13 if currentTable is not consistent then
14 tables ← guess and make consistent(currentTable); ToExplore.insert(tables);
15 currentTable ← ToExplore.pop();
16 if currentTable is not evidence-closed then
17 tables ← guess and make evidence closed(currentTable); ToExplore.insert(tables);
18 currentTable ← ToExplore.pop();
19 prepared ← is prepared(currentTable);
20 M ← build DFA(currentTable) ; // transforming currentTable to a DFA M
21 H ← build hypothesis(M) ; // constructing a hypothesis H from M
22 equivalent, ctx ← equivalence query(H); // ctx is a delay-timed word
23 if equivalent = ⊥ then
24 tables ← guess and ctx processing(currentTable, ctx) ; // counterexample
           processing
25 ToExplore.insert(tables);
26 currentTable ← ToExplore.pop();
27 return H;
```
In each iteration of the algorithm, one table instance is taken out of ToExplore. The learner checks whether the table is closed, consistent, and evidence closed. If the table is not closed, i.e. there exists r <sup>∈</sup> *<sup>R</sup>* such that row(r) <sup>=</sup> row(s) for all s <sup>∈</sup> *<sup>S</sup>*, the learner moves r from *<sup>R</sup>* to *<sup>S</sup>*. Then for each σ <sup>∈</sup> Σ, the element r · (σ, 0) is added to *<sup>R</sup>*, and a guess has to be made for its reset information. Hence, 2|Σ<sup>|</sup> unfilled table instances will be generated. Next, for each entry in the <sup>|</sup>Σ<sup>|</sup> new rows of *<sup>R</sup>*, it is necessary to guess reset information for all but the last transition in e <sup>∈</sup> *<sup>E</sup>*. After this guess, it is now possible to fill the table instances by making membership queries with transformed delay-timed words. Hence, there are at most 2( - ei∈*E*\{-} (|ei|−1))×|Σ<sup>|</sup> filled table instances for one unfilled table instance. All filled table instances are inserted into ToExplore.

If the table is not consistent, i.e. there exist some γ<sup>r</sup>, γ <sup>r</sup> ∈ *S*∪*R* and *σ<sup>r</sup>* ∈ *Σ<sup>r</sup>* such that <sup>γ</sup><sup>r</sup> ·*σ<sup>r</sup>*, γ <sup>r</sup> ·*σ<sup>r</sup>* <sup>∈</sup> *<sup>S</sup>*∪*<sup>R</sup>* and row(γ<sup>r</sup>) = row(γ <sup>r</sup>), but row(γ<sup>r</sup> ·*σr*) <sup>=</sup> row(γ <sup>r</sup> ·*σr*). Let <sup>e</sup> <sup>∈</sup> *<sup>E</sup>* be one place where they are different. Then *<sup>σ</sup><sup>r</sup>* ·<sup>e</sup> needs to be added to *<sup>E</sup>*. For each entry in *<sup>S</sup>*∪*R*, all but the last transition in *<sup>σ</sup><sup>r</sup>* · <sup>e</sup> need to be guessed, then the table can be filled. 2(|*σ*·e|−1)×(|*S*|+|*R*|) filled table instances will be generated and inserted into ToExplore. The operation for making tables evidence-closed is analogous.

Once the current table is prepared, the learner builds a hypothesis H and makes an equivalence query to the teacher. If the answer is positive, then H is a COTA which recognizes the target timed language L; otherwise, the teacher gives a delay-timed word ω as a counterexample. The learner first finds the longest reset-logical-timed word in *<sup>R</sup>* which, when converted to a delay-timed word, agrees with a prefix of ω. The remainder of ω, however, needs to be converted to a reset-logical-timed word by guessing reset information. The corresponding prefixes are then added to *R*. Hence, at most 2|ω<sup>|</sup> unfilled table instances are generated. For each unfilled table instance, at most 2( - ei∈*E*\{-} (|ei|−1))×|ω<sup>|</sup> filled tables are produced and inserted into ToExplore.

Throughout the learning process, the learner adds a finite number of table instances to ToExplore at every iteration. Hence, the search tree is finite-branching. Moreover, if all guesses are correct, the resulting table instance will be identical to the observation table in the learning process with a smart teacher (apart from the guessing processes, the basic table operations are the same as those in Section 3.1). This means, with an appropriate search order, for example, taking the table instance that requires the least number of guesses in ToExplore at every iteration, the algorithm terminates and returns the same table as in the learning process with a smart teacher, which is a COTA that recognizes the target language L. In conformity to Theorem 1, the algorithm may terminate even if the corresponding reset-logical-timed languages are not equivalent, while yielding correct COTAs recognizing the same delay-timed language.

Theorem 6. *Algorithm 2 terminates and returns a COTA* H *which recognizes the target timed language* L*.*

*Complexity analysis.* If **<sup>T</sup>** = (Σ, *<sup>Σ</sup>*, *<sup>Σ</sup><sup>r</sup>*,*S*, *<sup>R</sup>*, *<sup>E</sup>*, f) is the final observation table for the correct candidate COTA, the number of guessed resets in *S*∪*R* is |*S*|+|*R*|, and the number of guessed resets for entries in each row of the table is <sup>e</sup>i∈*E*\{} (|e<sup>i</sup>| − 1). Hence, the total number of guessed resets is (|*S*| + |*R*|) × (1 + <sup>e</sup>i∈*E*\{} (|e<sup>i</sup>| − 1)). Assuming an appropriate search order (for example according to the number of guesses in each table), this yields the number of table instances considered before termination as <sup>O</sup>(2(|*S*|+|*R*|)×(1+- ei∈*E*\{-} (|ei|−1))).

## 5 Implementation and Experimental Results

To investigate the efficiency and scalability of the proposed methods, we implemented a prototype<sup>3</sup> in PYTHON for learning deterministic one-clock timed automata. The examples include a practical case concerning the functional specification of the TCP protocol [26] and a set of randomly generated DOTAs to be learnt. All of the evaluations have been carried out on a 3.6GHz Intel Core-i7 processor with 8GB RAM running 64-bit Ubuntu 16.04.

*Functional specification of the TCP protocol.* In [26], a state diagram on page 23 specifies state changes during a TCP connection triggered by causing events while leading to resulting actions. As observed by Ouaknine and Worrell in [31], such a functional specification of the protocol can be represented as a one-clock timed automaton. In our setting, the corresponding DOTA <sup>A</sup> to be learnt is configured to have <sup>|</sup>Q<sup>|</sup> = 11 states with the two CLOSED states collapsed, <sup>|</sup>Σ<sup>|</sup> = 10 after abstracting the causing events and the resulting actions, <sup>|</sup>F<sup>|</sup> = 2, and <sup>|</sup>Δ<sup>|</sup> = 19 with appropriately specified timing constraints including guards and resets. Using the algorithm with the smart teacher, a

<sup>3</sup> Available at https://github.com/Leslieaj/OTALearning. The evaluated artifact is archived in [8].


Table 1: Experimental results on random examples for the smart teacher situation.

Case ID: n m κ, consisting of the number of locations, the size of the alphabet and the maximum constant appearing in the clock constraints, respectively, of the corresponding group of <sup>A</sup>'s. <sup>|</sup>Δ|mean: the average number of transitions in the corresponding group.

#Membership & #Equivalence: the number of conducted membership and equivalence queries, respectively. Nmin: the minimal, Nmean: the mean, Nmax: the maximum.

nmean: the average number of locations of the learned automata in the corresponding group. tmean: the average wall-clock time in seconds, including that taken by the learner and the teacher.

correct DOTA H is learned in 155 seconds after 2600 membership queries and 28 equivalence queries. Specifically, H has 15 locations excluding a sink location connected by 28 transitions. The introduction of 4 new locations comes from splitting of guards along transitions, which however can be trivially merged back with other locations. The figures depicting A and H can be found in Appendix D of [7].

*Random examples for a smart teacher.* We randomly generated 80 DOTAs in eight groups, with each group having different numbers of locations, size of alphabet, and maximum constant appearing in clock constraints. As shown in Table 1, the proposed learning method succeeds in all cases in identifying a DOTA that recognizes the same timed language. In particular, the number of membership queries and that of equivalence queries appear to grow polynomially with the size of the problem4, and are much smaller than the worst-case bounds estimated in Sect. 3.5. Moreover, the learned DOTAs do not have prominent increases in the number of locations (by comparing <sup>n</sup>mean with the first component of Case IDs). The average wall-clock time including both time taken by the learner and by the teacher is recorded in the last column <sup>t</sup>mean, of which, however, often over 90% is spent by the teacher for checking equivalences w.r.t. small **T**'s while around 50% by the learner for checking the preparedness condition w.r.t. large **T**'s.

It is worth noting that all of the results reported above are carried out on an implementation equipped with the *accelerating trick* discussed in Sect. 3.6. We remark that when *dropping* this trick, the average number of membership queries blow up with a factor of 0.83 (min) to 15.02 (max) with 2.16 in average for all the 8 groups, and 0.84 (min) to 1.71 (max) with 1.04 for the average number of equivalence queries, leading to dramatic increases also in the computation time (including that in operating tables).

<sup>4</sup> An exception w.r.t. the group 7 6 10 is due to relatively simple DOTAs generated occasionally.


Table 2: Experimental results on random examples for the normal teacher situation.

#Membership & #Equivalence: the number of conducted membership and equivalence queries with the cached methods, respectively. Nmin: the minimal, Nmean: the mean, Nmax: the maximum. #**T***explored*: the average number of the explored table instances.

#Learnt: the number of the learnt DOTAs in the group (learnt/total).

The alternative implementation and experimental results without the accelerating trick can also be found in the tool page (under the dev branch).

*Random examples for a normal teacher.* Due to its high, exponential complexity, the algorithm with a normal teacher failed (out of memory) in identifying DOTAs for almost all the above examples, except 6 cases out of the 10 in group 4 4 20. We therefore randomly generated 40 extra DOTAs of smaller size classified into 4 groups. With the accelerating trick, the learner need not guess the resets in elements of *E* for an entry in *S* ∪ *R* if the querying result of the entry is the sink location. We also omitted the checking of the evidence-closed condition, since it may add redundant rows in *R*, leading to more guesses and thereby a larger search space. The omission does not affect the correctness of the learnt DOTAs. Moreover, as different table instances may generate repeated queries, we cached the results of membership queries and counterexamples, such that the numbers of membership and equivalence queries to the teacher can be significantly reduced. Table 2 shows the performance of the algorithm in this setting. Results without caching are available in the tool page (under the normal branch).

## 6 Conclusion

We have presented a polynomial active learning method for deterministic one-clock timed automata from a smart teacher who can tell information about clock resets in membership and equivalence queries. Our technique is based on converting the problem to that of learning reset-logical-timed languages. We then extend the method to learning DOTAs from a normal teacher who receives delay-timed words for membership queries, while the learner guesses the reset information in the observation table. We evaluate both algorithms on randomly generated examples and, for the former case, the functional specification of the TCP protocol.

Moving forward, an extension of our active learning method to nondeterministic OTAs and timed automata involving multiple clocks is of particular interest.

Data Availability Statement The datasets generated and/or analyzed during the current study are available in the Figshare repository: https://doi.org/10.6084/m9.figshare.11545983.v3.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Rare event simulation for non-Markovian repairable fault trees** ✩

Carlos E. Budde<sup>1</sup> , Marco Biagi<sup>2</sup> , Raúl E. Monti<sup>1</sup> , Pedro R. D'Argenio3*,*4*,*<sup>5</sup> , and Mariëlle Stoelinga1*,*<sup>6</sup>

<sup>1</sup> Formal Methods and Tools, University of Twente, Enschede, the Netherlands

<sup>2</sup> Department of Information Engineering, University of Florence, Florence, Italy <sup>3</sup> FAMAF, Universidad Nacional de Córdoba, Córdoba, Argentina

<sup>4</sup> CONICET, Córdoba, Argentina

<sup>5</sup> Department of Computer Science, Saarland University, Saarbrücken, Germany <sup>6</sup> Department of Software Science, Radboud University, Nijmegen, the Netherlands

**Abstract.** Dynamic fault trees (dft) are widely adopted in industry to assess the dependability of safety-critical equipment. Since many systems are too large to be studied numerically, dfts dependability is often analysed using Monte Carlo simulation. A bottleneck here is that many simulation samples are required in the case of rare events, e.g. in highly reliable systems where components fail seldomly. Rare event simulation (res) provides techniques to reduce the number of samples in the case of rare events. We present a res technique based on importance splitting, to study failures in highly reliable dfts. Whereas res usually requires meta-information from an expert, our method is fully automatic: By cleverly exploiting the fault tree structure we extract the so-called importance function. We handle dfts with Markovian and non-Markovian failure and repair distributions—for which no numerical methods exist and show the efficiency of our approach on several case studies.

## **1 Introduction**

Reliability engineering is an important field that provides methods and tools to assess and mitigate the risks related to complex systems. Fault tree analysis (fta) is a prominent technique here. Its application encompasses a large number of industrial domains that range from automotive and aerospace system engineering, to energy and telecommunication systems and protocols.

**Fault trees.** A fault tree (ft) describes how component failures occur and propagate through the system, eventually leading to system failures. Technically, an ft is a directed acyclic graph whose leaves model component failures, and whose other nodes (called gates) model failure propagation. Using fault trees one can compute dependability metrics to quantify how a system fares w.r.t.

<sup>✩</sup> This work was partially funded by NWO, NS, and ProRail project 15474 (*SE-QUOIA*), ERC grant 695614 (*POWVER*), EU project 102112 (*SUCCESS*), ANPCyT PICT-2017-3894 (*RAFTSys*), and SeCyT project 33620180100354CB (*ARES*).

certain performance indicators. Two common metrics are system *reliability*—the probability that there are no system failures during a given mission time—and system *availability*—the average percentage of time that a system is operational.

In this paper we consider repairable dynamic fault trees. *Dynamic fault trees* (dfts [17, 43]) are a common and widely applied variant of fts, catering for common dependability patterns such as spare management and causal dependencies. *Repairs* [6] are not only crucial in fault-tolerant and resilient systems, they are also an important cost driver. Hence, repairable fault trees allow one to compare different repair strategies with respect to various dependability metrics.

**Fault tree analysis.** The reliability/availability of a fault tree can be computed via numerical methods, such as probabilistic model checking. This involves exhaustive explorations of state-based models such as interactive Markov chains [40]. Since the number of states (i.e. system configurations) is exponential in the number of tree elements, analysing large trees remains a challenge today [26, 1]. Moreover, numerical methods are usually restricted to exponential failure rates and combinations thereof, like Erlang and acyclic phase type distributions [40].

Alternatively, fault trees can be analysed using (standard) Monte Carlo simulation (smc [22, 40, 38], aka statistical model checking). Here, a large number of simulated system runs (*samples*) is produced. Reliability and availability are then statistically estimated from the resulting sample set. Such sampling does not involve storing the full state space so, although the result provided can only be correct with a certain probability, smc is much more memory efficient than numerical techniques. Furthermore, smc is not restricted to exponential probability distributions. However, a known bottleneck of smc are rare events: when the event of interest has a low probability (which is typically the case in highly reliable systems), millions of samples may be required to observe it. Producing these samples can take a unacceptably long simulation time.

**Rare event simulation.** To alleviate this problem, the field of rare event simulation (res) provides techniques that reduce the number of samples [35]. The two leading techniques are importance sampling and importance splitting.

*Importance sampling* tweaks the probabilities in a model, then computes the metric of interest for the changed system, and finally adjusts the analysis results to the original model [23, 33]. Unfortunately it has specific requirements on the stochastic model: in particular, it is generally limited to Markov models.

*Importance splitting*, deployed in this paper, does not have this limitation. Importance splitting relies on rare events that arise as a sequence of less rare intermediate events [28, 2]. We exploit this fact by generating more (partial) samples on paths where such intermediate events are observed. As a simple example, consider a biased coin whose probability of heads is *p* = <sup>1</sup>*/*80. Suppose we flip it eight times in a row, and say we are interested in observing at least three heads. If heads comes up at the first flip (*H*) then we are on a promising path. We can then clone (*split*) the current path *H*, generating e.g. 7 copies of it, each clone evolving independently from the second flip onwards. Say one clone observes three heads—the copied *H* plus two more. Then, this observation of the rare event (three heads) is counted as <sup>1</sup>*/*<sup>7</sup> rather than as 1 observation, to account for the splitting where the clone was spawned. Now, if a clone observes a new head (*HH*), this is even more promising than *H*, so the splitting can be repeated. If we make 5 copies of the *HH* clone, then observing three heads in any of these copies counts as <sup>1</sup> <sup>35</sup> <sup>=</sup> <sup>1</sup> <sup>7</sup> · <sup>1</sup> <sup>5</sup> . Alternatively, observing tails as second flip (*HT*) is less promising than heads. One could then decide not to split such path.

This example highlights a key ingredient of importance splitting: the *importance function*, that indicates for each state how promising it is w.r.t. the event of interest. This function, together with other parameters such as thresholds [19], are used to choose e.g. the number of clones spawned when visiting a state. An importance function for our example could be the number of heads seen thus far. Another one could be such number, multiplied by the number of coin flips yet to come. The goal is to give *higher importance* to states from which observing the *rare event is more likely*. The efficiency of an importance splitting implementation increases as the importance function better reflects such property.

Rare event simulation has been successfully applied in several domains [34, 45, 49, 4, 5, 46]. However, a key bottleneck is that it critically relies on expert knowledge. In particular for importance splitting, finding a good importance function is a well-known highly non-trivial task [35, 25].

**Our contribution: rare event simulation for fault trees.** This paper presents an importance splitting method to analyse rfts. In particular, we automatically derive an importance function by exploiting the description of a system as a fault tree. This is crucial, since the importance function is normally given manually in an ad hoc fashion by a domain or res expert. We use a variety of res algorithms based in our importance function, to estimate system unreliability and unavailability. Our approach can converge to precise estimations in increasingly reliable systems. This method has four advantages over earlier analysis methods for rfts—which we overview in the related work section 6 namely: (1) we are able to estimate both the system reliability and availability; (2) we can handle arbitrary failure and repair distributions; (3) we can handle rare events; and (4) we can do it in a fully automatic fashion.

Technically, we build local importance functions for the (automata-semantics of the) nodes of the tree. We then aggregate these local functions into an importance function for the full tree. Aggregation uses structural induction in the layered description of the tree. Using our importance function, we implement importance splitting methods to run res analyses. We implemented our theory in a full-stack tool chain. With it, we computed confidence intervals for the unreliability and unavailability of several case studies. Our case studies are rfts whose failure and repair times are governed by arbitrary continuous probability density functions (pdfs). Each case study was analysed for a fixed runtime budget and in increasingly resilient configurations. In all cases our approach could estimate the narrowest intervals for the most resilient configurations.

**Paper outline.** Background on fault trees and res is provided in Secs. 2 and 3. We detail our theory to implement res for rfts in Sec. 4. Using a tool chain, we performed an extensive experimental evaluation that we present in Sec. 5. We overview related work in Sec. 6 and conclude our contributions in Sec. 7.

## **2 Fault tree analysis**

A fault tree '-' is a directed acyclic graph that models how component failures propagate and eventually cause the full system to fail. We consider repairable fault trees (RFTs), where failures and repairs are governed by arbitrary probability distributions.

Fig. 1: Fault tree gates and the repair box

**Basic elements.** The leaves of the tree, called basic events or *basic elements* (BEs), model the failure of components. A BE *b* is equipped with a failure distribution *F<sup>b</sup>* that governs the probability for *b* to fail before time *t*, and a repair distribution *R<sup>b</sup>* governing its repair time. Some BEs are used as spare components: these (SBEs) replace a primary component when it fails. SBEs are equipped also with a dormancy distribution *Db*, since spares fail less often when *dormant*, i.e. not in use. Only if an SBE becomes active, its failure distribution is given by *Fb*.

**Gates.** Non-leave nodes are called *intermediate events* and are labelled with *gates*, describing how combinations of lower failures propagate to upper levels. Fig. 1 shows their syntax. Their meaning is as follows: the AND, OR, and VOT*<sup>k</sup>* gates fail if respectively all, one, or *k* of their *m* children fail (with 1 *k m*). The latter is called the *voting* or *k* out of *m* gate. Note that VOT<sup>1</sup> is equivalent to an OR gate, and VOT*<sup>m</sup>* is equivalent to an AND. The *priority-and gate* (PAND) is an AND gate that only fails if its children fail from left to right (or simultaneously). PANDs express failures that can only happen in a particular order, e.g. a short circuit in a pump can only occur after a leakage. SPARE gates have one *primary* child and one or more *spare* children: spares replace the primary when it fails. The FDEP gate has an input *trigger* and several *dependent events*: all dependent events become unavailable when the trigger fails. FDEPs can model for instance networks elements that become unavailable if their connecting bus fails.

**Repair boxes.** An RBOX determines which basic element is repaired next according to a given policy. Thus all its inputs are BEs or SBEs. Unlike gates, an RBOX has no output since it does not propagate failures.

**Top level event.** A full-system failure occurs if the *top event* (i.e. the root node) of the tree fails.

**Example.** The tree in Fig. 2 models a railway-signal system, which fails if its high voltage and relay cabinets fail [21, 39]. Thus, the top event is an AND gate with children HVcab (a BE) and Rcab. The latter is a SPARE gate with primary P and spare

Fig. 2: Tiny rft

S. All BEs are managed by one RBOX with repair priority HVcab *>* P *>* S.

**Notation.** The nodes of a tree are given by *nodes*(-) = {0*,* <sup>1</sup>*,...,n* <sup>−</sup> <sup>1</sup>}. We let *v, w* range over *nodes*(-). A function *type*- : *nodes*(-) → {BE*,* SBE*,*AND*,* OR*,* VOT*k,* PAND*,* SPARE*,* FDEP*,* RBOX} yields the type of each node in the tree. A function *chil*- : *nodes*(-) <sup>→</sup> *nodes*(-)<sup>∗</sup> returns the ordered list of children of a node. If clear from context, we omit the superscript from function names.

**Semantics.** Following [32] we give semantics to rft as Input/Output Stochastic Automata (iosa), so that we can handle arbitrary probability distributions. Each state in the iosa represents a system configuration, indicating which components are operational and which have failed. Transitions among states describe how the configuration changes when failures or repairs occur.

More precisely, a *state* in the iosa is a tuple *<sup>x</sup>* = (*x*0*,..., <sup>x</sup><sup>n</sup>*−1) <sup>∈</sup> <sup>S</sup> <sup>⊆</sup> <sup>N</sup>*<sup>n</sup>*, where S is the *state space* and *x<sup>v</sup>* denotes the state of node *v* in -. The possible values for *<sup>x</sup><sup>v</sup>* depend on the type of *<sup>v</sup>*. The *output <sup>z</sup><sup>v</sup>* ∈ {0*,* <sup>1</sup>} of node *<sup>v</sup>* indicates whether it is operational (*zv*=0) or failed (*zv*=1) and is calculated as follows:


For example, the rft from Fig. 2 starts with all operational elements, so the initial state is *x*<sup>0</sup> = (0*,* 0*,* 2*,* 0*,* 0). If then P fails, *x*<sup>P</sup> and *z*<sup>P</sup> are set to 1 (failed) and S becomes *x*<sup>S</sup> = 0 (active and operational spare), so the state changes to *<sup>x</sup>*<sup>1</sup> = (0*,* <sup>1</sup>*,* <sup>0</sup>*,* <sup>0</sup>*,* 0). The traces of the iosa are given by *<sup>x</sup>*<sup>0</sup>*x*<sup>1</sup> ··· *<sup>x</sup><sup>n</sup>* <sup>∈</sup> <sup>S</sup><sup>∗</sup>, where a change from *<sup>x</sup><sup>j</sup>* to *<sup>x</sup><sup>j</sup>*+1 corresponds to transitions triggered in the iosa.

**Nondeterminism.** Dynamic fault trees may exhibit nondeterministic behaviour as a consequence of underspecified failure behaviour [15, 27]. This can happen e.g. when two SPAREs have a single shared SBE: if all elements are failed, and the SBE is repaired first, the failure behaviour depends on which SPARE gets the SBE. Monte Carlo simulation, however, requires fully stochastic models and cannot cope with nondeterminism. To overcome this problem we deploy the theory from [16, 32]. If a fault tree adheres to some mild syntactic conditions, then its iosa semantics is *weakly deterministic*, meaning that all resolutions of the nondeterministic choices lead to the same probability value. In particular, we require that (1) each BE is connected to at most one SPARE gate, and (2) BEs and SBEs connected to SPAREs are not connected to FDEPs. In addition to this, some semantic decisions have been fixed, e.g. the semantics of PAND is fully specified, and policies should be provided for RBOX and spare assignments.

**Dependability metrics.** An important use of fault trees is to compute relevant dependability metrics. Let *X<sup>t</sup>* denote the random variable that represents the state of the top event at time *t* [14]. Two popular metrics are:


System *unreliability* and *unavailability* are the reverse of these metrics. That is: UNREL*<sup>T</sup>* = 1 <sup>−</sup> REL*<sup>T</sup>* and UNAVA = 1 <sup>−</sup> AVA.

## **3 Stochastic simulation for Fault Trees**

**Standard Monte Carlo simulation (SMC).** Monte Carlo simulation takes random samples from stochastic models to estimate a (dependability) metric of interest. For instance, to estimate the unreliability of a tree we sample *N* independent traces from its iosa semantics. An unbiased statistical estimator for *p* = UNREL*<sup>T</sup>* is the proportion of traces observing a top level event, that is, *p*ˆ*<sup>N</sup>* = <sup>1</sup> *N* -*N <sup>j</sup>*=1 *<sup>X</sup><sup>j</sup>* where *<sup>X</sup><sup>j</sup>* = 1 if the *<sup>j</sup>*-th trace exhibits a top level failure before time *T* and *X<sup>j</sup>* = 0 otherwise. The statistical error of *p*ˆ is typically quantified with two numbers *<sup>δ</sup>* and *<sup>ε</sup>* s.t. *<sup>p</sup>*<sup>ˆ</sup> <sup>∈</sup> [*<sup>p</sup>* <sup>−</sup> *ε, p* <sup>+</sup> *<sup>ε</sup>*] with probability *<sup>δ</sup>*. The interval *<sup>p</sup>*ˆ<sup>±</sup> *<sup>ε</sup>* is called a *confidence interval* (ci) with coefficient *<sup>δ</sup>* and precision <sup>2</sup>*ε*.

Such procedures scale linearly with the number of tree nodes and cater for a wide range of pdfs, even non-Markovian distributions. However, they encounter a bottleneck to estimate *rare events*: if *<sup>p</sup>* <sup>≈</sup> <sup>0</sup>, very few traces observe *<sup>X</sup><sup>j</sup>* = 1. Therefore, the variance of estimators like *<sup>p</sup>*<sup>ˆ</sup> becomes huge, and cis become very broad, easily degenerating to the trivial interval [0*,* 1]. Increasing the number of traces alleviates this problem, but even standard ci settings—where *<sup>ε</sup>* is relative to *p*—require sampling an unacceptable number of traces [35]. Rare event simulation techniques solve this specific problem.

**Rare Event Simulation (RES).** res techniques [35] increase the amount of traces that observe the rare event, e.g. a top level event in an rft. Two prominent classes of res techniques are *importance sampling*, which adjusts the pdf of failures and repairs, and *importance splitting* (isplit [30]), which samples more (partial) traces from states that are closer to the rare event. We focus on isplit due to its flexibility with respect to the probability distributions.

isplit can be efficiently deployed as long as the rare event *<sup>γ</sup>* can be described as a nested sequence of less-rare events *γ* = *γ<sup>M</sup> <sup>γ</sup><sup>M</sup>*−<sup>1</sup> - ··· *γ*0. This decomposition allows isplit to study the conditional probabilities *<sup>p</sup><sup>k</sup>* <sup>=</sup> *Prob*(*γk*+1 <sup>|</sup> *<sup>γ</sup>k*) separately, to then compute *<sup>p</sup>* <sup>=</sup> *Prob*(*γ*) = *<sup>M</sup>*-<sup>1</sup> *<sup>k</sup>*=0 *Prob*(*γk*+1 <sup>|</sup> *<sup>γ</sup>k*). Moreover, isplit requires all conditional probabilities *<sup>p</sup><sup>k</sup>* to be much greater than *<sup>p</sup>*, so that estimating each *<sup>p</sup><sup>k</sup>* can be done efficiently with smc.

The key idea behind isplit is to define the events *<sup>γ</sup><sup>k</sup>* via a so called *importance function* <sup>I</sup> : <sup>S</sup> <sup>→</sup> <sup>N</sup> that assigns an *importance* to each state *<sup>s</sup>* <sup>∈</sup> <sup>S</sup>. The higher the importance of a state, the closer it is to the rare event *γM*. Event *γ<sup>k</sup>* collects all states with importance at least *k*, for certain sequence of *threshold levels* 0 = <sup>0</sup> *<* <sup>1</sup> *<sup>&</sup>lt;* ··· *< M*. Formally: *<sup>γ</sup><sup>k</sup>* <sup>=</sup> {*<sup>s</sup>* <sup>∈</sup> <sup>S</sup> <sup>|</sup> <sup>I</sup> (*s*) *k*}.

To exploit the importance function <sup>I</sup> in the simulation procedure, isplit samples more (partial) traces from states with higher importance. Two wellknown methods are deployed and compared in this paper: Fixed Effort and restart. *Fixed Effort* (fe [19]) samples a predefined amount of traces in each region <sup>S</sup>*<sup>k</sup>* <sup>=</sup> *<sup>γ</sup><sup>k</sup>* \ *<sup>γ</sup>k*+1 <sup>=</sup> {*<sup>s</sup>* <sup>∈</sup> <sup>S</sup> <sup>|</sup> *k*+1 *<sup>&</sup>gt;* <sup>I</sup>(*s*) *k*}. Thus, starting at *<sup>γ</sup>*<sup>0</sup> it first estimates the proportion of traces that reach *<sup>γ</sup>*1, i.e. *<sup>p</sup>*<sup>0</sup> <sup>=</sup> *Prob*(*γ*<sup>1</sup> <sup>|</sup> *<sup>γ</sup>*0) = *Prob*(S0). Next, from the states that reached *<sup>γ</sup>*<sup>1</sup> new traces are generated to estimate *<sup>p</sup>*<sup>1</sup> <sup>=</sup> *Prob*(S1), and so on until *<sup>p</sup>M*. Fixed Effort thus requires that (*i*) each trace has a clearly defined "end," so that estimations of each *p<sup>k</sup>* finish with probability 1, and (*ii*) all rare events reside in the uppermost region.

Fig. 3: Importance Splitting algorithms Fixed Effort & restart

**Example.** Fig. 3a shows Fixed Effort estimating the probability to visit states labelled ✔ before others labelled ✘. States ✔ have importance >13, and thresholds 1*,* <sup>2</sup> = 4*,* <sup>10</sup> partition the state space in regions {S*i*}<sup>2</sup> *<sup>i</sup>*=0 s.t. all ✔ ∈ S2. The effort is 5 simulations per region, for all regions: we call this algorithm fe5. In region S0, 2 simulations made it from the initial state to threshold 1, i.e. they reached some state with importance 4 before visiting a state ✘. In S1, starting from these two states, 3 simulations reached 2. Finally, 2 out of 5 simulations visited states ✔ in S2. Thus, the estimated rare event probability of this run of fe 5 is *<sup>p</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>2</sup> *<sup>i</sup>*=1 *<sup>p</sup>*ˆ*<sup>i</sup>* <sup>=</sup> <sup>2</sup> 5 3 5 2 <sup>5</sup> = 9*.*<sup>6</sup> <sup>×</sup> <sup>10</sup>−2.

*RESTART* (rst [48, 47]) is another res algorithm, which starts one trace in *γ*<sup>0</sup> and monitors the importance of the states visited. If the trace up-crosses threshold 1, the first state visited in S<sup>1</sup> is saved and the trace is cloned, aka *split*—see Fig. 3b. This mechanism rewards traces that get closer to the rare event. Each clone then evolves independently, and if one up-crosses threshold <sup>2</sup> the splitting mechanism is repeated. Instead, if a state with importance below <sup>1</sup> is visited, the trace is *truncated* ( ✗ in Fig. 3b). This penalises traces that move away from the rare event. To avoid truncating all traces, the one that spawned the clones in region S*<sup>k</sup>* can go below importance *k*. To deploy an unbiased estimator for *<sup>p</sup>*, restart measures how much split was required to visit a rare state [47]. In particular, restart does not need the rare event to be defined as *γ<sup>M</sup>* [44], and it was devised for steady-state analysis [48] (e.g. to estimate UNAVA) although it can also been used for transient studies as depicted in Fig. 3b [45].

## **4 Importance Splitting for FTA**

The effectiveness of isplit crucially relies on the choice of the importance function I as well as the threshold levels *<sup>k</sup>* [30]. Traditionally, these are given by domain and/or res experts, requiring a lot of domain knowledge. This section presents a technique to obtain <sup>I</sup> and the *<sup>k</sup>* automatically for an rft.

### **4.1 Compositional importance functions for Fault Trees**

By the core idea behind importance splitting, states that are more likely to lead to the rare event should have a higher importance. To achieve this, the key lies in defining an importance function I and thresholds *<sup>k</sup>* that are sensitive to both the state space <sup>S</sup> and the transition probabilities of the system. For us, <sup>S</sup> <sup>⊆</sup> <sup>N</sup>*<sup>n</sup>* are all possible states of a repairable fault tree (rft). Its top event fails when certain nodes fail in certain order, and remain failed before certain repairs occur. To exploit this for isplit, the structure of the tree must be embedded into <sup>I</sup>.

The strong dependence of the importance function I on the structure of the tree is easy to see in the following example. Take the rft from Fig. 2 and let its current state *x* be s.t. P is failed and HVcab and S are operational. If the next event is a repair of P, then the new state *x* (where all basic elements are operational) is farther from a failure of the top event. Hence, a good importance function should satisfy <sup>I</sup> (*x*) *<sup>&</sup>gt;* <sup>I</sup> (*x* ). Oppositely, if the next event had been a failure of <sup>S</sup> leading to state *<sup>x</sup>*, then one would want that <sup>I</sup> (*x*) *<sup>&</sup>lt;* <sup>I</sup> (*x*). The key observation is that these inequalities depend on the structure of as well as on the failures/repairs of basic elements.

In view of the above, any attempt to define an importance function for an arbitrary fault tree must put its gate structure in the forefront. In Table 1 we introduce a compositional heuristic for this, which defines *local importance functions* distinguished per node type. The importance function associated to node *<sup>v</sup>* is <sup>I</sup>*<sup>v</sup>* : <sup>N</sup>*<sup>n</sup>* <sup>→</sup> <sup>N</sup>. We define the *global importance function* of the tree (I- or simply I) as the local importance function of the top event node of -.


Table 1: Compositional importance function for rfts

Thus, I*<sup>v</sup>* is defined in Table 1 via structural induction in the fault tree. It is defined so that it assigns to a *failed* node *v* its *highest importance value*. Functions with this property deploy the most efficient isplit implementations [30], and some res algorithms (e.g. Fixed Effort) require this property [19].

In the following we explain our definition of I*v*. If *v* is a failed BE or SBE, then its importance is 1; else it is 0. This matches the output of the node, thus <sup>I</sup>*v*(*x*) = *<sup>z</sup>v*. Intuitively, this reflects how failures of basic elements are positively correlated to top event failures. The importance of AND, OR, and VOT*<sup>k</sup>* gates depends exclusively on their input. The importance of an AND is the sum of the importance of their children scaled by a normalisation factor. This reflects that AND gates fail when all their children fail, and each failure of a child brings an AND closer to its own failure, hence increasing its importance. Instead, since OR gates fail as soon as a single child fails, their importance is the maximum importance among its children. The importance of a VOT*<sup>k</sup>* gate is the sum of the *k* (out of *m*) children with highest importance value.

Omiting normalisation may yield an undesirable importance function. To understand why, suppose a binary AND gate *v* with children *l* and *r*, and define <sup>I</sup>naive *<sup>v</sup>* (*x*) = <sup>I</sup>*l*(*x*) + <sup>I</sup>*r*(*x*). Suppose that <sup>I</sup>*<sup>l</sup>* takes it highest value in max<sup>I</sup> *<sup>l</sup>* = 2 while <sup>I</sup>*<sup>r</sup>* in max<sup>I</sup> *<sup>r</sup>* = 6 and assume that states *<sup>x</sup>* and *<sup>x</sup>* are s.t. <sup>I</sup>*l*(*x*)=1, <sup>I</sup>*r*(*x*)=0, <sup>I</sup>*l*(*x* )=0, <sup>I</sup>*r*(*x* )=3. This means that in both states one child of *v* is "good-as-new" and the other is "half-failed" and hence the system is equally close to fail in both cases. Hence we expect <sup>I</sup>naive *<sup>v</sup>* (*x*) = <sup>I</sup>naive *<sup>v</sup>* (*x* ) when actually <sup>I</sup>naive *<sup>v</sup>* (*x*)=1 =3= <sup>I</sup>naive *<sup>v</sup>* (*x* ). Instead, <sup>I</sup>*<sup>v</sup>* operates with <sup>I</sup>*l*(*x*) maxI *l* and <sup>I</sup>*r*(*x*) maxI *r* , which can be interpreted as the "percentage of failure" of the children of *v*. To make these numbers integers we scale them by lcm*v*, the *least common multiple* of their max importance values. In our case lcm*<sup>v</sup>* = 6 and hence <sup>I</sup>*v*(*x*) = <sup>I</sup>*v*(*x* )=3. Similar problems arise whit all gates, hence normalization is applied in general.

SPARE gates with *m* children (including its primary) behave similarly to AND gates: every failed child brings the gate closer to failure, as reflected in the left operand of the max in Table 1. However, SPAREs fail when their primaries fail and no SBEs are *available*, e.g. possibly being used by another SPARE. This means that the gate could fail in spite of some children being operational. To account for this we exploit the gate output: multiplying *z<sup>v</sup>* by *m* we give the gate its maximum value when it fails, even when this happens due to unavailable but operational SBEs. For a PAND gate *v* we have to carefully look at the states. If the left child *l* has failed, then the right child *r* contributes positively to the failure of the PAND and hence the importance function of the node *v*. If instead the right child has failed first, then the PAND gate will not fail and hence we let it contribute negatively to the importance function of *v*. Thus, we multiply <sup>I</sup>*r*(*x*) maxI *r* (the normalized importance function of the right child) by <sup>−</sup><sup>1</sup> in the later case (i.e. when state *<sup>x</sup><sup>v</sup>* ∈ { */* <sup>1</sup>*,* <sup>4</sup>}). Instead, the left child always contribute positively. Finally, the max operation is two-fold: on the one hand, *<sup>z</sup><sup>v</sup>* · <sup>2</sup> ensures that the importance value remains at its maximun while failing (PANDs remain failed even after the left child is repaired); on the other, it ensures that the smallest value posible is 0 while operational (since importance values can not be negative.)

### **4.2 Automatic importance splitting for FTA**

Our compositional importance function is based on the distribution of operational/failed basic elements in the fault tree, and their failure order. This follows the core idea of importance splitting: the more failed BEs/SBEs (in the right order), the closer a tree is to its top event failure.

However, isplit is about running more simulations from state with higher *probability* to lead to rare states. This is only partially reflected by whether basic element *b* is failed. Probabilities lie also in the distributions *Fb, Rb, Db*. These distributions govern the transitions among states *x* ∈ S, and can be exploited for importance splitting. We do so using the two-phased approach of [11, 12], which in a first (static) phase computes an importance function, and in a second (dynamic) phase selects the thresholds from the resulting importance values.

In our current work, the first phase runs breadth-first search in the iosa module of each tree node. This computes node-local importance functions, that are aggregated into a tree-global I using our compositional function in Table 1.

The second phase involves running "pilot simulations" on the importancelabelled states of the tree. Running simulations exercises the fail/repair distributions of BEs/SBEs, imprinting this information in the thresholds *k*. Several algorithms can do such *selection of thresholds*. They operate sequentially, starting from the initial state—a fully operational tree—which has importance *i*<sup>0</sup> = 0. For instance, Expected Success [10] runs *N* finite-life simulations. If *K < <sup>N</sup>* <sup>2</sup> simulations reach the next smallest importance *i*<sup>1</sup> *> i*0, then the first threshold will be <sup>1</sup> = *i*1. Next, *N* simulations start from states with importance *i*1, to determine whether the next importance *i*<sup>2</sup> should be chosen as threshold 2, and so on.

Expected Success also computes the *effort* per splitting region <sup>S</sup>*<sup>k</sup>* <sup>=</sup> {*<sup>x</sup>* <sup>∈</sup> <sup>S</sup> <sup>|</sup> *k*+1 *<sup>&</sup>gt;* <sup>I</sup>(*x*) *k*}. For Fixed Effort, "effort" is the base number of simulations to run in region <sup>S</sup>*k*. For restart, it is the number of clones spawned when threshold *k*+1 is up-crossed. In general, if *K* out of *N* pilot simulations make it from *<sup>k</sup>*−<sup>1</sup> to *k*, then the *<sup>k</sup>*-th effort is *<sup>N</sup> K* . This is chosen so that, during res estimations, one simulation makes it from threshold *<sup>k</sup>*−<sup>1</sup> to *<sup>k</sup>* on average.

Thus, using the method from [11, 12] based on our importance function I-, we compute (automatically) the thresholds and their effort for tree -. This is all the meta-information required to apply importance splitting res [19, 18, 11].

Fig. 4: Tool chain

**Implementation.** Fig. 4 outlines a tool chain implemented to deploy the theory described above. The input model is an rft, described in the Galileo textual format [42, 41] extended with repairs and arbitrary pdfs. This rft file is given as input to a Java converter that produces three outputs: the iosa semantics of the tree, the property queries for its reliability or availability, and our compositional importance function in terms of variables of the iosa semantic model. This information is dumped into a single text file and fed to FIG: a statistical model checker specialised in importance splitting res. FIG interprets this importance function, deploying it into its internal model representation, which results in a global function for the whole tree. FIG can then use isplit algorithms such as restart and Fixed Effort, via the automatic methods described above. The result are confidence intervals that estimate the reliability or availability of the rft. In this way, we implemented automatic importance splitting for fta. In [9] we provide more details about our tool chain and its capabilities.

## **5 Experimental evaluation**

#### **5.1 General setup**

Using our tool chain, we computed the unreliability and unavailability of 26 highly-resilient repairable non-Markovian dfts. These trees come from seven literature case studies, enriched with RBOX elements and non-Markovian pdfs. We estimated their UNREL10<sup>3</sup> or UNAVA in increasingly resilient configurations.

To estimate these values we used various simulation algorithms: Standard Monte Carlo (smc); Fixed Effort [19] for different number of runs performed in each <sup>S</sup>*<sup>k</sup>* region (fe*<sup>n</sup>* for *<sup>n</sup>* = 8*,* <sup>12</sup>*,* <sup>16</sup>*,* <sup>24</sup>*,* <sup>32</sup>); restart [47] with thresholds selected via a Sequential Monte Carlo algorithm [12] for different global splitting values (rst*<sup>n</sup>* for *<sup>n</sup>* = 2*,* <sup>3</sup>*,* <sup>5</sup>*,* <sup>8</sup>*,* <sup>11</sup>); and restart with thresholds selected via Expected Success [10], which computes splitting values independently for each threshold (rstes). fe*n*, rst*n*, and rstes, used the automatic isplit framework based in our importance function, as described in Sec. 4.2.

An *instance* y is a combination of an algorithm *algo*, an rft, and a dependability metric. An rft is identified by a case study (CS) and a parameter (p), where larger parameters of the rft CS<sup>p</sup> indicate smaller dependability values *p*CS<sup>p</sup> . Running *algo* for a fixed simulation time, instance y estimates the value *<sup>p</sup>*<sup>y</sup> <sup>=</sup> *<sup>p</sup>*CS<sup>p</sup> . The resulting ci (*p*ˆy) has a certain width *<sup>p</sup>*ˆy<sup>∈</sup> [0*,* 1] (we fix the confidence coefficient *δ* = 0*.*95). The performance of *algo* can be measured by that width: the smaller *<sup>p</sup>*ˆ<sup>y</sup> , the more efficient the algorithm that achieved it.

The simulation time fixed for an rft may not suffice to observe rare events, e.g. for smc. In such cases the FIG tool reports a "null estimate" *<sup>p</sup>*ˆ<sup>y</sup> = [0*,* 0]. Moreover, the simulation of random events depends on the rng—and its seed used by FIG, so different runs may yield different results *p*ˆy. Therefore, for each y we repeated *n* = 10 times the computation of *p*ˆy, to assess the performance of *algo* in y by: (*i*) how many times it yielded not-null estimates, indicated with a bold number at the base of the bar corresponding to y (e.g. **<sup>8</sup> <sup>10</sup>** in Fig. 5b); (*ii*) what was the average width *<sup>p</sup>*ˆ<sup>y</sup> , using not-null estimates only, indicated by the height of the bar; and (*iii*) what was the standard deviation of those widths, indicated by whiskers on top of the bar. We performed *n* = 10 repetitions to ensure statistical significance: a 95% ci for a plotted bar is narrower than the whiskers and, in the hardest configuration of every CS, the whiskers of smc bars never overlap with those of the best res algorithm.

**Case studies.** Our seven parametric case studies are: the synthetic models DSPARE*<sup>n</sup>* and VOT*m*, with *<sup>n</sup>* ∈ {3*,* <sup>4</sup>*,* <sup>5</sup>} SBEs the first, *<sup>m</sup>* ∈ {2*,* <sup>3</sup>*,* <sup>4</sup>} shared BEs the second, and one RBOX each; FTPP*<sup>s</sup>* [17], where we study one triad with *<sup>s</sup>* ∈ {4*,* <sup>5</sup>*,* <sup>6</sup>} shared SBEs, using one RBOX for the processors and another for the network elements; HECS*<sup>o</sup>* [43], with 2 memory interfaces, 4 RBOX (one per subsystem), *<sup>o</sup>* ∈ {1*,...,* <sup>5</sup>} shared spare processors, and <sup>2</sup>*<sup>o</sup>* parallel buses; and RWC*<sup>u</sup>*∈{4*,...,*7} [22, 21, 39], which combines subsystems RC*<sup>v</sup>* with one RBOX and *<sup>v</sup>* ∈ {3*,...,* <sup>6</sup>} SPAREs, and HVC*<sup>w</sup>* with another RBOX and *<sup>w</sup>* ∈ {2*,...,* <sup>4</sup>} shared SBEs. In total these are 26 rfts with pdfs that include exponential, Erlang, uniform, Rayleigh, Weibull, normal, and log-normal distributions. In an extended version of this work [9] we provide all details of our case studies.

**Hardware.** Experiments ran in two types of nodes in a SLURM cluster running Linux x64 (Ubuntu, kernel 3.13.0-168): *korenvliet* nodes have CPUs Intel® Xeon® E5-2630 v3 @ 2.40 GHz, and 64 GB of DDR4 RAM @ 1600 MHz; *caserta* has CPUs Intel® Xeon® E7-8890 v4 @ 2.20 GHz, and 2 TB of RAM DDR4 @ 1866 MHz.

#### **5.2 Experimental results and discussion**

Using smc and restart we computed UNAVA for VOT2*,*3*,*4, HECS1*,...,*5, RC3*,...,*6, and RWC1*,...,*4. fe was not used since it requires regeneration theory for steady-state analysis [19], which is not always feasible with non-Markovian models. The mean widths of the cis achieved per instance are shown in Fig. 5.

For example for VOT<sup>2</sup> (Fig. 5a), 10 independent computations with smc ran in caserta for 5 min, and all converged to not-null cis ( **<sup>10</sup>** ). The mean width of these cis was <sup>1</sup>*.*40×10-<sup>4</sup> and their standard deviation <sup>7</sup>*.*96×10-6. For VOT3, all smc computations yielded not-null cis (after 30 min) with an average precision of <sup>9</sup>*.*62×10-<sup>6</sup> and standard deviation <sup>1</sup>*.*52×10-6. For VOT<sup>4</sup> all smc simulations yielded null cis after 3 hours of simulation (**0**). Instead, rst<sup>2</sup> converged to 10, 10, and 5 not-null cis resp. for VOT2*,*3*,*4, with mean widths (and standard deviation): 1*.*24×10-<sup>4</sup> (1*.*19×10-5), 5*.*09×10-<sup>6</sup> (1*.*48×10-6), and 1*.*79×10-<sup>7</sup> (3*.*19×10-8). Thus for the VOT case study, rst<sup>2</sup> was consistently more efficient than smc, and the efficiency gap increased as UNAVA became rarer.

This trend repeats in all experiments: as expected, the rarer the metric, the wider the cis computed in the time limit, until at some point it becomes very hard to converge to not-null cis at all (specially for smc). For the least resilient configuration of each case study, smc can be competitive or even more efficient than some isplit variants. For instance for VOT<sup>1</sup> and HECS<sup>1</sup> in Figs. 5a and 5b, all computations converged to not-null cis for all algorithms, but smc exhibits less variable ci widths, viz. smaller whiskers. This is reasonable: truncating and splitting traces in restart adds (*i*) simulation overhead that may not pay off to estimate not-so-rare events, and on top of it (*ii*) correlations of cloned traces that share a common history, increasing the variability among independent runs. On the other hand and as expected, smc looses this competitiveness for all case studies as failures become rarer, here when UNAVA -1*.*0×10-5. This

Fig. 5: ci precision for system unavailability

holds nicely for the biggest case studies: HECS<sup>5</sup> †(a 42-nodes rft whose iosa has 126-not-clock variables <sup>≈</sup> <sup>2</sup>*.*89×10<sup>38</sup> states, with 57 clocks of exponential, uniform, and log-normal pdfs) and RWC<sup>4</sup> (42 nodes, 181 variables <sup>≈</sup> <sup>6</sup>*.*93×10<sup>73</sup> states, 62 clocks of exponential, Erlang, Rayleigh, uniform, and normal pdfs).

Using smc, restart, and fe, we also estimated UNREL<sup>1000</sup> for RWC2*,*3*,*4, DSPARE3*,*4*,*5, FTPP4*,*5*,*6, HVC4*,*5*,*6*,*7, and HECS2*,*3*,*4*,*5. For HVC (only) we ran 20 experiments per tree, 10 in each cluster node. Fig. 6 shows the results.

(e) HECS

The overall trend shown for unreliability estimations is similar to the previous unavailability cases. Here however it was possible to use Fixed Effort, since every simulation has a clearly defined end at time *T* = 10<sup>3</sup>. It is interesting thus to compare the efficiency of restart vs. fe: we note for example that some variants of fe performed considerably better than any other approach in the most resilient configurations of FTPP and HECS. It is nevertheless difficult to draw general conclusions from Figs. 6a to 6e, since some variants that performed best in a case study—e.g. fe<sup>16</sup> in HECS—did worse in others—e.g. FTPP, where the best algorithms were fe8,12. Furthermore, fe8, which is always better than

<sup>†</sup>rst<sup>8</sup> for HECS<sup>5</sup> escapes this trend: analysing the execution logs it was found that FIG crashed during the second computation.

smc when UNREL<sup>1000</sup> *<sup>&</sup>lt;* <sup>10</sup>−3, did not perform very well in HVC, where the algorithms that achieved the narrowest and most not-null cis were rst5,11. Such cases notwithstanding, fe is a solid competitor of restart in our benchmark.

Another relevant point of study is the optimal effort *<sup>e</sup>* for rst<sup>e</sup> or fee, which shows no clear trend in our experiments. Here, *e* is a "global effort" used by these algorithms, equal for all S*<sup>k</sup>* regions. *e* also alters the way in which the thresholds selection algorithm Sequential Monte Carlo (seq [12]) selects the *k*. The lack of guidelines to select a value for *e* that works well across different systems was raised in [8]. This motivated the development of Expected Success (es [10]), which selects efforts individually per <sup>S</sup>*<sup>k</sup>* (or *k*). Thus, in rstes, a trace upcrossing threshold *<sup>k</sup>* is split according to the individual effort *<sup>e</sup><sup>k</sup>* selected by es. In the benchmark of [10], which consists mostly of queueing systems, es was shown superior to seq. However, experimental outcomes on dfts in this work are different: for UNAVA, rstes yielded mildly good results for HECS and RC; for the other case studies and for all UNREL<sup>1000</sup> experiments, rstes always yielded null cis. It was found that the effort selected for most thresholds *<sup>k</sup>* was either too small—so splitting in *<sup>e</sup><sup>k</sup>* was not enough for the rstes trace to reach *k*+1—or too large—so there was a splitting/truncation overhead. This point is further addressed in the conclusions.

Beyond comparisons among the specific algorithms, be these for res or for selecting thresholds, it seems clear that our approach to fta via isplit deploys the expected results. For each parameterised case study CSp, we could find a value of the parameter p where the level of resilience is such, that smc is less efficient than our automatically-constructed isplit framework. This is particularly significant for big dfts like HECS and RWC, whose complex structure could be exploited by our importance function.

## **6 Related work**

Most work on dft analysis assumes discrete [43, 3] or exponentially distributed [15, 29] components failure. Furthermore, components repair is seldom studied in conjunction with dynamic gates [6, 3, 40, 29, 31]. In this work we address repairable dfts, whose failure and repair times can follow arbitrary pdfs. More in detail, rfts were first formally introduced as stochastic Petri nets in [6, 13]. Our work stands on [32], which reviews [13] in the context of stochastic automata with arbitrary pdfs. In particular we also address non-Markovian continuous distributions: in Sec. 5 we experimented with exponential, Erlang, uniform, Rayleigh, Weibull, normal, and log-normal pdfs. Furthermore and for the first time, we consider the application of [13, 32] to study rare events.

Much effort in res has been dedicated to study highly reliable systems, deploying either importance splitting or sampling. Typically, importance sampling can be used when the system takes a particular shape. For instance, a common assumption is that all failure (and repair) times are exponentially distributed with parameters *λ<sup>i</sup>* , for some *<sup>λ</sup>* <sup>∈</sup> <sup>R</sup> and *<sup>i</sup>* <sup>∈</sup> <sup>N</sup>*>*<sup>0</sup>. In these cases, a favourable change of measure can be computed analytically [20, 23, 33, 34, 49, 39].

In contrast, when the fail/repair times follow less-structured distributions, importance splitting is more easily applicable. As long as a full system failure can be broken down into several smaller components failures, an importance splitting method can be devised. Of course, its efficiency relies heavily on the choice of importance function. This choice is typically done ad hoc for the model under study [44, 30, 46]. In that sense [24, 25, 11, 12] are among the first to attempt a heuristic derivation of all parameters required to implement splitting. This is based on formal specifications of the model and property query (the dependability metric). Here we extended [11, 12, 8], using the structure of the fault tree to define composition operands. With these operands we aggregate the automatically-computed local importance functions of the tree nodes. This aggregation results in an importance function for the whole model.

## **7 Conclusions**

We have presented a theory to deploy *automatic importance splitting* (isplit) for fault tree analysis of repairable dynamic fault trees (rfts). This Rare Event Simulation approach supports arbitrary probability distributions of components failure and repair. The core of our theory is an importance function I defined structurally on the tree. From such function we implemented isplit algorithms, and used them to estimate the *unreliability* and *unavailability* of highly-resilient rfts. Departing from classical approaches, that define importance functions ad hoc using expert knowledge, our theory computes all metadata required for res from the model and metric specifications. Nonetheless, we have shown that for a fixed simulation time budget and in the most resilient rfts, diverse isplit algorithms can be automatically implemented from I-, and always converge to narrower confidence intervals than standard Monte Carlo simulation.

There are several paths open for future development. First and foremost, we are looking into new ways to define the importance function, e.g. to cover more general categories of fts such as fault maintenance trees [37]. It would also be interesting to look into possible correlations among specific res algorithms and tree structures, that yield the most efficient estimations for a particular metric. Moreover, we have defined I based on the tree structure alone. It would be interesting to further include stochastic information in this phase, and not only afterwards during the thresholds-selection phase.

Regarding thresholds, the relatively bad performance of the Expected Success algorithm shows a spot for improvement. In general, we believe that enhancing its statistical properties should alleviate the behaviour mentioned in Sec. 5.2. Moreover, techniques to increase trace independence during splitting (e.g. resampling) could further improve the performance of the isplit algorithms. Finally, we are investigating enhancements in iosa and our tool chain, to exploit the ratio between fail and dormancy pdfs of SBEs in warm SPARE gates.

#### **Acknowledgments**

The authors thank José and Manuel Villén-Altamirano, for fruitful discussions that helped to better understand the application scope of our approach.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### FIG**: the Finite Improbability Generator** -

Carlos E. Budde <sup>1</sup>

Formal Methods and Tools, University of Twente, Enschede, the Netherlands c.e.budde@utwente.nl

**Abstract.** This paper introduces the statistical model checker FIG 1.2, that estimates transient and steady-state reachability properties in stochastic automata. This software tool specialises in Rare Event Simulation via importance splitting, and implements the algorithms restart and Fixed Effort. FIG is push-button automatic since the user need not define an importance function: this function is derived from the model specification plus the property query. The tool operates with Input/Output Stochastic Automata with Urgency, aka iosa models, described either in the native syntax or in the jani exchange format. The theory backing FIG has demonstrated good efficiency, comparable to optimal importance splitting implemented ad hoc for specific models. Written in C++, FIG can outperform other state-of-the-art tools for Rare Event Simulation.

## **1 Introduction**

In formal analysis of stochastic systems, statistical model checking (smc [33]) emerges as an alternative to numerical techniques such as (exhaustive) probabilistic model checking. Its partial, on-demand state exploration offers a memorylightweight option to exhaustive explorations. At its core, smc integrates Monte Carlo simulation with formal models, where traces of states are generated dynamically e.g. via discrete event simulation. Such traces are samples of the states where a (possibly non-Markovian) stochastic model usually ferrets. Given a temporal logic property *<sup>ϕ</sup>* that characterises certain states, an smc analysis yields an estimate *γ*ˆ of the actual probability *γ* with which the model satisfies *ϕ*. The estimate *γ*ˆ typically comes together with a quantification of the statistical error: two numbers *δ* ∈ (0*,* 1) and *ε >* 0 such that *γ*ˆ ∈ [*γ* − *ε, γ* + *ε*] with probability *δ*. Thus, if *<sup>n</sup>* traces are sampled, the full smc outcome is the tuple (*n, γ, δ, ε* <sup>ˆ</sup> ).

With this statistical quantification—usually presented as a confidence interval (ci) around *<sup>γ</sup>*ˆ—an idea of the quality of an estimation is conveyed. To increase the quality one must increase the *precision* (smaller *ε*) or the *confidence* (bigger *<sup>δ</sup>*). For fixed confidence, this means a narrower ci around *<sup>γ</sup>*ˆ. The number of traces *<sup>n</sup>* is inversely proportional to *<sup>ε</sup>* and to the ci width, so smc trades memory for runtime or precision when compared to exhaustive methods [5].

This trade-off of smc comes with one up and one down. The **up** is the capability to analyse systems whose stochastic transitions can have non-Markovian

<sup>-</sup> This work was partially funded by NWO, NS, and ProRail project 15474 (*SEQUOIA*) and EU project 102112 (*SUCCESS*).

A. Biere and D. Parker (Eds.): TACAS 2020, LNCS 12078, pp. 483–491, 2020. https://doi.org/10.1007/978-3-030-45190-5\_27

distributions. In spite of gallant efforts, this is still out of reach for most other model checking approaches, making smc unique. The **down** are rare events. If there is a very low probability to visit the states characterised by the property *ϕ*, then most traces will not visit them. Thus the estimate *γ*ˆ is either (an incorrect) 0 or, if a few traces do visit these states, statistical error quantification make *ε* skyrocket. To counter such phenomenon, *n* must increase as *γ* decreases. Unfortunately, for typical estimates such as the sample mean, it takes *<sup>n</sup>* <sup>384</sup>*/<sup>γ</sup>* to build a (rather lax!) ci where *<sup>δ</sup>* = 0*.*<sup>95</sup> and *<sup>ε</sup>* <sup>=</sup> *<sup>γ</sup>* <sup>10</sup> . If e.g. *<sup>γ</sup>* <sup>≈</sup> <sup>10</sup>−<sup>8</sup> then *n* 38400000000 traces are needed, causing trace-sampling times to grow unacceptably long. Rare Event Simulation (res [24]) methods tackle this issue.

The two main res methods are importance sampling (is) and importance splitting (isplit). is compromises the aforementioned **up**, since it must tamper the stochastic transitions of the model. Given that the study of non-Markovian systems is a chief reason to use smc, FIG, a statistical model checker specialised in res, implements isplit. To deploy an efficient implementation, however, both importance sampling and splitting require expert knowledge. The novelty of FIG lies on its automatic derivation of the importance function (and thresholds and splitting values) required by isplit. This derivation exploits the model and property under study, resulting in a push-button application of res for smc.

**Outline.** The way in which FIG approaches res is explained in Sec. 2. Its model and properties input syntax are presented in Sec. 3. Finally, Sec. 4 mentions some features of FIG 1.2, before ending the paper with the briefest experimental display.

**Related work.** Other statistical model checkers offer res methods to some degree of automation. Plasma Lab implements automatic is [18] and semiautomatic isplit [21] for Markov chains. Its isplit engine offers a wizard that guides the user to choose an importance function. The wizard exploits a layered decomposition of the property query—not the system model. Via apis, the isplit engine of Plasma Lab could be extended beyond dtmc models. SBIP 2.0 [22] implements the same (semiautomatic, property-based) engine for dtmcs. <sup>S</sup>BIP offers a richer set of temporal logics to define the property query in. Cosmos [1] and ftres [26] implement importance sampling on Markov chains, the latter specialising in systems described as repairable Dynamic Fault Trees (dfts). All these tools can operate directly on Markovian models, and none offers fully automated isplit. Instead, the smc tool modes [5] supports non-Markovian probability distributions and is much closer to the capabilities of FIG, offering a similar degree of automation. As a matter of fact, all core res algorithms in modes were inspired in or motivated by the theory behind FIG. On the one hand, FIG is restricted to fully-stochastic iosa models, whereas modes can also cope with nondeterminism (e.g. in Markov automata) using the LSS algorithm [10, 5]. On the other hand, using the batch means method, FIG can estimate steady-state properties, which modes cannot currently do. Moreover, FIG 1.2 implements basic functionality to tailor importance functions for dfts.

Previous versions of FIG have been used for scientific experimentation and research: the theory of [6] was first implemented and exercised with FIG 1.0; and FIG 1.1 was presented in [2], and last used in an extended journal version of [5].

## **2 Rare Event Simulation**

res methods make more traces visit the rare states that satisfy a property *<sup>ϕ</sup>* (the set <sup>S</sup>*ϕ*), to reduce the variance of smc estimators. For a fixed budget of traces *<sup>n</sup>*, this yields more precise cis than classical Monte Carlo simulation (cmc).

FIG implements *importance splitting*: a main res method that can work on non-Markovian systems without special considerations. isplit splits the states of the model into layers that wrap S*<sup>ϕ</sup>* like an onion. Reaching a state in S*<sup>ϕ</sup>* from the surface is then broken down into many steps. The *i*-th step estimates the conditional probability to reach (the inner) layer *i* + 1 from (the outer) layer *i*. This stepwise estimation of conditional probabilities can be much more efficient than trying to go in one leap from the surface of the onion to its core [20].

Formally, let S be the states of a model with initial states S<sup>0</sup> and rare states <sup>S</sup>*ϕ*. isplit works on a partition -*M <sup>i</sup>*=0 S*<sup>i</sup>* = S, where S*<sup>ϕ</sup>* = S*M*. To estimate the probability *γ* = *Prob*(S*<sup>ϕ</sup>* | S0), each conditional probability *γ<sup>i</sup>* = *Prob*(S*<sup>i</sup>* | S*<sup>i</sup>*−1) is estimated separately via cmc. Then simply *<sup>γ</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>M</sup> <sup>i</sup>*=1 *<sup>γ</sup>*ˆ*<sup>i</sup>* <sup>≈</sup> *<sup>M</sup> <sup>i</sup>*=1 *γ<sup>i</sup>* = *γ*.

This approach is correct, i.e. it yields an unbiased estimator *<sup>γ</sup>*<sup>ˆ</sup> *<sup>n</sup>*→∞ −−−−→ *<sup>γ</sup>*. However, it is efficient iff <sup>∀</sup>*<sup>M</sup> <sup>i</sup>*=1 *. γ<sup>i</sup> γ*, which depends on how the S*<sup>i</sup>* layers where chosen. For this, an *importance function <sup>f</sup>* : <sup>S</sup> <sup>→</sup> <sup>R</sup><sup>0</sup> and *thresholds <sup>i</sup>* <sup>∈</sup> <sup>R</sup><sup>0</sup> are defined: then <sup>S</sup>*<sup>i</sup>* <sup>=</sup> {*<sup>s</sup>* <sup>∈</sup> <sup>S</sup> <sup>|</sup> *<sup>i</sup> <sup>f</sup>*(*s*) *< i*+1}, where <sup>0</sup> = 0, and <sup>S</sup>*<sup>ϕ</sup>* are the states with highest *importance*, i.e. *<sup>f</sup>*(*s*) *M*. The efficiency of isplit is thus delegated to the choice of {*i*}*<sup>M</sup> <sup>i</sup>*=1 and the importance function *f*.

These choices are *the* key challenge in isplit [20]. Theoretical developments assume *<sup>f</sup>* is given [12, 8], and applications define it ad hoc via (res and domain) expert knowledge [30, 27]. Yet there is one general rule: importance must be proportional to the probability of reaching S*ϕ*. Thus for *s, s* ∈ S, if a trace that visits *s* is more likely to observe a rare state, one wants *f*(*s*) *f*(*s* ). This means that *<sup>f</sup>* depends both on the model <sup>M</sup> and the property *<sup>ϕ</sup>* that define <sup>S</sup>*ϕ*.

FIG, an smc tool, exploits the formal definitions of <sup>M</sup> and *<sup>ϕ</sup>* to derive *<sup>f</sup>* and {*i*}*<sup>M</sup> <sup>i</sup>*=1 so as to reflect this rule. For this, FIG runs bfs from <sup>S</sup>*<sup>ϕ</sup>* on the (inverted) transitions of M. This computes the number-of-transitions distance from each state to S*ϕ*. The heuristic importance function of FIG, *<sup>f</sup>* , is the inverse of this distance, stored as an array the size of S. To avoid the state explosion FIG works on modular formalisms, deriving local *f <sup>i</sup>* for the <sup>M</sup>*<sup>i</sup>* whose parallel composition forms M. *f* is an aggregation of these functions, e.g. adding the *f <sup>i</sup>* of every <sup>M</sup>*<sup>i</sup>* with variables in *ϕ*. Details are in [2] and also in [5], where the difference with the (later) implementation in modes is that FIG uses the dnf of *<sup>ϕ</sup>*.

*f* is solely based on the number-of-transitions distance. Stochastic behaviour of M omitted by *f* , such as probabilistic labels in the transitions, is captured in the thresholds *i*. For this, FIG runs short simulations that start from S0. Say *K*<sup>1</sup> out of *<sup>N</sup>* simulations visit states with importance *<sup>i</sup>*<sup>1</sup> *> i*<sup>0</sup> <sup>=</sup> *<sup>f</sup>* (S0). Then, 1 out of *e*<sup>1</sup> = *<sup>N</sup> K*<sup>1</sup> simulations are expected to reach threshold <sup>1</sup> = *i*1. Next, repeat this procedure starting from states with importance *i*<sup>1</sup> to choose <sup>2</sup> and *e*2. Etc. Such threshold-selection algorithms (see Sec. 4) are fully described in [4].

Thus, just from <sup>M</sup> and *<sup>ϕ</sup>*, FIG enables isplit by computing *<sup>f</sup>* and {*i, ei*}*<sup>M</sup> <sup>i</sup>*=1.

## **3 Modelling formalism and input languages**

**IOSA.** FIG models are Input/Output Stochastic Automata with urgency [11]. In iosa, continuous variables called *clocks* sample random values from arbitrary distributions (pdfs). As time evolves, all clocks count down at the same rate. The first to reach zero can trigger events and synchronise with other modules, broadcasting an *output* action that synchronises with homonymous *input* actions (iosa are input-enabled). Actions can be urgent, where urgent outputs have

```
module M1
  fc,rc : clock;
  inf,brk : [0..2] init 0;
  [fl!] brk==0 @ fc -> (inf'=1)
                     & (brk'=1);
  [r??] brk==1 ->(brk'=2)&(rc'=γ);
  [up!] brk==2 @ rc -> (inf'=2)
                     & (brk'=0)
                     & (fc'=μ);
  [f!!] inf==1 -> (inf'=0);
  [u!!] inf==2 -> (inf'=0);
endmodule
```
maximal progress. iosa can thus be nondeterministic: to allow simulation, [23] gives conditions to ensure determinism modulo weak bisimulation. iosa *variables* are clocks, integers, or Booleans. *Constants* can also be floats and have global scope (variables are module-local). FIG offers array variables and can get e.g. "arandom/the-smallest value." Code 1 shows the guarded command language of FIG models. Decorators ?/! tell an action is input/output, e.g.

Code 1: iosa module in FIG 1.2

fl!. Double decorators (r??) are for urgency. Non-urgent outputs can be sent only on clock expiration ([fl!]··· @ fc ->). A clock can sample random values (fc'=*μ*).

**JANI.** Besides its native input syntax, FIG 1.2 reads models written in the jani exchange format [7]. Model types supported are ctmc and a subset of sta that matches iosa, e.g. with a single pdf per clock and broadcast synchronisation. FIG also translates iosa to jani as sta, to share models with tools such as the Modest Toolset [16] and Storm [13]. This is used in Sec. 4 for comparisons.

**Properties.** FIG estimates the probability with which input **properties** models satisfy temporal logic formulæ. A formula is specified as a (transient or steady-state) property query in the model file. Transient properties in FIG correspond to the pctl-like query P=? in prism [19]: e.g. the first property in Code 2 asks the probability of assigning value 8 to variable q2 before

it takes a value 0. Steady-state properties in FIG correspond to the unbounded csl-like query S=? in prism: e.g. S(q2>=8). For steady-state estimations FIG implements batch means [9]. The initial (discarded) transient simulation time, and the batch time, can be heuristically computed by the tool. These values can also be given by the user—in Code 2, the last property specifies 9 and 999 resp.

## **4** FIG 1.2 **showcase**

theory [15], seldom applicable to non-Markovian models and unsupported by FIG 1.2). rst and fe work with an *effort <sup>e</sup>*. fe*<sup>e</sup>* means *<sup>e</sup>* simulations are ran in a layer <sup>S</sup>*i*. rst*<sup>e</sup>* means *<sup>e</sup>* <sup>−</sup> <sup>1</sup> clones are spawned when a simulation up-crosses a threshold *i*. Omitting *<sup>e</sup>* makes FIG 1.2 use respectively fe<sup>8</sup> or rst3.

<sup>A</sup> res run yields a random value *<sup>r</sup>* <sup>∈</sup> [0*,* 1] of unknown distribution, so FIG computes standard clt confidence intervals with Student's *<sup>t</sup>*-distribution quantiles. *r* has a Bernoulli distribution only for transient properties estimated with cmc: FIG can then use Wilson score intervals [32]. Floating-point precision loss is reduced by using the logarithm of *r* and of the number of runs.

FIG reads or computes importance functions. Option --adhoc takes as mandatory argument a function on the variables of the iosa modules. Instead, --amono automatically builds *f* on the parallel composition of all modules, and --acomp builds a local *f <sup>i</sup>* per iosa module—see Sec. 2. For --acomp, FIG takes an optional argument to aggregate all local *f <sup>i</sup>* into one global *f* . This can be an associative binary arithmetic operator, or a custom function on the names of the iosa modules. By default, *f* is computed as the sum of all local functions. Option --dft 0 indicates that the model is a fault tree: FIG then builds specialised local importance functions for certain modules, e.g. basic events and pand gates.

Two algorithms in FIG 1.2 can compute the thresholds and efforts {*i, ei*}*<sup>M</sup> <sup>i</sup>*=1. Sequential Monte Carlo [8, 6] (seq, option -t hyb) is characterised by one effort for all regions <sup>S</sup>*i*, set with -g *<sup>e</sup>*. Instead, Expected Success [4] (es, -t es) determines each effort *e<sup>i</sup>* per S*<sup>i</sup>* region. By default FIG 1.2 uses -e restart -g 3 -t hyb. Other customisable options are the rng, its seed, the floating point precision, and a timeout. Mandatory arguments for FIG invocation are the model and properties file, the simulation type (--flat for cmc, or --adhoc/amono/acomp for res), and a stop criterion (either time, or confidence and precision of the ci).

**Experimental demonstration.** We display the capabilities of FIG via three experiments. First, we show how isplit implemented in FIG 1.2 is as automatic but more efficient than cmc to estimate rare properties. Second, we test the degree to which *f* in FIG can approximate optimal importance functions chosen ad hoc for some models. Third, we compare FIG and its closest competitor: modes. All these experiments can be reproduced via the artifact freely available in [3].

We test different configurations of engines, efforts, and thresholds. For each configuration we run simulations until some timeout. This yields a ci with precision <sup>2</sup>*<sup>ε</sup>* for confidence coefficient *<sup>δ</sup>* = 0*.*95. The smaller the *<sup>ε</sup>*, the narrower the ci, and the better the performance of the configuration (and tool) that produced it.

First, we analyse repairable dfts with warm spares and exponential (fail), normal (repair), and lognormal (dormancy) pdfs. Using cmc, fe<sup>8</sup>*,*16*,*<sup>32</sup> and rst<sup>3</sup>*,*4*,*<sup>6</sup> we estimate the probability of a top level event after the first failure, before all components are repaired, in trees with 6, 7, and 8 spares (the smallest iosa has 116 variables and > <sup>2</sup>*.*5 e 37 states). For isplit we used seq thresholds with --dft 0 --acomp and no arguments, i.e. as automatic as cmc.

With a 20 min timeout, each configuration was repeated 13 times in a Xeon E5-2683v4 CPU running Linux x64 4.4.0. The height of the bars in the top plot of Fig. 1 is the average ci precision (lower is better), using Z-score*m*=2 to remove

Fig. 1: ci precision. Top: dfts (transient). Bottom: queues (steady-state).

outliers [17]. Whiskers are standard deviation, and white numbers indicate how many runs yielded not-null estimates. Clearly, res algorithms outperform cmc in the hardest cases: less than half of cmc runs in DFT-8 could build (wide) cis.

Second, we estimate the steady-state overflow probability in the last node of tandem queues, on a Markovian case with 2 buffers [29], 3 buffers [28], and a non-Markovian 3-buffers case [30]. We study how FIG—using --amono, seq, and rst<sup>3</sup>*,*4*,*5*,*7*,*9—approximates each optimal ad hoc function and thresholds of [29, 28, 30]. Experiments ran as before: the bottom plot of Fig. 1 shows that FIG's default (rst<sup>3</sup> with seq, legend "AUTO 3") is always closest to the optimal.

Third, we compare FIG and modes in the original benchmark of the latter [5]. We do so for fe-seq, rst-seq, rst-es, using each tool's default options. We ran each benchmark instance 15 min, thrice per tool, in an Intel i7-6700 CPU with Linux x64 5.3.1. The scatter plots of Fig. 2 show the median of the ci precisions. Sub-plots on the bottom-right are a zoom-ins in the range [10−10*,*10−5].

An (x,y) point is an instance whose median ci width was x for FIG 1.2 and y for modes netcore-3.0.150, single threaded. A point over the solid diagonal line means FIG built a narrower ci. A point on the upper boundary means that modes built no cis in all runs. Dotted diagonal lines indicate cis twice as wide. Fig. 2 shows that both tools perform similarly, with a slight trend in favour of FIG. This could be caused by modes operating on jani sta (translated from iosa by FIG): modes must assign values to variables and then compare them to clocks.

Albeit modes is multi-threaded, these experiments ran on a single thread to compare both tools on equal conditions. On the other hand, FIG also estimates the probability of steady-state properties, for which there is no support in modes.

Fig. 2: ci precision of FIG (x-axis) vs. modes (y-axis): medians of 3 runs <sup>×</sup> 15 min

## **References**


**Acknowledgments.** The author thanks Arnd Hartmanns for excellent discussions that originally motivated and subsequently helped to shape this work.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## MORA - Automatic Generation of Moment-Based Invariants

Ezio Bartocci<sup>1</sup> , Laura Kovacs ´ <sup>1</sup>,<sup>2</sup> , and Miroslav Stankovicˇ<sup>1</sup>

<sup>1</sup> TU Wien, Vienna, Austria <sup>2</sup> Chalmers, Gothenburg, Sweden

Abstract. We introduce MORA, an automated tool for generating invariants of probabilistic programs. Inputs to MORA are so-called Prob-solvable loops, that is probabilistic programs with polynomial assignments over random variables and parametrized distributions. Combining methods from symbolic computation and statistics, MORA computes invariant properties over higher-order moments of loop variables, expressing, for example, statistical properties, such as expected values and variances, over the value distribution of loop variables.

## 1 Introduction

Probabilistic programs (PPs) are becoming more and more commonplace. Originally employed in randomized algorithms and cryptographic/privacy protocols, now gaining momentum due to the several emerging applications in the areas of machine learning and AI [5]. By introducing randomness into the program, program variables can no longer be treated as having single values; we must think about them as distributions. Dealing with distributions is much more challenging and some simplifications are required. Existing approaches, see e.g. [1,3,7,9,10], usually take into consideration only expected values or upper and lower bounds over program variables, or rely on user guidance for providing templates and hints.

One of the main challenges in analyzing PPs and computing their higher-order moments comes with the presence of loops and the burden of computing so-called *quantitative invariants* [7]. Quantitative invariants are properties that are true before and after each loop iteration and are crucial for analyzing the behavior of PP loops.

In this paper, we introduce the MORA tool for computing quantitative invariants of a class of PPs, called *Prob-solvable loops* [2], with random assignments, parametrized distributions, and polynomial probabilistic updates. Our implementation is available at:

#### https://github.com/miroslav21/mora,

and successfully evaluated on a number of challenging examples. Unlike other existing approaches, e.g. [1, 3, 7, 9], MORA computes non-linear invariants in a fully automatic way, without relying on user-provided templates/hints. The proposed automatic approach can handle an arbitrary number of loop iterations and also infinite loops. On the contrary, tools like PSI [4] support only the automatic analysis of probabilistic programs with a specified number of loop iterations.

This research was supported by the ERC Starting Grant 2014 SYMCAR 639270, the Wallenberg Academy Fellowship 2014 TheProSE, and the Austrian FWF project W1255-N23.

x=0 while true: u = RV(uniform, 0, b) g = RV(gauss, 0, 1) x = x - u @ 1/2; x + u @ 1/2 y=y+x+g

Loop conditions are ignored, yielding non-deterministic PPs. The value of the random variable u is sampled by a uniform distribution with support in the real interval [0, b], whereas the value of g is a random number from a normal distribution with mean (first

moment) 0 and variance (second moment) 1. Updates to variable x are probabilistic: with probability 1/2, the variable x is updated by x-u. Similarly, with probability 1/2, x is updated by x+u. Further, updates to u and g do not depend on other variables; the update to x depends only on itself and u.

Fig. 1. An illustrative example of a Prob-solvable loop.

Moreover, the invariants inferred by MORA are not restricted to expected values but are quantitative invariants over the higher-order moments of program variables. We refer to such invariants as *moment-based invariants* [2]. To the best of our knowledge, no other approach can so far automatically compute higher-order moments of PPs, not even for the restricted yet expressive enough class of Prob-solvable loop.

The purpose of this paper is to describe what MORA can do and how it can be used. The paper is intended as a tool demonstration and guide for potential users of MORA. We focus on the usage and implementation aspects of MORA. For details on theoretical foundations and algorithmic aspects of MORA for computing moment-based invariants, we refer to [2]. We note however that, when compared to the experimental setup of [2], MORA comes with a completely new design, fully implemented in python and supporting an easy installation and use by even non-experts in PPs.

## 2 MORA– Programming Model

Input programs to MORA are PP loops that are Prob-solvable [2]. In Figure 1, we give an example of a Prob-solvable loop and use this example as a running example to guide the potential users of MORA in the rest of this paper.

In a nutshell, the probabilistic assignments of Prob-solvable loops involve (i) variable values drawn from random distributions, such as uniform or normal distributions, and (ii) random variable updates. In the sequel, we write RV to refer to a random variable. Input programs to MORA thus satisfy the following two properties:

(1) Input programs to MORA are PPs generated from the grammar in Figure 2.

(2) In addition to the grammar of Figure 2, MORA requires its PP input to be Probsolvable, imposing further restrictions as follows:


Note that Figure 1 satisfies all constraints above, and thus is Prob-solvable.

#### Grammar defining PP inputs to MORA

```
PROGRAM → INIT ASSIGNS " while t r u e : " RV ASSIGNS UPD ASSIGNS
INIT ASSIGNS → INIT ASSIGN | INIT ASSIGN INIT ASSIGNS
RV ASSIGNS → RV ASSIGN | RV ASSIGN RV ASSIGNS
UPD ASSIGNS → UPD ASSIGN | UPD ASSIGN UPD ASSIGNS
INIT ASSIGN → VAR " = " INIT EXPR
RV ASSIGN → VAR " = " RV EXPR
UPD ASSIGN → VAR " = " UPD BRANCHES
UPD BRANCHES → UPD BRANCH | UPD BRANCH UPD BRANCHES
UPD BRANCH → UPD EXPR "@" UPD PROB
UPD PROB → SIMP EXPR
INIT EXPR → RV EXPR | SIMP EXPR
RV EXPR → "RV( u ni f o rm , " SIMP EXPR " , " SIMP EXPR " ) "
                 | "RV( g a u s s , " SIMP EXPR " , " SIMP EXPR " ) "
UPD EXPR → UPD EXPR OP UPD EXPR | VAR | ATOM
SIMP EXPR → SIMP EXPR OP SIMP EXPR | ATOM
ATOM → NUM | PARAMETER
OP → [∗+−]
VAR → [ a−zA−Z][a−zA−Z0−9]∗
PARAMETER → [ a−zA−Z][a−zA−Z0−9]∗
NUM → [ −]?[0 −9]+[.]?[0 −9 ]∗( [\/][1 −9][0 −9 ]∗)?
```
Fig. 2.

## 3 MORA– Usage

We describe the easiest way MORA can be used to generate moment-based invariants:


from mora.mora import mora

– Run MORA using the command:

```
mora("running", goal=GOAL),
```
where GOAL can be (i) a specific natural number k <sup>≥</sup> <sup>1</sup>, in which case MORA computes the kth moments of all variables from running; (ii) a specific moment of one loop variable of running (e.g. "xˆ2" specifying the second moment of a variable x of Figure 1); or (iii) a list containing the goals as just specified. One can specify finitely many goals as inputs to MORA; yet, at least one goal is required. For example, by running mora("running", [1, "xˆ2", "xˆ3"]), MORA computes the expected values (first moments, i.e. 1) of all variables from Figure 1, as well as the second and third moments of variable x of Figure 1 (specified by xˆ2 and xˆ3, respectively).

MORA is completely automatic. That is, once an execution of MORA is started on a given Prob-solvable loop and input goals, MORA outputs the higher-order moments, and thus moment-based invariants, of its loop w.r.t. the specified input goals. To this end, MORA computes the expected values of all monomials over loop variables, on which one of the goals from Goal depends. In general, computing the kth moment requires computing the expected values of all monomial expressions over loop variables, such that the total degree of the monomials is less or equal than k – see [2] for more details.

Fig. 3. MORA workflow diagram.

In the rest of the paper, we will illustrate the main steps of MORA, by considering Figure 1 as its input loop and [1, 2] as its list of input goals. With such an input goal, MORA is set to compute the first and second moments of each variable of Figure 1. Note, that even if 1 was omitted from the aforementioned input goal, MORA would still need to compute some of the first moments of the variables, as they are required for computing the second-order moments. In the sequel, we show-case the MORA behaviour for:

$$\text{norma}\left(\text{"running"}, \quad \left[1, \,\,\,\mathbb{Z}\right]\right). \tag{1}$$

## 4 MORA– Tool Overview

We first give details on our implementation. We then present the overall workflow of MORA in Figure 3, based on which we overview the main components of our tool.

*Overall Implementation.* MORA is implemented in python3, requiring python version of at least 3.7. MORA relies on the diofant and scipy libraries: (i) the python library diofant is used in MORA for symbolic mathematical computations and recurrence solving; (ii) the scipy library, and in particular its statistics module scipy.stats, is used in MORA to handle probability distributions and statistical functions, as well as to simplify and compute expressions involving probability distributions and initial values of variables. Altogether, our implementation comprises of around 350 lines of code.

MORA *– Parser.* MORA first checks whether a given input program is Prob-solvable, by checking the requirements of Section 2. If the input program is not Prob-solvable, an error is reported, and the execution of MORA stops. Otherwise, within its parser module, MORA extracts initial values from its input loop, rewrites loop updates into equations over expected values of monomial expressions over loop variables, and processes the list of its input goals to identify which higher-order moments need to be computed.

For our demo execution (1), MORA extracts the initial value x(0)=0, where x(0) denotes the initial value of x before the loop. Using the input goals specified in (1), MORA is set to compute the expected values of {u, g, x, y, uˆ2, gˆ2, xˆ2, yˆ2 characterizing the first and second moments of all loop variables of Figure 1. Further, the loop updates of Figure 1 are rewritten by MORA into equations over expected values, as follows:

$$\begin{cases} E[x^k(n+1)] = E[1/2 \cdot (x(n) - u(n+1))^k + 1/2 \cdot (x(n) + u(n+1))^k] \\ E[y^k(n+1)] = E[(y(n) + x(n+1) + g(n+1))^k] \end{cases}, \quad \text{(2)}$$

where n <sup>≥</sup> <sup>0</sup> is the loop counter of Figure 1, x(n) denotes the value of <sup>x</sup> at the nth loop iteration, and E[expr] is the expected value of an expression expr.

MORA *– Core.* After rewriting probabilistic loop updates into equations over expected values, MORA rewrites these equations into non-probabilistic recurrences over so-called E-variables, with the loop counter n being the recurrence index. E-variables are simply variables created from monomials over original variables. Thanks to the restrictions defining PPs to be Prob-solvable, the resulting recurrences are linear recurrences with constant coefficients, that is C-finite recurrences, whose closed forms can always be computed [8]. MORA solves these recurrences by calling its *Solver* module.

Using the equations (2) over expected values, the non-probabilistic recurrences of Figure 1 generated by MORA are as follows, using the MORA synthax:

$$\begin{aligned} \mathbf{y} &= \mathbf{x} + \mathbf{y} \\ \mathbf{g} &\ast \ast 2 = 1 \\ \mathbf{x} &= \mathbf{x} \\ \mathbf{u} &= \mathbf{b}/2 \\ \mathbf{x} &\ast \ast 2 = \mathbf{b} \ast \ast 2/3 + \mathbf{x} \ast \ast 2 \\ \mathbf{u} &\ast \ast 2 = \mathbf{b} \ast \ast 2/3 \\ \mathbf{y} &\ast \ast 2 = \mathbf{b} \ast \ast 2/3 + \mathbf{x} \ast \ast 2 + 2 \ast \ast \ast \mathbf{y} + \mathbf{y} \ast \ast 2 + 1 \\ \mathbf{g} &= 0 \\ \mathbf{x} \ast \mathbf{y} &= \mathbf{b} \ast \ast 2/3 + \mathbf{x} \ast \ast 2 + \mathbf{x} \ast \ast \mathbf{y} \end{aligned} \tag{3}$$

The left-hand sides of these equations represent values of E-variables at iteration n+ 1, while monomials over original variables on the right-hand side represent E-variables at iteration n. For example, the first equation of (3) stands for E[y(n <sup>+</sup> 1)] = <sup>E</sup>[x(n)] + E[y(n)]. On the other hand, the fourth equation of (3) represents E[x(n + 1)<sup>2</sup>] = b2 <sup>3</sup> <sup>+</sup> <sup>E</sup>[x(n)<sup>2</sup>], as <sup>b</sup> is a constant parameter and x\*\*k in python denotes the <sup>k</sup>th power of x.

Solver. In this module, MORA extracts and solves recurrences from the non-probabilistic equations over E-variables computed by its *Core* module. By exploiting the structure of Prob-solvable programs, MORA also optimizes the order in which recurrences are solved, e.g. independent recurrences are solved first. Partial solutions can be used to reduce the complexity of the latter recurrences. MORA then uses the diofant library to handle and solve single recurrences.

For Figure 1, using the E-variable equations of (3), the following closed form solutions are computed by MORA:

$$\begin{array}{l} E[u^2] = \frac{b^2}{3} \\ E[x^1] = 0 \\ E[y^1] = y(0) \\ E[x^2] = \frac{b^2n}{3} \\ E[u^1] = \frac{b}{2} \\ E[y^1x^1] = \frac{b^2n}{6}(n+1) \\ E[y^2] = \frac{n}{18}\left(2b^2n^2 + 3b^2n + b^2 + 18\right) + y(0)^2 \\ E[y^1] = 0 \\ E[g^2] = 1 \end{array} \tag{4}$$

with y(0) standing for the initial value of y (treated as a parameter, since not specified).

MORA *– Out Parser.* MORA's output consists of basic information about the program and the goal, moment-based invariants computed, and computation time. By default, the MORA output is shown only on the screen. However, an optional argument can specify if an output file should be created. Two possible values for output format are (i) "txt", producing a simple human-readable file, and (ii) "tex", producing a file with invariants in LATEX format (as given in (4) above).

## 5 Evaluation

A proof-of-concept implementation, together with initial experiments, were already given in our work on generating moment-based invariants [2]. MORA comes however with a new design and re-implementation of [2], significantly improving the experimental setting and evaluations of [2]. Table 1 compares MORA against the experiments of [2], on a subset of Prob-solvable loops from [2], evidencing that MORA is faster than our initial proof-of-concept implementation. This is due to the following reasons:




## 6 Conclusion

We described MORA, a fully automated tool for generating invariants of probabilistic programs. MORA combines recurrence solving, symbolic summation and statistical reasoning, and derives higher-order moments of loop variables in probabilistic programs.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Author Index

Abate, Alessandro I-97 Afzal, Mohammad II-383 Ahmed, Daniele I-97 Akshay, S. I-387 Albert, Elvira II-118 Almaawi, Alyas I-115 Almeida, Bernardo II-39 An, Jie I-444 Angluin, Dana II-325 Ayaziová, Paulína II-413 Babu M, Charles II-383 Baier, Christel I-324 Barreto, Raimundo II-403 Barrett, Clark I-367 Bartocci, Ezio I-492 Becker, Benedikt II-235 Bendík, Jaroslav I-135 Benerecetti, Massimo II-289 Berger, Philipp I-40 Beyer, Dirk I-3, II-126, II-347 Biagi, Marco I-463 Bian, Jinting II-217 Bockenek, Joshua A. II-98 Boender, Jaap II-271 Bornat, Richard II-271 Bozga, Marius I-228 Budde, Carlos E. I-463, I-483 Castro, David II-278 Celik, Ahmet II-137 Černá, Ivana I-135 Chakraborty, Supratik I-22, II-383 Chalupa, Marek II-413 Chauhan, Avriti II-383 Chen, Mingshuai I-444 Chimdyalwar, Bharti II-383 Cimatti, Alessandro I-155 Cordeiro, Lucas C. II-403 Correas, Jesús II-118 Cubuktepe, Murat I-287

D'Argenio, Pedro R. I-463 Dangl, Matthias I-3 Darke, Priyanka II-383 de Boer, Frank S. II-217 de Gouw, Stijn II-217 Delgrange, Florent I-346 Dell'Erba, Daniele II-289 Deng, Yuxin II-21 Dietsch, Daniel II-418 Dixon, Alex I-405 Du, Wenjie II-21 Dubut, Jérémy I-191 Esparza, Javier I-228 Fan, Chuchu I-173 Fedyukovich, Grigory II-195 Ferreira, Francisco II-278 Fisman, Dana II-325 Frenkel, Hadar I-211 Frohn, Florian I-58 Funke, Florian I-324 Furbach, Florian II-378 Gastin, Paul I-387 Geatti, Luca I-155 Geldenhuys, Jaco II-373 Giacobbe, Mirco II-79 Gligoric, Milos II-137 Goel, Aman I-413 Gordillo, Pablo II-118 Griggio, Alberto I-155 Groote, Jan Friso II-3 Grumberg, Orna I-211 Gupta, Aarti II-195 Gupta, Ashutosh I-22, II-383

Hahn, Ernst Moritz I-306 Hamers, Ruben I-266 Hasuo, Ichiro I-191 Heizmann, Matthias II-418 Heljanko, Keijo II-378 Henzinger, Thomas A. II-79 Hiep, Hans-Dieter A. II-217 Howar, Falk II-398 Hruška, Martin II-413 Huisman, Marieke I-247 Hussein, Soha II-393

Iosif, Radu I-228

Jansen, David N. II-3 Jansen, Nils I-287 Jantsch, Simon I-324 Jašek, Tomáš II-413 Jeannerod, Nicolas II-235 Jongmans, Sung-Shik I-266 Joosten, Sebastiaan J. C. I-247 Junges, Sebastian I-287

Kammueller, Florian II-271 Kápl, Roman II-254 Katoen, Joost-Pieter I-40, I-287, I-346 Katsumata, Shin-ya I-191 Keiren, Jeroen J. A. II-3 Khurshid, Sarfraz I-115 Kimberly, Greg I-155 King, Andy I-79 Kobayashi, Naoki II-195 Kolčák, Juraj I-191 Kovács, Laura I-492 Krishna, S I-387 Kumar, Shrawan II-383

Lang, Frédéric II-57 Lazić, Ranko I-405 Lechner, Mathias II-79 Lochmann, Alexander II-178

Maathuis, Olaf II-217 Madhusudan, P. II-158 Malík, Viktor II-368 Mann, Makai I-367 Manolios, Panagiotis II-388 Marché, Claude II-235 Mateescu, Radu II-57 Mathur, Umang II-158 Mazzanti, Franco II-57 McCamant, Stephen II-393 Meel, Kuldeep S. I-115

Menezes, Rafael II-403 Meyer, Roland II-378 Middeldorp, Aart II-178 Mitra, Sayan I-173 Mogavero, Fabio II-289 Mokhlesi, Navid I-173 Monti, Raúl E. I-463 Mordido, Andreia II-39 Mues, Malte II-398 Mutius, Joshua von I-425

Nagarajan, Rajagopal II-271 Neele, Thomas II-307 Nutz, Alexander II-418

Okudono, Takamasa I-79 Oortwijn, Wytse I-247

Palmskog, Karl II-137 Parízek, Pavel II-254 Pasareanu, Corina I-211 Perez, Mateo I-306 Peringer, Petr II-408 Peruffo, Andrea I-97 Poly, Guillaume II-271 Ponce-de-León, Hernán II-378

Qin, Xudong II-21 Quatmann, Tim I-346 Quiring, Benjamin II-388

Randour, Mickael I-346 Ravindran, Binoy II-98 Régis-Gianas, Yann II-235 Rocha, Herbert II-403 Román-Díez, Guillermo II-118 Roychowdhury, Sparsa I-387 Rubio, Albert II-118

Sakallah, Karem I-413 Schätzle, Claus II-418 Schewe, Sven I-306 Schrammel, Peter II-368 Schüssele, Frank II-418 Sharma, Vaibhav II-393 Sheinvald, Sarai I-211 Shoval, Yaara II-325 Sibai, Hussein I-173 Sifakis, Joseph I-228

Sighireanu, Mihaela II-235 Šoková, Veronika II-408, II-413 Somenzi, Fabio I-306 Sprunger, David I-191 Stankovič, Miroslav I-492 Stoelinga, Mariëlle I-463 Strejček, Jan II-413 Švejda, Jan I-40

Tomovič, Lukáš II-413 Tonetta, Stefano I-155 Topcu, Ufuk I-287 Treinen, Ralf II-235 Trivedi, Ashutosh I-306

Unadkat, Divyesh I-22, II-383 Usman, Muhammad I-115

van de Pol, Jaco I-247 van Eekelen, Marko II-217 Vasconcelos, Vasco T. II-39 Venkatesh, R II-383

Verbeek, Freek II-98 Visser, Willem II-373, II-393 Viswanathan, Mahesh II-158 Vojnar, Tomáš II-368, II-408, II-413

Wang, Kaiyuan I-115 Wang, Wenxi I-115 Welzel, Christoph I-228 Wendler, Philipp II-126 Wesselink, Wieger II-307 Whalen, Michael W. II-393 Wijs, Anton II-3 Willemse, Tim A. C. II-307 Wimmer, Simon I-425 Wojtczak, Dominik I-306

Yamada, Akihisa I-191 Yoshida, Nobuko II-278

Zhan, Bohua I-444 Zhan, Naijun I-444 Zhang, Miaomiao I-444