**Constantin Enea Akash Lal (Eds.)**

# LNCS 13964

# **Computer Aided Verification**

**35th International Conference, CAV 2023 Paris, France, July 17–22, 2023 Proceedings, Part I**

# **Lecture Notes in Computer Science 13964**

Founding Editors

Gerhard Goos Juris Hartmanis

#### Editorial Board Members

Elisa Bertino, *Purdue University, West Lafayette, IN, USA* Wen Gao, *Peking University, Beijing, China* Bernhard Steffen , *TU Dortmund University, Dortmund, Germany* Moti Yung , *Columbia University, New York, NY, USA*

The series Lecture Notes in Computer Science (LNCS), including its subseries Lecture Notes in Artificial Intelligence (LNAI) and Lecture Notes in Bioinformatics (LNBI), has established itself as a medium for the publication of new developments in computer science and information technology research, teaching, and education.

LNCS enjoys close cooperation with the computer science R & D community, the series counts many renowned academics among its volume editors and paper authors, and collaborates with prestigious societies. Its mission is to serve this international community by providing an invaluable service, mainly focused on the publication of conference and workshop proceedings and postproceedings. LNCS commenced publication in 1973.

Constantin Enea · Akash Lal Editors

# Computer Aided Verification

35th International Conference, CAV 2023 Paris, France, July 17–22, 2023 Proceedings, Part I

*Editors* Constantin Enea LIX, Ecole Polytechnique, CNRS and Institut Polytechnique de Paris Palaiseau, France

Akash Lal Microsoft Research Bangalore, India

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-031-37705-1 ISBN 978-3-031-37706-8 (eBook) https://doi.org/10.1007/978-3-031-37706-8

© The Editor(s) (if applicable) and The Author(s) 2023. This book is an open access publication.

**Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

### **Preface**

It was our privilege to serve as the program chairs for CAV 2023, the 35th International Conference on Computer-Aided Verification. CAV 2023 was held during July 19–22, 2023 and the pre-conference workshops were held during July 17–18, 2023. CAV 2023 was an in-person event, in Paris, France.

CAV is an annual conference dedicated to the advancement of the theory and practice of computer-aided formal analysis methods for hardware and software systems. The primary focus of CAV is to extend the frontiers of verification techniques by expanding to new domains such as security, quantum computing, and machine learning. This puts CAV at the cutting edge of formal methods research, and this year's program is a reflection of this commitment.

CAV 2023 received a large number of submissions (261). We accepted 15 tool papers, 3 case-study papers, and 49 regular papers, which amounts to an acceptance rate of roughly 26%. The accepted papers cover a wide spectrum of topics, from theoretical results to applications of formal methods. These papers apply or extend formal methods to a wide range of domains such as concurrency, machine learning and neural networks, quantum systems, as well as hybrid and stochastic systems. The program featured keynote talks by Ruzica Piskac (Yale University), Sumit Gulwani (Microsoft), and Caroline Trippel (Stanford University). In addition to the contributed talks, CAV also hosted the CAV Award ceremony, and a report from the Synthesis Competition (SYNTCOMP) chairs.

In addition to the main conference, CAV 2023 hosted the following workshops: Meeting on String Constraints and Applications (MOSCA), Verification Witnesses and Their Validation (VeWit), Verification of Probabilistic Programs (VeriProP), Open Problems in Learning and Verification of Neural Networks (WOLVERINE), Deep Learning-aided Verification (DAV), Hyperproperties: Advances in Theory and Practice (HYPER), Synthesis (SYNT), Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), and Verification Mentoring Workshop (VMW). CAV 2023 also hosted a workshop dedicated to Thomas A. Henzinger for this 60th birthday.

Organizing a flagship conference like CAV requires a great deal of effort from the community. The Program Committee for CAV 2023 consisted of 76 members—a committee of this size ensures that each member has to review only a reasonable number of papers in the allotted time. In all, the committee members wrote over 730 reviews while investing significant effort to maintain and ensure the high quality of the conference program. We are grateful to the CAV 2023 Program Committee for their outstanding efforts in evaluating the submissions and making sure that each paper got a fair chance. Like recent years in CAV, we made artifact evaluation mandatory for tool paper submissions, but optional for the rest of the accepted papers. This year we received 48 artifact submissions, out of which 47 submissions received at least one badge. The Artifact Evaluation Committee consisted of 119 members who put in significant effort to evaluate each artifact. The goal of this process was to provide constructive feedback to tool developers and help make the research published in CAV more reproducible. We are also very grateful to the Artifact Evaluation Committee for their hard work and dedication in evaluating the submitted artifacts.

CAV 2023 would not have been possible without the tremendous help we received from several individuals, and we would like to thank everyone who helped make CAV 2023 a success. We would like to thank Alessandro Cimatti, Isil Dillig, Javier Esparza, Azadeh Farzan, Joost-Pieter Katoen and Corina Pasareanu for serving as area chairs. We also thank Bernhard Kragl and Daniel Dietsch for chairing the Artifact Evaluation Committee. We also thank Mohamed Faouzi Atig for chairing the workshop organization as well as leading publicity efforts, Eric Koskinen as the fellowship chair, Sebastian Bardin and Ruzica Piskac as sponsorship chairs, and Srinidhi Nagendra as the website chair. Srinidhi, along with Enrique Román Calvo, helped prepare the proceedings. We also thank Ankush Desai, Eric Koskinen, Burcu Kulahcioglu Ozkan, Marijana Lazic, and Matteo Sammartino for chairing the mentoring workshop. Last but not least, we would like to thank the members of the CAV Steering Committee (Kenneth McMillan, Aarti Gupta, Orna Grumberg, and Daniel Kroening) for helping us with several important aspects of organizing CAV 2023.

We hope that you will find the proceedings of CAV 2023 scientifically interesting and thought-provoking!

June 2023 Constantin Enea Akash Lal

## **Organization**

### **Conference Co-chairs**


### **Artifact Co-chairs**


### **Workshop Chair**


Mohamed Faouzi Atig Uppsala University, Sweden

### **Verification Mentoring Workshop Organizing Committee**


### **Fellowship Chair**


### **Website Chair**


### **Sponsorship Co-chairs**


### **Proceedings Chairs**


### **Program Committee**

Aarti Gupta Princeton University, USA Alexander Nadel Intel, Israel Ankush Desai Amazon Web Services Anna Slobodova Intel, USA

Arjun Radhakrishna Microsoft, India

Cezara Dragoi Amazon Web Services, USA

Abhishek Bichhawat IIT Gandhinagar, India Aditya V. Thakur University of California, USA Ahmed Bouajjani University of Paris, France Aina Niemetz Stanford University, USA Akash Lal Microsoft Research, India Alan J. Hu University of British Columbia, Canada Alessandro Cimatti Fondazione Bruno Kessler, Italy Anastasia Mavridou KBR, NASA Ames Research Center, USA Andreas Podelski University of Freiburg, Germany Anthony Widjaja Lin TU Kaiserslautern and Max-Planck Institute for Software Systems, Germany Arie Gurfinkel University of Waterloo, Canada Aws Albarghouthi University of Wisconsin-Madison, USA Azadeh Farzan University of Toronto, Canada Bernd Finkbeiner CISPA Helmholtz Center for Information Security, Germany Bettina Koenighofer Graz University of Technology, Austria Bor-Yuh Evan Chang University of Colorado Boulder and Amazon, USA Burcu Kulahcioglu Ozkan Delft University of Technology, The Netherlands Caterina Urban Inria and École Normale Supérieure, France

Corina Pasareanu CMU, USA

Juneyoung Lee AWS, USA Kshitij Bansal Google, USA Kyungmin Bae POSTECH, South Korea Marcell Vazquez-Chanlatte Alliance Innovation Lab

Markus Rabe Google, USA Michael Emmi AWS, USA

Christoph Matheja Technical University of Denmark, Denmark Claudia Cauli Amazon Web Services, UK Constantin Enea LIX, CNRS, Ecole Polytechnique, France Cristina David University of Bristol, UK Dirk Beyer LMU Munich, Germany Elizabeth Polgreen University of Edinburgh, UK Elvira Albert Complutense University, Spain Eunsuk Kang Carnegie Mellon University, USA Gennaro Parlato University of Molise, Italy Hossein Hojjat Tehran University and Tehran Institute of Advanced Studies, Iran Ichiro Hasuo National Institute of Informatics, Japan Isil Dillig University of Texas, Austin, USA Javier Esparza Technische Universität München, Germany Joost-Pieter Katoen RWTH-Aachen University, Germany Jyotirmoy Deshmukh University of Southern California, USA Kenneth L. McMillan University of Texas at Austin, USA Kristin Yvonne Rozier Iowa State University, USA Kuldeep Meel National University of Singapore, Singapore (Nissan-Renault-Mitsubishi), USA Marieke Huisman University of Twente, The Netherlands Marta Kwiatkowska University of Oxford, UK Matthias Heizmann University of Freiburg, Germany Mihaela Sighireanu University Paris Saclay, ENS Paris-Saclay and CNRS, France Mohamed Faouzi Atig Uppsala University, Sweden Naijun Zhan Institute of Software, Chinese Academy of Sciences, China Nikolaj Bjorner Microsoft Research, USA Nina Narodytska VMware Research, USA Pavithra Prabhakar Kansas State University, USA Pierre Ganty IMDEA Software Institute, Spain Rupak Majumdar Max Planck Institute for Software Systems, Germany Ruzica Piskac Yale University, USA

Serdar Tasiran Amazon, USA Shaz Qadeer Meta, USA Swarat Chaudhuri UT Austin, USA

Sebastian Junges Radboud University, The Netherlands Sébastien Bardin CEA, LIST, Université Paris Saclay, France Sharon Shoham Tel Aviv University, Israel Shuvendu Lahiri Microsoft Research, USA Subhajit Roy Indian Institute of Technology, Kanpur, India Suguman Bansal Georgia Institute of Technology, USA Sylvie Putot École Polytechnique, France Thomas Wahl GrammaTech, USA Tomáš Vojnar Brno University of Technology, FIT, Czech Republic Yakir Vizel Technion - Israel Institute of Technology, Israel Yu-Fang Chen Academia Sinica, Taiwan Zhilin Wu State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China

### **Artifact Evaluation Committee**

Alvin George IISc Bangalore, India Amit Samanta University of Utah, USA Anan Kabaha Technion, Israel Andres Noetzli Cubist, Inc., USA Avraham Raviv Bar Ilan University, Israel Ayrat Khalimov TU Clausthal, Germany Charles Babu M. CEA LIST, France

Alejandro Hernández-Cerezo Complutense University of Madrid, Spain Aman Goel Amazon Web Services, USA Anna Becchi Fondazione Bruno Kessler, Italy Arnab Sharma University of Oldenburg, Germany Baoluo Meng General Electric Research, USA Benjamin Jones Amazon Web Services, USA Bohua Zhan Institute of Software, Chinese Academy of Sciences, China Cayden Codel Carnegie Mellon University, USA Chungha Sung Amazon Web Services, USA Clara Rodriguez-Núñez Universidad Complutense de Madrid, Spain Cyrus Liu Stevens Institute of Technology, USA Daniel Hausmann University of Gothenburg, Sweden

Daniela Kaufmann TU Wien, Austria Debasmita Lohar MPI SWS, Germany Denis Mazzucato Inria, France Ferhat Erata Yale University, USA Filipe Arruda UFPE, Brazil Florian Sextl TU Wien, Austria Frédéric Recoules CEA LIST, France Goktug Saatcioglu Cornell, USA Grégoire Menguy CEA LIST, France Hadrien Renaud UCL, UK Ignacio D. Lopez-Miguel TU Wien, Austria John Kolesar Yale University, USA Kirby Linvill CU Boulder, USA Luke Geeson UCL, UK

Deivid Vale Radboud University Nijmegen, Netherlands Dorde Žikeli´c Institute of Science and Technology Austria, Austria Ekanshdeep Gupta New York University, USA Enrico Magnago Amazon Web Services, USA Filip Cordoba Graz University of Technology, Austria Florian Dorfhuber Technical University of Munich, Germany Francesco Parolini Sorbonne University, France Goran Piskachev Amazon Web Services, USA Guy Amir Hebrew University of Jerusalem, Israel Habeeb P. Indian Institute of Science, Bangalore, India Haoze Wu Stanford University, USA Hari Krishnan University of Waterloo, Canada Hünkar Tunç Aarhus University, Denmark Idan Refaeli Hebrew University of Jerusalem, Israel Ilina Stoilkovska Amazon Web Services, USA Ira Fesefeldt RWTH Aachen University, Germany Jahid Choton Kansas State University, USA Jie An National Institute of Informatics, Japan Joseph Scott University of Waterloo, Canada Kevin Lotz Kiel University, Germany Kush Grover Technical University of Munich, Germany Levente Bajczi Budapest University of Technology and Economics, Hungary Liangcheng Yu University of Pennsylvania, USA Lutz Klinkenberg RWTH Aachen University, Germany Marek Chalupa Institute of Science and Technology Austria, Austria

Matthias Hetzenberger TU Wien, Austria Mertcan Temel Intel Corporation, USA Michele Chiari TU Wien, Austria

Omkar Tuppe IIT Bombay, India

Sankalp Gambhir EPFL, Switzerland

Mario Bucev EPFL, Switzerland Mário Pereira NOVA LINCS—Nova School of Science and Technology, Portugal Marius Mikucionis Aalborg University, Denmark Martin Jonáš Masaryk University, Czech Republic Mathias Fleury University of Freiburg, Germany Maximilian Heisinger Johannes Kepler University Linz, Austria Miguel Isabel Universidad Complutense de Madrid, Spain Mihai Nicola Stevens Institute of Technology, USA Mihály Dobos-Kovács Budapest University of Technology and Economics, Hungary Mikael Mayer Amazon Web Services, USA Mitja Kulczynski Kiel University, Germany Muhammad Mansur Amazon Web Services, USA Muqsit Azeem Technical University of Munich, Germany Neelanjana Pal Vanderbilt University, USA Nicolas Koh Princeton University, USA Niklas Metzger CISPA Helmholtz Center for Information Security, Germany Pablo Gordillo Complutense University of Madrid, Spain Pankaj Kalita Indian Institute of Technology, Kanpur, India Parisa Fathololumi Stevens Institute of Technology, USA Pavel Hudec HKUST, Hong Kong, China Peixin Wang University of Oxford, UK Philippe Heim CISPA Helmholtz Center for Information Security, Germany Pritam Gharat Microsoft Research, India Priyanka Darke TCS Research, India Ranadeep Biswas Informal Systems, Canada Robert Rubbens University of Twente, Netherlands Rubén Rubio Universidad Complutense de Madrid, Spain Samuel Judson Yale University, USA Samuel Pastva Institute of Science and Technology Austria, Austria Sarbojit Das Uppsala University, Sweden Sascha Klüppelholz Technische Universität Dresden, Germany Sean Kauffman Aalborg University, Denmark


### **Additional Reviewers**

Azzopardi, Shaun Baier, Daniel Belardinelli, Francesco Bergstraesser, Pascal Boker, Udi Ceska, Milan Chien, Po-Chun Coglio, Alessandro Correas, Jesús Doveri, Kyveli Drachsler Cohen, Dana Durand, Serge Fried, Dror Genaim, Samir Ghosh, Bishwamittra Gordillo, Pablo

Guillermo, Roman Diez Gómez-Zamalloa, Miguel Hernández-Cerezo, Alejandro Holík, Lukáš Isabel, Miguel Ivrii, Alexander Izza, Yacine Jothimurugan, Kishor Kaivola, Roope Kaminski, Benjamin Lucien Kettl, Matthias Kretinsky, Jan Lengal, Ondrej Losa, Giuliano Luo, Ning Malik, Viktor

Markgraf, Oliver Martin-Martin, Enrique Meller, Yael Perez, Mateo Petri, Gustavo Pote, Yash Preiner, Mathias Rakamaric, Zvonimir Rastogi, Aseem Razavi, Niloofar Rogalewicz, Adam Sangnier, Arnaud Sarkar, Uddalok Schoepe, Daniel Sergey, Ilya

Stoilkovska, Ilina Stucki, Sandro Tsai, Wei-Lun Turrini, Andrea Vafeiadis, Viktor Valiron, Benoît Wachowitz, Henrik Wang, Chao Wang, Yuepeng Wies, Thomas Yang, Jiong Yen, Di-De Zhu, Shufang Žikeli´c, Ɖor de Zohar, Yoni

# **Invited Talks**

### **Privacy-Preserving Automated Reasoning**

Ruzica Piskac

#### Yale University, USA

Formal methods offer a vast collection of techniques to analyze and ensure the correctness of software and hardware systems against a given specification. In fact, modern formal methods tools scale to industrial applications. Despite this significant success, privacy requirements are not considered in the design of these tools. For example, when using automated reasoning tools, the implicit requirement is that the formula to be proved is public. This raises an issue if the formula itself reveals information that is supposed to remain private to one party. To overcome this issue, we propose the concept of privacypreserving automated reasoning.

We first consider the problem of privacy-preserving Boolean satisfiability [1]. In this problem, two mutually distrustful parties each provides a Boolean formula. The goal is to decide whether their conjunction is satisfiable without revealing either formula to the other party. We present an algorithm to solve this problem. Our algorithm is an oblivious variant of the classic DPLL algorithm and can be integrated with existing secure two-party computation techniques.

We next turn to the problem where one party wants to prove to another party that their program satisfies a given specification without revealing the program. We split this problem into two subproblems: (1) proving that the program can be translated into a propositional formula without revealing either the program or the formula; (2) proving that the obtained formula entails the specification. To solve the latter subproblem, we developed a zero-knowledge protocol for proving the unsatisfiability of formulas in propositional logic [2] (ZKUNSAT). Our protocol is based on a resolution proof of unsatisfiability. We encode verification of the resolution proof using polynomial equivalence checking, which enables us to use fast zero-knowledge protocols for polynomial satisfiability.

Finally, we will outline future directions towards extending ZKUNSAT to first-order logic modulto theories (SMT) and translating programs to formulas in zero-knowledge to realize fully automated privacy-preserving program verification.

#### **References**


#### xviii Privacy-Preserving Automated Reasoning

2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, 7–11 November 2022, pp. 2203–2217. ACM (2022)

### **Enhancing Programming Experiences Using AI: Leveraging LLMs as Analogical Reasoning Engines and Beyond**

Sumit Gulwani

Microsoft, USA

AI can significantly improve programming experiences for a diverse range of users: from professional developers and data scientists (proficient programmers) who need help in software engineering and data wrangling, to spreadsheet users (low-code programmers) needing help in authoring formulas, and students (novice programmers) seeking hints when tackling programming homework. To effectively communicate their needs to AI, users can express their intent explicitly through input-output examples or natural language specifications, or implicitly by presenting bugs or recent code edits for AI to analyze and suggest improvements.

Analogical reasoning is at the heart of problem solving as it allows to make sense of new information and transfer knowledge from one domain to another. In this talk, I will demonstrate that analogical reasoning is a fundamental emergent capability of Large Language Models (LLMs) and can be utilized to enhance various types of programming experiences.

However, there is significant room for innovation in building robust experiences tailored to specific task domains. I will discuss how various methods from symbolic AI (particularly programming-by-examples-or-analogies) such as search-and-rank, failure-guided refinement, and neuro-symbolic cooperation, can help fill this gap. This comes in three forms: (a) Prompt engineering that involves synthesizing specification-rich, context-aware prompts from various sources, sometimes using the LLM itself, to elicit optimal output. (b) Post-processing techniques that guide, rank, and validate the LLM's output, occasionally employing the LLM for these purposes. (c) Multi-turn workflows that involve multiple LLM invocations, allowing the model more time and iterations to optimize results. I will illustrate these concepts using various capabilities in Excel, PowerQuery, and Visual Studio.

### **Verified Software Security Down to Gates**

Caroline Trippel

Stanford University, USA

Hardware-software (HW-SW) contracts are critical for high-assurance computer systems design and an enabler for software design/analysis tools that find and repair hardware-related bugs in programs. E.g., memory consistency models define what values shared memory loads can return in a parallel program. Emerging security contracts define what program data is susceptible to leakage via hardware side-channels and what speculative control- and data-flow is possible at runtime. However, these contracts and the analyses they support are useless if we cannot guarantee microarchitectural compliance, which is a "grand challenge." Notably, some contracts are still evolving (e.g., security contracts), making hardware compliance a moving target. Even for mature contracts, comprehensively verifying that a complex microarchitecture implements some abstract contract is a time-consuming endeavor involving teams of engineers, which typically requires resorting to incomplete proofs.

Our work takes a radically different approach to the challenge above by synthesizing HW-SW contracts from advanced (i.e., industry-scale/complexity) processor implementations. In this talk, I will present our work on: synthesizing security contracts from processor specifications written in Verilog; designing compiler approaches parameterized by these contracts that can find and repair hardware-related vulnerabilities in programs; and updating hardware microarchitectures to support scalable verification and efficient security-hardened programs. I will conclude by outlining remaining challenges in attaining the vision of verified software security down to gates.

### **Contents – Part I**

#### **Automata and Logic**




### **Contents – Part II**

#### **Decision Procedures**


#### **Model Checking**


xxviii Contents – Part II


### **Contents – Part III**

#### **Probabilistic Systems**


xxx Contents – Part III



**Automata and Logic**

# **Active Learning of Deterministic Timed Automata with Myhill-Nerode Style Characterization**

Masaki Waga(B)

Graduate School of Informatics, Kyoto University, Kyoto, Japan mwaga@fos.kuis.kyoto-u.ac.jp

**Abstract.** We present an algorithm to learn a deterministic timed automaton (DTA) via membership and equivalence queries. Our algorithm is an extension of the L\* algorithm with a Myhill-Nerode style characterization of recognizable timed languages, which is the class of timed languages recognizable by DTAs. We first characterize the recognizable timed languages with a Nerode-style congruence. Using it, we give an algorithm with a smart teacher answering *symbolic* membership queries in addition to membership and equivalence queries. With a symbolic membership query, one can ask the membership of a certain set of timed words at one time. We prove that for any recognizable timed language, our learning algorithm returns a DTA recognizing it. We show how to answer a symbolic membership query with finitely many membership queries. We also show that our learning algorithm requires a polynomial number of queries with a smart teacher and an exponential number of queries with a normal teacher. We applied our algorithm to various benchmarks and confirmed its effectiveness with a normal teacher.

**Keywords:** timed automata · active automata learning · recognizable timed languages · L\* algorithm · observation table

### **1 Introduction**

*Active automata learning* is a class of methods to infer an automaton recognizing an unknown target language Ltgt ⊆ Σ<sup>∗</sup> through finitely many queries to a teacher. The L\* algorithm [8], the best-known active DFA learning algorithm, infers the minimum DFA recognizing Ltgt using *membership* and *equivalence* queries. In a membership query, the learner asks if a word w ∈ Σ<sup>∗</sup> is in the target language Ltgt, which is used to obtain enough information to construct a hypothesis DFA Ahyp. Using an equivalence query, the learner checks if the hypothesis Ahyp recognizes the target language Ltgt. If L(Ahyp) = Ltgt, the teacher returns a counterexample *cex* ∈ LtgtL(Ahyp) differentiating the target language and the current hypothesis. The learner uses *cex* to update Ahyp to classify *cex* correctly. Such a learning algorithm has been combined with formal verification, e. g., for testing [22,24,26,28] and controller synthesis [31].

<sup>l</sup>0 <sup>l</sup>1

a

b a b

b

a

<sup>l</sup>2 <sup>l</sup>3

(a) A DFA A

b a

<sup>l</sup>0 <sup>l</sup>1 a, c ≥ 1/c := 0 a,c< 1 a, c ≤ 1 a,c> 1

(b) Intermediate observation tables for learning A. a and aa are deemed equivalent with extensions S = {ε, a} but distinguished with S = {ε, a, b}.

(c) A DTA A with one clock variable c

{τ- <sup>0</sup> | τ- <sup>0</sup> = 0} {τ- <sup>0</sup>a | τ- <sup>0</sup> ∈ (0, 1)} {τ<sup>0</sup> | τ<sup>0</sup> = 0} -- {τ<sup>0</sup> | τ<sup>0</sup> ∈ (0, 1)} τ<sup>0</sup> + τ- <sup>0</sup> <sup>∈</sup> (0, 1) . . . {τ0aτ<sup>1</sup> | τ<sup>0</sup> ∈ (0, 1), τ<sup>1</sup> ∈ (0, 1), τ<sup>0</sup> + τ<sup>1</sup> ∈ (0, 1)}(= p1) τ<sup>0</sup> + τ<sup>1</sup> + τ- <sup>0</sup> <sup>∈</sup> (0, 1) . . . {τ0aτ1aτ<sup>2</sup> | τ<sup>0</sup> ∈ (1, 2), τ<sup>1</sup> ∈ (0, 1), τ<sup>2</sup> ∈ (0, 1), τ<sup>1</sup> + τ<sup>2</sup> ∈ (0, 1)}(= p2) τ<sup>1</sup> + τ<sup>2</sup> + τ- <sup>0</sup> <sup>∈</sup> (0, 1) . . .

(d) Timed observation table for learning A- . Each cell is indexed by a pair (p, s) ∈ P ×S of elementary languages. The cell indexed by (p, s) shows a constraint Λ such that <sup>w</sup> <sup>∈</sup> <sup>p</sup> · <sup>s</sup> satisfies <sup>w</sup> ∈ Ltgt if and only if <sup>Λ</sup> holds. Elementary languages <sup>p</sup>1 and <sup>p</sup>2 are deemed equivalent with the equation τ <sup>1</sup> 0 <sup>+</sup> <sup>τ</sup> <sup>1</sup> 1 <sup>=</sup> <sup>τ</sup> <sup>2</sup> 1 <sup>+</sup> <sup>τ</sup> <sup>2</sup> 2 , where <sup>τ</sup> *<sup>j</sup> <sup>i</sup>* represents τ*<sup>i</sup>* in p*<sup>j</sup>* .

**Fig. 1.** Illustration of observation tables in the L\* algorithm for DFA learning (Fig. 1b) and our algorithm for DTA learning (Fig. 1d)

Most of the DFA learning algorithms rely on the characterization of regular languages by *Nerode's congruence*. For a language L, words p and p are equivalent if for any extension s, p·s ∈ L if and only if p ·s ∈ L. It is well known that if L is regular, such an equivalence relation has finite classes, corresponding to the locations of the minimum DFA recognizing L (known as *Myhill-Nerode theorem*; see, e. g., [18]). Moreover, for any regular language L, there are finite extensions S such that p and p are equivalent if and only if for any s ∈ S, p · s ∈ L if and only if p · s ∈ L. Therefore, one can learn the minimum DFA by learning such finite extensions S and the finite classes induced by Nerode's congruence.

The L\* algorithm learns the minimum DFA recognizing the target language Ltgt using a 2-dimensional array called an *observation table*. Figure 1b illustrates observation tables. The rows and columns of an observation table are indexed with finite sets of words P and S, respectively. Each cell indexed by (p, s) ∈ P × S shows if p · s ∈ Ltgt. The column indices S are the current extensions approximating Nerode's congruence. The L\* algorithm increases P and S until: 1) the equivalence relation defined by S converges to Nerode's congruence and 2) P covers all the classes induced by the congruence. The equivalence between p, p ∈ P under S can be checked by comparing the rows in the observation table indexed with p and p . For example, Fig. 1b shows that a and aa are deemed equivalent with extensions S = {ε, a} but distinguished by adding b to S. The refinement of P and S is driven by certain conditions to validate the DFA construction and by addressing the counterexample obtained by an equivalence query.

*Timed* words are extensions of conventional words with real-valued dwell time between events. *Timed* languages, sets of timed words, are widely used to formalize real-time systems and their properties, e. g., for formal verification. Among various formalisms representing timed languages, *timed automata (TAs)* [4] is one of the widely used formalisms. A TA is an extension of an NFA with finitely many clock variables to represent timing constraints. Figure 1c shows an example.

Despite its practical relevance, learning algorithms for TAs are only available for limited subclasses of TAs, e. g., real-time automata [6,7], event-recording automata [15,16], event-recording automata with unobservable reset [17], and one-clock deterministic TAs [5,30]. Timing constraints representable by these classes are limited, e. g., by restricting the number of clock variables or by restricting the edges where a clock variable can be reset. Such restriction simplifies the inference of timing constraints in learning algorithms.

*Contributions.* In this paper, we propose an active learning algorithm for *deterministic* TAs (DTAs). The languages recognizable by DTAs are called *recognizable timed languages* [21]. Our strategy is as follows: first, we develop a Myhill-Nerode style characterization of recognizable timed languages; then, we extend the L\* algorithm for recognizable timed languages using the similarity of the Myhill-Nerode style characterization.

Due to the continuity of dwell time in timed words, it is hard to characterize recognizable timed languages by a Nerode-style congruence between timed words. For example, for the DTA in Fig. 1c, for any τ,τ ∈ [0, 1) satisfying τ<τ , (1 − τ )a distinguishes τ and τ because τ (1 − τ )a leads to l<sup>0</sup> while τ (1 − τ )a leads to l1. Therefore, such a congruence can make *infinitely* many classes.

Instead, we define a Nerode-style congruence between sets of timed words called *elementary languages* [21]. An elementary language is a timed language defined by a word with a conjunction of inequalities constraining the time difference between events. We also use an equality constraint, which we call, a *renaming equation* to define the congruence. Intuitively, a renaming equation bridges the time differences in an elementary language and the clock variables in a TA. We note that there can be multiple renaming equations showing the equivalence of two elementary languages.

*Example 1.* Let <sup>p</sup><sup>1</sup> and <sup>p</sup><sup>2</sup> be elementary languages <sup>p</sup><sup>1</sup> <sup>=</sup> {<sup>τ</sup> <sup>1</sup> <sup>0</sup> aτ <sup>1</sup> <sup>1</sup> <sup>|</sup> <sup>τ</sup> <sup>1</sup> <sup>0</sup> ∈ (0, 1), τ <sup>1</sup> <sup>1</sup> <sup>∈</sup> (0, 1), τ <sup>1</sup> <sup>0</sup> + τ <sup>1</sup> <sup>1</sup> <sup>∈</sup> (0, 1)} and <sup>p</sup><sup>2</sup> <sup>=</sup> {<sup>τ</sup> <sup>2</sup> <sup>0</sup> aτ <sup>2</sup> <sup>1</sup> aτ <sup>2</sup> <sup>2</sup> <sup>|</sup> <sup>τ</sup> <sup>2</sup> <sup>0</sup> <sup>∈</sup> (1, 2), τ <sup>2</sup> <sup>1</sup> ∈ (0, 1), τ <sup>2</sup> <sup>2</sup> <sup>∈</sup> (0, 1), τ <sup>2</sup> <sup>1</sup> + τ <sup>2</sup> <sup>2</sup> ∈ (0, 1)}. For the DTA in Fig. 1c, p<sup>1</sup> and p<sup>2</sup> are equivalent with the renaming equation τ <sup>1</sup> <sup>0</sup> + τ <sup>1</sup> <sup>1</sup> = τ <sup>2</sup> <sup>1</sup> + τ <sup>2</sup> <sup>2</sup> because for any w<sup>1</sup> = τ <sup>1</sup> <sup>0</sup> aτ <sup>1</sup> <sup>1</sup> <sup>∈</sup> <sup>p</sup><sup>1</sup> and <sup>w</sup><sup>2</sup> <sup>=</sup> <sup>τ</sup> <sup>2</sup> <sup>0</sup> aτ <sup>2</sup> <sup>1</sup> aτ <sup>2</sup> <sup>2</sup> ∈ p2: 1) we reach l<sup>0</sup> after reading either of w<sup>1</sup> and w<sup>2</sup> and 2) the values of c after reading w<sup>1</sup> and w<sup>2</sup> are τ <sup>1</sup> <sup>0</sup> + τ <sup>1</sup> <sup>1</sup> and τ 2 <sup>1</sup> + τ <sup>2</sup> <sup>2</sup> , respectively.

We characterize recognizable timed languages by the finiteness of the equivalence classes defined by the above congruence. We also show that for any recognizable timed language, there is a finite set S of elementary languages such that the equivalence of any prefixes can be checked by the extensions S.

By using the above congruence, we extend the L\* algorithm for DTAs. The high-level idea is the same as the original L\* algorithm: 1) the learner makes membership queries to obtain enough information to construct a hypothesis DTA Ahyp and 2) the learner makes an equivalence query to check if Ahyp recognizes the target language. The largest difference is in the cells of an observation table. Since the concatenation p·s of an index pair (p, s) ∈ P ×S is not a timed word but a set of timed words, its membership is not defined as a Boolean value. Instead, we introduce the notion of *symbolic* membership and use it as the value of each cell of the *timed* observation table. Intuitively, the symbolic membership is the constraint representing the subset of p· s included by Ltgt. Such a constraint can be constructed by finitely many (non-symbolic) membership queries.

*Example 2.* Figure 1d illustrates a *timed* observation table. The equivalence between p1, p<sup>2</sup> ∈ P under S can be checked by comparing the cells in the rows indexed with p<sup>1</sup> and p<sup>2</sup> with renaming equations. For the cells in rows indexed by p<sup>1</sup> and p2, their constraints are the same by replacing τ<sup>0</sup> + τ<sup>1</sup> with τ<sup>1</sup> + τ<sup>2</sup> and vice versa. Thus, p<sup>1</sup> and p<sup>2</sup> are equivalent with the current extensions S.

Once the learner obtains enough information, it constructs a DTA via the monoid-based representation of recognizable timed languages [21]. We show that for any recognizable timed language, our algorithm terminates and returns a DTA recognizing it. We also show that the number of the necessary queries is polynomial to the size of the equivalence class defined by the Nerode-style congruence if symbolic membership queries are allowed and, otherwise, exponential to it. Moreover, if symbolic membership queries are not allowed, the number of the necessary queries is at most doubly exponential to the number of the clock variable of a DTA recognizing the target language and singly exponential to the number of locations of a DTA recognizing the target language. This worst-case complexity is the same as the one-clock DTA learning algorithm in [30].

We implemented our DTA learning algorithm in a prototype library LearnTA. Our experiment results show that it is efficient enough for some benchmarks taken from practical applications, e. g., the FDDI protocol. This suggests the practical relevance of our algorithm.

The following summarizes our contribution.


*Related Work.* Among various characterization of timed languages [4,10–13,21], the characterization by *recognizability* [21] is closest to our Myhill-Nerode-style characterization. Both of them use finite sets of elementary languages for characterization. Their main difference is that [21] proposes a formalism to define a timed language by relating prefixes by a morphism, whereas we propose a technical gadget to define an equivalence relation over timed words with respect to suffixes using symbolic membership. This difference makes our definition suitable for an L\*-style algorithm, where the original L\* algorithm is based on Nerode's congruence, which defines an equivalence relation over words with respect to suffixes using conventional membership.

As we have discussed so far, active TA learning [5,15–17,30] has been studied mostly for limited subclasses of TAs, where the number of the clock variables or the clock variables reset at each edge is fixed. In contrast, our algorithm infers both of the above information. Another difference is in the technical strategy. Most of the existing algorithms are related to the active learning of *symbolic automata* [9,14], enhancing the languages with clock valuations. In contrast, we take a more semantic approach via the Nerode-style congruence.

Another recent direction is to use a *genetic algorithm* to infer TAs in passive [27] or active [3] learning. This differs from our learning algorithm based on a formal characterization of timed languages. Moreover, these algorithms may not converge to the correct automaton due to a genetic algorithm.

#### **2 Preliminaries**

For a set X, its powerset is denoted by P(X). We denote the empty sequence by ε. For sets X, Y , we denote their symmetric difference by XY = {x | x ∈ X ∧ x /∈ Y }∪{y | y ∈ Y ∧ y /∈ X}.

#### **2.1 Timed Words and Timed Automata**

**Definition 3 (timed word).** *For a finite alphabet* Σ*, a* timed word w *is an alternating sequence* <sup>τ</sup>0a1τ1a<sup>2</sup> ...anτ<sup>n</sup> *of* <sup>Σ</sup> *and* <sup>R</sup>≥0*. The set of timed words over* Σ *is denoted by* T (Σ)*. A* timed language L⊆T (Σ) *is a set of timed words.*

For timed words w = τ0a1τ1a<sup>2</sup> ...anτ<sup>n</sup> and w = τ 0a <sup>1</sup>τ 1a <sup>2</sup> ...a n τ n- , their concatenation w · w is w · w = τ0a1τ1a<sup>2</sup> ...an(τ<sup>n</sup> + τ 0)a <sup>1</sup>τ 1a <sup>2</sup> ...a n τ n- . The concatenation is naturally extended to timed languages: for a timed word w and timed languages L,L , we let w·L = {w·w<sup>L</sup> | w<sup>L</sup> ∈ L}, L·w = {wL·w | w<sup>L</sup> ∈ L}, and L·L = {w<sup>L</sup> · wL- | w<sup>L</sup> ∈ L, wL- ∈ L }. For timed words w and w , w is a *prefix* of w if there is a timed word w satisfying w ·w = w . A timed language L is *prefix-closed* if for any w ∈ L, L contains all the prefixes of w.

For a finite set C of clock variables, a *clock valuation* is a function ν ∈ (R≥<sup>0</sup>)<sup>C</sup> . We let **<sup>0</sup>**<sup>C</sup> be the clock valuation satisfying **<sup>0</sup>**<sup>C</sup> (c) = 0 for any <sup>c</sup> <sup>∈</sup> <sup>C</sup>. For <sup>ν</sup> <sup>∈</sup> (R≥<sup>0</sup>)<sup>C</sup> and <sup>τ</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>, we let <sup>ν</sup> <sup>+</sup> <sup>τ</sup> be the clock valuation satisfying (<sup>ν</sup> <sup>+</sup> <sup>τ</sup> )(c) = <sup>ν</sup>(c) + <sup>τ</sup> for any <sup>c</sup> <sup>∈</sup> <sup>C</sup>. For <sup>ν</sup> <sup>∈</sup> (R≥<sup>0</sup>)<sup>C</sup> and <sup>ρ</sup> <sup>⊆</sup> <sup>C</sup>, we let <sup>ν</sup>[<sup>ρ</sup> := 0] be the clock valuation satisfying (ν[ρ := 0])(x) = 0 for c ∈ ρ and (ν[ρ := 0])(c) = ν(c) for c /∈ ρ. We let G<sup>C</sup> be the set of constraints defined by a finite conjunction of inequalities c d, where <sup>c</sup> <sup>∈</sup> <sup>C</sup>, <sup>d</sup> <sup>∈</sup> <sup>N</sup>, and ∈ {>, <sup>≥</sup>, <sup>≤</sup>, <}. We let <sup>C</sup><sup>C</sup> be the set of constraints defined by a finite conjunction of inequalities c d or <sup>c</sup> <sup>−</sup> <sup>c</sup> d, where c, c <sup>∈</sup> <sup>C</sup>, <sup>d</sup> <sup>∈</sup> <sup>N</sup>, and ∈ {>, <sup>≥</sup>, <sup>≤</sup>, <}. We denote - ∅ by . For <sup>ν</sup> <sup>∈</sup> (R≥<sup>0</sup>)<sup>C</sup> and <sup>ϕ</sup> ∈ C<sup>C</sup> ∪ G<sup>C</sup> , we denote <sup>ν</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup> if <sup>ν</sup> satisfies <sup>ϕ</sup>.

**Definition 4 (timed automaton).** *A* timed automaton *(TA) is a 7-tuple* (Σ, L, l0, C, I, Δ, F)*, where:* Σ *is the finite alphabet,* L *is the finite set of locations,* l<sup>0</sup> ∈ L *is the initial location,* C *is the finite set of clock variables,* I : L → C<sup>C</sup> *is the invariant of each location,* Δ ⊆ L× G<sup>C</sup> ×(Σ ∪ {ε})× P(C)×L *is the set of edges, and* F ⊆ L *is the accepting locations.*

A TA is *deterministic* if 1) for any a ∈ Σ and (l, g, a, ρ, l ),(l, g , a, ρ , l) ∈ Δ, g∧g is unsatisfiable, or 2) for any (l, g, ε, ρ, l ) ∈ Δ, g∧I(l) is at most a singleton. Figure 1c shows a deterministic TA (DTA).

The semantics of a TA is defined by a *timed transition system (TTS)*.

**Definition 5 (semantics of TAs).** *For a TA* A = (Σ, L, l0, C, I, Δ, F)*, the* timed transition system (TTS) *is a 4-tuple* S = (Q, q0, Q<sup>F</sup> ,→)*, where:* Q = <sup>L</sup> <sup>×</sup> (R≥<sup>0</sup>)<sup>C</sup> *is the set of* (concrete) states*,* <sup>q</sup><sup>0</sup> = (l0, **<sup>0</sup>**<sup>C</sup> ) *is the* initial state*,* Q<sup>F</sup> = {(l, ν) ∈ Q | l ∈ F} *is the set of* accepting states*, and* → ⊆ Q × Q *is the* transition relation *consisting of the following*1*.*


A *run* of a TA A is an alternating sequence q0,→1, q1,...,→n, q<sup>n</sup> of q<sup>i</sup> ∈ Q and →<sup>i</sup> ∈ → satisfying q<sup>i</sup>−<sup>1</sup> →<sup>i</sup> q<sup>i</sup> for any i ∈ {1, 2,...,n}. A run q0,→1, q1,...,→n, q<sup>n</sup> is accepting if q<sup>n</sup> ∈ Q<sup>F</sup> . Given such a run, the associated timed word is the concatenation of the labels of the transitions. The timed *language* L(A) of a TA A is the set of timed words associated with some accepting run of A.

#### **2.2 Recognizable Timed Languages**

Here, we review the *recognizability* [21] of timed languages.

**Definition 6 (timed condition).** *For a set* <sup>T</sup> <sup>=</sup> {τ0, τ1,...,τn} *of ordered variables, a* timed condition Λ *is a finite conjunction of inequalities* Ti,j d*, where* Ti,j = <sup>j</sup> <sup>k</sup>=<sup>i</sup> <sup>τ</sup>k*,* ∈ {>, <sup>≥</sup>, <sup>≤</sup>, <}*, and* <sup>d</sup> <sup>∈</sup> <sup>N</sup>*.*

A timed condition Λ is *simple*<sup>2</sup> if for each Ti,j , Λ contains d < Ti,j < d+ 1 or <sup>d</sup> <sup>≤</sup> <sup>T</sup>i,j <sup>∧</sup>Ti,j <sup>≤</sup> <sup>d</sup> for some <sup>d</sup> <sup>∈</sup> <sup>N</sup>. A timed condition <sup>Λ</sup> is *canonical* if we cannot strengthen or add any inequality Ti,j d to Λ without changing its semantics.

<sup>1</sup> We use ε,τ → to avoid the discussion with an arbitrary small dwell time in [21].

<sup>2</sup> The notion of simplicity is taken from [15].

**Definition 7 (elementary language).** *A timed language* L *is* elementary *if there are* u = a1, a2,...,a<sup>n</sup> ∈ Σ<sup>∗</sup> *and a timed condition* Λ *over* {τ0, τ1,...,τn} *satisfying* L = {τ0a1τ1a<sup>2</sup> ...anτ<sup>n</sup> | τ0, τ1,...,τ<sup>n</sup> |= Λ}*, and the set of valuations of* {τ0, τ1,...,τn} *defined by* Λ *is bounded. We denote such* L *by* (u, Λ)*. We let* E(Σ) *be the set of elementary languages over* Σ*.*

For p, p ∈ E(Σ), p is a *prefix* of p if for any w ∈ p , there is a prefix w ∈ p of w , and for any w ∈ p, there is w ∈ p such that w is a prefix of w . For any elementary language, the number of its prefixes is finite. For a set of elementary languages, *prefix-closedness* is defined based on the above definition of prefixes.

An elementary language (u, Λ) is *simple* if there is a simple and canonical timed condition Λ satisfying (u, Λ)=(u, Λ ). We let SE(Σ) be the set of simple elementary languages over Σ. Without loss of generality, we assume that for any (u, Λ) ∈ SE(Σ), Λ is simple and canonical. We remark that any DTA cannot distinguish timed words in a simple elementary language, i. e., for any p ∈ SE(Σ) and a DTA A, we have either p ⊆ L(A) or p ∩ L(A) = ∅. We can decide if p ⊆ L(A) or p ∩ L(A) = ∅ by taking some w ∈ p and checking if w ∈ L(A).

**Definition 8 (immediate exterior).** *Let* L = (u, Λ) *be an elementary language. For* <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>*, the* discrete immediate exterior ext<sup>a</sup>(L) *of* <sup>L</sup> *is* ext<sup>a</sup>(L) = (<sup>u</sup> · a, Λ ∪ {τ|u|+1 = 0})*. The* continuous immediate exterior ext<sup>t</sup> (L) *of* L *is* ext<sup>t</sup> (L)=(u, Λ<sup>t</sup> )*, where* Λ<sup>t</sup> *is the timed condition such that each inequality* <sup>T</sup>i,|u<sup>|</sup> <sup>=</sup> <sup>d</sup> *in* <sup>Λ</sup> *is replaced with* <sup>T</sup>i,|u<sup>|</sup> > d *if such an inequality exists, and otherwise, the inequality* <sup>T</sup>i,|u<sup>|</sup> < d *in* <sup>Λ</sup> *with the smallest index* <sup>i</sup> *is replaced with* <sup>T</sup>i,|u<sup>|</sup> <sup>=</sup> <sup>d</sup>*. The* immediate exterior *of* <sup>L</sup> *is* ext(L) = <sup>a</sup>∈<sup>Σ</sup> ext<sup>a</sup>(L) <sup>∪</sup> ext<sup>t</sup> (L)*.*

*Example 9.* For a word <sup>u</sup> <sup>=</sup> <sup>a</sup>·<sup>a</sup> and a timed condition <sup>Λ</sup> <sup>=</sup> {T0,<sup>0</sup> <sup>∈</sup> (1, 2)∧T0,<sup>1</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup>0,<sup>2</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup>1,<sup>2</sup> <sup>∈</sup> (0, 1) <sup>∧</sup> <sup>T</sup>2,<sup>2</sup> = 0}, we have 1.<sup>3</sup> · <sup>a</sup> · <sup>0</sup>.<sup>5</sup> · <sup>a</sup> · <sup>0</sup> <sup>∈</sup> (u, Λ) and 1.7 · a · 0.5 · a · 0 ∈/ (u, Λ). The discrete and continuous immediate exteriors of (u, Λ) are ext<sup>a</sup>((u, Λ)) = (<sup>u</sup> · <sup>a</sup>, Λ<sup>a</sup>) and ext<sup>t</sup> ((u, Λ)) = (u, Λ<sup>t</sup> ), where Λ<sup>a</sup> = {T<sup>0</sup>,<sup>0</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup><sup>0</sup>,<sup>1</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup><sup>0</sup>,<sup>2</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup><sup>1</sup>,<sup>2</sup> <sup>∈</sup> (0, 1) <sup>∧</sup> <sup>T</sup><sup>2</sup>,<sup>2</sup> <sup>=</sup> <sup>T</sup><sup>3</sup>,<sup>3</sup> = 0} and <sup>Λ</sup><sup>t</sup> <sup>=</sup> {T<sup>0</sup>,<sup>0</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup><sup>0</sup>,<sup>1</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup><sup>0</sup>,<sup>2</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup><sup>1</sup>,<sup>2</sup> <sup>∈</sup> (0, 1) <sup>∧</sup> <sup>T</sup><sup>2</sup>,<sup>2</sup> <sup>&</sup>gt; <sup>0</sup>}.

**Definition 10 (chronometric timed language).** *A timed language* L *is* chronometric *if there is a finite set* {(u1, Λ1),(u2, Λ2),...,(um, Λm)} *of disjoint elementary languages satisfying* L = <sup>i</sup>∈{1,2,...,m}(ui, Λi)*.*

For any elementary language L, its immediate exterior ext(L) is chronometric. We naturally extend the notion of exterior to chronometric timed languages, i. e., for a chronometric timed language L = <sup>i</sup>∈{1,2,...,m}(ui, Λi), we let ext(L) = <sup>i</sup>∈{1,2,...,m} ext((ui, Λi)), which is also chronometric. For a timed word w = τ0a1τ1a<sup>2</sup> ...anτn, we denote the valuation of τ0, τ1,...,τ<sup>n</sup> by κ(w).

*Chronometric relational morphism* [21] relates any timed word to a timed word in a certain set P, which is later used to define a timed language. Intuitively, the tuples in Φ specify a mapping from timed words immediately out of P to timed words in P. By inductively applying it, any timed word is mapped to P.

**Definition 11 (chronometric relational morphism).** *Let* P *be a chronometric and prefix-closed timed language. Let* (u, Λ, u , Λ , R) *be a 5-tuple such that* (u, Λ) ⊆ ext(P)*,* (u , Λ ) ⊆ P*, and* R *is a finite conjunction of equations of the form* <sup>T</sup>i,|u<sup>|</sup> <sup>=</sup> <sup>T</sup>- j,|u-| *, where* i ≤ |u| *and* j ≤ |u |*. For such a tuple, we let* -(u, Λ, u , Λ , R) ⊆ (u, Λ) × (u , Λ ) *be the relation such that* (w, w ) ∈ -(u, Λ, u , Λ , R) *if and only if* κ(w), κ(w ) |= R*. For a finite set* Φ *of such tuples, the* chronometric relational morphism -Φ ⊆ T (Σ)×P *is the relation inductively defined as follows: 1) for* w ∈ P*, we have* (w, w) ∈ -Φ*; 2) for* w ∈ ext(P) *and* w ∈ P*, we have* (w, w ) ∈ -Φ *if we have* (w, w ) ∈ -(u, Λ, u , Λ , R) *for one of the tuples* (u, Λ, u , Λ , R) ∈ Φ*; 3) for* w ∈ ext(P)*,* w ∈ T (Σ)*, and* w ∈ P*, we have* (w · w , w) ∈ -Φ *if there is* w ∈ T (Σ) *satisfying* (w, w) ∈ -Φ *and* (w · w , w) ∈ -Φ*. We also require that all* (u, Λ) *in the tuples in* Φ *must be disjoint and the union of each such* (u, Λ) *is* ext(P) \ P*.*

A chronometric relational morphism -Φ is *compatible* with F ⊆ P if for each tuple (u, Λ, u , Λ , R) defining -Φ, we have either (u , Λ ) ⊆ F or (u , Λ )∩F = ∅.

**Definition 12 (recognizable timed language).** *A timed language* L *is* recognizable *if there is a chronometric prefix-closed set* P*, a chronometric subset* F *of* P*, and a chronometric relational morphism* -Φ ⊆ T (Σ) × P *compatible with* F *satisfying* L = {w | ∃w ∈ F,(w, w ) ∈ -Φ}*.*

It is known that for any recognizable timed language L, we can construct a DTA A recognizing L, and vice versa [21].

#### **2.3 Distinguishing Extensions and Active DFA Learning**

Most DFA learning algorithms are based on *Nerode's congruence* [18]. For a (not necessarily regular) language L ⊆ Σ∗, Nerode's congruence ≡<sup>L</sup> ⊆ Σ<sup>∗</sup> × Σ<sup>∗</sup> is the equivalence relation satisfying w ≡<sup>L</sup> w if and only if for any w ∈ Σ∗, we have w · w ∈ L ⇐⇒ w · w ∈ L.

Generally, we cannot decide if w ≡<sup>L</sup> w by testing because it requires infinitely many membership checking. However, if L is regular, there is a finite set of suffixes <sup>S</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> called *distinguishing extensions* satisfying <sup>≡</sup><sup>L</sup> <sup>=</sup> <sup>∼</sup><sup>S</sup> L, where <sup>∼</sup><sup>S</sup> <sup>L</sup> is the equivalence relation satisfying <sup>w</sup> <sup>∼</sup><sup>S</sup> <sup>L</sup> <sup>w</sup> if and only if for any w ∈ S, we have w·w ∈ L ⇐⇒ w ·w ∈ L. Thus, the minimum DFA recognizing <sup>L</sup>tgt can be learned by<sup>3</sup>: i) identifying distinguishing extensions <sup>S</sup> satisfying <sup>≡</sup>Ltgt <sup>=</sup> <sup>∼</sup><sup>S</sup> <sup>L</sup>tgt and ii) constructing the minimum DFA <sup>A</sup> corresponding to <sup>∼</sup><sup>S</sup> <sup>L</sup>tgt .

The *L\* algorithm* [8] is an algorithm to learn the minimum DFA Ahyp recognizing the target regular language Ltgt with finitely many *membership* and *equivalence* queries to the teacher. In a membership query, the learner asks if w ∈ Σ<sup>∗</sup> belongs to the target language Ltgt i. e., w ∈ Ltgt. In an equivalence query, the learner asks if the hypothesis DFA Ahyp recognizes the target language

<sup>3</sup> The distinguishing extensions S can be defined locally. For example, the TTT algorithm [19] is optimized with *local* distinguishing extensions for some prefixes <sup>w</sup> <sup>∈</sup> <sup>Σ</sup>∗. Nevertheless, we use the global distinguishing extensions for simplicity.


Ltgt i. e., L(Ahyp) = Ltgt, where L(Ahyp) is the language of the hypothesis DFA Ahyp. When we have L(Ahyp) = Ltgt, the teacher returns a counterexample *cex* ∈ L(Ahyp)Ltgt. The information obtained via queries is stored in a 2 dimensional array called an *observation table*. See Fig. 1b for an illustration. For finite index sets P, S ⊆ Σ∗, for each pair (p, s) ∈ (P ∪P ·Σ)×S, the observation table stores whether p · s ∈ Ltgt. S is the current candidate of the distinguishing extensions, and <sup>P</sup> represents <sup>Σ</sup>∗/∼<sup>S</sup> <sup>L</sup>tgt . Since <sup>P</sup> and <sup>S</sup> are finite, one can fill the observation table using finite membership queries.

Algorithm 1 outlines an L\*-style algorithm. We start from P = S = {ε} and incrementally increase them. To construct a hypothesis DFA Ahyp, the observation table must be *closed* and *consistent*. An observation table is *closed* if, for each <sup>p</sup> <sup>∈</sup> <sup>P</sup> · <sup>Σ</sup>, there is <sup>p</sup> <sup>∈</sup> <sup>P</sup> satisfying <sup>p</sup> <sup>∼</sup><sup>S</sup> <sup>L</sup>tgt <sup>p</sup> . An observation table is *consistent* if, for any p, p <sup>∈</sup> <sup>P</sup> <sup>∪</sup><sup>P</sup> ·<sup>Σ</sup> and <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>, <sup>p</sup> <sup>∼</sup><sup>S</sup> <sup>L</sup>tgt <sup>p</sup> implies <sup>p</sup>·<sup>a</sup> <sup>∼</sup><sup>S</sup> <sup>L</sup>tgt <sup>p</sup> ·a.

Once the observation table becomes closed and consistent, the learner constructs a hypothesis DFA Ahyp and checks if L(Ahyp) = Ltgt by an equivalence query. If L(Ahyp) = Ltgt holds, Ahyp is the resulting DFA. Otherwise, the teacher returns *cex* ∈ L(Ahyp)Ltgt, which is used to refine the observation table. There are several variants of the refinement. In the L\* algorithm, all the prefixes of *cex* are added to P. In the Rivest-Schapire algorithm [20,25], an extension s strictly refining S is obtained by an analysis of *cex* , and such s is added to S.

### **3 A Myhill-Nerode Style Characterization of Recognizable Timed Languages with Elementary Languages**

Unlike the case of regular languages, any finite set of timed words cannot correctly distinguish recognizable timed languages due to the infiniteness of dwell time in timed words. Instead, we use a finite set of *elementary languages* to define a Nerode-style congruence. To define the Nerode-style congruence, we extend the notion of membership to elementary languages.

**Definition 13 (symbolic membership).** *For a timed language* L ⊆ T (Σ) *and an elementary language* (u, Λ) ∈ E(Σ)*, the* symbolic membership memsym <sup>L</sup> ((u, Λ)) *of* (u, Λ) *to* <sup>L</sup> *is the strongest constraint such that for any* <sup>w</sup> <sup>∈</sup> (u, Λ)*, we have* <sup>w</sup> ∈ L *if and only if* <sup>κ</sup>(w) <sup>|</sup><sup>=</sup> memsym <sup>L</sup> (L)*.*

We discuss how to obtain symbolic membership in Sect. 4.5. We define a Nerode-style congruence using symbolic membership. A naive idea is to distinguish two elementary languages by the equivalence of their symbolic membership. However, this does not capture the semantics of TAs. For example, for the DTA A in Fig. 1c, for any timed word w, we have 1.3 · a · 0.4 · w ∈ L(A) ⇐⇒ 0.3 · a · 1.0 · a · 0.4 · w ∈ L(A), while they have different symbolic membership. This is because symbolic membership distinguishes the *position* in timed words where each clock variable is reset, which must be ignored. We use *renaming equations* to abstract such positional information in symbolic membership. Note that Ti,n = <sup>n</sup> <sup>k</sup>=<sup>i</sup> τ<sup>k</sup> corresponds to the value of the clock variable reset at τi.

**Definition 14 (renaming equation).** *Let* <sup>T</sup> <sup>=</sup> {τ0, τ1,...,τn} *and* <sup>T</sup> <sup>=</sup> {τ - <sup>0</sup>, τ - <sup>1</sup>,...,τ - n- } *be sets of ordered variables. A* renaming equation <sup>R</sup> *over* <sup>T</sup> *and* T *is a finite conjunction of equations of the form* Ti,n = T- i-,n- *, where* i ∈ {0, 1,...,n}*,* i ∈ {0, 1,...,n }*,* <sup>T</sup>i,n <sup>=</sup> <sup>n</sup> <sup>k</sup>=<sup>i</sup> <sup>τ</sup><sup>k</sup> *and* <sup>T</sup>- i-,n- = <sup>n</sup>- k=i τ k*.*

**Definition 15 (**∼<sup>S</sup> <sup>L</sup>**).** *Let* L⊆T (Σ) *be a timed language, let* (u, Λ),(u , Λ ),(u, Λ) ∈ E(Σ) *be elementary languages, and let* R *be a renaming equation over* T *and* T *, where* T *and* T *are the variables of* Λ *and* Λ *, respectively. We let* (u, Λ) (u--,Λ--),R <sup>L</sup> (u , Λ ) *if we have the following: for any* w ∈ (u, Λ)*, there is* w ∈ (u , Λ ) *satisfying* κ(w), κ(w ) |= R*;* memsym <sup>L</sup> ((u, Λ)·(u, Λ))∧R∧Λ *is equivalent to* memsym <sup>L</sup> ((u , Λ )·(u, Λ))∧R∧Λ*. We let* (u, Λ) <sup>∼</sup>(u--,Λ--),R <sup>L</sup> (u , Λ ) *if we have* (u, Λ) (u--,Λ--),R <sup>L</sup> (u , Λ ) *and* (u , Λ (u--,Λ--),R <sup>L</sup> (u, Λ)*. Let* <sup>S</sup> ⊆ E(Σ)*. We let* (u, Λ) <sup>∼</sup>S,R <sup>L</sup> (u , Λ ) *if for any* (u, Λ) <sup>∈</sup> <sup>S</sup>*, we have* (u, Λ) <sup>∼</sup>(u--,Λ--),R <sup>L</sup> (u , Λ )*. We let* (u, Λ) <sup>∼</sup><sup>S</sup> <sup>L</sup> (u , Λ ) *if* (u, Λ) <sup>∼</sup>S,R <sup>L</sup> (u , Λ ) *for some renaming equation* R*.*

*Example 16.* Let A be the DTA in Fig. 1c and let (u, Λ), (u , Λ ), and (u, Λ) be elementary languages, where u = a, Λ = {τ<sup>0</sup> ∈ (1, 2)∧τ0+τ<sup>1</sup> ∈ (1, 2)∧τ<sup>1</sup> ∈ (0, 1)}, u = a·a, Λ = {τ <sup>0</sup> ∈ (0, 1)∧τ <sup>0</sup>+τ <sup>1</sup> ∈ (1, 2)∧τ <sup>1</sup>+τ <sup>2</sup> ∈ (1, 2)∧τ <sup>2</sup> ∈ (0, 1)}, u = a, and <sup>Λ</sup> <sup>=</sup> {τ<sup>0</sup> <sup>∈</sup> (0, 1) <sup>∧</sup> <sup>τ</sup><sup>1</sup> = 0}. We have memsym L(A)((u, Λ) · (u, Λ)) = <sup>Λ</sup> <sup>∧</sup> <sup>Λ</sup> <sup>∧</sup> τ<sup>1</sup> + τ <sup>0</sup> <sup>≤</sup> 1 and memsym L(A)((u , Λ ) · (u, Λ)) = Λ ∧ Λ ∧ τ <sup>2</sup> + τ <sup>0</sup> ≤ 1. Therefore, for the renaming equation T<sup>1</sup>,<sup>1</sup> = T- <sup>2</sup>,<sup>2</sup>, we have (u, Λ) <sup>∼</sup>(u--,Λ--),T1*,*1=T - 2*,*2 <sup>L</sup> (u , Λ ).

An algorithm to check if (u, Λ) <sup>∼</sup><sup>S</sup> <sup>L</sup> (u , Λ ) is shown in Appendix B.2 of [29].

Intuitively, (u, Λ) s,R <sup>L</sup> (u , Λ ) shows that any w ∈ (u, Λ) can be "simulated" by some w ∈ (u , Λ ) with respect to s and R. Such intuition is formalized as follows.

**Theorem 17.** *For any* L⊆T (Σ) *and* (u, Λ),(u , Λ ),(u, Λ) ∈ E(Σ) *satisfying* (u, Λ) (u--,Λ--) <sup>L</sup> (u , Λ )*, for any* w ∈ (u, Λ)*, there is* w ∈ (u , Λ ) *such that for any* w ∈ (u, Λ)*,* w · w ∈ L ⇐⇒ w · w ∈ L *holds.*

By (u,Λ)∈E(Σ)(u, Λ) = <sup>T</sup> (Σ), we have the following as a corollary.

**Corollary 18.** *For any timed language* L⊆T (Σ) *and for any elementary languages* (u, Λ),(u , Λ ) ∈ E(Σ)*,* (u, Λ) <sup>∼</sup>E(Σ) <sup>L</sup> (u , Λ ) *implies the following.*


The following characterizes recognizable timed languages with <sup>∼</sup><sup>E</sup>(Σ) <sup>L</sup> .

**Theorem 19. (Myhill-Nerode style characterization).** *A timed language* <sup>L</sup> *is recognizable if and only if the quotient set* SE(Σ)/∼<sup>E</sup>(Σ) <sup>L</sup> *is finite.*

By Theorem 19, we always have a finite set S of distinguishing extensions.

**Theorem 20.** *For any recognizable timed language* L*, there is a finite set* S *of elementary languages satisfying* <sup>∼</sup><sup>E</sup>(Σ) <sup>L</sup> <sup>=</sup> <sup>∼</sup><sup>S</sup> <sup>L</sup>*.*

### **4 Active Learning of Deterministic Timed Automata**

We show our L\*-style active learning algorithm for DTAs with the Nerode-style congruence in Sect. 3. We let Ltgt be the target timed language in learning.

For simplicity, we first present our learning algorithm with a smart teacher answering the following three kinds of queries: membership query memLtgt (w) asking whether <sup>w</sup> ∈ Ltgt, symbolic membership query asking memsym <sup>L</sup>tgt ((u, Λ)), and equivalence query eqLtgt (Ahyp) asking whether <sup>L</sup>(Ahyp) = <sup>L</sup>tgt. If <sup>L</sup>(Ahyp) = <sup>L</sup>tgt, eqLtgt (Ahyp) = , and otherwise, eqLtgt (Ahyp) is a timed word *cex* <sup>∈</sup> L(Ahyp)Ltgt. Later in Sect. 4.5, we show how to answer a symbolic membership query with finitely many membership queries. Our task is to construct a DTA A satisfying L(A) = Ltgt with finitely many queries.

#### **4.1 Successors of Simple Elementary Languages**

Similarly to the L\* algorithm in Sect. 2.3, we learn a DTA with an observation table. Reflecting the extension of the underlying congruence, we use sets of simple elementary languages for the indices. To define the extensions, P ∪ (P · Σ) in the L\* algorithm, we introduce *continuous* and *discrete successors* for simple elementary languages, which are inspired by *regions* [4]. We note that immediate exteriors do not work for this purpose. For example, for (u, Λ)=(a, {T<sup>0</sup>,<sup>1</sup> <sup>&</sup>lt; <sup>2</sup> <sup>∧</sup> <sup>T</sup><sup>1</sup>,<sup>1</sup> <sup>&</sup>lt; <sup>1</sup>}) and <sup>w</sup> = 0.<sup>7</sup> · <sup>a</sup> · <sup>0</sup>.9, we have <sup>w</sup> <sup>∈</sup> (u, Λ) and ext<sup>t</sup> ((u, Λ)) = (a, {T<sup>0</sup>,<sup>1</sup> = 2 <sup>∧</sup> <sup>T</sup><sup>1</sup>,<sup>1</sup> <sup>&</sup>lt; <sup>1</sup>}), but there is no t > 0 satisfying <sup>w</sup> · <sup>t</sup> <sup>∈</sup> ext<sup>t</sup> ((u, Λ)).

#### **Algorithm 2:** DTA construction from a timed observation table


For any (u, Λ) ∈ SE(Σ), we let Θ(u,Λ) be the total order over 0 and the fractional parts frac(T0,n), frac(T1,n),..., frac(Tn,n) of T0,n,T1,n,...,Tn,n. Such an order is uniquely defined because Λ is simple and canonical (Proposition 36 of [29]).

**Definition 21 (successor).** *Let* p = (u, Λ) ∈ SE(Σ) *be a simple elementary language. The* discrete successor succ<sup>a</sup>(p) *of* <sup>p</sup> *is* succ<sup>a</sup>(p)=(u·a, Λ∧τn+1 = 0)*. The* continuous successor succ<sup>t</sup> (p) *of* p *is* succ<sup>t</sup> (p)=(u, Λ<sup>t</sup> )*, where* Λ<sup>t</sup> *is defined as follows: if there is an equation* Ti,n = d *in* Λ*, all such equations are replaced with* <sup>T</sup>i,n <sup>∈</sup> (d, d + 1)*; otherwise, for each greatest* <sup>T</sup>i,n *in terms of* <sup>Θ</sup>(u,Λ)*, we replace* <sup>T</sup>i,n <sup>∈</sup> (d, d + 1) *with* <sup>T</sup>i,n <sup>=</sup> <sup>d</sup> + 1*. We let* succ(p) = <sup>a</sup>∈<sup>Σ</sup> succ<sup>a</sup>(p) <sup>∪</sup> succ<sup>t</sup> (p)*. For* P ⊆ SE(Σ)*, we let* succ(P) = <sup>p</sup>∈<sup>P</sup> succ(p)*.*

*Example 22.* Let <sup>u</sup> <sup>=</sup> aa, <sup>Λ</sup> <sup>=</sup> {T0,<sup>0</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup>0,<sup>1</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup>0,<sup>2</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup>1,<sup>1</sup> <sup>∈</sup> (0, 1) <sup>∧</sup> <sup>T</sup>1,<sup>2</sup> <sup>∈</sup> (0, 1) <sup>∧</sup> <sup>T</sup>2,<sup>2</sup> = 0}. The order <sup>Θ</sup>(u,Λ) is such that 0 = frac(T2,<sup>2</sup>) < frac(T1,<sup>2</sup>) < frac(T0,<sup>2</sup>). The continuous successor of (u, Λ) is succ<sup>t</sup> ((u, Λ)) = (u, Λ<sup>t</sup> ), where <sup>Λ</sup><sup>t</sup> <sup>=</sup> {T0,<sup>0</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup>0,<sup>1</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup>0,<sup>2</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup>1,<sup>1</sup> <sup>∈</sup> (0, 1) <sup>∧</sup> <sup>T</sup>1,<sup>2</sup> <sup>∈</sup> (0, 1) <sup>∧</sup> <sup>T</sup>2,<sup>2</sup> <sup>∈</sup> (0, 1)}. The continuous successor of (u, Λ<sup>t</sup> ) is succ<sup>t</sup> ((u, Λ<sup>t</sup> )) = (u, Λ*tt*), where <sup>Λ</sup>*tt* <sup>=</sup> {T<sup>0</sup>,<sup>0</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup><sup>0</sup>,<sup>1</sup> <sup>∈</sup> (1, 2) <sup>∧</sup> <sup>T</sup><sup>0</sup>,<sup>2</sup> = 2 <sup>∧</sup> <sup>T</sup><sup>1</sup>,<sup>1</sup> <sup>∈</sup> (0, 1) <sup>∧</sup> <sup>T</sup><sup>1</sup>,<sup>2</sup> <sup>∈</sup> (0, 1) <sup>∧</sup> <sup>T</sup><sup>2</sup>,<sup>2</sup> <sup>∈</sup> (0, 1)}.

#### **4.2 Timed Observation Table for Active DTA Learning**

We extend the observation table with (simple) elementary languages and symbolic membership to learn a *recognizable timed language*.

**Definition 23 (timed observation table).** *A* timed observation table *is a 3-tuple* (P, S, T) *such that:* P *is a prefix-closed finite set of simple elementary languages,* S *is a finite set of elementary languages, and* T *is a function mapping* (p, s) <sup>∈</sup> (<sup>P</sup> <sup>∪</sup> succ(P)) <sup>×</sup> <sup>S</sup> *to the symbolic membership* memsym <sup>L</sup>tgt (<sup>p</sup> · <sup>s</sup>)*.*

Figure 2 illustrates timed observation tables: each cell indexed by (p, s) show the symbolic membership memsym <sup>L</sup>tgt (<sup>p</sup> · <sup>s</sup>). For timed observation tables, we extend the notion of closedness and consistency with <sup>∼</sup><sup>S</sup> <sup>L</sup>tgt we introduced in Sect. 3.



We note that consistency is defined only for discrete successors. This is because a timed observation table does not always become "consistent" for continuous successors. See Appendix C of [29] for a detailed discussion. We also require *exterior-consistency* since we construct an exterior from a successor.

**Definition 24 (closedness, consistency, exterior-consistency, cohesion).** *Let* O = (P, S, T) *be a timed observation table.* O *is* closed *if, for each* <sup>p</sup> <sup>∈</sup> succ(P) \P*, there is* <sup>p</sup> <sup>∈</sup> <sup>P</sup> *satisfying* <sup>p</sup> <sup>∼</sup><sup>S</sup> <sup>L</sup>tgt <sup>p</sup> *.* O *is* consistent *if, for each* p, p <sup>∈</sup> <sup>P</sup> *and for each* <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>*,* <sup>p</sup> <sup>∼</sup><sup>S</sup> <sup>L</sup>tgt <sup>p</sup> *implies* succ<sup>a</sup>(p) <sup>∼</sup><sup>S</sup> <sup>L</sup>tgt succ<sup>a</sup>(p )*.* O *is* exterior-consistent *if for any* <sup>p</sup> <sup>∈</sup> <sup>P</sup>*,* succ<sup>t</sup> (p) <sup>∈</sup>/ <sup>P</sup> *implies* succ<sup>t</sup> (p) <sup>⊆</sup> ext<sup>t</sup> (p)*.* O *is* cohesive *if it is closed, consistent, and exterior-consistent.*

From a cohesive timed observation table (P, S, T), we can construct a DTA as outlined in Algorithm 2. We construct a DTA via a recognizable timed language. The main ideas are as follows: 1) we approximate <sup>∼</sup><sup>E</sup>(Σ),R <sup>L</sup>tgt by <sup>∼</sup>S,R <sup>L</sup>tgt ; 2) we decide the equivalence class of ext(p) ∈ ext(P)\P in E(Σ) from successors. Notice that there can be multiple renaming equations <sup>R</sup> showing <sup>∼</sup>S,R <sup>L</sup>tgt . We use one of them found by an exhaustive search in Appendix B.2 of [29].

The DTA obtained by MakeDTA is faithful to the timed observation table in *rows*, i. e., for any p ∈ P ∪succ(P), Ltgt∩p = L(Ahyp)∩p. However, unlike the L\* algorithm, this does not hold for each *cell*, i. e., there may be p ∈ P ∪succ(P) and s ∈ S satisfying Ltgt ∩ (p · s) = L(Ahyp) ∩ (p · s). This is because we do not (and actually cannot) enforce consistency for continuous successors. See Appendix C of [29] for a discussion. Nevertheless, this does not affect the correctness of our algorithm partly by Theorem 26.

**Theorem 25 (row faithfulness).** *For any cohesive timed observation table* (P, S, T)*, for any* p ∈ P ∪ succ(P)*,* Ltgt ∩ p = L(*MakeDTA*(P, S, T)) ∩ p *holds.*

**Theorem 26.** *For any cohesive timed observation table* (P, S, T)*,* <sup>∼</sup><sup>S</sup> Ltgt=∼<sup>E</sup>(Σ) Ltgt *implies* Ltgt = L(*MakeDTA*(P, S, T))*.*

### **Algorithm 4:** Outline of our L\*-style algorithm for DTA learning

**<sup>1</sup>** P ← {(ε, τ<sup>0</sup> = 0)}; S ← {(ε, τ- <sup>0</sup> = 0)} **<sup>2</sup> while do <sup>3</sup> while** (P, S, T) *is not cohesive* **do <sup>4</sup> if** <sup>∃</sup><sup>p</sup> <sup>∈</sup> succ(<sup>P</sup> ) \ P. p- <sup>∈</sup> P. p <sup>∼</sup>*<sup>S</sup>* <sup>L</sup>tgt <sup>p</sup> **then** // (P, S, T) is not closed **<sup>5</sup>** P ← P ∪ {p} // Add such p to P **<sup>6</sup> else if** <sup>∃</sup>p, p- <sup>∈</sup> P, a <sup>∈</sup> Σ. p <sup>∼</sup>*<sup>S</sup>* <sup>L</sup>tgt <sup>p</sup>- <sup>∧</sup> succ*a*(p)∼*<sup>S</sup>* <sup>L</sup>tgt succ*a*(p- ) **then** // (P, S, T) is inconsistent due to a **<sup>7</sup> let** S- <sup>⊆</sup> <sup>S</sup> **be** a minimal set such that <sup>p</sup>∼*S*∪{{*a*·*w*|*w*∈*s*}|*s*∈*S*-} <sup>L</sup>tgt <sup>p</sup>- **<sup>8</sup>** S ← S ∪ {{a · w | w ∈ s} | s ∈ S- } **<sup>9</sup> else** // (P, S, T) is not exterior-consistent **<sup>10</sup>** P ← P ∪ {p- <sup>∈</sup> succ*t*(<sup>P</sup> ) \ <sup>P</sup> | ∃<sup>p</sup> <sup>∈</sup> P. p- = succ*t*(p) <sup>∧</sup> <sup>p</sup>- ext*t*(p)} **<sup>11</sup>** fill T using symbolic membership queries **<sup>12</sup>** Ahyp ← MakeDTA(*P, S, T*) **<sup>13</sup> if** *cex* <sup>=</sup> eqLtgt (Ahyp) **then <sup>14</sup> add** AnalyzeCEX(*cex*) **to** S **<sup>15</sup> else return** <sup>A</sup>hyp // It returns <sup>A</sup>hyp if eqLtgt (Ahyp) = .

#### **4.3 Counterexample Analysis**

We analyze the counterexample *cex* obtained by an equivalence query to refine the set S of suffixes in a timed observation table. Our analysis, outlined in Algorithm 3, is inspired by the Rivest-Schapire algorithm [20,25]. The idea is to reduce the counterexample *cex* using the mapping defined by the congruence ∼S <sup>L</sup>tgt (lines 5–7 ), much like <sup>Φ</sup> in recognizable timed languages, and to find a suffix <sup>s</sup> strictly refining <sup>S</sup> (line 9), i. e., satisfying <sup>p</sup> <sup>∼</sup><sup>S</sup> <sup>L</sup>tgt <sup>p</sup> and <sup>p</sup>∼<sup>S</sup>∪{s} <sup>L</sup>tgt <sup>p</sup> for some p ∈ succ(P) and p ∈ P.

By definition of *cex* , we have *cex* = w<sup>0</sup> ∈ LtgtL(Ahyp). By Theorem 25, w<sup>n</sup> ∈ LtgtL(Ahyp) holds, where n is the final value of i. By construction of Ahyp and wi, for any i ∈ {1, 2,...,n}, we have w<sup>0</sup> ∈ L(Ahyp) ⇐⇒ w<sup>i</sup> ∈ L(Ahyp). Therefore, there is i ∈ {1, 2,...,n} satisfying w<sup>i</sup>−<sup>1</sup> ∈ LtgtL(Ahyp) and w<sup>i</sup> ∈ LtgtL(Ahyp). For such i, since we have w<sup>i</sup>−<sup>1</sup> = w <sup>i</sup> · w <sup>i</sup> ∈ LtgtL(Ahyp), w<sup>i</sup> = w<sup>i</sup> · w <sup>i</sup> ∈ LtgtL(Ahyp), and κ(w <sup>i</sup>), κ(wi) |= Ri, such w <sup>i</sup> is a witness of p i∼<sup>E</sup>(Σ),R*<sup>i</sup>* <sup>L</sup>tgt <sup>p</sup>i. Therefore, <sup>S</sup> can be refined by the simple elementary language s ∈ SE(Σ) including w i .

#### **4.4 L\*-Style Learning Algorithm for DTAs**

Algorithm 4 outlines our active DTA learning algorithm. At line 1, we initialize the timed observation table with P = {(ε, τ<sup>0</sup> = 0)} and S = {(ε, τ <sup>0</sup> = 0)}. In the loop in lines 2–15, we refine the timed observation table until the hypothesis DTA Ahyp recognizes the target language Ltgt, which is checked by equivalence queries. The refinement finishes when the equivalence relation <sup>∼</sup><sup>S</sup> <sup>L</sup>tgt defined by the suffixes <sup>S</sup> converges to <sup>∼</sup><sup>E</sup>(Σ) <sup>L</sup>tgt , and the prefixes <sup>P</sup> covers SE(Σ)/∼<sup>E</sup>(Σ) <sup>L</sup>tgt .

In the loop in lines 3–11, we make the timed observation table cohesive. If the timed observation table is not closed, we move the incompatible row in succ(P)\ P to P (line 5). If the timed observation table is inconsistent, we concatenate an


(a) The initial timed observation table <sup>O</sup>1 = (P1, S1, T1) (b) DTA A<sup>1</sup> hyp constructed from <sup>O</sup>1 (c) DTA A<sup>3</sup> hyp constructed from <sup>O</sup><sup>3</sup> 


(d) Timed observation table <sup>O</sup>2 = (P2, S2, T2) after processing *cex*

(e) The final observation table <sup>O</sup>3 = (P3, S3, T3)

**Fig. 2.** Timed observation tables <sup>O</sup>1, O2, O3, and the DTAs <sup>A</sup><sup>1</sup> hyp and <sup>A</sup><sup>3</sup> hyp made from <sup>O</sup>1 and <sup>O</sup>3, respectively. In <sup>O</sup>2 and <sup>O</sup>3, we only show the constraints non-trivial from p and s. The DTAs are simplified without changing the language. The use of clock assignments, which does not change the expressiveness, is from [21].

event a ∈ Σ in front of some of the suffixes in S (line 8). If the timed observation table is not exterior-consistent, we move the boundary succ<sup>t</sup> (p) <sup>∈</sup> succ<sup>t</sup> (P) \ P satisfying succ<sup>t</sup> (p) ext<sup>t</sup> (p) to P (line 10). Once we obtain a cohesive timed observation table, we construct a DTA Ahyp = MakeDTA(P, S, T) and make an equivalence query (line 12). If we have L(Ahyp) = Ltgt, we return Ahyp. Otherwise, we have a timed word *cex* witnessing the difference between the language of the hypothesis DTA Ahyp and the target language Ltgt. We refine the timed observation table using Algorithm 3.

*Example 27.* Let Ltgt be the timed language recognized by the DTA in Fig. 1c. We start from P = {(ε, τ<sup>0</sup> = 0)} and S = {(ε, τ <sup>0</sup> = 0}. Figure 2a shows the initial timed observation table O1. Since the timed observation table O<sup>1</sup> in Fig. 2a is cohesive, we construct a hypothesis DTA <sup>A</sup><sup>1</sup> hyp. The hypothesis recognizable timed language is (P1, F1, Φ1) is such that P<sup>1</sup> = F<sup>1</sup> = {(ε, τ<sup>0</sup> = 0)} and Φ<sup>1</sup> = {(ε, τ<sup>0</sup> > 0, ε, τ0, ),(a, τ<sup>0</sup> = τ<sup>0</sup> +τ<sup>1</sup> = τ<sup>1</sup> = 0, ε, τ0, )}. Figure 2b shows the first hypothesis DTA <sup>A</sup><sup>1</sup> hyp.

We have <sup>L</sup>(A<sup>1</sup> hyp) = Ltgt, and the learner obtains a counterexample, e. g., *cex* = 1.0 · a · 0, with an equivalence query. In Algorithm 3, we have w<sup>0</sup> = *cex* , <sup>w</sup><sup>1</sup> = 0.<sup>5</sup> · <sup>a</sup> · 0, <sup>w</sup><sup>2</sup> = 0 · <sup>a</sup> · 0, and <sup>w</sup><sup>3</sup> = 0. We have <sup>w</sup><sup>0</sup> ∈ L(A<sup>1</sup> hyp)Ltgt and <sup>w</sup><sup>1</sup> ∈ L(A<sup>1</sup> hyp)Ltgt, and the suffix to distinguish w<sup>0</sup> and w<sup>1</sup> is 0.5 · a · 0. Thus, we add s<sup>1</sup> = (a, τ <sup>1</sup> = 0 < τ <sup>0</sup> = τ <sup>0</sup> + τ <sup>1</sup> < 1) to S<sup>1</sup> (Fig. 2d).

In Fig. 2d, we observe that T2(p1, s1) is more strict than T2(p0, s1), and we have <sup>p</sup>1∼S<sup>2</sup> <sup>L</sup>tgt <sup>p</sup>0. To make (P2, S2, T2) closed, we add <sup>p</sup><sup>1</sup> to <sup>P</sup>2. By repeating similar operations, we obtain the timed observation table O<sup>3</sup> = (P3, S3, T3) in Fig. 2e, which is cohesive. Figure 2c shows the DTA <sup>A</sup><sup>3</sup> hyp constructed from O3. Since <sup>L</sup>(A<sup>3</sup> hyp) = <sup>L</sup>tgt holds, Algorithm <sup>4</sup> finishes returning <sup>A</sup><sup>3</sup> hyp.

By the use of equivalence queries, Algorithm 4 returns a DTA recognizing the target language if it terminates, which is formally as follows.

**Theorem 28 (correctness).** *For any target timed language* Ltgt*, if Algorithm 4 terminates, for the resulting DTA* Ahyp*,* L(Ahyp) = Ltgt *holds.*

Moreover, Algorithm 4 terminates for any recognizable timed language Ltgt essentially because of the finiteness of SE(Σ)/∼<sup>E</sup>(Σ) <sup>L</sup>tgt .

**Theorem 29 (termination).** *For any recognizable timed language* Ltgt*, Algorithm 4 terminates and returns a DTA* A *satisfying* L(A) = Ltgt*.*

*Proof (Theorem* 29*).* By the recognizability of Ltgt and Theorem 19, SE(Σ)/∼<sup>E</sup>(Σ) <sup>L</sup>tgt is finite. Let <sup>N</sup> <sup>=</sup> |SE(Σ)/∼<sup>E</sup>(Σ) <sup>L</sup>tgt <sup>|</sup>. Since each execution of line <sup>5</sup> adds <sup>p</sup> to <sup>P</sup>, where <sup>p</sup> is such that for any <sup>p</sup> <sup>∈</sup> <sup>P</sup>, <sup>p</sup>∼<sup>E</sup>(Σ) <sup>L</sup>tgt <sup>p</sup> holds, it is executed at most N times. Since each execution of line 8 refines S, i. e., it increases |SE(Σ)/∼<sup>S</sup> <sup>L</sup>tgt <sup>|</sup>, line <sup>8</sup> is executed at most <sup>N</sup> times. For any (u, Λ) ∈ SE(Σ), if <sup>Λ</sup> contains <sup>T</sup>i,|u<sup>|</sup> <sup>=</sup> <sup>d</sup> for some <sup>i</sup> ∈ {0, <sup>1</sup>,..., <sup>|</sup>u|} and <sup>d</sup> <sup>∈</sup> <sup>N</sup>, we have succ<sup>t</sup> ((u, Λ)) <sup>⊆</sup> ext<sup>t</sup> ((u, Λ)). Therefore, line 10 is executed at most N times. Since <sup>S</sup> is strictly refined in line 14, i. e., it increases |SE(Σ)/∼<sup>S</sup> <sup>L</sup>tgt <sup>|</sup>, line <sup>14</sup> is executed at most <sup>N</sup> times. By Theorem 26, once <sup>∼</sup><sup>S</sup> <sup>L</sup>tgt saturates to <sup>∼</sup><sup>E</sup>(Σ) <sup>L</sup>tgt , MakeDTA returns the correct DTA. Overall, Algorithm 4 terminates.

#### **4.5 Learning with a Normal Teacher**

We briefly show how to learn a DTA only with membership and equivalence queries. We reduce a symbolic membership query to finitely many membership queries, answerable by a normal teacher. See Appendix B.1 of [29] for detail.

Let (u, Λ) be the elementary language given in a symbolic membership query. Since Λ is bounded, we can construct a finite and disjoint set of simple and canonical timed conditions Λ 1, Λ 2,...,Λ <sup>n</sup> satisfying <sup>1</sup>≤i≤<sup>n</sup> <sup>Λ</sup> <sup>i</sup> = Λ by a simple enumeration. For any simple elementary language (u , Λ ) ∈ SE(Σ) and timed words w, w ∈ (u , Λ ), we have w ∈ L ⇐⇒ w ∈ L. Thus, we can construct memsym <sup>L</sup> ((u, Λ)) by making a membership query memL(w) for each such (u , Λ ) ⊆ (u, Λ) and for some w ∈ (u , Λ ). We need such an exhaustive search, instead of a binary search, because memsym <sup>L</sup> ((u, Λ)) may be non-convex.

Assume Λ is a canonical timed condition. Let M be the size of the variables in Λ and I be the largest difference between the upper bound and the lower bound for some Ti,j in Λ. The size n of the above decomposition is bounded by (2 <sup>×</sup> <sup>I</sup> + 1)<sup>1</sup>/2×M×(M+1), which exponentially blows up with respect to <sup>M</sup>.

In our algorithm, we only make symbolic membership queries with elementary languages of the form p · s, where p and s are simple elementary languages. Therefore, I is at most 2. However, even with such an assumption, the number of the necessary membership queries blows up exponentially to the size of the variables in Λ.

#### **4.6 Complexity Analysis**

After each equivalence query, our DTA learning algorithm strictly refines S or terminates. Thus, the number of equivalence queries is at most N. In the proof of Theorem 29, we observe that the size of P is at most 2N. Therefore, the number (|P| + |succ(P)|) × |S| of the cells in the timed observation table is at most (2<sup>N</sup> + 2<sup>N</sup> <sup>×</sup> (|Σ<sup>|</sup> + 1)) <sup>×</sup> <sup>N</sup> = 2N<sup>2</sup>|Σ<sup>|</sup> + 2. Let <sup>J</sup> be the upper bound of i in the analysis of *cex* returned by equivalence queries (Algorithm 3). For each equivalence query, the number of membership queries in Algorithm 3 is bounded by log J, and thus, it is, in total, bounded by N × log J. Therefore, if the learner can use symbolic membership queries, the total number of queries is bounded by a polynomial of N and J. In Sect. 4.5, we observe that the number of membership queries to implement a symbolic membership query is at most exponential to M. Since P is prefix-closed, M is at most N. Overall, if the learner cannot use symbolic membership queries, the total number of queries is at most exponential to N.

**Table 1.** Summary of the results for Random. Each row index |L| |Σ| K<sup>C</sup> shows the number of locations, the alphabet size, and the upper bound of the maximum constant in the guards, respectively. The row "count" shows the number of instances finished in 3 h. Cells with the best results are highlighted.


Let Atgt = (Σ, L, l0, C, I, Δ, F) be a DTA recognizing Ltgt. As we observe in the proof of Lemma 33 of [29], N is bounded by the size of the state space of the region automaton [4] of <sup>A</sup>tgt, <sup>N</sup> is at most <sup>|</sup>C|!×2|C<sup>|</sup> × <sup>c</sup>∈<sup>C</sup> (2Kc+2)×|L|, where K<sup>c</sup> is the largest constant compared with c ∈ C in Atgt. Thus, without symbolic membership queries, the total number of queries is at most doubly-exponential to |C| and singly exponential to |L|. We remark that when |C| = 1, the total number of queries is at most singly exponential to |L| and Kc, which coincides with the worst-case complexity of the one-clock DTA learning algorithm in [30].

### **5 Experiments**

We experimentally evaluated our DTA learning algorithm using our prototype library LearnTA<sup>4</sup> implemented in C++. In LearnTA, the equivalence queries are answered by a zone-based reachability analysis using the fact that DTAs are closed under complement [4]. We pose the following research questions.

RQ1 How is the scalability of LearnTA to the language complexity? RQ2 How is the efficiency of LearnTA for practical benchmarks?

For the benchmarks with one clock variable, we compared LearnTA with one of the latest one-clock DTA learning algorithms [1,30], which we call OneSMT. OneSMT is implemented in Python with Z3 [23] for constraint solving.

For each execution, we measured the number of queries and the total execution time, including the time to answer the queries. For the number of queries, we report the number with memoization, i. e., we count the number of the queried timed words (for membership queries) and the counterexamples (for equivalence queries). We conducted all the experiments on a computing server with Intel Core i9-10980XE 125 GiB RAM that runs Ubuntu 20.04.5 LTS. We used 3 h as the timeout.

(a) Membership queries (b) Membership queries (log scale) (c) Equivalence queries

**Fig. 3.** The number of locations and the number of queries for |L| 2 10 in Random, where |L|∈{3, 4, 5, 6}

<sup>4</sup> LearnTA is publicly available at https://github.com/masWag/LearnTA. The artifact of the experiments is available at https://doi.org/10.5281/zenodo.7875383.


**Table 2.** Summary of the target DTAs and the results for Unbalanced. |L| is the number of locations, |Σ| is the alphabet size, |C| is the number of clock variables, and K<sup>C</sup> is the maximum constant in the guards in the DTA.

#### **5.1 RQ1: Scalability with Respect to the Language Complexity**

To evaluate the scalability of LearnTA, we used randomly generated DTAs from [5] (denoted as Random) and our original DTAs (denoted as Unbalanced). Random consists of five classes: 3 2 10, 4 2 10, 4 4 20, 5 2 10, and 6 2 10, where each value of |L| |Σ| K<sup>C</sup> is the number of locations, the alphabet size, and the upper bound of the maximum constant in the guards in the DTAs, respectively. Each class consists of 10 randomly generated DTAs. Unbalanced is our original benchmark inspired by the "unbalanced parentheses" timed language from [10]. Unbalanced consists of five DTAs with different complexity of timing constraints. Table 2 summarizes their complexity.

Table 1 and 3 summarize the results for Random, and Table 2 summarizes the results for Unbalanced. Table 1 shows that LearnTA requires more membership queries than OneSMT. This is likely because of the difference in the definition of prefixes and successors: OneSMT's definitions are discrete (e. g., prefixes are only with respect to events with time elapse), whereas ours are both continuous and discrete (e. g., we also consider prefixes by trimming the dwell time in the end); Since our definition makes significantly more prefixes, LearnTA tends to require much more membership queries. Another, more high-level reason is that LearnTA learns a DTA without knowing the number of the clock variables, and many more timed words are potentially helpful for learning. Table 1 shows that LearnTA requires significantly many membership queries for 4 4 20. This is likely because of the exponential blowup with respect to K<sup>C</sup> , as discussed in Sect. 4.6. In Fig. 3, we observe that for both LearnTA and OneSMT, the number of membership queries increases nearly exponentially to the number of locations. This coincides with the discussion in Sect. 4.6.

In contrast, Table 1 shows that LearnTA requires fewer equivalence queries than OneSMT. This suggests that the cohesion in Definition 24 successfully detected contradictions in observation before generating a hypothesis, whereas OneSMT mines timing constraints mainly by equivalence queries and tends to require more equivalence queries. In Fig. 3c, we observe that for both LearnTA and OneSMT, the number of equivalence queries increases nearly linearly to the number of locations. This also coincides with the complexity analysis in Sect. 4.6. Figure 3c also shows that the number of equivalence queries increases faster in OneSMT than in LearnTA.


**Table 3.** Summary of the target DTA and the results for practical benchmarks. The columns are the same as Table 2. Cells with the best results are highlighted.

Table 2 also suggests a similar tendency: the number of membership queries rapidly increases to the complexity of the timing constraints; In contrast, the number of equivalence queries increases rather slowly. Moreover, LearnTA is scalable enough to learn a DTA with five clock variables within 15 min.

Table 1 also suggests that LearnTA does not scale well to the maximum constant in the guards, as observed in Sect. 4.6. However, we still observe that LearnTA requires fewer equivalence queries than OneSMT. Overall, compared with OneSMT, LearnTA has better scalability in the number of equivalence queries and worse scalability in the number of membership queries.

#### **5.2 RQ2: Performance on Practical Benchmarks**

To evaluate the practicality of LearnTA, we used seven benchmarks: AKM, CAS, Light, PC, TCP, Train, and FDDI. Table 3 summarizes their complexity. All the benchmarks other than FDDI are taken from [30] (or its implementation [1]). FDDI is taken from TChecker [2]. We use the instance of FDDI with two processes.

Table 3 summarizes the results for the benchmarks from practical applications. We observe, again, that LearnTA requires more membership queries and fewer equivalence queries than OneSMT. However, for these benchmarks, the difference in the number of membership queries tends to be much smaller than in Random. This is because these benchmarks have simpler timing constraints than Random for the exploration by LearnTA. In AKM, Light, PC, TCP, and Train, the clock variable can be reset at every edge without changing the language. For such a DTA, all simple elementary languages are equivalent in terms of the Nerode-style congruence if we have the same edge at their last event and the same dwell time after it. If two simple elementary languages are equivalent, LearnTA explores the successors of only one of them, and the exploration is relatively efficient. We have a similar situation in CAS. Moreover, in many of these DTAs, only a few edges have guards. Overall, despite the large number of locations and alphabets, these languages' complexities are mild for LearnTA.

We also observe that, surprisingly, for all of these benchmarks, LearnTA took a shorter time for DTA learning than OneSMT. This is partly because of the difference in the implementation language (i. e., C++ vs. Python) but also because of the small number of equivalence queries and the mild number of membership queries. Moreover, although it requires significantly more queries, LearnTA successfully learned FDDI with seven clock variables. Overall, such efficiency on benchmarks from practical applications suggests the potential usefulness of LearnTA in some realistic scenarios.

#### **6 Conclusions and Future Work**

Extending the L\* algorithm, we proposed an active learning algorithm for DTAs. Our extension is by our Nerode-style congruence for recognizable timed languages. We proved the termination and the correctness of our algorithm. We also proved that our learning algorithm requires a polynomial number of queries with a smart teacher and an exponential number of queries with a normal teacher. Our experiment results also suggest the practical relevance of our algorithm.

One of the future directions is to extend more recent automata learning algorithms (e. g., TTT algorithm [19] to improve the efficiency) to DTA learning. Another direction is constructing a *passive* DTA learning algorithm based on our congruence and an existing passive DFA learning algorithm. It is also a future direction to apply our learning algorithm for practical usage, e. g., identification of black-box systems and testing black-box systems with black-box checking [22, 24,28]. Optimization of the algorithm, e. g., by incorporating clock information is also a future direction.

**Acknowledgements.** This work is partially supported by JST ACT-X Grant No. JPMJAX200U, JST PRESTO Grant No. JPMJPR22CA, JST CREST Grant No. JPMJCR2012, and JSPS KAKENHI Grant No. 22K17873.

#### **References**


2014, Kyoto, Japan, 17–19 September 2014. JMLR Workshop and Conference Proceedings, vol. 34, pp. 79–93. JMLR.org (2014). http://proceedings.mlr.press/v34/ isberner14a.html


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Automated Analyses of IOT Event Monitoring Systems

Andrew Apicelli<sup>1</sup>, Sam Bayless<sup>1</sup> , Ankush Das<sup>1</sup> , Andrew Gacek<sup>1</sup> , Dhiva Jaganathan<sup>1</sup>, Saswat Padhi<sup>2</sup>, Vaibhav Sharma3(B) , Michael W. Whalen<sup>1</sup> , and Raveesh Yadav<sup>1</sup>

<sup>1</sup> Amazon Web Services, Inc., Seattle, USA {apicea,sabayles,daankus,gacek,dhivasj,mww,raveeshy}@amazon.com <sup>2</sup> Google LLC, Mountain View, USA spadhi@google.com <sup>3</sup> Amazon.com Services LLC, Seattle, USA svaib@amazon.com

Abstract. AWS IoT Events is an AWS service that makes it easy to respond to events from IoT sensors and applications. *Detector models* in AWS IoT Events enable customers to monitor their equipment or device fleets for failures or changes in operation and trigger actions when such events occur. If these models are incorrect, they may become out-of-sync with the actual state of the equipment causing customers to be unable to respond to events occurring on it.

Working backwards from common mistakes made when creating detector models, we have created a set of automated analyzers that allow customers to prove their models are free from six common mistakes. Our analyzers have been running in the AWS IoT Events production service since December 2021. Our analyzers check six correctness properties in the production service in real time. 93% of customers of AWS IoT Events have run our analyzers without needing to have any knowledge of them. Our analyzers have reported property violations in 22% of submitted detector models in the production service.

### 1 Introduction

AWS IoT Events is a managed service for managing fleets of IoT devices. Customers use AWS IoT Events in diverse use cases such as monitoring self-driving wheelchairs, monitoring a device's network connectivity, humidity, temperature, pressure, oil level, and oil temperature sensing. Customers use AWS IoT Events by creating a *detector model* that detects events occurring on IoT devices and notifies an external service so that a corrective action can be taken. An example is an industrial boiler which constantly reports its temperature to a detector. The detector tracks the boiler's average temperature over the past 90 min and notifies a human operator when it is running too hot.

c The Author(s) 2023

S. Padhi–Work done while at Amazon Web Services, Inc.

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 27–39, 2023. https://doi.org/10.1007/978-3-031-37706-8\_2

Each detector model is defined as a finite state machine with dynamically typed variables and timers, where timers allow detectors to keep track of state over time. A model processes inputs from IoT devices to update internal state and to notify other AWS services when events are detected. Customers can use a single detector model to instantaneously detect events in thousands of devices. Ensuring well-formedness of a detector model is crucial as ill-formed detector models can miss events in *every* monitored device.

Starting from a survey that identified sources of well-formedness problems in customer models, we identified most common mistakes made by customers and detect them using type- and model-checking. To use a model-checker for checking well-formedness of a detector model, we formalize the execution semantics of a detector model and translate this semantics into the source-language notation of the JKind model checker [1]. Model checking [2–9] verifies desirable properties over the behavior of a system by performing the equivalent of an exhaustive enumeration of all the states reachable from its initial state. Most model checking tools use *symbolic encodings* and some form of *induction* [6] to prove properties of very large finite or even infinite state spaces.

We have implemented type-checking and model-checking as an analysis feature in the production AWS IoT Events service. Our analyzers have reported well-formedness property violations in 22% of submitted detector models. 93% of customers of AWS IoT Events have checked their detector models using our analyzers. Our analyzers report property violations to customers with an average latency of 5.6 s (see Sect. 4).

Our contributions are as follows:


### 2 Overview

Consider a user of AWS IoT Events who wants to monitor the temperature of an industrial boiler. If the industrial boiler overheats, it can cause fires and endanger human lives. To detect an early warning of an overheating event, they want to automatically identify two different alarming events on the boiler's temperature. They want their first alarm to be triggered if the boiler's reported temperature is outside the normal range for more than 1 min. They want their second alarm to be triggered if the temperature is outside the normal range for another 5 min after the first alarm.

A user might try to implement these requirements by creating the (flawed) detector model shown in Fig. 1. This detector receives temperature data from the boiler and responds by sending a text message to the user. The detector model contains four states:

Fig. 1. AWS IoT Events detector model with two alarms (buggy version)


Fig. 2. An action in the detector model from Fig. 1


*Understanding the Bug:* Every state in the detector model consists of *actions*. An action changes the internal state of a detector or triggers an external service. For example, the GettingTooHot state consists of an action that starts a timer. The user can edit these actions with an interface shown in Fig. 2. This action starts a one minute timer named Wait1Min. Note that timers are accessible from every state in the detector model. Even though the Wait1Min timer is created in the GettingTooHot state of Fig. 1, it can be checked for expiration in all the four states of Fig. 1.

The detector model in Fig. 1 has a fatal flaw based on a typo. The user has written timeout("Wait1Min") instead of timeout("Wait5Min") when transitioning out of TooHot. This is allowed as timers are globally referenceable. However, it is a bug because each global timer has a unique name and the Wait1Min timer has already been used and expired. This makes StillTooHot unreachable, meaning the 2nd alarm won't ever fire, since a timer can expire at most once.

*Related Work.* Languages such as IOTA [10], SIFT [11], and the system from Garcia et al. [12] use *trigger-condition-action rules* [13] to control the behavior of internet of things applications. These languages have the benefit of being largely declarative, allowing users to specify desired actions under different environmental stimuli. Similar to our approach, SIFT [11] automatically removes common user mistakes as well as compiles specifications into controller implementations without user interaction, and IOTA [10] is a reasoning calculus that allows custom specifications to be written both about why something *should* or *should not* occur. AWS IoT Events is designed explicitly for monitoring, rather than control, and our approach is imperative, rather than declarative: detector models do not have the same inconsistencies as rule sets, as they are disambiguated using explicit priorities on transitions. On the other hand, customers may still construct machines that do not match their intentions, motivating the analyses described in this paper.

### 3 Technique

In this section, we present a formal execution semantics of an AWS IoT Events detector model and describe specifications for the correctness properties.

*Formalization of Detector Models.* Defining the alphabet and the transition relation for the state machine is perhaps the most interesting aspect of our formalization. Since detector models may contain global timers, *timed automata* [14] might seem like an apt candidate abstraction. However, AWS IoT Events users are not allowed to change the clock frequency of timers, nor specify arbitrary *clock constraints*. These observations allow us to formalize the detector models as a regular state machine, with timeout durations as additional state variables.

Formally, we represent the state machine for a detector model **M** as a tuple -**<sup>S</sup>**, **<sup>S</sup>**<sup>0</sup>, **<sup>I</sup>**, **<sup>G</sup>**, **<sup>T</sup>**, <sup>E</sup><sup>E</sup>, <sup>E</sup><sup>X</sup>, <sup>E</sup><sup>I</sup> , where:


It is assumed that the sets **I**, **G**, and **T** are pairwise disjoint, and we define the set **V** -**I** ∪ **G** to represent input and global variables in the model.

We denote by V the set of values for global (**G**) and input (**I**) variables; V ranges over the values of primitive types: integers, decimals (rationals), booleans,

```
τ ::= int | dec | str | bool
  ::= e0 bop e1 | uop e0 | l | v | timeout(t) | isundefined(v) | ...
α ::= setTimer(t, e) | resetTimer(t)
      | clearTimer(t) | setGlobal(g, e)
κ ::= event(e, a∗)
μ ::= transition(e, a∗, s)
ι ::= message(i, v) timeout(t)
```
Fig. 3. Types, expressions, actions, and events in IoT Events Detector Models

and strings. Integers and rationals are assumed to be unbounded, and rationals are arbitrarily precise. We use N as the domain for time and timeout values. Sets <sup>V</sup><sup>⊥</sup> and <sup>N</sup><sup>⊥</sup> are extended with the value <sup>⊥</sup> to represent an *uninitialized* variable.

The grammar for types (τ ), expressions (), actions (α), events (κ), transitions (μ) and input triggers (ι) is shown in Fig. 3. In the grammar, metavariable e stands for an expression, l stands for a literal value in <sup>V</sup>, v stands for any variable in **<sup>V</sup>**, t is a timer variable in **<sup>T</sup>**, a is an action, and i is an input in **<sup>I</sup>**. The unary and binary operators include standard arithmetic, Boolean, and relational operators. The timeout expression is true at the instant timer t expires, and the isundefined expression returns true if the variable or timer in question has not been assigned. Actions (α) describe changes to the system state: setTimer starts a timer and sets the periodicity of the timer, while the resetTimer and clearTimer reset and clear a timer (without changing the periodicity of the timer). The setGlobal action assigns a global variable. Events (κ) describe conditions under which a sequence of actions occur.

We define configurations **C** for the state machine as:

$$\mathbf{C} \triangleq \mathbf{S} \times (\mathbf{I} \to \mathbb{V}^{\perp}) \times (\mathbf{T} \to (\mathbb{N}^{\perp} \times \mathbb{N}^{\perp})) \times (\mathbf{G} \to \mathbb{V}^{\perp})$$

Each configuration C <sup>=</sup> s, i, t, g tracks the following:


*Example 1.* Consider a corrected version of our example detector model from Fig. 1 which has two timers, Wait1Min and Wait5Min, and no global variables. Some examples of configurations for this model are:

– TempOK, {temp : ⊥}, {Wait1Min : (⊥, <sup>⊥</sup>), Wait5Min : (⊥, <sup>⊥</sup>)}, {} is the initial configuration. The model contains input temp, timers Wait1Min and Wait5Min, and no global variables. As no variables or timers have been assigned, all variables have value undefined (⊥).

Fig. 4. Rules describing behavior of the system

– TooHot, {temp : 300}, {Wait1Min : (60, <sup>⊥</sup>), Wait5Min : (300, 260)}, {} is the configuration at global time t if the temperature is still beyond the normal range and we transition to the TooHot detector model state. Note the Wait1Min timer is no longer set whereas the Wait5Min timer has a periodicity of 300 and is set to expire at t + 260.

To define the execution semantics, we create a structural operational semantics for each of the grammar rules and for the interaction with the external environment, as shown in Fig. 4. We distinguish semantic rules by decorating the turnstiles with the grammar type that they operate over (, α, κ, μ, <sup>E</sup><sup>I</sup> , and ι). The variables e, a, k, m, i stand in for elements of the appropriate syntactic class defined by the turnstile. For lists of elements, we decorate the syntactic class with \* (e.g. <sup>α</sup>∗), and the variables with 'l' (e.g. al). We use the following notation conventions: Given C <sup>=</sup> s, i, t, g, we say C.s <sup>=</sup> s, and similarly with the other components of C. We also say C[s <sup>←</sup> s ] is equivalent to s , i, t, g, and similarly with the other components of C.

Expressions ( ) evaluate to values, given a configuration. We do not present expression rules (they are simple), but illustrate the other rule types in Fig. 4. For actions ( <sup>α</sup>), the *setTimer* rule establishes the periodicity of a timer and also starts it. The *resetTimer* and *clearTimer* rules restart an existing timer given a periodicity p or clear it, respectively, and the *setGlobal* rule updates the value of a global variable. *Events* (κ) are used by entry and exit events for states. The list rules for actions (α∗) and events (κ∗) are not presented but are straightforward: they apply the relevant rule to the head of the list and pass the updated configuration to the remainder of the list, or return the configuration unchanged for nil. *Transition event lists* (μ∗) cause the system to change state, executing (only) the first transition from the list whose guard e evaluates to true. Finally, the top-level rule <sup>ι</sup> describes how the system evolves according to external stimuli.

A *run* of the machine is any valid sequence of configurations produced by repeated applications of the <sup>ι</sup> rule. Timeout inputs increment the time to the earliest active timeout as described by the *matchesEarliest* predicate:

$$\begin{aligned} \mathsf{matchesEarth}(t, x) &\equiv \exists t\_i, p\_i. (p\_i, x) = t(t\_i) \land \\ \forall t\_j, p\_j, y. ((p\_j, y) = t(t\_j) \implies y = \bot \lor y \ge x) \end{aligned}$$

The subtractTimers function subtracts <sup>t</sup><sup>i</sup> from each timer in <sup>C</sup>, and the clearTimers function, for any timers whose time remaining is equal to zero, calls the clearTimer action<sup>1</sup>.

#### 3.1 Well-formedness Properties

To find common issues with detector models, we surveyed (i) detector models across customer tickets submitted to AWS IoT Events, (ii) questions posted on internal forums like the AWS re:Post forum [15], and (iii) feedback submitted via the web-based console for AWS IoT Events. Based on this survey, we determined that the following correctness properties should hold over all detector models. For more details about this survey, please refer to Appendix A.

The Model does not Contain Type Errors: The AWS IoT Events expression language is *untyped*, and thus, may contain ill-typed expressions, e.g., performing arithmetic operations on Booleans. A large class of such bugs can be readily detected and prevented using a *type inference* algorithm. The algorithm follows the standard Hindley-Milner type unification approach [16–18] and generates (and solves) a set of type constraints or reports an error if no valid typing is possible. We use this type inference algorithm to detect type errors in the detector model. Every type error is reported as a warning to the customer. When our type inference successfully infers types for expressions, we use them to construct a well-typed abstract state machine using the formalization reported in Sect. 3.

For the remaining well-formedness properties we use model checking. We introduce one or more *indicator variables* in our global abstract state to track certain kinds of updates in the state machine, and then we assert temporal properties on these indicator variables. Because we use a model checker that

<sup>1</sup> In the interests of space, we do not cover the *batch* execution mode, where all variables used in expressions maintain their "pre-state" value until the step is completed; it is a straightforward extension.

checks only *safety properties*, in many cases we invert the property of interest and check that its negation is falsifiable, using the same mechanism often used for test-case generation [19].

Every Detector Model State is Reachable and Every Detector Model Transition and Event can be Executed: For each state s <sup>∈</sup> **<sup>S</sup>**, we add a new Boolean *reachability indicator* variable v <sup>s</sup> reached to our abstract state that is initially false and assigned true when the state is entered (similarly for transitions and events). To encode the property in a safety property checker, we encode the following *unreachability* property expressed in LTL and check it is falsifiable. If it is provable, the tool warns the user.

> Unreachable(s) - (<sup>¬</sup> v <sup>s</sup> reached)

Every Variable is Set Before Use: In order to test that variables are properly initialized, first we identify the places where variables are assigned and used. In detector models, there are three places where variables are used: in the evaluation of conditions for events and transitions, and in the setGlobal action (which occurs because of an event or transition). We want to demonstrate that the variables used within these contexts are never equal to ⊥ during evaluation. In this case, we can reuse the reachability variables that we have created for events and transitions to encode that variables should always have defined values when they are used.

We first define some functions to extract the set of variables used in expressions and action lists. The function V ars(e) : <sup>→</sup> v set simply extracts the variables in the expression. For action lists, it is slightly more complex, because variables are both defined and used:

$$\begin{aligned} Vars(\mathtt{nil1}) &= \{\} \\ Vars(\mathtt{setTimer}(t,e)::tl) &= Vars(e) \cup Vars(tl) \\ Vars(\mathtt{resetTime}(t)::tl) &= Vars(tl) \\ Vars(\mathtt{c1} \mathtt{earTime}(t)::tl) &= Vars(tl) \\ Vars(\mathtt{setGlobal}(g,e)::tl) &= Vars(e) \cup (Vars(tl)-\{g\}) \\ &= Vars(\mathtt{event}(e,al)) = Vars(e) \cup Vars(al) \\ Vars(\mathtt{transition}(e,al,s')) &= Vars(e) \cup Vars(al) \end{aligned}$$

Every event or transition can be executed at most once during a computation step, so we can use the execution indicator variables to determine when a variable might be used.

$$\forall a\_i, v\_j \in Vars(a\_i) \; .$$

$$\text{SetBeforeUse}(a\_i, v\_j) \stackrel{\Delta}{=} \square \left(v\_{\text{exec}}^{a\_i} \implies v\_j \neq \bot\right).$$

Input Read Only on Message Trigger: This property is covered in the previous property, with one small change. To enforce it, we modify the translation of the semantics slightly so that at the beginning of each step, prior to processing the input message, all input variables are assigned ⊥.

Message Triggered Between Consecutive Timeouts: We conservatively approximate a liveness property (no infinite path consisting of only timeout events) with a safety property: the same timer should not timeout twice without an input message occurring in between the timeouts. This formulation may flag models that do not have infinite paths with no input events, but our customers consider it a reasonable indicator.

We begin by defining an indicator variable for each timer <sup>t</sup><sup>i</sup> (of type integer rather than Boolean): v<sup>i</sup> timeout and initialize it to zero. We modify the translation of updateTimers to increment this variable when its timer variable equals zero, and modify the translation of the *message* rule to reset all v<sup>i</sup> timeout variables to zero. The property of interest is then:

> NoConsecutiveTimeouts(t<sup>i</sup>) - - vi timeout <sup>&</sup>lt; <sup>2</sup>

### 4 Experiments

In this section, we evaluate the performance of model-checking safety properties on detector models, with a focus on model checking latency. Low analysis latency is crucial because our tool warns customers of property violations while they are editing their detector model. Our type inference implementation runs with an average latency of 10 milliseconds on all the detector models in our experiments. Since type inference is much faster than model checking and can be successfully run on all detector models, we do not evaluate it in this section.

AWS IoT Events has a commercial feature [20] which uses the type checking and model checking described in Sect. 3. The feature's implementation first infers types using the type inference algorithm. Next, it translates the detector model into the Lustre language [21]. The translation of IoT Events into Lustre is straightforward and directly follows from the semantics presented in Sect. 3. The safety properties described in Sect. 3.1 are attached to the model, along with location information. Then the feature analyzes the model using the JKind [1] tool suite, an open-source industrial model-checker. If JKind invalidates a safety property, the feature decodes the location from the safety property and includes it in the warning.

To evaluate this implementation, we randomly selected 210 detector models previously analyzed by the commercial feature. We checked the five properties described in Sect. 3.1 in parallel on a c4.8xlarge EC2 instance running Amazon Linux 2 x86\_64 using JKind version 4.4.1, with a timeout of 60 s.

Of the safety properties that we were able to translate to Lustre, JKind resolved 96% within our timeout of 60 s, with 80% completing in less than 10 s.

Table 1 shows that checking the *no-unreachable-action* safety property requires the most time to complete. The detector models analyzed in the evaluation include models for monitoring self-driving wheel chairs, monitoring device connectivity, humidity, temperature, pressure, oil level, oil temperature, doors, motion, refrigerator temperature, dough fermentation, and vehicle speed-sensing. They consisted of between 1–7 states and from 0–14 state changes. The *nounreachable-action* safety property is checked on every action, generating an


Table 1. Performance of our model-checking tool against 210 detector models

average of 17 safety properties per detector model, the most of any kind of safety property. This large number of properties to be checked on every detector model caused checking the *no-unreachable-action* safety property to have the highest average latency (5.6 s per analysis).

Table 1 shows that about 13% of the properties could not be translated to Lustre. In 2% of the detector models, translation failures arose due to type errors or incorrect use of the AWS IoT Events expression language in the detector model. The remaining translation failures occurred due to either: (1) use of operations not supported by Lustre, (2) no types being inferred for inputs or variables in the detector model, or (3) use of non-linear arithmetic, which is unsupported in JKind. Bitwise functions, strings, and array data types are supported in the AWS IoT Events expression language but not in Lustre. This language gap prevented us from translating 19 of the 210 detector models. Failing to infer a type for a variable in the detector model prevented translation of 6 of the 210 detector models. JKind's lack of support for non-linear arithmetic prevented model-checking 2 of the 210 detector models. We are actively working to support more functions, string and array data types, type annotations, and non-linear arithmetic in our model-checking of detector models.

### 5 Conclusion

Our analyzers have been running in the AWS IoT Events production service since December 2021. Since then, 93% of AWS IoT Events customers have used our implementation to check their detector models for well-formedness, without needing to have any knowledge of the underlying type checking and model checking. Our analyzers successfully complete for 85% of real-world detector models and we are actively working on improving this support as explained in Sect. 4. Overall, our implementation has reported well-formedness property violations in 22% of submitted detector models in the production service, with an average latency of 5.6 s. We find giving customers push-button access to fast verification without requiring any knowledge of the underlying techniques enables adoption of automated reasoning-based tools.

### A Common Issues with Detector Models


Table 2. Issues seen in detector models from customer questions

As mentioned in Sect. 3.1, we surveyed customer detector models for generic correctness problems. We present the root causes of the problems from this study in Table 2. Incorrect scaling (#1) occurs when the customer does not set up their detector model to be instantiated correctly for every IoT device in their fleet. Infinite loop (#3) occurs when the detector model has an infinite execution path involving only timeout events and no external input messages. IoT models should be eventually quiescent if no external inputs occur.

Variable-used-before-set (#4) occurs when a variable in the detector's state is read from before being set to an initial value. AWS IoT Events does not require variables in detector models to be initialized.

A step through a detector can be triggered due to both a timer expiration or a new value being sent to the detector by the outside world. Input read on timer expiration (#5) occurs when a step, triggered by timer expiration, causes the detector to read from its input(s). This is a problem because customers often do not realize that such a read will return the last value sent to the detector by the outside world. Insufficient logging permissions (#6) occurs when a detector is not given sufficient permissions to produce logging output. Incorrect crossservice setup (#8) occurs when customers do not correctly set up data flow across services in AWS IoT. While unnecessarily complex detector models (#9) is not a correctness problem, it poses a significant difficulty to customers in maintaining their detector models, and so, we include it in Table 2.

Of these 9 root causes, we identified that type checking and model checking detected 5 root causes highlighted in green in Table 2. These 5 root causes were responsible for 44% of issues in our survey. Based on Table 2, we determined that the following correctness properties should hold over all detector models:


We explain these properties further s in Sect. 3.1.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Learning Assumptions for Compositional Verification of Timed Automata**

Hanyue Chen<sup>1</sup>, Yu Su<sup>1</sup>, Miaomiao Zhang1(B) , Zhiming Liu<sup>2</sup>, and Junri Mi<sup>1</sup>

> <sup>1</sup> Tongji University, Shanghai, China {2111285,lemon,miaomiao,18mjr}@tongji.edu.cn <sup>2</sup> Southwest University, Chongqing, China zhimingliu88@swu.edu.cn

**Abstract.** Compositional verification, such as the technique of assumeguarantee reasoning (AGR), is to verify a property of a system from the properties of its components. It is essential to address the state explosion problem associated with model checking. However, obtaining the appropriate assumption for AGR is always a highly mental challenge, especially in the case of timed systems. In this paper, we propose a learning-based compositional verification framework for deterministic timed automata. In this framework, a modified learning algorithm is used to automatically construct the assumption in the form of a deterministic one-clock timed automaton, and an effective scheme is implemented to obtain the clock reset information for the assumption learning. We prove the correctness and termination of the framework and present two kinds of improvements to speed up the verification. We discuss the results of our experiments to evaluate the scalability and effectiveness of the framework. The results show that the framework we propose can reduce state space effectively, and it outperforms traditional monolithic model checking for most cases.

### **1 Introduction**

Model checking [9,19,33,36] is an important technique to automatically determine whether a system satisfies a specified property. However, it suffers from the state explosion problem since it needs to store the explored system states in memory, which is impossible for most realistic systems [21]. In timed systems, although symbolic representations and partial order reductions have greatly increased the size of the systems that can be verified, many realistic timed systems are still too large to be handled. In particular, if a system has several components, the number of global system states will grow exponentially with the number of components. Assume-guarantee reasoning (AGR) [20,25,29,35] is a promising method helpful to address the state explosion problem.

Consider a system M composed of two components M<sup>1</sup> and M<sup>2</sup> that synchronize on a given set of shared actions. Supposing we are to verify that M satisfies a property φ, the verification rule in AG states that if there exists an assumption

c The Author(s) 2023

This work has been funded by NSFC under grant No.61972284 and No.62032019.

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 40–61, 2023. https://doi.org/10.1007/978-3-031-37706-8\_3

A on the environment of M<sup>2</sup> such that 1) M<sup>1</sup> and A satisfy the property φ, and 2) M<sup>2</sup> is a refinement of A, then M satisfies φ.

A major challenge in verifying component-based systems using the AG rule is the need to obtain the appropriate assumption that requires non-trivial human effort [26]. Based on abstraction-refinement paradigm in [22], the assumption is computed as a conservative abstraction of some of the components, and it is then refined using counterexamples obtained from model checking it [15]. The algorithm presented in [24] is capable of generating the weakest possible assumption automatically, though it does not compute partial results. In the later work [23], a framework is proposed for the automatic generation of assumptions in an incremental fashion using the *L*\* learning algorithm [8]. Several improvements, e.g. [14,17,18,38], are proposed to further reduce the learning complexity. The work [6] by Alur et al. presents a symbolic implementation of the *L*\* algorithm where the required data structures are maintained compactly using ordered BDDs [16].

All the aforementioned work focuses on untimed systems. For timed systems, using assume-guarantee style proof rules, the work in [39] proves a refined representation is a correct implementation of an abstract one. To check Zeroconf, a protocol for dynamic configuration of IPv4 link-local addresses, Berendsen et al. [12] model the protocol as a network of timed automata (TAs) [3,4], and provide a proof that combines model checking with the application of a new abstraction relation that is compositional with respect to committed locations. However, the abstract models there are all provided manually. Compared to the manual methods, the compositional verification framework presented in [31,32] utilizes a learning algorithm for automatic construction of timed assumptions for AGR. The work considers event-recording automata [5], which are a subclass of timed automata. Sankur [37] gives compositional verification for the system composed by a deterministic finite automaton (DFA) and a timed automaton, where a DFA assumption is learned [27] to approximate the timed component. The framework can only check the untimed property of the system and it has the limitation that the TA size is relatively small.

The timed automaton is the most appreciated model for its simplicity and adequacy in expressiveness, and it is widely used for practical real-time systems [28,30]. However, to the best of our knowledge, though compositional verification for timed systems helps mitigate the state space explosion problem, there is still no work to tackle the problem of automatically inferring the timed assumptions based on AGR for timed automata. Therefore, we propose, in this paper, a learning-based framework for AG-based automatic verification of deterministic timed automata. The framework applies the compositional rule in an iterative fashion. Each iteration consists of three steps. In the first step, based on the work in [7], a modified *L*\* algorithm is presented to learn a timed assumption in the form of a deterministic one-clock timed automata (DOTAs) using membership queries. Then two further steps are conducted to check whether the learned assumption satisfies the two premises of the proof rule via candidate queries. We design an algorithm for model conversion with polynomial complexity, which is executed as a step preceding the above iterative steps. It converts the input models M1, M<sup>2</sup> and φ to the output ones, which contain the clock reset information for the assumption learning. Thus, the complexity of the learning step in the framework in total is polynomial. We show this conversion preserves the verification results.

We further prove the correctness and termination of the compositional verification. We would like to note that the framework we propose applies to verification of systems with a number of components. In other words, though the assumption learned is a DOTA, M<sup>1</sup> and M<sup>2</sup> can be compositions of several DOTAs. For this, we design a heuristic to transform multi-clock reset information to one-clock reset information, which enables the framework to handle learning-based compositional verification for multi-clock systems. We also propose two improvements to speed up the verification, which are shown to have different advantages in cases of experiments. Finally, we implement the framework and conduct comparative experiments with UPPAAL [10,11] on cases of the benchmark of AUTOSAR (Automotive Open System Architecture) [1]. The experiments show that the framework proposed in this paper performs better than that of UPPAAL provided the properties to be checked are satisfied.

The rest of the paper is organized as follows. In Sect. 2, we introduce background knowledge. We present in Sect. 3 our learning-based compositional verification framework, as well as the proofs of termination and correctness. In Sect. 4, we present the two improvements. We report the experimental results in Sect. 5. Finally, we discuss the conclusions of the paper in Sect. 6.

### **2 Preliminaries**

We use <sup>N</sup> to denote the set of natural numbers, <sup>R</sup>≥<sup>0</sup> the set of non-negative reals, and let <sup>B</sup> <sup>=</sup> {-, ⊥}, where and ⊥ stand for true and false, respectively.

#### **2.1 Timed Automata**

Let X be a finite set of real-valued variables ranged over by x, y, etc. standing for clocks. A clock valuation for <sup>X</sup> is a function <sup>ν</sup> : <sup>X</sup> → <sup>R</sup>≥<sup>0</sup> which associates every clock <sup>x</sup> with a value <sup>ν</sup>(x) <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>. For <sup>t</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>, let <sup>ν</sup> <sup>+</sup> <sup>t</sup> denote the clock valuation which maps every clock <sup>x</sup> <sup>∈</sup> <sup>X</sup> to the value <sup>ν</sup>(x) + <sup>t</sup>. For a set <sup>γ</sup> <sup>⊆</sup> <sup>X</sup> and a valuation <sup>ν</sup>, we use [<sup>γ</sup> <sup>→</sup> 0]<sup>ν</sup> to denote the valuation which resets all clock variables in <sup>γ</sup> to 0 and agrees with <sup>ν</sup> for other clocks in <sup>X</sup>\γ.

We use Φ(X) to denote the set of clock constraints over X of the form <sup>ϕ</sup> :: = -<sup>|</sup> <sup>x</sup><sup>1</sup> m <sup>|</sup> <sup>x</sup><sup>1</sup> <sup>−</sup> <sup>x</sup><sup>2</sup> m <sup>|</sup> <sup>ϕ</sup> <sup>∧</sup> <sup>ϕ</sup>, where <sup>x</sup>1, x<sup>2</sup> <sup>∈</sup> <sup>X</sup>, <sup>m</sup> <sup>∈</sup> <sup>N</sup> and ∈ {=, <, >, <sup>≤</sup>, ≥}. We use <sup>ϕ</sup>(ν) = to mean that the clock valuation ν for X satisfies the clock constraint ϕ over X, i.e. ϕ evaluates to true using the values given by ν.

**Definition 1 (Timed Automata).** *A timed automaton (TA) is a 6-tuple* M = (Q, q0, Σ, F, X, Δ)*, where* <sup>Q</sup> *is a finite set called the locations,* <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup> *is the* *initial location,* <sup>Σ</sup> *is a finite set called the alphabet,* <sup>F</sup> <sup>⊆</sup> <sup>Q</sup> *is the set of accepting locations,* <sup>X</sup> *is the finite set of clocks, and* <sup>Δ</sup> <sup>⊆</sup> <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>×</sup> <sup>Φ</sup>(X) <sup>×</sup> <sup>2</sup><sup>X</sup> <sup>×</sup> <sup>Q</sup> *is a finite set called the transitions.*

A transition <sup>δ</sup> <sup>∈</sup> <sup>Δ</sup> is a 5-tuple (q, σ, ϕ, γ, q ), where q, q <sup>∈</sup> <sup>Q</sup> are respectively the source and target locations, <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> is an action, <sup>ϕ</sup> is a clock constraint over X which is called the guard of the transition and specifies that the transition is enabled when it is true in the source state, and the set <sup>γ</sup> <sup>⊆</sup> <sup>X</sup> gives the reset clocks by this transition. Thus, δ allows a jump from q to q by performing an action <sup>σ</sup> if it is enabled, i.e. <sup>ϕ</sup>(ν) = -. We use δ[i] to denote the i'th element of the tuple δ = (q, σ, ϕ, γ, q ) for i = 1,..., 5. A *run* ρ of M is a finite sequence of transitions <sup>ρ</sup> = (q0, ν0) <sup>σ</sup>1,t<sup>1</sup> −→ (q1, ν1) <sup>σ</sup>2,t<sup>2</sup> −→ · · · <sup>σ</sup>*n*,t*<sup>n</sup>* −→ (qn, νn) where <sup>ν</sup><sup>0</sup> <sup>=</sup> {ν(x)|ν(x)=0, x <sup>∈</sup> <sup>X</sup>}, and for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> there exists a transition (q<sup>i</sup>−1, σi, ϕi, γi, qi) <sup>∈</sup> <sup>Δ</sup> such that <sup>ϕ</sup>i(ν<sup>i</sup>−<sup>1</sup> <sup>+</sup>ti) = -, and <sup>ν</sup><sup>i</sup> = [γ<sup>i</sup> <sup>→</sup> 0](ν<sup>i</sup>−<sup>1</sup> <sup>+</sup>ti). If q<sup>n</sup> is an accepting location, we say ρ is an *accepting run* of M. Each pair (σi, ti) <sup>∈</sup> <sup>Σ</sup> <sup>×</sup> <sup>R</sup>≥<sup>0</sup> in the run <sup>ρ</sup> is called a *timed action* that indicates the action σ<sup>i</sup> is applied after t<sup>i</sup> time units since the occurrence of the previous action.

The *timed trace* of ρ is a *timed word trace*(ρ)=(σ1, t1) (σ2, t2)...(σn, tn). Since time value t<sup>i</sup> represents *delay time*, we also call such a timed trace a *delay-timed word*, denoted by ω. Adding the reset information along ω, we get the corresponding *reset-delay-timed word*, denoted by ω<sup>r</sup> = *trace*r(ρ) = (σ1, t1, γ1)(σ2, t2, γ2)···(σn, tn, γn). Notice that here <sup>γ</sup><sup>i</sup> is a clock set <sup>γ</sup><sup>i</sup> <sup>⊆</sup> <sup>X</sup> which records the reset clocks in the corresponding transition when taking timed action (σi, ti).

If ρ is an accepting run of M, *trace*(ρ) is called an *accepting timed word*. The *recognized timed language* of M is the set of its accepting delay-timed words, i.e. <sup>L</sup>(M) = {*trace*(ρ)<sup>|</sup> <sup>ρ</sup> is an accepting run of <sup>M</sup>}. The *recognized reset-delaytimed language* <sup>L</sup>r(M) is defined as {*trace*r(ρ)<sup>|</sup> <sup>ρ</sup> is an accepting run of <sup>M</sup>}. A TA M is *deterministic* iff for any given delay-timed word ω, there is at most one run ρ in M having *trace*(ρ) = ω.

For a run ρ, we define the corresponding *logical-timed word* ω<sup>l</sup> = (σ1, **v**1) (σ2, **<sup>v</sup>**2)···(σn, **<sup>v</sup>**n), where **<sup>v</sup>**<sup>i</sup> <sup>∈</sup> <sup>R</sup>|X<sup>|</sup> <sup>≥</sup><sup>0</sup> is the vector which records the values for all clocks in X. Therefore, delay-timed words and logical-timed words describe the operations of the timed model M from different perspectives. The former describe M from the external perspective, recording the actions and time intervals between two consecutive actions. While the latter describe it from the internal perspective, recording the actions and the specific values of internal clocks when the actions occur. Both are necessary for the active learning algorithm described in Sect. 2.2.

Given the clock reset information γ<sup>i</sup> along the run ρ over the delay-timed word ω = (σ1, t1) (σ2, t2)...(σn, tn), we can obtain ω's corresponding logicaltimed word <sup>ω</sup><sup>l</sup> = (σ1, **<sup>v</sup>**1)(σ2, **<sup>v</sup>**2)···(σn, **<sup>v</sup>**n) by taking

$$\mathbf{v}\_{i}[j] = \begin{cases} t\_{i}, & \text{if } \ i = 1 \text{ or } \ x\_{j} \in \gamma\_{i-1} \text{ for all } 2 \le i \le n; \\ \mathbf{v}\_{i-1}[j] + t\_{i}, & \text{otherwise.} \end{cases} \tag{1}$$

where 1 <sup>≤</sup> <sup>j</sup> ≤ |X<sup>|</sup> and **<sup>v</sup>**i[j] is the <sup>j</sup>'th element in **<sup>v</sup>**i. We use <sup>Γ</sup> to denote the mapping from the delay-timed words to the logical-timed words, that is, Γ(ω) = ωl. With the reset information along the run ρ, we have the *resetlogical-timed word* ωrl = (σ1, **v**1, γ1)(σ2, **v**2, γ2) ...(σn, **v**n, γn). We can extend the mapping Γ to a mapping from the reset-delay-timed words to the resetlogical-timed words.

The *recognized logical-timed language* of M is given as L(M) = {Γ(*trace*(ρ))<sup>|</sup> <sup>ρ</sup> is an accepting run of <sup>M</sup>}, and the *recognized reset-logical-timed language* of <sup>M</sup> is <sup>L</sup>r(M) = {Γ(*trace*r(ρ))<sup>|</sup> <sup>ρ</sup> is an accepting run of <sup>M</sup>}.

**Definition 2 (Projection of Delay-Timed Words).** *Given a delay-timed word* <sup>ω</sup> = (σ1, t1) (σ2, t2)...(σn, tn) <sup>∈</sup> (Σ<sup>1</sup> <sup>×</sup> <sup>R</sup>≥<sup>0</sup>) <sup>∗</sup> *and an alphabet* Σ2*, the projection of* ω *to* Σ<sup>2</sup> *is a delay-timed word, denoted by* ω-<sup>Σ</sup><sup>2</sup> *, and defined as follows:*

$$\|\omega\|\_{\Sigma\_2} = \left(\sigma\_{i\_1}, \sum\_{j=1}^{i\_1} t\_j\right) \left(\sigma\_{i\_2}, \sum\_{j=i\_1+1}^{i\_2} t\_j\right) \dots \left(\sigma\_{i\_m}, \sum\_{j=i\_{m-1}+1}^{i\_m} t\_j\right) \tag{2}$$

*where* <sup>σ</sup><sup>i</sup>*<sup>k</sup>* <sup>∈</sup> <sup>Σ</sup><sup>2</sup> *is the* <sup>i</sup>k*'th action in* <sup>ω</sup>*,* <sup>1</sup> <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>m</sup>*.*

Therefore, ω-<sup>Σ</sup><sup>2</sup> restricts each action σ<sup>i</sup>*<sup>k</sup>* to be in Σ<sup>2</sup> and modifies the corresponding delay time of <sup>σ</sup><sup>i</sup>*<sup>k</sup>* to be the time interval between <sup>σ</sup><sup>i</sup>*k*−<sup>1</sup> and <sup>σ</sup><sup>i</sup>*<sup>k</sup>* in <sup>ω</sup>. For instance, let <sup>ω</sup> = (a, 1)(b, 3)(a, 1)(c, 4)(a, 2) and <sup>Σ</sup><sup>2</sup> <sup>=</sup> {b, c}, then the corresponding ω-<sup>Σ</sup><sup>2</sup> = (b, 4)(c, 5).

**Definition 3 (Parallel Composition of Timed Automata).** *Given two timed automata* M<sup>1</sup> = (Q1, q<sup>1</sup> <sup>0</sup>, Σ1, F1, X1, Δ1) *and* M<sup>2</sup> = (Q2, q<sup>2</sup> <sup>0</sup>, Σ2, F2, X2, Δ2)*, assume that the clock sets* X<sup>1</sup> *and* X<sup>2</sup> *are disjoint. Their parallel composition is a TA* <sup>M</sup><sup>1</sup><sup>M</sup><sup>2</sup> = (Q1×Q2,(q<sup>1</sup> 0, q<sup>2</sup> <sup>0</sup>), Σ1∪Σ2, F1×F2, X1∪X2, Δ) *where the transitions* Δ *are as follows:*


The language of the composition is the set of accepting delay-timed words and <sup>L</sup>(M<sup>1</sup><sup>M</sup>2) = {ω|<sup>ω</sup> <sup>∈</sup> ((Σ<sup>1</sup> <sup>∪</sup> <sup>Σ</sup>2) <sup>×</sup> <sup>R</sup>≥<sup>0</sup>) <sup>∗</sup> and ω-<sup>Σ</sup>*<sup>i</sup>* ∈ L(Mi), i ∈ {1, <sup>2</sup>}}.

**Definition 4 (Language Inclusion).** *Given two timed automata* M<sup>1</sup> *and* M2*, if* <sup>L</sup>(M1)-<sup>Σ</sup><sup>2</sup> <sup>=</sup> {ω-<sup>Σ</sup><sup>2</sup> <sup>|</sup><sup>ω</sup> ∈ L(M1)} *is a subset of* <sup>L</sup>(M2)*, we say* <sup>M</sup><sup>1</sup> *satisfies* <sup>M</sup>2*, denoted by* <sup>M</sup><sup>1</sup> <sup>|</sup><sup>=</sup> <sup>M</sup>2*.*

**Definition 5 (Deterministic One-Clock Timed Automata).** *A one-clock timed automaton (OTA) is the timed automaton with only one clock. A deterministic OTA is denoted by DOTA.*

#### **2.2 Learning Deterministic One-Clock Timed Automata**

In this section, we briefly describe the active learning algorithm for a DOTA M. We refer to [7] for more details. Active learning of a DOTA assumes the existence of a *teacher* who can answer two kinds of queries: *membership* and *candidate queries* posed by a *learner*. A membership query asks the question if <sup>ω</sup><sup>l</sup> <sup>∈</sup> <sup>L</sup>(M) for a logical-timed word <sup>ω</sup>l; and a candidate query asks if the learned DOTA <sup>A</sup> represents the assumption satisfies the equation <sup>L</sup>(A) = <sup>L</sup>(M). The main challenge for learning the timed assumption is to obtain the reset information of the logical clocks for each transition. We consider two different settings, depending on whether the teacher also provides clock reset information along with answers to queries.

A *smart teacher* is one which provides clock reset information along with answers to queries. It accepts a logical-timed word ω<sup>l</sup> as an input for the membership query from the learner. It then returns an answer about if the timed word is accepted or not together with reset information of each transition along the trace, that is, the reset-logical-timed word ωrl.

When the *smart teacher* takes a candidate query from the learner, a counterexample is yielded and provided as a reset-delay-timed word. The algorithm maintains a *timed observation table* **T** to store answers from all previous queries. Once the learner has gained sufficient information, i.e. **T** is *closed* and *consistent*, an assumption A is constructed from the table. Then the learner poses a candidate query to the teacher to judge if <sup>L</sup>(A) = <sup>L</sup>(M). If yes, the algorithm terminates with the learned model A. Otherwise, the teacher responds with a reset-delay-timed word ω<sup>r</sup> as a counterexample. After processing ωr, the algorithm starts a new round of learning. The whole procedure repeats until the teacher gives a positive answer to a candidate query. It is known that the complexity of the algorithm is polynomial in the size of the learned model. In practical applications, this corresponds to the case where some parts of the model (information of clock reset) are known by testing or watchdogs.

In the case when *normal teacher* is used, the learner needs to guess the reset information on each transition discovered in the observation table. At each iteration, the learner guesses all needed reset information and forms a number of table candidates. Due to the required guesses, the complexity of the algorithm is exponential in the size of the learned model. The following theorem which is presented in [7] shows that for both types of teachers, the algorithm converts the learning problem to that of learning the reset-logical-timed language.

**Theorem 1.** *Given two DOTAs* <sup>M</sup> *and* <sup>A</sup>*, if* <sup>L</sup>r(M) = <sup>L</sup>r(A)*, then* <sup>L</sup>(M)= <sup>L</sup>(A)*.*

### **3 Framework for Learning-Based Compositional Verification of Timed Automata**

Consider a system <sup>M</sup> <sup>=</sup> <sup>M</sup><sup>1</sup><sup>M</sup><sup>2</sup> consisting of two deterministic timed automata and a safety property φ represented as a deterministic timed automaton. We devote this section to presenting our learning-based verification framework for automatically finding an appropriate assumption A in the AG rule to verify that M satisfies φ. Section 3.1 first describes the framework. Then, in Sects. 3.2, 3.3 and 3.4, the main algorithms of the framework are presented in detail. Finally, Sect. 3.5 shows the correctness and termination of the framework.

#### **3.1 Verification Framework via Assumption Learning**

Let Σ1, Σ<sup>2</sup> and Σ<sup>φ</sup> be the alphabets of the TAs M1, M<sup>2</sup> and φ, respectively. We then have that the alphabet of the assumption <sup>A</sup><sup>0</sup> is <sup>Σ</sup><sup>A</sup><sup>0</sup> = (Σ<sup>1</sup> <sup>∪</sup> <sup>Σ</sup>φ) <sup>∩</sup> <sup>Σ</sup>2. The AG rule is stated as follow:

$$\frac{M\_1 \| A\_0 \mid = \phi, \ M\_2 \mid = A\_0}{M\_1 \| M\_2 \mid = \phi} \tag{3}$$

The rule converts the problem of verifying <sup>M</sup><sup>1</sup><sup>M</sup><sup>2</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> to that of finding an assumption <sup>A</sup><sup>0</sup> which is a DOTA satisfying both <sup>M</sup><sup>1</sup><sup>A</sup><sup>0</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> and <sup>M</sup><sup>2</sup> <sup>|</sup><sup>=</sup> <sup>A</sup>0. Here, we consider M<sup>1</sup> and M<sup>2</sup> as general TAs, which are either a DOTA or compositions of a number of DOTAs. Therefore, the framework we propose is not only applicable to verifying the composition of just two components. For a system composed of n components, where n > 2, we can partition the components into two parts. For instance, if a system consists of 4 components <sup>M</sup> <sup>=</sup> {H1, H2, H3, H4}, we can let <sup>M</sup><sup>1</sup> <sup>=</sup> <sup>H</sup><sup>1</sup><sup>H</sup><sup>3</sup> and <sup>M</sup><sup>2</sup> <sup>=</sup> <sup>H</sup><sup>2</sup><sup>H</sup>4. In order to automatically obtain the assumption, we use model learning algorithms. However, the current learning algorithm for DOTA [7] is not directly applicable. We thus design a "smart teacher" with heuristic to answer clock reset information for the learning. For this, we also need to design a model conversion algorithm. We illustrate the learning-based verification framework in Fig. 1. The inputs of the framework are M1, M<sup>2</sup> and property φ and the verification process consists of four steps, which we describe below.

**The First Step.** This step converts the input models into TAs M <sup>1</sup>, M <sup>2</sup> and φ (ref to Sec. 3.2) without changing the verification results, i.e. checking M 1<sup>M</sup> 2 against <sup>φ</sup> is equivalent to checking <sup>M</sup><sup>1</sup><sup>M</sup><sup>2</sup> against <sup>φ</sup>. The output of this step is utilized to determine the clock reset information for the assumption learning in the second step. Then, the AG rule 3 is applied to M <sup>1</sup>, M <sup>2</sup> and φ . Thus, if there exists an assumption A such that M <sup>1</sup><sup>A</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> and <sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>A</sup>, then <sup>M</sup> 1<sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> . The weakest assumption A<sup>w</sup> is the one with which the rule is guaranteed to return conclusive results and M <sup>1</sup><sup>A</sup><sup>w</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> .

**Definition 6 (Weakest Assumption).** *Let* M <sup>1</sup>*,* M <sup>2</sup> *and* φ *be the models mentioned above and* Σ<sup>A</sup> = (Σ <sup>1</sup> <sup>∪</sup> <sup>Σ</sup> <sup>φ</sup>) <sup>∩</sup> <sup>Σ</sup> <sup>2</sup>*. The weakest assumption* A<sup>w</sup> *of* M <sup>2</sup> *is a timed automaton such that the two conditions hold: 1)* Σ<sup>A</sup>*<sup>w</sup>* = ΣA*, and 2) for any timed automaton* E *with* Σ<sup>E</sup> = Σ<sup>A</sup> *and* M <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>E</sup>*,* <sup>M</sup> <sup>1</sup><sup>E</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> *iff* <sup>E</sup> <sup>|</sup><sup>=</sup> <sup>A</sup>w*.*

**Fig. 1.** Learning-based compositional verification framework for timed automata

**The Second Step.** A DOTA assumption A is learned through a number of membership queries in this step. The answer to each query involves gaining the definite clock reset information for each timed word, i.e. whether the clock of A is reset when an action is taken at a specific time. We design a heuristic to obtain such information from the clock reset information of the converted models M 1, M <sup>2</sup> and φ . This allows the framework to handle learning-based compositional verification for multi-clock systems. We refer to Sect. 3.3 for more details.

**The Third and the Fourth Steps.** Once the assumption A is constructed, two candidate queries start for checking the compositional rule. The first is a subset query to check whether M <sup>1</sup><sup>A</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> . The second is a superset query to check whether M <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>A</sup>. If both candidate queries return true, the compositional rule guarantees that M 1<sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> . Otherwise, a counterexample ctx (either ctx<sup>1</sup> or ctx<sup>2</sup> in Fig. 1) is generated and further analyzed to identify whether ctx is a witness of the violation of M 1<sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> . If it does not show the violation, ctx is used to update A in the next learning iteration. The details about candidate queries are discussed in Sect. 3.4.

Therefore, <sup>L</sup>(A) is a subset of <sup>L</sup>(Aw) and a superset of <sup>L</sup>(M 2)-<sup>Σ</sup>*<sup>A</sup>* . It is not guaranteed that a DOTA <sup>A</sup> can be learned to satisfy <sup>L</sup>(A) = <sup>L</sup>(Aw). However, as shown later in Theorem 3, under the condition that <sup>L</sup>(Aw) is accepted by a DOTA, the learning process terminates when compositional verification returns a conclusive result often before <sup>L</sup>(Aw) is computed. This means that verification in the framework usually terminates earlier by finding either a counterexample that verifies that M 1<sup>M</sup> <sup>2</sup> |<sup>=</sup> <sup>φ</sup> or an assumption <sup>A</sup> that satisfies the two premises in the reasoning rule, indicating M 1<sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> .

### **Algorithm 1:** ConvertW(M1, M2, φ)

**input :** Two models <sup>M</sup>1 and <sup>M</sup>2 and the property <sup>φ</sup> to be verified **output:** Converted timed automaton M- 1, <sup>M</sup>- 2 and property <sup>φ</sup>- **1** M-- 1 , M-- 2 , φ-- <sup>←</sup> ConvertS(M1, φ, M2); **2** φ- , M- 1, M- 2 <sup>←</sup> ConvertS(φ--, M-- 1 , M-- 2 ); **3 return** M- 1, M- 2, φ- ;

#### **3.2 Model Conversion**

We use membership queries to learn the DOTA assumption. For a membership query with the input of a logical-timed word ωl, an answer from the teacher is the clock reset information of the word, which is necessary for obtaining the resetlogical-timed word ωrl. As shown in [7], the learning algorithm with a normal teacher can only generate the answer by guessing reset information and this is the cause of high complexity. We thus design a *smart teacher* in our framework scheme. The smart teacher generates the answer to a query with the input ω<sup>l</sup> by directly making use of the available clock reset knowledge of <sup>L</sup>(Aw) (related with ΣA, M<sup>1</sup> and φ). To this end, we implement the *model conversion* from the models M1, M<sup>2</sup> and φ to the models M <sup>1</sup>, M <sup>2</sup> and φ , respectively.

The model conversion algorithm is mainly to ensure that each action in Σ<sup>A</sup> corresponds to unique clock reset information. Given an action <sup>σ</sup> having <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>A</sup> and <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>1</sup> (resp. <sup>Σ</sup>φ), if there is only one transition by <sup>σ</sup> or all its different transitions have the same reset clocks, i.e. for any transitions δ<sup>1</sup> and δ2, δ1[4] = δ2[4] if δ1[2] = δ2[2] = σ, the reset information for the action σ is simply δ[4] of any particular transition by σ. If there are different transitions by σ, say δ<sup>1</sup> and <sup>δ</sup>2, which have different reset clocks, i.e. <sup>δ</sup>1[4] <sup>=</sup> <sup>δ</sup>2[4], we say that the *reset clocks of action* σ *are inconsistent*.

Reset clock inconsistency causes difficulty for the teacher to obtain the clock reset information of an action in a whole run. To deal with this difficulty, we design model conversion in Algorithm 1 to convert M1, M<sup>2</sup> and φ into M <sup>1</sup>, M 2 and φ . In the algorithm, the conversion is implemented by calling Algorithm 2 twice to introduce auxiliary actions and transitions into M<sup>1</sup> and φ to resolve reset clock inconsistency in the two automata, respectively.

The converted models M <sup>1</sup>, M <sup>2</sup> and φ returned by the invocations to Algorithm <sup>2</sup> have the property that all transitions with the same action <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>A</sup> will have the same reset clocks, and thus M <sup>1</sup> and φ do not have reset clock inconsistency. As shown later in Theorem 2, the verification of M 1<sup>M</sup> <sup>2</sup> against φ is equivalent to that of <sup>M</sup><sup>1</sup><sup>M</sup><sup>2</sup> against <sup>φ</sup>.

Algorithm 2, denoted by ConvertS(M1,M2,M3), takes three deterministic TAs, namely M1, M<sup>2</sup> and M3, as its input and convert them into three new TAs, namely M <sup>1</sup>, M <sup>2</sup> and M <sup>3</sup>, as the output. We explain the three main functionalities of the algorithm in the following three paragraphs.

**Check Reset Information in** <sup>M</sup><sup>1</sup> **(Lines** <sup>1</sup>**-**6**).** Let *<sup>Σ</sup>* = (ΣM<sup>1</sup> <sup>∪</sup>ΣM<sup>2</sup> )∩ΣM<sup>3</sup> , <sup>f</sup> be a binary relation between *<sup>Σ</sup>* and 2<sup>X</sup>, where <sup>X</sup> is the set of clocks of <sup>M</sup>1, and **Algorithm 2:** ConvertS(M1,M2,M3)

**input :** Three timed automata <sup>M</sup>1, <sup>M</sup>2, <sup>M</sup>3 **output:** Converted timed automata M- 1, <sup>M</sup>- 2 and <sup>M</sup>- 3 **<sup>1</sup>** M- 1 ← M1,M- 2 ← M2,M- 3 ← M3; **<sup>2</sup>** *Σ* ← (ΣM<sup>1</sup> ∪ ΣM<sup>2</sup> ) ∩ ΣM<sup>3</sup> ; **<sup>3</sup>** *<sup>f</sup>* ← ∅; **<sup>4</sup> for** δ ∈ Δ- 1 **do <sup>5</sup> if** <sup>δ</sup>[2] <sup>∈</sup> *<sup>Σ</sup>* and <sup>δ</sup>[2] ∈ *dom*(f) **then <sup>6</sup>** put δ[2], δ[4] into *<sup>f</sup>* ; **<sup>7</sup> else if** <sup>δ</sup>[2] <sup>∈</sup> *dom*(f) and δ[2], δ[4] ∈ <sup>f</sup> **then <sup>8</sup>** σ ← δ[2]; **<sup>9</sup>** σ*new* ← introduce new action; **<sup>10</sup>** put σ*new* into ΣM- <sup>1</sup> , ΣM- <sup>2</sup> , ΣM- 3 ; **11 for** δ- ∈ {ω|ω ∈ Δ- 1 *and* <sup>ω</sup>[2] = <sup>σ</sup> *and* <sup>ω</sup>[4] = <sup>δ</sup>[4]} **do 12** δ- [2] ← σ*new*; **<sup>13</sup> for** δ ∈ {ω|ω ∈ Δ- 2 *and* <sup>ω</sup>[2] = <sup>σ</sup>} **do 14** δ- ←clone(δ); **15** δ-[2] ← σ*new*; **16** put δ into Δ- 2; **<sup>17</sup> for** δ ∈ {ω|ω ∈ Δ- 3 *and* <sup>ω</sup>[2] = <sup>σ</sup>} **do 18** δ- ←clone(δ); **19** δ-[2] ← σ*new*; **20** put δ into Δ- 3; **<sup>21</sup> return** M- 1,M- 2,M- 3;

<sup>f</sup> <sup>=</sup> <sup>∅</sup> initially. The transitions of <sup>M</sup><sup>1</sup> are checked one by one. For a transition δ, if its action δ[2] is in *Σ* but not in the domain of f (Line 5). Transition δ is the first transition by <sup>δ</sup>[2] found, and thus the pair δ[2], δ[4] is added to the relation f. If the action of δ is already in *dom*(f) but the reset clocks δ[4] is inconsistent with the records in f, the algorithm proceeds to the next steps to handle the inconsistency of the reset clocks (Lines 7-20).

**Introduce Auxiliary Actions in** <sup>M</sup><sup>1</sup> **(Lines** <sup>7</sup>**-**12**).** If <sup>δ</sup>[2] <sup>∈</sup> *dom*(f) <sup>∧</sup> δ[2], δ[4] ∈ <sup>f</sup> (Line 7), we need to introduce a *new action* (through the variable σnew) and add it to the alphabets of the output models. Then the transition δ with action σ is modified to a new transition, say δ by replacing action σ with the value of σnew (Lines 11-12).

**Add Auxiliary Transitions in** M<sup>2</sup> **and** M<sup>3</sup> **(Lines** 13**-**20**).** Since new actions are introduced in M1, we need to add auxiliary transitions with each new action in M<sup>2</sup> and M<sup>3</sup> accordingly. Specifically, consider the case when M<sup>1</sup> and M<sup>2</sup> synchronize on action σ via transitions δ and δ in the models, respectively. If δ in <sup>M</sup><sup>1</sup> is modified to <sup>δ</sup> in <sup>M</sup> <sup>1</sup> by renaming its action σ to σ , a fresh co-transition <sup>δ</sup> should be added to <sup>M</sup> <sup>2</sup> which is a copy of δ by changing σ to σ so as for the synchronisation in the composition of M <sup>1</sup> and M <sup>2</sup> (Lines 13-16). The same changes are made for M<sup>3</sup> (Lines 17-20).

*Example 1.* Fig. 2 shows an example of the conversion. In M1, there are two transitions that contain action a but only one has clock reset. To solve clock reset inconsistency of M1, the new action a is introduced, and M<sup>1</sup> is converted into M <sup>1</sup> by changing action name a of one transition to a marked as an orange dashed line. In M<sup>2</sup> and φ, by adding the corresponding new transitions, M <sup>2</sup> and φ are achieved. In φ, the transitions with a and a still have different reset information, so it is further changed to φ by adding a transition marked as a blue dotted line. Correspondingly, M <sup>1</sup> and M <sup>2</sup> are changed. Obviously, we can determine the reset information of the transition with a (a , a, a) in automata M <sup>1</sup> and φ .

**Fig. 2.** <sup>M</sup>1, <sup>M</sup>2 and <sup>φ</sup> are converted into <sup>M</sup>- 1, <sup>M</sup>- 2 and <sup>φ</sup>-

We now show that the verification of M 1<sup>M</sup> <sup>2</sup> against φ is equivalent to the original verification of <sup>M</sup><sup>1</sup><sup>M</sup><sup>2</sup> against <sup>φ</sup>.

**Theorem 2.** *Checking* M 1<sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> *is equivalent to checking* <sup>M</sup><sup>1</sup><sup>M</sup><sup>2</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup>*.*

*Proof.* We prove <sup>M</sup><sup>1</sup><sup>M</sup><sup>2</sup> |<sup>=</sup> <sup>φ</sup> <sup>⇔</sup> <sup>M</sup> 1<sup>M</sup> <sup>2</sup> |<sup>=</sup> <sup>φ</sup> . This is equivalent to prove <sup>L</sup>(M<sup>1</sup><sup>M</sup><sup>2</sup><sup>φ</sup>) <sup>=</sup> ∅⇔L(M 1<sup>M</sup> <sup>2</sup><sup>φ</sup> ) <sup>=</sup> <sup>∅</sup>, where <sup>φ</sup> and <sup>φ</sup> are the complements of φ and φ , respectively.

We first prove <sup>L</sup>(M<sup>1</sup><sup>M</sup><sup>2</sup><sup>φ</sup>) <sup>=</sup> ∅⇒L(M 1<sup>M</sup> <sup>2</sup><sup>φ</sup> ) = ∅. The left hand side implies that <sup>M</sup><sup>1</sup><sup>M</sup><sup>2</sup><sup>φ</sup> has at least one accepting run <sup>ρ</sup>. According to the construction of M <sup>1</sup>, M <sup>2</sup> and φ , for the composed model M 1<sup>M</sup> <sup>2</sup><sup>φ</sup> , compared with <sup>M</sup><sup>1</sup><sup>M</sup><sup>2</sup><sup>φ</sup>, the locations and the guards of transitions remain the same, although some auxiliary transitions have been added to the model where actions are renamed. So we can construct a run ρ in M 1<sup>M</sup> <sup>2</sup><sup>φ</sup> , which visits the locations in the same order as ρ. Since ρ is an accepting run, its final location must be an accepting one, which implies ρ is an accepting run of M 1<sup>M</sup> <sup>2</sup><sup>φ</sup> , and trace(ρ ) ∈ L(M 1<sup>M</sup> <sup>2</sup><sup>φ</sup> ).

For <sup>L</sup>(M 1<sup>M</sup> <sup>2</sup><sup>φ</sup> ) <sup>=</sup> ∅⇒L(M<sup>1</sup><sup>M</sup><sup>2</sup><sup>φ</sup>) <sup>=</sup> <sup>∅</sup>, since <sup>L</sup>(M 1<sup>M</sup> <sup>2</sup><sup>φ</sup> ) = ∅, there exists at least one accepting run ρ in M 1<sup>M</sup> <sup>2</sup><sup>φ</sup> . Still, by the construction of M <sup>1</sup>, M <sup>2</sup> and φ , we can construct an accepting run <sup>ρ</sup> in <sup>M</sup><sup>1</sup><sup>M</sup><sup>2</sup><sup>φ</sup>, by replacing the newly introduced actions along ρ with their original names, and trace(ρ) is an evidence of <sup>L</sup>(M<sup>1</sup><sup>M</sup><sup>2</sup><sup>φ</sup>) <sup>=</sup> <sup>∅</sup>.

**Complexity.** For the model conversion, Algorithm 1 mainly consists of two invocations of Algorithm 2 which has a nested loop. In the worst case execution of Algorithm 2, the transitions of M<sup>1</sup> in the outer loop and the transitions of <sup>M</sup>1,M<sup>2</sup> and <sup>M</sup><sup>3</sup> in the inner loops are traversed, so the time complexity is polynomial and quadratic in the number of transitions.

#### **3.3 Membership Queries**

After model conversion, a number of membership queries are used to learn the DOTA assumption A. For each membership query, the learner provides the teacher a logical-timed word <sup>ω</sup><sup>l</sup> = (σ1, **<sup>v</sup>**1)(σ2, **<sup>v</sup>**2)···(σn, **<sup>v</sup>**n) to obtain clock reset information, where <sup>σ</sup><sup>i</sup> <sup>∈</sup> <sup>Σ</sup><sup>A</sup> and <sup>|</sup>**v**i<sup>|</sup> = 1. Based on the converted model, the teacher supplements corresponding reset information γ<sup>i</sup> for each σ<sup>i</sup> in ω<sup>l</sup> to construct the reset-logical-timed word ωrl = (σ1, **v**1, γ1)(σ2, **v**2, γ2)...(σn, **v**n, γn). Though the learning algorithm we use is associated with one clock and the hypothesis we obtain is always a DOTA, the number of clocks in M <sup>1</sup> and φ might be multiple since they are not necessarily DOTAs. This raises the question of how to transform the multi-clock reset information to the single-clock reset information. To solve this problem, we use a heuristic to generate the one-clock reset information γ<sup>i</sup> for each action σi. Let X be the finite set of clocks of M <sup>1</sup> and φ , and x be the single clock of the learned assumption, where <sup>|</sup>X<sup>|</sup> <sup>&</sup>gt; 1. For each action <sup>σ</sup>i, we try four heuristics to determine whether <sup>x</sup> is reset: 1) random assignment, 2) <sup>γ</sup><sup>i</sup> is always {x}, 3) <sup>γ</sup><sup>i</sup> is always <sup>∅</sup>, and 4) dynamic reset rule (if there exits a reset clock <sup>y</sup> <sup>∈</sup> <sup>X</sup>, then <sup>γ</sup><sup>i</sup> <sup>=</sup> {x}, otherwise <sup>γ</sup><sup>i</sup> <sup>=</sup> <sup>∅</sup>). We use the fourth since the verification has the least checking time. After obtaining the logcial timed word ωrl, the teacher further checks whether it satisfies φ under the environment of M <sup>1</sup> by model checking if M <sup>1</sup><sup>A</sup><sup>w</sup>*rl* <sup>|</sup><sup>=</sup> <sup>φ</sup> , where A<sup>ω</sup>*rl* is the automaton constructed from ωrl.

As shown in Fig. 1, the step of model conversion is executed only once. It is then followed by the execution of the smart teacher we design, which only requires a polynomial number of membership queries for the assumption learning. Without the first step, the framework needs to turn to a normal teacher, in which case the reset information is obtained by guessing, and an exponential number of membership queries are required.

#### **3.4 Candidate Queries**

The candidate queries are to get answers about whether the learned hypothesis A satisfies the AG reasoning rule.

**The First Candidate Query.** This step checks whether M <sup>1</sup><sup>A</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> . If the answer is positive, we proceed to the second candidate query. Otherwise, a counterexample ctx1, ctx1-<sup>Σ</sup>*<sup>A</sup>* ∈ L(A) is generated and further analyzed by constructing a TA Actx<sup>1</sup> such that M <sup>1</sup><sup>A</sup>ctx<sup>1</sup> |<sup>=</sup> <sup>φ</sup> . We then check whether ctx1-<sup>Σ</sup>*<sup>A</sup>* ∈ L(M 2)-<sup>Σ</sup>*<sup>A</sup>* . If the result is positive, we have M 1<sup>M</sup> <sup>2</sup> |<sup>=</sup> <sup>φ</sup> . Otherwise, ctx1-<sup>Σ</sup>*<sup>A</sup>* ∈ L(A) \ L(Aw) and ctx<sup>1</sup> serves as a negative counterexample to refine assumption A via the next round of membership queries.

**The Second Candidate Query.** This step checks whether M <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>A</sup>, i.e. <sup>L</sup>(M 2)-<sup>Σ</sup>*<sup>A</sup>* ⊆ L(A). If yes, as <sup>M</sup> <sup>1</sup><sup>A</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> and <sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>A</sup>, the verification algorithm terminates and we conclude M 1<sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> . Otherwise, a counterexample ctx<sup>2</sup> is generated and a TA Actx<sup>2</sup> is constructed from the timed word ctx2. We check whether M <sup>1</sup><sup>A</sup>ctx<sup>2</sup> |<sup>=</sup> <sup>φ</sup> . If yes, as ctx2-<sup>Σ</sup>*<sup>A</sup>* ∈ L(M 2)-<sup>Σ</sup>*<sup>A</sup>* , we conclude M 1<sup>M</sup> <sup>2</sup> |<sup>=</sup> <sup>φ</sup> . Otherwise, ctx2-<sup>Σ</sup>*<sup>A</sup>* ∈ L(Aw) \ L(A) is a counterexample, indicating a new round learning is needed to refine and check A using membership and candidate queries until a conclusive result is obtained.

#### **3.5 Correctness and Termination**

We now show the correctness and termination of the framework.

**Theorem 3.** *Given two deterministic timed automata* M<sup>1</sup> *and* M2*, and property* <sup>φ</sup>*, if there exists a DOTA that accepts the target language* <sup>L</sup>(Aw)*, where* <sup>A</sup><sup>w</sup> *is the weakest assumption of the converted model* M <sup>2</sup>*, the proposed learning-based compositional verification returns true if* <sup>φ</sup> *holds on* <sup>M</sup><sup>1</sup><sup>M</sup><sup>2</sup> *and false otherwise.*

*Proof.* From Theorem 2, we only need to consider the converted models M <sup>1</sup>, M 2 and φ .

**Termination.** The proposed framework consists of the steps of model conversion, membership and candidate queries. We argue about the termination of the overall framework by showing the termination of each step.

By Algorithm 1 and Theorem 2, the step of model conversion terminates. Because the learning algorithm of DOTA terminates [7], assumption A will be obtained at last by membership queries. As to the candidate queries, they either conclude M 1<sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> and then terminate, or provide a positive or negative counterexample ctx, that is, ctx-<sup>Σ</sup>*<sup>A</sup>* ∈ L(Aw) \ L(A) or ctx-<sup>Σ</sup>*<sup>A</sup>* ∈ L(A) \ L(Aw), for the refinement of A.

For the weakest assumption Aw, since there exists a DOTA which accepts <sup>L</sup>(Aw), the framework eventually constructs <sup>A</sup><sup>w</sup> in some round to produce the positive answer M <sup>1</sup><sup>A</sup><sup>w</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> to the first candidate query. As shown in Sect. 3.4, we can check whether <sup>L</sup>(M 2)-<sup>Σ</sup>*<sup>A</sup>* ⊆ L(A). If the result is positive, we have M 1<sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> and the framework terminates. Otherwise, a counterexample ctx2-<sup>Σ</sup>*<sup>A</sup>* ∈ L(M 2)-<sup>Σ</sup>*<sup>A</sup>* \ L(Aw) is generated. So <sup>M</sup> 1<sup>M</sup> <sup>2</sup> |<sup>=</sup> <sup>φ</sup> , and ctx<sup>2</sup> is a witness to the fact that M 1<sup>M</sup> <sup>2</sup> violates φ .

**Correctness.** Since there exists a DOTA that accepts the target language <sup>L</sup>(Aw), the framework always eventually terminates with a result which is either true or false. It is true only if both candidate queries return true and this means that φ is held on M 1<sup>M</sup> <sup>2</sup>. Otherwise, a counterexample ctx-<sup>Σ</sup>*<sup>A</sup>* ∈ L(Aw) is generated. Since M <sup>1</sup><sup>A</sup>ctx |<sup>=</sup> <sup>φ</sup> and ctx-<sup>Σ</sup>*<sup>A</sup>* ∈ L(M 2)-<sup>Σ</sup>*<sup>A</sup>* , hence M 1<sup>M</sup> <sup>2</sup> |<sup>=</sup> <sup>φ</sup> .

It is possible, in some cases, there is no DOTA that can accept <sup>L</sup>(Aw), and the proposed verification framework cannot be guaranteed in these cases. However, the framework is still *sound*, meaning that for the cases when a DOTA assumption is learned and the verification terminates with a result, the result holds. Therefore, the framework is able to handle more flexible models such as multi-clock models. We will explore this with experiments in Sect. 5.

**Theorem 4.** *Given two deterministic timed automata* M <sup>1</sup> *and* M <sup>2</sup> *which might have multiple clocks, and property* φ *, even if there is no DOTA that accepts the target language* <sup>L</sup>(Aw)*, the proposed verification framework is still sound.*

*Proof.* Given M <sup>1</sup> and M <sup>2</sup> which are multi-clock timed automata, suppose in some round if the learned DOTA assumption <sup>A</sup> satisfies <sup>L</sup>(A) ⊆ L(Aw) and <sup>L</sup>(M 2)-<sup>Σ</sup>*<sup>A</sup>* ⊆ L(A), we have that both results of the first and second candidate queries are positive. Hence, verification terminates and M 1<sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> holds. For the same reasoning, in the case of a counterexample ctx is generated, that is M <sup>1</sup><sup>A</sup>ctx |<sup>=</sup> <sup>φ</sup> and ctx-<sup>Σ</sup>*<sup>A</sup>* ∈ L(M 2)-<sup>Σ</sup>*<sup>A</sup>* , this implies that M 1<sup>M</sup> <sup>2</sup> |<sup>=</sup> <sup>φ</sup> and the verification terminates with the valid result.

The framework is not *complete* though. For a M<sup>1</sup> with multiple clocks, it is not guaranteed to have a DOTA assumption <sup>A</sup> such that <sup>L</sup>(A) = <sup>L</sup>(Aw). Thus, the framework is not guaranteed to terminate. Furthermore, for a M<sup>2</sup> with multiple clocks, the framework may not be able to learn a DOTA assumption A, such that <sup>L</sup>(M 2)-<sup>Σ</sup>*<sup>A</sup>* ⊆ L(A) even though <sup>M</sup> 1<sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> .

### **4 Optimization Methods**

In this section, we give two improvements to the verification framework proposed in Sect. 3. The first one reduces state space and membership queries in terms of the given information of M <sup>1</sup> and φ . The second one uses a smaller alphabet than Σ<sup>A</sup> = (Σ <sup>1</sup> <sup>∪</sup> <sup>Σ</sup> <sup>φ</sup>) <sup>∩</sup> <sup>Σ</sup> <sup>2</sup> to improve the verification speed.

#### **4.1 Using Additional Information**

In the process of learning assumption A with respect to M <sup>1</sup> and φ , we make better use of the available information of M <sup>1</sup> and φ . It is clear that if there are more actions taking place from a learned location, it is likely there are more successor locations from that location and more symbolic states are needed. It is, in general, that not all the actions are enabled in a location. Since the logicaltimed words of the models M <sup>1</sup> or φ are known beforehand, the sequence of actions that can be taken can be obtained. Therefore, we can use this information to remove those actions which do not take place from a certain location to reduce the number of successor states. Furthermore, the number of membership queries can be reduced by directly giving answers to these queries whose timed words violate the action sequences. This results in accelerating the learning process as well as speeding up the verification to some extent. The experiments in the next section also show these improvements.

For example, M <sup>1</sup> has two actions *read* and *write*. In addition, it is known that the *write* action can only be performed after the *read* has been executed. So, we add such information to the learning step of the verification framework. That is, *read* should take place before *write* in any timed word. Thus, for the membership queries with such word ω<sup>l</sup> = ...(write, **v**k)...(read, **v**m)..., where *write* takes place before *read*, a negative answer is directly returned without the model checking steps for membership queries as shown in Sect. 3.3.

The additional information is usually derived from the design rules and other characteristics of the system under study. In the implementation, we provide some basic keywords to describe the rules, e.g. "beforeAfter" specifies the order of actions, and "startWith" specifies a certain action should be executed first. Therefore, the above example is encoded as "[beforeAfter]:(read,write)".

#### **4.2 Minimizing the Alphabet of the Assumption**

In our framework, the automated AG procedure uses a fixed assumption alphabet Σ<sup>A</sup> = (Σ 1∪Σ φ)∩Σ <sup>2</sup>. However, there may exist an assumption A<sup>s</sup> over a smaller alphabet <sup>Σ</sup><sup>s</sup> <sup>⊂</sup> <sup>Σ</sup><sup>A</sup> that satisfies the two premises of the AG rule. We thus propose and implement another improvement to build the timed assumption over a minimal alphabet. Smaller alphabet size can directly reduce the number of membership queries and thus speeds up the verification process.

**Theorem 5.** *Given* Σ<sup>A</sup> = (Σ 1∪Σ φ)∩Σ <sup>2</sup>*, if there exists an assumption* A<sup>s</sup> *over non-empty alphabet* <sup>Σ</sup><sup>s</sup> <sup>⊂</sup> <sup>Σ</sup><sup>A</sup> *satisfying* <sup>M</sup> <sup>1</sup><sup>A</sup><sup>s</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> *and* <sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>A</sup>s*, then there must exist an assumption* A *over* Σ<sup>A</sup> *satisfying* M <sup>1</sup><sup>A</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> *and* <sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>A</sup>*.*

*Proof.* Based on As, we can construct a timed assumption A over Σ<sup>A</sup> as follows. For A<sup>s</sup> = (Qs, q<sup>s</sup> <sup>0</sup>, Σs, Fs, Xs, Δs), we first build A = (Q, q0, ΣA, F, X, Δ) where Q = Qs, q<sup>0</sup> = q<sup>s</sup> <sup>0</sup>, F <sup>=</sup> <sup>F</sup>s, Δ <sup>=</sup> <sup>Δ</sup><sup>s</sup> and <sup>X</sup> <sup>=</sup> <sup>X</sup>s. Then for <sup>∀</sup><sup>q</sup> <sup>∈</sup> <sup>Q</sup> and <sup>∀</sup><sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>A</sup> \ <sup>Σ</sup>s, we add (q, σ, true, <sup>∅</sup>, q) into <sup>Δ</sup>.

We now prove with such A, M <sup>1</sup><sup>A</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> and <sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>A</sup> still hold, that is, M 1<sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> . Since the locations of A and A<sup>s</sup> are the same, the locations of M <sup>1</sup><sup>A</sup> and <sup>M</sup> <sup>1</sup><sup>A</sup><sup>s</sup> are the same. For the composed model <sup>M</sup> <sup>1</sup><sup>A</sup>, and the newly added transition <sup>δ</sup>new = (q, σ, true, <sup>∅</sup>, q) from state <sup>q</sup> in <sup>A</sup>, since <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>A</sup> \ <sup>Σ</sup>s, it will be synchronized with such transition taking the form δ<sup>1</sup> = (qc, σ, ϕc, γc, q c) in M <sup>1</sup>. So in M <sup>1</sup><sup>A</sup>, the composed transition with respect to (qc, q) and <sup>σ</sup>, is ((qc, q), σ, ϕc, γc,(q <sup>c</sup>, q)). While in <sup>M</sup> <sup>1</sup><sup>A</sup>s, for such transition <sup>δ</sup><sup>1</sup> in <sup>M</sup> <sup>1</sup>, though there is no synchronized transition from state q in As, the composed transition is still ((qc, q), σ, ϕc, γc,(q <sup>c</sup>, q)) in M <sup>1</sup><sup>A</sup>s. So <sup>M</sup> <sup>1</sup><sup>A</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> . According to the construction process of A from As, as M <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>A</sup>s, i.e. <sup>L</sup>(M 2)-<sup>Σ</sup>*<sup>s</sup>* ⊆ L(As), it follows that M <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>A</sup>.

The main problem with smaller alphabet is that AG rule is no longer complete for deterministic finite automata [18]. The problem still exists for timed automata. If <sup>Σ</sup><sup>s</sup> <sup>⊂</sup> <sup>Σ</sup>A, then there might not exist an assumption <sup>A</sup><sup>s</sup> over <sup>Σ</sup><sup>s</sup> that satisfies the two premises of AG even though M 1<sup>M</sup> <sup>2</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> . In this situation, we say Σ<sup>s</sup> is incomplete and needs to be refined. So each time when we find Σ<sup>s</sup> is incomplete, we select another Σ <sup>s</sup> <sup>⊂</sup> <sup>Σ</sup><sup>A</sup> and restart the learning algorithm again. If a large number of round of refinement is needed, the speed of the verification is reduced significantly. To compensate for this speed reduction, we reuse the counterexamples that indicate the incompleteness of Σ<sup>s</sup> in the previous loops and use a variable List<sup>c</sup> to store them. Before starting a new round of learning, we use List<sup>c</sup> to judge whether the current Σ <sup>s</sup> is appropriate in advance. We say Σ <sup>s</sup> is *appropriately selected* only if all the counterexamples of List<sup>c</sup> can not indicate Σ <sup>s</sup> is incomplete.

With a small alphabet <sup>Σ</sup><sup>s</sup> <sup>⊂</sup> <sup>Σ</sup>A, we can not directly conclude the verification result if M 1<sup>M</sup> <sup>2</sup> |<sup>=</sup> <sup>φ</sup> . The reason is that any given counterexample ctx maintaining M <sup>1</sup><sup>A</sup>ctx |<sup>=</sup> <sup>φ</sup> <sup>∧</sup> ctx-<sup>Σ</sup>*<sup>s</sup>* ∈ L(M 2)-<sup>Σ</sup>*<sup>s</sup>* will be used to illustrate the incompleteness of the Σs, though in some cases ctx indeed indicates that M 1<sup>M</sup> <sup>2</sup> |<sup>=</sup> <sup>φ</sup> over <sup>Σ</sup>A. As a result, the treatment of ctxs will decrease the whole verification speed if M 1<sup>M</sup> <sup>2</sup> |<sup>=</sup> <sup>φ</sup> . To solve this, we need to detect real counterexamples earlier. We will first check whether M <sup>1</sup><sup>A</sup>ctx |<sup>=</sup> <sup>φ</sup> <sup>∧</sup> ctx-<sup>Σ</sup>*<sup>A</sup>* ∈ L(M 2)-Σ*<sup>A</sup>* holds. If the result is yes, the verification concludes M 1<sup>M</sup> <sup>2</sup> |<sup>=</sup> <sup>φ</sup> . Otherwise ctx is used to refine assumption over new Σ s.

#### **5 Experimental Results**

We implemented the proposed framework in Java. The membership queries and candidate queries are executed by calling the model checking tool UPPAAL. We evaluated the implementation on the benchmark of AUTOSAR (Automotive Open System Architecture) case studies. All the experiments were carried out on a 3.7GHz AMD Ryzen 5 5600X processor with 16GB RAM running 64-bit Windows 10. The source code of our tool and experiments is available in [2].

AUTOSAR is an open and standardized software architecture for automotive ECUs (Electronic Control Units). It consists of three layers, from top to bottom: AUTOSAR Software, AUTOSAR Runtime Environment (RTE), and Basic Software [1]. Its safety guarantee is very important [13,34,40]. A formal timed model of AUTOSAR architecture consists of several tasks and their corresponding runnables, different communication mechanisms of any two runnables, RTE communication controllers and task schedulers. In terms of different number of tasks and runnables, we designed three kinds of composed models: the smallscale model AUTOSAR-1 (8 automata), the complex-scale composed models AUTOSAR-2 (14 automata) and AUTOSAR-3 (14 automata). The properties of the architecture to be checked are: 1) buffers between two runnables will never overflow or underflow, and 2) for a pair of sender runnable and receiver runnable, they should not execute the *write* action simultaneously. The checking methods we performed in the experiments are: 1) traditional monolithic model checking via UPPAAL, 2) compositional verification framework we propose (CV), 3) CV with the first improvement that uses additional information of M <sup>2</sup> and φ (CV+A), 4) CV with the second improvement that minimizes assumption alphabet (CV+M), and 5) CV with both improvements (CV+A+M). Each experiment was conducted five times to calculate the average verification time. Tables 1-4 show the detailed verification results for each property using these methods, where Case IDs are given in the format *n-m-k-l*, denoting respectively the identifiers of the verified properties, the number of locations and clocks of M2, and the alphabet size of M2. The Boolean variable *Valid* denotes whether the property is satisfied. The symbols <sup>|</sup>Q|, <sup>|</sup>Σ|, <sup>R</sup>, and <sup>T</sup>mean stand for the number of the locations and the alphabet size of the learned assumption, the number of alphabet refinements during learning and the average verification time in seconds, respectively.

**1) AUTOSAR-1 Experiment.** AUTOSAR-1 consists of 8 timed automata: 4 runnables, 2 buffers, and 2 schedulers used for scheduling the runnables. We partition the system into two parts, where M<sup>1</sup> is a DOTA and M<sup>2</sup> is composed of 7 DOTAs. The experimental results for this case are recorded in Table 1, where the proposed compositional verification (CV) outperforms the monolithic checking via UPPAAL except for cases 1-71424-7-8 and 3-71424-7-8 . This is because, for these two cases, the learning algorithm needs more than 30 rounds to refine assumptions using generated counterexamples. However, in terms of the first improvement (CV+A), i.e. CV with additional information of M 1, the verification time reduces drastically for these two cases. Similarly, by the use of the second improvement (CV+M), i.e. CV with a minimized alphabet, the verification time decreases due to fewer membership queries. With both improvements (CV+A+M), compared with single ones, the checking time varies depending on the actual case. As shown in Table 1, in the case of checking property 1 with CV+A, since the alphabet size of the learned assumption A is the largest one, i.e. 3, the second improvement can take effect. So the verification time using CV+A+M is less than that using CV+A. However, it is worse than CV+M.

We have discussed in Sect. 3.5 that the framework can handle models for M<sup>1</sup> which might be a multi-clock timed automaton, though termination is not guaranteed. So, we also repartition the AUTOSAR-1 system into two parts for verification, where M<sup>1</sup> is composed of 7 DOTAs. The results in Table 2 reveal that the proposed compositional method outperforms UPPAAL in most of the cases except the case 5-4-1-2. The reason is that UPPAAL might find a counterexample faster than the compositional approach because of the on-the-fly technique, which terminates the verification once a counterexample is found.


**Table 1.** Verification Results for AUTOSAR-1 where <sup>M</sup>1 is a DOTA.

In contrast, our framework needs to spend some time learning the assumption ahead of searching the counterexample, resulting in more time for the termination of the verification framework. In the experiments, we also observe that the time varies with the selection of M1. Therefore, a proper selection of the components composed as M<sup>1</sup> or M<sup>2</sup> can lead to a faster verification, while ensuring termination of the framework.

**Table 2.** Verification Results for AUTOSAR-1 where <sup>M</sup>1 is a composition of DOTAs


**2) AUTOSAR-2 Experiment.** AUTOSAR-2 is a more complex system with totally 14 automata, including 6 runnables and a task to which the runnables are mapped, 5 buffers, a RTE and a scheduler. In this experiment, we select M<sup>1</sup> as a composition of several DOTAs. The results in Table 3 show that in the cases of properties 1-4, UPPAAL fails to obtain checking results due to the large state space, whereas our compositional approach can finish the verification for all the properties in 300 seconds using the same memory size. This indicates that the framework can reduce the state space significantly in some cases.


**Table 3.** Verification Results for AUTOSAR-2

ROM: run out of memory.

**3) AUTOSAR-3 Experiment.** The system consists of 14 components, where both M<sup>1</sup> and M<sup>2</sup> are the compositions of several DOTAs. The checking results shown in Table 4 illustrate that the minimal alphabet improvement can obtain the smallest alphabet with size 1, thus reducing the verification time. However, the additional information improvement performs badly in most cases.


**Table 4.** Verification Results for AUTOSAR-3

### **6 Conclusion**

Though in model checking, assume-guarantee reasoning can help alleviate state space explosion problem of a composite model, its practical impact has been limited due to the non-trivial human interaction to obtain the assumption. In this paper, we propose a learning-based compositional verification for deterministic timed automata, where the assumption is learned as a deterministic one-clock timed automaton. We design a model conversion algorithm to acquire the clock reset information of the learned assumption to reduce the learning complexity and prove this conversion preserves the verification results. To make the framework applicable to multi-clock systems, we design a smart teacher with heuristic to answer clock reset information. We also prove the correctness and termination of the framework. To speed up the verification, we further give two kinds of improvements to the learning process. We implemented the framework and performed experiments to evaluate our method. The results show that it outperforms monolithic model checking, and the state space can be effectively reduced. Moreover, the improvements also have positive effects on most studied systems.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Online Causation Monitoring of Signal Temporal Logic**

Zhenya Zhang1(B) , Jie An<sup>2</sup> , Paolo Arcaini<sup>2</sup> , and Ichiro Hasuo<sup>2</sup>

<sup>1</sup> Kyushu University, Fukuoka, Japan zhang@ait.kyushu-u.ac.jp <sup>2</sup> National Institute of Informatics, Tokyo, Japan {jiean,arcaini,hasuo}@nii.ac.jp

**Abstract.** Online monitoring is an effective validation approach for hybrid systems, that, at runtime, checks whether the (partial) signals of a system satisfy a specification in, e.g., *Signal Temporal Logic (STL)*. The classic STL monitoring is performed by computing a robustness interval that specifies, at each instant, how far the monitored signals are from violating and satisfying the specification. However, since a robustness interval monotonically shrinks during monitoring, classic online monitors may fail in reporting new violations or in precisely describing the system evolution at the current instant. In this paper, we tackle these issues by considering the *causation* of violation or satisfaction, instead of directly using the robustness. We first introduce a *Boolean causation monitor* that decides whether each instant is relevant to the violation or satisfaction of the specification. We then extend this monitor to a *quantitative causation monitor* that tells how far an instant is from being relevant to the violation or satisfaction. We further show that classic monitors can be derived from our proposed ones. Experimental results show that the two proposed monitors are able to provide more detailed information about system evolution, without requiring a significantly higher monitoring cost.

**Keywords:** online monitoring · Signal Temporal Logic · monotonicity

### **1 Introduction**

Safety-critical systems require strong correctness guarantees. Due to the complexity of these systems, offline verification may not be able to guarantee their total correctness, as it is often very difficult to assess all possible system behaviors. To mitigate this issue, runtime verification [4,29,36] has been proposed as a

Z. Zhang is supported by JSPS KAKENHI Grant No. JP23K16865 and No. JP23H03372. J. An, P. Arcaini, and I. Hasuo are supported by ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603), JST, Funding Reference number 10.13039/501100009024 ERATO. P. Arcaini is also supported by Engineerable AI Techniques for Practical Applications of High-Quality Machine Learning-based Systems Project (Grant Number JPMJMI20B8), JST-Mirai.

complementary technique that analyzes the system execution at runtime. Online monitoring is such an approach that checks whether the system execution (e.g., given in terms of signals) satisfies or violates a system specification specified in a temporal logic [28,34], e.g., *Signal Temporal Logic (STL)* [30].

*Quantitative online monitoring* is based on the STL *robust semantics* [17,21] that not only tells whether a signal satisfies or violates a specification ϕ (i.e., the classic Boolean satisfaction relation), but also assigns a value in <sup>R</sup> ∪ {∞, −∞} (i.e., *robustness*) that indicates *how robustly* ϕ is satisfied or violated. However, differently from offline assessment of STL formulas, an online monitor needs to reason on *partial signals* and, so, the assessment of the robustness should be adapted. We consider an established approach [12] employed by *classic online monitors* (ClaM in the following). It consists in computing, instead of a single robustness value, a *robustness interval*; at each monitoring step, ClaM identifies an *upper bound* [R]<sup>U</sup> telling the maximal reachable robustness of any possible suffix signal (i.e., any continuation of the system evolution), and a *lower bound* [R]<sup>L</sup> telling the minimal reachable robustness. If, at some instant, [R]<sup>U</sup> becomes negative, the specification is violated; if [R]<sup>L</sup> becomes positive, the specification is satisfied. In the other cases, the specification validity is unknown.

Consider a simple example in Fig. 1. It shows the monitoring of the speed of a vehicle (in the upper plot); the specification requires the speed to be always below 10. The lower plot reports how the upper bound [R]<sup>U</sup> and the lower bound [R]<sup>L</sup> of the reachable robustness change over time. We observe that the initial value of [R]<sup>U</sup> is around 8 and gradually decreases.<sup>1</sup> The monitor allows to detect that the specification is violated at time b = 20 when the speed becomes higher than 10, and therefore [R]<sup>U</sup> goes below 0. After that, the violation severity progres-

**Fig. 1.** ClaM – Robustness upper and lower bounds of ✷[0,100](v < 10)

sively gets worse till time <sup>b</sup> = 30, when [R]<sup>U</sup> becomes <sup>−</sup>5. After that point, the monitor does not provide any additional useful information about the system evolution, as [R]<sup>U</sup> remains stuck at <sup>−</sup>5. However, if we observe the signal of the speed after b = 30, we notice that (i) the severity of the violation is mitigated, and the "1st violation episode" ends at time b = 35; however, the monitor ClaM does not report this type of information; (ii) a "2nd violation episode" occurs in the time interval [40, 45]; the monitor ClaM does not distinguish the new violation.

The reason for the issues reported in the example is that the upper and lower bounds are monotonically decreasing and increasing; this has the consequence

<sup>1</sup> The value of lower bound [R]<sup>L</sup> is not shown in the figure, as not relevant. In the example, it remains constant before b = 100, and the value is usually set either according to domain knowledge about system signals, or to −∞ otherwise.

that the robustness interval at a given step is "masked" by the history of previous robustness intervals, and, e.g., it is not possible to detect mitigation of the violation severity. Moreover, as an extreme consequence, as soon as the monitor ClaM assesses the violation of the specification (i.e., the upper bound [R]<sup>U</sup> becomes negative), or its satisfaction (i.e., the lower bound [R]<sup>L</sup> becomes positive), the Boolean status of the monitor does not change anymore. Such characteristic directly derives from the STL semantics and it is known as the *monotonicity* [9– 11] of classic online monitors. Monotonicity has been recognized as a problem of these monitors in the literature [10,37,40], since it does not allow to detect specific types of information that are "masked". We informally define two types of *information masking* that can occur because of monotonicity:

*evolution masking***:** the monitor may not properly report the evolution of the system execution, e.g., mitigation of violation severity may not be detected; *violation masking***:** as a special case of *evolution masking*, the first violation episode during the system execution "masks" the following ones.

The information not reported by ClaM because of information masking, is very useful in several contexts. First of all, in some systems, the first violation of the specification does not mean that the system is not operating anymore, and one may want to continue monitoring and detect all the succeeding violations; this is the case, e.g., of the monitoring approach reported by Selyunin et al. [37] in which all the violations of the SENT protocol must be detected. Moreover, having a precise description of the system evolution is important for the usefulness of the monitoring; for example, the monitoring of the speed in Fig. 1 could be used in a vehicle for checking the speed and notifying the driver whenever the speed is approaching the critical limit; if the monitor is not able to precisely capture the severity of violation, it cannot be used for this type of application.

Some works [10,37,40] try to mitigate the monotonicity issues, by "resetting" the monitor at specific points. A recent approach has been proposed by Zhang et al. [40] (called ResM in the following) that is able to identify each "violation episode" (i.e., it solves the problem of *violation masking*), but does not solve the *evolution masking* problem. For the example in Fig. 1, ResM is able to detect the two violation episodes in intervals [20, 35] and [40, 45], but it is not able to report that the speed decreases after b = 10 (in a non-violating situation), and that the severity of the violation is mitigated after b = 30.

**Contribution.** In this paper, in order to provide more information about the evolution of the monitored system, we propose to monitor the *causation* of violation or satisfaction, instead of considering the robustness directly. To do this, we rely on the notion of *epoch* [5]. At each instant, the *violation (satisfaction) epoch* identifies the time instants at which the evaluation of the atomic propositions of the specification ϕ causes the violation (satisfaction) of ϕ.

Based on the notion of epoch, we define a *Boolean causation monitor* (called BCauM) that, at runtime, not only assesses the specification violation/satisfaction, but also tells whether each instant is relevant to it. Namely, BCauM marks each current instant b as (i) a *violation causation instant*, if b is added to the violation epoch; (ii) a *satisfaction causation instant*, if b is added to the satisfaction epoch; (iii) an *irrelevant instant*, if b is not added to any epoch. We show that BCauM is able to detect all the violation episodes (so solving the *violation masking* issue), as violation causation instants can be followed by irrelevant instants. Moreover, we show that the information provided by the classic Boolean online monitor can be derived from that of the Boolean causation monitor BCauM.

However, BCauM just tells us whether the current instant is a (violation or satisfaction) causation instant or not, but does not report *how far* the instant is from being a causation instant. To this aim, we introduce the notion of *causation distance*, as a quantitative measure characterizing the spatial distance of the signal value at b from turning b into a causation instant. Then, we propose the *quantitative causation monitor* (QCauM) that, at each instant, returns its causation distance. We show that using QCauM, besides solving the *violation masking* problem, we also solve the *evolution masking* problem. Moreover, we show that we can derive from QCauM both the classic quantitative monitor ClaM, and the Boolean causation monitor BCauM.

Experimental results show that the proposed monitors, not only provide more information, but they do it in an efficient way, not requiring a significant additional monitoring time w.r.t. the existing monitors.

**Outline.** Section 2 reports necessary background. We introduce BCauM in Sect. 3, and QCauM in Sect. 4. Experimental assessment of the two proposed monitors is reported in Sect. 5. Finally, Sect. 6 discusses some related work, and Sect. 7 concludes the paper.

### **2 Preliminaries**

In this section, we review the fundamentals of *signal temporal logic (STL)* in Sect. 2.1, and then introduce the existing classic online monitoring approach in Sect. 2.2.

#### **2.1 Signal Temporal Logic**

Let <sup>T</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup> be a positive real, and *<sup>d</sup>* <sup>∈</sup> <sup>N</sup><sup>+</sup> be a positive integer. A *d -dimensional signal* is a function **<sup>v</sup>**: [0, T] <sup>→</sup> <sup>R</sup>*<sup>d</sup>* , where <sup>T</sup> is called the *time horizon* of **<sup>v</sup>**. Given an arbitrary time instant t ∈ [0, T], **v**(t) is a *d*-dimensional real vector; each dimension concerns a *signal variable* that has a certain physical meaning, e.g., speed, RPM, acceleration, etc. In this paper, we fix a set **Var** of variables and assume that a signal **v** is *spatially bounded*, i.e., for all t ∈ [0, T], **v**(t) ∈ Ω, where Ω is a *d*-dimensional hyper-rectangle.

*Signal temporal logic (STL)* is a widely-adopted specification language, used to describe the expected behavior of systems. In Definition 1 and Definition 2, we respectively review the syntax and the robust semantics of STL [17,21,30].

**Definition 1 (STL syntax).** In STL, the *atomic propositions* α and the *formulas* ϕ are defined as follows:

$$\alpha ::= f(w\_1, \dots, w\_K) > 0 \qquad \varphi ::= \alpha \mid \bot \mid \neg \varphi \mid \varphi \land \varphi \mid \Box\_I \varphi \mid \Diamond\_I \varphi \mid \varphi \mathcal{U}\_I \varphi$$

Here <sup>f</sup> is a <sup>K</sup>-ary function <sup>f</sup> : <sup>R</sup><sup>K</sup> <sup>→</sup> <sup>R</sup>, <sup>w</sup>1,...,w<sup>K</sup> <sup>∈</sup> **Var**, and <sup>I</sup> is a closed interval over <sup>R</sup>≥<sup>0</sup>, i.e., <sup>I</sup> = [l, u], where l, u <sup>∈</sup> <sup>R</sup> and <sup>l</sup> <sup>≤</sup> <sup>u</sup>. In the case that l = u, we can use l to stand for I. ✷, ✸ and U are temporal operators, which are known as *always*, *eventually* and *until*, respectively. The always operator ✷ and eventually operator ✸ are two special cases of the until operator U, where ✸Iϕ ≡ U<sup>I</sup> ϕ and ✷Iϕ ≡ ¬✸I¬ϕ. Other common connectives such as ∨,→ are introduced as syntactic sugar: ϕ<sup>1</sup> ∨ ϕ<sup>2</sup> ≡ ¬(¬ϕ<sup>1</sup> ∧ ¬ϕ2), ϕ<sup>1</sup> → ϕ<sup>2</sup> ≡ ¬ϕ<sup>1</sup> ∨ ϕ2.

**Definition 2 (STL robust semantics).** Let **v** be a signal, ϕ be an STL formula and <sup>τ</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup> be an instant. The *robustness* R(**v**, ϕ, τ ) <sup>∈</sup> <sup>R</sup> ∪ {∞, −∞} of **<sup>v</sup>** w.r.t. ϕ at τ is defined by induction on the construction of formulas, as follows.

$$\begin{aligned} \mathrm{R}(\mathbf{v},\alpha,\tau) &:= f(\mathbf{v}(\tau)) & \mathrm{R}(\mathbf{v},\bot,\tau) &:= -\infty & \mathrm{R}(\mathbf{v},\neg\varphi,\tau) &:= -\mathrm{R}(\mathbf{v},\varphi,\tau) \\ \mathrm{R}(\mathbf{v},\varphi\_{1}\wedge\varphi\_{2},\tau) &:= \min\left(\mathrm{R}(\mathbf{v},\varphi\_{1},\tau),\mathrm{R}(\mathbf{v},\varphi\_{2},\tau)\right) \\ \mathrm{R}(\mathbf{v},\square\_{I}\varphi,\tau) &:= \inf\_{t\in\tau+I}\mathrm{R}(\mathbf{v},\varphi,t) & \mathrm{R}(\mathbf{v},\diamonds\_{I}\varphi,\tau) &:= \sup\_{t\in\tau+I}\mathrm{R}(\mathbf{v},\varphi,t) \\ \mathrm{R}(\mathbf{v},\varphi\_{1}\,\mathcal{U}\_{I}\,\varphi\_{2},\tau) &:= \sup\_{t\in\tau+I}\min\left(\mathrm{R}(\mathbf{v},\varphi\_{2},t),\inf\_{t'\in[\tau,t)}\mathrm{R}(\mathbf{v},\varphi\_{1},t')\right) \end{aligned}$$

Here, τ + I denotes the interval [l + τ,u + τ ].

The original STL semantics is Boolean, which represents whether a signal **v** satisfies ϕ at an instant τ , i.e., whether (**v**, τ ) |= ϕ. The robust semantics in Definition 2 is a quantitative extension that refines the original Boolean STL semantics, in the sense that, R(**v**, ϕ, τ ) > 0 implies (**v**, τ ) |= ϕ, and R(**v**, ϕ, τ ) < 0 implies (**v**, τ ) |= ϕ. More details can be found in [21, Proposition 16].

#### **2.2 Classic Online Monitoring of STL**

STL robust semantics in Definition 2 provides an offline monitoring approach for *complete signals*. *Online monitoring*, instead, targets a growing *partial signal* at runtime. Besides the verdicts and ⊥, an online monitor can also report the verdict unknown (denoted as ?), which represents a status when the satisfaction of the signal to ϕ is not decided yet. In the following, we formally define partial signals and introduce online monitors for STL.

Let T be the time horizon of a signal **v**, and let [a, b] ⊆ [0, T] be a subinterval in the time domain [0, T]. A *partial signal* **v**a:<sup>b</sup> is a function which is only defined in the interval [a, b]; in the remaining domain [0, T]\[a, b], we denote that **v**a:<sup>b</sup> = , where stands for a value that is not defined.

Specifically, if a = 0 and b ∈ (a, T], a partial signal **v**a:<sup>b</sup> is called a *prefix* (partial) signal; dually, if b = T and a ∈ [0, b), a partial signal **v**a:<sup>b</sup> is called a *suffix* (partial) signal. Given a prefix signal **v**0:b, a *completion* **v**0:<sup>b</sup> · **v**b:<sup>T</sup> of **v**0:<sup>b</sup> is defined as the concatenation of **v**0:<sup>b</sup> with a suffix signal **v**b:<sup>T</sup> .

**Definition 3 (Classic Boolean STL online monitor).** Let **v**0:<sup>b</sup> be a prefix signal, and ϕ be an STL formula. An online monitor M(**v**0:b, ϕ, τ ) returns a verdict in { , ⊥, ?} (namely, true, false, and unknown), as follows:

$$\mathbf{M}(\mathbf{v}\_{0:b}, \boldsymbol{\varphi}, \boldsymbol{\tau}) := \begin{cases} \top & \text{if } \forall \mathbf{v}\_{b:T}. \mathbf{R}(\mathbf{v}\_{0:b} \cdot \mathbf{v}\_{b:T}, \boldsymbol{\varphi}, \boldsymbol{\tau}) > 0 \\ \bot & \text{if } \forall \mathbf{v}\_{b:T}. \mathbf{R}(\mathbf{v}\_{0:b} \cdot \mathbf{v}\_{b:T}, \boldsymbol{\varphi}, \boldsymbol{\tau}) < 0 \\ ? & \text{otherwise} \end{cases}$$

Namely, the verdicts of M(**v**0:b, ϕ, τ ) are interpreted as follows:


Note that, by Definition 3 only, we cannot synthesize a feasible online monitor, because the possible completions for **v**0:<sup>b</sup> are infinitely many. A constructive online monitor is introduced in [12], which implements the functionality of Definition 3 by computing the *reachable* robustness of **v**0:b. We review this monitor in Definition 4.

**Definition 4 (Classic Quantitative STL online monitor (**ClaM**)).** Let **v**0:<sup>b</sup> be a prefix signal, and let ϕ be an STL formula. We denote by R<sup>α</sup> max and R<sup>α</sup> min the possible *maximum* and *minimum bounds* of the robustness R(**v**, α, τ )<sup>2</sup>. Then, an *online monitor* [R](**v**0:b, ϕ, τ ), which returns a sub-interval of [R<sup>α</sup> min, R<sup>α</sup> max] at the instant b, is defined as follows, by induction on the construction of formulas.

$$\begin{aligned} [\mathbf{R}](\mathbf{v}\_{0:b},\alpha,\tau) &:= \begin{cases} \left[f\left(\mathbf{v}\_{0:b}(\tau)\right), f\left(\mathbf{v}\_{0:b}(\tau)\right)\right] & \text{if } \tau \in [0,b] \\ \left[\mathbf{R}\_{\min}^{\alpha}, \mathbf{R}\_{\max}^{\alpha}\right] & \text{otherwise} \end{cases} \\ [\mathbf{R}](\mathbf{v}\_{0:b},\neg\varphi,\tau) &:= -\left[\mathbf{R}\right](\mathbf{v}\_{0:b},\varphi,\tau) \\ [\mathbf{R}](\mathbf{v}\_{0:b},\varphi\_{1}\wedge\varphi\_{2},\tau) &:= \min\left([\mathbf{R}](\mathbf{v}\_{0:b},\varphi\_{1},\tau), [\mathbf{R}](\mathbf{v}\_{0:b},\varphi\_{2},\tau)\right) \\ [\mathbf{R}](\mathbf{v}\_{0:b},\square\_{I}\varphi,\tau) &:= \inf\_{t\in\tau+I}\left([\mathbf{R}](\mathbf{v}\_{0:b},\varphi,t)\right) \\ [\mathbf{R}](\mathbf{v}\_{0:b},\varphi\_{1}\mathcal{U}\_{I}\varphi\_{2},\tau) &:= \sup\_{t\in\tau+I} \min\left([\mathbf{R}](\mathbf{v}\_{0:b},\varphi\_{2},t), \inf\_{t'\in[\tau,t)}[\mathbf{R}](\mathbf{v}\_{0:b},\varphi\_{1},t')\right) \end{aligned}$$

Here, f is defined as in Definition 1, and the arithmetic rules over intervals I = [l, u] are defined as follows: −I := [−u, −l] and min(I1, I2) := [min(l1, l2), min(u1, u2)].

We denote by [R]<sup>U</sup>(**v**0:b, ϕ, τ ) and [R]<sup>L</sup> (**v**0:b, ϕ, τ ) the upper bound and the lower bound of [R](**v**0:b, ϕ, τ ) respectively. Intuitively, the two bounds together form the reachable robustness interval of the completion **v**0:<sup>b</sup> · **v**b:<sup>T</sup> , under any possible suffix signal **v**b:<sup>T</sup> . For instance, in Fig. 2, the upper bound [R]<sup>U</sup> at b = 20 is 0, which indicates that the robustness of the completion of the signal speed, under any suffix, can never be larger than 0.

The quantitative online monitor ClaM in Definition 4 refines the Boolean one in Definition 3, and the Boolean monitor can be derived from ClaM as follows:

<sup>2</sup> R(**v**, α, τ ) is bounded because **v** is bounded by Ω. In practice, if Ω is not know, we set R<sup>α</sup> max and <sup>R</sup><sup>α</sup> min to, respectively, <sup>∞</sup> and −∞.


The classic online monitors are *monotonic* by definition. In the Boolean monitor (Definition 3), with the growth of **v**0:b, M(**v**0:b, ϕ, τ ) can only turn from ? to {⊥, }, but never the other way around. In the quantitative one (Definition 4), as shown in Lemma 1, [R]U(**v**0:b, ϕ, τ ) and [R]<sup>L</sup> (**v**0:b, ϕ, τ ) are both monotonic, the former one decreasingly, the latter one increasingly. An example can be observed in Fig. 2.

**Lemma 1 (Monotonicity of STL online monitor).** Let [R](**v**0:b, ϕ, τ ) be the quantitative online monitor for a partial signal **v**0:<sup>b</sup> and an STL formula ϕ. With the growth of the partial signal **v**0:b, the upper bound [R]<sup>U</sup>(**v**0:b, ϕ, τ ) monotonically decreases, and the lower bound [R]<sup>L</sup> (**v**0:b, ϕ, τ ) monotonically increases, i.e., for two time instants b1, b<sup>2</sup> ∈ [0, T], if b<sup>1</sup> < b2, we have (i) [R]<sup>U</sup>(**v**0:b<sup>1</sup> , ϕ, τ ) <sup>≥</sup> [R]<sup>U</sup>(**v**0:b<sup>2</sup> , ϕ, τ ), and (ii) [R]<sup>L</sup> (**v**0:b<sup>1</sup> , ϕ, τ ) <sup>≤</sup> [R]<sup>L</sup> (**v**0:b<sup>2</sup> , ϕ, τ ).

*Proof.* This can be proved by induction on the structures of STL formulas. The detailed proof can be found in the full version [38].

### **3 Boolean Causation Online Monitor**

As explained in Sect. 1, monotonicity of classic online monitors causes different types of *information masking*, which prevents some information from being delivered. In this section, we introduce a novel *Boolean causation (online) monitor* BCauM, that solves the *violation masking* issue (see Sect. 1). BCauM is defined based on *online signal diagnostics* [5,40], which reports the *cause* of violation or satisfaction of the specification at the atomic proposition level.

**Definition 5 (Online signal diagnostics).** Let **v**0:<sup>b</sup> be a partial signal and ϕ be an STL specification. At an instant b, online signal diagnostics returns a *violation epoch* E(**v**0:b, ϕ, τ ), under the condition [R]<sup>U</sup>(**v**0:b, ϕ, τ ) < 0, as follows:

$$\begin{split} & \mathcal{E}^{\ominus}(\mathbf{v}\_{0:b},\alpha,\tau) := \begin{cases} \{\langle\alpha,\tau\rangle\} & \text{if } \left[\mathrm{R}\right]^{\text{U}}(\mathbf{v}\_{0:b},\alpha,\tau) < 0\\ \emptyset & \text{otherwise} \end{cases} \\ & \mathcal{E}^{\ominus}(\mathbf{v}\_{0:b},\neg\varphi,\tau) := \mathcal{E}^{\ominus}(\mathbf{v}\_{0:b},\varphi,\tau) \\ & \mathcal{E}^{\ominus}(\mathbf{v}\_{0:b},\varphi\_{1}\wedge\varphi\_{2},\tau) := \bigcup\_{\substack{t\in\{1,2\}\text{ s.t.}\\ \left[\mathrm{R}\right]^{\text{U}}(\mathbf{v}\_{0:b},\varphi\_{t},\tau)<0}} \mathrm{E}^{\ominus}(\mathbf{v}\_{0:b},\varphi\_{t},\tau) \\ & \mathcal{E}^{\ominus}(\mathbf{v}\_{0:b},\mathcal{D}\_{I}\varphi,\tau) := \bigcup\_{\substack{t\in\tau\tau+I\text{ s.t.}\\ \left[\mathrm{R}\right]^{\text{U}}(\mathbf{v}\_{0:b},\varphi\_{t},t)<0}} \mathrm{E}^{\ominus}(\mathbf{v}\_{0:b},\varphi\_{t}) \\ & \mathcal{E}^{\ominus}(\mathbf{v}\_{0:b},\varphi\_{1}\amalg\_{I}\varphi\_{2},\tau) := \bigcup\_{\substack{t\in\tau\tau+I\text{ s.t.}\\ \left[\mathrm{R}\right]^{\text{U}}(\mathbf{v}\_{0:b},\varphi\_{1}\amalg\_{I}\varphi\_{2},\tau)<0}} \mathrm{E}^{\ominus}(\mathbf{v}\_{0:b},\varphi\_{1},t') \end{split}$$

and a *satisfaction epoch* E⊕(**v**0:b, ϕ, τ ), under the condition [R]<sup>L</sup> (**v**0:b, ϕ, τ ) > 0, as follows:

<sup>E</sup>⊕(**v**0:b, α, τ ) := {α, τ } if [R]<sup>L</sup> (**v**0:b, α, τ ) > 0 ∅ otherwise E⊕(**v**0:b,¬ϕ, τ ) := E(**v**0:b, ϕ, τ ) <sup>E</sup>⊕(**v**0:b, ϕ<sup>1</sup> <sup>∧</sup> <sup>ϕ</sup>2, τ ) := <sup>i</sup>∈{1,2} s.t. [R]L(**v**0:*b*,ϕ*i*,τ)>0 E⊕(**v**0:b, ϕi, τ ) E⊕(**v**0:b, ✷Iϕ, τ ) := <sup>t</sup>∈τ+<sup>I</sup> s.t. [R]L(**v**0:*b*,ϕ,t)>0 E⊕(**v**0:b, ϕ, t) <sup>E</sup>⊕(**v**0:b, ϕ<sup>1</sup> <sup>U</sup><sup>I</sup> <sup>ϕ</sup>2, τ ) := <sup>t</sup>∈τ+<sup>I</sup> s.t. [R]L(**v**0:*b*,ϕ1U*t*ϕ2,τ)>0 ⎛ <sup>⎝</sup>E⊕(**v**0:b, ϕ2, t) <sup>∪</sup> t-∈[τ,t) E⊕(**v**0:b, ϕ1, t ) ⎞ ⎠

If the conditions are not satisfied, E(**v**0:b, ϕ, τ ) and E⊕(**v**0:b, ϕ, τ ) are both ∅. Note that the definition is recursive, thus the conditions should also be checked for computing the violation and satisfaction epochs of the sub-formulas of ϕ.

Computation for other operators can be inferred by the presented ones and the STL syntax (Definition 1).

Intuitively, when a partial signal **v**0:<sup>b</sup> violates a specification ϕ, a violation epoch starts collecting the evaluations (identified by pairs of atomic propositions and instants) of the signal at the atomic proposition level, that cause the violation of the whole formula ϕ (which also applies to the satisfaction cases in a dual manner). This is done inductively, based on the semantics of different operators:


**Example 1.** The example in Fig. 2 illustrates how an epoch is collected. The specification requires that whenever the speed is higher than 10, the car should decelerate within 5 time units. As shown by the classic monitor, the specification is violated at b = 25, since v becomes higher than 10 at 20 but a remains positive during [20, 25]. Note that the specification can be rewritten as ϕ ≡ ✷[0,100](¬(v > 10) ∨ ✸[0,5](a < 0)). For convenience, we name the sub-formulas of ϕ as follows:

$$\begin{array}{ccc} \varphi' \equiv \neg(v > 10) \lor \lozenge\_{[0,5]}(a < 0) & \varphi\_1 \equiv \neg(v > 10) & \varphi\_2 \equiv \lozenge\_{[0,5]}(a < 0) \\ & \alpha\_1 \equiv v > 10 & \alpha\_2 \equiv a < 0 \end{array}$$

**Fig. 2.** Classic monitor (ClaM) result for the STL specification: ✷[0,100](v > <sup>10</sup> <sup>→</sup> ✸[0,5](a < 0))

**Fig. 3.** The violation epochs (the red parts) respectively when b = 30 and b = 35

**Fig. 4.** Boolean causation monitor (BCauM) result

Figure 3 shows the violation epochs at two instants 30 and 35. First, at b = 30,

$$\begin{aligned} \mathrm{E}^{\ominus}(\mathbf{v}\_{0:30}, \varphi, 0) &= \left(\bigcup\_{t \in [20, 25]} \mathrm{E}^{\ominus}(\mathbf{v}\_{0:30}, \alpha\_1, t)\right) \cup \left(\bigcup\_{t \in [20, 30]} \mathrm{E}^{\ominus}(\mathbf{v}\_{0:30}, \alpha\_2, t)\right) \\ &= \langle \alpha\_1, [20, 25] \rangle \cup \langle \alpha\_2, [20, 30] \rangle \end{aligned}$$

Similarly, the violation epoch E(**v**0:35, ϕ, 0) at b = 35 is the same as that at b = 30. Intuitively, the epoch at b = 30 shows the cause of the violation of **v**0:30; then since signal a < 0 in [30, 35], this segment is not considered as the cause of the violation, so the epoch remains the same at b = 35. ✁

**Definition 6 (Boolean causation monitor (**BCauM**)).** Let **v**0:<sup>b</sup> be a partial signal and ϕ be an STL specification. We denote by A the set of atomic propositions of ϕ. At each instant b, a *Boolean causation (online) monitor* BCauM returns a verdict in {, ⊕, } (called *violation causation*, *satisfaction causation* and *irrelevant*), which is defined as follows,

$$\mathcal{M}(\mathbf{v}\_{0:b}, \varphi, \tau) \;:= \begin{cases} \ominus & \text{if } \exists \alpha \in \mathcal{A}. \; \langle \alpha, b \rangle \in \mathrm{E}^{\ominus}(\mathbf{v}\_{0:b}, \varphi, \tau) \\ \oplus & \text{if } \exists \alpha \in \mathcal{A}. \; \langle \alpha, b \rangle \in \mathrm{E}^{\ominus}(\mathbf{v}\_{0:b}, \varphi, \tau) \\ \odot & \text{otherwise} \end{cases}$$

An instant b is called a *violation/satisfaction causation instant* if *M*(**v**0:b, ϕ, τ ) returns /⊕, or an *irrelevant instant* if *M*(**v**0:b, ϕ, τ ) returns .

Intuitively, if the current instant b (with the related α) is included in the epoch (thus the signal value at b is relevant to the violation/satisfaction of ϕ), BCauM will report a *violation/satisfaction causation* (/⊕); otherwise, it will report*irrelevant* (). Notably BCauM is non-monotonic, in that even if it reports or ⊕ at some instant b, it may still report after b. This feature allows BCauM to bring more information, e.g., it can detect the end of a violation episode and the start of a new one (i.e., it solves the *violation masking* issue in Sect. 1); see Example 2.

**Example 2.** Based on the signal diagnostics in Fig. 3, the Boolean causation monitor BCauM reports the result shown as in Fig. 4.

Compared to the classic Boolean monitor in Fig. 2, BCauM brings more information, in the sense that it detects the end of the violation episode at b = 30, by going from to , when the signal a becomes negative. ✁

Theorem 1 states the relation of BCauM with the classic Boolean online monitor.

**Theorem 1.** The Boolean causation monitor BCauM in Definition 6 refines the classic Boolean online monitor in Definition 3, in the following sense:


*Proof.* The proof is based on Definitions 5 and 6, Lemma 1 about the monotonicity of classic STL online monitors, and two extra lemmas in the full version [38].

### **4 Quantitative Causation Online Monitor**

Although BCauM in Sect. 3 is able to solve the *violation masking* issue, it still does not provide enough information about the evolution of the system signals, i.e., it does not solve the *evolution masking* issue introduced in Sect. 1. To tackle this issue, we propose a *quantitative (online) causation monitor* QCauM in Definition 7, which is a quantitative extension of BCauM. Given a partial signal **v**0:b, QCauM reports a *violation causation distance* [*R*] (**v**0:b, ϕ, τ ) and a *satisfaction causation distance* [*R*] <sup>⊕</sup> (**v**0:b, ϕ, τ ), which, respectively, indicate *how far* the signal value at the current instant b is from turning b into a violation causation instant and from turning b into a satisfaction causation instant.

**Definition 7 (Quantitative causation monitor (**QCauM**)).** Let **v**0:<sup>b</sup> be a partial signal, and ϕ be an STL specification. At instant b, the quantitative causation monitor QCauM returns a *violation causation distance* [*R*] (**v**0:b, ϕ, τ ), as follows:

$$\begin{split} & [\mathcal{\mathcal{A}}]^{\ominus} \left( \mathbf{v}\_{0:b}, \alpha, \tau \right) := \begin{cases} f(\mathbf{v}\_{0:b} | \tau) & \text{if } b = \tau \\ \mathbf{R}\_{\max}^{\alpha} & \text{otherwise} \end{cases} \\ & [\mathcal{\mathcal{A}}]^{\ominus} \left( \mathbf{v}\_{0:b}, \neg \varphi, \tau \right) := -[\mathcal{\mathcal{A}}]^{\ominus} \left( \mathbf{v}\_{0:b}, \varphi, \tau \right) \\ & [\mathcal{\mathcal{A}}]^{\ominus} \left( \mathbf{v}\_{0:b}, \varphi\_{1} \wedge \varphi\_{2}, \tau \right) := \min \left( [\mathcal{\mathcal{A}}]^{\ominus} \left( \mathbf{v}\_{0:b}, \varphi\_{1}, \tau \right), [\mathcal{\mathcal{A}}]^{\ominus} \left( \mathbf{v}\_{0:b}, \varphi\_{2}, \tau \right) \right) \\ & [\mathcal{\mathcal{A}}]^{\ominus} \left( \mathbf{v}\_{0:b}, \varphi\_{1} \vee \varphi\_{2}, \tau \right) := \min \left( \begin{cases} \max \left( [\mathcal{\mathcal{A}}]^{\ominus} \left( \mathbf{v}\_{0:b}, \varphi\_{1}, \tau \right), [\mathbf{R}]^{\updownarrow} \left( \mathbf{v}\_{0:b}, \varphi\_{2}, \tau \right) \right), \\ \max \left( [\mathbf{R}]^{\updownarrow} \left( \mathbf{v}\_{0:b}, \varphi\_{1}, \tau \right), [\mathcal{\mathcal{A}}]^{\ominus} \left( \mathbf{v}\_{0:b}, \varphi\_{2}, \tau \right) \right) \end{cases} \right) \end{split}$$

72 Z. Zhang et al.

[*R*] (**v**0:b, ✷Iϕ, τ ) := inf <sup>t</sup>∈τ+<sup>I</sup> [*R*] (**v**0:b, ϕ, t) [*R*] (**v**0:b, ✸Iϕ, τ ) := inf <sup>t</sup>∈τ+<sup>I</sup> max [*R*] (**v**0:b, ϕ, t), [R]U(**v**0:b, ✸Iϕ, τ ) [*R*] (**v**0:b, ϕ<sup>1</sup> <sup>U</sup><sup>I</sup> <sup>ϕ</sup>2, τ ) := inf <sup>t</sup>∈τ+<sup>I</sup> ⎛ ⎜⎝ max ⎛ ⎜⎝ min inf t-∈[τ,t) [*R*] (**v**0:b, ϕ1, t ) [*R*] (**v**0:b, ϕ2, t) [R]U(**v**0:b, ϕ<sup>1</sup> <sup>U</sup><sup>I</sup> <sup>ϕ</sup>2, τ ) ⎞ ⎟⎠ ⎞ ⎟⎠

and a *satisfaction causation distance* [*R*] <sup>⊕</sup> (**v**0:b, ϕ, τ ), as follows:

$$\begin{split} \left[\mathcal{A}\right]^{\oplus}(\mathbf{v}\_{0:b},\boldsymbol{\alpha},\boldsymbol{\tau}) &:= \begin{cases} f(\mathbf{v}\_{0:b}(\boldsymbol{\tau})) & \text{if } b=\boldsymbol{\tau} \\ \mathbf{R}\_{\min}^{\boldsymbol{\alpha}} & \text{otherwise} \end{cases} \\ \left[\mathcal{A}\right]^{\oplus}(\mathbf{v}\_{0:b},\boldsymbol{\varphi},\boldsymbol{\tau}) &:= -\left[\mathcal{A}\right]^{\oplus}(\mathbf{v}\_{0:b},\boldsymbol{\varphi},\boldsymbol{\tau}) \\ \left[\mathcal{A}\right]^{\oplus}(\mathbf{v}\_{0:b},\boldsymbol{\varphi}\_{1}\wedge\boldsymbol{\varphi}\_{2},\boldsymbol{\tau}) &:= \max\left(\min\limits\_{\min\limits\limits\nolimits}\Big(\left[\mathcal{A}\right]^{\oplus}(\mathbf{v}\_{0:b},\boldsymbol{\varphi}\_{1},\boldsymbol{\tau}),\left[\mathcal{R}\right]^{\mathbb{L}}(\mathbf{v}\_{0:b},\boldsymbol{\varphi}\_{2},\boldsymbol{\tau})\Big)\right) \\ \left[\mathcal{A}\right]^{\oplus}(\mathbf{v}\_{0:b},\boldsymbol{\varphi}\_{1}\vee\boldsymbol{\varphi}\_{2},\boldsymbol{\tau}) &:= \max\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\big(\left[\mathcal{A}\right]^{\oplus}(\mathbf{v}\_{0:b},\boldsymbol{\varphi}\_{1},\boldsymbol{\tau}),\left[\mathcal{A}\right]^{\oplus}(\mathbf{v}\_{0:b},\boldsymbol{\varphi}\_{2},\boldsymbol{\tau})\Big) \\ \left[\mathcal{A}\right]^{\oplus}(\mathbf{v}\_{0:b},\boldsymbol{\square}\_{I}\varphi\_{1},\boldsymbol{\tau}) &:= \sup\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits\limits$$

Intuitively, a violation causation distance [*R*] (**v**0:b, ϕ, τ ) is the spatial distance of the signal value **v**0:b(b), at the current instant b, from turning b into a violation causation instant such that b is relevant to the violation of ϕ (also applied to the satisfaction case dually). It is computed inductively on the structure of ϕ:


QCauM

**Fig. 5.** Quantitative causation monitor (QCauM) result for Example <sup>1</sup>


Similarly, we can also compute the satisfaction causation distance. We use Example 3 to illustrate the quantitative causation monitor for the signals in Example 1.

**Example 3.** Consider the quantitative causation monitor for the signals in Example 1. At b = 30, the violation causation distance is computed as:

[*R*] (**v**0:30,ϕ,0)= inf <sup>t</sup>∈[0,100][*R*] - **<sup>v</sup>**0:30,ϕ ,t = inf <sup>t</sup>∈[0,100] ⎛ <sup>⎝</sup>min ⎛ <sup>⎝</sup> max [*R*] (**v**0:30,ϕ1,t),[R]<sup>U</sup>(**v**0:30,ϕ2,t) , max [R]<sup>U</sup>(**v**0:30,ϕ1,t),[*R*] (**v**0:30,ϕ2,t) ⎞ ⎠ ⎞ ⎠ = inf <sup>t</sup>∈[0,100] ⎛ ⎜⎜⎜⎜⎝ min ⎛ ⎜⎜⎜⎜⎝ max −[*R*] <sup>⊕</sup>(**v**0:30,α1,t), sup t-<sup>∈</sup>t+[0,5] [R]<sup>U</sup>(**v**0:30,α2,t ) max −[R]<sup>L</sup> (**v**0:30,α1,t),max [R]<sup>U</sup>(**v**0:30,ϕ2,t), inf t-<sup>∈</sup>t+[0,5] [*R*] - **<sup>v</sup>**0:30,α2,t ⎞ ⎟⎟⎟⎟⎠ ⎞ ⎟⎟⎟⎟⎠ =max −[R]<sup>L</sup> (**v**0:30,α1,25),[R]<sup>U</sup>(**v**0:30,ϕ2,25), inf t-<sup>∈</sup>[25,30][*R*] - **<sup>v</sup>**0:30,α2,t =max(−3,−3,−5)=−3.

Similarly, at b = 35, the violation causation distance [*R*] (**v**0:35, ϕ, 0) = 5. See the result of QCauM shown in Fig. 5. Compared to ClaM in Fig. 2, it is evident that QCauM provides much more information about the system evolution, e.g., it can report that, in the interval [15, 20], the system satisfies the specification "more", as the speed decreases. ✁

By using the violation and satisfaction causation distances reported by QCauM jointly, we can infer the verdict of BCauM, as indicated by Theorem 2.

**Theorem 2.** The quantitative causation monitor QCauM in Definition 7 refines the Boolean causation monitor BCauM in Definition 6, in the sense that:

– if [*R*] (**v**0:b, ϕ, τ ) < 0, it implies *M*(**v**0:b, ϕ, τ ) = ; – if [*R*] <sup>⊕</sup> (**v**0:b, ϕ, τ ) > 0, it implies *M*(**v**0:b, ϕ, τ ) = ⊕; – if [*R*] (**v**0:b, ϕ, τ ) > 0 and [*R*] <sup>⊕</sup> (**v**0:b, ϕ, τ ) < 0, it implies *M*(**v**0:b, ϕ, τ ) = .

*Proof.* The proof is generally based on mathematical induction. First, by Definition 7 and Definition 5, it is straightforward that Theorem 2 holds for the atomic propositions.

Then, assuming that Theorem 2 holds for an arbitrary formula ϕ, we prove that Theorem 2 also holds for the composite formula ϕ constructed by applying STL operators to ϕ. The complete proof for all three cases is shown in the full version [38].

As an instance, we show the proof for the first case with ϕ = ϕ<sup>1</sup> ∨ ϕ2, i.e., we prove that [*R*] (**v**0:b, ϕ<sup>1</sup> ∨ ϕ2, τ ) < 0 implies *M*(**v**0:b, ϕ<sup>1</sup> ∨ ϕ2, τ ) = .

$$\begin{aligned} & [\mathcal{R}]^{\ominus} \left( \mathbf{v}\_{0:b}, \varphi\_{1} \vee \varphi\_{2}, \tau \right) < 0 \\ & \Rightarrow \max \left( [\mathcal{R}]^{\ominus} \left( \mathbf{v}\_{0:b}, \varphi\_{1}, \tau \right), [\mathbf{R}]^{\upU} (\mathbf{v}\_{0:b}, \varphi\_{2}, \tau) \right) < 0 \quad & \text{(by Def. 7 and w.l.o.g.)} \\ & \Rightarrow [\mathcal{R}]^{\ominus} \left( \mathbf{v}\_{0:b}, \varphi\_{1}, \tau \right) < 0 \quad & \text{(by def. of max)} \\ & \Rightarrow \mathcal{M} (\mathbf{v}\_{0:b}, \varphi\_{1}, \tau) = \ominus \\ & \Rightarrow \mathcal{E}^{\ominus} (\mathbf{v}\_{0:b}, \varphi\_{1} \vee \varphi\_{2}, \tau) \supseteq \mathcal{E}^{\ominus} (\mathbf{v}\_{0:b}, \varphi\_{1}, \tau) \qquad & \text{(by Def. 5 and Thm. 1)} \\ & \Rightarrow \exists \alpha . \ \langle \alpha, b \rangle \in \mathcal{E}^{\ominus} (\mathbf{v}\_{0:b}, \varphi\_{1} \vee \varphi\_{2}, \tau) \qquad & \text{(by def. of  $\Box$ )} \\ & \Rightarrow \mathcal{M} (\mathbf{v}\_{0:b}, \varphi\_{1} \vee \varphi\_{2}, \tau) \doteq & \rightarrow & \text{(by Def. 6)} \end{aligned}$$

The relation between the quantitative causation monitor QCauM and the Boolean causation monitor BCauM, disclosed by Theorem 2, can be visualized by the comparison between Fig. 5 and Fig. 4. Indeed, when the violation causation distance reported by QCauM is negative in Fig. 5, BCauM reports in Fig. 4.

Next, we present Theorem 3, which states the relation between the quantitative causation monitor QCauM and the classic quantitative monitor ClaM.

**Theorem 3.** The quantitative causation monitor QCauM in Definition 7 refines the classic quantitative online monitor ClaM in Definition 4, in the sense that, the monitoring results of ClaM can be reconstructed from the results of QCauM, as follows:

$$\left[\mathbf{R}\right]^{\mathsf{U}}(\mathbf{v}\_{0:b},\varphi,\tau) = \inf\_{t \in \left[0,b\right]} \left[\boldsymbol{\mathscr{R}}\right]^{\ominus}(\mathbf{v}\_{0:t},\varphi,\tau) \tag{1}$$

$$\mathbb{I}[\mathbf{R}]^\mathsf{L}(\mathbf{v}\_{0:b}, \boldsymbol{\varphi}, \tau) = \sup\_{t \in [0, b]} [\boldsymbol{\mathcal{R}}]^\oplus \left(\mathbf{v}\_{0:t}, \boldsymbol{\varphi}, \tau\right) \tag{2}$$

*Proof.* The proof is generally based on mathematical induction. First, by Definition 7 and Definition 4, it is straightforward that Theorem 3 holds for the atomic propositions.

Then, we make the global assumption that Theorem 3 holds for an arbitrary formula ϕ, i.e., both the two cases inf <sup>t</sup>∈[0,b] [*R*] (**v**0:t, ϕ, τ ) = [R]<sup>U</sup>(**v**0:b, ϕ, τ ) and sup<sup>t</sup>∈[0,b] [*R*] <sup>⊕</sup> (**v**0:t, ϕ, τ ) = [R]<sup>L</sup> (**v**0:b, ϕ, τ ) hold. Based on this assumption, we prove that Theorem 3 also holds for the composite formula ϕ constructed by applying STL operators to ϕ.

As an instance, we prove inf <sup>t</sup>∈[0,b] [*R*] (**v**0:t, ϕ , τ ) = [R]<sup>U</sup>(**v**0:b, ϕ , τ ) with ϕ = ϕ<sup>1</sup> ∨ ϕ<sup>2</sup> as follows. The complete proof is presented in the full version [38].

First, if b = τ , it holds that:

$$\begin{aligned} &\inf\_{t\in[0,b]}\left[\mathcal{Q}\right]^{\ominus}(\mathbf{v}\_{0:t},\varphi\_{1}\vee\varphi\_{2},\tau)=[\mathcal{Q}]^{\ominus}(\mathbf{v}\_{0:\tau},\varphi\_{1}\vee\varphi\_{2},\tau) \\ &=\max\Big([\mathcal{R}]^{\updownarrow}(\mathbf{v}\_{0:\tau},\varphi\_{1},\tau),[\mathcal{R}]^{\updownarrow}(\mathbf{v}\_{0:\tau},\varphi\_{2},\tau)\Big) \\ &=[\mathcal{R}]^{\updownarrow}(\mathbf{v}\_{0:b},\varphi\_{1}\vee\varphi\_{2},\tau) \end{aligned} \tag{\text{by Def. 7 and global assumption.}} $$

Then, we make a local assumption that, given an arbitrary b, it holds that inf <sup>t</sup>∈[0,b] [*R*] (**v**0:t, ϕ<sup>1</sup> <sup>∨</sup> <sup>ϕ</sup>2, τ ) = [R]<sup>U</sup>(**v**0:b, ϕ<sup>1</sup> <sup>∨</sup> <sup>ϕ</sup>2, τ ). We prove that, for <sup>b</sup> which is the next sampling point to b, it holds that,

inf <sup>t</sup>∈[0,b-] [*R*] (**v**0:<sup>t</sup>,ϕ1∨ϕ2,τ ) =min [R]<sup>U</sup>(**v**0:<sup>b</sup>,ϕ1∨ϕ2,τ ),[*R*] (**v**0:b- ,ϕ1∨ϕ2,τ ) (by local assump.) =min ⎛ ⎜⎜⎜⎝ max [R]<sup>U</sup>(**v**0:<sup>b</sup>,ϕ1,τ ),[R]<sup>U</sup>(**v**0:<sup>b</sup>,ϕ2,τ ) , max [*R*] (**v**0:b- ,ϕ1,τ ),[R]<sup>U</sup>(**v**0:b- ,ϕ2,τ ) , max [R]<sup>U</sup>(**v**0:b- ,ϕ1,τ ),[*R*] (**v**0:b- ,ϕ2,τ ) ⎞ ⎟⎟⎟⎠ (by Defs. 4 & 7) =min ⎛ ⎜⎜⎜⎜⎜⎝ max [R]<sup>U</sup>(**v**0:<sup>b</sup>,ϕ1,τ ),[R]<sup>U</sup>(**v**0:<sup>b</sup>,ϕ2,τ ) , max [*R*] (**v**0:b- ,ϕ1,τ ),[R]<sup>U</sup>(**v**0:<sup>b</sup>,ϕ2,τ ) , max [R]<sup>U</sup>(**v**0:<sup>b</sup>,ϕ1,τ ),[*R*] (**v**0:b- ,ϕ2,τ ) , max- [*R*] (**v**0:b- ,ϕ1,τ ),[*R*] (**v**0:b- ,ϕ2,τ ) ⎞ ⎟⎟⎟⎟⎟⎠ (by global assump.) =max ⎛ ⎝ min [R]<sup>U</sup>(**v**0:<sup>b</sup>,ϕ1,τ ),[*R*] (**v**0:b- ,ϕ1,τ ) , min [R]<sup>U</sup>(**v**0:<sup>b</sup>,ϕ2,τ ),[*R*] (**v**0:b- ,ϕ2,τ ) ⎞ <sup>⎠</sup> (by def. of min, max) =max [R]<sup>U</sup>(**v**0:b- ,ϕ1,τ ),[R]<sup>U</sup>(**v**0:b- ,ϕ2,τ ) (by global assump.) =[R]<sup>U</sup>(**v**0:b-,ϕ1∨ϕ2,τ ) (by Def. 4)

**Fig. 6.** Refinement among STL monitors

Theorem 3 shows that the result [R]<sup>U</sup>(**v**0:b, ϕ, τ ) of ClaM can be derived from the result of QCauM by applying inf<sup>t</sup>∈[0,b] [*R*] (**v**0:b, ϕ, t). For instance, comparing the results of QCauM in Fig. 5 and the results of ClaM in Fig. 2, we can find that the results in Fig. 2 can be reconstructed by using the results in Fig. 5.

**Remark 1.** Figure 6 shows the refinement relations between the six STL monitoring approaches. The left column lists the offline monitoring approaches derived directly from the Boolean and quantitative semantics of STL respectively. The middle column shows the classic online monitoring approaches. Our two causation monitors, namely BCauM and QCauM, are given in the column on the right. Given a pair (A, B) of the approaches, A ← B indicates that the approach B refines the approach A, in the sense that B can deliver more information than A, and the information delivered by A can be derived from the information delivered by B. It is clear that the refinement relation in the figure ensures transitivity. Note that blue arrows are contributed by this paper. As shown by Fig. 6, the relation between BCauM and QCauM is analogous to that between the Boolean and quantitative semantics of STL.

### **5 Experimental Evaluation**

We implemented a tool<sup>3</sup> for our two causation monitors. It is built on the top of Breach [15], a widely used tool for monitoring and testing of hybrid systems [18]. Being consistent with Breach, the monitors target the output signals given by Simulink models, as an additional block. Experiments were executed on a MacOS machine, 1.4 GHz Quad-Core Intel Core-i5, 8 GB RAM, using Breach v1.10.0.

#### **5.1 Experiment Setting**

**Benchmarks.** We perform the experiments on the following two benchmarks. *Abstract Fuel Control (AFC)* is a powertrain control system from Toyota [27], which has been widely used as a benchmark in the hybrid system community [18– 20]. The system outputs the *air-to-fuel* ratio AF, and requires that the deviation of AF from its reference value AFref should not be too large. Specifically, we consider the following properties from different perspectives:

– ϕAFC <sup>1</sup> := ✷[10,50](|AF − AFref| < 0.1): the deviation should always be small;

<sup>3</sup> Available at https://github.com/choshina/STL-causation-monitor, and Zenodo [39].


*Automatic transmission (AT)* is a widely-used benchmark [18–20], implementing the transmission controller of an automotive system. It outputs the gear, speed and RPM of the vehicle, which are required to satisfy this safety requirement:

– ϕAT <sup>1</sup> := ✷[0,27](speed > 50 → ✸[1,3](RPM < 3000)): whenever the speed is higher than 50, the RPM should be below 3000 in three time units.

**Baseline and Experimental Design.** In order to assess our two proposed monitors (the Boolean causation monitor BCauM in Definition 6, and the quantitative causation monitor QCauM in Definition 7), we compare them with two baseline monitors: the classic quantitative robustness monitor ClaM (see Definition 4); and the state-of-the-art approach *monitor with reset* ResM [40], that, once the signal violates the specification, resets at that point and forgets the previous partial signal.

Given a model and a specification, we generate input signals by randomly sampling in the input space and feed them to the model. The online output signals are given as inputs to the monitors and the monitoring results are collected. We generate 10 input signals for each model and specification. To account for fluctuation of monitoring times in different repetitions<sup>4</sup>, for each signal, the experiment has been executed 10 times, and we report average results.

#### **5.2 Evaluation**

**Qualitative Evaluation.** We here show the type of information provided by the different monitors. As an example, Fig. 7 reports, for two specifications of the two models, the system output signal (in the top of the two sub-figures), and the monitoring results of the compared monitors. We notice that signals of both models (top plots) violate the corresponding specifications in multiple points. Let us consider monitoring results of ϕAFC <sup>1</sup> ; similar observations apply to ϕAT <sup>1</sup> .

When using the ClaM, only the first violation right after time 15 is detected (the upper bound of robustness becomes negative); after that, the upper bound remains constant, without reporting that the system recovers from violation at around time 17, and that the specification is violated again four more times.

Instead, we notice that the monitor with reset ResM is able to detect all the violations (as the upper bound becomes greater than 0 when the violation episode ends), but it does not properly report the margin of robustness; indeed, during the violation episodes, it reports a constant value of around −0.4 for the upper bound, but the system violates the specification with different degrees of severity in these intervals; in a similar way, when the specification is satisfied around after time 17, the upper bound is just above 0, but actually the system

<sup>4</sup> Note that only the monitoring time changes across different repetitions; monitoring results are instead always the same, as monitoring is deterministic for a given signal.

**Fig. 7.** Examples of the information provided by the different monitors

**Table 1.** Experimental results – Average (avg.) and standard deviation (stdv.) of monitoring and simulation times (ms)


satisfies the specification with different margins. As a consequence, ResM provides sharp changes of the robustness upper bound that do not faithfully reflect the system evolution.

We notice that the Boolean causation monitor BCauM only reports information about the violation episodes, but not on the degree of violation/satisfaction. Instead, the quantitative causation monitor QCauM is able to provide a very detailed information, not only reporting all the violation episodes, but also properly characterizing the degree with which the specification is violated or satisfied. Indeed, in QCauM, the violation causation distance smoothly increases from violation to satisfaction, so faithfully reflecting the system evolution.

**Quantitative Assessment of Monitoring Time.** We discuss the computation cost of doing the monitoring.


**Table 2.** Experimental results of the four monitoring approaches – Monitoring time (ms) – ΔA = (QCauM−A) /<sup>A</sup>

In Table 1, we observe that, for all the monitors, the *monitor* ing time is much lower than the *total* time (system execution + monitoring). It shows that, for this type of systems, the monitoring overhead is negligible. Still, we compare the execution costs for the different monitors. Table 2 reports the monitoring times of all the monitors for each specification and each signal. Moreover, it reports the percentage difference between the quantitative causation monitor QCauM (the most informative one) and the other monitors.

We first observe that ResM and BCauM have, for the same specification, high variance of the monitoring times across different signals. ClaM and QCauM, instead, provide very consistent monitoring times. This is confirmed by the standard deviation results in Table 1. The consistent monitoring cost of QCauM is a good property, as the designers of the monitor can precisely forecast how long the monitoring will take, and design the overall system accordingly.

We observe that QCauM is negligibly slower than ClaM for ϕAFC <sup>1</sup> and ϕAFC <sup>2</sup> , and at most twice slower for the other two specifications. This additional monitoring cost is acceptable, given the additional information provided by QCauM. Compared to ResM, QCauM is usually slower (at most around the double); also in this case, as QCauM provides more information than ResM, the cost is acceptable.

Compared to the Boolean causation monitor BCauM, QCauM is usually faster, as it does not have to collect epochs, which is a costly operation. However, we observe that it is slower in ϕAFC <sup>3</sup> , because, in this specification, most of the signals do not violate it (and so also BCauM does not collect epochs in this case).

To conclude, QCauM is a monitor able to provide much more information that exiting monitors, with an acceptable overhead in terms of monitoring time.

### **6 Related Work**

**Monitoring of STL.** Monitoring can be performed either offline or online. Offline monitoring [16,30,33] targets complete traces and returns either true or false. In contrast, online monitoring deals with the partial traces, and thus a three-valued semantics was introduced for LTL monitoring [7,8], and in further for MTL and STL qualitative online monitoring [24,31], to handle the situation where neither of the conclusiveness can be made. In usual, the quantitative online monitoring provides a quantitative value or a robust satisfaction interval [12– 14,25,26]. Based on them, several tools have been developed, e.g., AMT [32,33], Breach [15], S-Taliro [1], etc. We refer to the survey [3] for comprehensive introduction. Recently, in [35], Qin and Deshmukh propose clairvoyant monitoring to forecast future signal values and give probabilistic bounds on the specification validity. In [2], an online monitoring is proposed for perception systems with Spatio-temporal Perception Logic [23].

**Monotonicity Issue.** However, most of these works do not handle the monotonicity issue stated in this paper. In [10], Cimatti et al. propose an assumptionbased monitoring framework for LTL. It takes the user expertise into account and allows the monitor *resettable*, in the sense that it can restart from any discrete time point. In [37], a recovery feature is introduced in their online monitor [25]. However, the technique is an application-specific approach, rather than a general framework. In [40], a reset mechanism is proposed for STL online monitor. However, as experimentally evaluated in Sect. 5, it essentially provides a solution for the Boolean semantics and still holds monotonicity between two resetting points.

**Signal Diagnostics.** Signal diagnostics [5,22,32] is originally used in an offline manner, for the purpose of fault localization and system debugging. In [22], the authors propose an approach to automatically address the single evaluations (namely, epochs) that account for the satisfaction/violation of an STL specification, for a complete trace. This information can be further used as a reference for detecting the root cause of the bugs in the CPS systems [5,6,32]. The online version of signal diagnostics, which is the basis of our Boolean causation monitor, is introduced in [40]. However, we show in Sect. 5 that the monitor based on this technique is often costly, and not able to deliver the quantitative runtime information compared to the quantitative causation monitor.

### **7 Conclusion and Future Work**

In this paper, we propose a new way of doing STL monitoring based on causation that is able to provide more information than classic monitoring based on STL robustness. Concretely, we propose two causation monitors, namely BCauM and QCauM. In particular, BCauM intuitively explains the concept of "causation" monitoring, and thus paves the path to QCauM that is more practically valuable. We further prove the relation between the proposed causation monitors and the classic ones.

As future work, we plan to improve the efficiency the monitoring, by avoiding some unnecessary computations for some instants. Moreover, we plan to apply it to the monitoring of real-world systems.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Process Equivalence Problems as Energy Games

Benjamin Bisping(B)

Technische Universität Berlin, Berlin, Germany benjamin.bisping@tu-berlin.de https://bbisping.de

Abstract. We characterize all common notions of behavioral equivalence by *one* 6-dimensional energy game, where energies bound capabilities of an attacker trying to tell processes apart. The defender-winning initial credits exhaustively determine which preorders and equivalences from the (strong) linear-time–branching-time spectrum relate processes.

The time complexity is exponential, which is optimal due to trace equivalence being covered. This complexity improves drastically on our previous approach for deciding groups of equivalences where exponential sets of distinguishing HML formulas are constructed on top of a superexponential reachability game. In experiments using the VLTS benchmarks, the algorithm performs on par with the best similarity algorithm.

Keywords: Bisimulation · Energy games · Process equivalence spectrum

### 1 Introduction

Many verification tasks can be understood along the lines of "how equivalent" two models are. Figure 1 replicates a standard example, known for instance from the

Fig. 1. A specification of mutual exclusion Mx, and Peterson's protocol Pe.

c The Author(s) 2023 C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 85–106, 2023. https://doi.org/10.1007/978-3-031-37706-8\_5

textbook *Reactive Systems* [3]: A specification of mutual exclusion Mx as two alternating users <sup>A</sup> and <sup>B</sup> entering their critical section *ecA*/*ec<sup>B</sup>* and leaving *lcA*/*lc<sup>B</sup>* before the other may enter; and the transition system of Peterson's [28] mutual exclusion algorithm Pe, minimized by weak bisimilarity, with internal steps −→ due to the coordination that needs to happen. For Pe to faithfully implement mutual exclusion, it should behave somewhat similarly to Mx.

Semantics in concurrent models must take nondeterminism into account. Setting the degree to which nondeterminism counts induces equivalence notions with subtle differences: Pe and Mx *weakly simulate* each other, meaning that a tree of options from one process can be matched by a similar tree of the other. This implies that they have the same *weak traces*, that is, matching paths. However, they are not weakly *bi-*similar, which would require a higher degree of symmetry than mutual simulation, namely, matching absence of options. There are many more such notions. Van Glabbeek's *linear-time–branching-time spectrum* [21] (cf. Fig. 3) brings order to the hierarchy of equivalences. But it is notoriously difficult to navigate. In our example, one might wonder: Are there notions relating the two *besides* mutual simulation?

Our recent algorithm for *linear-time–branching-time spectroscopy* by Bisping, Nestmann, and Jansen [7,9] is capable of answering equivalence questions for finite-state systems by *deciding the spectrum of behavioral equivalences in one go*. In theory, that is. In practice, the algorithm of [7] runs out of memory when applied to the weak transition relation of even small examples like Pe. The reason for this is that saturating transition systems with the closure of weak steps adds a lot of nondeterminism. For instance, Pe may reach 10 different states by internal steps (−→<sup>∗</sup>). The spectroscopy algorithm of [7] builds a bisimulation game where the defender wins if the game starts at a pair of equivalent processes. To allow all attacks relevant for the spectrum, the [7]-game must consider partitionings of state sets reached through nondeterminism. There are 115,975 ways of partitioning 10 objects. As a consequence, the game graph of [7] comparing Pe and Mx has 266,973 game positions. On top of each postion, [7] builds sets of distinguishing formulas of Hennessy–Milner modal logic (HML) [21,24] with minimal expressiveness. These sets may grow exponentially. Game over!

Contributions. In this paper, we adapt the spectroscopy approach of [7,9] to render small verification instances like Pe/Mx feasible. The key ingredients that will make the difference are: understanding the spectrum purely through *depthproperties of HML formulas*; using *multidimensional energy games* [15] instead of reachability games; and exploiting the considered spectrum to drastically *reduce the branching-degree* of the game as well as the height of the energy lattice. Figure 2 lays out the algorithm with pointers to key parts of this paper.


Fig. 2. Overview of the computations <sup>→</sup> and correspondences <sup>∼</sup> we will discuss.


### 2 Distinctions and Equivalences in Transition Systems

Two classic concepts of system analysis form the background of this paper: *Hennessy–Milner logic* (HML) interpreted over *transition systems* goes back to Hennessy and Milner [24] investigating observational equivalence in operational semantics. Van Glabbeek's *linear-time–branching-time spectrum* [21] arranges all common notions of equivalence as a hierarchy of HML sublanguages.

#### 2.1 Transition Systems and Hennessy–Milner Logic

Definition 1 (Labeled transition system). *A* labeled transition system *is a tuple* <sup>S</sup> = (P,Σ, −→) *where* <sup>P</sup> *is the set of* processes*,* Σ *is the set of* actions*, and* −→⊆P× Σ × P *is the* transition relation*.*

*By* <sup>I</sup>(p) *we denote the* actions enabled initially *for a process* p ∈ P*, that is,* <sup>I</sup>(p) := {a <sup>∈</sup> Σ | ∃p . p a −→ p }*. We lift the steps to sets with* P <sup>a</sup> −→ P *iff* P = {p | ∃p <sup>∈</sup> P. p <sup>a</sup> −→ p }*.*

Hennessy–Milner logic expresses *observations* that one may make on such a system. The set of formulas true of a process offers a denotation for its semantics.

Definition 2 (Hennessy–Milner logic). *The* syntax of Hennessy–Milner logic *over a set* Σ *of actions,* HML[Σ]*, is defined by the grammar:*

$$\begin{aligned} \varphi &::= \langle a \rangle \varphi & \qquad \qquad \qquad \qquad \quad \quad \quad \langle \, \Box \, \varphi \rangle \\ \downarrow & \quad \bigwedge \{ \psi, \psi, \ldots \} \\ \psi &::= \neg \varphi \mid \varphi. \end{aligned}$$

Fig. 3. Hierarchy of equivalences/preorders becoming finer towards the top.

*Its semantics* - · <sup>S</sup> *over a transition system* <sup>S</sup> = (P,Σ, −→) *is given as the set of processes where a formula "is true" by:*

$$\begin{aligned} \left[ \left[ \langle a \rangle \varphi \right] \right]^{\mathcal{S}} &:= \{ p \in \mathcal{P} \mid \exists p' \in \left[ \varphi \right] \right\}^{\mathcal{S}}.p \xrightarrow{a} p' \{ \\ \left[ \bigwedge\_{i \in I} \psi\_{i} \right]^{\mathcal{S}} &:= \bigcap \{ \left[ \psi\_{i} \right]^{\mathcal{S}} \mid i \in I \land \nexists \varphi . \psi\_{i} = \neg \varphi \} \\ &\qquad \Big\backslash \bigcup \{ \left[ \varphi \right]^{\mathcal{S}} \mid \exists i \in I. \psi\_{i} = \neg \varphi \}. \end{aligned}$$

HML basically extends propositional logic with a modal observation operation. Conjunctions then bound trees of future behavior. Positive conjuncts mean lower bounds, negative ones impose upper bounds. For the scope of this paper, finite bounds suffice, i.e. , conjunctions are finite-width. The empty conjunction <sup>T</sup> := <sup>∅</sup> is usually omitted in writing.

Fig. 4. Example system of internal decision <sup>τ</sup> −→ against an action *ecA* −−→.

We use Hennessy–Milner logic to capture *differences* between program behaviors. Depending on how much of its expressiveness we use, different notions of equivalence are characterized.

Definition 3 (Distinguishing formulas and preordering languages). *A formula* ϕ <sup>∈</sup> HML[Σ] *is said to* distinguish *two processes* p, q ∈ P *iff* p <sup>∈</sup> ϕ S *and* q /<sup>∈</sup> ϕ S *. A sublanguage of Hennessy–Milner logic,* <sup>O</sup>*<sup>X</sup>* <sup>⊆</sup> HML[Σ]*, either distinguishes two processes,* p *<sup>X</sup>* <sup>q</sup>*, if it contains a distinguishing formula, or preorders them otherwise. If processes are preordered in both directions,* p *<sup>X</sup>* q *and* q *<sup>X</sup>* <sup>p</sup>*, then they are considered* <sup>X</sup>*-equivalent,* <sup>p</sup> <sup>∼</sup>*<sup>X</sup>* <sup>q</sup>*.*

Fig. 3 charts the *linear-time–branching-time spectrum*. If processes are preordered/equated by one notion of equivalence, they also are preordered/equated by every notion below. We will later formally characterize the notions through Proposition 1. For a thorough presentation, we point to [23]. For those familiar with the spectrum, the following example serves to refresh memories.

*Example 1.* Fig. 4 shows a tiny slice of the weak-step-saturated version of our initial example from Fig. 1 that is at the heart of why Pe and Mx are not bisimulation-equivalent. The difference between S and S is that S can internally transition to Div (labeled <sup>τ</sup> −→) without ever performing an *ec<sup>A</sup>* action. We can express this difference by the formula <sup>ϕ</sup><sup>S</sup> := <sup>τ</sup> {¬ec<sup>A</sup>}, meaning "after <sup>τ</sup> , *ec<sup>A</sup>* may be impossible." It is true for S, but not for S . Knowing a distinguishing formula means that S and S cannot be bisimilar by the Hennessy–Milner theorem. The formula <sup>ϕ</sup><sup>S</sup> is called a *failure* (or *refusal*) as it specifies a set of actions that are disabled after a trace. In the other direction of comparison, the negation ϕS- := {¬τ {¬ec<sup>A</sup>}} distinguishes <sup>S</sup> from <sup>S</sup>. The differences between the two processes cannot be expressed in HML without negation. Therefore the processes are simulation-equivalent, or *similar*, as similarity is characterized by the positive fragment of HML.

#### 2.2 Price Spectra of Behavioral Equivalences

For algorithms exploring the linear-time–branching-time spectrum, it is convenient to have a representation of the spectrum in terms of numbers or "prices" of formulas as in [7]. We, here, use six dimensions to characterize the notions of equivalence depicted in Fig. 3. The numbers define the HML observation languages that characterize the very preorders/equivalences. Intuitively, the colorful

Fig. 5. Pricing e of formula τ -{ecAlcAT, <sup>τ</sup> T, ¬ecBT}.

numbers mean: (1) Formula modal depth of observations. (2) Formula nesting depth of conjunctions. (3) Maximal modal depth of deepest positive clauses in conjunctions. (4) Maximal modal depth of the other positive clauses in conjunctions. (5) Maximal modal depth of negative clauses in conjunctions. (6) Formula nesting depth of negations. More formally:

Definition 4 (Energies). *We denote as* energies*,* **En***, the set of* N*-dimensional vectors* (N)<sup>N</sup>*, and as* extended energies*,* **En**∞*, the set* (<sup>N</sup> ∪ {∞})<sup>N</sup> *.*

*We compare energies component-wise, i.e. ,* (e1,...,e<sup>N</sup> ) <sup>≤</sup> (f1,...,f<sup>N</sup> ) *iff* <sup>e</sup><sup>i</sup> <sup>≤</sup> <sup>f</sup><sup>i</sup> *for each* <sup>i</sup>*. Least upper bounds* sup *are defined as usual as componentwise supremum, as are greatest lower bounds* inf*.*

Definition 5 (Formula prices). *The* expressiveness price expr: HML[Σ] <sup>→</sup> (N)<sup>6</sup> *of a formula interpreted as* 6*-dimensional energies is defined recursively by:*

$$\begin{aligned} \mathsf{expr}(\langle a\rangle\varphi) &:= \begin{pmatrix} 1+\mathsf{expr}\_1(\varphi) \\ \mathsf{expr}\_2(\varphi) \\ \mathsf{expr}\_3(\varphi) \\ \mathsf{expr}\_4(\varphi) \\ \mathsf{expr}\_5(\varphi) \\ \mathsf{expr}\_6(\varphi) \end{pmatrix} \qquad \mathsf{expr}(\neg\varphi) := \begin{pmatrix} \mathsf{expr}\_1(\varphi) \\ \mathsf{expr}\_2(\varphi) \\ \mathsf{expr}\_3(\varphi) \\ \mathsf{expr}\_4(\varphi) \\ \mathsf{expr}\_5(\varphi) \\ 1+\mathsf{expr}\_6(\varphi) \end{pmatrix} \\ \mathsf{expr}(\bigwedge\_{i\in I} \psi\_i) &:= \sup\left\{ \begin{pmatrix} 0 & 0 \\ 1+\sup\_{i\in I}\mathsf{expr}\_2(\psi\_i) \\ \sup\_{i\in I^{os}}\mathsf{expr}\_1(\psi\_i) \\ \sup\_{i\in I^{os}}\mathsf{expr}\_1(\psi\_i) \\ 0 \end{pmatrix} \right\} \cup \{\mathsf{expr}(\psi\_i) \mid i \in I\} \right\} \\ \mathsf{Key} &:= \left\{ i \in I \mid \exists \varphi\_i', \psi\_i = \neg\varphi\_i' \right\} \\ \mathsf{Pos} &:= I \mid \mathcal{N}\mathbf{g} \end{aligned}$$
 
$$\begin{aligned} R &:= \left\{ i \in I \mid \exists \varphi\_i', \psi\_i = \neg\varphi\_i' \right\} \\ R &:= \left\{ \begin{pmatrix} \mathcal{O} & \text{if } Pos = \mathcal{O} \\ \{r\} & \text{for some } r \in Pos \text{ where } \mathsf{expr}\_1(\psi\_r) \text{ maximal for Pos.} \end{pmatrix} \right\} \end{aligned}$$

Fig. 6. Cut through the price lattice with dimensions 2 (conjunction nesting) and 5 (negated observation depth).

Figure 5 gives an example how the prices compound. The colors of the lines match those used for the dimensions and their updates in the other figures. Circles mark the points that are counted. The formula itself expresses a socalled *ready-trace* observation: We observe a trace <sup>τ</sup> · *ec<sup>A</sup>* · *lc<sup>A</sup>* and, along the way, may check what other options would have been enabled or disabled. Here, we check that <sup>τ</sup> is enabled and *ec<sup>B</sup>* is disabled after the first <sup>τ</sup> -step. With the pricing, we can characterize all standard notions of equivalence:

Proposition 1. *On finite systems, the languages of formulas with prices below the coordinates given in Fig. 3 characterize the named notions of equivalence, that is,* p *<sup>X</sup>* <sup>q</sup> *with respect to equivalence* <sup>X</sup>*, iff no* <sup>ϕ</sup> *with* expr(ϕ) <sup>≤</sup> <sup>e</sup><sup>X</sup> *distinguishes* p *from* q*.*

*Example 2.* The formulas of Example <sup>1</sup> have prices: expr(τ {¬ec<sup>A</sup>}) = (2, <sup>2</sup>, <sup>0</sup>, <sup>0</sup>, <sup>1</sup>, 1) for <sup>ϕ</sup><sup>S</sup> and expr( {¬τ {¬ec<sup>A</sup>}}) = (2, 3, 0, 0, 2, 2) for ϕ<sup>S</sup>- . The prices of the two are depicted as red marks in Fig. 6. This also visualizes how ϕ<sup>S</sup> is a counterexample for bisimilarity and how <sup>ϕ</sup><sup>S</sup> is a counterexample for failure and finer preorders. Indeed the two preorders are coarsest ways of telling the processes apart. So, S and S are equated by all preorders *below* the marks, i.e. similarity, S ∼1S S , and coarser preorders (S ∼<sup>T</sup> S , S ∼<sup>E</sup> S ). This carries over to the initial example of Peterson's mutex protocol from Fig. 1, where weak simulation, Pe ∼1WS Mx, is the most precise equivalence. Practically, this means that the specification Mx has liveness properties not upheld by the implementation Px.

*Remark 1.* Definition 5 deviates from our previous formula pricing of [7] in a crucial way: We only collect the *maximal depths of positive clauses*, whereas [7] tracks *numbers of deep and flat positive clauses* (where a flat clause is characterized by an observation depth of 1). Our change to a purely "depth-guided" spectrum will allow us to characterize the spectrum by an energy game and to eliminate the Bell-numbered blow up from the game's branching-degree. The special treatment of the deepest positive branch is necessary to address revival, failure trace, and ready trace semantics, which are popular in the CSP community [17,31].

### 3 An Energy Game of Distinguishing Capabilities

Conventional equivalence problems ask whether a pair of processes is related by a specific equivalence. These problems can be abstracted into a more general "spectroscopy problem" to determine the set of equivalences from a spectrum that relate two processes as in [7]. This section captures the spectrum of Fig. 3 by one rather simple energy game.

### 3.1 Energy Games

Multidimensional energy games are played on graphs labeled by vectors to be added to (or subtracted from) a vector of "energies" where one player must pay attention to the energies not being exhausted. We plan to encode the distinction capabilities of the semantic spectrum as energy levels in an energy game enriched by min{...}-operations that takes minima of components. This way, energy levels where the defender has a winning strategy will correspond to equivalences that hold. We will just need updates decrementing or maintaining energy levels.

Definition 6 (Energy updates). *The set of* energy updates*,* **Up***, contains vectors* (u1,...,u<sup>N</sup> ) <sup>∈</sup> **Up** *where each component is of the form*

*–* <sup>u</sup><sup>k</sup> ∈ {−1, <sup>0</sup>}*, or –* <sup>u</sup><sup>k</sup> <sup>=</sup> min<sup>D</sup> *where* <sup>D</sup> ⊆ {1,...,N} *and* <sup>k</sup> <sup>∈</sup> <sup>D</sup>*.*

*Applying an update to an energy,* upd(e, u)*, where* <sup>e</sup> = (e<sup>1</sup>,...,e<sup>N</sup> ) <sup>∈</sup> **En** *(or* **En**∞*) and* <sup>u</sup> = (u<sup>1</sup>,...,u<sup>N</sup> ) <sup>∈</sup> **Up***, yields a new energy vector* <sup>e</sup> *where* <sup>k</sup>*th components* e <sup>k</sup> := <sup>e</sup><sup>k</sup> <sup>+</sup> <sup>u</sup><sup>k</sup> *for* <sup>u</sup><sup>k</sup> <sup>∈</sup> <sup>Z</sup> *and* <sup>e</sup> <sup>k</sup> := min<sup>d</sup>∈<sup>D</sup> <sup>e</sup><sup>d</sup> *for* <sup>u</sup><sup>k</sup> <sup>=</sup> minD*. Updates that would cause any component to become negative are illegal.*

Definition 7 (Games). *An* N*-dimensional* declining energy game <sup>G</sup>[g<sup>0</sup>, e<sup>0</sup>] = (G, G<sup>d</sup>, , w, g<sup>0</sup>, e<sup>0</sup>) *is played on a directed graph uniquely labeled by energy updates consisting of*

	- *a set of* defender positions <sup>G</sup><sup>d</sup> <sup>⊆</sup> <sup>G</sup>
	- *a set of* attacker positions <sup>G</sup><sup>a</sup> := <sup>G</sup> \ <sup>G</sup><sup>d</sup>*,*

*– an* initial energy budget *vector* <sup>e</sup><sup>0</sup> <sup>∈</sup> **En**∞*.*

*The notation* <sup>g</sup> *<sup>u</sup>* g *stands for* g g *and* w(g, g ) = u*.*

Definition 8 (Plays, energies, and wins). *We call the (finite or infinite) paths* <sup>ρ</sup> <sup>=</sup> <sup>g</sup><sup>0</sup>g<sup>1</sup> ... <sup>∈</sup> <sup>G</sup><sup>∞</sup> *with* <sup>g</sup><sup>i</sup> *ui* <sup>g</sup>i+1 plays *of* <sup>G</sup>[g<sup>0</sup>, e<sup>0</sup>]*.*

*The* energy level *of a play* ρ *at round* i*,* ELρ(i)*, is recursively defined as* ELρ(0) := <sup>e</sup><sup>0</sup> *and otherwise as* ELρ(i+1) := upd(ELρ(i), ui)*. If we omit the index,* ELρ*, this refers to the final energy level of a finite run* ρ*, i.e. ,* EL<sup>ρ</sup>(|ρ| − 1)*.*

*Plays where energy levels become undefined (negative) are won by the defender. So are infinite plays. If a finite play is stuck (i.e. ,* <sup>g</sup><sup>0</sup> ...g<sup>n</sup> *), the stuck player loses: The defender wins if* <sup>g</sup><sup>n</sup> <sup>∈</sup> <sup>G</sup>a*, and the attacker wins if* <sup>g</sup><sup>n</sup> <sup>∈</sup> <sup>G</sup>d*.*

Proposition 2. *In this model, energy levels can only decline.*


Definition 9 (Strategies and winning budgets). *An* attacker strategy *is a map from play prefixes ending in attacker positions to next game moves* s: (G<sup>∗</sup><sup>×</sup> <sup>G</sup>a) <sup>→</sup> <sup>G</sup> *with* <sup>s</sup>(g<sup>0</sup> ...g<sup>a</sup>) <sup>∈</sup> (g<sup>a</sup> ·)*. Similarly, a* defender strategy *names moves starting in defender states. If all plays consistent with a strategy* s *ensure a player to win,* s *is called a* winning strategy *for this player. The player with a winning strategy for* <sup>G</sup>[g0, e0] *is said to* win <sup>G</sup> *from position* <sup>g</sup><sup>0</sup> *with initial energy budget* e0*. We call* Win<sup>a</sup>(g) = {<sup>e</sup> | G[g, e] *is won by the attacker*} *the* attacker winning budgets*.*

Proposition 3. *The attacker winning budgets at positions are upward-closed with respect to energy, that is,* e <sup>∈</sup> Win<sup>a</sup>(g) *and* <sup>e</sup> <sup>≤</sup> <sup>e</sup> *implies* <sup>e</sup> <sup>∈</sup> Win<sup>a</sup>(g)*.*

This means we can characterize the set of winning attacker budgets in terms of minimal winning budgets Winmin <sup>a</sup> (g) = Min(Win<sup>a</sup>(g)), where Min(S) selects minimal elements {e <sup>∈</sup> S <sup>|</sup> e <sup>∈</sup> S. e <sup>≤</sup> e <sup>∧</sup> e = e}. Clearly, Winmin <sup>a</sup> must be an antichain and thus finite due to the energies being well-partially-ordered [26]. Dually, we may consider the *maximal* energy levels winning for the defender, Winmax <sup>d</sup> : <sup>G</sup> <sup>→</sup> **<sup>2</sup>En**<sup>∞</sup> where we need extended energies to bound won half-spaces.

#### 3.2 The Spectroscopy Energy Game

Let us now look at the "spectroscopy energy game" at the center of our contribution. Figure 7 gives a graphical representation. The intuition is that the attacker shows how to construct formulas that distinguish a process p from every q in a set of processes Q. The energies limit the expressiveness of the formulas. The first dimension bounds for how many turns the attacker may challenge observations

Fig. 7. Schematic spectroscopy game <sup>G</sup>of Definition 10.

of actions. The second dimension limits how often they may use conjunctions to resolve nondeterminism. The third, fourth, and fifth dimensions limit how deeply observations may nest underneath a conjunction, where the fifth stands for negated clauses, the third for one of the deepest positive clauses and the fourth for the other positive clauses. The last dimension limits how often the attacker may use negations to enforce symmetry by swapping sides. The moves closely match productions in the grammar of Definition 2 and prices in Definition 5.

Definition 10. (Spectroscopy energy game). *For a system* <sup>S</sup> = (P,Σ, −→)*, the* 6*-dimensional* spectroscopy energy game <sup>G</sup><sup>S</sup> [g0, e0]=(G, Gd, , w, g0, e0) *consists of*


*where* p, q ∈ P *and* Q, Q<sup>∗</sup> <sup>∈</sup> **<sup>2</sup>**P*, and six kinds of moves:*

*<sup>−</sup>* observation moves (p, Q)<sup>a</sup> (−1*,*0*,*0*,*0*,*0*,*0) (p , Q )<sup>a</sup> *if* <sup>p</sup> <sup>a</sup> −→ p *,* Q <sup>a</sup> −→ Q *, <sup>−</sup>* conj. challenges (p, Q)<sup>a</sup> (0*,*−1*,*0*,*0*,*0*,*0) (p, Q \ <sup>Q</sup><sup>∗</sup>, Q<sup>∗</sup>)<sup>d</sup> *if* <sup>Q</sup><sup>∗</sup> <sup>⊆</sup> <sup>Q</sup>*, <sup>−</sup>* conj. revivals (p, Q, Q<sup>∗</sup>)<sup>d</sup> (min{1*,*3}*,*0*,*0*,*0*,*0*,*0) (p, Q<sup>∗</sup>)<sup>a</sup> *if* <sup>Q</sup><sup>∗</sup> <sup>=</sup> <sup>∅</sup>*, <sup>−</sup>* conj. answers (p, Q, Q<sup>∗</sup>)<sup>d</sup> (0*,*0*,*0*,*min{3*,*4}*,*0*,*0) (p, q) ∧ <sup>a</sup> *if* <sup>q</sup> <sup>∈</sup> <sup>Q</sup>*, <sup>−</sup>* positive decisions (p, q) ∧ a (min{1*,*4}*,*0*,*0*,*0*,*0*,*0) (p, {q})<sup>a</sup>, *and <sup>−</sup>* negative decisions (p, q) ∧ a (min{1*,*5}*,*0*,*0*,*0*,*0*,*−1) (q, {p})<sup>a</sup> *if* <sup>p</sup> <sup>=</sup> <sup>q</sup>*.*

The spectroscopy energy game is a bisimulation game in the tradition of Stirling [33].

Fig. 8. Example <sup>3</sup> spectroscopy energy game, minimal attacker winning budgets, and distinguishing formulas/clauses. (In order to reduce visual load, only the first components of the updates are written next to the edges. The other components are 0.)

Lemma 1. (Bisimulation game, proof see [5]). <sup>p</sup><sup>0</sup> *and* <sup>q</sup><sup>0</sup> *are bisimilar iff the defender wins* <sup>G</sup>[(p<sup>0</sup>, {q<sup>0</sup>})<sup>a</sup>, e<sup>0</sup>] *for every initial energy budget* <sup>e</sup><sup>0</sup>*, i.e. if* (∞,∞,∞,∞,∞,∞) <sup>∈</sup> Winmax <sup>d</sup> ((p<sup>0</sup>, {q<sup>0</sup>})<sup>a</sup>)*.*

In other words, if there are initial budgets winning for the attacker, then the compared processes can be told apart. For G, the attacker "unknown initial credit problem" in energy games [34] coincides with the "apartness problem" [20] for processes.

*Example 3.* Figure <sup>8</sup> shows the spectroscopy energy game starting at (S, {S })a from Example 1. The lower part of each node displays the node's Winmin <sup>a</sup> . The magenta HML formulas illustrate distinctions relevant for the correctness argument of the following Subsect. 3.3. Section 4 will describe how to obtain attacker winning budgets and equivalences. The blue "symmetric" positions are definitely won by the defender—we omit the game graph below them. We also omit the move (S , {S, Div})<sup>a</sup> (0*,*−1*,*0*,*0*,*0*,*0) (S , {S}, {Div})<sup>d</sup>—it can be ignored as will be discussed in Subsect. 3.4.

### 3.3 Correctness: Tight Distinctions

We will check that winning budgets indeed characterize what equivalences hold by constructing price-minimal distinguishing formulas from attacker budgets.

Definition 11 (Strategy formulas). *Given the set of winning budgets* Wina*, the set of* attacker strategy formulas Strat *for a position with given energy level* e *is defined inductively as follows:*

$$\begin{array}{llll} \langle b \rangle \varphi \in \mathsf{Strat}((p,Q)\_{\mathfrak{a}},e) & \mbox{if } (p,Q)\_{\mathfrak{a}} \,\,\forall \,(p',Q')\_{\mathfrak{a}}, \,e'=\mathsf{upd}(e,u) \in \mathsf{Win}\_{\mathfrak{a}}((p',Q')\_{\mathfrak{a}}),\\ \,p\xrightarrow{\widetilde{b}} p', \,Q\xrightarrow{\widetilde{b}} Q', \,\mathit{and}\,\varphi \in \mathsf{Strat}((p',Q')\_{\mathfrak{a}},e'),\\ \,\omega \in \mathsf{Strat}((n,O)\_{\mathfrak{a}},e) & \mbox{if } (n,O) \ \Rightarrow (n,O,O\_{\mathfrak{a}})\_{\mathfrak{a}} \,\,\forall e'=\mathsf{upd}(e,u) \in \mathsf{Win}\_{\mathfrak{a}}((n,O,O\_{\mathfrak{a}})\_{\mathfrak{a}}). \end{array}$$

p <sup>ϕ</sup> <sup>∈</sup> Strat((p, Q)<sup>a</sup>, e) *if* (p, Q)<sup>a</sup> *u* (p, Q, Q<sup>∗</sup>)<sup>d</sup>*,* e = upd(e, u) <sup>∈</sup> Win<sup>a</sup>((p, Q, Q<sup>∗</sup>)<sup>d</sup>)*, and* ϕ <sup>∈</sup> Strat((p, Q, Q<sup>∗</sup>)<sup>d</sup>, e )*,*

 <sup>q</sup>∈<sup>Q</sup>ψ<sup>q</sup> <sup>∈</sup> Strat((p, Q, <sup>∅</sup>)<sup>d</sup>, e) *if* (p, Q, <sup>∅</sup>)<sup>d</sup> *uq* (p, q) ∧ <sup>a</sup> *,* <sup>e</sup><sup>q</sup> <sup>=</sup> upd(e, u<sup>q</sup>) <sup>∈</sup> Win<sup>a</sup>((p, q) ∧ <sup>a</sup> ) *and* <sup>ψ</sup><sup>q</sup> <sup>∈</sup> Strat((p, q) ∧ , e<sup>q</sup>) *for each* q <sup>∈</sup> Q*,*

<sup>a</sup> <sup>q</sup>∈Q∪{∗}ψ<sup>q</sup> <sup>∈</sup> Strat((p, Q, Q<sup>∗</sup>)<sup>d</sup>, e) *if* (p, Q, Q<sup>∗</sup>)<sup>d</sup> *uq* (p, q) ∧ <sup>a</sup> *,* <sup>e</sup><sup>q</sup> <sup>=</sup> upd(e, u<sup>q</sup>) <sup>∈</sup> Win<sup>a</sup>((p, q) ∧ <sup>a</sup> ) *and* <sup>ψ</sup><sup>q</sup> <sup>∈</sup> Strat((p, q) ∧ <sup>a</sup> , e<sup>q</sup>) *for each* <sup>q</sup> <sup>∈</sup> <sup>Q</sup>*, and if* (p, Q, Q<sup>∗</sup>)<sup>d</sup> *<sup>u</sup>*<sup>∗</sup> (p, Q<sup>∗</sup>)<sup>a</sup>*,* <sup>e</sup><sup>∗</sup> <sup>=</sup> upd(e, u<sup>∗</sup>) <sup>∈</sup> Win<sup>a</sup>((p, Q<sup>∗</sup>)<sup>a</sup>)*, and* <sup>ψ</sup><sup>∗</sup> <sup>∈</sup> Strat((p, Q<sup>∗</sup>)<sup>a</sup>, e<sup>∗</sup>) *is an observation,*

ϕ <sup>∈</sup> Strat((p, q) ∧ <sup>a</sup> , e) *if* (p, q) ∧ a *u* (p, {q})<sup>a</sup>*,* e = upd(e, u) <sup>∈</sup> Win<sup>a</sup>((p, {q})<sup>a</sup>) *and* ϕ <sup>∈</sup> Strat((p, {q})<sup>a</sup>, e ) *is an observation, and*

$$\neg \varphi \in \dot{\mathsf{Strat}}((p, q)\_{\sf a}^{\uparrow}, e) \; \dot{\mathsf{if}}^{\uparrow}(p, q)\_{\sf a}^{\uparrow} \; \mathsf{Set}\ (q, \{p\})\_{\sf a}, \; e' = \mathsf{upd}(e, u) \in \mathsf{Win}\_{\sf a}((q, \{p\})\_{\sf a}) \; \| \; \mathsf{Set}\ ((q, \{p\})\_{\sf a}) \; \| \; \mathsf{Set}\ ((q, \{p\})\_{\sf a}^{\uparrow}, e') \; \| \; \mathsf{Set}\ ((q, \{p\})\_{\sf a}^{\uparrow}, e') \; \| \; \mathsf{Set}\ ((q, \{p\})\_{\sf a}^{\uparrow}, e') \; \| \; \mathsf{Set}\ ((q, \{p\})\_{\sf a}^{\uparrow}, e') \; \| \; \mathsf{Set}\ ((q, \{p\})\_{\sf a}^{\uparrow}, e') \; \| \; \mathsf{Set}\ ((q, \{p\})\_{\sf a}^{\uparrow}, e') \; \| \; \mathsf{Set}\ ((q, \{p\})\_{\sf a}^{\uparrow}, e') \; \| \; \mathsf{Set}\ ((q, \{p\})\_{\sf a}^{\uparrow}, e') \; \| \; \mathsf{Set}\ ((q, \{p\})\_{\sf a}^{\uparrow}, e') \; \| \; \mathsf{Set}\ ((q, \{p\})\_{\sf a}^{\uparrow}, e') \; \| \; \mathsf{Set}\ ((q, \{p\})\_{\sf a}^{\uparrow}, e') \; \| \; \mathsf{Set}\ ((q, \{p\})\_{\sf a}^{\uparrow}, e') \; \| \; \mathsf{Set}\ ((q, \{p\})\_{\sf a}^{\uparrow}, e') \; \|$$

Because of the game structure, we actually know the u needed in each line of the definition. It is u = (−1, 0, 0, 0, 0, 0) in the first case; (0, <sup>−</sup>1, 0, 0, 0, 0) in the second; (0, <sup>0</sup>, <sup>0</sup>, min{3,4}, 0, 0) in the third; (0, 0, 0, min{3,4}, 0, 0) and (min{1,3}, 0, 0, 0, 0, 0) in the fourth; (min{1,4}, 0, 0, 0, 0, 0) in the fifth; and (min{1,5}, 0, 0, 0, 0, <sup>−</sup>1) in last case. Strat((p, q) ∧ <sup>a</sup> , ·) can contain negative clauses, which form no proper formulas on their own.

Lemma 2 (Price soundness). <sup>ϕ</sup> <sup>∈</sup> Strat((p, Q)<sup>a</sup>, e) *implies that* expr(ϕ) <sup>≤</sup> e *and that* expr(ϕ) <sup>∈</sup> Win<sup>a</sup>((p, Q)<sup>a</sup>)*.*

*Proof.* By induction on the structure of ϕ with arbitrary p, Q, e, exploiting the alignment of the definitions of winning budgets and formula prices. Full proof in [5].

Lemma 3 (Price completeness). <sup>e</sup><sup>0</sup> <sup>∈</sup> Win<sup>a</sup>((p<sup>0</sup>, Q<sup>0</sup>)<sup>a</sup>) *implies there are elements in* Strat((p<sup>0</sup>, Q<sup>0</sup>)<sup>a</sup>, e<sup>0</sup>)*.*

*Proof.* By induction on the tree of winning plays consistent with some attacker winning strategy implied by <sup>e</sup><sup>0</sup> <sup>∈</sup> Win<sup>a</sup>((p<sup>0</sup>, Q<sup>0</sup>)<sup>a</sup>). Full proof in [5].

Lemma 4 (Distinction soundness). *Every* <sup>ϕ</sup> <sup>∈</sup> Strat((p, Q)<sup>a</sup>, e) *distinguishes* p *from every* q <sup>∈</sup> Q*.*

*Proof.* By induction on the structure of ϕ with arbitrary p, Q, e, exploiting that Strat can only construct formulas with the invariant that they are true for p and false for each q <sup>∈</sup> Q. Full proof in [5].

Lemma 5 (Distinction completeness). *If* ϕ *distinguishes* p *from every* q <sup>∈</sup> Q*, then* expr(ϕ) <sup>∈</sup> Win<sup>a</sup>((p, Q)a)*.*

*Proof.* By induction on the structure of ϕ with arbitrary p, Q, exploiting the alignment of game structure and HML semantics and the fact that expr cannot "overtake" inverse updates. Full proof in [5].

Theorem 1 (Correctness). *For any equivalence* X *with coordinate* e<sup>X</sup>*,* <sup>p</sup> X <sup>q</sup>*, precisely if all* <sup>e</sup>pq <sup>∈</sup> Winmin <sup>a</sup> ((p, {q})a) *are above or incomparable,* <sup>e</sup>pq <sup>≤</sup> <sup>e</sup><sup>X</sup>*.*

*Proof.* By contraposition, in both directions.


The theorem basically means that by fixing an initial budget in G, we can obtain a characteristic game for any notion from the spectrum.

#### 3.4 Becoming More Clever by Looking One Step Ahead

The spectroscopy energy game G of Definition 10 may branch exponentially with respect to <sup>|</sup>Q<sup>|</sup> at conjunction challenges after (p, Q)<sup>a</sup>. For the spectrum we are interested in, we can drastically limit the sensible attacker moves to four options by a little lookahead into the enabled actions <sup>I</sup>(q) of q <sup>∈</sup> Q and <sup>I</sup>(p).

Definition 12 (Clever spectroscopy game). *The* clever spectroscopy game*,* G-*, is defined exactly like the previous spectroscopy energy game of Definition 10 with the restriction of the conjunction challenges*

$$(p,Q)\_{\mathfrak{a}} \xleftarrow{\ell^{(0,-1,0,0,0,0)}} \mathsf{A} \quad (p,Q \nmid Q\_\*,Q\_\*)\_{\mathfrak{d}} \quad \text{with } Q\_\* \subseteq Q,$$

*to situations where* <sup>Q</sup><sup>∗</sup> ∈ {∅, {<sup>q</sup> <sup>∈</sup> <sup>Q</sup> | I(q) ⊆ I(p)}, {<sup>q</sup> <sup>∈</sup> <sup>Q</sup> | I(p) ⊆ I(q)}, {q <sup>∈</sup> Q | I(p) = <sup>I</sup>(q)}}.

Theorem 2 (Correctness of cleverness). *Assume modal depth of positive clauses* <sup>e</sup><sup>4</sup> ∈ {0, <sup>1</sup>,∞}*,* <sup>e</sup><sup>4</sup> <sup>≤</sup> <sup>e</sup><sup>3</sup>*, and that modal depth of negative clauses* <sup>e</sup><sup>5</sup> <sup>&</sup>gt; <sup>1</sup> *implies* <sup>e</sup><sup>3</sup> <sup>=</sup> <sup>e</sup><sup>4</sup>*. Then, the attacker wins* <sup>G</sup>-[(p<sup>0</sup>, Q<sup>0</sup>)<sup>a</sup>, e] *precisely if they win* <sup>G</sup>[(p<sup>0</sup>, Q<sup>0</sup>)<sup>a</sup>, e]*.*

*Proof.* The implication from the clever spectroscopy game G to the full spectroscopy game G is trivial as the attacker moves in are a subset of those in and the defender has the same moves in both games. For the other direction, we have to show that any move (p, Q)<sup>a</sup> (0*,*−1*,*0*,*0*,*0*,*0) (p, Q \ <sup>Q</sup><sup>∗</sup>, Q<sup>∗</sup>)<sup>d</sup> winning at energy level <sup>e</sup> can be simulated by a winning move (p, Q)<sup>a</sup> (0*,*−1*,*0*,*0*,*0*,*0) - (p, Q \ Q , Q )<sup>d</sup>. Full proof in [5].

### 4 Computing Equivalences

The previous section has shown that attacker winning budgets in the spectroscopy energy game characterize distinguishable processes and, dually, that the defender's wins characterize equivalences. We now examine how to actually compute the winning budgets of both players.

#### 4.1 Computation of Attacker Winning Budgets

The winning budgets of the attacker (Definition 9) are characterized inductively:


By Proposition 3, it suffices to find the finite set of minimal winning budgets, Winmin <sup>a</sup> . Turning this into a computation is not as straightforward as in other energy game models. Due to the minD-updates, the energy update function upd(·, u) is neither injective nor surjective.

We must choose an inversion function upd−<sup>1</sup> that picks minimal solutions and that minimally "casts up" inputs that are outside the image of upd(·, u), i.e., such that upd−1(e , u) = inf{e <sup>|</sup> e <sup>≤</sup> upd(e, u)}. We compute it as follows:

Definition 13 (Inverse update). *The* inverse update *function is defined as* upd−1(e , u) := sup({e}∪{m(i) | ∃D. u<sup>i</sup> <sup>=</sup> minD}) *with* <sup>e</sup><sup>i</sup> <sup>=</sup> <sup>e</sup> <sup>i</sup> <sup>−</sup> <sup>u</sup><sup>i</sup> *for all* <sup>i</sup> *where* <sup>u</sup><sup>i</sup> ∈ {0, <sup>−</sup>1} *and* <sup>e</sup><sup>i</sup> <sup>=</sup> <sup>e</sup> <sup>i</sup> *otherwise, and with* (m(i))<sup>j</sup> <sup>=</sup> <sup>e</sup> <sup>i</sup> *for* <sup>u</sup><sup>i</sup> <sup>=</sup> min<sup>D</sup> *and* <sup>j</sup> <sup>∈</sup> <sup>D</sup>*, and* (m(i))<sup>j</sup> = 0 *otherwise, for all* i, j*.*

*Example 4.* Let <sup>u</sup> := (min{1,3}, min{1,2}, <sup>−</sup>1, <sup>−</sup>1). (3, 4, 0, 1) <sup>∈</sup>/ img(upd(·, u)), but:

$$\begin{aligned} \textsf{upd}^{-1}((3,4,0,1),u) &= \sup\{ (3,4,1,2), (3,0,3,0), (4,4,0,0) \} = (4,4,3,2) \\ \textsf{upd}((4,4,3,2),u) &= (3,4,2,1) \ge (3,4,0,1) \\ \textsf{upd}^{-1}((3,4,2,1),u) &= \sup\{ (3,4,3,2), (3,0,3,0), (4,4,0,0) \} = (4,4,3,2) \end{aligned}$$

With upd−<sup>1</sup> , we only need to find the Winmin <sup>a</sup> relation as a least fixed point of the inductive description. This is done by Algorithm 1. Every time a new way of winning a position for the attacker is discovered, this position is added to the todo. Initially, these are the positions where the defender is stuck. The update at an attacker position in Line 8 takes the inversely updated budgets (upd−<sup>1</sup>) of successor positions to be tentative attacker winning budgets. At a defender position, the attacker only wins if they have winning budgets for all follow-up positions (Line 12). Any supremum of such budgets covering all follow-ups will be winning for the attacker (Line 13). At both updates, we only select the minima as a finite representation of the infinitely many attacker budgets.

```
1 def compute_winning_budgets(G = (G, Gd, , w)):
2 attacker_win := [g → {} | g ∈ G]
3 todo := {g ∈ Gd | g 
                      · }
4 while todo = ∅:
5 g := some todo
6 todo := todo \ {g}
7 if g ∈ Ga :
8 new_attacker_win := Min(attacker_win[g] ∪ {upd−1(e
                                                        , u) |
            g u g ∧ e ∈ attacker_win[g
                                    ]})
9 else:
10 defender_post := {g | g u g
                                   }
11 options := {(g
                       , upd−1(e
                               , u)) | g u g ∧ e ∈ attacker_win[g
                                                            ]}}
12 if defender_post ⊆ dom(options) :
13 new_attacker_win := Min({supg-
                                         ∈defender_post strat(g
                                                         ) |
                strat ∈ (G → En) ∧ ∀g
                                  . strat(g
                                         ) ∈ options(g
                                                    )})
14 else:
15 new_attacker_win := ∅
16 if new_attacker_win = attacker_win[g] :
17 attacker_win[g] := new_attacker_win
18 todo := todo ∪ {gp | ∃u. gp u g}
19 Winmin
        a := attacker_win
20 return Winmin
```
a Algorithm 1: Algorithm for computing attacker winning budgets of declining energy game G.

### 4.2 Complexity and How to Flatten It

For finite games, Algorithm 1 is sure to terminate in exponential time of game graph branching degree and dimensionality.

Lemma 6 (Winning budget complexity, proof see [5]). *For an* N*-dimensional declining energy game with of branching degree* o*, Algorithm <sup>1</sup> terminates in* <sup>O</sup>(<sup>|</sup> |·|G<sup>|</sup> <sup>N</sup> · (o + <sup>|</sup>G<sup>|</sup> (N−1)·<sup>o</sup>)) *time, using* <sup>O</sup>(|G<sup>|</sup> N ) *space for the output.*

Lemma 7 (Full spectroscopy complexity). *Time complexity of computing winning budgets for the full spectroscopy energy game* <sup>G</sup> *is in* <sup>2</sup><sup>O</sup>(|P|·2|P|) *.*

*Proof.* Out-degrees <sup>o</sup> in <sup>G</sup> can be bounded in <sup>O</sup>(2|P|), the whole game graph <sup>|</sup> |∈O(<sup>|</sup> · −→|·2|P| +|P|<sup>2</sup> ·3|P|), and game positions <sup>|</sup>G|∈O(|P|·3|P|). Insert with N = 6 in Lemma 6. Full proof in [5].

We thus have established the approach to be double-exponential. The complexity of the previous spectroscopy algorithm [7] has not been calculated. One must presume it to be equal or higher as the game graph has Bell-numbered branching degree and as the algorithm computes formulas, which entails more options than

the direct computation of energies. This is what lies behind the introduction's observation that moderate nondeterminism already renders [7] unusable.

Our present energy game reformulation allows us to use two ingredients to do way better than double-exponentially when focussing on the common lineartime–branching-time spectrum:

First, Subsect. 3.4 has established that most of the partitionings in attacker conjunction moves can be disregarded by looking at the initial actions of processes.

Second, Fahrenberg et al. [15] have shown that considering just "capped" energies in a grid **En**<sup>K</sup> <sup>=</sup> {0,...,K}<sup>N</sup> can reduce complexity. Such a *flattening of the lattice* turns the space of possible energies into constant factor (K + 1)<sup>N</sup> (with (K + 1)<sup>N</sup>−<sup>1</sup>-sized antichains) independent of input size. For Algorithm 1, space complexity needed for attacker\_win drops to <sup>O</sup>(|G|) and time complexity to <sup>|</sup> |· <sup>2</sup><sup>O</sup>(o) . If we are only interested in finitely many notions of equivalence as in the case of Fig. 3, we can always bound the energies to range to the maximal appearing number plus one. The last number represents all numbers outside the bound up to infinity.

Lemma 8 (Clever spectroscopy complexity). *Time complexity of computing winning budgets for the clever spectroscopy energy game* G *with capped energies is in* 2<sup>O</sup>(|P|) *.*

*Proof.* Out-degrees o in <sup>G</sup> can be bounded in <sup>O</sup>(|P|), the whole game graph | -|∈O(<sup>|</sup> · −→| · 2|P| + |P|<sup>2</sup> · 2|P|), and game positions <sup>|</sup>G-|∈O(|P| · 2|P|). Inserting in the flattened version of Lemma 6 yields:

$$\begin{split} \mathcal{O}(|\succ\_{\mathsf{A}}| \cdot 2^{C\_{0} \cdot o}) &= \mathcal{O}((|\overset{\cdot}{\to}| \cdot 2^{|\mathcal{P}|} + |\mathcal{P}|^{2} \cdot 2^{|\mathcal{P}|}) \cdot 2^{C\_{1} \cdot |\mathcal{P}|}) \\ &= \mathcal{O}((|\overset{\cdot}{\to}| + |\mathcal{P}|^{2}) \cdot 2^{C\_{2} \cdot |\mathcal{P}|}) \\ &= \mathcal{O}(|\overset{\cdot}{\to}| \cdot 2^{C\_{2} \cdot |\mathcal{P}|}) .\end{split}$$

Deciding trace equivalence in nondeterministic systems is PSPACE-hard and will thus take at least exponential time. Therefore, the exponential time of the "clever" spectroscopy algorithm restricted to a finite spectrum is about as good as it may get, asymptotically speaking.

#### 4.3 Equivalences and Distinguishing Formulas from Budgets

For completeness, let us briefly flesh out how to actually obtain equivalence information from the minimal attacker winning budgets Winmin <sup>a</sup> ((p, {q})<sup>a</sup>) we compute.

Definition 14. *For an antichain Mn* ⊆ **En** *characterizing an upper part of the energy space, the complement antichain Mn* := Min (**En**<sup>∞</sup> <sup>∩</sup> ({(sup <sup>E</sup> ) <sup>−</sup> (1,..., 1) <sup>|</sup> <sup>E</sup> <sup>⊆</sup> *Mn*}∪{e(i) <sup>∈</sup> **En**<sup>∞</sup> <sup>|</sup> (e(i))<sup>i</sup> = (inf *Mn*)<sup>i</sup> <sup>−</sup> <sup>1</sup> ∧ ∀<sup>j</sup> <sup>=</sup> i.(e(i))<sup>j</sup> <sup>=</sup> ∞})) *has the complement energy space as its downset.*

Winmax <sup>d</sup> ((p, {q})a) = Winmin <sup>a</sup> ((p, {q})a) characterizes *all* preordering formula languages and thus equivalences defined in terms of expressiveness prices for p and q. This might contain multiple, incomparable, notions from the spectrum. Taking both directions, Winmin <sup>a</sup> ((p, {q})a) <sup>∪</sup> Winmin <sup>a</sup> ((q, {p})a), will thus characterize the finest intersection of equivalences to equate p and q.

If we just wonder which of the equivalences from the spectrum hold, we may establish this more directly by checking which of them are not dominated by attacker wins.

From the information, we can also easily build witness relations to certify that we return sound equivalence results. In particular, the pairs won with arbitrary attacker budgets, {(p, q) <sup>|</sup> (∞,∞,∞,∞,∞,∞) <sup>∈</sup> Winmax <sup>d</sup> ((p, {q})<sup>a</sup>)} are a bisimulation. Similarly, the strategy formulas of Definition 9 can directly be computed to explain inequivalence.

If we use symbolic winning budgets capped as proposed at the end of Subsect. 4.2, the formula reconstruction will be harder and the Winmin <sup>a</sup> ((p, {q})<sup>a</sup>) might be below the maximal defender winning budgets if these exceed the bound. But this will not matter as long as we choose a cap beyond the natural numbers that characterize our spectrum.

### 5 Exploring Minimizations

Our algorithm can be used to analyze the equivalence structure of moderatelysized real-world transition systems. In this section, we take a brief look at its performance on the VLTS ("very large transition systems") benchmark suite [18] and return to our initial Peterson example.

The energy spectroscopy algorithm has been added to the Linear-Time– Branching-Time Spectroscope of [7] and can be tried on transition systems at https://equiv.io/.

Table 1 reports the results of running the implementation of [7] and this paper's implementation in variants using the spectroscopy energy game G and the clever spectroscopy energy game G-. We tested on the VLTS examples of up to 25,000 states and the Peterson example (Fig. 1). The table lists the Psizes of the input transition systems and of their bisimilarity quotient system P/∼<sup>B</sup> . The spectroscopies have been performed on the bisimilarity quotient systems by constructing the game graph underneath positions comparing all pairs of enabledness-equivalent states. The middle three groups of columns list the resource usage for the three implementations using: the [7]-spectroscopy, the energy game G, and the clever game G-. For each group, the first column reports traversed game size, and the second gives the time the spectroscopy took in seconds. Where the tests ran out of memory or took longer than five minutes (in the Java Virtual Machine with 8 GB heap space, at 4 GHz, singlethreaded), the cells are left blank. The last three columns list the output sizes of state spaces reduced with respect to enabledness ∼E, traces ∼T, and simulation ∼1S—as one would hope, all three algorithms returned the same results.

From the output, we learn that the VLTS examples, in a way, lack diversity: Bisimilarity ∼<sup>B</sup> and trace equivalence ∼<sup>T</sup> mostly coincide on the systems (third and penultimate column).

Concerning the algorithm itself, the experiments reveal that the computation time grows mostly linearly with the size of the game move graph. Our algorithm can deal with bigger examples than [7] (which fails at peterson, vasy\_10\_56 and cwi\_1\_2, and takes more than 500 s for vasy\_8\_24). Even where [7] has a smaller game graph (e.g. cwi\_3\_14), the exponential formula construction renders it slower. Also, the clever game graph indeed is much smaller than for examples with a lot of nondeterminism such as peterson.


Table 1. Sample systems, sizes, and benchmark results.

Of those terminating, the heavily nondeterministic cwi\_1\_2 is the most expensive example. As many coarse notions must record the nondeterministic options, this blowup is to be expected. If we compare to the best similarity algorithm by Ranzato and Tapparo [29], they report their algorithm SA to tackle cwi\_1\_2 single-handedly. Like our implementation, the prototype of SA [29] ran out of memory while determining similarity for vasy\_18\_73. This is in spite of SA theoretically having optimal complexity and similarity being less complex (cubic) than trace equivalence, which we need to cover. The benchmarks in [29] failed at vasy\_10\_56, and vasy\_25\_25, which might be due to 2010's tighter memory requirements (they used 2 GB of RAM) or the degree to which bisimilarity and enabledness in the models is exploited.

### 6 Conclusion and Related Work

This paper has connected two strands of research in the field of system analysis: The strand of *equivalence games on transition systems* starting with Stirling's bisimulation game [7,12,32,33] and the strand of *energy games for systems of bounded resources* [2,10,11,14–16,27,30,34].

The connection rests on the insight that levels of equivalence correspond to resources available to an attacker who tries to tell two systems apart. This parallel is present in recent work within the security domain [25] just as much as in the first thoughts on observable nondeterminism by Hennessy and Milner [24].

The paper has not examined the precise relationship of the games of Sect. 3 to the whole zoo of VASS, energy, mean-payoff, monotonic [1], and counter games. The spectroscopy energy game deviates slightly from common multienergy games due to minD-updates and due to the attacker being energy-bound (instead of the defender). As the energies cannot be exhausted by defender moves, the game can also be interpreted as a VASS game [2,10] where the attacker is stuck if they run out of energy. Our algorithm complexity matches that of general lower-bounded N-dimensional energy games [15]. Links between our declining energy games and other games from the literature might enable slight improvements of the algorithm. For instance, reachability in VASS games can turn polynomial [11].

In the strand of generalized game characterizations for equivalences [7,12,32], this paper extends applicability for real-world systems. The implementation performs on par with the most efficient similarity algorithm [29]. Given that among the hundreds of equivalence algorithms and tools most primarily address bisimilarity [19], a tool for coarser equivalences is a worthwhile addition. Although our previous algorithm [7] is able to solve the spectroscopy problem, its reliance on super-exponential partitions of the state space makes it ill-fit for transition systems with significant nondeterminism. In comparison, our new algorithm also needs one less layer of complexity because it determines equivalences without constructing distinguishing formulas.

These advances enable a spectroscopy of systems saturated by weak transitions. We can thus analyze weak equivalences such as in the example of Peterson's mutex. For special weak equivalences without a strong counterpart such as branching bisimilarity [22], deeper changes to the modal logic are necessary [6].

The increased applicability has allowed us to exhaustively consider equivalences on the smaller systems of the widely-used VLTS suite [18]. The experiments reveal that the spectrum between trace equivalence and bisimilarity mostly collapses for the examined systems. It may often be reasonable to specify systems in such a way that the spectrum collapses. In a benchmark suite, however, a lack of semantic diversity can be problematic: For instance, otherwise sensible techniques like polynomial-time reductions [13] will not speed up language inclusion testing, and nuances of the weak equivalence spectrum [8] will falsely seem insignificant. One may also overlook errors and performance degradations that appear only for transition systems where equal traces do not imply equivalent branching behavior. We hope this blind spot does not affect the validity of any of the numerous studies relying on VLTS benchmarks.

Acknowledgments. This work benefited from discussion with Sebastian Wolf, with David N. Jansen, with members of the LFCS Edinburgh, and with the MTV research group at TU Berlin, as well as from reviewer comments.

Data Availibility. Proofs and updates are to be found in the report version of this paper [5]. The Scala source is on GitHub: https://github.com/benkeks/equivalencefiddle/. A webtool implementing the algorithm runs on https://equiv.io/. An artifact including the benchmarks is archived on Zenodo [4].

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Concurrency**

# **Commutativity for Concurrent Program Termination Proofs**

Danya Lette(B) and Azadeh Farzan

University of Toronto, Toronto, Canada danya@cs.toronto.edu, azadeh@cs.toronto.edu

**Abstract.** This paper explores how using commutativity can improve the efficiency and efficacy of algorithmic termination checking for concurrent programs. If a program run is terminating, one can conclude that all other runs equivalent to it up-to-commutativity are also terminating. Since reasoning about termination involves reasoning about infinite behaviours of the program, the equivalence class for a program run may include infinite words with lengths strictly larger than ω that capture the intuitive notion that some actions may soundly be postponed indefinitely. We propose a sound proof rule which exploits these as well as classic bounded commutativity in reasoning about termination, and devise a way of algorithmically implementing this sound proof rule. We present experimental results that demonstrate the effectiveness of this method in improving automated termination checking for concurrent programs.

### **1 Introduction**

Checking termination of concurrent programs is an important practical problem and has received a lot of attention [3,29,35,37]. A variety of interesting techniques, including thread-modular reasoning [10,34,35,37], causality-based reasoning [29], and well-founded proof spaces [15], among others, have been used to advance the state of the art in reasoning about concurrent program termination. Independently, it has been established that leveraging *commutativity* in proving safety properties can be a powerful tool in improving automated checkers [16–19]. There are many instances of applications of Lipton's reductions [32] in program reasoning [14,28]. Commutativity has been used to simultaneously search for a program with a simple proof and its safety proof [18,19] and to improve the efficiency and efficacy of assertion checking for concurrent programs [16]. Recently [17], *abstract commutativity* relations are formalized and combined to increase the power of commutativity in algorithmic verification.

This paper investigates how using commutativity can improve the efficiency and efficacy of proving the termination of concurrent programs as an enhancement to existing techniques. The core idea is simple: if we know that a program run ρabρ is terminating, and we know that a and b commute, then we can conclude that ρbaρ is also terminating. Let us use an example to make this idea concrete for termination proofs of concurrent programs. Consider the two thread

```
assume(barrier >= producer_num);
                             while (j < consumer_limit){
                                 j--;
                                 C--; // consume content
                             }
while (i < producer_limit){
   C++; // produce content
   i++
}
barrier++;
Producer Thread: Consumer Thread:
```
**Fig. 1.** Producer/Consumer Template

templates in Fig. 1: one for a producer thread and one for a consumer thread, where i and j are local variables. The assumption is that barrier and the local counters i and j are initialized to 0. The producer generates content (modelled by incrementing of a global counter C++) up to a limit and then, using barrier, signals the consumer to start consuming. Independent of the number of producers and consumers, this synchronization mechanism ensures that the consumers wait for all producers to finish before they start consuming. Note that the producer threads fully commute—each statement in a producer commutes with each statement in another. A producer and consumer only partially commute.

In a program with only two producers, a human would argue at the high level that the *independence* of producer loops implies that their parallel composition is equivalent, up to *commutativity*, to their sequential composition. Therefore, it suffices to prove that the sequential program terminates. In other words, it should suffice to prove that each producer terminates. Let us see how this high level argument can be formalized using commutativity reasoning. Let λ<sup>1</sup> and λ<sup>2</sup> stand for the loop bodies of the two producers. Among others, consider the (syntactic) concurrent program run (λ1λ2)ω; this run may or may not be feasible. Since λ<sup>1</sup> and λ<sup>2</sup> commute, we can transform this run, by making *infinitely many* swaps, to the run λ<sup>ω</sup> <sup>1</sup> λ<sup>ω</sup> <sup>2</sup> . The model checking expert would consider this transformation rather misguided: it appears that we are indefinitely postponing λ<sup>2</sup> in favour of λ1. Moreover, a word with a length strictly larger than ω, called a *transfinite* word, does not have an appropriate representation in language theory because it does not belong to Σω. Yet, the observation that (λ1λ2)<sup>ω</sup> ≡ λ<sup>ω</sup> <sup>1</sup> λ<sup>ω</sup> <sup>2</sup> is the key to a powerful proof rule for termination of concurrent programs: If λ<sup>ω</sup> <sup>1</sup> is terminating and λ<sup>1</sup> commutes against λ2, then we can conclude that (λ1λ2)<sup>ω</sup> is terminating. In other words, the termination proof for the first producer loop implies that all interleaved executions of two producers terminate, without the need for a new proof. Note that the converse is not true; termination of λ<sup>ω</sup> <sup>1</sup> λ<sup>ω</sup> 2 does not necessarily imply the termination of λ<sup>ω</sup> <sup>2</sup> . So, even if we were to replace the second producer with a forever loop, our observation would stand as is. Hence, for the termination of the entire program (and not just the run (λ1λ2)ω), one needs to argue about the termination of both λ<sup>ω</sup> <sup>1</sup> and λ<sup>ω</sup> <sup>2</sup> , matching the high level argument. In Sect. 3, we formally state and prove this proof rule, called the *omega-prefix* proof rule, and show how it can be incorporated into an algorithmic verification framework. Using this proof rule, the program consisting of N producers can be proved terminating by proving precisely N single-thread loops terminating.

Now, consider adding a consumer thread to our two producer threads. The consumer loop is independent of the producer threads but the consumer thread, as a whole, is not. In fact, part of the work of a termination prover is to prove that any interleaved execution of a consumer loop with either producer is *infeasible* due to the *barrier* synchronization and therefore terminating. Again, a human would argue that two such cases need to be considered: the consumer crosses the barrier with 0 or 1 producers having terminated. Each case involves several interleavings, but one should not have to prove them correct individually. Ideally, we want a mechanism that can take advantage of commutativity for both cases.

Before we explore this further, let us recall an algorithmic verification template which has proven useful in incorporating commutativity into safety reasoning [16–19] and in proving termination of sequential [25] and parameterized concurrent programs [15]. The work flow is illustrated in Fig. 2. The program and the proof are represented using (B¨uchi) automata, and module (d) (and consequently module (a)) are implemented as inclusion checks between the languages of these automata. The iteratively refined proof—a language of infeasible syntactic program runs—can be annotated Floyd-Hoare style and generalized using interpolation as in [25]. For module (b), any known technique for reasoning about the termination of simple sequential programs can be used on lassos.

The straightforward way to account for commutativity in this refinement loop would involve module (c): add to Π all program runs equivalent to the existing ones up to commutativity without having a proof for them. In the safety context, it is well-known that checking whether a program is subsumed by the *commutativity closure* of a proof is *undecidable*. We show (in Sect. 3) that the same hurdle exists when doing inclusion checks for program termination.

In the context of safety [16–19], *program reductions* were proposed as an antidote to this undecidability problem: rather than enlarging the proof, one reduces the program and verifies a new program with a subset of the original program runs while maintaining (at least) one representative for each commutativity equivalence class. These representatives are the lexicographically least members of their equivalence classes, and are algorithmically computed based on the idea of the *sleep set* algorithm [22] to construct the automaton for the reduced program. However, using this technique is not possible in termination reasoning where *lassos*, and not finite program runs, are the basic objects.

To overcome this problem, we propose a different class of reductions, called *finite-word reduction*. Inspired by the classical result that an ω-regular language can be faithfully captured as a finite-word language for the purposes of certain

**Fig. 2.** Refinement Loop For Proving Termination.

checks such as inclusion checks [4], we propose a novel way of translating both the program and the proof into finite-word languages. The classical result is based on an exponentially sized construction and does not scale. We propose a polynomial construction that has the same properties for the purpose of our refinement loop. This contribution can be viewed as an efficient translation of termination analysis to safety analysis and is useful independent of the commutativity context. For the resulting finite-word languages, we propose a novel variation of the *persistent set* algorithm to *reduce* the finite-word program language. This reduction technique is aware of the lasso structure in finite words.

Used together, finite-word reductions and omega-prefix generalization provide an approximation of the undecidable commutativity-closure idea discussed above. They combine the idea of closures, from proof generalization schemes like [15] and reductions from safety [16], into one uniform proof rule that both *reduces* the program and *generalizes* the proof up to commutativity to take as much advantage as possible. Neither the reductions nor the generalizations are ideal, which is a necessity to maintain algorithmic computability. Yet, together, they can perform in a near optimal way in practice: for example, with 2 producers and one consumer, the program is proved terminating by sampling precisely 3 terminating lassos (1 for each thread) and 2 infeasible lassos (one for each barrier failure scenario).

Finally, mostly out of theoretical interest, we explore a class of infinite word reductions that have the same theoretical properties as safety reductions, that is, they are *optimal* and their regularity (in this case, ω-regularity) is guaranteed. We demonstrate that if one opts for the *Foata Normal Form (FNF)* instead of lexicographical normal form, one can construct optimal program reductions in the style of [16,18,19] for termination checking. To achieve this, we use the notion of the FNF of infinite words from (infinite) trace theory [13], and prove the ω-regular analogue of the classical result for regular languages: a reduction consisting of only program runs in FNF is ω-regular, optimal, and can be soundly proved terminating in place of the original program (Sect. 3).

To summarize, this paper proposes a way of improving termination checking for concurrent programs by exploiting commutativity to boost existing algorithmic verification techniques. We have implemented our proposed solution in a prototype termination checker for concurrent programs called TerMute, and present experimental results supporting the efficacy of the method in Sect. 6

### **2 Preliminaries**

#### **2.1 Concurrent Programs**

In this paper, programs are *languages* over an alphabet of program statements Σ. The control flow graph for a *sequential* program with a set of locations Loc, and distinct entry and exit locations, naturally defines a finite automaton (Loc, Σ, δ, entry, {exit}). Without loss of generality, we assume that this automaton is deterministic and has a single exit location. This automaton recognizes *a language* of finite-length words. This is the set of all syntactic program runs that may or may not correspond to an actual program execution.

For the purpose of termination analysis, we are also interested in infinite-length program runs. Given a deterministic finite automaton <sup>A</sup>L <sup>=</sup> (Q, Σ, δ, q0, F) with no dead states, where <sup>L</sup>(AL) = <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> is a regular language of finite-length syntactic program runs, we define B¨uchi(AL)=(Q, Σ, δ, q0, Q), a B¨uchi automaton recognizing the language L<sup>ω</sup> = {u ∈ Σ<sup>ω</sup> : ∀v ∈ pref(u).v ∈ pref(L)}, where pref(u) denotes {w ∈ Σ<sup>∗</sup> : ∃w- ∈ Σ<sup>∗</sup> ∪ Σω.w · w- = u} and pref(L) = - v∈L pref(v). These are all syntactic infinite program runs that may or may not correspond to an actual program execution.

We represent concurrency via interleaving semantics. A concurrent program is a parallel composition of a fixed number of threads, where each thread is a sequential program. Each thread <sup>P</sup>i is recognized by an automaton <sup>A</sup><sup>i</sup> <sup>P</sup> = (Loci, Σi, δi, entryi, {exiti}). We assume the <sup>Σ</sup>i's are disjoint. The DFA recognizing <sup>P</sup> <sup>=</sup> <sup>P</sup>1|| ... ||Pn is constructed using the standard product construction for a DFA A<sup>P</sup> recognizing the *shuffle* of the languages of the individual thread DFA's.

The language of infinite runs of this concurrent program, denoted Pω, is the language recognized by B¨uchi(AP). Note that a word in the language P<sup>ω</sup> may not necessarily be the shuffle of infinite runs of its individual threads.

$$\mathsf{P}^{\omega} = \{ u \in \Sigma^{\omega} \mid \exists i:\ u \vert\_{\Sigma\_i} \in \mathsf{P}\_i^{\omega} \land \forall j:\ u \vert\_{\Sigma\_j} \in \mathsf{pref}(\mathsf{P}\_j) \cup \mathsf{P}\_j^{\omega} \} $$

In the rest of the paper, we will simply write P when we mean P <sup>ω</sup> for brevity. Note that P<sup>ω</sup> includes *unfair* program runs, for example those in which individual threads can be indefinitely starved. As argued in [15], this can be easily fixed by intersecting P<sup>ω</sup> with the set of all fair runs.

#### **2.2 Termination**

Let X the domain of the program state, Σ a set of statements, and denote -. : <sup>Σ</sup><sup>∗</sup> → P(<sup>X</sup> <sup>×</sup> <sup>X</sup>) a function which maps a sequence of statements to a relation over the program state, satisfying s1s2 = s<sup>1</sup> · s2 for all s1, s<sup>2</sup> ∈ Σ∗. Define sequential composition of relations in the usual fashion: r1r<sup>2</sup> = {(x, y) : ∃z.(x, z) ∈ r<sup>1</sup> ∧ (z,y) ∈ r2}. We write s(x) to denote {y : (x, y) ∈ <sup>s</sup>} ⊆ <sup>X</sup>.

We say that an infinite sequence of statements τ ∈ Σ<sup>ω</sup> is infeasible if and only if <sup>∀</sup><sup>x</sup> <sup>∈</sup> <sup>X</sup> <sup>∃</sup><sup>k</sup> <sup>∈</sup> <sup>N</sup> <sup>s</sup><sup>1</sup> ...sk(x) = <sup>∅</sup>, where <sup>s</sup>i is the <sup>i</sup>th statement in the run τ . A program—an ω-regular language P ⊆ Σω—is terminating if all of its *infinite* runs are infeasible.

$$\frac{\forall \tau \in P, \ \tau \text{ is infeasible}}{P \text{ is terminating}} \tag{\text{TERM}}$$

**Lassos.** It is not possible to effectively represent all infinite program runs, but we can opt for a slightly more strict rule by restricting our attention to ultimately periodic runs UP ⊆ Σω. That is, runs that are of the form uv<sup>ω</sup> for some finite words u, v ∈ Σ∗. These are also typically called *lassos*.

It is *unsound* to replace *all runs* with *all ultimately periodic runs* in rule Term. P may be non-terminating while all its ultimately periodic runs are terminating. Assume that our program P is an ω-regular language and there is a universe T of known *terminating* programs that are all omega-regular languages. Then, we get the following *sound* rule instead:

$$\frac{\exists \Pi \in \mathcal{T}. P \subseteq \Pi}{P \text{ is terminating}} \tag{\text{TERMUP}}$$

If the inclusion P ⊆ Π does not hold, then it is witnessed by an ultimately periodic run [4]. In a refinement loop in the style of Fig. 2, one can iteratively expand Π based on this ultimately periodic witness (a.k.a. a lasso), and hence have a termination proof construction scheme in which ultimately periodic runs (lassos) are the only objects of interest. Note that if P includes unfair runs of a concurrent program, rather than fixing it, one can instead initialize Π with all the unfair runs of the concurrent program, which is an ω-regular language. This way, the rule becomes a fair termination rule.

#### **2.3 Commutativity and Traces**

An *independence* (or commutativity) relation I ⊆ Σ × Σ is a symmetric, antireflexive relation that captures the commutativity of a program's statements: (s1, s2) ∈ I =⇒ s1s2 = s2s1. In what follows, assume such an I is fixed.

**Finite Traces.** Two finite words w<sup>1</sup> and w<sup>2</sup> are *equivalent* whenever we can apply a finite sequences of swaps of adjacent independent program statements to transform w<sup>1</sup> into w2. Formally, an independence relation I on statements gives rise to an equivalence relation <sup>≡</sup>I on words by defining <sup>≡</sup>I to be the reflexive and transitive closure of the the relation <sup>∼</sup>I , defined as us1s2<sup>v</sup> <sup>∼</sup>I us2s1<sup>v</sup> ⇐⇒ (s1, s2) <sup>∈</sup> <sup>I</sup>. A Mazurkiewicz trace [u]I <sup>=</sup> {<sup>v</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> : <sup>v</sup> <sup>≡</sup>I <sup>u</sup>} is the corresponding equivalence class; we use "trace" exclusively to denote Mazurkiewicz traces.

**Infinite Traces.** Traces may also be defined in terms of dependence graphs (or partial orders). Given a word τ = s1s2..., the dependence graph corresponding to τ is a labelled, directed, acyclic graph G = (V,E) with labelling function

L : V → Σ and vertices V = {1, 2,... }, where <sup>L</sup>(i) = <sup>s</sup>i, and (i, i- ) ∈ E whenever i<i and (L(i), L(i - )) ∈ I. Then, [τ ] ∞ I , the equivalence class of the infinite word τ , is precisely the set of *linear extensions* of G. Therefore, τ - <sup>≡</sup>I <sup>τ</sup> iff <sup>τ</sup> is a linear extension of G.

For example, Fig. 3(i) illustrates the Hasse diagram of the finite trace [abcba]I , and Fig. 3(ii), the Hasse diagram of the infinite trace [abc(ab)ω] ∞ I , where <sup>I</sup> <sup>=</sup> {(a, b),(b, a)}.

For an infinite word τ , the *infinite trace* [τ ] ∞ I may contain linear extensions that do not

**Fig. 3.** Hasse diagrams.

correspond to any word in Σω. For example, if I = {(a, b),(b, a)}, then the trace [(ab)ω] ∞ I includes a member (infinite word) in which all <sup>a</sup>s appear before all <sup>b</sup>s. We use aωb<sup>ω</sup> to denote this word and call such words **transfinite**. This means that [τ ] ∞ I ⊆ <sup>Σ</sup>ω, even for an ultimately periodic <sup>τ</sup> .

**Normal Forms.** A trace, as an equivalence class, may be represented unambiguously by one of its member words. *Lexicographical* normal forms [13] are the most commonly used normal forms, and the basis for the commonly known *sleep set* algorithm in partial order reduction [22]. *Foata Normal Forms* (FNF) are less well-known and are used in the technical development of this paper:

**Definition 1 (Foata Normal Form of a finite trace** [13]**).** *For a finite trace* <sup>t</sup>*, define FNF*(t) *as a sequence of sets* <sup>S</sup>1S2...Sk *(for some* <sup>k</sup> <sup>∈</sup> <sup>N</sup>*) where* <sup>t</sup> <sup>=</sup> <sup>Π</sup><sup>k</sup> i Si *and for all* i*:*

<sup>∀</sup>a, b <sup>∈</sup> <sup>S</sup>i <sup>a</sup> <sup>=</sup> <sup>b</sup> <sup>=</sup><sup>⇒</sup> (a, b) <sup>∈</sup> <sup>I</sup> *(no dependencies in* <sup>S</sup>i *)* <sup>∀</sup><sup>b</sup> <sup>∈</sup> <sup>S</sup>i+1 <sup>∃</sup><sup>a</sup> <sup>∈</sup> <sup>S</sup>i (a, b) ∈ <sup>I</sup> *(* <sup>S</sup>i *dependent on* <sup>S</sup>i+1 *)*

Given a trace's dependence graphs, the FNF can be constructed by repeatedly removing sets of minimal elements, that is, sets with no incoming edges. Although we have defined a trace's FNF as a sequence of sets, we will generally refer to a trace's FNF as a word in which the elements in each set are assumed to be ordered lexicographically. For example, FNF([abcba]I ) = ab · <sup>c</sup> · ab, where <sup>I</sup> <sup>=</sup> {(a, b),(b, a)}. We overload this notation by writing FNF([u]I ) as FNF(u), and, for a language L, FNF(L) = {FNF(u) : u ∈ L}.

**Theorem 1 (**[13]**).** L *is a regular language iff the set of its Foata (respectively Lexicographical) normal forms is a regular language.*

### **3 Closures and Reductions**

Commutativity defines an equivalence relation <sup>≡</sup>I which preserves the termination of a program run.

**Proposition 1.** *For* τ,τ - ∈ Σ<sup>ω</sup> *and* τ - <sup>≡</sup>I <sup>τ</sup> *,* <sup>τ</sup> *is terminating iff* <sup>τ</sup> *is terminating.*

In the context of a refinement loop in the style of Fig. 2, Proposition 1 suggests one can take advantage of commutativity by including all runs that are equivalent to the ones in Π (which are already proved terminating) in module (c). We formally discuss this strategy next.

Given a language L and an independence relation I, define [L] ∞ I <sup>=</sup> <sup>∪</sup>τ∈L[<sup>τ</sup> ] ∞ I . Recall from Sect. 2 that, in general, [τ ] ∞ I ⊆ <sup>Σ</sup>ω. Since programs are represented by ω-regular languages in our formalism, it is safe for us to exclude transfinite words from [τ ] ∞ I from commutativity closures computation. Define:

$$[L]\_I^{\omega} = \cup\_{\tau \in L} [\tau]\_I^{\infty} \cap \Sigma^{\omega} \tag{\omega \text{-closure}}$$

The following sound proof rule is a straightforward adaptation of Rule TermUP that takes advantage of commutativity-based proof generalization:

$$\frac{\exists \Pi \subseteq \mathcal{T}. P \subseteq [\Pi]\_I^{\omega}}{P \text{ is terminating}} \tag{\text{TERMCLosSURE}}$$

Recall the example from Sect. 1 with two producers. The *transfinite* program run λ<sup>ω</sup> <sup>1</sup> λ<sup>ω</sup> <sup>2</sup> that is the sequential compositions of the two producers looping forever back to back does not belong to the ω-closure of any ω-regular language. We generalize the notion of ω-closure to incorporate the idea of such runs in a new proof rule.

Let τ a transfinite word (like aωbω). Let τ a prefix of τ . If |τ - | = ω, we say that τ is an ω-prefix of τ , or τ - <sup>∈</sup> *pref* ω(<sup>τ</sup> ). A direct definition for when a transfinite word τ is terminating would be rather contrived, since a word such as aωb<sup>ω</sup> does not correspond to a program execution in the usual sense. However, a very useful property arises when considering the <sup>ω</sup>-words of *pref* ω(<sup>τ</sup> ): If an ω-prefix τ of a transfinite word τ is terminating, then all words in [τ ] ω I are terminating.

**Theorem 2 (Omega Prefix Proof Rule).** *Let* τ --, τ - ∈ Σω, τ *a transfinite word, if* <sup>τ</sup> <sup>≡</sup>I <sup>τ</sup> - *and* τ - <sup>∈</sup> *pref* ω(<sup>τ</sup> )*,* <sup>τ</sup> *terminates* ⇒ τ -*terminates.*

Remark that [τ ] ω I <sup>⊆</sup> <sup>Σ</sup>ω, so the former theorem uses the usual definition of termination, i.e. termination of ω-words; however; this theorem implicitly defines a notion of termination for some transfinite words.

Define [τ ] pω I , the *omega-prefix closure* of <sup>τ</sup> as

$$[\tau]^{p\omega}\_I = [\tau]^\omega\_I \cup \bigcup\_{\tau'.\tau \in \operatorname{pref}\_\omega(\tau')} [\tau']^\omega\_{I'}.$$

Theorem 2 guarantees that, if τ terminates, then all of [τ ] pω I terminates. The converse, however, does not necessarily hold: [τ ] pω I is not an equivalence class.

*Example 1.* Continuing the example in Fig. 1, recall that λ<sup>1</sup> and λ<sup>2</sup> are independent. Let us assume we have a proof that λ<sup>ω</sup> <sup>1</sup> is terminating. The class [λω 1 ] ω I <sup>=</sup> {λ<sup>ω</sup> <sup>1</sup> } does not include any other members and therefore we cannot conclude the termination status of any other program runs based on it. On the other hand, since λ<sup>ω</sup> <sup>1</sup> <sup>∈</sup> *pref* <sup>ω</sup>(λ<sup>ω</sup> <sup>1</sup> λ<sup>ω</sup> <sup>2</sup> ) and [(λ1λ2)ω] ω I = [λ<sup>ω</sup> <sup>1</sup> λ<sup>ω</sup> 2 ] ω I , (λ1λ2)<sup>ω</sup> <sup>∈</sup> [λ<sup>ω</sup> 1 ] pω I . Therefore, we can conclude that (λ1λ2)<sup>ω</sup> is also terminating. Note that λ<sup>2</sup> can be non-terminating and the argument still stands.

One can replace the closure in Rule TermClosure with omega-prefix closure and produce a new, more powerful, sound proof rule. There is, however, a major obstacle in the way of an algorithmic implementation of Rule TermClosure with either closure scheme: the inclusion check in the premise is not decidable.

**Proposition 2.** [L] ω I *and* [L] pω I *for an* <sup>ω</sup>*-regular language* <sup>L</sup> *may not be* <sup>ω</sup>*regular. Moreover, it is undecidable to check the inclusions* L<sup>1</sup> ⊂ [L2] ω I *and* L<sup>1</sup> ⊂ [L2] pω I *for* <sup>ω</sup>*-regular languages* <sup>L</sup><sup>1</sup> *and* <sup>L</sup>2*.*

#### **3.1 The Compromise: A New Proof Rule**

In the context of safety verification, with an analogous problem, a dual approach was proposed as a way forward [18] based on *program reductions*.

**Definition 2 (**ω**-Reduction and** ωp**-Reduction).** *A language* R ⊆ P *is an* ω*-reduction (resp.* ωp*-reduction of* P*) of program* P *under independence relation* I *iff for all* τ ∈ P *there is some* τ - ∈ R *such that* τ ∈ [τ - ] ω I *(resp.* <sup>τ</sup> <sup>∈</sup> [<sup>τ</sup> - ] pω I *).*

The idea is that a program reduction can be soundly proven in place of the original program but, with strictly fewer behaviours to prove correct, less work has to be done by the prover.

**Proposition 3.** *Let* P *be a concurrent program and* Π *be* ω*-regular. We have:*

*–* P ⊆ [Π] ω I *iff there exists an* <sup>ω</sup>*-reduction* <sup>R</sup> *of* <sup>P</sup> *under* <sup>I</sup> *such that* <sup>R</sup> <sup>⊆</sup> <sup>Π</sup>*. –* P ⊆ [Π] pω I *iff there exists an* ωp*-reduction* <sup>R</sup> *of* <sup>P</sup> *under* <sup>I</sup> *such that* <sup>R</sup> <sup>⊆</sup> <sup>Π</sup>*.*

An ω/ωp-reduction R may not always be ω-regular. However, Proposition 3 puts forward a way for us to make a compromise to rule TermClosure for the sake of algorithmic implementability. Consider a *universe* of program reductions Red(P), which does not include *all* reductions. This gives us a new proof rule:

$$\frac{\exists \varPi \in \mathscr{T}. \exists R \in \mathsf{Red}(P). R \subseteq \varPi}{P \text{ is terminating}} \tag{\text{TERMRED}(\mathbf{C})}$$

If Red(P) is the set of *all* ω-reductions (resp. ωp-reductions), then Rule TermReduc becomes logically equivalent to Rule TermClosure (resp. with [Π] pω I ). By choosing a strict subset of all reductions for Red(P), we trade the undecidable premise check of the proof rule TermClosure with a new decidable premise check for Rule TermReduc. The specific algorithmic problem that this paper solves is then the following: What are good candidates for Red(P) such that an effective and efficient algorithmic implementation of Rule TermReduc exists? Moreover, we want this implementation to show significant advantages over the existing algorithms that implement the Rule TermUP.

In Sect. 5, we propose *Foata Reduction* as a theoretically clean option for Red(P) in the universe of all ω-reductions. In particular, they have the algorithmically essential property that the reductions do not include any transfinite words. In the universe of ωp-reductions, which does account for transfinite words, such a theoretically clean notion does not exist. This paper instead proposes the idea of mixing both closures and reductions as a best algorithmic solution for the undecidable Rule TermClosure in the form of the following new proof rule:

$$\frac{\exists \boldsymbol{H} \subseteq \mathcal{T}. \exists \boldsymbol{R} \in \text{Red}(\boldsymbol{P}). \boldsymbol{R} \subseteq [\boldsymbol{H}]\_{I}^{opg}}{\boldsymbol{P} \text{ is terminating}} \tag{\text{TERMOP}}$$

In Sect. 3.2, we introduce [Π] opg I as an underapproximation of [Π] pω I that is guaranteed to be ω-regular and computable. Then, in Sect. 4, we discuss how, through a representation shift from infinite words to finite words, an appropriate class of reductions for Red(P) can be defined and computed.

#### **3.2 Omega Prefix Generalization**

We can implement the underapproximation of [Π] pω I by generalizing the proof of termination of each individual lasso in the refinement loop of Fig. 2. Let <sup>u</sup>1, ...um, v1, ...vm- <sup>∈</sup> <sup>Σ</sup> and consider the lasso uvω, where <sup>u</sup> <sup>=</sup> <sup>u</sup>1...um,<sup>v</sup> <sup>=</sup> <sup>v</sup>1...vm- , and m- <sup>&</sup>gt; 0. Let <sup>A</sup>uv*<sup>ω</sup>* = (Q, Σ, δ, q0, {qm}) a B¨uchi automaton consisting of a stem and a loop, with a single accepting state <sup>q</sup>m at the head of the loop, recognizing the ultimately periodic word uvω—in [25], this automaton is called a *lasso module* of uvω. Let <sup>Σ</sup>I*loop* <sup>⊆</sup> <sup>Σ</sup> <sup>=</sup> {<sup>a</sup> : {v1, ..., vm-}×{a} ⊆ I} the statements that are independent with the statements <sup>v</sup>1,...,vm of the loop, and <sup>Σ</sup>I*stem* <sup>⊆</sup> <sup>Σ</sup>I*loop* <sup>=</sup> {<sup>a</sup> : {u1,...,um, v1,...,vm-}×{a} ⊆ I} the statements that are independent of all statements appearing in uvω.

Define OPG(Aτ )=(<sup>Q</sup> ∪ {q- },Σ,δOPG, q0, {qm}) for a lasso <sup>τ</sup> <sup>=</sup> uv<sup>ω</sup> where

$$
\delta\_{\mathcal{CPG}}(q,a) = \begin{cases}
q & \text{if } q \in \{q\_0, \dots, q\_{m-1}\} \land a \in \Sigma\_{I\_{stem}} \\ & \text{or if } q \in \{q\_{m+1}, \dots, q\_{m+m'}\} \cup \{q'\} \land a \in \Sigma\_{I\_{loop}} \\
q' & \text{if } q = q\_m \land a \in \Sigma\_{I\_{loop}} \text{ or } m' = 1 \text{ and } a = v\_1 \\
\delta(q\_m, v\_1) & \text{if } q = q' \land a = v\_1 \\
\delta(q,a) & \text{o.w.} \\
\end{cases}
$$

We refer to the language <sup>L</sup>(OPG(Aτ )) recognized by this automaton as [<sup>τ</sup> ] opg I for short. Note that this construction is given for individual lassos; we may generalize this to a (finite) set of lassos by simply taking their union. For a lasso <sup>τ</sup> <sup>=</sup> uvω, OPG(Aτ ) is a linearly-sized B¨uchi automaton whose language satisfies the following:

**Proposition 4.** [τ ] opg I <sup>⊆</sup> [<sup>τ</sup> ] pω I *.*

Intuitively, this holds because this automaton simply allows us to intersperse the statements of uv<sup>ω</sup> with independent statements; when considering the Mazurkiewicz trace arising from a word interspersed as described, these added independent statements may all be ordered after uvω, resulting in a transfinite word with ω-prefix uvω.

**Theorem 3.** *If* τ *is terminating, then every run in* [τ ] opg I *is terminating.*

This follows directly from Theorem 2 and Proposition4, and concludes the soundness and algorithmic implementability of Rule TermOP if Red(P) = {P}.

### **4 Finite-Word Reductions**

In this section, inspired by the program reductions used in safety verification, we propose a way of using those families of reductions to implement Red(P) in Rule TermReduc. This method can be viewed as a way of translating the liveness problem into an equivalent safety problem.

In [4], a finite-word encoding of ω-regular languages was proposed that can be soundly used for checking inclusion in the premise of rules such as Rule TermReduc:

**Definition 3 (**\$**-language** [4]**).** *Let* L ∈ Σω*. Define the* \$*-language of* L *as*

$$\mathbb{S}(L) = \{ u \\$v \mid u, v \in \Sigma^\* \land uv^\omega \in L \}.$$

If L is ω-regular, then \$(L) is regular [4]. This is proved by construction, but the one given in [4] is exponential. Since the B¨uchi automaton for a concurrent program P is already large, an exponential blowup to construct \$(P) can hardly be tolerated. We propose an alternative polynomial construction.

#### **4.1 Efficient Reduction to Safety**

Our polynomial construction, denoted by fast\$, consists of linearly many copies of the B¨uchi automaton recognizing the program language.

**Definition 4 (fast\$).** *Given a B¨uchi automaton* A = (Q, Σ, δ, q0, F)*, define fast\$*(A)=(Q\$, Σ ∪ {\$}, δ\$, q0, F\$) *with* Q\$ = Q ∪ (Q × Q × {0, 1})*,* F\$ = {(q, q, 1) : q ∈ Q}*, and for* q, r ∈ Q*,* i ∈ {0, 1}*:*

$$\begin{aligned} \delta\_{\\$}(q,a) &= \begin{cases} \{(q,q,0)\} & \text{if } a = \\$\\ \delta(q,a) & \text{o.w.} \end{cases} \\ \delta\_{\\$}((q,r,i),a) &= \begin{cases} \{(q,r',1): r' \in \delta(r,a)\} & \text{if } i = 0 \text{ and } r \in F \\ \{(q,r',i): r' \in \delta(r,a)\} & \text{o.w.} \end{cases} \end{aligned}$$

Let L be an ω-regular language and A be a B¨uchi automaton recognizing L. We overload the notation and use fast\$(L) to denote the language recognized by fast\$(A). Note that fast\$(L), unlike \$(L), is a construction parametric on the B¨uchi automaton recognizing the language, rather than the language itself. In general, fast\$(L) under-approximates \$(L). But, under the assumption that all alphabet symbols of Σ label at most one transition in the B¨uchi automaton A (recognizing L), then fast\$(L) = \$(L). This condition is satisfied for any B¨uchi automaton that is constructed from the control flow graph of a (concurrent) program since we may treat each statement appearing on the graph as unique, and these graph edges correspond to the transitions of the automaton.

**Theorem 4.** *For any* ω*-regular language* L*, we have fast\$*(L) ⊆ \$(L)*. If* P *is a concurrent program then fast\$*(P) = \$(P)*.*

First, let us observe that in Rule TermUP, we can replace P with fast\$(P) and Π with fast\$(Π) (and hence the universe T with a correspondingly appropriate universe) and derive a new *sound* rule.

**Theorem 5.** *The finite word version of Rule* TermUP *using fast\$ is sound.*

The proof of Theorem 5 follows from Theorem 4. Using fast\$, the program is precisely represented and the proof is under-approxiamted, therefore the inclusion check implies the termination of the program.

#### **4.2 Sound Finite Word Reductions**

With a finite word version of the Rule TermUP, the natural question arises if one can adopt the analogue of the sound proof rule used for safety [18] by introducing an appropriate class of reductions for program termination in the following proof rule:

$$\frac{\exists H \in \mathcal{T}. \exists R \in \mathsf{Red}(\\$(P)). R \subseteq \mathsf{fastS}(H)}{P \text{ is terminating}} \qquad \qquad \text{(FürITETERMRED)}$$

A language R is a *sound reduction* of \$(P) if the termination of all ultimately periodic words uvω, where u\$v ∈ R, implies the termination of all ultimately periodic words of P. Since, in u\$v, the word u represents the stem of a lasso and the word v represents its loop, it is natural to define equivalence, considering the two parts separately, that is: <sup>u</sup>\$<sup>v</sup> <sup>≡</sup>I <sup>u</sup>- \$v iff u- <sup>≡</sup>I <sup>u</sup> <sup>∧</sup> <sup>v</sup>- <sup>≡</sup>I <sup>v</sup>. One can use any technique for producing reductions for safety, for example *sleep sets* for lexicographical reductions [18], in order to produce a *sound* reduction that includes representatives from this equivalence relation. Assume that \$ does not commute with any other letter in an extension I\$ of I over Σ ∪ {\$} and observe that the standard finite-length word Mazurkiewicz equivalence relation of <sup>u</sup>\$<sup>v</sup> <sup>≡</sup>I\$ <sup>u</sup>- \$v- coincides with <sup>u</sup>\$<sup>v</sup> <sup>≡</sup>I <sup>u</sup>- \$v as defined above. Let *FRed*(\$(P)) be the set of all such reductions. An algorithmic implementation of Rule FiniteTermReduc with Red(\$(P)) = *FRed*(\$(P)) may then be taken straightforwardly from the safety literature.

Note, however, that reductions in *FRed*(\$(P)) are more restrictive than their infinite analogues; for example, uv\$<sup>v</sup> ∈ [u\$v]I , whereas uvv<sup>ω</sup> <sup>=</sup> uv<sup>ω</sup> and therefore uvv<sup>ω</sup> <sup>≡</sup>I uv<sup>ω</sup> for any <sup>I</sup>. By treating \$(P)'s \$-word as a a finite word without recognizing its underlying lasso structure, every word uv<sup>ω</sup> in the program necessarily engenders an infinite family of representatives in R—one for each \$-word {u\$v, uv\$v, u\$vv, ...} ⊆ \$(P) corresponding to uv<sup>ω</sup> ∈ P.

We define *dollar closure* as variant of classic closure that is sensitive to the termination equivalence of the corresponding infinite words:

$$[u\\$v]\_I^{\\$} = \{x\\$y : uv^{\omega} \in [xy^{\omega}]\_I^{p\omega}\}$$

The termination of uv<sup>ω</sup> is implied by the termination of any xy<sup>ω</sup> such that x\$y is a member of [u\$v] \$ I (see Theorem 2). However, the converse does not necessarily hold. Therefore, like omega-prefix closure, [u\$v] \$ I is not an equivalence class. It suggests a more relaxed condition (than the one used for *FRed*(\$(P))) for the soundness of a reduction:

**Definition 5 (Sound** \$**-Program Reduction).** *A language* R ⊆ P *is called a* sound \$*-program reduction of* \$(P) *under independence relation* I *iff for all* uv<sup>ω</sup> ∈ P *we have* [u\$v] \$ I <sup>∩</sup> <sup>R</sup> <sup>=</sup> <sup>∅</sup>*.*

A \$-reduction R satisfying the above condition is obviously sound: It must contain a \$-representative x\$y ∈ [u\$v] \$ I for each word uv<sup>ω</sup> in the program. If <sup>R</sup> is terminating, then xy<sup>ω</sup> is terminating, and therefore so is uvω. Moreover, these sound \$-program reductions can be quite parsimonious, since one word can be an omega-prefix corresponding to many classes of program behaviours.

Under this soundness condition, we may now include one representative of [u\$v] \$ I for each uv<sup>ω</sup> <sup>∈</sup> <sup>P</sup> in a sound reduction of <sup>P</sup>. For example, <sup>R</sup> <sup>=</sup> {\$a, \$b} is a sound \$-program reduction of P = aω||b<sup>ω</sup> when (a, b) ∈ I. To illustrate, note that the only traces of P are the three depicted as Hasse diagrams in Fig. 4; the distinct program words (ab)ω,(aba)ω,(abaa)ω, ... all correspond to the same infinite trace shown in Fig. 4(iii). A salient feature of Fig. 4(iii) is that a<sup>ω</sup> and b<sup>ω</sup> correspond to *disconnected* components of this dependence graph. The omegaprefix rule of Theorem 2 can be interpreted in this graphical context as follows: if any *connected component* of the trace is terminating, then the entire class is terminating.

**Fig. 4.** The only three traces in <sup>P</sup> <sup>=</sup> <sup>a</sup><sup>ω</sup>||b<sup>ω</sup> when (a, b) <sup>∈</sup> <sup>I</sup>.

Recall that module (d) of the refinement loop of Fig. 2 may naturally be implemented as the inclusion check P ⊆ Π, or one of its variations that appear in the proof rules proposed throughout this paper. In a typical inclusion check, a product of the program and the complement of the proof automata are explored for the reachability of an accept state. Therefore, classic reduction techniques that operate on the program by pruning transitions/states during this exploration are highly desirable in this context. We propose a repurposing of such techniques that shares the simplicity and efficiency of constructing reductions from *FRed*(\$(P))) (in the style of safety) and yet takes advantage of the weaker soundness condition in Definition 5 and performs a more aggressive reduction. In short, a reduced program may be produced by pruning transitions while performing an on-the-fly exploration of the program automaton. In pruning, our goal is to discard transitions that would necessarily form words whose suffixes lead us into the disconnected components of the program traces underlying the program words that have been explored so far. This selective pruning technique is provided by a straightforward adaptation of the well-known safety reduction technique of persistent sets [22]. Consider the program illustrated in Fig. 5(a). In the graph in Fig. 5(b), the green states are explored and the dashed transitions are pruned. This amounts to proving two lassos terminating in the refinement loop of Fig. 2, where each lasso corresponds to one connected component of a program trace.

**Fig. 5.** Example of persistent set selective search.

We compute persistent sets using a variation of Algorithm 1 in Chap. 4 of [22]. In brief, a ∈ P ersistent≺(q) if a is the lexicographically least enabled state at q according to thread order ≺, if a is an enabled statement from the same thread as another state-

ment a- ∈ P ersistent≺(q), or if a is dependent on some statement a- ∈ P ersistent≺(q) from a different thread than a. In addition, \$ is also persistent whenever it is enabled. This set may be computed via a fixed-point algorithm; whenever a statement that is not enabled is added to P ersistent≺(q), then P ersistent≺(q) is simply the set of all enabled states. Intuitively, this procedure works because transitions are ignored only when they are nec-


essarily independent from all the statements that will be explored imminently; these may soundly be ignored indefinitely or deferred. Transitions that are deferred indefinitely are precisely those that would lead into a disconnected component of a program traces.

The reduced program that arises from the persistent set selective search of fast\$(AP ) based on thread order <sup>≺</sup> is denoted by PersistentSS≺(\$(P)). Figure 5(b) illustrates a reduced program; note that \$-transitions are omitted for simplicity. The reduced program corresponds to the states shown in green. The other program states are unreachable because the only persistent transitions correspond to statements from the least enabled thread; the transitions shown with dashed lines are not persistent.

**Theorem 6 (soundness of finite word reductions).** *Rule* FiniteTermReduc *is a sound proof rule when* Red(\$(P)) = {∀ ≺: PersistentSS≺(\$(P))}*.*

The theorem holds under the condition that the set T from Rule FiniteTermReduc is the set of all terminating ω-regular languages, and the under the assumption that the program is fair (or, equivalently, that the proof includes the unfair runs of P, as discussed in Sect. 2.2), where a fair run is one where no enabled thread action is indefinitely deferred. The proof of soundness appears in the extended version of this paper [31]. Intuitively, it relies on the fact that PersistentSS≺(\$(P)) is a \$*-program reduction* for all the fair runs in P.

*Example 2.* Recall the producer-consumer in Fig. 1, and consider the program with two producers P<sup>1</sup> and P<sup>2</sup> and one consumer C. Let λ<sup>1</sup> denote the loop body of P1, and λ<sup>2</sup> that of P2. Concretely, λ<sup>1</sup> = [i < producer limit] ; C++ ; i++ where [...] is an *assume* statement, and similarly for λ2. In addition, each loop has an exit statement, which we denote by ι<sup>1</sup> and ι2. For instance, ι<sup>1</sup> = [i >= producer limit]. Let ≺ such that P<sup>1</sup> ≺ P<sup>2</sup> ≺ C.

In A = PersistentSS≺(\$(P)), P<sup>1</sup> is the *first* thread and therefore persistent; that is, the word \$λ1—the \$-word corresponding to λ<sup>ω</sup> <sup>1</sup> – is in the reduction. Since λ<sup>1</sup> is independent of all statements in P<sup>2</sup> and C, any run in which P<sup>1</sup> enters the loop (and does not exit via ι1) will not be included in the reduction. In effect, this means that λ<sup>ω</sup> <sup>1</sup> is the only representative of [λ<sup>ω</sup> 1 ] pω I = [λ<sup>ω</sup> 1 ] ω I <sup>∪</sup>[λ<sup>ω</sup> <sup>1</sup> ·(P2+C)ω] ω I in the program reduction.

Even though P<sup>2</sup> seems identical to P1, the same is not true for P<sup>2</sup> because it appears later in the thread order. In this case, [λ2] pω I is represented by the family of words (λ1)∗ι1λω 2 .

### **5 Omega Regular Reductions**

In the classic implementation of Rule TermUP [25], ω-regular languages are used to represent the program P and the proof Π. It is therefore natural to ask if Red(P) in Rule TermReduc can be a family of ω-regular program reductions. For *finite* program reductions [16–19], and also for classic POR, lexicographical normal forms are almost always the choice. *Infinite traces* have lexicographic normal forms that are analogous to their finite counterparts [13]. However, these normal forms are not suitable for defining Red(P). For example, if (a, b) ∈ I, then the lexicographic normal form of the trace [(ab)ω] ∞ I is <sup>a</sup>ωb<sup>ω</sup> if a<b or bωa<sup>ω</sup> otherwise; both transfinite words. Fortunately, Foata normal forms do not share the same problem.

**Definition 6 (Foata Normal Form of an infinite trace**[13]**).** *Foata Normal Form FNF*(t) *of an infinite trace* t *is a sequence of non-empty sets* S1S2... *such that* <sup>t</sup> <sup>=</sup> <sup>Π</sup>i≤ωSi *and for all* <sup>i</sup>*:*

<sup>∀</sup>a, b <sup>∈</sup> <sup>S</sup>i <sup>a</sup> <sup>=</sup> <sup>b</sup> <sup>=</sup><sup>⇒</sup> (a, b) <sup>∈</sup> <sup>I</sup> *(no dependencies in* <sup>S</sup>i *)* <sup>∀</sup><sup>b</sup> <sup>∈</sup> <sup>S</sup>i+1 <sup>∃</sup><sup>a</sup> <sup>∈</sup> <sup>S</sup>i (a, b) ∈ <sup>I</sup> *(* <sup>S</sup>i *dependent on* <sup>S</sup>i+1 *)*

For example, FNF([(ab)ω] ∞ I )=(ab)<sup>ω</sup> if (a, b) <sup>∈</sup> <sup>I</sup>. To define a reduction based on FNF, we need a mild assumption about the program language.

**Definition 7 (Closedness).** *A language* L ⊆ Σ<sup>∞</sup> *is* closed *under the independence relation* I *iff* [L] ∞ I <sup>⊆</sup> <sup>L</sup> *and is* <sup>ω</sup>-closed *under* <sup>I</sup> *iff* [L] ω I <sup>⊆</sup> <sup>L</sup>*.*

It is straightforward to see that any concurrent program P (as defined in Sect. 2.1), and any valid dependence relation I, we have that P is ω-closed. This means that for any (infinite) program run τ , any other ω-word τ that is equivalent to τ is also in the language of the program.

The key result that makes Foata normal forms amenable to automation in the automaton-theoretic framework is the following theorem.

### **Theorem 7.** *If* L ⊆ Σ<sup>ω</sup> *is* ω*-regular and closed, FNF*(L) *is* ω*-regular.*

The proof of this theorem provides a construction for the B¨uchi automaton that recognizes the language FNF(L); see [31] for more detail. However, this construction is not efficient since, for a program P, of size Θ(n), the B¨uchi automaton recognizing FNF(P) can be as large as O(n2n). Finally, Foata reductions are *minimal* in the same exact sense that lexicographical reductions of finite-word languages are minimal:

**Theorem 8** *[Theorem 11.2.15* [13]*]. If* L ⊆ Σ<sup>ω</sup> *is* ω*-regular and closed, then for all* τ ∈ L*,* τ - ∈ *FNF*(L) ∩ [τ ] ω I <sup>=</sup><sup>⇒</sup> <sup>τ</sup> -= τ *.*

Our experimental results in Sect. 6 suggest that this complexity is a big bottleneck in practical benchmarks. Therefore, despite the fact that Foata normal forms put forward an algorithmic solution for the implementation of Rule TermReducTermReduc, the inefficiency of the solution makes it unsuitable for practical termination checkers.

### **6 Experimental Results**

The techniques presented in this paper have been implemented in a prototype tool called TerMute written in Python and C++. The inputs are concurrent integer programs written in a C-like language. TerMute may output "Terminating", or "Unknown", in the latter case also returning a lasso whose termination could not be proved. Ranking functions and invariants are produced using the method described in [24], which is restricted to linear ranking functions of linear lassos. Interpolants are generated using SMTInterpol [6] and MathSAT [7]; the validity of Hoare triples are checked using CVC4 [2].

TerMute may be run in several different modes. FOATA is an implementation of the algorithm described in Sect. 5. The baseline is the core counterexample-guided refinement algorithm of [25], which has been adapted to the finite-word context in order to operate on the automata fast\$(P) and fast\$(Π) of Sect. 4.1. All other modes are modifications of this baseline, maintaining the same refinement scheme, so that we can isolate the impact of adding commutativity reasoning. Hoare triple generalization ("HGen") augments the baseline by making solver calls after each refinement round in order to determine if edges may soundly be added to the proof for any valid Hoare triples not produced as part of the original proof. "POR" implements the persistent set technique of Sect. 4.2 and "OPG" is the finite-word analogue of the ω-prefix generalization in Sect. 3.2. TerMute can also be run on any combinations of these techniques. In what follows, we use TerMute to refer to the portfolio winner among all algorithms that employ *commutativity reasoning*, namely POR, OPG, POR + HGen, POR + OPG, and POR + OPG + HGen.

See [31] for more detail regarding our experimental setup and results.

**Benchmarks.** Our benchmarks include 114 terminating concurrent linear integer programs that range from 2 to 12 threads and cover a variety of patterns commonly used for synchronization, including the use of locks, barriers, and monitors. Some are drawn from the literature on termination verification of concurrent programs, specifically [29,34,37], and the rest were created by us, some of which are based on sequential benchmarks from The Termination Problem Database [38], modified to be multi-threaded. We include programs whose threads display a wide range of independence—from complete independence (e.g. the producer threads in Fig. 1), all the way to complete dependence—and demonstrate a range of complexity with respect to control flow.

**Results.** Our experiments have a timeout of 300 s and a memory cap of 32 GB, and were run on a 12th Gen Intel Core i7-12700K with 64 GB of RAM running Ubuntu 22.04. We experimented with both interpolating solvers and the reported times correspond to the winner of the two. The results are depicted in Fig. 6(a) as a *quantile* plot that compares the algorithms. The total number of benchmarks solved is noted on each curve. FOATA times out on all but the simplest benchmarks, and therefore is omitted from the plot.

The portfolio winner, TerMute, solves 101 benchmarks in total. It solves any benchmark otherwise solved by algorithms without commutativity reasoning (namely, the baseline or HGen). It is also faster on 95 out of 101 benchmarks it solves. The figure below illustrates how often each of the portfolio algorithms emerges as the fastest among these 95 benchmarks.

HGen aggressively generalizes the proof and consequently, it forces convergence in many fewer refinement rounds. This, however, comes at the cost of a time overhead per round. Therefore, HGen helps in solving more benchmarks, but whenever a benchmarks is solvable without it, it is solved much faster. The scatter plot in Fig. 6(b) illustrates this phenomenon when HGen is added to POR+OPG. The plot compares the times of benchmarks solved by both algorithms on a logarithmic scale, and the overhead caused by HGen is significant in the majority of the cases.

**Fig. 6.** Experimental results for TerMute: (a) quantile plot for the throughput of each algorithm, and (b) scatter plot for the impact of thread order on efficiency.

Recall, from Sect. 4, that the persistent set algorithm is parametrized on an order over the participating threads. The choice of order centrally affects the way the persistent set algorithm works, by influencing which transitions may be explored and, by extension, which words appear in the reduced program. Experimentally, we have observed that the chosen order plays a significant role in how well the algorithms work, but to varying degrees. For instance, for POR, the worst thread order times out on 16% of the benchmarks that the best order solves. For POR+OPG+HGen, the difference is more modest at 7%. In practice, it is sensible then to instantiate a few instances of the TerMute with a few different random orders to increase the chances of getting better performance.

### **7 Related Work**

The contribution of this paper builds upon sequential program termination provers to produce termination proofs for concurrent programs. As such, any progress in the state of the art in sequential program termination can be used to produce proofs for more lassos, and is, therefore, complementary to our approach. So, we only position this paper in the context of algorithmic concurrent program termination, and the use of commutativity for verification in general, and skip the rich literature on sequential program termination [11,36] or model checking liveness [8,9,26,33].

*Concurrent Program Termination.* The thread-modular approach to proving termination of concurrent programs [10,34,35,37] aims to prove a thread's termination without reasoning directly about its interactions with other threads, but rather by inferring facts about the thread's environment. In [37], this approach is combined with compositional reasoning about termination arguments. Our technique can also be viewed as modular in the sense that lassos – which, like isolated threads, are effectively sequential programs – are dealt with independently of the broader program in which they appear; however, this is distinct from thread-modularity insofar as we reason directly about behaviours arising from the interaction of threads. Whenever a thread-modular termination proof can be automatically generated for the program, that proof is the most efficient in terms of scalability with the number of threads. However, for a thread-modular proof to always exist, local thread states have to be exposed as auxiliary information. The modularity in our technique does not rely on this information at all. Commutativity can be viewed as a way of observing and taking advantage of some degree of non-interference, different from that of thread modularity.

Causal dependence [29] presents an abstraction refinement scheme for proving concurrent programs terminating that takes advantage of the equivalence between certain classes of program runs. These classes of runs are determined by partial orders that capture the causal dependencies between transitions, in a manner reminiscent of the commutativity-based partial orders of Mazurkiewicz traces. The key to scalability of this method is that they forgo a containment check in the style of module (d) of Fig. 2. Instead, they cover the space of program behaviour by splitting it into cases. Therefore, for the producer-only instance of the example in Sect. 1, this method can scale to many many thread easily, while our commutativity-based technique cannot. Similar to thread-modular approach, this technique cannot be beaten in scalability for the programs that can be split into linearly many cases. However, there is no guarantee (none given in [29]), that a bounded complete causal trace tableau for a terminating program must exist; for example, when there is a dependency between loops in different threads that would cause the program to produce unboundedly many (Mazurkiewicz) traces that have to be analyzed for termination. The advantage of our method is that, once consumers are added to the example in Sect. 1, we can still take advantage of all the existing commutativity to gain more efficiency.

Similar to safety verification, context bounding [3] has been used as a way of under-approximating concurrent programs for termination analysis as well.

*Commutativity in Verification.* Program reductions have been used as a means of simplifying proofs of concurrent and distributed programs before. Lipton's movers [32] have been used to simplify programs for verification. CIVL [27,28] uses a combination of abstraction and reduction to produce *layered programs*; in an interactive setup, the programmer can prove that an implementation satisfies a specification by moving through these layered programs to increasingly more abstract programs. In the context of message-passing distributed systems [12,21], commutativity is used to produce a synchronous (rather than sequential) program with a simpler proof of correctness.

In [16–19] program reductions are used in a refinement loop in the same style as this paper to prove safety properties of concurrent programs. In [18,19], an unbounded class of lexicographical reductions are enumerated with the purpose of finding a simple proof for at least one of the reductions; the thesis being that there can be a significant variation in the simplicity of the proof for two different reductions. In [19], the idea of contextual commutativity—i.e. considering two statements commutative in some context yet not all contexts—is introduced and algorithmically implemented. In [16,17], only one reduction at a time is explored, in the same style as this paper. In [16], a persistent-set-based algorithm is used to produce space-efficient reductions. In [17] the idea of abstract commutativity is explored. It is shown that no *best* abstraction exists that provides a maximal amount of commutativity and, therefore, the paper proposes a way to combine the benefits of different commutativity relations in one verification algorithm. The algorithm in this paper can theoretically take advantage of all of these (orthogonal) findings to further increase the impact of commutativity in proving termination.

*Non-termination.* The problem of detecting non-termination has also been directly studied [1,5,20,23,30]. Presently, our technique does not accommodate proving the non-termination of a program. However, it is relatively straightforward to adapt any such technique (or directly use one of these tools) to accommodate this; in particular, when we fail to find a termination proof for a particular lasso, sequential methods for proving non-termination may be employed to determine if the lasso is truly a non-termination witness. However, it is important to note that a program may be non-terminating while all its lassos are terminating, and the refinement loop in Fig. 2 may just diverge without producing a counterexample in this style; this is a fundamental weakness of using lassos as modules to prove termination of programs.

### **8 Conclusion**

In the literature on the usage of commutativity in safety verification, sound program reductions are constructed by selecting lexicographical normal forms of equivalence classes of concurrent program runs. These are not directly applicable in the construction of sound program reductions for termination checking, since the lexicographical normal forms of infinite traces may not be ω-words. In this paper, we take this apparent shortcoming and turn it into an effective solution. First, these transfinite words are used in the design of the *omega prefix proof rule* (Theorem 2). They also inform the design of the *termination aware* persistent set algorithm described in Sect. 4.2. Overall, this paper contributes mechanisms for using commutativity-based reasoning in termination checking, and demonstrates that, using these mechanisms, one can efficiently check the termination of concurrent programs.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Fast Termination and Workflow Nets**

Piotr Hofman1(B) , Filip Mazowiecki<sup>1</sup> , and Philip Offtermatt1,2

<sup>1</sup> University of Warsaw, Warsaw, Poland piotr.hofman@uw.edu.pl, f.mazowiecki@mimuw.edu.pl <sup>2</sup> Universit´e de Sherbrooke, Sherbrooke, Canada Philip.Offtermatt@usherbrooke.ca

**Abstract.** Petri nets are an established model of concurrency. A Petri net is terminating if for every initial marking there is a uniform bound on the length of all possible runs. Recent work on the termination of Petri nets suggests that, in general, practical models should terminate fast, *i.e.* in polynomial time. In this paper we focus on the termination of workflow nets, an established variant of Petri nets used for modelling business processes. We partially confirm the intuition on fast termination by showing a dichotomy: workflow nets are either non-terminating or they terminate in linear time.

The central problem for workflow nets is to verify a correctness notion called soundness. In this paper we are interested in generalised soundness which, unlike other variants of soundness, preserves desirable properties like composition. We prove that verifying generalised soundness is coNPcomplete for terminating workflow nets.

In general the problem is PSPACE-complete, thus intractable. We utilize insights from the coNP upper bound to implement a procedure for generalised soundness using MILP solvers. Our novel approach is a semi-procedure in general, but is complete on the rich class of terminating workflow nets, which contains around 90% of benchmarks in a widely-used benchmark suite. The previous state-of-the-art approach for the problem is a different semi-procedure which is complete on the incomparable class of so-called free-choice workflow nets, thus our implementation improves on and complements the state-of-the-art.

Lastly, we analyse a variant of termination time that allows parallelism. This is a natural extension, as workflow nets are a concurrent model by design, but the prior termination time analysis assumes sequential behavior of the workflow net. The sequential and parallel termination times can be seen as upper and lower bounds on the time a process represented as a workflow net needs to be executed. In our experimental section we show that on some benchmarks the two bounds differ significantly, which agrees with the intuition that parallelism is inherent to workflow nets.

**Keywords:** Workflow · Soundness · Fast termination · generalised Soundness · Polynomial time

This work was partially supported by ERC grant INFSYS: 501-D110-60-0196287. P. Offtermatt is now at Informal Systems, Munich, Germany.

c The Author(s) 2023

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 132–155, 2023. https://doi.org/10.1007/978-3-031-37706-8\_7

### **1 Introduction**

*Petri nets* are a popular formalism to model problem in software verification [22], business processes [1] and many more (see [42] for a survey). One of the fundamental problems for such models is the *termination problem*, *i.e.* whether the lengths of all runs are universally bounded. There are two natural variants of this problem. First, if the initial configuration is fixed then the problem is effectively equivalent to the boundedness problem, known to be EXPSPACE-complete for Petri nets [36,41]. Second, if termination must hold for all initial configurations the problem known to be in polynomial time [30], and such nets are known as *structurally terminating*. In this paper we are interested in the latter variant.

Termination time is usually studied for *vector addition system with states* (VASS), an extension of Petri nets that allows the use of control states. In particular, the aforementioned EXPSPACE and polynomial time bounds work for VASS. In 2018, a deeper study of the termination problem for VASS was initiated [12]. This study concerns the asymptotics of the function f(n) bounding the length of runs, where n bounds the size of the initial configuration. The focus is particularly on classes where f(n) is a polynomial function, suggesting that such classes are more relevant for practical applications. This line of work was later continued for variants of VASS involving probabilities [11] and games [31].

For VASS the function f(n) can asymptotically be as big as Fi(n) in the Grzegorczyk hierarchy for any finite i (recall that F3(n) is nonelementary and Fω(n) is Ackermann) [35,43]. It was known that for terminating Petri nets many problems are considerably simpler [40]. However, to the best of our knowledge, the asymptotic behaviour of f(n) was not studied for Petri nets.

*Our Contributions.* In this paper we focus on *workflow nets*, a class of Petri nets widely used to model business processes [1]. Our first result is the following dichotomy: any workflow net is either non-terminating or f(n) is linear. This confirms the intuition about fast termination of practical models [12]. In our proof, we follow the intuition of applying linear algebra from [40] and rely on recent results on workflow nets [9]. We further show that the optimal constant a<sup>N</sup> such that f(n) = a<sup>N</sup> · n can be computed in polynomial time. The core of this computation relies on a reduction to continuous Petri nets [19], a well known relaxation of Petri nets. Then we can apply standard tools from the theory of continuous Petri nets, where many problems are in polynomial time [7,19].

For workflow nets, the central decision problems are related to soundness. There are many variants of this problem (see [2] for a survey). For example k*-soundness* intuitively verifies that k started processes eventually properly terminate. We are interested in *generalised soundness*, which verifies whether ksoundness holds for all k [25–27]. The exact complexity of most popular soundness problems was established only recently in 2022 [9]. Generalised soundness is surprisingly PSPACE-complete. Other variants, like k-soundness, are EXPSPACE-complete, thus computationally harder, despite having a seemingly less complex definition. Moreover, unlike k-soundness and other variants, generalised soundness preserves desirable properties like composition [26]. Building on our first result, *i.e.* the dichotomy between non-terminating and linearly terminating workflow nets, our second result is that generalised soundness is coNP-complete for terminating workflow nets.

Finally, we observe that the asymptotics of f(n) are defined with the implicit assumption that transitions are fired sequentially. Since workflow nets are models for parallel executions it is natural to expect that runs would also be performed in parallel. Our definition of parallel executions is inspired by similar concepts for time Petri nets, and can be seen as a particular case [5]. We propose a definition of the optimal running time of runs exploiting parallelism and denote this time g(n), where n bounds the size of the initial marking. We show that the asymptotic behaviour of g(n), similar to f(n), can be computed in polynomial time, for workflow nets with mild assumptions. Together, the two functions f(n) and g(n) can be seen as (pessimistic) upper bound and (optimistic) lower bound on the time needed for the workflow net to terminate.

*Experiments.* Based on our insights, we implement several procedures for problems related to termination in workflow nets. Namely, we implement our algorithms for checking termination, for deciding generalised soundness of workflow nets and for computing the asymptotic behaviour of f(n). We additionally implement procedures to compute f(k), g(k) and decide k-soundness for terminating workflow nets. To demonstrate the efficacy of our procedures, we test our implementation on a popular and well-studied benchmark suite of 1382 workflow nets, originally introduced in [18]. It turns out that the vast majority of instances (roughly 90%) is terminating, thus the class of terminating workflow nets seems highly relevant in practice. Further, we positively evaluate our algorithm for generalised soundness against a recently proposed state-of-art approach [10] which semi-decides the property in general, and is further exact on the class of *free-choice workflow nets* [3]. Interestingly, our novel approach for generalised soundness is also a semi-procedure in general, but precise on terminating workflow nets. The approach from [10] is implemented as an ∃∀-formula from FO(Q, <, +), while our approach manages to avoid any quantifier alternations. It turns out that our approach is faster on over 95% of benchmark instances, and thus significantly improves upon the state-of-art. The mean analysis time for our approach is just 12.8 ms, while it is about 2 s for the previous state-ofthe-art. In addition, the classes of free-choice and terminating workflow nets are incomparable, thus our approach complements the state-of-the-art.

*Related Work.* For general Petri nets and VASS the most well-known problem is reachability, recently shown to be Ackermann-complete [14,33,34]. Despite its high complexity, there are tools for the problem [16,45], including some based on integer and continuous relaxations [6,8,21]. Reachability was also studied in the context of terminating models. In particular, it is PSPACE-complete for structurally terminating Petri nets [40] and EXPSPACE-complete for polynomially terminating VASS [32].

Most algorithms for soundness are based on reductions to reachability [1], this is the case for the first algorithms for generalised soundness [25,27]. However, such reductions only imply Ackermannian upper bounds on the problem, while a direct study yielded elementary complexities [9].

A different class of approaches for soundness relies on *reduction rules*, which can be applied iteratively to reduce the size of a net while exactly preserving soundness [4,39]. These approaches are not precise in general, but can be for subclasses, *e.g.* for *live and bounded* free-choice workflow nets [15]. We use a certain set of reduction rules [13] for generalised soundness in our experimental evaluation.

There exist many established tools and frameworks for the analysis of workflow nets, for example Woflan [44], WoPeD [20], and ProM [17]. However, when it comes to soundness problems, these tools typically focus on k-soundness, with a particular focus on k = 1 (except for the discussed tool in [10]).

*Organisation.* In Sect. 2 we define the models, problems and basic notation. In Sect. 3 we prove the dichotomy between non-terminating and linear workflow nets. Then, we show how to compute the linear constants for terminating workflow nets in Sect. 4. Building on the dichotomy we show that generalised soudness is coNP-complete in Sect. 5. In Sect. 6 we define and compute a variant of termination time that takes into account parallelism. We present our experimental results in Sect. 7. Most proofs can be found in the appendix.

### **2 Preliminaries**

We write <sup>N</sup>, <sup>N</sup><sup>&</sup>gt;0,Z, <sup>Q</sup> and <sup>Q</sup>≥<sup>0</sup> for the naturals (including 0), the naturals without 0, the integers, the rationals, and the nonnegative rationals, respectively.

Let <sup>N</sup> be a set of numbers, *e.g.* <sup>N</sup> <sup>=</sup> <sup>N</sup>. For d, d1, d<sup>2</sup> <sup>∈</sup> <sup>N</sup><sup>&</sup>gt;<sup>0</sup> we write <sup>N</sup><sup>d</sup> for the set of vectors with elements from N in dimension d. Similarly, N<sup>d</sup>1×d<sup>2</sup> is the set of matrices with d<sup>1</sup> rows and d<sup>2</sup> columns and elements from N. We use bold font for vectors and matrices. For <sup>a</sup> <sup>∈</sup> <sup>Q</sup> and <sup>d</sup> <sup>∈</sup> <sup>N</sup><sup>&</sup>gt;0, we write *<sup>a</sup>*<sup>d</sup> := (a, a, . . . , a) <sup>∈</sup> <sup>Q</sup><sup>d</sup> (or *<sup>a</sup>* if <sup>d</sup> is clear from context). In particular **<sup>0</sup>**<sup>d</sup> <sup>=</sup> **<sup>0</sup>** is the zero vector.

Sometimes it is more convenient to have vectors with coordinates in a finite set. Thus, for a finite set S, we write N<sup>S</sup>, Z<sup>S</sup>, and Q<sup>S</sup> for the set of vectors over naturals, integers and rationals. Given a vector *v* and an element s ∈ S, we write *v*(s) for the value *v* assigns to s.

Given *<sup>v</sup>*, *<sup>w</sup>* <sup>∈</sup> <sup>Q</sup><sup>S</sup>, we write *<sup>v</sup>* <sup>≤</sup> *<sup>w</sup>* if *<sup>v</sup>*(s) <sup>≤</sup> *<sup>w</sup>*(s) for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>, and *<sup>v</sup>* <sup>&</sup>lt; *<sup>w</sup>* if *v* ≤ *w* and *v*(s) < *w*(s) for some s ∈ S. The *size* of S, denoted |S|, is the number of elements in S. We define the *norm* of a vector *v* := max<sup>s</sup>∈<sup>S</sup>|*v*(s)|, and the norm of a matrix *<sup>A</sup>* <sup>∈</sup> <sup>Q</sup><sup>m</sup>×<sup>n</sup> as *A* := max<sup>1</sup>≤j≤m,1≤i≤<sup>n</sup>|A(i, j)|. For a set <sup>S</sup> <sup>∈</sup> <sup>Q</sup><sup>d</sup>, we denote by <sup>S</sup> <sup>∈</sup> <sup>R</sup><sup>d</sup> the closure of <sup>S</sup> in the euclidean space.

#### **2.1 (Integer) Linear Programs**

Let n, m <sup>∈</sup> <sup>N</sup>>0, *<sup>A</sup>* <sup>∈</sup> <sup>Z</sup>m×n, and *<sup>b</sup>* <sup>∈</sup> <sup>Z</sup>m. We say that <sup>G</sup> := *Ax* <sup>≤</sup> *<sup>b</sup>* is a *system of linear inequalities* with m inequalities and n variables. The *norm* of a system G is defined as G := *A*+*b*+m+n. An (m×n)*-ILP*, short for *integer linear program*, is a system of linear inequalities with m inequalities and n variables, where we are interested in the integer solutions. An (m×n)*-LP* is such a system where we are interested in the rational solutions. We use the term MILP, short for *mixed integer linear program*, for a system where some variables are allowed to take on rational values, while others are restricted to integer values.

We allow syntactic sugar in ILPs and LPs, such as allowing constraints x ≥ y, x = y, x<y (in the case of ILPs). Sometimes we are interested in finding optimal solutions. This means we have a objective function, formally a linear function on the variables of the system, and look for a solution that either maximizes or minimizes the value of that function. For LPs, finding an optimal solution can be done in polynomial time, while this is NP-complete for ILPs and MILPs.

#### **2.2 Petri Nets**

A *Petri net* N is a triple (P, T, F), where P is a finite set of *places*; T is a finite set of *transitions* such that <sup>T</sup> <sup>∩</sup><sup>P</sup> <sup>=</sup> <sup>∅</sup>; and <sup>F</sup> : ((<sup>P</sup> <sup>×</sup>T)∪(<sup>T</sup> <sup>×</sup>P)) <sup>→</sup> <sup>N</sup> is a function describing its *arcs*. A *marking* is a vector *<sup>m</sup>* <sup>∈</sup> <sup>N</sup><sup>P</sup> . We say that *<sup>m</sup>*(p) is the number of *tokens* in place p ∈ P and p is *marked* if *m*(p) > 0. To write markings, we list only non-zero token amounts. For example, *m* = {p<sup>1</sup> : 2, p<sup>2</sup> : 1} is the marking *m* with *m*(p1)=2,*m*(p2) = 1 and *m*(p) = 0 for all p ∈ P \ {p1, p2}.

Let <sup>t</sup> <sup>∈</sup> <sup>T</sup>. We define the vector •<sup>t</sup> <sup>∈</sup> <sup>N</sup><sup>P</sup> by •t(p) := <sup>F</sup>(p, t) for <sup>p</sup> <sup>∈</sup> <sup>P</sup>. Similarly, the vector t • <sup>∈</sup> <sup>N</sup><sup>P</sup> is defined by <sup>t</sup> •(p) := F(t, p) for p ∈ P. We write the *effect* of t as Δ(t) := t • − •t. A transition t is *enabled* in a marking *m* if *m* ≥ •t. If t is enabled in the marking *m*, we can *fire* it, which leads to the marking *<sup>m</sup>* := *<sup>m</sup>* <sup>+</sup> <sup>Δ</sup>(t), which we denote *<sup>m</sup>* −→<sup>t</sup> *<sup>m</sup>* . We write *m* −→ *m* if there exists some <sup>t</sup> <sup>∈</sup> <sup>T</sup> such that *<sup>m</sup>* −→<sup>t</sup> *<sup>m</sup>* .

A sequence of transitions π = t1t<sup>2</sup> ...t<sup>n</sup> is called a *run*. We denote the *length* of <sup>π</sup> as <sup>|</sup>π<sup>|</sup> := <sup>n</sup>. A run <sup>π</sup> is *enabled* in a marking *<sup>m</sup>* iff *<sup>m</sup>* −→<sup>t</sup><sup>1</sup> *<sup>m</sup>*<sup>1</sup> −→<sup>t</sup><sup>2</sup> *<sup>m</sup>*<sup>2</sup> −→<sup>t</sup><sup>3</sup> ...*m*<sup>n</sup>−<sup>1</sup> −→<sup>t</sup>*<sup>n</sup> <sup>m</sup>* for some markings *<sup>m</sup>*1,*m*2,...,*m* <sup>∈</sup> <sup>N</sup><sup>P</sup> . The set of all runs is denoted Runs*<sup>m</sup>* <sup>N</sup> , *i.e.* <sup>π</sup> <sup>∈</sup> Runs*<sup>m</sup>* <sup>N</sup> if π is enabled in *m*. The *effect* of π is Δ(π) := - <sup>i</sup>∈[1..n] <sup>Δ</sup>(ti). *Firing* <sup>π</sup> from *<sup>m</sup>* leads to a marking *<sup>m</sup>* , denoted *<sup>m</sup>* −→<sup>π</sup> *<sup>m</sup>* , iff *<sup>m</sup>* <sup>∈</sup> Runs*<sup>m</sup>* <sup>N</sup> and *m* = *m*+Δ(π). We denote by −→<sup>∗</sup> the reflexive, transitive closure of −→. Given two runs π = t1t<sup>2</sup> ...t<sup>n</sup> and π = t 1t <sup>2</sup> ...t n we denote ππ := t1t<sup>2</sup> ...tnt 1t <sup>2</sup> ...t n-.

The size of a Petri net is defined as |N | = |P|+|T|+|F|. We define the *norm* of <sup>N</sup> as N := F + 1, where we view <sup>F</sup> as a vector in <sup>N</sup>(<sup>P</sup> <sup>×</sup>T)∪(<sup>T</sup> <sup>×</sup><sup>P</sup> ) .

We also consider several variants of the firing semantics of Petri nets which we will need throughout the paper. In the *integer semantics*, we consider markings over Z<sup>P</sup> , and transitions can be fired without being enabled. To denote the firing and reachability relations, we use the notations −→<sup>Z</sup> and −→<sup>∗</sup> <sup>Z</sup>. In the *continuous semantics* [19], we consider markings over Q<sup>P</sup> <sup>≥</sup><sup>0</sup>. Given <sup>t</sup> <sup>∈</sup> <sup>T</sup> and a *scaling factor*

**Fig. 1.** A Petri net with places <sup>p</sup>1, <sup>p</sup>2, p3, p4 and transitions <sup>t</sup>1, t2. Marking {p1 : 2, p4 : 1} is drawn. No transition is enabled.

<sup>β</sup> <sup>∈</sup> <sup>Q</sup>≥<sup>0</sup> <sup>1</sup>, the effect of firing βt is <sup>Δ</sup>(βt) := <sup>β</sup> · <sup>Δ</sup>(t). Further, βt is enabled in a marking *<sup>m</sup>* iff <sup>β</sup> · •<sup>t</sup> <sup>≤</sup> *<sup>m</sup>*. We use −→<sup>Q</sup>≥<sup>0</sup> for the continuous semantics, that is, *<sup>m</sup>* −→βt <sup>Q</sup>≥<sup>0</sup> *<sup>m</sup>* means βt is enabled in *<sup>m</sup>* and *<sup>m</sup>* <sup>=</sup> *<sup>m</sup>* <sup>+</sup> <sup>Δ</sup>(βt). A *continuous run* π is a sequence of factors and transitions β1t1β2t<sup>2</sup> ...βntn. Enabledness and firing are extended to continuous runs: *<sup>m</sup>* −→<sup>π</sup> <sup>Q</sup>≥<sup>0</sup> *<sup>m</sup>* holds iff there exist *<sup>m</sup>*1,...,*m*<sup>n</sup>−<sup>1</sup> such that *<sup>m</sup>* −→<sup>β</sup>1t<sup>1</sup> <sup>Q</sup>≥<sup>0</sup> *<sup>m</sup>*<sup>1</sup> −→<sup>β</sup>2t<sup>2</sup> <sup>Q</sup>≥<sup>0</sup> ···*m*<sup>n</sup>−<sup>1</sup> −→<sup>β</sup>*n*t*<sup>n</sup>* <sup>Q</sup>≥<sup>0</sup> *m* . The *length* of π is |π| <sup>c</sup> := n <sup>i</sup>=1 <sup>β</sup>i. Given <sup>α</sup> <sup>∈</sup> <sup>Q</sup>≥<sup>0</sup> and a run <sup>π</sup> <sup>=</sup> <sup>β</sup>1t1β2t<sup>2</sup> ...βnt<sup>n</sup> we write απ to denote the run (αβ1)t1(αβ2)t<sup>2</sup> ...(αβn)tn. We introduce a lemma stating that continuous runs can be rescaled.

**Lemma 1 (Lemma 12(1) in** [19]**).** *Let* <sup>α</sup> <sup>∈</sup> <sup>Q</sup>≥0*. Then <sup>m</sup>* −→<sup>π</sup> <sup>Q</sup>≥<sup>0</sup> *<sup>m</sup> if and only if* <sup>α</sup>*<sup>m</sup>* −→απ <sup>Q</sup>≥<sup>0</sup> <sup>α</sup>*m .*

Each run under normal semantics or integer semantics is *equivalent* to a continuous run i.e. <sup>t</sup>1t<sup>2</sup> ...t<sup>n</sup> <sup>≈</sup> <sup>1</sup>t11t<sup>2</sup> ... <sup>1</sup>t2. Given <sup>π</sup> <sup>∈</sup> Runs*<sup>m</sup>* <sup>N</sup> (*i.e.* a standard run) we define απ = απ<sup>c</sup> where π<sup>c</sup> ≈ π is a continuous run. If π<sup>c</sup> = β1t<sup>1</sup> ...βnt<sup>n</sup> with <sup>β</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup> for all <sup>i</sup> ∈ {1,...,n} then we also call <sup>π</sup> a (standard) run, *i.e.* the run where every transition t<sup>i</sup> is repeated β<sup>i</sup> times.

We define the set of continuous runs enabled from *<sup>m</sup>* <sup>∈</sup> <sup>N</sup><sup>P</sup> in <sup>N</sup> as CRuns*<sup>m</sup>* N . The *Parikh image* of a continuous run π = β1t1β2t<sup>2</sup> ...βnt<sup>n</sup> is the vector *R*<sup>π</sup> ∈ Q<sup>T</sup> such that *R*π(t) = - <sup>i</sup>|t*i*=<sup>t</sup> <sup>β</sup>i. For a (standard) run <sup>π</sup> we define its Parikh image *<sup>R</sup>*<sup>π</sup> := *<sup>R</sup>*<sup>π</sup>*<sup>c</sup>* where <sup>π</sup><sup>c</sup> <sup>≈</sup> <sup>π</sup>. Given a vector *<sup>R</sup>* <sup>∈</sup> <sup>Q</sup><sup>T</sup> ≥0 - , we define Δ(*R*) := <sup>t</sup>∈<sup>T</sup> *<sup>R</sup>*(t) · <sup>Δ</sup>(t), •*<sup>R</sup>* := - <sup>t</sup>∈<sup>T</sup> •<sup>t</sup> · *<sup>R</sup>*(t), *<sup>R</sup>*• := - <sup>t</sup>∈<sup>T</sup> <sup>t</sup> • · *R*(t). Note that *R* is essentially a run without imposing an order on the transitions. For ease of notation, we define Δ(T) as a matrix with columns indexed by T and rows indexed by P, where Δ(T)(t)(p) := Δ(t)(p). Then Δ(*R*) = Δ(T)*R*.

*Example 1.* Consider the Petri net drawn in Fig. 1. Marking *m* := {p<sup>1</sup> : 2, p<sup>4</sup> : 1} enables no transitions. However, we have *<sup>m</sup>* −→<sup>t</sup>1t<sup>2</sup> <sup>Z</sup> {p<sup>3</sup> : 2}. We also have *<sup>m</sup>* −→<sup>t</sup>2t<sup>1</sup> Z {p<sup>3</sup> : 2}, since the transition order does not matter under the integer semantics. Thus, when we take <sup>R</sup> <sup>=</sup> {t<sup>1</sup> : 1, t<sup>2</sup> : 1}, we have *<sup>m</sup>* −→<sup>R</sup> <sup>Z</sup> {p<sup>3</sup> : 2}.

Under the continuous semantics we can fire <sup>1</sup>/2t1, which is impossible under the normal semantics. For example, we have *<sup>m</sup>* −→<sup>1</sup>/2t<sup>1</sup> <sup>Q</sup>≥<sup>0</sup> {p<sup>1</sup> : 1, p<sup>2</sup> : <sup>1</sup>/<sup>2</sup>} −→<sup>1</sup>/2t<sup>2</sup> <sup>Q</sup>≥<sup>0</sup> {p<sup>1</sup> : 1, p<sup>3</sup> : 1, p<sup>4</sup> : 1} −→<sup>1</sup>/3t<sup>1</sup> <sup>Q</sup>≥<sup>0</sup> {p<sup>1</sup> : <sup>1</sup>/3, p<sup>2</sup> : <sup>1</sup>/3, p<sup>3</sup> : 1, p<sup>4</sup> : <sup>2</sup>/<sup>3</sup>}.

<sup>1</sup> Sometimes scaling factors are defined to be at most 1. The definitions are equivalent: Scaling larger than 1 can be done by firing the same transition multiple times.

#### **2.3 Workflow Nets**

A *workflow net* is a Petri net N such that:


We say that N is k*-sound* iff for all *m*, {i: k} −→<sup>∗</sup> *m* implies *m* −→<sup>∗</sup> {f : k}. Further, we say N is *generalised sound* iff it is k-sound for all k.

A place <sup>p</sup> <sup>∈</sup> <sup>P</sup> is *nonredundant* if {i: <sup>k</sup>} −→<sup>∗</sup> *<sup>m</sup>* for some <sup>k</sup> <sup>∈</sup> <sup>N</sup> and marking *m* with *m*(p) > 0, and *redundant* otherwise. We accordingly say that N is *nonredundant* if all p ∈ P are nonredundant, otherwise N is *redundant*. A redundant workflow net can be made nonredundant by removing each redundant place p ∈ P and all transitions such that •t(p) > 0 or t •(p) > 0. Note that this does not impact behaviour of the workflow, as the discarded transitions could not be fired in the original net. A polynomial-time saturation procedure can identify redundant places, see [27, Thm. 8, Def. 10, Sect. 3.2] and [9, Prop. 5.2].

If <sup>N</sup> is a workflow net, we write Runs<sup>k</sup> <sup>N</sup> for the set of runs that are enabled from the marking {i: <sup>k</sup>}, and CRuns<sup>k</sup> <sup>N</sup> for the same for continuous runs. Lemma <sup>1</sup> implies that if <sup>π</sup> <sup>∈</sup> Runs<sup>k</sup> <sup>N</sup> then <sup>1</sup> <sup>k</sup> <sup>π</sup> <sup>∈</sup> CRuns<sup>1</sup> <sup>N</sup> . The converse does not need to hold as the rescaled continuous run need not have natural coefficients.

*Example 2.* The Petri net in Fig. 1 can be seen as a workflow net with initial place p<sup>1</sup> and final place p3. The workflow is not k-sound for any k. Further, the net is redundant: {i: k} is a deadlock for every k, so places p2, p<sup>3</sup> and p<sup>4</sup> are redundant. -

#### **2.4 Termination Complexity**

Let N be a workflow net. Let us define as M axT ime<sup>N</sup> (k) the supremum of lengths among runs enabled in {i: k}, that is, M axT ime<sup>N</sup> (k) = sup{|π| | π ∈ Runs<sup>k</sup> <sup>N</sup> }. We say that <sup>N</sup> is *terminating* if M axT ime<sup>N</sup> (k) <sup>=</sup> <sup>∞</sup> for all <sup>k</sup> <sup>∈</sup> <sup>N</sup><sup>&</sup>gt;0, otherwise it is *non-terminating*.

We say that <sup>N</sup> has *polynomial termination time* if there exist <sup>d</sup> <sup>∈</sup> <sup>N</sup>, <sup>∈</sup> <sup>R</sup> such that for all k,

$$MaxTime\_{\mathcal{N}}(k) \le \ell \cdot k^d. \tag{1}$$

Further N has *linear termination time* if Eq. (1) holds with d = 1. Even more fine-grained, N has a-linear *termination time* if Eq. (1) holds for = a and d = 1. Note that any net with a-linear termination time also has (a + b)-linear termination time for all b ≥ 0. For ease of notation, we call workflow nets that have linear termination time *linear workflow nets*, and similarly for a-linear.

We define <sup>a</sup><sup>N</sup> := inf{<sup>a</sup> <sup>∈</sup> <sup>R</sup> | N is <sup>a</sup> -linear}. Note that in particular <sup>N</sup> is a<sup>N</sup> -linear (because the inequality in Eq. (1) is not strict) and that a<sup>N</sup> is the smallest constant with this property.

**Fig. 2.** Two workflow nets with the initial marking {i: 1}. The workflow net on the left-hand side is terminating in linear time. The workflow net on the right-hand side is the same as the one on the left, but with one extra transition <sup>t</sup>4. It is non-terminating.

*Example 3.* The net on the left-hand side of Fig. 2 is terminating. For example, from {i: 2} all runs have length at most 3. It is easy to see that from {i: k} all runs have length at most <sup>3</sup> <sup>2</sup> <sup>k</sup> (*e.g.* the run (t1t2t3) *<sup>k</sup>* 2 ). The net has a<sup>N</sup> = <sup>3</sup>/2.

The net on the right-hand side is non-terminating. From {i: 2}, all runs of the form t1t2t ∗ <sup>4</sup> are enabled. Note that while the net is non-terminating, all runs from {i: 1} have length at most 1 (because t<sup>3</sup> and t<sup>4</sup> are never enabled). -

Our definition of termination time is particular to workflow nets, as there it is natural to have only i marked initially. It differs from the original definition of termination complexity in [12]. In [12] VASS are considered instead of Petri nets, and the initial marking is arbitrary. The termination complexity is measured in the size of the encoding of *m*. The core difference is that in [12] it is possible to have a fixed number of tokens in some places, but arbitrarily many tokens in other places. In Sect. 3 we show an example that highlights the difference between the two notions. Our definition is a more natural fit for workflow nets, and will allow us to reason about soundness. Indeed, our particular definition of termination time allows us to obtain the coNP-completeness result of generalised soundness for linear workflow nets in Sect. 5.

#### **3 A Dichotomy of Termination Time in Workflow Nets**

Let us exhibit behaviour in Petri nets that cannot occur in workflow nets. Consider the net drawn in black in Fig. 3 and a family of initial markings {{p<sup>1</sup> : 1, s<sup>1</sup> : 1, b : <sup>n</sup>} | <sup>n</sup> <sup>∈</sup> <sup>N</sup>}. From the marking {p<sup>1</sup> : 1, s<sup>1</sup> : 1, b : <sup>n</sup>}, all runs have finite length, yet a run has length exponential in n. From the marking {p<sup>1</sup> : k, s<sup>1</sup> : 1, b : <sup>n</sup>}, the sequence (t1t2)<sup>k</sup>t4(t3)<sup>2</sup><sup>k</sup>t<sup>5</sup> results in the marking {p<sup>1</sup> : 2k, s<sup>1</sup> : 1, b : n − 1}. Thus, following this pattern n times leads from {p<sup>1</sup> : 1, s<sup>1</sup> : 1, b : <sup>n</sup>} to {p<sup>1</sup> : 2<sup>n</sup>, s<sup>1</sup> : 1}. This behaviour crucially requires us to keep a single token in s1, while having n tokens in b.

We can transform the net into a workflow net, as demonstrated by the colored part of Fig. 3. However, observe that then

$$\{\mathbf{i} : 2\} \to^{t\_{\mathrm{f}}t\_{\mathrm{f}}t\_{\mathrm{d}}} \{p\_1 : 2, s\_1 : 1, s\_2 : 1, b : 1\} \to^{t\_1t\_2t\_3} \{p\_1 : 2, s\_1 : 1, s\_2 : 1, b : 1, p\_3 : 1\}.$$

Note that the sequence t1t2t<sup>3</sup> strictly increased the marking. It can thus be fired arbitrarily many times, and the workflow net is non-terminating.

It turns out that, contrary to standard Petri nets, there exist no workflow nets with exponential termination time.<sup>2</sup> Instead, there is a dichotomy between non-termination and linear termination time.

**Theorem 1.** *Every workflow net* N *is either non-terminating or linear. Moreover,* M axT ime<sup>N</sup> (k) <sup>≤</sup> ak *for some* <sup>a</sup> ≤ N poly(|N |) *.*

**Fig. 3.** In black: A Petri net N adapted from [28, Lemma 2.8]. It enables a run with length exponential in <sup>n</sup> from marking {p1 : 1, s1 : 1, b : <sup>n</sup>}. In color: Additional places and transitions, which make N a workflow net.

As explained in Sect. 2.3 we can assume that N is nonredundant, *i.e.* for all <sup>p</sup> <sup>∈</sup> <sup>P</sup> there exists <sup>k</sup> <sup>∈</sup> <sup>N</sup> such that {i: <sup>k</sup>} −→<sup>∗</sup> *<sup>m</sup>* with *<sup>m</sup>*(p) <sup>&</sup>gt; 0. The first important ingredient is the following lemma.

**Lemma 2.** *Let* N = (P, T, F) *be a nonredundant workflow net. Then* N *is nonterminating iff there exists a nonzero <sup>R</sup>* <sup>∈</sup> <sup>N</sup><sup>T</sup> *such that* <sup>Δ</sup>(*R*) <sup>≥</sup> **<sup>0</sup>***.*

*Proof (sketch).* The first implication follows from the fact that if we start from a big initial marking, then it is possible to fill every place with arbitrarily many tokens. In such a configuration any short run is enabled, so if there is a run with non-negative effect then it is further possible to repeat it infinitely many times. For the other implication we reason as follows. If there is an infinite run then by Dickson's lemma there are *<sup>m</sup>*,*m* <sup>∈</sup> <sup>N</sup><sup>P</sup> such that for some <sup>k</sup>, it holds that {i: <sup>k</sup>} −→<sup>π</sup> *<sup>m</sup>* −→<sup>ρ</sup> *<sup>m</sup>* and *<sup>m</sup>* <sup>≥</sup> *<sup>m</sup>*. But then <sup>Δ</sup>(*R*ρ) = *<sup>m</sup>* <sup>−</sup> *<sup>m</sup>* <sup>≥</sup> **<sup>0</sup>**.

We define ILP<sup>N</sup> with a |T| dimensional vector of variables *x* as the following system of inequalities: *x* ≥ **0** and Δ(T)*x* ≥ **0**− {i: ∞}. <sup>3</sup> The next lemma follows immediately from the definition of −→Z.

<sup>2</sup> This is caused by the choice of the family of initial configurations. Fixing the number of initial tokens in some places can be simulated by control states in the VASS model.

<sup>3</sup> This ∞ is syntactic sugar to omit the inequality for the place i. Formally Δ(T) and *x* should be projected to ignore i.

**Lemma 3.** *[Adapted from Claim 5.7 in* [9]*] For every* <sup>k</sup> <sup>∈</sup> <sup>N</sup>*, <sup>m</sup>* <sup>∈</sup> <sup>N</sup><sup>P</sup> *, and a run* <sup>π</sup>*, it holds that* {i: <sup>k</sup>} −→<sup>π</sup> <sup>Z</sup> *m iff R*<sup>π</sup> *is a solution to ILP*<sup>N</sup> *with the additional constraint* -|T| <sup>i</sup>=1 Δ(ti)(i) · *R*π(ti) ≥ −k*.*

*Proof (Sketch for Theorem* 1*).* Because of Lemma 3 the Parikh image of every run (in <sup>k</sup>∈<sup>N</sup> Runs<sup>k</sup> <sup>N</sup> ) is a solution *<sup>R</sup>* <sup>∈</sup> <sup>N</sup><sup>T</sup> of <sup>Δ</sup>(T)*<sup>R</sup>* ≥ −{i: ∞}. So, we consider a set of solutions of the system of inequalities Δ(T)*R* ≥ −{i: ∞}. It is a linear set, so the sum of two solutions is again a solution and any solution can be written as a sum of small solutions with norm smaller than some <sup>c</sup> <sup>∈</sup> <sup>N</sup>. For such small solutions, the length of any corresponding run is at most |T| · c. Now observe that if the workflow is terminating then there is no *<sup>R</sup>* <sup>∈</sup> <sup>N</sup><sup>T</sup> such that Δ(T)*R* ≥ **0**, because of Lemma 2. But it holds that Δ(*R*)(i) ≤ −1 for any solution *<sup>R</sup>*, so in particular for all small solutions. Let us take a run <sup>π</sup> <sup>∈</sup> Runs<sup>k</sup> N . We decompose *R*<sup>π</sup> as a finite sum - <sup>i</sup> *R*<sup>i</sup> where *R*<sup>i</sup> are from the set of small solutions. We have −k ≤ Δ(*R*i)(i) = - <sup>i</sup> Δ(*R*i)(i) ≤ - <sup>i</sup> −1 = −. Recall that the norm of small solutions is bounded by c. It follows that the length of the run π is bounded by · |T| · c ≤ k · |T| · c. So the workflow is |T| · c-linear.

#### **4 Refining Termination Time**

Recall that a<sup>N</sup> is the smallest constant such that N is a<sup>N</sup> -linear. In this section, we are interested in computing a<sup>N</sup> . This number is interesting, as it can give insights into the shape and complexity of the net, *i.e.* a large a<sup>N</sup> implies complicated runs firing transitions several times, while a small a<sup>N</sup> implies some degree of choice, where not all transitions can be fired for each initial token.

The main goal of this section is to show an algorithm for computing a<sup>N</sup> . Our algorithm handles the more general class of *aggregates* on workflow nets, and we can compute a<sup>N</sup> as such an aggregate. More formally, let N = (P, T, F) be a workflow net. An aggregate is a linear map <sup>f</sup> : <sup>Q</sup><sup>T</sup> <sup>→</sup> <sup>Q</sup>. The aggregate of a (continuous) run is the aggregate of its Parikh image, that is f(π) := f(*R*π).

*Example 4.* Consider the aggregate fall(π) := - <sup>t</sup>∈<sup>T</sup> *<sup>R</sup>*π(t) = <sup>|</sup>π|, which computes the number of occurrences of all transitions. Let us consider two other natural aggregates. The aggregate ft(π) := *R*π(t) computes the number of occurrences of transition t, and fp(π) := - <sup>t</sup>∈<sup>T</sup> <sup>Δ</sup>(t)(p) · *<sup>R</sup>*π(t) computes the number of tokens added to place p. Another use for aggregates is counting transition, but with different weights for each transition, thus simulating *e.g.* different costs. -

Given a workflow net N and an aggregate f we define

$$\sup\_{f, \mathcal{N}} = \sup \left\{ \frac{f(\pi)}{k} \mid k \in \mathbb{N}\_{>0}, \ \pi \in \text{Runs}\_{\mathcal{N}}^k \right\}.\tag{2}$$

Let us justify the importance of this notion by relating it to a<sup>N</sup> .

**Proposition 1.** *Let* N *be a linear workflow net. Then* a<sup>N</sup> = *sup*<sup>f</sup>*all*,<sup>N</sup> *.*

*Proof.* Recall that a<sup>N</sup> is the smallest number a such that |π| ≤ a · k for all k ∈ <sup>N</sup>><sup>0</sup> and <sup>π</sup> <sup>∈</sup> Runs<sup>k</sup> <sup>N</sup> . Equivalently, <sup>|</sup>f*all*(π)<sup>|</sup> <sup>k</sup> ≤ a. Thus by definition supf*all*,<sup>N</sup> ≤ a<sup>N</sup> , and the inequality cannot be strict since a<sup>N</sup> is the smallest number with this property.

**Theorem 2.** *Consider a workflow net* N *and an aggregate* f*. The value sup*f,<sup>N</sup> *can be computed in polynomial time.*

**Corollary 1.** *Let* N = (P, T, F) *be a linear workflow net. The constant* a<sup>N</sup> *can be computed in polynomial time.*

In practice, we can use an LP solver to compute the constant a<sup>N</sup> . The algorithm is based on the fact that continuous reachability for Petri nets is in polynomial time [7,19]. We formulate a lemma that relates the values of aggregates under the continuous and standard semantics.

**Lemma 4.** *Let* N *be a Petri net and* f *be an aggregate.*

*1. Let* <sup>π</sup> <sup>∈</sup> *Runs*<sup>k</sup> <sup>N</sup> *. Then* <sup>1</sup>/<sup>k</sup> · <sup>π</sup> <sup>∈</sup> *CRuns*<sup>1</sup> <sup>N</sup> *and* <sup>f</sup>(1/<sup>k</sup> · <sup>π</sup>) = <sup>f</sup>(π) /k*. 2. Let* <sup>π</sup><sup>c</sup> <sup>∈</sup> *CRuns*<sup>1</sup> <sup>N</sup> *. There are* <sup>k</sup> <sup>∈</sup> <sup>N</sup> *and* <sup>π</sup> <sup>∈</sup> *Runs*<sup>k</sup> <sup>N</sup> *with* <sup>f</sup>(πc) = <sup>f</sup>(π) /k*.*

*Proof.* Both items are simple consequences of Lemma 1 and the linearity of aggregates. Note that for (2), if π<sup>c</sup> = β1t<sup>1</sup> ...βnt<sup>n</sup> then it suffices to define k such that <sup>β</sup><sup>i</sup> · <sup>k</sup> <sup>∈</sup> <sup>N</sup> for all <sup>i</sup> ∈ {1,...,n}.

From the above lemma we immediately conclude the following.

**Corollary 2.** *It holds that sup*f,<sup>N</sup> <sup>=</sup> sup{f(πc) <sup>|</sup> <sup>π</sup><sup>c</sup> <sup>∈</sup> *CRuns*<sup>1</sup> <sup>N</sup> }*.*

*Proof (The proof of Theorem* 2*).* We use Corollary 2 and conclude that we have to compute sup{f(πc) <sup>|</sup> <sup>π</sup><sup>c</sup> <sup>∈</sup> CRuns<sup>1</sup> <sup>N</sup> }. Let <sup>S</sup> <sup>=</sup> {*R*<sup>π</sup>*<sup>c</sup>* <sup>|</sup> <sup>π</sup><sup>c</sup> <sup>∈</sup> CRuns<sup>1</sup> <sup>N</sup> }. As f(π) is defined as f(*R*π) , we reformulate our problem to compute sup{f(*v*) | *v* ∈ S}. Since f is a continuous function, it holds that sup{f(*v*) | *v* ∈ S} = sup{f(*v*) | *v* ∈ S}. Let us define LPf,<sup>N</sup> as an LP with variables *x* := x1,...,x|T<sup>|</sup> and constraints Δ(T)*x* ≥ −{i: 1} and *x* ≥ **0**.

*Claim 1.* It holds that *v* ∈ S if and only if *v* is a solution to LPf,<sup>N</sup> .

We postpone the proof of Claim 1. Claim 1 allows us to rephrase the computation of sup{f(*v*) | *v* ∈ S} as an LPf,<sup>N</sup> where we want to maximise f(*v*), which can be done in polynomial time.

What remains is the proof of Claim 1. It constitutes the remaining part of this Section. The claim is a special case of the forthcoming Lemma 8. Its formulation and proof require some preparation.

**Definition 1.** *A workflow net is* good *for a set of markings* <sup>M</sup> <sup>⊆</sup> <sup>Q</sup><sup>P</sup> <sup>≥</sup><sup>0</sup> *if for every place* p *there are markings m*,*m and continuous runs* π *and* π *such that <sup>m</sup>*(p) <sup>&</sup>gt; <sup>0</sup>*, <sup>m</sup>* <sup>∈</sup> <sup>M</sup>*, and* {i: 1} −→<sup>π</sup> <sup>Q</sup>≥<sup>0</sup> *<sup>m</sup>* −→<sup>π</sup>- <sup>Q</sup>≥<sup>0</sup> *m .*

The notion of being good for a set of markings is a refined concept of nonredundancy. The nonredundancy allow us to mark every place. But if, after marking the place, we want to continue the run and reach a marking in a specific set of markings <sup>M</sup> <sup>⊆</sup> <sup>Q</sup><sup>P</sup> <sup>≥</sup><sup>0</sup>, then we don't know if the given place can be marked. This motivates Definition 1.

*Example 5.* Let us consider a workflow net depicted on Fig. 4. It is nonredundant, as every place can be marked. But it is not good for {f : 1} as there is no continuous run to the marking {f : 1}. In the initial marking the only enabled transition is <sup>t</sup><sup>1</sup> but firing βt<sup>1</sup> for any <sup>β</sup> <sup>∈</sup> <sup>Q</sup>≥<sup>0</sup> reduce the total amount of tokens in the net. The lost tokens can not be recrated so it is not possible to reach {f : 1}.

**Fig. 4.** A Petri net with places <sup>p</sup>1, <sup>p</sup>2, p3 and transitions <sup>t</sup>1, t2, t3. Marking {i: 1} is drawn.

The important fact is as follows:

**Lemma 5.** *Let* <sup>M</sup> <sup>⊆</sup> <sup>Q</sup><sup>P</sup> <sup>≥</sup><sup>0</sup> *be a set of solutions of some LP. Then testing if a net is good for* M *can be done in polynomial time.*

**Lemma 6.** *Suppose a workflow net* <sup>N</sup> *is good for* <sup>M</sup> <sup>⊆</sup> <sup>Q</sup><sup>P</sup> <sup>≥</sup><sup>0</sup> *and* <sup>M</sup> *is a convex set. Then there is a marking m***<sup>+</sup>** *such that m***+**(p) > 0 *for every* p ∈ P *and there are continuous runs* π*,* π *, and a marking mf* <sup>∈</sup> <sup>M</sup> *such that* {i: 1} −→<sup>π</sup> <sup>Q</sup>≥<sup>0</sup> *<sup>m</sup>***<sup>+</sup>** −→<sup>π</sup>- <sup>Q</sup>≥<sup>0</sup> *mf .*

Informally, we prove it by taking a convex combination of a |P| runs one for each p ∈ P. The last bit needed for the proof of Lemma 8 is the following lemma, shown in [19].

**Lemma 7 (**[19]**, Lemma 13).** *Let* <sup>N</sup> *be a Petri net. Consider <sup>m</sup>***0***, <sup>m</sup>* <sup>∈</sup> <sup>N</sup><sup>P</sup> *and <sup>v</sup>* <sup>∈</sup> <sup>Q</sup><sup>T</sup> <sup>≥</sup><sup>0</sup> *such that:*

*– m* = *m***<sup>0</sup>** + Δ(*v*)*; –* ∀p ∈ •*v* : *m***0**(p) > 0*; –* ∀p ∈ *v*• : *m*(p) > 0*.*

*Then there exists a finite continuous run* <sup>π</sup> *such that <sup>m</sup>***<sup>0</sup>** −→<sup>π</sup> <sup>Q</sup>≥<sup>0</sup> *<sup>m</sup> and <sup>R</sup>*<sup>π</sup> <sup>=</sup> *<sup>v</sup>.* **Lemma 8.** *Suppose* M *is a convex set of markings over* Q<sup>P</sup> <sup>≥</sup><sup>0</sup> *and that the workflow net is good for* M*. Let* S *be the set of Parikh images of continuous runs that start in* {i: 1} *and end in some marking m* ∈ M *i.e.*

> S := {*R*<sup>π</sup> | ∃π∈*CRuns*<sup>1</sup> <sup>N</sup> ∃*m*-<sup>∈</sup><sup>M</sup> *such that* {i: 1} −→<sup>π</sup> <sup>Q</sup>≥<sup>0</sup> *m* }.

*Then v* ∈ S *if and only if there is a marking m* ∈ M *such that* Δ(T)*v* = *m* − {i: 1}*.*

In one direction the proof of the lemma is trivial, in the opposite direction, intuitively, we construct a sequence of runs with Parikh images converging to *v*. The Lemma 6 is used to put ε in every place (for ε −→ 0) and Lemma 7 to show that there are runs with the Parihk image equal ε*x* + (1 − ε)*v* for some *x* witnessing Lemma 6. We are ready to prove Claim 1.

*Claim 1.* It holds that *v* ∈ S if and only if *v* is a solution to LPf,<sup>N</sup> .

*Proof.* Let M be the set of all markings over Q<sup>P</sup> <sup>≥</sup><sup>0</sup>, which clearly is convex. As N is nonredundant we know that every place can be marked via a continuous run, and because M is the set of all markings we conclude that N is good for M according to Definition 1. Thus M satisfies the prerequisites of Lemma 8. It follows that S is the set of solutions of a system of linear inequalities. Precisely, *<sup>v</sup>* <sup>∈</sup> <sup>S</sup> if and only if there is *<sup>m</sup>* <sup>∈</sup> <sup>Q</sup><sup>P</sup> <sup>≥</sup><sup>0</sup> such that <sup>Δ</sup>(T)*<sup>v</sup>* <sup>≥</sup> *<sup>m</sup>*− {i: 1} and *<sup>v</sup>* <sup>≥</sup> 0, which is equivalent to Δ(T)*v* ≥ −{i: 1} and *v* ≥ 0, as required.

### **5 Soundness in Terminating Workflow Nets**

The dichotomy between linear termination time and non-termination shown in Sect. 3 yields an interesting avenue for framing questions in workflow nets. We know that testing generalised soundness is PSPACE-complete, but the lower bound in [9] relies on a reset gadget which makes the net non-terminating. Indeed, it turns out that the problem is simpler for linear workflow nets.

**Theorem 3.** *Generalised soundness is coNP-complete for linear workflow nets.*

A marking *m* is called a *deadlock* if Runs*<sup>m</sup>* <sup>N</sup> = ∅. To help prove the coNP upper bound, let us introduce a lemma.

**Lemma 9.** *Let* N *be a terminating nonredundant workflow net. Then* N *is not generalised sound iff there exist* <sup>k</sup> <sup>∈</sup> <sup>N</sup> *and a marking <sup>m</sup>* <sup>∈</sup> <sup>N</sup><sup>P</sup> *such that* {i: k} −→<sup>∗</sup> <sup>Z</sup> *m, m is a deadlock and m* = {f : k}*. Moreover, if* N ≤ 1 *then* {i: k} −→<sup>∗</sup> <sup>Z</sup> *m can be replaced with* {i: k} −→<sup>∗</sup> <sup>Q</sup> *m.*

The last part of the lemma is not needed for the theoretical results, but it will speed up the implementation in Sect. 7. We can now show Theorem 3.

*Proof (of the coNP upper bound in Theorem* 3*).* Let N = (P, T, F) and denote <sup>T</sup> <sup>=</sup> {t1,...,tn}. By Lemma <sup>9</sup> <sup>N</sup> is not generalised sound iff there are <sup>k</sup> <sup>∈</sup> <sup>N</sup> and *<sup>m</sup>* <sup>∈</sup> <sup>N</sup><sup>P</sup> such that {i: <sup>k</sup>} −→<sup>∗</sup> <sup>Z</sup> *m*, *m* is a deadlock and *m* = {f : k}. We can reduce the property to an ILP. First, the procedure guesses |T| places p1,...,p<sup>n</sup> ∈ P (one for each transition). For each transition ti, place p<sup>i</sup> will prohibit firing t<sup>i</sup> by not being marked with enough tokens. We create ILPN,p1,...,p*<sup>n</sup>* , which is very similar to ILP<sup>N</sup> (see Sect. 3), but adds additional constraints. They state that (Δ(T)*x*)(p<sup>j</sup> ) − •t<sup>j</sup> (p<sup>j</sup> ) < 0 for every 1 ≤ j ≤ n.

Let us show that there are p1,...,p<sup>n</sup> such that ILPN,p1,...,p*<sup>n</sup>* has a solution iff there exist k and a deadlock *m* such that {i: k} −→<sup>∗</sup> <sup>Z</sup> *m*. Indeed, let x1,...,x<sup>n</sup> be a solution of ILPN,p1,...,p*<sup>n</sup>* . We denote k = −n <sup>i</sup>=1 Δ(ti)(i) · x<sup>i</sup> and *m* = {i: k} + n <sup>i</sup>=1 Δ(ti) · xi. It is clear that {i: k} −→<sup>∗</sup> <sup>Z</sup> *m*. The new constraints ensure that for each t<sup>i</sup> ∈ T there exists p<sup>i</sup> ∈ P such that •ti(pi) > *m*(pi), thus *m* is a deadlock.

To encode the requirement that *m* = {f : k}, note that there are three cases, either *m*(k) ≤ k − 1, *m*(k) ≥ k + 1, or *m*(k) = k but *m* − {f : k} ≥ 0. We guess which case occurs, and add the constraint for that case to ILPN,p1,...,p*<sup>n</sup>* .

The lower bound can be proven using a construction presented in [10, Theorem 2] to show a problem called continuous soundness on acyclic workflow nets is coNP-hard. We say that a workflow net is *continuously sound* iff for all *m* such that {i: 1} −→<sup>∗</sup> <sup>Q</sup>≥<sup>0</sup> *<sup>m</sup>*, it holds that *<sup>m</sup>* −→<sup>∗</sup> <sup>Q</sup>≥<sup>0</sup> {<sup>f</sup> : 1}. The reduction can be used as is to show that generalised soundness of nets with linear termination time is coNP-hard, but the proof differs slightly. See the appendix for more details.

### **6 Termination Time and Concurrent Semantics**

Note that in Petri nets, transitions may be fired concurrently. Thus, in a sense, our definition of termination time may overestimate the termination time.

In this section we investigate parallel executions for workflow nets. Whereas the termination time is focused on the worst case sequential execution, now we are interested in finding the best case parallel executions. Thus, we provide an optimistic lower bound on the execution time to contrast the pessimistic upper bound investigated in Sect. 3 and Sect. 4.

**Definition 2.** *Given a Petri net* <sup>N</sup> *let* <sup>π</sup> <sup>=</sup> <sup>t</sup>1t<sup>2</sup> ...t<sup>n</sup> <sup>∈</sup> *Runs*<sup>k</sup> <sup>N</sup> *for some* <sup>k</sup> <sup>∈</sup> <sup>N</sup>*. A* block *in* π *is a subsequence of* π*, i.e.* ta,...,t<sup>b</sup> *for some* 1 ≤ a ≤ b ≤ n*. We define the* parallel execution *of* π *with respect to* k *as a decomposition of* π *into blocks* π = π1π<sup>2</sup> ...π *such that*

*1. all transitions are pairwise different in a single block; and 2.* •*R*<sup>π</sup>*<sup>i</sup>* ≤ {i: k} + - j<i Δ(π<sup>j</sup> ) *for every* 1 ≤ i ≤ *.*

*The* execution time *of a parallel execution is denoted as* exec(π1π<sup>2</sup> ...π) := *.*

*Example 6.*

We consider parallel executions of the run t1t2t1t2t3t<sup>3</sup> with respect to 4 initial tokens. The run can be decomposed into (t1t2)(t1t2)(t3)(t3) but also into (t1)(t2t1)(t2t3)(t3). Both executions have execution time 4. The parallel execution (t1t2)(t1t2t3)(t3) has execution time 3. -

We are interested in finding the parallel executions of a run that minimise the execution time. It turns out that the so-called *greedy parallel execution* is such a minimal parallel execution. Given π and k it is defined inductively on the prefix of π. Suppose we already have some blocks π<sup>1</sup> ...π<sup>i</sup>−<sup>1</sup>. To construct block πi, we simply choose the maximal sequence of transitions immediately following the last block π<sup>i</sup>−<sup>1</sup> that satisfies the two conditions of Definition 2. In particular the last partition in Example 6 is the greedy parallel execution.

**Lemma 10.** *Consider a run* <sup>π</sup> *and* <sup>k</sup> <sup>∈</sup> <sup>N</sup>*. The greedy parallel execution of* <sup>π</sup> *has the smallest execution time among all parallel executions of* π *with respect to* k*.*

Consider a workflow net N with the initial marking {i: k}. Let S<sup>k</sup> := {π | {i: <sup>k</sup>} −→<sup>π</sup> {<sup>f</sup> : <sup>k</sup>}}. We define M inT ime<sup>N</sup> (k) as the minimal execution time among parallel executions of runs in Sk. If S<sup>k</sup> = ∅ then M inT ime<sup>N</sup> (k)=+∞.

**Lemma 11.** *Let* <sup>N</sup> *be a workflow net and let* k, x <sup>∈</sup> <sup>N</sup>*. Deciding whether* M inT ime<sup>N</sup> (k) ≤ x *is PSPACE-hard even if we fix* k = 1*.*

As computing M inT ime<sup>N</sup> (k) is computationally hard, we modify the question and ask about the asymptotic behaviour (similarly to Sect. 4). Thus, we are interested in computing lim<sup>k</sup>→∞ M inT ime<sup>N</sup> (k) <sup>k</sup> . The problem is well defined as the limit exists. This is interesting as lim<sup>k</sup>→∞ M inT ime<sup>N</sup> (k) <sup>k</sup> corresponds to the average processing time of a single token when the workflow runs (informally speaking) on its maximal efficiency.

**Theorem 4.** *For a given nonredundant, generalised sound workflow net*<sup>4</sup> <sup>N</sup> *we can compute* lim<sup>k</sup>→∞ M inT ime<sup>N</sup> (k) <sup>k</sup> *in polynomial time.*

*Proof (A sketch of the proof ).* The main idea relies on the continuous semantics, similarly to the proof of Theorem 2. We show that the limit is equal to the infimum over execution times<sup>5</sup> of continuous runs {i: 1} −→<sup>Q</sup>≥<sup>0</sup> {<sup>f</sup> : 1}. Then we prove the following claim.

<sup>4</sup> These assumptions can be relaxed to *a net good for* {<sup>f</sup> : 1}, see Definition 1.

<sup>5</sup> For a suitably defined parallel execution and execution time of continuous runs.

*Claim 2.* Let *<sup>v</sup>* <sup>∈</sup> <sup>Q</sup><sup>T</sup> <sup>≥</sup><sup>0</sup>. Let <sup>S</sup>*<sup>v</sup>* <sup>=</sup> {<sup>π</sup> | {i: 1} −→<sup>π</sup> <sup>Q</sup>≥<sup>0</sup> {<sup>f</sup> : 1} and *<sup>R</sup>*<sup>π</sup> <sup>=</sup> *<sup>v</sup>*}. If <sup>S</sup> <sup>=</sup> <sup>∅</sup> then the infimum over optimal execution time of runs in S*<sup>v</sup>* equals *v*.

Let S be the set of Parikh images of continuous runs from {i: 1} to {f : 1}. We define <sup>f</sup> : <sup>S</sup> <sup>→</sup> <sup>Q</sup>≥<sup>0</sup> such that <sup>f</sup>(*v*) = *v*. Thus we can reformulate the problem as computing inf{f(*v*) | *v* ∈ S}. The function f is continuous, thus we can reformulate further as compute inf{f(*v*) | *v* ∈ S}. The function f is not linear on S, but it is piecewise linear. We define S<sup>t</sup> ⊆ S for t ∈ T as follows S<sup>t</sup> = {*v* | *v* ∈ S and *v*(t) ≥ *v*(t ) for all t ∈ T}. Observe that f is linear over S<sup>t</sup> for every t ∈ T and that S = <sup>t</sup>∈<sup>T</sup> <sup>S</sup>t. Thus we can rephrase our problem as computing the minimum over the set {inf{*v*(t) | *v* ∈ S<sup>t</sup>} | t ∈ T}.

Thus it is sufficient to show that inf{*v*(t) | *v* ∈ S<sup>t</sup>} can be computed in polynomial time for any t ∈ T. Lemma 8 allows us to characterize S as follows: *v* ∈ S iff Δ(T)*v* = {f : 1}−{i: 1} and *v* ≥ **0**. In consequence, S<sup>t</sup> can be characterized as the set of solutions of the following system of inequalities

$$\Delta(T)\boldsymbol{\upsilon} = \{\mathbf{f} \colon 1\} - \{\text{i} \colon 1\} \text{ and } \boldsymbol{\upsilon} \ge \mathbf{0} \text{ and } \boldsymbol{\upsilon}(t) \ge \boldsymbol{\upsilon}(t') \text{ for all } t' \in T.$$

This allows us to capture {inf{*v*(t) | *v* ∈ S<sup>t</sup>} | t ∈ T} as an LP problem which can be solved in polynomial time.

#### **7 Experimental Evaluation**

We have implemented prototypes of several procedures outlined in the paper, namely procedures to 1) decide termination; 2) decide soundness for terminating nets; 3) compute a<sup>N</sup> for terminating nets; and 4) compute M inT ime<sup>N</sup> (1), M axT ime<sup>N</sup> (1), and decide 1-soundness for nets with known a<sup>N</sup> . The idea behind all procedures is to use our results to encode the properties in LPs/ILPs. To solve these programs, we utilize the MILP solver *Gurobi* [24].

For 1), recall Lemma 2, which states that non-termination of a workflow net <sup>N</sup> is equivalent to the existence of a Parikh image *<sup>R</sup>* <sup>∈</sup> <sup>N</sup><sup>T</sup> with <sup>Δ</sup>(*R*) <sup>≥</sup> **<sup>0</sup>**. We can instead search for *<sup>R</sup>* <sup>∈</sup> <sup>Q</sup><sup>T</sup> , as any solution could be scaled up to an integral one. Thus, we can encode this condition as an LP in a straightforward manner, and decide termination in polynomial time.<sup>6</sup>

For 2), we essentially use ILPN,p1,...,p*<sup>n</sup>* , as defined in the proof of Theorem 3. A solution to ILPN,p1,...,p*<sup>n</sup>* yields a run <sup>π</sup> such that there exists <sup>k</sup> <sup>∈</sup> <sup>N</sup> with {i: <sup>k</sup>} −→<sup>π</sup> <sup>Z</sup> *m*, where *m* is a deadlock.

We also consider continuous instead of integral variables. Then solutions relate to runs over −→<sup>∗</sup> <sup>Q</sup> instead. As hinted at in the last sentence of Lemma 9, both variants yield equivalent results on nets without arc weights, *i.e.* N ≤ 1. However, continuous variables are generally easier to handle for MILP solvers. For brevity, by *integer deadlocks* we refer to the approach using integer variables, and by *continuous deadlocks* to the approach with continuous variables.

<sup>6</sup> This observation and the general approach comes from [30].

For 3), recall the LP given in Claim 1. We can use it to compute supf,<sup>N</sup> for any aggregate N , so in particular we can use it to compute supf*all*,<sup>N</sup> , which is equal to a<sup>N</sup> by Equation (2). Here, it only remains to mention that Gurobi allows not only checking feasibility of systems of linear inequalities, but further allows optimizing an objective value, as required by the LP.

For 4), note that if we have the bound a<sup>N</sup> on the length of runs from {i: 1}, we can check properties by unrolling runs. The intuition is as follows. We have a<sup>N</sup> · |T| integer variables. For step j of the run, we have variables x1,j , x2,j ,...,x|T|,j . The variables for a step encode which transition(s) are fired in that step. We ensure that we encode a run by requiring -|T| <sup>i</sup>=1 xi,j ≤ 1 for all j ∈ [1..a<sup>N</sup> ]. We use integer variables, so either one or no transition is fired in each step.

Alternatively, we encode a parallel execution by imposing the requirements of Definition 2 on steps. By further specifying that for all j ∈ [1..a<sup>N</sup> ], it holds that {i: 1} + j j-=0 -|T| <sup>i</sup>=1 Δ(ti)xi,j- ≥ **0**, thus the marking reached so far after each step is nonnegative. To compute M inT ime<sup>N</sup> (1)/M axT ime<sup>N</sup> (1), we minimise/maximise the number of blocks/steps with non-zero transition variables. For 1-soundness, we require reaching a deadlock different from {f : 1}.

Our prototype is implemented in C#. All experiments were run on an 8- Core Intel<sup>R</sup> CoreTM i7-7700 CPU @ 3.60 GHz with Ubuntu 18.04. We limited memory to ∼8 GB. The time was limited to 60 s for checking termination and generalised soundness as well as for computing a<sup>N</sup> . It was limited to 15 s for computing M inT ime<sup>N</sup> (1), M axT ime<sup>N</sup> (1) and for checking 1-soundness.

#### **7.1 Benchmark Suite**

We use a popular benchmark suite of 1386 free-choice nets originating from models created in the IBM WebSphere Business Modeler. The instances were originally introduced in [18] and have frequently been studied since, see [13,37,38]. The nets use a slightly different formalisation of workflow nets that allow multiple final places, which can be transformed to standard workflow nets using a technique from [29]. This technique adds transitions, thus can increase a<sup>N</sup> , M inT ime<sup>N</sup> and M axT ime<sup>N</sup> . Unfortunately, 4 instances cannot be transformed to workflow nets with this technique, so we remove them. We also apply a set of well-known reduction rules from [13] that reduce the size of instances while keeping all types of soundness intact, and remove instances that are trivially sound after reduction. These rules never increase a<sup>N</sup> . While they in theory could increase M inT ime<sup>N</sup> , this does not happen on our benchmarks. Due to the nature of the reduction rules, it may not be appropriate to run them before analyzing M inT ime<sup>N</sup> , M axT ime<sup>N</sup> (1) and a<sup>N</sup> , since these numbers then give no information about the original workflow. Thus we only run experiments on the reduced instances when we check soundness and termination.

In total, we are left with 1382 unreduced and 740 non-trivial reduced instances. Statistics about the sizes of the workflow nets can be seen in the columns under Net Size in Fig. 5. The reduced nets are much smaller than the unreduced ones, even when the nets are not reduced to the trivial net.



**Fig. 5.** Top: Statistics on the net size, and analysis times for deciding termination, and checking generalised soundness via deadlocks and continuous soundness. Bottom: Statistics on the number of terminating/non-terminating and deadlocking/nondeadlocking (thus generalised unsound/generalised sound) nets.

#### **7.2 Termination and Deadlocks**

The time taken to decide termination is shown in the column labelled "Termination" in the top table of Fig. 5. The numbers of nets that are terminating and non-terminating are shown in the bottom table of Fig. 5. Among both the unreduced and reduced instances, the vast majority are terminating (about 90%). Note that the reduction rules can remove nontermination, even when they do not make the net nontrivial, thus the prevalence of terminating instances is even stronger among the reduced instances. In terms of analysis time, termination can be decided in under 25 ms for all instances, with a median of 3 ms.

The top of Fig. 5 shows the analysis times for generalised soundness. We use three algorithms. Columns "Continuous Deadlock" and "Integer Deadlock" show results for our two proposed approaches, and column "Continuous Soundness" shows the performance of a state-of-art approach [10] for deciding generalised soundness. Note that both approaches may claim an unsound workflow net to be sound, but they are precise on different classes of nets. The absence of integer deadlocks is equivalent to generalised soundness on terminating nets, see Lemma 9. Similarly, continuous soundness is equivalent to generalised soundness on freechoice nets [10].

In practice, it turns out that our approach for checking the absence of integer deadlocks is faster than the existing approach using continuous soundness on every single instance. Continuous soundness times out on 215 of the unreduced instances (not listed in the table), but neither of the approaches utilizing deadlocks times out on any instance. The performance of continuous soundness is not surprising: continuous soundness is checked by passing an ∃∀-formula from FO(Q, <, +) to an SMT solver. Quantifier alternation increases the complexity of validating such formulas [23]. In comparison, our check for integer deadlocks is implemented using standard ILP techniques, and thus an existential formula.

The bottom shows how many nets are non-terminating, as well as how many are deadlocking (thus not generalised sound). Recall that integer deadlocks and continuous deadlocks are equivalent for nets without arc weights, which all of our nets are. Both types of deadlocks are fast to compute, taking less than 90ms on each instance. In practice, checking for continuous deadlocks may be useful even for nets with arc weights, since their absence also proves the absence of integer deadlocks. About 50% of the unreduced instances and roughly 75% of the reduced instances are deadlocking. Note that the reduction rules can only make sound instances trivial, which are by definition not able to reach a deadlock.

### **7.3** *a<sup>N</sup>* **,** *M inT ime<sup>N</sup>* **(1) and** *MaxT ime<sup>N</sup>* **(1)**

The top of Fig. 6 the distribution of a<sup>N</sup> . This number depends on the number of transitions, so is hard to put into context. We instead display L := <sup>a</sup>N/|T|. Intuitively, that number is an upper bound on the average of how many times each transition can be fired per initial tokens. For example, a net with L = 1 likely is linear, *i.e.* each transition can be fired only once per initial token, while nets with L >> 1 may exhibit more complex behaviour, and nets with L << 1 may exhibit high degrees of choice, where runs only visit a part of the net. We group nets with similar L to give an idea of the distribution of the values of L across instances. Our computation of a<sup>N</sup> ran out of memory on 8 nets, so the figure displays only 1254 nets. Most nets have L ≤ 1, with a significant number having in particular L = 1. The maximal L is 5.83 among unreduced and 4.33 among reduced instances, while the minimal L is 0.17 and 0.11 respectively.

To display M inT ime<sup>N</sup> (1) and M axT ime<sup>N</sup> (1), we also divide them by the number of transitions, as we did for <sup>a</sup><sup>N</sup> . We write <sup>T</sup>M in := M inT ime<sup>N</sup> (1)/|T<sup>|</sup> and TMax := MaxT ime<sup>N</sup> (1)/|T|. We are mostly interested in their difference D := TMax − TM in. For nets with large D, the difference between the pessimistic sequential and optimistic parallel execution time is large, thus they might allow a high degree of parallelism. On the contrary, if nets have very small D, they have a sequential structure. We again group nets with similar D, as we did for L above. The results of the analysis are shown in the middle table of Fig. 6.

As we divide by |T| in the definition of D, it would be unusual for it to take on huge values, and indeed all nets have D < 1. Note that even D = 0.5 is significant, as it means that M inT ime<sup>N</sup> (1) and M axT ime<sup>N</sup> (1) differ by half the number of transitions. The table totals only 700 nets. On 111 nets, computing M inT ime<sup>N</sup> (1) times out, while on 32 nets computing M axT ime<sup>N</sup> (1) times out, and both time out on 51 nets. On the remaining 360 nets, there is no execution from {i: 1} to {f : 1}, thus M inT ime<sup>N</sup> (1) = ∞.

The analysis times for computing a<sup>N</sup> , M inT ime<sup>N</sup> (1) and M axT ime<sup>N</sup> (1) are shown in the bottom table of Fig. 6. We group nets by their size |N | = |P|+|T| to show how the analysis times depend on the instance size. We only list 1060 nets,


**Fig. 6.** Top: Statistics on the distribution of L. Middle: Statistics on the distribution of D. Bottom: Statistics on the analysis times for a*<sup>N</sup>* , I*Min* and I*Max*.

as we omit those where the computation of M inT ime<sup>N</sup> (1) or M axT ime<sup>N</sup> (1) timed out. One interesting observation is that for most instances, particularly small ones, M inT ime<sup>N</sup> (1) is harder to compute than M axT ime<sup>N</sup> (1). However, both are very slow to compute compared to a<sup>N</sup> , which indeed never times out on our instances. In fact, a<sup>N</sup> takes at most 714ms to compute for any instance. It is interesting that the time for computing a<sup>N</sup> does not seem to depend highly on the net size. We suspect this might be partly due to the fact that a<sup>N</sup> tends to be proportionally smaller for larger instances: Bucket [0, 20) has a mean L of 1.04, while the mean is 0.86 for bucket [150, 405).

#### **7.4 1-Soundness**

Lastly, we briefly comment on the time for deciding 1-soundness via unrolling for nets with known a<sup>N</sup> . The procedure times out for 71 instances, among which a<sup>N</sup> has a mean of 133.88 and a maximum of 256. It takes a mean of 612.66ms and a maximum of 14431ms to decide 1-soundness in this way. Unlike in the case for generalised soundness, our procedure for 1-soundness does not seem to be able to compete with the state-of-the-art. In [18], 1-soundness is decided for many of our instances in a few milliseconds per instance, which our approach does so only for instances with small a<sup>N</sup> (up to about 25).

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Lincheck: A Practical Framework for Testing Concurrent Data Structures on JVM

Nikita Koval1(B) , Alexander Fedorov1,2, Maria Sokolova<sup>1</sup>, Dmitry Tsitelov<sup>3</sup>, and Dan Alistarh<sup>2</sup>

 JetBrains, Prague, Czech Republic ndkoval@ya.ru IST Austria, Klosterneuburg, Austria Devexperts, Munich, Germany

Abstract. This paper presents Lincheck, a new practical and userfriendly framework for testing concurrent algorithms on the Java Virtual Machine (JVM). Lincheck provides a simple and declarative way to write concurrent tests: instead of describing *how* to perform the test, users specify *what to test* by declaring all the operations to examine; the framework automatically handles the rest. As a result, tests written with Lincheck are concise and easy to understand. The framework automatically generates a set of concurrent scenarios, examines them using stress-testing or bounded model checking, and verifies that the results of each invocation are correct. Notably, if an error is detected via model checking, Lincheck provides an easy-to-follow trace to reproduce it, significantly simplifying the bug investigation.

To the best of our knowledge, Lincheck is the first production-ready tool on the JVM that offers such a simple way of writing concurrent tests, without requiring special skills or expertise. We successfully integrated Lincheck in the development process of several large projects, such as Kotlin Coroutines, and identified new bugs in popular concurrency libraries, such as a race in Java's standard ConcurrentLinkedDeque and a liveliness bug in Java's AbstractQueuedSynchronizer framework, which is used in most of the synchronization primitives. We believe that Lincheck can significantly improve the quality and productivity of concurrent algorithms research and development and become the state-ofthe-art tool for checking their correctness.

### 1 Introduction

Concurrent programming is known to be notoriously hard and error-prone. Writing a good and robust test for a concurrent data structure may be even more challenging than implementing it. Programmers produce many such stress tests every day, but they often are nondeterministic, cover only specific cases, and do not catch all the bugs. Both the industry and academia need a tool that would simplify writing reliable tests for concurrent data structures.

In this paper, we present Lincheck [1], a new practical framework for JVMbased languages (such as Java, Kotlin, and Scala), which simplifies writing reliable concurrent tests. While most existing tools require writing the algorithm in a special language [2], specifying all possible concurrent scenarios and their outcomes [3–6], or learning a large amount of theory [7,8], Lincheck provides a more pragmatic *declarative* approach. It requires users only to list the data structure operations, thus, specifying *what to test* instead of *how*. Taking these operations, Lincheck generates a set of concurrent scenarios and examines them via stress testing or model checking, verifying that the outcome results are correct. The default correctness property is linearizability [9], but various relaxations [10–12] are also supported. One may think of Lincheck as a mix of a fuzzer (that generates concurrent scenarios) and a model checker or stress runner (which examines these scenarios) equipped with an automatic outcome verifier.

*Lincheck by Example.* The "classic" way to write a concurrent test is to manually run parallel threads, invoking the data structure operations in them and checking that some sequential history can explain the produced results. Such tests typically contain hundreds of lines of boilerplate code and cover only easyto-verify scenarios. Lincheck automates the machinery, making tests short and declarative. To illustrate that, we present a test for the ConcurrentLinkedDeque collection (double-ended queue, which supports insertions and removals at both ends) of the standard Java library in Listing 1.

The initial state of the testing data structure is specified in the constructor; here, we simply create a new empty deque at line 2. The following lines 4– 9 declare the deque operations; they should be annotated with @Operation. Finally, we run the analysis by invoking ModelCheckingOptions.check(..) on the testing class at line 11. Replacing ModelCheckingOptions with StressOptions switches to stress testing, which essentially runs parallel threads.

```
1 class DequeTest {
2 val deque = ConcurrentLinkedDeque <Int >()
3
4 @Operation fun addFirst(e: Int) = deque.addFirst(e)
5 @Operation fun addLast(e: Int) = deque.addLast(e)
6 @Operation fun pollFirst() = deque. pollFirst()
7 @Operation fun pollLast() = deque .pollLast()
8 @Operation fun peekFirst() = deque. peekFirst()
9 @Operation fun peekLast() = deque .peekLast()
10
11 @Test fun runTest() = ModelCheckingOptions()
12 .check( this :: class )
13 }
```
Listing 1. Concurrent test via Lincheck for Java's ConcurrentLinkedDeque. The code is written in Kotlin; import statements are omitted.

After executing the test, we get an error presented in Fig. 1. Surprisingly, this class from the standard Java library has a bug; the error was originally detected via Lincheck by the authors [13] (notably, there were several unsuc-

Fig. 1. The incorrect execution of the Java's ConcurrentLinkedDeque identified by the Lincheck test from Listing 1 and illustrated by a pictured diagram. To narrow the test output, ConcurrentLinkedDeque is replaced with CLD.

cessful attempts to fix the incorrectness before that [14,15]). Obviously, the produced results are non-linearizable: for pollLast() in the second thread to return -8, it should be called before addLast(-6) in the first thread; however, that would require the following peekFirst() to return -6 instead of -8. While Lincheck always prints a failing scenario with incorrect results (if found), the model checker also provides a detailed *interleaving trace* that reproduces the error.

Providing a detailed and informative trace is a game-changer. With it, we can easily understand why ConcurrentLinkedDeque is incorrect. The underlying data structure forms a doubly-linked list, with head and tail pointers approximating its first and last nodes. Initially, head and tail point to a logically removed (Node.item == null) node. After addFirst(-8) in the second thread is applied, a new node is added to the beginning; head and tail remain unchanged. Then, pollLast() starts; it finds the last non-empty node (the previously added one) and gets preempted before extracting the element. (The procedure linearizes on changing the Node.item value to null via atomic Compare-and-Set (CAS) instruction.) After invoking addLast(-6) in the first thread, a new node is added to the end of the list. The following peekFirst() does not change the data structure logically but advances the head pointer. Finally, the execution switches back to the second thread. The pollLast() operation successfully removes the node containing -8 (which is no longer the last element), extracting the item via CAS followed by unlinking the node physically. These twelve lines of straightforward code easily find a bug in the standard library of Java and provide a detailed trace that leads to the error, reducing the investigation time from hours to minutes. We also believe that with such an instrument as Lincheck, the bug would not have been released in the first place.

*Practical-Oriented Design.* Lincheck was designed as a tool for testing realworld concurrent code. The following its properties are crucial in practice:


*Real-World Applications.* We have successfully integrated Lincheck in the development processes of Kotlin Coroutines [16] and JCTools [17] libraries, enabling reliable testing of their core data structures, which are often complex and several thousand lines of code long. Lincheck's support of popular workload constraints and linearizability relaxations and its ability to handle blocking operations, such as those of Mutex and Channel, were crucial for these tests. Furthermore, for over five years, we have successfully used Lincheck in our "Parallel Programming" course to automate the verification of more than 4K student solutions annually.

We have also detected several new bugs [18] in popular libraries, including the previously discussed race in Java's ConcurrentLinkedDeque [13], nonlinearizabi-lity of NonBlockingHashMapLong from JCTools [19], and liveness bugs in Java's AbstractQueuedSynchronizer [18] and Mutex in Kotlin Coroutines [20].

In conclusion, Lincheck is a powerful and versatile tool for testing complex concurrent programs. It provides non-trivial features in terms of generality, ease of use, and performance. We provide a comprehensive overview of Lincheck in the rest of the paper and believe that it will greatly save time and (mental) energy tracking down concurrency bugs.

### 2 Lincheck Overview

We now dive into Lincheck internals, presenting its key features as we go along. The testing process can be broken down into three stages, as depicted in the diagram below. Lincheck generates a set of concurrent scenarios and examines them via either model checking or stress testing, verifying that each scenario invocation results satisfy the desirable correctness property (linearizability [9] by default). If the outcome is incorrect, the invocation hangs, or the code throws an unexpected exception, the test fails with an error similar to the one in Fig. 1.

*Minimizing Failing Scenarios.* When an error is detected, it is often possible to reproduce it with fewer threads and operations [21]. Lincheck automatically "minimizes" the failing scenario in a greedy way: it repeatedly removes an operation from the scenario until the test stops failing, thus finding a *minimal* failing scenario. While this approach is not theoretically-optimal, we found it working well in practice<sup>1</sup>.

*User Guide.* This section focuses mainly on the technical aspects behind the Lincheck features. For those readers who are interested in using the framework in their project, we suggest taking a look at the official Lincheck guide [22].

### 2.1 Phase 1: Scenario Generation

Lincheck allows to tune the number of parallel threads, operations in them, and the number of scenarios to be generated when creating ModelCheckingOptions or StressOptions. The framework then generates a set of concurrent scenarios by filling threads with randomly picked operations (annotated with @Operation) and generating (by default random) arguments for these operations.

*Operation Arguments.* Consider testing a concurrent hash table. If it has a bug, it is more likely to be detected when accessing the same element concurrently. To increase the probability of such scenarios, users can narrow the range of possible elements passed to the operations; Listing 2 illustrates how to configure the test in a way so the generated elements are always between 1 and 3.

```
1 @Param (name = "elem", gen = IntGen:: class , conf = "1:3")
2 @OpGroupConfig (name="writer", nonParallel= true )
3 class SingleWriterHashSetTest {
4 val s = SingleWriterHashSet <Int >()
5
6 @Operation ( group = "writer"
 ) // never executes concurrently
7 fun add( @Param (name = "elem") e: Int) = s.add(e)
8 @Operation
9 fun contains( @Param (name = "elem")
  e: Int) = s.contains(e)
```
<sup>1</sup> Finding the *minimum* failing scenario is a highly complex problem, as it could be not based on any of the generated scenarios.

```
10 @Operation ( group = "writer"
  ) // never executes concurrently
11 fun remove( @Param (name = "elem") e: Int) = s.remove(e)
12
13 @Test fun runTest() = ModelCheckingOptions()
14 .check( this :: class )
15 }
```
Listing 2. Testing single-writer set with custom argument generation (highlighted with yellow) and single-writer workload constraint (highlighted with red).

*Workload Constraints.* Some data structures may require a part of operations not to be executed concurrently, such as single-producer/consumer queues. Lincheck provides out-of-the-box support for such constraints, generating scenarios accordingly. The framework API requires grouping such operations and restricting their parallelism; Listing 2 illustrates how to test a single-writer set.

#### 2.2 Phase 2: Scenario Running

Lincheck uses stress testing and model checking to examine generated scenarios. The stress testing mode was influenced by JCStress [3], but Lincheck automatically generates scenarios and verifies outcomes, while JCStress requires listing both scenarios and correct results manually. The main issue with stress testing is the complexity of analysing a bug after detecting it. To mitigate this, Lincheck supports bounded model checking, providing detailed traces that reproduce bugs, similar to the one in Fig. 1. The rest of the subsection focuses on the modelchecking approach, discussing the most significant details.

*Bounded Model Checker.* The model-checking mode has drawn inspiration from the CHESS (also known as Line-Up) framework for C# [5]. It assumes the sequentially consistent memory model and evaluates all possible schedules with a limited number of context switches. Unlike CHESS, Lincheck bounds the number of schedules rather than context switches, which makes testing time independent of scenario size and algorithm complexity.

In some cases, the specified number of schedules may not be enough to explore all interleavings, so Lincheck studies them evenly, probing logically different scenarios first. For instance, imagine a case where Lincheck is analyzing interleavings with a single context switch and has previously explored only one interleaving, which originated from the first thread containing four atomic operations. Under these circumstances, Lincheck presumes that 25% of the interleavings have been explored when starting from the first thread, while the second thread remains unexplored. As a result, Lincheck becomes more inclined to select the second thread as the starting point for the next exploration.

*Switch Points.* To control the execution, Lincheck inserts internal method calls at shared memory accesses by on-the-fly byte-code transformation via ASM framework [23]. These internal methods serve as *switch points*, enabling manual context switching. Notably, Lincheck supports shared memory access through AtomicFieldUpdater, VarHandle, and Unsafe and handles built-in synchronization via MONITORENTER/MONITOREXIT, park/unpark, and wait/notify. Internally, it replaces there synchronization primitives with custom implementations, thus, enabling full control of the execution.

*Progress Guarantees.* While exploring potential switch points, Lincheck can detect active synchronization, handling it similarly to locks. This capability to detect blocking code enables Lincheck to verify the testing algorithm for *obstruction-freedom*<sup>2</sup>, the weakest non-blocking guarantee [10]. Although more popular lock- and wait-freedom are part of Lincheck's future plans, the majority of practical liveness bugs are caused by unexpected blocking code, making the obstruction-freedom check fairly useful for lock-free and wait-free algorithms.

*Optimizations.* Lincheck uses various heuristics to speed up the analysis and increase the coverage. The most impactful one excludes final field accesses from the analysis, as their values are unchanging. Our internal experiments indicate a reduction in the number of inserted switch points by over *<sup>×</sup>*2 in real-world code. Another important optimization tracks objects that are not shared with other threads, excluding accesses to them from the analysis. This heuristic eliminates an additional 10–15% of switch points in practice.

*Happens-Before.* When an operation starts, Lincheck collects which operations from other threads are already completed to establish the "happens-before" relation; this information is further passed to the results verifier.

*Modular Testing.* When constructing new algorithms, it is common to use existing non-trivial data structures as building blocks. Considering such underlying data structures to be correct and treating their operations as atomic may significantly reduce the number of possible interleavings and check only meaningful ones, thus increasing the testing quality. Lincheck makes it possible with the modular testing feature; please read the official guide for details [22].

*Limitations.* For the model checking mode, the testing data structure must be deterministic to ensure reproducible executions, which is a common requirement for bug reproducing tools [24]. For the algorithms that utilize randomization, Lincheck offers out-of-the-box support by fixing seeds for Random; thus, making the latter deterministic. To our experience, Random is the only source of nondeterminism in practical concurrent algorithms.

*Model Checking vs Stress Testing.* The primary benefit of using model checking is obtaining a comprehensive trace reproducing the detected error, as demonstrated in Fig. 1. However, the current implementation assumes the sequentially consistent memory model, which can result in missed bugs caused by low-level effects, such as an omitted volatile modifier in Java. We are in the process of incorporating the GenMC algorithm [6,25] to support weak memory models and increase analysis coverage through the partial order reduction

<sup>2</sup> The *obstruction-freedom* property ensures that any operation completes within a limited number of steps if all other threads are stopped.

technique. In the meantime, we suggest using stress testing in addition to model checking.

### 2.3 Phase 3: Verification of Outcome Results

Once the scenario is executed, the operation results should be verified against the specified correctness property, which is linearizability [9] by default. In brief, Lincheck tries to match the operation results to a sequential history that preserves the order of operations in threads and does not violate the "happensbefore" relation established during the execution.

*LTS.* Instead of generating all possible sequential executions, Lincheck lazily builds a *labeled transition system (LTS)* [26] and tries to explain the obtained results using it. Roughly, LTS is a directed graph, which nodes represent the data structure states, while edges specify the transitions and are labeled with operations and their results. Execution results are considered valid if there exists a finite path in the LTS (i.e., sequential history) that leads to the same results. Lincheck lazily builds LTS by invoking operations on the testing data structure in one thread. Thus, the sequential behavior is specified implicitly. Figure 2 illustrates an LTS lazily constructed by Lincheck for verifying incorrect results of ConcurrentLinkedDeque from Fig. 1.

*Sequential Specification.* By default, Lincheck sequentially manipulates the testing data structure to build an LTS. It is possible to specify the sequential behavior explicitly, providing a separate class with the same methods as those annotated with @Operation. It allows for a single Lincheck test instead of separate sequential and concurrent ones. For API details, please refer to the guide [22].

*Validation Functions.* It is possible to validate the data structure invariants at the end of the test, adding the

Fig. 2. An LTS constructed for verifying ConcurrentLinkedDeque results from Fig. 1.

corresponding function and annotating it with @Validate. For example, we have uncovered a memory leak in the algorithm for removing nodes from a concurrent linked list in [27] by validating that logically removed nodes are unreachable at the end.

*Linearizability Relaxations.* Additionally to linearizability, Lincheck supports various relaxations, such as quiescent consistency [10], quantitative relaxation [11], and quasi-linearizability [12].

*Blocking Operations.* Some structures are blocking by design, such as the case of Mutex or Channel. Consider a *rendezvous channel*, also known as "synchronous queue", as an example: senders and receivers perform a rendezvous handshake as a part of their protocol (senders wait for receivers and vice versa). If we run send(e) and receive() in parallel, they both succeed. However, executing the operations sequentially will result in suspending the first one. To reason about correctness, the *dual data structures* formalism [28] is usually used. Essentially, it splits each operation into two parts at the point of suspension, linearizing these parts separately. We extend this formalism by allowing suspended requests to cancel and by making it more efficient for verification.

### 3 Evaluation

Lincheck has already gained adoption in Kotlin and Java communities, as well as by companies and universities. It has been integrated into the development processes of Kotlin Coroutines [16] and JCTools [17], enabling reliable testing of their core data structures, and was used to find several new bugs in popular concurrency libraries and algorithms published at top-tier conferences. Furthermore, for over five years, we have successfully used Lincheck in our "Parallel Programming" course to automate the verification of more than 4K student solutions per year. Notably, many users appear to especially appreciate Lincheck's low entry threshold and its ability to "explain" errors with detailed traces.

*Novel Bugs Discovered with* Lincheck*.* We have uncovered multiple new concurrency bugs in popular libraries and authors' implementations of algorithms published at top conferences. These bugs are listed in Table 1 and include some found in the standard Java library. Lincheck not only detects non-linearizability and unexpected exception bugs, but also liveliness issues. For example, it identified an obstruction-freedom violation in Java's AbstractQueuedSynchronizer framework, which is a foundation for building most synchronization primitives in the standard Java library.

Notably, the tests that uncover the bugs listed in Table tab1 are publicly available [18], allowing readers to easily reproduce these bugs.

*Running Time Analysis.* We have designed Lincheck for daily use and expect it to be fast enough in interactive mode. Various factors, including the complexity of the testing algorithm and the number of threads, operations, and invocations, can impact its performance. We suggest using two configurations for the best user experience and robustness: a *fast* configuration for local builds to catch simple bugs quickly and a *long* configuration to perform a more thorough analysis on CI/CD (Continuous Integration) servers:


We assess the performance and reliability of Lincheck with these *fast* and *long* configurations by measuring the testing times and showing whether the expected bugs were detected. We run the experiment on the buggy algorithms listed in Table 1, along with ConcurrentHashMap and ConcurrentLinkedQueue from


Table 1. Novel bugs discovered with Lincheck; tests are publicly available [18].

*<sup>a</sup>* The deadlock in the LogicalOrderingAVL algorithm was originally found by Trevor Brown and later confirmed with Lincheck.

Table 2. Running times of Lincheck tests with *fast* and *long* configurations using both stress testing and model checking (MC) for the listed data structures. Failed tests, which detect bugs, are highlighted with red. Notably, finding a bug may take longer than testing a correct implementation due to scenario minimization.


the Java standard library and a quasi-linearizable LockFreeTaskQueue with Semaphore from Kotlin Coroutines. The results are available in Table 2. The experiment was conducted on a Xiaomi Mi Notebook Pro 2019 with Intel(R) Core(TM) i7-8550U CPU @ 1.80 GHz and 32 Gb RAM. The results show that the *fast* configuration ensures short running times, being suitable for use as unit tests without slowing down the build and able to uncover some bugs. However, some bugs are detected only with the *long* configuration, emphasizing the need for more operations and invocations to guarantee correctness. Despite this, the running time remains practical and acceptable.

### 4 Related Work

Several excellent tools for linearizability testing and model checking have been proposed, e.g. [4,5,35–41], and some even support relaxed memory models [6,25, 42,43] and linearizability relaxations [36,44]. Due to space limitations, we focus our discussion on the works that shaped Lincheck.

*Inspiration.* Lincheck was originally inspired by the JCStress [3] tool for JVM, which is designed to test the memory model implementation. However, JCStress does not offer a declarative approach to writing tests. The bounded model checker in Lincheck was influenced by CHESS (Line-Up) [5] for C#, which is also non-declarative and does not support linearizability extensions. Lincheck offers several novel features and usability advantages compared to these inspirations, making it a versatile platform for research in testing and model checking. Although other tools such as GenMC [6,25,43] have superior features, Lincheck is designed to be extensible and can integrate new tools. In particular, we are working on incorporating the GenMC algorithm into Lincheck at the moment of writing this paper.

*Lincheck Compared to Other Solutions.* To the best of our knowledge, no other tool offers similar functionality. In particular, Lincheck allows certain operations to never execute in parallel (supporting single-producer/consumer constraints), detects obstruction-freedom violations (which is crucial for checking non-blocking algorithms), provides a way to specify sequential behavior explicitly (enabling oracle-based testing), and supports blocking operations for Kotlin Coroutines. Furthermore, Lincheck is a highly user-friendly framework, featuring a simple API and easy-to-understand output, which we have found users to highly appreciate.

### 5 Discussion

We introduced Lincheck, a versatile and expandable framework for testing concurrent data structures. As Lincheck is not just a tool but a platform for incorporating advancements in concurrency testing and model checking, we plan to integrate cutting-edge model checkers that support weak memory models. Written in Kotlin, Lincheck is also interoperable with native languages such as Swift or C/C++. Our goal is to extend Lincheck testing to these languages, making it the leading tool for checking correctness of concurrent algorithms. We believe that Lincheck has the potential to significantly improve the quality and efficiency of concurrent algorithms development, reducing time and effort to write reliable tests and investigate bugs.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **nekton: A Linearizability Proof Checker**

Roland Meyer<sup>1</sup> , Anton Opaterny1(B) , Thomas Wies2 , and Sebastian Wolff<sup>2</sup>

<sup>1</sup> TU Braunschweig, Braunschweig, Germany {roland.meyer,anton.opaterny}@tu-bs.de <sup>2</sup> New York University, New York, USA {wies,sebastian.wolff}@cs.nyu.edu

**Abstract.** nekton is a new tool for checking linearizability proofs of highly complex concurrent search structures. The tool's unique features are its parametric heap abstraction based on separation logic and the flow framework, and its support for hindsight arguments about future-dependent linearization points. We describe the tool, present a case study, and discuss implementation details.

**Keywords:** separation logic · proof checker · linearizability · flow framework

### **1 Introduction**

We present nekton, a mostly automated deductive program verifier based on separation logic (SL) [23,27]. The tool is designed to aid the construction of linearizability proofs for complex concurrent search structures. Similar to many other SL-based tools [2,8,14,22,33,33], nekton uses an SMT solver to automate basic SL reasoning. Similar to the original implementation of CIVL [7], it uses non-interference reasoning à la Owicki-Gries [25] to automate thread modularity. What makes nekton stand out among these relatives is its inbuilt support for expressing complex inductive heap invariants using the flow framework [12,13,20] and the ability to (partially) automate complex linearizability arguments that require hindsight reasoning [4,5,15,18,19,24]. Together, these features enable nekton to verify challenging concurrent data structures such as the FEMRS tree [4] with little user guidance.

nekton [17] is derived from the tool plankton [18,19], which shares the same overall goals and features as nekton but strives for full proof automation at the expense of generality. In terms of the trade-off between automation and expressivity, nekton aims to occupy a sweet spot between plankton and general purpose program verifiers. In the following, we discuss nekton's unique features in more detail and explain how it deviates from plankton's design.

The flow framework can be used to express global properties of graph structures in a node-local manner, aiding compositional verification of recursive data structures. The framework is parametric in a *flow domain* which determines what global information about the graph is provided at each node. Various flow domains have been proposed that have shown to be useful in concurrency proofs [11,26]. To simplify proof automation, plankton uses a fixed flow domain that is geared towards verifying functional correctness of search structures. In contrast, nekton is parametric in the flow domain. For instance, it supports custom domains for reasoning about overlayed structures and other data-structure-specific invariants. This design choice significantly increases the expressivity of the tool at the cost of a mild increase in the annotation burden for the user. For instance, the FEMRS tree case study that we present in this paper relies on a flow domain that is beyond the scope of plankton. In fact, the flow domain is also beyond state-of-the-art abstract interpretation-based verification tools checking linearizability [1]. However, computing relative to a given flow domain is considerably more difficult than computing with a hard-coded one: it requires parametric versions for (1) computing post images, (2) checking entailment, and (3) checking non-interference. Yet, it allows for sufficient automation compared to general user-defined (recursive) predicates as accepted by, e.g., Viper [22] and VeriFast [9].

The second key feature of nekton is its support for *hindsight reasoning*. Intuitively, hindsight arguments rely on statements of the form "if q holds in the current state and p held in some past state, then r must have held in some intermediate state". Such arguments can greatly simplify the reasoning about complex concurrent algorithms that involve future-dependent linearization points. At a technical level, hindsight reasoning is realized by lifting a state-based separation logic to one defined over computation histories [18,19]. nekton's support for this style of reasoning goes beyond the simple hindsight rule in [18] but does not yet implement the general *temporal interpolation* rule introduced more recently in [19], which is already supported by plankton.

These features set nekton apart from its competitors. First, it offers more expressivity compared to tools with a higher degree of automation like plankton [18,19], Cave [29–31], and Poling [34]. Second, it's proofs require less annotation effort than more flexible refinement-proofs for fine-grained concurrency, like those of CIVL [7,10] and Armada [16]. Last, it integrates techniques for proving linearizability, which are missing in industrial grade tools like Anchor [6].

In the remainder of this paper, we provide a high-level overview of the tool (Sect. 2), present a case study (Sect. 3), and discuss implementation details some of which also concern plankton and have not yet been reported on before (Sect. 4).

#### **2 Input**

nekton checks the correctness of proof outlines for the linearizability of concurrent data structures. Its distinguishing feature compared to its ancestor plankton is that the heap abstraction is not hard-coded inside the tool, but taken as an input parameter. That is, nekton's input is a *heap abstraction* and a set of *proof outlines*, one for each function manipulating the data structure state. The heap abstraction defines how the data structure's heap representation is mapped onto a labeled graph that captures the properties of interest and that can then be reasoned about in separation logic. It also embeds the mechanism for checking linearizability.

nekton works with the recent flow graphs proposed by Krishna et al. [12,13], in their latest formulation due to [18]. Flow graphs augment heap graphs with ghost state. The ghost state can be understood as a certificate formulating global properties of heap graphs in a node-local manner. It takes the form of a so-called flow value that has been propagated through the heap graph and, therefore, brings global information with it. The propagation is like in static analysis, except that we work over heap graphs rather than control-flow graphs. To give an example, assume we want to express the global property that the heap graph is a tree. A helpful certificate would be the path count, the number of paths from a distinguished root node to the node of interest. It allows us to formulate the tree property node-locally, by saying that the path count is always at most one.

Our first input is a flow domain (M, *gen*). The parameter (M, +, 0) is a commutative monoid from which we draw the flow values. The propagation needs standard fixed point theory: the natural ordering <sup>a</sup> <sup>≤</sup> <sup>a</sup> + <sup>b</sup> for a, b <sup>∈</sup> <sup>M</sup> on the monoid should form an <sup>ω</sup>-complete partial order. We expect the user to specify both + and <sup>≤</sup> to avoid the quantifier over the offset in the definition of ≤. The parameter *gen* generates the transfer functions labeling the edges in the heap graph. Transfer functions transform flow values to record information about the global shape. The generator has the type

$$gen: \mathsf{PointerFld} \to (\mathsf{DatzFld} \to Data) \to \mathsf{Mon}(M \to M) \,.$$

We assume flow graphs distinguish between pointer fields (PointerFld) and fields that hold data values (DataFld). Flow values are propagated along every pointer field, in a way that depends on the current data values but that does not depend on the target of the field. To see that the data values are important, imagine a node has already been deleted logically but not yet physically from a data structure, as is often the case in lock-free processing. Then the logical deletion would be indicated by a raised flag (a distinguished data field), and we would not forward the current path count. To reason about flow values with SMT solvers, we restrict the allowed types of flow values to

$$M \quad \dots = \quad \mathbb{B} \quad | \quad \mathbb{N} \quad | \quad \mathbb{P}(\mathbb{B}) \quad | \quad \mathbb{P}(\mathbb{N}) \quad | \quad M \times M \quad \dots$$

Flow values are (sets of) Booleans or integers, or products over these base types. When defining a product type, the user has to label each component with a selector allowing to project a tuple onto this component. Importantly, the user can define the addition operation + for the flow monoid freely over the chosen type as long as the definition is expressible within the underlying SMT theory (e.g., for <sup>N</sup> one may choose as + the usual addition or the maximum). The tool likewise inherits the assertion language for integers and Booleans that is supported by the SMT solver. There are two more userdefined inputs that are tightly linked to the heap representation.

**Linearizability.** We establish the linearizability of functions manipulting a data structure with the help of the keyset framework [11,28], which we encode using flows. A crucial problem when proving linearizability are membership queries: we have to determine whether a given key has been in the data structure at some point in time while the function was running. The keyset framework localizes these membership queries from the overall data structure to single nodes. It assigns to each node n a set of keys for which n is responsible, in the sense that n has to answer the membership queries for these keys. This set of keys is n's *keyset*. Imagine we have a singly linked list

$$\mathsf{Head} \xrightarrow{\left(-\infty,\infty\right)} \left(n\_1,5\right) \xrightarrow{\left[6,\infty\right)} \left(n\_2,7\right) \xrightarrow{\left[6,\infty\right)} \left(n\_3,10\right) \xrightarrow{\left[11,\infty\right)} \perp\dots$$

The shared pointer Head propagates the keys in the interval (−∞,∞) as a flow value to node <sup>n</sup><sup>1</sup> holding key <sup>5</sup>. This set is called <sup>n</sup>1's *inset*. The inset of a node <sup>n</sup> contains all keys *<sup>k</sup>* for which a search will reach <sup>n</sup>. If *<sup>k</sup>* <sup>&</sup>gt; 5, the search will proceed to <sup>n</sup>2, otherwise it will stay at <sup>n</sup>1. Thus, the keyset of <sup>n</sup><sup>1</sup> is (−∞, 5]. That is, if *<sup>k</sup>* <sup>∈</sup> (−∞, 5], the answer to the membership query is determined by the test *<sup>k</sup>* = 5. Node <sup>n</sup><sup>1</sup> forwards [6,∞) to the successor node <sup>n</sup><sup>2</sup> with key <sup>7</sup>. Since <sup>n</sup><sup>2</sup> has been logically deleted, indicated by the tombstone †, it cannot answer membership queries: the keyset is empty. Instead, the node forwards its entire inset [6,∞) to node <sup>n</sup>3, which is now responsible for the keyset [6, 10]. We speak of a framework because whether a given key *<sup>k</sup>* belongs to a node's keyset or whether it is propagated to one of the node's successors is specific to each data structure, but the way in which the linearizability argument for membership queries is localized to individual flow graph nodes is always the same.

In nekton, the user can define <sup>P</sup>(N) for sets of keys as (a component in) the flow domain of interest. With parameter *gen*, they can implement the propagation. We also provide flexibility in the definition of the keyset and membership queries in the form of two predicates *rsp* (responsible) resp. *cnts* (contains). To give an example, we would define

$$\mathsf{r}\mathsf{r}sp(x,k)\triangleq k\in x\mathsf{\star}\mathsf{f}\mathsf{l}\mathsf{ow}.\mathsf{i}\mathsf{s}\*k\leq x\mathsf{\star}\mathsf{k}\mathsf{e}\mathsf{y}\*\neg x\mathsf{\star}\mathsf{m}\mathsf{r}\mathsf{k}\mathsf{e}\mathsf{d}\mathsf{ .}$$

With x flow, we denote x's flow value. The flow domain is a product, and we refer to the component called is. With x key and x marked we denote the x's key and marked fields. Formally, the dereference notation is a naming convention for logical variables that refer to values of resources defined in the node-local invariant explained below. Reconsider the example and let *<sup>k</sup>* = 6. The key belongs to the inset [6,∞) that <sup>n</sup><sup>2</sup> receives from <sup>n</sup>1. We discussed that the node's keyset is empty, and indeed *rsp*(n2, 6) is false. For <sup>n</sup>3, we have *rsp*(n3, 6) true. With the predicate *rsp* in place, we can also refer to n.keyset in assertions.

For verifying functions with non-fixed linearization points, nekton implements the hindsight principle [24]. Reasoning with that principle goes as follows. We record information about bygone states of the data structure in past predicates ⟐ a. For example, <sup>⟐</sup>(*<sup>k</sup>* <sup>∈</sup> <sup>x</sup> flow.is) says that the key of interest was in the node's inset at some point while the function was running. Moreover, the assertion about the current state may tell us that the key is smaller than the key held by the node and that the node is not marked now, *k* ≤ x key ∗ ¬n marked. Then the hindsight principle will guarantee that there has been a state in between the two moments where the node still had the key in its inset, the inequality held true, and the node was unmarked. This is <sup>⟐</sup> *rsp*(n, *<sup>k</sup>*) as defined above. To draw this conclusion, the hindsight principle inspects the interferences the data structure state may experience from concurrently executed functions. In the example, no interferene can unmark a node or change a key. So the predicates encountered in the current state must have held already in the past state when *k* ∈ x flow.is was true. This form of hindsight reasoning is stronger than the one in [18] but not yet as elaborate as the one in [19]. From a program logic point of view, hindsight reasoning relies on a lifting of state-based to computation-based separation algebras [18].

**Implications.** Reasoning about automatically generated transfer functions is difficult, in particular when they relate different components in a product flow domain. Consider <sup>N</sup> <sup>×</sup> <sup>P</sup>(N) with the first component the path count at a node and the second component the keyset. The transfer functions will never forget to count a path, and so the following implication will be valid over all heap graphs:

$$(x \star \mathsf{f1ow.pcount}) = 0 \quad \implies \quad (x \star \mathsf{f1ow.keyset}) = \mathcal{Q} \; . \tag{1}$$

Despite the help of an SMT solver, nekton will fail to establish the validity of such an implication. Therefore, the user may input a set of such formulas that the tool will then take for being valid without further checks. Correctness of a proof is always relative to this set of implications.

#### **2.1 Proof Outlines**

A concurrent data structure consists of a set of structs defining the heap elements and a set of functions for manipulating the data structure state. nekton expects as input a proof outline for each such function. The program logic implemented by nekton is an Owicki-Gries system that, besides partial correctness, requires interference freedom of the given proof outlines. The user is expected to give the interferences as input.

The proof outlines accepted by nekton take the form { *pre* } *po* { *post* } with

*po* ::= com | { *<sup>a</sup>* } | *po* ; *po* <sup>|</sup> (*po* <sup>+</sup> *po*) { *<sup>a</sup>* }|{ *<sup>a</sup>* } *po*\* { *<sup>a</sup>* } | atomic *po* .

The proof outlines are partial in that intermediary assertions, say in com<sup>1</sup> ; com2, may be omitted. nekton will automatically generate the missing information using strongest postconditions. What has to be given are loop invariants and unifying assertions for the different branches of if-then-else statements. Consecutive assertions { *a* } ; { *b* } are interpreted as a weaking of a to b.

Programs are given in a dialect of C. Commands are assignments to/from variables and memory locations, allocations, assumptions, and acquires/releases of locks

$$\begin{array}{lclclcl} & \mathsf{g} \texttt{\texttt{\\_}} & \mathsf{\\_} & \mathsf{\\_} & \mathsf{\\_} & \mathsf{\\_} \\ & & & \mathsf{\\_} & \mathsf{\\_} & \mathsf{\\_} \\ & & & \mathsf{\\_} & \mathsf{\\_} & \mathsf{\\_} \\ \end{array} \\ \begin{array}{lclclcl} & \mathsf{\\_} & \mathsf{\\_} & \mathsf{\\_} & \mathsf{\\_} \\ & & \mathsf{\\_} & \mathsf{\\_} & \mathsf{\\_} \\ \end{array} \\ \begin{array}{lclclcl} & \mathsf{\\_} & \mathsf{\\_} & \mathsf{\\_} & \mathsf{\\_} \\ \end{array}$$

Here, p, q are program variables, fld is a field name, and dereferences are denoted by an arrow. The language is strictly typed with base types void, bool, and int. The latter represents the mathematical integers, i.e., has an infinite domain. We admit the usual conditions over the base types. Using the struct keyword users can specify their own types. In addition, nekton supports syntactic sugar like if-then-else, (do-)while loops, non-recursive macros, break and return statements, assertions, simultaneous assignments, and compare-and-swaps. These can be expressed in terms of the core language in the expected way.

The assertion language is a standard separation logic defined over the base types, heap graphs, and the given flow domain. It has the separating conjunction and classical implication (no magic wand). Our heap model is divided into a local and a shared heap, and we use the box operator a to indicate assertions over the shared state. The shared state is represented by an iterated separating conjunction. Since this conjunction refers to a set of nodes and we want to reason first-order, we handle it implicitly. We let each assertion <sup>a</sup> in a proof outline stand for <sup>∃</sup>*x*. a <sup>∗</sup> ∗*<sup>n</sup>*∈N\*Nodes*(*a*) *NInv*(n). The iterated separating conjunction is over all nodes that do not occur in a, and asserts a node-local invariant for each of them. The existential quantifier is over all logical variables in the assertion. Keeping it implicit makes the assertions more concise and aids automation.

**Node Invariants.** nekton expects the node-local invariant *NInv*(n) as another input. The role of this invariant is to make use of the flow framework and state global properties of the data structure in a local way. The invariant would say, for instance, that sentinel nodes are never marked. Compared to the implication list, the node-local invariant has the advantage that its claims are actually checked. Technically, the node-local invariant is a separation logic formula that is only allowed to refer to the given node n and its fields. It will often define logical variables like n flow that refer to the entry of the flow field and can be used outside the node-local invariant. These variables are quantified away by ∃*x* above.

**Interferences.** Interferences are RGSep actions [32] restricted to the format

$$NInv(x). \left\{ \begin{array}{c} a \end{array} \right\} \leadsto \left[ \mathsf{f1d}\_1, \dots, \mathsf{f1d}\_n \right] \left\{ \begin{array}{c} b \end{array} \right\} \dots \tag{2}$$

To give an example, we formulate that a concurrently executed function may mark a node using the action *NInv*(x). { ¬(<sup>x</sup> marked) } [marked]{ <sup>x</sup> marked }. An action refers to a single node in the heap graph as described by the above node-local invariant. The action applies if the assertion a evaluates to true, and modifies the node in a way that satisfies b. Like the invariant, the assertions a and b have to be node-local and only refer to the values of x's fields. The assertions may introduce logical variables that are implicitly existentially quantified and whose scope extends over a and b. Such variables allow us to relate the pre- and post-state of the interference. The fields given in the brackets are the ones that may change under the action. If assertion b does not refer to the value of a field that is given in the list, the field may receive arbitrary values. If a field is not named, it is guaranteed to stay unchanged.

### **3 Case Study**

We present a linearizability proof of the FEMRS tree [4] conducted with nekton. We omit the data structure's maintenance operation because it leads to flow updates that neither nekton nor another state-of-the-art technique aimed at automation can handle. Each node in the tree stores one key and points to up to two child nodes left and right, storing keys with lower and higher values, respectively. In addition, each node contains two Boolean fields del and rem for the removal of nodes. This is because the tree distinguishes the logical removal, indicated by the del flag, from the physical unlinking of a node, indicated by the rem flag. As long as a logically removed node has not been unlinked, it can become part of the tree again. The idea is to save the creation of new nodes for keys that are physically but no longer logically part of the tree. Lastly, every node can be locked.

Figure 1 depicts a possible state of the FEMRS tree. Each node is labeled with its key. Dashed nodes have been logically removed. To prove linearizability, we rely on the keyset framework. The inset flow is used to define the keysets, as explained earlier. The edges in the figure are labeled with the flow they propagate. The transfer functions leading to this propagation stem from the following generator *gen*:

**Fig. 1.** A state of the FEMRS tree.

*gen*(fld) λf. x del ? <sup>f</sup> : <sup>f</sup> \ - fld = left ? [<sup>x</sup> key,∞):(−∞, x key] . The predicates defining the keyset and membership are

$$\begin{split} rsp(x,k) &\triangleq \quad k \in x \succ \mathsf{f1ow.is} \ast k = x \succ \mathsf{key} \\ &\qquad \vee k \in x \succ \mathsf{f1ow.is} \ast k < x \succ \mathsf{key} \ast x \succ \mathsf{Left} = \mathsf{nil1} \\ &\qquad \vee k \in x \succ \mathsf{f1ow.is} \ast k > x \succ \mathsf{key} \ast x \succ \mathsf{right} = \mathsf{nil1} \\ \mathsf{cents}(x,k) &\triangleq \quad k \in x \succ \mathsf{f1ow.is} \ast k = x \succ \mathsf{key} \ast \neg x \succ \mathsf{de1} \ . \end{split}$$

In the example, *rsp*(5, 7), *rsp*(15, 15), *rsp*(20, 17), *cnts*(12, 12) and more hold.

The set of interferences expresses this: (I1) As long as the lock of the node is not held by the thread under consideration and as long as the node has not been marked unlinked, the child pointers and the (logical and physical) removal flags may change arbitrarily. The proof does not rely, e.g., on the fact that the rem flag is raised only once and only when the del flag is true. (I2) A lock that is not held by the thread may change arbitrarily. (I3) A node that is being physically unlinked ceases to receive flow. The following nekton actions formalize this:

$$NInv(x).\{x\star 1 \text{ock} \neq \text{ømed} \ast \neg x \star \text{rem}\} \leadsto [1 \text{eft}, \text{right}, \text{de1}, \text{ren}]\{true\} \tag{11}$$

$$NInv(x).\{x\text{-}1\text{ock}\neq\text{owned}\}\rightsquigarrow[1\text{ock}]\{true\}\tag{\text{I}}\tag{\text{I}}$$

$$\{NInv(x).\{x\star 1 \text{ock} \neq \text{øuned} \ast x \star \text{°I} \text{os}.\text{is} \neq \mathcal{Q} \ast x \star r \text{æ}\} \leadsto [\text{is}]\{x \star \text{°I} \text{os}.\text{i} \text{s} = \mathcal{Q}\}.\text{ (I3)}$$

We prove the linearizability of the functions contains(*k*), insert(*k*), and remove(*k*). All of them call the auxiliary function locate(*k*), which returns the last edge it traversed during a search for key *k*. Figure 2 gives the proof outline of locate. The proof for the full implementation can be found in [17].

We use a product flow domain <sup>P</sup>(N) <sup>×</sup> <sup>N</sup>. The first component is the inset flow with the generator function discussed above. The second component is the pathcount, whose *gen*() simply yields the identity for all edges. The benefit of the product flow is that we can prove memory safety on the side, while conducting the linearizability proof.

In the node-local invariant, we introduce logical variables like x left to make the proof more readable. We refer to these variables in the generator function. The invariant for the node pointed to by the shared Root differs from that of the remaining nodes:

$$\begin{array}{rcl} NInv(x) & \triangleq & x \mapsto \langle \mathsf{f1ow} = \langle x \star \mathsf{f1ow}.\mathsf{i}\mathsf{i},\,\,\mathsf{x}\star\mathsf{f1ow}.\mathsf{pcom}\mathsf{int}\rangle, \\ & \mathsf{1eft} = x \star \mathsf{1eft},\,\,\mathsf{right}\mathsf{right} = x \star \mathsf{rright}\mathsf{right},\,\,\mathsf{key} = x \star \mathsf{key}, \\ & \mathsf{1ock} = x \star \mathsf{1ock},\,\,\mathsf{de1} = x \star \mathsf{de1},\,\,\mathsf{r}\mathsf{en} = x \star \mathsf{r}\mathsf{en}\,\,\rangle \\ & \ast \ NInv\_{\mathsf{all1}}(x) \* (x = \mathsf{Rot} \Rightarrow NInv\_{\mathsf{lox}}(x)) \\ \end{array}$$

$$\begin{array}{rcl} NInv\_{\mathsf{boot}}(x) & \triangleq & x \star \mathsf{key} = -\infty \star \neg x \star \mathsf{de1} \ast \neg x \star \mathsf{r}\mathsf{en} \\ & \ast \ x \star \mathsf{f1ow}.\mathsf{i}\mathsf{is} = (-\infty,\infty) \;\,\,\, \mathsf{x} \star \mathsf{f1ow}.\mathsf{pcom}\mathsf{int} = 1 \\\ NInv\_{\mathsf{all1}}(x) & \triangleq & (\neg x \star \mathsf{r}\mathsf{en} \Rightarrow x \star \mathsf{key} \in x \star \mathsf{f1ow}.\mathsf{i}\mathsf{is}) \ast \ x \star \mathsf{f1ow}.\mathsf{pcom}\mathsf{int} < 3 \\ & \ast \ (x \star \mathsf{r}\mathsf{en} \Rightarrow x \star \mathsf{de1}) \* (x \star \mathsf{1eft} = x \star \mathsf{r}\mathsf{e} \mathsf{l} \texttt{gt} \Rightarrow x \star \mathsf{1eft} = \mathsf{ni1}).\end{array}$$

The node-local invariant makes the expected claims. The root has key −∞, is neither logically deleted nor unlinked, has as incoming keys (−∞,∞) and the pathcount is 1.

**Fig. 2.** Proof outline for locate as verified by nekton.

These flow values are established by the data structure's initialization function using an auxiliary edge with an appropriate generator. For all nodes, we have that their key is in the inflow, provided the node has not yet been unlinked, the path count is at most 3, a node has to be first logically deleted before it can be unlinked, and the only case in which the left and the right child can coincide is when they are both the null pointer. We treat nil as a node outside the set of nodes N. This in particular means the node-local invariant does not apply to it. It will follow from the definition of the generator function that the keysets are disjoint. We do not need to state this in the invariant as it is only important when interpreting the verification results.

The assertion on line 9 helps our implication engine, which is designed for conjunctive assertions, deal with the disjunctions.

We explain the implication between Lines 11 and 12. It starts with the assertion *NInv*(p) <sup>∗</sup> *NInv*(c) <sup>∗</sup> <sup>⟐</sup>[*NInv*(p) <sup>∗</sup> *<sup>k</sup>* <sup>∈</sup> <sup>p</sup> flow.is] <sup>∗</sup> <sup>p</sup> right = <sup>c</sup> <sup>∗</sup> <sup>p</sup> key <sup>&</sup>lt; *<sup>k</sup>* . To apply the hindsight principle, we derive the following guarantees from the set of interferences. A node's key is never changed. The only way a node's inset can shrink is by unlinking, after which its left and right pointers are no longer changed. The right child of p is not nil in the current state. From this information, the hindsight principle concludes <sup>⟐</sup>[*NInv*(p) <sup>∗</sup> *NInv*(c) <sup>∗</sup> *<sup>k</sup>* <sup>∈</sup> <sup>p</sup> flow.is <sup>∗</sup> <sup>p</sup> key <sup>&</sup>lt; *<sup>k</sup>* <sup>∗</sup> <sup>p</sup> right <sup>=</sup> <sup>c</sup>] . Together with the definition of the transfer functions labeling the edges, this assertion yields <sup>⟐</sup>[*NInv*(c) <sup>∗</sup> *<sup>k</sup>* <sup>∈</sup><sup>c</sup> flow.is] . Another hindsight application starts with *NInv*(p) <sup>∗</sup> <sup>c</sup> = nil <sup>∗</sup> <sup>⟐</sup>[ *NInv*(p) <sup>∗</sup> *<sup>k</sup>* <sup>∈</sup> <sup>p</sup> flow.is ] <sup>∗</sup> <sup>p</sup> right = <sup>c</sup> <sup>∗</sup> <sup>p</sup> key <sup>&</sup>lt; *<sup>k</sup>*

and moves the facts known in the current state into the past predicate. The definition of *rsp*(x, *<sup>k</sup>*) then yields <sup>⟐</sup>[*NInv*(p) <sup>∗</sup> *rsp*(p, *<sup>k</sup>*)] .

The full proof consists of 99 lines of code, 48 lines of assertions to prove them linearizable, and 56 lines of definitions for the flow domain, interferences, and invariants. nekton takes 45<sup>s</sup> to verify the proof's correctness on an Apple M1 Pro.

### **4 Correctness and Implementation**

nekton checks that the verification conditions generated from the given proof outlines hold and that the assertions are interference-free. The program logic from [18,19] then gives the following semantic guarantee: no matter how many client threads execute the data structure functions, partial correctness holds. That is, if a function is executed from a state satisfying the precondition and terminates, it must have reached a state in which the postcondition held true. Termination itself is not guaranteed. The postcondition will relate the function's return value to a statement about membership of the given key in the data structure, and the keyset framework will allow us to conclude linearizability from this relation. The verification conditions will in particular make sure the node invariant is maintained. We discuss the actual checks.

The first step is to derive and check verification conditions for all commands com. If the command is surrounded by assertions, { *<sup>p</sup>* }; com; { *<sup>q</sup>* }, the verification condition is *sp*(*p*, com) <sup>|</sup><sup>=</sup> *<sup>q</sup>*, the strongest postcondition *sp* of *<sup>p</sup>* under com entails *<sup>q</sup>*. If the assertion { *<sup>q</sup>* } is not given, nekton completes the given proof by using *<sup>q</sup>* = *sp*(*p*, com). The verification conditions for loops are similar. For two consecutive assertions { *p* } ; { *q* }, as they occur for example at the end of a branch, the verification condition is *<sup>p</sup>* <sup>|</sup>= *<sup>q</sup>*.

The second step is to check that the assertions { *p* } and { *q* } in the proof are interference-free, i.e., cannot be invalidated by the actions of other threads.

Finally, nekton checks that the interferences given by the user cover the actual interferences of the program. We review the above steps in more detail.

**Strongest Postconditions.** The computation of the strongest postcondition follows the standard axioms for separation logic [23]. However, they do not deal with the flow which may not only be directly modified by com but also indirectly by an update elsewhere. To deal with such indirect updates, nekton computes a *footprint fp*: a subset of the heap locations that the standard axioms require plus those locations whose flow changes due to com. The footprint yields a decomposition *<sup>p</sup>* <sup>=</sup> *fp* <sup>∗</sup> *<sup>f</sup>* of predicate *<sup>p</sup>*, where f is a frame that is not affected by the update. From this decomposition, we compute the strongest postcondition as *sp*(*p*, com) = *sp*(*fp*, com) <sup>∗</sup> *<sup>f</sup>* , using the frame rule. Actually, nekton also shows that the update maintains the node invariant, which only requires a check for *sp*(*fp*, com).

For *fp* to be a footprint wrt. com, all nodes outside *fp* should receive the same flow from *sp*(*fp*, com) as from *fp*. This holds if *fp* and *sp*(*fp*, com) induce the same flow transformer function [20]. To determine a footprint, nekton takes a strategy that is justified by lock-free programming [18]. Starting from the updated nodes, it gathers a small (fixed) set of locations that forms an acyclic subgraph. Acyclicity guarantees that *fp* and *sp*(*fp*, com) have the same transformer iff they agree on the transformation along all paths: if n belongs to *fp* and n fld does not, then n fld must point to the same location and transform inflows to outflows in the same way in *fp* and in *sp*(*fp*, com).

The strongest postcondition above is for state-based reasoning. For predicates over computations, which have state and past predicates, we use the following observation: past predicates are never invalidated by commands. This allows us to just copy them to the postcondition: *sp*(*<sup>p</sup>* <sup>∗</sup> <sup>⟐</sup> *<sup>q</sup>*, com) = *sp*(*p*, com) <sup>∗</sup> <sup>⟐</sup> *<sup>p</sup>* <sup>∗</sup> <sup>⟐</sup> *<sup>q</sup>*. Note that we add the precondition as a new past predicate. Moreover, we may add *new* past predicates derived by hindsight arguments. As these derived past predicates are implied by the postcondition, they formally do not strengthen the assertion, but of course help the tool.

**Hindsight Reasoning.** Recall from Sect. 2 that hindsight reasoning draws conclusions of the form ⟐ *p* ∗ *q* ⇒ ⟐ *r* : every computation from a *p*-state must inevitably transition through *r* in order to reach *q*. In nekton, *p* and *q* are restricted to node-local predicates in the sense defined above, and *r* is fixed to *p* ∧ *q*.

To prove the implication, assume it did not hold. Then there is a computation where *p* is invalidated before *q* is established. This is covered by the interference: there is an action *act<sup>p</sup>* invalidating *p* and an action *act <sup>q</sup>* establishing *q*. Let *act<sup>p</sup>* and *act <sup>q</sup>* be *NInv*(n). { *<sup>o</sup><sup>p</sup>* } [... ]{ ... } resp. *NInv*(n). { *<sup>o</sup><sup>q</sup>* } [... ]{ ... }. There is (always) a decomposition *<sup>o</sup><sup>p</sup>* <sup>=</sup> *<sup>o</sup><sup>i</sup> <sup>p</sup>* <sup>∗</sup> *<sup>o</sup><sup>m</sup> <sup>p</sup>* such that *o<sup>i</sup> <sup>p</sup>* is immutable. Immutability holds if *o<sup>i</sup> p* is shared and interference-free. Consequently, *o<sup>i</sup> <sup>p</sup>* must still hold when *q* is established. Now, we check if *o<sup>i</sup> <sup>p</sup>* and *o<sup>q</sup>* are contradictory, *o<sup>i</sup> <sup>p</sup>* <sup>∧</sup> *<sup>o</sup><sup>q</sup>* <sup>|</sup><sup>=</sup> *false*. If so, *act<sup>q</sup>* is not enabled after *actp*. This, in turn, means *q* cannot be established after *p* is invalidated the computation cannot exist. nekton draws the hindsight conclusion if it can prove the contradiction for all pairs *actp*, *act <sup>q</sup>* of interferences that invalidate *p* and establish *q*.

**Entailment.** Our assertions *<sup>p</sup>* <sup>∗</sup> ∗*<sup>i</sup>*∈*<sup>I</sup>* <sup>⟐</sup> *<sup>p</sup><sup>i</sup>* consist of a predicate *<sup>p</sup>* for the current state and a set of past predicates ⟐ *p<sup>i</sup>* tracking information about the computation. We have *<sup>p</sup>* <sup>∗</sup> ∗*<sup>i</sup>*∈*<sup>I</sup>* <sup>⟐</sup> *<sup>p</sup><sup>i</sup>* <sup>|</sup><sup>=</sup> *<sup>q</sup>* <sup>∗</sup> ∗*<sup>j</sup>*∈*<sup>J</sup>* <sup>⟐</sup> *<sup>q</sup><sup>j</sup>* , if *<sup>p</sup>* <sup>|</sup><sup>=</sup> *<sup>q</sup>* and <sup>∀</sup><sup>j</sup> <sup>∃</sup>i. <sup>⟐</sup> *<sup>p</sup><sup>i</sup>* <sup>|</sup><sup>=</sup> <sup>⟐</sup> *<sup>q</sup><sup>j</sup>* . To show <sup>⟐</sup> *<sup>p</sup><sup>i</sup>* <sup>|</sup><sup>=</sup> <sup>⟐</sup> *<sup>q</sup><sup>j</sup>* , we rely on the algorithm for state predicates and prove *<sup>p</sup><sup>i</sup>* <sup>|</sup><sup>=</sup> *<sup>q</sup><sup>j</sup>* .

Entailment checks *<sup>p</sup>* <sup>|</sup>= *<sup>q</sup>* between state predicates decompose into reasoning about resources and reasoning about logically pure facts. The latter degenerates to an implication in classical logic: nekton uses a straightforward encoding into SMT and discharges it with Z3 [21]. For reasoning about resources, nekton implements a custom matching procedure to correlate the resources in *p* and *q*. The procedure is guided by the program variables x: if the value of x is a in *p* and b in *q*, then a and b are matched, meaning b is renamed to a. The procedure then continues to match the fields of already matched addresses. Finally, nekton checks syntactically if all the resources in *q* occur in *p*.

If nekton fails to prove an implication, it consults the implication list. It takes the implications as they are, and does not try to embed them into a context as would be justified by congruence. nekton does not track the precise implications it has used.

**Interference Freedom.** A state predicate *p* is interference-free wrt. *act* of the form *NInv*(n). { *<sup>r</sup>* } [fld1,..., fld*<sup>n</sup>*]{ *<sup>o</sup>* }, if the strongest postcondition of *<sup>p</sup>* under *act* entails *<sup>p</sup>* itself, *sp*(*p*, *act*) <sup>|</sup>= *<sup>p</sup>*. Towards *sp*(*p*, *act*), let *<sup>p</sup>* = *NInv*(x) <sup>∗</sup> *<sup>q</sup>*, meaning x is an accessible location. Applying *act* to x in *p* acts like an assignment to the fields such that their new values satisfy *o*. The strongest postcondition for this is standard [3]:

$$sp\_x(p, \operatorname{act}) \triangleq \operatorname{o}[n\backslash x] \ast \exists y\_1 \dots \neg y\_n. \operatorname{(}p \ast r[n\backslash x] \text{)} [x \star \mathsf{f1d\_1} \langle y\_1, \dots, x \star \mathsf{f1d\_n} \langle y\_n \rangle \text{]} \dots]$$

We strengthen *p* with the precondition *r* of *act* to make sure the action is enabled. We use *<sup>r</sup>* [n\x] for *<sup>r</sup>* with <sup>n</sup> replaced by <sup>x</sup>, meaning we instantiate *<sup>r</sup>* to location <sup>x</sup>. We replace the old values of the updated fields with fresh quantified variables and add the fields' new valuation *<sup>o</sup>*[n\x]. Then, the strongest postcondition *sp*(*p*, *act*) applies *spx*(*p*, *act*) to all locations <sup>x</sup> in *<sup>p</sup>*.

**Interference Coverage.** Consider *act*<sup>1</sup> <sup>=</sup> *NInv*(x). { *<sup>p</sup>* } [fld1,..., fld*n*]{ *<sup>q</sup>* } and *act*<sup>2</sup> <sup>=</sup> *NInv*(x). { *<sup>r</sup>* } [fld <sup>1</sup>,..., fld *<sup>m</sup>*]{ *<sup>o</sup>* }. We say that *act*<sup>1</sup> covers *act*<sup>2</sup> if *act*<sup>1</sup> can produce all updates induced by *act*2. This is the case if *<sup>r</sup>* <sup>|</sup><sup>=</sup> *<sup>p</sup>*, *<sup>o</sup>* <sup>|</sup><sup>=</sup> *<sup>q</sup>*, and { fld <sup>1</sup>,..., fld *<sup>m</sup>* }⊆{ fld1,..., fld*<sup>n</sup>* }. It remains to extract the actual interferences of the program and check if they are covered by the user-specified ones. The extraction is done while computing the strongest postcondition *sp*: the computed footprints *fp* and *sp*(*fp*, com) from above reveal the updated fields as well as the pre- and post-states.

**Flow Encoding.** The flow monoid is not yet parsed from the user input but defined programmatically in nekton. The transfer function generator is parsed. nekton has five flow domains predefined, including path counting and keysets, which are easy to extend. nekton does not check whether the flow monoid is indeed a monoid and satisfies the requirements of an ω-cpo, nor whether ≤ coincides with the natural partial order.

The main task in dealing with a parametric rather than fixed flow domain is to encode predicates involving the flow into SMT formulas. This encoding is then used to implement the aforementioned components for strongest postconditions, hindsight, entailment, and interferences. Devising the encoding is challenging because it requires a representation of flow values that is sufficiently expressive to define relevant flow domains, yet sufficiently restricted to have efficient SMT solver support (we use Z3 [21]). With the input format described in Sect. 2, we encode flows using the theory of integers and uninterpreted functions.

**Limitations.** For the future, we see several directions for extensions of our current implementation: (i) a parser for flow monoids rather than a programmatic interface, (ii) support for *partial* annotations that are automatically completed by nekton, (iii) the ability to prove atomic triples instead of just linearizability for sets, and (iv) more helpful error messages or counterexamples to guide the proof-writing user.

**Acknowledgments.** This work was funded in part by an Amazon Research Award. The work was also supported by the DFG project *EDS@SYN: Effective Denotational Semantics for Synthesis*. The fourth author is supported by a Junior Fellowship from the Simons Foundation (855328, SW).

**Data Availability Statement.** The nekton tool and case studies generated and/or analysed in the present paper are available in the Zenodo repository [17], https://doi.org/10.5281/zenodo. 7931936.

### **References**

1. Abdulla, P.A., Jonsson, B., Trinh, C.Q.: Fragment abstraction for concurrent shape analysis. In: Ahmed, A. (ed.) ESOP 2018. LNCS, vol. 10801, pp. 442–471. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89884-1\_16


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Overcoming Memory Weakness with Unified Fairness Systematic Verification of Liveness in Weak Memory Models

Parosh Aziz Abdulla<sup>1</sup> , Mohamed Faouzi Atig<sup>1</sup> , Adwait Godbole2(B) , Shankaranarayanan Krishna<sup>3</sup> , and Mihir Vahanwala<sup>4</sup>

> <sup>1</sup> Uppsala University, Uppsala, Sweden {parosh,mohamed\_faouzi.atig}@it.uu.se <sup>2</sup> University of California Berkeley, Berkeley, USA adwait@berkeley.edu <sup>3</sup> IIT Bombay, Mumbai, India krishnas@cse.iitb.ac.in <sup>4</sup> MPI-SWS, Saarbrücken, Germany mvahanwa@mpi-sws.org

Abstract. We consider the verification of liveness properties for concurrent programs running on weak memory models. To that end, we identify notions of fairness that preclude demonic non-determinism, are motivated by practical observations, and are amenable to algorithmic techniques. We provide both logical and stochastic definitions of our fairness notions, and prove that they are equivalent in the context of liveness verification. In particular, we show that our fairness allows us to reduce the liveness problem (repeated control state reachability) to the problem of simple control state reachability. We show that this is a general phenomenon by developing a uniform framework which serves as the formal foundation of our fairness definition, and can be instantiated to a wide landscape of memory models. These models include SC, TSO, PSO, (Strong/Weak) Release-Acquire, Strong Coherence, FIFOconsistency, and RMO.

### 1 Introduction

Safety and liveness properties are the cornerstones of concurrent program verification. While safety and liveness are complementary, verification methodologies for the latter tend to be more complicated for two reasons. First, checking safety properties, in many cases, can be reduced to the (simple) reachability problem, while checking liveness properties usually amounts to checking repeated reachability of states [47]. Second, concurrency comes with an inherent *scheduling non-determinism*, i.e., at each step, the scheduler may non-deterministically select the next process to run. Therefore, liveness properties need to be accompanied by appropriate fairness conditions on the scheduling policies to prohibit trivial blocking behaviors [42]. In the example of two processes trying to acquire a lock, demonic non-determinism [20] may always favour one process over the other, leading to starvation.

Despite the gap in complexity, the verification of liveness properties has attracted much research in the context of programs running under the classical Sequential Consistency (SC) [40]. An execution of a program under SC is a non-deterministically chosen interleaving of its processes' atomic operations. A write by any given process is immediately visible to all other processes, and reads are made from the most recent write to the memory location in question. SC is (relatively) simple since the only non-determinism comes from interleaving.

*Weak memory models* forego the fundamental SC guarantee of immediate visibility of writes to optimize for performance. More precisely, a write operation by a process may asynchronously be propagated to the other processes. The delay could be owed to physical buffers or caches, or could simply be a virtual one thanks to instruction reorderings allowed by the semantics of the programming language. Hence we have to contend with a (potentially unbounded) number of write operations that are "in transit", i.e., they have been issued by a process but they have yet to reach the other processes. In this manner, weak memory introduces a second source of non-determinism, namely *memory nondeterminism*, reflecting the fact that write operations are non-deterministically (asynchronously) propagated to the different processes. Formal models for weak memory, ranging from declarative models [8,21,35,39,41] to operational ones [15,30,43,46] make copious use of non-determinism (non-determinism over entire executions in the case of declarative models and non-deterministic transitions in the case of operational models). While we have seen extensive work on verifying safety properties for program running under weak memory models, the literature on liveness for programs running under weak memory models is relatively sparse, and it is only recently we have seen efforts in that direction [5,36].

As mentioned earlier, we need fairness conditions to exclude demonic behaviors when verifying liveness properties. A critical issue here is to come up with an *appropriate* fairness condition, i.e., a condition that (i) is sufficiently strong to eliminate demonic non-determinism and (ii) is sufficiently weak to allow all "good" program behaviors. To illustrate the idea, let us go back to the case of SC. Here, traditional fairness conditions on processes, such as *strong fairness* [31], are too weak if interpreted naively, e.g. "along any program run, each process is scheduled infinitely often". The problem is that even though a strongly fair scheduler may pick a process infinitely often, it may choose to do so only in configurations where the process cannot progress since its guards are not satisfied. Such guards may, for instance, be conditions on the values of the shared variables. For example, executions of the program in Fig. 1 may not terminate under SC, since the second process may only get scheduled when the value of x is 2, thereby looping infinitely around the do-while loop.

Stronger fairness conditions such as transition fairness, and probabilistic fairness [11,27] can help avoid this problem. They imply that any *transition* enabled infinitely often is also taken infinitely often (with probability one in the case of probabilistic fairness). Transition fairness eliminates demonic scheduler nondeterminism, and hence it is an appropriate notion of fairness in the case o SC.

$$\begin{array}{lclclcl}\text{for } & \mathbf{0};\\\text{while } & \{\mathbf{r} \, != \mathbf{1}\} & \{\mathbf{\{0}};\\\quad \mathbf{x} = \mathbf{1}; \; \mathbf{x} = \mathbf{2}; \; x = \mathbf{y};\\\} & \mathbf{y} = \mathbf{1};\end{array} \qquad \begin{array}{lcl}\text{do } & \{\mathbf{\{s}} \, = \mathbf{x}; \; \} & \text{until } \{\mathbf{s} = \mathbf{0}\} \\\quad \mathbf{y} = \mathbf{1};\\\end{array}$$

Fig. 1. Does this program always terminate? Only if we can guarantee that the process to the right will eventually be scheduled to read when x = 1.

However, it is unable to eliminate demonic memory non-determinism. The reason is that transition fairness allows runs of the programs where write events occur at a higher frequency than the frequency in which they are propagated to the processes. This means that, in the long run, a process may only see its own writes, potentially preventing its progress and, therefore, the system's progress as a whole. This scenario is illustrated in Fig. 2.

$$\begin{array}{rcl} \text{do } \{ \mathbf{x} = \mathbf{1}; \mathbf{j} \} \\ \text{und } \mathbf{i} \mathbf{1} \quad \{ \mathbf{x} = \mathbf{2} \text{ or } \mathbf{y} = \mathbf{1} \}; \quad \begin{array}{rcl} \text{do } \{ \mathbf{x} = \mathbf{2}; \mathbf{j} \} \\ \text{und } \mathbf{i} \mathbf{1} \quad \{ \mathbf{x} = \mathbf{1} \text{ or } \mathbf{y} = \mathbf{1} \}; \\ \mathbf{y} = \mathbf{1}; \end{array} \\ \end{array}$$

Fig. 2. This program is guaranteed to terminate under any model only if pending propagation is guaranteed to not accumulate unboundedly: e.g. in TSO, each process may never see the other's writes due to an overflowing buffer.

To deal with memory non-determinism, we exploit the fact that the sizes of physical buffers or caches are bounded, and instruction reorderings are bounded in scope. Therefore, in any practical setting, the number of writes in transit at a given moment cannot be unbounded indefinitely. This is what we seek to capture in our formalism. Based on this observation, we propose three new notions of fairness that (surprisingly) all turn out to be equivalent in the context of liveness. First, we introduce *boundedness fairness* which only considers runs of the system for which there is a bound b on the number of events in transit, in each configuration of the run. Note that the value of b is arbitrary (but fixed for a given run). Bounded fairness is apposite: (i) it is sufficiently strong to eliminate demonic memory non-determinism, and (ii) it is sufficiently weak to allow all reasonable behaviors (as mentioned above, practical systems will bound the number of transient messages). Since we do not fix the value of the bound, this allows parameterized reasoning, e.g., about buffers of any size: our framework does not depend on the actual value of the bound, only on its mere existence. Furthermore, we define two additional related notions of fairness for memory non-determinism. The two new notions rely on *plain configurations*: configurations in which there are no transient operations (all the writes operations have reached all the processes). First, we consider *plain fairness*: along each infinite run, the set of plain configurations is visited infinitely often, and then define the *probabilistic* version: each run almost surely visits the set of plain configurations. We show that the three notions of fairness are equivalent (in Sect. 4, we make precise the notion of equivalence we use).

After we have defined our fairness conditions, we turn our attention to the verification problem. We show that verifying the repeated reachability under the three fairness conditions, for a given memory model m, is reducible to the simple reachability under m. Since our framework does not perform program transformations we can prove liveness properties for program P through proving simple reachability on the same program P. As a result we obtain two important sets of corollaries: if the simple reachability problem is decidable for m, then the repeated reachability problem under the three fairness conditions are also decidable. This is the case when the memory model m is TSO, PSO, SRA, etc. Even when the simple reachability problem is not decidable for m, e.g., when m is RA, RMO, we have still succeeded to reduce the verification of liveness properties under fairness conditions to the verification of simple probability. This allows leveraging proof methodologies developed for the verification of safety properties under these weak memory models (e.g., [22,29]).

Having identified the fairness conditions and the verification problem, there are two potential approaches, each with its advantages and disadvantages. We either instantiate a framework for individual memory models one after one or define a general framework in which we can specify multiple memory models and apply the framework "once for all". The first approach has the benefit of making each instantiation more straightforward, however, we always need to translate our notion of fairness into the specific formulation. In the second approach, although we incur the cost of heavier machinery, we can subsequently take for granted the fact that the notion of fairness is uniform across all models, and coincides with our intuition. This allows us to be more systematic in our quest to verify liveness. In this paper, we have thus chosen to adopt the second approach. We define a general model of weak memory models in which we represent write events as sequences of messages ordered per variable and process. We augment the message set with additional conditions describing which messages have reached which processes. We use this data structure to specify our fairness conditions and solve our verification problems. We instantiate our framework to apply our results to a wide variety of memory models, such as RMO [12], FIFO consistency, RA, SRA, WRA [34,35], TSO [13], PSO, StrongCOH (the relaxed fragment of RC11) [30], and SC [40].

In summary, we make the following contributions


– Prove the decidability of liveness properties for models such as TSO, PSO, SRA, WRA, StrongCOH, and opening the door for leveraging existing proof frameworks for simple reachability for other models such as RA.

We give an overview of a wide landscape of memory models in Sect. 3.3, and provide a high level explanation of the versatility of our framework.

Structure of the Paper. We begin by casting concurrent programs as transition systems in Sect. 2. In Sect. 3, we develop our framework for the memory such that the desired fairness properties can be meaningfully defined across several models. In Sect. 4, we define useful fairness notions and prove their equivalence. Finally, in Sect. 5 we show how the liveness problems of repeated control state reachability reduce to the safety problem of control state reachability, and obtain decidability results. A full version of this paper is available at [6].

### 2 Modelling Concurrent Programs

We consider concurrent programs as systems where a set of processes run in parallel, computing on a set of process-local variables termed as *registers* and communicating through a set of *shared variables*. This inter-process communication, which consists of reads from, writes to, and atomic compare-and-swap operations on shared variables, is mediated by the *memory subsystem*. The overall system can be visualized as a composition of the process and memory subsystems working in tandem. In this section we explain how concurrent programs naturally induce *labelled transition systems*.

#### 2.1 Labelled Transition Systems

A labelled transition system is a tuple T = -Γ,→,Λ where Γ is a (possiblyinfinite) set of configurations, →⊆ Γ × Λ × Γ is a transition relation, and Λ is the set of labels that annotate transitions. We also refer to them as annotations to disambiguate from instruction labels. We write γ l −→ γ to denote that (γ,l, γ ) ∈→, in words that there is a transition from γ to γ with label l. We denote the transitive closure of <sup>→</sup> by <sup>∗</sup> −→, and the k-fold self-composition (for <sup>k</sup> <sup>∈</sup> <sup>N</sup>) as <sup>k</sup> −→.

*Runs and Paths.* A (possibly infinite) sequence of valid transitions ρ = γ<sup>1</sup> → γ<sup>2</sup> → γ<sup>3</sup> ··· is called a run. We say that a run is a γ-run if the initial configuration in the run is γ, and denote the set of γ-runs as Runs(γ). We call a (finite) prefix of a run as a *path*. In some cases transition systems are *initialized*, i.e. an initial set Γinit ⊆ Γ is specified. In such cases, we call runs starting from some initial configuration (γ<sup>1</sup> → γ<sup>2</sup> → γ<sup>3</sup> ... with γ<sup>1</sup> ∈ Γinit) as initialized runs.

#### 2.2 Concurrent Programs

The sequence of instructions executed by each process is dictated by a concurrent program, which induces a *process subsystem*. We begin by formulating the notion of a program. We assume a finite set P of processes that operate over a (finite) set X of shared variables. Figure 3 gives the grammar for a small but general assembly-like language that we use for defining the syntax of concurrent programs. A program instance, prog is described by a set of shared variables, var∗, followed by the codes of the processes, (proc reg<sup>∗</sup> instr <sup>∗</sup>)∗. Each process <sup>p</sup> <sup>∈</sup> <sup>P</sup> has a finite set Regs(p) of (local) *registers*. We assume w.l.o.g. that the sets of registers of the different processes are disjoint, and define Regs(prog) := ∪<sup>p</sup>∈<sup>P</sup>Regs(p). We assume that the data domain of both the shared variables and registers is a finite set <sup>D</sup>, with a special element <sup>0</sup> <sup>∈</sup> <sup>D</sup>. The code of a process starts by declaring its set of registers, reg∗, followed by a sequence of instructions.

prog ::= var<sup>∗</sup> (proc reg<sup>∗</sup> instr<sup>∗</sup> )<sup>∗</sup> instr ::= lbl : stmt stmt ::= var:=reg | reg:=var | reg:= CAS(var ,reg ,reg) | reg:=expr | if reg then lbl | term

Fig. 3. A simple programming language.

An instruction i is of the form l : stmt where l is a unique (across all processes) instruction label that identifies the instruction, and stmt is a statement. The labels comprise the set of values the program counters of the processes may take. The problems of (repeated) instruction label reachability, which ask whether a section of code is accessed (infinitely often), are of importance to us.

Read (reg := var) and write (var : = reg) statements read the value of a shared variable into a register, and write the value of a register to a shared variable respectively. The CAS statement is the *compare-and-swap* operation which atomically executes a read followed by a write. We consider a non-blocking version of the CAS operation which returns a boolean indicating whether the operation was successful (the expected value was read and atomically updated to the new value). The write is performed only if the read matches the expected value.

We assume a set expr of expressions containing a set of operators applied to constants and registers without referring to the shared variables. The reg := expr statement updates the value of register reg by evaluating expression expr. We exact set of expressions is orthogonal to our treatment, and hence left uninterpreted. The if-statement has its usual interpretation, and control flow commands such as while, for, and goto-statements, can be encoded with branching and if-statements as usual.

#### 2.3 Concurrent Programs as Labelled Transition Systems

We briefly explain the abstraction of a concurrent program as a labelled transition system. The details for the process component, i.e. evolution of the program counter and register contents, follow naturally. The key utility of this approach lies in the modelling of the memory subsystem, which we devote §3 to.

*Configurations.* A configuration γ is expressed as a tuple -(L, R), γm, where L maps processes to their current program counter values, R maps registers to their current values, and γ<sup>m</sup> captures the current state of the (weak) memory.

*Transitions.* In our model, a step in our system is either: (a) a silent memory update, or (b) a process executing its current instruction. In case (a), only the memory component γ<sup>m</sup> of γ changes. The relation is governed by the definition of the memory subsystem. In case (b), if the instruction is the terminal one, or assigns an expression to a register, or a conditional, then only the process component (L, R) of γ changes. Here, the relation is obvious. Otherwise, the two components interact via a read, write or CAS, and both undergo changes. Here again, the relation is governed by what the memory subsystem permits.

*Annotations.* Silent memory update steps are annotated with m : Upd. Transitions involving process p executing an instruction that does not involve memory are annotated with p : ⊥. On the other hand, p : R(x, d), p : W(x, d), p : CAS(x, d, d , b) represent reads, writes and CAS operations by p respectively. The annotations indicate the variable and the associated values.

To study this transition system, one must understand which transitions, annotated thus, are enabled. For this, it is clear that we must delve into the details of the memory subsystem.

### 3 A Unified Framework for Weak Memory Models

In this section, we present our unified framework for representing weak memory models. We begin by describing the modelling aspects of our framework at a high level.

We use a message-based framework, where each write event generates a *message*. A process can use a write event to justify its read only if the corresponding message has been *propagated* to it by the memory subsystem. The total chronological order in which a process p writes to variable x is given by poloc (per-location program order). We work with models where the order of propagation is consistent with poloc. This holds for several models of varying strengths. This requirement allows us to organise messages into per-variable, per-process *channels*. We discuss these aspects of the framework in Sect. 3.1. Weak memory models define additional causal dependencies over poloc. Reading a message may cause other messages it is dependent on to become illegible. We discuss our mechanism to capture these dependencies in Sect. 3.2. The strength of the constraints levied by causal dependencies varies according to memory model. In Sect. 3.3, we briefly explain how our framework allows us to express causality constraints of varying strength, by considering a wide landscape of weak memory models. We refer the reader to [6] for the technical details of the instantiations.

#### 3.1 Message Structures

Message. A write by a process to a variable leads to the formation of a *message*, which, first and foremost records the value being written. In order to ensure atomicity, a message also records a boolean denoting whether the message can be used to justify the read of an *atomic* read-write operation, i.e. CAS. Finally, to help with the tracking of causal dependencies generated by read events, a message records a set of processes seen <sup>⊆</sup> <sup>P</sup> that have sourced a read from it. Thus, a message is a triple and we define the set of messages as: Msgs <sup>=</sup> <sup>D</sup>×B×2<sup>P</sup>.

Channels. A *channel* e(x, p) is the sequence of messages corresponding to writes to x by process p. The total poloc order of these writes naturally induces the channel order. By design, we will ensure that the configuration holds finitely many messages in each channels. We model each channel as a word over the message set: e(x, p) ∈ Msgs∗. A message structure is a collection of these channels: <sup>e</sup> : <sup>X</sup> <sup>×</sup> <sup>P</sup> <sup>→</sup> Msgs∗.

#### 3.2 Ensuring Consistency of Executions

Memory models impose constraints restricting the set of message that can be read by a process. The framework uses state elements frontier,source, constraint that help enforce these constraints. These elements reference positions within each channel which is something that we now discuss.

Channel Positions. The channel order provides the order of propagation of write messages to any process (which in turn is aligned with poloc). Thus, for any process p , channel e(x, p) is partitioned into a prefix of messages that are outdated, a null or singleton set of messages that can be used to justify a read, and a suffix of messages that are yet to be propagated. In order to express these partitions, we need to identify not only nodal positions, but also to *internodes* (space between nodes). To this end, we index channels using the set <sup>W</sup> <sup>=</sup> <sup>N</sup>∪N<sup>+</sup>. Positions indexed by N denote nodal positions (with a message), while positions indexed with N<sup>+</sup> denote internodes. For a channel of length n, the positions are ordered as: = 0<sup>+</sup> <sup>&</sup>lt; <sup>1</sup> <sup>&</sup>lt; <sup>1</sup><sup>+</sup> <sup>&</sup>lt; <sup>2</sup> ··· <n<n<sup>+</sup> <sup>=</sup> <sup>⊥</sup>. A process can read from the message located at <sup>e</sup>(·, ·)[i] for <sup>i</sup> <sup>∈</sup> <sup>N</sup>.

Frontier. With respect to a given process, a message can either have been propagated but not readable, propagated and readable, or none. Since the propagation order of messages follows channel order, the propagated set of messages forms a prefix of the channel. This prefix-partitioning is achieved by a map frontier : <sup>P</sup> <sup>×</sup> <sup>X</sup> <sup>×</sup> <sup>P</sup> <sup>→</sup> <sup>W</sup>. If frontier(p, ·, ·) is an internode (of form <sup>i</sup> <sup>+</sup>) then the message v = e[i] has been propagated to p, but cannot be read because it is outdated. On the other hand, if frontier(p, ·, ·) = <sup>i</sup> <sup>∈</sup> <sup>N</sup>, then the message e[i] can be read by the process. In Fig. 4, frontier(p1, x, p1/p2/p3) equal v<sup>+</sup> <sup>1</sup> /v2/v<sup>3</sup> respectively (the colored nodes). Consequently, the message at index v (or the ones before it) are unreadable (denoted by the pattern). On the other hand the messages at v2, v<sup>3</sup> are readable.

Fig. 4. Frontier and source. Fig. 5. Constraint.

Source. Given process p and variable x, the process potentially can source the read from any of the <sup>|</sup>P<sup>|</sup> channels on <sup>x</sup>. The second state element, source : <sup>P</sup> <sup>×</sup> <sup>X</sup> <sup>→</sup> <sup>P</sup> performs arbitration over this choice of read sources: <sup>p</sup> can read <sup>v</sup> only if v = frontier(p, x,source(p, x)). In Fig. 4, while both nodes v2, v<sup>3</sup> are not outdated, source(p1, x) = p3, making v<sup>3</sup> the (checkered) node which p<sup>1</sup> reads from.

Constraint. The constraint element tracks causal dependencies between messages. For each message m, and channel, it identifies the last message on the channel that is a causal predecessor of m. It is defined as a map constraint : <sup>N</sup> <sup>×</sup> <sup>X</sup> <sup>×</sup> <sup>P</sup> <sup>→</sup> <sup>W</sup>. Figure <sup>5</sup> illustrates possible constraint(v3, ·, ·) pointers for message node v<sup>3</sup> in the context of the channel configuration in Fig. 4.

Constraint. The frontier state marks the last messages in each channel that can be read by a process. Messages that are earlier than the frontier of all processes can be effectively eliminated from the system since they are illegible. We call this garbage collection (denoted as GC).

*The overall memory configuration,*

$$\gamma\_{\mathfrak{m}} = \langle \mathbf{e}, \underbrace{(\mathbb{P} \times \mathbb{X} \times \mathbb{P} \to \mathbb{W})}\_{\text{frontizer}}, \underbrace{(\mathbb{P} \times \mathbb{X} \to \mathbb{P})}\_{\text{source}}, \underbrace{(\mathbb{V} \times \mathbb{X} \times \mathbb{P} \to \mathbb{W})}\_{\text{constraint}}\rangle$$

consists of the message structure along with the consistency enforcing state.

Read Transition. Our framework allows a unified read transition relation which is independent of the memory model that we work with. We now discuss this transition rule which is given in Fig. 6. Suppose process p is reading from variable x. First, we identify the arbitrated process p<sup>s</sup> which is read from using the source state. Then we pick the message on the (x, ps) channel which the frontier of p points to. Note that this must be a node of form N. The read value is the value in this message. Finally, we update the frontier(p, ·, ·) to reflect the fact that all messages in the causal prefix of the read message have propagated to p.

$$\begin{array}{c} p\_s = \gamma.\text{source}(p,\mathsf{x}) \qquad \mathsf{v} = \gamma.\text{front}(p,\mathsf{x},p\_s) \qquad \mathsf{v}.\mathsf{value} = \mathsf{d} \\ \gamma\_1 = \gamma\_1 [\mathsf{v}.\mathsf{seen} \leftarrow \mathsf{v}.\mathsf{seen} \cup \{p\}] \\ \gamma\_2 = \mathsf{GC}(\gamma\_1 [\mathsf{ay}.\lambda p'.\mathsf{front}(p,\mathsf{y},p') \leftarrow \max(\text{front}(p,\mathsf{y},p'),\mathsf{constraint}(\mathsf{v},\mathsf{y},p'))]) \\ \gamma \xrightarrow{p.\mathsf{R}(\mathsf{x},\mathsf{d})\_\mathsf{n}} \gamma\_2 \end{array}$$

Fig. 6. The read transition, common to all models across the framework. For <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup>m, γ.frontier, γ.source, γ.constraint represent the respective components of γ. For a node <sup>v</sup> <sup>∈</sup> Msgs, <sup>v</sup>.value <sup>∈</sup> <sup>D</sup> represents the written value in the message node <sup>v</sup>.

*Example 1 (Store Buffer (SB)).* Fig. 7 shows the Store Buffer (SB) litmus test. The annotated outcome of store buffering is possible in all WRA/RA/SRA/TSO models. Right after p<sup>1</sup> (resp. p2) has performed both its writes to x (resp. y), we have <sup>e</sup>(y, p2) = <sup>v</sup><sup>0</sup> <sup>y</sup> v<sup>1</sup> <sup>y</sup>⊥, and <sup>e</sup>(x, p1) = <sup>v</sup><sup>0</sup> <sup>x</sup> v<sup>1</sup> <sup>x</sup>⊥.

This example illustrates how weak memory models allow non-deterministic delays the propagation of messages. In this example, frontier(p2, x, p1) = v<sup>0</sup> <sup>x</sup> , and frontier(p1, y, p2) = v<sup>0</sup> y , both processes see non-recent messages. On the other hand, the annotated outcomes are observed if source(p1, y) = p<sup>2</sup> and source(p2, x) = p1.

$$\begin{array}{rcl} p\_1 &=& \mathbf{0} & \land \quad \lor\_{\times}^{0} \mathbf{y} &=& \mathbf{0} & \land\_{\times}^{0} \mathbf{y} \\ \mathbf{x} &=& \mathbf{1} & \land \quad \lor\_{\times}^{1} \mathbf{y} &=& \mathbf{1} & \land\_{\times}^{0} \mathbf{y} \\ x &=& \mathbf{y} & \land \quad \lor \quad \boldsymbol{\phi} &=& \mathbf{x} & \land \quad \boldsymbol{\phi} \end{array}$$

$$\mathbf{\dot{F} \mathbf{g} . 7. \text{ SB}$$

We now turn to a toy example (Fig. 8) to illustrate the dependency enforcing and book-keeping mechanisms we have introduced.

*Example 2.* Consider an program with two shared variables, x, y, and two processes, p1, p2. We omit the channel e(p2, y) for space. Process p1's frontiers are shown in violet, p2's frontiers are shown in orange. We begin with the first memory configuration. The arrow depicts constraint(v1, y, p1) = v2. This situation can arise in a causally consistent model where the writer of v<sup>1</sup> was aware of v<sup>2</sup> before writing v1. The first transition shows p<sup>2</sup> updating and moving its frontier (to v1). This results in a redundant node (v<sup>3</sup> in hashed texture) since the frontier of both p<sup>1</sup> and p<sup>2</sup> has crossed it. This is cleaned up by GC. Now, p<sup>2</sup> begins its read from v1. Reading v1, albeit on x, makes all writes by p<sup>1</sup> to y prior to v<sup>2</sup> redundant. When p<sup>2</sup> reads v1, its frontier on e(y, p1) advances as prescribed by constraint(v1, y, p1), as shown in the fourth memory configuration. Note that this makes another message (v4) redundant: all frontiers are past it. Once again, GC discards the message obtaining the last configuration.

Fig. 8. Update, constraint in action during a read, and garbage collection

#### 3.3 Instantiating the Framework

Versatility of the Framework. The framework we introduce can be instantiated to RMO [12], FIFO consistency, RA, SRA, WRA [34,35], TSO [13], PSO, StrongCOH (the relaxed fragment of RC11) [30], and SC [40].

This claim is established by constructing semantics for each of these models using the components that we have discussed. We provide a summary of the insights, and defer the technical details to the full version [6].

Fig. 9. Memory models, arranged by their strength. An arrow from <sup>A</sup> to <sup>B</sup> denotes that B is strictly more restrictive than A. A green check (resp. red cross) denotes the control state reachability is decidable (resp. undecidable). (Color figure online)

We briefly explain how our framework accounts for the increasing restrictive strength of these memory models. The weakest of these is RMO, which only enforces poloc. There are no other causal dependencies, and thus for any message the constraint on other channels is . RMO can be strengthened in two ways: StrongCOH does it by requiring a total order on writes to the same variable, i.e. mox. Here the constraint is nontrivial only on channels of the same variable. On the other hand, FIFO enforces consistency with respect to the program order. Here, the constraint is nontrivial only on channels of the same process. WRA strengthens FIFO by enabling reads to enforce causal dependencies between write messages. This is captured by the non-trivial constraint, and we note that seen (the set of processes to have sourced a read from a message) plays a crucial role here. RA enforces the mo<sup>x</sup> of StrongCOH as well as the causal dependencies of WRA. PSO strengthens StrongCOH by requiring a stronger precondition on the execution of an atomic read-write. More precisely, in any given configuration, for every variable, there is at most one write message that can be used to source a CAS operation, i.e. with the CAS flag set to true. SRA and TSO respectively strengthen RA and PSO by doing away with write races. Here, the Boolean CAS flag in messages is all-important as an enforcer of atomicity. TSO strengthens SRA in the same way as PSO strengthens StrongCOH. Finally, when we get to SC, the model is so strong that all messages are instantly propagated. Here, for any message, the pointer on other channels is ⊥.

### 4 Fairness Properties

Towards the goal of verifying liveness, we use the framework we developed to introduce fairness properties in the the classical and probabilistic settings in Sect. 4.1 and Sect. 4.2 respectively. Our approach thus has the advantage of generalising over weak memory. In Sect. 4.3 we relate these fairness properties in the context of repeated control state reachability: a key liveness problem.

#### 4.1 Transition and Memory Fairness

In this section, we consider fairness in the classical (non-probabilistic) case. We begin by defining transition fairness, which [11] is a standard notion of fairness that disallows executions which neglect certain transitions while only taking others. For utility in weak memory liveness, we then augment transition fairness to meet practical assumptions on the memory subsystem. Transition fairness and probabilistic fairness are intrinsically linked [27, Section 11]. Our augmentations are designed to carry over to the probabilistic domain in the same spirit.

Definition 1 (Transition fairness, [11]). *We say that a program execution is transition fair if for every configuration that is reached infinitely often, each transition enabled from it is also taken infinitely often.*

We argued the necessity of transition fairness in the introduction; however, it is vacuously satisfied by an execution that visits any configuration only finitely often. This could certainly be the case in weak memory, where there are infinitely many configurations. To make a case for the implausibility of this scenario, we begin by characterising classes of weak memory configurations.

Definition 2 (Configuration size). *Let* γ *be a program configuration with memory component* (e, frontier,source, constraint)*. We denote the configuration size by* size(γ) *and it is defined as* x <sup>p</sup> len(e(x, p))*, i.e. the total number of messages in the message structure.*

Intuitively, the size of the configuration is the number of messages "in transit", and hence a measure of the weakness of the behaviour of the execution. We note that overly weak behaviour is rarely observed in practice [23,45]. For instance, instruction reorderings that could be observed as weak behaviour are limited in scope. Another source of weak behaviour is the actual reading of stale values at runtime. However, the hardware (i.e. caches, buffers, etc.) that stores these values is finite, and is typically flushed regularly. Informally, the finite footprint of the system architecture (eg. micro-architecture) implies a bound, albeit hard to compute, on the size of the memory subsystem. Thus, we use the notion of configuration size to define:

Definition 3 (Size Bounded Executions). *An execution* γ0, γ1,... *is said to be size bounded, if there exists an* <sup>N</sup> *such that for all* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*,* size(γn) <sup>≤</sup> <sup>N</sup>*. If this* N *is specified, we refer to the execution as* N*-bounded.*

Already, the requirement of size-boundedness enables our system to refine our practical heuristics. However, if the bound N is unknown, it isn't immediately clear how this translates into a technique for liveness verification. We will now use the same rationale to motivate and develop an alternate augmentation which lends itself more naturally to algorithmic techniques. Recall that we intuitively relate the size of the configuration to the extent of weak behaviour. Now, consider Sequential Consistency, the strongest of the models. All messages are propagated immediately, and hence, the configuration has minimal size throughout. We call minimally sized configurations *plain*, and they are of particular interest to us:

Definition 4 (Plain message structure). *A message structure* (V, msgmap, e) *is called plain, if for each variable* x*,* <sup>p</sup> len(e(x, p)) = 1*.*

Drawing a parallel with SC, one could reason that the recurrence of plain configurations is a hallmark of a system that doesn't exhibit overly weak behaviour. This is captured with the following fairness condition.

Definition 5 (Repeatedly Plain Executions). *An execution* γ0, γ1,... *is said to be repeatedly plain, if* γ<sup>i</sup> *is a plain configuration for infinitely many* i*.*

Following the memory transition system introduced in Sect. 2 and Sect. 3, we observe that every configuration has a (finite) path to some plain configuration (by performing a sequence of update steps). Hence, if a configuration is visited infinitely often in a fair execution, a plain configuration will also be visited infinitely often. Consequently, size bounded transition fair runs are also repeatedly plain transition fair.

#### 4.2 Probabilistic Memory Fairness

Problems considered in a purely logical setting ask whether *all* executions satisfying a fairness condition fulfill a liveness requirement. However, if the answer is negative, one might be interested in quantifying the fair executions which do not repeatedly reach the control state. We perform this quantification by considering the probabilistic variant of the model proposed earlier, and defining fairness analogously as a property of Markov Chains.

*Markov chains* A Markov chain is a pair C = -Γ, M where Γ is a (possibly-infinite) set of configurations and M is the transition matrix which assigns to each possible transition, a *transition probability*: M : Γ × Γ → [0, 1]. Indeed, this matrix needs to be stochastic, i.e., γ-<sup>∈</sup><sup>Γ</sup> <sup>M</sup>(γ, γ )=1 should hold for all configurations.

We can convert our concurrent program transition (Sect. 2) into a Markov chain M by adding probabilities to the transitions. We assign M(γ, γ ) to a nonzero value if and only if the transition γ → γ is allowed in the underlying transition system. Markov Chain executions are, by construction, transition fair with probability 1. We now present the analog of the repeatedly plain condition.<sup>1</sup>

Definition 6 (Probabilistic Memory Fairness. *A Markov chain is considered to satisfy probabilistic memory fairness if a plain configuration is reached infinitely often with probability one.*

This parallel has immense utility because verifying liveness properties for a class of Markov Chains called Decisive Markov Chains is well studied. [7] establishes that the existence of a *finite attractor*, i.e. a finite set of states F that is repeatedly reached with probability 1, is sufficient for decisiveness. The above definition asserts that the set of plain configurations is a finite attractor.

#### 4.3 Relating Fairness Notions

Although repeatedly plain transition fairness is weaker than size bounded transition fairness and probabilistic memory fairness, these three notions are equivalent with respect to canonical liveness problems, i.e. repeated control state reachability and termination. The proof we present for repeated reachability can be adapted for termination.

Theorem 1. *There exists* <sup>N</sup><sup>0</sup> <sup>∈</sup> <sup>N</sup> *such that for all* <sup>N</sup> <sup>≥</sup> <sup>N</sup>0*, the following are equivalent for any control state (program counters and register values)* c*:*


<sup>1</sup> A concrete Markov Chain satisfying the declarative definition may be adapted from the one described in [5] in a similar setting.

*Proof.* For each <sup>N</sup> <sup>∈</sup> <sup>N</sup>, we construct a connectivity graph <sup>G</sup><sup>N</sup> . The vertices are the finitely many plain configurations γ, along with the finitely many control states c. We draw a directed edge from γ<sup>i</sup> to γ<sup>j</sup> , if γ<sup>j</sup> is reachable from γ<sup>i</sup> via configurations of size at most N. We additionally draw an edge from a plain configuration γ to control state c iff c is reachable from γ via configurations of size at most N. We similarly construct a connectivity graph G without bounds on intermediate configuration sizes. We note:


Since plain configurations are attractors, the graph G is instrumental in deciding repeated control state reachability. Consider the restriction of G to plain configurations, i.e. GΓ. Transition fairness (resp. Markov fairness) implies that γ is visited infinitely often (resp. with probability 1) only if it is in a bottom strongly connected component (scc). In turn any control state c will be guaranteed to be reached infinitely often if and only if it is reachable from every bottom scc of GΓ. The if direction follows using the transition fairness and attractor property, while the converse follows by simply identifying a finite path to a bottom scc from which c isn't reachable. The equivalence follows because the underlying graph is canonical for all three notions of fairness.

### 5 Applying Fairness Properties to Decision Problems

In this section, we show how to decide liveness as a corollary of the proof of Theorem 1. We begin by noting that techniques for termination are subsumed by those for repeated control state reachability. This is because termination is not guaranteed iff one can reach a plain configuration from which a terminal control state is inaccessible. Hence, in the sequel, we focus on repeated control state reachability.

### 5.1 Deciding Repeated Control State Reachability

We observe that under the fairness conditions we defined, liveness, i.e. repeated control state reachability reduces to a *safety* query.

*Problem 1 (Repeated control state reachability).* Given a control state (program counters and register values) c, do all infinite executions (in the probabilistic case, a set of measure 1) satisfying fairness condition A reach c infinitely often?

*Problem 2 (Control state reachability).* Given a control state c and a configuration γ, is (c, γm) reachable from γ for some γm?

Theorem 2. *Problem 1 for repeatedly fair transition fairness and probabilistic memory fairness reduces to Problem 2. Moreover, the reduction can compute the* N<sup>0</sup> *from Theorem 1 such that it further applies to size bounded transition fairness.*

*Proof.* This follows by using Problem 2 to compute the underlying connectivity graph G from the proof of Theorem 1. A small technical hurdle is that plain configuration reachability is not the same as control state reachability. However, the key to encode this problem as a control state query is to use the following property: for a configuration γ and a message m (∈ e(x, p)), if for every process p , m is not redundant (formally, frontier(p , x, p) ≤ m), then there exists a plain configuration γ containing m such that γ is reachable from γ via a sequence of update steps. The plan, therefore, is to read and verify whether the messages desired in the plain configuration are, and remain accessible to all processes. Finally, the computation of N<sup>0</sup> follows by enumerating G<sup>N</sup> .

#### 5.2 Quantitative Control State Repeated Reachability

We set the context of a Markov chain C = -Γ, M that refines the underlying the transition system induced by the program. We consider is the quantitative variant of repeated reachability, where instead of just knowing whether the probability is one or not, we are interested in computing it.

*Problem 3 (Quantitative control state repeated reachability).* Given a control state <sup>c</sup> and an error margin <sup>∈</sup> <sup>R</sup>, find a <sup>δ</sup> such that for Markov chain <sup>C</sup>, |Prob(γinit |= -♦c) − δ| ≤

We refer the reader to [6] for details on the standard reduction, from which the following result follows:

Theorem 3. *If Problem 2 is decidable for a memory model, then Problem 3 is computable for Markov chains that satsify probabilistic memory fairness.*

#### 5.3 Adapting Subroutines to Our Memory Framework

We now briefly sketch how to adapt known solutions to Problem 2 for PSO, TSO, StrongCOH, WRA and SRA to our framework.

PSO and TSO. Reachability between plain configurations (a special case of Problem 2) under these models has already been proven decidable [12]. The store buffer framework is similar to the one we describe, and hence the results go through. Moreover, [5, Lemmas 3, 4] shows the decidability of our Problem 2 for TSO. The same construction, which uses an augmented program to reduce to ex-plain configuration reachability, works for PSO as well.

StrongCOH. Decidability of reachability under StrongCOH is shown in [1]. The framework used, although quite different in notation, is roughly isomorphic to the one we propose. The relaxed semantics of StrongCOH allow the framework to be set up as a WSTS [2,26], which supports backward reachability analysis, yielding decidability. Backward reachability gives an upward closed set of states that can reach a target label. Checking whether an arbitrary state is in this upward closed set requires a comparison with only the finitely many elements in the basis. This solves Problems 2.

WRA and SRA. Decidability of reachability under WRA and SRA has recently been shown in [34]. The proof follows the WSTS approach, however, the model used in the proof has different syntax and semantics from the one we present here. However, a reconciliation is possible, and we briefly sketch it here. A state in the proof model is a map from processes to *potentials*. A potential is a finite set of finite traces that a process may execute. These proof-model states are well-quasi-ordered and operating on them sets up a WSTS. Backward reachability gives us a set of maps from processes to *potentials* that allow us to reach the target label. The key is to view a process-potential map as a requirement on our message based configuration. Higher a map in the wqo, stronger the requirement it enforces. In this sense, the basis of states returned by backward reachability constitute the minimal requirements our configuration may meet in order for the target label to be reachable. Formally, let γ be a configuration of our framework. The target label is reachable from γ if and only if: there exists a process-potential map B is in the backward reachable set, such that every trace in every process' potential in B is enabled in γ. It suffices to check the existence of B over the finite basis of the backward reachable set. Note that γ is completely arbitrary: this solves our Problem 2.

### 6 Related Work

Fairness. Only recently has fairness for weak memory started receiving increasing attention. The work closest to ours is by [4], who formulate a probabilistic extension for the Total Store Order (TSO) memory model and show decidability results for associated verification problems. Our treatment of fairness is richer, as we relate same probabilistic fairness with two alternate logical fairness definitions. Similar proof techniques notwithstanding, our verification results are also more general, thanks to the development of a uniform framework that applies to a landscape of models. [37] develop a novel formulation of fairness as a declarative property of event structures. This notion informally translates to "Each message is eventually propagated." We forego axiomatic elegance to motivate and develop stronger practical notions of fairness in our quest to verify liveness.

Probabilistic Verification. There are several works on verification of *finitestate Markov chains* (e.g. [14,33]). However, since the messages in our memory systems are unbounded, these techniques do not apply. There is also substantive literature on the verification of infinite state probabilistic system, which have often been modelled as infinite Markov chains [17–19,24,25]. However their results cannot be directly leveraged to imply ours. The machinery we use for showing decidability is relies on decisive Markov chains, a concept formulated in [7] and used in [4].

Framework. On the modelling front, the ability to specify memory model semantics as first-order constraints over the program-order (po), reads-from (rf), and modification-order (mo) have led to elegant declarative frameworks based on event structures [9,10,21,28]. There are also approaches that, instead of natively characterizing semantics, prescribe constraints on their ISA-level behaviours in terms of program transformations [38]. On the operational front, there have been works that model individual memory models [43,46] and clusters of similar model [30,35], however we are not aware of any operational modelling framework that encompasses as wide a range of models as we do. The operationalization in [16] uses write buffers which resemble our channels, however, their operationalization too focuses on a specific semantics.

### 7 Conclusion, Future Work, and Perspective

*Conclusion.* The ideas developed in Sect. 4 lie at the heart of our contribution: we motivate and define transition fairness augmented with memory size boundedness or the recurrence of plain configurations, as well as the analogous probabilistic memory fairness. These are equivalent for the purpose of verifying repeated control state reachability, i.e. liveness, and lie at the core of the techniques we discuss in Sect. 5. These techniques owe their generality to the versatile framework we describe in Sect. 3.

*Future Work.* There are several interesting directions for future work. We believe that our framework can be extended to handle weak memory models that allow speculation, such as ARM and POWER. In such a case, we would need to extend our fairness conditions to limit the amount of allowed speculation. It is also interesting to mix transition fairness with probabilistic fairness, i.e., use the former to solve scheduler non-determinism and the latter to resolve memory nondeterminism, leading to (infinite-state) Markov Decision Process model. Along these lines, we can also consider synthesis problems based on 2 <sup>1</sup> <sup>2</sup> -games. To solve such game problems, we could extend the framework of Decisive Markov chains that have been developed for probabilistic and game theoretic problems over infinite-state systems [7] A natural next step is developing efficient algorithms for proving liveness properties for programs running on weak memory models. In particular, since we reduce the verification of liveness properties to simple reachability, there is high hope one can develop CEGAR frameworks relying both on over-approximations, such as predicate abstraction, and under-approximations such as bounded context-switching [44] and stateless model checking [3,32].

*Perspective.* Leveraging techniques developed over the years by the program verification community, and using them to solve research problems in programming languages, architectures, databases, etc., has substantial potential added value. Although it requires a deep understanding of program behaviors running on such platforms, we believe it is about finding the right concepts, combining them correctly, and then applying the existing rich set of program verification techniques, albeit in a non-trivial manner. The current paper is a case in point. Here, we have used a combination of techniques developed for reactive systems [31], methods for the analysis of infinite-state systems [7], and semantical models developed for weak memory models [12,30,34,35] to obtain, for the first time, a framework for the systematic analysis of liveness properties under weak memory models.

Acknowledgements. This project was partially supported by the Swedish Research Council and SERB MATRICS grant MTR/2019/000095. Adwait Godbole was supported in part by Intel under the Scalable Assurance program, DARPA contract FA8750-20-C0156 and National Science Foundation (NSF) grant 1837132.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Rely-Guarantee Reasoning for Causally Consistent Shared Memory

Ori Lahav<sup>1</sup> , Brijesh Dongol2(B) , and Heike Wehrheim<sup>3</sup>

<sup>1</sup> Tel Aviv University, Tel Aviv, Israel orilahav@tau.ac.il <sup>2</sup> University of Surrey, Guildford, UK b.dongol@surrey.ac.uk <sup>3</sup> University of Oldenburg, Oldenburg, Germany heike.wehrheim@uni-oldenburg.de

Abstract. Rely-guarantee (RG) is a highly influential compositional proof technique for concurrent programs, which was originally developed assuming a sequentially consistent shared memory. In this paper, we first generalize RG to make it parametric with respect to the underlying memory model by introducing an RG framework that is applicable to any model axiomatically characterized by Hoare triples. Second, we instantiate this framework for reasoning about concurrent programs under *causally consistent memory*, which is formulated using a recently proposed *potential-based* operational semantics, thereby providing the first reasoning technique for such semantics. The proposed program logic, which we call Piccolo, employs a novel assertion language allowing one to specify ordered sequences of states that each thread may reach. We employ Piccolo for multiple litmus tests, as well as for an adaptation of Peterson's algorithm for mutual exclusion to causally consistent memory.

### 1 Introduction

Rely-guarantee (RG) is a fundamental compositional proof technique for concurrent programs [21,48]. Each program component P is specified using *rely* and *guarantee* conditions, which means that P can tolerate any environment interference that follows its rely condition, and generate only interference included in its guarantee condition. Two components can be composed in parallel provided that the rely of each component agrees with the guarantee of the other.

The original RG framework and its soundness proof have assumed a sequentially consistent (SC) memory [33], which is unrealistic in modern processor architectures and programming languages. Nevertheless, the main principles behind RG are not at all specific for SC. Accordingly, our first main contribution,

Lahav is supported by the Israel Science Foundation (grants 1566/18 and 814/22) and by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement no. 851811). Dongol is supported by EPSRC grants EP/X015149/1, EP/V038915/1, EP/R025134/2, VeTSS, and ARC Discovery Grant DP190102142. Wehrheim is supported by the German Research Council DFG (project no. 467386514).

c The Author(s) 2023

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 206–229, 2023. https://doi.org/10.1007/978-3-031-37706-8\_11

is to formally decouple the underlying memory model from the RG proof principles, by proposing a generic RG framework parametric in the input memory model. To do so, we assume that the underlying memory model is axiomatized by Hoare triples specifying pre- and postconditions on memory states for each primitive operation (e.g., loads and stores). This enables the formal development of RG-based logics for different shared memory models as instances of one framework, where all build on a uniform soundness infrastructure of the RG rules (e.g., for sequential and parallel composition), but employ different specialized assertions to describe the possible memory states, where specific soundness arguments are only needed for primitive memory operations.

The second contribution of this paper is an instance of the general RG framework for *causally consistent shared memory*. The latter stands for a family of wide-spread and well-studied memory models weaker than SC, which are sufficiently strong for implementing a variety of synchronization idioms [6,12,26]. Intuitively, unlike SC, causal consistency allows different threads to observe writes to memory in different orders, as long as they agree on the order of writes that are causally related. This concept can be formalized in multiple ways, and here we target a strong form of causal consistency, called *strong release-acquire* (SRA) [28,31] (and equivalent to "causal convergence" from [12]), which is a slight strengthening of the well-known release-acquire (RA) model (used by C/C++11). (The variants of causal consistency only differ for programs with write/write races [10,28], which are rather rare in practice.)

Our starting point for axiomatizing SRA as Hoare triples is the *potentialbased* operational semantics of SRA, which was recently introduced with the goal of establishing the decidability of control state reachability under this model [27,28] (in contrast to undecidability under RA [1]). Unlike more standard presentations of weak memory models whose states record information about the *past* (e.g., in the form of store buffers containing executed writes before they are globally visible [36], partially ordered execution graphs [8,20,31], or collections of timestamped messages and thread views [11,16,17,23,25,47]), the states of the potential-based model track possible *futures* ascribing what sequences of observations each thread can perform. We find this approach to be a particularly appealing candidate for Hoare-style reasoning which would naturally generalize SC-based reasoning. Intuitively, while an assertion in SC specifies possible observations at a given program point, an assertion in a potential-based model should specify possible *sequences* of observations.

To pursue this direction, we introduce a novel assertion language, resembling temporal logics, which allows one to express properties of sequences of states. For instance, our assertions can express that a certain thread may currently read <sup>x</sup> = 0, but it will have to read <sup>x</sup> = 1 once it reads <sup>y</sup> = 1. Then, we provide Hoare triples for SRA in this assertion language, and incorporate them in the general RG framework. The resulting program logic, which we call Piccolo, provides a novel approach to reason on concurrent programs under causal consistency, which allows for simple and direct proofs, and, we believe, may constitute a basis for automation in the future.

Fig. 1. Message passing in SC

### 2 Motivating Example

To make our discussion concrete, consider the message passing program (MP) in Figs. 1 and 2, comprising shared variables x and y and local registers a and b. The proof outline in Fig. 1 assumes SC, whereas Fig. 2 assumes SRA. In both cases, at the end of the execution, we show that if <sup>a</sup> is 1, then <sup>b</sup> must also be 1. We use these examples to explain the two main concepts introduced in this paper: (i) a generic RG framework and (ii) its instantiation with a potential-focused assertion system that enables reasoning under SRA.

*Rely-Guarantee.* The proof outline in Fig. 1 can be read as an RG derivation:


$$\mathcal{G}\_2 \triangleq \left\{ \{ \mathbf{y} = 1 \Rightarrow \mathbf{x} = 1 \} \ \mathbf{T}\_2 \longmapsto \mathbf{a} := \texttt{LOLD}(\mathbf{y}), \{ \mathbf{a} = 1 \Rightarrow \mathbf{x} = 1 \} \ \mathbf{T}\_2 \longmapsto \mathbf{b} := \texttt{LOLD}(\mathbf{x}) \right\}$$

7. To perform the parallel composition, R1, G1 and R2, G2 should be *noninterfering*. This involves showing that each R ∈ R<sup>i</sup> is *stable* under each <sup>G</sup> ∈ G<sup>j</sup> for <sup>i</sup> <sup>=</sup> <sup>j</sup>. That is, if <sup>G</sup> <sup>=</sup> {P} <sup>τ</sup> -→ c, we require the Hoare triple {P ∩ R} τ -→ c {R} to hold. In this case, these proof obligations are straightforward to discharge using Hoare's assignment axiom (and is trivial for <sup>i</sup> = 1 and <sup>j</sup> = 2 since load instructions leave the memory intact).

*Remark 1.* Classical treatments of RG involve two related ideas [21]: (1) specifying a component by rely and guarantee conditions (together with standard pre- and postconditions); and (2) taking the relies and guarantees to be binary relations over states. Our approach adopts (1) but not (2). Thus, it can be seen as an RG presentation of the Owicki-Gries method [37], as was previously done in [32]. We have not observed an advantage for using binary relations in our examples, but the framework can be straightforwardly modified to do so.

Now, observe that substantial aspects of the above reasoning are *not* directly tied with SC. This includes the Hoare rules for compound commands (such as sequential composition above), the idea of specifying a thread using collections of stable rely assertions and guaranteed guarded primitive commands, and the noninterference condition for parallel composition. To carry out this generalization, we assume that we are provided an assertion language whose assertions are interpreted as *sets of memory states* (which can be much more involved than simple mappings of variables to values), and a set of valid Hoare triples for the primitive instructions. The latter is used for checking validity of primitive triples, (e.g., {P} T<sup>1</sup> -<sup>→</sup> STORE(x, 1) {Q}), as well as non-interference conditions (e.g., {P ∩ R} T<sup>1</sup> -<sup>→</sup> STORE(x, 1) {R}). In Sect. 4, we present this generalization, and establish the soundness of RG principles independently of the memory model.

*Potential-Based Reasoning.* The second contribution of our work is an application of the above to develop a logic for a potential-based operational semantics that captures SRA. In this semantics every memory state records sequences of store mappings (from shared variables to values) that each thread may observe. For example, assuming all variables are initialized to <sup>0</sup>, if <sup>T</sup><sup>1</sup> executed its code until completion before T<sup>2</sup> even started (so under SC the memory state is the store {x -<sup>→</sup> 1, <sup>y</sup> -<sup>→</sup> 1}), we may reach the SRA state in which <sup>T</sup>1's potential consists of one store {x -<sup>→</sup> 1, <sup>y</sup> -<sup>→</sup> 1}, and <sup>T</sup>2's potential is the sequence of stores:

$$\langle \{ \mathbf{x} \mapsto 0, \mathbf{y} \mapsto 0 \}, \{ \mathbf{x} \mapsto 1, \mathbf{y} \mapsto 0 \}, \{ \mathbf{x} \mapsto 1, \mathbf{y} \mapsto 1 \} \rangle,$$

which captures the stores that T<sup>2</sup> may observe in the order it may observe them. Naturally, potentials are *lossy* allowing threads to non-deterministically lose a subsequence of the current store sequence, so they can progress in their sequences. Thus, <sup>T</sup><sup>2</sup> can read <sup>1</sup> from <sup>y</sup> only after it loses the first two stores in its potential, and from this point on it can only read 1 from <sup>x</sup>. Now, one can see that *all* potentials of T<sup>2</sup> at its initial program point are, in fact, subsequences of the above sequence (regardless of where <sup>T</sup><sup>1</sup> is), and conclude that <sup>a</sup> = 1 <sup>⇒</sup> <sup>b</sup> = 1 holds when T<sup>2</sup> terminates.

To capture the above informal reasoning in a Hoare logic, we designed a new form of assertions capturing possible locally observable sequences of stores, rather than one global store, which can be seen as a restricted fragment of linear temporal logic. The proof outline using these assertions is given in Fig. 2. In particular, [<sup>x</sup> = 1] is satisfied by all store sequences in which every store maps <sup>x</sup> to 1, whereas [<sup>y</sup> = 1] ; [<sup>x</sup> = 1] is satisfied by all store sequences that can be split into a (possibly empty) prefix whose value for <sup>y</sup> is not 1 followed by a (possibly empty) suffix whose value for <sup>x</sup> is 1. Assertions of the form <sup>τ</sup> -I state that the potential of thread τ includes only store sequences that satisfy I.

The first assertion of T<sup>2</sup> is implied by the initial condition, T0-[<sup>y</sup> = 1], since the potential of the parent thread T<sup>0</sup> is inherited by the forked child threads and T<sup>2</sup> -[<sup>y</sup> = 1] implies <sup>T</sup><sup>2</sup> -[<sup>y</sup> = 1] ; <sup>I</sup> for any <sup>I</sup>. Moreover, <sup>T</sup><sup>2</sup> -[<sup>y</sup> = 1] ; [<sup>x</sup> = 1] is preserved by (i) line 1 because writing 1 to <sup>x</sup> leaves [<sup>y</sup> = 1] unchanged and re-establishes [<sup>x</sup> = 1]; and (ii) line 2 because the semantics for SRA ensures that after reading <sup>1</sup> from <sup>y</sup> by <sup>T</sup>2, the thread <sup>T</sup><sup>2</sup> is confined by <sup>T</sup>1's potential just before it wrote 1 to <sup>y</sup>, which has to satisfy the precondition <sup>T</sup>1-[<sup>x</sup> = 1]. (SRA allows to update the other threads' potential only when the suffix of the potential after the update is observable by the writer thread.)

In Sect. 6 we formalize these arguments as Hoare rules for the primitive instructions, whose soundness is checked using the potential-based operational semantics and the interpretation of the assertion language. Finally, Piccolo is obtained by incorporating these Hoare rules in the general RG framework.

*Remark 2.* Our presentation of the potential-based semantics for SRA (fully presented in Sect. 5) deviates from the original one in [28], where it was called loSRA. The most crucial difference is that while loSRA's potentials consist of lists of perlocation read options, our potentials consist of lists of *stores* assigning a value to every variable. (This is similar in spirit to the adaptation of load buffers for TSO [4,5] to snapshot buffers in [2]). Additionally, unlike loSRA, we disallow empty potential lists, require that the potentials of the different threads agree on the very last value to each location, and handle read-modify-write (RMW) instructions differently. We employed these modifications to loSRA as we observed that direct reasoning on loSRA states is rather unnatural and counterintuitive, as loSRA allows traces that *block* a thread from reading any value from certain locations (which cannot happen in the version we formulate). For example, a direct interpretation of our assertions over loSRA states would allow states in which τ-[<sup>x</sup> = <sup>v</sup>] and <sup>τ</sup>-[<sup>x</sup> = <sup>v</sup>] both hold (when τ does not have any option to read from x), while these assertions are naturally contradictory when interpreted on top of our modified SRA semantics. To establish confidence in the new potential-based semantics we have proved *in Coq* its equivalence to the standard execution-graph based semantics of SRA (over 5K lines of Coq proofs) [29].

### 3 Preliminaries: Syntax and Semantics

In this section we describe the underlying program language, leaving the sharedmemory semantics parametric.

*Syntax.* The syntax of programs, given in Fig. 3, is mostly standard, comprising primitive (atomic) commands c and compound commands C. The non-standard

*values* <sup>v</sup> <sup>∈</sup> Val <sup>=</sup> {0, <sup>1</sup>, ...} *shared variables* x, y <sup>∈</sup> Loc <sup>=</sup> {x, y, ...} *local registers* <sup>r</sup> <sup>∈</sup> Reg <sup>=</sup> {a, <sup>b</sup>, ...} *thread identifiers* τ,π <sup>∈</sup> Tid <sup>=</sup> {T0, <sup>T</sup>1, ...} e ::= r | v | e + e | e = e | ¬e | e ∧ e | e ∨ e | ... <sup>c</sup> ::= <sup>r</sup> := <sup>e</sup> <sup>|</sup> STORE(x, e) <sup>|</sup> <sup>r</sup> := LOAD(x) <sup>|</sup> SWAP(x, e) ˜<sup>c</sup> ::= c, r := e <sup>C</sup> ::= <sup>c</sup> <sup>c</sup>˜ skip <sup>C</sup> ; <sup>C</sup> if <sup>e</sup> then <sup>C</sup> else <sup>C</sup> while <sup>e</sup> do <sup>C</sup> <sup>C</sup> <sup>τ</sup> <sup>τ</sup> <sup>C</sup>

## Fig. 3. Program syntax

$$\frac{\gamma' = \gamma[r \mapsto \gamma(e)]}{r := e \gg \gamma \stackrel{c}{\to} \gamma'} \qquad \frac{l = \mathbb{W}(x, \gamma(e))}{\mathsf{STOLE}(x, e) \gg \gamma \stackrel{l}{\to} \gamma} \qquad \frac{l = \mathbb{R}(x, v) \qquad \gamma' = \gamma[r \mapsto v]}{r := \mathsf{LoadD}(x) \gg \gamma \stackrel{l}{\to} \gamma'}$$

$$\frac{l = \textsf{RW}(x, v, \gamma(e))}{\textsf{SWAP}(x, e) \gg \gamma \xrightarrow{l} \gamma} \qquad \frac{c \gg \gamma \xrightarrow{l\_e} \gamma\_0}{\textit{ }} \frac{c \gg \gamma \xrightarrow{l\_e} \gamma\_1}{\textit{ }} \quad \dots \quad r\_n := e\_n \gg \gamma\_{n-1} \xrightarrow{\varepsilon} \gamma\_n$$

$$\langle c, \langle r\_1, \dots, r\_n \rangle \rangle := \langle e\_1, \dots, e\_n \rangle \rangle \gg \gamma \xrightarrow{l\_e} \gamma\_n$$

Fig. 4. Small-step semantics of (instrumented) primitive commands (c˜ <sup>γ</sup> <sup>l</sup><sup>ε</sup>−→ <sup>γ</sup>- )

components are instrumented commands <sup>c</sup>˜, which are meant to atomically execute a primitive command <sup>c</sup> and a (multiple) assignment *<sup>r</sup>* := *<sup>e</sup>*. Such instructions are needed to support auxiliary (a.k.a. ghost) variables in RG proofs. In addition, SWAP (a.k.a. atomic exchange) is an example of an RMW instruction. For brevity, other standard RMW instructions, such as FADD and CAS, are omitted.

Unlike many weak memory models that only support top-level parallelism, we include dynamic thread creation via commands of the form C<sup>1</sup> <sup>τ</sup><sup>1</sup> ||<sup>τ</sup><sup>2</sup> <sup>C</sup><sup>2</sup> that forks two threads named τ<sup>1</sup> and τ<sup>2</sup> that execute the commands C<sup>1</sup> and C2, respectively. Each C<sup>i</sup> may itself comprise further parallel compositions. Since thread identifiers are explicit, we require commands to be *well formed*. Let Tid(C) be the set of all thread identifiers that appear in C. A command C is *well formed*, denoted wf(C), if parallel compositions inside employ disjoint sets of thread identifiers. This notion is formally defined by induction on the structure of commands, with the only interesting case being wf(C<sup>1</sup> <sup>τ</sup><sup>1</sup> ||<sup>τ</sup><sup>2</sup> <sup>C</sup><sup>2</sup>) if wf(C<sup>1</sup>) <sup>∧</sup> wf(C<sup>2</sup>) <sup>∧</sup> <sup>τ</sup><sup>1</sup> <sup>=</sup> <sup>τ</sup><sup>2</sup> <sup>∧</sup> Tid(C<sup>1</sup>) <sup>∩</sup> Tid(C<sup>2</sup>) = <sup>∅</sup>.

*Program Semantics.* We provide small-step operational semantics to commands independently of the memory system. To connect this semantics to a given memory system, its steps are instrumented with labels, as defined next.

Definition 1. <sup>A</sup> *label* <sup>l</sup> takes one of the following forms: a read <sup>R</sup>(x, v<sup>R</sup>), a write <sup>W</sup>(x, v<sup>W</sup>), a read-modify-write RMW(x, vR, v<sup>W</sup>), a fork FORK(τ1, τ<sup>2</sup>), or a join JOIN(τ1, τ<sup>2</sup>), where <sup>x</sup> <sup>∈</sup> Loc, <sup>v</sup>R, v<sup>W</sup> <sup>∈</sup> Val, and <sup>τ</sup>1, τ<sup>2</sup> <sup>∈</sup> Tid. We denote by Lab the set of all labels.

<sup>c</sup>˜ <sup>γ</sup> <sup>l</sup><sup>ε</sup> <sup>γ</sup>- c, γ ˜ <sup>l</sup><sup>ε</sup> skip, γ- C1, γ <sup>l</sup><sup>ε</sup> C- 1, γ- C1 ; <sup>C</sup>2, γ <sup>l</sup><sup>ε</sup> C- 1 ; <sup>C</sup>2, γ- skip ; <sup>C</sup>2, γ <sup>ε</sup> C2, γ <sup>γ</sup>(e) = *true* <sup>⇒</sup> <sup>i</sup> = 1 γ(e) -<sup>=</sup> *true* <sup>⇒</sup> <sup>i</sup> = 2 if <sup>e</sup> then <sup>C</sup>1 else <sup>C</sup>2, γ <sup>ε</sup> <sup>C</sup>i, γ C- <sup>=</sup> if <sup>e</sup> then (<sup>C</sup> ; while <sup>e</sup> do <sup>C</sup>) else skip while <sup>e</sup> do C, γ <sup>ε</sup> <sup>C</sup>- , γ

Fig. 5. Small-step semantics of commands (C, γ <sup>l</sup><sup>ε</sup>−→ C- , γ- )

C, γ <sup>l</sup><sup>ε</sup> C- , γ- C0 {τ C}, γ τ,lε 0 τ C- , γ <sup>C</sup>(<sup>τ</sup> ) = <sup>C</sup>1 <sup>τ</sup><sup>1</sup> ||<sup>τ</sup><sup>2</sup> <sup>C</sup>2 τ1 -<sup>∈</sup> *dom*(C) <sup>τ</sup>2 -<sup>∈</sup> *dom*(C) <sup>l</sup> <sup>=</sup> FORK(τ1, τ2) C- <sup>=</sup> {τ1 <sup>C</sup>1, τ2 <sup>C</sup>2} , γ τ,l - , γ <sup>C</sup> <sup>=</sup> τ C<sup>1</sup> <sup>τ</sup><sup>1</sup> ||<sup>τ</sup><sup>2</sup> <sup>C</sup>2, <sup>τ</sup>1 skip, τ2 skip <sup>l</sup> <sup>=</sup> JOIN(τ1, τ2) C- <sup>=</sup> {<sup>τ</sup> skip} 0 , γ τ,l 0 - , γ

Fig. 6. Small-step semantics of command pools (C, γ τ,l<sup>ε</sup> −−→ C- , γ- )

Definition 2. <sup>A</sup> *register store* is a mapping <sup>γ</sup> : Reg <sup>→</sup> Val. Register stores are extended to expressions as expected. We denote by Γ the set of all register stores.

The semantics of (instrumented) primitive commands is given in Fig. 4. Using this definition, the semantics of commands is given in Fig. 5. Its steps are of the form C, γ <sup>l</sup><sup>ε</sup> −→ C , γ where C and C are commands, γ and γ are register stores, and l<sup>ε</sup> ∈ Lab∪{ε} (ε denotes a thread internal step). We lift this semantics to *command pools* as follows.

Definition 3. <sup>A</sup> *command pool* is a non-empty partial function <sup>C</sup> from thread identifiers to commands, such that the following hold:

1. Tid(C(τ<sup>1</sup>)) <sup>∩</sup> Tid(C(τ<sup>2</sup>)) = <sup>∅</sup> for every <sup>τ</sup><sup>1</sup> <sup>=</sup> <sup>τ</sup><sup>2</sup> in *dom*(C). 2. <sup>τ</sup> ∈ Tid(C(<sup>τ</sup> )) for every <sup>τ</sup> <sup>∈</sup> *dom*(C).

We write command pools as sets of the form {τ<sup>1</sup> -→ C1, ... ,τ<sup>n</sup> -→ Cn}.

Steps for command pools are given in Fig. 6. They take the form C, γ τ,l<sup>ε</sup> −−→ C , γ , where C and C are command pools, γ and γ are register stores, and <sup>τ</sup> : <sup>l</sup>ε (with <sup>τ</sup> <sup>∈</sup> Tid and <sup>l</sup><sup>ε</sup> <sup>∈</sup> Lab ∪ {ε}) is a *command transition label*.

*Memory Semantics.* To give semantics to programs under a memory model, we synchronize the transitions of a command C with a memory system. We leave the memory system parametric, and assume that it is represented by a labeled transition system (LTS) M with set of states denoted by M.Q, and steps denoted by −→<sup>M</sup> . The transition labels of general memory system M consist of non-silent program transition labels (elements of Tid × Lab) and a (disjoint) set M.**Θ** of internal memory actions, which is again left parametric (used, e.g., for memory-internal propagation of values).

*Example 1.* The simple memory system that guarantees sequential consistency is denoted here by SC. This memory system tracks the most recent value written to each variable and has no internal transitions (SC.**<sup>Θ</sup>** = <sup>∅</sup>). Formally, it is defined by SC.Q -Loc → Val and −→SC is given by:

$$\begin{array}{llll} l & l = \mathtt{RM}(x, v\_{\mathtt{k}}, v\_{\mathtt{k}})\\ \frac{l}{m} \mathtt{m}(x) = v\_{\mathtt{k}} & m' = m[x \mapsto v\_{\mathtt{k}}] \\ \frac{m}{m} \xrightarrow{\tau, l} \mathtt{sc\\_m}{m} & \frac{m' = m[x \mapsto v\_{\mathtt{k}}]}{m} & \frac{m' = m[x \mapsto v\_{\mathtt{k}}]}{m} \end{array} \quad \begin{array}{llll} l = \mathtt{RM}(x, v\_{\mathtt{k}}, v\_{\mathtt{k}})\\ m(x) = v\_{\mathtt{k}}\\ m \xrightarrow{\tau, l} \mathtt{sc\\_m} m'\\ m \xrightarrow{\tau, l} \mathtt{sc\\_m} m' \end{array}$$

The composition of a program with a general memory system is defined next.

Definition 4. The *concurrent system* induced by a memory system <sup>M</sup>, denoted by <sup>M</sup>, is the LTS whose transition labels are the elements of (Tid×(Lab∪{ε})) M.**Θ**; states are triples of the form C, γ,m where C is a command pool, γ is a register store, and m ∈ M.Q; and the transitions are "synchronized transitions" of the program and the memory system, using labels to decide what to synchronize on, formally given by:

$$\begin{array}{c} \langle \mathcal{C}, \gamma \rangle \xhookrightarrow{\tau, l} \langle \mathcal{C}', \gamma' \rangle\\ l \in \mathsf{Lab} & m \xrightarrow{\tau, l} \mathsf{M} \ m'\\ \langle \mathcal{C}, \gamma, m \rangle \xhookrightarrow{\tau, l} \langle \mathcal{C}', \gamma', m' \rangle \end{array} \quad \begin{array}{c} \quad \theta \in \mathcal{M}. \mathsf{O} \\\ \langle \mathcal{C}, \gamma \rangle \xhookrightarrow{\tau, \varepsilon} \langle \mathcal{C}', \gamma' \rangle \end{array} \quad \begin{array}{c} \quad \theta \in \mathcal{M}. \mathsf{O} \\\ m \xrightarrow{\theta}\_{\mathcal{M}} m' \end{array}$$

### 4 Generic Rely-Guarantee Reasoning

In this section we present our generic RG framework. Rather than committing to a specific assertion language, our reasoning principles apply on the *semantic level*, using sets of states instead of syntactic assertions. The structure of proofs still follows program structure, thereby retaining RG's compositionality. By doing so, we decouple the semantic insights of RG reasoning from a concrete syntax. Next, we present proof rules serving as blueprints for memory model specific proof systems. An instantiation of this blueprint requires lifting the semantic principles to syntactic ones. More specifically, it requires


Thus, each instance of the framework (for a specific memory system) is left with the task of identifying useful abstractions on states, as well as a suitable formalism, for making the generic semantic framework into a proof system.

*RG Judgments.* We let M be an arbitrary memory system and Σ<sup>M</sup> - Γ×M.Q. Properties of programs C are stated via *RG judgments*:

$$\mathcal{C} \stackrel{\textit{sat}}{\underline{\omega}} \mathcal{M} \; (P, \mathcal{R}, \mathcal{G}, Q)$$

where P, Q <sup>⊆</sup> <sup>Σ</sup>M, R⊆P(ΣM), and <sup>G</sup> is a set of *guarded commands*, each of which takes the form {G} τ -→ α, where G ⊆ Σ<sup>M</sup> and α is either an (instrumented) primitive command <sup>c</sup>˜ or a fork/join label (of the form FORK(τ1, τ<sup>2</sup>) or JOIN(τ1, τ<sup>2</sup>)). The latter is needed for considering the effect of forks and joins on the memory state.

*Interpretation of RG Judgments.* RG judgments <sup>C</sup> sat<sup>M</sup> (P, <sup>R</sup>, <sup>G</sup>, Q) state that a terminating run of C starting from a state in P, under any concurrent context whose transitions preserve each of the sets of states in R, will end in a state in Q and perform only transitions contained in G. To formally define this statement, following the standard model for RG, these judgments are interpreted on *computations* of programs. Computations arise from runs of the concurrent system (see Definition 4) by abstracting away from concrete transition labels and including arbitrary "environment transitions" representing steps of the concurrent context. We have:


Note that memory transitions do not occur in the classical RG presentation (since SC does not have internal memory actions).

A *computation* is a (potentially infinite) sequence

$$\xi = \langle \mathcal{C}\_0, \gamma\_0, m\_0 \rangle \xrightarrow{a\_1} \langle \mathcal{C}\_1, \gamma\_1, m\_1 \rangle \xrightarrow{a\_2} \dots$$

with a<sup>i</sup> ∈ {cmp, env, mem}. We let Clast(ξ), γlast(ξ), mlast(ξ) denotes its last element, when <sup>ξ</sup> is finite. We say that <sup>ξ</sup> is a computation *of a command pool* <sup>C</sup> when <sup>C</sup><sup>0</sup> <sup>=</sup> <sup>C</sup> and for every <sup>i</sup> <sup>≥</sup> 0:


We denote by *Comp*(C) the set of all computations of a command pool <sup>C</sup>.

To define validity of RG judgments, we use the following definition.

Definition 5. Let <sup>ξ</sup> <sup>=</sup> C0, γ0, m0 −− a <sup>→</sup><sup>1</sup> C1, γ1, m1 −− a <sup>→</sup><sup>2</sup> ... be a computation, and <sup>C</sup> sat<sup>M</sup> (P, <sup>R</sup>, <sup>G</sup>, Q) an RG-judgment.

	- if <sup>α</sup> = ˜<sup>c</sup> is an instrumented primitive command, then for some <sup>l</sup><sup>ε</sup> <sup>∈</sup> Lab ∪ {ε}, we have {τ -<sup>→</sup> <sup>c</sup>˜}, γi, mi τ,l<sup>ε</sup> −−→<sup>M</sup> {<sup>τ</sup> -→ skip}, γi+1, mi+1

$$\begin{array}{c} \text{COMP} \\ \{\tau \mapsto \texttt{skip}\} \stackrel{\scriptstyle \textsc{\rm{\tiny}}\mathcal{M}\_{\mathcal{M}} \ }{\;\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\/\!\!\!\!\!\!\!\!\/\!\!\!\!\!\!\!\!\!\!\/\!\!\!\!\/\!\!\!\!\/\!\!\!\!\/\to \mathcal{E}\stackrel{\scriptstyle \textsc{\rm{\tiny}}\mathcal{M}\_{\mathcal{M}} \ }{\{\!\!\!\!\!\!\!\/\!\!\!\!\/\!\!\!\/\!\!\!\!\/\!\!\!\/\!\!\!\!\/\!\!\!\/\!\!\!\/\!\!\!\/\!\!\!\/\!} \\\\ \text{SEQ} \begin{cases} \{\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\/\!$$

Fig. 7. Generic sequential RG proof rules (letting <sup>e</sup> <sup>=</sup> {γ,m | <sup>γ</sup>(e) = *true*})

• if <sup>α</sup> ∈ {FORK(τ1, τ2), JOIN(τ1, τ2)}, then <sup>m</sup><sup>i</sup> τ,α −−→<sup>M</sup> <sup>m</sup>i+1 and <sup>γ</sup><sup>i</sup> <sup>=</sup> <sup>γ</sup>i+1. – <sup>ξ</sup> admits <sup>Q</sup> if γlast(ξ), mlast(ξ) ∈ <sup>Q</sup> whenever <sup>ξ</sup> is finite and <sup>C</sup>last(ξ)(<sup>τ</sup> ) = skip for every <sup>τ</sup> <sup>∈</sup> *dom*(Clast(ξ)).

We denote by Assume(P, <sup>R</sup>) the set of all computations that admit <sup>P</sup> and <sup>R</sup>, and by Commit(G, Q) the set of all computations that admit <sup>G</sup> and <sup>Q</sup>.

Then, *validity* of a judgment if defined as

<sup>|</sup><sup>=</sup> <sup>C</sup> sat<sup>M</sup> (P, <sup>R</sup>, <sup>G</sup>, Q) <sup>⇔</sup> *Comp*(C) <sup>∩</sup> Assume(P, <sup>R</sup>) <sup>⊆</sup> Commit(G, Q)

*Memory Triples.* Our proof rules build on *memory triples*, which specify preand postconditions for primitive commands for a memory system M.

Definition 6. <sup>A</sup> *memory triple for a memory system* <sup>M</sup> is a tuple of the form {P} τ -→ α {Q}, where P, Q ⊆ ΣM, τ ∈ Tid, and α is either an instrumented primitive command, a fork label, or a join label. A memory triple for M is *valid*, denoted by M {P} τ -→ α {Q}, if the following hold for every γ,m ∈ P, γ ∈ Γ and m ∈ M.Q:

– if α is an instrumented primitive command and {τ -<sup>→</sup> <sup>α</sup>}, γ,m τ,l<sup>ε</sup> −−→<sup>M</sup> {τ -→ skip}, γ , m for some l<sup>ε</sup> ∈ Lab ∪ {ε}, then γ , m ∈ Q.

– If <sup>α</sup> ∈ {FORK(τ1, τ<sup>2</sup>), JOIN(τ1, τ<sup>2</sup>)} and <sup>m</sup> τ,α −−→<sup>M</sup> m , then γ,m ∈ Q.

*Example 2.* For the memory system SC introduced in Example 1, we have, e.g., memory triples of the form SC {e(<sup>r</sup> := <sup>x</sup>)} <sup>τ</sup> -<sup>→</sup> <sup>r</sup> := LOAD(x) {e} (where <sup>e</sup>(<sup>r</sup> := <sup>x</sup>) is the expression <sup>e</sup> with all occurrences of <sup>r</sup> replaced by <sup>x</sup>).

*RG Proof Rules.* We aim at proof rules deriving valid RG judgments. Figure 7 lists (semantic) proof rules based on externally provided memory triples. These rules basically follows RG reasoning for sequential consistency. For example, rule seq states that RG judgments of commands C<sup>1</sup> and C<sup>2</sup> can be combined when the postcondition of C<sup>1</sup> and the precondition of C<sup>2</sup> agree, thereby uniting their relies and guarantees. Rule com builds on memory triples. The rule par for parallel composition combines judgments for two components when their relies and guarantees are *non-interfering*. Intuitively speaking, this means that each of the assertions that each thread relied on for establishing its proof is preserved when applying any of the assignments collected in the guarantee set of the other thread. An example of non-interfering rely-guarantee pairs is given in step 7 in Sect. 2. Formally, non-interference is defined as follows:

Definition 7. Rely-guarantee pairs R1, <sup>G</sup>1 and R2, <sup>G</sup>2 are *non-interfering* if M {R ∩ P} τ -→ α {R} holds for every R ∈ R<sup>1</sup> and {P} τ -→ α ∈ G2, and similarly for every R ∈ R<sup>2</sup> and {P} τ -→ α ∈ G1.

In turn, fork-join combines the proof of a parallel composition with proofs of fork and join steps (which may also affect the memory state). Note that the guarantees also involve guarded commands with FORK and JOIN labels.

Additional rules for consequence and introduction of auxiliary variables are elided here (they are similar to their SC counterparts), and provided in the extended version of this paper [30].

*Soundness.* To establish soundness of the above system we need an additional requirement regarding the internal memory transitions (for SC this closure vacuously holds as there are no such transitions). We require all relies in R to be *stable under internal memory transitions*, i.e. for R ∈ R we require

$$\forall \gamma, m, m', \theta \in \mathcal{M}. \Theta. \, m \xrightarrow{\theta}\_{\mathcal{M}} m' \Rightarrow (\langle \gamma, m \rangle \in R \Rightarrow \langle \gamma, m' \rangle \in R) \tag{\text{menn}}$$

This condition is needed since the memory system can non-deterministically take its internal steps, and the component's proof has to be stable under such steps.

# Theorem 1 (Soundness). C sat<sup>M</sup> (P, <sup>R</sup>, <sup>G</sup>, Q) =⇒ C sat<sup>M</sup> (P, <sup>R</sup>, <sup>G</sup>, Q)*.*

With this requirement, we are able to establish soundness. The proof, which generally follows [48] is given in the extended version of this paper [30]. We write C sat<sup>M</sup> (P, <sup>R</sup>, <sup>G</sup>, Q) for provability of a judgment using the semantic rules presented above.

### 5 Potential-Based Memory System for SRA

In this section we present the potential-based semantics for Strong Release-Acquire (SRA), for which we develop a novel RG logic. Our semantics is based on the one in [27,28], with certain adaptations to make it better suited for Hoare-style reasoning (see Remark 2).

In weak memory models, threads typically have different views of the shared memory. In SRA, we refer to a memory snapshot that a thread may observe as a *potential store*:

Definition 8. <sup>A</sup> *potential store* is a function <sup>δ</sup> : Loc <sup>→</sup> Val× {R, RMW} ×Tid. We write val(δ(x)), rmw(δ(x)), and tid(δ(x)) to retrieve the different components of <sup>δ</sup>(x). We denote by <sup>Δ</sup> the set of all potential stores.

Having <sup>δ</sup>(x) = v, <sup>R</sup>, τ allows to read the value <sup>v</sup> from <sup>x</sup> (and further ascribes that this read reads from a write performed by thread τ , which is technically needed to properly characterize the SRA model). In turn, having <sup>δ</sup>(x) = v, RMW, τ further allows to perform an RMW instruction that atomically reads and modifies x.

Potential stores are collected in *potential store lists* describing the values which can (potentially) be read and in what order.

Notation 9. Lists over an alphabet <sup>A</sup> are written as <sup>L</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> · ... · <sup>a</sup><sup>n</sup> where <sup>a</sup>1, ... ,a<sup>n</sup> <sup>∈</sup> <sup>A</sup>. We also use · to concatenate lists, and write <sup>L</sup>[i] for the <sup>i</sup>'th element of L and |L| for the length of L.

A *(potential) store list* is a finite sequence of potential stores ascribing a possible sequence of stores that a thread can observe, in the order it will observe them. The RMW-flags in these lists have to satisfy certain conditions: once the flag for a location is set, it remains set in the rest of the list; and the flag must be set at the end of the list. Formally, store lists are defined as follows.

Definition 10. <sup>A</sup> *store list* <sup>L</sup> ∈ L is a non-empty finite sequence of potential stores with *monotone RMW-flags* ending with an RMW, that is: for all x ∈ Loc,

1. if rmw(L[i](x)) = RMW, then rmw(L[j](x)) = RMW for every i<j ≤ |L|, and

2. rmw(L[|L|](x)) = RMW.

Now, SRA states (SRA.Q) consist of *potential mappings* that assign potentials to threads as defined next.

Definition 11. A *potential* D is a non-empty set of potential store lists. A *potential mapping* is a function <sup>D</sup> : Tid <sup>P</sup>(L)\{∅} that maps thread identifiers to potentials such that all lists agree on the very final potential store (that is: <sup>L</sup><sup>1</sup>[|L1|] = <sup>L</sup><sup>2</sup>[|L2|] whenever <sup>L</sup><sup>1</sup> ∈ D(τ<sup>1</sup>) and <sup>L</sup><sup>2</sup> ∈ D(τ<sup>2</sup>)).

These potential mappings are "lossy" meaning that potential stores can be arbitrarily dropped. In particular, dropping the first store in a list enables reading from the second. This is formally done by transitioning from a state D to a "smaller" state D as defined next.

$$\begin{array}{llll} \text{WRTTE} & \text{WRTTE} \\ \forall \boldsymbol{L}' \in \mathcal{D}'(\boldsymbol{\tau}). \exists L \in \mathcal{D}(\boldsymbol{\tau}). L' = L[\boldsymbol{x} \mapsto \langle \boldsymbol{u}\_{\boldsymbol{t}}, \mathbf{R} \mathbf{M}, \boldsymbol{\tau} \rangle] \\ \forall \boldsymbol{\pi} \in \text{dom}(\mathcal{D}) \; \{\;\;\;\;\boldsymbol{z}\}, L' \in \mathcal{D}'(\boldsymbol{\pi}). \exists L\_{0}, L\_{1}. \\ & L\_{0} \cdot L\_{1} \in \mathcal{D}(\boldsymbol{\pi}) \land L\_{1} \in \mathcal{D}(\boldsymbol{\tau}) \land \\ & L' = L\_{0}[\boldsymbol{x} \mapsto \mathbf{R}] \; \;L\_{1}[\boldsymbol{x} \mapsto \langle \boldsymbol{u}\_{\boldsymbol{t}}, \mathbf{R} \mathbf{M}, \boldsymbol{\tau} \rangle] \\ & \mathcal{D} \xrightarrow{\boldsymbol{\tau}, \mathsf{V}(\boldsymbol{x}, \boldsymbol{u}\_{\boldsymbol{t}})} \text{SR} \; \mathcal{D}' \\ \end{array} \begin{array}{ll} \text{LOSE} \\ \begin{array}{ll} \text{LOSE} \\ \mathcal{D}' \sqsubseteq \mathcal{D} \\ \mathcal{D} \xrightarrow{\boldsymbol{\mathcal{D}} \sqsubseteq \mathcal{D}'} \\ \mathcal{D} \xrightarrow{\boldsymbol{\mathcal{D}} \sqsubseteq \mathcal{D}'} \\ \mathcal{D} \xrightarrow{\boldsymbol{\mathcal{D}} \sqsubseteq \mathcal{D}'} \end{array} \begin{array}{ll} \text{DUP} \\ \mathcal{D} \sqsubseteq \mathcal{D}' \\ \mathcal{D} \xrightarrow{\boldsymbol{\mathcal{D}} \sqsubseteq \mathcal{D}'} \\ \end{array}$$


Fig. 8. Steps of SRA (defining <sup>δ</sup>[<sup>x</sup> → v, u, τ ](y) = v, u, τ if <sup>y</sup> <sup>=</sup> <sup>x</sup> and <sup>δ</sup>(y) else, and δ[x → R] to set all RMW-flags for x to R; both pointwise lifted to lists)

Definition 12. The (overloaded) partial order is defined as follows:


We also define L L if L is obtained from L by duplication of some stores (e.g., δ<sup>1</sup> · δ<sup>2</sup> · δ<sup>3</sup> δ<sup>1</sup> · δ<sup>2</sup> · δ<sup>2</sup> · δ3). This is lifted to potential mappings as expected.

Figure 8 defines the transitions of SRA. The lose and dup steps account for losing and duplication in potentials. Note that these are both internal memory transitions (required to preserve relies as of (mem)). The fork and join steps distribute potentials on forked threads and join them at the end. The read step obtains its value from the first store in the lists of the potential of the reader, provided that all these lists agree on that value and the writer thread identifier. rmw steps atomically perform a read and a write step where the read is restricted to an RMW-marked entry.

Most of the complexity is left for the write step. It updates to the new written value for the writer thread τ . For every other thread, it updates a *suffix* (L1) of the store list with the new value. For guaranteeing causal consistency this updated suffix cannot be arbitrary: it has to be in the potential of the writer thread (L<sup>1</sup> ∈ D(<sup>τ</sup> )). This is the key to achieving the "shared-memory causality principle" of [28], which ensures causal consistency.

*Example 3.* Consider again the MP program from Fig. 2. After the initial fork step, threads T<sup>1</sup> and T<sup>2</sup> may have the following store list in their potentials:

$$\mathbf{L} = \begin{bmatrix} \mathbf{x} \mapsto \langle 0, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{0} \rangle \\ \mathbf{y} \mapsto \langle 0, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{0} \rangle \end{bmatrix} \cdot \begin{bmatrix} \mathbf{x} \mapsto \langle 0, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{0} \rangle \\ \mathbf{y} \mapsto \langle 0, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{0} \rangle \end{bmatrix} \cdot \begin{bmatrix} \mathbf{x} \mapsto \langle 0, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{0} \rangle \\ \mathbf{y} \mapsto \langle 0, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{0} \rangle \end{bmatrix}.$$

Then, STORE(x, 1) by <sup>T</sup><sup>1</sup> can generate the following store list for <sup>T</sup>2:

$$\mathbf{L}\_2 = \begin{bmatrix} \mathbf{x} \mapsto \langle 0, \mathbf{R}, \mathbf{T}\_0 \rangle \\ \mathbf{y} \mapsto \langle 0, \mathbf{R} \mathbf{M} \mathbf{W}, \mathbf{T}\_0 \rangle \end{bmatrix} \cdot \begin{bmatrix} \mathbf{x} \mapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{W}, \mathbf{T}\_1 \rangle \\ \mathbf{y} \mapsto \langle 0, \mathbf{R} \mathbf{M} \mathbf{W}, \mathbf{T}\_0 \rangle \end{bmatrix} \cdot \begin{bmatrix} \mathbf{x} \mapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{W}, \mathbf{T}\_1 \rangle \\ \mathbf{y} \mapsto \langle 0, \mathbf{R} \mathbf{M} \mathbf{W}, \mathbf{T}\_0 \rangle \end{bmatrix}.$$

Thus T<sup>2</sup> keeps the possibility of reading the "old" value of x. For T<sup>1</sup> this is different: the model allows the writing thread to only see its new value of x and all entries for x in the store list are updated. Thus, for T<sup>1</sup> we obtain store list

$$\mathbf{L}\_{\mathbf{1}} = \begin{bmatrix} \mathbf{x} \longmapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{1} \rangle \\ \mathbf{y} \longmapsto \langle 0, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{0} \rangle \end{bmatrix} \cdot \begin{bmatrix} \mathbf{x} \longmapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{1} \rangle \\ \mathbf{y} \longmapsto \langle 0, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{0} \rangle \end{bmatrix} \cdot \begin{bmatrix} \mathbf{x} \longmapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{1} \rangle \\ \mathbf{y} \longmapsto \langle 0, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{0} \rangle \end{bmatrix}.$$

Next, when <sup>T</sup><sup>1</sup> executes STORE(y, 1), again, the value for <sup>y</sup> has to be updated to <sup>1</sup> in <sup>T</sup><sup>1</sup> yielding

$$\mathbf{L}\_{\mathbf{1}}^{\prime} = \begin{bmatrix} \mathbf{x} \longmapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{W}, \mathbf{T}\_{0} \rangle \\ \mathbf{y} \longmapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{W}, \mathbf{T}\_{1} \rangle \end{bmatrix} \cdot \begin{bmatrix} \mathbf{x} \longmapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{W}, \mathbf{T}\_{1} \rangle \\ \mathbf{y} \longmapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{W}, \mathbf{T}\_{1} \rangle \end{bmatrix} \cdot \begin{bmatrix} \mathbf{x} \longmapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{W}, \mathbf{T}\_{1} \rangle \\ \mathbf{y} \longmapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{W}, \mathbf{T}\_{1} \rangle \end{bmatrix}.$$

For T<sup>2</sup> the write step may change L<sup>2</sup> to

$$\mathbf{L}\_2' = \begin{bmatrix} \mathbf{x} \mapsto \langle 0, \mathbf{R}, \mathbf{T}\_0 \rangle \\ \mathbf{y} \mapsto \langle 0, \mathbf{R}, \mathbf{T}\_0 \rangle \end{bmatrix} \cdot \begin{bmatrix} \mathbf{x} \mapsto \langle 1, \mathbf{R} \mathbf{W}, \mathbf{T}\_1 \rangle \\ \mathbf{y} \mapsto \langle 0, \mathbf{R}, \mathbf{T}\_0 \rangle \end{bmatrix} \cdot \begin{bmatrix} \mathbf{x} \mapsto \langle 1, \mathbf{R} \mathbf{W}, \mathbf{T}\_1 \rangle \\ \mathbf{y} \mapsto \langle 1, \mathbf{R} \mathbf{W}, \mathbf{T}\_1 \rangle \end{bmatrix}.$$

Thus, thread T<sup>2</sup> can still see the old values, or lose the prefix of its list and see the new values. Importantly, it cannot read 1 from <sup>y</sup> *and then* 0 from <sup>x</sup>. Note that STORE(y, 1) by <sup>T</sup><sup>1</sup> *cannot* modify <sup>L</sup><sup>2</sup> to the list

$$\mathbf{L}\_{\mathbf{2}}^{\prime\prime} = \begin{bmatrix} \mathbf{x} \mapsto \langle 0, \mathbf{R}, \mathbf{T}\_{0} \rangle \\ \mathbf{y} \mapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{1} \rangle \end{bmatrix} \cdot \begin{bmatrix} \mathbf{x} \mapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{1} \rangle \\ \mathbf{y} \mapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{1} \rangle \end{bmatrix} \cdot \begin{bmatrix} \mathbf{x} \mapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{1} \rangle \\ \mathbf{y} \mapsto \langle 1, \mathbf{R} \mathbf{M} \mathbf{U}, \mathbf{T}\_{1} \rangle \end{bmatrix},$$

as it requires T<sup>1</sup> to have L<sup>2</sup> in its *own potential*. This models the intended semantics of message passing under causal consistency.

The next theorem establishes the equivalence of SRA as defined above and opSRA from [28], which is an (operational version of) the standard strong releaseacquire declarative semantics [26,31]. (As a corollary, we obtain the equivalence between the potential-based system from [28] and the variant we define in this paper.)

Our notion of equivalence employed in the theorem is *trace equivalence*. We let a trace of a memory system be a sequence of transition labels, ignoring ε transitions, and consider traces of SRA starting from an initial state λτ ∈ {T1, ... ,T<sup>N</sup> }. {λx.0, RMW, <sup>T</sup>0} and traces of opSRA starting from the initial execution graph that consists of a write event to every location writing 0 by a distinguished initialization thread T0.

### Theorem 2. *A trace is generated by* SRA *iff it is generated by opSRA.*

The proof is of this theorem is by simulation arguments (forward simulation in one direction and backward for the converse). It is mechanized in Coq [29]. The mechanized proof does not consider fork and join steps, but they can be straightforwardly added.


Fig. 9. Assertions of Piccolo

### 6 Program Logic

For the instantiation of our RG framework to SRA, we next (1) introduce the assertions of the logic Piccolo and (2) specify memory triples for Piccolo. Our logic is inspired by *interval logics* like Moszkowski's ITL [35] or duration calculus [13].

*Syntax and Semantics.* Figure 9 gives the grammar of Piccolo. We base it on *extended expressions* which—besides registers—can also involve locations as well as expressions of the form <sup>R</sup>(x) (to indicate RMW-flag <sup>R</sup>). Extended expressions <sup>E</sup> can hold on entire *intervals* of a store list (denoted [E]). Store lists can be split into intervals satisfying different interval expressions (I<sup>1</sup> ; ... ; <sup>I</sup>n) using the ";" operator (called "chop"). In turn, τ -I means that all store lists in τ 's potential satisfy <sup>I</sup>. For an assertion <sup>ϕ</sup>, we let *fv*(ϕ) <sup>⊆</sup> Reg∪Loc∪Tid be the set of registers, locations and thread identifiers occurring in <sup>ϕ</sup>, and write <sup>R</sup>(x) <sup>∈</sup> <sup>ϕ</sup> to indicate that the term <sup>R</sup>(x) occurs in <sup>ϕ</sup>.

As an example consider again MP (Fig. 2). We would like to express that T<sup>2</sup> upon seeing y to be 1 cannot see the old value 0 of x anymore. In Piccolo this is expressed as T2-[<sup>y</sup> = 1] ; [<sup>x</sup> = 1]: the store lists of <sup>T</sup><sup>2</sup> can be split into two intervals (one possibly empty), the first satisfying <sup>y</sup> = 1 and the second <sup>x</sup> = 1.

Formally, an assertion ϕ describes register stores coupled with SRA states:

Definition 13. Let γ be a register store, δ a potential store, L a store list, and D a potential mapping. We let <sup>e</sup>γ,δ <sup>=</sup> <sup>γ</sup>(e), <sup>x</sup>γ,δ <sup>=</sup> <sup>δ</sup>(x), and -<sup>R</sup>(x)γ,δ <sup>=</sup> if rmw(δ(x)) = <sup>R</sup> then *true* else *false*. The extension of this notation to any extended expression E is standard. The validity of assertions in γ, D, denoted by γ, D |= <sup>ϕ</sup>, is defined as follows:


Note that with <sup>∧</sup> and <sup>∨</sup> as well as negation on expressions,<sup>1</sup> the logic provides the operators on sets of states necessary for an instantiation of our RG framework. Further, the requirements from SRA states guarantee certain properties:

<sup>1</sup> Negation just occurs on the level of simple expressions e which is sufficient for calculating P \ e required in rules if and while.


Fig. 10. Memory triples for Piccolo using WRITE ∈ {SWAP, STORE} and assuming <sup>τ</sup> <sup>=</sup> <sup>π</sup>


All assertions are preserved by steps lose and dup. This stability is required by our RG framework (Condition (mem))<sup>2</sup>. Stability is achieved here because negations occur on the level of (simple) expressions only (e.g., we cannot have <sup>¬</sup>(τ-[<sup>x</sup> = <sup>v</sup>]), meaning that <sup>τ</sup> must have a store in its potential whose value for x is not v, which would not be stable under lose).

Proposition 1. *If* γ, D |= <sup>ϕ</sup> *and* <sup>D</sup> <sup>ε</sup> −→SRA D *, then* γ, D |= <sup>ϕ</sup>*.*

*Memory Triples.* Assertions in Piccolo describe sets of states, thus can be used to formulate memory triples. Figure 10 gives the base triples for the different primitive instructions.

We see the standard SC rule of assignment (Subst-asgn) for registers followed by a number of stability rules detailing when assertions are not affected by instructions. Axioms Fork and Join describe the transfer of properties from forking thread to forked threads and back.

The next four axioms in the table concern write instructions (either SWAP or STORE). They reflect the semantics of writing in SRA: (1) In the writer thread τ all stores in all lists get updated (axiom Wr-own). Other threads π will have (2) their lists being split into "old" values for x with R flag and the new value for x (Wr-other-1), (3) properties (expressed as I<sup>τ</sup> ) of suffixes of lists being preserved when the writing thread satisfies the same properties (Wr-other-2) and (4) their lists consisting of R-accesses to x followed by properties of the

<sup>2</sup> Such stability requirements are also common to other reasoning techniques for weak memory models, e.g., [19].

writer (Wr-other-3). The last axiom concerns SWAP only: as it can only read from store entries marked as RMW it discards intervals satisfying [R(x)].

*Example 4.* We employ the axioms for showing one proof step for MP, namely one pair in the non-interference check of the rely R<sup>2</sup> of T<sup>2</sup> with respect to the guarantees G<sup>1</sup> of T1:

{T2-[<sup>y</sup> = 1] ; [<sup>x</sup> = 1] <sup>∧</sup> <sup>T</sup>1-[<sup>x</sup> = 1]} <sup>T</sup><sup>1</sup> -<sup>→</sup> STORE(x, 1) {T2-[<sup>y</sup> = 1] ; [<sup>x</sup> = 1]}

By taking <sup>I</sup><sup>τ</sup> to be [<sup>x</sup> = 1], this is an instance of Wr-other-2.

In addition to the axioms above, we use a *shift* rule for load instructions:

Ld-shift {<sup>τ</sup> -I} τ -<sup>→</sup> <sup>r</sup> := LOAD(x) {ψ} {<sup>τ</sup> -[(<sup>e</sup> <sup>∧</sup> <sup>E</sup>)(<sup>r</sup> := <sup>x</sup>)] ; <sup>I</sup>} <sup>τ</sup> -<sup>→</sup> <sup>r</sup> := LOAD(x) {(<sup>e</sup> <sup>∧</sup> <sup>τ</sup> -[E]; <sup>I</sup>) <sup>∨</sup> <sup>ψ</sup>}

A load instruction reads from the first store in the lists, however, if the list satisfying [(<sup>e</sup> <sup>∧</sup> <sup>E</sup>)(<sup>r</sup> := <sup>x</sup>)] in [(<sup>e</sup> <sup>∧</sup> <sup>E</sup>)(<sup>r</sup> := <sup>x</sup>)] ; <sup>I</sup> is empty, it reads from a list satisfying I. The shift rule for LOAD puts this shifting to next stores into a proof rule. Like the standard Hoare rule Subst-asgn, Ld-shift employs backward substitution.

*Example 5.* We exemplify rule Ld-shift on another proof step of example MP, one for local correctness of T2:

$$\{\mathbf{T}\_2 \ltimes [\mathbf{y} \neq 1] \; ; \; [\mathbf{x} = 1] \} \; \mathbf{T}\_2 \mapsto \mathbf{a} := \mathbf{L} \mathbf{0} \mathbf{A} \mathbf{(y) \; \{\mathbf{a} = 1 \Rightarrow \mathbf{T}\_2 \ltimes [\mathbf{x} = 1] \}}$$

From axiom Stable-ld we get {T2-[<sup>x</sup> = 1]} <sup>T</sup><sup>2</sup> -<sup>→</sup> <sup>a</sup> := LOAD(y) {T2-[<sup>x</sup> = 1]}. We obtain {T2-[<sup>y</sup> = 1] ; [<sup>x</sup> = 1]} <sup>T</sup><sup>2</sup> -<sup>→</sup> <sup>a</sup> := LOAD(y) {<sup>a</sup> = 1 <sup>∨</sup> <sup>T</sup>2-[<sup>x</sup> = 1]} using the former as premise forLd-shift.

In addition, we include the standard conjunction, disjunction and consequence rules of Hoare logic. For instrumented primitive commands we employ the following rule:

$$\text{INSTR} \xrightarrow{\{\psi\_0\}} \frac{\{\psi\_0\} \; \tau \longmapsto c \; \{\psi\_1\} \{\psi\_1\} \; \tau \longmapsto r\_1 := e\_1 \; \{\psi\_2\} \dots \{\psi\_{n-1}\} \; \tau \longmapsto r\_n := e\_n \; \{\psi\_n\}}{\{\psi\_0\} \; \tau \longmapsto \langle c, \langle r\_1, \dots, r\_n \rangle \rangle := \langle e\_1, \dots, e\_n \rangle \} \; \{\psi\_n\}} \; \tau \colon \{\psi\_n\} \; \tau \longmapsto \langle\psi\_n| := \langle\psi\_n| \psi\_n\rangle \; \tau \longmapsto \langle\psi\_n|$$

Finally, it can be shown that all triples derivable from axioms and rules are valid memory triples.

Lemma 1. *If a* Piccolo *memory triple is derivable,* Piccolo {ϕ} <sup>τ</sup> -→ α {ψ}*, then* SRA {{γ, D | γ, D |= <sup>ϕ</sup>}} <sup>τ</sup> -<sup>→</sup> <sup>α</sup> {{γ, D | γ, D |= <sup>ψ</sup>}}*.*

$$\begin{array}{c} \{\mathsf{T}\_{0}\ltimes[\mathtt{x}\neq\ 2]\} \\ \hline \mathsf{Thread} \ \mathsf{T}\_{1} \\ \{\mathsf{T}\_{1}\ltimesI\_{0}^{\mathtt{x}}\} \\ 1:\mathsf{STORE}(\mathsf{x},1); \\ \{\mathsf{T}\_{1}\ltimesI\_{1}^{\mathtt{x}}\} \\ 2:\mathsf{STORE}(\mathsf{x},2) \\ \{\mathsf{True}\} \\ \{\mathsf{a}=2\Rightarrow\mathsf{b}=\mathsf{2}\} \end{array} \left| \begin{array}{c} \textbf{Thread}\ \mathsf{T}\_{2} \\ \{\mathsf{T}\_{2}\ltimesI\_{012}^{\mathtt{x}}\} \\ \{\mathsf{3}:\mathsf{a}=\mathsf{LOAD}(\mathsf{x}); \\ \{\mathsf{a}=2\Rightarrow\mathsf{T}\_{2}\ltimesI\_{2}^{\mathtt{x}}\} \\ \{\mathsf{a}=2\Rightarrow\mathsf{b}=2\} \\ \{\mathsf{a}=2\Rightarrow\mathsf{b}=2\} \\ \end{array} \right| $$

$$\begin{array}{|c||c||c|}\hline \textbf{Thread} & \textbf{T}\_{1} & \textbf{Thread} \,\mathsf{T}\_{2} \\ \{\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\begin{array}{|c|}\{\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\|\!\!\!\!\!\|\!\!\!\!\!\|\!\!\!\!\!\&\//\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\begin{array}{|c|}\{\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\}\&\{\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!$$

Fig. 12. RRC for four threads (a.k.a. CoRR2)

### 7 Examples

We discuss examples verified in Piccolo. Additional examples can be found in the extended version of this paper [30].

*Coherence.* We provide two coherence examples in Figs. 11 and 12, using the notation I<sup>x</sup> <sup>v</sup>1v2...v<sup>n</sup> = [<sup>x</sup> <sup>=</sup> <sup>v</sup><sup>1</sup>];[<sup>x</sup> <sup>=</sup> <sup>v</sup><sup>2</sup>] ; ... ; [<sup>x</sup> <sup>=</sup> <sup>v</sup><sup>n</sup>]. Figure <sup>11</sup> enforces an ordering on writes to the shared location x on thread T1. The postcondition guarantees that after reading the second write, thread T<sup>2</sup> cannot read from the first. Figure 12 is similar, but the writes to x occur on two different threads. The postcondition of the program guarantees that the two different threads agree on the order of the writes. In particular if one reading thread (here T3) sees the value 2 then 1, it is impossible for the other reading thread (here <sup>T</sup>4) to see <sup>1</sup> then 2.

Potential assertions provide a compact and intuitive mechanism for reasoning, e.g., in Fig. 11, the precondition of line 3 precisely expresses the order of values available to thread T2. This presents an improvement over view-based assertions [16], which required a separate set of assertions to encode write order.

*Peterson's Algorithm.* Figure 13 shows Peterson's algorithm for implementing mutual exclusion for two threads [38] together with Piccolo assertions. We depict only the code of thread T1. Thread T<sup>2</sup> is symmetric. A third thread T<sup>3</sup> is assumed stopping the other two threads at an arbitrary point in time. We

```
Thread T1 -

 ¬a1 ∧ ¬a2 ∧ mx1 = 0
while ¬stop do -

                    ¬a1 ∧ (¬a2 ∨ T1-
                                      [R(turn)] ; [flag2])
1 : STORE(flag1, true); -

                          ¬a1 ∧ T1-
                                     [flag1] ∧ (¬a2 ∨ T1-
                                                          [R(turn)] ; [flag2])
2 : SWAP(turn, 2); a1 := true;
3 : do -

         a1 ∧ (¬a2 ∨ T1-
                          [flag2 ∧ turn -
                                        = 1] ∨ P)

4 : fl1 := LOAD(flag2); -

                             a1 ∧ (¬a2 ∨ (fl1 ∧ T1-
                                                    [flag2 ∧ turn -
                                                                   = 1]) ∨ P)

5 : tu1 := LOAD(turn); -

                            a1 ∧ (¬a2 ∨ (fl1 ∧ tu1 -
                                                  = 1 ∧ T1-
                                                             [flag2 ∧ turn-
                                                                           = 1]) ∨ P)

6 : until ¬fl1 ∨ (tu1 = 1); -

                               a1 ∧ (¬a2 ∨ P)

7 : STORE(cs, ⊥); -

                    a1 ∧ (¬a2 ∨ P)

8 : STORE(cs, 0); -

                   T1-
                        [cs = 0] ∧ a1 ∧ (¬a2 ∨ P)

9 : mx1 := LOAD(cs); -

                       mx1 = 0 ∧ a1 ∧ (¬a2 ∨ P)

10 : STORE(flag1, 0); a1 := false
 mx1 = 0
```
Fig. 13. Peterson's algorithm, where <sup>P</sup> <sup>=</sup> <sup>T</sup>1-[R(turn)] ; [flag2 <sup>∧</sup>turn = 1]. Thread <sup>T</sup><sup>2</sup> is symmetric and we assume a stopper thread <sup>T</sup>3 that sets stop to *true*.

use do <sup>C</sup> until <sup>e</sup> as a shorthand for <sup>C</sup> ; while <sup>e</sup> do <sup>C</sup>. For correctness under SRA, all accesses to the shared variable turn are via a SWAP, which ensures that turn behaves like an SC variable.

Correctness is encoded via registers mx<sup>1</sup> and mx<sup>2</sup> into which the contents of shared variable cs is loaded. Mutual exclusion should guarantee both registers to be 0. Thus neither threads should ever be able to read cs to be ⊥ (as stored in line 7). The proof (like the associated SC proof in [9]) introduces auxiliary variables a<sup>1</sup> and a2. Variable a<sup>i</sup> is initially *false*, set to *true* when a thread T<sup>i</sup> has performed its swap, and back to *false* when T<sup>i</sup> completes.

Once again potentials provide convenient mechanisms for reasoning about the interactions between the two threads. For example, the assertion T1-[R(turn)] ; [flag<sup>2</sup>] in the precondition of line 2 encapsulates the idea that an RMW on turn (via SWAP(turn, 2)) must read from a state in which flag<sup>2</sup> holds, allowing us to establish T1-[flag<sup>2</sup>] as a postcondition (using the axiom Swap-skip). We obtain disjunct T1-[flag<sup>2</sup> <sup>∧</sup> turn = 1] after additionally applying Wr-own.

### 8 Discussion, Related and Future Work

Previous RG-like logics provided ad-hoc solutions for other concrete memory models such as x86-TSO and C/C++11 [11,16,17,32,39,40,47]. These approaches established soundness of the proposed logic with an ad-hoc proof that couples together memory and thread transitions. We believe that these logics can be formulated in our proposed general RG framework (which will require extensions to other memory operations such as fences).

Moreover, Owicki-Gries logics for different fragments of the C11 memory model [16,17,47] used specialized assertions over the underlying view-based semantics. These include *conditional-view assertion* (enabling reasoning about MP), and *value-order* (enabling reasoning about coherence). Both types of assertions are special cases of the potential-based assertions of Piccolo.

Ridge [40] presents an RG reasoning technique tailored to x86-TSO, treating the write buffers in TSO architectures as threads whose steps have to preserve relies. This is similar to our notion of stability of relies under internal memory transitions. Ridge moreover allows to have memory-model specific assertions (e.g., on the contents of write buffers).

The OGRA logic [32] for Release-Acquire (which is slightly weaker form of causal consistency compared to SRA studied in this paper) takes a different approach, which cannot be directly handled in our framework. It employs simple SC-like assertions at the price of having a non-standard non-interference condition which require a stronger form of stability.

Coughlin et al. [14,15] provide an RG reasoning technique for weak memory models with a semantics defined in terms of *reordering relations* (on instructions). They study both multicopy and non-multicopy atomic architectures, but in all models, the rely-guarantee assertions are interpreted over SC.

Schellhorn et al. [41] develop a framework that extends ITL with a compositional interleaving operator, enabling proof decomposition using RG rules. Each interval represents a sequence of states, strictly alternating between program and environment actions (which may be a skip action). This work is radically different from ours since (1) their states are interpreted using a standard SC semantics, and (2) their intervals represent an *entire execution* of a command as well the interference from the environment while executing that command.

Under SC, rely-guarantee was combined with separation logic [44,46], which allows the powerful synergy of reasoning using stable invariants (as in relyguarantee) and ownership transfer (as in concurrent separation logic). It is interesting to study a combination of our RG framework with concurrent separation logics for weak memory models, such as [43,45].

Other works have studied the decidability of verification for causal consistency models. In work preceding the potential-based SRA model [28], Abdulla et al. [1] show that verification under RA is undecidable. In other work, Abdulla et al. [3] show that the reachability problem under TSO remains decidable for systems with dynamic thread creation. Investigating this question under SRA is an interesting topic for future work.

Finally, the spirit of our generic approach is similar to Iris [22], Views [18], Ogre and Pythia [7], the work of Ponce de León et al. [34], and recent axiomatic characterizations of weak memory reasoning [19], which all aim to provide a *generic* framework that can be instantiated to underlying semantics.

In the future we are interested in automating the reasoning in Piccolo, starting from automatically checking for validity of program derivations (using, e.g., SMT solvers for specialised theories of sequences or strings [24,42]), and, including, more ambitiously, synthesizing appropriate Piccolo invariants.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Unblocking Dynamic Partial Order Reduction**

Michalis Kokologiannakis , Iason Marmanis(B) , and Viktor Vafeiadis

MPI-SWS, Kaiserslautern, Saarbr¨ucken, Germany {michalis,imarmanis,viktor}@mpi-sws.org

**Abstract.** Existing dynamic partial order reduction (DPOR) algorithms scale poorly on concurrent data structure benchmarks because they visit a huge number of blocked executions due to spinloops.

In response, we develop Awamoche, a sound, complete, and strongly optimal DPOR algorithm that avoids exploring any useless blocked executions in programs with await and confirmation-CAS loops. Consequently, it outperforms the state-of-the-art, often by an exponential factor.

### **1 Introduction**

*Dynamic partial order reduction* (DPOR) [13] has been promoted as an effective verification technique for concurrent programs: starting from a single execution of the program under test, DPOR repeatedly reverses the order of conflicting accesses in order to generate all (meaningfully) different program executions.

Applying DPOR in practice, however, reveals a major performance and scalability bottleneck: it explores a huge number of *blocked executions*, often outnumbering the complete program executions by an exponential factor. Blocked executions most commonly occur in programs with *spinloops*, i.e., loops that do not make progress unless some condition holds. Such loops are usually transformed into *assume statements* [14,18], effectively requiring that the loop exits at its first iteration (and blocking otherwise).

We distinguish three classes of such blocked executions.

The first class occurs in programs with non-terminating spinloops, such as a program awaiting for *x >* 42 in a context where *x* = 0. For this program, modeled as the statement assume(*x >* 42), DPOR obviously explores a blocked execution as the only existing value for *x* violates the assume condition. Such blocked executions *should* be explored because they indicate program errors.

The second class occurs in programs with await loops. To see how such loops lead to blocked executions, consider the following program under *sequential consistency* (SC) [23] (initially *x* = *y* = 0),

$$\begin{array}{l} x := 2\\ \mathtt{assume}(y \le 1) \end{array} \left| \begin{array}{l} y := 2\\ \mathtt{assume}(x \le 1) \\ y := 1 \end{array} \right| $$

where each assume models an await loop, e.g., do *a* := *y* while (*a >* 1) for the assume of the first thread. Suppose that DPOR executes this program in a left-toright manner, thereby generating the interleaving *x* := 2*,* assume(*y* ≤ 1)*, y* := 2. At this point, assume(*x* ≤ 1) cannot be executed, since *x* would read 2. Yet, DPOR cannot simply abort the exploration. To generate the interleaving where the first thread reads *y* = 1, DPOR must consider the case where the read of *x* is executed before the *x* := 2 assignment. In other words, DPOR has to explore blocked executions in order to generate non-blocked ones.

The third class occurs in programs with confirmation-CAS loops such as:

$$\begin{array}{ll} \mathsf{do} & \\ a := x & \\ b := f(a) & \\ \mathtt{while} \,\,(\neg \mathsf{CAS}(x, a, b)) & \\ \end{array} \text{which is modeled as:} \quad b := f(a) \\ \begin{array}{ll} a := x & \\ b := f(a) \\ \mathtt{while} \,\,(\neg \mathsf{CAS}(x, a, b)) & \\ \end{array}$$

Consider a program comprising two threads running the code above, with *a* and *b* being local variables. Suppose that DPOR first obtains the (blocked) trace where both threads concurrently try to perform their CAS: *a*<sup>1</sup> := *x*, *a*<sup>2</sup> := *x*, CAS(*x, a*1*, b*1), CAS(*x, a*2*, b*2). Trying to satisfy the blocked assume of thread 2 by reversing the CAS instructions is fruitless because then thread 1 will be blocked.

In this paper, we show that exploring blocked executions of the second and third classes is unnecessary.

We develop Awamoche, a sound, complete, and optimal DPOR algorithm that avoids generating any blocked executions for programs with await and confirmation-CAS loops. Our algorithm is *strongly optimal* in that no exploration is wasted: it either yields a complete execution or a termination violation. Awamoche extends TruSt [15], an optimal DPOR algorithm that supports weak memory models and has polynomial space requirements, with three new ideas:


As we shall see in Sect. 5, supporting these DPOR modifications is by no means trivial when it comes to proving correctness and (strong) optimality. Indeed, TruSt's correctness proof proceeds in a backward manner, assuming a way to determine the last event that was added to a given trace. The presence of in-place and speculative revisits, however, makes this impossible.

We therefore develop a completely different proof that works in a forward manner: from each configuration that is a prefix of a complete trace, we construct a sequence of steps that will lead to a larger configuration that is also a prefix of the trace. Our proof assumes that same-location writes are causally ordered, which invariably holds in correct data structure benchmarks, but is otherwise more general than TruSt's assuming less about the underlying memory model.

Our contributions can be summarized as follows:

Section 2 We describe how and why DPOR encounters blocked executions.

Section 3 We intuitively present Awamoche's three novel key ideas: stale reads, in-place revisits, and speculative revisits.


### **2 DPOR and Blocked Executions**

Before presenting Awamoche, we recall the fundamentals of DPOR (Sect. 2.1), and explain why spinloops lead to blocked explorations (Sect. 2.2).

#### **2.1 Dynamic Partial Order Reduction**

DPOR algorithms verify a concurrent program by enumerating a representative subset of its interleavings. Specifically, they partition the interleavings into equivalence classes (two interleavings are equivalent if one can be obtained from the other by reordering independent instructions), and strive to explore one interleaving per equivalence class. Optimal algorithms [2,15] achieve this goal.

DPOR algorithms explore interleavings dynamically. After running the program and obtaining an initial interleaving, they detect *racy* instructions (i.e., instructions accessing the same variable with at least one of them being a write), and proceed to explore an interleaving where the race is reversed.

Let us clarify the exploration procedure with the following example, where both variables *x* and *y* are initialized to zero.

$$\begin{array}{l|l}\textbf{if} \ (x=0) & x:=1\\\ y:=1 & x:=2 \end{array} \tag{\text{RW}+\text{WW}}$$

The RW+WW program has 5 interleavings that can be partitioned into 3 equivalence classes. Intuitively, the *y* := 1 is irrelevant because the program contains no other access to *y*; all that matters is the ordering among the *x* accesses.

The exploration steps for RW+WW can be seen in Fig. 1<sup>1</sup>. DPOR obtains a full trace of the program, while also recording the transitions that it took at each

<sup>1</sup> The exploration procedure has been simplified for presentational purposes. For a full treatment, please refer to [2,15].

**Fig. 1.** A DPOR exploration of RW+WW

step at the respective transition's backtrack set (traces 0 to 2 ). After obtaining a full trace, it initiates a race-detection phase. During this phase, DPOR detects the races between *r<sup>x</sup>* and the two writes *w*<sup>1</sup> and *w*2. (While *w*<sup>1</sup> and *w*<sup>2</sup> also write the same variable, they do not constitute a race, as they are causally related.) For the first race, DPOR adds *w*<sup>1</sup> in the backtrack set of the first transition, so that it can subsequently execute *w*<sup>1</sup> instead of *rx*. For the second one, while *w*<sup>2</sup> is not in the backtrack set of the first transition, *w*<sup>2</sup> cannot be directly executed as the first transition without its causal predecessors (i.e., *w*1) having already executed. Since *w*<sup>1</sup> is already in the backtrack set of the first transition, DPOR cannot do anything else, and the race-detection phase is over.

After the race-detection phase is complete, the exploration proceeds in an analogous manner: DPOR backtracks to the first transition, fires *w*<sup>1</sup> instead of *r<sup>x</sup>* (trace 3 ), re-runs the program to obtain a full trace (trace 4 ), and initiates another race-detection phase. During the latter, a race between *r<sup>x</sup>* and *w*<sup>2</sup> is detected, and *w*<sup>2</sup> is inserted in the backtrack set of the second transition.

Finally, DPOR backtracks to the second transition, executes *w*<sup>2</sup> instead of *r<sup>x</sup>* (trace 5 ), and eventually obtains the full trace 6 . During the last racedetection phase of the exploration, DPOR detects the races between *r<sup>x</sup>* and the two writes *w*<sup>1</sup> and *w*2. As *r<sup>x</sup>* is already in the backtrack set of the first two transitions, DPOR has nothing else to do, and thus concludes the exploration.

Observe that DPOR explored one representative trace from each equivalence class (traces 2 , 4 , and 6 ). To avoid generating multiple equivalent interleavings, optimal DPOR algorithms extend the description above by restricting when a race reversal is considered. In particular, the TruSt algorithm [15] imposes a maximality condition on the part of the trace that is affected by the reversal.

#### **2.2 Assume Statements and DPOR**

$$\begin{array}{l} \mathsf{while1e} \,(x = 0) \,\,\|\, \left\| \begin{array}{l} x := 1 \\ x := 2 \end{array} \right\|\, x := 2 \,\,\left(\begin{array}{l} \mathsf{new} \,\mathsf{w} + \mathsf{w} \mathsf{w} \cdot \mathsf{L} \end{array} \right) \qquad \begin{array}{l} \mathsf{assume} \,(x \neq 0) \\\\ x := 2 \end{array} \,\left(\begin{array}{l} \mathsf{new} + \mathsf{w} \mathsf{w} - \mathsf{A} \end{array} \right) \,\, \end{array}$$

**Fig. 2.** A variation of RW+WW with an await loop (left) and an assume (right)

To see how assume statements arise in concurrent programs, suppose that we replace the if-statement of RW+WW with an await loop (Fig. 2). Although the change does not really affect the possible outcomes for *x*, it makes DPOR diverge: DPOR examines executions where the loop terminates in 1, 2, 3, . . . steps. Since, however, the loop has no side-effects, we can actually transform it into an assume(*x*) statement, effectively modeling a loop bound of one.

Doing so guarantees DPOR's termination but not its good performance. The reason is ascribed to the very nature of DPOR. Indeed, suppose that DPOR executes the first instruction of the left thread and then blocks due to assume statement. At this point, DPOR cannot simply stop the exploration due to the assume statement not being satisfied; it has to explore the rest of the program, so that the race reversals make the assume succeed. All in all, DPOR explores 2 complete and 1 blocked traces for RW+WW-A.

In general, DPOR cannot know whether some future reversal will ever make an assume succeed. Worse yet, it might be the case that there is an exponential number of traces to be explored (due to the other program threads), until DPOR is certain that the assume statement cannot be unblocked.

To see this, consider the following program where RW+WW-A runs in parallel with some threads accessing *z*:

$$\text{RW} + \text{WW} - A \parallel z := 1 \parallel a\_1 := z \parallel \dots \parallel a\_N := z \parallel \quad \text{(RW} + \text{WW} \text{-A-PAR)}$$

For the trace of RW+WW-A where the assume fails, DPOR fruitlessly explores 2*<sup>N</sup>* traces in the hope that an access to *x* is found that will unblock the assume statement.

Given that executing an assume statement that fails leads to blocked executions, one might be tempted to consider a solution where assume statements are only scheduled if they succeed. Even though such a solution would eliminate blocking for RW+WW-A, it is not a panacea. To see why, consider a variation of RW+WW-A where the first thread executes assume(*x* = 0) instead of assume(*x* = 0). In such a case, the assume can be scheduled first (as it succeeds), but reversing the races among the *x* accesses will lead to blocked executions. It becomes evident that a more sophisticated solution is required.

#### **3 Key Ideas**

Awamoche, our optimal DPOR algorithm, extends TruSt [15] with three novel key ideas: stale-read annotations (Sect. 3.1), in-place revisits (Sect. 3.2) and speculative revisits (Sect. 3.3). As we will shortly see, these ideas guarantee that Awamoche is *strongly optimal*: it never initiates fruitless explorations, and all explorations lead to executions that are either complete or denote termination violations. In the rest of the paper, we call such executions *useful*.

#### **3.1 Avoiding Blocking Due to Stale Reads**

Race reversals are at the heart of any DPOR algorithm. TruSt distinguishes two categories of race reversals: (1) write-read and write-write reversals, (2) readwrite reversals. While the former category can be performed by modifying the trace directly in place (called a "forward revisit"), the latter may require removing events from the trace (called a "backward revisit"). To ensure optimality for backward revisits, TruSt checks a certain maximality condition for the events affected by them, namely the read, which will be reading from a different write, and all events to be deleted.

An immediate consequence is that any read events not satisfying TruSt's maximality condition, which we call *stale reads*, will never be affected by a subsequent revisit. As an example, consider the following program with a read that blocks if it reads 0:

$$x \coloneqq 1 \parallel \texttt{ вэзчие} (x = 1) \tag{W+R}$$

After obtaining the trace *x* := 1; assume(*x* = 1), TruSt forward-revisits the read in-place, and makes it read 0. At this point, we know that (1) the assume will fail, and (2) that both the read and the events added before it cannot be backwardrevisited, due to the read reading non-maximally (which violates TruSt's maximality condition). As such, no useful execution is ever going to be reached, and there is no point in continuing the exploration.

Leveraging the above insight, we make Awamoche immediately drop traces where some assume is not satisfied due to a stale read. To do this, Awamoche automatically annotates reads followed by assume statements with the condition required to satisfy the assume, and discards all forward revisits that do not satisfy the annotation.

Even though stale-read annotations are greatly beneficial in reducing blocking, they are merely a remedy, not a cure. As already mentioned, they are only leveraged in write-read reversals, and are thus sensitive to DPOR's exploration order. To completely eliminate blocking, Awamoche performs in-place and speculative revisits, described in the next sections.

#### **3.2 Handling Await Loops with In-Place Revisits**

Awamoche's solution to eliminate blocking is to not blindly reverse all races whenever a trace is blocked, but rather to only try and reverse those that might unblock the exploration.

As an example, consider Rw+ww-A-PAR (Fig. 3). After Awamoche obtains the first full trace, it detects the races among the *z* accesses, as well as the *rx, w*1

**Fig. 3.** Key steps in Awamoche's exploration of rw+ww-a-par

**Fig. 4.** An Awamoche exploration of RW+WW

race. (Recall that Awamoche is based on TruSt and therefore does not consider the *rx, w*2 race in this trace.) At this point, a standard DPOR would start reversing the races among the *z* accesses. Doing so, however, is wasteful, since reversing races after the blockage will lead to the exploration of more blocked executions.

Instead, Awamoche chooses to reverse the *rx, w*1 race (as this might make the assume succeed), and completely drops the races among the *z* accesses. We call this procedure *in-place revisiting* (denoted by *ir* in Fig. 3). Intuitively, ignoring the *z* races is safe to do as they will have the chance to manifest in the trace where the *rx, w*1 race has been reversed.

Indeed, reversing the *rx, w*1 does make the assume succeed, at which point the exploration proceeds in the standard DPOR way. Awamoche explores 2*<sup>N</sup>* traces where the read of *x* reads 1, and another 2*<sup>N</sup>* where it reads 2. Note that, even though in this example Awamoche explores 2/3 of the traces that standard DPOR explores, as we show in Sect. 6 the difference can be exponential.

**Fig. 5.** An Awamoche exploration of the confirmation-CAS example.

Suppose now that we change the assume(x) in Rw+ww-A-PAR to assume(x = 42) so that there is no trace where the assume is satisfied. The key steps of Awamoche's exploration can be seen in Fig. 4. Upon obtaining a full trace, all races to *z* are ignored and Awamoche revisits *r<sup>x</sup>* in place. Subsequently, as the assume is still not satisfied, Awamoche again revisits *r<sup>x</sup>* in place (trace <sup>2</sup> ). At this point, since there are no other races on *x* it can reverse, Awamoche reverses all the races on *z*, and finishes the exploration.

In total, Awamoche explores 2*<sup>N</sup>* blocked executions for the updated example, which are all useful. As *r<sup>x</sup>* is reading from the latest write to *x* in all these executions and the assume statement (corresponding to an await loop) still blocks, each of these executions constitutes a distinct liveness violation.

#### **3.3 Handling Confirmation CASes with Speculative Revisits**

In-place revisiting alone suffices to eliminate useless blocking in programs whose assume statements arise only due to await loops. It does not, however, eliminate blocking in *confirmation-CAS loops*. Confirmation-CAS loops consist of a speculative read of some shared variable, followed by a (possibly empty) sequence of local accesses and other reads, and a confirmation CAS that only succeeds if it reads from the same write as the speculative read.

As an example, consider the confirmation-CAS example from Sect. 1 and a trace where both reads read the initial value, the CAS of the first thread succeeds, and the CAS of the second thread reads the result of the CAS of the first. Although this trace is blocked and explored by DPOR (since the CAS read of the second thread is reading from the latest, same-location write), it does not constitute an actual liveness violation. In fact, even though the CAS read that blocks does read from the latest, same-location write, the *r* := *x* read in the same loop iteration does not. In order for a blocked trace (involving a loop) to be an actual liveness violation, all reads corresponding to a given iteration need to be reading the latest value, and not just one.

To avoid exploring blocked traces altogether for cases likes this, we equip Awamoche with some builtin knowledge about confirmation-CAS loops and treat them specially when reversing races. To see how this is done, we present a run of Awamoche on the confirmation-CAS example of Sect. 1 (see Fig. 5).

While building the first full trace (trace 1 ), another big difference between Awamoche and standard DPOR algorithms is visible: Awamoche does not maintain backtrack sets for confirmation CASes. Indeed, there is no point in reversing a race involving a confirmation CAS, as such a reversal will make the CAS read from a different write than the speculative read, and hence lead to an assume failure.

After obtaining the first full trace (trace 2 ), Awamoche initiates a racedetection phase. At this point, the final big difference between Awamoche and previous DPORs is revealed. Awamoche will not reverse races between reads and CASes, but rather between *speculative reads*. (While speculative reads are not technically conflicting events, they conflict with the later confirmation-CASes.) As can be seen in trace 3 , Awamoche schedules the speculative read of the second thread before that of the first thread so that it explores the scenario where the confirmation of the second thread succeeds before the one of the first.

Finally, simply by adding the remaining events of the second thread before the ones of the first thread, Awamoche explores the second and final trace of the example (trace 4 ), while avoiding having blocked traces altogether.

### **4 Await-Aware Model Checking Algorithm**

Awamoche is based on TruSt [15], a state-of-the-art stateless model checking algorithm that explores *execution graphs* [9], and thus seamlessly supports weak memory models. In what follows, we formally define execution graphs (Sect. 4.1), and then present Awamoche (Sect. 4.2).

#### **4.1 Execution Graphs**

An execution graph *G* consists of a set of events (nodes), representing instructions of the program, and a few relations of these events (edges), representing interactions among the instructions.

**Definition 1.** *An* event*, e* ∈ Event*, is either the initialization event* init*, or a thread event t, i, lab where t* ∈ Tid *is a thread identifier, i* ∈ Idx *-* = N *is a serial number inside each thread, and lab* ∈ Lab *is a label that takes one of the following forms:*


*We omit the* ∅ *for read/write labels with no attributes. The functions* tid*,* idx*,* loc*, and* val*, respectively return the thread identifier, serial number, location, and value of an event, when applicable. We use* R*,* W*,* B*, and* error *to denote the set of all read, write, block, and error events, respectively, and assume that* init ∈ W*. We use superscript and subscripts to further restrict those sets (e.g.,* W*l -* = {init}∪{*w* ∈ W | loc(*w*) = *l*}*).*

In the definition above, read and write events come with various attributes. Specifically, we encode successful CAS operations and other similar atomic operations, such as fetch-and-add, as two events: an exclusive read followed by an exclusive write (both denoted by the excl attribute). Moreover, we have a spec attribute for speculative reads, and write Rconf for the corresponding confirmation reads (i.e., the first exclusive, same-location read that is po-after a given *<sup>r</sup>* <sup>∈</sup> <sup>R</sup>spec). Finally, we have the awt attribute for reads the outcome of which is tied with an assume statement, and write Rblk for the subset of Rawt that are reading a value that makes the assume fail (see below).

#### **Definition 2.** *An* execution graph *G consists of:*


*We write G.*R *for the set G.*E ∩ R *and similarly for other sets. Given two events e*1*, e*<sup>2</sup> ∈ *G.*E*, we write e*<sup>1</sup> *<<sup>G</sup> e*<sup>2</sup> *if e*<sup>1</sup> ≤*<sup>G</sup> e*<sup>2</sup> *and e*<sup>1</sup> = *e*2*. We write G*|*<sup>E</sup> for the restriction of an execution graph G to a set of events E, and G*\*E for the graph obtained by removing a set of events E.*

Based on the above graph representation, we define *G.*po, which orders events in the same thread according to their *i* component, and porf, which is the causal order among the graph events, as follows:

$$\begin{aligned} G. \mathtt{po} & \triangleq \{ \langle \mathtt{init}, e \rangle \mid e \in G. \mathtt{E} \mid \{ \mathtt{init} \} \} \\ & \cup \{ \langle e, e' \rangle \in G. \mathtt{E} \times G. \mathtt{E} \mid \mathtt{tid}(e) = \mathtt{tid}(e') \land \mathtt{idx}(e) < \mathtt{idx}(e') \} \\ & G. \mathtt{port} \triangleq (G. \mathtt{po} \cup G. \mathtt{rf})^{+} \end{aligned}$$

The semantics of a program *P* under a memory model m is the set of execution graphs corresponding to the program that satisfy the consistency predicate of m. Consistency predicates generally constrain the possible choices of co and rf, thereby indirectly constraining the possible final values of memory locations and the values that reads can return.

TruSt (and by extension, Awamoche), assumes some properties on the memory model [15]: porf acyclicity, porf-prefix-closedness, co-maximal-extensibility. Intuitively, extensibility captures the idea that executing a program should never get stuck if a thread has more statements to execute.


```
1: procedure Verify(P)
2: VisitP(G∅)
3: procedure VisitP(G)
4: if ¬consistentm(G) ∨ ∃b ∈ G.Rblk. ¬maximal(G, b) then return
5: switch a ← nextP(G) do
6: G ← G ++ a
7: case a = ⊥
8: return "Visited full execution graph G"
9: case a ∈ error
10: exit("error")
11: case a ∈ Rconf
12: e ← maxpo{r ∈ Rspec | tid(r) = tid(a)}
13: VisitP(SetRF(G, a, G.rf(e)))
14: case a ∈ R \Rconf
15: for w ∈ G.Wloc(a) do
16: if a ∈ G.Rspec ∧ ∃b ∈ G.Rspec. w, b∈ G.rf then
17: MaybeBackwardRevisitP(SetRF(G, a, w), {b}, a)
18: else
19: VisitP(SetRF(G, a, w))
20: case a ∈ W
21: if WWRace(G) then exit("Write-write race")
22: VisitP( IPR (G, a))
23: Revs ← G.Rloc(a) \ dom(G.porf; [a])
24: MaybeBackwardRevisitP(G, Revs, a)
25: case
26: VisitP(G)
```
#### **4.2 Awamoche**

Similarly to TruSt, Awamoche verifies a concurrent program *P* by enumerating all of its consistent execution graphs (see Algorithm 1). In contrast to TruSt, however, Awamoche is *strongly optimal*: it never explores an execution *G* where there exists some blocked read *<sup>r</sup>* <sup>∈</sup> *G.*Rblk that is reading from a non-co-maximal write. In other words, Awamoche only visits graphs that lead to useful executions<sup>2</sup>. In order to be able to do so, Awamoche makes stronger assumptions on the underlying memory model m, namely that there are no write-write races, and that m does not allow porf to contradict co (i.e., that co ⊆ porf).

Next, we first describe how TruSt works, and then proceed with Awamoche's modifications .

Given a program *P*, Verify visits all consistent execution graphs of *P* by calling Visit on the execution graph *<sup>G</sup>*<sup>∅</sup> containing only the initialization event.

<sup>2</sup> Recall that blocked reads that read from maximal writes *are* useful, as they denote liveness violations.

At each step (Line 4), as long as the current graph remains consistent under the specified memory model m, Visit obtains a new event *a* via next*P*(*G*) (Line 5), and extends the current graph *G* with *a* (Line 6). We assume that *G*++*a* adds *a* to *G.*E, and also to *G.*co, in case *a* is a write. (Recall that co ⊆ porf and so *a*'s co-placing is unique.)

If there are no more events to add to the graph, then *G* is complete, and Visit returns (Line 7). If *a* denotes an error, then it is reported to the user and verification terminates (Line 9).

If *a* is a read, Visit needs to examine all possible places where *a* could read from. To that end, for each same-location write *w* in *G* (Line 15), Visit recursively explores the possibility that *a* reads from *w* (Line 19). Formally, SetRF(*G, r, w*) returns a graph *G* that is identical to *G* except for its rf component:

$$G'.\mathbf{r}\mathbf{f} = G.\mathbf{r}\mathbf{f} \mid (G.\mathbf{E} \times \{r\}) \cup \{\langle w, r \rangle\}$$

If *a* is a write, Visit examines both the case when *a* is simply added to *G* (Line 22) and the "*backward-revisit*" cases for each existing same-location read in *G* that could read from *a* (Line 5). When *a* backward-revisits a read *r*, the resulting graph *G* only contains the events that were added before *r*, or are porfbefore *a*, and updates *r* to read from *a*. Since, however, there might be many backward revisits that lead to the exact same graph *G* , to ensure optimality, *G* is visited only when the current graph *G* forms a *maximal extension* of *G* . We do not provide TruSt's definition of maximal extensions here, as Awamoche modifies it to achieve strong optimality.

Let us now move to the parts of Algorithm 1 that are Awamoche-specific.

First, Awamoche discards all graphs where some blocked read is reading non-maximally (Line 4). As explained in Sect. 3.2, such reads cannot be revisited and will thus only lead to blocked executions. In addition, to guarantee correctness, Awamoche raises an error if it detects unordered writes (Line 21).

Second, whenever a write event *a* is added, Awamoche revisits all samelocation blocked reads *in place* making them read from *a* (Line 22) and excluding them from the normal backward-revisit procedure (Line refvisitspsipr:visitspsrevs). Formally, we define IPR(*G, a*) to return a graph *G* that is identical to *G* apart from its rf component:

$$G'.\mathbf{rf} = G.\mathbf{rf} \mid (G.\mathbf{E} \times G.\mathsf{R}\_{\mathsf{loc}(a)}^{\mathsf{blk}}) \cup (\{a\} \times G.\mathsf{R}\_{\mathsf{loc}(a)}^{\mathsf{blk}})$$

Third, whenever a confirmation read *a* is added (Line 11), i.e., an exclusive read that succeeds an unmatched speculative read *e*, Awamoche only explores the execution where *a* reads from the same write as *e* (Line 13): any other write would make the confirmation CAS fail.

Fourth, whenever a speculative read *a* is added to read from a candidate write *w* and there is another speculative read *b* reading from the same write *w* (Line 16), Awamoche backward-revisits *b* to read from *a*. Note that, due to the atomicity of the confirming CASes, there can be at most one other speculative read *b* reading from *w*, and so Awamoche revisits it to read from *a*, making it blocked, so that it get revisited in place when the confirming CAS of *a* is added to the graph. (To ensure graph well-formedness, we assume that IPR(*G, b*) does


not modify *G* when called with a read argument *b*, and that SetRF(*G, b,* ) makes *b* read from ⊥, which IPR also considers.)

Finally, similarly to TruSt, Awamoche only performs a backward revisit if *G* forms a maximal extension, though Awamoche employs a slightly different definition of maximal extensions. Awamoche's backward-revisit algorithm can be seen in Algorithm 2.

Roughly, Awamoche performs a backward revisit from *a* to *r* that leads to a graph IPR(*Gr, a*) if, starting from *G<sup>r</sup>* without *r* and *a*, and adding *r* and all the deleted events in a co-maximal way (and performing in-place revisits along the way), leads to *G*. Formally, we write *G*<sup>1</sup> *e* - *G*<sup>2</sup> if there exists *G* <sup>1</sup> such that *G*<sup>2</sup> = IPR(*G* <sup>1</sup>*, e*), *G* <sup>1</sup> = *G*<sup>1</sup> ++ *e* and:

$$\begin{aligned} G\_1'. \mathbf{rf} &= G\_1. \mathbf{rf} \cup \{ \langle \mathsf{max}\_{G.co\_e}, e \rangle \} & G\_1'. \mathbf{co} = G\_1. \mathbf{co} & \text{if } e \in \mathsf{R} \\\ G\_1'. \mathbf{rf} &= G\_1. \mathbf{rf} & G\_1'. \mathbf{co} = G\_1. \mathbf{co} \cup \{ \langle w, e \rangle \} w \in G. \mathsf{W} & \text{if } e \in \mathsf{W} \\\ G\_1'. \mathbf{rf} &= G\_1. \mathbf{r} \mathbf{f} & G\_1'. \mathbf{co} = G\_1. \mathbf{co} & \text{otherwise} \end{aligned}$$

We note that, for the special case where *<sup>e</sup>* <sup>∈</sup> <sup>R</sup>spec and there is *<sup>e</sup>* <sup>∈</sup> *G.*Rspec loc(*e*) such that *e* is not followed by the matching confirmation CAS, we consider ⊥ as the max*G.*co*<sup>e</sup>* . As a final remark, note that, Awamoche modifies next*P*(*G*) so that (a) after scheduling a speculative read, it keeps scheduling events in the same threads until the respective confirming CAS is added, and (b) it does not schedule events from a thread whose last (speculative) read reads ⊥. These modifications ensure that the confirmation patterns are added one at a time, and that in-place revisits take place among confirming CASes and speculative reads.

#### **5 Correctness and Optimality**

Proving Awamoche correct is non-trivial, as we had to develop a novel proof strategy. In what follows, we first review TruSt's proof argument, show why it is inapplicable for Awamoche. Then, we explain our proof strategy (Sect. 5.1) and state our completeness and optimality results (Sect. 5.2).

#### **5.1 Approaches to Correctness**

**TruSt.** The proof of TruSt proceeds in a backward manner. Specifically, TruSt's proof is based on a procedure Prev that, given an execution *G*, recovers the

$$\begin{array}{c} \mathsf{assume}(x \neq 0) \\ y := 1 \end{array} \left| \begin{array}{c} a := y \\ \end{array} \right| \right| \\ \left| x := 1 \\ \end{array} \left| \begin{array}{c} \mathsf{R}(x) \\ \xrightarrow{\mathsf{R}(y)} \mathsf{X}^{\mathsf{R}}(y) \\ \xrightarrow{\mathsf{R}(x)} \mathsf{X}^{\mathsf{R}}(x,1) \\ \end{array} \right| $$

**Fig. 6.** TruSt: In-place revisits make it impossible to determine the last step taken

unique "previous" execution *G<sup>p</sup>* that the algorithm must reach in order to visit *G*. To do so, assuming a left-to-right addition order of events, Prev(*G*) finds the rightmost porf-maximal event *e* of *G*, and decides whether *e* was added in a non-revisit step, or *e* is a read that was just revisited by a write event located to its right. If *e* was added in a non-revisit step, then *G<sup>p</sup>* is simply *G* without *e*. Otherwise, Prev obtains *G<sup>p</sup>* from *G* in the following way: it removes *e* along with the write *w* that *e* reads from, and then iteratively adds the leftmost available event to *G* in a co-maximal way, until *w* is about to be added.

TruSt's completeness and optimality are proved using Prev. For the former, one can show that each consistent final execution can reach the initial empty execution through a series of Prev steps, and each of these steps is matched by a forward step of TruSt. For the latter, one can show that each step of TruSt is matched by the (unique) Prev step.

To see why we cannot follow a similar approach for Awamoche, consider the program of Fig. 6, along with one of its executions. We will show that inplace revisits make it impossible to trace the algorithm's last step merely by inspecting the execution. Assuming a left-to-right addition order, Awamoche will reach this execution as follows: it first adds R(*x*), R(*y*) and W(*x,* 1) (notice that at this point the first read is blocked), then in-place revisit R(*x*), and finally add W(*y,* 1) and backward-revisit R(*y*). This last revisit, however, creates a problem: TruSt's proof assumes that a backward revisit *r, w* implies that *w* is located at the right of *r*, which is clearly not the case here. The fact that in Awamoche backward revisits can happen in both directions, makes it impossible to trace the algorithm's last step simply by inspecting an execution.

**Awamoche.** In contrast to TruSt, Awamoche's proof proceeds in a forward fashion. For each consistent final execution *G<sup>f</sup>* we show 1. which steps are taken by the algorithm in order to reach *G<sup>f</sup>* , and 2. that these are the only possible ones that lead to *G<sup>f</sup>* . To do so, we first define a notion of a *prefix* : we say that an execution *G* is a prefix of *G* (written *G G* ), if *G* can be reached from *G* with a series of *operational steps*. In turn, we define an operational step to be a step that the algorithm may take in the non-revisit case (without demanding it is the one actually taken by the algorithm), that may perform in-place revisits as well.

Using this notion of prefixes, our proof defines a procedure Succs that, given a consistent execution *G<sup>f</sup>* and an execution *G* produced by the algorithm such that *<sup>G</sup> <sup>G</sup><sup>f</sup>* , Succs returns the minimal sequence of algorithm steps that reach some execution *G* for which it is *G G G<sup>f</sup>* . Concretely, if next*P*(*G*) can be added to *G* such that the resulting execution *G* is a prefix of *G<sup>f</sup>* , Succs returns this addition step. Otherwise, next*P*(*G*) is a read event *r* that must be first revisited by an event *e* in order to reach an execution that is a prefix of *G<sup>f</sup>* . Succs then returns the sequence of algorithm steps that reach the execution resulting from extending *G* with the porf-prefix of *e* and setting *r* to read from *e* (or from ⊥, if *e* is a speculative read). Both completeness and optimality follow from Succs's properties, as well as from the observation that every consistent final execution can be reached by a series of operational steps.

### **5.2 Awamoche: Completeness, Optimality, and Strong Optimality**

Before stating our results, we first formally define useful executions. Recall that these are executions where all blocking reads corresponding to await loops are reading maximally (such executions denote liveness violations), and no confirmation CAS fails.

**Definition 3.** *A consistent execution G is* useful *if every read in G.*Rblk *reads from a G.*co*-maximal write and no confirmation CAS fails.*

Next, we define the class of input programs that satisfy our assumptions.

**Definition 4.** *A program P is* well-formed *if every speculative read is followed by a confirmation CAS with no write in-between, and all writes to locations accessed by speculative reads write distinct values.*

**Completeness and Optimality.** Completeness guarantees that every useful final execution is explored. Awamoche is complete for well-formed programs that do not exhibit write-write races.

**Theorem 1 (Completeness).** *Given a well-formed program P,* Verify(*P*) *either detects a write-write race and exits, or visits every useful final execution of P.*

Optimality states that (1) no equivalent final executions are explored, (2) there are no *fruitless* explorations that never lead to a consistent final execution.

**Definition 5.** *We call an execution G visited by* Awamoche fruitless *if it does not recursively lead to any* Visit(*P, G<sup>f</sup>* ) *call, for any consistent final execution G<sup>f</sup> .*

Awamoche is optimal for well-formed programs.

**Theorem 2 (Optimality).** *Given a well-formed program P (1)* Verify(*P*) *never visits two equivalent final executions, and (2) if* Visit(*P, G*) *directly leads to a call to* Visit(*P, G* ) *with G being fruitless, then* Visit(*P, G* ) *will not initiate any other* Visit *calls.*

Observe that in the optimality theorem above, fruitless exploration can lead to an extra Visit step. The reason for that is the treatment of CASes: the read part of a CAS *c* can be added so that it reads from the same write as a different (successful) CAS. In such a case, there is no way to consistently add the pending write of *c* without revisiting, which in turn may not be able to happen due to Awamoche's maximality condition.

**Strong Optimality.** Strong optimality states that, apart from being optimal, only useful executions are visited. Awamoche is strongly-optimal for wellformed programs.

**Theorem 3 (Strong Optimality).** *Given a well-formed program P,* Verify(*P, G*) *only visits useful executions.*

### **6 Evaluation**

We implemented Awamoche as a tool that verifies C/C++ programs under the RC11 memory model [22]. Similarly to other stateless model checkers, Awamoche works at the level of the LLVM Intermediate Representation (LLVM-IR).

In what follows, we evaluate the effectiveness of Awamoche's key ideas (namely, stale-read annotations, in-place revisiting and speculative revisiting) both individually, and as a whole. To that end, we evaluate Awamoche on a set of benchmarks that both amplify the weaknesses of standard DPOR, as well as demonstrate the applicability of our approach in realistic workloads. In all our tests, we compare Awamoche against a vanilla version of TruSt, a version of TruSt that employs stale-read annotations (TruStstale), and a version of TruSt that employs both stale-read annotation and in-place revisiting (TruStIPR).

Even though there are other stateless model checking tools that can be used to verify C/C++ programs (namely, GenMC [19] and Nidhugg [1]), we do not compare against them here, as we care about Awamoche's performance compared to TruSt. We only mention in passing that we expect GenMC's performance to be similar to that of TruStstale (as its implementation incorporates various optimizations for assume statements), and Nidhugg's similar to TruStIPR (as it employs an optimization with a similar effect to in-place revisiting [14]). We also note that comparing with Nidhugg is difficult since it operates under a different memory model, and does not transform the same types of loops to assume statements as Awamoche (also see Sect. 7).

We draw two major conclusions from our evaluation. First, Awamoche's optimization yields exponential performance benefits compared to standard DPOR approaches. Second, these benefits do not only apply to small synthetic benchmarks, but also extend to realistic concurrent data structures.

*Experimental Setup.* We conducted all experiments on a Dell PowerEdge M620 blade system, running a custom Debian-based distribution, with two Intel Xeon E5-2667 v2 CPU (8 cores @ 3.3 GHz), and 256 GB of RAM. We used LLVM 11.0.1 for Awamoche. Unless explicitly noted otherwise, all reported times are in seconds. We set a timeout limit of 30 min.


**Table 1.** Synthetic benchmarks

orch-run: *N* threads are spawned and wait to be signaled before they start performing thread-local computations.

wait-workers: A worker thread waits for *N* workers to publish their results before it starts running.

nr+nw: A synthetic benchmark where *K* reader threads wait until a variable written *L* times by a writer thread satisfies some condition (which cannot be satisfied).

conf-loop: *N* threads perform a confirmation-CAS loop similar to the one of Sect. 1.

#### **6.1 Results**

Let us first focus on some benchmarks that help us better understand where each of Awamoche's components can be applied (Table 1). Starting with orch-run, we see that even though blocked executions greatly outnumber complete executions, stale-reads annotations alone suffice to bring the number of blocked executions down to zero. This, however, is partly due to luck: in orch-run, main() spawns a number of workers that do not execute until they are signaled by main() using a special variable. In turn, because TruStstale follows a left-toright scheduling, when DPOR encounters the worker threads, the scenario where they are not signaled is not considered, since it implies reading a stale value.

By contrast, in wait-workers and nr+nw, stale-reads annotations are insufficient to eliminate blocking. In these benchmarks, some designated threads wait for the rest of the workers to perform some tasks before proceeding. However, it is not guaranteed that these designated threads are going to be always processed after the rest of the threads by DPOR, and thus stale-reads annotations have little to no effect. Employing in-place revisiting, on the other hand, leads to a dramatic performance improvement: the number of blocked executions is effectively eliminated (the single blocked execution in nr+nw is a liveness violation).


**Table 2.** Real-world benchmarks

mpmc-enq: *N* threads enqueue an item in a multiple-producer multiple-consumer queue. treiber-push: A lock-free stack implementation. *N* threads are pushing an item. m-enq: A modification of the Michael-Scott queue without the tail pointer. *N* threads are enqueueing an item.

Analogously to wait-workers and nr+nw, conf-loop demonstrates why inplace revisiting is insufficient when the success of an assume does not depend on a single load, but rather on a sequence of actions (as is the case in confirmation loops). As it can be seen, TruStIPR still explores blocked executions, which Awamoche manages to eliminate thanks to speculative revisits.

Moving to the final part of our evaluation, Table 2 demonstrates that the benefits of Awamoche extend to realistic workloads as well. As can be seen from Table 1, none of Awamoche's optimizations is redundant, as they are often all required to eliminate the exploration of blocked executions. Observe, however, that our benchmarks only exercise push or enqueue operations. This is because the respective pop or dequeue operations contain assume statements in their confirmation-CAS loops, and therefore cannot be optimized by Awamoche.

#### **7 Related Work**

The seminal work of Flanagan and Godefroid [13] has spawned a number of papers on DPOR. Among these, Optimal-DPOR [2] and TruSt [15] stand out, as they provide the first optimal DPOR algorithm, and the first optimal DPOR algorithm with polynomial memory consumption, respectively. TruSt is based on [17] and thus has the extra advantage of being parametric in the choice of the underlying weak memory model.

A lot of works improve on DPOR one way or another. Many techniques introduce coarser equivalence partitionings to combat the state-space explosion problem (e.g., [3,6–8,10–12]). Other works focus on extending it to weak memory models [1,4,5,17,20,24], while others try to leverage particular programming patterns [14,16,18]. Kokologiannakis, Ren, and Vafeiadis [18] in particular, deal with transforming spinloops into assume statements, the handling of which we optimize in this paper.

Among those, the work that is closest to ours is Godot [14]. Godot is an extension to DPOR that has a similar effect to in-place revisiting in the sense that it only explores executions that are either complete, or denote program termination errors. That said, Godot only works under SC, and cannot handle stale-read annotations or confirmation loops (which are instrumental in scaling the verification of concurrent data structures, as we saw in Sect. 6). In addition, Godot's loop transformation is static (in contrast to Awamoche's, which is dynamic), making it easy to construct examples where Godot's transformation does not work. Finally, even though Godot does not impose a "no write-write race" restriction on the input programs, this restriction is trivially satisfied for models like SC or TSO [26]: in such models, it is sound to transform writes to atomic exchange statements that write the value they read, thereby ordering all writes to each location.

### **8 Conclusion**

We presented Awamoche, the first memory-model-agnostic DPOR algorithm that is sound, complete, and strongly optimal for programs with await and confirmation-CAS loops. Awamoche avoids blocked executions that arise due to await loops by revisiting blocking reads in-place, and deals with confirmation-CAS loops by also considering revisits whenever two speculative reads read from the same write.

As our theoretical and experimental results demonstrate, Awamoche yields exponential benefits over the current state-of-the-art. Yet, it does not support certain more advanced patterns commonly appearing in concurrent programs, the handling of which we leave as future work. Examples of such patterns include confirmation-CAS loops with assume statements between the speculative and the confirmation reads (such statements may arise due to break/continue instructions), elimination backoff data structures, and await loops that use CASes instead of plain reads. We also believe that our key ideas for achieving strong optimality in these cases should be applicable in other scenarios as well, such as in programs with mutual exclusion locks or transactions.

**Acknowledgments.** We thank the anonymous reviewers for their feedback. This work has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 101003349).

### **References**

1. Abdulla, P.A., Aronis, S., Atig, M.F., Jonsson, B., Leonardsson, C., Sagonas, K.: Stateless model checking for TSO and PSO. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 353–367. Springer, Heidelberg (2015). https://doi.org/ 10.1007/978-3-662-46681-0 28


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

**Cyber-Physical and Hybrid Systems**

# **3D Environment Modeling for Falsification and Beyond with Scenic 3.0**

Eric Vin1(B) , Shun Kashiwa<sup>1</sup>, Matthew Rhea<sup>3</sup>, Daniel J. Fremont<sup>1</sup> , Edward Kim<sup>2</sup>, Tommaso Dreossi<sup>4</sup>, Shromona Ghosh<sup>5</sup>, Xiangyu Yue<sup>6</sup>, Alberto L. Sangiovanni-Vincentelli<sup>2</sup>, and Sanjit A. Seshia<sup>2</sup>

 University of California, Santa Cruz, USA *{*evin,shkashiw,dfremont*}*@ucsc.edu University of California, Berkeley, USA SentinelOne, Mountain View, USA insitro, San Francisco, USA Waymo LLC, Mountain View, USA

<sup>6</sup> The Chinese University of Hong Kong, Hong Kong, China

**Abstract.** We present a major new version of Scenic, a probabilistic programming language for writing formal models of the environments of cyber-physical systems. Scenic has been successfully used for the design and analysis of CPS in a variety of domains, but earlier versions are limited to environments that are essentially two-dimensional. In this paper, we extend Scenic with native support for 3D geometry, introducing new syntax that provides expressive ways to describe 3D configurations while preserving the simplicity and readability of the language. We replace Scenic's simplistic representation of objects as boxes with precise modeling of complex shapes, including a ray tracing-based visibility system that accounts for object occlusion. We also extend the language to support arbitrary temporal requirements expressed in LTL, and build an extensible Scenic parser generated from a formal grammar of the language. Finally, we illustrate the new application domains these features enable with case studies that would have been impossible to accurately model in Scenic 2.

**Keywords:** Scenario description language *·* Synthetic data *·* Probabilistic programming *·* Automatic test generation *·* Simulation

### **1 Introduction**

A major challenge in the design of cyber-physical systems (CPS) like autonomous vehicles is the heterogeneity and complexity of their environments. Increasingly, problems of perception, planning, and control in such environments have been tackled using machine learning (ML) algorithms whose behavior is not wellunderstood. This trend calls for verification techniques for ML-based CPS; however, a significant barrier has been the difficulty of constructing *formal models* that capture the diversity of these systems' environments [25]. Indeed, building such models is a prerequisite not only for verification but any formal analysis.

Scenic [10,12] is a probabilistic programming language that addresses this challenge by providing a precise yet readable formalism for modeling the environments of CPS. A Scenic program defines a *scenario* describing physical objects in a world, placing a probability distribution on their positions and other properties; a single program can generate many different concrete *scenes* by sampling from this distribution. Scenic also allows defining a stochastic policy describing how agents behave over time, and implementing the resulting dynamic scenarios in a variety of external simulators. Environment models defined in Scenic can be used for many tasks: falsification, as in the VerifAI toolkit [5], but also debugging, training data generation, and real-world experiment design [13]. These tasks have been successfully demonstrated in a variety of domains including autonomous driving [29], aviation [9], and reinforcement learning agents [1].

Despite Scenic's successes, it has several limitations that prevent its use in a number of applications of interest. First, the original language models the world as being *two-dimensional*, since this enables a substantial simplification in the language's syntax (e.g., orientations being a single angle) as well as optimizations in its implementation. The 2D assumption is reasonable for domains such as driving but leaves Scenic unable to properly model environments for aerial and underwater vehicles, for example. There can be problems even for ground vehicles: Scenic could not generate a scene where a robot vacuum is underneath a table, as their 2D bounding boxes would overlap and Scenic would treat them as colliding. The use of bounding boxes rather than precise shapes also leads Scenic to use a simplistic visibility model that ignores occlusion, making it possible for Scenic to claim objects are visible when they are not and vice versa: a serious problem when generating training data for a perception system.

Fundamentally, verification of AI-based autonomous systems requires reasoning about perception and physics in a 3D world. To support such reasoning, a formal environment modeling language must provide faithful representations of 3D geometry. Towards this end, we present Scenic 3.0<sup>1</sup>, a largely backwardscompatible major release featuring:


<sup>1</sup> Available at: https://github.com/BerkeleyLearnVerify/Scenic/.

– **Rewritten Parser**: We give a Parsing Expression Grammar [8] for Scenic, using it to generate a parser with more precise error messages and better support for new syntax and optimization passes.

We first define the new features in Scenic 3 in detail in Sect. 2, working through several toy examples. Then, in Sect. 3, we describe two case studies using Scenic with scenarios that could not be accurately modeled without the new features: falsifying a specification for a robot vacuum and generating training data constrained by an LTL formula for a self-driving car's perception system.

*Related Work.* There are many tools for test and data generation [3]. Some approaches learn from examples [7,26] and so do not provide specific control over scenarios as Scenic does. Approaches based on rules or grammars [17,20, 26] provide some control but have difficulty enforcing requirements over the generated data as a whole. Several probabilistic programming languages have been used for generation of objects and scenes [15,22,23], but none of them provide specialized syntax to lay out geometric scenarios, nor for describing dynamic behaviors. Finally, there has been work on synthetic data generation of 3D scenes and objects using ML techniques such as GANs (e.g., [7,14,30]), but these lack the specificity and controllability provided by a programming language like Scenic.

#### **2 New Features**

#### **2.1 3D Geometry**

The primary new feature in Scenic 3 is the generalization of the language to 3 dimensions. Some changes, like changing the type system so that vectors have length 3, are obvious: here we focus on cases where the existing syntax of Scenic does not easily generalize, using simple scenarios to motivate our design choices.

The first challenge when moving to 3D is the representation of an object's orientation in space: Scenic's existing heading property, providing a single angle, is no longer sufficient. Instead, we introduce yaw, pitch, and roll angles, using the common convention for aircraft that these represent *intrinsic* rotations (i.e., yaw is applied first, then pitch is applied to the resulting orientation, etc.). Using intrinsic angles makes it easy to compose rotations: for example if we point an airplane towards a landing strip with yaw and pitch (either manually or using Scenic's facing toward specifier — more on this below), we can add an additional roll by adding to that property. To further simplify composition, we add a parentOrientation property which specifies the local coordinate system in which the 3 angles above should be interpreted (by default, the global coordinate system). This allows the user to specify an orientation with respect to a previously-computed orientation, for instance that of a tilted surface.

Scenic provides a flexible system of natural language *specifiers* which can be combined to define properties of objects. Consider the following Scenic 3 code:

```
1 objectA = new Object at (1, 2, 3), facing (45 deg, 0, 90 deg)
2 objectB = new Object left of objectA by 1
3 objectC = new Object above objectB by 1,
4 facing (Range(0,30) deg, Range(0,30) deg, 0)
```
Here, we use the at specifier to define a specific position for object A; the facing specifier defines the object's orientation using explicit yaw, pitch, and roll angles. We then place object B left of A by 1 unit with the left of specifier: this specifier now not only sets the position property, but also sets the parentOrientation property to the orientation of object A (unless explicitly overridden). Thus object B will be oriented the same way as A. Similarly, object C is positioned relative to B and so inherits its orientation as its parentOrientation. However, this time we use the facing specifier to define random yaw and pitch angles, so object C will face up to 30◦ off of B.

Another way to specify an object's orientation is the facing toward specifier. This is a case where the 2D semantics become ambiguous in 3D. Consider a scenario where the user wants an airplane to be "facing toward" a runway: the plane's body should be oriented toward the runway (giving its yaw), but it is not clear whether in addition the plane should be pitched downward so that its nose points directly toward the runway. To allow for both interpretations, Scenic 3 has facing toward only specify yaw, while the new facing directly toward specifier also specifies pitch. This is illustrated in Fig. 1.

Another common practice in 3D space is to place one object *on* another. For example, we may want to place a chair on a floor, or a painting on a wall. Scenic's existing on specifier, which sets the position of an object to be a uniformly random point in a given region, does not suffice for such cases because it would cause the chair to intersect the floor or the painting to penetrate the wall (or both). To fix this issue, we allow each object to define a *base* point, which on positions instead of the object's center. The default base point is the bottom center of the object's bounding box, suitable for cars and chairs for example; a Painting class could override this to be the back center. Finally, to enable placing objects on each other, objects can provide a topSurface property specifying the surface which is considered the "top" for the purposes of the on specifier. As before, there is a reasonable default (the upward-pointing faces of the object's mesh) that can be overridden. This syntax is illustrated in Fig. 2.

A final 3D complication arises when positioning objects on irregular surfaces. Consider a pair of cars driving up an uneven mountain road, with one 10 m behind the other. We can use the ahead of specifier to place one car 10 m ahead of the other, but then the car will penetrate the road due to its upward slope. Alternatively, the on specifier can correctly place the car so it is tangent to the road, but then we cannot directly specify the distance between the cars. The natural semantics here would be to combine the constraints from *both* specifiers, but this is illegal in Scenic 2 where a given property (such as position) can only be specified by a single specifier at a time. We enable this usage in Scenic 3 by introducing the concept of a *modifying specifier* that modifies the value of a property already defined by another specifier. Specifically, if an object's

**Fig. 1.** Line-of-sight-based orientations in Scenic. The ego ball (highlighted green) is placed above the origin, as seen by the RGB global coordinate axes, with one plane facing towards the ego and another facing directly toward the ego. (Color figure online)

**Fig. 2.** A Scenic program placing a chair on a floor. The Z-axis of the global coordinate axes protrudes from the floor, indicating which direction is up.

position is already specified, the on specifier will *project* that position down onto the given surface. This is illustrated by the green chair in Fig. 3.

Note that the green chair is correctly upright on the floor even though it was positioned relative to the cube, and so should inherit parentOrientation from the cube as discussed above. In this situation, the user has provided no explicit orientation for the chair, and both below and on can provide one. To resolve this ambiguity, we introduce a *specifier priority* system, where specifiers have different priorities for the properties they specify (generalizing Scenic's existing system where a specifier could specify a property *optionally*). In our example, below specifies position with priority 1 and parentOrientation with priority 3, while on specifies these with priorities 1 and 2 respectively. So both specifiers determine position (with on modifying the value from below as explained above), but on takes precedence over below when specifying parentOrientation. This yields the expected behavior while still allowing below to determine the orientation when used in combination with other specifiers than on.

**Fig. 3.** A Scenic program placing a green chair on the floor under a rotated cube in midair. A blue chair is placed directly under the cube for clarity. (Color figure online)

#### **2.2 Mesh Shapes and Regions**

Scenic 2's approximation of objects by their bounding boxes was adequate for 2D driving scenarios, for example, but is wholly inadequate in 3D, where objects are commonly far from box-shaped. For example, consider placing a chair tucked in under a table. Since the bounding boxes of these two objects intersect, Scenic 2 would always reject this situation as a collision and try to generate a new scene, even if the chair and table are entirely separate. In Scenic 3, each object has a precise shape given by its shape property, which is set to an instance of the class Shape. The most general Shape class is MeshShape, which represents an arbitrary 3D mesh and can be loaded from standard formats; classes for primitive shapes like spheres are provided for convenience. These shapes are used to perform precise collision and containment checks between objects and regions.

Scenic also supports mesh regions, which can either represent surfaces or volumes in 3D space. For example, given a mesh representing an ocean we might want to sample on the surface for a boat or in the volume for a submarine.

All meshes in Scenic are handled using Trimesh [4], a Python library for triangular meshes, which internally calls out to the tools Blender [27] and Open-SCAD [28] for several operations. These operations tend to be expensive, so Scenic uses several heuristics to cheaply determine simple cases; these can give between a 10x–1000x speedup when sampling scenes.

#### **2.3 Precise Visibility Model**

Scenic 2's visibility system simply checks if the bounding box corners of objects are contained in the view cone of the viewing object, which is no longer adequate for 3D scenarios with complex shapes. Visibility checks are now done using ray tracing, and account for objects being able to occlude visibility. In addition to standard pyramidal view cones used for cameras, Scenic correctly handles wraparound view regions such as those of common LiDAR sensors. Visibility checks use a configurable density of rays, and are optimized to only send rays in areas where they could feasibly hit the object.

#### **2.4 Temporal Requirements**

A key feature of Scenic is the ability to declaratively impose constraints on generated scenes using require statements. However, Scenic 2 only provides limited support for *temporal* requirements constraining how a dynamic scenario evolves over time, with the require always and require eventually statements. Slightly more complex examples, like "cars A and B enter the intersection after car C", require the user to explicitly encode them as monitors, which is error-prone and yields verbose hard-to-read imperative code: this property requires an 8-line monitor in [12].

Scenic 3 extends require to arbitrary properties in Linear Temporal Logic [21], allowing natural properties like this to be concisely expressed:

```
1 require (carA not in intersection and carB not in intersection
2 until carC in intersection)
```
The semantics of the operators always, eventually, next, and until are taken from RV-LTL [2] to properly model the finite length of Scenic simulations.

#### **2.5 Rewritten Parser**

For interoperability with Python libraries, Scenic is compiled to Python, and the original Scenic parser was implemented on top of the Python parser. This approach imposed serious restrictions on the language design (e.g., forcing nonintuitive operator precedences), made extending the parser difficult, and led to misleading error messages which pointed to the wrong part of the program.

Scenic 3 uses a parser automatically generated from a Parsing Expression Grammar (PEG) [8] for the language. The parser is based on Pegen [24], the parser generator developed for CPython, and the grammar itself was obtained by extending the Python PEG. The new parser outputs an abstract syntax tree representing the structure of the original Scenic code (unlike the old parser), ensuring that syntax errors are correctly localized and simplifying the task of writing analysis and optimization passes for Scenic.

This new parser gives us flexibility in designing and implementing the language. For example, we carefully assigned precedence to the four new temporal operators so that users can naturally express temporal requirements without unnecessary parentheses. There are additional benefits from having a precise machine-readable grammar for Scenic: for instance, as we wrote the grammar, we discovered ambiguities that had previously been unnoticed and made minor changes to the language to eliminate them. The grammar could also be be used to fuzz test the compiler and other tools operating on Scenic programs.

### **3 Case Studies**

In this section, we discuss two case studies in the robotics simulator Webots [19]. The code for both case studies is available in the Scenic GitHub repository [11]. The first case study, performing falsification of a robot vacuum, illustrates a domain that could not be modeled in Scenic 2 due to the lack of 3D support. The second case study, generating data constrained by an LTL formula for testing or training the perception system of an autonomous vehicle, is an example of how the new features in Scenic 3 can significantly improve effectiveness even in one of Scenic's original target domains.

#### **3.1 Falsification of a Robot Vacuum**

In this example we evaluate the iRobot Create [16], a robot vacuum, on its ability to effectively clean a room filled with objects. We use a specification stating that the robot must clean at least a third of the room within 5 min: in Signal Temporal Logic [18], the formula ϕ = F[0*,*300](*coverage* > 1/3). We use Scenic to generate a complete room and export it to Webots for simulation. The room is surrounded by four walls and contains two main sections: in the dining room section, we place a table of varied width and length randomly on the floor, with 3 chairs tucked in around it and another chair fallen over. In the living room section, we place a couch with a coffee table in front of it, both leaving randomly-sized spaces roughly the diameter of the robot vacuum. We then add a variable number of toys, modeled as small boxes, cylinders, cones, and spheres, placed randomly around the room; for a taller obstacle, we place a stack of 3 box toys somewhere in the room. Finally, we place the vacuum randomly on the floor, and use Scenic's mutate statement to add noise to the positions and yaw of the furniture. Several scenes sampled from this scenario are shown in Fig. 4.

We tested the default controller for the vacuum against 0, 1, 2, 4, 8, and 16-toy variants of our Scenic scenario, running 25 simulations for each variant. For each simulation, we computed the robustness value [6] of our spec ϕ. The average values are plotted in Fig. 5, showing a clear decline as the number of toys increases. Many of the runs actually falsified ϕ: up to 44% with 16 toys.

There are several aspects of this example that would not be possible in Scenic 2. First, the new syntax in Scenic 3 allows for convenient placement of objects, specifically the use of on in combination with left of and right of, to place the chairs on the appropriate side of the dining table but on the floor. Many of the objects are also above others and have overlapping bounding boxes, but because Scenic now models shapes precisely, it is able to properly register these

**Fig. 4.** Several sampled scenes from the robot vacuum scenario.

**Fig. 5.** Spec. robustness value vs. number of toys, averaged over 25 simulations.

objects as non-intersecting and place them in truly feasible locations (e.g., in Fig. 4, the toy under the dining table in the top left scene and the robot under the coffee table in the bottom right scene).

#### **3.2 Constrained Data Generation for an Autonomous Vehicle**

In this example we generate instances of a potentially-unsafe driving scenario for use in training or testing the perception system of an AV. Consider a car passing in front of the AV in an intersection where the AV must yield, and so needs to detect the other car before it becomes too late to brake and avoid a collision. We want to generate time series of images labeled with whether or not the crossing car is visible, for a variety of different scenes with different city

**Fig. 6.** Intersection simulation images, with visibility label for the crossing car.

layouts to provide various openings and backdrops. Our scenario places both the ego car (the AV) and the crossing car randomly on the appropriate road ahead of the intersection. We place several buildings along the crossing road that block visibility, allowing some randomness in their position and yaw values. We also place several buildings completely randomly behind the crossing road to provide a diverse backdrop of buildings in the images. Finally, we want to constrain data generation to instances of this scenario where the crossing car is not visible until it is close to the AV, as these will be the most challenging for the perception system. Using the new LTL syntax, we simply write:

# <sup>1</sup> require (not *ego* can see car) until distance to car <sup>&</sup>lt; <sup>75</sup>

Figure 6 shows a simulation sampled from this scenario. In Scenic 2, the crossing car would be wrongly labeled as visible in image (a), since the occluding buildings would not be taken into account. This would introduce significant error into the generated training set, which in previous uses of Scenic had to be addressed by manually filtering out spurious images; this is avoided with the new system.

### **4 Conclusion**

In this paper we presented Scenic 3, a major new version of the Scenic programming language that provides full native support for 3D geometry, a precise occlusion-aware visibility system, support for more expressive temporal operators, and a rewritten extensible parser. These new features extend Scenic's use cases for developing, testing, debugging, and verifying cyber-physical systems to a broader range of application domains that could not be accurately modeled in Scenic 2. Our case study in Sect. 3.1 demonstrated how Scenic 3 makes it easier to perform falsification for CPS with complex 3D environments. Our case study in Sect. 3.2 further showed that even in domains that could already be modeled in Scenic 2, like autonomous driving, Scenic 3 allows for significantly more precise specifications due to its ability to reason accurately about 3D orientations, collisions, visibility, etc.; these concepts are often relevant to the properties we seek to prove about a system or an environment we want to specify. We expect the improvements to Scenic we describe in this paper will impact the formal methods community both by extending Scenic's proven use cases in simulation-based verification and analysis to a much wider range of application domains, and by providing a 3D environment specification language which is general enough to allow a variety of new CPS verification tools to be built on top of it.

In future work, we plan to develop 3D scenario optimization techniques (complementing the 2D methods Scenic already uses) and explore additional 3D application domains such as drones. We also plan to leverage the new parser to allow users to define their own custom specifiers and pruning techniques.

**Acknowledgements.** The authors thank Ellen Kalvan for helping debug and write tests for the prototype, and several anonymous reviewers for their helpful comments. This work was supported in part by DARPA contracts FA8750-16-C0043 (Assured Autonomy) and FA8750-20-C-0156 (Symbiotic Design of Cyber-Physical Systems), by Berkeley Deep Drive, by Toyota through the iCyPhy center, and NSF grants 1545126 (VeHICaL) and 1837132.

#### **References**


30. Wang, K., Savva, M., Chang, A.X., Ritchie, D.: Deep convolutional priors for indoor scene synthesis. ACM Trans. Graph. **37**(4), 70 (2018). https://doi.org/10. 1145/3197517.3201362

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Unified Model for Real-Time Systems: Symbolic Techniques and Implementation**

S. Akshay<sup>1</sup> , Paul Gastin2,4 , R. Govind1(B) , Aniruddha R. Joshi<sup>1</sup> , and B. Srivathsan3,4

<sup>1</sup> Department of CSE, Indian Institute of Technology Bombay, Mumbai, India

{akshayss,govindr,aniruddhajoshi}@cse.iitb.ac.in <sup>2</sup> Université Paris-Saclay, ENS Paris-Saclay, CNRS, LMF, 91190 Gif-sur-Yvette,

France

paul.gastin@ens-paris-saclay.fr <sup>3</sup> Chennai Mathematical Institute, Chennai, India sri@cmi.ac.in <sup>4</sup> CNRS, ReLaX, IRL 2000, Siruseri, India

**Abstract.** In this paper, we consider a model of *generalized timed automata* (GTA) with two kinds of clocks, *history* and *future*, that can express many timed features succinctly, including timed automata, eventclock automata with and without diagonal constraints, and automata with timers.

Our main contribution is a new simulation-based zone algorithm for checking reachability in this unified model. While such algorithms are known to exist for timed automata, and have recently been shown for event-clock automata without diagonal constraints, this is the first result that can handle event-clock automata with diagonal constraints and automata with timers. We also provide a prototype implementation for our model and show experimental results on several benchmarks. To the best of our knowledge, this is the first effective implementation not just for our unified model, but even just for automata with timers or for event-clock automata (with predicting clocks) without going through a costly translation via timed automata. Last but not least, beyond being interesting in their own right, generalized timed automata can be used for model-checking event-clock specifications over timed automata models.

**Keywords:** Real-time systems · Timed automata · Event-clock automata · Clocks · Timers · Verification · Zones · Simulations · Reachability

c The Author(s) 2023

This work was supported by UMI ReLaX, IRL 2000 and DST/CEFIPRA/INRIA Project EQuaVE. S Akshay was supported in part by DST/SERB Matrics Grant MTR/2018/000744. Paul Gastin was partially supported by ANR project Ticktac (ANR-18-CE40-0015).

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 266–288, 2023. https://doi.org/10.1007/978-3-031-37706-8\_14

**Fig. 1.** An automaton with clocks on left, and timers on right for same constraints.

#### **1 Introduction**

The idea of adding real-time dynamics to formal verification models started as a hot topic of research in the 1980 s [6,11]. Over the years, timed automata [8,9] has emerged as a leading model for finite-state concurrent systems with real-time constraints. Timed automata make use of *clocks*, real-valued variables which increase along with time. Constraints over clock values can be used as guards for transitions, and clocks can be reset to 0 along transitions. It is notable that the early works in this area made use of *timers* to deal with real-time [13,22,32]. Timers are started by setting them to some initial value within a given interval. Their values decrease with time, and an *timeout* event can be used in transitions to detect the instant when the timers become 0. Quoting from [6], the shift from timers to clocks in timed automata, as we know them today, is attributed to the fact that: "*apart from some technical conveniences in developing the emptiness algorithm and proving its correctness, the reformulation allows a simple syntactic characterization of determinism for timed automata*". Over the last thirty years, the study of timed automata has led to the development of rich theory and industry-strength verification tools. The use of clocks has also allowed for the extension of the model to more complex constraints and assignments to clocks in transitions [14,17]. Furthermore, considering more sophisticated rates of evolution for clocks gives the yet another well-established model of hybrid automata [7].

When it comes to the reachability problem, timers do have some nice properties. Let us explain with an example. Figure 1 shows a timed automaton on the left, and an automaton with timers on the right, for the set of words *ab*<sup>∗</sup> such that the time between every consecutive letters is 1. The timed automaton sets clock *x* to 0 and checks for the guard *x* = 1? to enforce the timing constraint. The automaton with timers, on the right, sets a timer *<sup>t</sup>x* to 1, and asks for its expiry in the immediate next action. Clock *<sup>y</sup>* and timer *<sup>t</sup>y* are not necessary for the required timing property, but we add them to illustrate a different aspect that we will describe now. To solve the reachability problem, a symbolic enumeration of the state space is performed. In the timed automaton, at state *q*1, the enumeration gives constraints *y*−*x* = *n* for every *n* ≥ 0. Starting from *y*−*x* = *n* and executing *b* gives *y* − *x* = *n* + 1, due to the combination of guard *x* = 1? and reset *x* := 0. This shows that a naïve symbolic enumeration is not bound to terminate. The question of developing finite abstractions for timed automata has been a central problem of study which started in the late 90s and continues till date (see recent surveys [18,38]). Such an issue does not occur with timers. In the automaton with timers on the right, *<sup>t</sup>x* is set to 1 and *<sup>t</sup>y* is set to some arbitrary value in the transition to *<sup>q</sup>*1. This gives <sup>−</sup><sup>1</sup> <sup>≤</sup> *<sup>t</sup>y* <sup>−</sup> *<sup>t</sup>x* ≤ ∞ for the set of all possible timer values. When *<sup>t</sup>x* times out, the value of *<sup>t</sup>y* could still be any value from 0 to <sup>∞</sup>. When *<sup>t</sup>x* is set to 1 again, the set of possible timer values still satisfies the same constraint <sup>−</sup><sup>1</sup> <sup>≤</sup> *<sup>t</sup>y* <sup>−</sup> *<sup>t</sup>x* ≤ ∞ leading to a fixed point with a finite reachable state space. The fact that symbolic enumeration terminates on an automaton with timers was already observed in [22]. To our knowledge, later works on timed automata reachability never went back to timers, and there is no tool support that we know of to deal with models with timers directly. We find this surprising given that timers occur naturally while modeling real-time systems and moreover they enjoy this finiteness property.

In addition to clocks and timers, *event-clocks* are another special type of clock variables that are used to deal with timing constraints [10], which are attached to events. An event-recording clock for event *a* maintains the time since the previous occurrence of *a*, whereas an event-predicting clock for *a* gives the time to the next occurrence of *a*. Event-clocks have been used in the model of eventclock automata (ECA), and also in the logic of event-clocks [36]. These works argue that event-clocks can express typical real-time requirements. Theoretically, ECA can be determinized, and hence complemented. Therefore, model-checking an event-clock (logic or automaton) specification *ϕ* over a timed automaton A can be reduced to reachability on the product of A and the ECA for ¬*ϕ*. This makes event-clocks a convenient feature in specifications.

Recently, a symbolic enumeration algorithm for ECA was proposed [3]. It was noticed that when restricted to event-predicting clocks, the symbolic enumeration terminates without any additional checks (similar to the case of timers), whereas for the combination involving event-recording clocks, one needs simulation techniques from the timed automata literature. The same work showed how to adapt the best known simulation technique from timed automata into the setting of ECA. However, as discussed above, for model-checking we need a model containing both conventional clocks, timers and event-clocks. To our knowledge, no tool can directly work on such models.

Our goal in this work is to provide a one stop solution to real-time verification, be it reachability analysis or model-checking (over event-clock specifications), be it using models with clocks, or models with timers. We consider a unified model of a timed automaton over variables that can simulate normal clocks, timers and event-clocks. Here are our key contributions:

1. We define a new model of generalized timed automata (GTA) which have two types of variables, called *history* clocks and *future* clocks. History clocks generalize normal clocks as well as event-recording clocks, while future clocks generalize event-predicting clocks and timers. However, unlike event-clocks, clocks in GTA are not necessarily associated with events. We also consider a generic syntax that allows for diagonal constraints between variables.


*Related Works.* In the work that first introduced ECA, a translation from ECA to a timed automaton was also proposed. However, this translation is not efficient: in the worst case, this translation incurs a blowup in the number of clocks and states. In [27,28], an extrapolation approach using maximal constants has been studied for ECA. However, it has been observed that simulation-based techniques are both more effective [14,16] and efficient [5,24–26] than extrapolation for checking reachability. Recently, [3] proposed a zone-based reachability algorithm for diagonal-free ECA, using simulations for finiteness, but there was no accompanying implementation. Diagonal constraints have long been known to allow succinct modeling [15] for the class of timed-automata, but only recently a zone-based algorithm that directly works on such automata, was proposed. ECA with diagonals are more expressive than ECA [19]. In this work, we propose a zone-based algorithm for a unified model that subsumes ECA with diagonals.

The use of history clocks and prophecy clocks in ECAs is in the same spirit as past and future modalities in temporal logics - this makes ECAs an attractive model for writing timed specifications. Indeed, this has also led to a development of various temporal logics with event-clocks [1,23,36]. ECA with diagonal constraints have been well-studied, such as in the context of timeline based planning [19,20]. Finally, while there has been substantial advances in the theory of ECA, to the best of our knowledge, the only tool that handles ECA is Tempo [37], and even this tool is restricted to just history clocks.

*Structure of the Paper.* In Sect. 2 we start by defining the generalized model. Section 3 examines its expressiveness, while Sect. 4 deals with the reachability problem and the safe subclass. Section 5 develops the symbolic enumeration technique, while Sect. 6 explains how distance graphs can be extended to this setting. Section 7 is dedicated to finiteness. Finally, we provide our experimental results in Sect. 8 and conclude with Sect. 9. All the missing proofs can be found in the full version of the paper [2].

### **2 Generalized timed automata**

In this section we introduce the unified model. While we build on classical ideas from timed automata, almost every aspect is extended and below we highlight these changes. We define *<sup>X</sup>* <sup>=</sup> *<sup>X</sup>H XF* to be a finite set of real-valued variables called *clocks*, where *<sup>X</sup>H* is the set of *history clocks*, and *<sup>X</sup>F* is the set of *future clocks*. History clocks always have a non-negative value and can increase arbitrarily along with time. Future clocks always have a non-positive value and can only increase until their values hit 0. History clocks simulate the usual clocks in timed automata and recording clocks of event-clock automata (ECA), and future clocks simulate timers and prophecy clocks of ECA. Both these clocks can take a special "undefined value" which marks that they are inactive. To deal with this naturally, we consider an extension of the reals with +∞ and −∞ as in [3]. The difference here is that we also have the so-called diagonal constraints.

**Extending Clock Constraints.** Let <sup>R</sup> <sup>=</sup> <sup>R</sup> ∪ {−∞*,* <sup>+</sup>∞} denote the set of all real numbers along with −∞ and +∞. The usual *<* order on reals is extended to deal with {−∞*,* <sup>+</sup>∞} as: −∞ *<c<* <sup>+</sup><sup>∞</sup> for all *<sup>c</sup>* <sup>∈</sup> <sup>R</sup> and −∞ *<sup>&</sup>lt;* <sup>∞</sup>. Similarly, <sup>Z</sup> <sup>=</sup> <sup>Z</sup> ∪ {−∞*,* <sup>+</sup>∞} denotes the set of all integers along with −∞ and +∞. Let <sup>R</sup>≥<sup>0</sup> (resp. <sup>R</sup>≤<sup>0</sup>) be the set of non-negative (resp. non-positive) reals. Let <sup>C</sup> <sup>=</sup> {(*, c*) <sup>|</sup> *<sup>c</sup>* <sup>∈</sup> <sup>R</sup> and ∈ {≤*, <*}}, called the set of weights.

Let *X* ∪ {0} be the set obtained by extending the clocks of GTA with the special constant clock 0. Note that this clock will always have the value 0. Let *Φ*(*X*) denote a set of clock constraints generated by the following grammar: *<sup>ϕ</sup>* ::= *<sup>x</sup>* <sup>−</sup> *yc* <sup>|</sup> *<sup>ϕ</sup>* <sup>∧</sup> *<sup>ϕ</sup>* where *x, y* <sup>∈</sup> *<sup>X</sup>* ∪ {0}, (*, c*) ∈ C and *<sup>c</sup>* <sup>∈</sup> <sup>Z</sup>. The introduction of the special constant clock 0 allows us to treat constraints with just a single clock as special cases: the constraint *xc* is equivalent to *x* − 0  *c* and the constraint *cx* is equivalent to 0 − *x*  −*c*. We often write *x* = *c* as a shorthand for *x* ≤ *c* ∧ *c* ≤ *x*. Constraints of the form *x* − *yc* will be called *atomic constraints*. A constraint of the form *x* − *yc* is a *diagonal* (resp. *non-diagonal*) constraint if *x, y* = 0 (resp. *x* = 0 or *y* = 0).

To evaluate the constraints allowed by *Φ*(*X*), we extend addition on real numbers with the convention that (+∞) + *<sup>α</sup>* <sup>=</sup> *<sup>α</sup>* + (+∞)=+<sup>∞</sup> for all *<sup>α</sup>* <sup>∈</sup> <sup>R</sup> and (−∞) + *β* = *β* + (−∞) = −∞, as long as *β* = +∞. We also extend the unary minus operation from real numbers to <sup>R</sup> by setting <sup>−</sup>(+∞) = −∞ and −(−∞)=+∞. Abusing notation, we write *β* −*α* for *β* + (−*α*). Notice that with this extended addition, the minus operation does not distribute over addition<sup>1</sup>.

**Extending Valuations.** A valuation of clocks is a function *v* : *X* ∪ {0} <sup>→</sup> <sup>R</sup> which maps the special clock 0 to 0, history clocks to <sup>R</sup>≥<sup>0</sup> ∪ {+∞} and future clocks to <sup>R</sup>≤<sup>0</sup> ∪ {−∞}. We denote by <sup>V</sup>(*X*) or simply by <sup>V</sup> the set of valuations over *<sup>X</sup>*. We say that clock *<sup>x</sup>* is *defined* (resp. *undefined*) in *<sup>v</sup>* when *<sup>v</sup>*(*x*) <sup>∈</sup> <sup>R</sup> (resp. *v*(*x*) ∈ {−∞*,* +∞}). Let *x, y* ∈ *X* ∪ {0} be clocks (including 0) and let (*, c*) be a weight. For valuations *<sup>v</sup>* <sup>∈</sup> <sup>V</sup>, define *<sup>v</sup>* <sup>|</sup><sup>=</sup> *<sup>y</sup>* <sup>−</sup> *xc* as *<sup>v</sup>*(*y*) <sup>−</sup> *<sup>v</sup>*(*x*)  *c*.

<sup>1</sup> Notice that <sup>−</sup>(*<sup>a</sup>* <sup>+</sup> *<sup>b</sup>*)=(−*a*)+(−*b*) when *<sup>a</sup>* or *<sup>b</sup>* is finite or when *<sup>a</sup>* <sup>=</sup> *<sup>b</sup>*. But, when *a* = +∞ and *b* = −∞ then −(*a* + *b*) = −∞ whereas (−*a*)+(−*b*)=+∞.

We say that a valuation *v* satisfies a constraint *ϕ* in *Φ*(*X*), denoted as *v* |= *ϕ*, when *v* satisfies all atomic constraints in *ϕ*.

By definition, we easily check that the constraint *y* − *xc* is equivalent to *true* (resp. *false*) when (*, c*)=(≤*,* +∞) (resp. (*, c*)=(*<,* −∞)). Constraints that are equivalent to *true* or *false* will be called trivial, whereas all others are non-trivial constraints. If (*, c*) = (≤*,* +∞) then *v* |= *y* − *xc* never holds when *v*(*x*) = −∞. Also, if *v*(*x*) = *v*(*y*) ∈ {−∞*,* +∞} then *v* |= *y* − *xc* only holds for (*, c*)=(≤*,* +∞). For a non-trivial constraint *y* − *xc*, we have


We abuse notation and for *Y* ⊆ *X*, we define *Y c* as - *y*∈*Y yc*, and *<sup>Y</sup>* <sup>=</sup> *<sup>c</sup>* as - *y*∈*Y <sup>y</sup>* <sup>=</sup> *<sup>c</sup>*. We denote by *<sup>v</sup>* <sup>+</sup> *<sup>δ</sup>* the *valuation* obtained from valuation *<sup>v</sup>* by increasing by *<sup>δ</sup>* <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> the value of all clocks in *<sup>X</sup>*. Note that, from a given valuation, not all time elapse result in valuations since future clocks need to stay at most 0. For example, from a valuation with *v*(*x*) = −3 and *v*(*y*) = −2, where *x, y* are future clocks, one can elapse at most 2 time units.

**Extending Resets.** For history clocks, the reset operation sets the clock to 0. For future clocks, the reset operation says that all constraints on the clock must be discarded, i.e., the clock is *released*. Given that the set of clocks is partitioned into history clocks and future clocks, we use the same notation [*R*]*v* to talk about the change of clocks in *R*, whether it be reset/release. Formally, given a set of clocks *<sup>R</sup>* <sup>⊆</sup> *<sup>X</sup>*, we define [*R*]*<sup>v</sup>* as {*v* <sup>∈</sup> <sup>V</sup> <sup>|</sup> *<sup>v</sup>* (*x*)=0 <sup>∀</sup> *<sup>x</sup>* <sup>∈</sup> *<sup>R</sup>* <sup>∩</sup> *<sup>X</sup>H* and *<sup>v</sup>* (*x*) = *v*(*x*) ∀ *x* ∈ *R*}. Observe that *the release operation* is implicit: each future clock in *R* could take any value (not necessarily the same) from [−∞*,* 0] in [*R*]*v*. Note that [*R*]*v* is a singleton when *R* contains only history clocks - this corresponds exactly to the reset operation in timed automata. Then, we simply write *v* = [*R*]*v* instead of {*v* } = [*R*]*v*. When *R* contains only future clocks, [*R*]*v* is the set of valuations obtained by releasing each clock in *R* while keeping the value of all other clocks unchanged. For *<sup>W</sup>* <sup>⊆</sup> <sup>V</sup>, we let [*R*]*<sup>W</sup>* <sup>=</sup> *v*∈*W* [*R*]*v*. We have [*R* ∪ *R*]*W* = [*R* ]([*R*]*W*).

**Extending Guards and Transitions.** Before we define GTA, let us focus on the language to specify transitions. In normal timed automata, as shown in Fig. 2, a transition reads a letter, checks a guard *<sup>g</sup>* <sup>∈</sup> *<sup>Φ</sup>*(*XH*) and then resets a subset *R* of (history) clocks. But in any one transition only a pair of guard, reset is performed and one cannot interleave them.

**Fig. 2.** A transition of TA (left) and of a GTA (right)

We generalize this to our setting with history and future clocks but also to allow arbitrary interleaving of guards and changes (to model this with a TA one may use a sequence of multiple transitions without delays in-between.) Formally, an *instantaneous timed program* is generated by the following grammar:

### prog := guard | change | prog; prog

where guard = *g* ∈ *Φ*(*X*) and change = [*R*] for some *R* ⊆ *X*. While guard and change are atomic programs, prog; prog refers to sequential composition. The set of all programs generated by the above grammar will be denoted Programs. Then on a transition, we simply have a pair of letter label and an instantaneous timed program, e.g., (*a,* prog) in Fig. 2 (right).

The semantics for programs on a transition must generalize semantics for guards (defined using satisfaction relation |= above) and resets/release (defined using [*R*] above). But there is an obvious difference between these two: a guard may be crossed only if the valuation before the guard satisfies it, whereas a *change* (reset or release) defines a relation between the valuations before and after the change. To capture both in a uniform way, we define the semantics of programs as relations on pairs of valuations. Formally, for *v, v* <sup>∈</sup> <sup>V</sup>, prog <sup>∈</sup> Programs we define (*v, v* ) |= prog, more conveniently written as *v* prog −−−→ *v* , inductively:

.

$$\begin{array}{l} \text{\$ --v\$ } \xrightarrow{\mathcal{g}} v' \text{ if } v \left| \right| \leftrightharpoons \mathcal{g} \text{ and } v' = v, \\\text{\$ --v\$ } \xrightarrow{[R]} v' \text{ if } v' \in [R] \mathrel{\newline} v, \\\text{\$ --v\$ } \xrightarrow{\text{\$ \operatorname{pro}\_1; \operatorname{pro}\_2}} v' \text{ if } \exists v'' \in \mathbb{V} \text{ such that } v \xrightarrow{\operatorname{pro}\_1} v'' \text{ and } v'' \xrightarrow{\operatorname{pro}\_2} v'. \end{array}$$

Now, we have all the pieces necessary to define our generalized model.

**Definition 1 (Generalized timed automata).** *A* generalized timed automata <sup>A</sup> *is given by a tuple* (*Q, Σ, X, Δ,*(*q*0*, g*0)*,*(*Qf , gf* ))*, where <sup>Q</sup> is a finite set of states, <sup>Σ</sup> is a finite alphabet of actions, <sup>X</sup>* <sup>=</sup> *<sup>X</sup>F <sup>X</sup>H is a set of clocks partitioned into future and history clocks, the initialization condition is a pair comprising of an initial state q*<sup>0</sup> ∈ *Q and an initial guard g*<sup>0</sup> ∈ *Φ*(*X*) *which should be satisfied by initial valuations, similarly, the final condition is a pair comprising of a set of final states <sup>Q</sup>f* <sup>⊆</sup> *<sup>Q</sup> along with a final guard <sup>g</sup>f that must be satisfied by final valuations, and Δ* ⊆ (*Q* × *Σ* × *Programs* × *Q*) *is a finite set of transitions. Δ contains transitions of the form* (*q, a, prog, q* )*, where q is the source state, q is the target state, a is the action triggering the transition, and prog is the instantaneous timed program that is executed in sequence (from left to right) while firing the transition.*

The semantics of a GTA <sup>A</sup> = (*Q, Σ, X, Δ,*(*q*0*, g*0)*,*(*Qf , gf* )) is given by a transition system TS<sup>A</sup> whose states are *configurations* (*q, v*) of <sup>A</sup>, where *<sup>q</sup>* <sup>∈</sup> *<sup>Q</sup>* and *<sup>v</sup>* <sup>∈</sup> <sup>V</sup> is a valuation. A configuration (*q, v*) is initial if *<sup>q</sup>* <sup>=</sup> *<sup>q</sup>*<sup>0</sup> and *<sup>v</sup>* <sup>|</sup><sup>=</sup> *<sup>g</sup>*0. A configuration (*q, v*) is accepting if *<sup>q</sup>* <sup>∈</sup> *<sup>Q</sup>f* and *<sup>v</sup>* <sup>|</sup><sup>=</sup> *<sup>g</sup>f* . Transitions of TS<sup>A</sup> are of two forms: (1) *delay transition*: (*q, v*) *<sup>δ</sup>* −→ (*q, v* <sup>+</sup> *<sup>δ</sup>*) if (*<sup>v</sup>* <sup>+</sup> *<sup>δ</sup>*) <sup>|</sup><sup>=</sup> *<sup>X</sup>F* <sup>≤</sup> 0, and (2) *discrete transition*: (*q, v*) *<sup>t</sup>* −→ (*q , v* ) if *t* = (*q, a,* prog*, q* ) ∈ *Δ* and *v* prog −−−→ *v* . Thus, a discrete transition *t* = (*q, a,* prog*, q* ), where prog = prog1; *...* ; prog*n* can be taken from (*q, v*) if there are valuations *<sup>v</sup>*1*,...,vn* such that *<sup>v</sup>* prog<sup>1</sup> −−−→ *v*<sup>1</sup> prog<sup>2</sup> −−−→ ··· prog*<sup>n</sup>* −−−−→ *<sup>v</sup>n* <sup>=</sup> *<sup>v</sup>* . A *run* of a GTA is a finite sequence of transitions from an initial configuration of TSA. A run is said to be *accepting* if its last configuration is accepting.

#### **3 Expressivity of GTA and Examples**

The GTA model defined above is rather expressive. Figure 3 illustrates an example which accepts words of the form *anb<sup>m</sup>* with *m* ≤ *n*, where each *a* occurs at time 0, after which *b*'s are seen one by one, with distance 1 between them. The history clock *x* is used to ensure the timing constraint. For every *a* that is read, the future clocks *y, z* decrease by 1. Hence the future clocks *y, z* maintain the opposite of the number of *a*'s seen. When the automaton starts reading *b*, the future clocks also start elapsing time and since they cannot go above 0, the number of *b*'s is at most the number of *a*'s. Such a language cannot be accepted by timed automata since the untimed language obtained by removing the time stamps needs to be regular in the case of timed automata. The GTA model is not only expressive, it is also convenient for use. To see this we now show that three classical models of timed systems can be easily captured using GTA. We also illustrate the modeling convenience provided by GTA in Sect. 8 based on experiments.

**Fig. 3.** Example of a GTA

**Timed automata.** Timed automata (TA) of Alur-Dill [9] can be modeled as a GTA as follows: (1) The set of states of the GTA is the same as the set of states of the TA. (2) There are no future clocks in the GTA and its history clocks are the clocks of the TA. (3) Each transition of the form *q a,g,R* −−−→ *q* in a TA , where *g* is a guard, *a* a letter and *R* a subset of clocks to be reset, is replaced by a transition *<sup>q</sup> a,*prog −−−−→ *<sup>q</sup>* where prog = *g*; [*R*]. (4) Initially, all clocks must be 0, captured by setting *<sup>g</sup>*<sup>0</sup> = (*XH* = 0). (5) The final guard is empty: *<sup>g</sup>f* <sup>=</sup> True.

**Event-clock Automata.** Event-clock automata (ECA) of [10] can be modeled as a GTA as follows: (1) The set of states of the GTA is the same as the set of states of the ECA. (2) For each *<sup>a</sup>* <sup>∈</sup> *<sup>Σ</sup>*, the GTA has a history clock ←−*<sup>a</sup>* and a future clock −→*a* . (3) Each transition of the form *q a,g* −−→ *q* in a ECA, where *<sup>g</sup>* is a guard of the ECA, *<sup>a</sup>* a letter, is replaced by a transition *<sup>q</sup> a,*prog −−−−→ *<sup>q</sup>* where prog := ( −→*<sup>a</sup>* = 0); [−→*<sup>a</sup>* ]; *<sup>g</sup>*; [←−*<sup>a</sup>* ]. (4) At initialization, history clocks must be undefined (set to <sup>∞</sup>), captured by *<sup>g</sup>*<sup>0</sup> = (*XH* <sup>=</sup> <sup>∞</sup>). (5) At acceptance, all future clocks must be undefined, i.e., *<sup>g</sup>f* = (*XF* <sup>=</sup> −∞).

**Automata with Timers.** The third model we consider is that of automata with timers. Timers are timing constructs that are started/initialized with a certain time value at some point/event and *count down* to 0. They measure the time from when they were started till the timer hits 0, where the event of hitting 0 is called *timeout*. However, they can be stopped using a *stop* event at any intermediate point instead and in which case the timer must be freed for reuse later. Timers are a common construct in protocol specification, e.g., the ITU standard which uses timers rather than clocks [30] and Mealy machines with timers [31].

In our setting, a timer can be seen as a specific instance of a future clock. More precisely Automata with timers (*A* ) can be modeled as GTA as follows: (1) The set of states of the GTA is the same as the set of states of *A* . (2) The future clocks of GTA are the timers of *A* and there are no history clocks. Initially, the timers are undefined, captured by *<sup>g</sup>*<sup>0</sup> = (*XF* <sup>=</sup> −∞) and *<sup>g</sup>f* <sup>=</sup> True. (4) A transition of *A* with action *<sup>a</sup>* from *<sup>q</sup>* to *<sup>q</sup>* is encoded as *<sup>q</sup> a,*prog −−−−→ *<sup>q</sup>* with:


We note that the timer above differs from a prophecy-event-clock (of ECA) though both are future clocks. Prophecy-clocks are released only when the event is seen, so at that point the value of the prophecy-clock must be 0. On the other hand timers can be stopped and released even when their value is not 0. This subtle difference has a surprising impact when we allow diagonal guards.

### **4 The Reachability Problem for GTA**

We are interested in the *reachability problem* for GTA: given a GTA A, does it have an accepting run? For normal TA, the reachability problem is decidable and PSPACE complete as shown in [9]. This was shown using the so-called region abstraction, by proving the existence of a finite time-abstract bisimulation. However, this is not the case for GTA. As explained in the previous subsection, GTA capture ECA, and as shown in [27,28], there exists ECA for which there is no finite time-abstract bisimulation. However, reachability is still decidable in the specific case of ECA, as again shown in [10]. We note that for ECA model of [27,28] there are no diagonal constraints. In this case they show decidability via zone-extrapolation. In [3], another approach for decidability via zone simulations is shown. But again even in this model diagonal constraints are disallowed. Even more critically in GTA, we can capture timers and a priori we can have diagonal constraints even among timers. So, the question we ask is whether reachability is still decidable for GTA. Surprisingly, the answer is no. The intuition is that with future clocks and diagonal constraints, we get the ability to count (cf. Fig. 3).

#### **Theorem 2.** *Reachability for GTA is undecidable.*

*Proof.* We reduce from counter machines. Given a counter machine, we will build a GTA with one future clock *<sup>y</sup>C* for each counter *<sup>C</sup>* and one extra future clock *<sup>z</sup>*. The reduction uses diagonal constraints between *<sup>z</sup>* and the future clocks *<sup>y</sup>C* .

Initially and after each transition, the value of the future clock *z* will be 0. Since a future clock has to be non-positive, time elapse is impossible. As an invariant, the value of the future clock *<sup>y</sup>C* is the opposite of the value of counter *C*. The operations on counter *C* are encoded with the following programs: (1) zero*C* <sup>=</sup> *yC* = 0 (2) inc*C* <sup>=</sup> [*z*]; *<sup>z</sup>* <sup>=</sup> *<sup>y</sup>C* <sup>−</sup> 1; [*yC* ]; *<sup>y</sup>C* <sup>=</sup> *<sup>z</sup>*; [*z*]; *<sup>z</sup>* = 0 (3) dec*C* <sup>=</sup> *yC* ≤ −1; [*z*]; *<sup>z</sup>* <sup>=</sup> *<sup>y</sup>C* + 1; [*yC* ]; *<sup>y</sup>C* <sup>=</sup> *<sup>z</sup>*; [*z*]; *<sup>z</sup>* = 0. In programs inc*C* and dec*C* , each release of a future clock is followed by a constraint which restricts the value non-deterministically chosen during the release. For instance, [*z*]; *<sup>z</sup>* <sup>=</sup> *<sup>y</sup>C* <sup>−</sup><sup>1</sup> is equivalent to *<sup>z</sup>* := *<sup>y</sup>C* <sup>−</sup> 1. Hence, the overall effect of inc*C* is *<sup>y</sup>C* := *<sup>y</sup>C* <sup>−</sup> 1, maintaining all other clocks unchanged, including the invariant *z* = 0.

Given this negative result, what can we do? A careful observation of the proof tells us that it is the interplay between diagonal constraints and arbitrary releases of future clocks that leads to undecidability. More precisely, the encoding depends on the fact that clocks *<sup>z</sup>* and *<sup>y</sup>C* which are used in diagonal constraints (*<sup>z</sup>* <sup>=</sup> *<sup>y</sup>C* <sup>−</sup> 1, *<sup>z</sup>* <sup>=</sup> *<sup>y</sup>C* + 1 and *<sup>y</sup>C* <sup>=</sup> *<sup>z</sup>*) may have arbitrary values when they are released. This suggests a restricted subclass that we formalize next.

**Definition 3 (Safe GTA).** *Let <sup>X</sup>D* <sup>⊆</sup> *<sup>X</sup>F be a subset of future clocks. A program prog* <sup>=</sup> *g*1; [*R*1]; *<sup>g</sup>*2; [*R*2]; *...* ; *<sup>g</sup>k*; [*Rk*]; *<sup>g</sup>k*+1 *is <sup>X</sup>D-safe if*


*A GTA* <sup>A</sup> *is <sup>X</sup>D-safe if it only uses <sup>X</sup>D-safe programs on its transitions and the initial guard g*<sup>0</sup> *sets each history clock to either* 0 *or* ∞*.*

Observe that the three examples discussed in Sect. 3 are safe. Timed automata do not have future clocks so the condition is vacuously true. In ECA, event-predicting clocks are always checked for 0 before being released, hence they are safe as well with *<sup>X</sup>D* <sup>=</sup> *<sup>X</sup>F* . Automata with timers without diagonal constraints are also trivially safe with *<sup>X</sup>D* <sup>=</sup> <sup>∅</sup>. The importance of safety is the following theorem which is the center-piece of this article.

## **Theorem 4.** *Reachability for <sup>X</sup>D-safe GTA is decidable.*

We will establish this theorem by showing a finite, sound and complete zone based reachability algorithm for *<sup>X</sup>D*-safe GTA. If the given GTA is not *<sup>X</sup>D*safe, then we lose proof of termination (unsurprisingly, since the problem is undecidable), but we still maintain soundness. Thus, even for such GTA when our algorithm does terminate it will give the correct answer.

### **5 Symbolic Enumeration**

We adapt the G-simulation framework presented in [26] for timed automata with diagonal constraints to GTA. Diagonal constraints offer succinct modeling [15], but are quite challenging to handle efficiently in zone-based algorithms, and have led to pitfalls in the past: [14] showed that the erstwhile algorithm based on zone-extrapolations that was implemented in tools is incorrect for models with diagonal constraints; moreover no extrapolation based method can work for automata with diagonal constraints. The simulation framework by-passes this impossibility result and is the state-of-the-art for timed automata with diagonal constraints. The framework was extended to event-clock automata without diagonal constraints in [3]. We show that the ideas from [26] and [3] can be suitably combined to give an effective procedure for safe GTAs. This extension to GTAs enables us to understand the mechanics of diagonal constraints in future clocks.

The algorithm based on the G-simulation framework involves:


We will next adapt the static analysis to the GTA setting. The algorithm for the zone graph computation and the implementation of the simulation relation over zones is taken off-the-shelf from [26] and [3], except for a minor adaptation to include diagonal constraints involving future clocks. What is absent, and requires a non-trivial analysis, is the proof of termination. Therefore, we will mainly focus on this aspect and devote Sect. 7 for the termination argument.

<sup>G</sup>*-Simulation and the Static Analysis for GTA.* We fix a GTA <sup>A</sup> <sup>=</sup> (*Q, Σ, X, T,*(*q*0*, g*0)*,*(*Qf , gf* )) for this section. Our goal is to define a simulation relation on the semantics of <sup>A</sup>, i.e., on TS(A). In the subsequent sections we will lift this to zones and show its finiteness. A simulation relation on TS(A) is a reflexive, transitive relation (*q, v*) (*q, v* ) relating configurations with the same control state and (1) for every (*q, v*) *<sup>δ</sup>* −→ (*q, v* + *δ*), we have (*q, v* ) *<sup>δ</sup>* −→ (*q, v* + *δ*) and (*q, v* + *δ*) (*q, v* + *δ*), (2) for every transition *t*, if (*q, v*) *<sup>t</sup>* −→ (*q*1*, v*1) for some valuation *v*1, then (*q, v* ) *<sup>t</sup>* −→ (*q*1*, v* <sup>1</sup>) for some valuation *v* <sup>1</sup> with (*q*1*, v*1) (*q*1*, v* 1).

For any set *<sup>G</sup>* of atomic constraints, we define a *preorder G* on valuations:

$$v \preceq\_G v' \qquad \text{if } \forall \varphi \in G, \ \forall \delta \ge 0, \qquad v + \delta \mid = \varphi \implies v' + \delta \mid = \varphi \text{ .} $$

Notice that in the definition above, we *do not* restrict *δ* to those such that *v*+*δ* is a valuation: we may have *<sup>v</sup>*(*x*)+*δ >* 0 for some *<sup>x</sup>* <sup>∈</sup> *<sup>X</sup>F* . In usual timed automata, this question does not arise, as elapsing any *δ* from any given valuation always results in a valuation. But this is crucial for the proof of Theorem 5 below.

Intuitively, the preorder above is a simulation wrt the constraints in *G* even after time elapse. But we need this to also be a simulation wrt discrete transitions. To achieve this, the set of constraints *G* should depend on the available discrete transitions. In fact, we define a map G from states to set of constraints, in such a way that it captures the simulation wrt the discrete actions. In other words, our focus will be to choose state-dependent sets of constraints (given by the map G) depending on A such that the resulting preorder induces a simulation on TS(A).

As a first step towards this, we define, for any set *G* of constraints and any program prog, a set of constraints *<sup>G</sup>* <sup>=</sup> pre(prog*, G*) such that, if *<sup>v</sup> G v* and *v* prog −−−→ *<sup>v</sup>*<sup>1</sup> then there exists *<sup>v</sup>* prog −−−→ *v* <sup>1</sup> such that *<sup>v</sup>*<sup>1</sup> *<sup>G</sup> <sup>v</sup>* <sup>1</sup>. This set is defined inductively as follows (*G* is a set of atomic constraints, *R* is a set of clocks, *g* is an *arbitrary* constraint, *y* − *xc* is an *atomic* constraint):

$$\mathsf{pre}(\operatorname{prog}\_1; \operatorname{prog}\_2, G) = \mathsf{pre}(\operatorname{prog}\_1, \mathsf{pre}(\operatorname{prog}\_2, G)) \\ \qquad \qquad \mathsf{pre}(g, G) = \mathsf{split}(g) \cup G \\ \qquad \qquad \qquad \qquad \mathsf{pre}([R], \{y - x \circ c\}) = \begin{cases} \{y - x \circ c\} & \text{if } x, y \notin R \\ \{y \circ c\} & \text{if } x \in R, y \notin R \\ \{-x \circ c\} & \text{if } x \notin R, y \in R \\ \emptyset & \text{if } x, y \in R \end{cases}$$

where split(*g*) is the set of atomic constraints occurring in *<sup>g</sup>*.

Now, the choice of suitable *G* will be obtained by static analysis, on the lines of what was done for timed automata with diagonals [24–26], but adapted to our more powerful model. More precisely, we define the map G from *Q* to sets of atomic constraints as the least fixpoint of the set of equations:

$$\mathcal{G}(q) = \{x \le 0 \mid x \in X\_F\} \cup \bigcup\_{q \xrightarrow{a, \text{prog}} q'} \mathsf{pre}(\text{prog}, \mathcal{G}(q')) \tag{1}$$

Finally, based on *G* and the <sup>G</sup>(*q*) computation, we can define a preorder <sup>A</sup> between configurations of TS(A) as (*q, v*) <sup>A</sup> (*q , v* ) if *<sup>q</sup>* <sup>=</sup> *<sup>q</sup>* and *<sup>v</sup>* G(*q*) *<sup>v</sup>* . We then show that <sup>A</sup> defined above is indeed a simulation relation.

### **Theorem 5.** *The relation* <sup>A</sup> *is a simulation on the transition system* TSA*.*

*Zones for GTA and the Zone Graph Computation*. Roughly, *zones* [12] are sets of valuations that can be represented efficiently using constraints between differences of clocks. In this section, we introduce an analogous notion for generalized timed automata. We consider *GTA zones*, or simply *zones*, which are special sets of valuations of GTA. A GTA zone is a set of valuations satisfying a conjunction of constraints of the form *<sup>y</sup>*−*xc*, where *x, y* <sup>∈</sup> *<sup>X</sup>* ∪{0}, *<sup>c</sup>* <sup>∈</sup> <sup>Z</sup> and ∈ {≤*, <*}. Thus zones are an abstract representation of sets of valuations. Then, an abstract configuration, also called a *node*, is a pair consisting of a state and a zone. Firing a transition *t* := (*q, a,* prog*, q* ) in a GTA A from node (*q,Z*) will result in another node following a sequence of operations that we now define. *GTA Zone Operations.* Let *g* be a guard, *R* ⊆ *X* a set of clocks and *Z* a GTA zone.


*Successor Computation.* We can show that starting from a zone *Z*, the successors after the above operations are also zones (see Theorem 29 in [2]). A guard *g* can be seen as yet another zone and hence guard intersection is just an intersection operation between two zones. Similarly, the change operation preserves zones. Finally, as is usual with timed automata, zones are closed under the time elapse operation.

Thus, for a transition *t* := (*q, a,* prog*, q* ) and a node (*q,Z*), we can define the successor node (*q , Z* ), and we write (*q,Z*) *<sup>t</sup>* −→ (*q , Z* ), where *Z* is the zone computed by the following sequence of operations: Let prog = prog1; *...* ; prog*n*, where each prog*i* is an atomic program, i.e., a guard or a change. Then we define zones *<sup>Z</sup>*1*,...,Zn*+1 where, *<sup>Z</sup>*<sup>1</sup> <sup>=</sup> *<sup>Z</sup>*, *<sup>Z</sup>* <sup>=</sup> −−−→ *<sup>Z</sup>n*+1, and for each 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>n</sup>*, *<sup>Z</sup>i*+1 <sup>=</sup> *<sup>Z</sup>i*∩*g<sup>i</sup>* if prog*i* is a guard *<sup>g</sup>i*, and *<sup>Z</sup>i*+1 = [*Ri*]*Z<sup>i</sup>* if prog*i* is a change [*Ri*].

Now, we can lift zone graphs, simulations from TA to GTA and obtain a symbolic reachability algorithm for GTA.

**Definition 6 (GTA zone graph).** *Given a GTA* A*, its* GTA zone graph, denoted GZG(A)*, is defined as follows: Nodes are of the form* (*q,Z*) *where <sup>q</sup> is a state and Z is a GTA zone. The initial node is* (*q*0*,* −→*Z*0) *where q*<sup>0</sup> *is the initial state and Z*<sup>0</sup> *is the set of all valuations which satisfy the* initial constraint *g*0*: Z*<sup>0</sup> *is given by g*<sup>0</sup> ∧ *<sup>X</sup>F* <sup>≤</sup> <sup>0</sup> ∧ *<sup>X</sup>H* <sup>≥</sup> <sup>0</sup> *. For every node* (*q,Z*) *and every transition t* := (*q, a, prog, q* ) *of* A*, there is a transition* (*q,Z*) *<sup>t</sup>* −→ (*q , Z* ) *in the GTA zone graph. A node* (*q,Z*) *is accepting if <sup>q</sup>* <sup>∈</sup> *<sup>Q</sup>f and <sup>Z</sup>* <sup>∩</sup> *<sup>g</sup>f is non-empty, i.e., there exists a valuation in Z satisfying the final constraint.*

Similar to the case of zone graphs for timed automata and event zone graphs for ECA, the GTA zone graph can be used to decide reachability for generalized timed automata. A node (*q,Z*) is said to be reachable (in A) if there is a path from the initial node (*q*0*,* −→*Z*0) to (*q,Z*) in GZG(*A*). Thus, reachability of a final state in <sup>A</sup> reduces to checking reachability of an accepting node in GZG(*A*). However, as in the case of zone graphs for timed automata, GZG(*A*) is also not guaranteed to be finite. Hence, we need to compute a finite truncation of the GTA zone graph, which is still sound and complete for reachability.

**Definition 7 (Simulation on GTA zones and finiteness).** *Let be a simulation relation on* TS(A)*. For two GTA zones Z, Z , we say* (*q,Z*) (*q,Z* ) *if for every v* ∈ *Z there exists v* ∈ *Z such that* (*q, v*) (*q, v* )*. The simulation is said to be finite if for every sequence* (*q,Z*1)*,*(*q,Z*2)*,... of* reachable *nodes, there exists j>i such that* (*q,Zj* ) (*q,Zi*)*.*

Now, the reachability algorithm, as in TA, enumerates the nodes of the GTA zone graph and uses the simulation <sup>A</sup> from Theorem 5 to truncate nodes that are smaller with respect to the simulation. In Sect. 7, we will show that <sup>A</sup> is finite when A is safe, which implies that the reachability algorithm terminates. But before that we discuss the issue of implementability.

#### **6 Computing with GTA Zones Using Distance Graphs**

To implement the reachability algorithm described above, we will view zones as *distance graphs*, as is usually done in the literature [12].

Recall the notion of weights <sup>C</sup> <sup>=</sup> {(*, c*) <sup>|</sup> *<sup>c</sup>* <sup>∈</sup> <sup>R</sup> and ∈ {≤*, <*}. An order relation *<* between weights is defined as (*, c*) *<* ( *, c* ) when either (1) *c<c* , or (2) *c* = *c* and is *<* while is ≤. Note that since (*<,* −∞) *<* (≤*,* −∞) *<* ( *, c*) *<sup>&</sup>lt;* (*<,*∞) *<sup>&</sup>lt;* (≤*,*∞) for all *<sup>c</sup>* <sup>∈</sup> <sup>R</sup>, this relation is a total order and therefore min of a finite set of weights is well defined. We also use the commutative and associative sum operation on weights defined in [4]. If *c, c* <sup>∈</sup> <sup>R</sup> are finite, the definition is as usual: (*, c*)+( *, c* )=(*, c* + *c* ) where = ≤ if = = ≤ and = *<* otherwise. Infinite weights *α, β* from the list (*<,* +∞)*,*(≤*,* −∞)*,*(≤ *,* +∞)*,*(*<,* −∞) are all 'absorbants' wrt. weaker weights: *α* + *β* = *β* + *α* = *α* if *α* is stronger than *<sup>β</sup>* (i.e., *<sup>α</sup>* is listed after *<sup>β</sup>*). Also, *<sup>α</sup>* + (*, c*) = *<sup>α</sup>* if *<sup>c</sup>* <sup>∈</sup> <sup>R</sup> is finite.

A distance graph G is a weighted directed graph without self-loops, with vertex set *<sup>X</sup>* ∪{0} <sup>=</sup> *<sup>X</sup>F* <sup>∪</sup>*XH* ∪{0}, and edges labeled with weights from C\{(*<sup>&</sup>lt; ,* −∞)}. We define its semantics [[G]] := {*<sup>v</sup>* <sup>∈</sup> <sup>V</sup> <sup>|</sup> *<sup>v</sup>* <sup>|</sup><sup>=</sup> *<sup>y</sup>*−*xc* for all edges *<sup>x</sup> c* −→ *<sup>y</sup>* in <sup>G</sup>}. The weight of edge *<sup>x</sup>* <sup>→</sup> *<sup>y</sup>* is denoted <sup>G</sup>*xy* and we set <sup>G</sup>*xy* = (≤*,*∞) if there is no edge *x* → *y*. The weight of a path is the sum of the weights of its edges. A cycle in <sup>G</sup> is said to be negative if its weight is strictly less than (≤*,* 0).

In classical timed automata, the significance of distance graphs stems from the observation that a distance graph has no negative cycles iff its semantics is non-empty. This property does not immediately hold for distance graphs over the extended algebra [4, Section 4.2] However, we can convert a distance graph G (in time polynomial in number of clocks) into a *standard form* where this characterization continues to hold. First, we set G <sup>0</sup>*x* = min(G<sup>0</sup>*x,*(≤*,* 0)) for *<sup>x</sup>* <sup>∈</sup> *<sup>X</sup>F* and <sup>G</sup> *x*<sup>0</sup> = min(G*x*0*,*(≤*,* 0)) for *<sup>x</sup>* <sup>∈</sup> *<sup>X</sup>H*. Moreover, if *<sup>x</sup>* <sup>∈</sup> *<sup>X</sup><sup>F</sup>* then we set G *x*<sup>0</sup> = min(G*x*0*,*(*<,*∞)) if <sup>G</sup>*xy* = (≤*,*∞) for some *<sup>y</sup>* <sup>=</sup> *<sup>x</sup>*, otherwise we keep G *x*<sup>0</sup> <sup>=</sup> <sup>G</sup>*x*<sup>0</sup>. Similarly, if *<sup>y</sup>* <sup>∈</sup> *<sup>X</sup><sup>H</sup>* then we set <sup>G</sup> <sup>0</sup>*y* = min(G<sup>0</sup>*y,*(*<,*∞)) if <sup>G</sup>*xy* = (≤*,*∞) for some *<sup>x</sup>* <sup>=</sup> *<sup>y</sup>*, otherwise we keep <sup>G</sup> <sup>0</sup>*y* <sup>=</sup> <sup>G</sup><sup>0</sup>*y*. Finally, for *x, y* <sup>∈</sup> *<sup>X</sup>* with *<sup>x</sup>* <sup>=</sup> *<sup>y</sup>* we set <sup>G</sup> *xy* <sup>=</sup> <sup>G</sup>*xy*. The graph <sup>G</sup> constructed above is called the standardization of G, it is equivalent to G (i.e., [[G ]] = [[G]]) and it has a negative cycle iff its semantics [[G ]] is empty [4].

Now, suppose G (in standard form) has no negative cycles, then we construct <sup>G</sup> by replacing the weight of an edge *<sup>x</sup>* <sup>→</sup> *<sup>y</sup>* by the minimum of the weights of the paths from *x* to *y* in G . Such a G is called the *normalization* of G and has several useful properties.

Let *Z* be a nonempty zone. Writing the constraints in *Z* as a distance graph, followed by standardizing and normalizing it, results in *its canonical distance graph* G(*Z*): [[G(*Z*)]] = *Z* and G(*Z*) is minimal among the standard graphs *G* with [[*G*]] = *<sup>Z</sup>*. We denote by *<sup>Z</sup>xy* the weight of the edge *<sup>x</sup>* <sup>→</sup> *<sup>y</sup>* in <sup>G</sup>(*Z*).

[3] contains the algorithms for the zone operations when there are no diagonal constraints. Successor computation can be done in O(|*X*| <sup>2</sup> · |*g*|) and the simulation in O(|*X*| <sup>2</sup>). Incorporating intersection with diagonal constraints requires an additional standardization step since diagonal constraints may break this property. A detailed explanation of the successor computation of zones is provided in [2]. For the simulation, the algorithm from [26] is used. However, in the presence of diagonal constraints, the simulation check becomes NP-complete in general, and makes use of heuristics that allows for a faster check in practice. What remains is to show that <sup>A</sup> is a finite simulation for *<sup>X</sup>D*-safe GTA.

### **7 Finiteness of the Simulation Relation**

In this section, we show that the simulation relation <sup>A</sup> proposed in Sect. 5 is finite for safe GTA, which proves termination of the symbolic enumeration-based reachability algorithm. We do this in two parts: first, we show that the zones that are reached during the enumeration satisfy some invariants, in particular, only finitely many values occur in constraints among future clocks. This is however not necessarily true for history clocks. There the simulation comes into play. In the second part of the proof, we combine the invariants with an equivalence relation to show finiteness of the simulation. Below, we sketch these arguments and provide intuition leaving formal details to [2] due to lack of space.

Throughout this section, we fix an *<sup>X</sup>D*-safe GTA <sup>A</sup>. Let *<sup>M</sup>* = max{|*c*| | *<sup>c</sup>* <sup>∈</sup> <sup>Z</sup> is used in some constraint of A}, called the maximal constant of <sup>A</sup>. We say that a zone *<sup>Z</sup>* is reachable if there is some reachable node (*q,Z*) in GZG(A).

**Part 1: Invariants on zones.** We start by showing an important property of reachable zones: closure under valuations that agree on the value of history clocks, and satisfy the same set of safe constraints involving non-history clocks.

We say that a constraint *<sup>x</sup>* <sup>−</sup> *yc* is *<sup>M</sup>-bounded* if either *<sup>c</sup>* <sup>∈</sup> <sup>R</sup> is such that <sup>|</sup>*c*| ≤ *<sup>M</sup>* or *<sup>c</sup>* ∈ {−∞; +∞}. It is *<sup>X</sup>D*-safe if *x, y* <sup>∈</sup> *<sup>X</sup>F* implies *x, y* <sup>∈</sup> *<sup>X</sup>D*. We say that it is (*XD, M*)-safe if it is both *<sup>M</sup>*-bounded and *<sup>X</sup>D*-safe.

**Lemma 8.** *Let v, v* <sup>∈</sup> <sup>V</sup> *be such that <sup>v</sup>* <sup>↓</sup>*X<sup>H</sup>* <sup>=</sup> *<sup>v</sup>*↓*X<sup>H</sup> and, for all* (*XD, M*)*-safe constraints <sup>y</sup>* <sup>−</sup> *xc with x, y* <sup>∈</sup> *<sup>X</sup>F* ∪ {0}*, we have <sup>v</sup>* <sup>|</sup><sup>=</sup> *<sup>y</sup>* <sup>−</sup> *xc if and only if v* |= *y* − *xc. Let Z be a* reachable *zone. Then, v* ∈ *Z if and only if v* ∈ *Z.*

The proof (given in [2]) works by establishing that the property is true in the initial zone, and showing that it is invariant under the zone operations used to compute GZG(A). This proof crucially uses the fact that <sup>A</sup> is *<sup>X</sup>D*-safe. For the case of releasing a clock *<sup>x</sup>* <sup>∈</sup> *<sup>X</sup>F* \ *<sup>X</sup>D*, we use the fact that a diagonal constraint involving *x* may not use another future clock. For the case of releasing a clock *<sup>x</sup>* <sup>∈</sup> *<sup>X</sup>D*, we use the fact that the value of the clock must be 0 or −∞ just before the release. As a non-example, consider Fig. 3. Here, *<sup>X</sup>D* <sup>=</sup> {*y, z*} and *<sup>M</sup>* = 1. After two iterations of *a*, the zone *Z*<sup>2</sup> reached is *x* = 0 ∧ *y* = *z* = −2. Pick *v* : *x* = 0*, y* = *z* = −2 and *v* : *x* = 0*, y* = *z* = −3. Notice that both of them satisfy the same set of (*XD, M*)-safe constraints, but *<sup>v</sup>* <sup>∈</sup> *<sup>Z</sup>*2, *<sup>v</sup>* <sup>∈</sup>*/ <sup>Z</sup>*2. Indeed, the automaton is not *<sup>X</sup>D*-safe since *<sup>y</sup>* and *<sup>z</sup>* are released arbitrarily.

From Lemma 8, we get the following corollary (with a more precise statement and proof in [2]). Namely, if a reachable zone *Z* contains a valuation *v* in which the difference between two future clocks *x, y* (including the zero clock) is finite and large enough, then *Z* contains valuations where the difference between *x* and *y* is any finite and large enough value.

**Corollary 9.** *Let <sup>Z</sup> be a reachable zone and let <sup>v</sup>* <sup>∈</sup> *<sup>Z</sup>. Let <sup>n</sup>* = max(1*,* <sup>|</sup>*XD*|)*. For all x, y* <sup>∈</sup> *<sup>X</sup>F* ∪ {0}*, if* −∞ *< v*(*x*) <sup>−</sup> *<sup>v</sup>*(*y*) *<sup>&</sup>lt;* <sup>−</sup>*nM then, for every <sup>α</sup> with* −∞ *<α<* −*nM, we have a valuation v* ∈ *Z with v* (*x*) − *v* (*y*) = *α.*

Notice that the property above does not hold if we simply take *n* = 1. For instance, if we have two clocks *x, z* <sup>∈</sup> *<sup>X</sup>D* then, applying the (*XD, M*)-safe program [*x, z*]; *<sup>z</sup>* <sup>=</sup> <sup>−</sup>*<sup>M</sup>* <sup>∧</sup> *<sup>x</sup>* <sup>−</sup> *<sup>z</sup>* <sup>=</sup> <sup>−</sup>*M* from <sup>V</sup> results in a zone *<sup>Z</sup>* where all valuations *v* satisfy *v*(*x*) = −2*M*. So the property fails with *n* = 1, *x* and *y* = 0. This is a noteworthy difference between models with and without diagonals.

Using Corollary 9, we can prove the main invariants satisfied by the zones obtained during the enumeration. Essentially, the weights of edges involving non-history clocks come from a finite set which depends on the number of future clocks in *<sup>X</sup>D* and the maximum constant *<sup>M</sup>* of the automaton. This also induces an invariant on the constraint between a history clock and a future clock.

Before stating the result, we first give two technical lemmas from [4] that we use extensively in the proof.

#### **Lemma 10 (** [4]**).**

	- *– αc iff* (≤*, α*) ≤ (*, c*) *iff* (≤*,* 0) ≤ (≤*,* −*α*)+(*, c*)*,*
		- *– α c iff* (*, c*) *<* (≤*, α*) *iff* (≤*,* −*α*)+(*, c*) *<* (≤*,* 0) *iff* (≤*,* −*α*)+(*, c*) ≤ (*<,* 0)*.*

**Lemma 11 (** [4]**).** *Let* <sup>G</sup> <sup>=</sup> <sup>G</sup>(*Z*) *for a non-empty GTA zone <sup>Z</sup>, and let x, y* <sup>∈</sup> *<sup>X</sup>* ∪ {0} *be a pair of distinct nodes and <sup>α</sup>* <sup>∈</sup> <sup>R</sup>*. There is a valuation <sup>v</sup>* <sup>∈</sup> [[G]] *with v*(*y*) − *v*(*x*) = *α if and only if*


**Lemma 12.** *Let <sup>Z</sup> be a nonempty reachable zone. Let <sup>n</sup>* = max(1*,* <sup>|</sup>*XD*|)*. Then, the normalized distance graph* <sup>G</sup>(*Z*) *satisfies the following* (†) *conditions:*

†<sup>1</sup> *For all <sup>x</sup>* <sup>∈</sup> *<sup>X</sup>F , <sup>y</sup>* <sup>∈</sup> *<sup>X</sup>H* ∪ {0}*, if <sup>Z</sup>xy is finite, then* (≤*,* 0) <sup>≤</sup> *<sup>Z</sup>x*<sup>0</sup> <sup>≤</sup> (≤*, nM*)*.* †<sup>2</sup> *For all <sup>x</sup>* <sup>∈</sup> *<sup>X</sup>F , if <sup>Z</sup>*<sup>0</sup>*x is finite, then* (*<,* <sup>−</sup>*nM*) <sup>≤</sup> *<sup>Z</sup>*<sup>0</sup>*x* <sup>≤</sup> (≤*,* 0)*.* †<sup>3</sup> *For all <sup>x</sup>* <sup>∈</sup> *<sup>X</sup>H, <sup>y</sup>* <sup>∈</sup> *<sup>X</sup>F , if <sup>Z</sup>*<sup>0</sup>*y is finite, then <sup>Z</sup>x*<sup>0</sup> + (*<,* <sup>−</sup>*nM*) <sup>≤</sup> *<sup>Z</sup>xy.* †<sup>4</sup> *For x, y* <sup>∈</sup> *<sup>X</sup>F , if <sup>Z</sup>xy is finite, then* (*<,* <sup>−</sup>*nM*) <sup>≤</sup> *<sup>Z</sup>xy* <sup>≤</sup> (≤*, nM*)*.*

*Proof.* We focus on †1*,* †2, leaving the more complicated cases to [2].

†<sup>1</sup> First, we consider the case where *<sup>y</sup>* = 0. So we assume that (≤*,* 0) <sup>≤</sup> *<sup>Z</sup>x*<sup>0</sup> *<sup>&</sup>lt;* (*<,*∞) is finite. Towards a contradiction, suppose that (≤*, nM*) *< Zx*<sup>0</sup> *<sup>&</sup>lt;* (*<sup>&</sup>lt; ,*∞). Since *<sup>Z</sup>* is non-empty, we know that (≤*,* 0) <sup>≤</sup> *<sup>Z</sup>x*<sup>0</sup> <sup>+</sup> *<sup>Z</sup>*<sup>0</sup>*x*. Then, using Lemma 10, we can find *<sup>α</sup>* <sup>∈</sup> <sup>R</sup> such that (≤*, α*) <sup>≤</sup> *<sup>Z</sup>x*<sup>0</sup>, (≤*,* <sup>−</sup>*α*) <sup>≤</sup> *<sup>Z</sup>*<sup>0</sup>*x*, and *nM < α*. Notice that *α <* <sup>∞</sup> since *<sup>Z</sup>x*<sup>0</sup> *<sup>&</sup>lt;* (*<,*∞). Further, using Lemma 11, we can get a valuation *v* ∈ *Z* such that 0−*v*(*x*) = *α*. Since *nM < α <* ∞, this implies −∞ *< v*(*x*) *<sup>&</sup>lt;* <sup>−</sup>*nM*. Let *<sup>Z</sup>x*<sup>0</sup> = (*, c*). We have *nM < c <* <sup>∞</sup>. Using Corollary 9, we can get a valuation *v* ∈ *Z*, such that −∞ *< v* (*x*) *<* −*c*, a contradiction as it violates the constraint 0 − *xc* of *Z*. Next, assume that *<sup>Z</sup>xy <sup>&</sup>lt;* (*<,*∞) for some *<sup>y</sup>* <sup>∈</sup> *<sup>X</sup>H*. Since *<sup>Z</sup>* is normal, we have *<sup>Z</sup>x*<sup>0</sup> <sup>≤</sup> *<sup>Z</sup>xy* <sup>+</sup> *<sup>Z</sup>y*<sup>0</sup> *<sup>&</sup>lt;* (*<,*∞) as *<sup>Z</sup>xy <sup>&</sup>lt;* (*<,*∞) and *<sup>Z</sup>y*<sup>0</sup> <sup>≤</sup> (≤*,* 0). We now conclude from the first case that (≤*,* 0) <sup>≤</sup> *<sup>Z</sup>x*<sup>0</sup> <sup>≤</sup> (≤*, nM*).

†<sup>2</sup> We have to show that either *<sup>Z</sup>*0*x* = (≤*,* −∞) or (*<,* <sup>−</sup>*nM*) <sup>≤</sup> *<sup>Z</sup>*0*x* <sup>≤</sup> (≤*,* 0). Let *<sup>Z</sup>*0*x* = (*, c*). Suppose (≤*,* −∞) *< Z*0*x <sup>&</sup>lt;* (*<,* <sup>−</sup>*nM*). We have −∞ *<c<* <sup>−</sup>*nM*. As before, we can find *<sup>α</sup>* such that (≤*, α*) <sup>≤</sup> *<sup>Z</sup>*0*x*, (≤*,* <sup>−</sup>*α*) <sup>≤</sup> *<sup>Z</sup>x*<sup>0</sup> and *α* = −∞. Then, by Lemma 11, we can find *v* ∈ *Z* with *v*(*x*) = *α*. We have −∞ *< v*(*x*) *c<* −*nM*. Now, using Corollary 9, we can get a valuation *v* ∈ *Z* such that *c<v* (*x*) *<* −*nM*, which leads to a contradiction as it violates the constraint *x* − 0  *c* in the zone.

**Part 2. Equivalence and Finiteness.** We introduce below an equivalence relation ∼*<sup>n</sup> M* of *finite index* on valuations, depending on *<sup>n</sup>* = max(1*,* <sup>|</sup>*XD*|) and the maximal constant *M*, and show that, if *G* is a set of atomic *M*-bounded integral constraints and if *Z* is a zone such that its canonical distance graph <sup>G</sup>(*Z*) satisfies (†) conditions, then the downward closure <sup>↓</sup>*G<sup>Z</sup>* <sup>=</sup> {*<sup>v</sup>* <sup>∈</sup> <sup>V</sup> | ∃*v* <sup>∈</sup> *<sup>Z</sup>* with *<sup>v</sup> G <sup>v</sup>* } is a union of ∼*<sup>n</sup> M* equivalence classes.

First, we define <sup>∼</sup>*M* on *α, β* <sup>∈</sup> <sup>R</sup> <sup>=</sup> <sup>R</sup> ∪ {−∞*,*∞} by *<sup>α</sup>* <sup>∼</sup>*M <sup>β</sup>* if (*αc* ⇐⇒ *βc*) for all (*, c*) with ∈ {*<,* ≤} and *<sup>c</sup>* ∈ {−∞*,*∞} ∪ {*<sup>d</sup>* <sup>∈</sup> <sup>Z</sup> | |*d*| ≤ *<sup>M</sup>*}. In particular, if *<sup>α</sup>* <sup>∼</sup>*M <sup>β</sup>* then (*<sup>α</sup>* <sup>=</sup> −∞ ⇐⇒ *<sup>β</sup>* <sup>=</sup> −∞) and (*<sup>α</sup>* <sup>=</sup> ∞ ⇐⇒ *<sup>β</sup>* <sup>=</sup> <sup>∞</sup>).

Next, for valuations *<sup>v</sup>*1*, v*<sup>2</sup> <sup>∈</sup> <sup>V</sup>, we define *<sup>v</sup>*<sup>1</sup> <sup>∼</sup>*<sup>n</sup> M <sup>v</sup>*<sup>2</sup> by two conditions: *<sup>v</sup>*1(*x*) <sup>∼</sup>*nM <sup>v</sup>*2(*x*) and *<sup>v</sup>*1(*x*) <sup>−</sup> *<sup>v</sup>*1(*y*) <sup>∼</sup>(*n*+1)*M <sup>v</sup>*2(*x*) <sup>−</sup> *<sup>v</sup>*2(*y*) for all clocks *x, y* <sup>∈</sup> *X*. Notice that we use (*n* + 1)*M* for differences of values. Clearly, ∼*<sup>n</sup> M* is an equivalence relation of finite index on valuations. Using this, we can show that the zones that are reachable in a safe GTA are unions of ∼*<sup>n</sup> M*-equivalence classes.

**Lemma 13.** *Let <sup>G</sup> be a set of <sup>X</sup>D-safe <sup>M</sup>-bounded integral constraints which contains both <sup>x</sup>* <sup>≤</sup> <sup>0</sup> *and* <sup>0</sup> <sup>≤</sup> *<sup>x</sup> for each future clock <sup>x</sup>* <sup>∈</sup> *<sup>X</sup>F . Let <sup>Z</sup> be a zone with a canonical distance graph* <sup>G</sup>(*Z*) *satisfying the* (†) *conditions of Lemma 12. Let <sup>v</sup>*1*, v*<sup>2</sup> <sup>∈</sup> <sup>V</sup> *be valuations with <sup>v</sup>*<sup>1</sup> <sup>∼</sup>*<sup>n</sup> M <sup>v</sup>*2*. Then, <sup>v</sup>*<sup>1</sup> ∈ ↓*G<sup>Z</sup> iff <sup>v</sup>*<sup>2</sup> ∈ ↓*GZ.*

Finally, from Lemmas 12 and 13, we obtain our main theorem of the section.

**Theorem 14.** *The simulation relation* <sup>A</sup> *is finite if* A *is safe.*

*Proof.* Let (*q,Z*0)*,*(*q,Z*1)*,*(*q,Z*2)*,...* be an infinite sequence of *reachable* nodes in the zone graph of <sup>A</sup>. By Lemma 12, for all *<sup>i</sup>*, the distance graph <sup>G</sup>(*Zi*) in canonical form satisfies conditions (†).

The set <sup>G</sup>(*q*) contains only *<sup>X</sup>D*-safe and *<sup>M</sup>*-bounded integral constraints. Let *G* be G(*q*) together with the constraints *x* ≤ 0 and 0 ≤ *x* for each future clock

**Table 1.** Experimental results obtained by running our prototype implementation and, when possible, the standard reachability algorithm using G-simulation implemented in Tchecker. Both implementations use a breadth-first search with simulation. For each model, we give the parameters in parenthesis - for ToyECA, we explain the parameterization in [2], while for others, we report the number of concurrent processes. All experiments were run on an Ubuntu machine with an Intel-i5 7th Generation processor and 8 GB RAM, and timeout set to 60 s.


*<sup>x</sup>* <sup>∈</sup> *<sup>X</sup><sup>F</sup>* . From Lemma <sup>13</sup> we deduce that for all *<sup>i</sup>*, <sup>↓</sup>*GZ<sup>i</sup>* is a union of <sup>∼</sup>*<sup>n</sup> M*-classes. Since ∼*<sup>n</sup> M* is of finite index, there are only finitely many unions of <sup>∼</sup>*<sup>n</sup> M*-classes. Therefore, we find *i<j* with <sup>↓</sup>*GZ<sup>i</sup>* <sup>=</sup> <sup>↓</sup>*GZ<sup>j</sup>* , which implies *<sup>Z</sup><sup>j</sup> <sup>G</sup> <sup>Z</sup>i*. Since <sup>G</sup>(*q*) <sup>⊆</sup> *<sup>G</sup>*, this also implies *<sup>Z</sup><sup>j</sup>* G(*q*) *<sup>Z</sup>i*.

### **8 Experimental Evaluation**

We have implemented a prototype that takes as input a GTA, as given in Definition 1, and applies our reachability algorithm, in the open source tool Tchecker [29]. To do so, we extend Tchecker to allow clocks to be declared as one of *normal*, *history*, *prophecy*, or *timer*, and extend the syntax of edges to allow arbitrary interleaving of guards and clock changes (reset/release). Our tool, along with the benchmarks used in this paper, is available and can be downloaded from https://github.com/EQuaVe/GTAReach. We present selected results in Table 1, with further details in [2].

First, we consider timed automata models from standard benchmarks [21,34,39]. Despite the overhead induced by our framework (e.g., maintaining general programs on transitions), we are only slightly worse off wrt. running time than the standard algorithm, while visiting and storing the same number of nodes. We illustrate this in rows 1–3 of Table 1 by providing a comparison of our tool with the implementation of the state-of-the-art zone-based reachability algorithm using G-simulation introduced in [24–26].

Next, we consider models belonging to the class of ECA without diagonal constraints. We remark that ours is the first implementation of a reachability algorithm that can operate on the whole class of ECA directly. We compare against an implementation that first translates the ECA into a timed automaton using the translation proposed in [10], and then runs the state-of-the-art reachability algorithm of [24–26] on this timed automaton. From rows 4–7 of Table 1, we observe significant improvements, both in terms of running time as well as number of visited nodes and stored nodes w.r.t. the standard approach.

Finally, in Rows 8–12, we consider the unified model GTA. As already pointed out, model-checking an event-clock specification *ϕ* over a timed automaton model A can be reduced to the reachability on the product of the TA A and the ECA representing ¬*ϕ*. In this spirit, our implementation allows the model to use any combination of *normal* clocks, *history* clocks, *prophecy* clocks or *timers* and moreover, permits diagonal guards between any of these clocks. To the best of our knowledge, no existing tool allows all these features. We emphasize this by the − in the G-Sim column of Table 1.

We model simple but useful properties using event-clocks, and check these properties on some standard models from literature such as CSMACD [39], Fire-alarm [35] and Alternating-bit-protocol(ABP) [33]. Note that for the benchmark Fire-alarm-pattern, the specification is modelled using an ECA with diagonals. As a consequence, the product automaton that we check reachability on contains normal clocks and event-clocks. Here, we consider the following ECA specification: no three *a*'s occur within *k* time units. The negation of this property can be easily modeled by an ECA with two states and a transition on *a* with the diagonal constraint ←−*<sup>a</sup>* <sup>−</sup>−→*<sup>a</sup>* <sup>≤</sup> *<sup>k</sup>*, where ←−*<sup>a</sup>* is the history clock recording time since the previous occurrence of *a*, and −→*a* is a future clock predicting the time to the next *<sup>a</sup>* occurrence. When reading an *<sup>a</sup>*, the quantity ←−*<sup>a</sup>* <sup>−</sup>−→*<sup>a</sup>* gives the distance between the next and the previous occurrence. This language is used in [19] to observe that ECA with diagonals are more expressive than ECA. Finally, we remark that the model of ABP contains timers. A more detailed discussion of the model and specifications in these benchmarks is provided in [2].

In conclusion, as can be seen from the experimental results in Table 1, we are able to demonstrate the full power of our reachability algorithm for the unified model of generalized timed automata.

#### **9 Conclusion**

The success of timed automata verification can safely be attributed to the advances in the zone-based technology over the last three decades. In fact, [22], the precursor to the seminal works [8,9], already laid the foundations for zones by describing the Difference-Bounds-Matrices (DBM) data structure. Our goal in this work has been to unify timing features defined in different timed models, while at the same time retain the ability to use efficient state-of-the-art algorithms for reachability. To do so, we have equipped the model with two kinds of clocks, history and future, and modified the transitions to contain a program that alternates between a guard and a change to the variables. For the algorithmic part, we have adapted the G-simulation framework to this powerful model. The main challenge was to show finiteness of the simulation in this extended setting. To aid the practical use of this generic model, we have developed a prototype implementation that can answer reachability for GTA. We remark that decidability for GTA comes via zones, and not through regions. In fact, since we generalize event-clock automata, we do not have a finite region equivalence for GTA [28].

We conclude with some interesting avenues for future work. An immediate future work is to use generalized timed automata for model-checking timed specifications over real-time systems. Further, the complexity and expressivity of safe GTA are natural intersting theoretical open questions, but we believe they are not obvious. Both these questions are answered in the timed automata literature using regions. However, we cannot have a region equivalence for our model, since even for the subclass of ECA, it was shown that no finite bisimulation is possible. In particular, it would be interesting to investigate if is possible to have a translation from safe GTA to timed automata. Note that even if such a translation exists, it is likely to incur an exponential blowup since even the translation from ECA to TA costs an exponential. Coming to the complexity of the reachability problem for safe GTA, it is easy to see that our procedure runs in EXPSPACE, as we have shown that each reachable zone is a union of equivalence classes of a finite index (see Lemma 13). On the other hand, PSPACE-hardness is inherited from timed automata [6,8]. Closing the complexity gap is open. We note that even in timed automata, the precise complexity of the simulation based reachability algorithm is difficult to analyze, but its selling point is that it works well in practice. Finally, we would also like to investigate liveness verification for GTA, in particular what future clocks bring us when we consider the setting of *ω*-words.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Closed-Loop Analysis of Vision-Based Autonomous Systems: A Case Study

Corina S. Păsăreanu1,2(B) , Ravi Mangal<sup>2</sup>, Divya Gopinath<sup>1</sup>, Sinem Getir Yaman<sup>3</sup>, Calum Imrie<sup>3</sup>, Radu Calinescu<sup>3</sup>, and Huafeng Yu<sup>4</sup>

> <sup>1</sup> KBR, NASA Ames, Moffett Field, CA 94035, USA pcorina@andrew.cmu.edu

<sup>2</sup> Carnegie Mellon University, Moffett Field, CA 94035, USA <sup>3</sup> University of York, York, UK

<sup>4</sup> Boeing Research and Technology, Santa Clara, CA, USA

Abstract. Deep neural networks (DNNs) are increasingly used in safety-critical autonomous systems as perception components processing high-dimensional image data. Formal analysis of these systems is particularly challenging due to the complexity of the perception DNNs, the sensors (cameras), and the environment conditions. We present a case study applying formal probabilistic analysis techniques to an experimental autonomous system that guides airplanes on taxiways using a perception DNN. We address the above challenges by replacing the camera and the network with a compact abstraction whose transition probabilities are computed from the confusion matrices measuring the performance of the DNN on a representative image data set. As the probabilities are estimated based on empirical data, and thus are subject to error, we also compute confidence intervals in addition to point estimates for these probabilities and thereby strengthen the soundness of the analysis. We also show how to leverage local, DNN-specific analyses as run-time guards to filter out mis-behaving inputs and increase the safety of the overall system. Our findings are applicable to other autonomous systems that use complex DNNs for perception.

### 1 Introduction

Complex autonomous systems, such as autonomous aircraft taxiing systems [31] and autonomous cars [20,25,42], need to perceive and reason about their environments using high-dimensional data streams (such as images) generated by rich sensors (such as cameras). Machine learnt components, specially deep neural networks (DNNs), are particularly capable of the required high-dimensional reasoning and hence, are increasingly used for perception in these systems. While formal analysis of the safety of these systems is highly desirable due to their safety-critical operational settings and the error-prone nature of learned components, in practice this is very challenging because of the complexity of the system components, including the high complexity of the neural networks (which may have thousands or millions of parameters), the complexity of the camera capture process, and the random and hard to characterize nature of the environment in which the system operates (i.e., the world itself).

In this work, we describe a formal analysis of a closed-loop autonomous system that addresses the above challenges. Our case study is motivated by a real-world application, namely, an experimental autonomous system for guiding airplanes on taxiways developed by Boeing [3,14]. The key idea is to abstract away altogether the perception components, namely, the perception network and the image generator, i.e., the camera taking images of the world, and replace them with a probabilistic component α that maps (abstractions of) the state of the system to state estimates that are used in downstream decision making in the closed-loop system. The resulting system can then be analyzed with standard (probabilistic) model checkers, such as PRISM [34] or Storm [22].

The approach is *compositional*, in the sense that the probabilistic component is computed separately from the rest of the system. The transition probabilities in α are derived based on *confusion matrices* computed for the DNN (measured on representative data sets). Developers routinely use confusion matrices to evaluate machine learning models, so our analysis is closely aligned with existing workflows, facilitating its adoption in practice.

The size of the probabilistic abstraction α is linear in the size of the output of the DNN, and is independent of the number of the DNN parameters or the complexity of the camera and the environment. We also describe how to leverage additional results obtained from analyzing the DNN in isolation to further refine the abstraction and also increase the safety of the closed-loop system through *run-time guards*. In particular, we leverage rules mined from the DNN model [17] to act as run-time guards for the closed-loop analysis, filtering out inputs that likely lead to invalid DNN behavior. Other methods can also be used (e.g. [17, 18,21,26,32,35]) to catch adversarial or out-of-distribution inputs.

The probabilities in α are estimated based on empirical data, so they are subject to error. We explore the use of *confidence intervals* in addition to point estimates for these probabilities and thereby strengthen the soundness of the analysis [5,7]. Our technique is applicable to other autonomous systems that use DNN-based perception from high-dimensional data.

Related Work. Formal proofs of closed-loop safety have been obtained for systems with low-dimensional sensor readings [11,12,27–30,40]; however, they become intractable for systems that use rich sensors producing high-dimensional inputs such as images.

Other works address the modeling and scalability challenges by constructing *abstractions* of the perception components [24,33]. To model different environment conditions, these abstract models use *non-deterministic* transitions. The resulting closed-loop systems are analyzed with traditional (non-probabilistic) techniques. The abstractions either lack soundness proofs [33] or come with only probabilistic soundness guarantees [24] which do not translate into probabilistic guarantees over the safety of the overall system. VerifAI [16] can find counterexamples to system safety, but can not provide guarantees.

The recent work in [36] aims to verify the safety of the trajectories of a camera-based autonomous vehicle in a given 3D-scene. The work use invariant regions over the input space grouped based on the same controller action. However, their abstraction captures only one environment condition (i.e., one scene) and one camera model, whereas our approach is not particular to a camera model and implicitly considers all the possible environment conditions.

In contrast to previous work, we describe a formal analysis that is *probabilistic*, which we believe is natural since the camera images capturing the state of the world are subject to randomness due to the environment; further DNNs are learnt from data and are not guaranteed to be 100% accurate. Recent work [2] also discusses the use of classification metrics, such as confusion matrices, for quantitative system-level analysis with temporal logic specifications. However, the work does not discuss the computation of confidence intervals that is necessary for quantifying the empirical results. Also, it does not incorporate DNN specific analyses as we do here. We build on our previous work DeepDECS [6], where the goal is to perform controller synthesis with safety guarantees, so the formalism is more involved. Furthermore, DeepDECS does not consider confidence interval analysis, which we explore here based on some of our other previous works [5,7]. We analyzed center-line tracking using TaxiNet in [31]. That work focuses on the analysis of the network and not on the overall system.

### 2 Autonomous Center-Line Tracking with TaxiNet

Boeing is developing an experimental autonomous system for center-line tracking on taxiways in an airport. The system uses a neural network called TaxiNet for perception. TaxiNet is designed to take a picture of the taxiway as input and return the plane's position with respect to the center-line on the taxiway. It returns two outputs; cross track error (cte), which is the distance in meters of the plane from the center-line and heading error (he), which is the angle in degrees of the plane with respect to the center-line. These outputs are fed to a controller which in turn manoeuvres the plane such that it remains close to the center of the taxiway. This forms a closed-loop system where the perception network continuously receives images as the plane moves on the taxiway. We use this system as a case study and also as a running example throughout the paper.

System Decomposition. The decomposition of this system is illustrated in Fig. 1. The controller sends actions a to the airplane to guide it on the taxiway. The dynamics (which models the movement of the airplane on the airport surface) maps previous state s and action a to the next state s . <sup>1</sup> Information about the taxiway is provided by the perception network (p), i.e. TaxiNet. The perception network takes high-dimensional images captured with a *camera* (c), and returns its estimation s*est* of the real state s.

For our application, state s ∈ S captures the position of the airplane on the surface; <sup>S</sup> is modeled as CTE <sup>×</sup> HE. The network estimates the state <sup>s</sup> := (cte, he) based on images taken with a camera placed on the airplane. If the network is 'perfect', then <sup>s</sup> = <sup>s</sup>*est*. <sup>2</sup> However, this does not hold in practice.

<sup>1</sup> Velocity may be provided as feedback to the controller; we ignore here for simplicity.

<sup>2</sup> Assuming the relevant state of the system is recoverable from the input image.

The network is trained on a finite set of images and is not guaranteed to be 100% accurate whereas images observed in operation show a wide variety due to different environment (e.g., light, weather) conditions and imperfections in the camera.

Fig. 1. Closed-loop System Fig. 2. Abstracted System

Component Modeling. We built a simple discrete model of the airplane dynamics and a discrete-time controller for the system, similar to previous related work [4,23] which also considers discretized control. Since the controller is discretized, we abstract the regression outputs of TaxiNet to view the model as a classifier which predicts the plane's position in discrete states. Treatment of more complex systems with continuous semantics and regression models is left for future work. The main challenge that we address in the paper is the modeling of the perception components (the camera capture process and the network), which we describe in detail in the next section. We model the (abstracted) autonomous system as a Discrete Time Markov Chain (DTMC) [38]; the code for the models is provided in the appendix of an extended version of this paper [37].

Safety Properties. In our study, the goal is to provide *guarantees* for safe behavior with respect to two system-level properties indicated by our industrial partner. The properties specify conditions for safe operation in terms of allowed cte and he values for the airplane, by using taxiway dimensions. The first property states that the airplane shall never leave the taxiway (i.e., <sup>|</sup>cte| ≤ 8 meters). The second property states that the airplane shall never turn more than a prescribed degree (i.e., <sup>|</sup>he| ≤ 35◦), as it would be difficult to maneuver the airplane from that position. These two properties can be encoded in PCTL [8] as follows.

$$P = ?[F(|\mathsf{cte}| > 8 \, m)] \qquad \text{(Property 1)}$$

$$P = ?[F(|\mathsf{he}| > 35^{\circ})] \qquad \text{(Property 2)}$$

Here <sup>P</sup> =? indicates that we want to calculate the probability that eventually (F) the system reaches an error state.

TaxiNet DNN. This is a regression model with 24 layers including five convolution layers, and three dense layers (with 100/50/10 ELU neurons) before the output layer. The inputs to the model are RGB color images of size 360 × 200 pixels. We use a representative data set with 11108 images, shared by our industry partner. The model has a Mean Absolute Error (MAE) of 1.185 for cte and 7.86 for he outputs respectively. The discrete nature of the controller in our DTMCs induces a discretization on TaxiNet's outputs and the treatment of TaxiNet as a classifier for the purpose of our analysis. cte <sup>∈</sup> [−8.0 <sup>m</sup>, 8.0 <sup>m</sup>] and he <sup>∈</sup> [−35.0◦, <sup>35</sup>.0◦] are translated into cte ∈ {0, <sup>1</sup>, <sup>2</sup>, <sup>3</sup>, <sup>4</sup>} and he ∈ {0, <sup>1</sup>, <sup>2</sup>} as shown below.

$$\mathbf{c}\mathbf{t}\mathbf{e} = \begin{cases} 3\text{ if } -8.0 \text{ m } < = \mathbf{c}\mathbf{t}\mathbf{e} < -4.8 \text{ m} \\ 1 \text{ if } -4.8 \text{ m } < = \mathbf{c}\mathbf{t}\mathbf{e} < -1.6 \text{ m} \\ 0 \text{ if } -1.6 \text{ m } < = \mathbf{c}\mathbf{t}\mathbf{e} < -1.6 \text{ m} \\ 2 \text{ if } 1.6 \text{ m } < \mathbf{c}\mathbf{t}\mathbf{e} < -4.8 \text{ m} \\ 2 \text{ if } 1.6 \text{ m } < \mathbf{c}\mathbf{t}\mathbf{e} < -4.8 \text{ m} \\ 4 \text{ if } 4.8 \text{ m } < \mathbf{c}\mathbf{t}\mathbf{e} < -8.0 \text{ m} \end{cases} \qquad \underline{\mathbf{h}\mathbf{e}} = \begin{cases} 1 \text{ if } -35.0^{\circ} & < \mathbf{e} \text{ he } < -11.67^{\circ} \\ 0 \text{ if } -11.67^{\circ} & < \mathbf{e} \text{ he } < -11.66^{\circ} \\ 2 \text{ if } 11.66^{\circ} & < \mathbf{h}\mathbf{e} < -35.0^{\circ} \\ 2 \text{ if } 11.66^{\circ} & < \mathbf{h}\mathbf{e} < -35.0^{\circ} \end{cases}$$

We use label "−1" to denote error states, i.e., cte = <sup>−</sup>1 iff <sup>|</sup>cte<sup>|</sup> <sup>&</sup>gt; 8 m and he <sup>=</sup> <sup>−</sup><sup>1</sup> iff <sup>|</sup>he<sup>|</sup> <sup>&</sup>gt; <sup>35</sup>◦. For simplicity, we use cte and he to denote both the classifier and regression outputs in other parts of the paper (with meaning clear from context). Note that none of the input images are labeled by the classifier as "−1", as the outputs of the network are normalized to be within the prescribed bounds; however, this does not preclude the system from reaching an error.

### 3 Probabilistic Analysis

In this section, we describe the methodology for abstracting and analyzing an autonomous system leveraging probabilistic model checking. The main idea, which we initially explored in [6], is to replace the composition p ◦ c of the camera (denoted as c) and the perception DNN (denoted as p) with a conservative abstraction mapping each system state to every possible estimated state; the transition probabilities are derived empirically based on the confusion matrices computed for the DNN, on a representative data set. We denote this abstraction as <sup>α</sup> : <sup>S</sup> → D(S), mapping system states to a discrete distribution over (estimated) system states. Figure 2 depicts the abstracted autonomous system.

We observe that c can be viewed as a map between state s ∈ S to a distribution over images, denoted as <sup>D</sup>(Img), where img <sup>∈</sup> Img and Img is the set of images. For instance, in the TaxiNet system, state s only captures the position of the airplane with respect to the center-line, but there are many different images that correspond to the same position. This is due to uncontrollable environmental conditions, such as temporary sensor failures or different lighting and weather conditions. Consequently, a single state s can map to a number of different images depending on the environment, and this is modeled by considering <sup>c</sup> to be a probabilistic map of type <sup>S</sup> → D(Img). Given a system state <sup>s</sup>, <sup>α</sup>(s) models the probability of p ◦ c leading to a particular estimated state s*est*; α needs to be probabilistic because c itself is probabilistic and p is not perfectly accurate.

We further describe how we can leverage DNN-specific analysis to improve the accuracy of perception and the safety of the overall system, via the optional addition of run-time guards. For the verification of the closed-loop system, we use the PRISM model checking tool [34]. We also explore methods for analysis of DTMCs with uncertain transition probabilities [5,7], to obtain *probabilistic guarantees* about the validity of our probabilistic safety proofs even though the abstraction probabilities are empirical estimates.

Assumptions. Our analysis assumes that the distribution of inputs to the network remains fixed over time (i.e., it is not subject to distribution shifts). Moreover, the data set of input images used to estimate the probabilities in α is assumed to be *representative*, i.e., constituted of independently drawn samples from this fixed underlying distribution of inputs. Relaxing these assumptions is a challenging but important task for future research.

#### 3.1 Probabilistic Abstractions for Perception

We describe in detail the construction of the probabilistic abstraction <sup>α</sup> : <sup>S</sup> <sup>→</sup> <sup>D</sup>(S). We do not need access to the camera and only require black-box access to the network for constructing our abstraction.<sup>3</sup> We assume S is a finite set such that #<sup>S</sup> = <sup>K</sup> where #<sup>S</sup> denotes the cardinality of set <sup>S</sup>. We use <sup>α</sup>(s, s*est*) to represent the probability associated with estimated state s*est*. It is defined as,

$$\alpha(s, s\_{est}) := \Pr\_{\mathbf{i}\mathfrak{m}\mathfrak{g} \sim c(s)} [p(\mathbf{i}\mathfrak{m}) = s\_{est}] \tag{1}$$

We estimate the probabilities in α by means of a confusion matrix. Let Img*<sup>s</sup>* ⊆ Img denote a *representative test dataset* for images corresponding to state s, i.e., every sample in Img*<sup>s</sup>* is assumed to be an independently drawn sample from <sup>c</sup>(s). We assume access to representative test datasets corresponding to every state <sup>s</sup> <sup>∈</sup> <sup>S</sup>. Let Img := *<sup>s</sup>*∈*<sup>S</sup>* Img*s*. For any test input img <sup>∈</sup> Img, let <sup>p</sup><sup>∗</sup>(img) <sup>∈</sup> <sup>S</sup> be the label (i.e., the true underlying state) of img, which is known since Img is a test dataset. For the sake of technical presentation, we assume a bijective map rep : <sup>S</sup> <sup>→</sup> [K] that maps every state in <sup>S</sup> to a number in [K] := {1, 2,...,K}. We evaluate p on the test dataset Img to construct a K × K confusion matrix C such that, for any k, k <sup>∈</sup> [K], the element in row <sup>k</sup> and column <sup>k</sup> of this matrix is given by the number of inputs from Img with true state rep−<sup>1</sup>(k) that the perception network <sup>p</sup> classifies as state rep−<sup>1</sup>(k ).

$$\mathcal{L}[k,k'] := \#\left\{ \mathsf{img} \in \overline{\mathsf{img}} \mid p^\*(\mathsf{img}) = \mathsf{rep}^{-1}(k) \land p(\mathsf{img}) = \mathsf{rep}^{-1}(k') \right\} \tag{2}$$

Given the confusion matrix C, empirical estimates for the probabilities in α are calculated as follows,

$$\alpha(\mathbf{rep}^{-1}(k), \mathbf{rep}^{-1}(k')) := \frac{\mathcal{C}[k, k']}{\sum\_{k'' \in [K]} \mathcal{C}[k, k'']}. \tag{3}$$

<sup>3</sup> Our run-time guard does require white-box access.

TaxiNet Example. For the TaxiNet application, we construct two probabilistic maps, αcte and αhe, corresponding to each of the state variables cte and he, using a representative test data set with 11108 samples.<sup>4</sup> Thus,


Table 1. Confusion Matrix for he <sup>α</sup>cte is of type CTE → D(CTE) and <sup>α</sup>he is of type HE → D(HE). Table <sup>1</sup> illustrates the confusion matrix for he. The mapping <sup>α</sup>he is computed in a straightforward way: <sup>α</sup>he(0, 0) = 4748/(4748 + 2139 + 148) = 0.675, giving the probability of estimating correctly that the value of he is zero. Similarly, <sup>α</sup>he(1, 0) = 91/(91 + 2010) = 0.043, giving the probability of estimating incorrectly that the value of he is zero instead of one. The corresponding DTMC code is as follows:

```
[] he=0 → 0.675: ( he_est '=0) + 0.304: ( he_est '=1) + 0.021: ( he_est '=2);
[] he=1 → 0.043: ( he_est '=0) + 0.957: ( he_est '=1) + 0.0: (he_est '=2);
[] he=2 → 0.377: ( he_est '=0) + 0.107: ( he_est '=1) + 0.516: ( he_est '=2);
```
A similar computation is performed for constructing αcte. The resulting code for the closed-loop system is shown in [37], in the appendix.

#### 3.2 DNN Checks as Run-Time Guards

We use DNN-specific checks as run-time guards to improve the performance of the perception network and therefore the safety of the overall system. We hypothesize that for inputs where the checks pass, the network is more likely to be accurate, and therefore, the system is safer.

For our case study, we distill logical rules from the DNN that characterize misbehavior in terms of intermediate neuron values and use them as run-time guards (as described in Sect. 4). More generally, one can use any off-the-shelf pointwise DNN check, such as local robustness [10,15,19,35,39,41] or confidence checks for well-calibrated networks [21], as run-time guards (provided that they are fast enough to be deployed in practice). For practical reasons (TaxiNet is a regression model, it contains ELU [9] activations, we do not have access to the training data) we can not use off-the-shelf checks here.

Modeling DNN Checks. Let us denote the application of (one or more) DNNspecific checks as a function check : (Img <sup>→</sup> <sup>S</sup>) <sup>×</sup> Img <sup>→</sup> <sup>B</sup>, such that, for perception network <sup>p</sup> <sup>∈</sup> Img <sup>→</sup> <sup>S</sup> and image img <sup>∈</sup> Img, check(p, img) = true if <sup>p</sup> passes the checks at input img, and check(p, img) = false otherwise.

We further assume that a system that uses DNN checks as a run-time guard attempts to read the camera sensor multiple (one or more) times, until the check passes; and aborts (or goes to a fail-safe state) if the number of consecutive failed checks reaches a certain threshold. This logic can be generalized to consider more sophisticated safe-mode operations; for instance, the system can decelerate

<sup>4</sup> To simplify the DTMCs, we model the updates to cte and he as independent. For more precision, we can compute confusion matrices and α for the pair (cte, he).

and/or notify an operator when the threshold is reached, as this could indicate serious sensor failure or adverse weather conditions.

To model the effect of the run-time check in our analysis, we can define β as the probability that an image img generated by the camera c, for *any* state s, satisfies check(p, img) = true;

$$\beta := \Pr\_{\mathsf{img} \sim D} [\mathsf{check}(p, \mathsf{img}) = \mathsf{true}] \tag{4}$$

Here <sup>D</sup> is the distribution obtained by *combining* <sup>c</sup>(s) for all states <sup>s</sup> <sup>∈</sup> <sup>S</sup>. <sup>5</sup> To be more precise we can define a separate β*<sup>s</sup>* for each state s. We estimate β using the representative set of images Img,

$$\beta := \frac{\#\overline{\mathbb{Tm}}^{true}}{\#\overline{\mathbb{Tm}}^{true}} \tag{5}$$

where Img*true* := {img <sup>∈</sup> Img <sup>|</sup> check(p, img) = true}.

For the overall analysis of the closed-loop system, irrespective of the state s, we can assume that the DNN check will pass with a probability β. Moreover, since the perception network only processes images that pass the DNN check, we construct a refined probabilistic abstraction α*true* using conditional probability:

$$\alpha^{true}(s, s\_{est}) := \Pr\_{\mathsf{img}\sim c(s)}[p(\mathsf{img}) = s\_{est} | \mathsf{check}(p, \mathsf{img}) = \mathsf{true}] \tag{6}$$

We can estimate α*true* as before, but the confusion matrix is built using only the images that pass the DNN check, i.e., for dataset Img*true* <sup>⊆</sup> Img.

TaxiNet Example. For TaxiNet, out of 11108 inputs, 9125 inputs (i.e., 82.1%) pass the DNN check resulting in the following code:

$$\begin{array}{llll} \text{if } \mathtt{i} : [0..1!] & \mathtt{init} : 0;\\ \{\} & \mathtt{pc=0} \quad \mathtt{k} : \mathtt{i} \lhd \mathtt{k} \to \mathtt{0} . \mathtt{821} : \ (\mathtt{v} \mathrel{\mathtt{\bullet}} \mathtt{1}) & \mathtt{k} \ (\mathtt{pc} \mathrel{\mathtt{\bullet}} \mathtt{1}) & \mathtt{k} \ (\mathtt{i} \mathrel{\mathtt{\bullet}} \mathtt{0}) & \mathtt{+} \ 0 . \mathtt{179} : \ (\mathtt{v} \mathrel{\mathtt{\bullet}} \mathtt{0}) & \mathtt{k} \ (\mathtt{i} \mathrel{\mathtt{\bullet}} \mathtt{i} + \mathtt{i} \mathrel{\mathtt{\bullet}} \mathtt{1}) \\ \end{array}$$

We model the result of applying the DNN check with variable <sup>v</sup>; <sup>v</sup> = 1 if the check returns true for an image and <sup>v</sup> = 0 otherwise. <sup>M</sup> is the number of allowed repeated sensor readings and i is used to count the number of failed DNN checks.

The abstraction for state variables he (αhe) and cte (αcte) is only computed for the inputs that pass the check (i.e., for <sup>v</sup> = 1) based on newly computed confusion matrices. The DTMC code for the closed-loop system with run-time guards is shown in [37], in the appendix.

#### 3.3 Confidence Analysis

The construction of the probabilistic abstractions relies on calculating empirical point estimates of the required probabilities. However, these empirical estimates lack statistical guarantees and can be off by an arbitrary amount from the true probabilities. To address this concern, we experiment with using FACT [5,7]

<sup>5</sup> To simplify the presentation, we omit the precise mathematical formulation for D.

to calculate *confidence intervals* for the probability that the safety properties of the closed-loop system are satisfied. The inputs to FACT are: 1) a parametric DTMC m where each empirically estimated transition probability is represented by a parameter, 2) a PCTL formula <sup>φ</sup>, 3) an error level <sup>δ</sup> <sup>∈</sup> (0, 1) and 4) an *observation function* O mapping state s to a tuple representing the number of observations for each outgoing transition from s; in our case, the number of observations can be obtained directly from the computed confusion matrices, i.e., <sup>O</sup>(s)=(C[rep(s), 1],..., <sup>C</sup>[rep(s), K]). FACT synthesizes a (1 <sup>−</sup> <sup>δ</sup>)-confidence interval [a, b] <sup>⊆</sup> [0, 1] for the probability that <sup>φ</sup> is satisfied, given the observations.

TaxiNet Example. The following partial code illustrates the parametric version of the code provided in Sect. 3.1 (with the complete code for the parametric models provided in [37], in the appendix). The first three lines represent the number of observations obtained from the confusion matrix in Table 1.

```
param double x = 4748 2139 148;
param double y = 91 2010;
param double z = 744 211 1017;
...
[] he=0 → x1:(he_est '=0) + x2:(he_est '=1) + (1-x1-x2):(he_est '=2);
[] he=1 → y1:(he_est '=0) + (1-y1):(he_est '=1);
[] he=2 → z1:(he_est '=0) + z2:(he_est '=1) + (1-z1-z2):(he_est '=2);
```
### 4 Experiments

In this section, we report on the experiments that we conducted as part of our probabilistic safety analysis of the center-line tracking autonomous system.

We built two DTMC models, m<sup>1</sup> and m2, denoting the closed-loop center-line tracking system without and with a run-time guard, respectively. The airplane dynamics and the controller are identically modeled in the two DTMCs as discrete components. The code for the models (in PRISM syntax) and more details about the analysis are presented in [37], in the appendix.

Mining Rules for Run-time Guards. We leverage our prior work [17], to extract rules of the form Pre =<sup>⇒</sup> P ost from the DNN. P ost is the condition <sup>|</sup>cte<sup>∗</sup> <sup>−</sup> cte<sup>|</sup> <sup>&</sup>gt; 1.0 <sup>m</sup> ∨ |he<sup>∗</sup> <sup>−</sup> he<sup>|</sup> <sup>&</sup>gt; 5◦ on the regression model's outputs and Pre is a condition over the neuron values in the three dense layers of TaxiNet (cte<sup>∗</sup> and he<sup>∗</sup> denote ground-truth values). The considered P ost characterizes invalid behavior (as explained in [31]). If an input satisfies Pre, the DNN check is considered to have failed on that input. Pre can be evaluated efficiently during the forward pass of the model, making it a good run-time guard candidate. Here is an example of a rule for invalid behavior:

$$\begin{aligned} N\_{1,85} &< -0.998 \land N\_{2,50} < = 3.31 \land N\_{1,84} < = -0.994 \land N\_{1,15} > -0.999\\ \land N\_{1,21} &< = 1.711 \land N\_{1,70} < = 11.088 \land N\_{1,51} > -0.999 \land N\_{1,21} > -0.637 \implies \\ & & \qquad \qquad \qquad \qquad |\mathsf{cte}^{\sf \sf } - \mathsf{cte}| > 1.0 \ m \lor |\mathsf{he}^{\sf \sf } - \mathsf{he}| > 5^{\diamond} \end{aligned}$$

N*i,j* indicates the j*th* neuron in the i *th* dense layer. The conditions over neuron values can be checked during the forward pass of the DNN. If an input satisfies the conditions, it is interpreted as failing the check. If the check consecutively fails M times, the system aborts, meaning that the system stops operating and hands over control to a fail-safe mechanism (such as the pilot). More details on the rules and their deployment as run-time guards are in [37], in the appendix.

Confusion Matrices. The confusion matrices for the classification version of TaxiNet, computed for the two cases (without and with run-time guard) are shown in [37], in the appendix. The tables can be used by developers to better understand the DNN performance. For instance, the results summarized in the confusion matrices indicate that the DNN performs best for inputs lying on the center-line, which can be attributed to training being done mainly using scenarios where the plane follows the center-line. The model appears to perform better when the plane is heading left, as opposed to heading right, which may be due to camera position. These observations can be used by developers to improve the model, by training on more scenarios. Note also that the model does not make 'blatant' errors, mistaking inputs on the *left* as being on the *right* (of center-line) or vice-versa (see e.g., entries with zero observations). Formal proofs can provide guarantees of absence of such transitions.

Fig. 3. Probabilistic model checking results via PRISM

Analysis. We analyzed m<sup>1</sup> and m<sup>2</sup> with respect to the two PCTL properties, <sup>P</sup> =?[F(cte = <sup>−</sup>1)] (*Property 1*), and <sup>P</sup> =?[F(he = <sup>−</sup>1)] (*Property 2*)<sup>6</sup>. The airplane is assumed to start from a initial position on the center-line and heading straight. For m2, i.e. the model with a run-time guard, we also evaluate the probability of the TaxiNet system going to the abort state using the property, <sup>P</sup> =?[F(<sup>v</sup> =0& <sup>i</sup> = <sup>M</sup>) (*Property 3*), where <sup>M</sup> is the threshold for the number of consecutive run-time check failures.

The probabilities of these properties being satisfied, calculated by PRISM, are shown in Fig. 3, where N is a constant in the DTMCs that dictates the length of the finite-time horizon considered for the analysis. Note that the system has an additional planning layer that calculates the waypoints for the airplane's course on the taxiway. The system is only used for controlling the airplane movement between pairs of waypoints, hence a short horizon suffices.

The confidence intervals computed with FACT are shown in Fig. 4, at different confidence levels (0.95 to 0.99), for <sup>N</sup> = 4. For computing the intervals, we ignore the transitions in the DTMCs that were not observed in our data (see [37] for more details).

<sup>6</sup> We rewrote the properties in terms of the discrete values.

The PRISM analysis scales well; e.g., evaluating *Property 1* for model m<sup>2</sup> (<sup>N</sup> = 30) requires less than 0.1 s on an M1 MacBook Pro, 16 GB RAM. The numbers are similar for other queries. However, the confidence analysis does not scale as well; we could not go beyond <sup>N</sup> = 4 for a timeout of two hours, with *Property 1* hardest to check. Newer work, fPMC [13], addresses these scalability challenges but we found it not yet mature enough to be applied to our models.

Discussion and Lessons Learned. The experiments demonstrate the feasibility of our approach, which enables reasoning about a complex DNN interacting with conventional (discrete-time) components via a simple probabilistic abstraction. Our analysis not only provides qualitative (i.e., an error is reachable or not) but also quantitative (i.e., likelihood of error) results, helping developers assess the risk associated with the analyzed scenario.

Fig. 4. Confidence interval results via FACT

The results highlight the benefit of the run-time guards in improving the safety of the overall system; see Figs. 3(a,b) for lower error probabilities and Figs. 4(a,b) for tighter intervals for m2. The probability of aborting is very small, indicating the efficacy of the fail-safe mechanism (see Figs. 3(c)). More importantly, since the DNN demonstrates higher accuracy on the inputs where the run-time check passes, the results also indicate that *improved accuracy of the DNN translates into improved safety*. The computed probabilities and confidence intervals can be examined by developers and regulators to ensure that system safety is met at required levels. If the confidence intervals are too large, they can be made tighter by adding more data, as guided by the confusion matrices.

Based on our feedback (confusion matrices) our industrial partner is retraining the perception network. As the system is in its early stages, our industrial partner was more interested in the trends suggested by our analysis rather than the exact probability results. For instance, our results indicate that safety will increase with a better-performing network. The partner was also interested in how the DNN-specific analysis contributes to the system-level analysis. A probabilistic analysis is best viewed as an "average-case" analysis rather than "worstcase". Nevertheless, such analysis is still useful since it conveys whether the system at least behaves safely in the average-case.

### 5 Conclusion

We demonstrated a method for the analysis of the safety of autonomous systems that use complex DNNs for visual perception. Our abstraction helps separate the concerns of DNN and conventional system development and evaluation. It also enables the integration of heterogeneous artifacts from DNN-specific analysis and system-level probabilistic model checking. The approach produces not only qualitative results but also provides insights that can be used in quantitative safety assessment for AI/DNN-enabled systems. This is, potentially, an important step to fill one of the gaps of quantitative evaluation for future AI certification [1].

Future work involves experimentation with image data sets representing a variety of environment conditions. We also plan to refine our models, inducing finer partitions on the DNN, and validate them through simulations. Another future research direction involves the study of the composition of safety proofs for the system analyzed in different scenarios. Finally, we are working on compositional analysis techniques to achieve worst-case (non-probabilistic) guarantees.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Hybrid Controller Synthesis for Nonlinear Systems Subject to Reach-Avoid Constraints

Zhengfeng Yang<sup>1</sup>, Li Zhang<sup>1</sup>, Xia Zeng2(B) , Xiaochao Tang<sup>1</sup>, Chao Peng<sup>1</sup>, and Zhenbing Zeng<sup>3</sup>

<sup>1</sup> Shanghai Key Lab of Trustworthy Computing, East China Normal University, Shanghai, China

{zfyang,cpeng}@sei.ecnu.edu.cn, {lzhang,xctang}@stu.ecnu.edu.cn

<sup>2</sup> School of Computer and Information Science, Southwest University, Chongqing, China

xzeng0712@swu.edu.cn

<sup>3</sup> Department of Mathematics, Shanghai University, Shanghai, China zbzeng@shu.edu.cn

Abstract. There is a pressing need for learning controllers to endow systems with properties of safety and goal-reaching, which are crucial for many safety-critical systems. Reinforcement learning (RL) has been deployed successfully to synthesize controllers from user-defined reward functions encoding desired system requirements. However, it remains a significant challenge in synthesizing provably correct controllers with safety and goal-reaching requirements. To address this issue, we try to design a special hybrid polynomial-DNN controller which is easy to verify without losing its expressiveness and flexibility. This paper proposes a novel method to synthesize such a hybrid controller based on RL, low-degree polynomial fitting and knowledge distillation. It also gives a computational approach, by building and solving a constrained optimization problem coming from verification conditions to produce barrier certificates and Lyapunov-like functions, which can guarantee every trajectory from the initial set of the system with the resulted controller satisfies the given safety and goal-reaching requirements. We evaluate the proposed hybrid controller synthesis method on a set of benchmark examples, including several high-dimensional systems. The results validate the effectiveness and applicability of our approach.

Keywords: Formal verification · Controller synthesis · Reinforcement learning · Barrier certificate · Lyapunov-like function

### 1 Introduction

The design of control and decision-making software for autonomous systems is a key part of many industrial applications, such as unmanned aerial vehicles, ground vehicles and general robots, therefore it attracts continued attention in the last decade [7,9,12,14]. Among many research works in this field, a highly challenging problem is the controller synthesis, i.e., to build control systems that guarantee the safety and the reachability simultaneously. As an emergency approach, the machine learning method has also been developed to tackle this problem in recent years. Several existing techniques focus on learning a control policy from user-defined reward/cost functions for encoding the required properties. A typical way is to use the framework of reinforcement learning (RL) which evaluates and improves the controller's performance by interacting with environments and systems. Because of its strong ability to deal with nonlinear and/or uncertain (or indeterministic) dynamical systems of high dimensions, as well as the universal approximation power of the deep neural networks, the RLbased controller synthesis has been extensively studied, and substantial progress has been made by different research teams [22,23]. However, formal reasoning of the required properties of such DNN-controlled dynamical systems is an arduous and challenging problem which makes the practical use of RL still limited. For safety/reachability verification of the system under the learned controller, one main approach is tracing the reachable sets of the system through computing [8,13,30], which needs to measure the solutions to the ODEs of the system, thus the scalability of these approaches is largely restricted. Another major approach is creating a certificate synthesis through solving the associated SMT problems [6,16,31], which also has limited scalability since the complexity of symbolic computation in the general purpose SMT solvers. In this paper, we will utilize the advantage of RL to train an elaborately designed hybrid controller, which makes the system easier to be verified with safety and goal-reaching requirements while maintaining controllability.

Our proposed hybrid controller is in the form of a lower degree polynomial plus a relatively small size neural network, called a polynomial-DNN controller. The learning-based process of the polynomial-DNN controller synthesis is divided into the following four phases: (1) at first we train a well-performing DNN controller by RL with safety and goal-reaching requirements; (2) then we manage to fit the trained DNN roughly by a polynomial with a prescribed lower degree bound as one part of the hybrid structure; (3) we construct a small and special neural network (NN) with Square activation function on the hidden layer and tanh on the output layer as the supplement for the polynomial part, and subsequently distill an initial polynomial-DNN controller from the original DNN controller; (4) finally, using RL from the distilled one to fine-tune a well-performing polynomial-DNN controller.

Thanks to the hybrid form consisting of a polynomial and a small NN with the special structure, the obtained hybrid controller is easier to verify and maintains its expressiveness and flexibility for two main reasons: (1) considering the verification efficiency, the original DNN is fitted by a lower degree polynomial through coarse approximation which can be easily obtained and significantly reduce the difficulty of formal verification; (2) the NN part compensates for the controller performance loss caused by the coarse polynomial approximation. Benefitting from its feature, the system with the polynomial-DNN controller can be equivalently transformed into a polynomial form via system recasting, which makes post-verification easily solvable.

The necessity of proposing a polynomial-DNN type controller can be explained as follows. Transforming DNN into polynomial form enables the application of efficient polynomial solving techniques for formal verification, but there is no guarantee that a polynomial of a specified degree bound can fit a DNN with high accuracy; meanwhile, the approximation and corresponding verification problem will become quite complicated as the degree of the polynomial increases, which also may result in the failure of the verification. Therefore, we resort to lower degree polynomial approximation simultaneously retrain a small NN as the compensation for loss of accuracy, since a rough approximating polynomial part cannot replace the whole DNN controller, and the verification may fail for the system controlled by the polynomial part. The hybrid controller balances the richness of expressiveness and the ease of formal verification very well. To check the effectiveness of the proposed approach, we have evaluated the hybrid controller synthesis on a set of commonly used benchmark examples. To summarize, the main contributions of this paper are as follows:


#### 1.1 Related Works

Several research works focus on the controller synthesis for the safety requirement, in which a typical way is to use reinforcement learning or supervised learning to build the overall learning framework for synthesizing security certificates (such as control barrier function, CBF) [1,26–29].

For the goal-reaching requirement, most of existing works concentrate on building controllers to drive the system to reach a specified set within a time bound [8,11,13,30]. Some others focus on synthesizing the control policy to make the system asymptotically converge to a specified goal state set, which is called stability requirement. The certificate of Lyapunov functions generation is a practical routine in this aspect [3–5,15,25].

In fact, learning a reach-avoid controller, namely, for both safety and goalreaching requirements, is a much more complicated problem. An example was given in [10], where a correct-by-construction controller that consists of a reference controller and a tracking controller has been successfully built to derive the actual trajectory according to the reference trajectory, and different reference controllers have been pre-designed for different scenarios.

Recently, a new learning-based approach is implemented in [17], where the safe and goal-reaching policy is constructed by jointly learning two additional certificate functions using supervised learning. Notice that there may exist the risk of synthesizing false certificates, as the certificate constraints are only satisfied at the sampled points. Although one can perform posterior formal verification to overcome this weak-point, it would be difficult to do the verification with several DNNs in the system. By comparison, our synthesized hybrid polynomial-DNN controller has clear advantages on formal verification.

### 2 Preliminaries

**Notations**. Let <sup>R</sup>[**x**] denote the ring of polynomials with coefficients in <sup>R</sup> over variables **<sup>x</sup>** = [x1, x1,...,x<sup>n</sup>] <sup>T</sup> , and <sup>R</sup>[**x**] <sup>n</sup> denotes the n-dimensional polynomial vector. Let <sup>Σ</sup>[**x**] <sup>⊂</sup> <sup>R</sup>[**x**] be the set of SOS polynomials. The distance from **<sup>x</sup>** to a set <sup>S</sup> is defined by **x**<sup>S</sup> = inf<sup>s</sup>∈<sup>S</sup> **<sup>x</sup>** <sup>−</sup> <sup>s</sup>2. A continuous function <sup>α</sup> : [0, a) <sup>→</sup> [0, +∞) for some a > 0 is said to belong to class <sup>K</sup> if it is strictly increasing and satisfies <sup>α</sup>(0) = 0. A continuous function <sup>β</sup> : (−b, c) <sup>→</sup> (−∞, +∞) for some b, c > 0 is said to belong to extended-class <sup>K</sup> if it is strictly increasing and satisfies <sup>β</sup>(0) = 0. A continuous function <sup>γ</sup> : [0, c) <sup>×</sup> [0,∞) <sup>→</sup> [0, +∞) for some c > 0 belongs to class KL, if for each fixed <sup>s</sup>, the mapping <sup>γ</sup>(r, s) belongs to class <sup>K</sup> with respect to <sup>r</sup>, and for each fixed <sup>r</sup>, the mapping <sup>γ</sup>(r, s) is decreasing with respect to <sup>s</sup>, and <sup>γ</sup>(r, s) <sup>→</sup> 0 as <sup>s</sup> → ∞.

This section formulates the safety and goal-reaching controller synthesis problem. A controlled continuous dynamical system is modeled by first-order ordinary differential equations

$$
\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{u}), \quad \text{with } \mathbf{u} = \mathbf{k}(\mathbf{x}), \tag{1}
$$

where **<sup>x</sup>** <sup>∈</sup> <sup>Ψ</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> are the system states, **<sup>u</sup>** <sup>∈</sup> <sup>U</sup> <sup>⊆</sup> <sup>R</sup><sup>m</sup> are the control inputs, and **<sup>f</sup>** <sup>∈</sup> <sup>R</sup>[**x**] <sup>n</sup> is the vector field defined on the state space <sup>D</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup>.

Assume **f** satisfies the local Lipschitz condition, which ensures (1) has a unique solution **<sup>x</sup>**(t, **<sup>x</sup>**<sup>0</sup>) in <sup>D</sup> for every initial state **<sup>x</sup>**<sup>0</sup> <sup>∈</sup> <sup>D</sup> at <sup>t</sup> = 0. A dynamical system is equipped with a domain Ψ ⊂ D and an initial set Θ ⊂ Ψ, represented as a triple <sup>C</sup> . = (**f**,Ψ,Θ). Given a prespecified unsafe region <sup>X</sup><sup>u</sup> <sup>⊂</sup> <sup>D</sup>, we say that the system C is *safe* if all trajectories starting from Θ can not evolve into the unsafe region Xu, which has been widely investigated in safety critical applications.

Definition 1 (Safety). *For* a *controlled constrained continuous dynamical system (CCDS)* <sup>C</sup> = (**f**,Ψ,Θ) *and a given unsafe region* <sup>X</sup>u*, the system is safe if for all* **<sup>x</sup>**<sup>0</sup> <sup>∈</sup> <sup>Θ</sup>*, there does not exist* <sup>t</sup><sup>1</sup> <sup>&</sup>gt; <sup>0</sup> *such that*

$$
\forall t \in [0, t\_1). \mathbf{x}(t, \mathbf{x}\_0) \in \Psi \quad \text{and } \mathbf{x}(t\_1, \mathbf{x}\_0) \in X\_u.
$$

At the same time, another important property has received much attention which is a generalization of stability and called *goal-reaching*.

Definition 2 (Goal-reaching). *Given a controlled CCDS* <sup>C</sup> = (**f**,Ψ,Θ) *and a set of goal states* X<sup>g</sup> ⊂ D*, the system* C *is goal-reaching with respect to the goal set* Xg*, if there exists a* KL*-function* γ *such that for any* **x**<sup>0</sup> ∈ Θ*,*

$$\|\mathbf{x}(t)\|\_{X\_{\theta}} \le \gamma(\|\mathbf{x}(0)\|\_{X\_{\theta}}, t) \quad \text{for all } t \ge 0.$$

Definition 3 (Safe and Goal-reaching Controller Synthesis). *Given a controlled CCDS* <sup>C</sup> = (**f**,Ψ, Θ) *with* **<sup>f</sup>** *defined by (1) with an unsafe set* <sup>X</sup><sup>u</sup> *and a goal set* Xg*, design a locally Lipschitz continuous feedback control law* **k** *such that the closed-loop system* <sup>C</sup> *with* **<sup>f</sup>** = **<sup>f</sup>**(**x**, **<sup>k</sup>**(**x**)) *is both safe and goal-reaching as per Definition 1 and 2.*

The concept of *barrier certificates* plays an important role in safety verification of continuous systems. The essential idea is to use the zero level set of a barrier certificate <sup>B</sup>(**x**) as a barrier to separate all the reachable states from the unsafe region. The following concept of barrier certificate, adapted from [24], can be used to guarantee the safety of a given controlled CCDS.

Theorem 1. *[24] Given a controlled CCDS* <sup>C</sup> = (**f**,Ψ,Θ)*, with* **<sup>f</sup>** *defined by (1), a feedback control law* **<sup>u</sup>** = **<sup>k</sup>**(**x**)*, and the unsafe region* <sup>X</sup>u*. Suppose there exists a real-valued function* <sup>B</sup> : <sup>Ψ</sup> <sup>→</sup> <sup>R</sup> *satisfying the following conditions:*

(i) <sup>B</sup>(**x**) <sup>≥</sup> 0 <sup>∀</sup>**<sup>x</sup>** <sup>∈</sup> <sup>Θ</sup>*,* (ii) <sup>B</sup>(**x**) <sup>&</sup>lt; <sup>0</sup> <sup>∀</sup>**<sup>x</sup>** <sup>∈</sup> <sup>X</sup>u*,* (iii) <sup>B</sup>(**x**)=0 ⇒ LfB(**x**) <sup>&</sup>gt; 0 <sup>∀</sup>**<sup>x</sup>** <sup>∈</sup> <sup>Ψ</sup>*,*

*where* <sup>L</sup>fB(**x**) *denotes the Lie-derivative of* <sup>B</sup>(**x**) *along the vector field* **<sup>f</sup>**(**x**)*, i.e.,* <sup>L</sup>fB(**x**) = n i=1 ∂B ∂x<sup>i</sup> · <sup>f</sup><sup>i</sup>(**x**)*, then* <sup>B</sup>(**x**) *is a barrier certificate for the closed-loop system* <sup>C</sup> *with the control law* **<sup>k</sup>**(**x**)*, and the safety of system* <sup>C</sup> *is guaranteed.*

For the goal-reaching controller design, we use a more general Lyapunov-like function which is introduced by the following definition.

Definition 4 (Lyapunov-like function). *Given a continuous system* <sup>C</sup> = (**f**,Ψ,Θ)*, and the set of goal states* <sup>X</sup><sup>g</sup> <sup>⊆</sup> <sup>Ψ</sup>*, a continuous differentiable realvalued function* <sup>V</sup> : <sup>Ψ</sup> <sup>→</sup> <sup>R</sup> *is said to be a Lyapunov-like function if*

(i) {**x**|<sup>V</sup> (**x**) <sup>≤</sup> <sup>0</sup>} <sup>=</sup> <sup>∅</sup> *and* {**x**|<sup>V</sup> (**x**) <sup>≤</sup> <sup>0</sup>} ⊆ <sup>X</sup>g*,* (ii) <sup>L</sup>f<sup>V</sup> (**x**) ≤ −β(<sup>V</sup> (**x**)) <sup>∀</sup>**<sup>x</sup>** <sup>∈</sup> Ψ,

*where* <sup>β</sup> *is some extended class* <sup>K</sup> *function, and* <sup>L</sup>f<sup>V</sup> (**x**) = n i=1 ∂V ∂x<sup>i</sup> · <sup>f</sup><sup>i</sup>(**x**)*.*

As mentioned in [17], the above Lyapunov-like function is more general than the classic one used in [3,4,21,25]. The Lyapunov-like function does not necessarily require that <sup>L</sup>f<sup>V</sup> (**x**) has to be always negative-definite, that is, <sup>L</sup>f<sup>V</sup> (**x**) <sup>&</sup>gt; <sup>0</sup> can happen on {**x**|<sup>V</sup> (**x**) <sup>&</sup>lt; 0}, which will make the function less restrictive.

Theorem 2. *For a controlled CCDS* <sup>C</sup> = (**f**,Ψ,Θ) *with* **<sup>f</sup>** *defined by (1) and a set of goal states* <sup>X</sup><sup>g</sup> <sup>⊆</sup> <sup>Ψ</sup>*, if* <sup>V</sup> (**x**) *is a Lyapunov-like function as in Definition 4, then the system under* **<sup>u</sup>** = **<sup>k</sup>**(**x**) *is goal-reaching with respect to* <sup>X</sup>g*.*

Combining Theorem 1 and Theorem 2, we obtain the following assertion stating that the existence of barrier certificates and Lyapunov-like functions guarantees the control law is both safe and goal-reachable. Hereafter, we refer to both barrier and Lyapunov-like functions as certificate functions for simplification.

### 3 Hybrid Polynomial-DNN Controllers Training

For the safe and goal-reaching controller synthesis problem, we design an easyto-verify control policy with the aid of reinforcement learning (RL) based on barrier certificate and Lyapunov-like function generation. As we know, it is hard for a controller with a simple structure to guarantee the safe and goal-reachable behaviors for large-scale systems. Contrarily, controllers with complex structures can make the system have more flexible behaviors. Unfortunately, it requires much more computation efforts to tackle reach-avoid verification of the system with such a complex controller. To make it amenable, we propose a method to learn a controller with special structure, *hybrid polynomial-DNN controller* , which is easily verifiable, and can be customized to safety and goal-reaching requirement. Specifically, this hybrid controller consists of a polynomial and a small-size neural network with one single hidden layer. Notably, it is expected to exhibit similar behaviors to the original complex DNN controller, but is much easier to be verified thanks to its special structure, which will be elaborated in Sect. 4.

To achieve this, we adopt a low-degree polynomial to roughly approximate the DNN. Then we fix the structure of a small-size neural network and append it to the low-degree polynomial to construct a hybrid form controller, which is retrained using RL. To accelerate the retraining process, we use distillation technology to distill an initialization of the NN part in the hybrid controller. In summary, the learning-based process of the hybrid controller synthesis is divided into the following three stages, as shown in Fig. 1.

Fig. 1. The diagram of training framework.

– Train a deep neural network controller via RL. Based on reinforcement learning, we train a deep neural network (DNN) controller for the given control system directly. Briefly, the RL procedure continuously uses the current controller to drive the system by interacting with the environment, and updates the relevant parameters of the controller by rewarding and penalizing. Through sufficient simulation and training, we expect to obtain a DNN controller that enables the system behavior to avoid the unsafe set and reach the specified target set with high probability.


#### 3.1 Training Well-Performing DNN Controllers Using RL

As illustrated in Fig. 1, the RL method is applied to train a well performed controller, so that the system is able to avoid obstacles and reach the goal region within the time bound.

We construct the reward function through encoding the desired behaviours of the closed-loop system under the DNN controller, which assures unsafe region avoidance and goal region reachability. We hope that the RL helps to synthesize an ideal controller by the designed reward, and all the trajectories of the closedloop system starting from the initial set Θ cannot evolve into the unsafe region Xu, and reach the desired region X<sup>g</sup> under the trained DNN controller. So the reward function design should concern two aspects, i.e., reward the behaviours far away from the unsafe region, and reward the behaviours approaching the goal region. In terms of the safety requirement, the reward function should penalize the behaviours approaching Xu. Thus, the reward function can be defined as a joint Gaussian distribution on the system state, whose expectation and variance are the center and radius of Xu, respectively,

$$
tau\_u(\mathbf{x}\_t) = -e^{-\frac{1}{2}\sum\_{i=1}^n (\frac{x\_i(t) - x\_u^i}{\rho\_u^i})^2}
$$

where **<sup>x</sup>**<sup>u</sup> = (x<sup>1</sup> u,...,x<sup>n</sup> <sup>u</sup>) <sup>∈</sup> <sup>X</sup><sup>u</sup> <sup>⊂</sup> <sup>D</sup> is the center of <sup>X</sup><sup>u</sup> and <sup>ρ</sup><sup>u</sup> is the radius of Xu. Similarly, the reward for the goal-reaching purpose could be defined as a joint Gaussian distribution,

$$reward\_g(\mathbf{x}\_t) = e^{-\frac{1}{2}\sum\_{i=1}^n (\frac{x\_i(t) - x\_g^i}{\rho\_g^i})^2}$$

where **<sup>x</sup>**<sup>g</sup> = (x<sup>1</sup> g,...,x<sup>n</sup> <sup>g</sup> ) and <sup>ρ</sup><sup>g</sup> are the center and the radius of <sup>X</sup>g, respectively. The entire reward function consists of the above two components, i.e.

$$reward(\mathbf{x}\_t) = \lambda \cdot reward\_g(\mathbf{x}\_t) + (1 - \lambda) \cdot reward\_u(\mathbf{x}\_t),$$

to achieve the task of safety and goal reachability, where 0 <λ< 1 is the parameter to control the weights between reward<sup>g</sup>(**x**<sup>t</sup>) and reward<sup>u</sup>(**x**<sup>t</sup>).

The remaining problem is to train the controller via RL. Here we use Deep Deterministic Policy Gradient (DDPG) [20] which is a popular RL approach suited for continuous control applications. The DDPG algorithm combines the value-based and policy-based methods, and is made up of two neural networks: the critic network and actor network.

To train the desired controller, we first generate a set of initial states from Θ. For each sampled initial state **x**0, with the help of **u**RL, one may yield the associated trajectory as a discrete time state sequence {**x**0, **x**1, ··· , **x**t, ··· , **x**m} which does not enter the unsafe area, and then collect the transition tuples (**x**t, **<sup>x</sup>**t+1, **<sup>u</sup>**t, reward(**x**<sup>t</sup>)) to form a replay buffer. Every few time steps, a batch size of data is sampled from the replay buffer to update the parameters of critic network and actor network, and then the new controller is used to simulate the trajectory to collect new data until the controller behaves well.

#### 3.2 Polynomial Approximation

Following the RL training process in Sect. 3.1, one may probably adopt a complex DNN structure to obtain a well-performing DNN controller. For safety critical systems, the properties of such synthesized controllers, such as safety and goalreaching, need to be formally guaranteed. However, it is a challenging problem to verify specified properties for the closed-loop system under the trained DNN-type controller due to its complexity. Consequently, a high-degree polynomial can be found by approximating the trained DNN with extremely high precision and may be expected as the controller candidate to be verified with polynomial constraint solving. However, it could be an unbearable high computation complexity for the corresponding verification problem with such high-degree polynomial controller, which will be explained in the experiment section.

Based on the trained DNN controller **u**RL through RL, we construct an easily verifiable controller with a hybrid form, which could lead the system to be safe and goal-reachable. We firstly roughly approximate the **u**RL by a low-degree polynomial, denoted by <sup>p</sup>(**x**), as a part. Afterwards, we retrain a small NN, denoted by <sup>k</sup>(**x**), with one hidden layer as the compensation for the approximation error between **<sup>u</sup>**RL and <sup>p</sup>(**x**). The hybrid polynomial-DNN controller is built, i.e., <sup>p</sup>(**x**)+k(**x**). The main task of this subsection focuses on how to obtain the approximate polynomial <sup>p</sup>(**x**) based on sampling points.

Concretely, a real coefficient vector **c** is used to parameterize a polynomial <sup>p</sup>(**x**, **<sup>c</sup>**) with a given degree <sup>d</sup>, i.e., <sup>p</sup>(**x**, **<sup>c</sup>**) = - <sup>j</sup> <sup>c</sup><sup>j</sup> <sup>b</sup><sup>j</sup> (**x**), where <sup>b</sup><sup>j</sup> (**x**) are monomials with total degree ≤ d. Given the sampling points, we can obtain the coefficient vector **c**<sup>∗</sup> by solving a least squares problems. Thus, the approximate polynomial <sup>p</sup>(**x**, **<sup>c</sup>**∗) is the approximation of **<sup>u</sup>**RL(**x**) on <sup>Ψ</sup>, denoted by <sup>p</sup>(**x**) for brevity. And the residual function <sup>r</sup>(**x**) denotes the error between the approximate polynomial <sup>p</sup>(**x**) and the DNN controller, i.e., <sup>r</sup>(**x**) = **<sup>u</sup>**RL(**x**) <sup>−</sup> <sup>p</sup>(**x**).

Having <sup>p</sup>(**x**), we cannot just regard it as the controller, because the error <sup>r</sup>(**x**) between **<sup>u</sup>**RL(**x**) and <sup>p</sup>(**x**) can not be ignorable. To take this into account, we compensate for the error by fitting the residual function <sup>r</sup>(**x**), by means of retraining a hybrid controller <sup>p</sup>(**x**)+k(**x**|θ ) to rectify the system behavior, where θ is the parameter to learn the NN part.

#### 3.3 Training the Residual Controller

In this part, we retrain to compensate for the difference in system behavior guided by the polynomial part <sup>p</sup>(**x**) versus the original DNN controller **<sup>u</sup>**RL.

The Structure of the Residual Network. We design a special neural network as the compensation to make the resulting verification problem tractable. As illustrated in Fig. 2, a typical DNN has a layered architecture and can be represented as a composition of its <sup>L</sup> layers: <sup>k</sup>(**x**|θ ) = <sup>l</sup><sup>L</sup> ◦ <sup>l</sup><sup>L</sup>−<sup>1</sup> ◦···◦ <sup>l</sup>1(**x**), where <sup>l</sup><sup>i</sup>(**x**) = <sup>σ</sup><sup>i</sup>(Wi**<sup>x</sup>** <sup>+</sup> <sup>b</sup><sup>i</sup>) which is parameterized by a weight matrix <sup>W</sup><sup>i</sup> and a bias vector bi, and all the parameters are denoted by θ for brevity. This work considers <sup>σ</sup><sup>i</sup> to be square activation on the hidden layers and tanh activation function on the output layer L, as shown in Fig. 2. This special setting has two advantages: i) ability to converge in the training process with the help of normalized output in the range of [−1, 1]; ii) ability to transform the control system with NN controller of this type into a polynomial form by system recasting (c.f. 4.1 for more details). Regarding ii), we introduce a new variable xn+1 to represent the NN output, i.e., <sup>x</sup>n+1 := tanh(h(**x**)), where <sup>h</sup>(**x**) := <sup>l</sup><sup>L</sup>−<sup>1</sup> ◦···◦ <sup>l</sup><sup>1</sup>(**x**) denotes the polynomial part in NN. The main observation that allows us to transform the system with this NN controller into an equivalent polynomial system is the fact that the special NN's derivative can be expressed as

$$
\dot{x}\_{n+1} = (1 - x\_{n+1}^2)\dot{h}.\tag{2}
$$

Actually, we construct such small NN with one single hidden layer because it is enough to construct a simple structure neural network further added to the controller as the compensation to control systems well.

The Residual Controller Training. Then we retrain the hybrid controller <sup>p</sup>(**x**)+k(**x**|θ ) making use of RL technique as described in the previous subsection. In order to improve training efficiency, the knowledge distillation technique is used to obtain the initialization of the NN part, i.e., <sup>k</sup>(**x**|θ ). It is easy to achieve

Fig. 2. Structure of the small neural network in the hybrid controller.

this by regarding the residual function <sup>r</sup>(**x**) as the ensemble network (also called teacher network) and distilling the knowledge from it into a small model (i.e., student network). The learned student network realizes the knowledge transfer from the teacher network and provides the initial values for the <sup>k</sup>(**x**|θ ) for further training.

We reiterate that the purpose of constructing a hybrid controller by adding <sup>k</sup>(**x**|θ ) to the polynomial part <sup>p</sup>(**x**) is to make the hybrid controller drive the system to perform as expected by the compensation. Here we achieve this not by training <sup>k</sup>(**x**|θ ) to satisfy **<sup>u</sup>**RL <sup>=</sup> <sup>p</sup>(**x**) + <sup>k</sup>(**x**|θ ), but instead we require the controller <sup>p</sup>(**x**) + <sup>k</sup>(**x**|θ ) could drive the following closed system to be safe and goal-reachable essentially: **<sup>x</sup>**˙ = <sup>f</sup>(**x**, p(**x**) + <sup>k</sup>(**x**|θ )).

We need to train a hybrid controller <sup>p</sup>(**x**) + <sup>k</sup>(**x**|θ ) for the above system to obtain the parameter θ . Utilizing the learned parameters of the student network from the knowledge distillation as the initialization for the <sup>k</sup>(**x**|θ ), we simulate the system to collect a dataset of sampled trajectories, and use the DDPG algorithm to achieve the control objective of safety and goal-reaching, by referring to the reward design elaborated in Sect. 3.1. Once the training is completed, we obtain the desired hybrid polynomial-DNN controller <sup>u</sup>(**x**) = <sup>p</sup>(**x**) +k(**x**), where <sup>p</sup>(**x**) is the polynomial part and <sup>k</sup>(**x**) is the small neural network.

### 4 Reach-Avoid Verification with Lyapunov-Like Functions and Barrier Certificates Generation

To ensure the safety and goal-reaching properties for the specified control system under the synthesized controller, a relaxed surrogate is to generate a Lyapunovlike function and a barrier certificate, stated in Theorem 1 and Theorem 2. Note that, to make the computation tractable, the basic idea is to translate the problem of producing barrier certificates and Lyapunov-like function into a solvable polynomial optimization problem. Specifically, we first transform the ODEs **f** of the CCDS through system recasting; and then we abstract the initial set Θ, unsafe region Xu, goal set X<sup>g</sup> and the system domain Ψ by polynomial expressions. At last, we establish the polynomial optimization problems yielded from the constraints of barrier certificate and Lyapunov-like function, proceeded by solving the resulted polynomial optimization problem to produce a barrier certificate and Lyapunov-like function, which can guarantee the safety and goal-reaching properties for the system with the hybrid controller, respectively. Notably, Sum-of-Squares (SOS) relaxation technique is applied to encode the polynomial optimization problem as an SOS problem involved with bilinear matrices inequalities (BMI) constraints.

#### 4.1 Constructing Polynomial Simulations of the Controller Network

In the following, we assume the control input **u** is one-dimensional for ease of presentation without loss of generality. Given a controlled CCDS <sup>C</sup> = (**f**,Ψ,Θ) with **f** defined by (1) with an unsafe set X<sup>u</sup> and a goal set Xg. Suppose the hybrid controller learned for the safety and goal-reaching requirements is <sup>u</sup>(**x**) = <sup>p</sup>(**x**) + <sup>k</sup>(**x**). Here <sup>k</sup>(**x**) is a small neural network with the square function as its activation function in the hidden layer, and the tanh in the output layer, i.e., <sup>k</sup>(**x**) = tanh(h(**x**)) where <sup>h</sup> is a polynomial which is in fact the composition of an affine function and a square function. We replace the non-polynomial term occurring in the controller part of the vector field **<sup>f</sup>**(**x**, **<sup>u</sup>**) by introducing <sup>x</sup>n+1 <sup>=</sup> tanh(h(**x**)). Then **<sup>x</sup>**˙ = **<sup>f</sup>**(**x**, **<sup>u</sup>**) is transformed into a polynomial one:

$$\begin{cases}
\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, p(\mathbf{x}) + x\_{n+1}), \\
\dot{x}\_{n+1} = (1 - x\_{n+1}^2)\dot{h}(\mathbf{x}).
\end{cases} \tag{3}$$

For simplicity, we denote (3) as <sup>ˆ</sup>**<sup>f</sup>** <sup>∈</sup> <sup>R</sup>[**x**] <sup>n</sup>+1.

Besides the vector field, we need to transform the Θ, Ψ, Xu, X<sup>g</sup> respectively because of the introduced new variable. For instance, the initial set should be specified by <sup>Θ</sup>¯ := {(**x**, xn+1) <sup>∈</sup> <sup>R</sup>n+1 <sup>|</sup> **<sup>x</sup>** <sup>∈</sup> Θ, xn+1 = tanh(h(**x**))}. Actually, <sup>Θ</sup>¯ can be abstracted by a polynomial inclusion. For the initial set <sup>Θ</sup>¯, we first compute a hyper-rectangle <sup>I</sup> := {**<sup>x</sup>** <sup>∈</sup> <sup>R</sup><sup>n</sup>| ∧ <sup>l</sup><sup>i</sup> <sup>≤</sup> <sup>x</sup><sup>i</sup> <sup>≤</sup> <sup>u</sup>i} as an overapproximation of the bounded compact set Θ through interval analysis, then we could compute a Taylor model for the term tanh(h(**x**)) on <sup>I</sup> and obtain <sup>p</sup><sup>1</sup>(**x**)−δ<sup>1</sup> <sup>≤</sup> <sup>x</sup>n+1 <sup>≤</sup> <sup>p</sup><sup>1</sup>(**x**)+δ1. For <sup>Θ</sup>¯, we can get the corresponding polynomial abstraction <sup>Θ</sup>ˆ. For brevity, let **<sup>x</sup>**<sup>ˆ</sup> denote the variable vector with the introduced variable <sup>x</sup>n+1, i.e., **<sup>x</sup>**<sup>ˆ</sup> = (**x**, xn+1)=(x1,...,xn, xn+1)<sup>T</sup> . Likewise, the other sets Ψ, Xu, X<sup>g</sup> can be dealt with in the same manner, and yield the associated polynomial abstractions, <sup>Ψ</sup>ˆ, <sup>X</sup>ˆ<sup>u</sup>, <sup>X</sup>ˆ<sup>g</sup>. The above polynomial abstractions can be written as following

$$\begin{cases} \dot{\boldsymbol{\Theta}} := \{ \hat{\mathbf{x}} \in \mathbb{R}^{n+1} \, | \, \mathbf{x} \in \boldsymbol{\Theta}, \, |x\_{n+1} - p\_1(\mathbf{x})| \le \delta\_1 \}, \\ \dot{\boldsymbol{\Psi}} := \{ \hat{\mathbf{x}} \in \mathbb{R}^{n+1} \, | \, \mathbf{x} \in \boldsymbol{\Psi}, \, |x\_{n+1} - p\_2(\mathbf{x})| \le \delta\_2 \}, \\ \hat{X}\_u := \{ \hat{\mathbf{x}} \in \mathbb{R}^{n+1} \, | \, \mathbf{x} \in X\_u, \, |x\_{n+1} - p\_3(\mathbf{x})| \le \delta\_3 \}, \\ \hat{X}\_g := \{ \hat{\mathbf{x}} \in \mathbb{R}^{n+1} \, | \, \mathbf{x} \in X\_g, \, |x\_{n+1} - p\_4(\mathbf{x})| \le \delta\_4 \}. \end{cases} (4)$$

Finally, we obtain a polynomial CCDS <sup>C</sup><sup>ˆ</sup> = (ˆ**f**, Ψ, <sup>ˆ</sup> <sup>Θ</sup>ˆ). Therefore, if **<sup>x</sup>**(t) is a trajectory of system (1) within domain specified by Ψ starting from some initial state **<sup>x</sup>**(t<sup>0</sup>) <sup>∈</sup> <sup>Θ</sup>, then **<sup>x</sup>**ˆ(t) is the trajectory of system (3) within the relaxed domain specified by <sup>Ψ</sup><sup>ˆ</sup> starting from the initial state **<sup>x</sup>**ˆ(t<sup>0</sup>) <sup>∈</sup> <sup>Θ</sup><sup>ˆ</sup> with <sup>x</sup>n+1(t<sup>0</sup>) = tanh(h(**x**(t<sup>0</sup>))).

Theorem 3. *If controlled CCDS* <sup>C</sup><sup>ˆ</sup> = (ˆ**f**, Ψ, <sup>ˆ</sup> <sup>Θ</sup>ˆ) *with* <sup>ˆ</sup>**<sup>f</sup>** *defined by (3) and with* <sup>Θ</sup>ˆ*,* <sup>Ψ</sup>ˆ*, and* <sup>X</sup>ˆ<sup>u</sup> *defined by (4) is safe, then the original CCDS* <sup>C</sup> = (**f**,Ψ,Θ) *with the given unsafe set* <sup>X</sup><sup>u</sup> *is safe. Moreover, if* <sup>B</sup>(**x**ˆ) *is a barrier certificate of* <sup>C</sup><sup>ˆ</sup> *w.r.t.* <sup>X</sup>ˆ<sup>u</sup>*, then* <sup>B</sup>(**x**,tanh(h(**x**))) *is also the barrier function of* <sup>C</sup> *w.r.t.* <sup>X</sup>u*.*

*Proof.* Without loss of generality, let us assume that **<sup>x</sup>**(t),t> 0 is one trajectory of the controlled CCDS <sup>C</sup> starting from the initial state **<sup>x</sup>**(t0) <sup>∈</sup> <sup>Θ</sup>, then **<sup>x</sup>**ˆ(t) with <sup>x</sup>n+1(t) = tanh(h(**x**(t))) is a trajectory of <sup>C</sup><sup>ˆ</sup> starting from the initial state **<sup>x</sup>**ˆ(t0) <sup>∈</sup> <sup>Θ</sup>ˆ. Then, the safety of <sup>C</sup><sup>ˆ</sup> indicates that each trajectory of <sup>C</sup><sup>ˆ</sup> from the initial state <sup>Θ</sup><sup>ˆ</sup> cannot reach any unsafe state specified by the assertions <sup>X</sup>ˆ<sup>u</sup>, which implies that each trajectory of <sup>C</sup> from the initial state **<sup>x</sup>**(t0) cannot reach any state specified by <sup>X</sup>u. Furthermore, the vector field <sup>ˆ</sup>**<sup>f</sup>** is yielded from **<sup>f</sup>** by the equivalent transformation, and <sup>Θ</sup>ˆ, <sup>Ψ</sup><sup>ˆ</sup> and <sup>X</sup>ˆ<sup>u</sup> are the associated polynomial abstractions. Therefore, <sup>B</sup>(**x**,tanh(h(**x**))) is the barrier certificate of CCDS <sup>C</sup>.

Theorem 4. *If controlled CCDS* <sup>C</sup><sup>ˆ</sup> = (ˆ**f**, Ψ, <sup>ˆ</sup> <sup>Θ</sup>ˆ) *with* <sup>ˆ</sup>**<sup>f</sup>** *defined by (3) and with* <sup>Θ</sup>ˆ*,* <sup>Ψ</sup><sup>ˆ</sup> *and* <sup>X</sup>ˆ<sup>g</sup> *defined by (4) is goal-reaching, then the original CCDS* <sup>C</sup> = (**f**,Ψ,Θ) *with the given goal set* <sup>X</sup><sup>g</sup> *is goal-reaching. Moreover, if* <sup>V</sup> (**x**ˆ) *is a Lyapunov-like function of* <sup>C</sup><sup>ˆ</sup> *w.r.t.* <sup>X</sup>ˆ<sup>g</sup>*, then* <sup>V</sup> (**x**,tanh(h(**x**))) *is the Lyapunov-like function of* C *w.r.t.* Xg*.*

*Proof.* Suppose the CCDS C is not goal-reaching for the given goal set Xg. Then <sup>∃</sup> and <sup>∃</sup>**x**<sup>0</sup> <sup>∈</sup> <sup>Θ</sup> such that **x**(t)<sup>X</sup><sup>g</sup> > 
, <sup>∀</sup>t > <sup>0</sup>. The state **<sup>x</sup>**ˆ(t) <sup>∈</sup> <sup>Ψ</sup><sup>ˆ</sup> with <sup>x</sup>n+1(t) = tanh(h(**x**(t))) from the initial state **<sup>x</sup>**ˆ(t0) satisfying

$$\|\|\hat{\mathbf{x}}(t)\|\|\_{\hat{X}\_g} > \epsilon,\tag{5}$$

because according to (4), <sup>X</sup>ˆ<sup>g</sup> is obtained just by involving a new variable and not changing the projection on the first n-dimension , i.e., Xg. Then from the theorem assumption, the CCDS <sup>C</sup><sup>ˆ</sup> is goal-reaching, so <sup>∃</sup>T > <sup>0</sup> such that **x**ˆ(t)Xˆ<sup>g</sup> <sup>&</sup>lt;, which contradicts with (5). Similar to Theorem 3, <sup>V</sup> (**x**,tanh(h(**x**))) is the Lyapunov-like function of C w.r.t. Xg. This completes the proof.

#### 4.2 Producing Barrier Certificate and Lyapunov-Like Function

For simplicity, hereafter we denote <sup>Θ</sup>ˆ, <sup>Ψ</sup>ˆ, <sup>X</sup>ˆ<sup>u</sup> and <sup>X</sup>ˆ<sup>g</sup> as follows.

$$\begin{cases} \hat{\boldsymbol{\Theta}} := \{ \hat{\mathbf{x}} \in \mathbb{R}^{n+1} \mid \wedge\_{i=1}^{m\_1} g\_i(\hat{\mathbf{x}}) \ge 0 \}, \quad \hat{\boldsymbol{\Psi}} := \{ \hat{\mathbf{x}} \in \mathbb{R}^{n+1} \mid \wedge\_{j=1}^{m\_2} h\_j(\hat{\mathbf{x}}) \ge 0 \}, \\\hat{X}\_u := \{ \hat{\mathbf{x}} \in \mathbb{R}^{n+1} \mid \wedge\_{k=1}^{m\_3} q\_k(\hat{\mathbf{x}}) \ge 0 \}, \quad \hat{X}\_g := \{ \hat{\mathbf{x}} \in \mathbb{R}^{n+1} \mid \wedge\_{\ell=1}^{m\_4} s\_\ell(\hat{\mathbf{x}}) \ge 0 \}. \end{cases}$$

Barrier Certificate Generation. Assume that the barrier function <sup>B</sup>(**x**ˆ) is a polynomial of degree at most d, whose coefficients form a vector space of dimension <sup>s</sup>(d) = <sup>n</sup>+1+<sup>d</sup> d with the canonical basis (**x**ˆ<sup>α</sup>) of monomials. Suppose the coefficients are unknown, and denoted by **<sup>b</sup>** = (bα) <sup>∈</sup> <sup>R</sup>s(d) the coefficient vector of <sup>B</sup>(**x**ˆ), and write

$$B(\hat{\mathbf{x}}, \mathbf{b}) = \sum\_{\alpha \in \mathbb{N}\_d^n} b\_{\alpha} \hat{\mathbf{x}}^{\alpha} = \sum\_{\alpha \in \mathbb{N}\_d^n} b\_{\alpha} x\_1^{\alpha\_1} x\_2^{\alpha\_2} \cdots x\_n^{\alpha\_n} x\_{n+1}^{\alpha\_{n+1}},$$

in the canonical basis. As stated in Theorem 1 and Theorem 3, the controlled CCDS C is safe under the designed controller if there exists such a barrier certificate <sup>B</sup>(**x**ˆ, **<sup>b</sup>**) for CCDS <sup>C</sup>ˆ. Meanwhile, determining the existence of barrier certificate <sup>B</sup>(**x**ˆ, **<sup>b</sup>**), can be represented as the following feasibility problem.

$$\begin{cases} \text{find } \mathbf{b} \\ \text{s.t. } B(\hat{\mathbf{x}}, \mathbf{b}) \ge 0, \ \forall \hat{\mathbf{x}} \in \hat{\boldsymbol{\Theta}}, \\ \mathcal{L}\_{\mathbf{f}\_{\mathbf{u}}} B(\hat{\mathbf{x}}, \mathbf{b}) > 0, \ \forall \mathbf{x} \in \hat{\Psi} \text{ and } B(\hat{\mathbf{x}}, \mathbf{b}) = 0, \\ B(\hat{\mathbf{x}}, \mathbf{b}) < 0, \ \forall \hat{\mathbf{x}} \in \hat{X}\_{u}. \end{cases} \tag{6}$$

Moreover, Sum-of-Squares (SOS) relaxation technique is applied to encode the optimization problem (6) as an SOS program. Given a basic semi-algebraic set <sup>K</sup> defined by: <sup>K</sup> = {**x**<sup>ˆ</sup> <sup>∈</sup> <sup>R</sup>n+1 <sup>|</sup> <sup>g</sup>1(**x**ˆ) <sup>≥</sup> <sup>0</sup>,...,g<sup>s</sup>(**x**ˆ) <sup>≥</sup> <sup>0</sup>}, where <sup>g</sup><sup>i</sup>(**x**ˆ) <sup>∈</sup> <sup>R</sup>[**x**ˆ], 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>s</sup>, a sufficient condition for the nonnegativity of the given polynomial <sup>f</sup>(**x**ˆ) on the semi-algebraic set <sup>K</sup> is provided as

$$f(\hat{\mathbf{x}}) = \sigma\_0(\hat{\mathbf{x}}) + \sum\_{i=1}^{s} \sigma\_i(\hat{\mathbf{x}}) g\_i(\hat{\mathbf{x}}),\tag{7}$$

where <sup>σ</sup><sup>i</sup>(**x**ˆ) <sup>∈</sup> <sup>Σ</sup>[**x**ˆ]<sup>d</sup>, 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>s</sup>. Thus, the representation (7) ensures that the polynomial <sup>f</sup>(**x**ˆ) is nonnegative on the given semi-algebraic set <sup>K</sup>.

Observing (6), the polynomial <sup>L</sup>**<sup>f</sup><sup>u</sup>** <sup>B</sup>(**x**ˆ, **<sup>b</sup>**) is involved with the uncertain variable <sup>ε</sup> in the range [−μ∗, μ<sup>∗</sup>], which can be written as <sup>h</sup>ˆ(ε) <sup>≥</sup> <sup>0</sup> with

$$
\hat{h}(\varepsilon) := (\varepsilon + \mu^\*) (\mu^\* - \varepsilon).
$$

Thus, the problem (6) can be transformed into the following optimization problem through SOS relaxation

$$\begin{cases} \text{find } \mathbf{b} \\ \text{s.t. } B(\hat{\mathbf{x}}, \mathbf{b}) - \sum\_{i} \sigma\_{i}(\hat{\mathbf{x}}) g\_{i}(\hat{\mathbf{x}}) \in \Sigma[\hat{\mathbf{x}}], \\ \mathcal{L}\_{\mathbf{f}\_{\mathbf{u}}} B(\hat{\mathbf{x}}, \mathbf{b}) - \lambda(\hat{\mathbf{x}}) B(\hat{\mathbf{x}}, \mathbf{b}) - \sum\_{j} \phi\_{j}(\hat{\mathbf{x}}) h\_{j}(\hat{\mathbf{x}}) - \nu(\hat{\mathbf{x}}, \varepsilon) \hat{h}(\varepsilon) - \epsilon \in \Sigma[\hat{\mathbf{x}}], \\ -B(\hat{\mathbf{x}}, \mathbf{b}) - \epsilon' - \sum\_{j} \kappa\_{j}(\hat{\mathbf{x}}) q\_{j}(\hat{\mathbf{x}}) \in \Sigma[\hat{\mathbf{x}}], \end{cases} (8)$$

where , <sup>&</sup>gt; <sup>0</sup>, the entries of <sup>σ</sup><sup>i</sup>(**x**ˆ), <sup>φ</sup><sup>j</sup> (**x**ˆ) <sup>κ</sup><sup>j</sup> (**x**ˆ) <sup>∈</sup> <sup>Σ</sup>[**x**ˆ], and <sup>ν</sup>(**x**ˆ, ε) <sup>∈</sup> <sup>Σ</sup>[**x**ˆ, ε], and <sup>λ</sup>(**x**ˆ) <sup>∈</sup> <sup>R</sup>[**x**ˆ]. Note that , are needed to ensure positivity of polynomials as required in the second and third constraints in (6). The feasibility of the constraints in (8) is sufficient to imply the feasibility of the constraints in (6).

Investigating (8), the product of undetermined coefficient parameters from <sup>λ</sup>(**x**ˆ) and <sup>B</sup>(**x**ˆ, **<sup>b</sup>**) in the second constraint makes the problem into a bilinear matrix inequalities (BMI) problem, which can be carried out by calling a Matlab package PENBMI solver [18].

Remark that the existence of the feasible solution **b**<sup>∗</sup> to the problem (8) implies that the system is guaranteed to be safe under the designated controller <sup>u</sup>(**x**) = <sup>p</sup>(**x**) + <sup>k</sup>(**x**).

Lyapunov-like Function Computation. We wonder that the learned controller is guaranteed to be not only safe but also goal-reaching in a sense of driving the system to converge to the specified goal set. As stated in Theorem 2, the existence of Lyapunov-like function suffices to prove that the system's behaviors asymptotically converge to the specified goal set Xg. In the similar manner, we first formalize the goal-reaching verification for system C through Theorem <sup>2</sup> and Theorem 4. Assume that the Lyapunov-like function <sup>V</sup> (**x**ˆ) is a polynomial of degree at most d , whose coefficients form a vector space of dimension <sup>s</sup>(d ) = <sup>n</sup>+1+d- d- with the canonical basis (**x**ˆ<sup>α</sup>) of monomials. We introduce the coefficient parameters of the Lyapunov-like function <sup>V</sup> (**x**ˆ) denoted as the vector **<sup>v</sup>** = (v<sup>α</sup>) <sup>∈</sup> <sup>R</sup>s(d- ) , and write

$$V(\hat{\mathbf{x}}, \mathbf{v}) = \sum\_{\alpha \in \mathbb{N}\_{d'}^{n+1}} v\_{\alpha} \mathbf{x}^{\alpha} = \sum\_{\alpha \in \mathbb{N}\_{d'}^{n+1}} v\_{\alpha} x\_1^{\alpha\_1} x\_2^{\alpha\_2} \cdots x\_{n+1}^{\alpha\_{n+1}},$$

in the canonical basis. By Theorem 4, the controlled CCDS C is goal-reaching under the designed controller can be reduced to that the CCDS <sup>C</sup><sup>ˆ</sup> is goal-reaching if there exists such a Lyapunov-like function <sup>V</sup> (**x**ˆ, **<sup>v</sup>**). The existence of Lyapunovlike function can be solved by tackling the following feasibility problem:

$$\begin{cases} \text{find } \mathbf{v} \\ \text{s.t. } \varnothing \neq \{\hat{\mathbf{x}} : V(\hat{\mathbf{x}}, \mathbf{v}) \le 0\} \subseteq \hat{X}\_g, \\\ \mathcal{L}\_{\mathbf{f}\_\mathbf{u}} V(\hat{\mathbf{x}}, \mathbf{v}) \le -\beta(V(\hat{\mathbf{x}}, \mathbf{v})), \; \forall \hat{\mathbf{x}} \in \hat{\Psi}. \end{cases} \tag{9}$$

Similarly, we encode the uncertain variable <sup>ε</sup> in the range [−μ, μ] into <sup>h</sup>ˆ(ε) <sup>≥</sup> 0 with <sup>h</sup>ˆ(ε) := (<sup>ε</sup> + <sup>μ</sup>)(<sup>μ</sup> <sup>−</sup> <sup>ε</sup>), and <sup>ε</sup> is involved by the controller **<sup>u</sup>** in the polynomial <sup>L</sup>**<sup>f</sup><sup>u</sup>** <sup>V</sup> (**x**ˆ, **<sup>v</sup>**). And for the given goal-reaching set <sup>X</sup>ˆ<sup>g</sup>, the constraint {**x**<sup>ˆ</sup> : <sup>V</sup> (**x**ˆ, **<sup>v</sup>**) <sup>≤</sup> <sup>0</sup>} <sup>=</sup> <sup>∅</sup> can be encoded by <sup>V</sup> (**x**ˆ<sup>0</sup>, **<sup>v</sup>**) <sup>≤</sup> <sup>0</sup> for a point **<sup>x</sup>**ˆ<sup>0</sup> <sup>∈</sup> <sup>X</sup>ˆ<sup>g</sup>.

Depending on the above encoding operations, the problem (9) can be transformed into the following constrained polynomial optimization problem

$$\begin{cases} \text{find } & \mathbf{v} \\ \text{s.t. } & s\_i(\hat{\mathbf{x}}) + \sigma\_i'(\hat{\mathbf{x}})V(\hat{\mathbf{x}}, \mathbf{v}) \in \Sigma[\hat{\mathbf{x}}], \\ & -\mathcal{L}\_{\mathbf{f}\_\mathbf{u}}V(\hat{\mathbf{x}}, \mathbf{v}) - \beta(V(\hat{\mathbf{x}}, \mathbf{v})) - \sum\_j \phi\_j'(\hat{\mathbf{x}})h\_j(\hat{\mathbf{x}}) - \nu'(\hat{\mathbf{x}}, \varepsilon)\hat{h}(\varepsilon) \in \Sigma[\hat{\mathbf{x}}], \\ & -V(\hat{\mathbf{x}}\_0, \mathbf{v}) \in \Sigma[\hat{\mathbf{x}}], \end{cases} (10)$$

where 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>m</sup>4, <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>m</sup>2, the entries of <sup>σ</sup> <sup>i</sup>(**x**ˆ), <sup>φ</sup> <sup>j</sup> (**x**ˆ) <sup>∈</sup> <sup>Σ</sup>[**x**ˆ], and ν (**x**ˆ, ε) <sup>∈</sup> <sup>Σ</sup>[**x**ˆ, ε]. For the sake of simplicity, we consider the extended class <sup>K</sup> function <sup>β</sup>(·) is the <sup>β</sup>(x) = <sup>x</sup> or <sup>β</sup>(x) = <sup>r</sup> · <sup>x</sup> (r > 0).

In summary, the safety and goal-reaching verification problem is transformed into a BMI problem by combining (8,10) for the parameters **b** and **v**. The solution **<sup>b</sup>**<sup>∗</sup> to problem (8) yields a barrier certificate <sup>B</sup>(**x**ˆ, **<sup>b</sup>**<sup>∗</sup>). It means that the closedloop system under the designed controller <sup>u</sup>(**x**) = <sup>p</sup>(**x**) + <sup>k</sup>(**x**) is safe. And the solution **<sup>v</sup>**<sup>∗</sup> to (10) produces a Lyapunov-like function <sup>V</sup> (**x**ˆ, **<sup>v</sup>**<sup>∗</sup>), which means that the system asymptotically converges to the specified goal set Xg.

### 5 Experiments

In this section we first present a nonlinear system to illustrate our approach, and then report an experimental evaluation of our method over a set of benchmark examples and compare with other two different potential methods. All experiments are conducted on 3.2GHz AMD Ryzen 7 3700X CPU under Windows 10 with 16GB RAM.

*Example 1. [Academic 3D Model* [6]*]* Consider the following continuous dynamical system in the plant:

$$
\begin{bmatrix}
\dot{x} \\
\dot{y} \\
\dot{z}
\end{bmatrix} = \begin{bmatrix}
z + 8y \\
\end{bmatrix} \cdot \mathbf{l}
$$

The system domain is <sup>Ψ</sup> <sup>=</sup> {**<sup>x</sup>** = (x, y, z)<sup>T</sup> <sup>∈</sup> <sup>R</sup><sup>3</sup> | − 5 <sup>≤</sup> x, y, z <sup>≤</sup> 5}. Our goal is to design a control law <sup>u</sup> = <sup>p</sup>(**x**) + <sup>k</sup>(**x**) such that all trajectories of the closed-loop system under u starting from the initial set

$$\Theta = \{ \mathbf{x} \in \mathbb{R}^3 \, | \, (x + 0.75)^2 + (y + 1)^2 + (z + 0.4)^2 \le 0.35^2 \} $$

will never enter the unsafe region

$$X\_u = \{ \mathbf{x} \in \mathbb{R}^3 \, | \, (x + 0.3)^2 + (y + 0.36)^2 + (z - 0.2)^2 \le 0.30^2 \},$$

and eventually enter the goal set <sup>X</sup><sup>g</sup> <sup>=</sup> {**<sup>x</sup>** <sup>∈</sup> <sup>R</sup><sup>3</sup> <sup>|</sup>x<sup>2</sup> <sup>+</sup> <sup>y</sup><sup>2</sup> <sup>+</sup> <sup>z</sup><sup>2</sup> <sup>≤</sup> <sup>0</sup>.1<sup>2</sup>}.

For the controller learning process, we attempt to train different NN structures with increasing depth and width as the controller templates, until a desired controller is obtained. We eventually obtained one DNN controller with 5 hidden layers each consisting of 128 neurons, but failed for smaller sizes. Based on this learned DNN controller, we construct a hybrid controller for the system. The polynomial part <sup>p</sup>(**x**) is carried out by the sampling-based method as follows:

$$p(\mathbf{x}) = 0.125 - 3.333x - 5.726y - 10.669z + 1.911x^2 + 1.212xy$$

$$+ 2.138xz - 1.332y^2 - 10.07yz - 12.952z^2.$$

The hybrid controller is then constructed as <sup>p</sup>(**x**) + <sup>k</sup>(**x**|θ ) where <sup>k</sup>(**x**|θ ) is a small NN with one hidden layer. After retraining by taking <sup>p</sup>(**x**) + <sup>k</sup>(**x**|θ ) into the system, we obtain the NN part with one hidden layer containing 30 neurons.

Under the hybrid controller <sup>p</sup>(**x**)+k(**x**), the controlled system can be verified to satisfy the safety and goal-reaching properties by the following barrier certificate <sup>B</sup>(**x**,tanh(h(**x**))) and Lyapunov-like function <sup>V</sup> (**x**,tanh(h(**x**))) respectively,

$$\begin{cases} B = 0.641x^2 - 0.143xy + 0.554y^2 + \cdots + 0.004\tanh(h(\mathbf{x})) - 0.353z + 0.061, \\ V = -0.09x^2 - 0.311xy + \cdots + 0.0123\tanh(h(\mathbf{x})) - 0.033x - 0.024z - 0.01, \end{cases}$$

where <sup>h</sup>(**x**)=2.248x<sup>2</sup> + 0.962xy + ···− 0.389<sup>z</sup> + 9.051.

Fig. 3. Phase portrait of the system in Example 1. Subfigure (a) describes the zero level set of the barrier certificate B(**x**) (the blue surface) separates unsafe region X*<sup>u</sup>* (the red ball) from the initial set Θ(the yellow ball). Subfigure (b) describes all trajectories of different colors from Θ (the yellow ball) can reach X*<sup>g</sup>* (the green ball). (Color figure online)

Figure 3(a) shows the zero level set of the barrier certificate in blue color which separates X<sup>u</sup> (the red ball) from all trajectories starting from Θ (the yellow ball), and Fig. 3(b) describes the simulation of different trajectories of the system converges to the goal set X<sup>g</sup> (the green ball) under the learned hybrid controller. Therefore, we conclude that the system can be guaranteed to be safe and goal-reachable from the initial set under our learned hybrid controller.

Although DNN policy by RL may appear to work well in many applications, it is difficult to assert any strong and provable claims about its correctness since the neurons, layers, weights and biases are far-removed from the intent of the actual controller. As found in [32], the state-of-the-art neural network verifiers are ineffective for verification of a neural controller over an infinite time horizon with complex system dynamics. So the idea is to learn a controller with formal reasonings of the specified property. *The following part is to conduct the research experiments stated below:*

*RE1: Explore directly learning a polynomial controller to control the system and guarantee its safety and goal-reaching requirements.*

On the verification point, one may think how about directly learning a polynomial controller to control the system (without appealing to the neural policy at all), using reinforcement learning to synthesize its unknown parameters. So the experiment first tried training the controller network with the commonly used Square activation function. Through training on the data set from 250 trajectories with 3000 data points on each, the result was unsuccessful for different network structures (of up to 5 layers and 250 neurons), which means it still fails when simulating the behaviors of the system under the trained polynomial controller. As mentioned in [32], Zhu et al. found that despite many experiments on tuning learning rates and rewards, directly training a linear control program to conform to their specification with either reinforcement learning (e.g. policy gradient) or random search was unsuccessful because of undesirable overfitting even for an example as simple as the inverted pendulum.

*RE2: Explore the effects of using just a polynomial or a small NN to imitate the original DNN to avoid the hybrid form.*

Our method is based on the RL to obtain a well-performing DNN controller in general form, and then with the guidance of the learned DNN, a hybrid controller is designed which is verifiable for the safety and the goal-reaching properties. The experiment next shows the performance of the hybrid controller synthesis and the comparison of the verification performance with other two RL-guided controller synthesis methods:

(RE2-1) Obtain a polynomial controller by imitating and abstracting the trained DNN controller, and under the guidance of the abstracted polynomial controller the resulting verification of the control system can naturally be encoded to a polynomial constraint solving problem;

(RE2-2) Abstract the DNN controller based on knowledge distillation to obtain a small network that is in simple structure, which is expected to maintain the safety and goal-reaching of the original network (on data set) [11]. Since the posterior verification cannot avoid approximating the neural network with a polynomial, and the upper bound of the error is positively related to the Lipschitz constant, the distilled small network is hopeful to make the verification successful thanks to its smaller Lipschitz constant.


Table 1. Performance Evaluation

We present a detailed experimental evaluation on a set of benchmarks in Table 1. The origins of these 10 widely used examples are provided in the first column; n**<sup>x</sup>** and d**<sup>f</sup>** denote the number of state variables and the maximal degree of the polynomials (or the polynomial abstraction by Taylor model for nonpolynomial systems) in the vector fields. The examples are with dimension up to 7. <sup>u</sup><sup>0</sup>(**x**) denotes the network structure of the DNN controller synthesized by RL directly. For example, the trained DNN controller for <sup>C</sup><sup>1</sup> has <sup>4</sup> hidden layers with 128 neurons on each. Here, all DNNs are with ReLU activation functions except for tanh on the output layer.

Table 1 has shown the performance of the mentioned three controller synthesis methods with the guidance of the well-trained DNN <sup>u</sup><sup>0</sup>(**x**), i.e., hybrid controller design, polynomial controller by imitating (denoted as *Poly.*), NN controller by distillation (denoted as *Distil.*). All the verification process on these methods is carried out through the certificate function generating and the time costs are recorded as TH, T<sup>P</sup> and T<sup>D</sup> respectively, when both barrier certificate and Lyapunov-like function have been obtained, and the degrees of the obtained certificate functions are recorded as dB,d<sup>V</sup> ; otherwise, '×' is marked when failing to compute any barrier certificate or Lyapunov-like function within the degree bound of 6 and the time bound of 3 hours.

In our hybrid controller design method (i.e., *Hyb. design*), we uniformly choose <sup>p</sup>(**x**) of degree 2 and <sup>k</sup>(**x**) with one single hidden layer shown in column <sup>k</sup>(**x**). <sup>d</sup><sup>B</sup> and <sup>d</sup><sup>V</sup> denote the degrees of the computed certificates of barrier function <sup>B</sup>(**x**ˆ) and Lyapunov-like function <sup>V</sup> (**x**ˆ) respectively. <sup>T</sup><sup>H</sup> in the last column denotes the verification time cost.

The column *Poly.* exhibits the results of the method described in (RE2-1) on the benchmarks, intending to further explain the necessity of proposing a hybrid form controller. As an ablation study, we only use polynomial approximations of the original DNNs as surrogate controllers and carry out certificate-based verification of them. Considering the control effect, we increase the degree bound of polynomial templates to <sup>8</sup> to ensure a high precision approximation. <sup>d</sup><sup>P</sup> denotes the lowest degree of the polynomial surrogate controllers that pass verification and T<sup>P</sup> denotes the corresponding time cost; '×' means that no such controller is found. The column *Distil.* provides the results of the method in (RE2-2) on the benchmarks. In this ablation study, we have distilled simpler NNs with one single hidden layer from the original DNNs and verify the specified properties using the distilled NN controllers. This process is repeated with the number of neurons of distilled NNs ranging from 20 up to 50 on its hidden layer, until obtaining one satisfying the specified properties whose verification time cost is denoted in TD, or failing to obtain one such simpler NN, denoted by '×' in TD.

For all the 10 examples, we have successfully verified the safety and goalreaching properties of the synthesized hybrid controllers with the certificate generation, while the methods based on polynomial surrogate controllers (i.e., *Poly.*) and distilled NN controllers (i.e., *Distil.*) succeed on 5 and 4 benchmarks, respectively. Moreover, for some examples, *Hyb. design* method can find barrier certificates and Lyapunov-like functions with lower degrees. Consequently, the decision variables of the BMI problems are less than the other methods, which does contribute to improving the effectiveness of the verification procedure.

We compare the efficiency of the methods in terms of the time spent in the verification process for successful examples. On average, the time spent by T<sup>P</sup> is 4.3 to 9.5 times as that of <sup>T</sup><sup>H</sup> on the <sup>5</sup> successful cases of <sup>T</sup><sup>P</sup> . Meanwhile, the time cost by <sup>T</sup><sup>D</sup> is about <sup>8</sup>.<sup>18</sup> seconds on average, which is <sup>1</sup>.<sup>46</sup> times more than that of T<sup>H</sup> on the four successful cases of TD. Comparing T<sup>H</sup> with T<sup>P</sup> and TD, we conclude that verification of the hybrid controllers is much more efficient.

To summarize, Table 1 shows that all the synthesized hybrid controllers have been efficiently verified to make the systems safe and goal-reachable on a set of commonly used benchmark examples, which demonstrates that our hybrid polynomial-DNN controller synthesis method is quite promising.

### 6 Conclusion

This paper has presented an approach to synthesize hybrid polynomial-DNN controllers for nonlinear systems such that the closed-loop system can be both well-performing and easily verified upon required properties. Our approach has creatively integrated low degree polynomial fitting and knowledge distillation into RL method during the constructing process. Thanks to the special feature of the hybrid controller, the controlled system can be transformed into the polynomial form. The SOS relaxation based method is applied to generate barrier certificates and Lyapunov-like functions, which can verify the safety and goal-reaching properties of the nonlinear control systems equipped with our synthesized hybrid controllers. Extensive experiments consistently demonstrate the effectiveness and scalability of the proposed approach.

Acknowledgments. This work was supported in part by the National Key Research and Development Project, China under Grant 2022YFA1005100, in part by the National Natural Science Foundation of China under Grants (No. 12171159, No. 62272397, No. 61972385, No. 61902325), Shanghai Trusted Industry Internet Software Collaborative Innovation Center, and "Digital Silk Road" Shanghai International Joint Lab of Trustworthy Intelligent Software (Grant No. 22510750100).

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Safe Environmental Envelopes of Discrete Systems

Rômulo Meira-Góes1(B) , Ian Dardik<sup>2</sup>, Eunsuk Kang<sup>2</sup>, Stéphane Lafortune<sup>3</sup>, and Stavros Tripakis<sup>4</sup>

<sup>1</sup> School of EECS, Pennsylvania State University, State College, USA

romulo@psu.edu <sup>2</sup> School of Computer Science, Carnegie Mellon University, Pittsburgh, USA

{idardik,eunsukk}@andrew.cmu.edu <sup>3</sup> EECS Department, University of Michigan, Ann Arbor, USA stephane@umich.edu <sup>4</sup> Khoury College of Computer Science, Northeastern University, Boston, USA

stavros@northeastern.edu

Abstract. A safety verification task involves verifying a system against a desired safety property under certain assumptions about the environment. However, these environmental assumptions may occasionally be violated due to modeling errors or faults. Ideally, the system guarantees its critical properties even under some of these violations, i.e., the system is *robust* against environmental deviations. This paper proposes a notion of *robustness* as an explicit, first-class property of a transition system that captures how robust it is against possible *deviations* in the environment. We modeled deviations as a set of *transitions* that may be added to the original environment. Our robustness notion then describes the safety envelope of this system, i.e., it captures all sets of extra environment transitions for which the system still guarantees a desired property. We show that being able to explicitly reason about robustness enables new types of system analysis and design tasks beyond the common verification problem stated above. We demonstrate the application of our framework on case studies involving a radiation therapy interface, an electronic voting machine, a fare collection protocol, and a medical pump device.

Keywords: Robustness · Discrete Transition Systems · Model Uncertainty

### 1 Introduction

A common type of verification task involves verifying a system (C) against a desired property (P) under certain assumptions about the environment (E); i.e., <sup>C</sup>||<sup>E</sup> <sup>|</sup>= <sup>P</sup>. Such assumptions may capture, for example, the expected behavior of a human operator in a safety-critical system, the reliability of the communication channel in a distributed system, or the capabilities of an attacker. However, the actual environment (E ) may occasionally deviate from the original model (E), due to changes or faults in the environment entities (e.g., errors committed by the operator or message loss in the channel). For certain types of deviations, a system that is *robust* would ideally be able to guarantee the property even under the deviated environment; i.e., <sup>C</sup>||E <sup>|</sup><sup>=</sup> <sup>P</sup>.

This paper proposes the notion of *robustness* as an explicit, first-class property of a transition system that captures how robust it is against possible *deviations* in the environment. A deviation is modeled as a set of *extra transitions* that may be added to the original environment, resulting in a new, deviated environment E that has a larger set of behaviors than E does. Then, system C is said to be *robust* to this deviated environment with respect to P if and only if it can still guarantee P even in presence of the deviation. Finally, the overall *robustness* of C with respect to E and P, denoted Δ, is the largest set of deviations that the system is robust against.

Conceptually, Δ defines the safe operating envelopes of the system: As long as the deployment environment remains within these envelopes, the system can guarantee a desired property. Being able to explicitly reason about Δ enables new types of system analysis and design tasks beyond the common verification problem stated above. Given a pair of alternative system designs, C<sup>1</sup> and C2, one could rigorously compare them with respect to their robustness levels; they both may satisfy property P under the normal operating environment E, but one may be more robust to deviations than the other. Given two properties, P<sup>1</sup> and P<sup>2</sup> (the latter possibly more critical than the former), one could check whether the system would continue to guarantee P<sup>2</sup> under a deviated environment even if it fails to do so for P1. Finally, given E, P, and a desired level of robustness, Δ, one could *synthesize* machine C to be robust to Δ.

In this paper, we formalize (1) the proposed notion of robustness and (2) the problem of computing Δ for given C, E, and P. One approach to automatically compute Δ is a brute-force method that enumerates all possible sets of deviations; however, as we will show, this approach is impractical, as the number of deviations is exponential in the size of the environment. To mitigate this, we present an approach for computing Δ by reduction to a controller synthesis problem [35,37].

We have built a prototype of the proposed approach for computing robustness and applied it to several case studies, including models of (1) a radiation therapy interface, (2) an electronic voting machine, (3) a public transportation fare collection protocol, and (4) a medical pump device. Our results show that our approach is capable of computing Δ to provide information about deviations under which these systems are able to guarantee their critical safety properties.

The contributions of this paper are as follows: (i) A novel, formal definition of robustness against environmental deviations (Sect. 4); (ii) A simple, bruteforce method for computing robustness and a more efficient approach based on controller synthesis (Sect. 5); and (iii) A prototype tool for computing Δ and an experimental evaluation on several case studies (Sect. 6).

### 2 Motivating Example

As a motivating example, we consider the Therac-25 radiation therapy machine. This machine is infamous for a design flaw that caused radiation overdoses, several of which led to the deaths of patients who received treatment [18]. In this section, we introduce a model for the Therac-25 based on the descriptions in [18] and discuss several methods for analyzing its safety. We show that robustness provides a generally richer analysis than classic verification.

(c) The normative environment, *E*

Fig. 1. The Therac-25 is modeled as <sup>C</sup>*<sup>T</sup>* <sup>25</sup> <sup>=</sup> <sup>C</sup>*term*||C*beam*||C*turn*. <sup>C</sup>*beam* is in Fig. 7b.

System. We model the Therac-25 as the composition of the following three finite-state machines: (1) C*term*, a computer terminal that nurses use to operate the Therac-25, (2) C*beam*, a beam-emitter that fires a radiation treatment beam in either *X-ray* or *electron* mode, and (3) C*turn*, a turntable that rotates between two hardware components called the *flattener* and the *spreader*. Formally, we define the Therac-25 as the composition all three machines: <sup>C</sup>*T*<sup>25</sup> <sup>=</sup> C*term*||C*beam*||C*turn*. We show the terminal and turntable in Figs. 1a and 1b respectively. We show the beam in Sect. 6.2 (Fig. 7b), where we present a case study on the Therac-25.

Environment. Nurses operate the Therac-25 by typing at a keyboard connected to a terminal. A nurse begins by choosing a beam mode by typing either an "x" for X-ray or an "e" for electron mode. The nurse then hits the "enter" key and waits for the terminal to display "beam ready" before finally pressing the "b" key to fire the beam. This workflow defines the operating environment which we call E, shown in Fig. 1c.

Safety property. Since the X-ray beams contain a high concentration of radiation, it is imperative that the flattener is in place when the machine fires an X-ray. We capture this key safety property in the following LTL [36] formula:

$$\mathbf{G} \text{(XFREE } \rightarrow \text{ FLATMODE)}$$

In this formula, XFIRED is a predicate that is true if an X-ray beam was just fired, while FLATMODE is a predicate that is true when the turn table is in flattener mode. We refer to this safety property as P*xf lat* in this example.

Safety Analyses. Robustness opens our safety analysis beyond classic verification. We discuss several analysis options below.

(1) Standard Verification: We can check that the Therac-25 is safe within the operating environment, that is, <sup>E</sup>||C*T*<sup>25</sup> <sup>|</sup><sup>=</sup> <sup>P</sup>*xf lat*. Standard model checking techniques [2] show that the Therac-25 is indeed safe with respect to E.

(2) Robustness Calculation: Given that the Therac-25 is safe with respect to E, we can calculate its robustness Δ. This calculation identifies the set of safe environmental envelopes of the Therac-25. Importantly, these envelopes reveal the environmental deviations that the Therac-25 can safely handle. For example, in Sect. 6.2, we show that the Therac-25 is robust against the environmental deviations in Fig. 8 in which a nurse repeatedly hits "enter" or the "up" arrow key after choosing a beam mode.

(3) Controller Comparison: Holding the environment E and the property P*xf lat* constant, we can compare the robustness of the Therac-25 against other models. In Sect. 6.2, we introduce the Therac-20 (C*T*<sup>20</sup>) and compare the robustness between C*T*<sup>25</sup> and C*T*<sup>20</sup>. Although both machines are safe with respect to the normative environment, we will find that C*T*<sup>25</sup> is strictly less robust than C*T*<sup>20</sup>. We will show how contrasting the robustness between the two machines exposes a critical software bug in the Therac-25. Furthermore, we will show that fixing the bug in the Therac-25 causes its robustness to be equivalent to the Therac-20.

(4) Property Comparison: Holding the environment E and the machine C*T*<sup>25</sup> constant, we can compare the machine's robustness with respect to P*xf lat* and a second safety property. For example, we could consider a new safety property P that strengthens P*xf lat* by additionally enforcing the spreader to be in place when a beam is fired in electron mode. The property P might be of interest to avoid an *underdose*, a situation that might result from the flattener being in place when an electron beam is fired. Because P is stronger than P*xf lat*, a designer may be interested to compare the robustness between the properties to understand which environmental deviations maintain P*xf lat*, but violate P .

### 3 Modeling Formalism

This section describes the underlying formalism used to model the environment, controlled systems, and the properties enforced by them.

Labeled Transition Systems. Given a finite set A, the usual notations |A| and A<sup>∗</sup> denote the cardinality of A and the set of all finite sequences over A respectively. In this work, we use finite labeled transition systems to model the behavior of the environment, the controller, and the property.

Definition 1. *A* labeled transition system *(LTS)* E *is a tuple* Q*E*, Act*E*, R*E*, q<sup>0</sup>*,E, where* Q*<sup>E</sup> is a finite set of states,* Act*<sup>E</sup> is a finite set of actions,* R*<sup>E</sup>* ⊆ Q*<sup>E</sup>* × Act*<sup>E</sup>* × Q*<sup>E</sup> is the transition relation of* E*, and* q<sup>0</sup>*,E* ∈ Q*<sup>E</sup> is the initial state.*

LTS <sup>E</sup> is said to be deterministic if for any (q, a, q ),(q, a, q) <sup>∈</sup> <sup>R</sup>*E*, then <sup>q</sup> = <sup>q</sup>; otherwise it is nondeterministic. We extend the transition relation R*<sup>E</sup>* to finite sequences of actions as R*E*<sup>∗</sup> ⊆ Q*<sup>E</sup>* × Act*E*<sup>∗</sup> × Q*<sup>E</sup>* in the usual manner. A *trace* of E is a finite sequence of actions a<sup>0</sup> ...a*<sup>n</sup>* of E complying with the transition in <sup>R</sup>*E*<sup>∗</sup>, i.e., (q0*,E*, a<sup>0</sup> ...a*n*, q) <sup>∈</sup> <sup>R</sup>*E*<sup>∗</sup> for some <sup>q</sup> <sup>∈</sup> <sup>Q</sup>*E*. The set of all traces in <sup>E</sup> is denoted by beh(E).

Given LTSs E<sup>1</sup> and E2, the parallel composition || defines standard synchronization of <sup>E</sup><sup>1</sup> and <sup>E</sup><sup>2</sup> [2,7]. The composed LTS <sup>E</sup>1||E<sup>2</sup> <sup>=</sup> Q*<sup>E</sup>*<sup>1</sup> <sup>×</sup> <sup>Q</sup>*<sup>E</sup>*<sup>2</sup> , Act*<sup>E</sup>*<sup>1</sup> <sup>∪</sup> Act*<sup>E</sup>*<sup>2</sup> , R*<sup>E</sup>*1||*E*<sup>2</sup> ,(q0*,E*<sup>1</sup> , q0*,E*<sup>2</sup> ) synchronizes over the common actions between <sup>E</sup><sup>1</sup> and E<sup>2</sup> and interleaves the remaining actions. Lastly, given LTSs E<sup>1</sup> and E2, we say that <sup>E</sup><sup>1</sup> is a subset of <sup>E</sup>2, denoted <sup>E</sup><sup>1</sup> <sup>⊆</sup> <sup>E</sup>2, if <sup>Q</sup>*<sup>E</sup>*<sup>1</sup> <sup>⊆</sup> <sup>Q</sup>*<sup>E</sup>*<sup>2</sup> , Act*<sup>E</sup>*<sup>1</sup> <sup>=</sup> Act*<sup>E</sup>*<sup>2</sup> , <sup>R</sup>*<sup>E</sup>*<sup>1</sup> <sup>⊆</sup> <sup>R</sup>*<sup>E</sup>*<sup>2</sup> , and <sup>q</sup>0*,E*<sup>1</sup> <sup>=</sup> <sup>q</sup>0*,E*<sup>2</sup> .

Control Strategy. Let an LTS E represent the environmental model to be controlled. A control strategy, or simply *controller*, for E is a function that maps a finite sequence of actions to a set of actions, i.e., <sup>C</sup> : Act*E*<sup>∗</sup> <sup>→</sup> <sup>2</sup>*Act<sup>E</sup>* . <sup>A</sup> *controlled trace* of <sup>E</sup> is a trace of <sup>E</sup>, <sup>a</sup><sup>0</sup> ...a*<sup>n</sup>* <sup>∈</sup> beh(E), such that <sup>a</sup>*<sup>i</sup>* <sup>∈</sup> <sup>C</sup>(a<sup>0</sup> ...a*<sup>i</sup>*−1) for any <sup>i</sup> <sup>≤</sup> <sup>n</sup>. The set of all controlled runs, denoted by beh(E/C), defines the closed-loop system of C controlling E. For convenience, this closedloop system is denoted by E/C. In this work, we assume that controller C has finite memory and it can be represented by a deterministic LTS. With an abuse of notation, the LTS controller representation is also denoted by C. For convenience, we define controller <sup>C</sup> <sup>=</sup> Q*<sup>C</sup>* , Act*<sup>C</sup>* , R*<sup>C</sup>* , q0*,C* to have the same actions as in <sup>E</sup>, i.e., Act*<sup>C</sup>* <sup>=</sup> Act*E*. In this manner, the closed-loop system E/C can be represented by the composition of environment E and controller C: E/C = <sup>E</sup>||C.

*Remark 1.* We assume that all elements of the set of actions Act*<sup>E</sup>* are "controllable" actions, that can be acted upon by a controller. However, the nondeterministic transition relation of E can be used to model uncontrollable actions of the environment. After an action a is selected by the controller at state q, the environment decides which state the system will be in, similarly to two-player games [15].

Safety Property. In this work, we consider a class of regular linear-time properties called safety properties over an environment E [2]. A safety property P is represented by a deterministic LTS P that defines the set of accepted behaviors. Usually, the LTS P encodes both the traces that satisfy P and those that violate it by including a sink error state. Formally, any trace that reaches the error state err ∈ Q*<sup>P</sup>* violates the safety property. An LTS E satisfies property P, denoted by <sup>E</sup> <sup>|</sup>= <sup>P</sup>, whenever the traces in beh(E) do not reach the error state in <sup>P</sup>. In this manner, we can test if <sup>E</sup> <sup>|</sup>= <sup>P</sup> by composing <sup>E</sup>||<sup>P</sup> and investigating if the err is reached.

Fig. 2. LTSs for the running example

*Example 1.* We describe a simple example that we use as a running example throughout the paper. Figure 2 depicts the environment E, controller C, and property P considered in this example. The environment E defines that action a is immediately followed by action b. Although controller C in Fig. 2b only shows action <sup>a</sup>, we assume that Act*<sup>C</sup>* <sup>=</sup> {a, b}. In this manner, <sup>C</sup> only allows action a to occur. Lastly, property P defines that action a should happen at most two times while action <sup>b</sup> should never happen. It follows that E/C <sup>|</sup>= <sup>P</sup> since the controller disables action b and the environment only executes one instance of action a.

### 4 Robustness Against Environmental Deviations

#### 4.1 Deviations

*A deviation* is a set of transitions <sup>d</sup> <sup>⊆</sup> (Q*<sup>E</sup>* <sup>×</sup> Act*<sup>E</sup>* <sup>×</sup> <sup>Q</sup>*<sup>E</sup>*) <sup>A</sup> *deviated system* is defined by augmenting the transitions of environment E with a deviation set:

Definition 2. *Given an LTS* <sup>E</sup> = Q*E*, Act*E*, R*E*, q0*,E and a deviation* <sup>d</sup> <sup>⊆</sup> <sup>Q</sup>*<sup>E</sup>* <sup>×</sup> Act*<sup>E</sup>* <sup>×</sup> <sup>Q</sup>*E. We define the* deviated system <sup>E</sup>*<sup>d</sup> as* <sup>E</sup>*<sup>d</sup>* := Q*E*, Act*E*, R*<sup>E</sup>* <sup>∪</sup> d, q0*,E.*

A controller <sup>C</sup> that guarantees property <sup>P</sup> for environment <sup>E</sup>, i.e., E/C <sup>|</sup>= <sup>P</sup>, might violate this property for the deviated environment <sup>E</sup>*d*, i.e., <sup>E</sup>*d*/C <sup>|</sup><sup>=</sup> <sup>P</sup>.

Definition 3. *Controller* C *is a* robust controller *with respect to environment* E*, deviation* <sup>d</sup>*, and property* <sup>P</sup> *if* <sup>E</sup>*d*/C <sup>|</sup><sup>=</sup> <sup>P</sup>*. Deviation* <sup>d</sup> *is a* robust deviation *with respect to* E*,* C*, and* P *if* C *is a robust controller with respect to* E*,* d*, and* P*.*

*Remark 2.* In this paper, we are only interested in ensuring safety properties over the controlled system. For this reason, it is sufficient to only consider adding new transitions to the environment. If a controlled system is safe, then deleting transitions from the environment does not violate the safety property.

#### 4.2 Comparing Deviations

Each deviation set affects the environment in different ways. To reason about the effects of each deviation set, we compare them using a partial order relation over Q*<sup>E</sup>* × Act*<sup>E</sup>* × Q*E*. For deviations d<sup>1</sup> and d<sup>2</sup> such that d<sup>1</sup> ⊆ d2, d<sup>2</sup> deviates LTS <sup>E</sup> more than <sup>d</sup><sup>1</sup> since beh(E*d*<sup>1</sup> ) <sup>⊆</sup> beh(E*d*<sup>2</sup> ). For this reason, we select the relation ⊆ over Q*<sup>E</sup>* × Act*<sup>E</sup>* × Q*<sup>E</sup>* to be the partial order to compare different deviation sets.

Definition 4. *Given* E *and deviations* d1, d2*,* d<sup>1</sup> *is* at least as powerful *as* d<sup>2</sup> *if* d<sup>2</sup> ⊆ d1*.*

#### 4.3 Robustness

Intuitively, robustness is defined as the set of all possible robust deviations d with respect to the environment E, controller C, and safety property P*saf* . Additionally, we introduce an environmental constraint, P*env*, to capture domain knowledge about the system under analysis. P*env* will filter environment deviations that might not be physically feasible or of interest to analyze. This constraint is captured as a safety property over <sup>E</sup>, i.e., <sup>E</sup> <sup>|</sup><sup>=</sup> <sup>P</sup>*env* states that the environment satisfies the constraint. Formally, our robustness notions is defined as follows:

Definition 5. *Let environment* <sup>E</sup>*, controller* <sup>C</sup>*, property* <sup>P</sup>*saf such that* E/C <sup>|</sup><sup>=</sup> <sup>P</sup>*saf , and environment constraint* <sup>P</sup>*env such that* <sup>E</sup> <sup>|</sup><sup>=</sup> <sup>P</sup>*env be given. The robustness of controller* C *with respect to* E*,* P*saf , and* P*env, denoted by* <sup>Δ</sup>(E, C, P*saf* , P*env*)*, is a set of robust deviations* <sup>Δ</sup> <sup>⊆</sup> <sup>2</sup>*<sup>Q</sup>E*×*ActE*×*Q<sup>E</sup> .* <sup>Δ</sup> *is defined to be the (unique) set of robust deviations satisfying the following conditions:*


*When* E, C, P*saf , and* P*env are clear from context, we simply write* Δ*. The set* Δ *is also denoted as the safety envelope of* C *with respect to* E*,* P*saf , and* P*env.*

Intuitively, the set Δ defines an upper bound on the possible deviations from E that controller C is robust against. In other words, Δ captures the envelopes for which controller C remains safe.

If a designer does not have domain knowledge about the system, then P*env* can be set to not constrain the environment, i.e., <sup>P</sup>*env* <sup>=</sup> Act<sup>∗</sup> *<sup>E</sup>*. After computing Δ without environmental constraints, a designer can obtain important information about the system and the environment. In the next analysis iteration, this knowledge can be transformed into environmental constraints to enhance the robustness analysis, i.e., P*env* ⊆ Act<sup>∗</sup> *E*.

By definition, <sup>Δ</sup> is always non-empty since <sup>d</sup> = <sup>∅</sup> is always robust. Moreover, due to conditions 2 and 3, only maximal robust deviations are included in Δ. We show that there is a unique set of deviations that satisfies the conditions of Def. 5. The proof of this lemma is available at [27], pg. 23.

Lemma 1. *Given LTS* E*, controller* C*, safety property* P*saf , and environment property* P*env, there is a unique* Δ *that satisfies the conditions in Def. 5.*

*Example 2.* Back to our running example, we investigate robust deviations and Δ. For simplicity, we do not impose any environment constraint, i.e., <sup>P</sup>*env* <sup>=</sup> Act<sup>∗</sup> *<sup>E</sup>*. Figure 3 shows four robust deviations for our running example, where transitions in green are deviations added to the environment. All robust deviations allow at most two transitions with action a, which is the maximum number allowed by the property. In this example, Δ has three robust deviations that are represented in Figs. 3b–3d. Since the robust deviation shown in Fig. 3a is a subset of both deviations in Fig. 3b and Fig. 3c, it is not included in Δ.

(c) Maximal robust deviated environment (d) Maximal robust deviated environment

#### 4.4 Problem Statement

Although Def. 5 has formally introduced our notion of robustness, it does not show how to compute robustness. Therefore, we investigate the problem of computing the set Δ.

*Problem 1.* Given E, C, P*saf* , and P*env* as in Def. 5, compute Δ.

#### 4.5 Comparing Robustness

Our robustness definition also allows us to compare the robustness between different controllers as well as different safety properties.

Comparing Controllers. Holding the environment and safety property constant, we can compare the robustness of the controllers.

Definition 6. *Given an environment* E*, controllers* C<sup>1</sup> *and* C2*, safety property* P*saf , and environment constraint* P*env, controller* C<sup>1</sup> *is at least as robust as* <sup>C</sup><sup>2</sup> *if and only if for all* <sup>d</sup><sup>2</sup> <sup>∈</sup> <sup>Δ</sup>(E,C2, P*saf* , P*env*) *there exists* <sup>d</sup><sup>1</sup> <sup>∈</sup> <sup>Δ</sup>(E,C1, P*saf* , P*env*) *such that* <sup>d</sup><sup>2</sup> <sup>⊆</sup> <sup>d</sup>1*. Equality and strictly less/more robust are defined in the usual manner using* ⊆*.*

Comparing Safety Properties. Holding the environment and controller constant, we can compare the robustness between safety properties.

Definition 7. *Given an environment* E*, controllers* C*, safety properties* P*saf,*<sup>1</sup> *and* P*saf,*<sup>2</sup>*, and environment constraint* P*env, controller* C *is at least as robust with respect to* P*saf,*<sup>1</sup> *than with respect to* P*saf,*<sup>2</sup> *if and only if for all* d<sup>2</sup> ∈ <sup>Δ</sup>(E, C, P*saf,*<sup>2</sup>, P*env*)*, there exists* <sup>d</sup><sup>1</sup> <sup>∈</sup> <sup>Δ</sup>(E, C, P*saf,*<sup>1</sup>, P*env*) *such that* <sup>d</sup><sup>2</sup> <sup>⊆</sup> <sup>d</sup>1*.*

### 5 Computing Robustness

This section presents two manners of solving Problem 1. One is a brute-force algorithm whereas the second uses control techniques to obtain the solution. Usually when dealing with regular safety properties, one transforms the safety property into an invariance property. This transformation is simply obtained by composing the environment with the safety property; then, an invariance property equivalent to the safety is defined over this composed system [2]. In this composed system, an invariance property is simply defined by a set of safe states. Unfortunately, computing robustness for safety properties does not directly reduce to computing robustness for invariance properties.

When transforming a safety property P*saf* to an invariance property, we compose the environment and the safety property. Let us assume that there are no environmental constraints. In our scenario, the invariance property P*inv* is defined based on the composed system E||C||P*saf* , i.e., P*inv* ⊆ Q*<sup>E</sup>*||*C*||*Psaf* . The composed system P*inv* introduces memory to the environment to differentiate when the safety property is violated or not. This memory addition prevents a simple reduction between invariance and safety properties since robustness is defined with respect to the environment. Robustness defines new transitions in E whereas computing robustness with respect to P*inv* defines new transitions in E||C||P*saf* . For this reason, we cannot simply reduce the problem of computing Δ with respect to safety properties to the problem of computing Δ with respect to an invariance property.

#### 5.1 Brute-Force Algorithm

One way of solving Problem 1 is via a brute-force algorithm. Intuitively, this algorithm is broken into two parts: (i) finding the set of robust deviations that satisfy the environmental constraint, and (ii) identifying the maximal ones within this set. In part (i), we verify <sup>E</sup>*d*||<sup>C</sup> <sup>|</sup><sup>=</sup> <sup>P</sup>*saf* and <sup>E</sup>*<sup>d</sup>* <sup>|</sup><sup>=</sup> <sup>P</sup>*env* for all deviations <sup>d</sup> <sup>⊆</sup> (Q*<sup>E</sup>* <sup>×</sup>Act*<sup>E</sup>* <sup>×</sup>Q*<sup>E</sup>*)\R*E*, which can be solved using standard model checking techniques [2]. Since this algorithm checks if every deviation set is robust or not, it is clear that it computes Δ.

#### 5.2 Controlling the Deviations Without Environmental Constraints

Due to the lack of scalability of the brute-force algorithm, we search for more efficient ways to compute Δ. For readability purposes, we start by describing our algorithm in detail assuming no environmental constraints, i.e., unconstrained environment <sup>P</sup>*env* <sup>=</sup> Act<sup>∗</sup> *E*. In the next section, we show how to use this algorithm to completely solve Problem 1, i.e., for a possibly constrained environment P*env* ⊆ Act<sup>∗</sup> *E*.

Overview of the Control Algorithm. At a high level, we transform the problem of computing Δ to a problem of controlling environmental transitions to avoid safety violations. Intuitively, we control deviations to force them to be robust, i.e., we take the viewpoint that we can control transitions in (Q*<sup>E</sup>* <sup>×</sup>Act*<sup>E</sup>* <sup>×</sup> <sup>Q</sup>*<sup>E</sup>*) \ <sup>R</sup>*E*. Different ways of controlling transitions in (Q*<sup>E</sup>* <sup>×</sup> Act*<sup>E</sup>* <sup>×</sup> <sup>Q</sup>*<sup>E</sup>*) \ <sup>R</sup>*<sup>E</sup>* provide different robust deviations.

Fig. 4. Overview of our approach to compute robustness for the unconstrained environment. The inputs are the LTSs of environment E, controller C, and property P*saf* . The set A is the set of all environment transitions, A = Q*<sup>E</sup>* × Act*<sup>E</sup>* × Q*E*. The LTSs <sup>T</sup>1,...,T*<sup>n</sup>* <sup>⊆</sup> <sup>F</sup> represent controlled meta-systems.

Figure 4 provides an overview of our approach. First, we define LTS E*<sup>A</sup>* to be the deviated system with all possible transitions, i.e., <sup>A</sup> <sup>=</sup> <sup>Q</sup>*<sup>E</sup>* <sup>×</sup>Act*<sup>E</sup>* <sup>×</sup>Q*E*. The deviated system E*<sup>A</sup>* is the maximally deviated environment since it encompasses every possible deviated system E*<sup>d</sup>* for d ⊆ Q*<sup>E</sup>* × Act*<sup>E</sup>* × Q*E*.

Next, we compose the deviated environment E*<sup>A</sup>* with controller C and property P*saf* , to create a "meta-system" F. This meta-system provides information about how the deviated environment E*<sup>A</sup>* under the control of C can violate P*saf* . Following this composition, we pose a control problem over the meta-system to prevent any violation of P*saf* . There are multiple ways of controlling this composed system; in our approach, we obtain a finite number of controllers encoded as T*<sup>i</sup>* ⊆ F. These different ways of controlling the meta-system provide different robust deviations from which we can extract Δ. To make our approach concrete, we describe each step in detail using our running example, shown in Fig. 2.

Constructing the Meta-system. The deviated environment <sup>E</sup>*<sup>A</sup>* <sup>=</sup> E*<sup>Q</sup>E*×*ActE*×*Q<sup>E</sup>* contains the behavior of any other deviated environment. Therefore, we define the meta-system to be the composition of deviated environment <sup>E</sup>*A*, controller <sup>C</sup>, and property <sup>P</sup>*saf* , i.e., <sup>F</sup> <sup>=</sup> <sup>E</sup>*A*||C||P*saf* . Figure 5a shows the meta-system F for our running example. Since C only has one state, we omit its state from the state names in Fig. 5a, i.e., states in Fig. 5a are defined as (q*e*, q*<sup>p</sup>*) <sup>∈</sup> <sup>Q</sup>*<sup>E</sup>* <sup>×</sup>Q*<sup>P</sup>saf* instead of (q*e*, q*c*, q*<sup>p</sup>*) <sup>∈</sup> <sup>Q</sup>*<sup>E</sup>* <sup>×</sup>Q*<sup>C</sup>* <sup>×</sup>Q*<sup>P</sup>saf* . All transitions in F are labeled a, omitted in Fig. 5a, since controller C only enables action a. We also identify in F which transitions are derived from the environment (dashed blue) and which are derived from deviations (green). For simplicity, we define a single error state in <sup>F</sup> to capture every (q*e*, q*c*, err) <sup>∈</sup> <sup>Q</sup>*<sup>E</sup>* <sup>×</sup> <sup>Q</sup>*<sup>C</sup>* <sup>×</sup> <sup>Q</sup>*Psaf* .

(c) Meta-controller *<sup>T</sup>*2

Fig. 5. Meta-systems. All transitions have action <sup>a</sup> since <sup>C</sup> only enables action <sup>a</sup> (see Fig. 2b). Dashed blue transitions represent transitions that are feasible in R*<sup>E</sup>* while solid green transitions represent the deviated transitions in (Q*<sup>E</sup>* × Act*<sup>E</sup>* × Q*E*) \ R*E*. The shaded area in Fig. 5b contains all safe states in the meta-system.

Controlling the Meta-system. Once the meta-system is constructed, we pose a meta-control problem over F to ensure that the meta-system avoids the error states, i.e., states (q*e*, q*c*, err) <sup>∈</sup> <sup>Q</sup>*<sup>E</sup>* <sup>×</sup> <sup>Q</sup>*<sup>C</sup>* <sup>×</sup> <sup>Q</sup>*<sup>P</sup>saf* . These error states represent safety violations in the closed-loop system. For instance, in Fig. 5a, if transition (2, C) <sup>→</sup> err occurs, then the closed-loop system violates <sup>P</sup>*saf* since more than two actions a were executed. In this meta-control problem, a meta-controller can disable transitions in F that originated from deviations in E, i.e., transitions in (Q*<sup>E</sup>* <sup>×</sup> Act*<sup>E</sup>* <sup>×</sup> <sup>Q</sup>*<sup>E</sup>*) \ <sup>R</sup>*E*.

*Problem 2.* Given meta-system F, synthesize a meta-controller T ⊆ F such that (1) for any (q*e*, q*c*, q*<sup>p</sup>*) <sup>∈</sup> <sup>Q</sup>*<sup>T</sup>* then state <sup>q</sup>*<sup>p</sup>* <sup>=</sup> err; and (2) for any - (q*e*, q*c*, q*<sup>p</sup>*), a,(q *e*, q *c*, q *p*) <sup>∈</sup> <sup>R</sup>*<sup>F</sup>* \ <sup>R</sup>*<sup>T</sup>* such that (q*e*, q*c*, q*<sup>p</sup>*) <sup>∈</sup> <sup>Q</sup>*<sup>T</sup>* , it follows that (q*e*, a, q *<sup>e</sup>*) <sup>∈</sup>/ <sup>R</sup>*E*.

Problem 2 states that the meta-controller is a subset of the meta-system F. We want to maintain the same structure as in F since we need to enforce that the meta-controller does not disable any transition associated with R*E*. Condition (1) in Problem 2 ensures that property P*saf* is not violated. On the other hand, condition (2) guarantees that only transitions assigned to deviations are disabled.

Back to our example, the LTS T described by the shaded area in Fig. 5b demonstrates a possible meta-controller that satisfies Problem 2. Condition (1) is satisfied since the error state is not included in the shaded area. With respect to condition (2), only solid green transitions are disabled. Figure 5c shows another meta-controller.

To solve Problem 2, one can solve a safety game over F using fixed-point computation [15,25]. Due to space limitations, we point the reader to [27], pg. .23 for the solution to this safety game.

Extracting Robust Deviations. Each meta-controller that solves Problem 2 relates to a robust deviation. Intuitively, a meta-controller disables deviations that would violate P*saf* . For instance, the meta-controller T<sup>1</sup> shown in Fig. 5b disables transition (3, B) <sup>→</sup> (1, C), which relates to disabling transition 3 *<sup>a</sup>* −→ 1 in the environment. Figure 3a depicts the deviated environment related to metacontroller T1. Similarly, Fig. 3b shows the deviated environment associated with meta-controller T2.

To extract a robust deviation from a meta-controller, we have to (1) identify the transitions that the meta-controller has disabled; and (2) project the disabled transitions to transitions Q*<sup>E</sup>* × Act*<sup>E</sup>* × Q*E*. Since a meta-controller is a subset of the meta-system, the disabled transitions are obtained by comparing F and T. Intuitively, the disabled transitions are those that escape the shaded area in Fig. 5.

$$Disables := \{ (q, \ a, \ q') \in R\_F \mid q \in Q\_T \land \ (q, \ a, \ q') \notin R\_T \} \tag{1}$$

For instance, in the case of meta-controller <sup>T</sup>1, the transition ((1, B), a,(1, C)) belongs to the Disabled set. Next, based on the disabled transitions, we project them to transitions in Q*<sup>E</sup>* × Act*<sup>E</sup>* × Q*E*, i.e., transitions in the environment.

$$del := \{ (q\_e, \, a, \, q'\_e) \in Q\_E \times Act\_E \times Q\_E \mid ((q\_e, q\_c, q\_p), a, (q'\_e, q'\_c, q'\_p)) \in Disabled \} \tag{2}$$

Transitions in del are the transitions to be deleted from Q*<sup>E</sup>* × Act*<sup>E</sup>* × Q*<sup>E</sup>* such that (Q*<sup>E</sup>* <sup>×</sup> Act*<sup>E</sup>* <sup>×</sup> <sup>Q</sup>*<sup>E</sup>*) \ del is a robust deviation set. If transitions in del are included in a deviation set, they can cause a violation of property P*saf* . In the case of <sup>T</sup>1, the transition (1, a, 1) is included in del. If we maintain, for instance, transition 1 *<sup>a</sup>* −→ 1 as part of a deviation set <sup>d</sup>, then the closed-loop <sup>E</sup>*d*/C violates the property <sup>P</sup>*saf* since the path (1, A) <sup>→</sup> (1, B) <sup>→</sup> (1, C) <sup>→</sup> err would be feasible in the meta-controller.

Computing Robustness *Δ*. Problem 2 searches for meta-controllers that guarantee the satisfaction of property P*saf* . To compute Δ, we need to obtain a finite number of meta-controllers. Algorithm 1 formalizes our description in Fig. 4. It takes as input the environment E, the controller C, a deviation set d, and a safety property P. From the algorithm overview description in Fig. 2, we have that for the unconstrained environment <sup>d</sup> <sup>=</sup> <sup>A</sup> <sup>=</sup> <sup>Q</sup>*<sup>E</sup>* <sup>×</sup>Act*<sup>E</sup>* <sup>×</sup>Q*<sup>E</sup>* and <sup>P</sup> <sup>=</sup> <sup>P</sup>*saf* .

In Algorithm 1, line 4 computes the largest possible set of invariant states that avoid the error state, i.e., Inv(Q*<sup>F</sup>* \ Err) solves the safety game as shown

Algorithm 1. COMPUTE-ROBUSTNESS

Input: LTSs <sup>E</sup>, <sup>C</sup>, <sup>P</sup> and deviation <sup>d</sup> Output: Set of deviations <sup>D</sup> 1: D ← ∅ 2: F ← E*d*||C||P 3: Err ← {(q*e*, q*c*, q*p*) ∈ Q*<sup>F</sup>* | q*<sup>p</sup>* = err} 4: W ← Inv(Q*<sup>F</sup>* \ Err) 5: for all <sup>S</sup> <sup>∈</sup> <sup>2</sup>*<sup>W</sup>* \ {∅} do 6: <sup>T</sup> <sup>←</sup> Meta-Controller(S, F) 7: del ← {(q*e*, a, q- *<sup>e</sup>*) ∈ d | ∃((q*e*, q*c*, q*p*), a, (q- *e*, q- *c*, q- *<sup>p</sup>*)) ∈ R*<sup>F</sup>* \ R*<sup>T</sup>* s.t. (q*e*, q*c*, q*p*) ∈ Q*<sup>T</sup>* } 8: D ← D ∪ {d \ del} 9: while <sup>∃</sup>d1, d2 <sup>∈</sup> <sup>D</sup> s.t. <sup>d</sup>1 <sup>⊆</sup> <sup>d</sup>2 do 10: <sup>D</sup> <sup>←</sup> <sup>D</sup> \ {d1} return <sup>D</sup> 11: procedure Meta-Controller(S, F) 12: S ← Inv(S) 13: if <sup>q</sup>0*,F* <sup>∈</sup>/ <sup>S</sup> then 14: T ← ∅ 15: else 16: <sup>Q</sup>*<sup>T</sup>* <sup>←</sup> <sup>S</sup>, Act*<sup>T</sup>* <sup>←</sup> Act*<sup>F</sup>* , <sup>q</sup>0*,T* <sup>←</sup> <sup>q</sup>0*,F* 17: R*<sup>T</sup>* ← {(q, a, q- ) ∈ S × Act*<sup>T</sup>* × S | (q, a, q- ) ∈ R*<sup>F</sup>* } return <sup>T</sup>

in [27], pg. 23. Based on this invariant set, each iteration in the loop (lines 5–8) computes a meta-controller (line 6) and stores its respective robust deviation (line 8). The meta-controller T is also computed by using the function Inv. The meta-controller solution ensures that Q*<sup>T</sup>* ⊆ S. Line 7 computes environmental transitions that must be deleted in order to obtain a robust deviation. The computed robust deviations are stored in Δ. Lastly, the loop in lines 9–10 ensures that only maximal robust deviations are included in Δ.

In more detail, to solve Problem 2, we must guarantee that the meta-system <sup>F</sup> does not reach any states in Err := {(q*e*, q*c*, q*<sup>p</sup>*) <sup>∈</sup> <sup>Q</sup>*<sup>F</sup>* <sup>|</sup> <sup>q</sup>*<sup>p</sup>* <sup>=</sup> err}. Formally, we compute the set Inv(Q*<sup>F</sup>* \ Err), which contains every state in <sup>F</sup> that does not reach a state in Err via a transition associated with R*E*. Based on this invariant set, we can extract any meta-controller that remains within this set. Informally, the Meta-Controller(S, F) in line 11 of Algorithm <sup>1</sup> computes a meta-controller that remains within states in S. First, this procedure computes the invariant set of <sup>S</sup>, i.e., Inv(S) with respect to meta-system <sup>F</sup> (line 12). In this manner, a meta-controller is defined by projecting the meta-system F to states and transitions in the set of state Inv(S) (lines 16–17).

The following theorem shows that Δ computed via Algorithm 1 is equal to Δ as in Definition <sup>5</sup> when <sup>P</sup>*env* <sup>=</sup> Act<sup>∗</sup> *<sup>E</sup>*, i.e., Algorithm 1 *partially* solves Problem 1.

Theorem 1. *Given LTS* E*, controller* C*, and property* P*saf , Algorithm 1 outputs* <sup>Δ</sup> *as in Definition <sup>5</sup> when* <sup>P</sup>*env* <sup>=</sup> Act<sup>∗</sup> *E.*

*Proof.* Sketch. In order to show that Theorem 1 holds, we provide two intermediate lemmas whose proofs are available at [27], pg. 24 (Lemma 2 and Lemma 3). The first lemma states that every meta-controller T produces a robust deviation. In this manner, we show that for every d ∈ Δ, the deviation d is robust. The second lemma shows that for every maximal robust deviation d ∈ Δ, there exists a meta-controller T associated with deviation d. Consequently, Algorithm 1 computes every possible maximal robust deviation.

Using Algorithm 1 to compute Δ for our running example, we obtain Δ that contains the three maximal robust deviations shown in Fig. 3. Lastly, we provide the computational complexity of Algorithm 1.

Theorem 2. *Algorithm <sup>1</sup> outputs* <sup>Δ</sup> *in* <sup>O</sup>(2<sup>|</sup>*QE*||*Q<sup>C</sup>* <sup>|</sup>(|*Q<sup>P</sup>* |−1))*.*

*Proof.* It follows from the size of <sup>2</sup>*<sup>W</sup>* .

Although Algorithm 1 has exponential complexity, we empirically show in Sect. 6 that it scales better than the brute-force algorithm.

Heuristics to Exploit the Structure of *F*. In Algorithm 1, we compute robust deviations for every possible subset of the largest invariant state set, c.f., line 5. To improve the efficiency of Algorithm 1, we provide a sound and complete heuristic that identifies and skips redundant subsets of <sup>2</sup>*<sup>W</sup>* \ ∅. The heuristic is based on the observation that sets of states that are not directly connected in F correspond to redundant deletion sets from Q*<sup>E</sup>* × Act*<sup>E</sup>* × Q*E*. As such, the heuristic exploits the structure of F by performing a depth-first search over its state space, hence skipping disconnected groups of states. For instance, the heuristic will skip the subset {(1, A),(3, C)} because (1, A) and (3, C) are not connected in <sup>F</sup>. This subset is redundant because its deletion set del = {((1, A),(1, B)),((1, A),(2, B)),((1, A),(3, B))} is identical to the deletion set for the subset {(1, A)} which is connected. In the worst-case scenario, our heuristic computes the power set of W, i.e., exactly as in line 5.

#### 5.3 Controlling the Deviations with Environmental Constraints

When introducing environmental constraints, we must eliminate the robust deviations that violate these constraints as described in Definition 5. One might think that P*env* and P*saf* could be combined as a single safety property for which we then compute Δ. However, this approach does not work since P*env* must be enforced only by the environment whereas P*saf* is a property of the closedloop system. Another approach is to verify if P*env* is satisfied for each deviation obtained in the for-loop (lines 5–8) in Algorithm 1. Although this approach is feasible, in practice, we want to reduce the number of deviations, using P*env*, before we compute the robust deviations. For this reason, we describe a sequential algorithm shown in Fig. 6. In this algorithm, Algorithm 1 is used multiple times in this constrained scenario instead of a single time as in the unconstrained scenario (Sect. 5.2).

Fig. 6. Overview of our approach to compute robustness for constrained environments.

The algorithm to compute robustness for constrained environments can be broken into two parts: (a) computing all maximal environments ˜ d*<sup>i</sup>* that satisfy P*env*; and (b) computing robust deviations for each deviated environment E*d*˜*<sup>i</sup>* found in part (a). Computing the maximal environments that satisfy P*env* reduces to computing maximal deviations of E with respect to a controller that allows every environment action, C*all*. Formally, the behavior of C*all* does not restrain <sup>E</sup>, beh(C*all*) = Act<sup>∗</sup> *E*; and it can be described by a one-state LTS. Therefore, the output of part (a) is the set of maximal deviations ˜ d*<sup>i</sup>* with respect to E, C*all*, and P*env*, denoted as maximal environment deviations. Each maximal deviated environment E*d*˜*<sup>i</sup>* satisfy the P*env*.

Once we have obtained all maximal environment deviations that satisfy P*env*, we focus on finding the maximal robust deviations with respect to C and P*saf* . In other words, we run Algorithm 1 for each maximal deviated environment E*d*˜*<sup>i</sup>* together with <sup>C</sup> and <sup>P</sup>*saf* . Since <sup>d</sup> is a subset of ˜ d*i*, we have that the perturbed system E*<sup>d</sup>* satisfies P*env*.

Each maximal deviated environment E*d*˜*<sup>i</sup>* generates a set of maximal robust deviations D*<sup>i</sup>* with respect to C and P*saf* . The final step is combining these maximal robust deviations with respect to each ˜ d*i*. Since they are maximal with respect to ˜ d*i*, there could be deviations that are not maximal as defined by Definition 5. The post-processing step combines the deviations and eliminates any non-maximal deviations; and it outputs Δ as in Definition 5. The correctness of this algorithm follows from Theorem 1.

### 6 Case Studies

#### 6.1 Implementation

We have implemented a prototype tool for computing robustness [28]. The tool accepts a model of an environment, a controller, and a safety property–as well as an optional list of environmental constraints–and outputs Δ. The tool has support for comparing the robustness of two controllers as well as the robustness of a controller with respect to two separate safety properties. Currently, the environment, controller, safety property, and environmental constraints must be encoded in Finite State Process (FSP) notation [23] but this is not a fundamental limitation.

(a) The beam *Cbeam with* hardware interlocks used in the Therac-20. (b) The beam *Cbeam without* hardware interlocks used in the Therac-25.

Fig. 7. The beam components of the two Therac machines. The hardware interlocks cause C- *beam* to have a fifth state "switching mode" that will only switch to X-ray mode after the flattener rotates into place.

We wrote the tool in the Kotlin programming language. Our tool includes an implementation of the brute-force algorithm from Sect. 5.1, as well as an implementation of Algorithm 1 and Algorithm 1 with heuristics. In the following case studies, we leverage the tool to calculate and compare the robustness of several systems. We summarize our performance results for each case study in Sect. 6.6.

#### 6.2 Therac-25

Background. In Sect. 2, we introduced the Therac-25 radiation therapy machine. In this section, we present a case study in which we compare the robustness of the Therac-25 to that of its predecessor, the Therac-20. We begin by showing that the Therac-20 is strictly more robust than the Therac-25. We then use this information to identify and fix a critical safety bug in the Therac-25 model.

Therac-20. The Therac-20 is a radiation therapy machine that was designed before the Therac-25. Unlike the Therac-25, the Therac-20 was not known for causing accidents that led to injuries and death. A key difference between the two machines is that the Therac-20 includes hardware *interlocks* in its beam component (Fig. 7a), while the Therac-25 does not (Fig. 7b). The purpose of the hardware interlocks is to provide a layer of security at the hardware level for upholding P*xf lat*. In our model, the interlocks work by ensuring that the flattener is completely rotated into place before allowing an operator to fire an X-ray beam. Unfortunately, hardware interlocks were considered expensive so they were omitted from the design of the later Therac-25 model. In the following section, we compare the robustness between the two Therac machines with respect to the normative environment E and the key safety property P*xf lat*.

Comparing Controllers. Using standard model checking techniques [2], we can confirm that both the Therac-20 and the Therac-25 are safe with respect

Fig. 8. Visual robustness comparison between the two Therac machines. Both machines are robust against gray transitions, but only the Therac-20 is robust against green transitions. (Color figure online)

Fig. 9. Software fix that eliminates the race condition in the Therac-25.

to E and P*xf lat*. Historically, however, the Therac-20 is known to be safer than the Therac-25. Therefore, we improve our safety analysis by also comparing the robustness between the two machines with respect to E, P*xlfat*, and an environmental constraint P*env*. P*env*, shown in [27], pg. 26, Fig. 11, restricts the environment to firing the beam at most once.

Our tool reports that the Therac-20 is strictly more robust than the Therac-25. To understand this result, we can examine the difference between the robustness for each machine. We show this difference visually by presenting one maximal robust deviation from each machine in Fig. 8. This figure shows that the Therac-20 is robust against the scenario in which the operator 1) types "e" to select electron beam mode, 2) optionally types "enter", 3) presses the "up" arrow key, and finally 4) types "x" to switch the beam into X-ray mode. The Therac-25, however, is not robust against this scenario. We see this in Fig. 8 because the series of actions must pass through at least one green arrow, where a green arrow indicates a transition that the Therac-25 is not robust against. In fact, the Therac-25 does not have *any* maximal robust deviations that allow this scenario.

The Therac-25's lack of robustness to the scenario above represents a race condition that occurs after the operator switches into X-ray mode from electron mode. In this scenario, if the operator types "enter" and fires the X-ray beam before the flattener rotates into place, the beam will fire an unflattened X-ray at the patient. This critical bug was responsible for real-world radiation overdoses, several of which resulted in death [18].

Fixing the Software Bug. In the previous section, we identified a critical software bug in the Therac-25. Our goal in the current section is to fix this bug entirely in the terminal software, thus avoiding an expensive hardware solution.

In Fig. 7a, we see that the hardware interlocks prevent a race condition by blocking the operator from typing a "b" until the flattener is rotated into place. Thus we can fix the race condition in software by altering the terminal to block the operator from typing a "b" until the flattener is rotated into place. We implement this fix by redesigning the terminal to block all key strokes from the instant it issues a "beam ready" message until the turntable rotates into place, as shown in Fig. 9. Finally, we use our tool to evaluate the robustness of the fix. The tool reports that the fixed Therac-25 design is strictly more robust than the original, and equally robust to the Therac-20.

#### 6.3 Voting

Background. In this section, we consider a case study of an electronic voting machine, introduced in [46]. In this case study, we model the voting machine, a voter, and a corrupt election official who attempts to "flip" the voter's choice. We define the voting machine as the composition of a voting booth and a user interface, shown at [27], pg. 26 in Fig. 12a and Fig. 12b respectively.

In the normative environment–shown in Fig. 10a–the voter enters the booth, enters their password, selects a candidate, clicks the vote button, and finally confirms the choice. Unfortunately, some voters may inadvertently skip the confirmation step and leave the booth early. This deviation from the normative behavior presents an opportunity for the election official to "flip" the intended vote: after the voter leaves the booth, the corrupt official can enter the booth, press "back" and change the vote to their liking. This scenario represents an actual election fraud that took place in the US [38].

(a) Normative environment for the voting machine. (b) The voting machine's robustness is identical with respect to *Pall* and *Pcfm*.

Fig. 10. Models for the voting machine example. In the figures above, the prefix "v" represents actions by the voter.

Comparing Properties. In this case study, we will consider two safety properties, P*all* and P*cfm*, both of which imply the absence of vote flipping. P*all* requires that the election official cannot at any point select, vote, or confirm a candidate. P*cfm* is weaker, only requiring that the election official cannot at any point confirm a candidate selection.

Using our tool for comparison, we see that the voting machine is equally robust with respect to each property. However, this result is surprising because P*cfm* is weaker than P*all*. To understand this result, we examine Fig. 10b where we present the sole maximal robust deviation for each property. In this figure, it is clear that the voting machine is not robust against any deviation in which the voter enters their password and then exits the booth without confirming their vote. The key insight is that, when an election official has the ability to confirm, it *implies* that the official can also select and vote. Therefore, we desire a voting machine without this implication because it will reduce the number of points of failure. For example, we could redesign the voting machine to require a password as part of the confirmation step. In lieu of this insight, a designer could choose to specify a margin of safety into the machine's specification by requiring that it is strictly more robust against P*cfm* than P*all*.

#### 6.4 Oyster

Background. The Oyster example was introduced in [41], in which the authors modeled the Oyster card that is used the public transportation system in the United Kingdom. In our model, the controller consists of an *entry gate* and an *exit gate*, where the card holder taps the Oyster card at the start and end of their journey respectively. The environment models the actions of a card holder; in the normative environment, a card holder chooses to tap with either their Oyster card or a credit card, and taps in and out with the chosen card. The key safety property is avoiding an *incomplete journey*, in which a card holder taps in with one card and taps out with a different card.

Calculating Robustness. An incomplete journey is avoided under the normative environment. We calculate the robustness of the system under the two environmental constraints 1) Oyster cards and credit cards give the correct information to the gates and 2) the gates operate correctly and calculate the correct fare when a card is tapped in and out. Unfortunately, the system is not robust to *any* deviations.

#### 6.5 PCA Pump

Background. In this section, we model a patient-controlled analgesia (PCA) pump, originally introduced in [5]. A PCA pump is a medical device that dispenses pain medicine to a patient, offering them partial control over the dose rate. A nurse uses the device interface to program the volume per dosage, as well as a minimum and maximum dose rate to protect the patient from an overdose. The pump includes batteries to power the device in case it is unplugged (e.g., by mistake by the nurse or patient), yet the power may fail if the device runs out of battery. In this case, the device cannot monitor the dosage amount or frequency, which may cause an overdose. Therefore, we define the key safety property P*pfail*

which requires the PCA pump to abstain from administering medicine after a power failure.

In the normative environment, the nurse operates the pump using the following three step workflow: 1) plug in the pump and turn it on, 2) program the desired dosage parameters into the pump and administer the treatment, and 3) turn off the device and unplug it. The nurse begins with step (1) and ends with step (3), but may omit or repeat step (2) as many times as needed. A diagram of the normative environment is available at [27], pg. 26, Fig. 13. Crucially, the pump is safe with respect to this environment and P*pfail* because the workflow assumes that the pump is never unplugged in step (2).

Calculating Robustness. We use our tool to calculate the robustness of the pump with respect to the normative environment, P*pfail*, and an environmental constraint P*env*. In this case study, P*env* restricts the environment to actions that are allowed by the pump's interface. A diagram of the sole maximal robust deviation is available at [27], pg. 27, Fig. 14. The tool reports that the pump is robust against four actions, three of which allow the operator to change settings before administering the treatment, and the fourth allows the operator to turn off the device prematurely after programming the dosage parameters. Unfortunately, the pump is not robust against any deviations in which it is unexpectedly unplugged. This poses a key weakness in the pump that the designers may wish to improve upon.

#### 6.6 Results and Discussion

We have run our tool on the examples and case studies above, and we present our results in Table 1. All tests were run on a Mac Book Pro with an M1 Pro chip and 32GB of RAM. In the table, |Act| is the union of Act*E*, Act*<sup>C</sup>* , Act*<sup>P</sup>saf* and Act*<sup>P</sup>env* , |d*max*| is the size of the largest deviation in Δ, and |W*<sup>P</sup>env* | is the size of the winning set for each maximal deviation ˜ d*<sup>i</sup>* (separated by a comma); NA indicates the absence of an environmental constraint. Furthermore, "Wall Heur" denotes the wall time for running Algorithm 1 with the heuristic, while "Wall Plain" denotes the wall time for running Algorithm 1, and "TO" indicates a time-out after five minutes.

Our results demonstrate that calculating robustness is tractable across several different case studies. In particular, our tool's performance on the larger PCA pump case study shows promising results in terms of scalability. Furthermore, we have shown that Δ is useful as a means for both analysis and comparison of controllers. For example, in the Therac-25 case study, robustness provided a richer analysis than classic verification that helped us discover–and ultimately fix–a critical race condition. Finally, we have also demonstrated in the voting machine case study that robustness provides a means for comparing two properties with respect to a controller and an environment.


Table 1. Summary of results from running our tool.

### 7 Related Work

Quantitative robustness notions for discrete transition systems have been investigated in several works [3,4,8,16,24,32,40,42]. We capture robustness qualitatively, which avoids the need for external cost functions over the discrete transition systems. The problem of synthesizing robust controllers against deviated environments given by a designer is investigated in [45]. Since [45] focuses on synthesizing robust controllers, their framework does not address the analysis of robustness. Moreover, robust controllers are measured via a rank function (quantitatively). Robust linear temporal logic (rLTL) extends the binary view of LTL to a 5-valued semantics to capture different levels of property satisfaction [43]. This work is tangent to ours as it focuses on specifying robustness.

In [17,49], the authors define robustness as a set of environmental behaviors for which a software system can guarantee safety. Defining robustness in the semantic domain–i.e. in terms of behaviors–implicitly describes safe environmental deviations. Our notion of robustness captures safe environmental deviations explicitly in terms of transitions, which offer both syntactic (transitions) and semantic (implied behaviors) information. Transition-based robustness also allows us to capture the safe environmental envelopes of a system; it is not clear how one might efficiently capture this information with only behaviors.

In [29], the authors define robustness also based on additional transitions to the environment. Their definition of robustness compares the perturbed controlled behavior, i.e., beh(E*d*|f), instead of directly comparing the additional transitions. In this manner, the partial order used to define robustness in [29] is different from our notion of robustness. Moreover, only an efficient algorithm for invariance properties is presented. Extending the work in [29], the authors explore the relationship between controller robustness and permissiveness for invariance properties [30].

Robust control in discrete event systems is also an active area of research [1,10,19–21,26,31,33,39,44,47,48]. However, they usually deal with specific types of faults such as communication delays, loss of information, or deception attacks [1,20,21,26,31,39,47]. We capture model uncertainty with our robustness definition, which can be attributed to these faults. Robustness against model uncertainty is tackled in the works of [10,19,44,48]. In these works, deviations are modeled by the behavior generated by the environment. On the other hand, we modeled deviations by the inclusion of extra transitions. In [11], a controller realizability problem is studied for environments modeled as modal transition systems, where a controller satisfies a property in all, some, or none of the LTS family. Our notion of robustness explicitly computes which systems in the LTS family satisfy the property.

Lastly, robustness also relates to fault-tolerance. Fault-tolerance has been studied in the context of distributed systems [13,22,34]. In [6,9,12,14], synthesis of fault-tolerant programs by retrofitting initial fault-intolerant programs. These works focus on specific types of fault models, whereas our robustness model computes the safety envelope the controller is robust against.

### 8 Conclusion

In this paper, we introduced a new notion of robustness against environmental deviations for discrete-state transition systems. Our notion of robustness is syntactically defined by additional transitions and semantically defined by the controlled behavior generated by these additional transitions. We provided two methods to compute robustness: a brute-force algorithm, and an algorithm based on a controller synthesis problem. We implemented these methods in a prototype tool which we used to analyze several case studies. In these case studies, we demonstrated that our robustness analysis provides crucial information by identifying the environmental envelopes in which the system can guarantee its safety properties.

As part of future work, we plan to extend our work to investigate robustness in the context of partially observable systems as well as in stochastic systems such as Markov decision processes (MDPs). We also plan to investigate the benefit of considering additional environmental states–as well as additional transitions– in our robustness analysis. Finally, we plan to extend our work beyond safety properties, e.g. including liveness.

Acknowledgements. This project was supported by the US NSF Awards CCF-2144860, CNS-1801342, CNS-1801546, CCF-1918140, and ECCS-2144416.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Verse: A Python Library for Reasoning About Multi-agent Hybrid System Scenarios

Yangge Li(B) , Haoqing Zhu(B) , Katherine Braught , Keyi Shen , and Sayan Mitra(B)

Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Champaign, USA {li213,haoqing3,braught2,keyis2,mitras}@illinois.edu

Abstract. We present the Verse library with the aim of making hybrid system verification more usable for multi-agent scenarios. In Verse, decision making agents move in a map and interact with each other through sensors. The decision logic for each agent is written in a subset of Python and the continuous dynamics is given by a black-box simulator. Multiple agents can be instantiated, and they can be ported to different maps for creating scenarios. Verse provides functions for simulating and verifying such scenarios using existing reachability analysis algorithms. We illustrate capabilities and use cases of the library with heterogeneous agents, incremental verification, different sensor models, and plug-n-play subroutines for post computations.

Keywords: Scenario verification · Reachability · Hybrid Systems

### 1 Introduction

Automatic verification tools for hybrid systems have been used to analyze linear models with thousands of continuous dimensions [1,5,6] and nonlinear models inspired by industrial applications [6,14]. The state of the art and the challenges are discussed in a recent survey [11]. Despite the potentially large user base, currently this technology is inaccessible without formal methods training. Automatic hybrid verification tools [10,13,17,25,31] require the input model to be written in a tool-specific language. Tools like C2E2 [15] attempt to translate models from Simulink/Stateflow, but the language-barrier goes down to the underlying math models. The verification algorithms are based on variants of the hybrid automaton [3,21,24] which requires the discrete states (or *modes*) to be spelled out explicitly as a graph, with guards and resets labeling the transitions. We discuss related works in more detail in Sect. 6, including recently developed libraries that address usability barrier [5,7,8].

c The Author(s) 2023

This research was funded in part by NASA University Leadership Initiative grant (80NSSC22M0070) Robust and Resilient Autonomy for Advanced Air Mobility.

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 351–364, 2023. https://doi.org/10.1007/978-3-031-37706-8\_18

In this paper, we present Verse, a Python library that aims to make hybrid technologies more usable for multi-agent scenarios. The key features implemented are as follows: (1) In Verse, users write scenarios in Python. User-defined functions can be used to create complex *agents*, invariant requirements can be written as assert statements, and scenarios can be created by instantiating multiple agents, all using the standard Python syntax. Verse parses this scenario and constructs an internal representation of the hybrid automaton for simulation and analysis. (2) Verse introduces an additional structure, called *map,* for defining the modes and the transitions of a hybrid system. Map contains *tracks* that can capture geometric objects (e.g., lanes or waypoints) that make it possible to create new scenarios just by instantiating agents on new maps. With track modes, users do not have to explicitly write different modes for a vehicle following different waypoint segments. Finally, (3) Verse comes with functions for simulation and safety verification via reachability analysis. Developers can implement new functions, plug-in existing tools, or implement advanced algorithms, e.g., for incremental verification. In this tool paper, we illustrate use cases with heterogeneous agents and different scenario setups, the flexibility of plugging in different reachability algorithms and the ability to develop more advanced algorithms (Sect. 5). Verse is available at https://github.com/AutoVerse-ai/Verse-library.

### 2 Overview of Verse

We will highlight the key features of Verse with an example. Consider two drones flying along three parallel ∞-shaped tracks that are vertically separated in space (shown by black lines in Fig. 1). Each drone has a simple collision avoidance logic: if it gets too close to another drone on the same track, then it switches to either the track above or the one below. A drone on T1 has both choices. Verse enables creation, simulation, and verification of such scenarios using Python, and provides a collection of powerful functions for building new analysis algorithms.

Fig. 1. *Left:* A 3-d <sup>∞</sup>-shaped map with example track mode labels. *Center:* Simulation of a red drone nearing the blue drone on T1 and nondeterministically moving to T0 or T2. Both branches are computed by Verse's simulate function. *Right:* Computed reachable sets of the two drones cover more possibilities: either drones can switch tracks when they get close. All four branches are explored by Verse. The branch for blue drone moving downwards violates safety as it may collide with the red drone following T1.

*Creating Scenarios.* Agents like the drones in this example are described by a *simulator* and a *decision logic* in an expressive subset of Python (see code in Fig. 2 and [26] for more details). The decision logic for an ego agent takes as input its current state and the (observable) states of the other agents, and updates the discrete state or the *mode* of the ego agent. For example, in lines 41– 43 of Fig. 2 an agent updates its mode to begin a track change if there is any agent near it. It may also update the continuous state of the ego agent. The *mode* of an agent, as we shall see later in Sect. 3, has two parts—a *tactical mode* corresponding to agent's decision or discrete state, and a *track mode* that is determined by the map. Using the any and all functions, the agent's decision logic can quantify over other agents in the scene. User defined functions are also allowed (is\_close, Fig. 2 line 41). Verse will parse this decision logic to create an internal representation of the transition graph of the hybrid model with guards and resets. The simulator can be written in any language and is treated as a black-box<sup>1</sup>. For the examples discussed in this paper, the simulators are also written in Python. Safety requirements can be specified using assert statements (see Fig. 5).

```
38 def decisionLogic(ego: State, others: List[State], track_map):
39 next = copy.deepcopy(ego)
40 if ego.tactical_mode == TacticalMode.Normal:
41 if any((is_close(ego, other) and ego.track_mode==other.track_mode) for other in
        -
         → others):
42 next.tactical_mode = TacticalMode.MoveDown
43 next.track_mode = track_map.Tg(ego.track_mode, ego.tactical_mode,
          -
           → TacticalMode.MoveDown)
44 if any((is_close(ego, other) and ego.track_mode==other.track_mode) for other in
        -
         → others):
45 next.tactical_mode = TacticalMode.MoveUp
46
47 if ego.tactical_mode == TacticalMode.MoveUp:
48 if in_interval(track_map.altitude(ego.track_mode)-ego.z, -1, 1):
49 next.tactical_mode = TacticalMode.Normal
50 next.track_mode = track_map.Tg(ego.track_mode, ego.tactical_mode,
          -
           → TacticalMode.Normal)
51
```
Fig. 2. Decision Logic Code Snippet from drone\_controller.py.

*Maps and Sensors.* The map of a scenario specifies the tracks that the agents can follow. While a map may have infinitely many tracks, they fall in a finite number of *track modes*. For example, in this ∞-shaped map, each layer is assigned to a track mode (T0-2) and all the tracks between each pair of layers are also assigned to a track mode (M10, M01 etc.). When an agent makes a decision and changes its tactical mode, the map object determines the new track mode for the agent. The map abstraction makes scenarios succinct and enables portability of agents across different maps. Besides creating from scratch, Verse provides functions for generating map objects from OpenDRIVE [4] files.

<sup>1</sup> This design decision for Verse is relatively independent. For reachability analysis, Verse currently uses black-box statistical approaches implemented in DryVR [14] and NeuReach [35]. If the simulator is available as a white-box model, such as differential equations, then Verse could use model-based reachability analysis.

The *sensor* function defines which variables from an agent are visible to other agents. The default sensor function allows all agents to see all variables; we discuss how the sensor function can be modified to include bounded noise in Sect. 5. A map, a sensor and a collection of (compatible) agents together define a scenario object (Fig. 3). In the first few lines, the drone agents are created, initialized, and added to the scenario object. A scenario can have heterogeneous agents with different decision logics.

```
32 scenario = Scenario()
33 drone_red = DroneAgent('drone_red', file_name='drone_controller.py')
34 drone_red.set_initial([init_l_1, init_u_1],(CraftMode.Normal, TrackMode.T1))
35 scenario.add_agent(drone_red)
36 drone_blue = DroneAgent('drone_blue', file_name='drone_controller.py')
37 scenario.add_agent(drone_blue)
38
39 scenario.set_map(M6())
40 scenario.set_sensor(BaseSensor())
41
42 traces = scenario.verify(40, time_step)
```
Fig. 3. Scenario specification snippet.

*Simulation and Reachability.* Once a scenario is defined, Verse's simulate function can generate simulation(s) of the system, which can be stored and plotted. As shown in Fig. 1(Center), a simulation from a single initial state explores all possible branches that can be generated by the decision logics of the interacting agents, upto a specified time horizon. Verse verifies the safety assertions of a scenario by computing the over-approximations of the *reachable sets* for each agent, and checking these against the predicates defined by the assertions. Figure 1(Right) visualizes the result of such a computation performed using the verify function. In this example, the safety condition is violated when the blue drone moves downward to avoid the red drone. The other branches of the scenario are proved to be safe. The simulate and verify functions save a copy of the resulting execution tree, which can be loaded and traversed to analyze the sequences modes and states that leads to safety violations. Verse makes it convenient to plug in different reachability subroutines. It also provides powerful functions to implement advanced verification algorithms, such as incremental verification.

### 3 Scenarios in Verse

A *scenario* in Verse is specified by a map, a collection of agents in that map, and a sensor function that defines the part of each agent visible to other agents. We describe these components below, and in Sect. 4, we will discuss how they formally define a hybrid system.

Tracks, Track Modes, and Maps. A *workspace* W is an Euclidean space in which the agents reside (For example, a compact subset of R<sup>2</sup> or R<sup>3</sup>). An agent's continuous dynamics makes it roughly follow certain continuous curves in W, called *tracks*, and occasionally the agent's decision logic changes the track. Formally, a *track* is simply a continuous function <sup>ω</sup> : [0, 1] <sup>→</sup> <sup>W</sup>, but not all such functions are valid tracks. A map M defines the set of tracks Ω<sup>M</sup> it permits. In a highway map, some tracks will be aligned along the lanes while others will correspond to merges and exits.

We assume that an agent's decision logic does not depend on exactly which of the infinitely many tracks it is following, but instead, it depends only on which type of track it is following or the *track mode*. In the example in Sect. 2, the track modes are T0, T1, M01, etc. Every (blue) track for transitioning from point on T0 to the corresponding point on T1 has track mode M01. A map has a finite set of track modes <sup>L</sup><sup>M</sup> and a labeling function <sup>V</sup><sup>M</sup> : <sup>Ω</sup><sup>M</sup> <sup>→</sup> <sup>L</sup><sup>M</sup> that maps the track to a track mode. It also has a mapping <sup>g</sup><sup>M</sup> : <sup>W</sup> <sup>×</sup> <sup>L</sup><sup>M</sup> <sup>→</sup> <sup>Ω</sup><sup>M</sup> that maps a track mode and a specific position in the workspace to a specific track.

Finally, a Verse agent's decision logic can change its internal mode or *tactical mode* P (E.g., Normal to MoveUp). When an agent changes its tactical mode, it may also update the track it is following and this is encoded in the track graph function: <sup>T</sup>g<sup>M</sup> : <sup>L</sup><sup>M</sup> <sup>×</sup> <sup>P</sup> <sup>×</sup> <sup>P</sup> <sup>→</sup> <sup>L</sup><sup>M</sup> which takes the current track mode, the current and the next tactical mode, and generates the new track mode the agent should follow. For example, when the tactical mode of a drone changes from Normal to MoveUP while it is on T1, this map function <sup>T</sup>gM(T1, Normal, MoveUp) = M10 informs that the agent should follow a track with mode M10. These sets and functions together define a Verse map object <sup>M</sup> <sup>=</sup> LM, VM, gM, Tg<sup>M</sup>. We will drop the subscript <sup>M</sup> when the map being used is clear from context.

Agents. A Verse *agent* is defined by modes and continuous state variables, a decision logic that defines (possibly nondeterministic) discrete transitions, and a flow function that defines continuous evolution. An agent A is *compatible* with a map M if the agent's tactical modes P are a subset of the allowed input tactical modes for <sup>T</sup>g. This makes it possible to instantiate the same agent on different compatible maps. The *mode space* for an agent instantiated on map M is the set <sup>D</sup> = <sup>L</sup> <sup>×</sup> <sup>P</sup>, where <sup>L</sup> is the set of track modes in <sup>M</sup> and <sup>P</sup> is the set of tactical modes of the agent. The *continuous state space* is <sup>X</sup> = <sup>W</sup> <sup>×</sup> <sup>Z</sup>, where <sup>W</sup> is the workspace (of M) and Z is the space of other continuous state variables. The (full) *state space* is the Cartesian product <sup>Y</sup> = <sup>X</sup> <sup>×</sup>D. In the two-drone example in Sect. 2, the continuous states variables are the positions and velocities along the three axes of the workspace. The modes are Normal, T1, MoveUp, M10, etc.

An *agent* <sup>A</sup> in map <sup>M</sup> with <sup>k</sup> <sup>−</sup> 1 other agents is defined by a tuple <sup>A</sup> = Y,Y <sup>0</sup>, G, R, F, where <sup>Y</sup> is the state space, <sup>Y</sup> <sup>0</sup> <sup>⊆</sup> <sup>Y</sup> is the set of initial states. The guard G and reset R functions jointly define the discrete transitions. For a pair of modes d, d <sup>∈</sup> D, G(d, d ) <sup>⊆</sup> <sup>X</sup><sup>k</sup> defines the condition under which a transition from <sup>d</sup> to <sup>d</sup> is enabled. The <sup>R</sup>(d, d ) : <sup>X</sup><sup>k</sup> <sup>→</sup> <sup>X</sup> function specifies how the continuous states of the agent are updated when the mode switch happens. Both of these functions take as input the sensed continuous states of all the other <sup>k</sup>−1 agents in the scenario. The <sup>G</sup> and the <sup>R</sup> functions are not defined separately, but are extracted by the Verse parser from a block of structured Python code as shown in Fig. 2. The discrete states in if conditions and assignments define the source and destination of discrete transitions. if conditions involving continuous states define guards for the transitions and assignments of continuous states define resets. Expressions with any and all functions are unrolled to disjunctions and conjunctions according to the number of agents k.

For example in Fig. 2, Lines 47–50 define transitions MoveUp, M10 to Normal, T0 and MoveUp, M21 to Normal, T1. The change of track mode is given by the <sup>T</sup>g function. The guard for this transition comes from the if condition at Line 48, <sup>G</sup>(MoveUp, M10,Normal, T0) = {<sup>x</sup> | −1 <sup>&</sup>lt; T0.pz <sup>−</sup> x.pz < 1} for x ∈ X given by user defined in\_interval function. Here continuous states remain unchanged after transition.

The final component of the agent is the *flow* function <sup>F</sup> : <sup>X</sup> <sup>×</sup> <sup>D</sup> <sup>×</sup>R≥<sup>0</sup> <sup>→</sup> <sup>X</sup> which defines the continuous time evolution of the continuous state. For any initial condition x0, d<sup>0</sup> ∈ <sup>Y</sup> , <sup>F</sup>(x0, d0)(·) gives the continuous state of the agent as a function of time. In this paper, we use F as a black-box function (see Footnote 1).

Sensors and Scenarios. For a scenario with <sup>k</sup> agents, a *sensor* function <sup>S</sup> : Y <sup>k</sup> → Y <sup>k</sup> defines the continuous observables as a function of the continuous state. For simplifying exposition, in this paper we assume that observables have the same type as the continuous state Y , and that each agent i is observed by all other agents identically. This simple, overtly transparent sensor model, still allows us to write realistic agents that only use information about nearby agents. In a highway scenario, the observable part of agent j to another agent i may be the relative distance <sup>y</sup>j <sup>=</sup> <sup>x</sup>j <sup>−</sup> <sup>x</sup>i, and vice versa, which can be computed as a function of the continuous state variables <sup>x</sup>j and <sup>x</sup>i. A different sensor function which gives nondeterministic noisy observations, appears in Sect. 5.

A Verse *scenario* SC is defined by (a) a map M, (b) a collection of k agent instances {A1...Ak} that are compatible with <sup>M</sup>, and (c) a sensor <sup>S</sup> for the <sup>k</sup> agents. Since all the agents are instantiated on the same compatible map M, they share the same workspace. Currently, we require agents to have identical state spaces, i.e., <sup>Y</sup>i <sup>=</sup> <sup>Y</sup>j , but they can have different decision logics and different continuous dynamics.

### 4 Verse Scenario to Hybrid Verification

In this section, we define the underlying hybrid system <sup>H</sup>(SC), that a Verse scenario SC specifies. The verification questions that Verse is equipped to answer are stated in terms of the behaviors or *executions* of <sup>H</sup>(SC). Verse's notion of a hybrid automaton is close to that in Definition 5 of [14]. The only uncommon aspect in [14] is that the continuous flows may be defined by a black-box simulator functions, instead of white-box analytical models (see Footnote 1).

Given a scenario with <sup>k</sup> agents SC <sup>=</sup> M, {A1, ...Ak}, <sup>S</sup>, P, the corresponding hybrid automaton <sup>H</sup>(SC) = X, <sup>X</sup><sup>0</sup>, <sup>D</sup>, <sup>D</sup><sup>0</sup>, <sup>G</sup>, <sup>R</sup>, TL, where


We denote by ξ.fstate, ξ.lstate, and ξ.ltime the initial state <sup>ξ</sup>(0), the last state <sup>ξ</sup>(T), and ξ.ltime = <sup>T</sup>. For a sampling parameter δ > 0 and a length <sup>m</sup>, <sup>a</sup> <sup>δ</sup>-*execution* of a hybrid automaton <sup>H</sup> = <sup>H</sup>(SC) is a sequence of <sup>m</sup> labeled trajectories <sup>α</sup> := ξ0, <sup>d</sup><sup>0</sup>, ...,ξm−1, <sup>d</sup>m−<sup>1</sup>, such that (1) <sup>ξ</sup>0.fstate <sup>∈</sup> <sup>X</sup><sup>0</sup>, <sup>d</sup><sup>0</sup> <sup>∈</sup> <sup>D</sup><sup>0</sup>, (2) For each <sup>i</sup> ∈ {1, ..., m <sup>−</sup> <sup>1</sup>}, <sup>ξ</sup>i.lstate <sup>∈</sup> <sup>G</sup>(d<sup>i</sup> , <sup>d</sup>i+1) and <sup>ξ</sup>i+1.fstate = <sup>R</sup>(d<sup>i</sup> , <sup>d</sup>i+1)(ξ<sup>i</sup> .lstate), and (3) For each <sup>i</sup> ∈ {1, ..., m <sup>−</sup> 1}, <sup>ξ</sup><sup>i</sup> .ltime = <sup>δ</sup> for <sup>i</sup> = <sup>m</sup> <sup>−</sup> 1 and <sup>ξ</sup><sup>i</sup> .ltime <sup>≤</sup> <sup>δ</sup> for <sup>i</sup> = <sup>m</sup> <sup>−</sup> 1.

We define first and last state of an execution <sup>α</sup> <sup>=</sup> ξ0, <sup>d</sup><sup>0</sup>, ...,ξm−1, <sup>d</sup>m−<sup>1</sup> as α.fstate = <sup>ξ</sup>0.fstate, α.lstate <sup>=</sup> <sup>ξ</sup>m−1.lstate and the first and last mode as α.fmode = <sup>d</sup><sup>0</sup> and α.lmode <sup>=</sup> <sup>d</sup>m−<sup>1</sup>. The set of reachable states is defined by ReachH := {α.lstate <sup>|</sup> <sup>α</sup> is an execution of <sup>H</sup>}. In addition, we denote the reachable states in a specific mode <sup>d</sup> <sup>∈</sup> <sup>V</sup> as ReachH(d) and ReachH(T) to be the set of reachable states at time T. Similarly, denoting the unsafe states for mode <sup>d</sup> as <sup>U</sup>(d), the safety verification problem for <sup>H</sup> can be solved by checking whether <sup>∀</sup><sup>d</sup> <sup>∈</sup> <sup>D</sup>, ReachH(d) <sup>∩</sup> <sup>U</sup>(d) = <sup>∅</sup>. Next, we discuss Verse functions for verification via reachability.

Verification Algorithms in Verse. The Verse library comes with several built-in verification algorithms, and it provides functions that users can use to implement powerful new algorithms. We describe the basic algorithm and functions in this section.

Consider a scenario SC with k agents and the corresponding hybrid automaton <sup>H</sup>(SC). For a pair of modes, <sup>d</sup>, <sup>d</sup> the standard discrete post<sup>d</sup>,d- : <sup>X</sup> <sup>→</sup> <sup>X</sup> and continuous post<sup>d</sup>,δ : <sup>X</sup> <sup>→</sup> <sup>X</sup> operators are defined as follows: For any state <sup>x</sup>, <sup>x</sup> <sup>∈</sup> <sup>X</sup>, post<sup>d</sup>,d- (x) = <sup>x</sup> iff <sup>x</sup> <sup>∈</sup> <sup>G</sup>(d, <sup>d</sup> ) and <sup>x</sup> = <sup>R</sup>(d, <sup>d</sup> )(x); and, post<sup>d</sup>,δ(x) = <sup>x</sup> iff <sup>∀</sup><sup>i</sup> <sup>∈</sup> <sup>1</sup>, ..., k, <sup>x</sup> i <sup>=</sup> <sup>F</sup>i(xi, <sup>d</sup>i, δ). These operators are also lifted to sets of states in the usual way. Verse provides postCont to compute post<sup>d</sup>,δ and postDisc to compute post<sup>d</sup>,d- . Instead of computing the exact post, postCont and postDisc compute over-approximations using improved implementations of the algorithms in [14]. Verse's verify function implements a reachability analysis algorithm using these post operators. The algorithm constructs an execution tree T ree = V,E up to depth <sup>m</sup> in breadth first order. Each vertex S, <sup>d</sup> ∈ <sup>V</sup>

is a pair of a set of states and a mode. The root is X<sup>0</sup>, <sup>d</sup><sup>0</sup>. There is an edge from S, d to S , d , iff <sup>S</sup> = postd-,δ(postd,d- (S)). The safety conditions are checked when the tree is constructed. Currently, Verse implements only bounded time reachability, however, basic unbounded time analysis with fixed-point checks could be added following [14,32].

### 5 Experiments and Use Cases

We evaluate key features and algorithms in Verse through examples. We consider two types of agents: a 4-d ground vehicle with bicycle dynamics and the Stanley controller [22] and a 6-d drone with a NN-controller [23]. Each of these agents can be fitted with one of two types of decision logic: (1) a collision avoidance logic (CA) by which the agent switches to a different available track when it nears another agent on its own track, and (2) a simpler non-player vehicle logic (NPV) by which the agent does not react to other agents (and just follows its own track at constant speed). We denote the car agent with CA logic as agent C-CA, drone with NPV as D-NPV, and so on. We use four 2-d maps (M1-4) and two 3-d maps <sup>M</sup>5-6. <sup>M</sup>1 and <sup>M</sup>2 have 3 and 5 parallel straight tracks, respectively. <sup>M</sup>3 has 3 parallel tracks with circular curve. <sup>M</sup>4 is imported from OpenDRIVE. <sup>M</sup>6 is the figure-8 map used in Sect. 2.

*Safety Analysis with Multiple Drones in a 3-d Map.* The first example is a scenario with two drones—D-CA agent (red) and D-NPV agent (blue)—in map <sup>M</sup>5. The safety assertion requires agents to always separate by at least 1 m. Figure 4(*left*) shows the computed reachable set, its projection on x-position, and on z position. Since the agents are separated in space-time, the scenario is verified safe. These plots are generated using Verse's plotting functions.

Fig. 4. Left to right: (1) Computed reachtubes for a 2-drone scenario; (2) same reachtube projected on x-dimension, and (3) on z-dimension. Since there is no overlap in space-time, no collision. (4) Reachtube for a 3-drone scenario, the red drone violates the safety condition by entering the unsafe region after moving downward. (Color figure online)

*Checking Multiple Safety Assertions.* Verse supports multiple safety assertions specified using assert statements. For example, the user can specify unsafe regions (Line 77–78) or safe separation between agents (Line 79–82) as shown in Fig. 5. We add a second D-NPV to the previous scenario and both safety assertions. The result is shown in the rightmost Fig. 4. In this scenario, D-CA violates the safety property by entering the unsafe region after moving downward to avoid collision. The behavior of D-CA after moving upward is not influenced. There is no violation of safe separation. Verse allow users to extract the set of reachable states and mode transitions that leads to a safety violation.

```
77 assert not (ego.x > 40 and ego.x < 50 and \
78 ego.y > -5 and ego.y < 5 and ego.z > -10 and ego.z < -6), "Unsafe Region"
79 assert not any(ego.x-other.x < 1 and ego.x-other.x > -1 and \
80 ego.y-other.y < 1 and ego.y-other.y > -1 and \
81 ego.z-other.z < 1 and ego.z-other.z > -1 \
82 for other in others), "Safe Separation"
```
Fig. 5. Safety assertions for three drone scenario.

*Changing Maps.* Verse allows users to easily create scenarios with different maps and port agents across compatible maps. We start with a scenario with one C-CA agent (red) and two C-NPV agents (blue, green) in <sup>M</sup>1. The safety assertion is that the vehicles should be at least 1m apart in both x and y-dimensions. Figure 6(*left*) shows the verification result and safety is not violated. However, if we switch to map <sup>M</sup>3 by changing one line in the scenario definition, a reachability analysis shows that a safety violation can happen after C-CA merges left Fig. 6(*center*). In addition, Verse allows importing map from OpenDRIVE [4] format. An example is included in the extended version of the paper [26].

Fig. 6. *Left:* running the three car scenario on map with parallel straight lanes. *Center:* same scenario with a curved map. *Right:* same scenario with a noisy sensor. (Color figure online)

*Adding Noisy Sensors.* Verse supports scenarios with different sensor functions. For example, the user can create a noisy sensor function that mimics a realistic sensor with bounded noise. Such sensor functions are easily added to the scenario using the set\_sensor function.

Figure 6(*right*) shows exactly the same three-car scenario with a noisy sensor, which adds <sup>±</sup>0.5 m noise to the perceived position of all other vehicles. Since the sensed values of other agents only impacts the checking of the guards (and hence the transitions) of the agents, Verse internally bloats the reachable set of positions for the other agents by <sup>±</sup>0.5 while checking guards. Compared with the behavior of the same agent with no sensor noise (shown in yellow in Fig. 6(*right*)), the sensor noise enlarges the region over which the transition can happen, causes enlarged reachtubes for the red agent.

*Plugging in Different Reachability Engines.* With a little effort, Verse allows users to plug in different reachability tools for the postCont computation. The user will need to modify the interface of the reachability tool so that given a set of initial states, a mode, and a non negative value δ, the reachability tool can output the set of reachable states over a δ-period represented by a set of timed hyperrectangles. Currently, Verse implements computing postCont using DryVR [14], NeuReach [35] and Mixed Monotone Decomposition [12]. A scenario with two car agents in map <sup>M</sup>1 verified using NeuReach and DryVR is included in the extended version of the paper [26].

*Incremental Verification.* We implemented an incremental verification algorithm in Verse called verifyInc. This algorithm improves verify by caching and reusing reachtubes, and can be effective when analyzing a sequence of slightly different scenarios. The function verifyInc avoids re-computing post<sup>d</sup>,d and postd,δ when constructing the execution tree by reusing earlier execution runs. Experiments show that verifyInc reduces running time by 10x for two identical runs and 2x when the decision logic is slightly modified. (More details are provided in the extended version of paper [26]). This exercise illustrates a usage of Verse in creating alternative analysis algorithms.

Table 1 summarizes the running time of verifying all the examples in this section. We additionally include three standard benchmarks: van-der-pol (Agent V) [20], spacecraft rendezvous (Agent S) [20], and gearbox (Agent G) [2]. As expected, the running times increase with the number of discrete mode transition. However, for complicated scenario with 7 agents and 37 transitions, the verification can still finish in under 6 mins, which suggests some level of scalability. The choice of reachability engine can also impact running time. For the same scenario in rows 2, 3 and 10, 11, Verse with NeuReach<sup>2</sup> as the reachability engine takes more time than using DryVR as the reachability engine.


Table 1. Runtime for verifying examples in Sect. 5. Columns are: number of agents (#A), agent type (A), map used (Map), reachability engine used (postCont), sensor type (NS), number of mode transitions #TR, and the total run time (Rt). N/A for not available.

### 6 Related Work

Automatic hybrid verification tools typically require the input model to be written in a tool-specific language [10,13–15,17,25]. Libraries like JuliaReach [7]

<sup>2</sup> Runtime for NeuReach includes training time.

Hylaa [5] and HyPro [8] share our motivation to reduce the usability barrier by providing reachability analysis APIs for popular programming languages. Verse is distinct in this family in that it supports creation and analysis of multi-agent scenarios. The work in [33] also supports multiple agents, however, Verse significantly improves usability with maps, scenarios and decision logics written in Python.

Interactive theorem provers have been used for modeling and verification of multi-agent and hybrid systems [16,19,27,29]. KeYmeraX [19] uses quantified differential dynamic logic for specifying multi-agent scenarios and supports proof search and user defined tactics. Isabelle/HOL [16], PVS [27], and Maude [29] have also been used for limited classes of hybrid systems. These approaches are geared for a different user segment in that they provide higher expressive and analytical power to expert users. Verse is inspired by widely used tools for simulating multiagent scenarios [9,18,28,30,36]. While the models created in these tools can be flexible and expressive, currently they are not amenable to formal verification.

### 7 Conclusions and Future Directions

In this paper, we presented the new open source Verse library for broadening applications of hybrid system verification technologies to scenarios involving multiple interacting decision-making agents. There are several future directions for Verse. Verse currently assumes all agents interact with each other only through the sensor in the scenario and all agents share the same sensor. This restriction could be relaxed to have different types of asymmetric sensors. Functions for constructing and systematically sampling scenarios could be developed. Functions for post-computation for white-box models by building connections with existing tools [1,10,15] would be a natural next step. Those approaches could obviously utilize the symmetry property of agent dynamics as in [32,34], but beyond that, new types of symmetry reductions should be possible by exploiting the map geometry.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Synthesis**

# Counterexample Guided Knowledge Compilation for Boolean Functional Synthesis

S. Akshay(B) , Supratik Chakraborty(B) , and Sahil Jain

Indian Institute of Technology Bombay, Mumbai, India {akshayss,supratik}@cse.iitb.ac.in, sahil.jain1.c2022@iitbombay.org

Abstract. Given a specification as a Boolean relation between inputs and outputs, Boolean functional synthesis generates a function, called a Skolem function, for each output in terms of the inputs such that the specification is satisfied. In general, there may be many possibilities for Skolem functions satisfying the same specification, and criteria to pick one or the other may vary from specification to specification.

In this paper, we develop a technique to represent the space of Skolem functions in a criteria-agnostic form that makes it possible to subsequently extract Skolem functions for different criteria. Our focus is on identifying such a form and on developing a compilation algorithm for this form. Our approach is based on a novel counter-example guided strategy for existentially quantifying a subset of variables from a specification in negation normal form. We implement this technique and compare our performance with those of other knowledge compilation approaches for Boolean functional synthesis, and show promising results.

### 1 Introduction

Manually designing systems that satisfy complex user-provided specifications can be notoriously tricky. *Automated synthesis* has therefore attracted significant attention of researchers over the past few decades [1–5]. In this paradigm, a user describes the desired behaviour of a system as a relational specification between its inputs and outputs, and an algorithm automatically generates an implementation, such that the specification is provably satisfied. In this paper, we focus only on systems with Boolean inputs and outputs with relational specifications given as Boolean formulas. The synthesis problem in this setting is also called *Boolean functional synthesis*. Formally, let ϕ(*X*,*<sup>Y</sup>* ) be a Boolean formula representing the specification, where *<sup>X</sup>* = (x<sup>1</sup>,...x<sup>m</sup>) is a vector of Boolean inputs and *<sup>Y</sup>* = (y<sup>1</sup>,...y<sup>n</sup>) a vector of Boolean outputs of the system. Boolean functional synthesis requires us to generate a vector of Boolean functions *<sup>Ψ</sup>*(*X*) = - ψ<sup>1</sup>(*X*),...ψ<sup>n</sup>(*X*) such that ∀*X*- <sup>∃</sup>*<sup>Y</sup>* ϕ(*X*,*<sup>Y</sup>* ) <sup>⇔</sup> ϕ(*X*, *<sup>Ψ</sup>*(*X*)) .

Authors names are in alphabetical order of last names

c The Author(s) 2023

<sup>-</sup>C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 367–389, 2023. https://doi.org/10.1007/978-3-031-37706-8\_19

For each <sup>i</sup> ∈ {1,...n}, the function <sup>ψ</sup>i(*X*) is called a Skolem function for <sup>y</sup><sup>i</sup> in ϕ(*X*,*<sup>Y</sup>* ), and *<sup>Ψ</sup>*(*X*) is called a Skolem function vector.

There are several interesting applications of Boolean functional synthesis, including automated program synthesis, circuit repair and debugging, cryptanalysis and the like [2,6–10]. This has motivated researchers to develop novel algorithms for solving increasingly larger and more complex synthesis benchmarks [11–19]. Each such algorithm generates *a single Skolem function vector* for a given relational specification, thereby providing an implementation of the system. However, there may be many alternative function vectors that also serve as Skolem function vectors for the same specification. Some of these may yield system implementations that are more "desirable" than those obtained from other Skolem function vectors, when non-functional metrics like size of program/circuit needed for implementation, ease of understandability etc. are considered. Therefore, having a tool output a single Skolem function vector (chosen by the tool, without any user agency in the choice) can be restrictive in terms of implementation choices available to the user.

One way to address the above problem is to use a knowledge compilation approach, i.e. to compile the specification to a *special normal form* from which it is relatively easy to use downstream logic synthesis tools to generate any Skolem function vector optimizing user-specified criteria. Unfortunately, earlier work on knowledge compilation for Boolean functional synthesis [13,14,20] does not allow us to do this easily. They simply allow efficient synthesis of one (among possibly many) Skolem function vector from the compiled representation. Moreover, the user has no agency in choosing which Skolem function vector is synthesized; all choices are made implicitly deep inside heuristics of the compilation algorithms. For example, if we compile a relational specification to wDNNF [14] or SynNNF [13], the only guarantee we have is that the so-called GACKS Skolem functions (see [14]) can be efficiently synthesized from the compiled forms. But what if these functions are not the user's preferred choice of Skolem functions for an application? Unfortunately, not much can be done if we compile the specification to wDNNF or SynNNF. Similarly, the compilation approach proposed in [20] allows efficient synthesis of Skolem functions of yet another form, but even here, the user hardly has any agency in choosing which (among many alternative) Skolem function vectors is actually output. Existing algorithms therefore effectively restrict the *semantic choice* of Skolem functions with hardly any way for the user to influence this choice. Once the semantic choice has been made by the compiler, the only agency the user has is in *optimizing the implementation of this semantic choice*. We believe the inability of existing compilation approaches to allow the user semantic choice of Skolem functions is a limiting factor in practical usage of these works. In this paper, we take a first step towards remedying this problem.

The central question we ask in this paper is: *Can we compile a Boolean relational specification to a representation that does not restrict the semantic choice of Skolem functions, and yet allows easy deployment of downstream logic synthesis tools to obtain Skolem functions customized to user-provided criteria?* Our main result is an affirmative answer to this question. We also design and implement an algorithm that compiles a given specification in negation normal form to such a representation form, We emphasize that our goal in this paper is not to identify specific optimization criteria or to synthesize Skolem functions that optimize some specific criteria. Instead, we focus on developing a representation that makes it possible to use downstream logic optimization tools to synthesize Skolem functions satisfying user-provided criteria. Our experiments show that our approach is competitive performance-wise to earlier approaches that severely restrict the semantic choice of Skolem functions.

The primary contributions of this paper can be summarized as follows.


*Related Work.* In knowledge compilation, the general goal is to represent a problem specification in a form that allows specific questions to be answered efficiently (see e.g., [21–23]). In [22,24], representation forms for Boolean functions were proposed that allow efficient enumeration of all satisfying assignments of the function. However, this idea cannot be easily extended to enumerate Skolem functions, since the space of functions is doubly exponentially large in the number of variables. For Boolean functional synthesis, [13,20,25,26] provide normal forms and present compilers that render synthesis of a single Skolem function vector easy. However, they do not provide the user any agency in choosing the Skolem function vector. In fact, the optimizations used in [13] preclude generation of all Skolem function vectors for reasons of efficiency. In the current work, our focus is on symbolically representing the space of all Skolem function vectors, without necessarily converting the given specification to a semantically equivalent one in special normal form. Thus, the problem addressed in this paper is technically different from those addressed in [13,20,25,26]. Nevertheless, our work can be viewed as knowledge representation for all Skolem functions.

### 2 A Motivating Example

We start with a simple example that illustrates some of the problems we wish to address. Suppose we are designing a memoryless arbiter that must arbitrate requests from three users for a shared resource. Let the arbiter inputs be Boolean variables <sup>r</sup><sup>1</sup>, r<sup>2</sup>, r<sup>3</sup>, where <sup>r</sup><sup>i</sup> is true iff there is a request from user <sup>i</sup>. Let the corresponding arbiter outputs be <sup>g</sup><sup>1</sup>, g<sup>2</sup>, g<sup>3</sup>, where <sup>g</sup><sup>i</sup> is true iff access is granted to user i. We want the arbiter to satisfy the following properties: (a) at most one user must be granted access at a time, (b) if some user has requested access, some user must be granted access, and (c) a user should be granted access only if she has requested. The above properties can be encoded as a specification ϕ <sup>≡</sup> <sup>ϕ</sup><sup>1</sup>∧ϕ<sup>2</sup>∧ϕ<sup>3</sup>, where <sup>ϕ</sup><sup>1</sup> <sup>≡</sup> - <sup>g</sup><sup>1</sup> ⇒ ¬(g<sup>2</sup>∨g3) ∧ - <sup>g</sup><sup>2</sup> ⇒ ¬(g<sup>1</sup>∨g3) ∧ - <sup>g</sup><sup>3</sup> ⇒ ¬(g<sup>1</sup>∨g2) , <sup>ϕ</sup><sup>2</sup> <sup>≡</sup> (r<sup>1</sup> <sup>∨</sup>r<sup>2</sup> <sup>∨</sup>r3) <sup>⇒</sup> (g<sup>1</sup> <sup>∨</sup>g<sup>2</sup> <sup>∨</sup>g3), and <sup>ϕ</sup><sup>3</sup> <sup>≡</sup> (g<sup>1</sup> <sup>⇒</sup> <sup>r</sup>1)∧(g<sup>2</sup> <sup>⇒</sup> <sup>r</sup>2)∧(g<sup>3</sup> <sup>⇒</sup> <sup>r</sup>3).

It turns out that there are many different Skolem function vectors *<sup>Ψ</sup>* = (ψ1, ψ2, ψ3) for the above specification, where each <sup>ψ</sup><sup>i</sup> gives a Skolem function for g<sup>i</sup>. We ran two state-of-the-art Boolean functional synthesis tools, viz. Manthan2 [17] and BFSS [14], on this specification. BFSS required us to also specify a linear order of outputs (we will shortly see why), and we used <sup>g</sup><sup>1</sup> <sup>≺</sup> <sup>g</sup><sup>2</sup> <sup>≺</sup> <sup>g</sup><sup>3</sup>. Both tools solved the problem in no time, and each reported a Skolem function vector *without any room for the user to influence the choice of Skolem functions*. Specifically, the Skolem functions returned by Manthan2 can be represented by the And-Inverter Graph (AIG) shown in Fig. 1a. Here, each circle represents a two-input AND gate, and each dotted (resp. solid) edge represents a connection with (resp. without) logical negation. Thus, the Skolem functions are: <sup>ψ</sup><sup>2</sup> <sup>≡</sup> <sup>r</sup><sup>2</sup> ∧ ¬r<sup>1</sup> ∧ ¬r<sup>3</sup>, <sup>ψ</sup><sup>1</sup> <sup>≡</sup> <sup>r</sup><sup>1</sup> ∧ ¬r<sup>3</sup> ∧ ¬g<sup>2</sup> and <sup>ψ</sup><sup>3</sup> <sup>≡</sup> <sup>r</sup><sup>3</sup> ∧ ¬g<sup>1</sup> ∧ ¬g<sup>2</sup>. Running BFSS on the same specification yields Skolem functions represented by the AIG in Fig. 1c. Here, <sup>ψ</sup><sup>3</sup> <sup>≡</sup> <sup>r</sup><sup>3</sup> ∧ ¬r<sup>1</sup> ∧ ¬r<sup>2</sup>, <sup>ψ</sup><sup>2</sup> <sup>≡</sup> <sup>r</sup><sup>2</sup> ∧ ¬g<sup>3</sup> and <sup>ψ</sup><sup>1</sup> <sup>≡</sup> <sup>r</sup><sup>1</sup> ∧ ¬g<sup>2</sup> ∧ ¬g<sup>3</sup>.

Fig. 1. Unoptimized and optimized AIGs of Skolem functions

Are the Skolem functions generated by the two tools in their simplest forms, and did they miss out some possibilities of optimization? To answer this, we used a widely used logic optimization tool, viz. abc [27], to simplify the two AIGs using commands to minimize the AND gate count and to balance lengths of paths in the AIGs. The resulting simplified AIGs are shown in Fig. 1b (obtained from Fig. 1a) and Fig. 1d (obtained from Fig. 1c). Thus, Manthan2's solution is equivalent to <sup>ψ</sup><sup>3</sup> <sup>≡</sup> <sup>r</sup><sup>3</sup>, <sup>ψ</sup><sup>2</sup> <sup>≡</sup> <sup>r</sup><sup>2</sup> ∧ ¬r<sup>1</sup> ∧ ¬r<sup>3</sup>, <sup>ψ</sup><sup>1</sup> <sup>≡</sup> <sup>r</sup><sup>1</sup> ∧ ¬r<sup>3</sup>, while BFSS' solution is equivalent to <sup>ψ</sup><sup>2</sup> <sup>≡</sup> <sup>r</sup><sup>2</sup>, <sup>ψ</sup><sup>1</sup> <sup>≡</sup> <sup>r</sup><sup>1</sup> ∧ ¬r<sup>2</sup>, <sup>ψ</sup><sup>3</sup> <sup>≡</sup> <sup>r</sup><sup>3</sup> ∧ ¬r<sup>1</sup> ∧ ¬r<sup>2</sup>. Note that the two solutions are semantically equivalent modulo permutaton of indices (although this wasn't obvious prior to optimization).

There are some important take-aways from this simple experiment. First, neither Manthan2 nor BFSS gave the user any agency in the semantic choice of the synthesized Skolem functions. The use of the abc tool with user-provided optimization criteria at the end simply gave us choice of implementation for the Skolem functions already determined by each tool. Significantly, there are choices of Skolem function vectors, viz. <sup>ψ</sup><sup>1</sup> <sup>≡</sup> <sup>r</sup><sup>1</sup>∧(¬r<sup>2</sup>∨¬r3), <sup>ψ</sup><sup>2</sup> <sup>≡</sup> <sup>r</sup><sup>2</sup>∧(¬r<sup>1</sup>∨r3), <sup>ψ</sup><sup>3</sup> <sup>≡</sup> (¬r<sup>1</sup> ∧ ¬r<sup>2</sup> <sup>∧</sup> <sup>r</sup>3), that are *ignored* by both Manthan2 and BFSS (and by other tools like CADET [11]). This can lead to ignoring "better" Skolem function vectors in general. The user's criteria for desirability of Skolem functions may differ from one problem instance to another, and may be completely different from what is hard-coded in the innards of a tool like Manthan2/BFSS. For example, the new Skolem function vector considered above admits an AIG representation in which input-to-output shortest (resp. longest) path lengths are equal across all outputs. This may indeed be a desirable feature in some application where variability of output delays matters. However, there is currently no way to influence BFSS/Manthan2 to arrive at Skolem functions optimized per such criteria.

The above example also illustrates the important role played by logic optimization in obtaining efficient implementations of Skolem functions generated by state-of-the-art synthesis tools. However, using logic optimization as a postprocessor can only provide a better implementation of already chosen (semantically) Skolem functions. Fortunately, more than five decades of research in logic optimization has resulted in mature (even commercial) tools that can do much more than just implementation optimization. Specifically, don't-care based optimizations [28] can search within a specified space of (semantically distinct) functions to choose one that is optimized according to a given user criteria. Such a choice involves a combined optimization across semantic and implementation choices. Given this capability of logic optimizers, and their indispensable use in synthesis flows, we posit that logic optimizers are the right engines to choose between alternative semantic choices of Skolem functions, in addition to optimizing their implementation. Of course, this requires specifying the semantic space of all (Skolem) functions in a form that can be easily processed by logic optimizers. State-of-the-art logic optimizers already allow specifying a family of functions using *on-sets* and *don't-care sets* [29]. Therefore, we propose to use this representation for representing the space of Skolem functions as well.

Before presenting the details of on-sets and don't-care sets for Skolem functions in our example, we note that Skolem functions for different outputs cannot be chosen independently in general. For example, <sup>ψ</sup><sup>3</sup> <sup>≡</sup> <sup>r</sup><sup>3</sup> is generated by Manthan2, and <sup>ψ</sup><sup>2</sup> <sup>≡</sup> <sup>r</sup><sup>2</sup> is generated by BFSS. However, there is no Skolem function vector with <sup>ψ</sup><sup>2</sup> <sup>≡</sup> <sup>r</sup><sup>2</sup> and <sup>ψ</sup><sup>3</sup> <sup>≡</sup> <sup>r</sup><sup>3</sup>), since this would lead to <sup>g</sup><sup>2</sup> <sup>=</sup> <sup>g</sup><sup>3</sup> = 1 when <sup>r</sup><sup>2</sup> <sup>=</sup> <sup>r</sup><sup>3</sup> = 1. Therefore, any representation of the semantic space of all Skolem function vectors *must necessarily* take into account dependence between Skolem functions for different outputs. One way to achieve this is to impose a linear order on the outputs, and to represent the set of Skolem functions for an output in terms of Skolem functions for preceding (in the order) outputs. With this approach, the semantic space of Skolem functions for each output can be expressed by two functions: one representing the set of assignments for which every Skolem function in the represented space must evaluate to 1 (i.e. on-set), and the other representing assignments for which it is ok for a Skolem function to evaluate to either 0 or 1 (i.e. don't-care set).

The above representation is analogous to representing vector spaces using a small set of mutually orthogonal basis vectors, where every vector in the space can be expressed as a linear combination of these basis vectors. In a similar manner, let A denote the on-set of a family of Skolem functions, and B denote the don't-care set for the same family. Let GenImpl- B denote the set of all *generalized implicants* of B, i.e. all formulas ν such that ν <sup>⇒</sup> B. Every Skolem function in the represented space can then be obtained (modulo semantic equivalence) as A <sup>∨</sup> ν where ν <sup>∈</sup> GenImpl- B . Specifically, for our example, with <sup>g</sup><sup>1</sup> <sup>≺</sup> <sup>g</sup><sup>2</sup> <sup>≺</sup> <sup>g</sup><sup>3</sup> of outpus (same as that given to BFSS), we have <sup>A</sup><sup>1</sup> <sup>≡</sup> (¬r<sup>3</sup>∧¬r<sup>2</sup>∧r1), <sup>B</sup><sup>1</sup> <sup>≡</sup> (r<sup>3</sup>∨r2)∧r<sup>1</sup>, <sup>A</sup><sup>2</sup> <sup>≡</sup> (¬r<sup>3</sup>∧r<sup>2</sup>∧¬g1), <sup>B</sup><sup>2</sup> <sup>≡</sup> <sup>r</sup><sup>3</sup>∧r<sup>2</sup>∧¬g<sup>1</sup>, <sup>A</sup><sup>3</sup> <sup>≡</sup> <sup>r</sup><sup>3</sup> ∧ ¬g<sup>2</sup> ∧ ¬g<sup>1</sup>, <sup>B</sup><sup>3</sup> = 0. The Karnaugh-maps shown below depict how the space of all Skolem function vectors can be visualized in terms of <sup>A</sup><sup>i</sup> and <sup>B</sup><sup>i</sup>. To obtain a specific Skolem function vector, we must place a 1 in each A<sup>i</sup>-cell, choose a subset of the <sup>B</sup><sup>i</sup> cells and place <sup>1</sup>'s in those cells and <sup>0</sup>'s in the balance <sup>B</sup><sup>i</sup> cells. Each such choice provides a semantically distinct Skolem function vector, and every Skolem function vector corresponds to one such choice. Specifically, the Skolem function vector missed by Manthan2/BFSS can now be easily obtained by choosing the red and blue <sup>B</sup><sup>1</sup> cells and the teal <sup>B</sup><sup>2</sup> cell to be <sup>1</sup> in the Karnaugh-maps. Similarly, Manthan2's solution is obtained by choosing the blue <sup>B</sup><sup>1</sup> cell and teal <sup>B</sup><sup>2</sup> cell to be <sup>1</sup>, and BFSS' solution is obtained by choosing the red <sup>B</sup><sup>1</sup> cell and teal <sup>B</sup><sup>2</sup> cell to be <sup>1</sup>. Allowing a logic optimizer to optimize Skolem functions with the spaces represented by (A<sup>1</sup>, B<sup>1</sup>, A<sup>2</sup>, B<sup>2</sup>, A<sup>3</sup>, B<sup>3</sup>) therefore makes it possible to synthesize each of these Skolem function vectors. This motivates compiling a given specification into an (A<sup>i</sup>, B<sup>i</sup>) pair for the Skolem functions for each output y<sup>i</sup>.

### 3 Preliminaries and Notation

Let *<sup>Z</sup>* = (z<sup>1</sup>,...,zn) be a vector of Boolean variables. A *literal* is a variable (zi) or its complement (¬zi), a *clause* is a disjunction of literals and a *cube* is a conjunction of literals. For 1 <sup>≤</sup> i <sup>≤</sup> j <sup>≤</sup> n, we use *<sup>Z</sup>*<sup>j</sup> <sup>i</sup> to denote the *slice* (zi,...z<sup>j</sup> ) of the vector *<sup>Z</sup>*. An n-input Boolean function is a mapping from {0, 1}<sup>n</sup> to {0, 1}. A Boolean formula ϕ(*Z*) is a syntactic object whose semantics is given by a mapping from {0, 1}<sup>n</sup> to {0, 1}. Thus, every Boolean formula represents a unique Boolean function, and every Boolean function can be represented by a (not necessarily unique) Boolean formula. Henceforth, we refer to Boolean formulas and Boolean functions interchangeably.

The *support* of ϕ(*Z*), denoted sup(ϕ), is the set of variables in *<sup>Z</sup>*. For ease of exposition, we will abuse notation and use *Z* to denote either a vector or the underlying set of elements, depending on the context. A complete (resp. partial) *assignment* π for *<sup>Z</sup>* is a complete (resp. partial) mapping from *<sup>Z</sup>* to {0, 1}. The value of variable <sup>z</sup><sup>i</sup> assigned by <sup>π</sup> is denoted <sup>π</sup>[z<sup>i</sup>]. A complete assignment <sup>π</sup> of *<sup>Z</sup>* is a *satisfying assignment* for ϕ(*Z*) if the Boolean function represented by ϕ evaluates to 1 when all variables in sup(ϕ) are assigned values given by π. In this case, we say that π <sup>|</sup>= F. A formula ϕ(*Z*) is *satisfiable* if it has at least one satisfying assignment; otherwise it is *unsatisfiable*. We say that two formulas on n variables are *equivalent* if they represent the same semantic mapping from {0, 1}<sup>n</sup> to {0, <sup>1</sup>}. Given Boolean formulas <sup>ϕ</sup> and <sup>α</sup> with <sup>z</sup><sup>i</sup> <sup>∈</sup> sup(ϕ), we use <sup>ϕ</sup>[z<sup>i</sup> <sup>→</sup> <sup>α</sup>] to denote the formula obtained by substituting <sup>α</sup> for every occurrence of <sup>z</sup><sup>i</sup> in ϕ. We use ϕ <sup>z</sup>*i*=1 (resp. <sup>ϕ</sup> <sup>z</sup>*i*=0) to denote the formula obtained by setting <sup>z</sup><sup>i</sup> to 1 (resp. 0) in the formula ϕ(*Z*). The resulting formulas are also called positive (resp. negative) co-factors of <sup>ϕ</sup> w.r.t. <sup>z</sup><sup>i</sup>. For notational convenience, we use ϕ π to denote the formula obtained by repeatedly co-factoring ϕ using the (possibly partial) assignment of variables given by π. As discussed in Sect. 2, we say that a function ϕ (*Z*) is a *generalized implicant* of ϕ(*Z*) if ϕ (*Z*) <sup>⇒</sup> ϕ(*Z*). This generalizes the notion of implicants used in the literature, which are restricted to be cubes. The set of all generalized implication of ϕ is denoted GenImpl- .

ϕ A Boolean formula ϕ(*Z*) can be represented as a circuit or a Directed Acyclic Graph (DAG) consisting of ¬, ∧ and ∨ gates, with literals at leaves. Further, it can be converted to a semantically equivalent formula in *Negation Normal Form (NNF)*, i.e., with no ¬-labelled internal nodes, in time linear in the size of the circuit. We consider formulas to be given in NNF unless mentioned otherwise, and interchangeably refer to a Boolean formula and the circuit representing it. If an NNF formula in *Conjunctive Normal Form (CNF)*, i.e., as conjunction of clauses, is unsatisfiable, then there is a subset of its clauses whose conjunction is unsatisfiable. This set is called its *unsatisfiable core*, and a *minimal unsatisfiable core* is one without any proper subset that is also an unsatisfiable core.

The *Boolean functional synthesis* problem, and notions of *Skolem functions* and *Skolem function vectors* have already been defined in Sect. 1. Let ϕ(*X*,*<sup>Y</sup>* ) be a Boolean relational specification over inputs *X* and outputs *Y* . A commonly used approach, adopted by several Boolean functional synthesis algorithms [6, 14–16], works as follows. Without loss of generality, let <sup>y</sup><sup>1</sup> ≺ ··· ≺ <sup>y</sup><sup>n</sup> be a linear ordering of the outputs in *Y* . We first define a set of derived specifications ϕ(i) (*X*,*<sup>Y</sup>* <sup>n</sup> <sup>i</sup> ) for all <sup>i</sup> ∈ {1,...n}, where <sup>ϕ</sup>(i) ⇔ ∃*<sup>Y</sup>* <sup>i</sup>−<sup>1</sup> <sup>1</sup> <sup>ϕ</sup>(*X*,*<sup>Y</sup>* ). Next, for each <sup>i</sup> ∈ {1,...n}, we find a Skolem function for <sup>y</sup><sup>i</sup> from the derived specification ϕ(i) (*X*,*<sup>Y</sup>* <sup>n</sup> <sup>i</sup> ), by treating <sup>y</sup><sup>i</sup> as the sole output and all of *<sup>X</sup>*,*<sup>Y</sup>* <sup>n</sup> <sup>i</sup>+1 as inputs in ϕ(i) . Let ψi(*X*,*<sup>Y</sup>* <sup>n</sup> <sup>i</sup>+1) denote the Skolem function for <sup>y</sup><sup>i</sup> thus obtained. Finally, we substitute the Skolem functions <sup>ψ</sup>i+1,...ψ<sup>n</sup> for <sup>y</sup>i+1,...y<sup>n</sup> respectively in the Skolem function <sup>ψ</sup><sup>i</sup> obtained above. This gives a Skolem function for <sup>y</sup><sup>i</sup> only in terms of *<sup>X</sup>*. By repeating the above process for all i in decreasing order from n <sup>−</sup> 1 to 1, we obtain a Skolem function vector for ϕ.

### 4 A New Knowledge Representation for Skolem Functions

We start with a key definition that is motivated by the desire to represent the entire space of Skolem functions arising from a specification compactly, and in a form that is easily amenable to well-established logic synthesis and optimization workflows. Recall from Sect. 2 that for a multi-output specification, Skolem functions for different outputs may be dependent on each other. Hence, the set of Skolem function vectors cannot be expressed as a Cartesian product of sets of Skolem functions for individual outputs. Instead, we impose a linear order on the outputs, and express the Skolem function for one output in terms of the inputs and other outputs that precede it in the order. Such a linear order may be automatically generated, user-provided, or even generated with guidance from the user, e.g., if the user provides a partial order on the outputs. We assume the availability of such an order ≺ in the definition below.

Definition 1. *Let* ϕ(*X*,*<sup>Y</sup>* ) *be a specification over a linearly ordered set of outputs <sup>Y</sup>* <sup>=</sup> {y1,...,y<sup>n</sup>}*. We say that output* <sup>y</sup><sup>i</sup> *has a Skolem basis in* <sup>ϕ</sup> *if there exists a pair of functions* (A<sup>i</sup>, B<sup>i</sup>) *over <sup>X</sup>* <sup>∪</sup> *<sup>Y</sup>* <sup>n</sup> <sup>i</sup>+1 *such that*


*We call the vector of pairs* (A<sup>i</sup>, B<sup>i</sup>)<sup>1</sup>≤i≤<sup>n</sup> *the Skolem basis vector for* <sup>ϕ</sup> *wrt* <sup>≺</sup>*.*

The Skolem basis vector can be seen as a succinct representation of the Skolem function space, i.e., the set of all Skolem function vectors of ϕ. A natural question that arises at this point is: *Given a specification* ϕ *and order* <sup>≺</sup> *of outputs, does there always exist a Skolem basis for* ϕ *wrt* <sup>≺</sup>? Fortunately, as we show in this paper, the answer is a resounding "Yes". Not only that, the Skolem basis for a given ϕ and <sup>≺</sup> is unique upto semantic equivalence of the basis functions. It is important to note that not every set of functions can be represented using just two basis functions. This is easy to see via a counting argument: the number of sets of Boolean functions over m inputs is 2<sup>2</sup>2*<sup>m</sup>* . However, the number of sets that admit a Skolem basis is (loosely) upper bounded by 2<sup>2</sup>·2*<sup>m</sup>* . Skolem functions are therefore special, since we show that the space of all Skolem functions for every output in every specification always admits representation by two basis functions, regardless of the order ≺. Interestingly, though the definition of Skolem basis vector needs us to specify an order ≺ on the outputs, somewhat surprisingly, the Skolem function space itself does not depend on the order.

Proposition 1. *Suppose Ψ is a Skolem function vector for the outputs Y in terms of inputs <sup>X</sup> in* ϕ*. Then, for any order* <sup>≺</sup>*, <sup>Ψ</sup> can be generated using the Skolem basis vector of* ϕ *wrt* <sup>≺</sup>*, and then substituting, for each* i ∈ {1,...n}*, the Skolem functions* <sup>ψ</sup><sup>j</sup> *for* <sup>y</sup><sup>j</sup> *where* i<j <sup>≤</sup> <sup>n</sup>*, in the Skolem function for* <sup>ψ</sup><sup>i</sup>*.*

*Proof Sketch:* With ordering <sup>y</sup><sup>1</sup> <sup>≺</sup> <sup>y</sup><sup>2</sup> <sup>≺</sup> ...y<sup>n</sup>, let (A<sup>i</sup>, B<sup>i</sup>) be the corresponding Skolem basis vector. The support of <sup>A</sup><sup>n</sup>, <sup>B</sup><sup>n</sup> are only the inputs *<sup>X</sup>*, while the support of <sup>A</sup><sup>i</sup>, <sup>B</sup><sup>i</sup> (for i > <sup>1</sup>) are *<sup>X</sup>* ∪ {y<sup>i</sup>+1, ...y<sup>n</sup>}. Let *<sup>Ψ</sup>* = (ψ1,...ψ<sup>n</sup>) be an arbitrary Skolem function vector, where each <sup>ψ</sup><sup>i</sup> is a function of *<sup>X</sup>*. By definition of Skolem basis, since <sup>ψ</sup><sup>n</sup> is a Skolem function for <sup>y</sup><sup>n</sup>, it can be obtained from <sup>A</sup><sup>n</sup> and <sup>B</sup><sup>n</sup> (each of which has support *<sup>X</sup>*). Now consider <sup>ψ</sup><sup>i</sup> for <sup>1</sup> <sup>≤</sup> i<n. By definition of Skolem basis, every Skolem function for <sup>y</sup><sup>i</sup> in terms of *<sup>X</sup>* <sup>∪</sup> {y<sup>i</sup>+1, ...y<sup>n</sup>} can be obtained from <sup>A</sup><sup>i</sup> and <sup>B</sup><sup>i</sup>. In particular, if we set <sup>y</sup><sup>i</sup>+1 to <sup>ψ</sup><sup>i</sup>+1 and so on until <sup>y</sup><sup>n</sup> to <sup>ψ</sup><sup>n</sup>, every Skolem function for <sup>y</sup><sup>i</sup> in terms of *<sup>X</sup>* can be obtained from <sup>A</sup><sup>i</sup> and <sup>B</sup><sup>i</sup>.

Another interesting property about Skolem basis vector is that, when it exists, it is unique. Later we will show (constructively) that it always exists and hence we would have also constructed the unique one.

# Proposition 2. *For any* <sup>y</sup><sup>i</sup> *in* <sup>ϕ</sup>*, its Skolem basis, when it exists, is unique.*

*Proof.* Fix <sup>i</sup>. Let <sup>S</sup> be the set of all Skolem functions for <sup>y</sup><sup>i</sup> in <sup>ϕ</sup>(i) . From Definition 1, we know that for all <sup>f</sup> <sup>∈</sup> <sup>S</sup>, <sup>A</sup><sup>i</sup> <sup>⇒</sup> <sup>f</sup>. Hence, <sup>A</sup><sup>i</sup> <sup>⇒</sup> <sup>f</sup>∈<sup>S</sup> <sup>f</sup>. However, we also know that <sup>A</sup><sup>i</sup> <sup>∈</sup> <sup>S</sup> (corresponds to choosing the generalized implicant <sup>0</sup> from GenImpl- Bi ). Therefore, - <sup>f</sup>∈<sup>S</sup> f <sup>⇒</sup> A<sup>i</sup>. It follows from the two implications that A <sup>⇔</sup> <sup>f</sup>∈<sup>S</sup> <sup>f</sup>.

In a similar manner, Definition <sup>1</sup> implies that for all <sup>f</sup> <sup>∈</sup> <sup>S</sup>, <sup>f</sup> <sup>⇒</sup> <sup>A</sup><sup>i</sup> <sup>∨</sup> <sup>B</sup><sup>i</sup>. Hence - <sup>f</sup>∈<sup>S</sup> f <sup>⇒</sup> <sup>A</sup><sup>i</sup> <sup>∨</sup>B<sup>i</sup>. However, we know that <sup>A</sup><sup>i</sup> <sup>∨</sup>B<sup>i</sup> <sup>∈</sup> <sup>S</sup> (corresponds to choosing the generalized implicant B from GenImpl- B ). Therefore, <sup>A</sup><sup>i</sup> <sup>∨</sup> <sup>B</sup><sup>i</sup> <sup>⇒</sup> <sup>f</sup>∈<sup>S</sup> <sup>f</sup>. It follows from the two implications that <sup>B</sup><sup>i</sup> <sup>⇔</sup> <sup>f</sup>∈<sup>S</sup> <sup>f</sup>.

Finally, we explain how our new representation of Skolem functions using a Skolem basis vector naturally lends itself to easy processing by downstream logic synthesis and optimization tools. Thus, a Skolem basis vector is not just an arbitrary way to represent the space of all Skolem function vectors; instead, it is strongly motivated by the way modern logic synthesis and optimization tools work to search the semantic space of partially specified functions (i.e. functions specified with on-sets and don't-care sets). Specifically, in logic synthesis and optimization parlance [29], <sup>A</sup><sup>i</sup> is the *on-set* and <sup>B</sup><sup>i</sup> is the *don't-care set* for Skolem functions for <sup>y</sup><sup>i</sup> in <sup>ϕ</sup>. In other words, <sup>A</sup><sup>i</sup> describes all assignments for which every Skolem function for <sup>y</sup><sup>i</sup> must evaluate to 1 while <sup>B</sup><sup>i</sup> describes those assignments on which a Skolem function can evaluate to either 1 or 0 without violating the requirement of being a Skolem function for <sup>y</sup><sup>i</sup> in <sup>ϕ</sup>. Thus, every semantically distinct Skolem function for <sup>y</sup><sup>i</sup> in <sup>ϕ</sup> can be obtained by choosing a distinct subset of satisfying assignments of <sup>B</sup><sup>i</sup> and choosing the Skolem function to evaluate on this subset of assignments in addition to those determined by Ai. Indeed, stateof-the-art logic synthesis and optimization tools (such as abc [27]) use on-sets and don't care sets expressed as Boolean functions to represent the space of all realizations of a partially specified function. The don't cares are then used to optimize the semantic and implementation choices when choosing the optimal realization of such a partially specified function, as per user provided criteria like area, gate count, delay, power consumption, balance of delays across paths etc. Indeed, the following guarantee follows rather trivially from Proposition 1.

Proposition 3. *Suppose we have access to a logic optimization tool that finds the optimal semantic and implementation choice of a partially specified function as per user criteria. Using this tool on the Skolem basis vector of* ϕ *wrt* <sup>≺</sup> *yields the optimal choice among all Skolem functions, where optimality of Skolem function for* <sup>y</sup><sup>i</sup> *is conditioned on the choice of Skolem functions for* <sup>y</sup><sup>j</sup> *, for* <sup>1</sup> <sup>≤</sup> j<i*.*

Having defined and motivated the Skolem basis vector as our new knowledge representation, in the rest of the paper we will show how it can actually be computed, *in theory and in practice*.

### 5 Towards Synthesizing the Skolem Basis Vector

*The Single Output Case:* First, we consider the case of a singleton output and show that here the existence of Skolem basis is easy to establish, and the basis is also easy to compute.

Theorem 1. *For a single-output specification* ϕ(*X*, y)*, the Skolem basis for* y *in* ϕ *is given by* A <sup>≡</sup> ϕ(*X*, 1) ∧ ¬ϕ(*X*, 0) *and* B <sup>≡</sup> ϕ(*X*, 1) <sup>↔</sup> ϕ(*X*, 0)*. Thus, in this case, the Skolem basis vector for* ϕ *can be computed in time/space linear in size of the circuit representing* ϕ*.*

*Proof.* Let <sup>2</sup><sup>|</sup>*<sup>X</sup>* <sup>|</sup> denote the set of all complete assignments <sup>π</sup> of *<sup>X</sup>*. Define <sup>S</sup><sup>1</sup> <sup>=</sup> {π <sup>|</sup> π <sup>∈</sup> 2<sup>|</sup>*<sup>X</sup>* <sup>|</sup> , π <sup>|</sup><sup>=</sup> <sup>ϕ</sup>(*X*, 1)} and <sup>S</sup><sup>0</sup> <sup>=</sup> {<sup>π</sup> <sup>|</sup> <sup>π</sup> <sup>∈</sup> <sup>2</sup><sup>|</sup>*<sup>X</sup>* <sup>|</sup> , π <sup>|</sup>= ϕ(*X*, 0)}. By definition of <sup>S</sup><sup>0</sup> and <sup>S</sup><sup>1</sup>, (with <sup>S</sup><sup>i</sup> denoting complement of set <sup>S</sup><sup>i</sup>), we have:

– <sup>π</sup> <sup>∈</sup> <sup>S</sup><sup>1</sup> <sup>∪</sup> <sup>S</sup><sup>0</sup> iff <sup>π</sup> <sup>|</sup><sup>=</sup> <sup>∃</sup>y ϕ(*X*, y).


Now let ψ(*X*) be an arbitrary Skolem function for y in ϕ(*X*). Recall that by definition a Skolem function satisfies ∀*X* - <sup>∃</sup>y ϕ(*X*, y) <sup>⇔</sup> ϕ(*X*, ψ(*X*)) . It then follows from the above observations that if <sup>π</sup> <sup>∈</sup> <sup>S</sup><sup>1</sup> <sup>∩</sup> <sup>S</sup><sup>0</sup>, <sup>ψ</sup>(π) must evaluate to <sup>1</sup>. Similarly, if <sup>π</sup> <sup>∈</sup> (S<sup>1</sup> <sup>∩</sup> <sup>S</sup><sup>0</sup>) <sup>∪</sup> (S<sup>1</sup> <sup>∩</sup> <sup>S</sup><sup>0</sup>), it makes no difference whether <sup>ψ</sup>(π) evaluates to <sup>0</sup> or <sup>1</sup>. Finally, if <sup>π</sup> <sup>∈</sup> <sup>S</sup><sup>0</sup> <sup>∩</sup> <sup>S</sup><sup>1</sup>, <sup>ψ</sup>(π) must evaluate to <sup>0</sup>. Since <sup>ψ</sup> was an arbitrary Skolem function for y in ϕ, we infer that the Skolem basis for AllSk(ϕ) is (A, B), where <sup>A</sup> <sup>≡</sup> <sup>ϕ</sup>(*X*, 1) ∧ ¬ϕ(*X*, 0) represents the set <sup>S</sup><sup>1</sup> <sup>∩</sup> <sup>S</sup><sup>0</sup>, and B <sup>≡</sup> - ϕ(*X*, 0) <sup>⇔</sup> ϕ(*X*, 1) represents the set (S<sup>1</sup> <sup>∩</sup> <sup>S</sup><sup>0</sup>) <sup>∪</sup> (S<sup>1</sup> <sup>∩</sup> <sup>S</sup><sup>0</sup>).

We next consider the multiple output case, where our strategy (as done usually for Skolem *function* synthesis) is to reduce to the one-output case above.

*Multiple Outputs and Existential Quantification:* When we have multiple outputs, from the definition of Skolem basis vector (Definition 1), it follows that the problem reduces to the single output case, if we can compute the derived specifications <sup>ϕ</sup>(i)(*X*,*<sup>Y</sup>* <sup>n</sup> <sup>i</sup>+1). Unfortunately, computing ϕ(i)(*X*,*<sup>Y</sup>* <sup>n</sup> <sup>i</sup> ) cannot always be done efficiently, even when ϕ(*X*,*<sup>Y</sup>* ) and the order <sup>≺</sup> on *<sup>Y</sup>* are given. We compute <sup>ϕ</sup>(i) from a given <sup>ϕ</sup>(i−1), where the variable <sup>y</sup><sup>i</sup> to be quantified is either chosen on-the-fly (giving a dynamic computation of ≺) or determined as per a statically provided order. Since ϕ(i+1) ⇔ ∃*<sup>Y</sup>* <sup>i</sup> <sup>1</sup> <sup>ϕ</sup> ⇔ ∃y<sup>i</sup>ϕ(i) for all i ∈ {1,...n−1}, we first consider how a single output variable can be quantified from a derived specification.

The conceptually simplest way to compute <sup>∃</sup>y<sup>i</sup> <sup>ϕ</sup>(i) is as <sup>ϕ</sup>(i) <sup>y</sup>*i*=1 <sup>∨</sup>ϕ(i) <sup>y</sup>*i*=0. Unfortunately, this doubles the size of the circuit representation. An alternative is to find a Skolem function, say <sup>ψ</sup><sup>i</sup>, for <sup>y</sup><sup>i</sup> in <sup>ϕ</sup>(i) , and then use ϕ(i) [y<sup>i</sup> <sup>→</sup> <sup>ψ</sup><sup>i</sup>]. This works well when <sup>ψ</sup><sup>i</sup> can be represented compactly. However, an NNF representation of <sup>ψ</sup><sup>i</sup> can be as large as that of <sup>ϕ</sup>(i) (e.g. if <sup>ψ</sup><sup>i</sup> <sup>≡</sup> <sup>ϕ</sup>(i) <sup>y</sup>*i*=1), in which case we may double the circuit size. We therefore ask *if it is possible to compute* <sup>∃</sup>y<sup>i</sup> <sup>ϕ</sup>(i) *by simply substituting a constant (not necessarily a Skolem function) for* <sup>y</sup><sup>i</sup> *in an NNF formula of almost the same size as* <sup>ϕ</sup>(i). It turns out that this is possible in two practically relevant cases. In other cases, we transform the circuit to permit such constant substitutions. For notational convenience, in the rest of this section, we omit <sup>i</sup> and use <sup>y</sup> and <sup>ϕ</sup> for <sup>y</sup><sup>i</sup> and <sup>ϕ</sup>(i) .

*The Case of Unates:* A variable y is *positive (resp. negative) unate* in ϕ if ϕ <sup>y</sup>=0 <sup>⇒</sup> <sup>ϕ</sup> <sup>y</sup>=1 (resp. <sup>ϕ</sup> <sup>y</sup>=1 <sup>⇒</sup> <sup>ϕ</sup> <sup>y</sup>=0). A variable is *unate* in ϕ if it is either positive or negative unate in ϕ. Then, we have: easily proved.

Lemma 1. *If* y *is positive unate in* ϕ*, then* <sup>∃</sup>y ϕ <sup>⇔</sup> ϕ <sup>y</sup>=1*. Similarly, if* y *is negative unate in* ϕ*, then* <sup>∃</sup>y ϕ <sup>⇔</sup> ϕ <sup>y</sup>=0*.*

*Proof.* The proof immediately from the definition of positive and negative unateness, and from the fact that <sup>∃</sup>y ϕ <sup>⇔</sup> ϕ <sup>y</sup>=0 <sup>∨</sup> <sup>ϕ</sup> <sup>y</sup>=1.

As an example, consider <sup>ϕ</sup> <sup>≡</sup> (x∧(y<sup>1</sup>∨y<sup>2</sup>))∨(¬x∧¬y<sup>2</sup>). Here, <sup>y</sup><sup>1</sup> is positive unate in <sup>ϕ</sup>, but <sup>y</sup><sup>2</sup> is not unate in <sup>ϕ</sup>. However, <sup>y</sup><sup>2</sup> is negative unate in <sup>ϕ</sup> <sup>y</sup>1=1, which by Lemma <sup>1</sup> is equivalent to <sup>∃</sup>y<sup>1</sup> <sup>ϕ</sup>. This shows that even if a variable is not unate to begin with, it may become unate after some variables are quantified. If we use the order <sup>y</sup><sup>1</sup> <sup>≺</sup> <sup>y</sup><sup>2</sup> in our example, both <sup>∃</sup>y<sup>1</sup> <sup>ϕ</sup> and <sup>∃</sup>y<sup>1</sup>∃y<sup>2</sup> <sup>ϕ</sup> can be computed by substituting for <sup>y</sup><sup>1</sup> and <sup>y</sup><sup>2</sup> in <sup>ϕ</sup>. This is however not true for <sup>y</sup><sup>2</sup> <sup>≺</sup> <sup>y</sup><sup>1</sup>.

Fig. 2. NNF circuit representations of formula <sup>ϕ</sup>1, ϕ<sup>+</sup> 1 , ϕ2, ϕ3.

In general, given a specification ϕ(*X*,*<sup>Y</sup>* ) and a linear ordering <sup>≺</sup> of outputs, if each output <sup>y</sup><sup>i</sup> is unate in the derived specification <sup>ϕ</sup>(i) ≡ ∃*<sup>Y</sup>* <sup>i</sup>−<sup>1</sup> <sup>1</sup> <sup>ϕ</sup>, then we can apply Lemma 1, Definition 1 and Theorem 1 to synthesize the entire Skolem basis vector for ϕ w.r.t. <sup>≺</sup> efficiently. This also suggests a heuristic for finding a (partial) order on the outputs *<sup>Y</sup>* . Specifically, given a derived specification ϕ(i), we try to find an output variable y in its support such that y is unate in ϕ(i) . If such a variable exists, we use it as the next variable in the ≺ order, and obtain ϕ(i+1) by using Lemma <sup>1</sup> to compute <sup>∃</sup>y ϕ(i) . As our experiments show (see Sect. 7) and has also been observed elsewhere [14], this approach is surprisingly effective for finding Skolem functions for many benchmarks.

*The Case of No Conflicts:* Next, we consider another case where quantification can be achieved by substituing constants for variables.

Definition 2. *Let* ϕ *be an NNF formula,* y <sup>∈</sup> sup(ϕ)*. Suppose we replace every occurence of* <sup>¬</sup>y *in* ϕ *by a fresh variable* y *(*y ∈ sup(ϕ)*). The resulting formula is called the* y-positive form of ϕ *and is denoted* ϕ+<sup>y</sup>*. The variable* <sup>y</sup> *is said to be* in conflict *in* ϕ *if there exists an assignment* π : sup(ϕ) \ {y}→{0, 1} *such that* ϕ<sup>+</sup><sup>y</sup> <sup>π</sup> <sup>⇔</sup> <sup>y</sup> <sup>∧</sup> <sup>y</sup>*. Otherwise, we say that* <sup>y</sup> *is* conflict-free *in* <sup>ϕ</sup><sup>+</sup><sup>y</sup>*.*

The assignment π in the above definition is called a *counterexample to conflictfreeness of* y *in* ϕ. It is easy to see that both y and y are positive unate in ϕ<sup>+</sup><sup>y</sup>. Henceforth, we use ϕ<sup>+</sup> instead of ϕ<sup>+</sup><sup>y</sup> when y is clear from the context.

We illustrate conflicts and conflict-freeness in Fig. 2. The y-positive form of <sup>ϕ</sup><sup>1</sup> is shown as <sup>ϕ</sup><sup>+</sup> <sup>1</sup> , where <sup>y</sup> is a fresh variable. Clearly, <sup>y</sup> is in conflict in <sup>ϕ</sup><sup>1</sup> since ϕ<sup>+</sup> 1 <sup>π</sup> <sup>⇔</sup> <sup>y</sup> <sup>∧</sup> <sup>y</sup> for <sup>π</sup> : <sup>x</sup><sup>1</sup> <sup>→</sup> <sup>0</sup>, x<sup>2</sup> <sup>→</sup> <sup>0</sup>. Similarly, <sup>y</sup> is in conflict in <sup>ϕ</sup><sup>2</sup> (as seen with <sup>π</sup> : <sup>x</sup><sup>1</sup> <sup>→</sup> <sup>0</sup>, x<sup>2</sup> <sup>→</sup> <sup>0</sup>). However, <sup>y</sup> is not in conflict in <sup>ϕ</sup><sup>3</sup> as there is no assignment <sup>π</sup> of <sup>x</sup><sup>1</sup>, x<sup>2</sup> for which <sup>ϕ</sup><sup>+</sup> 3 <sup>π</sup> <sup>⇔</sup> <sup>y</sup> <sup>∧</sup> <sup>y</sup>.

Lemma 2. *If* y *is conflict-free in* ϕ*, then* <sup>∃</sup>y ϕ <sup>⇔</sup> ϕ<sup>+</sup> y=1,y-=1*.*

*Proof.* Since y is conflict-free in ϕ, it follows that ϕ<sup>+</sup> y=1,y-=1 ⇒ - ϕ+ y=1,y-=0 ∨ ϕ+ y=0,<sup>y</sup>=1. Since all internal nodes in ϕ<sup>+</sup> are labeled by either <sup>∧</sup> or <sup>∨</sup>, it also follows that y and y are positive unate in ϕ<sup>+</sup>. Therefore, - ϕ+ y=1,y-=0 ∨

$$\begin{array}{ll} \left. \varphi^{+} \right|\_{y=0, \widehat{y}=1} \Rightarrow \left. \varphi^{+} \right|\_{y=1, \widehat{y}=1}. \text{ The proof is completed by observing that by definition } \exists y \, \varphi \Leftrightarrow \left( \left. \varphi \right|\_{y=0} \lor \left. \varphi \right|\_{y=0, \widehat{y}=1} \lor \left. \varphi^{+} \right|\_{y=1, \widehat{y}=0} \right). \text{ } \end{array}$$

A notion similar to conflict as defined above was used in [13,20] for defining normal forms for synthesis. The difference is that unlike in [13,20], we do not require a pre-specified subset of the support to be set to 1 in the assignment π. To identify conflicts, we define a *conflict formula* <sup>κ</sup>ϕ,y as - ϕ+ y=1,y-=1 ∧ ¬ϕ<sup>+</sup> y=1,y-=0 ∧ <sup>¬</sup>ϕ<sup>+</sup> y=0,y-=1 . By Definition 2, <sup>y</sup> is conflict-free in <sup>ϕ</sup> iff <sup>κ</sup>ϕ,y is unsatisfiable.

Proposition 4. *For* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>4</sup>*, there exist* <sup>ϕ</sup><sup>i</sup> *with* <sup>y</sup><sup>i</sup> <sup>∈</sup> sup(ϕ<sup>i</sup>) *s.t., (i)* <sup>y</sup><sup>1</sup> *is neither unate nor conflict-free in* <sup>ϕ</sup>1*, (ii)* <sup>y</sup><sup>2</sup> *is unate but not conflict-free in* <sup>ϕ</sup>2*, (iii)* <sup>y</sup><sup>3</sup> *is conflict-free but not unate in* <sup>ϕ</sup>3*, (iv)* <sup>y</sup><sup>4</sup> *is unate, conflict-free in* <sup>ϕ</sup>4*.*

The formulas <sup>ϕ</sup>1, ϕ2, ϕ<sup>3</sup> from Fig. <sup>2</sup> satisfy conditions (i), (ii) and (iii) respectively. For (iv), we consider <sup>ϕ</sup><sup>4</sup> <sup>≡</sup> <sup>x</sup> <sup>∧</sup> <sup>y</sup>, in which <sup>y</sup> is unate and conflict-free. Lemmas 1, 2 and Proposition 4 show that both unateness and conflict-freeness are independently useful, and hence combining we directly obtain:

Theorem 2. *Given* <sup>ϕ</sup>(*X*,*<sup>Y</sup>* ) *and a linear order* <sup>≺</sup> *on <sup>Y</sup> , if* <sup>y</sup><sup>i</sup> *is either unate or conflict-free in* <sup>ϕ</sup>(i) *for all* i ∈ {1,...n}*, then we can effectively synthesize the Skolem basis vector in time linear in size of* ϕ*.*

We remark that the implications of Theorem 2 go beyond what can be achieved by earlier work on normal forms for synthesis [13,20]. Indeed, there are formulas that are neither in SynNNF nor SAUNF but for which Theorem 2 applies.

Finally, unateness is a semantic property; hence if y is not unate in ϕ, it is not unate in every μ such that ϕ <sup>⇔</sup> μ. However, conflict-freeness has a representational aspect. If y is in conflict in ϕ, we can *always* find another NNF formula μ such that (i) μ <sup>⇔</sup> ϕ, and (ii) y is conflict-free in μ. To see why, note that if μ <sup>≡</sup> (y <sup>∧</sup> ϕ <sup>y</sup>=1) <sup>∨</sup> (¬y <sup>∧</sup> ϕ <sup>y</sup>=0), i.e. Shannon expansion of ϕ w.r.t. y, then μ <sup>⇔</sup> ϕ and y is conflict-free in μ. However, taking the Shannon expansion may not always be the best way to render an output conflict-free, as it often leads to blow-up in the size of the expanded formula. In the next section, we give a counterexample guided algorithm to obtain μ from ϕ and y, that works much more efficiently than Shannon expansion in practice.

### 6 Counterexample-Guided Rectification

Recall from the previous section that if y is in conflict in ϕ(*X*,*<sup>Y</sup>* ), then there exists a counterexample (assignment) π : *<sup>X</sup>* <sup>∪</sup> *<sup>Y</sup>* \ {y}→{0, 1} such that ϕ+ <sup>π</sup> <sup>⇔</sup> <sup>y</sup> <sup>∧</sup> <sup>y</sup>. In this section, we discuss how we can use such counterexamples to transform ϕ(*X*,*<sup>Y</sup>* ) to a specification μ(*X*,*<sup>Y</sup>* ) such that μ <sup>⇔</sup> ϕ and y is conflict-free in μ. We call such a transformation *rectification* of ϕ w.r.t y, and the resulting formula μ is said to be *rectified* w.r.t. y.

Lemma 3. *Let* π *be a counterexample to conflict-freeness of* y *in* ϕ(*X*,*<sup>Y</sup>* ) *and let* ξ *be a formula satisfying (a)* sup(ξ) <sup>⊆</sup> *<sup>X</sup>* <sup>∪</sup> *<sup>Y</sup>* \ {y}*, (b)* ϕ <sup>⇒</sup> ξ*, and (c)* ξ <sup>π</sup> *is unsatisfiable. Define* <sup>τ</sup> <sup>≡</sup> <sup>ϕ</sup> <sup>∧</sup> <sup>ξ</sup> *and let* <sup>τ</sup> <sup>+</sup> *denotes the positive form of* <sup>τ</sup> *w.r.t.* y*. Then the following hold: (i)* τ <sup>⇔</sup> ϕ*, (ii)* π *is not a counterexample to conflict-freeness of* y *in* τ *, and (iii) every counterexample to conflict-freeness of* y *in* τ *is also a counterexample to conflict-freeness of* y *in* ϕ*.*

*Proof.* Since ϕ <sup>⇒</sup> ξ, it follows that τ <sup>⇔</sup> ϕ <sup>∧</sup> ξ <sup>⇔</sup> ϕ. This proves claim (i) of Lemma 3. Next, note that since π is a counterexample to conflict-freeness of y in ϕ, we must have ϕ<sup>+</sup> <sup>π</sup> <sup>⇔</sup> (<sup>y</sup> <sup>∧</sup> <sup>y</sup>). Since <sup>ξ</sup> does not have <sup>y</sup> in its support, it follows that <sup>τ</sup> <sup>+</sup> <sup>⇔</sup> ϕ<sup>+</sup> <sup>∧</sup> ξ. Therefore, τ <sup>+</sup> <sup>π</sup> <sup>⇔</sup> <sup>ϕ</sup><sup>+</sup> <sup>π</sup> <sup>∧</sup> <sup>ξ</sup> <sup>π</sup> <sup>⇔</sup> (<sup>y</sup> <sup>∧</sup> <sup>y</sup>) <sup>∧</sup> <sup>ξ</sup> π. However, from the premise of Lemma 3, we know that ξ <sup>π</sup> is unsatisfiable. Hence τ + <sup>π</sup> is false. Specifically, <sup>τ</sup> <sup>+</sup> <sup>π</sup> ⇔ (<sup>y</sup> <sup>∧</sup> <sup>y</sup>), and hence <sup>π</sup> is not a counterexample to conflict-freeness of y in τ . This proves claim (ii) of Lemma 3. Finally, let π : *<sup>X</sup>* <sup>∪</sup> *<sup>Y</sup>* \ {y}→{0, 1} be a counterexample to conflict-freeness of y in τ . By definition, τ <sup>+</sup> π- <sup>⇔</sup> (<sup>y</sup> <sup>∧</sup>y). However, <sup>τ</sup> <sup>+</sup> π- <sup>⇔</sup> <sup>ϕ</sup><sup>+</sup> π- <sup>∧</sup><sup>ξ</sup> π- . Since all variables in support of ξ are assigned by π , we must have ξ π being equivalent to either <sup>0</sup> or 1. If ξ π is <sup>0</sup>, then <sup>τ</sup> <sup>+</sup> π must also be <sup>0</sup>, a contradiction of <sup>τ</sup> <sup>+</sup> π- <sup>⇔</sup> (<sup>y</sup> <sup>∧</sup> <sup>y</sup>). Therefore, we must have ξ π equivalent to <sup>1</sup>, and hence <sup>ϕ</sup><sup>+</sup> π- <sup>⇔</sup> (<sup>y</sup> <sup>∧</sup> <sup>y</sup>) for τ + π to be equivalent to (<sup>y</sup> <sup>∧</sup>y). It follows that <sup>π</sup> must be a counterexample to conflict-freeness of y in ϕ. This proves claim (iii) of Lemma 3.

Henceforth, we call a formula ξ satisfying conditions (a), (b) and (c) of Lemma <sup>3</sup> <sup>a</sup> *partial rectifier* of ϕ w.r.t. y. Given π, it is easy to find a partial rectifier.

Lemma 4. *For all* v <sup>∈</sup> *<sup>X</sup>* <sup>∪</sup> *<sup>Y</sup>* \ {y}*, let* v,π *denote* <sup>v</sup> *if* <sup>π</sup>[v]=1*, and* <sup>¬</sup><sup>v</sup> *if* <sup>π</sup>[v]=0*. Let* <sup>ξ</sup><sup>π</sup> *be* <sup>¬</sup> - <sup>v</sup>∈*<sup>X</sup>* <sup>∪</sup>*<sup>Y</sup>* \{y} <sup>l</sup>v,π *. Then* <sup>ξ</sup><sup>π</sup> *satisfies conditions (a), (b) and (c) of Lemma 3.*

The proof follows immediately from the observations: (i) π is the only satisfying assignment of <sup>¬</sup>ξ<sup>π</sup>, and (ii) <sup>ϕ</sup> <sup>π</sup> ⇔ - ϕ<sup>+</sup>[y → ¬y] <sup>π</sup> <sup>⇔</sup> (<sup>y</sup> <sup>∧</sup> <sup>y</sup>)[y → ¬y] <sup>⇔</sup> <sup>0</sup>. Consequently, <sup>¬</sup>ξ<sup>π</sup> ⇒ ¬ϕ. Although Lemma <sup>4</sup> gives a partial rectifier, it prevents only the assignment π from being a counterexample to conflict-freeness of y in τ . Later we will see a partial rectifier that prevents many more assignments from being counterexamples. For the time being, however, we assume that we have access to a procedure PartialRectifier that takes as inputs ϕ and π and outputs a partial rectifier that satisfies conditions (a), (b) and (c) of Lemma 3.

The above discussion suggests a simple algorithm, shown as Algorithm RectifyOneOutput below, for rectifying a specification ϕ w.r.t. an output y.

The algorithm first initializes a temporary formula μ to ϕ. It then invokes a propositional satisfiability (SAT) solver to obtain a satisfying assignment π of the conflict formula <sup>κ</sup>μ,y (defined in Sect. <sup>5</sup> just before Proposition 4). The assignment π serves as a counterexample to conflict-freeness of y in μ, and is used to obtain a partial rectifier ξ of μ w.r.t. y. The formula μ is then updated by conjoining it with ξ. Lemma <sup>3</sup> guarantees that this gives a specification semantically equivalent to ϕ, while removing π from the set of counterexamples to conflict-freeness of y in μ. By repeating the process with the updated formula μ, all counterexamples to conflict-freeness of y in μ are eventually removed.

Theorem 3. *Algorithm* RectifyOneOutput *always terminates with a formula* μ *s.t.* μ <sup>⇔</sup> ϕ *and* y *is conflict-free in* μ*.*

*Proof.* The following inductive invariants hold at end of every iteration of the loop in lines 2–8, thanks to Lemma 3: (i) μ <sup>⇔</sup> ϕ, (ii) the set of counterexamples to conflict-freeness of y in μ has strictly fewer elements than at the start of the iteration. Since the set of counterexamples is finite (at most 2<sup>|</sup>*<sup>X</sup>* <sup>|</sup>+|*<sup>Y</sup>* |−<sup>1</sup> elements), eventually this set must become empty. By definition of the conflict formula, <sup>κ</sup>μ,y must be unsatisfiable when this happens. Hence, the algorithm eventually exits the loop in lines 2–8 and terminates. Since there are no counterexamples to conflict-freeness of y in μ on termination, y is indeed conflict-free in μ.

*Rectification by Counterexample Generalization:* The idea of counterexample generalization is best illustrated by an example. Consider the specification ϕ(*X*, y) <sup>≡</sup> - (x<sup>1</sup> <sup>∧</sup> <sup>x</sup><sup>2</sup>) <sup>∨</sup> ((x<sup>2</sup> <sup>∧</sup> <sup>x</sup><sup>3</sup>) <sup>∨</sup> <sup>y</sup>) ∧ - <sup>¬</sup><sup>y</sup> <sup>∨</sup> (¬x<sup>3</sup> <sup>∧</sup> <sup>x</sup><sup>4</sup>) , wherein y is in conflict. To see why this is so, consider ϕ<sup>+</sup><sup>y</sup> (henceforth called <sup>ϕ</sup><sup>+</sup>) represented as a NNF circuit in Fig. 3. Let <sup>π</sup> be an assignment that assigns <sup>1</sup> to <sup>x</sup><sup>1</sup>, x<sup>3</sup> and 0 to x<sup>2</sup>, x<sup>4</sup>. The values in red below the leaves in Fig. <sup>3</sup> represent this assignment. If we propagate these values upstream to the root of the circuit, we get the values/formulas shown in red adjacent to internal nodes, as shown in Fig. 3. This process is akin to *constant/symbol propagation* in symbolic simulation [30]. Note that the root of the circuit is assigned y <sup>∧</sup> y by this process, indicating that ϕ<sup>+</sup> <sup>π</sup> <sup>⇔</sup> (<sup>y</sup> <sup>∧</sup> <sup>y</sup>). Hence, <sup>y</sup> is in conflict in <sup>ϕ</sup> and <sup>π</sup> is a counterexample to conflict-freeness of y in ϕ.

Interestingly, the constant/symbol propagation discussed above can yield many more counterexamples beyond π. Specifically, let N denote the set of coloured nodes in the figure. Suppose we cut the circuit at the nodes in N, as shown by the dotted line in Fig. 3. Let the sub-circuit above the cut be denoted <sup>C</sup><sup>N</sup> . Notice that the leaf nodes of <sup>C</sup><sup>N</sup> are either nodes in N or leaf nodes of the original circuit corresponding to y or y. Now consider any assignment π : {x1, x2, x3, x<sup>4</sup>}→{0, <sup>1</sup>} s.t. when we propagate constants/symbols in the original circuit starting with π at the leaves, we get the same values as in Fig. 3 at all nodes

Fig. 3. Circuit representing ϕ<sup>+</sup>*<sup>y</sup>*

in <sup>N</sup>. This ensures that all leaves of <sup>C</sup><sup>N</sup> have the same constant/symbol as in Fig. 3. Therefore, further constant/symbol propagation must assign exactly the same constant/symbol/formula at every internal node of <sup>C</sup><sup>N</sup> as in Fig. 3. Specifically, the root node is assigned y <sup>∧</sup> y, implying that π is a counterexample to conflict-freeness of y in ϕ.

Can we characterize all the counterexamples π obtainable by the above method? It turns out we can do this. First, note from Fig. 3 that the sub-circuits rooted at the orange, purple and green nodes represent the Boolean formulas <sup>x</sup><sup>1</sup> <sup>∧</sup> <sup>x</sup><sup>2</sup>, <sup>x</sup><sup>2</sup> <sup>∧</sup> <sup>x</sup><sup>3</sup> and (¬x<sup>3</sup> <sup>∧</sup> <sup>x</sup>4) respectively. Hence, the set of all counterexamples π obtained above are precisely the satisfying assignment of the formula β ≡ ¬(x<sup>1</sup>∧x2)∧¬(x<sup>2</sup>∧x3)∧¬(¬x<sup>3</sup>∧x4). Notice that there are many assignments beyond <sup>π</sup> that satisfy <sup>β</sup>, e.g. <sup>x</sup>1x2x3x<sup>4</sup> = 0000 or <sup>0010</sup> or <sup>1000</sup>, and so on. Thus, we have truly *generalized* the counterexample π.

In general, given a specification ϕ(*X*,*<sup>Y</sup>* ), an output variable y and a counterexample π : *<sup>X</sup>* <sup>∪</sup> *<sup>Y</sup>* \ {y}→{0, 1} to conflict-freeness of y in ϕ, we first construct an NNF circuit representing ϕ<sup>+</sup>. For every node <sup>n</sup> in the circuit, let ϕ+ <sup>n</sup> denote the sub-formula represented by the sub-circuit rooted at <sup>n</sup>. Next, we assign values given by π to the leaves of the circuit representing ϕ<sup>+</sup> and propagate these values to the root of the circuit. Let <sup>v</sup>n,π denote the constant/symbol/formula assigned to node n in the circuit by this process. In other words, <sup>v</sup>n,π <sup>⇔</sup> <sup>ϕ</sup><sup>+</sup> n <sup>π</sup>. We now choose a subset N of nodes n such that (i) sup(ϕ<sup>+</sup> <sup>n</sup> ) ∩ {y, <sup>y</sup>} <sup>=</sup> <sup>∅</sup>, (ii) <sup>v</sup>n,π is a constant, and (iii) every path from a nony, non-y leaf to the root passes through a node in N. Such a set N can always be found, for example, by choosing N to be the set of non-y, non-y leaves. However, as Fig. <sup>3</sup> shows, <sup>N</sup> need not include only leaf nodes. Let <sup>β</sup>π,N denote the formula n∈N - ϕ+ <sup>n</sup> <sup>⇔</sup> <sup>v</sup>π,n .

Lemma 5. *Every satisfying assignment of* <sup>β</sup>π,N *is a counterexample to conflictfreeness of* <sup>y</sup> *in* <sup>ϕ</sup>*. Moreover,* <sup>¬</sup>βπ,N *satisfies the three conditions required for a partial rectifier as specified in Lemma 3.*

*Proof.* Since every path from a non-y, non-y leaf to the root passes through a node in N, we can use nodes in N and the leaves corresponding to y and y to cut the circuit (as shown in Fig. 3). Let <sup>C</sup><sup>N</sup> denote the sub-circuit above this cut. Let <sup>π</sup> be a satisfying assignment (not necessarily same as <sup>π</sup>) of <sup>β</sup>π,N . By definition of <sup>β</sup>π,N , constant/symbol propagation starting from <sup>π</sup> assigns the constant value <sup>v</sup>π,n to every node <sup>n</sup> <sup>∈</sup> <sup>N</sup>. It follows that for all leaf nodes <sup>l</sup> of the sub-circuit <sup>C</sup><sup>N</sup> , vπ-,l <sup>=</sup> <sup>v</sup>π,l. Hence, every internal node <sup>m</sup> of <sup>C</sup><sup>N</sup> must also have <sup>v</sup>π-,m <sup>=</sup> <sup>v</sup>π,m. In particular the root node gets assigned the same value/symbol/formula that it had when we did constant/symbol propagation starting from π. In other words, ϕ+ π- <sup>⇔</sup> <sup>ϕ</sup><sup>+</sup> <sup>π</sup>. However, Since π is a counterexample to conflict-freeness of y in ϕ, we know ϕ+ <sup>π</sup> <sup>⇔</sup> (y∧y). Therefore, <sup>ϕ</sup>+ π- <sup>⇔</sup> (y∧y) and <sup>π</sup> is a counterexample to conflict-freeness of y in ϕ<sup>+</sup>.

To see <sup>¬</sup>βπ,N satisfies the conditions required of a partial rectifier in Lemma 3, note that sup(ϕ<sup>+</sup> <sup>n</sup> )∩{y, <sup>y</sup>} <sup>=</sup> <sup>∅</sup>. Therefore, sup(¬βπ,N )∩{y, <sup>y</sup>} is also empty. Next, by definiton, if an assignment <sup>π</sup> <sup>|</sup><sup>=</sup> <sup>β</sup>π,N , every node <sup>n</sup> <sup>∈</sup> <sup>N</sup> in the circuit <sup>ϕ</sup><sup>+</sup> gets assigned the constant value vπ,n. Using the same argument as in the first part of the proof, we can then show that ϕ<sup>+</sup> π- <sup>⇔</sup> (<sup>y</sup> <sup>∧</sup> <sup>y</sup>). Hence <sup>ϕ</sup> π- <sup>⇔</sup> <sup>ϕ</sup>+[y → ¬y] π- <sup>⇔</sup> <sup>y</sup> <sup>∧</sup>y[y → ¬y] <sup>⇔</sup> <sup>0</sup>. This shows that <sup>β</sup>π,N ⇒ ¬ϕ. In other words, <sup>ϕ</sup> ⇒ ¬βπ,N . Finally, <sup>β</sup>π,N <sup>π</sup> ⇔ n∈N - ϕ+ n <sup>π</sup> <sup>⇔</sup> <sup>v</sup>π,n . However, <sup>v</sup>π,n <sup>⇔</sup> <sup>ϕ</sup><sup>+</sup> n <sup>π</sup> by definition. Hence <sup>β</sup>π,N <sup>π</sup> <sup>⇔</sup> <sup>1</sup> and hence <sup>¬</sup>βπ,N <sup>π</sup> is unsatisfiable.

The above lemma allows us to use <sup>¬</sup>βπ,N as a partial rectifier of <sup>ϕ</sup> w.r.t. <sup>y</sup> in Algorithm RectifyOneOutput. Significantly, this eliminates in one shot all counterexamples to conflict-freeness of y in ϕ that are satisfying assignments of <sup>β</sup>π,N , thereby reducing the number of iterations of the loop in Algorithm RectifyOneOutput. As seen in the example above, <sup>β</sup>π,N can indeed have many more satisfying assignments beyond π. We use this technique to implement the subroutine PartialRectifier in Algorithm RectifyOneOutput. Specifically, we choose the set N such that the longest path of each node n <sup>∈</sup> N from a leaf of <sup>C</sup><sup>μ</sup> is within an empirically determined threshold (<sup>20</sup> in our experiments).

*Generalizing Using Unsatisfiable Cores:* It turns out that we can generalize counterexamples even beyond what was achieved above. To see a concrete example, consider the specification γ(*X*, y) <sup>≡</sup> ϕ(*X*, y) <sup>∧</sup> - <sup>¬</sup><sup>y</sup> <sup>∨</sup> (x<sup>1</sup> <sup>∧</sup> <sup>x</sup><sup>2</sup>) , where ϕ(*X*, y) is the same specification considered in Fig. 3. The NNF circuit representing γ<sup>+</sup><sup>y</sup> (or <sup>γ</sup><sup>+</sup> for short) is the same as that shown in Fig. <sup>3</sup> with an additional <sup>∧</sup>-gate that feeds the root node, and that is fed by the y leaf and output of the orange node. The same assignment π as considered earlier serves as a counterexample to conflict-freeness of y in γ, and the same set N can be chosen to obtain the same partial rectifier <sup>¬</sup>β, where <sup>β</sup> ≡ ¬(x<sup>1</sup> <sup>∧</sup> <sup>x</sup><sup>2</sup>) ∧ ¬(x<sup>2</sup> <sup>∧</sup> <sup>x</sup><sup>3</sup>) ∧ ¬(¬x<sup>3</sup> <sup>∧</sup> <sup>x</sup><sup>4</sup>). Note, however, that in the circuit for γ<sup>+</sup>, if the orange and purple nodes are assigned the value 0 by constant propagation starting from an assignment π , the root node must be assigned y∧y, *regardless of the value assigned to the green node*. Therefore, we could have used <sup>β</sup> ≡ ¬(x<sup>1</sup> <sup>∧</sup> <sup>x</sup><sup>2</sup>) ∧ ¬(x<sup>2</sup> <sup>∧</sup> <sup>x</sup><sup>3</sup>), which represents a larger set of counterexamples than <sup>β</sup>. Specifically, <sup>x</sup><sup>1</sup>x<sup>2</sup>x<sup>3</sup>x<sup>4</sup> = 1001 does not satisfy β but satisfies β . It follows that rectification using <sup>¬</sup>β eliminates more counterexamples in one go than rectification using <sup>¬</sup>β.

In general, given <sup>ϕ</sup>, <sup>y</sup>, <sup>π</sup> and <sup>N</sup> as in our previous discussion, let <sup>s</sup><sup>n</sup> be a fresh variable for every node <sup>n</sup> <sup>∈</sup> <sup>N</sup>, and define the formula <sup>ρ</sup>π,N <sup>≡</sup> <sup>ϕ</sup> <sup>∧</sup>  n∈N - (s<sup>n</sup> <sup>⇒</sup> (ϕ<sup>+</sup> <sup>n</sup> <sup>⇔</sup> <sup>v</sup>π,n)) <sup>∧</sup> <sup>s</sup><sup>n</sup> . Since <sup>ϕ</sup> ⇒ ¬βπ,N (see Lemma 5) and since <sup>β</sup>π,N <sup>≡</sup> <sup>n</sup>∈<sup>N</sup> (ϕ<sup>+</sup> <sup>n</sup> <sup>⇔</sup> <sup>v</sup>π,n), it follows that <sup>ρ</sup>π,N is unsatisfiable. Assuming <sup>ϕ</sup> is satisfiable (otherwise the synthesis problem is itself trivial), every unsatisfiable core of <sup>ρ</sup>π,N must set a subset of the <sup>s</sup><sup>n</sup> variables to <sup>1</sup>. Let <sup>U</sup> <sup>⊆</sup> <sup>N</sup> be the set of nodes <sup>n</sup> s.t. <sup>s</sup><sup>n</sup> = 1 in a minimal unsatisfiable core of <sup>ρ</sup>. Then <sup>ρ</sup>π,U <sup>≡</sup> ϕ <sup>∧</sup> n∈U - (s<sup>n</sup> <sup>⇒</sup> (ϕ<sup>+</sup> <sup>n</sup> <sup>⇔</sup> <sup>v</sup>π,n)) <sup>∧</sup> <sup>s</sup><sup>n</sup> is unsatisfiable.

Lemma 6. *Lemma <sup>5</sup> holds with* <sup>β</sup>π,N *replaced by* <sup>β</sup>π,U *. Moreover,* <sup>β</sup>π,N <sup>⇒</sup> <sup>β</sup>π,U *.*

*Overall Algorithm:* We are now present Algorithm FindSkBasisVec. The algorithm initializes a running specification α to ϕ. It then repeatedly chooses the next output <sup>y</sup><sup>i</sup> for whose Skolem functions a Skolem basis needs to be computed. The choice of <sup>y</sup><sup>i</sup> can be as per a static order, or as determined on-the-fly heuristically. The algorithm then finds Skolem basis (A<sup>i</sup>, B<sup>i</sup>) using Theorem <sup>1</sup> by treating <sup>y</sup><sup>i</sup> as the sole output in the specification <sup>α</sup>. It next updates the running specification <sup>α</sup> by existentially quantifying <sup>y</sup><sup>i</sup> from <sup>α</sup>. In order to do this, it first checks if <sup>y</sup><sup>i</sup> is unate in <sup>α</sup>, and if so, substitutes an appropriate constant for <sup>y</sup><sup>i</sup> in α to quantify it out. Otherwise, the algorithm invokes Algorithm Rectify-OneOutput. Thanks to Theorem 3, we can effectively and efficiently quantify <sup>y</sup><sup>i</sup> from <sup>α</sup> by setting <sup>y</sup><sup>i</sup> = 1 and <sup>y</sup><sup>i</sup> = 1 in the positive form of the formula <sup>μ</sup> returned by RectifyOneOutput. Once all outputs are processed, the algorithm outputs the vector of (A<sup>i</sup>, B<sup>i</sup>) pairs computed as the Skolem basis vector.

Theorem 4. *Algorithm* FindSkBasisVec *terminates with a Skolem basis vector for the specification* ϕ(*X*,*<sup>Y</sup>* )*.*

*Proof.* The proof of termination follows immediately from Theorem 3. The proof of correctness follows from Definition 1, Theorems 1, 3, and Lemmas 1, 2.

Though we developed rectification as a technique for rendering a variable conflict free with the objective of generating Skolem basis vectors, it can be independently used to compile a Boolean formula to a form that allows efficient quantifier elimination. However, a performance evaluation of rectification versus other quantification techniques in such applications is beyond the scope of this paper.

### 7 Implementation and Experiments

We implemented the above algorithms in C++ using the abc package [27] and ran our tool on a set of 602 Boolean functional synthesis benchmarks (also used in [12,14]). We used an Intel(R) Xeon(R) CPU E5-2660 v2@2.20GHz machine with 40 cores in single-threaded mode (multiple cores used only to run experiments in parallel). We set an overall timeout of 3600 seconds, within which the timeout for unate-check was 1000 seconds.

*Detailed Analysis of Our Results.* We did an ablation study to understand which part of our approach was most successful in compiling the benchmarks.

Our results are summarized in Fig. 4. Here, "Total solves" denotes the number (out of 602) benchmarks for which Algorithm Find-SkBasisVec completed within the timeout. "PAR2 score" is a widely used weighted performance score, computed as sum of time taken (in seconds) for each solved instances and double of timeouts (3600 s)s) for each unsolved instance. For benchmarks that were rectified, *for each application of rectification, we verified (using a SAT solver) that the rectified circuit was semantically equivalent* to the original. The time for this verification is included when computing PAR2


# Fig. 4. Table of results

scores. In row 3, we note the "Average time" taken (including for verification), in seconds, over all solved instances. In rows 4, 5 and 6, we count, respectively, the number of solved benchmarks, where (i) all variables were unate (ii) some but not all were unate and (iii) no variables were unate (these add up to row 1). In row 7, we list the number of solved benchmarks for which there was at least one conflict, i.e., a call to the rectification algorithm was needed. Row 8 lists the solved benchmarks with at least one output that was not unate but no outputs having conflicts. The other rows are self-explanatory.

*Order Dependence.* Since a Skolem basis vector depends on the ordering of outputs, we considered two order variants. In the first, we considered a heuristically determined static order (denoted SO), taken as is from [14]. Then, we tried a heuristic dynamic order (denoted DO): after each output variable is processed, the next is obtained on-the-fly by applying the heuristic from [14].

*Conflict Optimization in Calculating Skolem Basis Vector.* We found several problem instances where the specification is not realizable, i.e., there exist input values for which no output values can make the specification true. For such instances, it is reasonable to restrict the computation of Skolem basis vector to a set F of Skolem functions, such that for any Skolem function ψ ∈ F, there exists ψ <sup>∈</sup> <sup>F</sup> such that <sup>ψ</sup> and <sup>ψ</sup> differ only on the space of input assignments for which no assignment of outputs would satisfy the specification. It turns out that this can be easily encoded in Algorithm 1 by modifying the conflict formula <sup>κ</sup>μ,y to <sup>κ</sup>μ,y <sup>∧</sup> <sup>ϕ</sup>(*X*,*<sup>Y</sup> -* ), where *<sup>Y</sup>*  is a fresh set of variables. Doing this, along with the static/dynamic ordering gives us the "CSO" and "CDO" columns in Fig. 4.

*Observations.* With either SO or DO, without conflict optimization, we are able to *compute Skolem basis vectors for 299 of 602 benchmarks* (286 were solved by both, 1 by only DO and 12 by only SO). Interestingly, the static order (SO) had fewer conflicts compared to the dynamic order (DO), when we had to rectify more often. Further, in the presence of conflict optimization, we are able to compute Skolem basis vectors for 309 out of 602 benchmarks. Note is that even though the PAR2 score is large, the average time taken is less than 2.5 min, including time taken for verification. In other words, *when we are able to compute Skolem basis vectors, we are able to do so in remarkably short duration.*

*Comparison with Other Tools/Approaches.* There are no existing tools that synthesize a represention of the space of all Skolem function vectors. Knowledge compilation tools e.g., C2Syn [13], NNF2SDD [25,31] come closest as they try to obtain a single circuit that is semantically equivalent to the original and is in a normal form: the SynNNF form for C2Syn and the SDD form for NNF2SDD. Skolem functions hence could be potential alternative approaches. In practice, C2Syn does refinement (see [13]) operations for performance boosting, thereby restricting the space of Skolem function vectors. Even with this optimization for C2Syn it can compile only 218 (out of 602) benchmarks, while NNF2SDD compiles only 142 to SDD on the same computing platform.

An apples-to-apples performance comparison of Boolean functional synthesis tools (that synthesize a single Skolem function vector) with our tool (that computes Skolem basis vectors for all Skolem function vectors) is not possible, since two different problems are being solved. Nevertheless, to understand the performance penalty incurred in computing a representation of all Skolem function vectors, we observe from [12] that with a 7200 s s timeout and using a more powerful cluster, Manthan [12] (resp. BFSS [14]) could synthesize a single Skolem function vector for <sup>∼</sup>356 (resp. 247) out of the same 602 benchmarks. In comparison, with 3600 s s timeout, we are able to compute Skolem basis vector for ∼ 300 benchmarks. In [17], an improved and highly engineered tool Manthan2 was developed, which could synthesize a single Skolem function vector for 502 benchmarks within 7200 s.s. Interestingly, *we are able to compute Skolem basis vectors for 22 benchmarks (out of which 13 have non-unate variables), for which even Manthan2* [17] *fails to synthesize a single Skolem function vector*.

### 8 Conclusion

In this work, we have introduced a representation for the space of Skolem functions, using the notion of Skolem basis vector. Our representation itself is criteriaagnostic, but allows the use of other existing techniques to optimize Skolem functions wrt different criteria. We develop a compilation algorithm that uses a combination unate and conflict-detection along with generalized counter-example guided approach to synthesize the Skolem basis vector. Our next step would be to identify specific problem contexts and optimization criteria and integrate our approach with the state-of-the-art logic synthesis tools to synthesize specific Skolem functions satisfying the given criteria.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Guessing Winning Policies in LTL Synthesis by Semantic Learning**

Jan Křetínský1,2(B) , Tobias Meggendorfer1,3 , Maximilian Prokop1,2 , and Sabine Rieder<sup>1</sup>

<sup>1</sup> Technical University of Munich, Munich, Germany jan.kretinsky@tum.de <sup>2</sup> Masaryk University, Brno, Czech Republic <sup>3</sup> Institute of Science and Technology, Klosterneuburg, Austria tobias.meggendorfer@cit.tum.de

**Abstract.** We provide a learning-based technique for guessing a winning strategy in a parity game originating from an LTL synthesis problem. A cheaply obtained guess can be useful in several applications. Not only can the guessed strategy be applied as best-effort in cases where the game's huge size prohibits rigorous approaches, but it can also increase the scalability of rigorous LTL synthesis in several ways. Firstly, checking whether a guessed strategy is winning is easier than constructing one. Secondly, even if the guess is wrong in some places, it can be fixed by strategy iteration faster than constructing one from scratch. Thirdly, the guess can be used in on-the-fly approaches to prioritize exploration in the most fruitful directions.

In contrast to previous works, we (i) reflect the highly structured logical information in game's states, the so-called semantic labelling, coming from the recent LTL-to-automata translations, and (ii) learn to reflect it properly by learning from previously solved games, bringing the solving process closer to human-like reasoning.

### **1 Introduction**

*LTL Synthesis.* [38] is a framework for automatic construction of reactive systems specified by formulae of linear temporal logic (LTL) [37]. Since LTL is a prominent logic in the area of safety-critical and provably reliable dynamic systems, LTL synthesis is a very tempting option to construct such systems since it avoids error-prone manual implementation; instead it is replaced with the need for a complete specification of the system (which is not trivial either, but in some cases easier). However, there is also an important computational caveat: the problem of LTL synthesis is 2-EXPTIME complete. Despite the infeasibility in the worst-case, many heuristics have been designed that can cope with practical problems, as documented by the yearly progress in the synthesis competition

c The Author(s) 2023

This research was funded in part by the German Research Foundation (DFG) project 427755713 *Group-By Objectives in Probabilistic Verification (GOPro).*

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 390–414, 2023. https://doi.org/10.1007/978-3-031-37706-8\_20

SYNTCOMP [18], which has an LTL track for a number of years. Yet, many reasonable instances even in the benchmark set of SYNTCOMP still remain practically unsolvable. In this paper, we aim at *guessing a solution* through a machine-learning model, even for hard cases, thus possibly providing an applicable answer, in a sense, without reading the input formula. We achieve that by learning from other games and by reflecting *semantic* information, bringing the process closer to human reasoning.

The classic technique for solving LTL synthesis is to


Due to the worst-case doubly-exponential blowup in the first step and the practically bad performance of (Safra's [39] and others' [36,40]) determinization procedures, this option was rarely used practically until direct, more practical translations were given [8,12]. The significantly smaller automata [20] have made this approach feasible and, in fact, winning in SYNTCOMP since then. The approach is implemented in the tool Strix [33], which additionally constructs the DPA/PG *only partially*, on-the-fly until it finds a winning strategy for one of the players. This helps to overcome some more cases where the DPA is still very large; yet, more complex specifications often remain out of reach.

*Semantic Labelling.* The key difficulty in the on-the-fly exploration is a good heuristic that prioritizes exploration in promising directions, so that a solution can be obtained quickly, without constructing "irrelevant" parts of the game.

*In a concrete state of a PG, is it better to go left or right?* While this question obviously does not have a simple answer in general, we take a step back and instead of a PG we solve the LTL synthesis problem. For instance, consider a state of a PG corresponding to satisfying **G***a*, i.e. "always *a* holds". Then, the letter {*a*} is clearly a better choice (for the system) than ∅. The former leads to the obligation of satisfying again **G***a*; the latter to the obligation ff (falsifying the formula). Taking the former edge does not guarantee winning, but the chances are certainly higher than giving up directly. In order to estimate the chances of winning with some obligation, we can evaluate it by randomly assigning truth values to temporal subformulae; intuitively, **G***a* can be true or false, so its "trueness" is 0.5, ff has trueness 0. *Trueness* is examined in [22] and utilized in newer versions of Strix [31] as guidance.

*Does every state correspond to a goal in LTL? And if so, can we determine which continuation brings us closer to satisfying it?* Recall that the classic translations of LTL to non-deterministic Büchi automata (NBA), stemming from [43], label the states of the NBA with a conjunction of LTL formulae, which are the current goals in this state. For deterministic automata, the situation is inevitably more complex. While the determinization procedures obfuscated any possible such semantic labelling, the more recent approach re-established it, e.g., [8] with

**Fig. 1.** Simple game where it is not clear which edges are "winning".

[26], or [42] with [9]. Beside the overall goal, it is necessary to also monitor the *progress of subgoals*. For example, consider **GF**(*a* ∧ **X***b*) "infinitely often *a* is followed by *b*". No matter what happens, the goal remains the same. However, whenever *a*, we are progressing with the subgoal of seeing the *a* − *b* sequence once, yielding a subgoal *b*, which is regarded as promising.

*Our Aim.* In this paper, we aim at *better guessing of winning decisions* than in [22,31]. While the previous work only reflected trueness of the main goal, which is just the percentage of truth assignments leading to satisfaction of a Boolean formula, our approach reflects also (i) the temporal structure of the formulae, (ii) the monitored subgoals, and (iii) learns from previously solved games. On the technical level, we design over 200 *structural features* instead of just trueness, learn an *SVM* classifier comparing which edge is most promising, and use *data from previously solved games*, i.e. which edges are "winning". As it turns out, defining this notion already is surprisingly tricky: We cannot simply use the output of classical strategy improvement algorithms, as there may be multiple, incompatible solutions. Indeed, already for reachability, there are no maximal permissive strategies [3], see Fig. 1. Here the edge (*v*2*, v*3) is winning iff (*v*3*, v*2) is not used, and vice versa; using both makes them losing. Nevertheless, they are "better" than, e.g., the self-loop on *v*1, which is always losing. Thus, we want to value both edges between *v*<sup>2</sup> and *v*<sup>3</sup> equally, and higher than the self loop on *v*1.

*Our Contribution* can be summarized as follows:


Strix already profits from our advice and—modulo our unoptimized advice implementation—speeds up significantly, as we see in Sect. 6.3.

*Usage of our Results:*


*Related Work.* To the best of our knowledge, there is only one other approach to using machine learning in LTL-synthesis. Here, the authors train a very powerful model (a hierarchical transformer) in order to directly predict a controller or counter example solely off the LTL specification [41]. Further, if their prediction is refuted by a classical model checking algorithm, they train a separated hierarchical transformer to repair it [5] until it is correct. While this turns out to be an overall competitive approach that also manages to solve some instances where classical synthesis tools as Strix [33] fail, this does not yield a complete procedure, as the repair loop is not guaranteed to ever terminate. In this work, we aim to improve existing, complete procedures such as implemented in Strix by means of machine learning based heuristics.

### **2 Preliminaries**

We introduce notation and provide an overview of necessary background knowledge. Due to space constraints, we only briefly comment on several topics and refer the interested reader to the respective literature.

We use N to denote the set of non-negative integers. The constants tt and ff denote *true* and *false*, respectively.

#### **2.1 Synthesis & Games**

The synthesis problem in its general form asks whether a system can be controlled such that it satisfies a given specification under any (possible) environment. Moreover, one often is interested in obtaining a witness to this query, i.e. some *controller* or *strategy* which specifies the system's actions.

*Parity Games* are a standard formalism used in synthesis. A *parity game* is a tuple G = ((*V,E*)*, v*0*, P,* p), where (*V,E*) is a finite digraph, *v*<sup>0</sup> ∈ *V* a *starting vertex*, *<sup>P</sup>* : *<sup>V</sup>* → {S*,* E} <sup>a</sup> *player mapping*, and <sup>p</sup> : *<sup>V</sup>* <sup>→</sup> <sup>N</sup> <sup>a</sup> *priority assignment*. Each vertex belongs to one of the two players S (called *system*) and E (called *environment*). In other words, the set of vertices is partitioned into player S's vertices *V*<sup>S</sup> and player E's vertices *V*<sup>E</sup> . See Fig. 2 for an example.

**Fig. 2.** An example parity game, taken from [22]. Rounded rectangles belong to the system S and normal rectangles to the environment E. The vertices are additionally labelled with their priorities.

*Remark 1.* In our implementation priorities are assigned to edges instead of vertices, as this allows for a much more concise representation and suits most translations better. However, for ease of presentation, we consider *state-based acceptance* instead of *transition-based*.

*Playing.* To play the game, a token is placed in the initial vertex *v*0. Then, the player owning the token's current vertex moves the token along an outgoing edge of the current vertex. This is repeated infinitely, giving rise to an infinite sequence of vertices containing the token *<sup>ρ</sup>* <sup>=</sup> *<sup>v</sup>*0*v*1*v*<sup>2</sup> ···∈ *<sup>V</sup> <sup>ω</sup>*, called a *play*. We write *ρ<sup>i</sup>* to refer to the *i*-th vertex in a play. A play *ρ* is *winning* (for the system player) if the smallest priority occurring infinitely often is odd. (Using "maximal" instead of "minimal" or "even" instead of "odd" does not fundamentally change the problem at hand.) Formally, we define inf(*ρ*) = {*v* ∈ *V* | ∀*j.* ∃*k* ≥ *j. ρ<sup>j</sup>* = *v*} as the set of infinitely occurring states. Since the game graph is finite, this set always is non-empty. The smallest priority occurring infinitely often is given as p(*ρ*) = min{p(*v*) | *v* ∈ inf(*ρ*)} and system wins the play *ρ* if p(*ρ*) is odd.

*Strategies.* A strategy of player *p* is a mapping *σ<sup>p</sup>* : *V<sup>p</sup>* → *E* assigning to each of *p*'s vertices an appropriate edge along which the token will be moved, i.e. (*v, σp*(*v*)) ∈ *E* for all *v* ∈ *Vp*. <sup>1</sup> Once both players fix a strategy, the game is fully determined and a unique run is induced. We call a strategy of system *σ*<sup>S</sup> *winning* if for *all* strategies of the environment *σ*<sup>E</sup> the induced play is winning, i.e. system wins no matter what the environment does.

For example, consider again the game depicted in Fig. 2. Fixing the strategies *σ*<sup>S</sup> = {*v*<sup>0</sup> → (*v*0*, v*2)*, v*<sup>2</sup> → (*v*2*, v*3)*, v*<sup>4</sup> → (*v*4*, v*4)} and *σ*<sup>E</sup> = {*v*<sup>1</sup> → (*v*1*, v*2)*, v*<sup>3</sup> → (*v*3*, v*3)} induces the play *v*0*v*2*v*3*v*<sup>3</sup> ··· . The set of infinitely often seen priorities equals {3}, hence the system player wins with these strategies. Moreover, the strategy *σ*<sup>0</sup> is winning, since the play always ends up in either *v*<sup>3</sup> or *v*4.

*Synthesis.* With these notions, we can compactly define the synthesis question: *Given a parity game* G, *does there exist a winning strategy for the system player?* In the example above, *σ*<sup>0</sup> is a witness to this question.

<sup>1</sup> Strategies may be more complex, e.g., by using memory. However, "positional" strategies are sufficient for parity games, thus we omit the general definition.

This problem is still intensely studied due to its broad applications. It also is one of the few problems which canonically lie in **NP** ∩ **coNP** (even in **UP** ∩ **coUP** [19]), with recent breakthroughs achieving quasi-polynomial algorithms [4,14,28].

*Extensive-Form Game.* A common notion in game theory is the *extensive-form* game. Intuitively, this means completely "unrolling" the game into an explicit representation. See e.g. [34, Chp. 5–7] for details. In our case, we consider the *game tree*, where each node corresponds to a simple path in the game G. Suppose we are in state *s* = (*v*1*,...,vi*) of the game tree. Then, the successors of *s* are determined by all successors of *v<sup>i</sup>* in the game, i.e. {*u* | (*vi, u*) ∈ *E*} as follows. Suppose such a successor *u* already occurs along *s*, i.e. a loop is closed, we check if the corresponding play is winning or losing. In that case, the choice leads to a corresponding winning or losing leaf of the tree, respectively. Otherwise, i.e. when no loop is closed by the choice, it leads to *s* ◦ *u*. Essentially, this game tree represents all potential simple paths (and thus, intuitively, all potential positional strategies) that can arise in the game, and each edge corresponds to a particular move of a player (also called *ply* in game theory). In particular, it is finite, however of potentially exponential size. Note that we can restrict to simple paths only because positional strategies are sufficient.

*Minimax Game Solving.* A fundamental way to solve games is the *minimax decision* rule, which intuitively corresponds to exhaustively exploring the extensiveform game (also discussed in [34]). Suppose we assign a value of 0 to "losing" leaves of the game tree and a value of 1 to the "winning" leaves. Then, we can "back-propagate" values by setting *V* (*s*) the maximum of all successors of *s* if it currently is the turn of the system player and the minimum if instead it is environment's turn (which wants the system to lose). The game is winning if the value in the initial state of the game tree is 1. This approach is also called *backward induction* or *retrograde analysis*: starting from the winning / losing positions of the game, we consider all moves which could lead to such situations.

*Strategy Improvement* (or *strategy iteration*, abbreviated by *SI*) is the most prominent practical way of solving parity games, i.e. answering the synthesis question. It received significant attention due to recent practical advances [13,15,17,32] and modern tool developments [6,33]. We explain the approach briefly, since its details are not important for this work. Intuitively, SI starts from arbitrary initial strategies for each player, and then performs the following steps in a loop. First, we check whether either strategy is winning. If yes, the algorithm exits, returning this strategy. Otherwise, one of the strategies is improved by changing its choices in some vertices. If an improvement is not possible, there exists no winning strategy for the respective player. Otherwise, the process is repeated with the new strategy.

This algorithm converges to the correct result in finite time for any initial strategy. However, if this initial strategy is chosen "close" to a winning strategy, then SI intuitively needs to perform fewer steps to converge to an optimal one. Thus, a heuristic which often comes up with a "good" initial strategy may improve the runtime significantly over arbitrary or random initialization.

#### **2.2 Linear Temporal Logic and Reactive Synthesis**

*Linear Temporal Logic* (LTL) [37] is a standard logic used to specify desired behaviour of a system. The syntax usually is given by

$$
\phi \colon = \mathbf{f} \mathbf{f} \mid a \mid \neg \phi \mid \phi \land \phi \mid \mathbf{X} \phi \mid \phi \mathbf{U} \,\phi,
$$

where *<sup>a</sup>* <sup>∈</sup> AP is an *atomic proposition*, inducing the *alphabet* Σ=2AP. These formulae are interpreted over infinite sequences *<sup>w</sup>* <sup>∈</sup> <sup>Σ</sup>*<sup>ω</sup>* called *<sup>ω</sup>*-words. A word *<sup>w</sup>* <sup>=</sup> *<sup>w</sup>*0*w*<sup>1</sup> ··· ∈ <sup>Σ</sup>*<sup>ω</sup>* satisfies the *next* operator **<sup>X</sup>***<sup>φ</sup>* iff *<sup>φ</sup>* is satisfied in the next step. Similarly, the *until* operator *φ***U***ψ* is satisfied iff *φ* holds until *ψ* is eventually satisfied. Usual abbreviations are defined as *finally* **<sup>F</sup>***<sup>φ</sup>* <sup>≡</sup> tt **<sup>U</sup>** *<sup>φ</sup>* and *globally* **G***φ* ≡ ¬**F**¬*φ*, which require that *φ* holds at least once or always, respectively. Moreover, the construction underlying our work also considers *strong release φ* **M** *ψ* ≡ *ψ* **U** (*ψ* ∧ *φ*), *(weak) release φ* **R** *ψ* ≡ **G***ψ* ∨ (*φ* **M** *ψ*), and *weak until φ* **W** *ψ* ≡ **G***φ* ∨ (*φ* **U** *ψ*). Considering these additional operators allows formulas to be represented in *negation normal form*, i.e. the negation ¬ only appears in front of atomic propositions. In the interest of space, we refer to [12] for precise definition on the semantics and discussion of these subtleties. Understanding these issues is however not required for this work.

*LTL Synthesis* is an instance of the general synthesis problem, where the specification to be satisfied is given in form of an LTL formula [38]. Due to recent advances [11,12,16,20,21,25], the *automata-based approach* [43] to LTL synthesis received significant attention. In particular, the tool Strix [33], built on top of Owl [24], which in turn implements these ideas, won several iterations of the synthesis competition SYNTCOMP [18]. Essentially, the given LTL formula is translated into an *ω*-automaton, which in turn is transformed into a parity game. Solving the resulting game yields a solution to the original synthesis question.

This game is obtained by "splitting" the automaton, as follows. The set of atomic propositions is split into system- and environment-controlled propositions, i.e. AP <sup>=</sup> AP<sup>S</sup> <sup>∪</sup> AP<sup>E</sup> , and the players' actions correspond to choosing which of their propositions to enable. Once both players chose their propositions' values, the automaton moves to the next vertex according to the players' choices. Concretely, for an automaton state *p*, the environment can choose to move into (*p, v*) where *v* ⊆ 2AP<sup>E</sup> , and from there, system can move to any automaton state *q* = *δ*(*p, v* ∪*v*) where *v* ⊆ 2AP<sup>S</sup> and *δ* is the transition function of the automaton. In particular, this means that the obtained game is *alternating*, i.e. system and environment take turns in alternation. Moreover, by convention the environment moves first. See e.g. [33] for more details on this approach.

*Semantic Translations* from LTL to automata are the key ingredient to our approach. On top of providing a parity game, they also give a *semantic labelling*,

**Fig. 3.** Motivational example to provide guidance through semantic labelling.

i.e. interpretable meaning, to the game's vertices. In particular, the approach introduced in [8] (see also [10–12]) and implemented in Owl [25] intuitively yields for each vertex a list of LTL formulae, which roughly correspond to (sub-)goals which still have to be fulfilled, possibly repetitively.

#### **2.3 Our Goal**

In this work, we want to demonstrate that this semantic labelling can be efficiently exploited for reactive synthesis. For a motivational example to consider semantic labelling, we display a (vastly simplified) labelled game in Fig. 3. We are offered with the choice of choosing *a* or ¬*a*. While it is not completely clear that choosing *a* is indeed better, it certainly seems to be more promising, as the subsequent labelling seems much "easier" to handle. Thus, faced with a choice, we likely would first try to win with *a*. Observe that without the semantic labelling, our best option in this situation would be a random guess. In [22], the authors used a simple, manually designed mechanism trying to capture this notion, called *trueness*. Motivated by the (surprisingly good) results of this approach, we want to tackle this problem by more sophisticated means. Concretely, we want to make meaningful decisions based on the labelling. However, while the theory underpinning semantic translations is quite clean and pleasant [12], the actual labellings appearing in practice are quite complex. To further complicate things, the highly optimized implementation thereof [25] employs several subtle optimizations and special cases. We provide an example to showcase the complexity of this labelling in practice later in Sect. 5, kept brief in the interest of space, and a small real-world example in [23, Appendix A.1]. Since we have a simple intuition which however seems difficult to formalize, we opt to tackle this problem through means of machine learning.

#### **3 Previous Approaches and Their Limitations**

In this section, we briefly summarize the ideas of [22] and the inherent problems associated with them. The primary motivation of [22] is to exploit the semantic labelling provided by [25], which gives us an indication of the long term goals in the game. As an analogy, consider the game of chess. Here, the "semantic labelling" is given by the board state, i.e. the position of each piece. This labelling provides us with a reasonable indication of (i) our current situation and (ii) which moves might be better than others. In particular, understanding and evaluating the semantics of the game is what allows humans to have a good intuition about the quality of moves, without thinking through the intractably large game tree. Likewise, this understanding is what enabled algorithms to perform beyond human capabilities.

#### **3.1 Parity Game Solving by Trueness**

A central notion of [22] is *trueness*, an approximation of how close a formula is to being satisfied, i.e. tt. The intuition is that the semantic labelling of states effectively describes "goals" of the system player. If the formula is tt, the system has satisfied all goals and consequently won the game. Likewise, increasing the trueness is indicative for a good move. Remaining with the analogy of chess, trueness somewhat corresponds to counting the number of pieces on the board (or rather the difference between our and the opponent's pieces): If no enemy pieces remain, we certainly have won, and a change of this difference, i.e. capturing an enemy piece or avoiding capture of own pieces, is a good indicator for the quality of a move. In particular, this prohibits us from taking moves which immediately lead to a piece being taken.

In [22], the authors propose two ideas. First, they suggest to use a truenessmaximizing strategy as initial one for strategy iteration, i.e. in each state select the edge which maximizes (or minimizes, in the case of E) the obtained trueness. Second, they use *Q-Learning*, a popular reinforcement learning approach, as a solver for parity games, i.e. as competitor to strategy iteration, using three different reward signals. There, each edge is given a reward, which is mostly based on (the change of) trueness, and these values then are back-propagated until choosing optimal rewards in each step yields a winning strategy.

While they also show Q-Learning to be an interesting avenue, we primarily focus on the "initializing strategy iteration" approach, since our goal is to augment exiting strategy iteration solvers. Moreover, the experimental evaluation of [22] suggests that Q-Learning scales poorly to large real-world formulae.

#### **3.2 Problems**

We now outline two key issues of this approach.


We proceed to outline how we tackle these issues by a more sophisticated approach.

#### **4 A New Hope**

We want to improve reactive synthesis by applying machine learning. As already motivated by [22], we want to approach this problem by identifying "promising" edges, choosing those as initial strategy for SI. Naturally, as a first step, we need training data for our learning approach. In particular, we need to identify which actually are the actual good choices in games, i.e. the *ground truth*. As it turns out, this is more complicated than one might expect.

#### **4.1 Obtaining Training Data with SI**

As SI allows us to solve a game and determine winning edges, one might try to employ SI for obtaining a ground truth (as we did initially). However, SI actually provides us with potentially misleading or even conflicting information! As we already hinted in the introduction through Fig. 1, SI cannot give us a canonical ground truth. In the example, one edge is winning iff the other is not used, and vice versa. Thus, SI will yield a strategy which does not take both edges and we would consider one of them losing. Moreover, note that there is no fundamental reason to prefer one edge over the other, so SI might in one run classify the edge from *v*<sup>2</sup> to *v*<sup>3</sup> as good and in a second run (or on a similar game) do the opposite or even consider neither winning. The underlying problem is that parity games do not allow for a unique *maximally permissive* strategy (see e.g. [3]), thus we cannot derive the "suitability" of an edge from a single solution strategy.

#### **4.2 Solving the Game Tree**

Instead of using a particular strategy obtained from SI, we therefore propose to identify "all" solutions, i.e. all edges which are part of a winning strategy. More formally, for each vertex *v* we want to determine the value of each outgoing edge in the corresponding game tree rooted at *v*. To prefer "shorter" solutions over larger, we add a beta-decay to the value. Concretely, suppose we consider the game tree state *s* = (*v*1*,...,vi*) which ends in a system state *vi*. Then, the value of *s* is defined by val(*s*) = *β* · max*<sup>s</sup>*-<sup>∈</sup>successors(*s*) val(*s* ) for a fixed 0 *<β<* 1.

As we already mentioned, evaluating this tree is intractably large, namely exponential in the size of the game, which itself is already doubly-exponential in the input formula [27,38]. Thus, we employ a classical technique of game theory.

#### **4.3 Monte Carlo Tree Search (MCTS)**

Intuitively, we explicitly unfold the tree up to a specified depth, e.g. 7 plies, and then assign the results of (guided) random sampling to the occurring leaves, approximating the (beta-decayed) value of the game in these vertices.

We describe our method to approximate the value of a node *s* = (*v*1*,...,vi*) in the game tree. In essence, starting from *vi*, we randomly select successors, with the following restrictions for each player. The environment plays *optimally*, i.e. if a state is winning for the environment (which we can determine beforehand through classical approaches) we immediately stop sampling and return a value of 0. Otherwise, the environment heuristically tries to delay the play as long as possible (decreasing the value the system player obtains due to beta-decay). In contrast, the system player checks in a one-step lookahead if a choice is trivially winning, i.e. leading to a state labelled tt, always choosing such an edge if one exists. Otherwise, the system randomly chooses among edges which are not trivially losing, i.e. lead to a ff state. If either player closes a loop, i.e. selects a successor which already occurs along the path, we determine the value by checking if the loop is winning or losing. A loss yields a value of 0, while a win yields *β*length. In summary, we approximate the probability of winning by playing randomly (avoiding obvious mistakes) against an optimal opponent, under-approximating the true value. We deliberately opt for this random-choice approach to prefer regions where there is less potential for error.

#### **4.4 Optimizations**

While MCTS makes approximation of the game tree value feasible, we added several further technical improvements to arrive at a practically viable method.

*SCC Decomposition.* We exploit the structure of the game by decomposing it into its strongly connected components (SCCs) and put them in reverse topological order. Computing (or approximating) the value in that order allows for caching: Once a run in the game tree leaves an SCC, it can only reach SCCs further down in the topological order, and, since we compute values in this order, the value of the reached state is already known, allowing us to re-use it immediately.

*Pruning.* In addition to employing the MCTS values as game values in the tree expansion, we also use it to prune the game tree. In particular, once we computed the Monte Carlo values for each state, we restrict the choice of the environment to the successors which yield (close to) the lowest Monte Carlo value (recall that the environment prefers lower values). We empirically chose 0*.*02 as a threshold, i.e. we only keep those edges for the environment which are within 0*.*02 value of the lowest decision. While in theory this might remove crucial paths due to statistical fluctuations of MCTS, in practice it allows for a much deeper game tree, which in our experiments heavily outweighed the theoretical downside.

### **5 Handling the Truth**

We introduced a way how to obtain a well-founded notion of "value" (to be precise, an approximation thereof) for a choice, i.e. an indication how good this choice is. As such, we can rank edges by their value in each state. Intuitively, picking an edge which is ranked very highly should correspond to a good chance of winning. A high value means that even against an optimal player we can very likely close a winning loop, and, due to beta decay, do so quickly, thus minimizing the chance for an error.

Recall that our goal is to provide a good initial strategy. Thus, the exact values actually are irrelevant, since we only want to give the best edge as initial choice. Instead of trying to predict the exact value, we therefore want to learn this relative ranking. Formally, suppose we consider a system vertex *v* ∈ *V*<sup>S</sup> with edges *E<sup>v</sup>* = {(*v, u*) | (*v, u*) ∈ *E*}. A ranking of edges effectively corresponds to a (total) order ≺*<sup>v</sup>* ⊆ *E<sup>v</sup>* × *Ev*. The principle of *pairwise ranking* [30] suggests that we learn a function *f* : *E<sup>v</sup>* × *E<sup>v</sup>* → {−1*,* 1} that classifies pairs of edges depending on which one is the better choice, i.e. *f*(*e, e* ) = 1 if *e* ≺*<sup>v</sup> e* and −1 otherwise. However, such a function might not be perfect. For example, we could get *f*(*e*1*, e*2) = 1, *f*(*e*2*, e*3) = 1, and *f*(*e*3*, e*1) = 1, which is incompatible with any order. Thus, learning to rank suggests to determine an ordering ≺ that minimizes the *inversions* w.r.t. *f*, i.e. the number of cases where *f*(*e, e* ) = 1 but *e* ≺*<sup>v</sup> e* . This problem, called rank aggregation, is known to be **NP**-hard, and we employ a greedy approximation as suggested by [30].

Our concrete goal thus now is to learn such a function *f* based on the semantic labelling of the start and end vertices of the two edges. We want to employ machine learning for this purpose: While the high-level intuition of the semantic labelling is rather clear, the actual implementation used to obtain the games [24] employs numerous optimizations, separate cases, etc. To provide the reader with a sense of the complexity, we display a single edge in the automaton obtained for a simple formula in Fig. 4, and a real-world scenario in [23, Appendix A.1].


**Fig. 4.** A single transition in the automaton computed for the formula (*a*∧**G***b*)∨**GF***c*.

We proceed to describe (i) (some of) the features we use, i.e. which quantities we extract from the labelling, (ii) the model we employ, and (iii) the dataset and methodology used to train our model.

#### **5.1 Features**

In total, we have defined over 200 different features to convert the edges into a usable vector of reals. In the interest of space we only present the high-level ideas of a small subset which covers most interesting ideas.

Since most information is contained in the states rather than in the edges themselves, the majority of our features are defined for the former. An edge is then either associated with the feature value of its successor or with the change in a feature value between its predecessor and successor. As indicated in Fig. 4, the semantic labelling comprises several formulae, namely a "master" formula, which intuitively indicates the global state, and several "monitors" (which themselves comprise several formulae), monitoring repeating sub-goals. We define *base features*, which convert a single formula to a single number. These features can then be applied to both the master as well as monitor formulae, where further aggregation is necessary. Some notable base features are the following:


their trueness) and aggregate by weighted average (rather than maximum). Additionally, we introduce punishments for failing monitors. Intuitively, this encourages long-term progress for temporal goals.

**One Step** Here, the idea is to recommend an assignment that is to be played in the current state by traversing the syntax tree and propagating recommendations upwards, which is inspired by message passing in graph neural networks. For example, if we see *a* ∧ *b* we strongly recommend playing *a* and *b*, if we see **F**(*a* ∧ *b*) we take the previous recommendation and tune it down, since **F** is "less urgent". The feature value is obtained by measuring how well the valuation of an edge aligns with the recommended assignment.

#### **5.2 Pair Classification by Support Vector Machines**

To instantiate our pair classification function *f*, we opt for support vector machines. In principle, one could employ any binary classifier, which is why we also experimented with other models such as decision trees, random forests or gradient boosted trees. However, SVMs proved to perform best, which we attribute to their great ability to generalize due to their margin maximizing nature [30]. Additionally, SVMs are rather simple (compared to our other options) and provide us with extra information known as *confidence*. Given by the distance of the predicted sample to the decision hyperplane, its magnitude can be interpreted as how confident the SVM is in its prediction. We denote the confidence of a pair (*e*1*, e*2) by *c*(*e*1*, e*2) and use it to slightly alter the greedy ranking algorithm from literature. To rank the edges of a vertex *v*, each edge *e* ∈ *E<sup>v</sup>* gets assigned a score *s*(*e*) = - *e*-∈*Ev,e*-=*<sup>e</sup> <sup>c</sup>*(*e, e* ). Recall that if we predict *e* ≺*<sup>v</sup> e* , the confidence is negative. Finally, we rank the edges according to their score, where a higher score corresponds to a better edge, and the recommended strategy is obtained by playing the highest ranked edge for each state.

#### **5.3 Further Notes on Implementation**

In addition to the feature extraction, there are several other engineering aspects, which are crucial for the final performance. In this section, we comment on the three most important ones.

*Statewise Feature Normalization.* Before passing the features to the model, we proceed to normalize them. Due to possible future applications in on-the-fly solvers, we only consider feature values of edges from the same state for this normalization. The crucial observation is that this already introduces comparative information in the features. A normalized trueness value of 1, for example, means this edge has the best trueness among all other edges from their state although it does not tell us anything about its absolute value. While the latter might also be important in theory, we observed that in practice the statewise normalized value is more important with only a few exceptions.

*State Classification.* We observed several significantly different behaviours required in different states. For example, in some states we need to exclusively focus on the master formula, while in others only the monitors play a role. This also relates to the underlying principles of the automaton construction. It is very difficult, especially for a simple model like an SVM, to switch between different behaviours. We divide states into three groups which approximate the different classes, and train separate models for each class. The three classes we suggest are (i) states without monitors, (ii) states where the master formula does not change in any successor, (iii) and states that fall into neither category. In addition to having the separate models learn separate behaviours, we can also provide them with separate feature sets that only include relevant information. For example, the first class only requires features of the master formula, whereas these can be neglected in the second one.

*Complement Construction.* The underlying automaton construction uses the fact that the system being able to enforce satisfaction of a formula *ϕ* is equivalent to the environment being able to enforce falsification of ¬*ϕ*. In other words, solving the game for the negated formula and swapped roles yields the same result. However, in the game obtained for ¬*ϕ* the role of "system", the player who choses second and for which we learnt the recommendation, i.e. for transitions from states (*p, v*) to *q*, now corresponds to the original environment. This drastically changes the meaning of features. For example, a trueness of 0 suddenly is very desirable. We tackle this by training separate models for both cases. Together with state classification, this yields a total of 6 different models that we assemble for our heuristic.

### **5.4 Training the Model**

With these ideas at hand, we conclude this section by discussing our dataset, in particular how we preprocess it, and how we train our model.

*Dataset and Preprocessing.* As one of our goals is to exploit human bias in writing LTL formulae, the foundation of our dataset is given by the LTL benchmarks of SYNTCOMP.<sup>2</sup>. To further augment the data, we mutate these formulae by randomly replacing temporal operators. This yields new (random) samples that syntactically resemble the original, human-written structure. For practical reasons, we only consider formulae which can be converted to a DPA within 10 min. Ultimately, this leaves us with 405 original and 514 mutated formulae, of which we use 60% each for training, 20% for validation, and 20% for evaluation.

Obtaining the edge pairs for training requires several further steps. First of all, we exclude trivial cases that can easily be detected by simple rules (see Sect. 4.3), allowing our model to focus on complicated cases. Further, we exclude pairs where the ground truth value happens to be equal, as it is unclear which edge the model should predict. In particular, we exclude all edges originating in losing

<sup>2</sup> Available on GitHub https://github.com/SYNTCOMP/benchmarks.

states (since there is no sensible action to recommend). Finally, we only include a limited amount of pairs per game in the training set: Pairs of the same game tend to look similar, thus a few disproportionately large games would result in a very unbalanced dataset. All remaining edge pairs are added in both orders, i.e. ((*e*1*, e*2)*, y*) and ((*e*2*, e*1)*,* −*y*), where *y* ∈ {1*,* −1} determines which edge is better, in order to prioritize teaching symmetry to the model.

*Training.* For each of the 6 models, we first compute mean and standard deviation of the respective training set and use them to standardize the input to N (0*,* 1). Further, we perform recursive feature elimination for each state class individually, adapted to features appearing twice (once for each input edge). For each state class, we ended up with 30–40 features.

For the actual training process, we performed an extensive grid search for several model types (decision trees, random forests, etc., see Sect. 5.2) in order to determine suitable values for the hyper-parameters. As mentioned earlier, we ultimately opted for the SVMs due to their simplicity and generalization abilities.

#### **6 Experimental Evaluation**

In this section, we present experimental evaluation of our tool SemML. The model was learnt by communicating the relevant data to a Python process running scikit-learn [35]. We then extracted the learnt weights and, based on them, implemented the recommendation procedure in Java, on top of Owl [24]. The artifact can be found at [1], which references a slightly improved version from the one we submitted to the artifact evaluation [2].

#### **6.1 Evaluation Goals**

Our primary goal in this work is to show that our approach, enabled by our new ground truth, can be used to solve more complicated instances than the approach of [22], in particular formulae going beyond pure (co-)safety. Thus, our first evaluation goal is the following:

*Research Question 1:* How much does our model based on SVM and the game tree ground truth outperform the trueness-based initial strategy recommendation approach of [22]?

We refer to the trueness-based initial strategy of [22] as TrueSI.

Although not the focus of this work, we ultimately want to improve synthesis through meaningful exploration guidance, in particular, by suggesting likely winning edges. Thus, we are interested how our prototype performs in a real-world scenario.

*Research Question 2:* How do initial strategies recommended by our approach synergize with state-of-the-art synthesis tools?

We address both questions separately.

#### **6.2 RQ1: Quality of Initial Strategy**

*Datasets.* To fairly compare to [22], we consider the same dataset, i.e. randomly generated LTL formulae, split into three categories: "(Co-)Safety", "Near (Co- )Safety", and "Parity". See [22] for details on how these are obtained. In essence, the tool randltl [7] is used to generate random formulae with different biases. Then, we filter out formulae which need more than 10 min to be translated to a parity automaton. As a second dataset, we also use some (original and mutated) SYNTCOMP formulae (the test set described in Sect. 5.4). We only consider formulae where the corresponding game can be won by system. We do this simply because we can only recommend on games which are winning – otherwise there is no preference on edges since every action is losing by definition. In total, this leaves 262 randomly generated formulae and 123 from SYNTCOMP.

*Metrics.* We consider two metrics for our comparison. Firstly, similar to [22], we consider the fraction of *immediately solved* games, i.e. games where following actions recommended by SemML or TrueSI directly yields a winning strategy. In light of our motivation to augment SI solvers, we want to measure how "close" the recommended strategy is to being correct in case is not immediately winning. To this end, we feed it to (a modified version of) the parity game solver Oink [6] and compute the *(relative) distance* of the obtained strategy, as follows. We count the number of (reachable) states in which the winning strategy determined by Oink differs from the recommended one, i.e. how many "wrong" choices were recommended, and divide it by the total amount of (reachable) states. We note that this unfortunately induces a slight bias that we cannot measure: Oink may potentially change winning decisions because of internal details of the algorithm. Ideally, we would want to obtain the minimal distance over all winning strategies; however this quantity is intractable to compute due to the exponential size of the strategy space. Nevertheless, we believe that this measure strongly correlates with the quality of the strategy.

We argue that simply measuring the number of iterations required by strategy iteration to converge is a too crude metric: On the one hand, even a "very wrong" strategy can be changed to a winning strategy in a single iteration by changing the choice in every single state. On the other hand, even a nearly correct strategy, requiring only a hand full of changes, may need as many iterations. Moreover, this additionally induces the same bias as above.

**Table 1.** Summary of our comparison between TrueSI, the approach of [22], and our tool SemML. We first list the fraction of immediately winning strategies (larger is better), followed by the geometric mean of the relative distance, i.e. the fraction of states in which the decision was adapted by Oink to obtain a winning strategy (smaller is better). For the first comparison, we also consider random initialization as a baseline. For this second comparison to be fair, we only consider games where neither tool yielded an immediately winning strategy.


**Fig. 5.** A detailed comparison on SYNTCOMP formulae. The left plot compares how many games were immediately solved, grouped by size and considering the (arithmetic) mean in each group. SemML's values are displayed by crosses, TrueSI by circles. The right plot compares the relative distance of SemML's and TrueSI's solutions.

*Expectations.* Since our approach incorporates trueness as one of its many features, we expect that our approach should be at least on par with the previous one of [22]. As we also consider long-term temporal information beyond trueness, we particularly expect to outperform TrueSI on larger, more complicated instances.

*Results.* We ran this evaluation on consumer hardware (Intel Core i7-8565U with 16GB RAM). We summarize our findings in Table 1. Clearly, our approach vastly outperforms the previous one. In particular, while TrueSI perfectly handles (co-)safety formulae, its performance quickly drops when going to more complicated formulae. In comparison, the SemML solves the vast majority of formulae immediately, even on the quite complicated SYNTCOMP dataset. We note that these findings are not "absolute" (as to be expected from machine learning approaches). There are few instances where the previous approach does perform better. Our baseline comparison to a random initialization approach validates that both approaches indeed solve a non-trivial problem.

Since we are particularly interested in complex, "human written" formulae, we investigate the SYNTCOMP dataset more closely. In Fig. 5, we provide a more detailed view on our two metrics. First, we investigate how the "immediately solving" performance evolves in comparison to the size of the game, which intuitively correlates with the difficulty of the synthesis question. We observe that SemML solves practically all smaller games and still performs well on larger games, compared to TrueSI, which quickly falls off. The second plot displays the relative distances for each instance which neither recommendation solved immediately. We clearly see that the strategies recommended by SemML are better in almost all cases.

This positively answers our first question. Aside from the direct comparison to the previous approach, the significant percentage of immediately solved games gives us an interesting implication: If SemML solves many games immediately, we can use SemML as a best-effort guidance tool for reactive synthesis questions which are intractably large to solve. Moreover, SemML thus presents us with a constant size representation of a winning strategy for many games, effectively described by approximately a few hundred SVM weights compared to a decision table for thousands of states in *each* game.

#### **6.3 RQ2: On-the-fly SemML**

In our second experiment, we evaluate the suitability of SemML for real-world parity game solving by using it as guidance tool for the state-of-the-art reactive synthesis tool Strix [33].

*Strix' Anatomy.* We first briefly describe how Strix works and how it uses guidance heuristics. In essence, Strix builds the parity game on-the-fly, i.e. iteratively constructs parts of the game it deems important. Then, two strategy improvements are running in parallel, one for either player. Not yet explored states are treated as losing for both. In this way, if we find a winning strategy for either player on the constructed part of the game, it is winning for the complete game. Otherwise, we need to explore further. Here, a key ingredient for practical efficiency is a heuristic to decide which states should be explored first: If we explore states reachable under the "smallest" winning strategy, we naturally find this strategy as quickly as possible. In its current form, Strix employs trueness for this guidance and selects an *automaton* edge with the *globally* highest trueness for exploration. (Dually, edges with the lowest trueness are also followed, since these are "promising" for the environment.)

*Integration.* We integrate SemML with Strix as follows. Suppose we are asked to compute a global score for an automaton edge *e* = (*p, q*) (recall that SemML gives *local* advice on edges in the *game*). We explicitly build up the game between the automaton states *p* and *q*, i.e. all choices of the environment in *p* followed by the respective system choices. For each occurring system state *s*, we compute the SemML ranking score as explained in Sect. 5.2, i.e. the confidence based score. This only gives us local information: the magnitude of our score only reflects the preference relative to actions available in the system state *s* = (*p, v*). Since the previously used trueness proved to be a good indicator for global progress, we multiply our local score by this global value. Finally, to obtain a value for the automaton edge, we take the minimal value of all arising system states, since the environment chooses first. We additionally apply straightforward rules such as assigning values of 0 and 1 values to ff and tt states, respectively. Finally, Strix by default employs a decomposition approach, which does not build a single DPA. Therefore, SemML would not be applicable, and we disable it for the purpose of evaluation.

*Dataset.* We considered 188 randomly selected formulae of SYNTCOMP (which were not used in the training of the model), also including unrealizable ones.

*Metrics.* We evaluate the total required time to solve the game and compare to Strix in its normal configuration. Since we expect the unoptimized computation of SemML's advice to take considerable time, we separately measure the required time and additionally perform a comparison with this time subtracted. Since our scoring function is a straightforward SVM, we strongly believe that by tailoring the evaluation to Strix' requirements, it can be significantly sped up. In particular, our advice computation re-constructs information which is computed during the exploration of the automaton but difficult to access without significant changes to both Strix and Owl.

*Expectations.* We do not expect this approach to work to its full potential because Strix architecture does not exactly fit our approach (recall that our primary motivation was to compare to [22]). We discuss these differences and possible ways to address them later. Moreover, as we construct the intermediate game states for every recommendation and evaluate the recommender SVM several times, we expect that significant time is spent computing the advice of SemML.

*Results.* We conducted our experiments on a server with an Intel Xeon E5-2630 v4 processor with 256GiB of RAM and employed a 10 min timeout per execution. We summarize our findings in Fig. 6. Strikingly, our approach already performs favourably, despite the differences in architecture, hardly optimized advice computation, and no specific re-training for the task at hand. Excluding the time spent for advice computation, our approach performs significantly better in practically all instances. This answers our second question positively, too.

**Adapting SemML to Strix** In order to adapt our underlying approach, we require several non-trivial changes to SemML. We discuss the "mismatches" between the current approach and how they could be addressed. First, Strix selects a globally optimal edge to explore while SemML suggest actions locally. In particular,

**Fig. 6.** Scatter plot comparing Strix with guidance provided by SemML and the default Trueness. On the left, we depict the total runtime excluding time spent for computing the guidance, and on the right we show the total time. We plot all models for which at least one method produced a result and count timeouts as 20 min (twice the timeout of 10 min). Note that the plot is logarithmic. The dashed lines denote a 10x difference.

our scoring is not trained to compare edges of two different states. While trueness seems to be a good compromise for the time being, we believe that (through significant engineering effort) Strix can be modified to accommodate local recommendations, or, alternatively, a more sophisticated indicator of a state's global relevance can be learnt. Second, Strix performs two searches, one for the environment and one for the system player. However, the parity games we deal with are not entirely symmetric – environment always moves first. Thus, we cannot directly apply SemML's ranking to environment states, as they have a different structure. Here, we believe that the best solution is to train a separate model for the environment (or rather, six further models). Thirdly, Strix only constructs the automaton explicitly and computes the game implicitly. As such, Strix requests scoring information only for edges in the automaton and not in the game. This can be addressed by closely integrating the scoring computation with the exploration of the automaton – instead of rebuilding the game for each edge (*p, q*), we can compute all scores for all outgoing edges of *p* at once. Finally, as we mentioned, Strix by default applies a decomposition approach which builds several sub-automata. These also are equipped with semantic labelling, however with a different meaning – enough to create a significant hurdle for our learning approach. We note that Strix actually builds automata by communicating with Owl through a highly optimized interface between Java and C++, significantly complicating passing information back and forth between the processes.

### **7 Conclusion**

We demonstrated that semantic labelling can be exploited for practical gains in LTL synthesis. Our experimental evaluation shows that we vastly outperform the simple approach of [22], the first step in this direction. Moreover, despite several mismatches, our approach shows promising results for real world applications of this idea, i.e. when combined with the state-of-the-art tool Strix.

*Future Work.* As discussed above, the main point for future work is a tight, tailored integration with Strix. In particular, we want to modify our approach to be applicable to the decomposition methods of Strix, modify Strix to consider local guidance, and actually learn for the precise task required by Strix.

Aside from this, we believe that there might be further interesting features (hand-crafted or learnt) which could provide us with additional insights. In particular, we want to employ automated feature extraction, through more sophisticated model architectures such as *transformers* or *graph neural networks*.

### **References**


414 J. Křetínský et al.

43. Vardi, M.Y., Wolper, P.: An automata-theoretic approach to automatic program verification (preliminary report). In: Proceedings of the Symposium on Logic in Computer Science (LICS 1986), Cambridge, Massachusetts, 16–18 June 1986, pp. 332–344. IEEE Computer Society (1986)

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Policy Synthesis and Reinforcement Learning for Discounted LTL

Rajeev Alur<sup>1</sup> , Osbert Bastani<sup>1</sup> , Kishor Jothimurugan<sup>1</sup> , Mateo Perez2(B) , Fabio Somenzi<sup>2</sup> , and Ashutosh Trivedi<sup>2</sup>

<sup>1</sup> University of Pennsylvania, Philadelphia, PA, USA {alur,obastani,kishor}@seas.upenn.edu <sup>2</sup> University of Colorado Boulder, Boulder, USA {mateo.perez,fabio,ashutosh.trivedi}@colorado.edu

Abstract. The difficulty of manually specifying reward functions has led to an interest in using linear temporal logic (LTL) to express objectives for reinforcement learning (RL). However, LTL has the downside that it is sensitive to small perturbations in the transition probabilities, which prevents probably approximately correct (PAC) learning without additional assumptions. Time discounting provides a way of removing this sensitivity, while retaining the high expressivity of the logic. We study the use of discounted LTL for policy synthesis in Markov decision processes with unknown transition probabilities, and show how to reduce discounted LTL to discounted-sum reward via a reward machine when all discount factors are identical.

### 1 Introduction

Reinforcement learning [39] (RL) is a sampling-based approach to synthesis in systems with unknown dynamics where an agent seeks to maximize its accumulated reward. This reward is typically a real-valued feedback that the agent receives on the quality of its behavior at each step. However, designing a reward function that captures the user's intent can be tedious and error prone, and misspecified rewards can lead to undesired behavior, called *reward hacking* [5].

Due to the aforementioned difficulty, recent research [8,17,23,31,35] has shown interest in utilizing high-level logical specifications, particularly linear temporal logic [7] (LTL), to express intent. However, a significant challenge arises due to the sensitivity of LTL, similar to other infinite-horizon objectives like average reward and safety, to small changes in transition probabilities. Even slight modifications in transition probabilities can lead to significant impacts on the value, such as enabling previously unreachable states to become reachable. Without additional information on the transition probabilities, such as the minimum nonzero transition probability, LTL is proven to be not probably approximately correct (PAC) [29] learnable [3,43]. Ideally, it is desirable to maintain PAC learnability while still keeping the benefits of a highly expressive temporal logic.

This research was partially supported by ONR award N00014-20-1-2115, NSF grant CCF-2009022, and NSF CAREER award CCF-2146563.

c The Author(s) 2023

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 415–435, 2023. https://doi.org/10.1007/978-3-031-37706-8\_21

Fig. 1. Example showing non-robustness of safety specifications.

Discounting can serve as a solution to this problem. Typically, discounting is used to encode time-sensitive rewards (i.e., a payoff is worth more today than tomorrow), but it has a useful secondary effect that payoffs received in the distant future have small impact on the accumulated reward today. This insensitivity enables PAC learning without requiring any prior knowledge of the transition probabilities. In RL, discounted reward is commonly used and has numerous associated PAC learning algorithms [29].

In this work, we examine the discounted LTL of [2] for policy synthesis in Markov decision processes (MDPs) with unknown transition probabilities. We refer to such MDPs as "unknown MDPs" throughout the paper. This logic maintains the syntax of LTL, but discounts the temporal operators. Discounted LTL gives a quantitative preference to traces that satisfy the objective sooner, and those that delay failure as long as possible. The authors of [2] examined discounted LTL in the model checking setting. Exploring policy synthesis and learnability for discounted LTL specifications is novel to this paper.

To illustrate how discounting affects learnability, consider the example [32] MDP shown in Fig. 1. It consists of a safe state s0, two sink states s1, s2, and two actions a1, a2. Taking action a<sup>i</sup> in s<sup>0</sup> leads to a sink state with probability p<sup>i</sup> and stays in s<sup>0</sup> with probability 1 − pi. Suppose we are interested in learning a policy to make sure that the system always stays in the state s0. Now consider two scenarios—one in which p<sup>1</sup> = 0 and p<sup>2</sup> = δ and another in which p<sup>2</sup> = 0 and p<sup>1</sup> = δ where δ > 0 is a small positive value. In the former case, the optimal policy is to always choose a<sup>1</sup> in s<sup>0</sup> and in the latter case, we need to choose a<sup>2</sup> in s0. Furthermore, it can be shown that a near-optimal policy in one case is not near-optimal in another. However, we cannot select a finite number of samples needed to distinguish between the two cases (with high probability) without knowledge of δ. In contrast, the time-discounted semantics of the safety property evaluates to <sup>1</sup> <sup>−</sup> <sup>λ</sup><sup>k</sup> where <sup>k</sup> is the number of time steps spent in the state s0. Then, for sufficiently small δ, any policy achieves a high value w.r.t. the discounted safety property in both scenarios. In general, small changes to the transition probabilities do not have drastic effects on the nature of near-optimal policies for discounted interpretations of LTL properties.

*Contributions.* Table 1 summarizes results of this paper in the context of known results regarding policy synthesis for various classes of specifications. We consider three key properties of specifications, namely, (1) whether there is a finite-state optimal policy and whether there are known algorithms for (2) computing an optimal policy when the MDP is known, as well as for (3) learning a near-optimal


Table 1. Policy synthesis in MDPs for different classes of specifications.

policy when the transition probabilities are unknown (without additional assumptions). The classes of specifications include reward machines with discounted-sum rewards [24], linear temporal logic (LTL) [7], discounted LTL and a variant of discounted LTL in which all discount factors are identical, which we call *uniformly* discounted LTL. In this paper, we show the following.


*Related Work.* Linear temporal logic (LTL) is a popular and expressive formalism to unambiguously express qualitative safety and progress requirements of Kripke structures and MDPs [7]. The standard approach to model check LTL formulas against MDPs is the *automata-theoretic* approach where the LTL formulas are first translated to a class of good-for-MDP automata [20], such as limitdeterministic Büchi automata [18,36,37,40], and then, efficient graph-theoretic techniques (computing accepting end-component and then maximizing the probability to reach states in such components) [13,30,40] over the product of the automaton with the MDP can be used to compute optimal satisfaction probabilities and strategies. Since LTL formulas can be translated into (deterministic) automata in doubly exponential time, the probabilistic model checking problem is in 2EXPTIME with a matching lower bound [11].

Several variants of LTL have been proposed that provide discounted temporal modalities. De Alfaro et al. [15] proposed an extension of μ-calculus with discounting and showed [14] the decidability of model-checking over finite MDPs. Mandrali [33] introduced discounting in LTL by taking a discounted sum interpretation of logic over a trace. Littman et al. [32] proposed geometric LTL as a logic to express learning objectives in RL. However, this logic has unclear semantics for nesting operators. Discounted LTL was proposed by Almagor, Boker, and Kupferman [2], which considers discounting without accumulation. The decidability of the policy synthesis problem for discounted LTL against MDPs is an open problem.

An alternative approach to discounting that ensuring PAC learnability is to introduce a fixed time horizon, along with a temporal logic for finite traces. In this setting, the logic LTL<sup>f</sup> is the most popular [10,16]. Using LTL<sup>f</sup> with a finite horizon yields simple algorithms [41], finite automata suffice for checking properties, but at the expense of the expressivity of the logic, formulas like **GF**p and **FG**p both mean that p occurs at the end of the trace.

There has been a lot of recent work on reinforcement learning from temporal specifications [1,9,16,19,21,22,24–28,31,32,42,44]. Such approaches often lack strong convergence guarantees. Some methods have been developed to reduce LTL properties to discounted-sum rewards [8,19] while preserving optimal policies; however they rely on the knowledge of certain parameters that depend on the transition probabilities of the unknown MDP. Recent work [3,32,43] has shown that PAC algorithms that do not depend on the transition probabilities do not exist for the class of LTL specifications. There has also been work on learning algorithms for LTL specifications that provide guarantees when additional information about the MDP (e.g., the smallest nonzero transition probability) is available [6,12,17].

### 2 Problem Definition

An alphabet Σ is a finite set of letters. A finite word (resp. ω-word) over Σ is defined as a finite sequence (resp. ω-sequence) of letters from Σ. We write Σ<sup>∗</sup> and Σ<sup>ω</sup> for the set of finite and ω-words over Σ.

A *probability distribution* over a finite set S is a function d: S→[0, 1] such that - <sup>s</sup>∈<sup>S</sup> <sup>d</sup>(s)=1. Let <sup>D</sup>(S) denote the set of all discrete distributions over <sup>S</sup>.

*Markov Decision Processes.* A Markov Decision Process (MDP) is a tuple M = (S, A, s0, P), where S is a finite set of states, s<sup>0</sup> is the initial state, A is a finite set of actions, and P : S × A → D(S) is the transition probability function. An *infinite run* <sup>ψ</sup> <sup>∈</sup> (S×A)<sup>ω</sup> is a sequence <sup>ψ</sup> <sup>=</sup> <sup>s</sup>0a0s1a<sup>1</sup> ..., where <sup>s</sup><sup>i</sup> <sup>∈</sup> <sup>S</sup> and <sup>a</sup><sup>i</sup> <sup>∈</sup> <sup>A</sup> for all <sup>i</sup> <sup>∈</sup> <sup>Z</sup>≥<sup>0</sup>. For any run <sup>ψ</sup> and any <sup>i</sup> <sup>≤</sup> <sup>j</sup>, we let <sup>ψ</sup>i:<sup>j</sup> denote the subsequence siaisi+1ai+1 ...a<sup>j</sup>−<sup>1</sup>s<sup>j</sup> . Similarly, a *finite run* h ∈ (S×A)<sup>∗</sup>×S is a finite sequence <sup>h</sup> <sup>=</sup> <sup>s</sup>0a0s1a<sup>1</sup> ...a<sup>t</sup>−<sup>1</sup>st. We use <sup>Z</sup>(S, A)=(S×A)<sup>ω</sup> and <sup>Z</sup><sup>f</sup> (S, A)=(S×A)<sup>∗</sup>×<sup>S</sup> to denote the set of infinite and finite runs, respectively.

A policy π : Z<sup>f</sup> (S, A) → D(A) maps a finite run h ∈ Z<sup>f</sup> (S, A) to a distribution π(h) over actions. We denote by Π(S, A) the set of all such policies. A policy π is deterministic if, for all finite runs h ∈ Z<sup>f</sup> (S, A), there is an action a ∈ A with π(h)(a)=1.

Given a finite run h = s0a<sup>0</sup> ...a<sup>t</sup>−<sup>1</sup>st, the *cylinder* of h, denoted by Cyl(h), is the set of all infinite runs with prefix h. Given an MDP M and a policy π ∈ Π(S, A), we define the probability of the cylinder set by D<sup>M</sup><sup>π</sup> (Cyl(h)) = <sup>t</sup>−<sup>1</sup> <sup>i</sup>=0 π(h0:i)(ai)P(si, ai, si+1). It is known that D<sup>M</sup><sup>π</sup> can be uniquely extended to a probability measure over the σ-algebra generated by all cylinder sets. Let P be a finite set of atomic propositions and Σ = 2<sup>P</sup> denote the set of all valuations of propositions in <sup>P</sup>. An infinite word <sup>ρ</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> is a map <sup>ρ</sup> : <sup>Z</sup>≥<sup>0</sup> <sup>→</sup> <sup>Σ</sup>.

Definition 1 (Discounted LTL). *Given a set of atomic propositions* P*,* discounted LTL *formulas over* P *are given by the grammar*

$$\varphi := b \in \mathcal{P} \mid \neg \varphi \mid \varphi \vee \varphi \mid \mathbf{X} \lambda \varphi \mid \varphi \text{ U} \lambda \varphi \varphi$$

*where* λ ∈ [0, 1)*. Note that, in general, different temporal operators within the same formula may have different discount factors* λ*. For a formula* ϕ *and a word* <sup>ρ</sup> <sup>=</sup> <sup>σ</sup>0σ<sup>1</sup> ... <sup>∈</sup> (2<sup>P</sup> )<sup>ω</sup>*, the semantics* ϕ, ρ ∈ [0, 1] *is given by*

$$\begin{aligned} \left[b,\rho\right] &= \mathbb{1}\left(b \in \sigma\_{0}\right) \\ \left[\neg\varphi,\rho\right] &= 1 - \left[\varphi,\rho\right] \\ \left[\varphi\_{1}\vee\varphi\_{2},\rho\right] &= \max\left\{\left[\varphi\_{1},\rho\right],\left[\varphi\_{2},\rho\right]\right\} \\ \left[\mathbf{X}\_{\lambda}\varphi,\rho\right] &= \lambda \cdot \left[\varphi,\rho\_{1:\infty}\right] \\ \left[\varphi\_{1}\mathbf{U}\_{\lambda}\varphi\_{2},\rho\right] &= \sup\_{i\geq 0}\left\{\min\left\{\lambda^{i}\left[\varphi\_{2},\rho\_{i:\infty}\right],\min\_{0\leq j$$

*where* ρi:<sup>∞</sup> = σiσi+1 ... *denotes the infinite word starting at position* i*.*

Conjunction is defined using ϕ<sup>1</sup> ∧ ϕ<sup>2</sup> = ¬(¬ϕ<sup>1</sup> ∨ ¬ϕ2). We use **F**λϕ = true**U**λϕ and **G**λϕ = ¬**F**λ¬ϕ to denote the discounted versions of *finally* and *globally* operators respectively. Note that when all discount factors equal 1, the semantics corresponds to the usual semantics of LTL.

For this paper, we consider the case of strict discounting, where λ < 1. We refer to the case where the discount factor is the same for all temporal operators as *uniform discounting*. Our definition differs from [2] in two ways: 1) we discount the next operator, and 2) we enforce strict, exponential discounting.

*Example Discounted LTL Specifications.* To develop an intuition of the semantics of discounted LTL, we now present a few example formulas and their meaning.


*Policy Synthesis Problem.* Given an MDP M = (S, A, s0, P), we assume that we have access to a *labelling function* L : S → Σ that maps each state to the set of propositions that hold true in that state. Given any run ψ = s0a0s1a<sup>1</sup> ... we can define an infinite word L(ψ) = L(s0)L(s1)... that denotes the corresponding sequence of labels. Given a policy π for M, we define the value of π with respect to a discounted LTL formula ϕ as

$$\mathcal{J}^{\mathcal{M}}(\pi,\varphi) = \underset{\rho \sim \mathcal{D}\_{\pi}^{\mathcal{M}}}{\mathbb{E}}[\varphi,\rho] \tag{1}$$

and the optimal value for M with respect to ϕ as J <sup>∗</sup>(M, ϕ) = sup<sup>π</sup> J <sup>M</sup>(π,ϕ). We say that a policy π is optimal for ϕ if J <sup>M</sup>(π,ϕ) = J <sup>∗</sup>(M, ϕ). Let Πopt(M, ϕ) denote the set of optimal policies. Given an MDP M, a labelling function L and a discounted LTL formula ϕ, the policy synthesis problem is to compute an optimal policy π ∈ Πopt(M, ϕ) when one exists.

*Reinforcement Learning Problem.* In reinforcement learning, the transition probabilities P are unknown. Therefore, we need to interact with the environment to learn a policy for a given specification. In this case, it is sufficient to learn an εoptimal policy <sup>π</sup> that satisfies <sup>J</sup> <sup>M</sup>(π,ϕ) ≥ J <sup>∗</sup>(M, ϕ)−ε. We use <sup>Π</sup><sup>ε</sup> opt(M, ϕ) to denote the set of ε-optimal policies. Formally, a learning algorithm A is an iterative process which, in every iteration n, (i) takes a step in M from the current state, (ii) outputs a policy π<sup>n</sup> and (iii) optionally resets the current state to s0. We are interested in probably-approximately correct (PAC) learning algorithms.

Definition 2 (PAC-MDP). *A learning algorithm* A *is said to be PAC-MDP for a class of specifications* C *if, there is a function* η *such that for any* p > 0*,* ε > 0*, MDP* M = (S, A, s0, P)*, labelling function* L*, and specification* ϕ ∈ C*, taking* <sup>N</sup> <sup>=</sup> <sup>η</sup>(|S|, <sup>|</sup>A|, <sup>|</sup>ϕ|, <sup>1</sup> <sup>p</sup> , <sup>1</sup> <sup>ε</sup> )*, with probability at least* 1 − p*, we have*

$$\left| \left\{ n \mid \pi\_n \notin H\_{opt}^\varepsilon(\mathcal{M}, \varphi) \right\} \right| \le N.$$

It has been shown that there does not exist PAC-MDP algorithms for LTL specifications. Therefore, we are interested in the class of discounted LTL specifications that are strictly discounted, i.e. λ < 1 for every temporal operator.

### 3 Properties of Discounted LTL

In this section, we discuss important properties of discounted LTL regarding the nature of optimal policies. We first show that, under uniform discounting, the amount of memory required for the optimal policy may increase with the discount factor. We then show that, in general, allowing multiple discount factors may result in optimal policies requiring infinite memory. This motivates our restriction to the uniform discounting case in Sect. 4. We end this section by introducing a PAC learning algorithm for discounted LTL.

#### 3.1 Nature of Optimal Policies

It is known that for any (undiscounted) LTL formula ϕ and any MDP M, there exists a *finite memory* policy that is optimal—i.e., the policy stores only a finite amount of information about the history. Formally, given an MDP M = (S, A, s0, P), a finite memory policy π = (M, δM, μ, m0) consists of a finite set of memory states M, a transition function δ<sup>M</sup> : M × S × A → M and an action function μ : M × S → D(A). Given a finite run h = s0a<sup>0</sup> ...s<sup>t</sup> = h st, the policy's action is sampled from μ(δM(m0, h ), st) where δ<sup>M</sup> is also used to represent the transition function extended to sequences of state-action pairs. We use Π<sup>f</sup> (S, A) to denote the set of finite memory policies. In this paper, we will show that uniformly discounted LTL admits finite memory optimal policies, but that infinite memory may be required for the general case.

Unlike (undiscounted) LTL, discounted LTL allows a notion of satisfaction quality. In discounted LTL, traces which satisfy a reachability objective sooner are given a higher value, and are thus preferred. If an LTL formula cannot be satisfied, the corresponding discounted LTL formula will assign higher values to traces which delay failure as long as possible. These properties of discounted LTL are desirable for enabling notions of promptness, but may yield more complex strategies which try to balance the values of multiple competing subformulas.

*Example 1.* Consider the discounted LTL formula ϕ = **G**λp∧**F**λ¬p. This formula contains two competing objectives that cannot both be completely satisfied. Increasing the value of **G**λp by increasing the number of p's at the beginning of the trace before the first ¬p decreases the value of **F**λ¬p. Under the semantics of conjunction, the value of ϕ is the minimum of the two subformulas. Specifically, the value of ϕ w.r.t. a word ρ is

$$\begin{aligned} \left[\mathbf{G}\boldsymbol{\lambda}p \wedge \mathbf{F}\boldsymbol{\lambda}\neg p, \rho\right] &= \left[\neg \mathbf{F}\boldsymbol{\lambda}\neg p \wedge \mathbf{F}\boldsymbol{\lambda}\neg p, \rho\right] \\ &= \left[\neg (\mathbf{F}\boldsymbol{\lambda}\neg p \lor \neg \mathbf{F}\boldsymbol{\lambda}\neg p), \rho\right] \\ &= 1 - \max\{\left[\mathbf{F}\boldsymbol{\lambda}\neg p, \rho\right], \left[\neg \mathbf{F}\boldsymbol{\lambda}\neg p, \rho\right]\} \\ &= 1 - \max\left\{\sup\_{i\geq 0} \{\boldsymbol{\lambda}^{i}\left[\neg p, \rho\_{i:\infty}\right]\}, 1 - \sup\_{i\geq 0} \{\boldsymbol{\lambda}^{i}\left[\neg p, \rho\_{i:\infty}\right]\}\right\}. \end{aligned}$$

where ρi:<sup>∞</sup> is the trace starting from index i. Now consider a two state (deterministic) MDP with two states S = {s1, s2} and two actions A = {a1, a2} in which the agent can decide to either stay in s<sup>1</sup> or move to s<sup>2</sup> at any step and the system stays in s<sup>2</sup> upon reaching s2. This MDP can be seen in Fig. 2. We have one proposition p which holds in state s<sup>1</sup> and not in s2. Note that all runs produced by the example MDP are either of the form s<sup>ω</sup> <sup>1</sup> or s<sup>k</sup> 1s<sup>ω</sup> <sup>2</sup> . The discounted LTL value of runs of the form s<sup>ω</sup> <sup>1</sup> is 0. The value of runs of the form ψ = s<sup>k</sup> 1s<sup>ω</sup> <sup>2</sup> is

$$v(k) = \left[\varphi, L(\psi)\right] = 1 - \max\{\lambda^k, 1 - \lambda^k\}\ \dots$$

A finite memory policy stays in s<sup>1</sup> for k steps will yield this value. Since λ<sup>k</sup> is decreasing in <sup>k</sup> and <sup>1</sup>−λ<sup>k</sup> is increasing in <sup>k</sup>, the integer value of <sup>k</sup> that maximizes <sup>v</sup>(k) lies in the interval [<sup>γ</sup> <sup>−</sup> <sup>1</sup>, γ + 1] where <sup>γ</sup> <sup>∈</sup> <sup>R</sup> satisfies <sup>λ</sup><sup>γ</sup> = 1 <sup>−</sup> <sup>λ</sup><sup>γ</sup>. Figure <sup>2</sup> shows this graphically. We have that γ = log(0.5) log(λ) which is increasing in λ. Hence, the amount of memory required increases with increase in λ.

Fig. 2. An example showing that memory requirements for optimal policies may depend on the discount factor. The red line is <sup>λ</sup><sup>k</sup>, the blue line is <sup>1</sup> *<sup>−</sup>* <sup>λ</sup><sup>k</sup> and the solid black line is <sup>v</sup>(k)=1 *<sup>−</sup>* max*{*λ<sup>n</sup>, <sup>1</sup> *<sup>−</sup>* <sup>λ</sup><sup>n</sup>*}*, where <sup>k</sup> is the number of time steps one remains in <sup>s</sup>0. The dashed vertical line shows the value <sup>γ</sup> where <sup>v</sup>(k) is maximized. We have set λ = 0.99. Note that changing the value of λ corresponds to rescaling the x-axis. (Color figure online)

The optimal strategy in the example above tries to balance the value of two competing subformula. We will now show that extending this idea to the general case of multiple discount factors requires balancing quantities that are decaying at different speeds. This balancing may require remembering an arbitrarily long history of the trace—infinite memory is required.

Theorem 1. *There exists an MDP* M = (S, A, s0, P)*, a labelling function* L *and a discounted LTL formula* ϕ *such that for all* π ∈ Π<sup>f</sup> (S, A) *we have* JM(π,ϕ) < J <sup>∗</sup>(M, ϕ).

*Proof.* Consider the MDP M depicted in Fig. 3. It consists of three states S = {s0, s1, s2} and two actions A = {a1, a2}. The edges are labelled with actions and the corresponding transition probabilities. There are two propositions P = {p1, p2} and p<sup>1</sup> holds true in state s<sup>1</sup> and p<sup>2</sup> holds true in state s2. The specification is given by ϕ = **F**<sup>λ</sup>1**G**<sup>λ</sup><sup>2</sup> p<sup>1</sup> ∧ **F**<sup>λ</sup><sup>2</sup> p<sup>2</sup> where λ<sup>1</sup> < λ<sup>2</sup> < 1.

Fig. 3. The need for infinite memory for achieving optimality in discounted LTL.

For any run ψ that never visits s2, we have ϕ, L(ψ) = 0 since -**F**<sup>λ</sup><sup>2</sup> p2, L(ψ) = 0. Otherwise the run has the form ψ = s<sup>k</sup><sup>0</sup> <sup>0</sup> s<sup>k</sup><sup>1</sup> <sup>1</sup> s<sup>ω</sup> <sup>2</sup> where k<sup>0</sup> is stochastic and k<sup>1</sup> is a strategic choice by the agent. To show that this requires an infinite amount of memory to play optimally, one just has to show that the optimal choice of k<sup>1</sup> increases with k0. This means that the agent must remember k0, the number of steps spent in the initial state, via an unbounded counter. Note that every value of k<sup>0</sup> has a non-zero probability in M and therefore choosing a suboptimal k<sup>1</sup> for even a single value of k<sup>0</sup> causes a decrease in value from the policy that always chooses optimal k1.

The value of the run ψ is ϕ, L(ψ) = min(λ<sup>k</sup><sup>0</sup> <sup>1</sup> (1 <sup>−</sup> <sup>λ</sup><sup>k</sup><sup>1</sup> <sup>2</sup> ), λ<sup>k</sup>0+k<sup>1</sup> <sup>2</sup> ). Note that λ<sup>k</sup><sup>0</sup> <sup>1</sup> (1 <sup>−</sup> <sup>λ</sup><sup>k</sup><sup>1</sup> <sup>2</sup> ) increases with increase in k<sup>1</sup> and λ<sup>k</sup>0+k<sup>1</sup> <sup>2</sup> decreases with increase in <sup>k</sup>1. Therefore taking <sup>γ</sup> <sup>∈</sup> <sup>R</sup> to be such that <sup>λ</sup><sup>k</sup><sup>0</sup> <sup>1</sup> (1 <sup>−</sup> <sup>λ</sup><sup>γ</sup> <sup>2</sup> ) = <sup>λ</sup><sup>k</sup>0+<sup>γ</sup> <sup>2</sup> , the optimal choice of k<sup>1</sup> lies in the interval [γ−1, γ+1]. Now γ satisfies 1 = (λ2/λ1)<sup>k</sup>0+1 λγ 2 . Since λ<sup>1</sup> < λ<sup>2</sup> < 1 we must have that γ increases with increase in k0. Therefore, k<sup>1</sup> also increases with increase in k0. 

#### 3.2 PAC Learning

In the above discussion, we showed that one might need infinite memory to act optimally w.r.t a discounted LTL formula. However, it can be shown that for any MDP M, labelling function L, discounted LTL formula ϕ and any ε > 0, there is a finite-memory policy π that is ε-optimal for ϕ. In fact, we can show that this class of discounted LTL formulas admit a PAC-MDP learning algorithm.

Theorem 2 (Existence of PAC-MDP). *There exists a PAC-MDP learning algorithm for discounted LTL specifications.*

*Proof (sketch).* Our approach to compute ε-optimal policies for discounted LTL is to compute a policy which is optimal for T steps. The policy will depend on the entire history of atomic propositions that has occured so far.

Given discounted LTL specification ϕ, the first step of the algorithm is to determine T. We select T such that for any two infinite words α and β where the first T + 1 indices match, i.e. α0:<sup>T</sup> = β0:<sup>T</sup> , we have that  [[ϕ, α]]−[[ϕ, β]] <sup>≤</sup> <sup>ε</sup>. Say that the maximum discount factor appearing in all temporal operators is λmax . Due to the strict discounting of discounted LTL, selecting <sup>T</sup> <sup>≥</sup> log <sup>ε</sup> log <sup>λ</sup>max ensures that  [[ϕ, α]] <sup>−</sup> [[ϕ, β]] <sup>≤</sup> <sup>λ</sup><sup>n</sup> <sup>≤</sup> <sup>ε</sup>.

Now we unroll the MDP for T steps. We include the history of the atomic proposition sequence in the state. Given an MDP M = (S, A, s0, P) and a labeling L : S → Σ, the unrolled MDP M<sup>T</sup> = (S , A , s 0, P ) is such that

$$S' = \bigcup\_{t=0}^{T} S \times \underbrace{\Sigma \times \dots \times \Sigma}\_{t \text{ times}},$$

A = A, P ((s, σ0,...,σ<sup>t</sup>−<sup>1</sup>), a,(s , σ0,...,σ<sup>t</sup>−<sup>1</sup>, σt)) = P(s, a, s ) if 0 ≤ t ≤ T and σ<sup>t</sup> = L(s ), and is 0 otherwise (the MDP goes to a sink state if t>T). The leaves of the unrolled MDP are the states where T timesteps have elapsed. In these states, there is an associated finite word of length T. For a finite word of length T, we define the value of any formula ϕ to be zero beyond the end of the trace, i.e. [[ϕ, ρj:∞]] = 0 for any j>T. We then compute the value of the finite words associated with the leaves which is then considered as the reward at the final step. We can use existing PAC algorithms to compute an ε-optimal policy w.r.t. this reward for the finite horizon MDP M<sup>T</sup> from which we can obtain a 2ε-optimal policy for M w.r.t the specification ϕ. 

### 4 Uniformly Discounted LTL to Reward Machines

In general, optimal strategies for discounted LTL require infinite memory (Theorem 1). However, producing such an example required the use of multiple, varied discount factors. In this section, we will show that finite memory is sufficient for optimal policies under uniform discounting, where the discount factors for all temporal operators in the formula are the same. We will also provide an algorithm for computing these strategies.

Our approach is to reduce uniformly discounted LTL formulas to *reward machines*, which are finite state machines in which each transition is associated with a reward. We show that the value of a given discounted LTL formula ϕ for an infinite word ρ is the discounted-sum reward computed by a corresponding reward machine.

Formally, a reward machine is a tuple R = (Q, δ, r, q0, λ) where Q is a finite set of states, <sup>δ</sup> : <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>Q</sup> is the transition function, <sup>r</sup> : <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>R</sup> is the reward function, q<sup>0</sup> ∈ Q is the initial state, and λ ∈ [0, 1) is the discount factor. With any infinite word <sup>ρ</sup> <sup>=</sup> <sup>σ</sup>0σ<sup>1</sup> ... <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>, we can associate a sequence of rewards c0c<sup>1</sup> ... where c<sup>t</sup> = r(qt, σt) with q<sup>t</sup> = δ(q<sup>t</sup>−<sup>1</sup>, σ<sup>t</sup>−<sup>1</sup>) for t > 0. We use R(ρ) to denote the discounted reward achieved by ρ,

$$\mathcal{R}(\rho) = \sum\_{t=0}^{\infty} \lambda^t c\_t,$$

and R(w) to denotes the partial discounted reward achieved by the finite word w = σ0σ<sup>1</sup> ...σ<sup>T</sup> ∈ Σ∗—i.e., R(w) = -T <sup>t</sup>=0 λ<sup>t</sup> c<sup>t</sup> where c<sup>t</sup> is the reward at time t.

Given a reward machine R and an MDP M, our objective is to maximize the expected value R(ρ) from the reward machine reading the word ρ produced by the MDP. Specifically, the value for a policy π for M is

$$\mathcal{J}^{\mathcal{M}}(\pi,\mathcal{R}) = \underset{\rho \sim \mathcal{D}\_{\pi}^{\mathcal{M}}}{\mathbb{E}}[\mathcal{R}(\rho)],$$

where π is optimal if J <sup>M</sup>(π, R) = sup<sup>π</sup> J <sup>M</sup>(π, R). Finding such an optimal policy is straightforward: we consider the product of the reward machine R with the MDP M to form a product MDP with a discounted reward objective. In the corresponding product MDP, we can compute optimal policies for maximizing the expected discounted-sum reward using standard techniques such as policy iteration and linear programming. If the transition function of the MDP is unknown, this product can be formed on-the-fly and any RL algorithm for discounted reward can be applied. Using the state space of the reward machine as memory, we can then obtain a finite-memory policy that is optimal for R.

We have the following theorem showing that we can construct a reward machine R<sup>ϕ</sup> for every uniformly discounted LTL formula ϕ.

Theorem 3. *For any uniformly discounted LTL formula* ϕ*, in which all temporal operators use a common discount factor* λ*, we can construct a reward machine* <sup>R</sup><sup>ϕ</sup> = (Q, δ, r, q0, λ) *such that for any* <sup>ρ</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>*, we have* <sup>R</sup>ϕ(ρ) = ρ, ϕ*.*

We provide the reward machine construction for Theorem 3 in the next subsection. Using this theorem, one can use a reward machine R<sup>ϕ</sup> that matches the value of a particular uniformly discounted LTL formula ϕ, and then apply the procedure outlined above for computing optimal finite-memory policies for reward machines.

Corollary 1. *For any MDP* M*, labelling function* L *and a discounted LTL formula* ϕ *in which all temporal operators use a common discount factor* λ*, there exists a finite-memory optimal policy* π ∈ Π*opt*(M, ϕ)*. Furthermore, there is an algorithm to compute such a policy.*

#### 4.1 Reward Machine Construction

For our construction, we examine the case of uniformly discounted LTL formula with positive discount factors λ ∈ (0, 1). This allows us to divide by λ in our construction. We note that the case of uniformly discounted LTL formula with λ = 0 can be evaluated after reading the initial letter of the word, and thus have trivial reward machines.

The reward machine R<sup>ϕ</sup> constructed for the uniformly discounted LTL formula ϕ exhibits a special structure. Specifically, all edges within any given strongly-connected component (SCC) of R<sup>ϕ</sup> share the same reward, which is either 0 or 1 − λ, while all other rewards fall within the range of [0, 1 − λ]. We present an inductive construction of the reward machines over the syntax of discounted LTL that maintains these invariants.

Lemma 1. *For any uniformly discounted LTL formula* ϕ *there exists a reward machine* R<sup>ϕ</sup> = (Q, δ, r, q0, λ) *such that following hold:*

Fig. 4. Reward machines for <sup>ϕ</sup> <sup>=</sup> <sup>p</sup> (left) and <sup>ϕ</sup> <sup>=</sup> **<sup>X</sup>**λ<sup>q</sup> (right). The transitions are labeled by the guard and reward.

<sup>I</sup>1*. For any* <sup>ρ</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>*, we have* <sup>R</sup>ϕ(ρ) = ρ, ϕ*.* I2*. There is a partition of the states* Q = <sup>L</sup> =1 Q *and a type mapping* χ : [L] → {0, 1 − λ} *such that for any* q ∈ Q *and* σ ∈ Σ*, (a)* <sup>δ</sup>(q, σ) <sup>∈</sup> <sup>L</sup> <sup>m</sup><sup>=</sup> Qm*, and (b) if* δ(q, σ) ∈ Q *then* r(q, σ) = χ()*.* I3*. For any* q ∈ Q *and* σ ∈ Σ*, we have* 0 ≤ r(q, σ) ≤ 1 − λ*.*

Our construction proceeds inductively. We define the reward machine for the base case of a single atomic proposition, i.e. ϕ = p, and then the construction for negation, the next operator, disjunction, the eventually operator (for ease of presentation), and the until operator. The ideas used in the constructions for disjunction, the eventually operator, and the until operator build off of each other, as they all involve keeping track of the maximum/minimum value over a set of subformulas. We use properties I<sup>1</sup> and I<sup>3</sup> to show correctness, and properties I<sup>2</sup> and I<sup>3</sup> to show finiteness. A summary of the construction and detailed proofs can be found in the full version of this paper [4].

Atomic Propositions. Let ϕ = p for some p ∈ P. The reward machine R<sup>ϕ</sup> = (Q, δ, r, q0, λ) for ϕ is such that Q = {q0, q1, q2} and δ(q, σ) = q for all q ∈ {q1, q2} and σ ∈ Σ. The reward machine is shown in Fig. 4 where edges are labelled with propositions and rewards. If p ∈ σ, δ(q0, σ) = q<sup>1</sup> and r(q0, σ)=1 − λ. If p /∈ σ, δ(q0, σ) = q<sup>2</sup> and r(q0, σ)=0. Finally, r(q1, σ)=1 − λ and r(q2, σ)=0 for all σ ∈ Σ. It is clear to see that I1, I2, and I<sup>3</sup> hold.

Negation. Let ϕ = ¬ϕ<sup>1</sup> for some LTL formula ϕ<sup>1</sup> and let R<sup>ϕ</sup><sup>1</sup> = (Q, δ, r, q0, λ) be the reward machine for ϕ1. Notice that the reward machine for ϕ can be constructed from R<sup>ϕ</sup><sup>1</sup> by simply replacing every reward c with (1 − λ) − c as -∞ <sup>i</sup>=0 λ<sup>i</sup> (1 − λ)=1. Formally, R<sup>ϕ</sup> = (Q, δ, r , q0, λ) where r (q, σ) = (1 − λ) − r(q, σ) for all q ∈ Q and σ ∈ Σ. Again, assuming that invariants I1, I2, and I<sup>3</sup> hold for R<sup>ϕ</sup><sup>1</sup> , it easily follows that they hold for Rϕ.

Next Operator. Let ϕ = **X**λϕ<sup>1</sup> for some ϕ<sup>1</sup> and let R<sup>ϕ</sup><sup>1</sup> = (Q, δ, r, q0, λ) be the reward machine for ϕ1. The reward machine for ϕ can be constructed from Rϕ<sup>1</sup> by adding a new initial state q <sup>0</sup> and a transition in the first step from it to the initial state of Rϕ<sup>1</sup> . From the next step R<sup>ϕ</sup> simulates Rϕ<sup>1</sup> . This has the resulting effect of skipping the first letter, and decreasing the value by λ. Formally, R<sup>ϕ</sup> = ({q <sup>0</sup>} Q, δ , r , q <sup>0</sup>, λ) where δ (q <sup>0</sup>, σ) = q<sup>0</sup> and δ (q, σ) = δ(q, σ) for all q ∈ Q and σ ∈ Σ. Similarly, r (q <sup>0</sup>, σ)=0 and r (q, σ) = r(q, σ) for all q ∈ Q and σ ∈ Σ. Assuming that invariants I1, I2, and I<sup>3</sup> hold for Rϕ<sup>1</sup> , it follows that they hold for Rϕ.

Disjunction. Let <sup>ϕ</sup> <sup>=</sup> <sup>ϕ</sup><sup>1</sup> <sup>∨</sup> <sup>ϕ</sup><sup>2</sup> for some <sup>ϕ</sup>1, ϕ<sup>2</sup> and let <sup>R</sup><sup>ϕ</sup><sup>1</sup> = (Q1, δ1, r1, q<sup>1</sup> <sup>0</sup>, λ) and <sup>R</sup><sup>ϕ</sup><sup>2</sup> = (Q2, δ2, r2, q<sup>2</sup> <sup>0</sup>, λ) be the reward machines for ϕ<sup>1</sup> and ϕ2, respectively. The reward machine R<sup>ϕ</sup> = (Q, δ, r, q0, λ) is constructed R<sup>ϕ</sup><sup>1</sup> and R<sup>ϕ</sup><sup>2</sup> such that for any finite word it maintains the invariant that the discounted reward is the maximum of the reward provided by R<sup>ϕ</sup><sup>1</sup> and R<sup>ϕ</sup><sup>2</sup> . Moreover, once it is ascertained that the reward provided by one machine cannot be overtaken by the other for any suffix, R<sup>ϕ</sup> begins simulating the reward machine with higher reward.

The construction involves a product construction along with a real-valued component that stores a scaled difference between the total accumulated reward for <sup>ϕ</sup><sup>1</sup> and <sup>ϕ</sup>2. In particular, <sup>Q</sup> = (Q<sup>1</sup> <sup>×</sup> <sup>Q</sup><sup>2</sup> <sup>×</sup> <sup>R</sup>) <sup>Q</sup><sup>1</sup> <sup>Q</sup><sup>2</sup> and <sup>q</sup><sup>0</sup> = (q<sup>1</sup> 0, q<sup>2</sup> <sup>0</sup>, 0). The reward deficit ζ of a state q = (q1, q2, ζ) denotes the difference between the total accumulated reward for ϕ<sup>1</sup> and ϕ<sup>2</sup> divided by λ<sup>n</sup> where n is the total number of steps taken to reach q. The reward function is defined as follows.

– For q = (q1, q2, ζ), we let f(q, σ) = r1(q1, σ) − r2(q2, σ) + ζ denote the new (scaled) difference between the discounted-sum rewards accumulated by R<sup>ϕ</sup><sup>1</sup> and R<sup>ϕ</sup><sup>2</sup> . The current reward depends on whether f(q, σ) is positive (accumulated reward from R<sup>ϕ</sup><sup>1</sup> is higher) or negative and whether the sign is different from ζ. Formally,

$$r(q, \sigma) = \begin{cases} r\_1(q\_1, \sigma) + \min\{0, \zeta\} & \text{if } f(q, \sigma) \ge 0 \\ r\_2(q\_2, \sigma) - \max\{0, \zeta\} & \text{if } f(q, \sigma) < 0 \end{cases}$$

– For a state q<sup>i</sup> ∈ Q<sup>i</sup> we have r(qi, σ) = ri(qi, σ).

Now we need to make sure that ζ is updated correctly. We also want the transition function to be such that the (reachable) state space is finite and the reward machine satisfies I1, I<sup>2</sup> and I3.

– First, we make sure that, when the difference ζ is too large, the machine transitions to the appropriate state in Q<sup>1</sup> or Q2. For a state q = (q1, q2, ζ) with |ζ| ≥ 1, we have

$$\delta(q,\sigma) = \begin{cases} \delta\_1(q\_1,\sigma) & \text{if } \zeta \ge 1 \\ \delta\_2(q\_2,\sigma) & \text{if } \zeta \le -1. \end{cases}$$

– For states with |ζ| < 1, we simply advance both the states and update ζ accordingly. Letting f(q, σ) = r1(q1, σ) − r2(q2, σ) + ζ, we have that for a state q = (q1, q2, ζ) with |ζ| < 1,

$$\delta(q,\sigma) = (\delta\_1(q\_1,\sigma), \delta\_2(q\_2,\sigma), f(q,\sigma)/\lambda). \tag{2}$$

– Finally, for q<sup>i</sup> ∈ Qi, δ(qi, σ) = δi(qi, σ).

*Finiteness.* We argue that the (reachable) state space of R<sup>ϕ</sup> is finite. Let Q<sup>i</sup> = L<sup>i</sup> =1 Q<sup>i</sup> for i ∈ {1, 2} be the SCC decompositions of Q<sup>1</sup> and Q<sup>2</sup> that satisfy property <sup>I</sup><sup>2</sup> for <sup>R</sup>ϕ<sup>1</sup> and <sup>R</sup>ϕ<sup>2</sup> respectively. Intuitively, if <sup>R</sup><sup>ϕ</sup> stays within <sup>Q</sup><sup>1</sup> × Q<sup>2</sup> <sup>m</sup> <sup>×</sup> <sup>R</sup> for some <sup>≤</sup> <sup>L</sup><sup>1</sup> and <sup>m</sup> <sup>≤</sup> <sup>L</sup>2, then the rewards from <sup>R</sup><sup>ϕ</sup><sup>1</sup> and <sup>R</sup><sup>ϕ</sup><sup>2</sup> are constant; this enables us to infer the reward machine (R<sup>ϕ</sup><sup>1</sup> and R<sup>ϕ</sup><sup>2</sup> ) with the higher total accumulated reward in a finite amount of time after which we transition to Q<sup>1</sup> or Q2. Hence the set of all possible values of ζ in a reachable state (q1, q2, ζ) <sup>∈</sup> <sup>Q</sup><sup>1</sup> <sup>×</sup> <sup>Q</sup><sup>2</sup> <sup>m</sup> <sup>×</sup> <sup>R</sup> is finite. This can be shown by induction.

*Property* I1*.* Intuitively, it suffices to show that Rϕ(w) = max{R<sup>ϕ</sup><sup>1</sup> (w), R<sup>ϕ</sup><sup>2</sup> (w)} for every finite word w ∈ Σ∗. We show this property along with the fact that for any w ∈ Σ<sup>∗</sup> of length n, if the reward machine reaches a state (q1, q2, ζ), then <sup>ζ</sup> = (R<sup>ϕ</sup><sup>1</sup> (w) − R<sup>ϕ</sup><sup>2</sup> (w))/λ<sup>n</sup>. This can be proved using induction on <sup>n</sup>.

*Property* I2*.* This property is true if and only if for every SCC C of R<sup>ϕ</sup> there is a type c ∈ {0, 1−λ} such that if δ(q, σ) = q for some q, q ∈ C and σ ∈ Σ, we have r(q, σ) = c. From the definition of the transition function δ, C cannot contain two states where one is of the form (q1, q2, ζ) <sup>∈</sup> <sup>Q</sup><sup>1</sup> <sup>×</sup> <sup>Q</sup><sup>2</sup> <sup>×</sup> <sup>R</sup> and the other is q<sup>i</sup> ∈ Q<sup>i</sup> for some i ∈ {1, 2}. Now if C is completely contained in Q<sup>i</sup> for some i, we can conclude from the inductive hypothesis that the rewards within C are constant (and they are all either 0 or 1 − λ). When all states of C are contained in <sup>Q</sup><sup>1</sup> <sup>×</sup> <sup>Q</sup><sup>2</sup> <sup>×</sup> <sup>R</sup>, they must be contained in <sup>Q</sup>¯<sup>1</sup> <sup>×</sup> <sup>Q</sup>¯<sup>2</sup> <sup>×</sup> <sup>R</sup> where <sup>Q</sup>¯<sup>i</sup> is some SCC of R<sup>ϕ</sup><sup>i</sup> . In such a case, we can show that |C| = 1 and in the presence of a self loop on a state within C, the reward must be either 0 or 1 − λ.

*Property* I3*.* We now show that all rewards are bounded between 0 and (1 − λ). Let q = (q1, q2, ζ) and f(q, σ) = r1(q1, σ) − r2(q2, σ) + ζ. We show the bound for the case when f(q, σ) ≥ 0 and the other case is similar. If ζ ≥ 0, then r(q, σ) = r1(q1, σ) ∈ [0, 1 − λ]. If ζ < 0, then r(q, σ) ≤ r1(q1, σ) ≤ 1 − λ and

$$r(q, \sigma) = r\_1(q\_1, \sigma) + \zeta = f(q, \sigma) + r\_2(q\_2, \sigma) \ge 0.$$

This concludes the construction for ϕ<sup>1</sup> ∨ ϕ2.

Eventually Operator. For ease of presentation, we treat the until operator as a generalization of the *eventually* operator **F**<sup>λ</sup> and present it first. We have that <sup>ϕ</sup> <sup>=</sup> **<sup>F</sup>**ϕ<sup>1</sup> for some <sup>ϕ</sup>1. Let <sup>R</sup><sup>ϕ</sup><sup>1</sup> = (Q1, δ1, r1, q<sup>1</sup> <sup>0</sup>, λ) be the reward machine for ϕ1. Let **X**<sup>i</sup> <sup>λ</sup> denote the operator **X**<sup>λ</sup> applied i times. We begin by noting that

$$\mathbf{F}\_{\lambda}\varphi\_1 \equiv \bigvee\_{i\geq 0} \mathbf{X}\_{\lambda}^{i}\varphi\_1 = \varphi\_1 \vee \mathbf{X}\_{\lambda}\varphi\_1 \vee \mathbf{X}\_{\lambda}^2\varphi\_1 \vee \dots \vee$$

The idea of the construction is to keep track of the unrolling of this formula up to the current timestep n,

$$\mathbf{F}^{n}\_{\lambda}\varphi\_{1} = \bigvee\_{n\geq i\geq 0} \mathbf{X}^{i}\_{\lambda}\varphi\_{1} = \varphi\_{1}\vee \mathbf{X}\_{\lambda}\varphi\_{1}\vee \mathbf{X}^{2}\_{\lambda}\varphi\_{1}\vee \dots \vee \mathbf{X}^{n}\_{\lambda}\varphi\_{1}\dots$$

For this, we will generalize the construction for disjunction. In the disjunction construction, there were states of the form (q1, q2, ζ) where ζ was a bookkeeping parameter that kept track of the difference between Rϕ<sup>1</sup> (w) and Rϕ<sup>2</sup> (w), namely, <sup>ζ</sup> = (R<sup>ϕ</sup><sup>1</sup> (w)−R<sup>ϕ</sup><sup>2</sup> (w))/λ<sup>n</sup> where <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> is some finite word of length <sup>n</sup>. To generalize this notion to make a reward machine for max{R1,..., Rk}, we will have states of the form {(q1, ζ1),...,(qn, ζn)} where <sup>ζ</sup><sup>i</sup> = (Ri(w) <sup>−</sup> max<sup>j</sup> <sup>R</sup><sup>j</sup> (w))/λ<sup>n</sup>. When <sup>ζ</sup><sup>i</sup> ≤ −<sup>1</sup> then <sup>R</sup>i(w) +λ<sup>n</sup> <sup>≤</sup> max<sup>j</sup> <sup>R</sup><sup>j</sup> (w) and we know that the associated reward machine R<sup>i</sup> cannot be the maximum, so we drop it from our set. We also note that the value of **X**<sup>i</sup> <sup>λ</sup>ϕ<sup>1</sup> can be determined by simply waiting i steps before starting the reward machine <sup>R</sup><sup>ϕ</sup><sup>1</sup> , i.e. <sup>λ</sup><sup>i</sup> R<sup>ϕ</sup><sup>1</sup> (ρi:∞) = R**X**<sup>i</sup> <sup>λ</sup>ϕ<sup>1</sup> (ρ). This allows us to perform a subset construction for this operator.

For a finite word w = σ0σ<sup>1</sup> ...σ<sup>n</sup> ∈ Σ<sup>∗</sup> and a nonnegative integer k, let wk:<sup>∞</sup> denote the subword σ<sup>k</sup> ...σ<sup>n</sup> which equals the empty word if k>n. We use the notation -**X**<sup>k</sup> <sup>λ</sup>ϕ1, w <sup>=</sup> <sup>λ</sup><sup>k</sup>R<sup>ϕ</sup><sup>1</sup> (wk:∞) and define -**F**<sup>k</sup> <sup>λ</sup>ϕ1, w = max<sup>k</sup>≥i≥<sup>0</sup> -**X**<sup>k</sup> <sup>λ</sup>ϕ1, w which represents the maximum value accumulated by the reward machine of some formula of the form **X**<sup>i</sup> <sup>λ</sup>ϕ<sup>1</sup> with i ≤ k on a finite word w. The reward machine for **F**λϕ<sup>1</sup> will consist of states of the form (v, S), containing a value v for bookkeeping and a set S that keeps track of the states of all R**X**<sup>i</sup> <sup>λ</sup>ϕ<sup>1</sup> that may still obtain the maximum given a finite prefix w of length n, i.e. reward machine states of all subformulas **X**<sup>i</sup> <sup>λ</sup>ϕ<sup>1</sup> for n ≥ i ≥ 0 that satisfy -**X**<sup>i</sup> <sup>λ</sup>ϕ1, w+λ<sup>n</sup> > -**F**<sup>n</sup> <sup>λ</sup>ϕ1, w since λ<sup>n</sup> is the maximum additional reward obtainable by any <sup>ρ</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> with prefix <sup>w</sup>. The subset <sup>S</sup> consists of elements of the form (qi, ζi) <sup>∈</sup> <sup>S</sup> where <sup>q</sup><sup>i</sup> <sup>=</sup> <sup>δ</sup>1(q<sup>1</sup> <sup>0</sup>, wi:∞) and ζ<sup>i</sup> = (-**X**<sup>i</sup> <sup>λ</sup>ϕ1, w − -**F**<sup>n</sup> λϕ1, w)/λ<sup>n</sup> corresponding to each subformula **X**<sup>i</sup> <sup>λ</sup>ϕ1. The value v = max{−1, −-**F**<sup>n</sup> <sup>λ</sup>ϕ1, w/λ<sup>n</sup>} is a bookkeeping parameter used to initialize new elements in the set S and to stop adding elements to S when v ≤ −1. We now present the construction formally.

We form a reward machine <sup>R</sup><sup>ϕ</sup> = (Q, δ, r, q0, λ) where <sup>Q</sup> <sup>=</sup> <sup>R</sup> <sup>×</sup> <sup>2</sup><sup>Q</sup>1×<sup>R</sup> and <sup>q</sup><sup>0</sup> = (0, {(q<sup>1</sup> <sup>0</sup>, 0)}). We define a few functions that ease defining our transition function. Let <sup>f</sup>(ζ, q, σ) = <sup>r</sup>1(q, σ) + <sup>ζ</sup> and <sup>m</sup>(S, σ) = max (qi,ζi)∈<sup>S</sup> <sup>f</sup>(ζi, qi, σ). For the subset construction, we define

$$\Delta(S,\sigma) = \bigcup\_{(q,\zeta)\in S} \{ (\delta\_1(q,\sigma),\zeta'):\zeta' = \left( (f(\zeta,q,\sigma) - m(S,\sigma))/\lambda \right) > -1 \}$$

The transition function is

$$\delta((v, S), \sigma) = \begin{cases} \left( v'(S, v, \sigma), \ \Delta(S, \sigma) \perp \left( q\_0^1, v'(S, v, \sigma) \right) \right) & \text{if } v'(S, v, \sigma) > -1 \\ \left( -1, \ \Delta(S, \sigma) \right) & \text{if } v'(S, v, \sigma) \le -1 \end{cases}$$

where v (S, v, σ)=(v − m(S, σ))/λ. The reward function is r((v, S), σ) = m(S, σ).

We now argue that R<sup>ϕ</sup> satisfies properties I1, I<sup>2</sup> and I<sup>3</sup> and the set of reachable states in R<sup>ϕ</sup> is finite assuming Rϕ<sup>1</sup> satisfies I1, I<sup>2</sup> and I3.

*Finiteness.* Consider states of the form (v, S) ∈ Q. If v = 0, then it must be that ζ<sup>i</sup> = 0 for all (qi, ζi) ∈ S since receiving a non-zero reward causes the value of v to become negative. There are only finitely many such states. If −1 <v< 0, then we will reach a state (v , S ) ∈ Q with v = −1 in at most n steps, where <sup>n</sup> is such that v/λ<sup>n</sup> ≤ −1. Therefore, the number of reachable states (v, S) with −1 <v< 0 is also finite. Also, the number of states of the form (−1, S) that can be initially reached (via paths consisting only of states of the form (v, S ) with v > −1) is finite. Furthermore, upon reaching such a state (−1, S), the reward machine is similar to that of a disjunction (maximum) of |S| reward machines. From this we can conclude that the full reachable state space is finite.

*Property* I1*.* The transition function is designed so that the following holds true: for any finite word w ∈ Σ<sup>∗</sup> of length n and letter σ ∈ Σ, if δ(q0, w)=(v, S), then m(S, σ)=(-**F**<sup>n</sup>+1 <sup>λ</sup> ϕ1, wσ − -**F**<sup>n</sup> <sup>λ</sup>ϕ1, w)/λ<sup>n</sup>. Since r((v, S), σ) = m(S, σ), we get that Rϕ(w) = -**F**<sup>n</sup> <sup>λ</sup>ϕ1, w. Thus, Rϕ(ρ) = -**F**λϕ1, ρ for any infinite word <sup>ρ</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>. This property for <sup>m</sup>(S, σ) follows from the preservation of all the properties outlined in the above description of the construction.

*Property* I2*.* Consider an SCC C in R<sup>ϕ</sup> such that (v, S) = δ((v, S), w) for some (v, S) ∈ C and w ∈ Σ<sup>∗</sup> of length n > 0. Note that if −1 <v< 0, then (v , S ) = δ((v, S), w) is such that v < v. Thus, it must be that v = 0 or v = −1. If v = 0, then all the reward must be zero, since any nonzero rewards result in v < <sup>0</sup>. If <sup>v</sup> <sup>=</sup> <sup>−</sup>1, then it must be that for any (qi, ζi) <sup>∈</sup> <sup>S</sup>, <sup>q</sup><sup>i</sup> is in an SCC <sup>C</sup><sup>i</sup> <sup>1</sup> in R<sup>ϕ</sup><sup>1</sup> with some reward type c<sup>i</sup> ∈ {0, 1 − λ}. For all ζ<sup>i</sup> to remain fixed (which is necessary as otherwise some ζ<sup>i</sup> strictly increases or decreases), it must be that all c<sup>i</sup> are the same, say c. Thus, the reward type in R<sup>ϕ</sup><sup>1</sup> for SCC C equals c.

*Property* I3*.* We can show that for any finite word w ∈ Σ<sup>∗</sup> of length n and letter σ ∈ Σ, if δ(q0, w)=(v, S), then the reward is r((v, S), σ) = m(S, σ) = (-**F**<sup>n</sup>+1 <sup>λ</sup> ϕ1, wσ − -**F**<sup>n</sup> <sup>λ</sup>ϕ1, w)/λ<sup>n</sup> using induction on n. Since property I<sup>3</sup> holds for R<sup>ϕ</sup><sup>1</sup> , we have that 0 ≤ (-**F**<sup>n</sup>+1 <sup>λ</sup> ϕ1, wσ − -**F**<sup>n</sup> <sup>λ</sup>ϕ1, w) <sup>≤</sup> (1 <sup>−</sup> <sup>λ</sup>)λ<sup>n</sup>.

Until Operator. We now present the until operator, generalizing the ideas presented for the eventually operator. We have that ϕ = ϕ1**U**λϕ<sup>2</sup> for some ϕ<sup>1</sup> and <sup>ϕ</sup>2. Let <sup>R</sup><sup>ϕ</sup><sup>1</sup> = (Q1, δ1, r1, q<sup>1</sup> <sup>0</sup>, λ) and <sup>R</sup><sup>ϕ</sup><sup>2</sup> = (Q2, δ2, r2, q<sup>2</sup> <sup>0</sup>, λ). Note that

$$\begin{aligned} \varphi\_1 \mathbf{U}\_{\lambda} \varphi\_2 &= \bigvee\_{i \ge 0} (\mathbf{X}\_{\lambda}^i \varphi\_2 \wedge \varphi\_1 \wedge \mathbf{X}\_{\lambda} \varphi\_1 \wedge \dots \wedge \mathbf{X}\_{\lambda}^{i-1} \varphi\_1) \\ &= \varphi\_2 \vee (\mathbf{X}\_{\lambda} \varphi\_2 \wedge \varphi\_1) \vee (\mathbf{X}\_{\lambda}^2 \varphi\_2 \wedge \varphi\_1 \wedge \mathbf{X}\_{\lambda} \varphi\_1) \vee \dots \end{aligned}$$

The goal of the construction is to keep track of the unrolling of this formula up to the current timestep n,

$$\varphi\_1 \mathbf{U}\_{\lambda}^n \varphi\_2 = \bigvee\_{n \ge i \ge 0} (\mathbf{X}\_{\lambda}^i \varphi\_2 \wedge \varphi\_1 \wedge \mathbf{X}\_{\lambda} \varphi\_1 \wedge \dots \wedge \mathbf{X}\_{\lambda}^{i-1} \varphi\_1) = \bigvee\_{n \ge i \ge 0} \psi\_i.$$

Each ψ<sup>i</sup> requires a subset construction in the style of the eventually operator construction to maintain the minimum. We then nest another subset construction in the style of the eventually operator construction to maintain the maximum over ψi. For a finite word w ∈ Σ∗, we use the notation ψi, w and ϕ1**U**<sup>k</sup> <sup>λ</sup>ϕ2, w for the value accumulated by reward machine corresponding to these formula on the word w, i.e. ψi, w = min{-**X**<sup>i</sup> <sup>λ</sup>ϕ2, mini>j≥<sup>0</sup>{-**X**j <sup>λ</sup>ϕ1, w} and ϕ1**U**<sup>k</sup> <sup>λ</sup>ϕ2, w = max<sup>k</sup>≥i≥<sup>0</sup> ψi, w.

Let <sup>S</sup> = 2(Q<sup>1</sup> <sup>Q</sup>2)×<sup>R</sup> be the set of subsets containing (q, ζ) pairs, where q may be from either Q<sup>1</sup> or Q2. The reward machine consists of states of the form (v, I, <sup>X</sup> ) where the value <sup>v</sup> <sup>∈</sup> <sup>R</sup> and the subset <sup>I</sup> ∈ S are for bookkeeping, and X ∈ 2<sup>S</sup> is a subset of subsets for each ψi. Specifically, each element of X is a subset S corresponding to a particular ψ<sup>i</sup> which may still obtain the maximum, i.e. ψi, w + λ<sup>n</sup> > ϕ1**U**<sup>n</sup> <sup>λ</sup>ϕ2, w. Each element of S is of the form (q, ζ). We have that <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>2</sup> for at most one element where <sup>q</sup> <sup>=</sup> <sup>δ</sup>2(q<sup>2</sup> <sup>0</sup>, wk:∞) and ζ = (-**X**<sup>k</sup> <sup>λ</sup>ϕ2, w − ϕ1**U**<sup>n</sup> <sup>λ</sup>ϕ2, w)/λ<sup>n</sup>. For the other elements of S, we have that <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>1</sup> with <sup>q</sup> <sup>=</sup> <sup>δ</sup>1(q<sup>1</sup> <sup>0</sup>, wk:∞) and ζ = (-**X**<sup>k</sup> <sup>λ</sup>ϕ1, w − ϕ1**U**<sup>n</sup> <sup>λ</sup>ϕ2, w)/λ<sup>n</sup>. If for any of these elements, the value of its corresponding formula becomes too large to be the minimum for the conjunction forming ψi, i.e. <sup>ψ</sup>i, w <sup>+</sup> <sup>λ</sup><sup>n</sup> <sup>≤</sup> ϕ1**U**<sup>n</sup> <sup>λ</sup>ϕ2, w+λ<sup>n</sup> <sup>≤</sup> -**X**<sup>k</sup> <sup>λ</sup>ϕt, w which occurs when ζ ≥ 1, that element is dropped from S. In order to update X , we add a new S corresponding to ψ<sup>n</sup> on the next timestep. The value v = max{−1, ϕ1**U**<sup>n</sup> <sup>λ</sup>ϕ2, w} is a bookkeeping parameter for initializing new elements in the subsets and for stopping the addition of new elements when v ≤ −1. The subset I is a bookkeeping parameter that keeps track of the subset construction for n>i≥<sup>0</sup> **<sup>X</sup>**<sup>i</sup> <sup>λ</sup>ϕ1, which is used to initialize the addition of a subset corresponding to ψ<sup>n</sup> = **X**<sup>n</sup> <sup>λ</sup>ϕ<sup>2</sup> ∧ ( n>i≥<sup>0</sup> **<sup>X</sup>**<sup>i</sup> <sup>λ</sup>ϕ1). We now define the reward machine formally.

We define a few functions that ease defining our transition function. We define δ∗(q, σ) = δi(q, σ) and f∗(ζ, q, σ) = ri(q, σ) + ζ if q ∈ Q<sup>i</sup> for i ∈ {1, 2}. We also define n(S, σ) = min(qi,ζi)∈<sup>S</sup> f∗(ζi, qi, σ) and m(X , σ) = max<sup>S</sup>∈X n(S, σ). For the subset construction, we define

$$\Delta(S,\sigma,m) = \bigcup\_{(q,\zeta)\in S} \{ (\delta\_\*(q,\sigma),\zeta'):\zeta'<1 \}$$

where ζ = (f∗(ζ, q, σ) − m)/λ and

$$T(\mathcal{X}, \sigma, m) = \bigcup\_{S \in \mathcal{X}} \{ \Delta(S, \sigma, m) : n(S, \sigma) > -1 \}.$$

We form a reward machine <sup>R</sup><sup>ϕ</sup> = (Q, δ, r, q0, λ) where <sup>Q</sup> <sup>=</sup> <sup>R</sup> ×S× <sup>2</sup><sup>S</sup> and <sup>q</sup><sup>0</sup> = (0, <sup>∅</sup>, {{(q<sup>2</sup> <sup>0</sup>, 0)}}). The transition function is

$$\delta((v, I, \mathcal{X}), \sigma) = \begin{cases} \left(v', \ I', \ T(\mathcal{X}, \sigma, m) \sqcup \left(I' \sqcup (q\_0^2, v')\right)\right) & \text{if } v' > -1\\ \left(-1, \ \emptyset, \ T(\mathcal{X}, \sigma, m)\right) & \text{if } v' \le -1 \end{cases}$$

where <sup>m</sup> <sup>=</sup> <sup>m</sup>(<sup>X</sup> , σ), <sup>v</sup> = (<sup>v</sup> <sup>−</sup> <sup>m</sup>)/λ, and <sup>I</sup> <sup>=</sup> <sup>Δ</sup>(<sup>I</sup> (q<sup>1</sup> 0, v ), σ, m). The reward function is r((v, I, X ), σ) = m(X , σ).

We now show a sketch of correctness, which mimics the proof for the eventually operator closely.

*Finiteness.* Consider states of the form (v, I, X ) ∈ Q. If v = 0, then for all S ∈ X and (qi, ζi) ∈ S it must be that ζ<sup>i</sup> = 0 since receiving a non-zero reward causes the value of v to become negative. Similarly, all ζ<sup>i</sup> = 0 for (qi, ζi) ∈ I when v = 0. There are only finitely many such states. If −1 <v< 0, then we will reach a state (v , I , X ) ∈ Q with v = −1 in at most n steps, where n is such that v/λ<sup>n</sup> ≤ −1. Therefore, the number of reachable states <sup>−</sup><sup>1</sup> <v< <sup>0</sup> is also finite. Additionally, the number of states where v = −1 that can be initially reached is finite. Upon reaching such a state (−1, ∅, X ), the reward machine is similar to that of the finite disjunction of reward machines for finite conjunctions.

*Property* I1*.* The transition function is designed so that the following holds true: for any finite word w ∈ Σ<sup>∗</sup> of length n and letter σ ∈ Σ, if δ(q0, w)=(v, I, X ), then m(X , σ)=(ϕ1**U**<sup>n</sup>+1 <sup>λ</sup> ϕ2, wσ − ϕ1**U**<sup>n</sup> <sup>λ</sup>ϕ2, w)/λ<sup>n</sup>. Since <sup>r</sup>((v, I, <sup>X</sup> ), σ) = m(X , σ), we get that Rϕ(w) = ϕ1**U**<sup>n</sup> <sup>λ</sup>ϕ2, w. Thus, Rϕ(ρ) = ϕ1**U**λϕ2, ρ for any infinite word <sup>ρ</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>. This property for <sup>m</sup>(<sup>X</sup> , σ) follows from the properties outlined in the construction, which can be shown inductively.

*Property* I2*.* Consider an SCC C of R<sup>ϕ</sup> and a state (v, I, X ) ∈ C. If v = 0, then we must receive zero reward because non-zero reward causes the value of v to become negative. It cannot be that −1 <v< 0 since if v < 0, we reach a state (v , I , X ) <sup>∈</sup> <sup>Q</sup> with <sup>v</sup> <sup>=</sup> <sup>−</sup><sup>1</sup> in at most <sup>n</sup> steps, where <sup>n</sup> is such that v/λ<sup>n</sup> ≤ −1. If v = −1, then we have a state of the form (−1, ∅, X ). For this to be an SCC, all elements of the form (qk, ζk) ∈ S for S ∈ X must be such that q<sup>k</sup> is in an SCC of its respective reward machine (either R<sup>ϕ</sup><sup>1</sup> or R<sup>ϕ</sup><sup>2</sup> ) with reward type t<sup>k</sup> ∈ {0, 1 − λ}. Additionally, there cannot be a t <sup>k</sup> = t<sup>k</sup> otherwise there would be a ζ<sup>k</sup> that changes following a cycle in the SCC C. Thus, the reward for this SCC C is tk.

*Property* I3*.* This property can be shown by recalling the property above that r((v, I, X ), σ) = m(X , σ)=(ϕ1**U**<sup>n</sup>+1 <sup>λ</sup> ϕ2, wσ − ϕ1**U**<sup>n</sup> <sup>λ</sup>ϕ2, w)/λ<sup>n</sup>.

### 5 Conclusion

This paper studied policy synthesis for discounted LTL in MDPs with unknown transition probabilities. Unlike LTL, discounted LTL provides an insensitivity to small perturbations of the transitions probabilities which enables PAC learning without additional assumptions. We outlined a PAC learning algorithm for discounted LTL that uses finite memory. We showed that optimal strategies for discounted LTL require infinite memory in general due to the need to balance the values of multiple competing objectives. To avoid this infinite memory, we examined the case of uniformly discounted LTL, where the discount factors for all temporal operators are identical. We showed how to translate uniformly discounted LTL formula to finite state reward machines. This construction shows that finite memory is sufficient, and provides an avenue to use discounted reward algorithms, such as reinforcement learning, for computing optimal policies for uniformly discounted LTL formulas.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Synthesizing Permissive Winning Strategy Templates for Parity Games

Ashwani Anand, Satya Prakash Nayak(B) , and Anne-Kathrin Schmuck

Max Planck Institute for Software Systems, Kaiserslautern, Germany {ashwani,sanayak,akschmuck}@mpi-sws.org

Abstract. We present a novel method to compute *permissive winning strategies* in two-player games over finite graphs with ω-regular winning conditions. Given a game graph G and a parity winning condition Φ, we compute a *winning strategy template* Ψ that collects an infinite number of winning strategies for objective Φ in a concise data structure. We use this new representation of sets of winning strategies to tackle two problems arising from applications of two-player games in the context of cyber-physical system design – (i) *incremental synthesis*, i.e., adapting strategies to newly arriving, *additional* ω-regular objectives Φ- , and (ii) *fault-tolerant control*, i.e., adapting strategies to the occasional or persistent unavailability of actuators. The main features of our strategy templates – which we utilize for solving these challenges – are their easy computability, adaptability, and compositionality. For *incremental synthesis*, we empirically show on a large set of benchmarks that our technique vastly outperforms existing approaches if the number of added specifications increases. While our method is not complete, our prototype implementation returns the full winning region in all 1400 benchmark instances, i.e. handling a large problem class efficiently in practice.

### 1 Introduction

Two-player ω-regular games on finite graphs are an established modeling and solution formalism for many challenging problems in the context of correct-byconstruction cyber-physical system (CPS) design [2,7,39]. Here, control software actuating a technical system "plays" against the physical environment. The winning strategy of the system player in this two-player game results in software which ensures that the controlled technical system fulfills a given temporal specification for any (possible) event or input sequence generated by the environment. Examples include warehouse robot coordination [36], reconfigurable manufacturing systems [26], and adaptive cruise control [33]. In these applications, the

c The Author(s) 2023

S. P. Nayak and A.-K. Schmuck are supported by the DFG project 389792660 TRR 248–CPEC.

A. Anand and A.-K. Schmuck are supported by the DFG project SCHM 3541/1-1.

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 436–458, 2023. https://doi.org/10.1007/978-3-031-37706-8\_22

Fig. 1. Experimental results over 1400 generalized parity games comparing the performance of our tool PeSTel against the state-of-the-art generalized parity solver GenZiel [16]. Data points give the average execution time (in ms) over all instances with the same number of parity objectives. Left: all objectives are given *upfront*. Right: objectives are added *one-by-one*. See Sect. 6 for more details on those experiments.

technical system under control, as well as its requirements, are developing and changing during the design process. It is therefore desirable to allow for maintainable and adaptable control software. This, in turn, requires solution algorithms for two-player ω-regular games which allow for this adaptability.

This paper addresses this challenge by providing a new algorithm to efficiently compute *permissive winning strategy templates* in parity games which enable rich *strategy adaptations*. Given a game graph <sup>G</sup> = (V,E) and an objective <sup>Φ</sup> a winning strategy template Ψ characterizes the winning region W ⊆ V along with three types of local edge conditions – a *safety*, a *co-live*, and a *live-group* template. The conjunction of these basic templates allows us to capture infinitely many winning strategies over G w.r.t. Φ in a simple data structure that is both (i) easy to obtain during synthesis, and (ii) easy to adapt and compose.

We showcase the usefulness of *permissive winning strategy templates* in the context of CPS design by two application scenarios: (i) *incremental synthesis*, where strategies need to be adapted to newly arriving *additional* ω-regular objectives Φ , and (ii) *fault-tolerant control*, where strategies need to be adapted to the occasional or persistent unavailability of actuators, i.e., system player edges.

We have implemented our algorithms in a prototype tool PeSTel and run it on more than 1400 benchmarks adapted from the SYNTCOMP benchmark suite [21]. These experiments show that our class of templates effectively avoids recomputations for the required strategy adaptations. For*incremental synthesis*, our experimental results are previewed in Fig. 1, where we compare PeSTel against the state-of-the-art solver GenZiel [16] for generalized parity objectives, i.e., finite conjunction of parity objectives. We see that PeSTel is as efficient as GenZiel whenever all conjuncts of the objective are given *up-front* (Fig. 1(left)) - even outperforming it in more than 90% of the instances. Whenever conjuncts of the objective arrive *one at a time*, PeSTel outperforms the existing approaches significantly if the number of objectives increases (Fig. 1(right)). This shows the potential of PeSTel towards more adaptable and maintainable control software for CPS.

$$\begin{array}{c} \begin{array}{c} f \\ \hline \\ \emptyset \\ e \end{array} \end{array} \begin{array}{c} \Phi\_{1} = \Box \neg \{ f \} \quad \Rightarrow \Psi\_{1} = \Psi\_{\text{UNSA}\mathbb{R}}(e\_{dc})\\ \Phi\_{2} = \Box \Diamond \{ c, d \} \Rightarrow \Psi\_{2} = \Psi\_{\text{UN}\mathbb{R}}(\{ e\_{ac}, e\_{ad} \})\\ \Phi\_{3} = \Diamond \Box \neg \{ b \} \quad \Rightarrow \Psi\_{3} = \Psi\_{\text{Cottv}\mathbb{R}}(e\_{ab}, e\_{db}, e\_{dc}) \end{array}$$

Fig. 2. A two-player game graph with Player <sup>1</sup> (squares) and Player <sup>0</sup> (circles) vertices, different winning conditions Φi, and corresponding winning strategy templates Ψi.

Illustrative Example. To appreciate the simplicity and easy adaptability of our strategy templates, consider the game graph in Fig. 2(left). The first winning condition Φ<sup>1</sup> requires vertex f to never be seen along a play. This can be enforced by Player <sup>0</sup> from vertices <sup>W</sup><sup>0</sup> <sup>=</sup> {a, b, c, d} called the *winning region*. The safety template Ψ<sup>1</sup> ensures that the game always stays in W<sup>0</sup> by forcing the edge ede to never be taken. It is easy to see that every Player 0 strategy that follows this rule results in plays which are winning if they start in W0. Now consider the second winning condition Φ<sup>2</sup> which requires vertex c or d to be seen infinitely often. This induces the live-group template Ψ<sup>2</sup> which requires that whenever vertex a is seen infinitely often, either edge eac or edge ead needs to be taken infinitely often. It is easy to see that any strategy that complies with this edge-condition is winning for Player 0 from every vertex and there are infinitely many such compliant winning strategies. Finally, we consider condition Φ<sup>3</sup> requiring vertex b to be seen only finitely often. This induces the strategy template Ψ<sup>3</sup> which is a co-liveness template requiring that all edges from Player 0 vertices which unavoidably lead to b (i.e., eab, ebd, and ede) are taken only finitely often. We can now combine all templates into a new template <sup>Ψ</sup> <sup>=</sup> <sup>Ψ</sup><sup>1</sup> <sup>∧</sup> <sup>Ψ</sup><sup>2</sup> <sup>∧</sup> <sup>Ψ</sup><sup>3</sup> and observe that all strategies compliant with <sup>Ψ</sup> are winning for <sup>Φ</sup> <sup>=</sup> <sup>Φ</sup><sup>1</sup> <sup>∧</sup> <sup>Φ</sup><sup>2</sup> <sup>∧</sup> <sup>Φ</sup>3.

In addition to their compositionality, strategy templates also allow for local strategy adaptations in case of edge unavailability faults. Consider again the game in Fig. <sup>2</sup> with the objective <sup>Φ</sup>2. Suppose that Player <sup>0</sup> follows the strategy π: a → d and d → a, which is compliant with Ψ2. If the edge ead becomes unavailable, we would need to re-solve the game for the modified game graph <sup>G</sup> = (V,E \ {ead}). However, given the strategy template <sup>Ψ</sup><sup>2</sup> we see that the strategy π : a → c and d → a is actually compliant with Ψ<sup>2</sup> over G . This allows us to obtain a new strategy without re-solving the game.

While these examples demonstrate the potential of templates for strategy adaptation, there exist scenarios where conflicts between templates or graph modifications arise, which require re-computations. Our empirical results, however, show that such conflicts rarely appear in practical benchmarks. This suggests that our technique can handle a large problem class efficiently in practice.

Related Work. The class of templates we use was introduced in [4] and utilized to represent environment assumptions that enable a system to fulfill its specifications in a cooperative setting. Contrary to [4], this paper uses the same class of templates to represent the system's winning strategies in a zero-sum setting.

While the computation of *permissive strategies* for the control of CPS is an established concept in the field of supervisory control<sup>1</sup> [14,42], it has also been addressed in reactive synthesis where the considered specification class is typically more expressive, e.g., Bernet et al. [8] introduce permissive strategies that encompass all the behaviors of positional strategies and Neider et al. [31] introduce permissiveness to subsume strategies that visit losing loops at most twice. Finally, Bouyer et al. [11] take a quantitative approach to measure the permissiveness of strategies, by minimizing the penalty of not being permissive. However, all these approaches are not optimized towards strategy adaptation and thereby typically fail to preserve enough behaviors to be able to effectively satisfy subsequent objectives. A notable exception is a work by Baier et al. [23]. While their strategy templates are more complicated and more costly to compute than ours, they are *maximally* permissive (i.e., capture *all* winning strategies in the game). However, when composing multiple objectives, they restrict templates substantially which eliminates many compositional solutions that our method retains. This results in higher computation times and lower result quality for incremental synthesis compared to our approach. As no implementation of their method is available, we could not compare both approaches empirically.

Even without the incremental aspect, synthesizing winning strategies for conjunctions of ω-regular objectives is known to be a hard problem – Chatterjee et al. [16] prove that the conjunction of even *two* parity objectives makes the problem NP-complete. They provide a generalization of Zielonka's algorithm, called GenZiel for generalized parity objectives (i.e., finite conjunction of parity objectives) which is compared to our tool PeSTel in Fig. 1. While PeSTel is (in contrast to GenZiel) not complete—i.e., there exist realizable synthesis problems for which PeSTel returns no solution—our prototype implementation returns the full winning region in all 1400 benchmark instances.

Fault-tolerant control is a well-established topic in control engineering [9], with recent emphasis on the logical control layer [19,30]. While most of this work is conducted in the context of supervisory control, there are also some approaches in reactive synthesis. While [29,32] considers the *addition* of "disturbance edges" and synthesizes a strategy that tolerates as many of them as possible, we look at the complementary problem, where edges, in particular system-player edges, disappear. To the best of our knowledge, the only algorithm that is able to tackle this problem without re-computation considers Büchi games [15]. In contrast, our method is applicable to the more expressive class of Parity games.

### 2 Preliminaries

Notation. We use N to denote the set of natural numbers including zero. Given two natural numbers a, b <sup>∈</sup> <sup>N</sup> with a<b, we use [a; <sup>b</sup>] to denote the set {<sup>n</sup> <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>a</sup> <sup>≤</sup> <sup>n</sup> <sup>≤</sup> <sup>b</sup>}. For any given set [a; <sup>b</sup>], we write <sup>i</sup> <sup>∈</sup>even [a; <sup>b</sup>] and <sup>i</sup> <sup>∈</sup>odd [a; <sup>b</sup>] as shorthand for <sup>i</sup> <sup>∈</sup> [a; <sup>b</sup>] ∩ {0, <sup>2</sup>, <sup>4</sup>,...} and <sup>i</sup> <sup>∈</sup> [a; <sup>b</sup>] ∩ {1, <sup>3</sup>, <sup>5</sup>,...}

<sup>1</sup> See [18,28,37] for connections between supervisory control and reactive synthesis.

respectively. Given two sets A and B, a relation R ⊆ A × B, and an element <sup>a</sup> <sup>∈</sup> <sup>A</sup>, we write <sup>R</sup>(a) to denote the set {<sup>b</sup> <sup>∈</sup> <sup>B</sup> <sup>|</sup> (a, b) <sup>∈</sup> <sup>R</sup>}.

Languages. Let Σ be a finite alphabet. The notation Σ<sup>∗</sup> and Σ<sup>ω</sup> respectively denote the set of finite and infinite words over <sup>Σ</sup>, and <sup>Σ</sup><sup>∞</sup> is equal to <sup>Σ</sup><sup>∗</sup> <sup>∪</sup> <sup>Σ</sup>ω. For any word w ∈ Σ∞, w<sup>i</sup> denotes the i-th symbol in w. Given two words u ∈ Σ<sup>∗</sup> and v ∈ Σ∞, the concatenation of u and v is written as the word uv.

Game Graphs. <sup>A</sup> *game graph* is a tuple <sup>G</sup> = - <sup>V</sup> <sup>=</sup> <sup>V</sup> <sup>0</sup> ∪· <sup>V</sup> <sup>1</sup>, E where (V,E) is a finite directed graph with *vertices* <sup>V</sup> and *edges* <sup>E</sup>, and <sup>V</sup> <sup>0</sup>, V <sup>1</sup> <sup>⊆</sup> <sup>V</sup> form a partition of V . Without loss of generality, we assume that for every v ∈ V there exists <sup>v</sup> <sup>∈</sup> <sup>V</sup> s.t. (v, v ) <sup>∈</sup> <sup>E</sup>. A *play* originating at a vertex <sup>v</sup><sup>0</sup> is a finite or infinite sequence of vertices <sup>ρ</sup> <sup>=</sup> <sup>v</sup>0v<sup>1</sup> ... <sup>∈</sup> <sup>V</sup> <sup>∞</sup>.

Winning Conditions/Objectives. Given a game graph G, we consider winning conditions/objectives specified using a formula Φ in *linear temporal logic* (LTL) over the vertex set V , that is, we consider LTL formulas whose atomic propositions are sets of vertices V . In this case the set of desired infinite plays is given by the semantics of <sup>Φ</sup> which is an <sup>ω</sup>-regular language <sup>L</sup>(Φ) <sup>⊆</sup> <sup>V</sup> <sup>ω</sup>. Every game graph with an arbitrary ω-regular set of desired infinite plays can be reduced to a game graph (possibly with a different set of vertices) with an LTL winning condition, as above. The standard definitions of ω-regular languages and LTL are omitted for brevity and can be found in standard textbooks [6]. To simplify notation we use <sup>e</sup> = (u, v) in LTL formulas as syntactic sugar for u ∧ v, with as the LTL *next* operator. We further use a set of edges <sup>E</sup> <sup>=</sup> {ei}<sup>i</sup>∈[0;k] as atomic proposition to denote <sup>i</sup>∈[0;k] <sup>e</sup>i.

Games and Strategies. <sup>A</sup> *two-player (turn-based) game* is a pair <sup>G</sup> = (G, Φ) where G is a game graph and Φ is a *winning condition* over G. A strategy of Player i, i ∈ {0, 1}, is a function <sup>π</sup><sup>i</sup> : <sup>V</sup> <sup>∗</sup><sup>V</sup> <sup>i</sup> <sup>→</sup> <sup>V</sup> such that for every ρv <sup>∈</sup> <sup>V</sup> <sup>∗</sup><sup>V</sup> <sup>i</sup> holds that π<sup>i</sup> (ρv) <sup>∈</sup> <sup>E</sup>(v). Given a strategy <sup>π</sup><sup>i</sup> , we say that the play <sup>ρ</sup> <sup>=</sup> <sup>v</sup>0v<sup>1</sup> ... is *compliant* with <sup>π</sup><sup>i</sup> if <sup>v</sup><sup>k</sup>−<sup>1</sup> <sup>∈</sup> <sup>V</sup> <sup>i</sup> implies <sup>v</sup><sup>k</sup> <sup>=</sup> <sup>π</sup><sup>i</sup> (v<sup>0</sup> ...v<sup>k</sup>−<sup>1</sup>) for all <sup>k</sup>. We refer to a play compliant with π<sup>i</sup> and a play compliant with both π<sup>0</sup> and π<sup>1</sup> as a πi *-play* and a π<sup>0</sup>π<sup>1</sup>*-play*, respectively. We collect all plays originating in a set S and compliant with π<sup>i</sup> , (and compliant with both <sup>π</sup><sup>0</sup> and <sup>π</sup><sup>1</sup>) in the sets <sup>L</sup>(S, π<sup>i</sup> ) (and <sup>L</sup>(S, π<sup>0</sup>π<sup>1</sup>), respectively). When <sup>S</sup> <sup>=</sup> <sup>V</sup> , we drop the mention of the set in the previous notation, and when <sup>S</sup> is singleton {v}, we simply write <sup>L</sup>(v, π<sup>i</sup> ) (and <sup>L</sup>(v, π<sup>0</sup>π<sup>1</sup>), respectively).

Winning. Given a game <sup>G</sup> = (G, Φ), a play <sup>ρ</sup> in <sup>G</sup> is *winning for* Player 0, if <sup>ρ</sup> ∈ L(Φ), and it is winning for Player 1, otherwise. A strategy <sup>π</sup><sup>i</sup> for Player <sup>i</sup> is *winning from a vertex* <sup>v</sup> <sup>∈</sup> <sup>V</sup> if all plays compliant with <sup>π</sup><sup>i</sup> and originating from v are winning for Player i. We say that a vertex v ∈ V is *winning for* Player i, if there exists a winning strategy π<sup>i</sup> from v. We collect all winning vertices of Player i in the Player i *winning region* W<sup>i</sup> ⊆ V . We always interpret winning w.r.t. Player 0 if not stated otherwise.

Strategy Templates. Let <sup>π</sup><sup>0</sup> be a Player <sup>0</sup> strategy and <sup>Φ</sup> be an LTL formula. Then we say π<sup>0</sup> *follows* Φ, denoted π<sup>0</sup> -Φ, if for all π<sup>0</sup>-plays ρ, ρ belongs to <sup>L</sup>(Φ), i.e. <sup>L</sup>(π<sup>0</sup>) ⊆ L(Φ). We refer to a set <sup>Ψ</sup> = {Ψ1,...,Ψk} of LTL formulas as *strategy templates* representing the set of strategies that follows Ψ<sup>1</sup> ∧ ... ∧ Ψk. We say a strategy template <sup>Ψ</sup> is *winning from a vertex* <sup>v</sup> for a game (G, Φ) if every Player 0 strategy following the template <sup>Ψ</sup> is winning from <sup>v</sup>. Moreover, we say a strategy template Ψ is *winning* if it is winning from every vertex in W0. In addition, we call <sup>Ψ</sup> *maximally permissive* for <sup>G</sup>, if every Player 0 strategy <sup>π</sup> which is winning in G also follows Ψ. With slight abuse of notation, we use Ψ for the set of formulas {Ψ1,...,Ψk}, and the formula Ψ<sup>1</sup> ∧ ... ∧ Ψk, interchangeably.

Set Transformers. Let <sup>G</sup> = (<sup>V</sup> <sup>=</sup> <sup>V</sup> <sup>0</sup> ∪· <sup>V</sup> <sup>1</sup>, E) be a game graph, <sup>U</sup> <sup>⊆</sup> <sup>V</sup> be a subset of vertices, and <sup>a</sup> ∈ {0, 1} be the player index. Then

$$\mathsf{u}\mathsf{pre}\_G(U) = \{v \in V \mid \forall (v, u) \in E. \, u \in U\} \tag{1}$$

$$\mathsf{cpre}\_G^a(U) = \{ v \in V^a \mid \exists (v, u) \in E. \, u \in U \} \cup \{ v \in V^{1-a} \mid u \in \mathsf{upre}\_G(U) \}\tag{2}$$

The universal predecessor operator upre<sup>G</sup>(U) computes the set of vertices with all the successors in U and the controllable predecessor operator cpre<sup>a</sup> <sup>G</sup>(U) the vertices from which Player a can force visiting U in *exactly one* step. In the following, we introduce two types of attractor operators: attr<sup>a</sup> <sup>G</sup>(U) that computes the set of vertices from which Player a can force at least a single visit to U in *finitely many* steps, and the universal attractor uattr<sup>G</sup>(U) that computes the set of vertices from which both players are forced to visit U. For the following, let pre ∈ {upre, cpre<sup>a</sup>}

$$\mathsf{pre}\_G^1(U) = \mathsf{pre}\_G(U) \cup U \qquad \mathsf{pre}\_G^i(U) = \mathsf{pre}\_G(\mathsf{pre}\_G^{i-1}(U)) \cup \mathsf{pre}\_G^{i-1}(U) \qquad (3)$$

$$\mathbf{attr}\_G^a(U) = \cup\_{i \ge 1} \mathbf{cpu}\_G^{a,i}(U) \quad \mathbf{uattr}\_G(U) = \cup\_{i \ge 1} \mathbf{upper}\_G^i(U) \tag{4}$$

### 3 Computation of Winning Strategy Templates

Given a 2-player game G with an objective Φ, the goal of this section is to compute a *strategy template* that characterizes a large class of winning strategies of Player 0 from a set of vertices <sup>U</sup> <sup>⊆</sup> <sup>V</sup> in a local, permissive, and computationally efficient way. These templates are then utilized in Sect. 5.1 for computational synthesis. In particular, this section introduces three distinct template classes—safety templates (Sect. 3.1), live-group-templates (Sect. 3.2), and colive-templates (Sect. 3.3) along with algorithms for their computation via safety, Büchi, and co-Büchi games, respectively. We then turn to general parity objectives which can be thought of as a sophisticated combination of Büchi and co-Büchi games. We show in Sect. 3.4 how the three introduced templates can be derived for a general parity objective by a suitable combination of the previously introduced algorithms for single templates. All presented algorithms have the same worst-case computation time as the standard algorithms solving the respective game. This shows that extracting strategy *templates* instead of 'normal' strategies does not incur an additional computational cost. We prove the soundness of the algorithms and discuss the complexities in the full version [5, Appendix A].

#### 3.1 Safety Templates

We start the construction of strategy templates by restricting ourselves to games with a safety objective—i.e., <sup>G</sup> = (G, Φ) with <sup>Φ</sup> := <sup>U</sup> for some <sup>U</sup> <sup>⊆</sup> <sup>V</sup> . A winning play in a safety game never leaves U ⊆ V . It is well known that such games allow capturing *all* winning strategies by a simple local template which essentially only allows Player 0 moves from winning vertices to other winning vertices. This is formalized in our notation as a safety template as follows,

Theorem 1 ([8, Fact 7]). *Let* <sup>G</sup> = (G, U) *be a safety game with winning region* <sup>W</sup><sup>0</sup> *and* <sup>S</sup> <sup>=</sup> {(u, v) <sup>∈</sup> <sup>E</sup> <sup>|</sup> - <sup>u</sup> <sup>∈</sup> <sup>V</sup> <sup>0</sup> ∩ W<sup>0</sup> <sup>∧</sup> (v /∈ W<sup>0</sup>)}*. Then*

$$\Psi\_{\text{UNSAFE}}(S) := \Box \bigwedge\_{e \in S} \neg e,\tag{5}$$

*is a winning strategy template for the game* G *which is also maximally permissive.*

It is easy to see that the computation of the safety template <sup>Ψ</sup>unsafe(S) reduces to computing the winning region <sup>W</sup><sup>0</sup> in the safety game (G, U) and extracting S. We refer to the edges in S as *unsafe edges* and we call this algorithm computing the set <sup>S</sup> as SafetyTemplate(G, U). Note that it runs in <sup>O</sup>(m) time, where <sup>m</sup> = <sup>|</sup>E|, as safety games are solvable in <sup>O</sup>(m) time.

#### 3.2 Live-Group Templates

As the next step, we now move to simple liveness objectives which require a particular vertex set I ⊆ V to be seen infinitely often. Here, winning strategies need to stay in the winning region (as before) but in addition always eventually need to make progress towards the vertex set I. We capture this required progress by *live-group templates*—given a group of edges H ⊆ E, we require that whenever a source vertex v of an edge in H is seen infinitely often, an edge e ∈ H (not necessarily starting at v) also needs to be taken infinitely often. This template ensures that compliant strategies always eventually make progress towards I, as illustrated by the following example.

*Example 1.* Consider the game graph in Fig. 2 where we require visiting {c, d} infinitely often. To satisfy this objective from vertex <sup>a</sup>, Player 0 needs to not get stuck at <sup>a</sup>, and should not visit <sup>b</sup> always (since Player 1 can force visiting <sup>a</sup> again, and stop Player 0 from satisfying the objective). Hence, Player 0 has to always eventually leave a and go to {c, d}. This can be captured by the livegroup {eac, ead}. Now if the play comes to <sup>a</sup> infinitely often, Player <sup>0</sup> will go to either c or d infinitely often, hence satisfying the objective.

Formally, such games are called *Büchi games*, denoted by <sup>G</sup> = (<sup>G</sup> = (V,E), Φ) with <sup>Φ</sup> := ♦I, for some <sup>I</sup> <sup>⊆</sup> <sup>V</sup> . In addition, a *live-group* <sup>H</sup> <sup>=</sup> {ej}<sup>j</sup>≥<sup>0</sup> is a set of edges <sup>e</sup><sup>j</sup> = (s<sup>j</sup> , t<sup>j</sup> ) with source vertices *src*(H) := {sj}<sup>j</sup>≥<sup>0</sup>. Given a set of live-groups <sup>H</sup> <sup>=</sup> {Hi}<sup>i</sup>≥<sup>0</sup> we define a live-group template as

$$\Psi\_{\rm{LIVE}}(\mathcal{H}) \coloneqq \bigwedge\_{i \geq 0} \Box \Diamond src(H\_i) \implies \Box \Diamond H\_i. \tag{6}$$

# Algorithm 1. BüchiTemplate(G, I)

Input: A game graph <sup>G</sup>, and a subset of vertices <sup>I</sup> Output: A set of unsafe edges <sup>S</sup> and a set of live-groups <sup>H</sup> 1: <sup>W</sup>0 <sup>←</sup> Büchi(G, I); <sup>S</sup> <sup>←</sup> SafetyTemplate(G, <sup>W</sup>0); 2: <sup>G</sup> <sup>←</sup> <sup>G</sup>|W<sup>0</sup> ; <sup>I</sup> <sup>←</sup> <sup>I</sup> ∩ W0; 3: H ← ReachTemplate(G, I); 4: return (S, <sup>H</sup>) 5: procedure ReachTemplate(G, I <sup>⊆</sup> <sup>V</sup> ) 6: H←∅; 7: while <sup>I</sup> <sup>=</sup> <sup>V</sup> do 8: <sup>A</sup> <sup>←</sup> uattrG(I); <sup>B</sup> <sup>←</sup> cpre<sup>0</sup> <sup>G</sup>(A); H←H∪{Edges(B,A)}; <sup>I</sup> <sup>←</sup> <sup>A</sup> <sup>∪</sup> <sup>B</sup>; 9: return <sup>H</sup>

The live-group template says that if some vertex from the source of a live-group is visited infinitely often, then some edge from this group should be taken infinitely often by the following strategy.

Intuitively, winning strategy templates for Büchi games consist of a safety template conjuncted with a live-group template. While the former enforces all strategies to stay within the winning region W, the latter enforces progress w.r.t. the goal set I within W. Therefore, the computation of a winning strategy template for Büchi games reduces to the computation of the unsafe set S to define <sup>Ψ</sup>unsafe(S) in (5) and the live-group <sup>H</sup> to define <sup>Ψ</sup>live(H) in (6). We denote by BüchiTemplate(G, I) the algorithm computing the above as detailed in Algorithm 1. The algorithm uses some new notations that we define here. Here, the function Büchi solves a Büchi game and returns the winning region (e.g., using the standard algorithm from [17]), Edges(X, Y ) = {(u, v) <sup>∈</sup> <sup>E</sup> <sup>|</sup> u ∈ X, v ∈ Y }, is the set of edges between two subsets of vertices X and Y . <sup>G</sup>|<sup>U</sup> := - <sup>U</sup> = <sup>U</sup><sup>0</sup> ∪· <sup>U</sup>1, E s.t. <sup>U</sup><sup>0</sup> := <sup>V</sup> <sup>0</sup>∩U, <sup>U</sup><sup>1</sup> := <sup>V</sup> <sup>1</sup>∩U, and <sup>E</sup> := <sup>E</sup>∩(U×U) denotes the restriction of a game graph <sup>G</sup> := - <sup>V</sup> <sup>=</sup> <sup>V</sup> <sup>0</sup> ∪· <sup>V</sup> <sup>1</sup>, E to a subset of its vertices U ⊆ V . We have the following formal result.

Theorem 2. *Given a Büchi game* <sup>G</sup> = (G, ♦I) *for some* <sup>I</sup> <sup>⊆</sup> <sup>V</sup> *, if* (S, <sup>H</sup>) = BüchiTemplate(G, I) *then* <sup>Ψ</sup> <sup>=</sup> {Ψunsafe(S), Ψlive(H)} *is a winning strategy template for the game* <sup>G</sup>*, computable in time* <sup>O</sup>(nm)*, where* <sup>n</sup> = <sup>|</sup><sup>V</sup> <sup>|</sup> *and* <sup>m</sup> = <sup>|</sup>E|*.*

While live-group templates capture infinitely many winning strategies in Büchi games, they are *not* maximally permissive, as exemplified next.

*Example 2.* Consider the game graph in Fig. 2 restricted to the vertex set {a, b, d} with the Büchi objective ♦d. Our algorithm outputs the live-group template <sup>Ψ</sup> <sup>=</sup> <sup>Ψ</sup>live({ead}). Now consider the winning strategy with memory that takes edge eda from d, and takes eab for play suffix bda and ead for play suffix aba. This strategy does not follow the template—the play (abd)<sup>ω</sup> is in <sup>L</sup>(π<sup>0</sup>) but not in <sup>L</sup>(Ψ).

#### 3.3 Co-live Templates

We now turn to yet another objective which is the dual of the one discussed before. The objective requires that eventually, only a particular subset of vertices I is seen. A winning strategy for this objective would try to restrict staying or going away from I after a finite amount of time. It is easy to notice that livegroup templates can not ensure this, but it can be captured by *co-live templates*: given a set of edges, eventually these edges are not taken anymore. Intuitively, these are the edges that take or keep a play away from I.

*Example 3.* Consider the game graph in Fig. 2 where we require eventually stop visiting <sup>b</sup>, i.e. staying in <sup>I</sup> = {a, c, d}. To satisfy this objective from vertex <sup>a</sup>, Player 0 needs to stop getting out of <sup>I</sup> eventually. Hence, Player 0 has to stop taking the edges {eab, edb, ede}, which can be ensured by marking both edges co-live. Now since no edges are leading to b, the play eventually stays in I, satisfying the objective. We note that this can not be captured by live-groups {eaa, eac, ead} and {eda}, since now the strategy that visits c and b alternatively from Player 0's vertices, does not satisfy the objective, but follows the live-group.

Formally, a co-Büchi game is a game <sup>G</sup> = (G, Φ) with co-Büchi winning condition <sup>Φ</sup> := ♦I, for some goal vertices <sup>I</sup> <sup>⊆</sup> <sup>V</sup> . A play is winning for Player 0 in such a co-Büchi game if it eventually stays in I forever. The *co-live* template is defined by a set of *co-live* edges D as follows,

$$\Psi\_{\text{COLIVE}}(D) := \bigwedge\_{e \in D} \Diamond \Box \neg e.$$

The intuition behind the winning template is that it forces staying in the winning region using the safety template, and ensures that the play does not go away from the vertex set I infinitely often using the co-live template. We provide the procedure in Algorithm 2 and its correctness in the following theorem. Here, CoBüchi(G, I) is a standard algorithm solving the co-Büchi game with the goal vertices I, and outputs the winning regions for both players [17]. We also use the standard algorithm Safety(G, I) that solves the safety game with the objective to stay in A forever.

Theorem 3. *Given a co-Büchi game* <sup>G</sup> = (G,♦I) *for some* <sup>I</sup> <sup>⊆</sup> <sup>V</sup> *, if* (S, D) = coBüchiTemplate(G, I) *then* <sup>Ψ</sup> = {Ψunsafe(S), Ψcolive(D)} *is a winning strategy template for Player* 0*, computable in time* <sup>O</sup>(nm) *with* <sup>n</sup> = <sup>|</sup><sup>V</sup> <sup>|</sup> *and* <sup>m</sup> = <sup>|</sup>E|*.*

#### 3.4 Parity Games

We now consider a more complex but canonical class of ω-regular objectives. Parity objectives are of central importance in the study of synthesis problems as they are general enough to model a huge class of qualitative requirements of cyber-physical systems, while enjoying the properties like positional determinacy.

# Algorithm 2. coBüchiTemplate(G, I)

Input: A game graph <sup>G</sup>, and a subset of vertices <sup>I</sup> Output: A set of unsafe edges <sup>S</sup> and a set of co-live edges <sup>D</sup> 1: <sup>S</sup> ← ∅; <sup>D</sup> ← ∅ 2: <sup>W</sup>0 <sup>←</sup> CoBüchi(G, I); <sup>S</sup> <sup>←</sup> SafetyTemplate(G, <sup>W</sup>0) 3: <sup>G</sup> <sup>←</sup> <sup>G</sup>|W<sup>0</sup> ; <sup>I</sup> <sup>←</sup> <sup>I</sup> ∩ W0; 4: while <sup>V</sup> <sup>=</sup> <sup>∅</sup> do 5: <sup>A</sup> <sup>←</sup> Safety(G, I); <sup>D</sup> <sup>←</sup> <sup>D</sup> <sup>∪</sup> Edges(A, V \A); 6: while cpre<sup>0</sup> <sup>G</sup>(A) <sup>=</sup> <sup>A</sup> do Outputs attr<sup>0</sup> <sup>G</sup>(A) 7: <sup>B</sup> <sup>←</sup> cpre<sup>0</sup> <sup>G</sup>(A); 8: <sup>D</sup> <sup>←</sup> <sup>D</sup> <sup>∪</sup> Edges(B, V \(<sup>A</sup> <sup>∪</sup> <sup>B</sup>)) <sup>∪</sup> Edges(B,B); 9: <sup>A</sup> <sup>←</sup> <sup>A</sup> <sup>∪</sup> <sup>B</sup>; 10: <sup>G</sup> <sup>←</sup> <sup>G</sup>|<sup>V</sup> \<sup>A</sup>; <sup>I</sup> <sup>←</sup> <sup>I</sup> <sup>∩</sup> <sup>V</sup> \A; 11: return (S, D)

A parity game is a game <sup>G</sup> = (G, Φ) with parity winning condition <sup>Φ</sup> = *Parity*(P), where

$$Parity(\mathbb{P}) \coloneqq \bigwedge\_{i \in \mathcal{c}\_{\text{odd}}[0; k]} \left( \Box \Diamond P\_i \implies \bigvee\_{j \in \mathcal{c}\_{\text{even}}[i+1; k]} \Box \Diamond P\_j \right),\tag{7}$$

with <sup>P</sup><sup>i</sup> <sup>=</sup> {<sup>q</sup> <sup>∈</sup> <sup>Q</sup> <sup>|</sup> <sup>P</sup>(q) = <sup>i</sup>} for some priority function <sup>P</sup> : <sup>V</sup> <sup>→</sup> [0; <sup>d</sup>] that assigns each vertex a priority. A play is winning for Player 0 in such a game if the maximum of priorities seen infinitely often is even.

Although parity objectives subsume previously described objectives, we can construct strategy templates for parity games using the combinations of previously defined templates. To this end, we give the following algorithm.

Theorem 4. *Given a parity game* <sup>G</sup> = (G,*Parity*(P)) *with priority function* <sup>P</sup> : <sup>V</sup> <sup>→</sup> [0; <sup>d</sup>]*, if* ((W0, <sup>W</sup><sup>1</sup>), <sup>H</sup>, D) = ParityTemplate(G, <sup>P</sup>)*, then* <sup>Ψ</sup> <sup>=</sup> {Ψunsafe(S), Ψlive(H), Ψcolive(D)} *is a winning strategy template for the game* <sup>G</sup>*, where* <sup>S</sup> = Edges(W0, <sup>W</sup><sup>1</sup>)*. Moreover, the algorithm terminates in time* <sup>O</sup>(n<sup>d</sup>+O(1))*, which is same as that of Zielonka's algorithm.*

We refer the readers to the full version [5, Appendix A.3] for the complete proofs, and here we provide the intuition behind the algorithm and the computation of the algorithm on the parity game in Fig. 3. The algorithm follows the divide-

Fig. 3. A parity game, where a vertex with priority <sup>i</sup> has label <sup>p</sup>i. The dotted edge in red is a co-live edge, while the dashed edges in blue are singleton live-groups. (Color figure online)

# Algorithm 3. ParityTemplate(G, <sup>P</sup>)

Input: A game graph <sup>G</sup>, and a priority function <sup>P</sup> : <sup>V</sup> → {0,...,d} Output: Winning regions (W0, <sup>W</sup>1), live-groups <sup>H</sup>, and co-live edges <sup>D</sup> 1: if <sup>d</sup> is odd then 2: A = attr<sup>1</sup> <sup>G</sup>(Pd) 3: if <sup>A</sup> <sup>=</sup> <sup>V</sup> then return (∅, V ), <sup>∅</sup>, <sup>∅</sup> 4: else 5: (W0, <sup>W</sup>1), <sup>H</sup>, D <sup>←</sup> ParityTemplate(G|<sup>V</sup> \<sup>A</sup>, <sup>P</sup>) 6: if <sup>W</sup>0 <sup>=</sup> <sup>∅</sup> then return (∅, V ), <sup>∅</sup>, <sup>∅</sup> 7: else 8: B = attr<sup>0</sup> <sup>G</sup>(W0) 9: <sup>D</sup> <sup>←</sup> <sup>D</sup> <sup>∪</sup> Edges(W0, V \W0) 10: H←H∪ ReachTemplate(G, <sup>W</sup>0) 11: (W- 0, <sup>W</sup>- 1), <sup>H</sup>- , D- <sup>←</sup> ParityTemplate(G|<sup>V</sup> \<sup>B</sup>, <sup>P</sup>) 12: return (W- 0 <sup>∪</sup> B, <sup>W</sup>- 1), H∪H- , D <sup>∪</sup> <sup>D</sup>- 13: else If <sup>d</sup> is even 14: A = attr<sup>0</sup> <sup>G</sup>(Pd) 15: if A=V then return (V, <sup>∅</sup>), ReachTemplate(G, Pd), <sup>∅</sup> 16: else 17: (W0, <sup>W</sup>1), <sup>H</sup>, D <sup>←</sup> ParityTemplate(G|<sup>V</sup> \<sup>A</sup>, <sup>P</sup>) 18: if <sup>W</sup>1 <sup>=</sup> <sup>∅</sup> then return (V, <sup>∅</sup>), H ∪ ReachTemplate(G|<sup>A</sup>, Pd), D 19: else 20: B = attr<sup>1</sup> <sup>G</sup>(W1) 21: (W- 0, <sup>W</sup>- 1), <sup>H</sup>- , D- <sup>←</sup> ParityTemplate(G|<sup>V</sup> \<sup>B</sup>, <sup>P</sup>) 22: return (W- 0, <sup>W</sup>- 1 <sup>∪</sup> <sup>B</sup>), <sup>H</sup>- , D-

and-conquer approach of Zeilonka's algorithm. Since the highest priority occurring is 6 which is even, we first find the vertices <sup>A</sup> = {d, h} from which Player 0 can force visiting {d} (vertices with priority 6) in line 14. Then since <sup>A</sup> = <sup>V</sup> , we find the winning strategy template in the rest of the graph <sup>G</sup><sup>1</sup> <sup>=</sup> <sup>G</sup>|<sup>V</sup> \<sup>A</sup>. Then the highest priority 5 is odd, hence we compute the region {c} from which Player 1 can ensure visiting <sup>5</sup>. We again restrict our graph to <sup>G</sup><sup>2</sup> <sup>=</sup> <sup>G</sup>|{a,b,e,f,g}. Again, the highest priority is even. We further compute the region <sup>A</sup><sup>2</sup> <sup>=</sup> {a, b} from which Player <sup>0</sup> can ensure visiting the priority <sup>4</sup>, giving us <sup>G</sup><sup>3</sup> <sup>=</sup> <sup>G</sup>|{e,f,g}. In <sup>G</sup>3, Player <sup>0</sup> can ensure visiting the highest priority 2, hence satisfying the condition in line 15. Then since in this small graph, Player 0 needs to keep visiting priority 2 infinitely often, which gives us the live-groups {egf } and {ef f } in line 15. Coming one recursive step back to <sup>G</sup>2, since <sup>G</sup><sup>3</sup> doesn't have a winning vertex for Player <sup>1</sup>, the if condition in the line 18 is satisfied. Hence, for the vertices in A2, it suffices to keep visiting priority 4 to win, which is ensured by the live-group {eab} added in the line 18. Now, again going one recursive step back to <sup>G</sup>1, we have <sup>W</sup><sup>0</sup> <sup>=</sup> {a, b, e, f, g}. If Player <sup>0</sup> can ensure reaching and staying in W<sup>0</sup> from the rest of the graph G1, it can satisfy the parity condition. Since from the vertex c, W<sup>0</sup> will anyway be reached, we get a colive edge ebc in line 9 to eventually keep the play in W0. Coming back to the initial recursive call, since now again <sup>G</sup><sup>1</sup> was winning for Player <sup>0</sup>, they only need to be able to visit the priority 6 from every vertex in <sup>A</sup>, giving another live-group {ehd}.

### 4 Extracting Strategies from Strategy Templates

This section discusses how a strategy that follows a computed winning strategy template can be extracted from the template. As our templates are just particular LTL formulas, one can of course use automata-theoretic techniques for this. However, as the types of templates we presented put some local restrictions on strategies, we can extract a strategy much more efficiently. For instance, the game in Fig. <sup>2</sup> with strategy template <sup>Ψ</sup> = <sup>Ψ</sup>live({eac, ead}) allows the strategy that simply uses the edges eac and ead alternatively from vertex a.

However, strategy extraction is not as straightforward for every template, even if it only conjuncts the three template types we introduced in Sect. 3. For instance, consider again the game graph from Fig. 2 with a strategy template <sup>Ψ</sup> = {Ψunsafe(eac, ead), Ψcolive(eaa, eab)}. Here, non of the four choices of Player <sup>0</sup> (i.e., outgoing edges) from vertex a can be taken infinitely often, and, hence, the only way a play satisfies Ψ is to not visit vertex a infinitely often. On the other hand, given strategy template <sup>Ψ</sup> <sup>=</sup> {Ψcolive(eab, edb), Ψlive({eab, eac, edb})}, edge edb is both live and co-live, which raises a conflict for vertex d. Hence, the only way a strategy can follow Ψ is again to ensure that d is not visited infinitely often. We call such situations *conflicts*. Interestingly, the methods we presented in Sect. 3 never create such conflicts and the computed templates are therefore *conflict-free*, as formalized next and proven in the full version [5, Appendix A.4].

Definition 1. *A strategy template* <sup>Ψ</sup> = {Ψunsafe(S), Ψcolive(D), Ψlive(H)} *in a game graph* <sup>G</sup> = (V,E) *is* conflict-free *if the following are true:*


Proposition 1. *Algorithms 1, 2, and 3 always return conflict-free templates.*

Due to the given conflict-freeness, winning strategies are indeed easy to extract from winning strategy templates, as formalized next.

Proposition 2. *Given a game graph* <sup>G</sup> = (V,E) *with* conflict-free *winning strategy template* <sup>Ψ</sup> <sup>=</sup> {Ψunsafe(S), Ψcolive(D), Ψlive(H)}*, a winning strategy* <sup>π</sup><sup>0</sup> *that follows* <sup>Ψ</sup> *can be extracted in time* <sup>O</sup>(m)*, where* <sup>m</sup> *is the number of edges.*

The proof is straightforward by constructing the winning strategy as follows. We first remove all unsafe and co-live edges from G and then construct a strategy π<sup>0</sup> that alternates between all remaining edges from every vertex in W0. This strategy is well defined as condition (i) in Definition 1 ensures that after removing all the unsafe and co-live edges a choice from every vertex remains. Moreover, if the vertex is a source of a live-group edge, condition (ii) in Definition 1 ensures that there are outgoing edges satisfying every live-group. It is easy to see that the constructed strategy indeed follows Ψ and is hence winning from vertices in W0, as Ψ was a winning strategy template. We call this procedure of strategy extraction ExtractStrategy(G, Ψ).

### 5 Applications of Strategy Templates

This section considers two concrete applications of strategy templates which utilize their structural simplicity and easy adaptability.

In the context of CPS control design problems, it is well known that the game graph of the resulting parity game used for strategy synthesis typically has a physical interpretation and results from behavioral constraints on the *existing technical system* that is subject to control. In particular, following the wellestablished paradigm of abstraction-based control design (ABCD) [2,7,39], an underlying (stochastic) disturbed non-linear dynamical system can be automatically abstracted into a two-player game graph using standard abstraction tools, e.g. SCOTS [35], ARCS [13], MASCOT [20], P-FACES [22], or ROCS [27].

In contrast to classical problems in reactive synthesis, it is very natural in this context to think about the game graph and the specification as two *different* objects. Here, specifications are naturally expressed via propositions that are defined over sets of states of this underlying game graph, without changing its structure. This separation is for example also present in the known LTL fragment GR(1) [10]. Arguably, this feature has contributed to the success of GR(1)-based synthesis for CPS applications, e.g. [1,3,24,25,38,40,41].

Given this insight, it is natural to define the incremental synthesis problem such that the game graph stays unchanged, while newly arriving specifications are modeled as new parity conditions over the same game graph. Formally, this results in a *generalized parity game* where the different objectives arrive *one at a time*. We show an incremental algorithm for synthesizing winning strategies for such games in Sect. 5.1. Similarly, fault-tolerant control requires the controller to adapt to unavailable actuators within the technical system under control. This naturally translates to the removal of Player 0 edges within the game graph given its physical interpretation. We show how strategy templates can be used to adapt winning strategies to these game graph modifications in Sect. 5.2.

#### 5.1 Incremental Synthesis via Strategy Templates

In this section we consider a 2-player game <sup>G</sup> with a conjunction <sup>Φ</sup> = <sup>k</sup> <sup>i</sup>=1 Φ<sup>i</sup> of multiple parity objectives Φi, also called a *generalized* parity objective. However, in comparison to existing work [12,16], we consider the case that different objectives Φ<sup>i</sup> might not arrive all at the same time. The intuition of our algorithm is to solve each parity game (G, Φ<sup>i</sup>) separately and then combine the resulting strategy templates <sup>Ψ</sup><sup>i</sup> to a global template <sup>Ψ</sup> <sup>=</sup> <sup>k</sup> <sup>i</sup>=1 Ψi. This allows to easily incorporate newly arriving objectives Φk+1. We only need to solve the parity game (G, Φk+1) and then combine the resulting template <sup>Ψ</sup>k+1 with <sup>Ψ</sup>.

While Proposition 1 ensures that every individual template Ψ<sup>i</sup> is *conflictfree*, this does unfortunately not imply that their conjunction is also *conflictfree*. Intuitively, combinations of strategy templates can cause the condition (i) and (ii) in Definition 1 to not hold anymore, resulting in a *conflict*. As already discussed in Sect. 4, this requires source vertices U ⊆ V with such conflicts to Algorithm 4. ComposeTemplate(G,(W <sup>0</sup>, H , D ,(Φi)i<),(Φi)≤i≤k) where <sup>Φ</sup><sup>i</sup> <sup>=</sup> *Parity*(Pi)

	- 8: P- <sup>i</sup>(u) <sup>←</sup> <sup>P</sup>[C1 ∪ C2 <sup>→</sup> <sup>2</sup>d- <sup>i</sup> + 1] for each <sup>i</sup> <sup>≤</sup> <sup>k</sup> 9: return ComposeTemplate(G, (W0, <sup>∅</sup>, <sup>∅</sup>, <sup>∅</sup>), (Φ- <sup>i</sup>)<sup>i</sup>≤<sup>k</sup>) with <sup>Φ</sup>- <sup>i</sup> = *Parity*(P-

eventually not be visited anymore. We therefore resolve such conflicts by adding the specification ♦¬U to every objective and recomputing the templates.

<sup>i</sup>))

To efficiently formalize this objective change, we note that a parity objective *Parity*(P) with an additional specification ♦¬<sup>U</sup> for some <sup>U</sup> <sup>⊆</sup> <sup>V</sup> is equivalent to another parity objective P arity(P ), where priority function <sup>P</sup> can be obtained from <sup>P</sup> : <sup>V</sup> <sup>→</sup> [0; 2d+1] just by modifying the priorities of vertices in <sup>U</sup> to 2d+1. Let us denote such a priority function by <sup>P</sup>[<sup>U</sup> <sup>→</sup> 2<sup>d</sup> + 1]. In particular, we have the following result:

Lemma 1. *Given a game graph* <sup>G</sup> *and two parity objectives* <sup>Φ</sup> = *Parity*(P)*,* <sup>Φ</sup> = P arity(P ) *such that* <sup>P</sup> : <sup>V</sup> <sup>→</sup> [0; 2<sup>d</sup> + 1] *and* <sup>P</sup> = <sup>P</sup>[<sup>U</sup> <sup>→</sup> 2<sup>d</sup> + 1] *for some vertex set* <sup>U</sup> <sup>⊆</sup> <sup>V</sup> *, it holds that* <sup>L</sup>(Φ ) = <sup>L</sup>(<sup>Φ</sup> <sup>∧</sup> ♦¬U)*. Moreover, if a strategy template is winning from some vertex* <sup>u</sup> *in the game* <sup>G</sup> = (G, Φ )*, then it is also winning from* <sup>u</sup> *in the game* <sup>G</sup> = (G, Φ)*.*

Using the above ideas, we present Algorithm 4 to solve generalized parity games (possibly incrementally). If no partial solution to the synthesis problem exists so far we have = 0, otherwise the game (G, i< <sup>Φ</sup><sup>i</sup>) was already solved and the respective winning region and templates are known. In both cases, the algorithm starts with computing a winning strategy template for each game (G, Φ<sup>i</sup>) for <sup>i</sup> ∈ { + 1, k} (line 1) and conjuncts them with the already computed ones (line 2). Then the algorithm checks for conflicts (line 3–4). If there is some conflict the algorithm modifies the objectives to ensure that the conflicted vertices are eventually not visited anymore (line 8), and then re-computes the templates in the game graph restricted to the intersection of winning regions for all objectives (line 9). If there is no conflict, then the algorithm returns the conjunction of the templates which is conflict-free, and hence, is winning from the intersection of winning regions for every objective (line 6). The latter is formalized in the following theorem. The proof can be found in the full version [5, Appendix B.2].

Theorem 5. *Given a generalized parity game* <sup>G</sup> = (G, <sup>i</sup>≤<sup>k</sup> <sup>Φ</sup>i) *with* <sup>Φ</sup><sup>i</sup> <sup>=</sup> *Parity*(Pi) *and priority functions* <sup>P</sup><sup>i</sup> : <sup>V</sup> <sup>→</sup> [0; 2d<sup>i</sup> + 1]*, if* (W0, <sup>H</sup>, D,(Φ <sup>i</sup>)i≤k) = ComposeTemplate(G, <sup>∅</sup>,(V, <sup>∅</sup>, <sup>∅</sup>),(Φi)i≤k)*, then* <sup>Ψ</sup> = {Ψunsafe(S), Ψlive(H), Ψcolive(D)} *is an conflict-free strategy template that is winning from* <sup>W</sup><sup>0</sup> *in the game* <sup>G</sup>*, where* <sup>S</sup> <sup>=</sup> Edges(W0, V \ W<sup>0</sup>). *Further,* <sup>Ψ</sup> *is computable in time* <sup>O</sup>(kn<sup>2</sup>d+3) *time, where* <sup>n</sup> <sup>=</sup> <sup>|</sup><sup>V</sup> <sup>|</sup> *and* <sup>d</sup> = max<sup>i</sup>≤<sup>k</sup> <sup>d</sup>i*.*

Due to the conflict checks carried out within Algorithm 4 the returned modified objectives Φ <sup>i</sup> ensure that the *conjunction* <sup>Ψ</sup> := <sup>k</sup> <sup>i</sup>=1 Ψ <sup>i</sup> of winning strategy templates Ψ <sup>i</sup> for the games (G, Φ <sup>i</sup>) is indeed conflict-free. In particular, the conjuncted template Ψ is actually returned by the algorithm. Hence, incrementally running Algorithm 4 is actually sound. This is an immediate consequence of Theorem 5 and stated as a corollary next.

Corollary 1. *Given a generalized parity game* <sup>G</sup> = (G, <sup>i</sup>≤<sup>k</sup> <sup>Φ</sup><sup>i</sup>) *with* <sup>Φ</sup><sup>i</sup> <sup>=</sup> *Parity*(P<sup>i</sup>) *and priority functions* <sup>P</sup><sup>i</sup> : <sup>V</sup> <sup>→</sup> [0; 2d<sup>i</sup> + 1]*, s.t.*

$$\begin{aligned} (\mathcal{W}'\_0, \mathcal{H}', D', (\Phi'\_i)\_{i < \ell}) &:= \text{ComposeTEMPLE}(G, (V, \emptyset, \emptyset, \emptyset), (\Phi\_i)\_{i < \ell}), \text{ and} \\ (\mathcal{W}\_0, \mathcal{H}, D, (\Phi'\_i)\_{i \le k}) &:= \text{ComposeTEMPLE}(G, (\mathcal{W}'\_0, \mathcal{H}', D', (\Phi'\_i)\_{i < \ell}), (\Phi\_i)\_{\ell \le i \le k}) \end{aligned}$$

*then* <sup>Ψ</sup> = {Ψunsafe(S), Ψlive(H), Ψcolive(D)} *is an conflict-free strategy template that is winning from* <sup>W</sup><sup>0</sup> *in the game* <sup>G</sup>*, where* <sup>S</sup> <sup>=</sup> Edges(W0, V \W<sup>0</sup>). *Further,* <sup>Ψ</sup> *is computable in time* <sup>O</sup>(kn2d+3)*, where* <sup>n</sup> <sup>=</sup> <sup>|</sup><sup>V</sup> <sup>|</sup> *and* <sup>d</sup> = max<sup>i</sup>≤<sup>k</sup> <sup>d</sup>i*.*

We note that the generalized Zielonka algorithm [16] for solving generalized parity games has time complexity <sup>O</sup>(mn-2d*<sup>i</sup>* ) - d*i* d1,d2,...,d*<sup>k</sup>* for a game with n vertices, <sup>m</sup> edges and <sup>k</sup> priority functions: <sup>P</sup><sup>i</sup> with <sup>2</sup>d<sup>i</sup> priorities for each <sup>i</sup>. Clearly, Algorithm 4 has a much better time complexity. However, it is not complete, i.e., it does not always return the complete winning region. This is due to templates being not maximally permissive and hence potentially raising conflicts which result in additional specifications that are not actually required. The next example shows such an incomplete instance for illustration. We however note that Algorithm 4 returned the *full winning region* on *all* benchmarks considered during evaluation, suggesting that such instances rarely occur in practice.

*Example 4.* Consider the game in Fig. <sup>2</sup> with objectives <sup>Φ</sup><sup>3</sup> <sup>∧</sup> <sup>Φ</sup><sup>4</sup> with <sup>Φ</sup><sup>4</sup> <sup>=</sup> *Parity*(P), where <sup>P</sup> maps vertices a, b, c, d, e, f to 0, 2, 1, 1, 1, 1, respectively. The winning strategy templates computed by ParityTemplate for objectives Φ<sup>3</sup> and <sup>Φ</sup><sup>4</sup> are <sup>Ψ</sup><sup>3</sup> <sup>=</sup> <sup>Ψ</sup>colive(eab, edb, ede) and <sup>Ψ</sup><sup>4</sup> <sup>=</sup> <sup>Ψ</sup>live({eab, edb, ede}), respectively. The conjunction of both templates marks all outgoing edges of vertex a and d in the live-group co-live. Hence, the algorithm would ensure that these conflicted vertices a and d are eventually not visited anymore. However, the only way to satisfy Φ3∧Φ<sup>4</sup> is by eventually looping on vertex a. But this solution was skipped by the strategy template Ψ<sup>4</sup> by putting edge eab in a live-group. Therefore, the algorithm returns the empty set as the winning region, whereas the actual winning region is the whole vertex set.

#### 5.2 Fault-Tolerant Strategy Adaptation

In this section we consider a 2-player parity game <sup>G</sup> = (G,*Parity*(P)) and a set of faulty Player 0 edges <sup>F</sup> <sup>⊆</sup> <sup>E</sup> <sup>∩</sup>(<sup>V</sup> <sup>0</sup> <sup>×</sup> <sup>V</sup> ) which might become unavailable during runtime. Given a strategy template <sup>Ψ</sup> for <sup>G</sup>, we can use <sup>Ψ</sup> = {Ψ,Ψunsafe(F)} for the (linear-time) extraction of a new strategy for the game, if Ψ is conflict-free for G. In this case, no re-computation is needed. If Ψ is not conflict-free for G, then we can remove the edges in F and compute a new winning strategy template using Algorithm 3. This is formalized in Algorithm 5, where we slightly abuse notation and assume that ParityTemplate only outputs strategy templates. The correctness of Algorithm 5 follows directly from Theorem 4.

Corollary 2. *Given a* 2*-player parity game* <sup>G</sup> = (G,*Parity*(P)) *with a strategy template* <sup>Ψ</sup> = ParityTemplate(G, <sup>P</sup>) *and faulty edge set* <sup>F</sup> <sup>⊆</sup> <sup>E</sup> <sup>∩</sup> (<sup>V</sup> <sup>0</sup> <sup>×</sup> <sup>V</sup> ) *it holds that* Ψ *obtained from Algorithm 5 is a winning strategy template for* G|<sup>E</sup>\<sup>F</sup> *.*

Faulty edges introduce an additional safety specification for which our templates are maximally permissive. This implies that Algorithm 5 is *sound and complete* – if there exists a winning strategy for (G|<sup>E</sup>\<sup>F</sup> ,*Parity*(P)) Algorithm <sup>5</sup> finds one.

Let us now assume that F collects all edges controlling *vulnerable* actuators that *might* become unavailable. In this scenario, Algorithm 5 returns a conservative strategy that *never* uses vulnerable actuators. It might however be desirable to use actuators as long as they are available to obtain better performance. Formally, this application scenario can be defined via a time-dependent graph who's edges change over time, i.e., <sup>E</sup><sup>t</sup> with <sup>E</sup><sup>0</sup> <sup>=</sup> <sup>E</sup> are the edges available at time <sup>t</sup> <sup>∈</sup> <sup>N</sup> and <sup>F</sup> := {<sup>e</sup> <sup>∈</sup> <sup>E</sup> <sup>|</sup> <sup>e</sup> ∈ <sup>E</sup>i, for some <sup>i</sup>}. Given the original parity game <sup>G</sup> = (G,*Parity*(P)) with a winning strategy template <sup>Ψ</sup> we can easily modify ExtractStrategy(G, Ψ) to obtain a time-dependent strategy <sup>π</sup><sup>g</sup> which reacts to the unavailability of edges, i.e., at time <sup>t</sup>, <sup>π</sup><sup>g</sup> takes an edge <sup>e</sup> <sup>∈</sup> <sup>E</sup>t\(<sup>S</sup> <sup>∪</sup> <sup>D</sup>) for all vertices without any live-group, and for the ones with live-groups, it alternates between the edges satisfying the live-groups whenever they are available, and an edge <sup>e</sup> <sup>∈</sup> <sup>E</sup>t\(<sup>S</sup> <sup>∪</sup> <sup>D</sup>) when no live-group edge is available.

The online strategy π<sup>g</sup> can be implemented even without knowing when edges are available<sup>2</sup>, i.e., without knowing the time dependent edge sequence {Et}<sup>t</sup>∈<sup>N</sup>


Input: A parity game <sup>G</sup> = (G,*Parity*(P)), a strategy template <sup>Ψ</sup>, and a set of faulty edges F Output: A new strategy template <sup>Ψ</sup>- 1: Ψ- ← {Ψ,Ψunsafe(F)} 2: if CheckTemplate(G,Ψ- ) then return <sup>Ψ</sup>- 3: else

<sup>4:</sup> return ParityTemplate(G|<sup>E</sup>\<sup>F</sup> , <sup>P</sup>|<sup>E</sup>\<sup>F</sup> )

<sup>2</sup> We note that it is reasonable to assume that current actuator faults are visible to the controller at runtime, see e.g. [34] for a real water gate control example.

up front. In this case <sup>π</sup><sup>g</sup> is obviously winning in <sup>G</sup> = (G,*Parity*(P)) if <sup>Ψ</sup> is conflict-free for G|E\<sup>F</sup> . If this is not the case, one needs to ensure that edges that cause conflicts are always eventually available again, as formalized next.

Definition 2. *Given a parity game* <sup>G</sup> = (G,*Parity*(P)) *we call the dynamic edge set* {Ei}i≥<sup>0</sup> *<sup>a</sup>* guaranteed availability fault (GAF) *if* <sup>∀</sup> *plays* <sup>ρ</sup> <sup>=</sup> <sup>v</sup>0v<sup>1</sup> ...*,* <sup>∀</sup><sup>v</sup> <sup>∈</sup> <sup>V</sup> *, if* <sup>v</sup> <sup>∈</sup> inf(ρ)*, then* <sup>∀</sup><sup>e</sup> = (v, w) <sup>∈</sup> <sup>F</sup>*,* <sup>∃</sup> *infinitely many times* <sup>t</sup>0, t<sup>1</sup> ... *such that* <sup>v</sup>t*<sup>j</sup>* <sup>=</sup> <sup>v</sup> *and* <sup>e</sup> <sup>∈</sup> <sup>E</sup>t*<sup>j</sup> ,* <sup>∀</sup><sup>j</sup> <sup>≥</sup> <sup>0</sup>*.*

Intuitively, guaranteed availability faults (GAF) ensure that a faulty edge is always eventually available when a play is in its source vertex. Under this fault, the following fault-correction result holds, which is proven in the full version [5, Appendix B.3].

Proposition 3. *Given a game graph* G *with a parity objective* Φ*, a strategy template* <sup>Ψ</sup> = {Ψunsafe(S), Ψlive(H), Ψcolive(D)} *computed by Algorithm <sup>3</sup> and a set* <sup>F</sup> = {<sup>e</sup> <sup>∈</sup> <sup>E</sup> <sup>|</sup> <sup>e</sup> ∈ <sup>E</sup>i, *for some* <sup>i</sup>} *of faulty edges, the game with the objective is realizable under GAF if for every vertex* <sup>v</sup> <sup>∈</sup> <sup>V</sup> <sup>0</sup>*, there is an outgoing edge which is not in* S ∪ D ∪ F*.*

This proposition allows a simple linear-time algorithm to check if the templates computed by Algorithm 3 are GAF-tolerant: check if every vertex in the winning region has an outgoing edge which is not in S ∪ D ∪F. If this is not the case, the recomputation is non-trivial and is out of scope of this paper. We can however collect the vertices which do not satisfy the above property and alert the system engineer that these vulnerable actuators require additional maintenance or protective hardware. Our experimental results in Sect. 6 show that conflicts arising from actuator faults are rare and very local. Our strategy templates allow to easily localize them, which supports their use for CPS applications.

### 6 Empirical Evaluation

We have developed a C++-based prototype tool PeSTel<sup>3</sup> (computing Permissive Strategy Templates) that implements Algorithms 1–5. We have used PeSTel to show its superior performance on the two applications considered in Sect. 5, suggesting its practical relevance. All our experiments were performed on a computer equipped with Apple M1 Pro 8-core CPU and 16 GB RAM.

Incremental Synthesis. We used PeSTel to solve generalized parity games both in one shot and incremental. We compare our algorithm with existing algorithms, i.e., GenZiel from [16] and three partial solvers<sup>4</sup> from [12], by executing

<sup>3</sup> Repository URL: https://github.com/satya2009rta/pestel.

<sup>4</sup> While GenZiel is sound and complete [16], we found different randomly generated games where the algorithms from [12] either return a superset or a subset of the winning region, hence compromising soundness and completeness. Since [12] lacks rigorous proof, it is not clear whether this is an implementation bug or a theoretical mishap, leaving soundness and completeness guarantees of these algorithms open.

Table 1. Aggregated experimental results on generalize parity game benchmarks with objectives given *up-front* (top) and *one-by-one* (bottom). Subrows: 1st row (mean time) – average computation time (in ms); 2nd row (incomplete) – number of examples where the corresponding tool failed to compute the complete winning region; 3rd row (faster than) – number of examples where PeSTel is faster than the respective tool; 4th row (timeouts) – number of examples where the respective tool timed out (10000 ms).


them on a large set of benchmarks. We have generated two types of benchmarks from the games used for the Reactive Synthesis Competition (SYNTCOMP) [21]. Benchmark A was generated by converting parity games into Street games using standard methods, and as each Streett pair can be represented by a {0, 1, 2} priority parity game, we represented the complete Streett objective as a conjunction of multiple {0, 1, 2}-priority parity objectives, resulting in a generalized parity game. Benchmark B was generated by adding randomly<sup>5</sup> generated parity objectives to given parity games. We considered 200 examples in Benchmark A and more than 1400 examples in Benchmark B.

We summarize the complete set of results of the experiments in<sup>6</sup> Table 1 and Fig. 1. We performed two kinds of experiments. First, we solved every generalized parity game in Benchmark A and B in *one shot* using the different methods. The results are shown in Table 1(top) and Fig. 1(left). Although the average time taken by PeSTel is higher than GenZiel and one partial solver, it is fastest in more than 90% of the games in both benchmarks. Thus, it shows that PeSTel is as efficient as the other methods in most cases. Moreover, for every

<sup>5</sup> The random generator takes three parameters: game graph "G", number of objectives "k", and maximum priority "m"; and then it generates "k" random parity objectives with maximum priority "m" as follows: 50% of the vertices in "G" are selected randomly, and those vertices are assigned priorities ranging from 0 to "m" (including 0 and m) such that 1/m-th (of those 50%) vertices are assigned priority 0 and 1/mth are assigned priority 1 and so on. The rest 50% are assigned random priorities ranging from 0 to "m". Hence, for every priority, there are at least 1/(2m)-th vertices (i.e., 1/m-th of 50% vertices) with that priority.

<sup>6</sup> See the full version of this paper [5, Appendix C] for a version of Fig. 1 including all solvers considered in Table 1.

Fig. 4. Experimental results for parity games with faulty edges. Left: percentage of instances with conflicts given a certain percentage of faulty edges. Right: average percentage of vertices that created conflicts given a certain percentage of faulty edges.

game in both benchmarks, PeSTel succeeded to compute the complete winning region, whereas the partial solvers failed to do so in some cases<sup>7</sup>. We note that the instances which are hard for PeSTel are those where the winning region becomes empty, which is quickly detected by GenZiel but only seen by PeSTel after most objectives are (separately) considered.

Second, we solved the examples in Benchmark B by adding the objectives *one-by-one*, i.e., we solved the game with one objective, then we added one more objective and solved it again, and so on. The results are shown in Table 1(bottom) and Fig. 1(right). As PeSTel can use the pre-computed strategy templates if we add a new objective to a game, it outperforms all the other solvers significantly as they need to re-solve the game from scratch every time.

Fault-Tolerant Control. As discussed in Sect. 5.2, strategy templates can be used to implement a fault tolerant time-dependent strategy, if the set of faulty edges F does not cause conflicts with the strategy template. We have used PeS-Tel on over 200 examples of parity games from SYNTCOMP [21] to evaluate the relevance of such conflicts in practice. For this, we randomly selected different percentages of edges to be faulty and checked for conflicts with the given template. The results are summarized in Fig. 4. The left plot shows the number of instances for which a conflict occurs if a certain percentage of randomly selected edges is faulty. We see that the majority of the instances never faces a conflict even when 30% of the edges are faulty. Looking more closely into the instances with conflicts, Fig. 4(right) shows the average number of conflicting vertices in these benchmarks. Here we see that conflicts occur very locally at a very small number of vertices. Our strategy templates allow for a linear-time algorithm to localize them, allowing to mitigate them in practice by additional hardware.

*Remark 1.* We remark again that our results are directly applicable to CPS with continuous dynamics via the paradigm of abstraction-based control design (ABCD). In particular, standard abstraction tools such as SCOTS [35],

<sup>7</sup> Additionally, we outperform all algorithms on the benchmarks considered by Bruyère et al. [12]. We have however chosen to not include them in our analysis as many of their generalized parity games have only one objective and are therefore trivial.

ARCS [13], MASCOT [20], P-FACES [20], or ROCS [27] automatically compute a game graph from the (stochastic) continuous dynamics that can directly be used as an input to PeSTel. The winning strategy computed by PeSTel can further be refined into a correct-by-construction continuous feedback controller for the original dynamical system using standard methods from ABCD. We leave these tool integrations to future work.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Synthesizing Trajectory Queries from Examples**

Stephen Mell1(B) , Favyen Bastani<sup>2</sup> , Steve Zdancewic<sup>1</sup> , and Osbert Bastani<sup>1</sup>

<sup>1</sup> University of Pennsylvania, Philadelphia, PA 19104, USA {sm1,stevez,obastani}@cis.upenn.edu <sup>2</sup> Allen Institute for AI, Seattle, WA 98104, USA favyenb@allenai.org

**Abstract.** Data scientists often need to write programs to process predictions of machine learning models, such as object detections and trajectories in video data. However, writing such queries can be challenging due to the fuzzy nature of real-world data; in particular, they often include real-valued parameters that must be tuned by hand. We propose a novel framework called Quivr that synthesizes trajectory queries matching a given set of examples. To efficiently synthesize parameters, we introduce a novel technique for pruning the parameter space and a novel quantitative semantics that makes this more efficient. We evaluate Quivr on a benchmark of 17 tasks, including several from prior work, and show both that it can synthesize accurate queries for each task and that our optimizations substantially reduce synthesis time.

### **1 Introduction**

Over the past decade, deep neural networks (DNNs) have successfully solved challenging artificial intelligence problems [47,70]. Abstractly, these models can be thought of as providing interfaces to real-world data—e.g., they can provide object classes [30,47], detections [59,60], and trajectories [10,11,83]. Then, these predictions are processed by programs, e.g., to identify driving patterns [5], events in TV broadcasts [28], or animal behaviors [67].

However, writing such programs can be challenging since they must still account for the fuzziness of real data. To do so, these programs typically include real-valued parameters that need to be manually tuned by the user. For example, consider a query over car trajectories designed to identify instances where one car turns in front of another. This query must capture the shape of the trajectory of both the turning car and the car crossing the intersection. In addition, the user must select the appropriate maximum duration from the first car changing lanes to the second car crossing the intersection. Even an expert would require significant experimentation to determine good parameter values; in our experience, it can take up to an hour to tune the parameters for a single query.

Appendices are available in the technical report [51].

c The Author(s) 2023

C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 459–484, 2023. https://doi.org/10.1007/978-3-031-37706-8\_23

We focus on programs that query databases of trajectories output by an object tracker [5,7,8,28,40–42,54]. Given a video, the tracker predicts the positions of objects in each frame (e.g., cars, people, or mice), as well as associations between detections of the same object across successive frames. Applications often require subsequent analysis of these trajectories. For example, in autonomous driving, when a risky scenario is encountered, engineers typically search for additional examples of that driving pattern to improve their planner [63,64,66]—e.g., cars driving too close [82] or stopping in the middle of the road [6]. Object tracking has also been used to track robots [58,81], animals for behavioral analysis [12,67,75], and basketball players for sports analytics [67,85].

We propose an algorithm for synthesizing queries over object trajectories given just a handful of input-output examples. A query takes as input a representation of a trajectory as a sequence of states (e.g., position, velocity, and acceleration) in successive frames of the video, and outputs whether the trajectory matches its semantics. Our query language is based on regular expressions—in particular, a query is a composition of a user-extensible set of predicates using the sequencing, conjunction, and iteration operators. For instance, trajectories might correspond to cars in a video; Fig. 1 shows a query for identifying cars turning at an intersection. As we discuss in Sect. 6, the full query language semantics is rich enough to subsume (variants of) Kleene algebras with tests (KAT) [46] and signal temporal logic (STL) [50]; however, such generality is seldom needed, so we use a pared-down query language that works well in practice.

Our algorithm performs enumerative search over the space of possible queries to identify ones that are consistent with the given examples. A key challenge in our setting is that our predicates have real-valued parameters that must also be synthesized. Thus, our strategy enumerates *sketches*, which are partial programs that only contain holes corresponding to real-valued parameters. For each sketch, we search over the space of real-valued parameters, while using an efficient pruning strategy to reduce the search space. At a high level, we use a quantitative semantics to directly compute "boundary parameters" at which a given example switches from being labeled positive to negative. Then, depending on the target label, we can prune the entire region of the search space on one side of these boundary parameters. We prove that this synthesis strategy comes with soundness and (partial) completeness guarantees.

We implement our approach in a system called Quivr. <sup>1</sup> Our implementation focuses on videos from fixed-position cameras. While our language and synthesis algorithm are general, the predicates we design are tailored to specific settings. We evaluate Quivr on identifying driving patterns in traffic videos, including ones inspired by recent work on autonomous driving [63,64,66], on behavior detection in a dataset of mouse trajectories [72], and on a synthetic task from the temporal logic synthesis literature [44]. We demonstrate how both our parameter pruning strategies and our query evaluation optimizations lead to substantial reductions in the running time of our synthesizer.

<sup>1</sup> Quivr stands for QUery Induction for Video tRajectories.

InLane1 ; Any ; InLane2

**Fig. 1.** (a) A video frame from a traffic camera, along with object trajectories (red) and manually annotated lanes (black). (b) The trajectories selected by the query (bottom), which selects cars turning at the intersection. (Color figure online)

In summary, our contributions are:


### **2 Overview**

We consider a hypothetical scenario where an engineer is designing a control algorithm for an autonomous car and would like to identify certain driving patterns in video data. We show how they can use our framework to synthesize a query to identify car trajectories that exhibit a given behavior.

*Video Data.* Traffic cameras are a rich source of driving behaviors [5,13,61]; one dataset used in our evaluation is YTStreams [7], which includes video from several such cameras. Figure 1(a) shows a single frame from such a video; we have used an object tracker [83] to identify all car trajectories (in red).

*Predicates.* Quivr assumes it is given a set of predicates that match portions of trajectories exhibiting behaviors of interest; during synthesis, it considers queries composed of these predicates. In Fig. 1(a), the engineer has manually annotated the lanes of interest in this video (black), to specify four InLaneK predicates that select trajectories of cars driving in each lane K visible in the video. Predicates may be configured by real-valued parameters. For example,

> -InLane1∧-DispLtθ

InLane1(A) ; Any ; InLane2(A) ∧ InLane2(B)

**Fig. 2.** A single match (top) for the multi-object query (bottom) which captures one car, A, turning into a lane behind another car, B, that is in that lane. The trajectories change color from red to green as a function of time. As can be seen, the car making the right turn does so just after the car going straight passes through the intersection. (Color figure online)

searches for trajectories where the car stays in lane 1 for a period of time and the car has a displacement at most θ between the beginning and end of that period. Note that atomic predicates, like -DispLtθ, can match multiple timesteps, whereas in formalisms like regular expressions and temporal logic, atomic predicates are over single time-steps. A key feature of our framework is that the set of available predicates is highly extensible, and the user can provide their own. See Sect. 5.1 for the predicates we use in our evaluation.

*Synthesis.* To specify a driving pattern, the engineer provides a small number of initial positive and negative examples of trajectories; then, Quivr synthesizes a query that correctly labels these examples. In Fig. 1(b), we show the result of executing the query shown, which is synthesized to identify left turns in the data. Often, there are multiple queries consistent with the initial examples. While it may be hard for users to sift through the video for positive examples, it is usually easy for them to label a given trajectory. Thus, to disambiguate, Quivr asks the user to label additional trajectories [19,36,62].

*Multi-object Queries.* So far, we have focused on queries that identify trajectories by processing each trajectory in isolation. A key feature of our framework is that users can express queries over multiple trajectories—for example,

> - -InLane1(B)∧-ChangeLane2To1(A) ; -InFront(A, B).

This query says that car B is in lane 1 while car A changes from lane 2 to lane 1, and car A ends up in front of car B. Note that the predicates now include variables indicating which object they refer to, and the predicate InFront(A, B) refers to multiple objects. An example of a pair of trajectories selected by a multi-object query is shown in Fig. 2.

#### **3 Query Language**

We describe our query language for matching object trajectories in videos. Our system first preprocesses the video using an object tracker to obtain trajectories, which are sequences <sup>z</sup> = (x0, x1, ..., xn−<sup>1</sup>) of states <sup>x</sup><sup>i</sup> ∈ X . Then, a query <sup>Q</sup> in our language maps each trajectory <sup>z</sup> to a value <sup>B</sup> <sup>=</sup> {0, <sup>1</sup>} indicating whether it matches z. Our language is similar to both STL and KAT. One key difference is that predicates are over arbitrary subsequences of z rather than single states x. In the main paper, we consider a simpler language, but in Appendix A we show how it can be extended to subsume both STL and KAT.

*Trajectories.* We begin by describing the input to a query in our language, which is the representation of one or more concurrent object trajectories in a video.

Consider a space S corresponding to a single object detection in a single video frame—e.g., <sup>s</sup> ∈S⊆ <sup>R</sup><sup>6</sup> might encode the 2D position, velocity, and acceleration of s in image coordinates. When considering m concurrent objects, let the space of *states* <sup>X</sup> <sup>=</sup> <sup>S</sup><sup>m</sup>, and then a *trajectory* <sup>z</sup> ∈ Z <sup>=</sup> <sup>X</sup> <sup>∗</sup> is a sequence <sup>z</sup> = (x0, x1, ..., x<sup>n</sup>−<sup>1</sup>) of states of length <sup>|</sup>z<sup>|</sup> <sup>=</sup> <sup>n</sup>. We use the notation <sup>z</sup><sup>i</sup>:<sup>j</sup> <sup>=</sup> (zi, z<sup>i</sup>+1, ..., z<sup>j</sup>−<sup>1</sup>) to denote a subtrajectory of <sup>z</sup>.

*Predicates.* We assume a set of predicates <sup>Φ</sup> is given, where each predicate <sup>ϕ</sup> <sup>∈</sup> <sup>Φ</sup> matches trajectories <sup>z</sup> ∈ Z; we use satϕ(z) <sup>∈</sup> <sup>B</sup> <sup>=</sup> {0, <sup>1</sup>} to indicate that <sup>ϕ</sup> matches z. As discussed below, queries in our language compose these predicates to match more complex patterns.

Next, predicates in our language may have real-valued parameters that must be specified. We denote such a predicate <sup>ϕ</sup> with parameter <sup>θ</sup> <sup>∈</sup> <sup>R</sup> by <sup>ϕ</sup>θ. To enable our synthesis algorithm to efficiently synthesize these real-valued parameters, we leverage the monotonicity in all such predicates we have used in our queries. In particular, we assume that the semantics of these predicates have the form

$$[\varphi\_{\theta}](z) := \mathbb{1}(\iota\_{\varphi}(z) \ge \theta),$$

where <sup>ι</sup><sup>ϕ</sup> : Z → <sup>R</sup> is a scoring function. We also assume that the range of <sup>ι</sup><sup>ϕ</sup> is bounded (which can be achieved with a sigmoid function, if necessary). For example, for the predicate DispLtθ, we have <sup>ι</sup>DispLt(z) = <sup>−</sup>z<sup>0</sup> <sup>−</sup> <sup>z</sup><sup>n</sup>−<sup>1</sup> . Thus, <sup>ι</sup>DispLt(z) <sup>≥</sup> <sup>θ</sup> says the total displacement is at most <sup>−</sup>θ. We describe the predicates we include in Sect. 5.1; they can easily be extended.

*Syntax.* The syntax of our language is

$$Q ::= \varphi \mid Q \; ; \; Q \mid Q^k \mid Q \land Q,$$

where Q<sup>k</sup> = Q; Q; ...; Q (k times). That is, the base case is a single predicate ϕ, and queries can be composed using sequencing (<sup>Q</sup> ; <sup>Q</sup>) and conjunction (Q∧Q). Operators for disjunction, negation, Kleene star, and STL's "until" are discussed in Appendix A.2. We describe constraints imposed on our language during synthesis in Sect. 4.7.

*Semantics.* The satisfaction semantics of queries have type -· : Q→Z→ <sup>B</sup>, where Q is the set of all queries in our language, Z is the set of trajectories, and

$$\begin{aligned} [\varphi](z) &:= \mathbf{sat}\_{\varphi}(z) \\ [Q\_1 \wedge Q\_2](z) &:= [Q\_1](z) \wedge [Q\_2](z) \\ [Q\_1 \; ; \; Q\_2](z) &:= \bigvee\_{k=0}^n [Q\_1](z\_{0:k}) \wedge [Q\_2](z\_{k:n}) \end{aligned}$$

**Fig. 3.** Satisfaction semantics of our query language; z ∈ Z is a trajectory of length n and <sup>ϕ</sup> <sup>∈</sup> <sup>Φ</sup> are predicates. Iteration (Q<sup>k</sup>) can be expressed as repeated sequencing.

<sup>B</sup> <sup>=</sup> {0, <sup>1</sup>}. In particular, -<sup>Q</sup>(z) <sup>∈</sup> <sup>B</sup> indicates whether the query <sup>Q</sup> matches trajectory z. The semantics are defined in Fig. 3. The base case of a single predicate <sup>ϕ</sup> checks whether <sup>ϕ</sup> matches <sup>z</sup>; conjunction <sup>Q</sup>1∧Q<sup>2</sup> checks if both conjuncts match; and sequencing Q<sup>1</sup> ; Q<sup>2</sup> checks if z can be split into z = z0:<sup>k</sup>z<sup>k</sup>:<sup>n</sup> in a way that Q<sup>1</sup> matches z0:<sup>k</sup> and Q<sup>2</sup> matches z<sup>k</sup>:<sup>n</sup>. The semantics can be evaluated in time <sup>O</sup>(|Q| · <sup>n</sup><sup>2</sup>).

### **4 Synthesis Algorithm**

We describe our algorithm for synthesizing queries consistent with a given set of examples. It performs a syntax-guided enumerative search over the space of possible queries [3]. In more detail, it enumerates *sketches*, which are partial programs where only parameter values are missing. For each sketch, it uses a quantitative pruning strategy to compute the subset of the input parameters for which the resulting query is consistent with the given examples. A key contribution is how our algorithm uses quantitative semantics for quantitative pruning.

#### **4.1 Problem Formulation**

*Partial Queries.* A *partial query* is in the grammar

$$Q ::= \text{??} \mid \varphi \text{??} \mid \varphi \mid Q \text{; } Q \mid Q^k \mid Q \land Q \text{.}$$

Note that there are two kinds of holes: (i) a *predicate hole* h = ?? that can be filled by a sub-query Q, and (ii) a *parameter hole* h = ϕ?? that can be filled by a real value <sup>θ</sup><sup>h</sup> <sup>∈</sup> <sup>R</sup>. We denote the predicate holes of <sup>Q</sup> by <sup>H</sup><sup>ϕ</sup>(Q), the parameter holes by <sup>H</sup><sup>θ</sup>(Q), and let <sup>H</sup>(Q) = <sup>H</sup><sup>ϕ</sup>(Q) ∪ H<sup>θ</sup>(Q). A partial query <sup>Q</sup> is a *sketch* (denoted <sup>Q</sup> ∈ Qsketch) [71] if <sup>H</sup><sup>ϕ</sup>(Q) = <sup>∅</sup>, and is *complete* (denoted <sup>Q</sup> <sup>∈</sup> <sup>Q</sup>¯) if <sup>H</sup>(Q) = <sup>∅</sup>. For example, for <sup>Q</sup> <sup>=</sup> -DispLt??1∧??2, we have <sup>H</sup><sup>θ</sup>(Q) = {??1} and <sup>H</sup><sup>ϕ</sup>(Q) = {??2}. (We label each hole <sup>h</sup> = ??<sup>i</sup> with an identifier <sup>i</sup> <sup>∈</sup> <sup>N</sup> to distinguish them.)

*Refinements and Completions.* Given query <sup>Q</sup> ∈ Q, predicate hole <sup>h</sup> ∈ H<sup>ϕ</sup>(Q), and production <sup>R</sup> <sup>=</sup> <sup>Q</sup> <sup>→</sup> <sup>f</sup>(Q1, ..., Qk) we can *fill* <sup>h</sup> with <sup>R</sup> (denoted <sup>Q</sup> <sup>=</sup> fill(Q, h, R)) by replacing h with f(??1, ..., ??k), where each ??i is a fresh hole, and similarly given a parameter hole <sup>h</sup> ∈ Hθ(Q) and a value <sup>θ</sup><sup>h</sup> <sup>∈</sup> <sup>R</sup>. We call <sup>Q</sup> <sup>a</sup> *child* of <sup>Q</sup> (denoted <sup>Q</sup> <sup>→</sup> <sup>Q</sup> ). Next, we call Q a *refinement* of Q (denoted Q ∗ −→ <sup>Q</sup>) if there exists a sequence <sup>Q</sup> <sup>→</sup> ... <sup>→</sup> <sup>Q</sup>; if furthermore <sup>Q</sup> <sup>∈</sup> <sup>Q</sup>¯, we say it is a *completion* of Q. For example, we have

$$??1 \to ??2 \,; \, ??3 \to \langle \mathtt{InLame1} \rangle \,; \, ??3 \to \dots \dots$$

Here, -InLane1; ??3 is a child (and refinement) of ??2 ; ??3 obtained by filling ??2 with <sup>Q</sup> → -InLane1—i.e.,

$$\langle \mathsf{lnLan1} \rangle \,; \, ??3 = \mathsf{fll}(??2 \,; \, ??3 , ??2 , Q \to \langle \mathsf{lnLan1} \rangle ) . $$

*Parameters.* We let <sup>θ</sup> <sup>∈</sup> <sup>R</sup>|Hθ(Q)<sup>|</sup> denote a choice of parameters for each <sup>h</sup> <sup>∈</sup> <sup>H</sup><sup>θ</sup>(Q), let <sup>θ</sup><sup>h</sup> <sup>∈</sup> <sup>Θ</sup><sup>h</sup> <sup>⊆</sup> <sup>R</sup> denote the parameter for hole <sup>h</sup>, and let <sup>Q</sup><sup>θ</sup> denote the query obtained by filling each <sup>h</sup> ∈ H<sup>θ</sup>(Q) with <sup>θ</sup>h. Note that if <sup>Q</sup> ∈ Qsketch, then <sup>Q</sup><sup>θ</sup> <sup>∈</sup> <sup>Q</sup>¯ is complete. For example, consider the sketch

$$Q = \langle \mathsf{Displ\\_t}\_{??1} \rangle \land \langle \mathsf{Min\\_length}\_{??2} \rangle \dots$$

This query has two holes, so its parameters are <sup>θ</sup> <sup>∈</sup> <sup>R</sup><sup>2</sup>. If <sup>θ</sup> = (3.2, <sup>5</sup>.0), then θ??1 = 3.2 is used to fill hole ??1 and θ??2 = 5.0 is used to fill ??2. In particular,

$$Q\_{\theta} = \langle \mathsf{Displ.t}\_{3.2} \rangle \wedge \langle \mathsf{MinLength}\_{5.0} \rangle \dots$$

*Query Synthesis Problem.* Given examples <sup>W</sup> ⊆ W <sup>=</sup> Z × <sup>B</sup>, where <sup>B</sup> <sup>=</sup> {0, <sup>1</sup>}, our goal is to find a query <sup>Q</sup> <sup>∈</sup> <sup>Q</sup>¯ that correctly labels these examples—i.e.,

$$\psi\_W(Q) := \bigwedge\_{(z,y)\in W} ([Q](z) = y).$$

Thus, ψ<sup>W</sup> (Q) indicates whether Q is consistent with the labeled examples W. Our goal is to devise a synthesis algorithm that is sound and complete—i.e., it finds a query that satisfies ψ<sup>W</sup> (Q) = 1 if and only if one exists.

#### **4.2 Algorithm Overview**

Our algorithm enumerates sketches <sup>Q</sup> ∈ Qsketch; for each one, it tries to compute parameter values θ such that the completed query Q<sup>θ</sup> is consistent with W—i.e., ψ<sup>W</sup> (Qθ) = 1. It can either stop once it has found a consistent query, or identify additional queries that are consistent with W. Algorithm 1 shows this high-level strategy—at each iteration, it selects a sketch Q, determines a region B of the parameter space containing consistent parameters <sup>θ</sup> <sup>∈</sup> <sup>B</sup>, and adds (Q, B) to a list of consistent queries that solve the synthesis problem.

The key challenge is searching over the space of continuous parameters θ for a given sketch Q such that Q<sup>θ</sup> is consistent with W. For efficiency, we rely heavily on pruning the search space. At a high level, consider evaluating a single candidate parameter <sup>θ</sup> on a single example (z,y) <sup>∈</sup> <sup>W</sup>—i.e., check whether


1: **procedure** SynthesizeQuery(W) 2: <sup>Q</sup>con <sup>←</sup> <sup>∅</sup> 3: **for** Q ∈ Qsketch **do** 4: B ← SynthesizeParameters(W, Q) 5: Qcon ← {(Q, B)} 6: **return** Qcon


Previous work has leveraged this property to prune the search space [49,53, 78]. Using a strategy based on binary search, for a given example (z,y) <sup>∈</sup> <sup>W</sup>, we can identify "boundary" parameters θ to accuracy ε in O(log(1/ε)) steps—i.e., compute θ for which -<sup>Q</sup><sup>θ</sup>−ε(z) = 1 and -Q<sup>θ</sup>+ε(z) = 0.

Our algorithm avoids this binary search process, which can lead to a significant speedup in practice. The key idea is to devise a quantitative semantics for queries that directly computes θ; in fact, this quantitative semantics is closely related to robust temporal logic semantics, where the conjunction and disjunction of the satisfaction semantics are replaced with minimum and maximum, respectively.

#### **4.3 Pruning with Boundary Parameters**

We begin by describing how "boundary parameters" can be used to prune a portion of the search space over parameters. First, for *any* candidate parameters <sup>θ</sup>, we can prune parameters <sup>θ</sup> <sup>≤</sup> <sup>θ</sup> (if -<sup>Q</sup>θ(z) = 1 and <sup>y</sup> = 0) or <sup>θ</sup> <sup>≥</sup> <sup>θ</sup> (if -Qθ(z) = 0 and y = 1). Pruned regions of the parameter space take the form of hyper-rectangles, which we call *boxes*. For convenience, let <sup>∞</sup> := (∞,...,∞).

**Definition 1.** Given x, y <sup>∈</sup> <sup>R</sup>¯ <sup>d</sup>, where <sup>R</sup>¯ <sup>=</sup> <sup>R</sup> ∪ {±∞}, a *box* is an axis-aligned half-open hyper-rectangle x, y := {<sup>v</sup> <sup>|</sup> <sup>x</sup><sup>i</sup> < v<sup>i</sup> <sup>≤</sup> <sup>y</sup><sup>i</sup>} ⊆ <sup>R</sup><sup>d</sup>.

The key property ensuring that parameters prune boxes of the search space is that the semantics are monotonically decreasing in θ.

**Lemma 1.** *Given sketch* <sup>Q</sup>*, trajectory* <sup>z</sup>*, and two candidate parameters* θ, θ <sup>∈</sup> <sup>R</sup><sup>d</sup> *such that* <sup>θ</sup> <sup>≤</sup> <sup>θ</sup> *component-wise, we have* -<sup>Q</sup>θ(z) <sup>≥</sup> -Q<sup>θ</sup>-(z)*.*

The proof follows by structural induction on the query semantics: the base case follows since the semantics <sup>1</sup>(ιϕ(z) <sup>≥</sup> <sup>θ</sup>k) for predicates is monotonically decreasing in θk, and the inductive case follows since conjunction and disjunction are monotonically increasing in their inputs (so they are also monotonically decreasing in θk). Below, we show how monotonicity ensures that we can prune whole regions of the search space if we find boundary parameters.

As an example, suppose we have two trajectories, z<sup>0</sup> of a car driving quickly and then slowly, and z<sup>1</sup> of a car driving slowly and then quickly, and that we are trying to synthesize a query for <sup>W</sup> <sup>=</sup> {(z0, 0),(z1, 1)}. For simplicity, we assume both z<sup>0</sup> = (0.9, 0.6) and z<sup>1</sup> = (0.5, 0.8) have just two time steps each, with just a single component representing velocity. Furthermore, we assume there is just a single predicate -VelGtθ matching time steps where the velocity is at least <sup>θ</sup>, where <sup>θ</sup> is a real-valued parameter. Since -VelGtθ matches single time steps, the satisfaction semantics is 0 except on trajectories of length 1, so:

$$\begin{aligned} \iota\_{\text{VerdGt}}((z\_0)\_{0:1}) &= 0.9 & \iota\_{\text{VerdGt}}((z\_0)\_{1:2}) &= 0.6 & \iota\_{\text{VerdGt}}((z)\_{i:i}) &= -\infty \\ \iota\_{\text{VerdGt}}((z\_1)\_{0:1}) &= 0.5 & \iota\_{\text{VerdGt}}((z\_1)\_{1:2}) &= 0.8 & \iota\_{\text{VerdGt}}((z)\_{0:2}) &= -\infty \end{aligned}$$

Consider the sketch <sup>Q</sup> <sup>=</sup> -VelGt??1;-VelGt??2. We can see that the candidate parameters (0.5, 0.6) satisfy -Q(0.5,0.6)(z1) = 1:

$$\begin{aligned} \mathbb{I}\left[Q\_{\{0,5,0.6\}}\right]((z\_1)\_{0:n}) &= \bigvee\_{k=0}^{2} \mathbb{I}\{\text{VelGt}\_{0.5}\} \mathbb{I}((z\_1)\_{0:k}) \wedge \left[\{\text{VelGt}\_{0.6}\}\right]((z\_1)\_{k:n})\\ &= \mathbb{I}\{\text{VelGt}\_{0.5}\} \mathbb{I}((z\_1)\_{0:1}) \wedge \left[\{\text{VelGt}\_{0.6}\}\right]((z\_1)\_{1:2})\\ &= \mathbb{I}(0.5 \ge 0.5) \wedge \mathbb{I}(0.8 \ge 0.6)\\ &= 1, \end{aligned}$$

where the second equality holds because -VelGt<sup>θ</sup> matches only length-1 trajectories, so the k = 0 and k = 2 cases evaluate to 0. Since the semantics are monotonically decreasing, we have -<sup>Q</sup>θ(z1) = 1 for any <sup>θ</sup> <sup>∈</sup>(−∞, −∞),(0.5, <sup>0</sup>.6).

Notice, however, that if we were to move any ε > 0 upward, we would have -<sup>Q</sup>(0.5+ε1,0.6+ε2)(z1) = <sup>1</sup>(0.<sup>5</sup> <sup>≥</sup> <sup>0</sup>.5 + <sup>ε</sup>1) <sup>∧</sup> <sup>1</sup>(0.<sup>8</sup> <sup>≥</sup> <sup>0</sup>.6 + <sup>ε</sup>2) = 0. So we know -<sup>Q</sup>θ(z1) = 0 for any <sup>θ</sup> <sup>∈</sup>(0.5, <sup>0</sup>.6),(∞,∞). This is because (0.5, <sup>0</sup>.6) lies on the boundary between {θ <sup>|</sup> -Q<sup>θ</sup>- (z)=0} and {θ <sup>|</sup> -Q<sup>θ</sup>- (z)=1}. This boundary plays a key role in our algorithm.

**Definition 2.** Given a sketch Q with d parameter holes and a trajectory z, we say <sup>θ</sup> <sup>∈</sup> <sup>R</sup><sup>d</sup> ∪ {⊥, } is a *boundary parameter* if one of the following holds:

– <sup>θ</sup> <sup>∈</sup> <sup>R</sup><sup>d</sup> and -Qθ(z) = 1, but -Q<sup>θ</sup>- (z) = 0 for all <sup>θ</sup> <sup>∈</sup>θ, ∞ – <sup>θ</sup> <sup>=</sup> <sup>⊥</sup> and -Q<sup>θ</sup>- (z) = 0 for all <sup>θ</sup> ∈ 
−∞ , ∞ – <sup>θ</sup> <sup>=</sup> and -Q<sup>θ</sup>-(z) = 1 for all <sup>θ</sup> ∈ 
−∞ , ∞

In the first case, by monotonicity, we also have -Q<sup>θ</sup>- (z) = 1 for all <sup>θ</sup> ∈ 
−∞ , θ; thus, θ lies on the boundary between parameters θ where Q<sup>θ</sup> evaluates to 1 and those where it evaluates to 0. The second and third cases are where Q<sup>θ</sup> always evaluates to 0 and 1, respectively.

Given a boundary parameter <sup>θ</sup> for an example (z,y) <sup>∈</sup> <sup>W</sup>, we can prune θ, ∞ if <sup>y</sup> = 1 or −∞ , θ if <sup>y</sup> = 0. Intuitively, boundary parameters provide optimal pruning along a fixed direction in the parameter space. Thus, our algorithm focuses on computing boundary parameters for pruning.

**Fig. 4.** (a) shows a boundary parameter, θ1, for z1, and a region that is inconsistent with z<sup>1</sup> and can be pruned (red), as well as a region that is consistent with it (blue). (b) similarly shows a boundary parameter θ<sup>0</sup> for z0. (c) shows the pruning pair composed of θ<sup>0</sup> and θ1, a region consistent with both (blue), and regions inconsistent with either (red). (d) is the same as (c), but if θ<sup>0</sup> and θ<sup>1</sup> swapped places. The labels b<sup>0</sup> through b<sup>8</sup> denote analogous boxes in (c) & (d). (e) shows how, if (d) were the result of the first step of search and b<sup>6</sup> were chosen next, search could proceed. (f) shows ground truth consistent (blue) and inconsistent (red) regions that the search process in (d) & (e) might converge toward. (Color figure online)

In Fig. 4(a), if θ<sup>1</sup> is a boundary parameter for z1, we know that the blue region satisfies z1, and thus is consistent with the label 1, while the red region dissatisfies z1, and thus is inconsistent with the label 1. Similarly, in Fig. 4(b), if θ<sup>0</sup> is a boundary parameter for z0, we know that the red satisfies z1, and thus is inconsistent with the label 0, while the blue dissatisfies z0, and thus is consistent with the label 0.

#### **4.4 Pruning with Pairs of Boundary Parameters**

To extend pruning to the entire dataset W, we could simply prune the union of the individual pruned regions for each (z,y) <sup>∈</sup> <sup>W</sup>. However, one important feature of our approach is that we can also establish regions of the parameter space where the parameters are guaranteed to be consistent with W. To formalize this idea, we introduce the concept of a "pruning pair", which is a pair of boundary parameters which might allow us to find such a consistent region.

**Definition 3.** Given a sketch Q and a dataset W, a pair of boundary parameters <sup>θ</sup>−, θ<sup>+</sup> <sup>∈</sup> <sup>R</sup><sup>d</sup> ∪ {⊥, } is a *pruning pair* for <sup>Q</sup> and <sup>W</sup> if all of the following hold:


If θ<sup>−</sup> < θ<sup>+</sup>, the pruning pair (θ−, θ<sup>+</sup>) is *consistent*, and *inconsistent* otherwise.

Our algorithm searches for pruning pairs along a fixed direction—i.e., it considers a curve <sup>L</sup> <sup>⊆</sup> <sup>R</sup><sup>d</sup> and looks for the following pruning pair along <sup>L</sup>:

$$\theta^+ = \sup \left\{ \theta \in L \, \Big| \, \bigwedge\_{z \in W^+} \mathbb{Q}Q\_{\theta} \|(z) = 1 \right\}, \quad \theta^- = \inf \left\{ \theta \in L \, \Big| \, \bigwedge\_{z \in W^-} \mathbb{Q}Q\_{\theta} \|(z) = 0 \right\}.$$

Intuitively, θ<sup>+</sup> is the largest θ that correctly classifies all positive examples, and conversely for θ−. We restrict to curves L that are monotonically increasing in all components, in which case the supremum and infimum are well defined since L comes with a total ordering (from its smallest point to its largest) that is consistent with the standard partial order on Rd. Then, (θ−, θ<sup>+</sup>) form a pruning pair: since θ<sup>+</sup> is a boundary parameter for z, if we take θ<sup>+</sup> to be any larger, then we must have -<sup>Q</sup>θ(z) = 0 for some <sup>z</sup> <sup>∈</sup> <sup>W</sup><sup>+</sup>, and similarly for <sup>θ</sup>−.

Given a curve L, we can compute an approximation to θ<sup>+</sup> and θ<sup>−</sup> via binary search. However, our algorithm avoids the need to do so by directly computing θ<sup>+</sup> and θ<sup>−</sup> using a quantitative semantics, which we describe in Sect. 4.6.

Figure 4(c) shows how the pair of boundary parameters θ<sup>0</sup> for z<sup>0</sup> and θ<sup>1</sup> for z<sup>1</sup> (where L is the diagonal line) prunes the parameter space. The blue region is guaranteed to be consistent with W, as it is the intersection of the region below θ<sup>+</sup>, which must satisfy -Qθ(z1) = 1, and the region above θ−, which must satisfy -Qθ(z0) = 0. Conversely, the red regions are inconsistent with either z<sup>0</sup> or z1, and therefore with W. Thus, the red regions can be pruned, whereas the blue regions are solutions to our synthesis problem. Note that the red region is the union of the red regions in Fig. 4(a) and (b), whereas the blue region is the intersection of the blue regions in Fig. 4(a) and (b).

This pattern holds for any consistent pruning pair (θ<sup>−</sup> < θ<sup>+</sup>); if instead the pair is inconsistent (θ<sup>−</sup> <sup>≥</sup> <sup>θ</sup><sup>+</sup>), then the resulting pattern is illustrated in Fig. 4(d); in this case, we can prune the red regions as before, but there is no blue region of solutions. In general, for a d dimensional parameter space, a pruning pair divides the parameter space into 3<sup>d</sup> boxes (i.e., for each dimension, the box can be below, in line with, or above the center box). The regions below θ<sup>−</sup> and above θ<sup>+</sup> can be pruned, and the region between θ<sup>−</sup> and θ<sup>+</sup> (if one exists) contains synthesis solutions. Precisely, it follows from the definitions and monotonicity that:

**Lemma 2.** *Every* <sup>θ</sup> ∈ 
−∞ , θ<sup>−</sup> *and* <sup>θ</sup> <sup>∈</sup>θ+,∞ *is inconsistent with* <sup>W</sup>*, and every* <sup>θ</sup> <sup>∈</sup>θ−, θ<sup>+</sup> *is consistent with* <sup>W</sup>*.*

The remaining boxes need to be further analyzed by our algorithm.

#### **4.5 Pruning Parameter Search Algorithm**

Next, we describe Algorithm 2, which searches over the space of parameters to fill a sketch Q for a given dataset W. The algorithm uses a subroutine that takes a box and returns a pruning pair in that box, which we describe in Sect. 4.6. Given this subroutine, our algorithm maintains a work-list of "unknown" boxes (i.e., unknown whether parameters in these boxes are consistent or inconsistent with W). At each iteration, it pops a box from the work-list (in first-in-first-out order), uses the given subroutine to find a pruning pair inside that box, applies the pruning procedure described in the previous section, and then adds each new unknown box to the work-list.

For the last step, the current box b is divided into 3<sup>d</sup> smaller boxes. The box <sup>b</sup>center := min{θ−, θ<sup>+</sup>}, max{θ−, θ<sup>+</sup>} is pruned (added to <sup>B</sup>inc) if the



pair (θ−, θ<sup>+</sup>) is inconsistent, and contains solutions to the synthesis problem otherwise (added to <sup>B</sup>con). The boxes <sup>b</sup>lower <sup>=</sup> −∞ , min{θ−, θ<sup>+</sup>} and <sup>b</sup>upper <sup>=</sup> max{θ−, θ<sup>+</sup>}, ∞ are always pruned. The boxes <sup>b</sup> <sup>∈</sup> <sup>B</sup>incomp are the remaining corners of b, and always have indeterminate consistency (added to <sup>B</sup>unk). The remaining boxes <sup>b</sup> <sup>∈</sup> <sup>B</sup>extra are indeterminate if (θ−, θ<sup>+</sup>) is consistent, and inconsistent otherwise. In our example, if the first step of the algorithm yielded Fig. 4(d), then the second step might pop b<sup>6</sup> and yield Fig. 4(e).

The following soundness result follows directly from Lemma 2.

**Theorem 1.** *In Algorithm 2, every* <sup>θ</sup> <sup>∈</sup> <sup>B</sup>con *is consistent with* <sup>W</sup> *for* <sup>Q</sup>*, and every* <sup>θ</sup> <sup>∈</sup> <sup>B</sup>inc *inconsistent.*

In addition, the algorithm is complete for almost all parameters:

**Theorem 2.** *The Lebesgue measure of* {<sup>θ</sup> <sup>∈</sup> <sup>b</sup> <sup>|</sup> <sup>b</sup> <sup>∈</sup> <sup>B</sup>unk} → <sup>0</sup> *as* <sup>N</sup> → ∞*.*

See Appendix D.1 for the proof. In other words, all parameters outside a subset of measure zero are eventually classified as consistent or inconsistent; intuitively, the parameters that may never be classified are the ones along the decision boundary. This result holds since at any search depth, the fraction of the parameter space pruned can be lower-bounded.

#### **4.6 Computing Pruning Pairs via Quantitative Semantics**

The pruning algorithm depends on the ability to compute, given a box b, a pruning pair (θ−, θ<sup>+</sup>) on the restriction of the parameter space to b. Recall that <sup>θ</sup><sup>+</sup> must be a boundary parameter for some <sup>z</sup><sup>+</sup> <sup>∈</sup> <sup>W</sup><sup>+</sup> and must satisfy -<sup>Q</sup><sup>θ</sup><sup>+</sup> (z) = 1 for all other <sup>z</sup> <sup>∈</sup> <sup>W</sup><sup>+</sup>, and <sup>θ</sup><sup>−</sup> must be a boundary parameter for some <sup>z</sup><sup>−</sup> <sup>∈</sup> <sup>W</sup>−, and must satisfy -<sup>Q</sup><sup>θ</sup><sup>−</sup> (z) = 0 for all other <sup>z</sup> <sup>∈</sup> <sup>W</sup>−.

Given a box <sup>b</sup> <sup>=</sup> <sup>θ</sup>min, θmax, our algorithm takes <sup>L</sup> <sup>⊆</sup> <sup>R</sup><sup>d</sup> to be the diagonal from θmin to θmax and computes the pruning pair along L. We can na¨ıvely

$$\begin{aligned} [\varphi\_{??i}]\_{v,u}^q(z) &:= \frac{\iota\_\varphi(z) - \upsilon\_i}{u\_i} \\ [[\varphi]]\_{v,u}^q(z) &:= \begin{cases} \infty & \text{if } \mathsf{sat}\_\varphi(z) = 1 \\ -\infty & \text{if } \mathsf{sat}\_\varphi(z) = 0. \end{cases} \\ [Q\_1 \wedge Q\_2]\_{v,u}^q(z) &:= \min\{ [Q\_1]\_{v,u}^q(z), [Q\_2]\_{v,u}^q(z) \} \\ [[Q\_1 \; ; \; Q\_2]\_{v,u}^q(z) &:= \max\_{0 \le k \le n} \min\{ [Q\_1]\_{v,u}^q(z\_{0:k}), [Q\_2]\_{v,u}^q(z\_{k:n}) \} \end{aligned}$$

**Fig. 5.** The quantitative semantics of our language, taking a sketch Q, trajectory z, parameter <sup>v</sup> <sup>∈</sup> <sup>R</sup><sup>d</sup>, and positive vector <sup>u</sup> <sup>∈</sup> <sup>R</sup><sup>d</sup> <sup>&</sup>gt;0. n is the length of z.

use binary search: for θ<sup>+</sup>, we search for the parameters where <sup>z</sup>∈W<sup>+</sup> -Qθ(z) transitions from 0 to 1, and similarly for θ<sup>−</sup> and <sup>z</sup>∈<sup>W</sup> <sup>−</sup> <sup>¬</sup>-Qθ(z).

Instead, by leveraging a quantitative semantics, we can directly compute θ<sup>+</sup> and θ−, thereby reducing computation time substantially. Given a sketch <sup>Q</sup>, trajectory <sup>z</sup>, parameter <sup>v</sup> <sup>∈</sup> <sup>R</sup><sup>d</sup>, and positive vector <sup>u</sup> <sup>∈</sup> <sup>R</sup><sup>d</sup> <sup>&</sup>gt;0, we devise a quantitative semantics -Q<sup>q</sup> v,u(z) <sup>∈</sup> <sup>R</sup>¯ such that the parameter -<sup>Q</sup>v,u(z) · <sup>u</sup> <sup>+</sup> <sup>v</sup> is a boundary parameter. Intuitively, this semantics computes, starting at v, how many u-sized steps must be taken to reach the boundary. (For the uses in our algorithm, the number of steps is always in [0, 1].) Then, for a box b = <sup>θ</sup>min, θmax, we can take <sup>v</sup> <sup>=</sup> <sup>θ</sup>min and <sup>u</sup> <sup>=</sup> <sup>θ</sup>max <sup>−</sup> <sup>θ</sup>min, and compute

$$\theta^+ = \left(\min\_{z \in W^+} \|Q\|\_{v,u}^q(z)\right) \cdot u + v, \qquad \theta^- = \left(\max\_{z \in W^-} \|Q\|\_{v,u}^q(z)\right) \cdot u + v.$$

We define the quantitative semantics in Fig. 5. The base case of ϕ?? adjusts and rescales ι<sup>ϕ</sup> by v and u, and the other cases replace conjunction and disjunction in the satisfaction semantics with minimum and maximum. We have the following key result (where ∞ · <sup>u</sup> := , −∞ · <sup>u</sup> := <sup>⊥</sup>, <sup>+</sup> <sup>v</sup> := , and <sup>⊥</sup> <sup>+</sup> <sup>v</sup> := <sup>⊥</sup>):

**Theorem 3.** *For a sketch* <sup>Q</sup>*, trajectory* <sup>z</sup>*, parameter* <sup>v</sup> <sup>∈</sup> <sup>R</sup><sup>d</sup>*, and positive vector* <sup>u</sup> <sup>∈</sup> <sup>R</sup><sup>d</sup> <sup>&</sup>gt;0*, we have that* -Q<sup>q</sup> v,u(z) · <sup>u</sup> <sup>+</sup> <sup>v</sup> *is a boundary parameter of* <sup>z</sup> *for* <sup>Q</sup>*.*

See Appendix D.2 for the full proof. For intuition, consider <sup>θ</sup>min <sup>=</sup> 0 and <sup>θ</sup>max <sup>=</sup> <sup>1</sup> (i.e., the current box <sup>b</sup> <sup>⊆</sup> <sup>R</sup><sup>d</sup> is the unit hypercube). Then, <sup>v</sup> <sup>=</sup> 0 and <sup>u</sup> <sup>=</sup> 1, so -Q<sup>q</sup> v,u reduces to the standard max-min quantitative semantics for temporal logic [25].

Now, if we consider the satisfaction semantics of a base predicate ϕ<sup>θ</sup><sup>i</sup> = <sup>1</sup>(ιϕ(z) <sup>≥</sup> <sup>θ</sup>i), then the value of <sup>θ</sup><sup>i</sup> where the sementics flips is just <sup>ι</sup>ϕ(z). So any parameter with i-th component ιϕ(z) is a boundary parameter, and since L has the same slope in all dimensions, the boundary parameter along L is <sup>ι</sup>ϕ(z) · 1 + 0 = ϕ??<sup>i</sup> q 0,<sup>1</sup> (z) · 1 + 0.

In the inductive cases, it suffices to show that we can replace conjunction and disjunction with minimum and maximum in the semantics. Since the satisfaction semantics is monotonically decreasing, as we move upward along L, at some point we will transition from 1 to 0. A conjunction becomes 0 when either conjunct becomes 0, so the transition will occur when we hit the first of the conjuncts' transition points (their minimum). Dually, a disjunction becomes 0 when both disjuncts become 0, so we will transition at the last of the disjuncts' transition points (their maximum).

Finally, the intuition behind u and v is that they "preprocess" the parameters so that we evaluate along the diagonal of the current box instead of  <sup>0</sup>, <sup>1</sup> .

#### **4.7 Implementation**

We implement our approach in a system called Quivr. It begins by running Algorithm 1 on a small number of labeled examples.

*Active Learning.* With a small number of examples, there are typically many queries that are consistent with the labels, and yet which disagree on the labels of the remaining data. To disambiguate, we use an active learning strategy, asking the user to label specific trajectories that we choose, which are then added to our set of labeled examples. Queries that are not consistent with the new label are discarded. The labeling process continues until the set of consistent queries agrees on the labels of all unlabeled data.

When choosing the trajectory z<sup>∗</sup> to query the user for next, we select the one on which the set of consistent queries C disagrees most—i.e.,

$$z^\* = \underset{z \in Z}{\text{arg min}} \left| J(z) - \frac{1}{2} \right|,$$

where

$$J(z) := |C|^{-1} \sum\_{Q\_{\theta} \in C} \mathbf{1} \left( \psi\_{(z,y)}(Q\_{\theta}) \right),$$

is the fraction of consistent queries that predict a positive label for trajectory z.

*Search Implementation.* In some cases, searching for consistent parameters may take a very long time. To improve performance, we impose a timeout: for each sketch, we pause search if either: (i) we find some consistent box of parameters or (ii) we've exceeded 25 steps. In both cases, we save the sets of consistent, inconsistent, and unknown boxes. At each step of active learning, the newly labeled example may render previously consistent parameters inconsistent, so we mark all consistent boxes as unknown. We then resume search, again until (i) we find some consistent box (which may be the same one we had before), or (ii) we again exceed 25 steps.

Note that while this timeout may cause us to query the user more often than is strictly necessary, it does not affect either the soundness or completeness of our approach, as we continue search after querying the user.

*Complete Query Selection.* Active learning and evaluation of F<sup>1</sup> scores (in Sect. 5) both require complete queries with specific parameters, rather than sketches


**Table 1.** The predicates used for the YTStreams dataset.

with boxes of parameters. Since the set C of consistent queries is infinitely large, we instead we use one query for each sketch that is known to have consistent parameters (sketches where search timed-out are thus not included). For those sketches, we pick the middle of the box of known-consistent parameters.

### **5 Evaluation**

We demonstrate how our approach can be used to synthesize queries to solve interesting tasks: in particular, we show that (i) given just a few initial examples, it can synthesize queries that achieve good performance on a held-out test set, and (ii) our optimizations significantly reduce the synthesis time.

#### **5.1 Experimental Setup**

*Datasets.* We evaluate on two datasets of object trajectories: YTStreams [7], consisting of video and extracted object trajectories from fixed-position traffic cameras, and MABe22 [72], consisting of trajectories of up to three mice interacting in a laboratory setting. We also evaluate on a synthetic maritime surveillance task from the STL synthesis literature [44]. On YTStreams, we use two traffic cameras, one in Tokyo and one in Warsaw, and we consider single cars or pairs of cars. On MABe22, we consider pairs of mice. For the predicates used, see Table 1 for YTStreams, Appendix Table 5 for MABe22, and Appendix Table 6 for maritime surveillance.

*Tasks.* On YTStreams, we manually wrote 5 ground truth queries. Several queries apply to multiple configurations (e.g., different pairs of lanes), resulting in 10 queries total (tasks H-Q in Table 2). The real-valued parameters were chosen manually, by visually examining whether they were selecting the desired trajectories. These queries cover a wide range of behaviors; for instance, they can **Table 2.** Ground-truth queries for the YTStreams dataset. "IDs" indicates which tasks are instances of a given query. Multiple instantiations correspond to different lanes being used for "lane 1" and "lane 2". The first is a one-object Shibuya query, the second is a one-object Warsaw query, and the rest are two-object Warsaw queries.


**Fig. 6.** Trajectories selected by multi-object queries. Each image shows two objects; the color of each one changes from red to green to denote the progression of time. Left: Unprotected right turn into lane with oncoming traffic. Middle: Bottom car drives faster than the top one and passes it. Right: One car driving closely behind the other. (Color figure online)

capture behaviors such as human drivers making unprotected turns, an important challenge for autonomous cars [64], as well as cars trying to pass [66]. We show examples of trajectories selected by three of our multi-object queries in Fig. 6. MABe22 describes 9 queries for scientifically interesting mouse behavior. We implemented the 6 most complex to use as ground truth queries (tasks A-F in Appendix Table 7). The maritime surveillance task has trajectory labels and so does not need a ground truth query (task G).

*Synthesis.* For each task, we divide the set Z of all trajectories into a train set Ztrain and a test set Ztest, using trajectories in the first half of the video for training, and those in the second half for testing. We randomly sample a set of initial labeled examples W from Ztrain, with 2 samples being positive and 10 being negative, and then actively label 25 additional examples from Ztrain. For YTStreams and MABe22, labels are from the ground truth query.

**Table 3.** F<sup>1</sup> score after n steps of active learning, with our algorithm for selecting tracks to label ("Q"), an active learning ablation ("R"), an LSTM ("L"), and a transformer ("T"). For Q and R, there may be many queries consistent with the labeled data, so the median F<sup>1</sup> score is reported. Bold indicates best score at a given number of steps.


For tractability, we limit search to sketches with at most three predicates, at most two of which may have parameters. In most cases, this excludes the ground truth from the search space.

#### **5.2 Accuracy of Synthesized Queries**

We show that Quivr synthesizes accurate queries from just a few labeled examples. We evaluate the F<sup>1</sup> score of the synthesized queries on Ztest. Recall that our algorithm returns a list C of consistent queries; we report the median F<sup>1</sup> score across <sup>Q</sup> <sup>∈</sup> <sup>C</sup>.

*Baselines.* We compare to (i) an ablation where we replace our active learning strategy with an approach that labels z uniformly at random from the remaining unlabeled training examples; (ii) an LSTM [16,33] neural network; and (iii) a transformer neural network [26,29,77]. Because neural networks perform poorly on such small datasets, we pretrain the LSTM on an auxiliary task, namely, trajectory forecasting [43]. Then, we freeze the hidden representation of the learned LSTM, and use these as features to train a logistic regression model on our labeled examples. The neural network baselines do active learning by selecting among the unlabeled trajectories the one with the highest predicted probability of being positive.

*Results.* We show the F<sup>1</sup> score of each of the 17 queries in Table 3 after 0, 5, 10, and 25 steps of active learning. After just 10 steps, our approach provides F<sup>1</sup> score above 0.99 on 10 of 17 queries, and after 25 steps, it yields an F<sup>1</sup> score


**Table 4.** Running time (seconds) of synthesis (mean ± standard error) using binary search (B) and quantitative semantics (Q) running on CPU and GPU, with 25 steps of active learning.

above 0.9 on all but 2 queries. Thus, Quivr is able to synthesize accurate queries with relatively little user input. The neural networks achieve poor performance, particularly on the more difficult queries.

### **5.3 Synthesis Running Time**

Next, we show that quantitative pruning and using a GPU each significantly reduce synthesis time, evaluating total running time for 25 steps of active learning.

*Ablations.* We compare to two ablations: (i) using the binary search approach of [53] to find pruning pairs, rather than using our quantitative semantics, and (ii) evaluating the matrix semantics (Appendix A.1) on a CPU rather than a GPU.

*Results.* In Fig. 4, we report the running time of our algorithms on a CPU (2× AMD EPYC 7402 24-Core) and a GPU (1× NVIDIA RTX A6000). For binary search, on average, the GPU is 7.6<sup>×</sup> faster than the CPU. On a GPU, using the quantitative semantics rather than binary search offers another 5.0<sup>×</sup> speed-up.

### **6 Related Work**

*Monotonicity for Parameter Pruning.* We build on [49] for our parameter pruning algorithm. Their approach has been applied to synthesizing STL formulas for sequence classification by first enumerating sketches and then using monotonicity to find parameters, similar to our binary search baseline [53]. We replace binary search with our novel strategy based on quantitative semantics, leading to 5.0<sup>×</sup> speedup. There is also work building on [49] to create logically-relevant distance metrics between trajectories by taking the Hausdorff distance between parameter satisfaction regions (which they call "validity domains"), with applications to clustering [78]. For logics like STL, our quantitative semantics could provide a speedup to their approach.

*Synthesis of Temporal Logic Formulas.* More broadly, there has been work synthesizing parameters in a variant of STL by discretizing the parameter space and then walking the satisfaction boundary [24]; in one dimension, their approach becomes binary search, inheriting its shortcomings. There has been work on synthesizing STL formulas that are satisfied by a closed-loop control model [38], but they assume the ability to find counterexample traces for incorrect STL formulas, which is not applicable to our setting. Another approach is to synthesize parameters in STL formulas using gradient-based optimization [35] or stochastic optimization [45], but we found these methods to be ineffective in our setting, and they do not come with either soundness or completeness guarantees. There is work using decision trees to synthesize STL formulas [1,14,44,48], but these operate on a restricted subset of STL, namely Boolean combinations of a fixed set of template formulas. This restriction prevents these approaches from synthesizing temporal structure, which is a key component of the queries in our domains. Finally, there has been work on active learning of STL formulae using decision trees [48], but it assumes the ability to query for equivalence between a particular STL formula and the ground truth, which is not possible in our setting.

*Synthesizing Constants.* There is work on synthesizing parameters of programs using counterexampled-guided inductive synthesis and different theory solvers, including Fourier-Motzkin variable elimination and an SMT solver [2]. Though our synthesis objective can be encoded in the theory of linear arithmetic, it is extremely large, and we have found such solvers to be ineffective in practice.

*Querying Video Data.* There has been recent work on querying object detections and trajectories in video data [5,7,8,28,40–42,54]. The main difference is our focus on synthesis; in addition, these approaches focus on SQL-like operators such as select, inner-join, group-by, etc., over predefined predicates, which cannot capture compositions such as the sequencing and iteration operators in our language, which are necessary for identifying more complex behaviors.

*Neurosymbolic Models.* There has been recent work on leveraging program synthesis in the context of machine learning. For instance, there has been work on using programs to represent high-level structure in images [21–23,74,84], for reinforcement learning [9,34,79,80], and for querying websites [18]; in contrast, we use programs to classify trajectories. The most closely related work is on synthesizing functional programs operating over lists [67,76]. Our language includes key constructs not included in their languages. Most importantly, we include sequencing; in their functional language, such an operator would need to be represented as a nested series of if-then-else operators. In addition, their language does not support predicates that match subsequences; while such a predicate could be added, none of their operators can compose such predicates.

*Quantitative Synthesis.* There has been work on program synthesis with quantitative properties—e.g., on synthesis for producing optimized code [37,57,65], for approximate computing [15,52], for probabilistic programming [56], and for embedded control [17]. These approaches largely focus on search-based synthesis, either using constraint optimization [52], continuous optimization [17], enumerative search [15,57], or stochastic search [37,56,65]. While we leverage ideas from this literature, our quantitative semantics based pruning strategy is novel.

*Quantitative Semantics.* Our quantitative semantics is similar to the "robustness degree" [25] of a temporal logic formula. The difference is that, by adjusting the denotations of the base predicates, our quantitative semantics gives a parameter on the satisfaction boundary. More broadly, there has been work on quantitative semantics for temporal logic for robust constraint satisfaction [20,25,73], and to guide reinforcement learning [39]. There has been work on quantitative regular expressions (QREs) [4], though in general, QREs cannot be efficiently evaluated due to their nondeterminism, and our language is restricted to ensure efficient computation. There has been work on synthesizing QREs for network traffic classification [68], using binary search to compute decision thresholds. Similarly, there has been work using the Viterbi semiring to obtain quantitative semantics for Datalog programs [69], which they use in conjunction with gradient descent to learn the rules of the Datalog program. In contrast, we use our quantitative semantics to efficiently prune the parameter search space in a provably correct way. Finally, there has been work on using GPUs to evaluate regular expressions [55]; however, they focus on regular expressions over strings.

*Query Languages.* Our language is closely related to both signal temporal logic (STL) [50] and Kleene algebras with tests (KAT) [46]. In particular, it can straightforwardly be extended to subsume both (see Appendix A for details), and our pruning strategy applies to this extended language. The addition of Kleene star, required to subsume KAT, worsens the evaluation time. STL has been used to monitor safety requirements for autonomous vehicles [32]. Spatio-Temporal Perception Logic (SPTL) is an extension of STL to support spatial reasoning [31]. Many of its operators are monotone, and thus would benefit from our algorithm. Scenic [27] is a DSL for creating static and dynamic driving scenes, but its focus is on generating scenes rather than querying for behaviors.

### **7 Conclusion**

We have proposed a novel framework called Quivr for synthesizing queries over video trajectory data. Our language is similar to KAT and STL, but supports conjunction and sequencing over multi-step predicates. Given only a few examples, Quivr efficiently synthesizes trajectory queries consistent with those examples. A key contribution of our approach is the use of a quantitative semantics to prune the parameter search space, yielding a 5.0<sup>×</sup> speedup over the state-of-theart. In our evaluation, we demonstrate that Quivr effectively synthesizes queries to identify interesting driving behaviors, and that our optimizations dramatically reduce synthesis time.

**Acknowledgements.** We thank the anonymous reviewers for their helpful feedback. This work was supported in part by NSF Award CCF-1910769, NSF Award CCF-1917852, and ARO Award W911NF-20-1-0080.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **A**

Abdulla, Parosh Aziz I-184 Akshay, S. I-266, I-367, III-86 Albert, Elvira III-176 Alistarh, Dan I-156 Alur, Rajeev I-415 Amilon, Jesper III-281 Amir, Guy II-438 An, Jie I-62 Anand, Ashwani I-436 Andriushchenko, Roman III-113 Apicelli, Andrew I-27 Arcaini, Paolo I-62 Asada, Kazuyuki III-40 Ascari, Flavio II-41 Atig, Mohamed Faouzi I-184

#### **B**

Badings, Thom III-62 Barrett, Clark II-163, III-154 Bastani, Favyen I-459 Bastani, Osbert I-415, I-459 Bayless, Sam I-27 Becchi, Anna II-288 Beutner, Raven II-309 Bisping, Benjamin I-85 Blicha, Martin II-209 Bonchi, Filippo II-41 Bork, Alexander III-113 Braught, Katherine I-351 Britikov, Konstantin II-209 Brown, Fraser III-154 Bruni, Roberto II-41 Bucev, Mario III-398

#### **C**

Calinescu, Radu I-289 Ceška, Milan ˇ III-113 Chakraborty, Supratik I-367 Chatterjee, Krishnendu III-16, III-86 Chaudhuri, Swarat III-213 Chechik, Marsha III-374 Chen, Hanyue I-40 Chen, Taolue III-255 Chen, Yu-Fang III-139 Choi, Sung Woo II-397 Chung, Kai-Min III-139 Cimatti, Alessandro II-288 Cosler, Matthias II-383 Couillard, Eszter III-437 Czerner, Philipp III-437

#### **D**

Dardik, Ian I-326 Das, Ankush I-27 David, Cristina III-459 Dongol, Brijesh I-206 Dreossi, Tommaso I-253 Dutertre, Bruno II-187

#### **E**

Eberhart, Clovis III-40 Esen, Zafer III-281 Esparza, Javier III-437

#### **F**

Farzan, Azadeh I-109 Fedorov, Alexander I-156 Feng, Nick III-374 Finkbeiner, Bernd II-309 Fremont, Daniel J. I-253 Frenkel, Hadar II-309 Fu, Hongfei III-16 Fu, Yu-Fu II-227, III-329

#### **G**

Gacek, Andrew I-27 Garcia-Contreras, Isabel II-64

© The Editor(s) (if applicable) and The Author(s) 2023 C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 485–488, 2023. https://doi.org/10.1007/978-3-031-37706-8

Gastin, Paul I-266 Genaim, Samir III-176 Getir Yaman, Sinem I-289 Ghosh, Shromona I-253 Godbole, Adwait I-184 Goel, Amit II-187 Goharshady, Amir Kafshdar III-16 Goldberg, Eugene II-110 Gopinath, Divya I-289 Gori, Roberta II-41 Govind, R. I-266 Govind, V. K. Hari II-64 Griggio, Alberto II-288, III-423 Guilloud, Simon III-398 Gurfinkel, Arie II-64 Gurov, Dilian III-281

#### **H**

Hahn, Christopher II-383 Hasuo, Ichiro I-62, II-41, III-40 Henzinger, Thomas A. II-358 Hofman, Piotr I-132 Hovland, Paul D. II-265 Hückelheim, Jan II-265

#### **I**

Imrie, Calum I-289

#### **J**

Jaganathan, Dhiva I-27 Jain, Sahil I-367 Jansen, Nils III-62 Je˙z, Artur II-18 Johannsen, Chris III-483 Johnson, Taylor T. II-397 Jonáš, Martin III-423 Jones, Phillip III-483 Joshi, Aniruddha R. I-266 Jothimurugan, Kishor I-415 Junges, Sebastian III-62, III-113

#### **K**

Kang, Eunsuk I-326 Karimi, Mahyar II-358 Kashiwa, Shun I-253 Katoen, Joost-Pieter III-113 Katz, Guy II-438 Kempa, Brian III-483 Kiesl-Reiter, Benjamin II-187 Kim, Edward I-253 Kirchner, Daniel III-176 Kokologiannakis, Michalis I-230 Kong, Soonho II-187 Kori, Mayuko II-41 Koval, Nikita I-156 Kremer, Gereon II-163 Kˇretínský, Jan I-390 Krishna, Shankaranarayanan I-184 Kueffner, Konstantin II-358 Kunˇcak, Viktor III-398

#### **L**

Lafortune, Stéphane I-326 Lahav, Ori I-206 Lengál, Ondˇrej III-139 Lette, Danya I-109 Li, Elaine III-350 Li, Haokun II-87 Li, Jianwen II-288 Li, Yangge I-351 Li, Yannan II-335 Lidström, Christian III-281 Lin, Anthony W. II-18 Lin, Jyun-Ao III-139 Liu, Jiaxiang II-227, III-329 Liu, Mingyang III-255 Liu, Zhiming I-40 Lopez, Diego Manzanas II-397 Lotz, Kevin II-187 Luo, Ziqing II-265

#### **M**

Maayan, Osher II-438 Macák, Filip III-113 Majumdar, Rupak II-187, III-3, III-437 Mallik, Kaushik II-358, III-3 Mangal, Ravi I-289 Marandi, Ahmadreza III-62 Markgraf, Oliver II-18 Marmanis, Iason I-230 Marsso, Lina III-374 Martin-Martin, Enrique III-176 Mazowiecki, Filip I-132 Meel, Kuldeep S. II-132 Meggendorfer, Tobias I-390, III-86 Meira-Góes, Rômulo I-326 Mell, Stephen I-459 Mendoza, Daniel II-383

Metzger, Niklas II-309 Meyer, Roland I-170 Mi, Junri I-40 Milovanˇcevi´c, Dragana III-398 Mitra, Sayan I-351

#### **N**

Nagarakatte, Santosh III-226 Narayana, Srinivas III-226 Nayak, Satya Prakash I-436 Niemetz, Aina II-3 Nowotka, Dirk II-187

#### **O**

Offtermatt, Philip I-132 Opaterny, Anton I-170 Ozdemir, Alex II-163, III-154

#### **P**

Padhi, Saswat I-27 P˘as˘areanu, Corina S. I-289 Peng, Chao I-304 Perez, Mateo I-415 Preiner, Mathias II-3 Prokop, Maximilian I-390 Pu, Geguang II-288

#### **R**

Reps, Thomas III-213 Rhea, Matthew I-253 Rieder, Sabine I-390 Rodríguez, Andoni III-305 Roy, Subhajit III-190 Rozier, Kristin Yvonne III-483 Rümmer, Philipp II-18, III-281 Rychlicki, Mateusz III-3

#### **S**

Sabetzadeh, Mehrdad III-374 Sánchez, César III-305 Sangiovanni-Vincentelli, Alberto L. I-253 Schapira, Michael II-438 Schmitt, Frederik II-383 Schmuck, Anne-Kathrin I-436, III-3 Seshia, Sanjit A. I-253 Shachnai, Matan III-226 Sharma, Vaibhav I-27

Sharygina, Natasha II-209 Shen, Keyi I-351 Shi, Xiaomu II-227, III-329 Shoham, Sharon II-64 Siegel, Stephen F. II-265 Sistla, Meghana III-213 Sokolova, Maria I-156 Somenzi, Fabio I-415 Song, Fu II-413, III-255 Soudjani, Sadegh III-3 Srivathsan, B. I-266 Stanford, Caleb II-241 Stutz, Felix III-350 Su, Yu I-40 Sun, Jun II-413 Sun, Yican III-16

#### **T**

Takhar, Gourav III-190 Tang, Xiaochao I-304 Tinelli, Cesare II-163 Topcu, Ufuk III-62 Tran, Hoang-Dung II-397 Tripakis, Stavros I-326 Trippel, Caroline II-383 Trivedi, Ashutosh I-415 Tsai, Ming-Hsien II-227, III-329 Tsai, Wei-Lun III-139 Tsitelov, Dmitry I-156

#### **V**

Vafeiadis, Viktor I-230 Vahanwala, Mihir I-184 Veanes, Margus II-241 Vin, Eric I-253 Vishwanathan, Harishankar III-226

#### **W**

Waga, Masaki I-3 Wahby, Riad S. III-154 Wang, Bow-Yaw II-227, III-329 Wang, Chao II-335 Wang, Jingbo II-335 Wang, Meng III-459 Watanabe, Kazuki III-40 Wehrheim, Heike I-206 Whalen, Michael W. I-27 Wies, Thomas I-170, III-350

Wolff, Sebastian I-170 Wu, Wenhao II-265

#### **X**

Xia, Bican II-87 Xia, Yechuan II-288

#### **Y**

Yadav, Raveesh I-27 Yang, Bo-Yin II-227, III-329 Yang, Jiong II-132 Yang, Zhengfeng I-304 Yu, Huafeng I-289 Yu, Yijun III-459 Yue, Xiangyu I-253

#### **Z**

Zdancewic, Steve I-459 Zelazny, Tom II-438 Zeng, Xia I-304 Zeng, Zhenbing I-304 Zhang, Hanliang III-459 Zhang, Li I-304 Zhang, Miaomiao I-40 Zhang, Pei III-483 Zhang, Yedi II-413 Zhang, Zhenya I-62 Zhao, Tianqi II-87 Zhu, Haoqing I-351 Žikeli´c, Ðor de III-86 Zufferey, Damien III-350