**Armin Biere David Parker (Eds.)**

# **Tools and Algorithms for the Construction and Analysis of Systems**

**26th International Conference, TACAS 2020 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020 Dublin, Ireland, April 25–30, 2020, Proceedings, Part II**

## Lecture Notes in Computer Science 12079

Founding Editors

Gerhard Goos, Germany Juris Hartmanis, USA

## Editorial Board Members

Elisa Bertino, USA Wen Gao, China Bernhard Steffen , Germany Gerhard Woeginger , Germany Moti Yung, USA

## Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen , University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at http://www.springer.com/series/7407

# Tools and Algorithms for the Construction and Analysis of Systems

26th International Conference, TACAS 2020 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020 Dublin, Ireland, April 25–30, 2020 Proceedings, Part II

Editors Armin Biere Johannes Kepler University Linz, Austria

David Parker University of Birmingham Birmingham, UK

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-45236-0 ISBN 978-3-030-45237-7 (eBook) https://doi.org/10.1007/978-3-030-45237-7

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2020. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## ETAPS Foreword

Welcome to the 23rd ETAPS! This is the first time that ETAPS took place in Ireland in its beautiful capital Dublin.

ETAPS 2020 was the 23rd instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations of programming language developments, analysis tools, and formal approaches to software engineering. Organizing these conferences in a coherent, highly synchronized conference program enables researchers to participate in an exciting event, having the possibility to meet many colleagues working in different directions in the field, and to easily attend talks of different conferences. On the weekend before the main conference, numerous satellite workshops took place that attracted many researchers from all over the globe. Also, for the second time, an ETAPS Mentoring Workshop was organized. This workshop is intended to help students early in the program with advice on research, career, and life in the fields of computing that are covered by the ETAPS conference.

ETAPS 2020 received 424 submissions in total, 129 of which were accepted, yielding an overall acceptance rate of 30.4%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2020 featured the unifying invited speakers Scott Smolka (Stony Brook University) and Jane Hillston (University of Edinburgh) and the conference-specific invited speakers (ESOP) Işıl Dillig (University of Texas at Austin) and (FASE) Willem Visser (Stellenbosch University). Invited tutorials were provided by Erika Ábrahám (RWTH Aachen University) on the analysis of hybrid systems and Madhusudan Parthasarathy (University of Illinois at Urbana-Champaign) on combining Machine Learning and Formal Methods. On behalf of the ETAPS 2020 attendants, I thank all the speakers for their inspiring and interesting talks!

ETAPS 2020 took place in Dublin, Ireland, and was organized by the University of Limerick and Lero. ETAPS 2020 is further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology). The local organization team consisted of Tiziana Margaria (general chair, UL and Lero), Vasileios Koutavas (Lero@UCD), Anila Mjeda (Lero@UL), Anthony Ventresque (Lero@UCD), and Petros Stratis (Easy Conferences).

The ETAPS Steering Committee (SC) consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Holger Hermanns (Saarbrücken), Marieke Huisman (chair, Twente), Joost-Pieter Katoen (Aachen and Twente), Jan Kofron (Prague), Gerald Lüttgen (Bamberg), Tarmo Uustalu (Reykjavik and Tallinn), Caterina Urban (Inria, Paris), and Lenore Zuck (Chicago).

Other members of the SC are: Armin Biere (Linz), Jordi Cabot (Barcelona), Jean Goubault-Larrecq (Cachan), Jan-Friso Groote (Eindhoven), Esther Guerra (Madrid), Jurriaan Hage (Utrecht), Reiko Heckel (Leicester), Panagiotis Katsaros (Thessaloniki), Stefan Kiefer (Oxford), Barbara König (Duisburg), Fabrice Kordon (Paris), Jan Kretinsky (Munich), Kim G. Larsen (Aalborg), Tiziana Margaria (Limerick), Peter Müller (Zurich), Catuscia Palamidessi (Palaiseau), Dave Parker (Birmingham), Andrew M. Pitts (Cambridge), Peter Ryan (Luxembourg), Don Sannella (Edinburgh), Bernhard Steffen (Dortmund), Mariëlle Stoelinga (Twente), Gabriele Taentzer (Marburg), Christine Tasson (Paris), Peter Thiemann (Freiburg), Jan Vitek (Prague), Heike Wehrheim (Paderborn), Anton Wijs (Eindhoven), and Nobuko Yoshida (London).

I would like to take this opportunity to thank all speakers, attendants, organizers of the satellite workshops, and Springer for their support. I hope you all enjoyed ETAPS 2020. Finally, a big thanks to Tiziana and her local organization team for all their enormous efforts enabling a fantastic ETAPS in Dublin!

February 2020 Marieke Huisman ETAPS SC Chair ETAPS e.V. President

## Preface

TACAS 2020 was the 26th edition of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems conference series. TACAS 2020 was part of the 23rd European Joint Conferences on Theory and Practice of Software (ETAPS 2020). The conference was held at the Royal Marine Hotel in Dublin, Ireland, during April 25–30, 2020.

TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, flexibility, and efficiency of tools and algorithms for building systems. TACAS solicited four types of submissions:


This year 155 papers were submitted to TACAS, consisting of 111 research papers, 8 case study papers, 19 regular tool papers, and 17 tool demo papers. Individual authors were limited to a maximum of three submissions. Each paper was reviewed by at least three Program Committee (PC) members, who also provided feedback whether certain papers should go through a rebuttal process.

The chairs asked for 59 rebuttals, usually following such rebuttal recommendations by PC members. In parallel to PC reviewing, the Artifact Evaluation Committee (AEC) reviewed the artifacts. A formal summary review of this evaluation was made available to the PC members and taken into account in the discussion phase. The case study chair and the tools chair made sure that identical reviewing and selection criteria were applied within their respective class of papers. After this thorough reviewing, rebuttal and discussion phase, a total of 48 papers were accepted, including 31 research papers, 4 case study papers, 5 regular tool papers and 8 tool demo papers.

As in 2019, TACAS 2020 included an artifact evaluation (AE) for all types of papers. There were two rounds of the AE: for regular tool papers and tool demonstration papers AE was compulsory and artifacts had to be submitted to the first round. For research and case study papers, it was voluntary, and artifacts could be submitted to either the first or the second round. The results of the first round were communicated to the TACAS PC before their discussion phase so that the quality of the artifact could be considered prior to the TACAS decision making. Each artifact was evaluated independently by at least three reviewers. All accepted papers with accepted artifacts received a badge which is added to the title page of the respective paper if desired by the authors.

The AEC used a two-phase reviewing process: reviewers first performed an initial check to see whether the artifact was technically usable and whether the accompanying instructions were consistent, followed by a full evaluation of the artifact. The main criteria for artifact acceptance was consistency with the paper, with completeness, and documentation being handled in a more lenient manner as long as the artifact was useful overall.

In the first round, out of 44 artifact submissions, 29 were accepted and 15 were rejected. This corresponds to an acceptance rate of 66%. Out of the 36 artifacts for regular tool papers and tool demonstration papers, 25 artifacts were accepted and 11 artifacts were rejected resulting in an acceptance rate of 69%. In all but five cases, tool papers whose artifacts did not pass the evaluation were rejected. Those 5 artifacts were invited for submission in the second evaluation round and 3 of these artifacts were resubmitted and successfully evaluated. Overall, out of the 20 artifacts submitted to the second evaluation round, 17 were accepted and 3 were rejected resulting in an acceptance rate of 85%.

TACAS 2020 also hosted the 9th International Competition on Software Verification (SV-COMP 2020), chaired and organized by Dirk Beyer. The competition had again a high participation: 28 verification systems with developers from 11 countries were submitted for the systematic comparative evaluation, including 3 submissions from industry. Six teams contributed validators for verification witnesses. The TACAS proceedings includes the competition report and short papers describing 11 of the participating verification systems. These papers were reviewed by a separate SV-COMP program committee; each of the papers was assessed by at least three reviewers. Two sessions in the TACAS program were reserved for the presentation of the results: the summary by the SV-COMP chair and the participating tools by the developer teams in the first session, and the open community meeting in the second session.

We are grateful to everyone who helped to make TACAS 2020 a success. In particular, we would like to thank all PC members, external reviewers, and the members of the AEC for their detailed and informed reviews and for their discussions during the virtual PC and AEC meetings. The collection and selection of papers was organized through the EasyChair Conference System and the proceedings volumes were published with the help of Springer; we thank them all for their assistance. We also thank the SC for their advice, the Organizing Committee of ETAPS 2020 and its general chair (Tiziana Margaria) and the chair of the ETAPS Executive Board (Marieke Huisman).

March 2020 Armin Biere David Parker PC Chairs Marijn Heule Case Study Chair Falk Howar Tools Chair Dirk Beyer Competition Chair Arnd Hartmanns Martina Seidl AEC Chairs

## Organization

## Program Committee

Dirk Beyer LMU Munich, Germany Roderick Bloem TU Graz, Austria Alessandro Cimatti FBK-irst, Italy Jan Kretinsky TU Munich, Germany Wenchao Li Boston University, USA Ken McMillan Microsoft, USA Bernhard Steffen TU Dortmund, Germany Christoph Wintersteiger Microsoft, UK

Christel Baier TU Dresden, Germany Ezio Bartocci Vienna University of Technology, Austria Armin Biere (Chair) Johannes Kepler University Linz Jasmin Blanchette Vrije Universiteit Amsterdam, The Netherlands Hana Chockler King's College London, UK Rance Cleaveland University of Maryland, USA Goran Frehse Université Grenoble Alpes, France Martin Fränzle Carl von Ossietzky Univ. Oldenburg, Germany Orna Grumberg Technion - Israel Institute of Technology Kim Guldstrand Larsen Aalborg University, Denmark Holger Hermanns Universität des Saarlandes, Germany Marijn Heule Carnegie Mellon University, USA Falk Howar TU Clausthal, IPSSE, Germany Benjamin Kiesl CISPA Helmholtz Center for Inf. Security, Germany Laura Kovacs Vienna University of Technology, Austria Aina Niemetz Stanford University, USA Gethin Norman University of Glasgow, UK David Parker (Chair) University of Birmingham, UK Corina Pasareanu CMU/NASA Ames Research Center, USA Nir Piterman University of Gothenburg, Sweden Kristin Yvonne Rozier Iowa State University, USA Philipp Ruemmer Uppsala University, Sweden Natasha Sharygina Università della Svizzera italiana, Switzerland Jan Strejček Masaryk University, Czech Republic Michael Tautschnig Queen Mary University of London, UK Jaco van de Pol Aarhus University, Denmark Tom van Dijk University of Twente, The Netherlands

## Artifact Evaluation Committee


## SV-COMP – Program Committee and Jury


Hernán Ponce de León (Dartagnan) Bundeswehr University Munich, Germany Henrich Lauko (DIVINE) Masaryk University, Czechia Felipe R. Monteiro (ESBMC) Fed. Univ. of Amazonas, Brazil Benjamin Quiring (GACAL) Northeastern University, USA Vaibhav Sharma (Java-Ranger) University of Minnesota, USA Philipp Ruemmer (JayHorn) Uppsala University, Sweden Peter Schrammel (JBMC) University of Sussex, UK Falk Howar (JDart) TU Dortmund, Germany Omar Inverso (Lazy-CSeq) Gran Sasso Science Institute, Italy Herbert Rocha (Map2Check) Universidade Federal do Amazonas, Brazil Philipp Berger (NITWIT) RWTH Aachen, Germany Cedric Richter (PeSCo) Paderborn University, Germany Saurabh Joshi (Pinaka) IIT Hyderabad, India Veronika Šoková (PredatorHP) BUT Brno, Czech Republic Willem Visser (SPF) Amazon Web Services, USA Marek Chalupa (Symbiotic) Masaryk University, Czech Republic Matthias Heizmann (UAutomizer) University of Freiburg, Germany Alexander Nutz (UKojak) University of Freiburg, Germany Daniel Dietsch (UTaipan) University of Freiburg, Germany Priyanka Darke (VeriAbs) Tata Consultancy Services, India Raveendra K. Medicherla (VeriFuzz) Tata Consultancy Services, India Liangze Yin (Yogar-CBMC) Nat. Univ. of Defense Technology, China

## Steering Committee

Bernhard Steffen (Chair) TU Dortmund, Germany Dirk Beyer LMU Munich, Germany Rance Cleaveland University of Maryland, USA Holger Hermanns Universität des Saarlandes, Germany Kim G. Larsen Aalborg University, Denmark

## Additional Reviewers

Alexandre Dit Sandretto, Julien Asadi, Sepideh Ashok, Pranav Avigad, Jeremy Baanen, Tim Bacci, Giorgio Bacci, Giovanni Backeman, Peter Bae, Kyungmin Barbosa, Haniel Bentkamp, Alexander Berani Abdelwahab, Erzana Biewer, Sebastian Blahoudek, Fanda Blicha, Martin Bozga, Marius Bozzano, Marco Bønneland, Frederik M. Cerna, David Ceska, Milan Chalupa, Marek Chapoutot, Alexandre Dierl, Simon Dureja, Rohit Ebrahimi, Masoud Eisentraut, Julia Endrullis, Jörg Ernst, Gidon Esen, Zafer Fan, Jiameng Fazekas, Katalin Fedyukovich, Grigory Fleury, Mathias Fokkink, Wan Forets, Marcelo Freiberger, Felix Frenkel, Hadar Friedberger, Karlheinz Frohme, Markus Fu, Feisi Fürnkranz, Johannes Giacobbe, Mirco Gjøl Jensen, Peter

Gossen, Frederik Goudsmid, Ohad Griggio, Alberto Grover, Kush Gutiérrez, Elena Haaswijk, Winston Hadžić, Vedad Hahn, Ernst Moritz Hansen, Mikkel Hartmanns, Arnd Hecking-Harbusch, Jesko Hofmann, Jana Holzner, Stephan Hugunin, Jasper Humenberger, Andreas Hupel, Lars Hyvärinen, Antti Irfan, Ahmed Jasper, Marc Jaulin, Luc Jensen, Mathias Claus Jensen, Peter Gjøl Jonas, Martin Jonsson, Bengt Jonáš, Martin Kacianka, Severin Kaminski, Benjamin Lucien Kanav, Sudeep Kempa, Brian Khalimov, Ayrat Kiourti, Panagiota Klauck, Michaela Klüppelholz, Sascha Koenighofer, Bettina Kopetzki, Dawid Krcal, Pavel Kröger, Paul Kupferman, Orna Köhl, Maximilian Lahkim Bennani, Ismail Legay, Axel Lemberger, Thomas Liang, Chencheng

Lorber, Florian Ma, Meiyi Major, Juraj Mann, Makai Marcovich, Ron Marescotti, Matteo Martins, Ruben Meggendorfer, Tobias Mikučionis, Marius Mitsch, Stefan Mover, Sergio Mues, Malte Murtovi, Alnis Möhlmann, Eike Mömke, Tobias Müller, David Narváez, David Naujokat, Stefan Oliveira da Costa, Ana Otoni, Rodrigo Pagel, Jens Parlato, Gennaro Paskevich, Andrei Peppelman, Marijn Perelli, Giuseppe Pivoluska, Matej Popescu, Andrei Puch, Stefan Putot, Sylvie Rebola-Pardo, Adrián

Reynolds, Andrew Rothenberg, Bat-Chen Roveri, Marco Rowe, Reuben Rüthing, Oliver Schilling, Christian Shoukry, Yasser Spießl, Martin Srba, Jiri Stankovic, Miroslav Stierand, Ingo Štill, Vladimír Stjerna, Albin Stock, Gregory Stojic, Ivan Theel, Oliver Tian, Chun Tonetta, Stefano Trtík, Marek van der Ploeg, Atze Vom Dorff, Sebastian Wardega, Kacper Weininger, Maximilian Wendler, Philipp Wimmer, Simon Winkels, Jan Yolcu, Emre Zeljić, Aleksandar Zhou, Weichao

## Contents – Part II

### Bisimulation


#### Logic and Proof


xviii Contents – Part II



## Contents – Part I

### Program Verification

and Stefano Tonetta


Multi-agent Safety Verification Using Symmetry Transformations . . . . . . . . . 173 Hussein Sibai, Navid Mokhlesi, Chuchu Fan, and Sayan Mitra


## Verifying Concurrent Systems


#### Model Checking and Reachability

Partial Order Reduction for Deep Bug Finding in Synchronous Hardware . . . 367 Makai Mann and Clark Barrett


## Bisimulation

## **An** *O***(***m* **log** *n***) algorithm for branching bisimilarity on labelled transition systems**

David N. Jansen<sup>1</sup> , Jan Friso Groote<sup>2</sup> , Jeroen J.A. Keiren<sup>2</sup> , and Anton Wijs<sup>2</sup>

<sup>1</sup> State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China dnjansen@ios.ac.cn <sup>2</sup> Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands {J.F.Groote, J.J.A.Keiren, A.J.Wijs}@tue.nl

**Abstract.** Branching bisimilarity is a behavioural equivalence relation on labelled transition systems (LTSs) that takes internal actions into account. It has the traditional advantage that algorithms for branching bisimilarity are more efficient than ones for other weak behavioural equivalences, especially weak bisimilarity. With m the number of transitions and n the number of states, the classic O(mn) algorithm was recently replaced by an O(m(log |Act| + log n)) algorithm [9], which is unfortunately rather complex. This paper combines its ideas with the ideas from Valmari [20], resulting in a simpler O(m log n) algorithm. Benchmarks show that in practice this algorithm is also faster and often far more memory efficient than its predecessors, making it the best option for branching bisimulation minimisation and preprocessing for calculating other weak equivalences on LTSs.

**Keywords:** Branching bisimilarity · Algorithm · Labelled transition systems

## **1 Introduction**

Branching bisimilarity [8] is an alternative to weak bisimilarity [17]. Both equivalences allow the reduction of labelled transition systems (LTSs) containing transitions labelled with internal actions, also known as silent, hidden or τ -actions.

One of the distinct advantages of branching bisimilarity is that, from the outset, an efficient algorithm has been available [10], which can be used to calculate whether two states are equivalent and to calculate a quotient LTS. It has complexity O(mn) with m the number of transitions and n the number of states. It is more efficient than classic algorithms for weak bisimilarity, which use transitive closure (for instance, [16] runs in O - n<sup>2</sup>m log n + mn<sup>2</sup>.<sup>376</sup> , where n<sup>2</sup>.<sup>376</sup> is the time for computing the transitive closure), and algorithms for weak simulation equivalence (strong simulation equivalence can be computed in O(mn) [12], and for weak simulation equivalence first the transitive closure needs to be computed). The algorithm is also far more efficient than algorithms for trace-based equivalence notions, such as (weak) trace equivalence or weak failure equivalence [16].

Branching bisimilarity also enjoys the nice mathematical property that there exists a canonical quotient with a minimal number of states and transitions (contrary to, for instance, trace-based equivalences). Additionally, as branching bisimilarity is coarser than virtually any other behavioural equivalence taking internal actions into account [7], it is ideal for preprocessing. In order to calculate a desired equivalence, one can first reduce the behaviour modulo branching bisimilarity, before applying a dedicated algorithm on the often substantially reduced transition system. In the mCRL2 toolset [5] this is common practice.

In [9,11] an algorithm to calculate stuttering equivalence on Kripke structures with complexity O(m log n) was proposed. Stuttering equivalence essentially differs from branching bisimilarity in the fact that transitions do not have labels and as such all transitions can be viewed as internal. In these papers it was shown that branching bisimilarity can be calculated by translating LTSs to Kripke structures, encoding the labels of transitions into labelled states following [6,19]. This led to an O(m(log |Act| + log n)) or O(m log m) algorithm for branching bisimilarity.

Besides the time complexity, the algorithm in [9,11] has two disadvantages. First, the translation to Kripke structures introduces a new state and a new transition per action label and target state of a transition, which increases the memory required to calculate branching bisimilarity. This made it far less memory efficient than the classical algorithm of [10], and this was perceived as a substantial practical hindrance. For instance, when reducing systems consisting of tens of millions of states, such as [2], memory consumption is the bottleneck. Second, the algorithm in [9,11] is very complex. To illustrate the complexity, implementing it took approximately half a person-year.

**Contributions.** We present an algorithm for branching bisimilarity that runs directly on LTSs in O(m log n) time and that is simpler than the algorithm of [9,11]. To achieve this we use an idea from Valmari and Lehtinen [20,21] for strong bisimilarity. The standard Paige–Tarjan algorithm [18], which has O(m log n) time complexity for strong bisimilarity on Kripke structures, registers work done in a separate partition of states. Valmari [20] observed that this leads to complexity O(m log m) on LTSs and proposed to use a partition of transitions, whose elements he (and we) calls bunches, to register work done. This reduces the time complexity on LTSs to O(m log n).

Using this idea we design our more straightforward algorithm for branching bisimilarity on LTSs. Essentially, this makes the maintenance of action labels particularly straightforward and allows to simplify the handling of new, so-called, bottom states [10]. It also leads to a novel main invariant, which we formulate as Invariant 1. It allows us to prove the correctness of the algorithm in a far more straightforward way than before.

We have proven the correctness and complexity of the algorithm in detail [14] and demonstrate that it outperforms all preceding algorithms both in time and space when the LTSs are sizeable. This is illustrated with more than 30 example LTSs. This shows that the new algorithm pushes the state-of-the-art in comparing and minimising the behaviour of LTSs w.r.t. weak equivalences, either directly (branching bisimilarity) or using the form of a preprocessing step (for other weak equivalences).

Despite the fact that this new algorithm is more straightforward than the previous O(m(log |Act| + log n)) algorithm [9], the implementation of the algorithm is still not easy. To guard against implementation errors, we extensively applied random testing, comparing the output with that of other algorithms. The algorithms and their source code are freely available in the mCRL2 toolset [5].

**Overview of the article.** In Section 2 we provide the definition of LTSs and branching bisimilarity. In Section 3 we provide the core algorithm with high-level data structures, correctness and complexity. The subsequent section presents the procedure for splitting blocks, which can be presented as an independent pair of coroutines. Section 5 presents some benchmarks. Proofs and implementation details are omitted in this paper, and can be found in [14].

## **2 Branching bisimilarity**

In this section we define labelled transition systems and branching bisimilarity.

**Definition 1 (Labelled transition system).** A labelled transition system (LTS) is a triple A = (S, Act, −→) where


It is common to write t <sup>a</sup> −→ t for (t, a, t- ) ∈ −→. With slight abuse of notation we write t <sup>a</sup> −→ t - ∈ T instead of (t, a, t- ) <sup>∈</sup> <sup>T</sup> for <sup>T</sup> ⊆ −→. We also write <sup>t</sup> <sup>a</sup> −→ Z for the set of transitions {<sup>t</sup> <sup>a</sup> −→ t - | t - <sup>∈</sup> <sup>Z</sup>}, and <sup>Z</sup> <sup>a</sup> −→ Z for the set {<sup>t</sup> <sup>a</sup> −→ t - | t ∈ Z, t- ∈ Z- }. We call all actions except <sup>τ</sup> the visible actions. If <sup>t</sup> <sup>a</sup> −→ t - , we say that from t, the state t - , the action a, and the transition t <sup>a</sup> −→ t are reachable.

**Definition 2 (Branching bisimilarity).** Let A = (S, Act, −→) be an LTS. We call a relation R ⊆ S × S a branching bisimulation relation iff it is symmetric and for all s, t <sup>∈</sup> <sup>S</sup> such that sRt and all transitions <sup>s</sup> <sup>a</sup> −→ s we have: 1. a = τ and s-R t, or

2. there is a sequence t <sup>τ</sup> −→··· <sup>τ</sup> −→ t a −→ t - such that sRt and s- R t--. Two states s and t are branching bisimilar, denoted by s ↔<sup>b</sup> t, iff there is a branching bisimulation relation R such that sRt.

Note that branching bisimilarity is an equivalence relation. Given an equivalence relation R, a transition s <sup>a</sup> −→ <sup>t</sup> is called inert iff <sup>a</sup> <sup>=</sup> <sup>τ</sup> and sRt. If <sup>t</sup> <sup>τ</sup> −→ t<sup>1</sup> τ −→ ··· <sup>τ</sup> −→ t<sup>n</sup>−<sup>1</sup> τ −→ t<sup>n</sup> a −→ t such that tRt<sup>i</sup> for 1 ≤ i ≤ n, we say that the state tn, the action a, and the transition t<sup>n</sup> a −→ t are inertly reachable from t.

The equivalence classes of branching bisimilarity partition the set of states.

**Definition 3 (Partition).** For a set X a partition Π of X is a disjoint cover of X, i.e., Π = {B<sup>i</sup> ⊆ X | B<sup>i</sup> = ∅, 1 ≤ i ≤ k} such that B<sup>i</sup> ∩ B<sup>j</sup> = ∅ for all 1 ≤ i<j ≤ k and X = <sup>1</sup>≤i≤<sup>k</sup> <sup>B</sup>i.

A partition Π is a refinement of Π iff for every B- ∈ Π there is some B ∈ Π such that B-⊆ B.

We will often use that a partition Π induces an equivalence relation in the following way: s ≡<sup>Π</sup> t iff there is some B ∈ Π containing both s and t.

## **3 The algorithm**

In this section we present the core algorithm. In the next section we deal with the actual splitting of blocks in the partition. We start off with an abstract description of this core part.

#### **3.1 High-level description of the algorithm**

The algorithm is a partition refinement algorithm. It iteratively refines two partitions Π<sup>s</sup> and Πt. Partition Π<sup>s</sup> is a partition of states in S that is coarser than branching bisimilarity. We refer to the elements of Π<sup>s</sup> as blocks, typically denoted using B. Partition Π<sup>t</sup> partitions the non-inert transitions of −→, where inertness is interpreted with respect to ≡<sup>Π</sup><sup>s</sup> . We refer to the elements of Π<sup>t</sup> as bunches, typically denoted using T.

The partition of transitions Π<sup>t</sup> records the current knowledge about transitions. Transitions are in different bunches iff the algorithm has established that they cannot simulate each other (i.e., they cannot serve as s <sup>a</sup> −→ s and t a −→ t -- in Definition 2).

The partition of states Π<sup>s</sup> records the current knowledge about branching bisimilarity. Two states are in different blocks iff the algorithm has found a proof that they are not branching bisimilar (this is formalised in Invariant 3). This implies that Π<sup>s</sup> must be such that states with outgoing transitions in different combinations of bunches are in different blocks (Invariant 1).

Before performing partition refinement, the LTS is preprocessed to contract τ -strongly connected components (SCCs) into a single state without a τ -loop. This step is valid as all states in a τ -SCC are branching bisimilar. Consequently, every block has bottom states, i.e., states without outgoing inert τ transitions [10].

The core invariant of the algorithm says that if one state in a block can inertly reach a transition in a bunch, all states in that block can inertly reach a transition in this bunch. This can be formulated in terms of bottom states:

**Invariant 1 (Bunches).** Π<sup>s</sup> is stable under Πt, i.e., if a bunch T ∈ Π<sup>t</sup> contains a transition with its source state in a block B ∈ Πs, then every bottom state in block B has a transition in bunch T.

The initial partitions Π<sup>s</sup> and Π<sup>t</sup> are the coarsest partitions that satisfy Invariant 1. Π<sup>t</sup> starts with a single bunch consisting of all non-inert transitions. Then, in Π<sup>s</sup> we need to separate states with some transition in this bunch from those without. We define Bvis to be the set of states from which a visible transition is inertly reachable, and Binvis to be the other states. Then Π<sup>s</sup> = {Bvis, Binvis}\{∅}.

Transitions in a bunch may have different labels or go to different blocks. In that case, the bunch can be split as these transitions cannot simulate each other. If we manage to achieve the situation where all transitions in a bunch have the same label and go to the same target block, the obtained partition turns out to be a branching bisimulation. Therefore, we want to split each bunch into socalled action-block-slices defined below. We also immediately define some other sets derived from Π<sup>t</sup> and Π<sup>s</sup> as we require them in our further exposition. So, we have:


T<sup>B</sup> <sup>a</sup> −→B- = T<sup>B</sup>−→ ∩ T <sup>a</sup> −→B- <sup>=</sup> {<sup>s</sup> <sup>a</sup> −→ s- ∈ T | s ∈ B ∧ s- ∈ B- }.


The block-bunch-slices and action-block-slices are explicitly maintained as auxiliary data structures in the algorithm in order to meet the required performance bounds. If the partitions Π<sup>s</sup> or Π<sup>t</sup> are adapted, all the derived sets above also change accordingly.

A bunch can be trivial, which means that it only contains one action-blockslice, or it can contain multiple action-block-slices. In the latter case one actionblock-slice is split off to become a bunch by itself. However, this may invalidate Invariant 1. Some states in a block may only have transitions in the new bunch while other states have only transitions in the old bunch. Therefore, blocks have to be split to satisfy Invariant 1. Splitting blocks can cause bunches to become non-trivial because action-block-slices fall apart.

This splitting is repeated until all bunches are trivial, and as already stated above, the obtained partition Π<sup>s</sup> is the required branching bisimulation. As the transition system is finite this process of repeated splitting terminates.

#### **3.2 Abstract algorithm**

We first present an abstract version of the algorithm in Algorithm 1. Its behaviour is as follows. As long as there are non-trivial bunches—i.e, bunches containing multiple action-block-slices—, these bunches need to be split such that they ultimately become trivial. The outer loop (Lines 1.2–1.19) takes a



non-trivial bunch T from Πt, and from this it moves an action-block-slice T <sup>a</sup> −→B- into its own bunch in Π<sup>t</sup> (Line 1.4). Hence, bunch T is reduced to T \ T <sup>a</sup> −→B-.

The two new bunches T <sup>a</sup> −→B and T \ T <sup>a</sup> −→B can cause instability, violating Invariant 1. This means there can be blocks with transitions in one new bunch, but some bottom states only have transitions in the other new bunch. For such blocks, stability needs to be restored by splitting them.

To restore this stability we investigate all block-bunch-slices in one of the new bunches, namely T <sup>a</sup> −→B- . Blocks that do not have transitions in these blockbunch-slices are stable with respect to both bunches. To keep track of the blocks that still need to be split, we partition the block-bunch-slices T<sup>B</sup>−→ into stable and unstable block-bunch-slices. A block-bunch-slice is stable if we have ensured that it is not a splitter for any block. Otherwise it is deemed unstable, and it needs to be checked whether it is stable, or whether the block B must be split. The first inner loop (Lines 1.5–1.7) inserts all unstable block-bunch-slices into the splitter list. Block-bunch-slices of the shape T<sup>B</sup> <sup>a</sup> −→B in the splitter list are labelled primary, and other list entries are labelled secondary.

In the second loop (Lines 1.8–1.18), one splitter T- <sup>B</sup>−→ from the splitter list is taken at a time and its source block is split into R (the part that can inertly reach T- <sup>B</sup>−→) and <sup>U</sup> (the part that cannot inertly reach <sup>T</sup>- <sup>B</sup>−→) to re-establish stability.

If T- <sup>B</sup>−→ was a primary splitter of the form <sup>T</sup><sup>B</sup> <sup>a</sup> −→B- , then we know that U must be stable under T<sup>U</sup>−→ \ T<sup>U</sup> <sup>a</sup> −→B- , as every bottom state in B has a transition in the former block-bunch-slice T<sup>B</sup>−→, and as the states in U have no transition in T<sup>B</sup> <sup>a</sup> −→B- , every bottom state in U must have a transition in T<sup>B</sup>−→ \ T<sup>B</sup> <sup>a</sup> −→B- . Therefore, at Line 1.11, block-bunch-slice T<sup>U</sup>−→ \ T<sup>U</sup> <sup>a</sup> −→B can be removed from the splitter list. This is the three-way split from [18].

Some inert transitions may have become non-inert, namely the τ -transitions that go from R to U. There cannot be τ -transitions from U to R. The new noninert transitions were not yet part of a bunch in Πt. So, a new bunch R <sup>τ</sup> −→ U is formed for them. All transitions in this new bunch leave R and thus R is the only block that may not be stable under this new bunch. To avoid superfluous work, we split off the unstable part N, i.e. the part that can inertly reach a transition in R <sup>τ</sup> −→ U and contains all new bottom states, at Line 1.14. The original bottom states of R become the bottom states of R- . There can be transitions N <sup>τ</sup> −→ R- that also become non-inert, and we add these to the new bunch R <sup>τ</sup> −→ U. As observed in [10], blocks containing new bottom states can become unstable under any bunch. So, stability of N (but not of R- ) must be re-established, and all block-bunch-slices leaving N are put on the splitter list at Line 1.15.

#### **3.3 Correctness**

The validity of the algorithm follows from a number of major invariants. The main invariant, Invariant 1, is valid at Line 1.2. Additionally, the algorithm satisfies the following three invariants.

**Invariant 2 (Bunches are not unnecessarily split).** For any pair of noninert transitions s <sup>a</sup> −→ s and t <sup>a</sup> −→ t - , if s, t ∈ B and s- , t- ∈ B then s <sup>a</sup> −→ s- ∈ T and t <sup>a</sup> −→ t -∈ T for some bunch T ∈ Πt.

**Invariant 3 (Preservation of branching bisimilarity).** For all states s, t ∈ S, if s ↔<sup>b</sup> t, then there is some block B ∈ Π<sup>s</sup> such that s, t ∈ B.

**Invariant 4 (No inert loops).** There is no inert loop in a block, i.e., for every sequence s<sup>1</sup> τ −→ s<sup>2</sup> τ −→··· <sup>τ</sup> −→ s<sup>n</sup> with s<sup>i</sup> ∈ B ∈ Πs, n > 1 it holds that s<sup>1</sup> = sn.

Invariant 2 indicates that two non-inert transitions that (1) start in the same block, (2) have the same label, and (3) end in the same block, always reside in the same bunch. Invariant 3 says that branching bisimilar states never end up in separate blocks. Invariant 4 ensures that all τ -paths in each block are finite. As a consequence every block has at least one bottom state, and from every state a bottom state can be inertly reached.

The invariants given above allow us to prove that the algorithm works correctly. When the algorithm terminates (and this always happens, see Section 3.5), branching bisimilar states are perfectly grouped in blocks.

**Theorem 1.** From the Invariants 1, 3 and 4, it follows that after the algorithm terminates, ≡<sup>Π</sup><sup>s</sup> = ↔b.

Because of the space restrictions here, the proofs are omitted. The interested reader is referred to [14] for the details.

#### **3.4 In-depth description of the algorithm**

To show that the algorithm has the desired O(m log n) time complexity, we now give a more detailed description of the algorithm. The pseudocode of the detailed algorithm is given in Algorithm 2. This algorithm serves two purposes. First of all, it clarifies how the data structures are used, and refines many of the steps in the high-level algorithm. Additionally, time budgets for parts of the algorithm

#### **Algorithm 2** Detailed algorithm for branching bisimulation partitioning


are printed in grey at the right-hand side of the pseudocode. We use these time budgets in Section 3.5 to analyse the overall complexity of the algorithm. We focus on the most important details in the algorithm.

At Lines 2.6–2.7, a small action-block-slice T <sup>a</sup> −→B is moved into its own bunch, and T is reduced to T \ T <sup>a</sup> −→B- . All blocks that have transitions in the two new bunches are added to the splitter list in Lines 2.8–2.13. This loop also marks some transitions (in the time complexity annotations we write Marked(T<sup>B</sup>−→) for the marked transitions of block-bunch-slice T<sup>B</sup>−→). The function of this marking is similar to that of the counters in [18]: it serves to determine quickly whether a bottom state has a transition in a secondary splitter T<sup>B</sup>−→ \T<sup>B</sup> <sup>a</sup> −→B- (or slices that are the result of splitting this slice). In general, a bottom state has transitions in some splitter block-bunch-slice if and only if it has marked transitions in this slice. There is one exception: After splitting under a primary splitter T<sup>B</sup>−→, bottom states in U are not marked. But as they always have a transition in T<sup>U</sup>−→ \ T<sup>U</sup> <sup>a</sup> −→B-, U is already stable in this case (see Line 2.19).

The second loop is refined to Lines 2.14–2.30. In every iteration one splitter T- <sup>B</sup>−→ from the splitter list is considered, and its source block is first split into <sup>R</sup> and U. Formally, the routine split(B, T) delivers the pair R, U defined by:

$$\begin{aligned} R &= \{ s \in B \mid s \xrightarrow{\tau} s\_1 \xrightarrow{\tau} \cdots \xrightarrow{\tau} s\_n \xrightarrow{a} s' \text{ where } s\_1, \dots, s\_n \in B, \, s\_n \xrightarrow{a} s' \in T \}, \\ U &= B \backslash R. \end{aligned} \tag{1}$$

We detail its algorithm and discuss its correctness in Section 4.

In Lines 2.21–2.28, the situation is handled when some inert transitions have become non-inert. We mark one of the outgoing transitions of every new bottom state such that we can find the bottom states with a transition in T<sup>N</sup>−→ in time proportional to the number of such new bottom states.

We illustrate the algorithm in the following example. Note this also illustrates some of the details of the split subroutine, which is discussed in detail in Section 4.

Example 1. Consider the situation in Figure 1a. Observe that block B is stable w.r.t. the bunches T and T- . We have split off a small bunch T <sup>a</sup> −→B from T, and as a consequence, B needs to be restabilised. The bunches put on the splitter list initially are T <sup>a</sup> −→B and T \ T <sup>a</sup> −→B- . When putting these bunches on the splitter list, all transitions in T<sup>B</sup> <sup>a</sup> −→B are marked, see the m's in Figure 1b. Also, for states that have transitions both in T <sup>a</sup> −→B and in T \T <sup>a</sup> −→B- , one transition in the latter bunch is marked, see the m's in Figure 1b.

We now first split B w.r.t. the primary splitter T <sup>a</sup> −→B into R, the states that can inertly reach T <sup>a</sup> −→B- , and U, the states that cannot. In Figure 1b, the states known to be destined for R are indicated by , the states known to be destined for U are indicated by . Initially, all states with a marked outgoing transition are destined for R, the remaining bottom state of B is destined for U. The split subroutine proceeds to extend sets R and U in a backwards fashion using two coroutines, marking a state destined for R if one of its successors is already in R, and marking a state destined for U if all its successors are in U. Here, the state in U does not have any incoming inert transitions, so its coroutine immediately terminates and all other states belong to R. Block B is split into subblocks R and U, as shown in Figure 1c. Block U is stable w.r.t. both T <sup>a</sup> −→B and T \T <sup>a</sup> −→B-.

We still need to split R w.r.t. T \ T <sup>a</sup> −→B- , into R<sup>1</sup> and U1, say. For this, we use the marked transitions in T \ T <sup>a</sup> −→B as a starting point to compute all bottom states that can reach a transition in T \ T <sup>a</sup> −→B- . This guarantees that the time we use is proportional to the size of T <sup>a</sup> −→B- . Initially, there is one state destined for R1, marked in Figure 1c, and one state destined for U1, marked in the same figure. We now perform the two coroutines in split simultaneously. Figure 1d shows the situation after both coroutines have considered one transition: The U1-coroutine (which calculates the states that cannot inertly reach T \ T <sup>a</sup> −→B- ) has initialised the counter untested of one state to 2 on Line 3.9 of Algorithm 3 because two of its outgoing inert transitions have not yet been considered. The R1-coroutine (which calculates the states that can inertly reach T \ T <sup>a</sup> −→B- ) has checked the unmarked transition in the splitter T<sup>R</sup>−→ \ T<sup>R</sup> <sup>a</sup> −→B- . As the latter coroutine has finished visiting unmarked transitions in the splitter, the U1-coroutine no longer needs to run the slow test loop at Lines 3.13–3.17 of the left column of Algorithm 3. In Figure 1e the situation is shown after two more steps in the coroutines. Each has visited two extra transitions. There two

Fig. 1: Illustration of splitting of a small block from T and stabilising block B with respect to the new bunches T <sup>a</sup> −→B and T \T <sup>a</sup> −→B-, as explained in Example 1.

extra are states destined for R1, marked , and one state is destined for U<sup>1</sup> with 0 remaining inert transitions, for which we know immediately that it has no transition in T \ T <sup>a</sup> −→B- , this is marked . Now, the R1-coroutine is terminated, since it contains more that <sup>1</sup> <sup>2</sup> |R| states, and the remaining incoming transitions of states in U<sup>1</sup> are visited. This will not further extend U1. The result of splitting is shown in Figure 1f. Some inert transitions become non-inert, so a new bunch with transitions R<sup>1</sup> τ −→ U<sup>1</sup> is created, and all these transitions are marked m.

We next have to split R<sup>1</sup> with respect to this new bunch into the set of states N<sup>1</sup> that can inertly reach a transition in the new bunch, and the set R- <sup>1</sup> that cannot inertly reach this bunch. In this case, all states in R<sup>1</sup> have a marked outgoing transition, hence N<sup>1</sup> = R1, and R- <sup>1</sup> = ∅. The coroutine that calculates the set of states that cannot inertly reach a transition in the bunch will immediately terminate because there are no transitions to be considered.

Observe that R<sup>1</sup> (= N1) has a new bottom state, marked 'nb'. This means that stability of R<sup>1</sup> with respect to any bunch is not guaranteed any more and needs to be re-established. We therefore consider all bunches in which R<sup>1</sup> has an outgoing transition. We add T<sup>R</sup><sup>1</sup> a −→B- , T<sup>R</sup>1−→ \ T<sup>R</sup><sup>1</sup> a −→B and T- <sup>R</sup>1−→ to the splitter list as secondary splitters, and mark one outgoing transition from each bottom state in each of these bunches using m. This situation is shown in Figure 1g.

In this case, R<sup>1</sup> is stable w.r.t. T<sup>R</sup><sup>1</sup> a −→B and T<sup>R</sup>1−→ \T<sup>R</sup><sup>1</sup> a −→B- , i.e., all states in R<sup>1</sup> can inertly reach a transition in both bunches. In both cases this is observed immediately after initialisation in split, since the set of states that cannot inertly reach a transition in these bunches is initially empty, and the corresponding coroutine terminates immediately.

Therefore, consider splitting R<sup>1</sup> with respect to T- <sup>R</sup>1−→. This leads to <sup>R</sup>2, the set of states that can inertly reach a transition in T- , and U2, the set of states that cannot inertly reach a transition in T- . Note there are no marked transitions in T- <sup>R</sup>1−→, so initially all bottom states of <sup>R</sup><sup>1</sup> are destined for <sup>U</sup><sup>2</sup> (marked in Figure 1h), and there are no states destined for R2. Then we start splitting R1. In the R2-coroutine, we first add the states with an unmarked transition in T- <sup>R</sup>1−→ to R<sup>2</sup> at Line 3.4r (i.e., in the right column of Algorithm 3) and then all predecessors of the new bottom state need to be considered. When split terminates, there will be no additional states in U2, and the remaining states end up in R2.

The situation after splitting R<sup>1</sup> into R<sup>2</sup> and U<sup>2</sup> is shown in Figure 1i. One of the inert transitions (marked m) becomes non-inert. Furthermore, R<sup>2</sup> contains a new bottom state. This is the state with a transition in T- . As each block must have a bottom state, a non-bottom state had to become a bottom state.

We need to continue stabilising R<sup>2</sup> w.r.t. bunch R<sup>2</sup> τ −→ U2, which does not lead to a new split, and we need to restabilise R<sup>2</sup> w.r.t. all bunches in which it has an outgoing transition. This also does not lead to new splits, so the situation in Figure 1i after removing the markings is the final result of splitting.

#### **3.5 Time complexity**

Throughout this section, let n be the number of states and m the number of transitions in the LTS. To simplify the complexity notations we assume that n ≤ m + 1. This is not a significant restriction, since it is satisfied by any LTS in which every non-initial state has an incoming transition. We also write in(s) and out(s) for the sets of incoming and outgoing transitions of state s.

We use the principle "Process the smaller half" [13]: when a set is split into two parts, we spend time proportional to the size of the smaller subset. This leads to a logarithmic number of operations assigned to each element. We apply this principle twice, once to new bunches and once to new subblocks. Additionally, we spend some time on new bottom states. This is formulated in the following theorem.

#### **Theorem 2.** For the main loop of Algorithm 2 we have:


#### Summing up these time budgets leads to an overall time complexity of O(m log n).

These runtimes are annotated as time budgets in the main loop of Algorithm 2. Line 2.7 moves the transitions of T <sup>a</sup> −→B to their new bunch, and Lines 2.6–2.14 take time proportional to the size of this new bunch.

A new subblock is formed at Line 2.17 (and at the same time, some states in subblock R may become new bottom states). Lines 2.15–2.22 take time proportional to its incoming and outgoing transitions. Similarly, a new subblock is formed in Line 2.23, and Lines 2.23–2.26 take time proportional to this subblock's transitions.

Finally, new bottom states found in R (and separated into N) allow to spend time proportional to Bottom(N)−→ at Lines 2.15–2.28. At Line 2.27 we need to include not only the current new bottom states but also the future ones because there may be block-bunch-slices that only have transitions from non-bottom states. When N is split under such a block-bunch-slice, at least one of these states will become a bottom state.

Time spent per marked transition fits the time bound because only a small number of transitions is marked: In Lines 2.11 and 2.12, at most two transitions are marked per transition in the small splitter T <sup>a</sup> −→B- . Line 2.22 marks R <sup>τ</sup> −→ U ⊆ out(R) ∩ in(U), which is always within the transitions of the smaller subblock. Line 2.28 marks no more transitions than the new bottom states have.

The initialisation in Lines 2.1–2.5 can be performed in O(m) time, where the assumption n ≤ m + 1 is used. Furthermore, we assume that we can access action labels fast enough to bucket sort the transitions in time O(m), which is for instance the case if the action labels are consecutively numbered.

To meet the indicated time budgets, our implementation uses a number of data structures. States are stored in a refinable partition [21], grouped per block, in such a way that we can visit bottom states without spending time on nonbottom states. Transitions are stored in four linked refinable partitions, grouped per source state, per target state, per bunch, and per block-bunch-slice, in such a way that we can visit marked transitions without spending time on unmarked transitions of the block. How these data structures are instrumental for the complexity can be found in [14].

## **4 Splitting blocks**

The function split(B, T), presented in Algorithm 3, refines block B into subblocks R and U, where R contains those states in B that can inertly reach a transition in T, and U contains the states that cannot, as formally specified in Equation (1).

These two sets are computed by two coroutines executing in lockstep: the two coroutines start the same number of loop iterations, so that the overhead is at most proportional to the faster of the two and all work done in both coroutines can be attributed to the smaller of the two subblocks R and U.

As a precondition, split requires that bottom states of B with an outgoing transition in T<sup>B</sup>−→ have a marked outgoing transition in T<sup>B</sup>−→. Formally, Bottom(B) Marked(TB−→) −−−−−−−−−→ <sup>=</sup> Bottom(B) TB−→ −−−−→. This allows to compute the initial sets: All states in B Marked(<sup>T</sup> ) −−−−−−−→, i.e., sources of marked transitions in T, are put in R. All bottom states that are not initially in R are put in U.

The sets are extended as follows in the coroutines. For R, first the states in B <sup>T</sup> \Marked(<sup>T</sup> ) −−−−−−−−−→ are added that were not yet in R. These are all the sources of unmarked transitions in T. Using backward reachability along inert transitions, R is extended until no more states can be added.


To identify the states in U, observe that a state is in U if all its inert successors are in U and it does not have a transition in T<sup>B</sup>−→. To compute U, we let a counter untested[t] for every non-bottom state t record the number of outgoing inert transitions to states that are not yet known to be in U. If untested[t] = 0, this means all inert successors of t are guaranteed to be in U, so, provided t does not have a transition in T<sup>B</sup>−→, one can also add t to U. To take care of the possibility that all inert transitions of t have been visited before all sources of unmarked transitions in T<sup>B</sup>−→ are added to R, we check all non-inert transitions of t to determine whether they are not in T<sup>B</sup>−→ at Lines 3.13–3.17.

The coroutine that finishes first, provided that its number of states does not exceed <sup>1</sup> <sup>2</sup> |B|, has completely computed the smaller subblock resulting from the refinement, and the other coroutine can be aborted. As soon as the number of states of a coroutine is known to exceed <sup>1</sup> <sup>2</sup> |B|, it is aborted, and the other coroutine can continue to identify the smaller subblock. In detail, the runtime complexity of R, U := split(B, T) is:

**–** O(|R−→| + |R←−|), if |R|≤|U|, and

**–** O(|Marked(T)| + |U−→| + |U←−| + |(Bottom(R) \ Bottom(B))−→|), if |U|≤|R|. This complexity is inferred as follows. As we execute the coroutines in lockstep, it suffices to show that the runtime bound for the smaller subblock is satisfied.

In case |R|≤|U|, observe |Marked(T)|≤|R−→|, so we get O(|R−→| + |R←−|) directly from the R-coroutine. When |U|≤|R|, we use time in O(|Marked(T)|) for Line 3.2, and we use time in O(|U←−|) for everything else except Lines 3.13– 3.17. For these latter lines, we distinguish two cases. If it turns out that t has no transition t <sup>α</sup> −→ u ∈ T, it is a U-state, so we attribute the time to O(|U−→|). Otherwise, it is an R-state that had some inert transitions in B, but they all are now in R <sup>τ</sup> −→ U. So t is a new bottom state, and we attribute the time to the outgoing transitions of new bottom states: O(|(Bottom(R) \ Bottom(B))−→|).

## **5 Experimental evaluation**

The new algorithm (JGKW20) has been implemented in the mCRL2 toolset [5] and is available in its 201908.0 release. This toolset also contains implementations of various other algorithms, such as the O(mn) algorithm by Groote and Vaandrager (GV) [10] and the O(m(log |Act| + log n)) algorithm of [9] (GJKW17). In addition, it offers a sequential implementation of the partition-refinement algorithm using state signatures by Blom and Orzan (BO) [3], which has time complexity O(n<sup>2</sup>m). For each state, BO maintains a signature describing which blocks the state can reach directly via its outgoing transitions.

In this section, we report on the experiments we have conducted to compare GV, BO, GJKW17 and JGKW20 when applied to practical examples. In the experiments the given LTSs are minimised w.r.t. branching bisimilarity. The set of benchmarks consists of all LTSs offered by the VLTS benchmark set<sup>3</sup> with at least 60,000 transitions. Their name ends in " n/1000 m/1000" and thus

<sup>3</sup> http://cadp.inria.fr/resources/vlts.


An O(m n) log algorithm for branching bisimilarity on LTSs <sup>17</sup>

describes their size. Additionally, we consider three cases that have been derived from models distributed with the mCRL2 toolset:


The software and benchmarks used for the experiments are available online [15]. All experiments have been conducted on individual nodes of the DAS-5 cluster [1]. Each of these nodes was running CentOS Linux 7.4, had an Intel Xeon E5-2698-v3 2.3GHz CPU, and was equipped with 256 GB RAM. Development version 201808.0.c59cfd413f of mCRL2 was used for the experiments.<sup>4</sup>

Table 1 presents the obtained results. Benchmarks are ordered by their number of transitions. On each benchmark, we have applied each algorithm ten times, and report the mean runtime and memory use of these ten runs, rounded to significant digits (estimated using [4] for the standard deviation). A trailing decimal dot indicates that the unit digit is significant. If this dot is missing, there is one insignificant zero. For all presented data the estimated standard deviation is less than 20% of the mean. Otherwise we print '-' in Table 1.

The -symbol after a table entry indicates that the measurement is significantly better than the corresponding measurements for the other three algorithms, and the -symbol indicates that it is significantly worse. Here, the results are considered significant if, given a hundred tables such as Table 1, one table of running time (resp. memory) is expected to contain spuriously significant results.

Concerning the runtimes, clearly, GV and BO perform significantly worse than the other two algorithms, and JGKW20 in many cases performs significantly better than the others. In particular, JGKW20 is about 40% faster than GJKW17, the fastest older algorithm. Concerning memory use, in the majority of cases GJKW17 uses more memory than the others, while sometimes BO is the most memory-hungry. JGKW20 is much more competitive, in many cases even outperforming every other algorithm.

The results show that when applied to practical cases, JGKW20 is generally the fastest algorithm, and even when other algorithms have similar runtimes, it uses almost always the least memory. This combination makes JGKW20 currently the best option for branching bisimulation minimisation of LTSs.

**Data Availability Statement and Acknowledgement.** The datasets generated and analysed during the current study are available in the figshare repository: https://doi.org/10.6084/m9.figshare.11876688.v1. This work is partly done during a visit of the first author at Eindhoven University of Technology, and a visit of the second author at the Institute of Software, Chinese Academy of Sciences. The first author is supported by the National Natural Science Foundation of China, Grant No. 61761136011.

<sup>4</sup> https://github.com/mCRL2org/mCRL2/commit/c59cfd413f

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Verifying Quantum Communication Protocols with Ground Bisimulation***-*

Xudong Qin<sup>1</sup>,<sup>2</sup>, Yuxin Deng<sup>1</sup> , and Wenjie Du<sup>3</sup>

**TACAS Evaluation Artifact 2020 Accepted**

<sup>1</sup> Shanghai Key Laboratory of Trustworthy Computing, MOE International Joint Lab of Trustworthy Software, and International Research Center of Trustworthy Software, East China Normal University, Shanghai, China steven qxd@126.com yxdeng@sei.ecnu.edu.cn <sup>2</sup> Peng Cheng Laboratory, Shenzhen, China <sup>3</sup> Shanghai Normal University, Shanghai, China wenjiedu@shnu.edu.cn

**Abstract.** One important application of quantum process algebras is to formally verify quantum communication protocols. With a suitable notion of behavioural equivalence and a decision method, one can determine if an implementation of a protocol is consistent with its specification. Ground bisimulation is a convenient behavioural equivalence for quantum processes because of its associated coinduction proof technique. We exploit this technique to design and implement two on-the-fly algorithms for the strong and weak versions of ground bisimulation to check if two given processes in quantum CCS are equivalent. We then develop a tool that can verify interesting quantum protocols such as the BB84 quantum key distribution scheme.

**Keywords:** Quantum process algebra · Bisimulation · Verification · Quantum communication protocols.

## **1 Introduction**

Process algebras provide a useful formal method for specifying and verifying concurrent systems. Their extensions to the quantum setting have also appeared in the literature. For example, Jorrand and Lalire [18,21] defined the *Quantum Process Algebra* (QPAlg) and presented a branching bisimulation to identify quantum processes with the same branching structure. Gay and Nagarajan [15] developed *Communicating Quantum Processes* (CQP), for which Davidson [6] established a bisimulation congruence. Feng et al. [10] have proposed a quantum variant of Milner's CCS [23], called qCCS, and a notion of probabilistic bisimulation for quantum processes, which is then improved to be a general notion of bisimulation that enjoys a congruence property [12]. Later on, motivated by [25], Deng and Feng [9] defined an open bisimulation for quantum processes

<sup>-</sup> Supported by the National Natural Science Foundation of China (61672229, 61832015) and the Inria-CAS joint project Quasar.

that makes it possible to separate ground bisimulation and the closedness under super-operator applications, thus providing not only a neater and simpler definition, but also a new technique for proving bisimilarity. In order to avoid the problem of instantiating quantum variables by potentially infinitely many quantum states, Feng et al. [11] extended the idea of symbolic bisimulation [17] for value-passing CCS and provided a symbolic version of open bisimulation for qCCS. They proposed an algorithm for checking symbolic ground bisimulation.

In the current work, we consider the ground bisimulation proposed in [9]. We put forward an on-the-fly algorithm to check if two given processes in qCCS with fixed initial quantum states are ground bisimilar. The algorithm is simpler than the one in [11] because the initial quantum states are determined for the former but can be parametric for the latter. Moreover, in many applications, we are only interested in the correctness of a quantum protocol with a predetermined input of quantum states. This is especially the case in the design stage of a protocol or in the debugging of a program.

The ground bisimulation defined in [9] is a notion of weak bisimulation because a strong transition can be matched by a weak transition where invisible actions are abstracted away. We also consider a strong version where all actions are visible, for which we have a simpler algorithm. Both algorithms are obtained by adapting the on-the-fly algorithm for checking probabilistic bisimulations [8,7], which in turn has its root in similar algorithms for checking classical bisimulations [14,17]. The basic idea is as follows. A quantum process with an initial quantum state forms a configuration. We describe the operational behaviour of a configuration as a probabilistic labelled transition system (pLTS), where probabilistic transitions arise naturally because measuring a quantum system can entail a probability distribution of post-measurement quantum systems. Ground bisimulations are a strengthening of probabilistic bisimulations by imposing some constraints on quantum variables and the environment states of processes. The skeleton of the algorithm for the strong ground bisimulation resembles to that for strong probabilistic bisimulation [8]. The algorithm for the (weak) ground bisimulation is inspired by [28] and uses as a subroutine a procedure in the aforementioned work. The procedure reduces the problem of finding a matching weak transition to a linear programming problem that can be solved in polynomial time. We have developed a tool that implements both algorithms and can check if two given configurations are strongly or weakly bisimilar. It is useful to validate whether an implementation of a protocol is equivalent to the specification. We have conducted experiments on a few interesting quantum protocols including super-dense coding, teleportation, secret sharing, and several quantum key distribution protocols, in particular the BB84 protocol [5], to analyse the functional correctness of the protocols.

*Other related work* Ardeshir-Larijani et al. [3] proposed a quantum variant of CCS to describe quantum protocols. The syntax of that variant is similar to qCCS but its semantics is very different. The behaviour of a concurrent process is a finite tree and an interleaving is a path from the root to a leaf. By interpreting an interleaving as a superoperator [26], the semantics of a process is a set of superoperators. The equivalence checking between two processes boils down to the equivalence checking between superoperators, which is accomplished by using the stabiliser simulation algorithm invented by Aaronson and Gottesman [1]. Ardeshir-Larijani et al. have implemented their approach in an equivalence checker in Java and verified several quantum protocols from teleportation to secret sharing. However, they are not able to handle the BB84 quantum key distribution protocol because its correctness cannot be specified as an equivalence between interleavings. Our approach is based on ground bisimulation and keeps all the branching behaviour of a concurrent process. Our algorithms for checking ground bisimulations are influenced by the on-the-fly algorithm of Hennessy and Lin for value-passing CCS [17]. We are inspired by the probabilistic bisimulation checking algorithm of Baier et al. [4] for the strong version of ground bisimulation, and by the weak bisimulation checking algorithm of Turrini and Hermanns [28] for the weak version.

Kubota et al. [20] implemented a semi-automated tool to check a notion of symbolic bisimulation and used it to verify the equivalence of BB84 and another quantum key distribution protocol based on entanglement distillation [27]. There are two main differences between their work and ours. (1) Their tool is based on equational reasoning and thus requires a user to provide equations while our tool is fully automatic. (2) Their semantic interpretation of measurement is different and entails a kind of linear-time semantics for quantum processes that ignores the timepoints of the occurrences of probabilistic branches. However, we use a branching-time semantics. For instance, the occurrence of a measurement before or after a visible action is significant for our semantics but not for the semantics proposed in [20].

Besides equivalence checking, based on either superoperators or bisimulations as mentioned above, model checking is another feasible approach to verify quantum protocols. For instance, Gay et al. developed the QMC model checker [16]. Feng et al. implemented the tool QPMC [13] to model check quantum programs and protocols. There are also other approaches for verifying quantum systems. Abramsky and Coecke [2] proposed a categorical semantics for quantum protocols. Quantomatic [19] is a semi-automated tool based on graph rewriting. Ying [30] established a quantum Hoare logic, which has been implemented in a theorem prover [22].

The rest of the paper is structured as follows. In Section 2 we recall the syntax and semantics of the quantum process algebra qCCS. In Section 3 we present an algorithm for checking ground bisimulations. In Section 4 we report the implementation of the algorithm and some experimental results on verifying a few quantum communication protocols. Finally, we conclude in Section 5 and discuss some future work.

## **2 Quantum CCS**

We introduce a quantum extension of classical CCS (qCCS) which was originally studied in [10,29,12]. Three types of data are considered in qCCS: as classical

$$\begin{array}{lll} qv(\mathsf{nil}) = \emptyset & qv(\tau.P) = qv(P) \\ qv(c?x.P) = qv(P) & qv(c!e.P) = qv(P) \\ qv(\underline{c}?q.P) = qv(P) - \{q\} & qv(\underline{c}!q.P) = qv(P) \cup \{q\} \\ qv(\mathcal{E}[\widetilde{q}].P) = qv(P) \cup \widetilde{q} & qv(M[\widetilde{q};x].P) = qv(P) \cup \widetilde{q} \\ qv(P+Q) = qv(P) \cup qv(Q) & qv(P \mid Q) = qv(P) \cup qv(Q) \\ qv(P[f]) = qv(P) & qv(P \mid L) = qv(P) \\ qv(\textbf{if } b \text{ then } P) = qv(P) & qv(A(\widetilde{q};\widetilde{x})) = \widetilde{q}. \end{array}$$

**Fig. 1.** Free quantum variables

data we have Bool for booleans and Real for real numbers, and as quantum data we have Qbt for qubits. Consequently, two countably infinite sets of variables are assumed: *cVar* for classical variables, ranged over by x, y, ..., and *qVar* for quantum variables, ranged over by q, r, .... We assume a set *Exp*, which includes *cVar* as a subset and is ranged over by e, e- ,... , of classical data expressions over Real, and a set of boolean-valued expressions *BExp*, ranged over by b, b- ,... , with the usual boolean constants true, false, and operators ¬, ∧, ∨, and →. In particular, we let e e be a boolean expression for any e, e- ∈ *Exp* and - ∈ {>, <, <sup>≥</sup>, <sup>≤</sup>, <sup>=</sup>}. We further assume that only classical variables can occur freely in both data expressions and boolean expressions. Two types of channels are used: *cChan* for classical channels, ranged over by c, d, ..., and *qChan* for quantum channels, ranged over by c, d,.... A relabelling function f is a map on *cChan* <sup>∪</sup> *qChan* such that <sup>f</sup>(*cChan*) <sup>⊆</sup> *cChan* and <sup>f</sup>(*qChan*) <sup>⊆</sup> *qChan*. Sometimes we abbreviate a sequence of distinct variables q1, ..., q<sup>n</sup> into ˜q.

The terms in qCCS are given by:

$$\begin{array}{c|c|c|c|c|c|c|c} P, Q ::= \mathtt{nil} & \tau.P & c?x.P & c!e.P & \underline{c}?q.P & \underline{c}!q.P & \mathcal{E}[\check{q}].P & M[\check{q};x].P & \mid \\ & P + Q & P \mid Q & \mid P[f] & P \backslash L & \mathtt{if} \, b \,\mathtt{then} \, P & A(\check{q};\check{x}) \\ \end{array}$$

where <sup>f</sup> is a relabelling function and <sup>L</sup> <sup>⊆</sup> *cChan* <sup>∪</sup> *qChan* is a set of channels. Most of the constructors are standard as in CCS [23]. We briefly explain a few new constructors. The process c?q.P receives a quantum datum along quantum channel c and evolves into P, while c!q.P sends out a quantum datum along quantum channel <sup>c</sup> before evolving into <sup>P</sup>. The symbol <sup>E</sup> represents a tracepreserving super-operator applied on the quantum system referred to by the variables ˜q. The process <sup>M</sup>[q-; <sup>x</sup>].P measures the state of qubits <sup>q</sup> according to the observable M and stores the measurement outcome into the classical variable x of P.

Free classical variables can be defined in the usual way, except for the fact that the variable x in the quantum measurement M[˜q; x] is bound. A process P is closed if it contains no free classical variable, i.e. fv(P) = <sup>∅</sup>.

The set of free quantum variables for process P, denoted by qv(P) can be inductively defined as in Figure 1. For a process to be legal, we require that


$$\begin{array}{llll} \langle\hat{T}^{(\mathsf{T},\mathsf{op})}\rangle & \stackrel{(C.\mathsf{op},\mathsf{op})}{\longrightarrow} & \langle\hat{T}^{(\mathsf{op},\mathsf{op})}\rangle \\ \langle\hat{\tau},P\_{\rho}\rangle \stackrel{\scriptstyle{\tau}}{\longrightarrow} \langle P,\rho\rangle & \stackrel{(C.\mathsf{op},\mathsf{op})}{\langle C.\mathsf{op},\rho\rangle \stackrel{\scriptstyle{\tau}\rightarrow\mathsf{op}}} \langle\hat{P}[\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\!]{\!]{\!]{\!}{\!}{\!}{\!]{\!}{\!]{\!@\!}{\!]{\!@\!}{\!]{\!@\!}{\!]{\!@\!}{\!]{\!}{\!@\!}{\!}{\!]{\!@\!}{\!]{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!}{\!$$

**Fig. 2.** Operational semantics of qCCS. Here in rule (C-Outp), [[e]] is the evaluation of e, and in rule (Meas), E<sup>i</sup> <sup>q</sup>˜ denotes the operator E<sup>i</sup> acting on the quantum systems ˜q.

3. Each constant A(˜q; ˜x) has a defining equation A(˜q; ˜x) := P, where P is a term with qv(P) <sup>⊆</sup> <sup>q</sup>˜ and fv(P) <sup>⊆</sup> <sup>x</sup>˜.

The first condition says that a quantum system will not be referenced after it has been sent out. This is a requirement of the quantum no-cloning theorem. The second condition says that parallel composition || models separate parties that never reference a quantum system simultaneously.

Throughout the paper we implicitly assume the convention that processes are identified up to α-conversion, bound variables differ from each other and they are different from free variables.

Before introducing the operational semantics of qCCS processes, we review the model of probabilistic labelled transition systems (pLTSs). Later on we will interpret the behaviour of quantum processes in terms of pLTSs because quantum measurements give rise to probability distributions naturally.

We begin with some notations. A (discrete) probability distribution over a set <sup>S</sup> is a function <sup>Δ</sup>: <sup>S</sup> <sup>→</sup> [0, 1] with <sup>s</sup>∈<sup>S</sup> <sup>Δ</sup>(s) = 1; the support of such a <sup>Δ</sup> is the set <sup>Δ</sup> <sup>=</sup> { <sup>s</sup> <sup>∈</sup> <sup>S</sup> <sup>|</sup> <sup>Δ</sup>(s) <sup>&</sup>gt; <sup>0</sup> }. The point distribution <sup>s</sup> assigns probability 1 to <sup>s</sup> and 0 to all other elements of <sup>S</sup>, so that <sup>s</sup> <sup>=</sup> {s}. We only need to use distributions with finite supports, and let Dist(S) denote the set of finite support distributions over S, ranged over by Δ, Θ, etc. If <sup>k</sup>∈<sup>K</sup> <sup>p</sup><sup>k</sup> = 1 for some collection of <sup>p</sup><sup>k</sup> <sup>≥</sup> 0, and the <sup>Δ</sup><sup>k</sup> are distributions, then so is <sup>k</sup>∈<sup>K</sup> <sup>p</sup><sup>k</sup> · <sup>Δ</sup><sup>k</sup> with ( <sup>k</sup>∈<sup>K</sup> <sup>p</sup><sup>k</sup> · <sup>Δ</sup>k)(s) = <sup>k</sup>∈<sup>K</sup> <sup>p</sup><sup>k</sup> · <sup>Δ</sup>k(s).

**Definition 1.** *<sup>A</sup>* probabilistic labelled transition system *is a triple* S, Act<sup>τ</sup> , →*, where* S *is a set of states,* Act<sup>τ</sup> *is a set of visible actions* Act *augmented with the invisible action* <sup>τ</sup> *, and* → ⊆ <sup>S</sup> <sup>×</sup> Act<sup>τ</sup> <sup>×</sup> Dist(S) *is the transition relation.*

We often write <sup>s</sup> <sup>α</sup> −→ <sup>Δ</sup> for (s, α, Δ) ∈ →. In pLTSs we not only consider relations between states, but also relations between distributions. Therefore, we make use of the lifting operation below [7].

**Definition 2.** *Let* R ⊆ <sup>S</sup>×<sup>S</sup> *be a relation between states. Then* <sup>R</sup>◦ <sup>⊆</sup> Dist(S)<sup>×</sup> Dist(S) *is the smallest relation that satisfies the two rules: (i)* <sup>s</sup> <sup>R</sup> <sup>s</sup> *implies* <sup>s</sup> <sup>R</sup>◦ <sup>s</sup>- *; (ii)* <sup>Δ</sup><sup>i</sup> <sup>R</sup>◦ <sup>Θ</sup><sup>i</sup> *for all* <sup>i</sup> <sup>∈</sup> <sup>I</sup> *implies* ( <sup>i</sup>∈<sup>I</sup> <sup>p</sup><sup>i</sup> · <sup>Δ</sup>i) <sup>R</sup>◦ ( <sup>i</sup>∈<sup>I</sup> <sup>p</sup><sup>i</sup> · <sup>Θ</sup>i) *for any* <sup>p</sup><sup>i</sup> <sup>∈</sup> [0, 1] *with* <sup>i</sup>∈<sup>I</sup> <sup>p</sup><sup>i</sup> = 1*, where* <sup>I</sup> *is a finite index set.*

We apply this operation to the relations <sup>α</sup> −→ in the pLTS for <sup>α</sup> <sup>∈</sup> Act<sup>τ</sup> , where we also write <sup>α</sup> −→ for ( <sup>α</sup> −→) ◦ . Thus as source of a relation <sup>α</sup> −→ we now also allow distributions. But note that <sup>s</sup> <sup>α</sup> −→ <sup>Δ</sup> is more general than <sup>s</sup> <sup>α</sup> −→ <sup>Δ</sup> because if s <sup>α</sup> −→ <sup>Δ</sup> then there is a collection of distributions <sup>Δ</sup><sup>i</sup> and probabilities <sup>p</sup><sup>i</sup> such that <sup>s</sup> <sup>α</sup> −→ <sup>Δ</sup><sup>i</sup> for each <sup>i</sup> <sup>∈</sup> <sup>I</sup> and <sup>Δ</sup> <sup>=</sup> <sup>i</sup>∈<sup>I</sup> <sup>p</sup><sup>i</sup> · <sup>Δ</sup><sup>i</sup> with <sup>i</sup>∈<sup>I</sup> <sup>p</sup><sup>i</sup> = 1.

We write <sup>s</sup> <sup>τ</sup><sup>ˆ</sup> −→ <sup>Δ</sup> if either <sup>s</sup> <sup>τ</sup> −→ <sup>Δ</sup> or <sup>Δ</sup> <sup>=</sup> <sup>s</sup>. We define weak transitions <sup>a</sup><sup>ˆ</sup> <sup>=</sup><sup>⇒</sup> by letting <sup>τ</sup><sup>ˆ</sup> <sup>=</sup><sup>⇒</sup> be the reflexive and transitive closure of <sup>τ</sup><sup>ˆ</sup> −→ and writing <sup>Δ</sup> <sup>a</sup><sup>ˆ</sup> <sup>=</sup><sup>⇒</sup> <sup>Θ</sup> for <sup>a</sup> <sup>∈</sup> Act whenever <sup>Δ</sup> <sup>τ</sup><sup>ˆ</sup> <sup>=</sup><sup>⇒</sup> <sup>a</sup> −→ <sup>τ</sup><sup>ˆ</sup> <sup>=</sup><sup>⇒</sup> <sup>Θ</sup>. If <sup>Δ</sup> <sup>=</sup> <sup>s</sup> is a point distribution, we often write <sup>s</sup> <sup>a</sup><sup>ˆ</sup> <sup>=</sup><sup>⇒</sup> <sup>Θ</sup> instead of <sup>s</sup> <sup>a</sup><sup>ˆ</sup> <sup>=</sup><sup>⇒</sup> <sup>Θ</sup>.

We now give the semantics of qCCS. For each quantum variable q we assume a 2-dimensional Hilbert space <sup>H</sup><sup>q</sup>. For any nonempty subset <sup>S</sup> <sup>⊆</sup> *qVar* we write H<sup>S</sup> for the tensor product space <sup>q</sup>∈<sup>S</sup> <sup>H</sup><sup>q</sup> and <sup>H</sup><sup>S</sup> for <sup>q</sup>∈<sup>S</sup> <sup>H</sup><sup>q</sup>. In particular, H = HqVar is the state space of the whole environment consisting of all the quantum variables, which is a countably infinite dimensional Hilbert space.

Let <sup>P</sup> be a closed quantum process and <sup>ρ</sup> a density operator on <sup>H</sup> , the pair P, ρ is called a *configuration*. We write *Con* for the set of all configurations, ranged over by C and D. We interpret qCCS with a pLTS whose states are all the configurations definable in the language, and whose transitions are determined by the rules in Figure 2; we have omitted the obvious symmetric counterparts to the rules *(C-Com)*, *(Q-Com)*, *(Int)* and *(Sum)*. The set of actions Act takes the following form, consisting of classical/quantum input/output actions. 1

Act <sup>=</sup> {c?v, c!<sup>v</sup> <sup>|</sup> <sup>c</sup> <sup>∈</sup> *cChan*, v <sup>∈</sup> Real}∪{c?r, c!<sup>r</sup> <sup>|</sup> <sup>c</sup> <sup>∈</sup> *qChan*, r <sup>∈</sup> *qVar*}

As H is infinite dimensional, ρ should be understood as a density operator on some finite dimensional subspace of H which contains Hqv(<sup>P</sup> ). 1

We use cn(α) for the set of channel names in action α. For example, we have cn(c?x) = {c} and cn(<sup>τ</sup> ) = <sup>∅</sup>.

In the first eight rules in Figure 2, the targets of arrows are point distributions, and we use the slightly abbreviated form <sup>C</sup> <sup>α</sup> −→ C to mean <sup>C</sup> <sup>α</sup> −→ C- .

The rules use the obvious extension of the function || on terms to configurations and distributions. To be precise, C || <sup>P</sup> is the configuration <sup>Q</sup> || P, ρ where <sup>C</sup> <sup>=</sup> Q, ρ, and <sup>Δ</sup> || <sup>P</sup> is the distribution defined by:

$$(\Delta \parallel P)(\langle Q, \rho \rangle) \stackrel{def}{=} \begin{cases} \Delta(\langle Q', \rho \rangle) \text{ if } Q = Q' \parallel P \text{ for some } Q'\\ 0 & \text{otherwise.} \end{cases}$$

Similar extension applies to <sup>Δ</sup>[f] and <sup>Δ</sup>\L.

Suppose there is a configuration <sup>C</sup> <sup>=</sup> P, ρ, the partial trace over system P at such state can be defined as trqv(<sup>P</sup> )(ρ) whose result is a reduced density operator representing the state of the environment. We give the definition of ground bisimulation and bisimilarity as follows.

**Definition 3 ([9]).** *A relation* R ⊆ *Con* × *Con is a* ground simulation *if for any* <sup>C</sup> <sup>=</sup> P, ρ*,* <sup>D</sup> <sup>=</sup> Q, σ*,* CRD *implies that qv*(P) = *qv*(Q)*,* trqv(<sup>P</sup> )(ρ) = trqv(Q)(σ)*, and*

**–** *whenever* <sup>C</sup> <sup>α</sup> −→ <sup>Δ</sup>*, there is some distribution* <sup>Θ</sup> *with* <sup>D</sup> <sup>α</sup><sup>ˆ</sup> <sup>=</sup><sup>⇒</sup> <sup>Θ</sup> *and* <sup>Δ</sup> <sup>R</sup>◦ <sup>Θ</sup>*.*

*A relation* <sup>R</sup> *is a* ground bisimulation *if both* <sup>R</sup> *and* <sup>R</sup><sup>−</sup><sup>1</sup> *are ground simulations. We denote by* ≈ *the largest ground bisimulation, called* ground bisimilarity*. If the above weak transition* <sup>D</sup> <sup>α</sup><sup>ˆ</sup> <sup>=</sup><sup>⇒</sup> <sup>Θ</sup> *is replaced by a strong transition* <sup>D</sup> <sup>α</sup> −→ <sup>Θ</sup>*, we obtain a* strong ground bisimulation*.*

In the rest of the paper, we mainly focus on ground bisimulation and only briefly mention the algorithm for checking strong ground bisimulation.

## **3 Algorithm**

We present an on-the-fly algorithm to check if two configurations are ground bisimilar.

The algorithm maintains two sets NonBisim and Bisim to keep non-bisimilar and bisimilar state pairs, respectively. When the algorithm terminates, Bisim should contain all the state pairs satisfying the bisimulation relation.

The function **Bisim**(t, u), as shown in Algorithm 1, is the main function of the algorithm, which attempts to find the smallest bisimulation containing the pair (t, u). It initialises Bisim and a set named V isited to store the visited pairs, then calls the function **Match** to search for a bisimulation. The function **Match**(t, u, V isited) invokes a depth-first traversal to match a pair of states (t, u) with all their possible behaviours. The set V isited is updated before the traversal for detecting loops. We also match the behaviours of t and u from both directions as we are checking bisimulations. Two states are deemed non-bisimilar in three cases:


The first case is checked by **MatchAction**, and the other two are done in **Match**. We add a pair of states to NonBisim if one of the three cases above has occurred. Otherwise, it will be stored in Bisim.

An auxiliary function **Act**(t) is invoked in **Match** to discover the next action that t can perform. If t have no more action to perform the function will return an empty set.

The function **MatchAction**(α, t, u, V isited) checks the equivalence of configurations through comparing their transitions. The function recursively discovers the next equivalent state pairs between the target states of the transitions. Technically, it checks the condition that if <sup>t</sup> <sup>α</sup> −→ <sup>Δ</sup> then there exists some <sup>Θ</sup> such that <sup>u</sup> <sup>α</sup><sup>ˆ</sup> <sup>=</sup><sup>⇒</sup> <sup>Θ</sup> and <sup>Δ</sup> <sup>R</sup>◦ <sup>Θ</sup>. Here we use as a subroutine a procedure of [28] to reduce the problem to a linear programming problem that can be solved in polynomial time. The problem is defined in Appendix. In **MatchAction**, we introduce a predicate **LP**(Δ, u, α, <sup>R</sup>) which is true if and only if the linear programming problem has a solution. We invoke the function **Close** to construct an equivalence relation <sup>R</sup> between <sup>S</sup> and the states in the support of the target distribution. Note that in Lines 28 and 34 we have two distinct cases because in output actions the emitted values are required to be equal, which are unlike other types of actions.

In general, there are loops in pLTSs. When a state pair to be considered is already contained in V isited it will be assumed to be bisimilar and added to Assumed (Lines 42-43). Later on, if the pair of states are found to be nonbisimilar, the pair will be added to NonBisim and a wrong assumption exception (Lines 18-21) will be raised to restart the checking process from the original pair of states. Then **Bisim**(t, u) renews the sets Bisim, V isited and Assumed to remove the pairs checked under the wrong assumption (Lines 4-6).

#### **Algorithm 1** Checking ground bisimulation

```
Require: Two pLTSs with initial configurations t and u.
```
**Ensure:** A boolean value bres indicating if the two pLTSs are ground bisimilar. 1: **function GroundBisimulation**(t, u) =

```
2: NonBisim := ∅
```

```
3: function Bisim(t, u) = try {
```

```
4: Bisim := ∅
```

```
5: V isited := ∅
```

```
8: } catch WrongAssumptionException ⇒ Bisim(t, u)
```

```
9:
```

```
10: function Match(t, u, V isited) -
```

```
t = P, ρ and u = Q, σ
```

```
11: V isited:=V isited ∪ {(t, u)}
12: b:=
          α∈Act(t) MatchAction(α,t,u,Visited)
13: b:=
          α∈Act(u) MatchAction(α,u,t,Visited)
14: bc1 :=qv(P) = qv(Q)
15: bc2 :=trqv(P )(ρ) = trqv(P )(σ)
16: bres:=b ∧ b ∧ bc1 ∧ bc2
17: if bres is tt then Bisim = Bisim ∪ {(t, u)}
18: else if bres is ff then
19: NonBisim = NonBisim ∪ {(t, u)}
20: if (t, u) ∈ Assumed then
21: raise WrongAssumptionException
22: return bres
23:
24: function MatchAction(α, t, u, V isited)
25: switch α do
26: case c!
27: for t c!ei −−→ Δi do
28: Assume {tk}tk∈Δi and {uj}
                                       u
                                        c!e-

                                          j
                                        ==⇒Γ ∧ei=e-

                                                j∧uj∈Γ
29: R:= {(tk, uj )|Close(tk, uj , V isited) = tt}
30: θ:=LP(Δi, u, α, R)
31: return 
                    i θi
32: otherwise
33: for t α
                −→ Δi do
34: Assume {tk}tk∈Δi and {uj}u α
                                        =⇒Γ ∧uj∈Γ
35: R:= {(tk, uj )|Close(tk, uj , V isited) = tt}
36: θ:=LP(Δi, u, α, R)
37: return 
                    i θi
38:
39: function Close(t, u, V isited)
40: if (t, u) ∈ Bisim then return tt
41: else if (t, u) ∈ NonBisim then return ff
42: else if (t, u) ∈ V isited then
43: Assumed = Assumed ∪ {(t, u)}
44: return tt
45: else return Match(t, u, V isited)
```
Now let us prove the termination and correctness of the algorithm.

**Theorem 1 (Termination).** *Given two configurations* t *and* u*, the function GroundBisimulation(t,u) always terminates.*

*Proof.* The algorithm starts with two empty sets NonBisim and Bisim. The next action to perform is detected in **Match**. Then it invokes function **MatchAction** to find the next new pair of configurations and recursively call function

**Match** to check them. Once a state pair is checked to be non-bisimilar in function **Match**, it is added into NonBisim. Meanwhile, if it is also contained in the set Assumed, the algorithm restarts a new execution of **Bisim**. Let k denote the number of executions of **Bisim**, and NonBisim<sup>k</sup> be the set NonBisim at the end of **Bisim**k. It is easy to show by induction that NonBisim<sup>k</sup> <sup>⊂</sup> NonBisim<sup>k</sup>+1 for any <sup>k</sup> <sup>≥</sup> 0. Since the system under consideration is finite-state, there always exists some n such that NonBisim<sup>n</sup> is the largest set of non-bisimilar state pairs and **Bisim**<sup>n</sup> is the last execution of **Bisim**.

After the execution of **Bisim**n, no more exceptions will be raised. Each time **Match** is executed with t and u as its parameters, we add (t, u) into V isited. The quantum variables and the configurations of the quantum registers for t and u are compared. When no more state pairs are added into V isited, the function M atch will not be invoked again and the whole algorithm will terminate.

**Theorem 2 (Correctness).** *Given two configurations* t *and* u *from two pLTSs, Bisim*(t, u) *returns true if and only if they are ground bisimilar.*

*Proof.* Let **Bisim**<sup>n</sup> be the last execution of **Bisim**. Let NonBisim<sup>n</sup> and Bisim<sup>n</sup> be the values of the two sets NonBisim and Bisim, respectively, recording the checked state pairs at the end of **Bisim**n. By inspecting **Match**, we know that NonBisim<sup>n</sup> <sup>∩</sup> Bisim<sup>n</sup> <sup>=</sup> <sup>∅</sup>.

Let us analyse the result returned by **Bisim**n, which is the output of the function call **Match**(t, u, V isited). If the result is f alse then one of the conjuncts in bres is invalid, which means that one of the three cases discussed in the beginning of Section 3 occurs, thus t and u are indeed non-bisimilar. If the return is true then there is Bisim<sup>n</sup> <sup>=</sup> V isited<sup>n</sup>\NonBisimn. For each pair (t, u) <sup>∈</sup> Bisimn, all the conjuncts in <sup>b</sup>res must be true. Both <sup>t</sup> and <sup>u</sup> must have the same set of free quantum variables and the same density operators. In addition, they have matching transitions. That is, for any action <sup>α</sup>, if <sup>t</sup> <sup>α</sup> −→ <sup>Δ</sup> then there exists some weak distribution <sup>Θ</sup> such that <sup>u</sup> <sup>α</sup> <sup>=</sup><sup>⇒</sup> <sup>Θ</sup> and <sup>Δ</sup> <sup>R</sup>◦ <sup>Θ</sup>. This is true because (i) the relation R in function **MatchAction** is correctly constructed, and (ii) the lifted relation R◦ exists. Below we argue for (i); the existence of the lifting operation in (ii) relies on the validity of the predicate **LP** whose correctness is established by Theorem 9 in [28].

The algorithm adds a pair into Assumed<sup>n</sup> if the pair to be checked has already been visited and passed the bisimulation checking conditions. It implies that Assumed<sup>n</sup> <sup>⊆</sup> V isitedn. Furthermore, as there is no wrong assumption detected after the execution of **Bisim**n, we have Assumed<sup>n</sup> <sup>⊆</sup> Bisim<sup>n</sup> which implies that Bisim<sup>n</sup> <sup>=</sup> Assumed<sup>n</sup> <sup>∪</sup> Bisimn. So Bisim<sup>n</sup> constitutes a bisimulation relation containing the initial state pair (t, u).

Before concluding this section, we analyse the time complexity of the algorithm.

**Theorem 3 (Complexity).** *Let the number of configurations reachable from* t *and* u *be n. The time complexity of function Bisim*(t, u) *is polynomial in* n*.*

*Proof.* The number of state pairs is at most n<sup>2</sup>. The number of state pairs examined in the <sup>k</sup>th execution of **Bisim** is at most <sup>O</sup>(n<sup>2</sup> <sup>−</sup> <sup>k</sup>). Therefore, the total number of state pairs examined is as most <sup>O</sup>(n<sup>2</sup>+(n<sup>2</sup>−1)+...+1) = <sup>O</sup>(n<sup>4</sup>). Note that each state has finitely many outgoing transitions. Given a transition, to check if there exists a weak matching transition, we call the function **LP** at most once, the construction of a flow network and solving the linear programming problem are both polynomial in n if we use the algorithm in [28]. Consequently, the whole algorithm is also polynomial in <sup>n</sup>.

For the strong version of ground bisimulation, we are only concerned with the matching of strong transitions. Therefore, Algorithm 1 can be simplified and there is no need of the predicate **LP** in the function **MatchAction**.

## **4 Implementation and Experiments**

In this section, we report on an implementation of our approach and provide the experimental results of verifying several quantum communication protocols.

**Fig. 3.** Verification workflow.

#### **4.1 Implementation**

We have implemented both strong and weak ground bisimulation checkers in Python 3.7. The workflow of our tool is sketched in Figure 3. The tool consists of a pLTS generation module and two bisimulation checking modules, devoted to modeling and verification, respectively. The input of this tool is a specification and an implementation of a quantum protocol, both described as qCCS processes, the definition of user-defined operators, as well as an initialisation of classical and quantum variables. Unlike classical variables, the initialisation of all quantum variables, deemed as a quantum register, is accomplished at the same time so to allow for superposition states. The final output of the tool is a result indicating whether the specification and the implementation are bisimilar under the same initial states. The algorithm also stores the bisimilar state pairs and non-bisimilar state pairs in two tables.

The pLTS generation module acts as a preprocessing unit before the verification task. It first translates the input qCCS processes into two abstract syntax trees (ASTs) by a parser. Then the ASTs are transformed into two pLTSs according to the operational semantics given in Figure 2, using the user-defined operators and the initial values of variables. The weak bisimulation checking module implements the weak ground bisimilarity checking algorithm we defined in the last section. It checks whether the initial states of the two generated pLTSs are weakly bisimilar.

The tool is available in [24], where we also provide all the examples for the experiments to be discussed in Section 4.3.

## **4.2 BB84 Quantum Key Distribution Protocol**

To illustrate the use of our tool, we formalise the BB84 quantum key distribution protocol. Our formalisation follows [11], where a manual analysis of the protocol is provided. Now we perform automatic verification via the ground bisimulation checker.

The BB84 protocol provides a provably secure way to create a private key between two partners with a classical authenticated channel and a quantum insecure channel between them. The protocol does not make use of entangled states. It ensures its security through the basic property of quantum mechanics: if the states to be distinguished are not orthogonal, such as |0 and |+, then information gain about a quantum state is only possible at the expense of changing the state. Let the sender and the receiver be Alice and Bob, respectively. The basic BB84 protocol with a sequence of qubits ˜q with size n goes as follows:


After the execution of the basic BB84 protocol, the remaining bits of K˜<sup>a</sup> and K˜<sup>b</sup> should be the same, provided that the communication channels are perfect and there is no eavesdropper.

*Implementation.* For simplicity, we assume that the sequence ˜q consists of only one qubit. This is enough to reflect the essence of the protocol. The other qubits used below are auxiliary qubits for the operation Ran.

Alice def = Ran[q1; Ba].Ran[q1; Ka].Set<sup>K</sup><sup>a</sup> [q1].H<sup>B</sup><sup>a</sup> [q1].A2B!q1. b2a?Bb.a2b!Ba.keya!cmp(Ka, Ba, Bb).**nil**; Bob def = A2B?q1.Ran[q2; Bb].M<sup>B</sup><sup>b</sup> [q1; Kb].b2a!Bb. a2b?Ba.keyb!cmp(Kb, Ba, Bb).**nil**; BB<sup>84</sup> def = (Alice||Bob) \ {a2b, b2a, A2B}

where there are several special operations:


*Specification.* The specification can be defined as follows using the same operations:

$$\begin{split} \text{BBS4}\_{\text{spec}} \stackrel{\text{def}}{=} \text{Ran}[q\_1; B\_a]. \text{Ran}[q\_1; K\_a]. \text{Ran}[q\_2; B\_b] \\ \text{.} (key\_a!cmp(K\_a, B\_a, B\_b). \text{nil} || key\_b!cmp(K\_a, B\_a, B\_b). \text{nil}). \end{split}$$

*Input.* For the implementation of BB84, we need to declare the following variables and operators in the input attached to it.


When modelling the protocol, we use several operators. They should be defined and their definitions are part of the input.


The function cmp is treated as an in-built function, so there is no need to define it in the input.

For the specification BB84spec, we only declare the classical bits Ba, Bb, Ka, qubits q1, q<sup>2</sup> and the operator Ran. The variables and operators declared here are the same as those in the input of the implementation.

*Output.* Taking the input discussed above, the tool first generates two pLTSs, with over 150 states for the implementation and 80 states for the specification, and then runs the ground bisimulation checking algorithm. As we can see from the fifth row in Table 1, our tool confirms that BB84, ρ0≈BB84spec, ρ0, where ρ<sup>0</sup> denotes the initial state of the quantum register, thus the implementation is faithful to the specification. In the output of the tool, there is an enumeration of 1084 pairs of non-bisimilar states and 3216 pairs of bisimilar states. The pLTSs and the state pairs can be found in [24].

## **4.3 Experimental Results**

We conducted experiments on several quantum communication protocols with a few different input variables. Table 1 provides a summary of our experimental results obtained on a macOS machine with an Intel Core i7 2.5 GHz processor and 16GB of RAM.


**Table 1.** Experimental results. The columns headed by **Impl** and **Spec** show the numbers of nodes contained in the generated pLTSs of the implementations and specifications, respectively. Column **N** shows the sizes of the sets of non-bisimilar state pairs and Column **B** shows the sizes of the sets of bisimilar state pairs. Column **ms** shows the time cost of the verification in milliseconds.

In each case, we report the final outcome (whether an implementation is ground bisimilar to its specification), the number of nodes in two pLTSs, the numbers of non-bisimilar and bisimilar state pairs in NonBisim and Bisim, respectively, as well as the verification time of our ground bisimulation checking algorithm. The time cost excludes the part of pLTS generation which takes around one second in all the examples.

Besides the protocol discussed in Section 4.2, we also verify other ones that make use of entangled qubits such as the teleportation and the quantum secrect sharing protocol. For quantum key distribution protocols, we conduct experiments on the BB84, the B92 and the E91.

Not all the cases in Table 1 give the size of the set NonBisim of non-bisimilar state pairs, as the bisimulation checking algorithm may immediately terminate once a negative verification result is obtained, i.e. the two initial states are not bisimilar.

#### **Data Availability Statement**

The datasets generated and/or analyzed during the current study are available in the figshare repository: https://doi.org/10.6084/m9.figshare.11874942.v1.

## **5 Conclusion and Future Work**

We have presented an on-the-fly algorithm to check ground bisimulation for quantum processes in qCCS, and a simpler algorithm for strong ground bisimulation. Based on the algorithms, we have developed a tool to verify quantum communication protocols modelled as qCCS processes. We have carried out experiments on several non-trivial quantum communication protocols from superdense coding to key distribution and found the tool helpful.

As to future work, several interesting problems remain to be addressed. For example, a limitation of the current work is to compare quantum processes with predetermined states of quantum registers. Indeed, there are occasions where one would expect two processes to be equivalent for arbitrary initial states. It is infeasible to enumerate all those states. Then the symbolic bisimulations proposed in [11] will be useful. We are considering to implement the algorithm for symbolic ground bisimulation, and then tackle the more challenging symbolic open bisimulation, both proposed in that work. Another problem occurs in the experiment of Section 4.2. The example tested one qubit instead of a sequence of qubits because more qubits lead to a drastic growth of the running time, which shows a limitation of the current approach of explicitly representing state spaces.

## **Appendix**

Algorithm 1 needs to check the condition that if <sup>t</sup> <sup>α</sup> −→ <sup>Δ</sup> then there exists some <sup>Θ</sup> such that <sup>u</sup> <sup>α</sup><sup>ˆ</sup> <sup>=</sup><sup>⇒</sup> <sup>Θ</sup> and <sup>Δ</sup> <sup>R</sup>◦ <sup>Θ</sup>. We use as a subroutine a procedure of [28] to reduce the problem to a network flow problem that can be solved in polynomial time.

Technically, we construct a network graph <sup>G</sup>(Δ, u, α, <sup>R</sup>)=(V,E) defined as follows. Let <sup>S</sup> be the set of reachable states, and <sup>R</sup> be a binary relation on the states.

Let and be two vertices that represent the source and the sink of the network, respectively. For each visible action α, the set of vertices V is given below

$$V = \{\square, \blacksquare\} \cup S \cup S^{tr} \cup S\_{\alpha} \cup S\_{\alpha}^{tr} \cup S\_{\bot} \cup S\_{\mathcal{R}}$$

where

$$\begin{aligned} S^{tr} &= \{ v^{tr} | tr = v \stackrel{\beta}{\longrightarrow} \Gamma, \ \beta \in \{ \alpha, \tau \} \}; \\ S\_{\alpha} &= \{ v\_{\alpha} | v \in S \}; \\ S\_{\alpha}^{tr} &= \{ v\_{\alpha}^{tr} | v^{tr} \in S^{tr} \}; \\ S\_{\perp} &= \{ v\_{\perp} | v \in S \}; \\ S\_{\mathcal{R}} &= \{ v\_{\mathcal{R}} | v \in S \}. \end{aligned}$$

and the set of edges E is

$$E = \{ (\triangle, u) \} \cup L\_1 \cup L\_\alpha \cup L\_2 \cup L\_\perp^\alpha \cup L\_{\mathcal{R}}^\alpha$$

where

$$\begin{split} L\_{1} &= \{ (v, v^{tr}), (v^{tr}, v') | tr = v \stackrel{\tau}{\longrightarrow} \varGamma, \ v' \in [\varGamma] \}; \\ L\_{\alpha} &= \{ (v, v\_{\alpha}^{tr}), (v\_{\alpha}^{tr}, v\_{\alpha}') | tr = v \stackrel{\alpha}{\longrightarrow} \varGamma, \ v\_{\alpha}' \in [\varGamma] \}; \\ L\_{2} &= \{ (v\_{\alpha}, v\_{\alpha}^{tr}), (v\_{\alpha}^{tr}, v\_{\alpha}') | tr = v\_{\alpha} \stackrel{\tau}{\longrightarrow} \varGamma, \ v\_{\alpha}' \in [\varGamma] \}; \\ L\_{\bot}^{\alpha} &= \{ (u\_{\alpha}, u\_{\bot}) | u \in S \}; \\ L\_{R} &= \{ (s\_{\bot}, s\_{\mathsf{R}}'), (s\_{\mathsf{R}}', \mathsf{T}) | (s, s') \in \mathsf{R} \}. \end{split}$$

For the invisible action <sup>τ</sup> , the definition is similar: <sup>V</sup> <sup>=</sup> {-, }∪S∪Str∪S<sup>⊥</sup> <sup>∪</sup>S<sup>R</sup> and <sup>E</sup> <sup>=</sup> {(-, u)} ∪ <sup>L</sup><sup>1</sup> <sup>∪</sup> <sup>L</sup><sup>⊥</sup> <sup>∪</sup> <sup>L</sup><sup>R</sup> where <sup>L</sup><sup>⊥</sup> <sup>=</sup> {(s, s⊥) <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup>}.

If α is a visible action, we consider the following linear programming problem associated to <sup>G</sup>(Δ, u, α, <sup>R</sup>):

$$\max \sum\_{(s,v)\in E} -f\_{s,v}$$

subject to

$$\begin{aligned} f\_{s,v} &\geq 0 & \text{for each } (s,v) \in E \\ f\_{\boldsymbol{\delta},u} &= 1 & \text{for each } v \in S \\ f\_{v\boldsymbol{\kappa},\boldsymbol{\tau}} &= \Delta(v) & \text{for each } v \in S \\ \sum\_{(s,v)\in E} f\_{s,v} - \sum\_{(v,w)\in E} f\_{v,w} &= 0 & \text{for each } v \in V \; \{\boldsymbol{\delta},\boldsymbol{\tau}\} \\ f\_{v^{tr},v^{\boldsymbol{\prime}}} - \boldsymbol{\Gamma}(v^{\boldsymbol{\prime}}) \cdot f\_{v,v^{\boldsymbol{\prime}\boldsymbol{\tau}}} &= 0 & \text{for each } tr = v \xrightarrow{\boldsymbol{\tau}} \boldsymbol{\Gamma} \text{ and } v^{\boldsymbol{\prime}} \in [\boldsymbol{\Gamma}] \\ f\_{v^{tr}\_{\boldsymbol{\alpha}},v^{\boldsymbol{\prime}}\_{\boldsymbol{\alpha}}} - \boldsymbol{\Gamma}(v^{\boldsymbol{\prime}}) \cdot f\_{v,v^{\boldsymbol{\prime}\boldsymbol{\tau}}} &= 0 & \text{for each } tr = v \xrightarrow{\boldsymbol{\alpha}} \boldsymbol{\Gamma} \text{ and } v^{\boldsymbol{\prime}} \in [\boldsymbol{\Gamma}] \\ f\_{v^{tr}\_{\boldsymbol{\alpha}},v^{\boldsymbol{\prime}}\_{\boldsymbol{\alpha}}} - \boldsymbol{\Gamma}(v^{\boldsymbol{\prime}}) \cdot f\_{v\_{\boldsymbol{\alpha}},v^{\boldsymbol{\prime}\boldsymbol{\tau}}} &= 0 & \text{for each } tr = v \xrightarrow{\boldsymbol{\tau}} \boldsymbol{\Gamma} \text{ and } v^{\boldsymbol{\prime}} \in [\boldsymbol{\Gamma}] \end{aligned}$$

Note that the fourth constraint is referred to as the flow-conservation constraints. The last three constraints link the source state and the result distribution.

For the invisible action τ , the linear programming problem associated to the network <sup>G</sup>(Δ, u, τ, <sup>R</sup>) is the same as above except that the last two constraints are dropped.

We denote by **LP**(Δ, u, α, <sup>R</sup>) the predicate that is true if and only if the linear programming problem above has a solution.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Deciding the bisimilarity of context-free session types

Bernardo Almeida , Andreia Mordido , and Vasco T. Vasconcelos

LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal

Abstract. We present an algorithm to decide the equivalence of contextfree session types, practical to the point of being incorporated in a compiler. We prove its soundness and completeness. We further evaluate its behaviour in practice. In the process, we introduce an algorithm to decide the bisimilarity of simple grammars.

Keywords: Types, Type equivalence, Bisimulation, Algorithm

## 1 Introduction

Session types enhance the expressivity of traditional types for programming languages by allowing the description of structured communication on heterogeneously typed channels [14,15,24]. Traditional session types are *regular* in the sense that the sequences of communication actions admitted by a type are in the union of a regular language (for finite executions) and an ω-regular language (for infinite executions). Introduced by Thiemann and Vasconcelos, context-free session types liberate traditional session types from the shackles of tail recursion, allowing, for example, the safe serialization of arbitrary recursive datatypes [26].

Session types are often used to discipline interactions in concurrent programs. When associated to (bidirectional, heterogeneous) channels, session types describe the permitted patterns of interaction. For example, a type of the form

rec x . +{L e a f : Skip , Node : ! Int ;x;x}

may describe one end of a communication channel. A process holding such a channel end must first *select* between choices Leaf and Node. If Leaf is chosen, then type Skip forwards the interaction to the continuation, if any. If no continuation is present, then interaction is over. Otherwise, the process must send an integer (!Int) followed by two trees, as witnessed by the recursive calls occurring after the choice of Node. A concurrent process holding the other end of the channel interacts via a *dual* type:

rec y . &{L e a f : Skip , Node : ? Int ;y;y}

In this case the process must be ready to offer both choices, Leaf and Node. For the latter option, the process must further receive an integer (?Int), followed by two trees.

Regular languages cannot capture such behaviour. The best one can do with regular session types (and without resorting to channel passing) is to use a regular type that allows transmitting trees, as well as many other non tree-like structures. The correct behaviour of processes interacting on such a channel would need to be checked at runtime [2,26].

If the algorithmic aspects of type equivalence for regular session types are well known (Gay and Hole propose an algorithm to decide subtyting [9], from which type equivalence can be derived), the same does not apply to context-free session types. Thiemann and Vasconcelos [26] show that the equivalence of context-free session types is decidable, by reducing the problem to the verification of bisimulation for Basic Process Algebra (BPA) which, in turn, was proved decidable by Christensen, Hüttel, and Stirling [6]. Even if the equivalence problem for context-free session types is known to be decidable, no algorithm has been proposed. Padovani [20] introduces a language with context-free session types that avoids the problem of checking the equivalence of types by requiring annotations in the source code. Annotations result in the structural alignment between code and types. This alignment—enforced by an explicit resumption process operator that breaks sequential composition in types—sidesteps the problem central to this paper: that of checking type equivalence. Furthermore, there are some basic equivalences on types that the compiler is not able to identify [20].

After the breakthrough by Christensen, Hüttel, and Stirling—a result that provides no immediate practical algorithm—the problem of deciding the equivalence of BPA terms has been addressed by several researchers [4,6,8,18]. Most of these works provide no practical algorithm that can be readily used, except the one by Czerwinski and Lasota where a polynomial time algorithm is presented that decides the bisimilarity of normed context-free processes in <sup>O</sup>(n<sup>5</sup>) [8]. However, context-free session types are not necessarily normed, which precludes resorting to this algorithm, or using the original result by Baeten, Bergstra, and Klop [3], as well as improvements by Hirshfeld, Jerrum, and Moller [12,13]. Moreover, the complexity estimates for deciding bisimilarity in BPA process are not promising. Kiefer provided an EXPTIME lower bound for BPA bisimilarity by proving this problem is EXPTIME-hard [19], whereas Jančar has provided a double exponential upper bound for this problem and proved that its complexity is <sup>O</sup>(2<sup>2</sup>pol(n) ) [17].

The decidability of deterministic pushdown automata (DPDA) has also been subject of much study [16,22,23]. Several techniques have been proposed to solve the problem, but no immediate practical algorithm was available until Henry and Sénizergues provide an algorithm for this problem [10]. Its poor performance however precludes its incorporation in a compiler. Furthermore, the algorithm Henry and Sénizergues propose handles the problem of language equivalence rather than the problem of deciding bisimilarity of DPDAs.

Our algorithm to decide the equivalence of context-free session types also allows deciding the bisimilarity of simple grammars (i.e., deterministic grammars in Greibach Normal Formal). It proceeds in three stages. The *first stage* builds a context-free grammar in Greibach Normal Formal (GNF)—in fact a simple grammar—from two context-free session types in a way that bisimulation is preserved. A basic result from Baeten, Bergstra, and Klop states that any

guarded BPA system can be transformed into Greibach Normal Formal (GNF) while preserving bisimulation equivalence, but unfortunately no procedure is presented [3]. The *second stage* prunes the grammar by removing unreachable symbols in unnormed sequences of non-terminal symbols. This stage builds on the result of Christensen, Hüttel, and Stirling [6]. The *third stage* constructs an expansion tree, by alternating between expansion and simplification steps. This last stage uses expansion operations proposed by Jančar, Moller, and Hirshfeld [11,18], and simplification rules proposed by Caucal, Christensen, Hüttel, Stirling, Jančar, and Moller [5,6,18]. The finite representation of bisimulations of BPA transition graphs [5,6] is paramount for our results of soundness and completeness.

The branching nature of the expansion tree confers an (at least) exponential complexity to the algorithm. However, our experiments with a concrete implementation—both as a stand-alone tool and incorporated in a compiler [2] are promising. We propose heuristics that decrease the execution time in 89% and reduce the number of timeouts by 95% (see Section 5).

We present an algorithm to decide the equivalence of context-free session types, practical to the point of being readily included in any compiler, an exercise that we conducted in parallel [2]. The main contributions of this work are:


The rest of the paper is organized as follows: an introduction to context-free session types can be found in Section 2, the algorithm in Section 3, the main results in Section 4, evaluation in Section 5, and conclusions in Section 6.

## 2 Context-free session types

This section briefly introduces context-free session types, based on the work of Thiemann and Vasconcelos [26]. The types we consider build upon a denumerable set of *variables* and a set of *choice labels*. Metavariables X, Y, Z range over variables and over labels. We assume given a set of base types denoted by B. The syntax of types is given by the grammar below.

$$\begin{array}{rclcrcl} S,T & ::= & \mathsf{skip} & \mid \not\sharp B & \mid \star \{\ell\_i \colon T\_i\}\_{i \in I} & \mid \ S; T \mid \mu X.T & \mid X\\ & & \sharp & ::= & ! \mid \ ? & \star \ ::= \oplus \mid \ \& \end{array}$$

In type μX.T, variable X is bound in the subterm T. The sets of bound and free variables in a given type are defined accordingly. Notation [T /X]S denotes the resulting of substituting T for the (free) occurrences of X in S.

Judgement S characterizes *terminated* types: context-free session types that exhibit no further action [1].

Terminated predicate: T -

$$\text{skip}\Big V \qquad X\big V \qquad \frac{S\big V \quad T\big V}{S;T\big V} \qquad \frac{T\big V}{\mu X.T\big V}$$

Notice that all types of the form μX.μX<sup>1</sup> . . . μXn.X, for n ≥ 0, are terminated.

We are not interested in all types generated by the above grammar. If Δ is a list of pairwise distinct variables, then judgement Δ T characterises the types of interest: the *well-formed* types.

Type formation system: Δ T

$$\Delta \vdash T$$

$$\begin{array}{cccc}\hline\hline\Delta\vdash\mathsf{skip}\quad\Delta\vdash\sharp B & \Delta\vdash X\\\hline\end{array}\begin{array}{cccc}\begin{array}{c}X\in\Delta\\\Delta\vdash X\\\hline\end{array}\begin{array}{c}\Delta\vdash S\\\hline\end{array}\begin{array}{c}\Delta\vdash T\_{i}\\\hline\Delta\vdash\star\{\ell\_{i}\mathrel{\mathop{\scalebox{-0.0}{\cdot\$}}\}{\in\operatorname{I}}\_{i\in\operatorname{I}}\end{array}\begin{array}{c}\neg T\not\sim\Delta,X\vdash T\\\hline\end{array}\begin{array}{c}\begin{array}{c}\neg T\not\sim\Delta,X\vdash T\\\hline\end{array}\begin{array}{c}\Delta\vdash\mu X.T\end{array}\end{array}$$

Terminated processes have a simple characterisation—types comprising skip, <sup>μ</sup> and semicolon—which justifies the inclusion of <sup>¬</sup><sup>T</sup> in the rules for type formation (Thiemann and Vasconcelos [26] introduce a contractive judgement for the effect). Type formation serves two main purposes: ensuring that all variables introduced by μ-types are pairwise distinct and that types underneath a μ are not terminated. This can be clearly seen by formation rule for μ-types, where notation Δ, X is understood as requiring X /∈ Δ. In the sequel we assume that all types are such that T and denote by T the set such types.

The set of *actions* is generated by the following grammar.

$$a \quad \coloneqq \sharp B \quad | \quad \star \ell$$

The *labelled transition system* (LTS) for context-free session types is given by <sup>T</sup> as the set of *states*, the set of *actions*, and the *transition relation* <sup>S</sup> <sup>a</sup>−→<sup>T</sup> <sup>T</sup> defined by the rules below.

Labelled transition system: <sup>S</sup> <sup>a</sup> −→<sup>T</sup> <sup>T</sup>

$$\begin{array}{c} \sharp B \stackrel{\sharp B}{\longrightarrow} \tau \text{ skip} \\ \frac{S \stackrel{a}{\longrightarrow} \tau \text{ S}'}{S; T \stackrel{a}{\longrightarrow} \tau \text{ S}'; T} \end{array} \qquad \begin{array}{c} \star \{\ell\_{i} \colon T\_{i}\}\_{i \in I} \stackrel{\star \ell\_{j}}{\longrightarrow} \tau \text{ T}\_{j} \quad (j \in I) \\ \frac{S \bigvee \ T \stackrel{a}{\longrightarrow} \tau \text{ T}'}{S; T \stackrel{a}{\longrightarrow} \tau \text{ T}'} \end{array} \qquad \begin{array}{c} \sharp \begin{array}{c} T \stackrel{a}{\longrightarrow} \\ \mu X. S/X \stackrel{a}{\longrightarrow} \tau \text{ T} \end{array} \end{array}$$

*Type bisimulation* is defined in the usual way from the labelled transition system [21]. We say that a type relation R is a *bisimulation* if, whenever SRT, for all a we have:

– for each <sup>S</sup> with <sup>S</sup> <sup>a</sup> −→<sup>T</sup> <sup>S</sup> , there is <sup>T</sup> such that <sup>T</sup> <sup>a</sup>−→<sup>T</sup> <sup>T</sup> and <sup>S</sup> RT , and – for each <sup>T</sup> with <sup>T</sup> <sup>a</sup> −→<sup>T</sup> <sup>T</sup> , there is <sup>S</sup> such that <sup>S</sup> <sup>a</sup> −→<sup>T</sup> <sup>S</sup> and <sup>S</sup> RT .

We say that two types are bisimilar, written S ∼<sup>T</sup> T, if there is a bisimulation R with SRT.

## 3 An algorithm to decide type bisimilarity

This section presents an algorithm to decide whether two types are in a type bisimulation. In the process we also provide an algorithm to decide the bisimilarity of simple context-free languages. The algorithm comprises three stages:


*Translating types to grammars.* Type variables X are the *non-terminal symbols* and LTS labels a are the *terminal symbols*. Sequences of type variables X are called *words*; ε denotes the empty word. A context-free grammar in Greibach Normal Form is a pair (X, <sup>P</sup>) where X is the *start word* and <sup>P</sup> <sup>a</sup> *set of productions* of the form <sup>Y</sup> <sup>→</sup> aZ (context-free session types do not require productions of the form Y → ε). Due to the deterministic nature of context-free session types, the grammars we are interested in are *simple*: for each non-terminal symbol Y and terminal symbol <sup>a</sup>, there is at most one production of the form <sup>Y</sup> <sup>→</sup> aZ.

Grammars in Greibach normal form naturally induce a labelled transition system by taking words X for states, terminal symbols <sup>a</sup> for actions, and <sup>a</sup> −→<sup>P</sup> , defined as XY <sup>a</sup> −→<sup>P</sup> ZY when <sup>X</sup> <sup>→</sup> aZ ∈ P, for the transition relation. The associated bisimilarity is denoted by ∼<sup>P</sup> .

The *unravelling* function on well-formed context-free session types, taken from Thiemann and Vasconcelos [26], is defined as follows.

$$\begin{aligned} \mathsf{unrr}(\mu X.T) &= \mathsf{unr}([\mu X.T/X]T) \\ \mathsf{unrr}(S;T) &= \begin{cases} \mathsf{unrr}(T) & \mathsf{unrr}(S) = \mathsf{skipp} \\ (\mathsf{unr}(S);T) \,\mathsf{unrr}(S) \neq \mathsf{skipp} \end{cases} \\ \mathsf{unrr}(T) &= T \qquad \text{in all other cases} \end{aligned}$$

The function terminates under the assumption that types are well formed.

Another function, word, builds a word from a type. In the process it updates a global set <sup>P</sup> of grammar productions. Word concatenation is denoted by X ·Y .

$$\begin{aligned} \mathsf{word}(\mathsf{skip}) &= \varepsilon \\ \mathsf{word}(S; T) &= \mathsf{word}(S) \cdot \mathsf{word}(T) \\ \mathsf{word}(\sharp B) &= Y, \text{ setting } \mathcal{P} := \mathcal{P} \cup \{Y \to \sharp B\} \\ \mathsf{word}(\star \{\ell\_i : T\_i\}\_{i \in I}) &= Y, \text{ setting } \mathcal{P} := \mathcal{P} \cup \{Y \to \star \ell\_i \cdot \mathsf{word}(T\_i) \mid i \in I\} \quad (Y \text{ fresh}) \\ \mathsf{word}(X) &= X \\ \mathsf{word}(\mu X.T) &= X \end{aligned}$$

The following lemma relates terminated types to the result of a call to word.

Lemma 1. *Let* <sup>T</sup>*. Then,* <sup>T</sup> *if and only if* word(T) = ε*.* *Proof.* The direct implication follows by rule induction on predicate -:


Conversely, if word(T) = ε, using the rules of the definition of word that produce the empty word:


To define the translation of context-free session types to simple grammars, assume that {μX1.T1, . . . , μXn.Tn} is the set of all μ-subterms in a given type T. Further assume that i<j whenever X<sup>j</sup> ∈ free(μXi.Ti). That is, the μ-subterms are topologically sorted with respect to their lexical nesting, innermost subterms first. Now we identify unrolled versions of the μ-subterms.

$$\begin{aligned} T\_1' &= [\mu X\_n.T\_n/X\_n] \cdots [\mu X\_2.T\_2/X\_2][\mu X\_1.T\_1/X\_1]T\_1 \\ T\_2' &= [\mu X\_n.T\_n/X\_n] \cdots [\mu X\_2.T\_1/X\_2]T\_2 \\ &\vdots \\ T\_n' &= [\mu X\_n.T\_n/X\_n]T\_n \end{aligned}$$

Clearly each type T <sup>i</sup> is closed (has no free variables). Notice that if T is a μ-type, then μXn.T<sup>n</sup> is T itself.

Finally, given an initial set of productions P0, function grm translates a type T into a grammar composed of a start word and set of productions:

$$\text{grm}(T,\mathcal{P}\_0) = (\text{word}(T), \mathcal{P}\_n)$$

where each P<sup>i</sup> is computed from P<sup>i</sup>−<sup>1</sup> by the following recurrence,

$$\mathcal{P}\_i' \cup \{ X\_i \to a\_j \vec{Y}\_j \vec{Z} \mid (Z \to a\_j \vec{Y}\_j) \in \mathcal{P}\_i' \} \text{ where } (Z \vec{Z}, \mathcal{P}\_i') = \mathsf{grm}(\mathsf{unr}(T\_i'), \mathcal{P}\_{i-1})$$

Notice that word(unr(T <sup>i</sup> )) is a non-empty word because of Lemma 1 and the fact that each T <sup>i</sup> is non-terminated by hypothesis. The function grm terminates on all inputs (because recursion is always on subterms) and adds a finite number of productions to the original set. Furthermore, because choices in session types do not contain duplicated labels, the function returns a simple grammar.

To run grm on two well-formed types proceed as follows: rename the second type so that bound variables do not overlap with those of the first; start with an empty set of productions; run the algorithm consecutively on the two types to obtain two initial words and a single set of productions.

*Example 1.* Consider the following pair of context-free session types.

$$\begin{aligned} S & \triangleq (\mu X\_1. \& \{n: X\_1; X\_1; ? \text{int}, \ell: ? \text{int} \}); (\mu X\_2. \text{lint}; X\_2; X\_2) \\ T & \triangleq (\mu Y\_1. \& \{n: Y\_1; Y\_1, \ell: \text{skip} \}; ? \text{int}); (\mu Y\_2. \text{lint}; Y\_2) \end{aligned}$$

Starting from the empty set of productions, running grm consecutively on S and on T produces the following set of productions

$$\begin{aligned} X\_1 &\rightarrow \& n \, X\_1 X\_1 X\_3 \\ X\_1 &\rightarrow \& \ell \, X\_4 \\ X\_2 &\rightarrow \!\!\text{int} \, X\_2 X\_2 \end{aligned} \qquad \begin{aligned} X\_3 &\rightarrow \!\!\text{int} \qquad Y\_1 &\rightarrow \& n \, Y\_1 Y\_1 Y\_3 \\ Y\_1 &\rightarrow \& \ell \, Y\_3 \\ Y\_2 &\rightarrow \!\!\text{int} \, X\_2 X\_2 \end{aligned} \qquad \begin{aligned} Y\_2 &\rightarrow \!\!\text{int} \, Y\_2 Y\_3 \\ Y\_3 &\rightarrow \!\!\text{int} \, Y\_3 Y\_2 \end{aligned}$$

and two start words X1X<sup>2</sup> and Y1Y2.

*Pruning unnormed productions.* For a a non-empty sequence of non-terminal symbols a1,...,an, write Y a −→<sup>P</sup> Z when Y <sup>a</sup><sup>1</sup> −→<sup>P</sup> ··· <sup>a</sup><sup>n</sup> −→<sup>P</sup> Z. We say that Y is *normed* when Y a −→<sup>P</sup> <sup>ε</sup> for some a, and that Y is *unnormed* otherwise. When Y is normed, the *minimal path* of Y is the shortest a such that Y a −→<sup>P</sup> ε. In this case, the *norm* of Y , denoted by <sup>|</sup>Y <sup>|</sup>, is the length of a. As observed by Christensen, Hüttel, and Stirling [6], any unnormed word Y is bisimilar to its concatenation with any other word, that is, if Y is unnormed, then Y <sup>∼</sup><sup>P</sup> Y X . We use this fact to prune unreachable symbols in unnormed words. And we do this in all productions.

*Example 2.* Recall Example 1 and notice that X<sup>2</sup> and Y<sup>2</sup> are both unnormed. Then, the last occurrence of X<sup>2</sup> in production X<sup>2</sup> → !int X2X<sup>2</sup> is unreachable, hence we simplify the production to obtain X<sup>2</sup> → !int X2.

*Building an expansion tree.* We base the third stage of the algorithm on the notion of *expansion tree* as proposed by Jančar and Moller [18], adapting an idea by Hirshfeld [11]. The *nodes* in trees are labelled by sets of pairs of words. We say that a node N is an *expansion* of N if N is a minimal set such that: for every pair (X, Y ) <sup>∈</sup> <sup>N</sup>,

– if X <sup>→</sup> aX then Y <sup>→</sup> aY with (X , Y ) ∈ N , and – if Y <sup>→</sup> aY then X <sup>→</sup> aX with (X , Y ) ∈ N .

An *expansion tree* is built from a root node: the singleton set containing the pair of start words obtained by translating the two types into a grammar. A children node is obtained from its parent node by expansion. However, as Jančar and Moller observed, expansions alone often lead to infinite trees. We then alternate between expansion and simplification operations, until either finding an empty node—case in which we decide equivalence positively—or failing to expand all nodes—case in which we decide equivalence negatively. We say that a branch is *successful* if it is infinite or finishes in an empty node, otherwise it is said to be *unsuccessful*.

In the *expansion step*, each node N derives a single child node, obtained as an expansion of N. As we are dealing with simple grammars, no branching is expected in the expansion tree at this step.

The *simplification step* consists on the application of the following rules:

Reflexive rule: Omit from a node any pair of the form (X, X );

Congruence rule: Omit from a node N any pair that belongs to the least congruence containing the ancestors of N;

BPA1 rule: If (X0X,Y <sup>0</sup>Y ) is in N and (X0X , Y0Y ) belongs to the ancestors of N, then create a sibling node for N replacing (X0X,Y <sup>0</sup>Y ) by (X, X ) and (Y , Y );

BPA2 rule: If (X0X,Y <sup>0</sup>Y ) is in N and X<sup>0</sup> and Y<sup>0</sup> are normed, then:

Case <sup>|</sup>X0|≤|Y0|: Let a be a minimal path for <sup>X</sup><sup>0</sup> and Z the word such that Y<sup>0</sup> a −→<sup>P</sup> Z. Add a sibling node for <sup>N</sup> including the pairs (X0Z,Y <sup>0</sup>) and (X, ZY ) in place of (X0X,Y <sup>0</sup>Y );

Otherwise: Let a be a minimal path for Y<sup>0</sup> and Z the word such that X<sup>0</sup> a −→<sup>P</sup> Z. Add a sibling node for <sup>N</sup> including the pairs (X0, Y0Z) and (ZX, Y ) in place of (X0X,Y <sup>0</sup>Y ).

Contrarily to expansion and to the reflexive and congruence simplifications, BPA rules promote branching in the expansion tree. We iteratively apply the simplification rules to ensure the algorithm computes the simplest possible children nodes derived from N. We can easily show that the simplification function that results from applying the reflexive, congruence, and BPA rules, has a fixed point in the complete partial ordered set of pairs node-ancestors, where the set of ancestors is fixed. The proof builds a partial order on the sets of pairs nodeancestors and uses Tarski's fixed point theorem [25]. The number of children nodes generated by the application of these rules is finite [6,18]. Notice that the sibling nodes do not exclude the (often) infinite branch resulting from successive expansions.

*Checking the bisimilarity of simple grammars.* Given a set of productions and two start words X and Y (all pruned), function bisimG alternates between simplification and expansion stages, starting with expansion. To avoid getting stuck in an infinite branch of the expansion tree, we use a breadth-first search on the expansion tree: node-ancestor pairs to be processed are stored in a queue. The initial pair inserted in the queue contains the initial node {(X, Y )} and an empty set of ancestors.

$$\mathsf{bisimG}(\vec{X}, \vec{Y}, \mathcal{P}) = \mathsf{exp}\mathsf{2nd}(\mathsf{singletonQuue}((\{(\vec{X}, \vec{Y})\}, \emptyset), \mathcal{P}))$$

Predicate expand terminates as soon as all nodes fail to expand (signalled by an empty queue), case in which the algorithm returns **False**, or an empty node is reached, case in which the algorithm returns **True**. Otherwise, it extracts node n at the front of the queue, simplifies its child node, and recurs.

```
expand(q,P) =
  if empty(q) then False
  else (n, a) = front(q)
       if empty(n) then True
       else if hasChild(n,P)
            then expand(simplify({(child(n,P), a ∪ n)}, dequeue(q),P))
            else expand(dequeue(q),P)
```
The simplification stage distinguishes the case where all type variables are normed, in which case BPA1 is not required to decide equivalence [5,6], from the case where some type variables might be unnormed.

```
rules = if allProductionsNormed(P) then [reflex, congruence, bpa2]
       else [reflex, congruence, bpa1, bpa2]
```
Function simplify applies the various rules iteratively, until reaching a fixed point. The application of the rules (via function apply) produces a set of nodes that are then enqueued. The simplification stage does not introduce new levels in the tree, hence the set of ancestors na is passed to function apply as is.

simplify(na, q,P) = **fold**(enqueue, q, apply(na,rules,P))

*Example 3.* The expansion tree for our running example is in Figure 1. Once a successful branch is reached (marked with -), bisimG(X, Y , <sup>P</sup>) returns **True**.

*Checking the bisimilarity of context-free session types.* Function bisimT decides the equivalence of two well-formed and renamed types, S and T. It starts by computing the start words for S and T by first translating S to a grammar and enriching this with the productions for type T. After pruning the productions in the grammar (function prune), the equivalence of S and T is decided using function bisimG.

$$\begin{aligned} \mathsf{biasT}(T, U) &= \mathsf{biasG}(\vec{X}, \vec{Y}, \mathsf{prime}(\mathcal{P})) \\ \text{where } (\vec{X}, \mathcal{P}') &= \mathsf{grm}(S, \emptyset) \\ (\vec{Y}, \mathcal{P}) &= \mathsf{grm}(T, \mathcal{P}') \end{aligned}$$

## 4 Correctness of the algorithm

In this section we prove that function bisimT is sound and complete with respect to the meta-theory of context-free session types. We start by showing a full abstraction result between context-free session types and grammars in Greibach Normal Form. Then, based on results from Caucal [5], Christensen, Hüttel, and Stirling [6], Jančar and Moller [18], we conclude that the algorithm we propose is sound and complete.

Fig. 1: An example of an expansion tree

*Type translation is fully abstract.* Sections 2 and 3 introduce bisimulation relations on the set T of types ∼<sup>T</sup> and on a given set P of productions ∼<sup>P</sup> . Our ultimate goal is to prove that we can faithfully analyze the bisimilarity of types by analyzing the bisimilarity of the corresponding grammars. For this purpose, we prove that the translation proposed in Section 3 is a *fully abstract encoding*, i.e., preserves the bisimilarity relation.

We start showing that the transformation of types to grammars preserves the labelled transitions. The following result states that grammars produced by grm mimic the transitions of the corresponding types and vice-versa.

Lemma 2. *Let* (X, <sup>P</sup> ) = grm(S, <sup>∅</sup>) *and* (Y , <sup>P</sup>) = grm(T,P )*. Then,* <sup>S</sup> <sup>a</sup>−→<sup>T</sup> <sup>T</sup> *if and only if* X <sup>a</sup> −→<sup>P</sup> Y *.*

*Proof.* For the direct implication we proceed by rule induction on the hypothesis, using the definition of word.


For the reverse implication, we prove that any transition in the grammar leads to a transition in the corresponding types.


Lemma 3. *If* word <sup>S</sup> <sup>a</sup>−→<sup>P</sup> X *, then exists* <sup>T</sup> *s.t.* <sup>S</sup> <sup>a</sup> −→<sup>T</sup> <sup>T</sup> *and* X <sup>=</sup> word <sup>T</sup>*.*

*Proof.* By induction on the definition of word. 

The main result of this subsection follows from Lemmas 2 and 3.

Theorem 1. *Let* (X, <sup>P</sup> ) = grm(S, <sup>∅</sup>) *and* (Y , <sup>P</sup>) = grm(T,P )*. Then,* grm *is a full abstract encoding, i.e.,* <sup>S</sup> <sup>∼</sup><sup>T</sup> <sup>T</sup> *if and only if* X <sup>∼</sup><sup>P</sup> Y *.*

*Proof.* For the direct implication, assume that S ∼<sup>T</sup> T and let B be a bisimulation for S and T. Then, consider B = {(word(S0),word(T0)) | (S0, T0) ∈ B}. Obviously, (word(S),word(T)) ∈ B . To prove that B is a bisimulation, one assumes that word(S0) <sup>a</sup> −→<sup>P</sup> X and proves that there exists Y such that word(T0) <sup>a</sup> −→<sup>P</sup> Y with (X, Y ) ∈ B . This proof is done by coinduction on the definition of word, uses Lemmas 2, 3, and the definition of B .

For the reverse implication, assume that X <sup>∼</sup><sup>P</sup> Y , with X <sup>=</sup> word(S) and Y <sup>=</sup> word(T) and let <sup>B</sup> be a bisimulation for X and Y . Then, consider <sup>B</sup> <sup>=</sup> {(S0, T0) | (word(S0),word(T0)) ∈ B }. Notice that (S, T) ∈ B. The proof that B is a bisimulation, consists in showing that: given (S0, T0) ∈ B, such that S0 <sup>a</sup> −→<sup>T</sup> <sup>S</sup> <sup>0</sup>, there exists T <sup>0</sup> such that T<sup>0</sup> <sup>a</sup> −→<sup>T</sup> <sup>T</sup> <sup>0</sup> and (S 0, T <sup>0</sup>) ∈ B. The proof follows by rule coinduction on the LTS and uses Lemmas 2 and 3. 

Now we sketch the proof that pruning grammars also preserves bisimulation. We distinguish the grammars in the context through the subscript of ∼.

## Theorem 2. X <sup>∼</sup><sup>P</sup> Y *if and only if* X <sup>∼</sup>prune(P) Y *.*

*Proof.* For the direct implication, the bisimulation for X and Y over <sup>P</sup> is also a bisimulation for X and Y over prune(P). For the reverse implication, if <sup>B</sup> is a bisimulation for X and Y over prune(P), then <sup>B</sup> <sup>=</sup> <sup>B</sup> ∪ {(V W, V W Z) <sup>|</sup> (<sup>W</sup> <sup>→</sup> V W Z) ∈ P, W unnormed} is a bisimulation for X and Y over <sup>P</sup>. 

*Correctness of the algorithm.* We now focus on the correctness of the function bisimG. Before proceeding to soundness, we recall the *safeness property* introduced by Jančar and Moller [18].

Lemma 4 (Safeness Property). *Given a set of productions* <sup>P</sup>*,* X <sup>∼</sup><sup>P</sup> Y *if and only if the expansion tree rooted at* {(X, Y )} *has a successful branch.*

Notice that function bisimG builds an expansion tree by alternating between simplification—reflexive, congruence, and BPA—and expansion operations, as proposed by Jančar and Moller. These simplification rules are *safe* [18], in the sense that the application of any rule preserves the bisimulation from a parent node to at least one child node and, reciprocally, that bisimulation on a child node implies the bisimulation of its parent node.

While the safeness property is instrumental in proving soundness, the *finite witness property* is of utmost importance to prove completeness. This result follows immediately from the analysis by Jančar and Moller [18], which capitalizes on results by Caucal [5], and Christensen, Hüttel, and Stirling [6]:

Lemma 5 (Finite Witness Property). *Given a set of productions* P*, if* X <sup>∼</sup><sup>P</sup> Y *then the expansion tree rooted at* {(X, Y )} *has a finite successful branch.*

We refer to Caucal, Christensen, Hüttel, and Stirling for details on the proof of existence of a finite witness, as stated in Lemma 5. This proof is particularly interesting in that it highlights the importance of the BPA rules and of pruning productions on reaching such (finite) witness. The results in these two papers also elucidate the reason for the distinction, in the simplification phase, between the cases where all the symbols in the grammar are and are not normed (cf. program variable rules in function expand). The safeness and finite witness properties ensure the termination of the algorithm, its soundness and completeness.

Lemma 6 (Termination). *Let* (X, <sup>P</sup> ) = grm(S, <sup>∅</sup>) *and* (Y , <sup>P</sup>) = grm(T,P )*. Then, the computation of* bisimG(X, Y , prune(P)) *always terminates.*

*Proof.* Start by noticing that prune(P) always terminates. For bisimG itself, if S ∼<sup>T</sup> T then, by Theorems 1 and 2, we have word(S) ∼prune(P) word(T) and thus the existence of a finite successful branch is ensured by the finite witness property (Lemma 5). Hence, breadth-first search eventually terminates.

When S ∼<sup>T</sup> T, we easily conclude that all branches in the expansion tree are finite and thus bisimG(X, Y ) terminates. To conclude that all branches are finite, observe that any infinite branch is successful by definition and thus the safeness property would imply word(S) ∼prune(P) word(T) and we would have S ∼<sup>T</sup> T, by Theorems 1 and 2. 

Lemma 7. *Let* (X, <sup>P</sup> ) = grm(S, <sup>∅</sup>) *and* (Y , <sup>P</sup>) = grm(T,P )*. If* bisimG(X, Y , prune(P)) *returns* **True***, then* X <sup>∼</sup>prune(P) Y *.*

*Proof.* Function bisimG returns **True** whenever it reaches a (finite) successful branch in the expansion tree rooted at {(X, Y )}, i.e., a branch terminating in an empty node. Conclude with the safeness property, Lemma 4. 

From the previous results, the soundness of our algorithm is now immediate: the algorithm to check the bisimulation of context-free session types is sound with respect to the meta-theory of context-free session types.

Theorem 3 (Soundness). *Let* (X, <sup>P</sup> ) = grm(S, <sup>∅</sup>) *and* (Y , <sup>P</sup>) = grm(T,P )*. If* bisimG(X, Y , prune(P)) *returns* **True** *then* <sup>S</sup> <sup>∼</sup><sup>T</sup> <sup>T</sup>*.*

*Proof.* From Theorem 1, Theorem 2, and Lemma 7. 

Given that the algorithm terminates (Lemma 6), we know that if S ∼<sup>T</sup> T, then bisimG(X, Y , prune(P)) returns **False**, where (X, <sup>P</sup> ) = grm(S, ∅) and (Y , <sup>P</sup>) = grm(T,P ). We now show that the algorithm to check the bisimulation of context-free session types is complete with respect to the meta-theory of context-free session types. The finite witness property is paramount to achieve this result.

Theorem 4 (Completeness). *Let* (X, <sup>P</sup> ) = grm(S, <sup>∅</sup>) *and* (Y , <sup>P</sup>) = grm(T,P )*. If* <sup>S</sup> <sup>∼</sup><sup>T</sup> <sup>T</sup> *then* bisimG(X, Y , prune(P)) *returns* **True***.*

*Proof.* Assume <sup>S</sup> <sup>∼</sup><sup>T</sup> <sup>T</sup>. By Theorems <sup>1</sup> and 2, we have X <sup>∼</sup>prune(P) Y . Hence, Lemma 5 ensures the existence of a finite successful branch on the expansion tree rooted at {(X, Y )}, i.e., a branch terminating in an empty node. Since our algorithm traverses the expansion tree using breadth-first search it will, eventually, reach the empty node and conclude the bisimulation positively. 

Theorem <sup>4</sup> ensures that if bisimG(X, Y , <sup>P</sup>) returns **False** then <sup>S</sup> <sup>∼</sup><sup>T</sup> <sup>T</sup>.

## 5 Evaluation

This section discusses the behaviour of our algorithm in the real world. Both for testing and for performance evaluation, we require test suites. We started with a carefully crafted, manually produced, suite of valid and invalid tests. This test suite was assembled by gathering pairs of types that emerged from examples we have studied and from programs we have written in FreeST, a programming language with context-free session types [2]. The tests produced by this method are, on the one hand, small, and, on the other hand, lacking diversity.

We then turned our attention to the automatic generation of test cases. Producing pairs of arbitrary (well-formed) types that share no variables is simple. However, the probability that a randomly generated pair of types turns out to be bisimilar is extremely low. For this reason, we generate arbitrary pairs of types that are bisimilar by construction. Theorem 5 naturally induces an algorithm: given a natural number n (the size of the pair), arbitrarily select for the base case (n = 0) one of the pairs in item 1 of the theorem and for the recursive case (n ≥ 1) one of the pairs in 2–12 items.

### Theorem 5 (Properties of type bisimilarity).

*1.* skip ∼<sup>T</sup> skip *and* B ∼<sup>T</sup> B*; 2.* S; T ∼<sup>T</sup> U; V *if* S ∼<sup>T</sup> U *and* T ∼<sup>T</sup> V *; 3.* μX.S ∼<sup>T</sup> μX.T *if* S ∼<sup>T</sup> T*; 4.* {<sup>i</sup> : Si}<sup>i</sup>∈<sup>I</sup> ∼<sup>T</sup> {<sup>i</sup> : Ti}<sup>i</sup>∈<sup>I</sup> *if* (S<sup>i</sup> ∼<sup>T</sup> Ti)<sup>i</sup>∈<sup>I</sup> *; 5.* S ∼<sup>T</sup> T;skip *and* S ∼<sup>T</sup> skip; T *if* S ∼<sup>T</sup> T*; 6.* {<sup>i</sup> : Si}<sup>i</sup>∈<sup>I</sup> ;U ∼<sup>T</sup> {<sup>i</sup> : Ti; V }<sup>i</sup>∈<sup>I</sup> *if* (S<sup>i</sup> ∼<sup>T</sup> Ti)<sup>i</sup>∈<sup>I</sup> *and* U ∼<sup>T</sup> V *; 7.* T ∼<sup>T</sup> S *if* S ∼<sup>T</sup> T*; 8.* R; (S; T) ∼<sup>T</sup> (U; V ); W *if* R ∼<sup>T</sup> U*,* S ∼<sup>T</sup> V *, and* T ∼<sup>T</sup> W*; 9.* μX.μY.S ∼<sup>T</sup> μX.[X/Y ]T ∼<sup>T</sup> μY.[Y /X]T *if* S ∼<sup>T</sup> T*; 10.* μX.S ∼<sup>T</sup> T *if* S ∼<sup>T</sup> T *and* X /∈ free(S)*; 11.* [U/X]S ∼<sup>T</sup> [V /X]T *if* S ∼<sup>T</sup> T *and* U ∼<sup>T</sup> V *; 12.* μX.S ∼<sup>T</sup> [μX.T /X]T *if* S ∼<sup>T</sup> T*.*

*Proof.* 1–3: Bisimulation is a congruence. 4–12: Thiemann and Vasconcelos [26] exhibit the appropriate bisimulations. 

For evaluating the algorithm on non-bisimilar pairs we add the following five anti-axioms to the list in Theorem 5: (1) skip ∼<sup>T</sup> B; (2) ?B ∼<sup>T</sup> !B; (3) skip ∼<sup>T</sup> {<sup>i</sup> : Si}<sup>i</sup>∈<sup>I</sup> ; (4) ⊕{<sup>i</sup> : Si}<sup>i</sup>∈<sup>I</sup> ∼<sup>T</sup> &{<sup>i</sup> : Si}<sup>i</sup>∈<sup>I</sup> ; (5) {<sup>i</sup> : Si}<sup>i</sup>∈<sup>I</sup> ∼<sup>T</sup> {<sup>j</sup> : Sj}<sup>J</sup> where I ⊂ J. We generate two types using the same methodology as for the positive case and, then, discard the data collected when the pair turns out to be bisimilar. This produces pairs of types that are much closer than those obtained by random generation, thus hopefully approaching the reality that the compilers face when in production.

We used QuickCheck [7] to generate two test suites. That for bisimilar pairs is constructed based on Theorem 5, whereas the construction of non-bisimilar tests relies on Theorem 5 plus the anti-axioms above. Both test suites comprise 2000 entries, featuring types with a number of nodes (in the syntax tree) ranging from 1 to 200.

The base algorithm described in the previous section turns out to behave quite poorly. We then implemented the following variants.


(a) Distribution of the execution time per variant

Fig. 2: Results on the test suite composed by bisimilar pairs of types is represented in blue and the test suite with non-bisimilar pairs is represented in orange. Time is in milliseconds. Scales of 2a and 2c are logarithmic; scale of 2b is linear.

3. *Using a double-ended queue to prepend promising children.* A double-ended queue allows prioritizing nodes with potential to reach an empty node faster. The algorithm prepends (rather than appends) empty nodes or nodes whose pairs (X, Y ) are such that <sup>|</sup>X | ≤ <sup>1</sup> and <sup>|</sup>Y | ≤ <sup>1</sup>. This procedure does not compromise soundness, completeness, nor termination because the number of terminal symbols is finite and the algorithm takes advantage of the reflexive and congruence rules to remove previously visited nodes from the queue.

To better understand how the algorithm performs in practice, we tested all the optimisations and their combinations. We evaluate each variant 1–3 individually (denoted by B1–B3) and all their combinations. For instance, B12 denotes the variant obtained from combining optimisations 1 and 2 above. B stands for the base algorithm, bisimT. We implemented the base algorithm and its variants in Haskell, using the Glasgow Haskell Compiler (version 8.6.5). The evaluation was conducted on a machine with an Intel Core i7-6700K at 4.2GHz and 8 GB of RAM running Arch Linux; tests were run under a timeout of 2 minutes.

Figure 2a depicts the distribution of the execution times (in ms) for both test suites and all variants. We observe that the behavior of negatives tests is roughly the same in all variants. However, the execution time for the positive tests differ from variant to variant. These differences mainly depend on the trade-off between the computational effort required for each optimisation and the efficiency they bring to deciding the equivalence of grammars. We observe that including optimisation 1 improves the execution time, while the rest, in general, does not. The combination of optimizations has a positive impact on execution time, with the exception of the B23 variant, whose distribution is worse than the base case.

Figure 2b shows the number of timeouts for each variant. The base case, B, has 146 positive tests whose execution time exceeds 2 minutes. The distribution of timeouts per variant exhibits a behavior that is consistent with that of runtime shown in Figure 2a. All combinations lead to a reduction in the number of timeouts, when compared to the base case.

Variant B1, resulting from considering optimisation 1, performs better than all others, presenting a median of 1.4 milliseconds and 7 timeouts, both for the positive tests. By taking advantage of optimisation 1, the number of timeouts reduced by 95%. The remaining positive tests take, on average, 1863.38 ms to complete with the base algorithm and 195.68 ms with variant B1, resulting in an 89% reduction in the execution time. This is the variant in production for the FreeST compiler [2].

The distribution of the execution time of B1 against the size of the input types is depicted in Figure 2c. As expected, the execution time increases considerably with the number of nodes. Although we have carried out tests with a fairly large number of nodes in the abstract syntax trees, we remark that, when used in a compiler, the algorithm will mostly come across types with a reduced number of nodes.

## 6 Conclusion

Context-free session types are a promising tool to describe protocols in concurrent programs. In order to be incorporated in programming languages and effectively used in compilers, a practical algorithm to decide bisimulation is called for. Taking advantage of a process algebra graph representation of types to decide bisimulation [12,13], we developed one such algorithm and proved it correct. The algorithm is incorporated in a compiler for a concurrent functional language equipped with context-free session types [2].

Possible extensions to this work include addressing higher-order session types. We also plan to extend the implementation of the algorithm to cope with contextfree grammars in Greibach Normal Form that are not necessarily deterministic.

*Acknowledgements.* We thank Alcides Fonseca for helping with the testing process, and Filipe Casal, Alexandra Silva, and Peter Thiemman for comments and discussions. This work was supported by FCT through the LASIGE Research Unit, ref. UIDB/00408/2020 and by Cost Action CA15123 EUTypes.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Sharp Congruences Adequate with Temporal Logics Combining Weak and Strong Modalities**

Fr´ed´eric Lang<sup>1</sup>, Radu Mateescu<sup>1</sup>, and Franco Mazzanti<sup>2</sup>

<sup>1</sup> Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP--, LIG, 38000 Grenoble, France {Frederic.Lang,Radu.Mateescu}@inria.fr <sup>2</sup> ISTI-CNR, Pisa, Italy Franco.Mazzanti@isti.cnr.it

**Abstract.** We showed in a recent paper that, when verifying a modal μ-calculus formula, the actions of the system under verification can be partitioned into sets of so-called weak and strong actions, depending on the combination of weak and strong modalities occurring in the formula. In a compositional verification setting, where the system consists of processes executing in parallel, this partition allows us to decide whether each individual process can be minimized for either divergence-preserving branching (if the process contains only weak actions) or strong (otherwise) bisimilarity, while preserving the truth value of the formula. In this paper, we refine this idea by devising a family of bisimilarity relations, named sharp bisimilarities, parameterized by the set of strong actions. We show that these relations have all the nice properties necessary to be used for compositional verification, in particular congruence and adequacy with the logic. We also illustrate their practical utility on several examples and case-studies, and report about our success in the RERS 2019 model checking challenge.

**Keywords:** Bisimulation · Concurrency · Model checking · Mu-calculus.

## **1 Introduction**

This paper deals with the verification of action-based, branching-time temporal properties expressible in the modal μ-calculus (Lμ) [31] on concurrent systems consisting of processes composed in parallel, usually described in languages with process algebraic flavour. A well-known problem is the state-space explosion that happens when the system state space exceeds the available computer memory.

Compositional verification is a set of techniques and tools that have proven efficient to palliate state-space explosion in many case studies [18]. They may either focus on the construction of the state space reduced for some equivalence relation, such as compositional state space construction [24, 32, 36, 43, 45–47], or on the decomposition of the full system verification into the verification of (expectedly smaller) subsystems, such as compositional reachability analysis [49, 10], assume-guarantee reasoning [41], or partial model checking [1, 34].

<sup>-</sup>-Institute of Engineering Univ. Grenoble Alpes

In this paper, we focus on property-dependent compositional state space construction, where the reduction to be applied to the system is obtained by analysing the property under verification. We will refine the approach of [37] which, given a formula ϕ of L<sup>μ</sup> to be verified, shows how to extract from ϕ a maximal hiding set of actions and a reduction (minimization for either strong [40] or divergence-preserving<sup>3</sup> branching — divbranching for short — bisimilarity [20, 23]) that preserves the truth value of ϕ. The reduction is chosen according to whether ϕ belongs to an L<sup>μ</sup> fragment named Ldbr <sup>μ</sup> , which is adequate with divbranching bisimilarity. This fragment consists of L<sup>μ</sup> restricted to *weak* modalities, which match actions preceded by (property-preserving) sequences of hidden actions, as opposed to traditional strong modalities α ϕ<sup>0</sup> and [α] ϕ0, which match only a single action satisfying α. If ϕ belongs to Ldbr <sup>μ</sup> , then the system can be reduced for divbranching bisimilarity; otherwise, it can be reduced for strong bisimilarity, the weakest congruence preserving full Lμ. We call this approach of [37] the mono-bisimulation approach.

We refine the mono-bisimulation approach in [35], by handling the case of L<sup>μ</sup> formulas containing both strong and weak modalities. To do so, fragments named Lstrong <sup>μ</sup> (As) extend Ldbr <sup>μ</sup> with strong modalities matching only the actions belonging to a given set A<sup>s</sup> of *strong* actions. This induces a partition of the parallel processes into those containing at least one strong action and those not containing any, so that a formula <sup>ϕ</sup> <sup>∈</sup> <sup>L</sup>strong <sup>μ</sup> (As) is still preserved if the processes containing strong actions are reduced for strong bisimilarity and the other ones for divbranching bisimilarity. We call this refined approach the combined bisimulations approach. Guidelines are also provided in [35] to extract a set of strong actions from particular L<sup>μ</sup> formulas encoding the operators of widely-used temporal logics, such as CTL [11], ACTL [39], PDL [15], and PDL-Δ [44]. This approach is implemented on top of the CADP verification toolbox [19], and experiments show that it can improve the capabilities of compositional verification on realistic case studies, possibly reducing state spaces by orders of magnitude.

In this paper, we extend these results as follows: (1) We refine the approach by devising a family of new bisimilarity relations, called *sharp bisimilarities*, parameterized by the set of strong actions As. They are hybrid between strong and divbranching bisimilarities, where strong actions are handled as in strong bisimilarity whereas weak actions are handled as in divbranching bisimilarity. (2) We show that each fragment Lstrong <sup>μ</sup> (As) is adequate with the corresponding sharp bisimilarity, namely, Lstrong <sup>μ</sup> (As) is precisely the set of properties that are preserved by sharp bisimilarity (w.r.t. As) on all systems. (3) We show that, similarly to strong and divbranching bisimilarities, every sharp bisimilarity is a congruence for parallel composition, which enables it to be used soundly in a compositional verification setting. (4) We define an efficient state space

<sup>3</sup> In [18, 37], the name divergence-sensitive is used instead of divergence-preserving branching bisimulation (or branching bisimulation with explicit divergences) [20, 23]. This could lead to a confusion with the relation defined in [13], also called divergence-sensitive but slightly different from the former relation. To be consistent in notations, we replace by dbr the abbreviation dsbr used in earlier work.

reduction algorithm that preserves sharp bisimilarity and has the same worstcase complexity as divbranching minimization. Although it is not a minimization (i.e., sharp bisimilar states may remain distinguished in the reduced state space), it coincides with divbranching minimization whenever the process it is applied to does not contain strong actions, and with strong minimization in the worst case. Therefore, applying this reduction compositionally always yields state space reduction at least as good as [35], which itself is an improvement over [37]. (5) At last, we illustrate our approach on case studies and compare our new results with those of [35, 37]. We also report about our recent success in the RERS 2019 challenge, which was obtained thanks to this new approach.

The paper is organized as follows: Sections 2 and 3 introduce the necessary background about process descriptions and temporal logic. Section 4 defines sharp bisimilarity, states its adequacy with Lstrong <sup>μ</sup> (As), and its congruence property for parallel composition. Section 5 presents the reduction algorithm and shows that it is correct and efficient. Section 6 illustrates our new approach on the case studies. Section 7 discusses related work. Finally, Section 8 concludes and discusses research directions for the future. The proofs of all theorems presented in this paper and a detailed description of how we tackled the RERS 2019 challenge are available in a Zenodo archive.<sup>4</sup>

## **2 Processes, Compositions, and Reductions**

We consider systems of processes whose behavioural semantics can be represented using an LTS (*Labelled Transition System*).

**Definition 1 (LTS).** *Let* A *be an infinite set of actions including the invisible action* τ *and visible actions* A\{τ}*. An LTS* P *is a tuple* (Σ, A, −→, pinit)*, where* Σ *is a set of states,* A ⊆ A *is a set of actions,* −→ ⊆ Σ × A × Σ *is the (labelled) transition relation, and* pinit ∈ Σ *is the initial state. We may write* Σ<sup>P</sup> , A<sup>P</sup> , −→<sup>P</sup> *for the sets of states, actions, and transitions of an LTS* P*, and init*(P) *for its initial state. We assume that* P *is finite and write* |P|st *(resp.* <sup>|</sup>P|tr *) for the number of states (resp. transitions) of* <sup>P</sup>*. We write* <sup>p</sup> <sup>a</sup> −→ <sup>p</sup> *for* (p, a, p- ) ∈ −→ *and* p A −→ *for* (∃p- <sup>∈</sup> <sup>Σ</sup><sup>P</sup> , a <sup>∈</sup> <sup>A</sup>) <sup>p</sup> <sup>a</sup>−→ <sup>p</sup>- *.*

LTS can be composed in parallel and their actions may be abstracted away using the parallel composition and action mapping defined below, of which action hiding, cut (also known as restriction), and renaming are particular cases.

**Definition 2 (Parallel composition of LTS).** *Let* P, Q *be LTS and* Async ⊆ A\{τ}*. The parallel composition of* P *and* Q *with synchronization on* Async*, written "*P |[Async]| Q*", is defined as* (Σ<sup>P</sup> × ΣQ, A<sup>P</sup> ∪ AQ, −→,(*init*(P), *init*(Q)))*, where* (p, q) <sup>a</sup>−→ (p- , q- ) *if and only if (1)* <sup>p</sup> <sup>a</sup>−→<sup>P</sup> <sup>p</sup>- *,* q- = q*, and* a /∈ Async*, or (2)* p- <sup>=</sup> <sup>p</sup>*,* <sup>q</sup> <sup>a</sup> −→<sup>Q</sup> <sup>q</sup>- *, and* a /<sup>∈</sup> <sup>A</sup>sync*, or (3)* <sup>p</sup> <sup>a</sup> −→<sup>P</sup> <sup>p</sup>- *,* <sup>q</sup> <sup>a</sup> −→<sup>Q</sup> <sup>q</sup>- *, and* a ∈ Async*.*

<sup>4</sup> https://doi.org/10.5281/zenodo.3470930

**Definition 3 (Action mapping).** *Let* P *be an LTS and* ρ : A<sup>P</sup> → 2<sup>A</sup> *be a total function. We write* ρ(A<sup>P</sup> ) *for the image of* ρ*, defined by* - <sup>a</sup>∈A<sup>P</sup> <sup>ρ</sup>(a)*. We write* ρ(P) *for the LTS* (Σ<sup>P</sup> , ρ(A<sup>P</sup> ), −→, *init*(P)) *where* −→ = {(p, a- , p- ) | (∃a ∈ <sup>A</sup><sup>P</sup> ) <sup>p</sup> <sup>a</sup>−→<sup>P</sup> <sup>p</sup>-∧a- ∈ ρ(a)}*. An action mapping* ρ *is admissible if* τ ∈ A<sup>P</sup> *implies* ρ(τ ) = {τ}*. We distinguish the following admissible action mappings:*


Parallel composition and action mapping subsume all abstraction and composition operators encodable as *networks of LTS* [42, 18, 33], such as synchronization vectors<sup>5</sup> and the parallel composition, hiding, renaming, and cut operators of CCS [38], CSP [8], mCRL [26], LOTOS [29], E-LOTOS [30], and LNT [9].

LTS can be compared and reduced modulo well-known bisimilarity relations, such as strong [40] and (div)branching [20, 23] bisimilarity. We do not give their definitions, which can easily be found elsewhere (e.g., [35]). They are special cases of Definition 7 (page 7), as shown by Theorem 1 (page 9). We write ∼ (resp. ∼dbr ) for the strong (resp. divbranching) bisimilarity relation between states. We write *min*str (P) (resp. *min*dbr (P)) for the quotient of P w.r.t. strong (resp. divbranching) bisimilarity, i.e., the LTS obtained by replacing each state by its equivalence class. The quotient is the smallest LTS of its equivalence class, thus computing the quotient is called minimization. Moreover, these bisimilarities are congruences for parallel composition and admissible action mapping. This allows reductions to be applied at any intermediate step during LTS construction, thus potentially reducing the overall cost. However, since processes may constrain each other by synchronization, composing LTS pairwise following the algebraic structure of the composition expression and applying reduction after each composition can be orders of magnitude less efficient than other strategies in terms of the largest intermediate LTS. Finding an optimal strategy is impossible, as it requires to know the size of (the reachable part of) an LTS product without actually computing the product. One generally relies on heuristics to select a subset of LTS to compose at each step of LTS construction. In this paper, we will use the *smart reduction* heuristic [12, 18], which is implemented within the SVL [17] tool of CADP [19]. This heuristic tries to find an efficient composition order by analysing the synchronization and hiding structure of the composition.

<sup>5</sup> For instance, the composition of P and Q where action a of P synchronizes with either b or c of Q, can be written as ρ(P)|[b, c]| Q, where ρ maps a onto {b, c}. This example illustrates the utility to map actions into sets of actions of arbitrary size.

## **3 Temporal Logics**

**Definition 4 (Modal** *μ***-calculus [31]).** *The modal* μ*-calculus (*Lμ*) is built from action formulas* α *and state formulas* ϕ*, whose syntax and semantics w.r.t. an LTS* P = (Σ, A, −→, pinit) *are defined as follows:*

α ::= a [[a]]<sup>A</sup> = {a} | false [[false]]<sup>A</sup> = ∅ | α<sup>1</sup> ∨ α<sup>2</sup> [[α<sup>1</sup> ∨ α2]]<sup>A</sup> = [[α1]]<sup>A</sup> ∪ [[α2]]<sup>A</sup> | ¬α<sup>0</sup> [[¬α0]]<sup>A</sup> = A \ [[α0]]<sup>A</sup> ϕ ::= false [[false]]<sup>P</sup> δ = ∅ | ϕ<sup>1</sup> ∨ ϕ<sup>2</sup> [[ϕ<sup>1</sup> ∨ ϕ2]]<sup>P</sup> δ = [[ϕ1]]<sup>P</sup> δ ∪ [[ϕ2]]<sup>P</sup> δ | ¬ϕ<sup>0</sup> [[¬ϕ0]]<sup>P</sup> δ = Σ \ [[ϕ0]]<sup>P</sup> δ | α ϕ<sup>0</sup> [[<sup>α</sup> <sup>ϕ</sup>0]]<sup>P</sup> <sup>δ</sup> <sup>=</sup> {<sup>p</sup> <sup>∈</sup> <sup>Σ</sup> | ∃<sup>p</sup> <sup>a</sup>−→ <sup>p</sup>- .a ∈ [[α]]<sup>A</sup> ∧ p- ∈ [[ϕ0]]<sup>P</sup> δ } | X [[X]]<sup>P</sup> δ = δ(X) | μX.ϕ<sup>0</sup> [[μX.ϕ0]]<sup>P</sup> δ = - <sup>k</sup>≥<sup>0</sup> <sup>Φ</sup><sup>0</sup> k P,δ(∅)

*where* <sup>X</sup> ∈ X *are propositional variables denoting sets of states,* <sup>δ</sup> : X → <sup>2</sup><sup>Σ</sup> *is a context mapping propositional variables to sets of states,* [ ] *is the empty context,* δ[U/X] *is the context identical to* δ *except for variable* X*, which is mapped to state set* <sup>U</sup>*, and the functional* <sup>Φ</sup><sup>0</sup>P,δ : 2<sup>Σ</sup> <sup>→</sup> <sup>2</sup><sup>Σ</sup> *associated to the formula* μX.ϕ<sup>0</sup> *is defined as* Φ<sup>0</sup>P,δ(U) = [[ϕ0]]<sup>P</sup> δ[U/X]*. For closed formulas, we write* P |= ϕ *(read* P *satisfies* ϕ*) for* pinit ∈ [[ϕ]]<sup>P</sup> [ ]*.*

Action formulas α are built from actions and Boolean operators. State formulas ϕ are built from Boolean operators, the possibility modality α ϕ<sup>0</sup> denoting the states with an outgoing transition labelled by an action satisfying α and leading to a state satisfying ϕ0, and the minimal fixed point operator μX.ϕ<sup>0</sup> denoting the least solution of the equation X = ϕ<sup>0</sup> interpreted over 2<sup>Σ</sup>.

The usual derived operators are defined as follows: Boolean connectors true = ¬false and ϕ<sup>1</sup> ∧ ϕ<sup>2</sup> = ¬(¬ϕ<sup>1</sup> ∨ ¬ϕ2); necessity modality [α] ϕ<sup>0</sup> = ¬α ¬ϕ0; and maximal fixed point operator νX.ϕ<sup>0</sup> = ¬μX.¬ϕ0[¬X/X], where ϕ0[¬X/X] is the syntactic substitution of X by ¬X in ϕ0. Syntactically, and [] have the highest precedence, followed by ∧, then ∨, and finally μ and ν. To have a welldefined semantics, state formulas are syntactically monotonic [31], i.e., in every subformula μX.ϕ0, all occurrences of X in ϕ<sup>0</sup> fall in the scope of an even number of negations. Thus, negations can be eliminated by downward propagation. We now introduce the weak modalities of the fragment Ldbr <sup>μ</sup> , proposed in [37].

**Definition 5 (Modalities of** *Ldbr <sup>μ</sup>* **[37]).** *We write* α<sup>τ</sup> *for an action formula such that* τ ∈ [[α<sup>τ</sup> ]]<sup>A</sup> *and* α<sup>a</sup> *for an action formula such that* τ /∈ [[αa]]A*. We consider the following modalities, their* L<sup>μ</sup> *semantics, and their informal semantics:*


**Ultra-weak:** p *is source of a path whose transition labels satisfy* α<sup>τ</sup> *, leading to a state that satisfies* ϕ2*, while traversing only states that satisfy* ϕ1*.*


We also consider the three dual modalities [(ϕ1?.α<sup>τ</sup> )∗] ϕ<sup>2</sup> = ¬-(ϕ1?.α<sup>τ</sup> )<sup>∗</sup> ¬ϕ2, [(ϕ1?.α<sup>τ</sup> )∗.ϕ1?.αa] ϕ<sup>2</sup> = ¬-(ϕ1?.α<sup>τ</sup> )∗.ϕ1?.αa ¬ϕ2, [ϕ1?.α<sup>τ</sup> ] = ¬ϕ1?.α<sup>τ</sup> @. The fragment Ldbr <sup>μ</sup> adequate with divbranching bisimilarity consists of L<sup>μ</sup> from which the modalities a ϕ and [a] ϕ are replaced by the ultra-weak, weak, and weak infinite looping modalities defined above.

We identify fragments of L<sup>μ</sup> parameterized by a set of strong actions As, as the set of state formulas whose action formulas contained in strong modalities satisfy only actions of As.

**Definition 6 (***Lstrong <sup>μ</sup>* **(***As***) fragment of** *L<sup>μ</sup>* **[35]).** *Let* A<sup>s</sup> ⊆ A *be a set of actions called strong actions and* α<sup>s</sup> *be any action formula such that* [[αs]]<sup>A</sup> ⊆ As*, called a strong action formula.* Lstrong <sup>μ</sup> (As) *is defined as the set of formulas semantically equivalent to some formula of the following language:*

$$\begin{array}{l} \varphi ::= \mathsf{false} \mid \varphi\_1 \lor \varphi\_2 \mid \neg \varphi\_0 \mid \langle \alpha\_s \rangle \,\varphi\_0 \mid X \mid \mu X. \varphi\_0\\ \mid \quad \langle (\varphi\_1 ?. \alpha\_\tau)^\* \rangle \,\varphi\_2 \mid \langle (\varphi\_1 ?. \alpha\_\tau)^\*. \varphi\_1 ?. \alpha\_a \rangle \,\varphi\_2 \mid \langle \varphi\_1 ?. \alpha\_\tau \rangle \,\mathsf{@} \end{array}$$

*In the context of* Lstrong <sup>μ</sup> (As)*, we call* <sup>α</sup>s <sup>ϕ</sup><sup>0</sup> *a strong modality.*<sup>6</sup>

In [35], we also provide guidelines for extracting a set A<sup>s</sup> from particular L<sup>μ</sup> formulas encoding the operators of widely-used temporal logics, such as CTL [11], ACTL [39], PDL [15], and PDL-Δ [44].

*Example 1.* The PDL formula [true∗.a1.a2]true belongs to Lstrong <sup>μ</sup> ({a2}) as it is semantically equivalent to [(true?.true)∗.true?.a1] [a2]true. The CTL formula EF(a1true∧ <sup>a</sup>2true) belongs both to <sup>L</sup>strong <sup>μ</sup> ({a1}) as it is semantically equivalent to -(true?.true)<sup>∗</sup> -(a1true?.true)∗.<sup>a</sup>1true?.a2true and to <sup>L</sup>strong <sup>μ</sup> ({a2}) as it is semantically equivalent to the same formula where a<sup>1</sup> and a<sup>2</sup> are swapped. These formulas do not belong to Lstrong <sup>μ</sup> (∅). (This was shown in [35].)

The latter example shows that to a formula ϕ may correspond several minimal sets of strong actions As. Indeed, either the a1true or the a2true modality can be made part of a weak modality, but not both in the same formula.

## **4 Sharp Bisimilarity**

We define the family of sharp bisimilarity relations below. Each relation is hybrid between strong and divbranching bisimilarities, parameterized by the set of strong actions, such that the conditions of strong bisimilarity apply to strong actions and the conditions of divbranching bisimilarity apply to all other actions.

<sup>6</sup> For generality we allow <sup>τ</sup> <sup>∈</sup> <sup>A</sup>s, to enable strong modalities of the form α<sup>τ</sup> <sup>ϕ</sup>0.

**Definition 7 (Sharp bisimilarity).** *A divergence-unpreserving sharp bisimulation w.r.t. a set of actions* A<sup>s</sup> *is a symmetric relation* R ⊆ Σ × Σ *such that if* (p, q) <sup>∈</sup> <sup>R</sup> *then for all* <sup>p</sup> <sup>a</sup> −→ <sup>p</sup>- *, there exists* q *such that* (p- , q- ) ∈ R *and either of the following hold: (1)* <sup>q</sup> <sup>a</sup>−→ <sup>q</sup>- *, or (2)* a = τ *,* τ /∈ As*, and* q- = q*, or (3)* a /∈ As*, and there exists a sequence of transitions* q<sup>0</sup> <sup>τ</sup> −→ ... <sup>τ</sup> −→ <sup>q</sup><sup>n</sup> <sup>a</sup> −→ <sup>q</sup>- (n ≥ 0) *such that* q<sup>0</sup> = q*, and for all* i ∈ 1..n*,* (p, qi) ∈ R*.* <sup>7</sup> *A sharp bisimulation* R *additionally satisfies the following divergence-preservation condition: for all* (p0, q0) ∈ R *such that* p<sup>0</sup> <sup>τ</sup> −→ <sup>p</sup><sup>1</sup> <sup>τ</sup> −→ <sup>p</sup><sup>2</sup> <sup>τ</sup> −→ ... *with* (pi, q0) <sup>∈</sup> <sup>R</sup> *for all* <sup>i</sup> <sup>≥</sup> <sup>0</sup>*, there is also an infinite sequence* q<sup>0</sup> <sup>τ</sup> −→ <sup>q</sup><sup>1</sup> <sup>τ</sup> −→ <sup>q</sup><sup>2</sup> <sup>τ</sup> −→ ... *such that* (pi, q<sup>j</sup> ) <sup>∈</sup> <sup>R</sup> *for all* i, j <sup>≥</sup> <sup>0</sup>*. Two states* p *and* q *are sharp bisimilar w.r.t.* As*, written* p ∼A<sup>s</sup> q*, if and only if there exists a sharp bisimulation* R *w.r.t.* A<sup>s</sup> *such that* (p, q) ∈ R*.*

Similarly to strong, branching, and divbranching bisimilarities, sharp bisimilarity is an equivalence relation as it is the union of all sharp bisimulations. The quotient of an LTS P w.r.t. sharp bisimilarity is unique and minimal both in number of states and number of transitions.

*Example 2.* Let a, b, ω ∈A\{τ}, τ,ω /∈ As. LTS P<sup>i</sup> and P- <sup>i</sup> of Figure 1 satisfy P<sup>i</sup> ∼A<sup>s</sup> P- <sup>i</sup> (i ∈ 1..7). We give the smallest relation between P<sup>i</sup> and P- <sup>i</sup> , whose symmetric closure is a sharp bisimulation w.r.t. A<sup>s</sup> and the weakest condition for P- <sup>i</sup> to be minimal. Unlike divbranching, states on the same τ -cycle are not necessarily sharp bisimilar: in P- <sup>7</sup>, if a ∈ A<sup>s</sup> then p- <sup>0</sup> and p- <sup>2</sup> are not sharp bisimilar.

*Example 3.* The LTS of Figure 2(a) is equivalent for ∼{a} to the one of Figure 2(b), which is minimal. We see that sharp bisimilarity reduces more than strong bisimilarity when at least one action (visible or invisible) is weak. Here, τ is the only weak action and the minimized LTS is smaller than the one minimal for strong bisimilarity (only p<sup>1</sup> and p<sup>2</sup> are strongly bisimilar).

If τ ∈ As, then case (2) of Definition 7 cannot apply, i.e., τ -transitions cannot be totally suppressed. As a consequence, looking at case (3), if τ -transitions are present in state q<sup>0</sup> then, due to symmetry, they must have a counterpart in state p. As a result, finite sequences of τ -transitions are preserved. Sharp may however differ from strong bisimilarity in the possibility to compress circuits of τ -transitions that would remain unreduced, as illustrated in Example 4 below.

*Example 4.* If τ ∈ A<sup>s</sup> and a /∈ As, then the LTS of Figure 2(b) (which is minimal for strong bisimilarity) can be reduced to the LTS of Figure 2(c).

Next theorems are new. Theorem 1 expresses that sharp bisimilarity w.r.t. a set of strong actions A<sup>s</sup> is strictly stronger than w.r.t. any set of strong actions strictly included in As. Unsurprisingly, it also establishes that sharp coincides with divbranching when the set of strong actions is empty, and with strong when

<sup>7</sup> We require that (p, qi) <sup>∈</sup> <sup>R</sup> for all <sup>i</sup> <sup>∈</sup> <sup>1</sup>..n and not the simpler condition (p, qn) <sup>∈</sup> <sup>R</sup> (as usual when defining branching bisimulation) because sharp bisimulation has not the nice property that (p, q0) ∈ R and (p, qn) ∈ R imply (p, qi) ∈ R for all i ∈ 1..n.

τ --

p<sup>2</sup> <sup>τ</sup> p<sup>3</sup> τ

a

0),(p1, p- 1),(p2, p- 0),(p3, p- <sup>0</sup>)} <sup>1</sup> minimal

$$\{(p\_0, p\_0'), (p\_1, p\_1'), (p\_2, p\_0'), (p\_3, p\_3')\}$$
  $a \in A\_s$  implies  $P\_2'$  minimal

0),(p1, p- 1),(p2, p- <sup>0</sup>)} a = ω implies P- <sup>3</sup> minimal

0),(p1, p- 1),(p2, p- 0),(p3, p- <sup>0</sup>)} a = ω implies P- <sup>4</sup> minimal

0),(p1, p- 1),(p2, p- 0),(p3, p- <sup>3</sup>)} b = a ∧ b ∈ A<sup>s</sup> implies P- <sup>5</sup> minimal

0),(p1, p- 1),(p2, p- 0),(p3, p- <sup>0</sup>)}

0),(p1, p- 1),(p2, p- 2),(p3, p- <sup>0</sup>)} a ∈ A<sup>s</sup> implies P- <sup>7</sup> minimal

**Fig. 1.** Examples of sharp bisimilar LTS

τ --

p-

2

τ

**Fig. 2.** LTS of Examples 3 and 4

it comprises all actions (including τ ). It follows that the set of sharp bisimilarity relations equipped with set inclusion forms a complete lattice whose supremum is divbranching bisimilarity and whose infimum is strong bisimilarity.

**Theorem 1.** *(1)* ∼<sup>∅</sup> = ∼dbr *(2)* ∼<sup>A</sup> = ∼ *(3) if* A- <sup>s</sup> ⊂ A<sup>s</sup> *then* ∼A<sup>s</sup> ⊂ ∼A- s *.*

Theorem 2 expresses that sharp bisimilarity w.r.t. A<sup>s</sup> preserves the truth value of all formulas of Lstrong <sup>μ</sup> (As), and Theorem 3 that two LTS verifying exactly the same formulas of Lstrong <sup>μ</sup> (As) are sharp bisimilar. We can then deduce that Lstrong <sup>μ</sup> (As) is adequate with ∼A<sup>s</sup> , as expressed by Corollary 1.

**Theorem 2.** *If* P ∼A<sup>s</sup> P *and* <sup>ϕ</sup> <sup>∈</sup> <sup>L</sup>strong <sup>μ</sup> (As) *then* P |= ϕ *iff* P-|= ϕ*.*

**Theorem 3.** *If* (∀<sup>ϕ</sup> <sup>∈</sup> <sup>L</sup>strong <sup>μ</sup> (As)) P |= ϕ *iff* Q |= ϕ*, then* P ∼A<sup>s</sup> Q*.*

**Corollary 1.** Lstrong <sup>μ</sup> (As) *is adequate with* ∼A<sup>s</sup> *, i.e.,* P ∼A<sup>s</sup> P *if and only if* (∀<sup>ϕ</sup> <sup>∈</sup> <sup>L</sup>strong <sup>μ</sup> (As)) P |= ϕ *iff* P-|= ϕ*.*

Theorems 4 and 5 express that sharp bisimilarity is a congruence for parallel composition and admissible action mapping. It follows that it is also a congruence for hide, cut, and rename, as expressed by Corollary 2.

**Theorem 4.** *If* P ∼A<sup>s</sup> P- *,* Q ∼A<sup>s</sup> Q *then* P |[Async]| Q ∼A<sup>s</sup> P- |[Async]| Q- *.*

**Theorem 5.** *If* ρ *is admissible and* P ∼A<sup>s</sup> P- *, then* ρ(P) ∼A- <sup>s</sup> ρ(P- )*, where* A- <sup>s</sup> = ρ(As) \ ρ(A<sup>P</sup> \ As)*.*

**Corollary 2.** *We write* A<sup>τ</sup> *for* A ∪ {τ}*. If* P ∼A<sup>s</sup> P*then:*


These theorems and corollaries generalize results on strong and divbranching bisimilarity. In particular, the side conditions of Corollary 2 are always true when A<sup>s</sup> = ∅ (divbranching) or A<sup>s</sup> = A (strong).

Since every admissible network of LTS can be translated into an equivalent composition expression consisting of parallel compositions and admissible action mappings, Theorems 4 and 5 imply some congruence property at the level of networks of LTS. However, one must be careful on how the synchronization rules preserve or modify the set of strong actions of components.

In the sequel, we establish formally the relationship between sharp bisimilarity and sharp τ -confluence, a strong form of τ -confluence [27] defined below in a way analogous to strong τ -confluence in [28]. It is known that every τ -transition that is τ -confluent is inert for branching bisimilarity, i.e., its source and target states are branching bisimilar. There are situations where τ -confluence can be detected locally, thus enabling on-the-fly LTS reductions. We present an analogous result that might have similar applications, namely, every τ -transition that is sharp τ -confluent is inert for (divergence-unpreserving) sharp bisimilarity.

**Definition 8 (Sharp** *<sup>τ</sup>* **-confluence).** *Let* <sup>P</sup> = (Σ, A, −→, pinit) *and* <sup>T</sup> <sup>⊆</sup> <sup>τ</sup> −→ *be a set of internal transitions.* T *is sharp* τ *-confluent w.r.t. a set* A<sup>s</sup> *of strong actions if* τ /∈ A<sup>s</sup> *and for all* (p0, τ, p1) ∈ T*,* a ∈ A*, and* p<sup>2</sup> ∈ Σ*: (1)* p<sup>0</sup> <sup>a</sup> −→ <sup>p</sup><sup>2</sup> *implies either* p<sup>1</sup> <sup>a</sup> −→ <sup>p</sup><sup>2</sup> *or there exists* <sup>p</sup><sup>3</sup> *such that* <sup>p</sup><sup>1</sup> <sup>a</sup> −→ <sup>p</sup><sup>3</sup> *and* (p2, τ, p3) <sup>∈</sup> <sup>T</sup>*, and (2) if* a ∈ A<sup>s</sup> *then* p<sup>1</sup> <sup>a</sup> −→ <sup>p</sup><sup>3</sup> *implies either* <sup>p</sup><sup>0</sup> <sup>a</sup> −→ <sup>p</sup><sup>3</sup> *or there exists* <sup>p</sup><sup>2</sup> *such that* p<sup>1</sup> <sup>a</sup> −→ <sup>p</sup><sup>2</sup> *and* (p2, τ, p3) <sup>∈</sup> <sup>T</sup>*. A transition* <sup>p</sup><sup>0</sup> <sup>τ</sup> −→ <sup>p</sup><sup>1</sup> *is sharp* <sup>τ</sup> *-confluent w.r.t.* A<sup>s</sup> *if there is a set of transitions* T *that is sharp* τ *-confluent w.r.t.* A<sup>s</sup> *and such that* (p0, τ, p1) ∈ T*.*

The difference between strong τ -confluence and sharp τ -confluence is the addition of condition (2), which can be removed to obtain the very same definition of strong τ -confluence as [28]. Strong τ -confluence thus coincides with sharp τ confluence w.r.t. the empty set of actions. Sharp τ -confluence not only requires that other transitions of the source state of a confluent transition also exist in the target state, but also that the converse is true for strong actions.

If a transition is sharp τ -confluent w.r.t. As, then it is also sharp τ -confluent w.r.t. any subset of As. In particular, sharp τ -confluence is stronger than strong τ -confluence (which is itself stronger than τ -confluence). Theorem 6 formalizes the relationship between sharp τ -confluence and divergence-unpreserving sharp bisimilarity. This result could be lifted to sharp bisimilarity by adding a condition on divergence in the definition of sharp τ -confluence.

**Theorem 6.** *If* τ /∈ A<sup>s</sup> *and* p<sup>0</sup> <sup>τ</sup> −→<sup>P</sup> <sup>p</sup><sup>1</sup> *is sharp* <sup>τ</sup> *-confluent w.r.t.* <sup>A</sup>s*, then* <sup>p</sup><sup>0</sup> *and* p<sup>1</sup> *are divergence-unpreserving sharp bisimilar w.r.t.* As*.*

Theorem 6 illustrates a form of reduction that one can expect using sharp bisimilarity when τ /∈ As, namely compression of diamonds of sharp τ -confluent transitions, which are usually generated by parallel composition. The strongest form of sharp τ -confluence (which could be called *ultra-strong* τ -confluence) is when all visible actions are strong. In that case, every visible action present in the source state must be also present in the target state, and conversely. The source and target states are then sharp bisimilar w.r.t. the set of visible actions. Yet, it is interesting to note that they are not necessarily strongly bisimilar, sharp bisimilarity w.r.t. all visible actions being weaker than strong bisimilarity.

There exist weaker forms of τ -confluence [27, 50], which accept that choices between τ -confluent and other transitions are closed by arbitrary sequences of τ -confluent transitions rather than sequences of length 0 or 1. It could be interesting to investigate how the definition of sharp τ -confluence could also be weakened, while preserving inertness for sharp bisimilarity.

## **5 LTS Reduction**

The interest of sharp bisimilarity in the context of compositional verification is the ability to replace components by smaller but still equivalent ones, as allowed by the congruence property. To do so, we need a procedure that enables such a reduction. This is what we address in this section.

A procedure to reduce an LTS P for sharp bisimilarity is proposed as follows: (1) Build P- , consisting of P in which all τ -transitions that immediately precede a transition labelled by a strong action (or all τ -transitions if τ is itself a strong action) are renamed into a special visible action κ ∈ A\A<sup>P</sup> ; (2) Minimize P for divbranching bisimilarity; (3) Hide in the resulting LTS all occurrences of κ. The renaming of τ -transitions into κ allows them to be considered temporarily as visible transitions, so that they are not eliminated by divbranching minimization.<sup>8</sup> This algorithm is now defined formally.

**Definition 9.** *Let* P *be an LTS and* A<sup>s</sup> *be a set of strong actions. Let* κ ∈ A\A<sup>P</sup> *be a special visible action. We write red* <sup>A</sup><sup>s</sup> (P) *for the reduction of* P *defined as the LTS "hide* κ *in min*dbr (P- )*", where* P- = (Σ<sup>P</sup> , A<sup>P</sup> ∪ {κ}, −→, *init*(P)) *and* −→ *is defined as follows:*

$$\begin{aligned} 0 &\longrightarrow = \{ (p, \kappa, p') \mid p \stackrel{a}{\longrightarrow}\_P p' \land \underline{\kappa}(a, p') \} \cup \{ (p, a, p') \mid p \stackrel{a}{\longrightarrow}\_P p' \land \neg \underline{\kappa}(a, p') \} \\ where \; \underline{\kappa}(a, p') &= ((a = \tau) \land (\tau \in A\_s \lor p' \stackrel{A\_{s\_i}}{\longrightarrow}\_P)) \end{aligned}$$

It is clear that *red* <sup>A</sup><sup>s</sup> (P) is a reduction, i.e., it cannot have more states and transitions than P. Since the complexities of the transformation from P to P- and of hiding κ are at worst linear in |P|tr , the complexity of the whole algorithm is dominated by divbranching minimization, for which there exists an algorithm<sup>9</sup> of worst-case complexity O(m log n), where m = |P|tr and n = |P|st [25].

As regards correctness, Theorem 7 states that *red* <sup>A</sup><sup>s</sup> (P) is indeed sharp bisimilar to P. Theorem 8 indicates that the reduction coincides with divbranching minimization if the LTS does not contain any strong action, with strong minimization if τ is a strong action or if the LTS does not contain τ , and that the resulting LTS has a size that lies in between the size of the minimal LTS for divbranching bisimilarity and the size of the minimal LTS for strong bisimilarity.

**Theorem 7.** *For any LTS* P*, we have* P ∼A<sup>s</sup> *red* <sup>A</sup><sup>s</sup> (P)*.*

<sup>8</sup> The letter κ stands for keep uncompressed.

<sup>9</sup> Strictly speaking, the algorithm of [25] implements branching minimization but, as noted by its authors, handling divergences requires only a minor adaptation.

**Theorem 8.** *The following hold for any LTS* P*: (1) if* A<sup>P</sup> ∩ A<sup>s</sup> = ∅ *then red* <sup>A</sup><sup>s</sup> (P) = *min*dbr (P)*, (2) if* τ /∈ A<sup>P</sup> \ A<sup>s</sup> *then red* <sup>A</sup><sup>s</sup> (P) = *min*str (P)*, and (3)* |*min*dbr (P)|st ≤ |*red* <sup>A</sup><sup>s</sup> (P)|st ≤ |*min*str (P)|st ∧ |*min*dbr (P)|tr ≤ |*red* <sup>A</sup><sup>s</sup> (P)|tr ≤ |*min*str (P)|tr *.*

Although sharp reduction is effective in practice, as will be illustrated in the next section, it may fail to compress τ -transitions that are inert for sharp bisimilarity, as show the following examples.

*Example 5.* Consider the LTS of Figure 2(a) (page 9). Its reduction using the above algorithm consists of the three steps depicted below:

The reduced LTS (obtained at step 3) has one more state and two more transitions than the minimal LTS shown in Figure 2(b). Even though all visible actions are strong, our reduction compresses more than strong bisimilarity (recall that the minimal LTS for strong bisimilarity has 7 states and 8 transitions). In general, our reduction reduces more than strong bisimilarity<sup>10</sup> as soon as τ /<sup>∈</sup> <sup>A</sup><sup>s</sup> (which is the case for most formulas in practice).

*Example 6.* In Figure 1 (page 8), if a ∈ A<sup>s</sup> then *red* <sup>A</sup><sup>s</sup> (P1) = P- <sup>1</sup>, *red* <sup>A</sup><sup>s</sup> (P2) = P- 2, and *red* <sup>A</sup><sup>s</sup> (P6) = P- <sup>6</sup>, i.e., reduction yields the minimal LTS. Yet, *red* <sup>A</sup><sup>s</sup> (P3) = P<sup>3</sup> = P- <sup>3</sup>, i.e., the sharp τ -confluent transition p<sup>0</sup> <sup>τ</sup> −→<sup>P</sup><sup>3</sup> <sup>p</sup><sup>2</sup> is not compressed. Similarly, P4, P5, and P<sup>7</sup> are not minimized using *red* <sup>A</sup><sup>s</sup> .

Devising a minimization algorithm for sharp bisimilarity is left for future work. It could combine elements of existing partition-refinement algorithms for strong and divbranching minimizations, but the following difficulty must be taken into account (basic knowledge about partition-refinement is assumed):

**–** A sequence of τ -transitions is inert w.r.t. the current state partition if both its source, target, and intermediate states are in the same block. To refine a partition for sharp bisimilarity, one must be able to compute efficiently the set of non-inert transitions labelled by weak actions and reachable after an arbitrary sequence of inert transitions. The potential presence of inert cycles has to be considered carefully to avoid useless computations.

<sup>10</sup> The result of reduction is necessarily strong-bisimulation minimal, because if a transition <sup>p</sup> <sup>τ</sup> −→ <sup>p</sup> is renamed into κ, then it is also the case of a τ -transition in every state bisimilar to p, which remains bisimilar after the renaming. In addition, the subsequent divbranching minimization step necessarily merges strongly bisimilar states.

**–** In the case of divbranching bisimilarity, every τ -cycle is inert and can thus be compressed into a single state. This is usually done initially, using the Tarjan algorithm for finding strongly connected components, whose complexity is linear in the LTS size. This guarantees the absence of inert cycles (except self τ -loops) all along the subsequent partition-refinement steps. However, τ -cycles are not necessarily inert for sharp bisimilarity, as illustrated by LTS P- <sup>7</sup> in Figure 1 (page 8). Therefore, τ -cycles cannot be compressed initially. Instead, a cycle inert w.r.t. the current partition may be split into several sub-blocks during a refinement step. To know whether the sub-blocks still contain inert cycles, the Tarjan algorithm may have to be applied again.

Although *red* <sup>A</sup><sup>s</sup> is not a minimization, we will see that it performs very well when used in a compositional setting. The reason is that (1) only a few of the system actions are strong, which limits the number of τ -transitions renamed to κ, and (2) sharp τ -confluent transitions most often originate from the interleaving of τ -transitions that are inert in the components of parallel composition. The above reduction algorithm removes most inert transitions in individual (sequential) LTS, thus limiting the number of sharp τ -confluent transitions in intermediate LTS. Still, better reductions can be expected with a full minimization algorithm, which will compress all τ -transitions that are inert for sharp bisimilarity.

## **6 Experimentation**

We experimented sharp reduction on the examples presented in [35] (consisting of formulas containing both weak and strong modalities), namely the TFTP (Trivial File Transfer Protocol) and the CTL verification problems on parallel systems of the RERS 2018 challenge. For lack of space, see [35] for more details about these case studies. In both cases, we composed parallel processes in the same order as we did using the combined bisimulations approach, but using sharp bisimilarity instead of strong or divbranching bisimilarity to reduce processes. Experiments were done on a 3GHz/12GB RAM/8-core Intel Xeon computer running Linux, using the specification languages and 32-bit versions of tools provided in the CADP toolbox version 2019-d "Pisa" [19].

The results are given in Figures 3 (TFTP) and 4 (RERS 2018), both in terms of the size of the largest intermediate LTS, the size of the final LTS (LTS obtained after the last reduction step, on which the formula is checked), memory consumption, and time. Each subfigure contains three curves corresponding to the mono-bisimulation approach (using strong bisimulation to reduce all LTS), the combined bisimulations approach, and the sharp bisimulation approach. The former two curves are made from data that were already presented in [35]. Note that the vertical axis of all subfigures is on a logarithmic scale. In the RERS 2018 case, the mono-bisimulation approach gives results only for experiments 101#22 and 101#23, all other experiments failing due to state space explosion.<sup>11</sup>

<sup>11</sup> E.g., smart mono-bisimulation fails on problem 103#23 after generating an intermediate LTS with more than 4.5 billion states and 36 billion transitions (instead of 50, 301 states and 334, 530 transitions using sharp bisimulation) using Grid'5000 [6].

**Fig. 3.** Experimental results of the TFTP case-study

**Fig. 4.** Experimental results of the RERS 2018 case-study

These results show that sharp bisimilarity incurs much more LTS reduction than the combined bisimulations approach, by a factor close to the one obtained when switching from the mono-bisimulation approach to the combined bisimulations approach. However, in the case of the RERS 2018 examples, this gain on LTS size does not always apply to time and/or memory consumption in the same proportions, except for experiment 103#22. This suggests that our implementation of minimization could be improved.

These experiments were conducted after closing of the RERS 2018 challenge. Encouraged by the good results obtained with these two approaches, we participated to the 2019 edition<sup>12</sup>, where 180 CTL problems were proposed instead of 9 in 2018. The models on which the properties had to be verified have from 8 to 70 parallel processes and from 29 to 234 actions. Although the models had been given in a wealth of different input formats (communicating automata, Petri nets in PNML format with NUPN information [16], and Promela) suitable for a large number of model checking tools, no other team than ours participated to the parallel challenges. This is a significant difference with 2018, when the challenge was easier, allowing three teams (with different tools) to participate.

We applied smart sharp reduction to these problems, using a prototype program that extracts strong actions automatically from (a restricted set of) CTL formulas used in the competition.<sup>13</sup> This allowed the 180 properties to be checked automatically in less than 2.5 hours (CPU time), and using about 200 MB of RAM only, whereas using strong reduction failed on most of the largest problems. The largest intermediate graph obtained for the whole set of problems has 3364 states. All results were correct and we won all gold medals<sup>14</sup> in this category.<sup>15</sup> Details are available in the Zenodo archive mentioned in the introduction.

## **7 Related Work**

The paper [48] defines on doubly-labelled transition systems (mix between Kripke structure and LTS) a family of bisimilarity relations derived from divbranching bisimilarity, parameterized by a natural number n, which preserves CTL\* formulas whose nesting of next operators is smaller or equal to n. Similar to our work, they show that this family of relations (which is distinct from sharp bisimilarity in that there is no distinction between weak and strong actions) fills the gap between strong and divbranching bisimilarities. They apply their bisimilarity relation to slicing rather than compositional verification.

The paper [2] proposes that, if the formula contains only so-called *selective* modalities, of the form -(¬α1)∗.α2 ϕ0, then all actions but those satisfying

<sup>12</sup> http://rers-challenge.org/2019

<sup>13</sup> The paper [35] presents identities that were used to extract such strong actions.

<sup>14</sup> A RERS gold medal is not a ranking but an achievement, not weakened by the low number of competitors. We also won all gold medals in the "verification of LTL properties on parallel systems" category, using an adaptation of this approach.

<sup>15</sup> http://cadp.inria.fr/news12.html

α<sup>1</sup> or α<sup>2</sup> can be hidden, and the resulting system can be reduced for τ <sup>∗</sup>.aequivalence [14]. Yet, there exist formulas whose strong modalities α ϕ<sup>0</sup> cannot translate into anything but the selective modality -(¬true)∗.α ϕ, meaning that no action at all can be hidden. In this case, τ <sup>∗</sup>.a equivalence coincides with strong bisimilarity and thus incurs much less reduction than sharp bisimilarity. Moreover, it is well-known that τ <sup>∗</sup>.a-equivalence is not a congruence for parallel composition [7], which makes it unsuitable to compositional verification, even to check formulas that contain weak modalities only.

The adequacy of Ldbr <sup>μ</sup> with divbranching bisimilarity is shown in [37]. This paper also claims that ACTL\X is as expressive as <sup>L</sup>dbr <sup>μ</sup> and thus also adequate with divbranching bisimilarity, but a small mistake in the proof had the authors omit that the Ldbr <sup>μ</sup> formula τ @ cannot actually be expressed in ACTL\X. It remains true that ACTL\X is preserved by divbranching bisimilarity.

In [13], it is shown that ACTL\X is adequate with divergence sensitive branching bisimilarity. This bisimilarity relation is equivalent to divbranching bisimilarity [21–23] only in the case of deadlock-free LTS, but it differs in the presence of deadlock states since it does not distinguish a deadlock state from a self τ -loop (which can instead be recognized in Ldbr <sup>μ</sup> with the τ @ formula).

## **8 Conclusion**

This work enhances the reductions that can be obtained by combining compositional LTS construction with an analysis of the temporal logic formula to be verified. In particular, known results about strong and divbranching bisimilarities have been combined into a new family of relations called sharp bisimilarities, which inherit all nice properties of their ancestors and refine the state of the art in compositional verification.

This new approach is promising. Yet, to be both usable by non-experts and fully efficient, at least two components are still missing: (1) The sets of strong actions, which are a key ingredient in the success of this approach, still have to be computed either using pencil and paper or using tools dedicated to restricted logics; automating their computation in the case of arbitrary L<sup>μ</sup> formulas is not easy, but likely feasible, opening the way to a new research track; finding a minimal set of strong actions automatically is challenging, and since it is not unique, even more challenging is the quest for the set that will incur the best reductions. (2) Efficient algorithms are needed to minimize LTS for sharp bisimilarity; they could probably be obtained by adapting the known algorithms for strong and divbranching minimizations (at least using some kind of signaturebased partition refinement algorithm in the style of Blom *et al.* [3–5] in a first step), but this remains to be done.

*Acknowledgements.* The authors thank Hubert Garavel, who triggered our collaboration between Grenoble and Pisa, and Wendelin Serwe for his comments on earlier versions of this paper. They also thank the anonymous referees for the pertinence of their comments, which allowed significant improvements of this paper.

## **References**


University of Twente, Enschede, The Netherlands. Lecture Notes in Computer Science, vol. 1217. Springer (Apr 1997), extended version with proofs available as Research Report VERIMAG RR97-01


Applications and Theory of Petri Nets (ICATPN'91), Gjern, Denmark. Lecture Notes in Computer Science, vol. 674, pp. 427–457. Springer (1993)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Verification and Efficiency

## **How Many Bits Does it Take to Quantize Your Neural Network?**

Mirco Giacobbe<sup>12</sup>, Thomas A. Henzinger<sup>1</sup>, and Mathias Lechner<sup>1</sup>

<sup>1</sup> IST Austria, Klosterneuburg, Austria <sup>2</sup> University of Oxford, Oxford, United Kingdom

**Abstract.** Quantization converts neural networks into low-bit fixedpoint computations which can be carried out by efficient integer-only hardware, and is standard practice for the deployment of neural networks on real-time embedded devices. However, like their real-numbered counterpart, quantized networks are not immune to malicious misclassification caused by adversarial attacks. We investigate how quantization affects a network's robustness to adversarial attacks, which is a formal verification question. We show that neither robustness nor nonrobustness are monotonic with changing the number of bits for the representation and, also, neither are preserved by quantization from a realnumbered network. For this reason, we introduce a verification method for quantized neural networks which, using SMT solving over bit-vectors, accounts for their exact, bit-precise semantics. We built a tool and analyzed the effect of quantization on a classifier for the MNIST dataset. We demonstrate that, compared to our method, existing methods for the analysis of real-numbered networks often derive false conclusions about their quantizations, both when determining robustness and when detecting attacks, and that existing methods for quantized networks often miss attacks. Furthermore, we applied our method beyond robustness, showing how the number of bits in quantization enlarges the gender bias of a predictor for students' grades.

## **1 Introduction**

Deep neural networks are powerful machine learning models, and are becoming increasingly popular in software development. Since recent years, they have pervaded our lives: think about the language recognition system of a voice assistant, the computer vision employed in face recognition or self driving, not to talk about many decision-making tasks that are hidden under the hood. However, this also subjects them to the resource limits that real-time embedded devices impose. Mainly, the requirements are low energy consumption, as they often run on batteries, and low latency, both to maintain user engagement and to effectively interact with the physical world. This translates into specializing our computation by reducing the memory footprint and instruction set, to minimize cache misses and avoid costly hardware operations. For this purpose, quantization compresses neural networks, which are traditionally run over 32-bit floating-point arithmetic, into computations that require bit-wise and integeronly arithmetic over small words, e.g., 8 bits. Quantization is the standard technique for the deployment of neural networks on mobile and embedded devices, and is implemented in TensorFlow Lite [13]. In this work, we investigate the robustness of quantized networks to adversarial attacks and, more generally, formal verification questions for quantized neural networks.

Adversarial attacks are a well-known vulnerability of neural networks [24]. For instance, a self-driving car can be tricked into confusing a stop sign with a speed limit sign [9], or a home automation system can be commanded to deactivate the security camera by a voice reciting poetry [22]. The attack is carried out by superposing the innocuous input with a crafted perturbation that is imperceptible to humans. Formally, the attack lies within the neighborhood of a known-to-be-innocuous input, according to some notion of distance. The fraction of samples (from a large set of test inputs) that do not admit attacks determines the robustness of the network. We ask ourselves how quantization affects a network's robustness or, dually, how many bits it takes to ensure robustness above some specific threshold. This amounts to proving that, for a set of given quantizations and inputs, there does not exists an attack, which is a formal verification question.

The formal verification of neural networks has been addressed either by overapproximating—as happens in abstract interpretation—the space of outputs given a space of attacks, or by searching—as it happens in SMT-solving—for a variable assignment that witnesses an attack. The first category include methods that relax the neural networks into computations over interval arithmetic [20], treat them as hybrid automata [27], or abstract them directly by using zonotopes, polyhedra [10], or tailored abstract domains [23]. Overapproximationbased methods are typically fast, but incomplete: they prove robustness but do not produce attacks. On the other hand, methods based on local gradient descent have turned out to be effective in producing attacks in many cases [16], but sacrifice formal completeness. Indeed, the search for adversarial attack is NPcomplete even for the simplest (i.e., ReLU) networks [14], which motivates the rise of methods based on *Satisfiability Modulo Theory* (SMT) and *Mixed Integer Linear Programming* (MILP). SMT-solvers have been shown not to scale beyond toy examples (20 hidden neurons) on monolithic encodings [21], but today's specialized techniques can handle real-life benchmarks such as, neural networks for the MNIST dataset. Specialized tools include DLV [12], which subdivides the problem into smaller SMT instances, and Planet [8], which combines different SAT and LP relaxations. Reluplex takes a step further augmenting LP-solving with a custom calculus for ReLU networks [14]. At the other end of the spectrum, a recent MILP formulation turned out effective using off-the-shelf solvers [25]. Moreover, it formed the basis for Sherlock [7], which couples local search and MILP, and for a specialized branch and bound algorithm [4].

All techniques mentioned above do not reason about the machine-precise semantics of the networks, neither over floating- nor over fixed-point arithmetic, but reason about a real-number relaxation. Unfortunately, adversarial attacks computed over the reals are not necessarily attacks on execution architectures, in particular, for quantized networks implementations. We show, for the first time, that attacks and, more generally, robustness and vulnerability to attacks do not always transfer between real and quantized networks, and also do not always transfer monotonically with the number of bits across quantized networks. Verifying the real-valued relaxation of a network may lead scenarios where


More generally, we show that all three phenomena can occur non-monotonically with the precision in the numerical representation. In other words, it may occur that a quantized network fulfills a specification while both a higher and a lower bits quantization violate it, or that the first violates it and both the higher and lower bits quantizations fulfill it; moreover, specific counterexamples may not transfer monotonically across quantizations.

The verification of real-numbered neural networks using the available methods is inadequate for the analysis of their quantized implementations, and the analysis of quantized neural networks needs techniques that account for their bit-precise semantics. Recently, a similar problem has been addressed for binarized neural networks, through SAT-solving [18]. Binarized networks represent the special case of 1-bit quantizations. For many-bit quantizations, a method based on gradient descent has been introduced recently [28]. While efficient (and sound), this method is incomplete and may produce false negatives.

We introduce, for the first time, a complete method for the formal verification of quantized neural networks. Our method accounts for the bit-precise semantics of quantized networks by leveraging the first-order theory of bit vectors without quantifiers (QF BV), to exactly encode hardware operations such as 2'complementation, bit-shift, integer arithmetic with overflow. On the technical side, we present a novel encoding which balances the layout of long sequences of hardware multiply-add operations occurring in quantized neural networks. As a result, we obtain a encoding into a first-order logic formula which, in contrast to a standard unbalanced linear encoding, makes the verification of quantized networks practical and amenable to modern bit-precise SMT-solving. We built a tool using Boolector [19], evaluated the performance of our encoding, compared its effectiveness against real-numbered verification and gradient descent for quantized networks, and finally assessed the effect of quantization for different networks and verification questions.

We measured the robustness to attacks of a neural classifier involving 890 neurons and trained on the MNIST dataset (handwritten digits), for quantizations between 6 and 10 bits. First, we demonstrated that Boolector, off-the-shelf and using our balanced SMT encoding, can compute every attack within 16 hours, with a median time of 3h 41m, while timed-out on all instances beyond 6 bits using a standard linear encoding. Second, we experimentally confirmed that both Reluplex and gradient descent for quantized networks can produce false conclusions about quantized networks; in particular, spurious results occurred consistently more frequently as the number of bits in quantization decreases. Finally, we discovered that, to achieve an acceptable level of robustness, it takes a higher bit quantization than is assessed by standard accuracy measures.

Lastly, we applied our method beyond the property of robustness. We also evaluate the effect of quantization upon the gender bias emerging from quantized predictors for students' performance in mathematics exams. More precisely, we computed the maximum predictable grade gap between any two students with identical features except for gender. The experiment showed that a substantial gap existed and was proportionally enlarged by quantization: the lower the number bits the larger the gap.

We summarize our contribution in five points. First, we show that the robustness of quantized neural networks is non-monotonic in the number of bits and is non-transferable from the robustness of their real-numbered counterparts. Second, we introduce the first complete method for the verification of quantized neural networks. Third, we demonstrate that our encoding, in contrast to standard encodings, enabled the state-of-the-art SMT-solver Boolector to verify quantized networks with hundreds of neurons. Fourth, we also show that existing methods determine both robustness and vulnerability of quantized networks less accurately than our bit-precise approach, in particular for low-bit quantizations. Fifth, we illustrate how quantization affects the robustness of neural networks, not only with respect to adversarial attacks, but also with respect to other verification questions, specifically fairness in machine learning.

## **2 Quantization of Feed-forward Networks**

A feed-forward neural network consists of a finite set of *neurons* <sup>x</sup><sup>1</sup>,...,x<sup>k</sup> partitioned into a sequence of layers: an *input layer* with n neurons, followed by one or many *hidden layers*, finally followed by an *output layer* with m neurons. Every pair of neurons <sup>x</sup><sup>j</sup> and <sup>x</sup><sup>i</sup> in respectively subsequent layers is associated with a *weight* coefficient <sup>w</sup>ij <sup>∈</sup> <sup>R</sup>; if the layer of <sup>x</sup><sup>j</sup> is not subsequent to that of <sup>x</sup><sup>i</sup>, then we assume <sup>w</sup>ij = 0. Every hidden or output neuron <sup>x</sup><sup>i</sup> is associated with a *bias* coefficient <sup>b</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup>. The real-valued semantics of the neural network gives to each neuron a real value: upon a valuation for the neurons in the input layer, every other neuron <sup>x</sup><sup>i</sup> assumes its value according to the update rule

$$x\_i = \text{ReLU-N}(b\_i + \sum\_{j=1}^k w\_{ij} x\_j),\tag{1}$$

where ReLU-N : <sup>R</sup> <sup>→</sup> <sup>R</sup> is the *activation function*. Altogether, the neural network implements a function f : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup><sup>m</sup> whose result corresponds to the valuation for the neurons in the output layer.

The activation function governs the firing logic of the neurons, layer by layer, by introducing non-linearity in the system. Among the most popular activation functions are purely non-linear functions, such as the tangent hyperbolic and the sigmoidal function, and piece-wise linear functions, better known as *Rectified Linear Units* (ReLU) [17]. ReLU consists of the function that takes the positive part of its argument, i.e., ReLU(x) = max{x, <sup>0</sup>}. We consider the variant of ReLU that imposes a cap value N, known as ReLU-N [15]. Precisely

$$\text{ReLU-N}(x) = \min\{\max\{x, 0\}, N\},\tag{2}$$

which can be alternatively seen as a concatenation of two ReLU functions (see Eq. 10). As a consequence, all neural networks we treat are full-fledged ReLU networks; their real-valued versions are amenable to state-of-the-art verification tools including Reluplex, but neither account for the exact floating- nor fixedpoint execution models.

Quantizing consists of converting a neural network over real numbers, which is normally deployed on floating-point architectures, into a neural network over integers, whose semantics corresponds to a computation over fixed-point arithmetic [13]. Specifically, fixed-point arithmetic can be carried out by integer-only architectures and possibly over small words, e.g., 8 bits. All numbers are represented in 2's complement over B bits words and F bits are reserved to the fractional part: we call the result a B*-bits quantization in Q*F *arithmetic*. More concretely, the conversion follows from the rounding of weight and bias coefficients to the <sup>F</sup>-th digit, namely ¯b<sup>i</sup> = rnd(2<sup>F</sup> <sup>b</sup><sup>i</sup>) and ¯wij = rnd(2<sup>F</sup> <sup>w</sup>ij ) where rnd(·) stands for any rounding to an integer. Then, the fundamental relation between a quantized value ¯a and its real counterpart a is

$$a \approx 2^{-F}\bar{a}.\tag{3}$$

Consequently, the semantics of a quantized neural network corresponds to the update rule in Eq. 1 after substituting of x, w, and b with the respective approximants 2−<sup>F</sup> <sup>x</sup>¯, 2−<sup>F</sup> <sup>w</sup>¯, and 2−<sup>F</sup> ¯b. Namely, the semantics amounts to

$$\bar{x}\_i = \text{ReLU}(2^F N)(\bar{b}\_i + \text{int}(2^{-F} \sum\_{j=1}^k \bar{w}\_{ij} \bar{x}\_j)),\tag{4}$$

where int(·) truncates the fractional part of its argument or, in other words, rounds towards zero. In summary, the update rule for the quantized semantics consists of four parts. The first part, i.e., the linear combination <sup>k</sup> <sup>j</sup>=1 <sup>w</sup>¯ijx¯<sup>j</sup> , propagates all neurons values from the previous layer, obtaining a value with possibly 2B fractional bits. The second scales the result by 2−<sup>F</sup> truncating the fractional part by, in practice, applying an arithmetic shift to the right of F bits. Finally, the third applies the bias ¯b and the fourth clamps the result between 0 and 2<sup>F</sup> N. As a result, a quantize neural network realizes a function f : <sup>Z</sup><sup>n</sup> <sup>→</sup> <sup>Z</sup><sup>m</sup>, which exactly represents the concrete (integer-only) hardware execution.

We assume all intermediate values, e.g., of the linear combination, to be fully representable as, coherently with the common execution platforms [13], we always allocate enough bits for under and overflow not to happen. Hence, any loss of precision from the respective real-numbered network happens exclusively, at each layer, as a consequence of rounding the result of the linear combination to F fractional bits. Notably, rounding causes the robustness to adversarial attacks of quantized networks with different quantization levels to be independent of one another, and independent of their real counterpart.

## **3 Robustness is Non-monotonic in the Number of Bits**

A neural classifier is a neural network that maps a n-dimensional input to one out of m classes, each of which is identified by the output neuron with the largest value, i.e., for the output values z<sup>1</sup>,...,z<sup>m</sup>, the choice is given by

$$\text{class}(z\_1, \ldots, z\_m) = \underset{i}{\text{arg}\max} \, z\_i. \tag{5}$$

For example, a classifier for handwritten digits takes in input the pixels of an image and returns 10 outputs z<sup>0</sup>,...,z<sup>9</sup>, where the largest indicates the digit the image represents. An adversarial attack is a perturbation for a sample input

original + perturbation = attack

that, according to some notion of closeness, is indistinguishable from the original, but tricks the classifier into inferring an incorrect class. The attack in Fig. 1 is

**Fig. 1:** Adversarial attack.

indistinguishable from the original by the human eye, but induces our classifier to assign the largest value to z<sup>3</sup>, rather than <sup>z</sup><sup>9</sup>, misclassifying the digit as a 3. For this example, misclassification happens consistently, both on the realnumbered and on the respective 8-bits quantized network in Q4 arithmetic. Unfortunately, attacks do not necessarily transfer between real and quantized networks and neither between quantized networks for different precision. More generally, attacks and, dually, robustness to attacks are non-monotonic with the number of bits.

We give a prototypical example for the non-monotonicity of quantized networks in Fig. 2. The network consists of one input, 4 hidden, and 2 output neurons, respectively from left to right. Weights and bias coefficients, which are annotated on the edges, are all fully representable in Q1. For the neurons in the top row we show, respectively from top to bottom, the valuations obtained using a Q3, Q2, and Q1 quantization of the network (following Eq. 4); precisely, we

**Fig. 2:** Neural network with non-monotonic robustness w.r.t. its Q1, Q2, and Q3 quantizations.

show their fractional counterpart ¯x/2<sup>F</sup> . We evaluate all quantizations and obtain that the valuations for the top output neuron are non-monotonic with the number of fractional bits; in fact, the Q1 dominates the Q3 which dominates the Q2 output. Coincidentally, the valuations for the Q3 quantization correspond to the valuations with real-number precision (i.e., never undergo truncation), indicating that also real and quantized networks are similarly incomparable. Notably, all phenomena occur both for quantized networks with rounding towards zero (as we show in the example), and with rounding to the nearest, which is naturally non-monotonic (e.g., 5/16 rounds to 1/2, 1/4, and 3/8 with, resp., Q1, Q2, and Q3).

Non-monotonicity of the output causes non-monotonicity of robustness, as we can put the decision boundary of the classifier so as to put Q2 into a different class than Q1 and Q3. Suppose the original sample is 3/2 and its class is associated with the output neuron on the top, and suppose attacks can only lay in the neighboring interval 3/2 ± 1. In this case, we obtain that the Q2 network admits an attack, because the bottom output neuron can take 5/2, that is larger than 2. On the other hand, the bottom output can never exceed 3/8 and 1/2, hence Q1 and Q3 are robust. Dually, also non-robustness is non-monotonic as, for the sample 9/2 whose class corresponds to the bottom neuron, for the interval 9/2 ± 2, Q2 is robust while both Q3 and Q1 are vulnerable. Notably, the specific attacks of Q3 and Q1 also do not always coincide as, for instance, 7/2.

Robustness and non-robustness are non-monotonic in the number of bits for quantized networks. As a consequence, verifying a high-bits quantization, or a real-valued network, may derive false conclusions about a target lower-bits quantization, in either direction. Specifically, for the question as for whether an attack exists, we may have both (i) false negatives, i.e., the verified network is robust but the target network admits an attack, and (ii) false positives, i.e., the verified network is vulnerable while the target network robust. In addition, we may also have (iii) true positives with invalid attacks, i.e., both are vulnerable but the found attack do not transfer to the target network. For these reasons we introduce a verification method quantized neural network that accounts for their bit-precise semantics.

## **4 Verification of Quantized Networks using Bit-precise SMT-solving**

Bit-precise SMT-solving comprises various technologies for deciding the satisfiability of first-order logic formulae, whose variables are interpreted as bit-vectors of fixed size. In particular, it produces satisfying assignments (if any exist) for formulae that include bitwise and arithmetic operators, whose semantics corresponds to that of hardware architectures. For instance, we can encode bit-shifts, 2's complementation, multiplication and addition with overflow, signed and unsigned comparisons. More precisely, this is the quantifier-free first-order theory of bit-vectors (i.e., QF BV), which we employ to produce a monolithic encoding of the verification problem for quantized neural networks.

A verification problem for the neural networks <sup>f</sup><sup>1</sup>,...,f<sup>K</sup> consists of checking the validity of a statement of the form

$$
\varphi(y\_1, \dots, y\_K) \implies \psi(f\_1(y\_1), \dots, f\_K(y\_K)), \tag{6}
$$

where ϕ is a predicate over the inputs and ψ over the outputs of all networks; in other words, it consists of checking an input–output relation, which generalizes various verification questions, including robustness to adversarial attacks and fairness in machine learning, which we treat in Sec. 5. For the purpose of SMT solving, we encode the verification problem in Eq. 6, which is a validity question, by its dual satisfiability question

$$\varphi(y\_1, \ldots, y\_K) \land \bigwedge\_{i=1}^K f\_i(y\_i) = z\_i \land \neg \psi(z\_1, \ldots, z\_K), \tag{7}$$

whose satisfying assignments constitute counterexamples for the contract. The formula consists of three conjuncts: the rightmost constraints the input within the assumption, the leftmost forces the output to violate the guarantee, while the one in the middle relates inputs and outputs by the semantics of the neural networks.

The semantics of the network consists of the bit-level translation of the update rule in Eq. 4 over all neurons, which we encode in the formula

$$\bigwedge\_{i=1}^{k} x\_i = \text{ReLU}(2^F N)(x\_i') \wedge x\_i' = \bar{b}\_i + \mathbf{ashr}(x\_i'', F) \wedge x\_i'' = \sum\_{j=1}^{k} \bar{w}\_{ij} x\_j. \tag{8}$$

Each conjunct in the formula employs three variables x, x- , and x- and is made of three, respective, parts. The first part accounts for the operation of clamping between 0 and 2<sup>F</sup> N, whose semantics is given by the formula ReLU-M(x) = ite(sign(x), <sup>0</sup>, ite(x <sup>≥</sup> M,M,x)). Then, the second part accounts for the operations of scaling and biasing. In particular, it encodes the operation of rounding by truncation scaling, i.e., int(2−<sup>F</sup> x), as an arithmetic shift to the right. Finally, the last part accounts for the propagation of values from the previous layer, which, despite the obvious optimization of pruning away all monomials

**Fig. 3:** Abstract syntax trees for alternative encodings of a long linear combination of the form k <sup>i</sup>=1 wixi.

with null coefficient, often consists of long linear combinations, whose exact semantic amounts to a sequence of multiply-add operations over an accumulator; particularly, encoding it requires care in choosing variables size and association layout.

The size of the bit-vector variables determines whether overflows can occur. In particular, since every monomial <sup>w</sup>ijx<sup>j</sup> consists of the multiplication of two B-bits variables, its result requires 2B bits in the worst case; since summation increases the value linearly, its result requires a logarithmic amount of extra bits in the number of summands (regardless of the layout). Provided that, we avoid overflow by using variables of 2B + log k bits, where k is the number of summands.

The association layout is not unique and, more precisely, varies with the order of construction of the long summation. For instance, associating from left to right produces a linear layout, as in Fig. 3a. Long linear combonations occurring in quantized neural networks are implemented as sequences of multiply-add operations over a single accumulator; this naturally induces a linear encoding. Instead, for the purpose formal verification, we propose a novel encoding which re-associates the linear combination by recursively splitting the sum into equal parts, producing a *balanced layout* as in Fig. 3b. While linear and balanced layouts are semantically equivalent, we have observed that, in practice, the second impacted positively the performance of the SMT-solver as we discuss in Sec. 5, where we also compare against other methods and investigate different verification questions.

## **5 Experimental Results**

We set up an experimental evaluation benchmark based on the MNIST dataset to answer the following three questions. First, how does our balanced encoding scheme impact the runtime of different SMT solvers compared to a standard linear encoding? Then, how often can robustness properties, that are proven for the real-valued network, transferred to the quantized network and vice versa? Finally, how often do gradient based attacking procedures miss attacks for quantized networks?

The MNIST dataset is a well-studied computer vision benchmark, which consists of 70,000 handwritten digits represented by 28-by-28 pixel images with a single 8-bit grayscale channel. Each sample belongs to exactly one category {0, <sup>1</sup>,... <sup>9</sup>}, which a machine learning model must predict from the raw pixel values. The MNIST set is split into 60,000 training and 10,000 test samples.

We trained a neural network classifier on MNIST, following a *post-training quantization* scheme [13]. First, we trained, using TensorFlow with floating-point precision, a network composed of 784 inputs, 2 hidden layers of size 64, 32 with ReLU-7 activation function and 10 outputs, for a total of 890 neurons. The classifier yielded a *standard accuracy*, i.e., the ratio of samples that are correctly classified out of all samples in the testing set, of 94.7% on the floating-point architecture. Afterward, we quantized the network with various bit sizes, with the exception of imposing the input layer to be always quantized in 8 bits, i.e., the original precision of the samples. The quantized networks required at least Q3 with 7 total bits to obtain an accuracy above 90% and Q5 with 10 bits to reach 94%. For this reason, we focused our study on the quantizations from 6 and the 10 bits in, respectively, Q2 to Q6 arithmetic.

Robust accuracy or, more simply, *robustness* measure the ratio of robust samples: for the distance ε > 0, a sample a is robust when, for all its perturbations y within that distance, the classifier class ◦ f chooses the original class c = class ◦ f(a). In other words, a is robust if, for all *<sup>y</sup>*

$$|a - y|\_{\infty} \le \varepsilon \implies c = \text{class} \circ f(y), \tag{9}$$

where, in particular, the right-hand side can be encoded as <sup>m</sup> <sup>j</sup>=1 <sup>z</sup><sup>j</sup> <sup>≤</sup> <sup>z</sup><sup>c</sup>, for *<sup>z</sup>* <sup>=</sup> f(*y*). Robustness is a validity question as in Eq. 6 and any witness for the dual satisfiability question constitutes an adversarial attack. We checked the robustness of our selected networks over the first 300 test samples from the dataset with ε = 1 on the first 200 and ε = 2 on the next 100; in particular, we tested our encoding using the SMT-solver Boolector [19], Z3 [5], and CVC4 [3], off-the-shelf.

Our experiments serve two purposes. The first is evaluating the scalability and precision of our approach. As for scalability, we study how encoding layout, i.e., linear or balanced, and the number of bits affect the runtime of the SMTsolver. As for precision, we measured the gap between our method and both a formal verifier for real-numbered networks, i.e., Reluplex [14], and the IFGSM algorithm [28], with respect to the accuracy of identifying robust and vulnerable samples. The second purpose of our experiments is evaluating the effect of quantization on the robustness to attacks of our MNIST classifier and, with an additional experiment, measuring the effect of quantization over the gender fairness of a student grades predictor, also demonstrating the expressiveness of our method beyond adversarial attacks.

As we only compared the verification outcomes, any complete verifier for real-numbered networks would lead to the same results as those obtained with Reluplex. Note that these tools verify the real-numbered abstraction of the network using some form of linear real arithmetic reasoning. Consequently, rounding errors introduced by the floating-point implementation of both, the network and the verifier, are not taken into account.

### **5.1 Scalability and performance**

We evaluated whether our balanced encoding strategy, compared to a standard linear encoding, can improve the scalability of contemporary SMT solvers for quantifier-free bit-vectors (QF BV) to check specifications of quantized neural networks. We ran all our experiments on an Intel Xeon W-2175 CPU, with 64GB memory, 128GB swap file, and 16 hours of time budget per problem instance. We encoded each instance using the two variants, the standard linear and our balanced layout. We scheduled 14 solver instances in parallel, i.e., the number of physical processor cores available on our machine. While Z3, CVC4 and Yices2


**Table 1:** Median runtimes for bit-exact robustness checks. The term oot refers to timeouts, and oom refers to out-of-memory errors. Due to the poor performance of Z3, CVC4, and Yices2 on our smallest 6-bit network, we abstained from running experiments involving more than 6 bits, i.e., entries marked by a dash (-).

timed out or ran out of memory on the 6-bit network, Boolector could check the instances of our smallest network within the given time budget, independently of the employed encoding scheme. Our results align with the SMT-solver performances reported by the SMT-COMP 2019 competition in the QF BV division [11]. Consequently, we will focus our discussion on the results obtained with Boolector.

With linear layout Boolector timed-out on all instances but the smallest networks (6 bits), while with the balanced layout it checked all instances with an overall median runtime of 3h 41m and, as shown in Tab. 1, roughly doubling at every bits increase, as also confirmed by the histogram in Fig. 4.

**Fig. 4:** Runtimes for bit-exact adversarial robustness checks of a classifier trained on the MNIST dataset using Boolector and our balanced SMT encodings. Runtime roughly doubles with each additional bit used for the quantization.

Our results demonstrate that our balanced association layout improves the performance of the SMT-solver, enabling it to scale to networks beyond 6 bits. Conversely, a standard linear encoding turned out to be ineffective on all tested SMT solvers. Besides, our method tackled networks with 890 neurons which, while small compared to state-of-the-art image classification models, already pose challenging benchmarks for the formal verification task. In the real-numbered world, for instance, off-the-shelf solvers could initially tackle up to 20 neurons [20], and modern techniques, while faster, are often evaluated on networks below 1000 neurons [14,4].

Additionally, we pushed our method to its limits, refining our MNIST network to a four-layers deep Convolutional network (2 Conv + 2 Fully-connected layers) with a total of 2238 neurons, which achieved a test accuracy of 98.56%. While for the 6-bits quantization we proved robustness for 99% of the tested samples within a median runtime of 3h 39min, for 7-bits and above all instances timed-out. Notably, Reluplex also failed on the real-numbered version, reporting numerical instability.

### **5.2 Comparison to other methods**

Looking at existing methods for verification, one has two options to verify quantized neural networks: verifying the real-valued network and hoping the functional property is preserved when quantizing the network, or relying on incomplete methods and hoping no counterexample is missed. A question that emerges is how accurate are these two approaches for verifying robustness of a quantized network? To answer this question, we used Reluplex [14] to prove the robustness of the real-valued network. Additionally, we compared to the Iterative Fast Gradient Sign Method (IFGSM), which has recently been proposed to generate <sup>∞</sup>-bounded adversarial attacks for quantized networks [28]; notably, IFGSM is incomplete in the sense that it may miss attacks. We then compared these two verification outcomes to the ground-truth obtained by our approach.

In our study, we employ the following notation. We use the term "false negative" (i) to describe cases in which the quantized network can be attacked, while no attack exists that fools the real-number network. Conversely, the term "false positive" (ii) describes the cases in which a real-number attack exists while the quantized network is robust. Furthermore, we use the term "invalid attack" (iii) to specify attacks produced for the real-valued network that fools the real-valued network but not the quantized network.

Regarding the real-numbered encoding, Reluplex accepts only pure ReLU networks. For this reason, we translate our ReLU-N networks into functionally equivalent ReLU networks, by translating each layer with

$$\text{ReLU-N}(W \cdot x + \mathbf{b}) = \text{ReLU}\left(-I \cdot \text{ReLU}(-W \cdot x - \mathbf{b} + N)\right). \tag{10}$$

Out of the 300 samples, at least one method timed out on 56 samples, leaving us with 244 samples whose results were computed over all networks. Tab. 2 depicts how frequently the robustness property could be transferred from the real-valued network to the quantized networks. Not surprisingly, we observed the trend that when increasing the precision of the network, the error between the quantized model and the real-valued model decreases. However, even for the 10-bit model, in 0.8% of the tested samples, verifying the real-valued model leads to a wrong conclusion about the robustness of the quantized network. Moreover, our results show the existence of samples where the 10-bit network is robustness while the real-valued is attackable and vice versa. The invalid attacks illustrate that the higher the precision of the quantization, the more targeted attacks need to be. For instance, while 94% of attacks generated for the real-valued network represented valid attacks on the 7-bit model, this percentage decrease to 80% for the 10-bit network.


**Table 2:** Transferability of vulnerability from the verification outcome of the realvalued network to the verification outcome of the quantized model. While vulnerability is transferable between the real-valued and the higher precision networks, (9 and 10-bits), in most of the tested cases, this discrepancy significantly increases when compressing the networks with fewer bits, i.e. see columns (i) and (ii).

Next, we compared how well incomplete methods are suited to reason about the robustness of quantized neural networks. We employed IFGSM to attack the 244 test samples for which we obtained the ground-truth robustness and measure how often IFGSM is correct about assessing the robustness of the network. For the sake of completeness, we perform the same analysis for the real-valued network.


**Table 3:** Transferability of incomplete robustness verification (IFGSM [28]) to groundtruth robustness (ours) for quantized networks. While for the real-valued and 10-bit networks our gradient based incomplete verification did not miss any possible attack, a non-trivial number of vulnerabilities were missed by IFGSM for the low-bit networks. The row indicted by R compares IFGSM attacking the floating-point implementation to the grouth-truth obtained, using Reluplex, by verifying the real-valued relaxation of the network.

Our results in Tab. 3 present the trend that with higher precision, e.g., 10 bits or reals, incomplete methods provide a stable estimate about the robustness of the network, i.e., IFGSM was able to find attacks for all non-robust samples. However, for lower precision levels, IFGSM missed a substantial amount of attacks, i.e., for the 7-bit network, IFGSM could not find a valid attack for 10% of the non-robust samples.

#### **5.3 The effect of quantization on robustness**

In Tab. 3 we show how standard accuracy and robust accuracy degrade on our MNIST classifier when increasing the compression level. The data indicates a constant discrepancy between standard accuracy and robustness; for real numbered networks, a similar fact was already known in the literature [26]: we empirically confirm that observation for our quantized networks, whose discrepancy fluctuated between 3 and 4% across all precision levels. Besides, while an acceptable, larger than 90%, standard accuracy was achieved at 7 bits, an equally acceptable robustness was achieved at 9 bits.

One relationship not shown in Tab. 3 is that these 4% of non-robust samples are not equal for across quantization levels. For instance, we observed samples


**Table 4:** Accuracy of the MNIST classifiers on the 244 test samples for which all quantization levels could be check within the given time budget. The column indicated by R compares the accuracy of the floating-point implementation to the robust accuracy of the real-valued relaxation of the network.

that are robust for 7-bit network but attackable when quantizing with 9- and 10 bits. Conversely, there are attacks for the 7-bit networks that are robust samples in the 8-bit network.

### **5.4 Network specifications beyond robustness**

Concerns have been raised that decisions of an ML system could discriminate towards certain groups due to a bias in the training data [2]. A vital issue in quantifying fairness is that neural networks are black-boxes, which makes it hard to explain how each input contributes to a particular decision.

We trained a network on a publicly available dataset consisting of 1000 students' personal information and academic test scores [1]. The personal features include gender, parental level of education, lunch plans, and whether the student took a preparation course for the test, all of which are discrete variables. We train a predictor for students' math scores, which is a discrete variable between 0 and 100. Notably, the dataset contains a potential source for gender bias: the mean math score among females is 63.63, while it is 68.73 among males.

The network we trained is composed of 2 hidden layers with 64 and 32 units, respectively. We use a 7-bit quantization-aware training scheme, achieving a 4.14% mean absolute error, i.e., the difference between predicted and actual math scores on the test set.

The network is *fair* if the gender of a person influences the predicted math score by at most the bias β. In other words, checking fairness amounts to verifying that

$$\bigwedge\_{i \neq \text{gender}} s\_i = t\_i \land s\_{\text{gender}} \neq t\_{\text{gender}} \implies |f(\mathbf{s}) - f(\mathbf{t})| \le \beta,\tag{11}$$

is valid over the variables *s* and *t*, which respectively model two students for which gender differs but all other features are identical—we call them twin students. When we encode the dual formula, we encode two copies of the semantics of the same network: to one copy we give one student *s* and take the respective grade g, to the other we give its twin *<sup>t</sup>* and take grade h; precisely, we check for the unsatisfiability the negation of formula in Eq. 11. Then, we compute a tight upper bound for the bias, that is the maximum possible change in predicted score for any two twins. To compute the tightest bias, we progressively increase β until our encoded formula becomes unsatisfiable.

We measure mean test error and gender bias of the 6- to the 10-bits quantization of the networks. We show the results in Tab. 5. The test error was stable


**Table 5:** Results for the formal analysis of the gender bias of a students' grade predictor. The maximum gender bias of the network monotonically decreases with increasing precision.

between 4.1 and 4.6% among all quantizations, showing that the change in precision did not affect the quality of the network in a way that was perceivable by standard measures. However, our formal analysis confirmed a gender bias in the network, producing twins with a 15 to 21 difference in predicted math score. Surprisingly, the bias monotonically increased as the precision level in quantization lowered, indicating to us that quantization plays a role in determining the bias.

## **6 Conclusion**

We introduced the first complete method for the verification of quantized neural networks which, by SMT solving over bit-vectors, accounts for their bit-precise semantics. We demonstrated, both theoretically and experimentally, that bitprecise reasoning is necessary to accurately ensure the robustness to adversarial attacks of a quantized network. We showed that robustness and non-robustness are non-monotonic in the number of bits for the numerical representation and that, consequently, the analysis of high-bits or real-numbered networks may derive false conclusions about their lower-bits quantizations. Experimentally, we confirmed that real-valued solvers produce many spurious results, especially on low-bit quantizations, and that also gradient descent may miss attacks. Additionally, we showed that quantization indeed affects not only robustness, but also other properties of neural networks, such as fairness. We also demonstrated that, using our balanced encoding, off-the-shelf SMT-solving can analyze networks with hundreds of neurons which, despite hitting the limits of current solvers, establishes an encouraging baseline for future research.

## **Acknowledgments**

An early version of this paper was put into the easychair repository as EasyChair Preprint no. 1000. This research was supported in part by the Austrian Science Fund (FWF) under grants S11402-N23(RiSE/SHiNE) and Z211-N23 (Wittgenstein Award), in part by the Aerospace Technology Institute (ATI), the Department for Business, Energy & Industrial Strategy (BEIS), and Innovate UK under the HICLASS project (113213).

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended

## Highly Automated Formal Proofs over Memory Usage of Assembly Code

Freek Verbeek<sup>1</sup>,<sup>2</sup> , Joshua A. Bockenek<sup>1</sup> , and Binoy Ravindran<sup>1</sup>

<sup>1</sup> Virginia Tech, Blacksburg VA, USA <sup>2</sup> Open University of The Netherlands, Heerlen, The Netherlands

Abstract. We present a methodology for generating a characterization of the memory used by an assembly program, as well as a formal proof that the assembly is bounded to the generated memory regions. A formal proof of memory usage is required for compositional reasoning over assembly programs. Moreover, it can be used to prove low-level security properties, such as integrity of the return address of a function. Our verification method is based on interactive theorem proving, but provides automation by generating pre- and postconditions, invariants, controlflow, and assumptions on memory layout. As a case study, three binaries of the Xen hypervisor are disassembled. These binaries are the result of a complex build-chain compiling production code, and contain various complex and nested loops, large and compound data structures, and functions with over 100 basic blocks. The methodology has been successfully applied to 251 functions, covering 12,252 assembly instructions.

Keywords: Formal Verification · Assembly · x86-64 · Memory Usage

## 1 Introduction

This paper presents a formal methodology for reasoning over the memory usage of functions in a software suite. Various security properties require knowledge on memory usage. For example, proving absence of buffer overflows requires proving that a function does not write outside certain memory regions. Controlflow integrity requires showing, among other things, that the return address cannot be overwritten [61]. The security property called non-interference requires reasoning over which parts of the memory are used by which functions [50].

Moreover, memory usage is crucial for compositional reasoning over assembly code. Typically, compositional reasoning requires proving that certain code fragments are spatially independent [45,47]. A proof of memory usage can be used to prove such independence, thereby allowing composition. Consider a function g that at some point calls function f. Compositional reasoning means that a verification effort over f can be reused for verification of g without unfolding it. This at least requires that the verification effort over f establishes that f does not modify the stack frame of g. More generally, compositional reasoning requires at least knowing that f restricts itself to certain parts of the memory. This is exactly what is established by proving memory usage.

Memory usage cannot satisfactorily be expressed at the source-code level. As an illustration, consider formulating a property that a function cannot overwrite its own return address. This requires knowledge on the values of the stack and frame pointers, making it an assembly-level property. At the assembly level, one can easily express a property formulating that the memory at the top of the stack frame (where the return address is stored) should remain unmodified.

Reasoning over assembly, however, is complicated due to the semantical gap between assembly and source code. In assembly code, ostensibly simple computations can be implemented using complex sequences of low-level operations. For example, a simple integer division by 10 can be implemented with a series of bit-level operations. Assembly code does not have types. It is common to, e.g., mix logical bitwise operators with signed integer arithmetic, or floating-point operations with bitvector operations. Assembly code does not have a clear distinction between stack frame and heap. Whether some address refers to a local variable stored in the stack, a global variable, or part of the heap, is provable only by adding assumptions on memory layout. Finally, assembly does not have a clear notion of scoping. Function calls are not necessarily clearly delineated, and instead of assuming that a function cannot write to a variable it has no access to (such as a local variable of another function), this has to be proven.

The contribution of this paper consists of a formal, compositional and highly automated methodology for reasoning over memory usage at the assemblylevel <sup>3</sup> . Our approach first uses untrusted tools to generate a formal memory usage certificate (see Section 2). This certificate contains 1.) theorems on memory usage, 2.) the preconditions under which memory usage can be shown, and 3.) proof ingredients. These proof ingredients contain assumptions on memory layout, control-flow information, and invariants. Section 2 provides an example of a function that theoretically can overwrite its own return address. We show that the certificate provides preconditions and a formal proof that a returnaddress-based exploit is not possible under those preconditions.

The certificate and the original assembly are loaded into an interactive theorem prover (ITP). Memory usage in general is an undecidable property (Rice's theorem [48]), which is why we aim for an ITP environment to allow user interaction when necessary. Using the proof ingredients, the certificate is formally proven correct with minimal user interaction, making use of customized proof strategies. Section 3 describes certificate verification and composition.

To demonstrate applicability and scalability, we apply the methodology to x86-64 binaries of the Xen hypervisor [13] (see Section 4). The binaries are obtained via the standard Xen build process, including optimizations. The binaries are decompiled to assembly using off-the-shelf disassembly tools. Our methodology is applied to 251 functions; for each function a certificate is automatically generated, and a proof is finished in the Isabelle/HOL theorem prover [44]. With-

<sup>3</sup> All code and proofs are publicly available [57].

out exception, the manual interaction consists of elementary interactive theorem proving such as applying the proper proof method.

While past work [38,41,25] on assembly-level formal verification exists, the degree of either scalability or automation is limited. As example of interactive theorem proving, Boyer and Yu verified machine-code implementations of various standard sort- and string functions, requiring over 19,000 lines of manually written proof code for the verification of roughly 900 instructions [8]. As example of automated theorem proving, Tan et al. presented an approach which takes about 6 hours for a 533-instruction string search algorithm [56]. In constrast, this paper involves a degree of user interaction of ≈85 lines of proof code per 1,000 lines of assembly. Our work is able to almost fully automatically verify 12,252 instructions from real world industrial binaries compiled by a real world build process. Section 5 discusses prior art, its contrast with the paper's work, and the paper's contributions. To the best of our knowledge, there is no related work that is able to achieve similar scalablity and automation on real world binaries.

## 2 Formal Memory Usage Certificates

Figure 1 provides an example of a formal memory usage certificate (FMUC). The FMUC is generated automatically from an assembly file. This assembly file may be produced from a binary using a disassembler such as objdump, IDA<sup>4</sup> , Ghidra's decompiler <sup>5</sup>, or Capstone [46]. In case source code is available, the assembly code can also be produced directly by a compiler. In this example, the C code of Figure 1a is used solely for presentation, the input to the FMUC generation is the assembly created by decompiling the corresponding binary. For each function in the assembly file, an FMUC is produced. External functions, for example due to dynamic linking, are treated as black boxes (see Section 3.4).

An FMUC consists of two parts: a memory usage theorem and its proof (see Figure 1c). The theorem consists of assumptions implying a Hoare triple [28,40] over the function. The Hoare triple is specific to memory usage. Intuitively, it means that from a state satisfying precondition P, after execution of code fragment f, the state satisfies postcondition Q (as in normal Hoare triples). The Hoare triple also contains a memory region set M. Besides its regular meaning, the Hoare triple expresses that any write that occurs during execution of f occurs within one of the memory regions in this set.

The term memory usage formally denotes an overapproximation of the memory written to by a function. Thus, any address that is not enclosed in one of the regions of M, is guaranteed to be preserved. Set M, however, will also include the memory regions read by the function, for verification purposes.

The precondition P expresses that the instruction pointer rip is at the entry point of the function. It also provides initial symbolic values for all registers and memory regions that are read (e.g.,: rsp = rsp0). Finally, it formulates that the return address is stored at the top of the stack frame. The postcondition Q

<sup>4</sup> https://www.hex-rays.com/products/ida/index.shtml

<sup>5</sup> https://ghidra-sre.org/

```
int main(int argc , char* argv[]) {
  int* a = (int*)argv;
  int* b = (int*)(argv + 4);
  *(int*)(argv + 2) = *a + *b;
  *(char*)argv = 'a';
  int array[argc];
  for (int i = 0; i < argc; i++) {
    array[i] = argv[i][0] * 2;
  }
  if (is_even(argc)) {
    return array[argc];
  }
  return array[0];
}
                (a) C Code
                                            Block 1149−>120b;
                                            Loop
                                             Block 123e−>1244;
                                             If SF -
                                                   = OF Then
                                               Block 120d−>123a
                                             Else Break Fi
                                            Pool;
                                            Block 1246−>1249;
                                            Block 124b−>124b; – call to is_even
                                            Block 1250−>1252;
                                            If ZF Then
                                             Block 1263−>1267
                                            Else Block 1254−>1261 Fi;
                                            Block 1269−>1279;
                                            If ZF Then
                                             Block 1280−>1285
                                            Else Block 127b−>127b Fi
                                             (b) Syntactic Control Flow f
```

```
thm: MRR =⇒ {P}f{Q; M}
proof:
   apply (check_scf_step)+
   apply (check_scf_while "P123e || P1246")
   apply (check_scf_step)+
where:
  P ≡ rip = 1149 ∧ rsp = rsp0 ∧ ... ∧ ∗[rsp, 8] = ret_addr
  Q ≡ rip = ret_addr ∧ rsp = rsp0 + 8 ∧ ... ∧ ∗[rsp0, 8] = ret_addr
```
(c) Theorem and proof code

<sup>M</sup> <sup>=</sup> {<sup>a</sup> = [rsp0, 8], b = [fs<sup>0</sup> + 40, 8], c = [rsi<sup>0</sup> + 36, 4], d = [rsp<sup>0</sup> <sup>−</sup> <sup>8</sup>, 8],...} MRR = {a, b, c, d, . . .} are separate

(d) The memory regions and their relations for block 123e−>1244.

P123e(σ) = rip = 123e rbp <sup>=</sup> rsp<sup>0</sup> <sup>−</sup> <sup>8</sup> rdi = rdi<sup>0</sup> rsp <sup>=</sup> rsp<sup>0</sup> <sup>−</sup> (88 + 16 <sup>∗</sup> ((15 + 4 <sup>∗</sup> sextend(31, <sup>0</sup>rdi0)) / 16)) <sup>∗</sup>[rsp<sup>0</sup> <sup>−</sup> <sup>40</sup>, 8] = rsp<sup>0</sup> <sup>−</sup> (85 + 16 <sup>∗</sup> ((15 + 4 <sup>∗</sup> sextend(31, <sup>0</sup>rdi0)) / 16)) >> <sup>2</sup> << <sup>2</sup> <sup>∗</sup>[rsp<sup>0</sup> <sup>−</sup> <sup>48</sup>, 8] = sextend(31, <sup>0</sup>rdi0) <sup>−</sup> <sup>1</sup> <sup>∗</sup>[rsp<sup>0</sup> <sup>−</sup> <sup>56</sup>, 8] = rsi<sup>0</sup> + 32 ...

(e) Invariant at line 0x123e (only 7 out of 23 equations shown)

{P124b} is\_even {P1250; Mis\_even }

(f) Assumption due to call of function is\_even

Fig. 1: An FMUC. Region [a, s] denotes a region of s bytes starting at 64-bit address a. Notation ∗r denotes reading region r in little-endian fashion. Notation 31, 0rdi<sup>0</sup> takes the lower 32 bits of the register.

expresses that the function has returned, i.e., the instruction pointer is equal to the return address and the stack pointer rsp is equal to its original value plus eight. For any callee-saved register, i.e., any register whose value is assumed to be preserved by the function call, it will say that its value is unchanged.

The component f of the memory usage theorem is a representation of the control flow of the function in terms of syntactic structures such as basic blocks, loops and if-then-else statements (see Figure 1b). We call this the syntactic control flow (SCF). The SCF is automatically generated from the control flow graph (CFG). The reason that a syntactic structure is required, is because the proof is done using Hoare logic, which is guided by syntax. The proof of an FMUC of an entire function is based on FMUCs per basic block. Thus one FMUC is generated per basic block, and one corollary FMUC for the entire function.

The proof consists of two further proof ingredients: memory region relations and invariants. We zoom in on block 123e−>1244 to explain both of these. The FMUC provides 13 regions for this block, of which 4 are shown (see Figure 1d). Region a stores the return address. Region b depends on the segment register fs and stores the canary [15]. Region c is based on the pointer passed as second argument to the function. Finally, region d is part of the stack frame. The generated memory region relations assume that all these regions are separate. Out of the per-block memory regions and their relations, memory regions and relations for the function as a whole are composed.

For each basic block, an invariant is generated. Stronger invariants can lead to a tighter approximation of memory usage. The invariant assigned to block 123e−>1244 is effectively a loop invariant (see Figure 1e). The frame pointer rbp is equal to the original stack pointer minus eight. Register rdi has not been touched. We also show some of the more complex invariants, such as the value of the stack pointer. In total, the loop invariant provides information on 11 registers and 12 memory locations for this basic block. Note that the FMUC provides preconditions in terms of the initial state of the corresponding basic block. In Section 3.2 these are lifted to preconditions in terms of the initial state of the function.

For this example, we treated is\_even as an external function (see Figure 1f). An assumption was thus generated, that expresses that the memory usage of that function suffices to show that the invariant at line 124b implies the invariant at line 1250. This means, among others, that the memory used by is\_even (denoted Mis\_even) should not overlap with regions a through d. Section 3.4 provides more information on composition.

The FMUC is generated automatically, except for the three line proof in Figure 1c. Due to the undecidability of memory usage, interaction may be required. Isabelle/HOL proof strategies are provided to assist in that interaction. Section 3 provides more details. The manual effort required in proving the FMUC for this function, consists simply of calling the proper proof strategies. First, check\_scf\_step is run, applying Hoare logic rules and proving correctness of the memory usage until the loop. Then, the proof strategy for dealing with the loop

is called, with the invariant generated from the FMUC. Finally, check\_scf\_step is called again, which is able to verify the remainder of the function.

Finally, note that without any assumptions the function could overwrite its own return address at various places. The memory region relations MRR are sufficiently strong to exclude this. These relations thus form the preconditions under which a return-address exploit is impossible. As example, they assume that regions a and c are separate. This means that the address stored in parameter argv (reflected as rsi<sup>0</sup> at the assembly level) is not allowed to point to a region within the stack frame of function main.

Due to space restriction, we omit details on the algorithms that generate an FMUC. In general, none of the FMUC generation is part of the trusted computing base. That is, none of the algorithms need to be backed up by formal proofs. The output of the FMUC generation is imported into Isabelle/HOL, where it is proven correct. If there is an error in CFG generation, control flow extraction, symbolic execution, or in the generated invariants, then the certificate cannot be proven in Isabelle/HOL. One exception is the memory region relations. They are assumptions, and if they are internally inconsistent this leads to a vacuous truth. For that reason, Z3 is used to generate them [39], making it impossible to introduce, e.g., a relation where two overlapping regions are considered separate.

## 3 FMUC Verification

This section presents the verification of an FMUC. Both the FMUC and the original assembly are loaded into Isabelle/HOL. The theorem is then proven using the proof ingredients stored in the FMUC. This means that given a step function that models the semantics of the assembly instructions, the Hoare triple is verified.

Let step :: <sup>I</sup> <sup>×</sup> <sup>S</sup> <sup>×</sup> <sup>S</sup> → <sup>B</sup> be a transition relation. It takes as input an instruction of type I and two states σ and σ- . It returns true if and only if execution of the instruction in state σ can produce state σ- . Undefined behavior, such as null-pointer dereferencing, is modeled by relating a state to any successor state. The semantics of a syntactic control flow (SCF) are straightforwardly defined by a function exec\_scf :: SCF <sup>×</sup> <sup>S</sup> <sup>×</sup> <sup>S</sup> → <sup>B</sup> (here SCF denotes the type of a syntactic control flow object). In case of loops the function is defined using a least fixed point construction. This way, if the halting condition is never met, there exists no related σ- .

First, we define the notion of memory usage wrt. a certain state change:

Definition 1. The set of memory regions M is the memory usage wrt. the state change from σ to σ- , if and only if, any byte at an address a not inside one of the regions is unchanged.

$$\text{usage}(M, \sigma, \sigma') \equiv \forall a \, \cdot \, \left( \forall r \in M \, \cdot \, \left[ a, 1 \right] \bowtie r \right) \Longrightarrow \sigma' : \ast[a, 1] = \sigma : \ast[a, 1]$$

Here, notation σ : ∗[a, s] means reading in little-endian fashion s bytes from memory address a in state σ. Notation r<sup>0</sup> r<sup>1</sup> denotes that two regions are separate.

Definition 2. A memory usage Hoare triple is defined as:

{P} f {Q; M}≡∀σ σ- · P(σ) ∧ exec\_scf(f, σ, σ- ) −→ Q(σ- ) ∧ usage(M, σ, σ- )

In words, Definition 2 states the following: if precondition P holds on the initial state σ and σ can be obtained by executing f, postcondition Q holds on the produced state and the values stored in all memory regions outside set M are preserved.

#### 3.1 Verification Tools Used

Isabelle/HOL The theorem prover utilized in this work was Isabelle 2018 [44]. It is a generic tool with a flexible, extensible syntactic framework. Isabelle also utilizes a powerful proof language known as intelligible semi-automated reasoning (Isar) [59] and a proof strategy language called Eisbach [37]. We made heavy use of Word library [17]. This library provides a limited-precision integer type, 'a word, where 'a is the number of bits in the integer. Various operations are provided for manipulation of and arithmetic involving formal words, including bit indexing, bit shifting, setting specific bits, and signed and unsigned arithmetic. Operators for inequality are also included, as well as operations for converting between word sizes.

Machine Model and Instruction Semantics Heule et al. provide semantics of the x86-64 architecture [27]. Instead of manually codifying instruction semantics, they applied machine learning to derive semantics from a live x86 machine. This produced highly reliable semantics: they compared the semantics to manually written semantics based on the Intel reference manuals, and found that in the few cases where they differed the Intel manuals were wrong. Roessle et al. embedded these semantics into the Isabelle/HOL theorem prover and tested the formal Isabelle semantics against live x86 hardware [49]. This formal machine model is the base of our verification effort.

Symbolic Execution Bockenek et al. provide an Isabelle/HOL symbolic execution engine based on the above semantics [6]. Effectively, this provides a function symb\_exec that symbolically runs basic blocks. Let a<sup>0</sup> and a<sup>1</sup> be the start- and end-addresses of the block. A call to symb\_exec(a0, a1, σ, σ- ) returns true if and only if state σ is the result of symbolically executing the block from state σ. The symbolic execution is completely written in Isabelle/HOL, meaning that every rewrite rule has been formally proven correct.

#### 3.2 Per-block Verification

Verification occurs by first verifying per basic block. Figure 2a shows an introduction rule for establishing a Hoare triple over a basic block. The first assumption requires the symbolic execution method to run over a universally quantified symbolic state σ that satisfies the precondition. Any resulting state σ should satisfy the postcondition Q, and the set of memory regions M generated for the block should be correct.

The second assumption is required because of an important subtlety: the regions generated in the FMUC are expressed in terms of the initial state of their basic block. However, it makes no sense to express the regions used by individual blocks within a larger function in terms of their own initial state. If a region of a basic block somewhere within a function body depends on, e.g., the value of register rdi at the start of that block, then it is unsound to express that memory region in terms of rdi0, i.e., the value of rdi at the start of the function. Therefore, the Hoare triples are defined based on a set of memory regions M- that solely depends on the initial state of the function. For each block, that set is obtained by taking the generated set of memory regions M (expressed in terms of the initial state of the block) and applying it to any state that satisfies the current invariant. This produces a set of regions expressed in terms of the initial state of the function.

An Isabelle proof strategy has been implemented that, given the proof ingredients from the FMUC, discharges this introduction rule. The proof strategy runs symbolic execution within Isabelle/HOL, proves the postcondition and proves the memory usage. The open variables P, Q, a0, a<sup>1</sup> and M are all provided by the FMUC. No interaction is required; for basic blocks the proof is automated.

#### 3.3 Verification of Function Body

$$\begin{array}{l} \forall \sigma' \quad \neg P(\sigma) \land \text{symb\\_exec}(a\_0, a\_1, \sigma, \sigma') \implies Q(\sigma') \land \text{usage}(M(\sigma), \sigma, \sigma')\\ \qquad M' = \{\text{\footnotesize{\textquotedblleft}{\textquotedblright}{}} \; \exists \sigma \; \neg P(\sigma) \land r \in M(\sigma) \; \} \\\\ \qquad \{P\} \; \textbf{Bock\\_a \; \text{\small{\kern{\textquotedblleft}{\textquotedblleft}{}}} \; \exists \mathbf{0} \; \text{\small{\kern{\textquotedblleft}{\textquotedblright}{}} \; \text{\small{\kern{\textquotedblleft}{\textquotedblleft}{}} \; \text{\small{\kern{\textquotedblleft}{\textquotedblright}{}}} \; \text{\small{\kern{\textquotedblleft}{\textquotedblright}{}} \; \text{\small{\kern{\textquotedblleft}{\textquotedblleft}{}} \; \text{\kern{\textquotedblleft}{\text{\small{\kern{\textquotedblleft}{\textquotedblleft}{}}} \; \text{\kern{\textquotedblleft}{\text{\small{\textquotedblleft}{}}} \; \text{\kern{\textquotedblleft}{\text{\small{\textquotedblleft}{}}} \; \text{\kern{\textquotedblleft}{\text{\small{\textquotedblleft}{}}} \; \text{\kern{\textquotedblleft}{\text{\small{\textquotedblleft}{}}} \; \text{\kern{\text{\small{\textquotedblleft}{}}} \; \text{\kern{\text{\textquotedblleft}{}}} \; \text{\kern{\text{\textquotedblleft}{}}} \; \text{\kern{\text{\textquotedblleft}{}}} \; \text{\kern{\text{\textquotedblleft}{}}} \; \text{\kern{\text{\textquotedblleft}{}}} \; \text{\kern{\text{\textquotedblleft}{}}} \; \text{\kern{\text{\textquoted$$

Fig. 2: Hoare rules for memory usage

For each syntactic construct, a Hoare rule is defined (see Figure 2). The sequence and conditional rules (only first is shown) are straightforward: the memory usage is the union of the memory usage of the constituents. Note that the sequence rule is sound only because the memory predicates are independent of the initial state of the basic blocks, as discussed above.

The while rule is based on a loop invariant I. If the memory usage of one iteration of function body f is constrained to the set of memory regions M, then that holds for the entire loop. This sounds counterintuitive. Consider a simple Clike loop iterating from i = 0 while i < 10 and as body the assignment a[i]=0, i.e., it writes to the ith element of an array. Verification of the loop requires the invariant I(σ) = i(σ) < 10. The FMUC of the loop body will have a set of memory regions M(σ) = {[a + i(σ), 1]}, i.e, one region of one byte, expressed in terms of the initial state of the basic block. Now consider the application of the introduction rule to the block of the loop body. It will introduce a Hoare triple with:

$$\begin{array}{lcl} M' = \{ \begin{array}{l} r \\ \end{array} & \exists \sigma \, \cdot \, I(\sigma) \land r \in M(\sigma) \} \\ & = \{ \begin{array}{l} r \\ \end{array} & \exists \sigma \, \cdot \, i(\sigma) < 10 \land r = [a + i(\sigma), 1] \} \\ & = \{ \begin{array}{l} [a', 1] \, \mid \, a \le a' \le a + 10 \end{array} \} \end{array}$$

The set M is actually the memory used by the entire loop. This is because the introduction rule applies the state-dependent set of memory regions to any state that satisfies the invariant. This shows that the strength of the generated invariants influences the tightness of the overapproximation of memory usage. A weaker invariant, e.g., i < 20, would produce a larger set of memory regions.

An Isabelle/HOL proof strategy is implemented that automatically applies the proper Hoare logic rule. It is driven by the syntactic control flow provided by the FMUC. For function bodies without loops, this proof strategy requires no further interaction. For each loop entry, it is required to manually apply the weaken rule to show that the postcondition of the block before entry implies the loop invariant. Without exception, each of these proofs could be finished using standard off-the-shelf Isabelle/HOL tools. The part that is usually the most involved – defining the invariants – is taken care of by the FMUC generation.

## 3.4 Composition

Let f be a function body. Assume that the function has been verified, i.e., a Hoare triple has been proven of the form: {Pf } <sup>f</sup> {Qf ; <sup>M</sup>f }. In order to composably reuse that verification effort, function f is considered to be a black box once it is verified. Now consider a function g calling function f:

a0: push rbp a1: call f a2: pop rbp a3: ret

Let P denote the precondition right before executing the assembly instruction call. Precondition P contains the equality ∗[rsp<sup>g</sup> <sup>0</sup> − 8, 8] = rbp<sup>g</sup> <sup>0</sup>, expressing that function g has pushed frame pointer rbp into its own local stack frame. Let Q denote the postcondition just after returning, but before executing pop. The postcondition of g expresses that callee-saved register rbp is properly restored, i.e., rbp = rbp<sup>g</sup> <sup>0</sup>. That is indeed done by the pop instruction. In order to prove proper restoration of rbp, it must be proven that function f did not overwrite any byte in region [rsp<sup>g</sup> <sup>0</sup> − 8, 8]. Additionally, function f must be proven not to overwrite region [rsp<sup>g</sup> <sup>0</sup>, 8] which stores the return address of g. For this particular instance of calling f, it thus must be proven that f preserves these two regions.

More generically, function f can be called by various functions other than g. For each call the specific requirements on which memory regions are required to be preserved differ. Thus, to be able to verify function f once, and reuse that verification effort for each call, the verification effort must at least contain an overapproximation of the memory written to by function f. Note that this is exactly the requirement when using separation logic [45,47,33]. Separation logic provides a frame rule for compositional reasoning. This frame rule informally states that if a program can be confined to a certain part of a state, properties of this program carry over when the program is part of a bigger system.

We thus provide a version of the frame rule of separation logic, specific to memory usage verification (see Figure 3). Effectively, this rule is used to prove that the memory usage of a caller function g is equal to the memory it uses itself, plus the memory used by function f. It requires four assumptions. First, it assumes function <sup>f</sup> has been verified for memory usage, with <sup>M</sup>f denoting that memory usage. Second, it assumes that precondition P can be split up into two parts: precondition <sup>P</sup>f required to verify function <sup>f</sup>, and a separate part <sup>P</sup>sep. The separate part is specific to the actual call of the function. In the example, Psep will contain the equality [rsp<sup>g</sup> <sup>0</sup> − 8, 8] = rbp<sup>g</sup> <sup>0</sup>. Third, the correctness of the set of memory regions <sup>M</sup>f should suffice to prove that the separated part <sup>P</sup>sep is preserved. In the example, this effectively means that <sup>M</sup>f should not overlap with the two regions of <sup>g</sup>. Fourth, <sup>P</sup>sep and <sup>Q</sup>f should imply postcondition <sup>Q</sup>.

$$\begin{array}{l} \{P\_f\} \int f \left\{Q\_f; M\_f\right\} \\ P \Longrightarrow P\_f \land P\_{\text{seq}} \\ \forall \sigma \; \sigma' \; \cdot \; \text{usage}(M\_f, \sigma, \sigma') \land P\_{\text{seq}}(\sigma) \longrightarrow P\_{\text{seq}}(\sigma') \\ Q\_f \land P\_{\text{seq}} \Longrightarrow Q \\ \hline \end{array}$$
 
$$\begin{array}{l} \{P\} \; \textbf{Ca11} \; f \; \{Q; M\_f\} \end{array}$$

Fig. 3: Frame rule for composition of memory usage

In practice, many functions will not be part of the assembly code under verification (e.g., external calls). We thus have to generate the assumptions required to proceed with verification. To this end, we introduce the following notation:

{P} <sup>f</sup> {Q; <sup>M</sup>f }≡∃ <sup>P</sup>f <sup>Q</sup>f <sup>P</sup>sep · four assumptions of frame rule are satisfied

Making this assumption informally expresses that function f is assumed to have been verified. Its memory usage <sup>M</sup>f is assumed to suffice to prove that we could step from states satisfying P to states satisfying Q.

## 4 Case Study: Xen Project

The Xen Project [13] is a mature, widely-used virtual machine monitor (VMM), also known as a hypervisor. Hypervisors provide a method of managing multiple virtual instances of operating systems (called guests or domains) on a physical host. The Xen hypervisor is a suitable case study because of its security relevance and its complex build process involving real production code. Security is a significant issue in environments where hypervisors are used, such as the Amazon Elastic Compute Cloud (Amazon EC2), Rackspace Cloud, and many other cloud service providers. For example, when one or more physical hosts support virtual guests for any number of distinct users, ensuring isolation of the guest operating systems (OSs) is important. The Xen build process produces multiple binaries that contain functions not present in the Xen source itself. This is due to the inclusion of external static libraries and programs. We used Xen 4.12 compiled with GCC 8.2 via the standard Xen build process. This build process uses various optimization levels, ranging from O1 to O3.

Of the binaries produced by the Xen build process, we considered three: xenstore, xen-cpuid, and qemu-img-xen. The xenstore binary is involved in the functionality of XenStore <sup>6</sup> , a hierarchical data structure shared amongst all Xen domains. The xen-cpuid utility queries the underlying processors and displays information about the features they support. The third binary, qemuimg-xen, consists of over three hundred functions that are not present in the Xen source code. It provides some of the functionality of Quick Emulator (QEMU). QEMU is a free, open-source emulator <sup>7</sup> . Xen uses it to emulate device models (DMs), which provide an interface for hardware storage.


Fig. 4: Case Study Overview

<sup>6</sup> https://wiki.xen.org/wiki/XenStore

<sup>7</sup> https://www.qemu.org/

Our methodology is currently capable of dealing with 71% of the functions present in these binaries (see Figure 4). The supported features include (nested) loops, subcalls, variable argument lists, jumps into other function bodies, string instructions with the rep prefix. There is no particular limit on function size. The average number of instructions per function analyzed is 49. Some of the functions analyzed have over 300 instructions and over 100 basic blocks.

There are five categories of features we do not support. The first and most common is indirection, accounting for 19%. Indirection involves a call or jump instruction that loads the target address from a register or memory location rather than using a static value. Switch statements and certain uses of goto are the most common causes of indirect jumps. Indirect calls generally result from usage of function pointers. For example, the main functions of all three verified binaries used switch statements in loops in the process of parsing command line options. These statements introduced indirect branches.

The second category involves issues related to generating the memory region relations. This step requires solving linear arithmetic over symbolically computed addresses. Sometimes, addresses are computed using a combination of arithmetic operators with bitwise logical operators. In some of these cases, our translation to Z3 does not produce an answer. As an example, function qcow\_open uses the rotate-left function to compute an address. As another example, function AES\_set\_encrypt\_key produces addresses that are obtained via combinations of bit-shifting, bit masking, and xor-ing.

The instruction repz cmps is currently not supported for technical reasons. It is the assembly equivalent of the function strncmp, but instead writes its result to a flag. Various other string-related instructions with the rep prefix are supported. Functions with recursion, a minority in systems code, are also not supported. Recursive stack frames in our framework are not well-suited to automation. The two recursive functions we encountered both perform file-system-like tasks. Functions do\_chmod and do\_ls are similar respectively to the permission-setting chmod utility, and directory-displaying ls. The final category is functions whose SCF explodes. The issue occurs mostly when loops have multiple entries.

The table in Figure 4 provides an overview of the verification effort. The table shows the absolute counts of functions verified as well as the total number of instructions for those functions. Alongside that information is the number of functions with loops that were verified and how many manual lines of proof were required in total. The vast majority of those manual proof lines were related to the loop count.

## 5 Related Work

Assembly verification has been an active research field for decades. Table 1 provides an inexhaustive overview of related work. We first address some formal verification efforts at the assembly level. Then we discuss work in which assembly verification played a role in a larger verification context. Finally, verified compilation and static binary analysis tools are discussed.

Assembly-level Verification. Clutterbuck et al. [14] performed formal verification of assembly code using SPACE-8080, a verifiable subset of the Intel 8080 instruction set architecture (ISA) that is analyzable and formally verifiable [12]. Not long after, Bevier et al. presented a systems approach to software verification [5,7]. Their work laid out a methodology for verifying the correctness of all components necessary to execute a program correctly, including compiler, assembler and linker. The methodology was applied to a small OS kernel, Kit [4]. Similarly, Yu and Boyer [60,8] presented operational semantics and mechanized reasoning for approximately 80% of the instructions of the MC68020 microprocessor, over 85 instructions. Their approach utilized symbolic execution of operational semantics. These early efforts required significant interaction. For example, the approach of Yu and Boyer required over 19,000 lines of manually written proof to verify approximately 900 assembly instructions.

Matthews et al. targeted a simple machine model called TINY as well as Java virtual machine (JVM) bitcode using the M5 operational model [38]. Their approach utilizes symbolic execution of code annotated with manually written invariants. It also used verification condition generation to increase automation. This reduced the number of manually written invariants. Both of these assembly-style languages feature a stack for handling scratch variables rather than a register file as x86, ARM, and most other mainstream ISAs do.

Goel et al. presented an approach for modeling and verifying non-deterministic programs on the binary level [25,24]. In addition to formulating the semantics of most user-mode x86 instructions, they provided semantics for common system calls. System call semantics increase the spread of programs that can be fully verified. Their work was applied to multiple small case studies, including a word count program and two kernel-mode memory copying examples.

Bockenek et al. provide an approach to proving memory usage over x86 code [6]. They used a Floyd-style reasoning framework to prove Floyd invariants over functions [21]. They have applied it to functions of the HermitCore unikernel, covering 2,613 assembly instructions. Their approach required a significant amount of manual effort: pre- and postconditions, invariants, the actual regions of memory used and their relations all need to be manually defined.

The main difference between these existing approaches and the methodology presented in this paper concerns automation. Generally, interactive theorem proving over semantics of assembly instructions does not scale due to the amount of intricate user interaction involved. Figure 1e shows, e.g., the complexity of defining an assembly-level invariant even for a small example. Fully automated approaches to formal verification, however, do not scale either. The recent automated approach AUSPICE takes about 6 hours for a 533-instruction string search algorithm [56]. To the best of our knowledge, our methodology is the first that is able to deal with optimized x86-64 binaries produced by production code, with a "manual effort vs. instruction count ratio" of roughly 1 to 11.

Myreen et al. developed decompilation-into-logic [40,41,42]. That work, developed in the HOL4 theorem prover [54], uses operational semantics of machine code to lift programs into a functional form. That functional form can then be


Table 1: Overview of Related Work.

used in a Hoare logic framework for program analysis [40]. Decompilation-intologic has been used for both ARM and x86 ISA machine models, and applied to various large examples, including benchmarks such as a garbage collector, and the Skein hash function. Decompilation-into-logic covers – formally – the gap between machine code and a HOL model. It is not a verification method in itself, i.e., it does not verify properties over the machine code. It can be used as a component in a binary-level verification methodology [51].

Feng et al. presented stack abstractions for modular verification of assembly code [20,19]. Their work allows for integration of various proof-carrying code systems [43]. As with our work, it utilizes a Hoare-style framework for its verification. The authors applied their work to multiple example functions, such as two factorial implementations. In constrast to our approach, manual annotations are required to provide information regarding invariants and memory layout.

Integrated Assembly-Level Verification Efforts. A major verification effort, based on decompilation-into-logic, is the verification of the seL4 kernel [32,31]. The seL4 project provides a microkernel written in formally proven correct C code. The tool AutoCorres [26] is used for C code verification. Sewell et al. verified a refinement relation between the C source code and an ARM binary for both non-optimized and optimized at O2 [51]. The major differences with respect to our work is that our methodology targets existing production code, instead of code written with verification in mind. For example, the seL4 source code does not allow taking the addresses of stack variables (such as in Figure 1a): their approach requires a static separation of stack and heap. Neither the seL4 proof effort nor our methodology support function pointers.

Shi et al. formally verified a real-time operating system (RTOS) for automotive use called ORIENTAIS [52]. Part of their approach involved source-level verification using a combination of Hoare logic and abstract communicating sequential processes (CSP) model analysis [29]. Binary verification was done by lifting the RTOS binary to xBIL, a related hardware verification language [53]. They translated requirements from the OSEK automotive industry standard to source code annotations.

Targeting a similar case study as this paper, Dam et al. formally verified a tiny ARMv7 hypervisor, PROSPER [16,3] at the assembly level. Their methodology integrated HOL4 with the Binary Analysis Platform (BAP) [9]. BAP utilizes a custom intermediate language that provides an architecture-agnostic representation of machine instructions and their side effects. HOL4 was used to translate the ARM binary into BAP's intermediate language, using the formal model of the ARM ISA by Fox et al.[22]. The SMT solver Simple Theorem Prover (STP) [23] was used to determine the targets of indirect branches and to discharge the generated verification conditions. While the approach was generally automated, user input was still required to describe software contracts of the hypervisor.

Verified Compilation. In contrast to directly verifying machine or assembly code, one can verify source code and then use verified compilation. Verified compilation establishes a refinement relation between assembly and source code. The CompCert project [36] provides a compiler for a subset of C. Its output has been verified to have the same semantics as the C source code. The seL4 project used CompCert to reduce its trusted code base [31]. Another example of verified compilation is CakeML [35]. It utilizes a subset of Standard ML modeled with big-step operational semantics. The main purpose of verified compilation, however, is not to verify properties over the code. For example, if the source code is vulnerable to a return-address exploit, then the assembly code is vulnerable as well. Verified compilation is thus often accompanied by source code verification. We have argued that for memory usage, assembly-level verification is necessary.

Static Analysis. Static analysis of binary code has been an active research field for decades [34,9,58]. The BitBlaze project [55] provides a tool called Vine which constructs control flow graphs for supplied programs and lifts x86 instructions to its own intermediate language (IL). Though Vine itself is not formally verified, it does support interfacing with the SMT solver STP as well as CVC [1,2]. The tool Infer [10], developed at Facebook, provides in-depth static analysis of LLVM code to detect bugs in C and C++ programs. It utilizes separation logic [47] and bi-abduction [11] to perform its analyses in an automated fashion. It is designed to be integrated into compiler toolchains, in order to provide immediate feedback even in continuous integration scenarios. FindBugs is a static analysis tool for Java code [30]. Rather than relying on formal methods, it uses searches for common code idioms to detect likely bugs. Common errors it highlights include null pointer dereferences, objects that compare equal not having equal hash codes, and inconsistent synchronization. The tool Splint [18] detects buffer overflows and similar potential security flaws in C code. It relies on annotated preconditions to derive postconditions.

The main difference between these static analysis tools and formal verification is that these tools generally are highly suited to find bugs, but are not able to prove absence of them. They generally apply techniques that are formally unsound, such as depth-bounded searches.

## 6 Conclusion

This paper presents an approach to formal verification of memory usage of functions in a compiled program. Memory usage is a property that expresses an overapproximation of the memory used by assembly code. Memory usage is fundamental to compositional verification of assembly code, as compositionality at least requires to prove that functions do not unexpectedly interfere with each others' stack frame. It can also be used to show security-related properties, such as integrity of the return address.

Our approach automatically generates a formal memory usage certificate that includes 1.) a set of memory regions read from and written to, 2.) postconditions that express sanity constraints over the function (e.g., the return address has not been overwritten, callee-saved registers are restored), 3.) proof ingredients such as the preconditions necessary for formal verification. The certificate is loaded into a theorem prover, where it is verified. Since the problem of memory usage is undecidable, we use an interactive theorem prover. The proof ingredients, combined with custom proof strategies, provide a large degree of automation. They deal with memory aliasing, the control flow of the function, and invariants.

The approach is applied to three binaries of the Xen hypervisor. These binaries contain production code and are the result of a complex build chain. They contain, among others, various nested loops, large and compound data structures, variadic functions, and both in- and external function calls. For 71% of the functions of these binaries, a certificate could be generated and verified. For each of these functions, it has at least been formally proven that the return address is not overwritten. The amount of user interaction is roughly 85 lines of proof code per 1,000 lines of assembly code. The greatest bottleneck is in indirect branching, which accounts for 19% of the functions.

In the near future we aim to support indirect branching. This would allow support of switches, callbacks, and pointers to functions. Additionally, we aim to strengthen the invariant generation. Stronger invariants lead to a tighter overapproximation of memory usage. The challenge here is not only to generate these invariants, but to automate their proof as well. Finally, we want to leverage the certificate to target high-level security properties, such as noninterference.

Data Availability Statement and Acknowledgments All code and proofs are available in the Zenodo repository: 10.5281/zenodo.3676687. Distribution statement: Approved for public release; distribution is unlimited. This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Agreement No. HR.00112090028, ONR under grant N00014-17-1-2297, and NAVSEA/NEEC under grant N00174-16-C-0018.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **GASOL: Gas Analysis and Optimization for Ethereum Smart Contracts** ∗ †

Elvira Albert<sup>1</sup>,<sup>2</sup> , Jes´us Correas<sup>2</sup> , Pablo Gordillo<sup>2</sup> , Guillermo Rom´an-D´ıez<sup>3</sup> , and Albert Rubio<sup>1</sup>,<sup>2</sup>

<sup>1</sup> Instituto de Tecnolog´ıa del Conocimiento, Spain <sup>2</sup> Complutense University of Madrid, Spain

<sup>3</sup> Universidad Polit´ecnica de Madrid, Spain

**Abstract.** We present the main concepts, components, and usage of Gasol, a Gas AnalysiS and Optimization tooL for Ethereum smart contracts. Gasol offers a wide variety of *cost models* that allow inferring the gas consumption associated to selected types of EVM instructions and/or inferring the number of times that such types of bytecode instructions are executed. Among others, we have cost models to measure only storage opcodes, to measure a selected family of gas-consumption opcodes following the Ethereum's classification, to estimate the cost of a selected program line, etc. After choosing the desired cost model and the function of interest, Gasol returns to the user an upper bound of the cost for this function. As the gas consumption is often dominated by the instructions that access the storage, Gasol uses the gas analysis to detect under-optimized storage patterns, and includes an (optional) automatic optimization of the selected function. Our tool can be used within an Eclipse plugin for Solidity which displays the gas and instructions bounds and, when applicable, the gas-optimized Solidity function.

## **1 Introduction and Main Applications**

Ethereum [27] is a global, open-source platform for decentralized applications that has become the world's leading programmable blockchain. As other blockchains, Ethereum has a native cryptocurrency named Ether. Unlike other blockchains, Ethereum is programmable using a Turing complete language, i.e., developers can code smart contracts that control digital value, run exactly as programmed, and are immutable. A smart contract is basically a collection of code (its functions) and data (its state) that resides at a specific address on the Ethereum blockchain. Smart contracts on the Ethereum blockchain are metered using gas. Gas is a unit that measures the amount of computational effort that it will take to execute each operation. Every single operation in Ethereum, be it

<sup>∗</sup>This work was funded partially by the Spanish MCIU, AEI and FEDER (EU) projects RTI2018-094403-B-C31 and RTI2018-094403-B-C33, the MINECO and FEDER (EU) projects TIN2015-69175-C4-2-R and TIN2015-69175-C4-3-R, by the CM projects P2018/TCS-4314 and S2018/TCS-4339 co-funded by EIE Funds of the EU and by the UCM CT27/16-CT28/16 grant.

<sup>†</sup>The software and dataset used during the current study are available at 10.6084/ m9.figshare.11876697

a transaction or a smart contract instruction execution, requires some amount of gas. The gas consumption of the Ethereum Virtual Machine (EVM) instructions is spelled out in [27]; importantly, instructions that use replicated storage are gas-expensive. Miners get paid an amount in Ether which is equivalent to the total amount of gas it took them to execute a complete operation. The rationale for gas metering is threefold: (i) Paying for gas at the moment of proposing the transaction prevents the emitter from wasting miners computational power by requiring them to perform worthless intensive work. (ii) Gas fees disincentive users to consume too much of replicated storage, which is a valuable resource in a blockchain-based consensus system (this is why storage bytecodes are gasexpensive). (iii) It puts a cap on the number of computations that a transaction can execute, hence prevents DoS attacks based on non-terminating executions.

Solidity [13] is the most popular language to write Ethereum smart contracts that are then compiled into EVM bytecode. The Solidity compiler, **solc**, is able to generate only constant gas bounds. However, when the bounds are parametric expressions that depend on the function parameters, on the contract state, or on the blockchain state (according to the experiments in [8] this happens in almost 10% of the functions), named **solc**, returns ∞ as gas bound. This paper presents Gasol [6], a resource analysis and optimization tool that is able to infer parametric bounds and optimize the gas consumption of Ethereum smart contracts. Gasol takes as input a smart contract (either in EVM, disassembled EVM, or in Solidity source code), a selection of a cost model among those available in the system (c.f. Section 2), and a selected public function, and it automatically infers cost upper bounds for this function. Optionally, the user can enable the gas optimization option (c.f. Section 3) to optimize the function w.r.t. storage usage, a highly valuable resource. Gasol has a wide range of applications: (1) It can be used to estimate the gas fee for running transactions, as it soundly over-approximates the gas consumption of functions. (2) It can be used to certify that the contract is free of out-of-gas vulnerabilities, as our bounds ensure that if the gas limit paid by the user is higher than our inferred gas bounds, the contract will not run out-of-gas. (3) As an attacker, one might estimate, how much Ether (in gas), an adversary has to pour into a contract in order to execute an out-of-gas attack. Also, attacks were produced by introducing a very large number of underpriced bytecode instructions [23]. Our cost models could allow detecting these second type of attacks by measuring how many instructions will be executed (that should be very large) while its associated gas consumption remains very low. (4) As we will show in the paper, the gas analysis can be used to detect gas-expensive fragments of code and automatically optimize them.

## **2 Gas Analysis using Gasol**

Figure 1 overviews the components of the Gasol tool [6]. The programmer can use Gasol during the software development process from its Eclipse plugin that allows selecting the cost model of interest and the function to be analyzed and/or optimized from the Outline. This selection together with the compiled EVM code is sent to the gas analyzer. A technical description of all phases

**Fig. 1.** Overview of Gasol's components

that comprise a gas analysis for EVM smart contracts is given in [8]. Basically, the analyzer uses various tools [3,7] to extract the CFGs and decompile them into a high-level representation from which upper bounds (UB) are produced by using extensions of resource analyzers and solvers [4,5]. However, in our basic gas analyzer named gastap [8], there was only one cost model to compute the overall gas consumption of the function (including the opcode and memory gas costs [27]), while Gasol is an extension of gastap that introduces optimization, a wide variety of analysis options to define novel cost models, and an Eclipse plugin. The UBs are provided to the user in the console as well as in markers for functions within the Eclipse editor. If the user had selected the optimization option, the analyzer detects potential sources of optimization and feeds them to the optimizer to generate an optimized Solidity function within a new file.

Fig. 2 displays our Eclipse plugin that contains a fragment of the public smart contract ExtraBalToken [1] used as running example. We can see its six state variables and its function fill that we will analyze and optimize. The right side window shows Gasol's configuration options to set up the cost model:

(i) Type of resource (gas/instructions): by selecting gas, we estimate the gas consumption according to the gas model in [27] (hence, use Gasol as a gas analyzer); by selecting instructions, we estimate the number of bytecode instructions executed (using Gasol as a standard complexity analyzer).

(ii) Type of instructions: allows selecting which instructions (or group of instructions) will be measured as follows.



**Fig. 2.** Excerpt of smart contract ExtraBalToken in Solidity within Eclipse plugin.





(iii) Filter : this is a text field used to filter out information from the UBs. For gas-family, the user can specify low, mid, etc. For storage, it allows specifying the name of the basic field(s) whose storage will be measured. For line and selected, we can type the line numbers and names of bytecode instructions of interest. Once all options have been selected, we have set up a cost model that is sent together with the EVM code to the gas analyzer and, after analysis, it outputs an UB for the selected function w.r.t. the cost model activated by the options. This UB is displayed, as shown in Fig. 2 in the console of the Eclipse plugin, and also within markers next to the function definition.

## **3 Gas Optimization using Gasol**

The information yield by the gas analysis is used in Gasol to detect potential optimizations. Currently, the optimization target is the reduction of the gas consumption associated to the usage of storage. In particular, we aim at replacing multiple accesses to the same (global) storage data within a fragment of code (each write access costs 20.000 in the worst case and 5.000 in the best case) by one access that copies the data in storage to a (local) memory position followed by accesses to such memory position (an access to the local memory costs only 3) and a final update to the storage if needed. The cost model number of instructions for storage-optimization described in Sec. 2 allows us to detect such storage optimizations, namely for each different field, if we get a bound that is different from one, we know that there may be multiple accesses to the same position in the storage and we try to replace them by gas-efficient memory accesses. Our transformation is done at the level of the Solidity code, by defining a local variable with the same name as the state variable to transform, and introducing setter and getter functions to access the storage variable. Currently, we can transform accesses to variables of basic types, in the future, we plan to extend it to data structures (maps and arrays). The number of instructions bound for field totalSupply is 2 · data (hence = 1), and our optimization of fill is:

```
1 function fill (uint [] data) {
2 uint256 totalSupply = get field totalSupply () ;
3
4 if ((msg.sender != owner)||(sealed ))
5 throw;
6 for (uint i=0; i<data.length; i++) {
7 address a = address( data[ i ] & (D160−1) );
                                                 8 uint amount = data[i] / D160;
                                                 9 if (balanceOf[a] == 0) {
                                                10 balanceOf[a] = amount;
                                                11 totalSupply += amount;
                                                12 }
                                                13 }
                                                14 set field totalSupply ( totalSupply );
                                                15 }
```
The gas bound (using the option All) for the optimized fill yield by Gasol is 21368 + 20674 · data, which means that, assuming the worst case for write access to storage, the gas consumed inside the loop is 49.45% smaller than the one for the original fill function (the memory gas does not change). Note that, even if we consider the best case of 5.000 for write access to storage for the accesses we have optimized, the gas reduction is still around 20%. This is, in fact, what we have manually estimated using the actual data of the 82 times this function has been executed in the Ethereum blockchain, achieving with Gasol a total saving of almost 60M gas. As our transformation is local to the function, in order to be sound, we check that the transformed global data is not being accessed by transitive calls. For instance, if there was a call to another function from function fill that accesses totalSupply, we would not transform it. Besides, for efficiency, we check if all accesses are read (bytecode SLOAD) and, in such case, we do not need to invoke the setter at the end (and avoid an unnecessary write access).

## **4 Related Tools and Conclusions**

Numerous tools are being developed to catch different types of vulnerabilities of smart contracts [20,16,22,19,17,26,18,10,15,9]. As mentioned in Sec. 1, the Solidity compiler solc is not able to give any gas estimation for the running example, as its gas consumption is not constant. Therefore, new gas analysis tools are being developed to detect potential gas related vulnerabilities and to infer bounds in these complex situations. The purpose of the Gasper and MadMax tools is precisely the detection of gas related vulnerabilities. MadMax [14] focuses on identifying control- and data-flow patterns inherent for the gas-related vulnerabilities, thus, it works as a bug-finder, rather than as a gas analyzer like Gasol. Similarly, Gasper identifies gas-costly programming patterns [12] by matching specific control-flow patterns and using SMT solvers and symbolic computation. Thus, it is an optimization detector, not an automatic optimizer as Gasol. The recently developed ebso tool [24] also aims at optimizing the gas consumption of EVM code. In contrast to Gasol, ebso's optimizations are limited to a basic block level, while our transformation might involve several blocks of the CFG and would not be achievable by ebso's approach. Also, ebso is not guided by the results of an automatic resource analysis which can capture the expensive storage patterns as in our case. Instead it is based on a full exploration of all possible alternative instructions (within the considered block) that would lead to the same result and consume less gas. They have obtained a number of rewrite rules that define sequences of bytecode instructions that can be replaced by equivalent ones that consume less. We could easily incorporate such basic block replacement optimizations within our tool, and it is part of our agenda.

The approach of [21], like ours, aims at inferring precise gas bounds. Their approach is based on symbolically enumerating all execution paths [11] and unwinding loops to a limit. Instead, using resource analysis, Gasol infers the maximal number of iterations for loops and generates accurate gas bounds which are valid for any possible execution of the function and not only for the unwound paths. The approach by Marescotti et al. has not been implemented in the context of EVM and a tool like Gasol has not been delivered. An orthogonal line of work with ours is the construction of resource-oriented attacks [23] that exploit the weaknesses of the EVM gas model. Gasol's cost models could help detect this resource-oriented attacks by estimating the number of executed bytecode instructions (very high) and their associated gas consumption (very low).

Finally, there is a tendency to define new languages (see Scilla [25], Michelson [2]) for programming smart contracts that provide certain safety guarantees, e.g., Scilla [25] provides predictable gas consumption by disallowing general recursion and while-loops. However, Ethereum is today the most widely used blockchain, and Solidity the most popular programming language to write Ethereum smart contracts, for which a gas analyzer+optimizer is of clear relevance.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## CPU Energy Meter: A Tool for Energy-Aware Algorithms Engineering

Dirk Beyer and Philipp Wendler

LMU Munich, Germany

Abstract. Verification algorithms are among the most resource-intensive computation tasks. Saving energy is important for our living environment and to save cost in data centers. Yet, researchers compare the efficiency of algorithms still in terms of consumption of CPU time (or even wall time). Perhaps one reason for this is that measuring energy consumption of computational processes is not as convenient as measuring the consumed time and there is no sufficient tool support. To close this gap, we contribute CPU Energy Meter, a small tool that takes care of reading the energy values that Intel CPUs track inside the chip. In order to make energy measurements as easy as possible, we integrated CPU Energy Meter into BenchExec, a benchmarking tool that is already used by many researchers and competitions in the domain of formal methods. As evidence for usefulness, we explored the energy consumption of some state-of-the-art verifiers and report some interesting insights, for example, that energy consumption is not necessarily correlated with CPU time.

Keywords: Energy Measurement · RAPL · Benchmarking · BenchExec

## 1 Introduction

There is a strong demand to save electrical energy, of which nowadays a large portion is used by computational processes. Most importantly, we need to protect the environment that we live in, but we also need to consider that energy usage is one of the most important cost factors in data centers: after computing devices are purchased and installed, the operational cost is dominated by the cost of consumed electrical energy. And since most of the used electrical energy is turned into heat energy, there is follow-up cost for the cooling system, which sets the limits of used energy for each rack in a data center [16].

In order to control energy consumption, we first need to measure it. Work in the area of green software engineering identified a lack of data and insufficient tool support [12]. Energy consumption of an algorithm is often reduced to CPU time, which seems to be a natural choice at a first look, but after more accurate measurement we know that this reduction leads to wrong conclusions.

Why is energy usage of verification algorithms not measured but only CPU time? Most likely it is technically too difficult for researchers to measure energy consumption, because it would require external hardware that is not common or because internal energy measurements are not well-known and complex to use.

In order to provide a solution to this problem, we contribute an open-source lightweight tool that enables convenient energy measurement for a large range of modern CPUs. The tool CPU Energy Meter makes it easy and convenient to access energy measurements done by the CPU for various of its parts. Furthermore, we integrate energy measurement in the benchmarking framework BenchExec, which is widely used by researchers and competitions (e.g., [2]).

Using CPU Energy Meter does not require any extra hardware, but accesses the existing feature for energy measurement called RAPL that Intel CPUs provide. This convenience comes with a limitation: We can only access measurement values for those parts of the computing board that the CPU measures, but no external equipment, such as hard drives and the power supply itself.

Related Work. Energy measurements should be used for algorithm engineering [1], and there is a strong need for tool support, such as PowerPack [8]. RAPL is being studied as a measurement method for energy consumption [6, 9, 10, 13, 17], and energy measurements that are based on RAPL are being developed for specific scenarios [11, 15, 18, 19] and used to evaluate algorithms [7]. CPU Energy Meter makes energy measurement conveniently accessible to verification researchers. The most closely related project is the Performance API (PAPI) analysis library, which also supports RAPL [19], but this is a large library with a much larger scope than just energy measurements. In contrast, our tool is a ready-to-use solution for energy measurements that is easy to install and use.

## 2 Intel Running Average Power Limit (RAPL)

The Intel Running Average Power Limit (RAPL) [14] is a feature of Intel CPUs that allows to measure and limit the energy consumption of CPUs. It is available since the 2nd generation of the Intel Core architecture (code name "Sandy Bridge"), i.e., on Intel Core i3/i5/i7 2000 and newer, as well as Intel Xeon E3/E5/E7 CPUs. This covers a wide range of common CPUs for notebooks, desktops, and servers.

One part of RAPL consists of access to a series of hardware counters in which the CPU accumulates the energy it has consumed. RAPL supports measuring the energy consumption of so-called "domains", and up to five domains are supported by current CPUs: package, PP0, PP1, DRAM, and PSYS. Which hardware units are included in which domain is not clearly specified by Intel, but in general we can use the following assumption: The package domain refers to the whole CPU, the PP0 domain refers to the processor cores, and the PP1 domain refers to other units such as an integrated graphics unit. The domains DRAM and PSYS may provide information on the energy consumption of the RAM and other hardware on the mainboard, but both need special support from the hardware platform and its values may not be comparable between different systems.

There is no official information by Intel on the precision of the measurements except that the counters are updated approximately every 1 ms. The resolution of the values varies between the CPUs, but is typically <sup>1</sup> <sup>2</sup><sup>16</sup> <sup>J</sup> or <sup>1</sup> <sup>2</sup><sup>14</sup> J, i.e., in the order of 10−<sup>5</sup> J. For the first generation of CPUs with RAPL, the energy consumption was approximated by the CPU and imprecise, but for subsequent generations the precision had been improved [6, 7, 10].

## 3 CPU Energy Meter

Our tool CPU Energy Meter provides access to the energy-measurement features of Intel CPUs to users. It was developed based on the tool Intel Power Gadget for Linux <sup>1</sup>. Our tool is available as open source under the permissive 2-clause BSD license and hosted on GitHub <sup>2</sup>. Installation packages of CPU Energy Meter are available for Debian-based distributions (e.g., Ubuntu).

CPU Energy Meter measures the energy consumption of the CPU(s) of a system for a specific time interval as reported by the RAPL interface (cf. Sect. 2). In order to ensure the highest possible measurement precision with the lowest possible overhead, it reads the RAPL energy counters as rarely as possible instead of using continuous sampling, while at the same time reading the counters often enough to safely detect and account for counter overflows. Furthermore, our tool was developed to use a minimal amount of necessary dependencies and permissions in order to make its installation as easy as possible.

Requirements. CPU Energy Meter requires a system with one or more Intel CPUs that support the RAPL feature. It needs direct access to the CPUs, thus running in a virtual machine is not supported. Accessing the model-specific registers of CPUs with the energy measurements is done via the Linux kernel module msr <sup>3</sup>, which needs to be loaded and provides device files named /dev/cpu/\*/msr.

Typically, access to these device files is granted only to the user root. In order to not need to execute CPU Energy Meter as root, one can change the file permissions of the device files appropriately (e.g., by granting read permissions to a group msr and making CPU Energy Meter always execute as this group using the "setgid" permission). Furthermore, CPU Energy Meter needs the capability CAP\_SYS\_RAWIO <sup>4</sup>, which can be granted using setcap <sup>5</sup>. The installation packages of CPU Energy Meter attempt to automatically configure the system such that every user can execute the tool without granting any other non-standard permissions to users. In any case (whether executed as root or not), CPU Energy Meter drops all unnecessary permissions as soon as possible using the library "libcap" <sup>6</sup> in order to reduce any risk related to the non-standard permissions.

Usage. CPU Energy Meter is intended primarily to be used by benchmarking frameworks, however, manual execution is also possible. When the tool is executed, it starts the measurements and prints the consumed energy for all supported domains and CPUs of the system as soon as it is killed via the interrupt signal or Ctrl+C. Intermediate measurements are printed when the signal USR1 is received. To manually measure the energy consumption of the duration of a specific command, one can execute the following command line, for example:

cpu-energy-meter & some\_command ; kill -INT %1

<sup>1</sup> https://software.intel.com/en-us/articles/intel-power-gadget

<sup>2</sup> https://github.com/sosy-lab/cpu-energy-meter

<sup>3</sup> http://man7.org/linux/man-pages/man4/msr.4.html

<sup>4</sup> http://man7.org/linux/man-pages/man7/capabilities.7.html

<sup>5</sup> http://man7.org/linux/man-pages/man8/setcap.8.html

<sup>6</sup> https://sites.google.com/site/fullycapable/

This will measure the energy consumption of all CPUs during the whole time that the specified command is running, regardless of whether this energy consumption is caused by the specified command or by other processes running in parallel (this is a limitation of the RAPL feature). Thus, measuring the energy consumption during a specific time period (e.g., 10 s) can be done by replacing some\_command with sleep 10.

The output values are given with the unit Joule, and can be formatted either in a way that is optimized for being read by humans (cf. Fig. 1) or parsed by programs.


Fig. 1: Example output of CPU Energy Meter on a single-CPU system of the SkyLake generation (with all five domains supported)

Integration into BenchExec. We have contributed an integration of CPU Energy Meter into the benchmarking framework BenchExec [4], because BenchExec is widely used in the formal-methods community (e.g., SV-COMP [2]). Starting with version 1.16, BenchExec automatically executes CPU Energy Meter if the latter is installed, and it reports the energy results in the same manner as the results of its internal time and memory measurements (BenchExec supports the creation of CSV tables and interactive HTML tables with plots for its benchmarking results). BenchExec will report the energy consumption only if all cores of one or more CPUs are used for each tool execution, because we cannot distinguish between the energy consumption of individual processes.

## 4 Applications

The 8th International Competition on Software Verification (SV-COMP'19) [3] measured energy consumption of verification tools using BenchExec and CPU Energy Meter and for the first time provided an alternative "green" ranking based on energy efficiency (CPU-energy usage divided by achieved score). This ranking was indeed considerably different from the main score-based ranking, with no overlap between the top three green verifiers and the top three verifiers in the category "C-Overall". Furthermore, the winner in the green ranking is two orders of magnitude more efficient than the last tool in the ranking (64 J per score point vs. 4 200 J per score point). This shows an enormous potential of efficiency improvements and energy savings if verification researchers get access to easy measurements of energy usage.

In the following, we analyze in more detail some energy measurements of SV-COMP'19, which provides all raw results online <sup>7</sup>. We pick the results for the submissions Cbmc <sup>8</sup> and CPA-Seq <sup>9</sup> across all categories. CPA-Seq is the winner of the category "C-Overall", written in Java, and employs several different algorithms, some of which are partially parallelized. The garbage collector that

<sup>7</sup> https://sv-comp.sosy-lab.org/2019/results/results-verified/All-Raw.zip

<sup>8</sup> http://www.cprover.org/cbmc/ <sup>9</sup> https://cpachecker.sosy-lab.org/


Table 1: Selection of Energy Measurements from SV-COMP'19

is used by the JVM adds some more parallelism. Cbmc is written in C++ and uses bounded model checking in a strictly sequential implementation. Thus, we expect that the energy consumption of these tools has different characteristics. SV-COMP'19 executed both tools for 10 522 tasks (CPU-time limit 900 s per task, Intel Xeon E3-1230 v5 CPU, quad-core with hyper-threading, 3.4 GHz, all 8 processing units of the CPU and 15 GB of memory were available to each tool execution, Ubuntu 18.04 64-bit with Linux kernel 4.15 was the operating system).

We now compare the energy consumption of the RAPL domain "Package" with the CPU time for Cbmc in Fig. 2 and for CPA-Seq in Fig. 3. <sup>10</sup> In the plot, all results that lie on the same line through the origin belong to tool executions for which the energy consumption per second of CPU time (in <sup>J</sup> <sup>s</sup> = W) was the same (this would be the average power of the CPU if measuring wall time instead of CPU time). We provide additional statistics in Table 1 and two graphs that compare the CPU time and the energy consumption of the two tools in Fig. 4.

Insight: *Also for verification tools, high values for CPU time do not imply high values for energy.* Figure 2 has a large vertical area of data points where the CPU time is close to the time limit. For those verification runs, the energy is in the range of 2.0 kJ to 15 kJ. This shows that for a specific CPU time, the energy consumption (and average power, cf. Table 1) for different verification tasks can vary by a factor of 7.

Insight: *Comparing different verification tools regarding CPU time can lead to different conclusions than energy-based comparisons.* The graph on the left of Fig. 4 compares Cbmc and CPA-Seq regarding CPU time, the graph on the right compares them regarding energy consumption. The difference between the shapes of these two graphs shows that looking at the energy consumption when comparing tools is an interesting addition to comparing only CPU time, and that

<sup>10</sup> For CPA-Seq, the CPU time is sometimes higher than 900 s because SV-COMP lets tools optionally run for more than the time limit in order to print additional statistics (but any result after the time limit is of course discarded).

Fig. 2: Comparison of CPU time vs. energy consumption for Cbmc

CPU time for Cbmc (s) CPU energy for Cbmc (kJ)

5

 energy for

CPA-Seq (kJ)

10

15

Fig. 4: Comparison of Cbmc and CPA-Seq with regard to CPU time and energy

the similar statistics on power usage with regard to CPU time (cf. lower part of Table 1 and Figs. 2 and 3) can be misleading: if the power-usage characteristics of both tools were the same, the two graphs in Fig. 4 would look similar.

## 5 Conclusion

CPU time for

CPA-Seq (s)

Verification algorithms consume large amounts of energy and thus, it is prohibitive to ignore the energy characteristics of algorithms when comparing their quality. Although this matter is understood, the verification community does not measure energy. We believe that this is because measurement of energy is complex and requires a lot of additional effort. The lightweight tool CPU Energy Meter fills this gap: It supports reading Intel-RAPL-based energy measurements in a convenient way and —via integration into BenchExec— using a tool environment that many verification researchers use anyway already.

An analysis of a large data set from a verification competition invalidates a wide-spread assumption: the data quickly reveal that energy consumption can deviate significantly from the consumed CPU time. Thus, it is not sufficient to measure CPU time.

Data Availability Statement. A replication package for this article including CPU Energy Meter and BenchExec is available at Zenodo [5]. Current versions of CPU Energy Meter are available at https://github.com/sosy-lab/ cpu-energy-meter and https://doi.org/10.5281/zenodo.1300309. The dataset from SV-COMP'19 [3] that was analyzed in Sect. 4 is available online at https://sv-comp.sosy-lab.org/2019/results/results-verified/All-Raw.zip.

## References


Int. Green and Sustainable Computing Conference (IGSC). pp. 1–8. IEEE (2015). https://doi.org/10.1109/IGCC.2015.7393710


Open Access. This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Logic and Proof

## **Practical Machine-Checked Formalization of Change Impact Analysis**

Karl Palmskog<sup>1</sup>, Ahmet Celik<sup>2</sup>, and Milos Gligoric<sup>3</sup>

 KTH Royal Institute of Technology, Stockholm, Sweden Facebook, Seattle, WA, USA The University of Texas at Austin, Austin, TX, USA palmskog@kth.se, celik@fb.com, gligoric@utexas.edu

**Abstract.** Change impact analysis techniques determine the components affected by a change to a software system, and are used as part of many program analysis techniques and tools, e.g., in regression test selection, build systems, and compilers. The correctness of such analyses usually depends both on domain-specific properties and change impact analysis, and is rarely established formally, which is detrimental to trustworthiness. We present a formalization of change impact analysis with machine-checked proofs of correctness in the Coq proof assistant. Our formal model factors out domain-specific concerns and captures system components and their interrelations in terms of dependency graphs. Using compositionality, we also capture hierarchical impact analysis formally for the first time, which, e.g., can capture when impacted files are used to locate impacted tests inside those files. We refined our verified impact analysis for performance, extracted it to efficient executable OCaml code, and integrated it with a regression test selection tool, one regression proof selection tool, and one build system, replacing their existing impact analyses. We then evaluated the resulting toolchains on several open source projects, and our results show that the toolchains run with only small differences compared to the original running time. We believe our formalization can provide a basis for formally proving domain-specific techniques using change impact analysis correct, and our verified code can be integrated with additional tools to increase their reliability.

**Keywords:** Change impact analysis · Regression test selection · Coq.

## **1 Introduction**

Change impact analysis aims to determine the components affected by a change to a software system, e.g., the modules or files affected by a modified line of code [3,4]. Change impact analysis techniques are used in many program analyses and tools, such as regression test selection (RTS) tools [26, 52, 59, 61], build systems [15, 21, 43, 45], and incremental compilers [48].

Change impact analysis techniques typically mix domain- and languagespecific concepts, such as method call graphs and class files, with more abstract notions, such as dependencies, transitive closures, and topological sorts. This can complicate reasoning about the correctness (safety) of a technique. For example, to the best of our knowledge, RTS techniques for Java-like languages have never been argued to be safe (i.e., to never omit tests affected by a change) by machine-checked reasoning—only by high-level pen-and-paper proofs [51,55,60].

In this paper, we present a formalization of key concepts used in many change impact analysis techniques—concepts that are independent of any language or application domain. Our formalization represents system components and their interrelations as vertices and edges in explicit dependency graphs. We consider whether components are impacted by changes between two system revisions by computing transitive closures of modified graph vertices in the inverse of the dependency graph from the old revision. This has been described as "invalidating the upward transitive closure" [14]. Among impacted vertices, we identify those that are checkable, representing, e.g., a test method, that can be re-executed.

We encoded our formal model as a library in the Coq proof assistant, and proved two key correctness properties: soundness and completeness. Soundness, intuitively, states that the outcomes of executing checkable vertices that are unimpacted in the new revision are the same as they would be in the previous revision. Completeness roughly states that all checkable vertices in the new revision are members of the set of all added, impacted, and unimpacted vertices.

Based on our correctness approach, we also defined and proved correct two strategies for hierarchical change impact analysis that are roughly analogous to, on the one hand, file-based incremental builds [43, 54], and on the other hand, hybrid regression test selection [46, 60]. To the best of our knowledge, hierarchical change impact analysis is previously unexplored in formal settings like ours. Ultimately, by proving some basic properties about relations between vertices and results of executing checkable vertices, developers can use our model and library to obtain end-to-end guarantees for domain-specific impact analyses.

To capture our model of system components and their dependencies in Coq, we used the Mathematical Components (MC) library [42] and its representation of relations, finite graphs, and subtypes [25,28,29]. For the formal proofs, we used the SSReflect proof language and followed the idiom of the MC library of leveraging boolean decision procedures in proofs via small-scale reflection [9, 30, 31]. To obtain efficient executable code, we performed several verified refinements of our initial Coq encoding. From our refined functions and datatypes, we then derived a practical tool, dubbed Chip, by carefully extracting Coq code to OCaml and linking it with an assortment of OCaml libraries. Chip can be viewed as a verified component for change impact analysis that can either be integrated into verified systems or used in conventionally developed systems.

To ensure the adequacy of our formal model, we performed an empirical study using Chip. Specifically, we integrated Chip with Ekstazi [26], a tool for class-based regression test selection in Java, with iCoq [11], a tool for regression proof selection in Coq itself, and with Tup [54], a build system similar to make, replacing the existing components for change impact analysis in all these tools. We then compared the outcome and running time between the respective modified and original tool versions when applied to the revision histories of several open-source projects. This approach is along the lines of previous evaluations of formal specifications [8, 20, 33] and RTS techniques [26, 37, 60]. During our evaluation of Chip, we also located and addressed several performance bottlenecks. We make the following contributions:


## **2 Background**

In this section, we give some brief background on change impact analysis and its applications, and on the Coq proof assistant.

## **2.1 Change Impact Analysis**

Broadly, we consider change impact analysis as the activity of identifying the potential consequences of a change to a software system. Formulated in this way, change impact analysis is an old concern in software engineering [4], and remains an active research topic as part of techniques and tools [1,34,53]. In early work, Arnold posited computing transitive closures of statically derived program call graphs as the fundamental technique for change impact analysis [3]. However, later research argues that dynamic analysis can be more precise [36] and lead to faster dependency collection for use in future analyses [26]. Our work aims to capture general concepts used in both static and dynamic approaches [10, 38].

## **2.2 Regression Test Selection and Regression Proof Selection**

Regression test selection (RTS) techniques optimize regression testing – running tests at each project revision to check correctness of recent changes – by deselecting tests that are not affected by the recent changes [50, 59]. Traditionally, RTS techniques maintain for each test a set of code elements (e.g., statements, methods, classes) on which the test depends. When code elements are modified, change impact analysis is used to detect those tests that are potentially affected by the changes. Prior work has studied RTS for various programming languages (e.g., C, C++, and Java), built dependency graphs statically or dynamically, and used various granularities of code elements (e.g., statements, methods, and classes). The meaning of the dependency graph is language-specific, but if the graph is properly constructed, the change impact analysis is independent of the language. For example, Ekstazi [26], a recent RTS tool for Java projects, builds and maintains Java class file dependency graphs dynamically, and when a class file is modified, Ekstazi uses change impact analysis to select all test classes that depend, directly or indirectly, on the modified class.

Regression proof selection (RPS) is the analogue of RTS for formal proofs, which, similarly to tests, can take a long time to check. The RPS technique implemented in the iCoq tool for Coq [12] uses hierarchical selection [11], where impacted files are used to locate impacted proofs to be checked.

### **2.3 Build Systems**

The classic build system make uses file timestamp comparisons to decide whether a task defined in a build script should be run. Dependency graphs are implicitly defined by tasks depending on the completion of other tasks, or on certain files, as expressed in the build script. In contrast to test execution, build script task execution typically produces side effects in the form of new files, e.g., files with object code in ELF format. Modern build systems such as Bazel [5] and CloudMake [21, 27] can use other ways than timestamps to find modified files, e.g., comparing cryptographic hashes of files across revisions. Recent alternative build systems that aim to replace make include Tup [54] and Shake [43]; the former uses an explicit persisted dependency graph.

### **2.4 The Coq Proof Assistant and Mathematical Components**

Coq consists of, on the one hand, a small and powerful purely functional programming language, and on the other hand, a system for specifying properties about programs and proving them [6]. Coq is based on a constructive type theory [17, 18] which effectively reduces proof checking to type checking, and puts programming on the same foot as proving. Mathematical Components (MC) [42] is an extensive Coq library that provides many structures from mathematics, including finite sets, relations, and subtypes; we use the module fingraph, which was derived from Gonthier's proof of the four-color theorem [28].

Datatypes and functions verified inside Coq to have some correctness property can be extracted to a practical programming language such as OCaml [40], and then integrated with libraries; extraction is used in several large-scale software verification projects [39, 57]. Obtaining efficient programs via extraction may require significant engineering because of discord between the requirements for formal correctness and agreeable program runtime behavior [19]. When target languages lack fully formal semantics, as is the case for OCaml, extraction cannot be fully trusted, but empirical evaluations are nevertheless encouraging [24, 58].

## **3 Formal Model**

This section introduces our model, assumptions, and correctness approach.

#### **3.1 Definitions**

**Components**: Our model of change impact analysis uses two finite sets of vertices V and V - , where V <sup>⊆</sup> V - . Members of these sets represent the components of a system (e.g., files or classes) before and after a change, respectively.

**Artifacts**: We let A be a set of artifacts. An artifact is intended to be a concrete underlying representation of a component, e.g., an abstract syntax tree or the content of a file. We assume that the equality of two artifacts is decidable, i.e., that we can compute for all a, a- <sup>∈</sup> A whether a <sup>=</sup> a or a <sup>=</sup> a- . To associate vertices with artifacts, we use two total functions f : V <sup>→</sup> A and f- : V - <sup>→</sup> A. In practice, we expect these functions to map vertices to compact summaries of component representations, such as checksums computed by cryptographic hash functions. Whenever f(v) <sup>=</sup> f- (v) for some v <sup>∈</sup> V , we say that the artifact for v is modified after the revision; otherwise, it is unmodified.

**Graphs**: Let g be a binary relation on V . For v, v- <sup>∈</sup> V , we say that v directly depends on v if g(v, v- ) holds. For example, if v and v represent classes in a Javalike language, v may be a subclass of v- . We will usually refer to relations like g as (dependency) graphs. We write g<sup>−</sup><sup>1</sup> for the inverse of <sup>g</sup>, i.e., we have <sup>g</sup><sup>−</sup><sup>1</sup>(v, v- ) iff g(v- , v). Moreover, we write g<sup>∗</sup>(v, v- ) for when v and v are transitively related in g, and say that v transitively depends on v- . We define the reflexive-transitive closure of a vertex v <sup>∈</sup> V with respect to a graph g as the set {v- <sup>|</sup> g<sup>∗</sup>(v, v- )}, i.e., as the set of all vertices reachable from v in g (which includes v itself).

**Execution**: We assume there is a subset E <sup>⊆</sup> V of checkable vertices, i.e., it is meaningful to apply some (side-effect free) function check on them and obtain some result. For example, a checkable vertex may represent a test method that either passes or fails when executed.

**Impactedness**: Let g be a dependency graph. We then say that a vertex v <sup>∈</sup> V is impacted if it is reachable in <sup>g</sup><sup>−</sup><sup>1</sup> from some modified vertex. Equivalently, v is impacted iff there is a v- <sup>∈</sup> V such that f(v- ) <sup>=</sup> f- (v- ) and (g<sup>−</sup><sup>1</sup>)∗(v- , v). Additionally, a vertex v-- <sup>∈</sup> V is considered fresh whenever v--<sup>∈</sup>/ V .

We take the (disjoint) union of the set I of impacted vertices and the set F of fresh vertices, and consider the checkable vertices in this set, i.e., vertices in - I <sup>∪</sup> F <sup>∩</sup> E. Intuitively, these are the only vertices that we need to consider in the new revision, since all other vertices in V are unimpacted—and using check on unimpacted vertices will have the same outcome as in the old revision.

#### **3.2 Example**

Figure 1 illustrates the core idea of the graph-based change impact analysis approach we model. Figure 1(a) shows the original dependency graph, where, e.g., component 3 depends directly on components 1 and 2, and 5 depends directly on 3 and transitively on 1 and 2; dotted components are checkable. Figure 1(b) shows the inverse graph, with the modified component 1 bolded, and the components impacted by the change in gray (the reflexive-transitive closure of 1 in the inverse graph). Based on these results, we call check on 5, but not on 6.

**Fig. 1.** Dependency graph where component 1 is changed, impacting 3 and 5.

### **3.3 Correctness Approach**

For correctness, we intuitively show that executing only impacted and fresh vertices that are checkable is enough in the new revision, since the result of executing unimpacted vertices is the same as in the old revision. This means that if we have access to the results of checking vertices in the old revision, we can use those results to obtain the complete outcome for all checkable vertices in the new revision, without going through the work usually required.

Having constructed the set T of tuples of checkable vertices and outcomes from the impacted, fresh, and unimpacted vertices, we can ask (1) whether T is complete, i.e., whether it contains outcomes for all checkable vertices in V - , and (2) whether the outcomes in T are sound, i.e., if they are same as if we had explicitly called check on the associated vertices.

To be able to prove soundness and completeness, we need to assume several properties relating the dependency graphs and outcomes of executing vertices in both revisions. Informally, we make the following assumptions:


The last assumption implicitly rules out that the underlying operation (e.g., test execution) on a vertex is nondeterministic, which it can be in practice [41].

## **4 Model Encoding**

In this section, we give an overview of our encoding in Coq of the formal model described in the previous section, using theories of finite sets and graphs from the MC library. We use a simplified version of Coq's specification language, Gallina.

### **4.1 Encoding in Coq**

We represent the vertex set V as a finite type (finType) <sup>V</sup>', and its subset V as a subtype (subType) V, induced by a decidable predicate P on vertices in V' (of type pred V'). This allows us to define the graph <sup>g</sup> as a binary decidable relation g on V, i.e., a variable of type rel V, and use the MC library predicate connect to express whether two vertices are transitively related in g. The inverse of g is defined as [ rel x y <sup>|</sup> gyx], which we write as <sup>g</sup>−<sup>1</sup>. We use connect to form the set of vertices in the reflexive-transitive closure of a given vertex x with respect to a graph g, and a canonical big operator [7] to form the union of all such closures for elements in a given set m of modified vertices:

```
Def impacted (g : rel V) (m : {set V}) : {set V} :=
 \bigcup_( x | x \in m) [ set y | connect g x y].
```
We characterize this function through MC's reflect ("if and only if"):

Thm impactedP g m x : reflect (∃ v, v \in m & connect g v x) (x \in impacted g m).

The MC library function val injects a subtype element into the corresponding supertype. We use this to capture impacted and fresh vertices in V':

Def impacted\_V' <sup>m</sup> : {set V'} := [set (val v) <sup>|</sup> v in impacted g−<sup>1</sup> <sup>m</sup>]. Def fresh\_V' : {set V'} := [set v | ¬ P v].

We represent the set of artifacts A as a type <sup>A</sup> with decidable equality (eqType), and functions f and f as regular Coq functions f and f'. This allows us to define the set of modified vertices in <sup>V</sup>', and then take the union (operator :|:) of impacted and fresh vertices:

```
Def mod_V : {set V} := [set v | f v != f' (val v)].
Def impacted_fresh_V' : {set V'} := impacted_V' mod_V :|: fresh_V'.
```
We then use a predicate checkable to form the subset of vertices in V' that can be executed:

```
Def chk_impacted_fresh_V':{set V'} := [set v in impacted_fresh_V' | checkable v].
```
We use a function check, which takes a vertex and returns a term in a result type R (an eqType, e.g., bool), to define a sequence of vertices and results:

```
Def res_impacted_fresh_V' : seq (V' ∗ R) :=
[ seq (v, check v) | v ← enum chk_impacted_fresh_V'].
```
Note that by using a sequence instead of a finite set for these tuples, we ensure R can be any type with decidable equality, such as a message of arbitrary length.

#### **4.2 Correctness Statements**

For stating and proving correctness, we assume we have dependency graphs for the old and new revision, as well as definitions of whether vertices are checkable, and checking functions:

Vars (g : rel V) (g' : rel V'). Vars (checkable : pred V) (checkable' : pred V') (check : V → R) (check' : V' → R).

We then define the graph g for vertices in V', named g\_V':

```
Def insub_g (x y : V') : bool := match insub x, insub y with
 Some x', Some y' ⇒ g x' y' | _, _ ⇒ false end.
Def g_V' : rel V' := [rel x y | insub_g x y].
```
This allows us to formulate the assumption A1 from above:

Hyp fg\_eq : ∀ (v : V), f v = f' (val v) → ∀ (v' : V'), g\_V' (val v) v' = g' (val v) v'.

The assumption A2 is equally straightforward to define:

Hyp chk\_f : ∀ v, f v = f' (val v) → checkable v = checkable' (val v).

Finally, the assumption A3, when formalized, establishes a relation between vertices in g and g':

Hyp chk\_V : ∀ (v : V), checkable v → checkable' (val v) → (∀ (v' : V'), connect g\_V' (val v) v' = connect g' (val v) v') → (∀ (v' : V'), connect g\_V' (val v) (val v') → f v' = f' (val v')) → check v = check' (val v).

We now assume we are given a sequence of results for checkable vertices in the old revision, and that this sequence is sound, complete, and duplicate-free:

```
Var res_V : seq (V ∗ R).
Hyp res_VP : ∀ v r, reflect (checkable v ∧ check v = r) ((v,r) \in res_V).
Hyp res_v_uniq : uniq [seq vr.1 | vr ← res_V].
```
We can then filter the sequence of old results to locate unimpacted vertices in the new revision:

```
Def res_unimpacted_V' : seq (V' ∗ R) := [seq (val vr.1, vr.2) |
 vr ← res_V & val vr.1 \notin impacted_V' mod_V].
```
This allows us to form a final sequence of vertex-result pairs:

Def res\_V' : seq (V' ∗ R) := res\_impacted\_fresh\_V' ++ res\_unimpacted\_V'.

For sanity-checking, we prove the absence of duplicates:

Def chk\_V' : seq V' := [seq vr.1 | vr ← res\_V']. Thm chk\_V'\_uniq : uniq chk\_V'.

We prove that the sequence contains all checkable vertices in V' (completeness):

Thm chk\_V'\_complete (v : V') : checkable' v → v \in chk\_V'.

Finally, we prove that the results in the sequence are consistent with explicitly calling check' on all vertices in V' (soundness):

Thm chk\_V'\_sound (v : V') (r : R): (v, r) \in res\_V' → checkable' v ∧ check' v = r.

The formal proofs, which we elide here, mostly reduce to reasoning over the connect predicate and inductively on graph paths.

## **5 Component Hierarchies**

Let V be a set of vertices representing fine-grained components (e.g., methods), with dependency graph g<sup>⊥</sup>. Let <sup>U</sup> be a different set of vertices representing coarse-grained components (e.g., files), associated with a function p: U <sup>→</sup> <sup>2</sup><sup>V</sup> that defines a partition of V . The partition indicates how components in U encapsulate components in <sup>V</sup> , and is associated with a graph <sup>g</sup> of vertices in <sup>U</sup> that is

**Fig. 2.** Hierarchy with component sets U and V , partition p, and dependencies.

consistent with dependencies expressed in g<sup>⊥</sup>. This approach can be repeated to produce component hierarchies, each time coalescing sets of finer-grained dependencies into single coarser-grained dependencies. Figure 2 illustrates a two-level hierarchy and its component dependencies.

Some change impact analysis techniques consider both fine-grained and coarsegrained component levels [11, 46, 60]. A key idea behind these techniques is to exploit the relationships between vertices across granularity levels. In particular, if a vertex u <sup>∈</sup> U is unmodified after a change, we may be able to immediately conclude that all vertices v <sup>∈</sup> p(u) are unmodified as well, potentially ruling out that a large subset of V is impacted. In this section, we formalize this intuition using our existing notions to express hierarchical change impact analysis.

#### **5.1 Formal Model of Hierarchies**

Let <sup>f</sup><sup>⊥</sup> and <sup>f</sup>- <sup>⊥</sup> be the functions mapping vertices to artifacts for <sup>V</sup> and <sup>V</sup> with V <sup>⊆</sup> V - , and let <sup>f</sup> and <sup>f</sup>- be the corresponding functions for <sup>U</sup> and <sup>U</sup> with U <sup>⊆</sup> U- . Let p and p be partition-inducing functions from U and U to subsets of V and V - , respectively. We make the following assumptions:

H1: For all u, u- <sup>∈</sup> U and v, v- <sup>∈</sup> V , if u <sup>=</sup> u- , g<sup>⊥</sup>(v, v- ), v <sup>∈</sup> p(u), and v- <sup>∈</sup> p(u- ), then g(u, u- ).

H2: For all u <sup>∈</sup> U, if f(u) = <sup>f</sup>- (u), then p(u) = p- (u).

$$\text{H3: For all } u \in U \text{ and } v \in V \text{, if } f \vdash (u) = f'\_{\top}(u) \text{ and } v \in p(u) \text{, then } f\_{\bot}(v) = f'\_{\bot}(v).$$

Intuitively, H1 expresses that whenever two fine-grained components that reside in different coarse-grained components are related, there must be a corresponding relation between their respective coarse-grained components. H2 expresses that whenever a coarse-grained component is unchanged, it contains the same fine-grained components as before. Finally, H3 expresses that a fine-grained component is unchanged if the coarse-grained component that contains it is unchanged. Under these assumptions, there are essentially two distinct strategies we can use to leverage impact analysis for coarse-grained components to analyze fine-grained components.

**Overapproximation strategy**: Let U- <sup>i</sup> be the set of impacted and fresh vertices in U- , computed as above without considering vertices in V - . Consider the set V - <sup>p</sup> = u∈U- i p- (u) which contains fresh and potentially impacted vertices in V - . Executing all checkable vertices in V - <sup>p</sup> may perform needless work for unimpacted vertices, but completely elides analysis of g<sup>⊥</sup>. This approach essentially corresponds to relying on comparing whole files to decide whether to rerun commands that operate on every component inside these files, as in make.

**Compositional strategy**: Let <sup>U</sup><sup>i</sup> be the set of impacted vertices in <sup>U</sup>, computed as above. Consider the set <sup>V</sup><sup>p</sup> <sup>=</sup> <sup>u</sup>∈U<sup>i</sup> <sup>p</sup>(u) of potentially impacted vertices in V . We use this set to scope further analysis. In particular, we use the subgraph <sup>g</sup><sup>p</sup> of <sup>g</sup><sup>⊥</sup> induced by <sup>V</sup><sup>p</sup> to precisely find the impacted vertices in <sup>V</sup> . While unimpacted vertices are then avoided, the additional analysis of <sup>g</sup><sup>p</sup> may be time-consuming to perform compared to the first strategy. At a high level, this strategy corresponds to the one used in RPS [11] and hybrid RTS [60].

## **5.2 Encoding and Correctness in Coq**

To encode hierachical analysis, we use finite types and functions (now suffixed by top and bot) in the same way as before, while adding partitioning assumptions:

Vars (p : U → {set V}) (p' : U' → {set V'}). Hyp p\_pt : partition (\bigcup\_(u | u \in U) [set (p u)]) [set: V]. Hyp p'\_pt : partition (\bigcup\_(u | u \in U') [set (p' u)]) [ set: V'].

For the overapproximation strategy, we first define impacted sets:

Def if\_top : {set U'} := impacted\_fresh\_V' f'\_top f\_top g\_top. Def p'\_if\_bot : {set V'} := \bigcup\_( u | u \in if\_top ) (p' u).

Under the assumptions outlined above, we then show formally that p'\_if\_bot is a superset of the results of analysis of V, V', and the graph g\_bot:

Thm in\_p' (v : V') : v \in impacted\_fresh\_V' f'\_bot f\_bot g\_bot → v \in p'\_if\_bot.

The key fact we use to prove this theorem is the following:

```
Thm connect_top_bot v v' u u' : v \in (p u) → v' \in (p u') →
 connect g_bot v v' → connect g_top u u'.
```
To encode the compositional strategy, we first define impacted sets:

Def i\_top : {set U} := impacted g\_top−<sup>1</sup> (mod\_V f'\_top f\_top). Def p\_i\_bot : {set V} := \bigcup\_( u | u \in i\_top ) (p u).

Then, we define a subtype and accompanying graph:

Def P\_V\_sub : pred V := fun v ⇒ v \in p\_i\_bot. Def V\_sub : finType := sig\_finType P\_V\_sub. Def g\_bot\_sub : rel V\_sub := [rel x y | g\_bot (val x) (val y)].

This allows us to use our previously defined analysis functions compositionally:

Def mod\_V\_sub := [set v : V\_sub | val v \in mod\_V f'\_bot f\_bot]. Def impacted\_V\_sub := impacted g\_bot\_sub−<sup>1</sup> mod\_V\_sub. Def impacted\_V'\_sub := [set val (val v) | v in impacted\_V\_sub]. Def impacted\_fresh\_V'\_sub := impacted\_V'\_sub :|: fresh\_V' P\_bot.

We finally show that the last set is the same as the one we would have obtained by directly analysing the graph g\_bot:

```
Thm impacted_fresh_V'_sub_eq :
impacted_fresh_V'_sub = impacted_fresh_V' f'_bot f_bot g_bot.
```
Using these definitions and results, we proved soundness and completeness for both strategies using the same approach as in Section 4.2.

## **6 Tool Implementation**

While our core definitions of change impact analysis described in Section 4 are executable inside Coq, this does not mean they are efficient or that code extracted from the definitions is immediately usable. We describe two aspects of bringing verified Coq code into our tool Chip: optimizations and encapsulation.

#### **6.1 Optimizations**

Our basic transitive closure function impacted is simple to reason about but not particularly fast in practice, since it fully explores the closures of all elements in the set of modified vertices. To mitigate this, we refined the function by leveraging the depth-first search function dfs from the fingraph MC module to incrementally compute the closure. dfs takes a graph as a function from vertices to neighbor sequences and a depth bound, and terminates as soon as it encounters a known vertex. We perform a stack-efficient left fold with dfs over an input sequence of vertices:

Def clos (l : seq V) : seq V := foldl (dfs g #|V|) [:: ] l.

Note that we set the dfs depth bound to the number of elements in the finite type <sup>V</sup> (written #|V|) to fully explore the graph <sup>g</sup>. However, one limitation of the MC dfs function is its linear-time sequence membership lookups. We therefore defined a better closure function with logarithmic membership lookup time using sets backed by red-black trees as found in the Coq standard library [2, 23]:

```
Fixpoint sdfs (g : V → seq V) (n : nat) (s : RBT.t) (x : V) : RBT.t :=
 if RBT.mem x s then s else
 if n is n'.+1 then foldl (sdfs g n') (RBT.add x s) (g x) else s.
Def sclos (l : seq V) : seq V := RBT.elements (foldl (sdfs g #|V|) RBT.empty l).
```
We used this closure function to define a function seq\_impacted\_fresh which we proved extensionally equivalent to impacted\_fresh\_V' defined in Section 4.1. We also added many custom extraction directives in Coq to ensure the extracted code uses efficient OCaml library functions, e.g., for list operations [22].

### **6.2 Encapsulation**

Before extraction to OCaml, we instantiate the finite types for graph vertices to ordinal finite types, which intuitively contain all natural numbers from 0 up to (but not including) some bound k. These numbers can then become machine integers during extraction, which allows us to provide a simple OCaml interface:

```
val impacted_fresh : int -> int -> (int -> string) ->
  (int -> string) -> (int -> int list) -> int list
```
Here, the first argument is the number of vertices in the new graph, while the second is the number of vertices in the old graph. After these integers follow two functions that map new and old vertices, respectively, to their artifacts in the form of OCaml strings. Then comes a function that defines the adjacent vertices of vertices in the old graph. The result is a list of impacted and fresh vertices.

Not all computationally meaningful types in Coq can be directly represented in OCaml's type system. Some function calls must therefore circumvent the type system by using calls to the special Obj.magic function [40]. We use this approach in our implementation of the above interface:

```
let impacted_fresh num_new num_old f' f succs =
 Obj.magic (ordinal_seq_impacted_fresh num_new num_old
(Obj.magic (fun x -> char_list_of_string (f' x)))
(Obj.magic (fun x -> char_list_of_string (f x)))
(Obj.magic succs))
```
The interface and implementation for two-level compositional hierarchical selection is a straightforward extension, with an additional argument p of type int -> int list for between-level partitioning.

## **7 Evaluation of the Model**

To evaluate our model and its Coq encoding, we performed an empirical study by integrating Chip with a recently developed RTS tool, Ekstazi, one RPS tool, iCoq, and one build system, Tup. We then ran the modified RTS tool on open-source Java projects used to evaluate RTS techniques [26,37], the modified RPS tool on Coq projects used in its evaluation [11], and the modified build system on C/C++ projects. Finally, we compared the outcomes and running times with those for the unmodified versions of Ekstazi, iCoq, and Tup.

## **7.1 Tool Integration**

Integrating Chip with Ekstazi was challenging, since Ekstazi collects dependencies dynamically and builds only a flat list of dependencies rather than an explicit graph. To overcome this limitation, we modified Ekstazi to build an explicit graph by maintaining a mapping from method callers to their callees. The integration with iCoq was also challenging because of the need for hierarchical selection of proofs and support for deletion of dependency graph vertices. We handle deletion of a vertex in iCoq by temporarily adding it to the new graph with a different artifact (checksum) from before, marked as non-checkable; then, after selection, we purge the vertex. In contrast, the integration with Tup was straightforward, since Tup stores dependencies in an SQLite database. We simply query this database to obtain a graph in the format expected by Chip.

## **7.2 Projects**

**RTS**: We use 10 GitHub projects. Table 1 (top) shows the name of each project, the number of lines of code (LOC) and the number of tests in the latest version control revision we used in our experiments, the SHA of the latest revision, and


**Table 1.** List of Projects Used in the Evaluation (RTS at the Top, RPS in the Middle, and Tup at the Bottom).

URL on GitHub. We chose these projects because they are popular Java projects (in terms of stars) on GitHub, use the Maven build system (supported by Ekstazi), and were recently used in RTS research [37, 60].

**RPS**: We use 4 Coq projects. Table 1 (middle) shows the name of each project, the number of LOC and the number of proofs in the latest revision we used, the latest revision SHA, and URL. We chose these projects because they were used in the evaluation of iCoq [11]; as in that evaluation, we used 10 revisions of StructTact and 24 revisions of the other projects.

**Build system**: We use 6 GitHub projects. Table 1 (bottom) shows the name of each project, the number of LOC and the number of build commands in the latest revision we used, the latest revision SHA, and URL. We chose these projects from the limited set of projects on GitHub that use Tup. We looked for projects that could be built successfully and had at least five revisions; the largest project that met these requirements, in terms of LOC, was Tup itself.

#### **7.3 Experimental Setup**

Our experimental setup closely follows recent work on RTS [37, 60]. That is, our scripts (1) clone one of the projects; (2) integrate the (modified) Ekstazi, iCoq, or Tup; and (3) execute tests on, check proofs for, or build the (up to) 24 latest revisions. For each run, we recorded the end-to-end execution time, which includes time for the entire build run. We also recorded the execution time for change impact analysis alone. Finally, we recorded the number of executed tests,


**Table 2.** Execution Time and CIA Time in Seconds for Ekstazi and Chip.

**Table 3.** Execution/CIA Time in Seconds for iCoq and Chip.


proofs, or commands, which we use to verify the correctness of the results, i.e., we checked that the results for the unmodified tool and Chip were equivalent. We ran all experiments on a 4-core Intel i7-6700 CPU @ 3.40GHz machine with 16GB of RAM, running Ubuntu Linux 17.04. We confirmed that the execution time for each experiment was similar across several runs.

## **7.4 Results**

**RTS**: Table 2 shows the execution times for Ekstazi. Column 1 shows the names of the projects. Columns 2 to 4 show the cumulative end-to-end time for RetestAll (i.e., running all tests at each revision), the unmodified RTS tool, and the RTS tool with Chip. Columns 5 and 6 show the cumulative time for change impact analysis (CIA time). The last row in the table shows the cumulative execution time across all projects. We have several findings. First, Ekstazi with Chip performs significantly better than RetestAll, and only slightly worse than the unmodified tool. Considering that we did not prioritize optimizing the integration, we believe that the current execution time differences are small. Second, the CIA time using Chip is slightly higher than the CIA time for the unmodified tool, but we believe this could be addressed by integrating Chip via the Java Native Interface (JNI). The selected tests for all projects and revisions were the same for the unmodified Ekstazi and Ekstazi with Chip.

**RPS**: Table 3 shows the total proof checking time for iCoq and the CIA time for iCoq and Chip. All time values are cumulative time across all the revisions we used. We find that iCoq with Chip has only marginal differences in performance from iCoq for all but the largest project, Verdi. While iCoq with Chip is notably slower in that case, it still saves a significant fraction of time from checking every revision from scratch (RecheckAll). StructTact is an outlier in that RecheckAll is actually faster than both iCoq and iCoq with Chip, due to the overhead from bookkeeping and graph processing in comparison to the project's relatively small size. The selected proofs for all projects and revisions were the same for the unmodified iCoq and iCoq with Chip.

**Build system**: Table 4 shows the total execution time for Tup and the CIA time for Tup and Chip. All time values are cumulative time across all the revisions we used. Unfortunately, the build time for most of the projects is short. However, we can still observe that Chip takes only slightly more time than the original tool to perform change impact analysis. In the future, we plan to evaluate our toolchain on larger projects. The lists of commands for all projects and



all revisions were the same for the unmodified Tup and Tup with Chip.

Overall, we believe these results indicate that our formal model is practically relevant and that it is feasible to use Chip as a verified component for change impact analysis in real-world tools.

## **8 Related Work**

**Formalizations of graph algorithms**: Pottier [49] encoded and verified Kosaraju's algorithm for computing strongly connected graph components in Coq. He also derived a practical program for depth-first search by extracting Coq code to OCaml, demonstrating the feasibility of extraction for graph-based programs. Th´ery subsequently formalized a similar encoding of Kosaraju's algorithm in Coq using the MC fingraph module [56]. Th´ery and Cohen then formalized and proved correct Tarjan's algorithm for computing strongly connected graph components in Coq [13,16]. Our formalization takes inspiration from Th´ery and Cohen's work, and adapts some of their definitions and results in a more applied context, with focus on performance of extracted code. Similar graph algorithm formalizations have also been done in the Isabelle/HOL proof assistant [35]. In work particularly relevant to build systems, Gu´eneau et al. [32] verified both the functional correctness and time complexity of an incremental graph cycle detection algorithm in Coq. In contrast to our reasoning on pure functions and use of extraction, they reason directly on imperative OCaml code.

**Formalizations of build systems**: Christakis et al. [15] formalized a general build language called CloudMake in the Dafny verification tool. Their language is a purely functional subset of JavaScript, and allows describing dependencies between executions of tools and files. Having embedded their language in Dafny, they verify that builds with cached files are equivalent to builds from scratch. In contrast to the focus on generating files in CloudMake, we consider a formal model with an explicit dependency graph and an operation check on vertices whose output is not used as input to other operations. The CloudMake formalization assumes an arbitrary operation exec that can be instantiated using Dafny's module refinement system; we use Coq section variables to achieve similar parametrization for check. We view our Coq development as a library useful to tool builders, rather than a separate language that imposes a specific idiom for expressing dependencies and build operations.

Mokhov et al. [45] presented an analysis of several build systems, including a definition what it means for such systems to be correct. Their correctness formulation is similar to that of Christakis et al. for cached builds, and relies on a notion of abstract persistent stores expressed via monads. Our vertices and artifacts correspond quite closely to their notions of keys and values, respectively. However, their basic concepts are given as Haskell code, which has less clear meaning and a larger trusted base than Coq or Dafny code. Moreover, they provide no formal proofs. Mokhov et al. [44] subsequently formalized in Haskell a static analysis of build dependencies as used in the Dune build system.

Stores could be added to our model, e.g., by letting checkable vertices be associated with commands that take lists of file names and the current store state as parameters, producing a new state. However, this would in effect entail defining a specific build language inside Coq, which we consider outside the scope of our library and tool.

## **9 Conclusion**

We presented a formalization of change impact analysis and its encoding and correctness proofs in the Coq proof assistant. Our formal model uses finite sets and graphs to capture system components and their interdependencies before and after a change to a system. We locate impacted vertices that represent, e.g., tests to be run or build commands to be executed, by computing transitive closures in the pre-change dependency graph. We also considered two strategies for change impact analysis of hierarchical systems of components. We extracted optimized impact analysis functions in Coq to executable OCaml code, yielding a verified tool dubbed Chip. We then integrated Chip with a regression test selection tool for Java, Ekstazi, one regression proof selection tool for Coq itself, iCoq, and one build system, Tup, by replacing their existing components for impact analysis. We evaluated the resulting toolchains on several open-source projects by comparing the outcome and running time to those for the respective original tools. Our results show the same outcomes with only small differences in running time, corroborating the adequacy of our model and the feasibility of practical verified tools for impact analysis. We also believe our Coq library can be used as a basis for proving correct domain-specific incremental techniques that rely on change impact analysis, e.g., regression test selection for Java and regression proof selection for type theories.

## **Acknowledgments**

The authors thank Ben Buhse, Cyril Cohen, Pengyu Nie, Zachary Tatlock, Thomas Wei, Chenguang Zhu, and the anonymous reviewers for their comments and feedback on this work. This work was partially supported by the US National Science Foundation under Grant No. CCF-1652517.

## **References**


mal Methods. LNCS, vol. 8442, pp. 643–657. Springer, Cham, Switzerland (2014). https://doi.org/10.1007/978-3-319-06410-9 43


A., Tassi, E., Th´ery, L.: A machine-checked proof of the odd order theorem. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) International Conference on Interactive Theorem Proving. LNCS, vol. 7998, pp. 163–179. Springer, Heidelberg, Germany (2013). https://doi.org/10.1007/978-3-642-39634-2 14


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **What's Decidable About Program Verification Modulo Axioms?***-*

Umang Mathur , P. Madhusudan, and Mahesh Viswanathan

University of Illinois, Urbana Champaign, USA

**Abstract.** We consider the decidability of the verification problem of programs *modulo axioms* — automatically verifying whether programs satisfy their assertions, when the function and relation symbols are interpreted as arbitrary functions and relations that satisfy a set of first-order axioms. Though verification of uninterpreted programs (with no axioms) is already undecidable, a recent work introduced a subclass of *coherent* uninterpreted programs, and showed that they admit decidable verification [26]. We undertake a systematic study of various natural axioms for relations and functions, and study the decidability of the coherent verification problem. Axioms include relations being reflexive, symmetric, transitive, or total order relations, functions restricted to being associative, idempotent or commutative, and combinations of such axioms as well. Our comprehensive results unearth a rich landscape that shows that though several axiom classes admit decidability for coherent programs, coherence is not a panacea as several others continue to be undecidable.

## **1 Introduction**

Programs are proved correct against safety specifications typically by induction the induction hypothesis is specified using *inductive invariants* of the program, and one proves that the reachable states of the program stays within the region defined by the invariants, inductively. Though there has been tremendous progress in the field of *decidable logics* for proving that invariants are inductive, finding inductive invariants is almost never fully automatic. And completely automated verification of programs is almost always undecidable.

Programs can be viewed as working over a data-domain, with variables storing values over this domain and being updated using constants, functions and relations defined over that domain. Apart from the notable exception of finite data domains, program verification is typically undecidable when the data domain is infinite. In a recent paper, Mathur et. al. [26] establish new decidability results when the data domain is infinite. Two crucial restrictions are imposed data domain functions and relations are assumed to be *uninterpreted* and programs are assumed to be *coherent* (the meaning of coherence is discussed later

<sup>-</sup> Umang Mathur is partially supported by a Google PhD Fellowship. P. Madhusudan is partially supported by NSF CCF 1527395. Mahesh Viswanathan is partially supported by NSF CCF 1901069

in this introduction). The theory of uninterpreted functions is an important theory in SMT solvers that is often used (in conjunction with other theories) to solve feasibility of loop-free program snippets, in bounded model-checking, and to validate verification conditions. The salient aspect of [26] is to show that entire program verification is decidable for the class of coherent programs, without any user-provided inductive invariants (like loop invariants). While the results of [26] were mainly theoretical, there has been recent work on applying this theory to verifying memory-safety of heap-manipulating programs [28].

Data domain functions and relations used in a program usually satisfy special properties and are not, of course, entirely uninterpreted. The results of [26] can be seen as an approximate/abstraction-based verification method in practice if the program verifies assuming functions and relations to be uninterpreted, then the program is correct for *any* data domain. However, properties of the data domain are often critical in establishing correctness. For example, in order to prove that a sorting program results in sorted arrays, it is important that the binary relation < used to compare elements of the array is a total ordering on the underlying data sort. Consequently, constraining the data domain to satisfy certain axioms results in more accurate modeling for verification.

*In this paper, we undertake a systematic study of the verification of uninterpreted programs when the data-domains are constrained using theories specified by (universally quantified) axioms.* The choice of the axioms we study are guided by two principles. First, we study natural mathematical properties of functions and relations. Second, we choose to study axioms that have a decidable *quantifier-free* fragment of first order logic. The reason is that even single program executions (as defined in Section 3.2) can easily encode quantifier-free formulae (by computing the terms in variables, and assert Boolean combinations of atomic relations and equality on them). Since we are seeking decidable verification for programs *with loops/iteration*, it makes little sense to examine axioms where even verification of single executions is undecidable.

**Coherence modulo theories:** Mathur et. al. [26] define a subclass of programs, called *coherent programs*, for which program verification on uninterpreted domains is decidable; without the restriction of *coherence*, program verification on uninterpreted domains is undecidable. Since our framework is strictly more powerful, we adapt the notion of coherence to incorporate theories. A coherent program [26] is one where all executions satisfy two properties — memoizing and early-assumes. The memoizing property demands that the program computes any term, modulo congruence induced by the equality assumes in the execution, only once. More precisely, if an execution recomputes a term, the term should be stored in a current variable. The early-assumes restriction demands, intuitively, that whenever the program assumes two terms to be equal, it should do so early, before computing superterms of them.

We adapt the above notion to *coherence modulo theories*<sup>1</sup>. The memoizing and early-assumes property are now required modulo the equalities that are entailed by the axioms. More precisely, if the theory is characterized by a set of axioms A, the memoizing property demands that if a program computes a term t and there was another term t that it had computed earlier which is equivalent to t modulo the assumptions made thus far *and the axioms* A, then t must be currently stored in a variable. Similarly, the early-assumes condition is also with respect to the axioms — if the program execution observes a new assumption of equality or a relation holding between terms, then we require that any equality entailed newly by it, the previous assumptions *and the axioms* A do not involve a dropped term. This is a smooth extension of the notion of coherence from [26]; when <sup>A</sup> <sup>=</sup> <sup>∅</sup>, we essentially retrieve the notion from [26].

#### **Main Contributions**

Our first contribution is an extension of the notion of coherence in [26] to handle the presence of axioms, as described above; this is technically nontrivial and we provide a natural extension.

Under the new notion of coherence, we first study axioms on relations. The EPR (effectively propositional reasoning) [37] fragment of first order logic is one of the few fragments of first order logic that is decidable, and has been exploited for bounded model-checking and verification condition validation in the literature [34,33,32]. We study axioms written in EPR (i.e., universally quantified formulas involving only relations) and show that verification for even coherent programs, modulo EPR axioms, is undecidable.

Given the negative result on EPR, we look at particular natural axioms for relations, which are nevertheless expressible in EPR. In particular, we look at reflexivity, irreflexivity, and symmetry axioms, and show that verification of coherent programs is decidable when the interpretation of some relational symbols is constrained to satisfy these axioms. Our proof proceeds by instrumenting the program with auxiliary assume statements that preserve coherence and subtle arguments that show that verification can be reduced to the case without axioms; decidability then follows from results established in [26].

We then show a much more nontrivial result that verification of coherent programs remains decidable when some relational symbols are constrained to be transitive. The proof relies on new automata constructions that compute streaming congruence closures while interpreting the relations to be transitive.

Furthermore, we show that combinations of reflexivity, irreflexivity, symmetry, and transitivity, admit a decidable verification problem for coherent program. Using this observation, we conclude decidability of verification when certain relations are required to be strict partial orders (irreflexive and transitive) or equivalence relations.

<sup>1</sup> We adapt the definition in a way that preserves the spirit of the definition of coherence. Moreover, if we do not adapt the definition, essentially all axioms classes we study in this paper would be undecidable.

We then consider axioms that capture total orders and show that they too admit a decidable coherent verification problem. Total orders are also expressible in EPR and their formulation in EPR has been used in program verification, as they can be used in lieu of the ordering on integers when only ordering is important. For example, they can be used to model data in sorting algorithms, array indices in modeling distributed systems to model process ids and the states of processes, etc. [34,33].

Our next set of results consider axioms on functions. Associativity and commutativity are natural and fundamental properties of functions (like + and ∗) and are hence natural ways to capture/abstract using these axioms. (See [14] where such abstractions are used in program analysis.) We first show that verification of coherent programs is decidable when some functions are assumed to be commutative or idempotent. Our proof, similar to the case of reflexive and symmetric relations, relies on reducing verification to the case without axioms using program instrumentation that capture the commutativity and idempotence axioms. However, when a function is required to be associative, the verification problem for coherent programs becomes undecidable. This undecidability result was surprising to us.

The decidability results established for properties of individual relation or function symbols discussed above can be combined to yield decidable verification modulo a set of axioms. That is, the verification of coherent programs with respect to models where relational symbols satisfy some subset of reflexivity/irreflexivity/symmetery/transitivity axioms or none, and function symbols are either uninterpreted, commutative, or idempotent, is decidable.

Decidability results outlined above, apply to programs that are coherent modulo the axioms/theories. However, given a program, in order to verify it using our techniques, we would also like to decide whether the program *is* coherent modulo axioms. We prove that for all the decidable axioms above, checking whether programs are coherent modulo the axioms is a decidable problem. Consequently, under these axioms, we can both check whether programs are coherent modulo the axioms and if they are, verify them.

There are several other results that we mention only in passing. For instance, we show that even for single executions, verifying them modulo equational axioms is undecidable as it is closely related to the word problem for groups. And our positive results for program verification under axioms for functions (commutativity, idempotence), also shows that bounded model-checking under such axioms is decidable, which can have its own applications.

Due to the large number of results and technically involved proofs, we give only the main theorems and proof gists for some of these in the paper; details can be found in [27].

## **2 Illustrative Example**

Consider the problem of searching for an element k in a sorted list. There are two simple algorithms for this problem. Algorithm 1 walks through the list from **assume** (<sup>T</sup> -= F); found := F; stop := F; exists := F; sorted := T; **while**( <sup>x</sup> -<sup>=</sup> NIL) { **if**( stop <sup>=</sup> <sup>F</sup>) **then** { **if**( <sup>k</sup> <sup>=</sup> key(x)) **then** found := T; **if**( <sup>k</sup> <sup>≤</sup> key(x)) **then** stop := T; } **if**( <sup>k</sup> <sup>=</sup> key(x)) **then** exists := T; y := next(x); **if**( <sup>y</sup> -<sup>=</sup> NIL) **then** { **if**( k(x) -<sup>≤</sup> k(y)) **then** sorted := F; } x := y; } @post: sorted <sup>=</sup> <sup>T</sup> <sup>=</sup><sup>⇒</sup> found <sup>=</sup> exists e1 e2 e3 e4 e5 e6 e7 x, y NIL T sorted e8 F, stop, found, exists e9 next next next key key key < is {(e5, e6),(e6, e7),(e7, e5)}

k

**Fig. 1.** Left: Uninterpreted program for finding a key <sup>k</sup> in a list starting at <sup>x</sup> with <sup>&</sup>lt; interpreted as a strict total order. The condition a ≤ b is shorthand for a<b ∨ a = b. Right: A model in which < is not interpreted as a strict total order. The elements in the universe of the model are denoted using circles. Some elements are labeled with variables denoting the initial values of these variables. The edges represent subterm relation. Not all functions are shown in the figure. The model does not satisfy the post-condition on the program on left.

beginning to end, and if it finds k, it sets a Boolean variable exists to T. Notice this algorithm does not exploit the sortedness property of the list. Algorithm 2 also walks through the list, but it stops as soon as it either finds k or reaches an element that is larger than k. If it finds the element it sets a Boolean variable found to T. If both algorithms are run on the same sorted list, then their answers (namely, exists and found) must be the same.

Fig. 1 (on the left) shows a program that weaves the above two algorithms together (treating Algorithm 1 as the specification for Algorithm 2). The variable x walks down the list using the next pointer. The variable stop is set to T when Algorithm 2 stops searching in the list. The precondition, namely that the input list is sorted, is captured by tracking another variable sorted whose value is T if consecutive elements are ordered as the list is traversed. The post condition demands that whenever the list is sorted, found and exists be equal when the list has been fully traversed. Note that the program's correctness is specified using only quantifier-free assertions using the same vocabulary as the program.

The program works on a data domain that provides interpretations for the functions key, next, the initial values of the variables, and the relation <. When < is interpreted to be a strict total order, the program is correct. However, if < is not interpreted as a total order, then the program may be incorrectly deemed as buggy. To see this, consider the data model shown on the right in Fig. 1. The data domain has 9 elements in its universe, with the functions next and key interpreted as shown. Initially, x, y have value e1, NIL is e4, k is e7, T and sorted are e8, and F, found, exists, and stop are e9. The interpretation of < is as follows — e<sup>5</sup> < e6, e<sup>6</sup> < e7, and e<sup>7</sup> < e5. Clearly < is not an order, but the program's *sortedness* check "sorted = T" will pass. After the entire list is processed, exists will be set to T when x = e3. On the other hand, stop will be set to T when x = e<sup>1</sup> because k = e<sup>7</sup> < key(x). Therefore, at the end found = F = exists. The work presented in [26], where all functions and relations are uninterpreted, would therefore declare this program to be incorrect.

The goal of this paper is to explore several natural restrictions on data models and study the problem of verifying coherent programs for them. When < is constrained to be a total order, the program in Fig. 1 is correct and coherent. Our results (see Section 5.5) show that verification of such programs when relations are constrained to be strict total orders is decidable, and hence we can build automatic decision procedures that will correctly verify such programs.

## **3 Preliminaries**

We briefly recall the syntax and semantics of uninterpreted programs and the verification problem modulo axioms. Our presentation closely follows [26] and for lack of space, some details have been postponed to [27].

#### **3.1 Program Syntax**

We consider imperative programs with loops over a fixed finite set of variables V and use constant (C), function (F), and predicate (R) symbols belonging to some first order signature Σ = (C, F, R). Programs are then given by the syntax below (f ∈ F, R ∈ R, x, y ∈ V , **z** is a tuple of variables in V ):

$$\begin{aligned} \langle \text{stmt} \rangle &::= \mid x := y \mid x := f(\mathbf{z}) \mid \mathbf{assume}(\langle \text{cond} \rangle) \mid \mathbf{skip} \mid \langle \text{stmt} \rangle; \langle \text{stmt} \rangle \\ & \mid \text{ while } (\langle \text{cond} \rangle) \langle \text{stmt} \rangle \mid \mathbf{if} (\langle \text{cond} \rangle) \mathbf{then} \langle \text{stmt} \rangle \mathbf{else} \langle \text{stmt} \rangle \\ \langle \text{cond} \rangle &::= x = y \mid R(\mathbf{z}) \mid \neg \langle \text{cond} \rangle \end{aligned}$$

#### **3.2 Executions and Semantics of Uninterpreted Programs**

Executions of programs over stmt are words over the following alphabet

$$\begin{aligned} \Pi &= \{ \text{``}x := y \text{''}, \text{``}x := f(\text{z}) \text{''}, \text{``}\text{assume}(x = y) \text{''}, \text{``}\text{assume}(x \neq y) \text{''},\\ \text{``}\text{assume}(R(\mathbf{z})) &\text{''}, \text{``}\text{assume}(\neg R(\mathbf{z})) \text{''} \mid x, y \in V, \mathbf{z} \text{ is } \text{`}\text{ } \text{``}\text{st}(V) \} \end{aligned}$$

For a program s ∈ stmt, the set of executions of s, denoted Exec(s) is a regular language over the alphabet Π and is given as follows (similar to [26]).

Exec(**skip**) = Exec(x := y)="x := y" Exec(x := f(**z**))="x := f(**z**)" Exec(**assume**(c)) = "**assume**(c)" Exec(**if** c **then** s<sup>1</sup> **else** s2)="**assume**(c)" · Exec(s1)+"**assume**(¬c)" · Exec(s2) Exec(s1; s2) = Exec(s1) · Exec(s2) Exec(**while** c {s}) = ["**assume**(c)" · Exec(s1)]<sup>∗</sup> · "**assume**(¬c)"

The set of partial executions of s is the set of prefixes of words in Exec(s) and is also regular.

A data model <sup>M</sup> = (UM, -<sup>M</sup>) for signature <sup>Σ</sup> is a first order structure where <sup>U</sup><sup>M</sup> is a universe of elements and -<sup>M</sup> maps every symbol in <sup>Σ</sup> to their interpretations. Given a data model M over Σ, and an execution ρ ∈ Π∗, the semantics of ρ on M is given by eval<sup>M</sup> : Π<sup>∗</sup> × V → U<sup>M</sup> that gives the the valuation of variables in V at the end of an execution; the precise definition is standard and is defered to [27].

### **3.3 Feasibility of Executions Modulo Axioms**

An execution is said to be *feasible* in a data model, if every assumption made in the execution, holds on the model. More precisely, an execution ρ is feasible in M if for every prefix σ-= σ · "**assume** c" of ρ, we have

(a) evalM(σ, x) = evalM(σ, y) if c is '(x = y)',

(b) evalM(σ, x) = evalM(σ, y) if c is '(x = y)',

(c) (evalM(σ, z1),..., evalM(σ, zr)) <sup>∈</sup> -<sup>R</sup><sup>M</sup> if <sup>c</sup> is 'R(z1,...,zr)', and

(d) (evalM(σ, z1),..., evalM(σ, zr)) ∈ -<sup>R</sup><sup>M</sup> if <sup>c</sup> is '¬R(z1,...,zr)'.

Let A be a set of first order sentences, including possible ground atomic predicates <sup>2</sup>. We say that a data model <sup>M</sup> is an <sup>A</sup>-model, denoted M |<sup>=</sup> <sup>A</sup>, if for every ϕ ∈ A, we have M |= ϕ. A formula ϕ is A-valid, denoted A |= ϕ, if φ holds in every model M that satisfies A. An execution ρ is said to be *feasible modulo* A if there is an A-model M such that ρ is feasible in M.

#### **3.4 Program Verification Modulo Axioms**

We consider programs annotated with post-conditions that are over the following syntax below. Here, x, y and **z** belong to the set of program variables V and R ∈ R is a relation symbol in Σ.

$$\mathcal{L}: \qquad \varphi ::= x = y \quad | \quad R(\mathbf{z}) \: \mid \: \varphi \lor \varphi \mid \: \neg \varphi$$

**Definition 1 (Program Verification Modulo Axioms).** *For a program* s *and a set of axioms* A*, we say that* s *satisfies a postcondition* ϕ *over the syntax* L modulo A *if for every* A*-model* M *and for execution* ρ ∈ Exec(s) *that is feasible in* M*,* M *satisfies* ϕ[evalM(ρ, V )/V ] *(i.e., where each variable* x ∈ V *is replaced by* evalM(ρ, V )*).*

We remark that one can alternatively phrase the verification problem stated above in terms of feasibility. That is, a program s satisfies a postcondition ϕ modulo A iff every execution ρ of s is infeasible modulo A (i.e., there is no A-model M such that ρ is feasible in M), where s-= s; **assume**(¬ϕ).

<sup>2</sup> A ground atomic predicate is of the form <sup>t</sup><sup>1</sup> <sup>∼</sup> <sup>t</sup>2, or <sup>R</sup>(t1,...tk) or <sup>¬</sup>R(t1,...tk), where ∼∈ {=, -=}, R is a relation symbol, and tis are ground terms.

## **4 Coherence Modulo Axioms**

In this section we extend the notion of coherence from [26], adapting it to our current setting where we restrict data models using axioms A. We will first recall the notion of terms computed by an execution, which will be used to define the notion of coherence.

#### **4.1 Terms Computed and Assumptions Accumulated by Executions**

We will associate a syntactic term TEval(ρ, x) with each variable x ∈ V after a partial execution <sup>ρ</sup>. Intuitively, every variable <sup>x</sup> <sup>∈</sup> <sup>V</sup> stores a constant term <sup>x</sup> in the beginning of an execution. New terms are computed on function computations, i.e., TEval(ρ · "x := f(z1,...,zr)") = f( TEval(ρ, z1),..., TEval(ρ, zr)). The precise definition is simple and is defered to [27]. The set of terms computed by an execution ρ is Terms(ρ) = { TEval(ρ- , x) | ρis a prefix of ρ, x ∈ V }.

As an execution proceeds, it accumulates assumptions over the terms it computes, and we will use κ(ρ) to denote the assumptions made by the execution ρ (see [27] for precise definition). For example, after an equality assume statement "**assume**(x = y)", we accumulate the atomic equality predicate ψ = t<sup>x</sup> = ty, where t<sup>x</sup> and t<sup>y</sup> are terms associated with x and y when the assume statement is encountered. Similarly, for the execution ρ = ρ- · "**assume**(¬R(z1, z2,...,zk))", we have κ(ρ) = κ(ρ- ) ∪ {¬R( TEval(ρ- , z1),..., TEval(ρ- , zk))}.

#### **4.2 Coherence**

Our definition of coherence modulo axioms is a smooth generalization of the definition of coherence in [26]. The notion of coherence consists of two properties *memoizing* and *early equality assumes*. The memoizing property says, intuitively, when a term t is computed after executing some prefix σ of an execution, if t is equivalent to some other term modulo the assumptions made in the execution so far, then t must not have been *dropped* at the end of σ, i.e., a program variable must already hold this term. We replace the notion of equivalence of terms in this definition by equivalence modulo the axioms as well.

The notion of early assumes in [26] intuitively says that assumptions of equality (on terms t<sup>1</sup> and t2) should be encountered early — earlier than *dropping* any superterm of t<sup>1</sup> or t2. This notion of early assumes allows for effectively computing *congruence closure* on the set of terms computed by the execution, which in turn, is necessary to accurately maintain which terms are equivalent. However, we observe that the notion in [26] is too restrictive and not entirely necessary. In our paper, we generalize this notion in several ways, to a more semantic one as follows. Whenever an execution encounters an assumption of equality between two term, we instead demand that only the equivalences that are *additionally* implied by this new assumption, can be infered *locally* using the already known congruence between terms in the *window*, i.e., the set of terms pointed to by the program variables when the equality assumption is encountered. Next, we incorporate axioms into this definition, by requiring that the notion of equivalence is also modulo the axioms, and further require that *all* assumptions (equality, disequality, relational) are required to be early (as against only restricting equality assumptions to be early like in [26]). We will elaborate on these differences using an example after presenting the formal definition next.

Given a set of first order sentences Γ and ground terms t<sup>1</sup> and t2, we say that t<sup>1</sup> ∼=<sup>Γ</sup> t<sup>2</sup> if Γ |= t<sup>1</sup> = t2.

**Definition 2 (Coherence modulo axioms).** *Let* A *be a set of axioms and let* ρ *be a complete or partial execution over variables* V *. Then,* ρ *is said to be coherent modulo* A *if it satisfies the following two properties.*


**Remark.** We remark that every execution that is coherent as per the definition in [26], is also coherent modulo <sup>A</sup> <sup>=</sup> <sup>∅</sup> as in Definition 2. However, the converse is not true and we illustrate this difference below.

*Example 1.* Let us now illustrate the notion of coherence in the presence of axioms using the execution ρ below.

$$\rho \circ \rho = \mathbf{z}\_1 := \mathbf{f}(\mathbf{x}, \mathbf{y}) \cdot \mathbf{z}\_2 := \mathbf{f}(\mathbf{y}, \mathbf{x}) \cdot \mathbf{z}\_3 := \mathbf{g}(\mathbf{z}\_1) \cdot \mathbf{z}\_4 := \mathbf{g}(\mathbf{z}\_2) \cdot \mathbf{z}\_3 := \mathbf{z}\_6 \cdot \mathbf{z}\_6 := \mathbf{g}(\mathbf{z}\_1) \cdot \mathbf{z}\_4$$

Let <sup>ρ</sup><sup>i</sup> denote the prefix of <sup>ρ</sup> of length <sup>i</sup>. Here, TEval(ρ3, <sup>z</sup>3) = <sup>g</sup>(f(x, y)), TEval(ρ5, <sup>z</sup>3) = <sup>z</sup>-<sup>5</sup> <sup>=</sup> <sup>g</sup>(f(x, <sup>y</sup>)) and TEval(ρ6, <sup>z</sup>6) = <sup>g</sup>(f(x, y)). When the set of axioms is <sup>A</sup> <sup>=</sup> <sup>∅</sup>, this execution is not coherent modulo <sup>A</sup> as it violates the memoizing requirement at the last statement z<sup>6</sup> := g(z1) (no variable stores the term <sup>g</sup>(f(x, y)) after ρ5).

Now, consider the axiom set denoting commutativity of f, i.e., Acomm = {∀u, v.f(u, v) = <sup>f</sup>(u, v)}. In this case, we observe that <sup>f</sup>(x, <sup>y</sup>) <sup>∼</sup>=<sup>A</sup>comm <sup>f</sup>(y, x) and thus <sup>g</sup>(f(x, <sup>y</sup>)) <sup>∼</sup>=<sup>A</sup>comm <sup>g</sup>(f(y, <sup>x</sup>)). Also, TEval(ρ5, <sup>z</sup>4) = <sup>g</sup>(f(y, x)) ∼=<sup>A</sup>comm <sup>g</sup>(f(x, y)). This ensures that ρ is indeed coherent modulo Acomm.

Let CoherentExecs(Σ,V, A) denote the set of executions over the signature Σ and variables V that are coherent modulo the set of axioms A.

**Definition 3.** *A program* s *over signature* Σ *and variables* V *is said to be coherent modulo* A *if* Exec(s) ⊆ CoherentExecs(Σ,V, A)*.*

In this paper, we explore several classes of axioms, studying when the verification problem for coherent programs modulo the axioms is decidable.

## **5 Axioms over Relations**

In this section, we investigate the decidability of the verification problem for coherent programs modulo relational axioms, i.e., axioms which only involve relation symbols R in the signature Σ.

#### **5.1 Verification modulo EPR axioms**

A first-order formula is said to be an EPR formula [37] if it is of the form

$$\exists x\_1 \dots x\_k \forall y\_1 \dots y\_m \,\,\varphi\_1$$

where ϕ is quantifier-free and purely relational (uses no function symbols).

It is well known that satisfiability of EPR formulas is decidable, in fact by a reduction to Boolean satisfiability [24]. Consequently, the problem of checking whether a single execution is feasible under axioms written in EPR can be shown to be decidable, and has been exploited in bounded model-checking.

Consequently, we could reasonably ask whether verification of coherent programs under EPR axioms is decidable. Surprisingly, we show that they are not (proof details can be found in [27]).

**Theorem 1.** *Verification of uninterpreted coherent programs modulo EPR axioms is undecidable.*

Given the above result, we turn to several classes of quantified axioms, which are all expressible in EPR (and hence have a decidable bounded model checking problem) and examine their decidability for coherent program verification.

#### **5.2 Reflexivity, Irreflexivity, and Symmetry**

We consider program verification under the following axioms (individually):

$$\begin{array}{lcl}\varphi\_{\mathsf{ref}}^{R} & \triangleq \forall x \cdot R(x,x) & \text{(reflexivity)}\\\varphi\_{\text{irref}}^{R} & \triangleq \forall x \cdot \neg R(x,x) & \text{(irreflexivity)}\\\varphi\_{\text{spmm}}^{R} & \triangleq \forall x,y \cdot R(x,y) \implies R(y,x) & \text{(symmetry)}\end{array} \tag{1}$$

We show that verification is decidable modulo these axioms using a technique that we call *program instrumentation*. Let us fix a relation R and an axiom ϕ<sup>R</sup> p , where p ∈ {refl, irref,symm}. The idea is to find a function (in fact, a string homomorphism) h<sup>R</sup> <sup>p</sup> such that for any program P, P is correct/coherent modulo {ϕ<sup>R</sup> <sup>p</sup> } iff <sup>h</sup><sup>R</sup> <sup>p</sup> ( Exec(P)) is correct/coherent modulo the empty axiom set. Decidability then follows by exploiting the results of [26]. The function h<sup>R</sup> <sup>p</sup> will capture the properties of the axiom it is trying to eliminate, and so it will be different for different axioms. We first outline these function h<sup>R</sup> <sup>p</sup> , then state their property and prove the decidability result.

**Fig. 2.** Implied negative relational assumes for a transitive relation <sup>R</sup>. The dashed edges ( ) represent the inferred relationship implied from the relations marked by bold edges ( ).

For reflexivity, we transform an execution ρ of P to ρ where ρ is essentially ρ, except that whenever we see the computation of a term, using an assignment of the form "x := f(**z**)", we immediately insert an assume statement that states that R(x, x) holds. More precisely, the homomorphism is defined as,

$$h\_{\text{ref}}^R(a) = \begin{cases} a \cdot \text{ "assume}(R(x, x))" & \text{if } a = \text{"}x := f(\mathbf{z})" \\ a & \text{otherwise} \end{cases}$$

The homomorphisms used for irreflexivity and symmetry follow similar lines and are outlined in [27].

**Theorem 2.** *For any relation symbol* R *and* p ∈ {refl, irref,symm}*, the problems of coherent verification modulo* {ϕ<sup>R</sup> <sup>p</sup> } *and checking coherence modulo* {ϕ<sup>R</sup> <sup>p</sup> } *are* PSPACE*-complete.*

#### **5.3 Transitivity**

We now consider the transitivity axiom for a relation R which says

$$
\varphi\_{\mathtt{trans}}^R = \forall x, y, z \cdot R(x, y) \land R(y, z) \implies R(x, z) \quad \text{(transitivity)}\tag{2}
$$

The proof for decidability modulo this axiom is different and more complex that the proofs for reflexivity, irreflexivity, and symmetry. Intuitively, the program instrumentation approach does not seem to work for transitivity. This is because transitivity effects can be global. For example, we may have that the execution asserts the sequence of relational assumes R(t1, t2), R(t2, t3),...R(t<sup>n</sup>−<sup>1</sup>, tn) (here, t1,...t<sup>n</sup> are terms computed by the execution), where some of the intermediate terms may have been dropped by the program (i.e., the variables holding these terms were reassigned). Consequently, relating t<sup>1</sup> and (the possibly newly constructed term) t<sup>n</sup> requires a principally new machinery. We modify the automaton construction from [26] so that it maintains the transitive closure of the assumptions the program makes. Our main observation is the following:

**Theorem 3.** *Let* Σ *be a first order signature and* V *a finite set of program variables. Let* <sup>A</sup> <sup>=</sup> {ϕ<sup>R</sup> trans | R ∈ Rtrans} *for some set of relation symbol* Rtrans *in* Σ*. The following observation hold.*


*Proof Sketch.* These are in some sense a generalization of the automata constructions used to establish decidability in [26].The automata Ftrans and Ctrans rely on tracking equivalence between values stored in variables, and functional and relational correspondences between these values. However, now since some relations maybe transitive, additional relational correspondences (or their absence) maybe implied for R ∈ Rtrans. The basic idea is to maintain for transitive relations R (a) the transitive closure of the positive relation assumes **assume**(R(·, ·)), and (b) the negative relational assumes implied by the relational assumes seen in an execution. More precisely, if the execution sees assumes **assume**(R(x, y)) and **assume**(R(y, z)), then we also add the constraint R(x, z) in the automaton's state. Further, if the execution observes **assume**(R(x, y)) and **assume**(¬R(x, z)), then one can infer the constraint ¬R(y, z), and in this case, we accumulate this additional constraint in the state of the automaton. Similarly, if the execution observes **assume**(R(y, z)) and **assume**(¬R(x, z)), then one can infer the constraint ¬R(x, y), which is added in the automaton's state. Both these scenarios are illustrated in Fig. 2. A detailed proof is in [27].

As a consequence we have the following result.

**Theorem 4.** *For* <sup>A</sup> <sup>=</sup> {ϕ<sup>R</sup> trans | R ∈ Rtrans}*, the problems of coherent verification modulo* A *and checking coherence modulo* A *are* PSPACE*-complete.*

#### **5.4 Strict Partial Orders**

We now turn our attention to axioms that dictate that certain relations be partial or total orders. The anti-symmetry axiom that holds for non-strict orders introduces subtle complications. Recall that R is anti-symmetric if ∀x, y.R(x, y)∧ R(y, x) ⇒ x = y; this axiom can imply equality between terms if R holds between a pair of terms. Concretely, if R is anti-symmetric, and the program makes assumptions in an execution that R(t1, t2) and R(t2, t1) hold, then any model in which such an execution is feasible must also ensure that t<sup>1</sup> = t2. This implicit equality assumption interferes with the notions of coherence and the automata constructions (proofs of the results in [26] and Theorem 4) that compute a congruence closure on terms in a streaming fashion.

Hence, we only consider *strict* partial orders in this section. Recall that a relation R is a strict partial order if it satisfies the irreflexivity axiom and the transitivity axiom, together denoted <sup>A</sup><sup>R</sup> SPO. We can prove decidability for problems modulo <sup>A</sup><sup>R</sup> SPO by using our algorithm for irreflexivity and transitivity.

**Theorem 5.** *The following problems are* PSPACE*-complete.*


#### **5.5 Strict Total Orders**

A relation R is a strict total order if it is a strict partial order and satisfies:

$$\forall x, y \cdot x \neq y \implies R(x, y) \lor R(y, x) \quad \text{(totality)}\tag{3}$$

Strict total orders are again tricky to handle as the axiom for totality can result in implicit equality between terms. For example, if ¬R(x, y) and ¬R(y, x) then it must be the case that x = y. However, if we restrict ourselves to executions that only have assumes of the form **assume**(R(x, y)) and do not have any assumes on ¬R, i.e., of the form **assume**(¬R(x, y)) then there are no implicit equalities that are entailed.

Unfortunately, in general, program executions can contain negative assumes on R (i.e., assumes of the form **assume**(¬R(x, y))). In order to ensure that executions contain only *positive* assumptions on R, we must be careful when identifying executions of programs with conditionals — branches where the assumption ¬R(x, y) holds must be translated to a branch that assumes R(y, x) and a branch that assumes x = y. We present a detailed translation in [27].

After such a translation, executions can now have additional equality assumes even if they did not appear in the program. When we refer to coherent programs, we mean that they are coherent according to the above modified notion of executions. This means for such programs to be coherent, all executions must ensure that the additional equality assumes are *early*. And when we talk about coherent verification of programs with total orders, we mean verification for programs that are coherent after this transformation.

We observe that in the absence of any assumes of the form ¬R(x, y) the verification problem modulo strict total orders reduces that modulo strict partial orders, giving us the following (A<sup>R</sup> STO denote the axioms of irreflexivity, transitivty and totality for the relation R).

**Theorem 6.** *The problems of coherent verification, and checking coherence modulo* <sup>A</sup><sup>R</sup> STO *are* PSPACE*-complete.*

## **6 Axioms Over Functions**

We now discuss computational problems modulo axioms that involve function symbols. The treatment of axioms involving functions in the verification of coherent programs is inherently hard. This is because, like in the case of (nonstrict) partial orders and strict total orders, the axioms along with the **assume**-steps in the execution, can imply equalities between terms beyond those entailed by just the **assume** steps in the execution. For example, consider the axiom ∀x, y · f(x, y) = f(y, x) constraining f to be a commutative function. Then terms like f(f(x, y), z) are equal to terms like f(z, f(x, y)), and hence when building models we must make sure that functions/relations on such terms are defined in the same way. Terms made equivalent by the functional axioms can be syntactically very different, and keeping track of the equivalence on unbounded executions is hard using finite memory. We consider many natural classes of axioms, and proving both positive and negative results that help delineate the decidability/undecidability boundary.

#### **6.1 Associativity**

We now consider the associativity axiom for a function f.

$$
\varphi^f\_{\mathsf{assoc}} \triangleq \forall x, y, z \cdot f(x, f(y, z)) = f(f(x, y), z) \tag{4.86: } \text{(associativity)} \tag{4}
$$

We show, surprisingly to us, that coherent verification is undecidable modulo {ϕ<sup>f</sup> assoc}, i.e., even when we have only one axiom that requires only one function to be associative. In fact, the situation is a lot worse — checking the feasibility of even a *single* (even coherent) execution is undecidable, in the presence of a single associative function. The proof of the following result uses a reduction from the word problem for finitely generated semigroups [36].

**Theorem 7.** *Given a a trace* <sup>ρ</sup> *that is coherent modulo* {ϕ<sup>f</sup> assoc}*, it is undecidable to determine if* ρ *is feasible. Therefore, the problem checking if a program* P *that is coherent modulo* {ϕ<sup>f</sup> assoc} *is undecidable.*

#### **6.2 Commutativity**

We now consider the commutativity axiom, which is the following

$$
\varphi^f\_{\text{comm}} \triangleq \forall x, y \cdot f(x, y) = f(y, x) \tag{commutativity} \tag{5}
$$

We augment executions with an auxiliary variable v<sup>∗</sup> ∈ V and transform executions using the following homomorphism that uses the auxiliary variable v<sup>∗</sup>

$$h\_{\mathsf{comm}}^f(a) = \begin{cases} a \cdot \, \prescript{\mathsf{u}}{}{\circ}^\* := f(y, x) \prescript{\mathsf{u}}{}{\circ}^\* \mathsf{assume}(z = v^\*) \prescript{\mathsf{u}}{}{\circ}^\* & \text{if } a = \, \prescript{\mathsf{u}}{}{\circ} := f(x, y) \prescript{\mathsf{u}}{}{\circ}^\* \\ a & \text{otherwise} \end{cases}$$

We show that the above transformation preserves feasibility and coherence, giving us the following result.

**Theorem 8.** *Verification of coherent programs and checking coherence modulo commutativity axioms is decidable and is* PSPACE−*complete.*

#### **6.3 Idempotence**

Next we consider the idempotence axiom for a unary function f:

$$
\varphi^f\_{\mathsf{idem}} \triangleq \forall x \cdot f(x) = f(f(x)) \tag{6}
$$

Again, we show that there is a simple homomorphism h<sup>f</sup> idem that preserves coherence and feasibility (see [27]) and reduces verification to one without axioms.

**Theorem 9.** *Verification of coherent programs and checking coherence modulo idempotence axioms is* PSPACE*-complete.*

## **7 Combining Axioms**

We have thus far proved decidability results when a relation or functions satisfies certain properties like reflexivity/irreflexivity/symmetry/transitivity or commutativity/idempotence. We now show that all of these results can be combined. That is, we can consider a signature where relations and functions are assumed to satisfy some subset of these properties, and with some being uninterpreted, and the verification problem will remain decidable for coherent programs.

**Theorem 10.** *Let* A *be a set of axioms where each relation symbol* R *is either a total order or satisfies some (possibly empty) subset of properties out of reflexivity, irreflexivity, symmetry, transitivity, and each function symbol* f *satisfies some (possibly empty) subset out of commutativity and idempotence. The verification problem for coherent programs modulo* A *is* PSPACE*-complete.*

The proof of the above result proceeds by *eliminating* axioms one at a time. We first eliminate the relational axioms (reflexivity, irreflexivity, symmetry) in A using program instrumentation. We then eliminate the functional axioms in A, again using program instrumentation. Our proof relies on this order of elimination of axioms. At this point, the only axioms remaining are those corresponding to transitivity of a subset of relational symbols, which is handled using the automata construction discussed in the proof of Theorem 3.

## **8 Related Work**

The theory of equality with uninterpreted functions (EUF) is a widely used theory in many verification applications as it has decidable quantifier free fragment. EUF has been central to advances in verification of microprocessor control [6,4] and hardware verification [1,19] and property directed model checking [18]. EUF has been used as a popular abstraction in software verification [2,3]. Uninterpreted functions have also been studied for equivalence checking and translation validation [35]. Bueno et al [5] demonstrated the effectiveness of uninterpreted programs for verifying SVCOMP benchmarks against control flow properties.

Mathur et al [26] introduced the class of coherent uninterpreted programs and showed that verification of coherent programs, with or without recursive function calls, is a decidable problem. This is one of the few subclasses of program verification over infinite domains that is known to be decidable. Previous works [13,14,31] have established decidability of verification of classes of uninterpreted programs with heavy syntactic restrictions such as disallowing conditionals inside loops or nested loops, etc. As noted in [26], the notion of coherence is close to the notion of a bounded pathwidth decomposition [38]. A term that is created in a coherent execution stays within some program variable (modulo congruence) until the first time all variables containing that term are over-written, and after this point, the execution never computes it again, and thus, the set of windows that contain a term form a contiguous segment of the program execution. Path decomposition and the related notion of tree decomposition have been exploited many times in the literature to give decidability in verification [25,7,8].

The work in [28] extends the work of [26] to *updatable maps* and identifies extensions of coherence that make verification decidable. It utilizes this to provide implementation of verification algorithms for memory safety for a class of heap manipulating programs, including traversal algorithms on data structures such as singly linked list, sorted lists, binary search trees etc. Combining the results of this paper with these results is an interesting future direction.

The class of EPR formulas that consist of universally quantified formulas over relational signatures is a well-known decidable class of first-order logic [37]. EPR-based reasoning has been proved powerful for verification of large-scale systems [33,29,39] and the Ivy [34,30] system is one of the most notable framework that exploits EPR based reasoning for verifying program snippets without recursion. EPR encoding of order axioms such as reflexivity, symmetry, transitivity and total orders has been used in proving programs working over heaps [20].

The work in Kleene Algebra with Tests (KAT) [22] considers problems involving unbounded recursion and choice with abstractions of data, similar to our work. However, while we treat congruence axioms for equality faithfully in our work, it is unclear to us how to express these in KAT or its extensions [21,23,9]. Furthermore, the restrictions of coherence studied in [26] and the work here that are based on bounded path-width notions seem very different from studies of decidable problems in KAT. A study of whether our results can be adapted to yield decidable fragments for KAT is an interesting future direction.

A notable verification technique with an automata-theoretic foundation and that has been very effective in practice is that of trace abstraction due to Heizmann et al [15,16,17,10,11,12]. In this technique, one constructs *iteratively* regular sets that (incompletely) capture the set of all infeasible executions, eventually striving to cover all failing executions of a program, but handling complex theories such as arithmetic. In contrast, our work builds complete automata in one stroke that accept all infeasible traces over a vocabulary, but handles only simple theories with restricted sets of axioms, but yielding decidability. Combining these lines of work for efficient software verification is an interesting future direction.

## **9 Conclusions**

By incorporating axioms on functions and relations, decidability results in this paper, enable a more faithfully automatic verification of programs. It is worth noting that the upper bound for all our decidability results is PSPACE, which is the same as that for Boolean programs. Thus, though we consider programs over infinite domains with additional structure, our verification results have the same complexity as that for programs over Boolean domains.

One future direction is to adapt this technique for practical program verification. In this context, adapting our technique within the automata-theoretic technique of [15,17,16,12,10] seems most promising. Second, there are several program verification techniques that use EPR, and in several of these, EPR is used mainly to establish a linear order on the universe [20]. Automatically verifying such programs using our technique is worth exploring.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If

## **Formalized Proofs of the Infinity and Normal Form Predicates in the First-Order Theory of Rewriting***-*

Alexander Lochmann and Aart Middeldorp

Department of Computer Science, University of Innsbruck, Austria {alexander.lochmann,aart.middeldorp}@uibk.ac.at

**Abstract.** We present a formalized proof of the regularity of the infinity predicate on ground terms. This predicate plays an important role in the first-order theory of rewriting because it allows to express the termination property. The paper also contains a formalized proof of a direct tree automaton construction of the normal form predicate, due to Comon.

**Keywords:** Formalization · First-order theory of rewriting · Tree automata

## **1 Introduction**

Term rewriting [1,18] is an abstract model of computation which underlies much of declarative programming and automated theorem proving. The foundation of rewriting is equational logic. Equations are used from left to right to direct the search for proofs. Fundamental properties like *confluence* (which ensures that different computation paths produce the same result) and *termination* (all computation paths produce a result) are *undecidable* in general. For terminating systems, one is interested in estimating the resources needed to evaluate expressions (space and time *complexity*). Much progress has been made in establishing sufficient and automatable criteria for confluence, termination, complexity, and other properties of rewrite systems. These criteria have been implemented in highly optimized automatic tools that compete on a yearly basis [12, 13]. These competitions, together with the recent advances in SAT [4] and SMT [2] solving, have on the one hand led to specialized techniques that are especially suitable for automation. On the other hand, software bugs observed in the tools gave rise to the more recent activity of *certification* of the output of termination, complexity, and confluence tools. This is done by formalizing the underlying methods in an interactive proof assistant like Coq [3] or Isabelle [15], and using the code generation facilities of these proof assistants to obtain trustworthy programs that can certify the output of the tools.

In this paper we are concerned with the formalization of methods that are used in FORT [16,17], a tool that implements the first-order theory of rewriting

<sup>-</sup>This research is supported by FWF (Austrian Science Fund) project P30301.

for the decidable class of left-linear, right-ground rewrite systems. FORT can be used to decide properties of a given rewrite system and to synthesize rewrite systems that satisfy arbitrary properties expressible in the first-order theory of rewriting. The decision procedure is based on tree automata techniques and goes back to a paper by Dauchet and Tison [7]. In a recent paper [10] the authors formalized results concerning ground tree transducers and RR<sup>n</sup> automata for a fragment of the first-order theory that allows to express confluence, resulting in a formalized confluence prover for left-linear, right-ground rewrite systems. In this paper we cover the infinity predicate that is crucial for expressing the termination property in the first-order theory of rewriting and an efficient automaton construction of the normal form predicate that is employed in FORT. The former goes back to a technical report by Dauchet and Tison [8] and the latter is based on a paper by Comon [5]. The normal form predicate has other applications as well (e.g. [9,14]). A proof of the construction of [8] is given in [16], but this proof contains a serious mistake that we report at the end of Section 3.

Our formalizations are based on IsaFoR [19],<sup>1</sup> an Isabelle/HOL library containing numerous abstract results and concrete techniques from the rewriting literature. Our own development can be found at

#### http://cl-informatik.uibk.ac.at/software/fortissimo/tacas2020/

Most definitions, theorems, and lemmata in this paper directly correspond to the formalization. These are indicated by the symbol, which links to a HTML presentation in the PDF version of the paper.

In the next section we recall basic definitions, notation, and results concerning term rewriting and tree automata that we need in the sequel. In Section 3 we present our first main result, a formalized correctness proof of the regularity of the infinity predicate for regular relations. The tree automaton constructed in the correctness proof is not directly executable due to the definition of <sup>Q</sup><sup>∞</sup> which plays an important role in the construction of the tree automaton. In Section <sup>4</sup> we present our second main result, an equivalent definition of <sup>Q</sup><sup>∞</sup> that is constructive. Our third result, a formalized correctness proof of an efficient tree automata construction of the normal form predicate for left-linear rewrite systems, is the topic of Section 5. We conclude in Section 6 with some statistics of our formalizations as well as a list of tasks that remain to be done for a certified version of FORT.

When we write "formalized" we always mean "formalized in Isabelle/HOL."

## **2 Preliminaries**

Familiarity with term rewriting [1] and tree automata [6] is useful, but we briefly recall important definitions and notation that we use in the remainder.

We assume a given signature F and a set of variables V. Function symbols in F are equipped with a fixed arity. Function symbols of arity zero are called

<sup>1</sup> http://cl-informatik.uibk.ac.at/isafor/

constants. The set of terms built from <sup>F</sup> and <sup>V</sup> is denoted by <sup>T</sup> (F, <sup>V</sup>) and inductively defined: A term is either a variable x ∈ V or f(t<sup>1</sup>,...,t<sup>n</sup>) for a function symbol <sup>f</sup> of arity <sup>n</sup> and terms <sup>t</sup><sup>1</sup>,...,t<sup>n</sup> ∈ T (F, <sup>V</sup>). The set of variables occurring in a term t is denoted by <sup>V</sup>ar(t). A term t with <sup>V</sup>ar(t) = <sup>∅</sup> is called ground. We write T (F) for the set of ground terms. Positions are strings of positive integers which are used to address subterms. The empty string is called root position and denoted by -. The set of positions in a term t is denoted by <sup>P</sup>os(t) and the subterm of t at position p ∈ Pos(t) by t|p. We write <sup>s</sup> t if s is a proper subterm of <sup>t</sup>, i.e., <sup>s</sup> <sup>=</sup> <sup>t</sup>|<sup>p</sup> with <sup>p</sup> <sup>=</sup> -. We write <sup>t</sup>[u]<sup>p</sup> for the result of replacing the subterm of t at position p with the term u. The root symbol of a term t is denoted by root(t) and t(p) denotes root(t|p). We write p<q if <sup>p</sup> is a proper prefix of q. A context C is a term with a hole . Here ∈ F / is a special constant. We write C[t] for the result of replacing the hole in C by t. A substitution σ is a mapping from variables to terms. We write tσ for the result of applying σ to the term t.

A term rewrite system (TRS for short) <sup>R</sup> consists of rewrite rules <sup>→</sup> r between terms and r over the same signature <sup>F</sup> such that <sup>V</sup>ar(r) ⊆ Var(). The rewrite relation <sup>→</sup><sup>R</sup> is defined on terms as follows: <sup>s</sup> <sup>→</sup><sup>R</sup> <sup>t</sup> if there exist a position p ∈ Pos(s), a rewrite rule <sup>→</sup> r ∈ R, and a substitution σ such that <sup>s</sup>|<sup>p</sup> <sup>=</sup> σ and <sup>t</sup> <sup>=</sup> <sup>s</sup>[rσ]p. The reflexive transitive closure of <sup>→</sup><sup>R</sup> is denoted by →<sup>∗</sup> <sup>R</sup>. A redex is a substitution instance of a left-hand side of a rewrite rule. Terms that contain a redex as subterm are called reducible. A normal form is a term without redexes. We write NF(R) for the set of ground normal forms of R. In this paper we consider finite TRSs over finite signatures. The TRSs handled by FORT are left-linear (no duplicate variables in left-hand sides of rewrite rules) and right-ground (no variables in right-hand sides of rewrite rules).

We now recall some basic notions related to tree automata. A *tree automaton* is a quadruple <sup>A</sup> = (F, Q, Q<sup>f</sup> , Δ) consisting of a finite signature <sup>F</sup>, a finite set <sup>Q</sup> of states, disjoint from <sup>F</sup>, a subset <sup>Q</sup><sup>f</sup> <sup>⊆</sup> <sup>Q</sup> of final states, and a set of transition rules Δ. Every transition rule has one of the following two shapes:

**–** <sup>f</sup>(p<sup>1</sup>,...,p<sup>n</sup>) <sup>→</sup> q with f ∈ F and p<sup>1</sup>,...,p<sup>n</sup>, q <sup>∈</sup> Q, or **–** p <sup>→</sup> q with p, q <sup>∈</sup> Q.

Transition rules of the second shape are called epsilon transitions. We write Δ- for the set of epsilon transitions. Furthermore, Δ<sup>¬</sup>- <sup>=</sup> <sup>Δ</sup> \ <sup>Δ</sup>-. Transition rules can be viewed as rewrite rules between ground terms in <sup>T</sup> (F ∪ Q). The induced rewrite relation is denoted by <sup>→</sup><sup>Δ</sup> or <sup>→</sup>A. A ground term <sup>t</sup> ∈ T (F) is *accepted* by <sup>A</sup> if t <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> for some <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>f</sup> . The set of all accepted terms is denoted by <sup>L</sup>(A) and a set L of ground terms is *regular* if L <sup>=</sup> L(A) for some tree automaton <sup>A</sup>.

Let <sup>A</sup> = (F, Q, Q<sup>f</sup> , Δ) be a tree automaton. A state <sup>q</sup> <sup>∈</sup> <sup>Q</sup> is *reachable* if t <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> for some term <sup>t</sup> ∈ T (F). We say that <sup>q</sup> is *productive* if <sup>C</sup>[q] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> q<sup>f</sup> for some ground context <sup>C</sup> and final state <sup>q</sup><sup>f</sup> <sup>∈</sup> <sup>Q</sup><sup>f</sup> . The automaton <sup>A</sup> is *trim* if all states are both reachable and productive. Any tree automaton can be transformed into an equivalent trim automaton. This result has been formalized in IsaFoR by Felgenhauer and Thiemann [11].

Below we present a formalized proof of a version of the *pumping lemma* that we need later.

**Lemma 1.** *Let* <sup>A</sup> = (F, Q, Q<sup>f</sup> , Δ) *be a tree automaton and* <sup>t</sup> <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> *with* <sup>t</sup> <sup>∈</sup> <sup>T</sup> (F) *and* <sup>q</sup> <sup>∈</sup> <sup>Q</sup>*. If* height(t) <sup>&</sup>gt; <sup>|</sup>Q<sup>|</sup> *then there exist contexts* <sup>C</sup><sup>1</sup> *and* <sup>C</sup><sup>2</sup> <sup>=</sup> *, a term* u*, and a state* p *such that* t <sup>=</sup> C<sup>1</sup>[C<sup>2</sup>[u]]*,* <sup>u</sup> <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>p</sup>*,* <sup>C</sup><sup>2</sup>[p] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>p</sup>*, and* C<sup>1</sup>[p] <sup>→</sup><sup>∗</sup> Δ q*.*

*Proof.* From the assumptions t <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> and height(t) <sup>&</sup>gt; <sup>|</sup>Q<sup>|</sup> we obtain a sequence (t<sup>1</sup>,...,t<sup>n</sup>+1, q<sup>1</sup>,...,q<sup>n</sup>+1, D<sup>1</sup>,...,D<sup>n</sup>) consisting of ground terms, states, and non-empty contexts with n > <sup>|</sup>Q<sup>|</sup> such that

**–** <sup>t</sup><sup>i</sup> <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup><sup>i</sup> for all <sup>i</sup> <sup>n</sup> + 1, **–** <sup>D</sup><sup>i</sup>[t<sup>i</sup>] = <sup>t</sup><sup>i</sup>+1 and <sup>D</sup><sup>i</sup>[q<sup>i</sup>] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup><sup>i</sup>+1 for all <sup>i</sup> <sup>n</sup>, and **–** <sup>q</sup><sup>n</sup>+1 <sup>=</sup> <sup>q</sup> and <sup>t</sup><sup>n</sup>+1 <sup>=</sup> <sup>t</sup>

by a straightforward induction proof on t. Because n > <sup>|</sup>Q<sup>|</sup> there exist indices <sup>1</sup> i<j <sup>n</sup> such that <sup>q</sup><sup>i</sup> <sup>=</sup> <sup>q</sup><sup>j</sup> . We construct the contexts <sup>C</sup><sup>1</sup> <sup>=</sup> <sup>D</sup><sup>n</sup>[... [D<sup>j</sup> ] ... ] and <sup>C</sup><sup>2</sup> <sup>=</sup> <sup>D</sup><sup>j</sup>−<sup>1</sup>[... [D<sup>i</sup>] ... ]. Note that <sup>C</sup><sup>2</sup> <sup>=</sup> as i<j. We obtain <sup>C</sup><sup>2</sup>[q<sup>i</sup>] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> q<sup>j</sup> and <sup>C</sup><sup>1</sup>[q<sup>j</sup> ] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup><sup>n</sup>+1 by induction on the difference <sup>j</sup> <sup>−</sup> <sup>i</sup>. By letting <sup>p</sup> <sup>=</sup> <sup>q</sup><sup>i</sup> <sup>=</sup> <sup>q</sup><sup>j</sup> and <sup>u</sup> <sup>=</sup> <sup>t</sup><sup>i</sup> we obtain the desired result. -

We conclude this preliminary section with a brief account of RR<sup>2</sup> relations, which are binary relations on ground terms over a signature F whose *encoding* as sets of ground terms over the extended signature <sup>F</sup>(2) = (F ∪ {⊥})<sup>2</sup> with a fresh constant <sup>⊥</sup> ∈ F/ is regular. The arity of a symbol fg ∈ F(2) is the maximum of the arities of f and g. The encoding of two terms t, u ∈ T (F) is the unique term t, u ∈T (F(2)) such that <sup>P</sup>os(t, u ) = <sup>P</sup>os(t)∪ Pos(u) and t, u (p) = fg where

$$f = \begin{cases} t(p) & \text{if } p \in \mathcal{P} \mathbf{os}(t) \\ \bot & \text{otherwise} \end{cases} \qquad g = \begin{cases} u(p) & \text{if } p \in \mathcal{P} \mathbf{os}(u) \\ \bot & \text{otherwise} \end{cases}$$

for all positions p ∈ Pos(t) ∪ Pos(u). We illustrate this on a concrete example. For the ground terms t <sup>=</sup> <sup>f</sup>(g(a), <sup>f</sup>(b, <sup>a</sup>)) and u <sup>=</sup> <sup>f</sup>(a, <sup>g</sup>(g(b))) we obtain t, u <sup>=</sup> ff(ga(a⊥), fg(bg(⊥b), <sup>a</sup>⊥)). A tree automaton operating on terms in <sup>T</sup> (F(2)) is called an RR<sup>2</sup> automaton. The two *projection* operations effectively transform RR<sup>2</sup> relations on T (F) to regular subsets of T (F).

## **3 Infinity Predicate**

The following formula in the first-order theory of rewriting expresses the termination property:<sup>2</sup>

$$\forall t \; \mathsf{F} \mathsf{IN}\_{\to+}(t) \land \neg \exists \; u \; (u \to^+ u) \; \mathsf{F}$$

<sup>2</sup> The formula characterizes termination of all rewrite systems <sup>R</sup> with the property that the induced rewrite relation <sup>→</sup><sup>R</sup> is *finitely branching*.

The predicate FIN→<sup>+</sup> holds for t ∈ T (F) if there are only finitely many terms u ∈ T (F) such that t <sup>→</sup><sup>+</sup> u. We consider its complement as it leads to smaller automata:

$$\neg \exists \; t \; (\mathsf{lNF}\_{\to^+}(t) \lor t \to^+ t)$$

with INF→<sup>+</sup> <sup>=</sup> {<sup>t</sup> ∈ T (F) <sup>|</sup> <sup>t</sup> <sup>→</sup><sup>+</sup> <sup>R</sup> <sup>u</sup> for infinitely many terms <sup>u</sup> ∈ T (F)}.

**Definition 1.** *Let* ◦ *be an arbitrary binary relation on* T (F)*. We write* INF◦ *for the set* {t ∈ T (F) <sup>|</sup> (t, u) ∈ ◦ *for infinitely many terms* u ∈ T (F)}*.*

In [8] the construction of a tree automaton that accepts FIN◦ for an arbitrary RR<sup>2</sup> relation ◦<sup>3</sup> is given. In [16, Appendix A] a correctness proof of the construction is presented, which contains a serious mistake (reported at the end of this section). In this section we give a rigorous and formalized proof of the regularity of INF◦ for arbitrary RR<sup>2</sup> relations ◦.

**Theorem 1.** *The set* INF◦ *is regular for every RR*<sup>2</sup> *relation* ◦⊆T (F) × T (F)*.*

The following definition originates from [8].

**Definition 2.** *Given a tree automaton* <sup>A</sup> = (F(2), Q, Q<sup>f</sup> , Δ)*, the set* <sup>Q</sup><sup>∞</sup> <sup>⊆</sup> <sup>Q</sup> *consists of all states* q <sup>∈</sup> Q *such that* ⊥, t →<sup>∗</sup> <sup>Δ</sup> <sup>q</sup> *for infinitely many terms* t ∈ T (F)*.*

*Example 1.* Consider the binary relation

$$\circ = \{ (\mathsf{f}(\mathsf{a}, \mathsf{g}^m(\mathsf{b})), \mathsf{g}^m(\mathsf{f}(\mathsf{a}, \mathsf{b}))) \mid n = 2 \text{ and } m \gg 1 \text{ or } n \gg 3 \text{ and } m = 1 \}$$

The encoding of ◦ is accepted by the RR<sup>2</sup> automaton <sup>A</sup> = (F(2), Q, Q<sup>f</sup> , Δ) with <sup>F</sup> <sup>=</sup> {a, <sup>b</sup>, <sup>f</sup>, <sup>g</sup>}, <sup>Q</sup> <sup>=</sup> {0,..., <sup>11</sup>}, <sup>Q</sup><sup>f</sup> <sup>=</sup> {0}, and <sup>Δ</sup> consisting of the following transition rules:


For instance,

$$\begin{aligned} \langle \mathsf{f}(\mathsf{a}, \mathsf{g}(\mathsf{g}(\mathsf{b}))), \mathsf{g}(\mathsf{f}(\mathsf{a}, \mathsf{b})) \rangle &= \mathsf{f}\mathsf{g}(\mathsf{af}(\bot \mathsf{a}, \bot \mathsf{b}), \mathsf{g}\bot(\mathsf{g}\bot(\mathsf{b}\bot))) \\ &\to\_{\Delta}^{\*} \mathsf{f}\mathsf{g}(\mathsf{af}(3, 4), \mathsf{g}\bot(\mathsf{g}\bot(7))) \to\_{\Delta}^{\*} \mathsf{f}\mathsf{g}(1, \mathsf{g}\bot(6)) \to\_{\Delta} \mathsf{f}\mathsf{g}(1, 2) \to\_{\Delta} 0 \end{aligned}$$

but f(a, <sup>g</sup>(b), <sup>f</sup>(a, <sup>b</sup>)) <sup>=</sup> ff(aa, gb(b⊥)) is not accepted.

We have <sup>Q</sup><sup>∞</sup> <sup>=</sup> {5}. State 5 is reached by ⊥, <sup>g</sup><sup>n</sup>(f(a, <sup>b</sup>)) for all <sup>n</sup> 0.

<sup>3</sup> The relation <sup>→</sup><sup>+</sup> <sup>R</sup> is an RR<sup>2</sup> relation for left-linear, right-ground TRSs <sup>R</sup>. Other uses of FIN (INF) can be found in [16].

**Definition 3.** - *Given a tree automaton* <sup>A</sup> = (F(2), Q, Q<sup>f</sup> , Δ)*, we define the tree automaton* <sup>A</sup><sup>∞</sup> = (F(2), Q <sup>∪</sup> Q, ¯ <sup>Q</sup>¯<sup>f</sup> , Δ <sup>∪</sup> <sup>Δ</sup>¯)*. Here* <sup>Q</sup>¯ *is a copy of* <sup>Q</sup> *where every state is dashed:* q¯ <sup>∈</sup> Q¯ *if and only if* q <sup>∈</sup> Q*. For every transition rule* fg(q<sup>1</sup>,...,q<sup>n</sup>) <sup>→</sup> q <sup>∈</sup> Δ *we have the following transition rules in* Δ¯*:*

$$\begin{array}{ccccc} fg(q\_1, \ldots, q\_n) \to \bar{q} & \quad \text{if } q \in Q\_\infty \text{ and } f = \bot & \quad \text{(1)}\\ \vdots & \quad \text{(2)} & \quad \text{(3)} \end{array}$$

$$\{fg(q\_1, \ldots, q\_{i-1}, \bar{q}\_i, q\_{i+1}, \ldots, q\_n) \to \bar{q} \qquad \text{for all } 1 \leqslant i \leqslant n\tag{2}$$

*Moreover, for every* -*-transition* p <sup>→</sup> q <sup>∈</sup> Δ *we add*

$$
\bar{p} \to \bar{q} \tag{3}
$$

*to* Δ¯*. We write* Δ *for* <sup>Δ</sup> <sup>∪</sup> <sup>Δ</sup>¯*.*

Dashed states are created by rules of shape (1) and propagated by rules of shapes (2) and (3). The above construction differs from the one in [8]; instead of (1) the latter contains fg(q<sup>1</sup>,...,q<sup>n</sup>) <sup>→</sup> <sup>q</sup>¯ if <sup>q</sup><sup>i</sup> <sup>∈</sup> <sup>Q</sup><sup>∞</sup> for some i > arity(f). In an implementation, rather than adding all dashed states and all transition rules of shape (2), the necessary rules would be computed by propagating the dashes created by (1) in order to avoid the appearance of unreachable dashed states. When A<sup>∞</sup> is used in isolation, a single bit suffices to record that a dashed state occurred during a computation.

*Example 2.* For the tree automaton A from Example 1 we obtain A<sup>∞</sup> by adding the following transition rules (the missing rules of shape (2) involve unreachable states):

$$\bot \mathsf{L} \mathsf{f}(3,4) \to \bar{5} \qquad \mathsf{L} \mathsf{g}(5) \to \bar{5} \qquad \mathsf{L} \mathsf{g}(\bar{5}) \to \bar{5} \qquad \mathsf{ag}(\bar{5}) \to \bar{1} \qquad \mathsf{f} \mathsf{g}(\bar{1},2) \to \bar{0}$$

The unique final state of <sup>A</sup><sup>∞</sup> is ¯0. We have f(a, <sup>g</sup>(g(b))), <sup>g</sup>(f(a, <sup>b</sup>))<sup>∈</sup> <sup>L</sup>(A∞) but there is no term u such that f(a, <sup>g</sup>(b)), u<sup>∈</sup> L(A∞).

The following preliminary lemma is proved by a straightforward induction argument.

**Lemma 2.** *If* t <sup>→</sup><sup>∗</sup> <sup>A</sup> <sup>p</sup> *then* <sup>t</sup> <sup>→</sup><sup>∗</sup> <sup>A</sup><sup>∞</sup> <sup>p</sup>*. If* <sup>C</sup>[p] <sup>→</sup><sup>∗</sup> <sup>A</sup> <sup>q</sup> *then* <sup>C</sup>[p] <sup>→</sup><sup>∗</sup> <sup>A</sup><sup>∞</sup> <sup>q</sup> *and* C[¯p] <sup>→</sup><sup>∗</sup> <sup>A</sup><sup>∞</sup> <sup>q</sup>¯*.* - -

**Theorem 2.** *Suppose* ◦ *is accepted by the RR*<sup>2</sup> *automaton* <sup>A</sup>*. If* <sup>t</sup> <sup>∈</sup> INF◦ *then* t, u<sup>∈</sup> <sup>L</sup>(A∞) *for some term* u ∈ T (F)*.*

*Proof.* From <sup>t</sup> <sup>∈</sup> INF◦ and ◦ <sup>=</sup> <sup>L</sup>(A) we obtain t, u<sup>∈</sup> <sup>L</sup>(A) for infinitely many terms u ∈ T (F). Since the signature is finite, there are only finitely many ground terms of any given height. Moreover, height(t, u ) = max (height(t), height(u)). Hence there must exist a term u ∈ T (F) with t, u<sup>∈</sup> L(A) such that

$$\text{height}(t) + |Q| + 1 < \text{height}(u)$$

This is only possible if there are positions p and q such that p /∈ Pos(t), pq <sup>∈</sup> <sup>P</sup>os(u), and <sup>|</sup>Q<sup>|</sup> <sup>&</sup>lt; <sup>|</sup>q|. From <sup>P</sup>os(t, u ) = <sup>P</sup>os(t) ∪ Pos(u) we obtain t, u |<sup>p</sup> <sup>=</sup> ⊥, u|<sup>p</sup> . Since t, u<sup>∈</sup> <sup>L</sup>(A) there exist states <sup>r</sup> <sup>∈</sup> <sup>Q</sup> and <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>f</sup> such that

$$\langle t, u \rangle = \langle t, u \rangle [\langle \perp, u \vert\_p \rangle]\_p \to\_{\mathcal{A}}^\* \langle t, u \rangle [r]\_p \to\_{\mathcal{A}}^\* q\_f \rangle$$

where we assume without loss of generality that the last step in the subsequence ⊥, u|p →<sup>∗</sup> <sup>A</sup> <sup>r</sup> uses a non-epsilon transition rule.

From <sup>|</sup>Q<sup>|</sup> <sup>&</sup>lt; <sup>|</sup>q<sup>|</sup> and pq ∈ Pos(u) we infer <sup>|</sup>Q<sup>|</sup> <sup>&</sup>lt; height(⊥, u|<sup>p</sup> ). Hence we can use the pumping lemma (Lemma 1) to conclude the existence of infinitely many terms v ∈ T (F) such that ⊥, v →<sup>∗</sup> <sup>A</sup> <sup>r</sup>. Hence <sup>r</sup> <sup>∈</sup> <sup>Q</sup><sup>∞</sup> by Definition 2. Since the last step ⊥, u|p →<sup>∗</sup> <sup>A</sup> <sup>r</sup> uses a non-epsilon transition rule, we obtain ⊥, u|p →<sup>∗</sup> <sup>A</sup><sup>∞</sup> <sup>r</sup>¯ using Lemma <sup>2</sup> and a final application of a rule of shape (1). Also using Lemma <sup>2</sup> we obtain t, u [¯r]<sup>p</sup> <sup>→</sup><sup>∗</sup> <sup>A</sup><sup>∞</sup> <sup>q</sup>¯<sup>f</sup> as t, u [r]<sup>p</sup> <sup>→</sup><sup>∗</sup> <sup>A</sup> <sup>q</sup><sup>f</sup> . We conclude t, u<sup>∈</sup> L(A∞) as desired. -

For the reverse direction of Theorem 3 we need two auxiliary results. The first result is proved by a straightforward induction argument. Here the mapping ϕ: <sup>T</sup> (F(2) <sup>∪</sup> <sup>Q</sup> <sup>∪</sup> <sup>Q</sup>¯) → T (F(2) <sup>∪</sup> <sup>Q</sup>) erases all dashes from states.

$$\text{Lemma 3.}\text{ If } t \in \mathcal{T}(\mathcal{F}^{(2)} \cup Q \cup \bar{Q}) \text{ and } t \to\_{\mathcal{A}\_{\infty}}^{\*} p \text{ then } \varphi(t) \to\_{\mathcal{A}}^{\*} \varphi(p). \tag{5}$$

With a little bit more effort, we obtain the second auxiliary result. The key step in the proof is identifying the rule of shape (1) that is used to create the first dashed state.

**Lemma 4.** *If* t ∈ T (F(2)) *and* <sup>t</sup> <sup>→</sup><sup>∗</sup> <sup>A</sup><sup>∞</sup> <sup>p</sup>¯ *then there exist a state* <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>∞</sup>*, a context* C*, and a term* s *such that* C[s] = t*,* root(s) = <sup>⊥</sup>f *for some* f ∈ F*,* s <sup>→</sup><sup>∗</sup> <sup>A</sup><sup>∞</sup> <sup>q</sup>¯*, and* <sup>C</sup>[¯q] <sup>→</sup><sup>∗</sup> <sup>A</sup><sup>∞</sup> <sup>p</sup>¯*.* -

**Theorem 3.** *Suppose* ◦ *is accepted by the RR*<sup>2</sup> *automaton* <sup>A</sup>*. If* t, u<sup>∈</sup> <sup>L</sup>(A∞) *for some term* <sup>u</sup> ∈ T (F) *then* <sup>t</sup> <sup>∈</sup> INF◦*.*

*Proof.* From t, u<sup>∈</sup> <sup>L</sup>(A∞) we obtain a final state ¯q<sup>f</sup> <sup>∈</sup> <sup>Q</sup>¯ with t, u →<sup>∗</sup> <sup>A</sup><sup>∞</sup> <sup>q</sup>¯<sup>f</sup> . Using Lemma 4, we obtain a context C, a term s with root(s) = <sup>⊥</sup>f for some <sup>f</sup> ∈ F, and a state <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>∞</sup> such that <sup>C</sup>[s] = t, u , <sup>s</sup> <sup>→</sup><sup>∗</sup> <sup>A</sup><sup>∞</sup> <sup>q</sup>¯, and <sup>C</sup>[¯q] <sup>→</sup><sup>∗</sup> <sup>A</sup><sup>∞</sup> <sup>q</sup>¯<sup>f</sup> . Let p be the position of the hole in C. From C[s] = t, u and root(s) = <sup>⊥</sup>f, we infer <sup>p</sup> ∈ Pos(u) \ Pos(t). Since <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>∞</sup> the set {<sup>v</sup> ∈ T (F) | ⊥, v →<sup>∗</sup> <sup>A</sup> <sup>q</sup>} is infinite. Hence the set <sup>S</sup> <sup>=</sup> {u[v]<sup>p</sup> ∈ T (F) | ⊥, v →<sup>∗</sup> <sup>A</sup> <sup>q</sup>} is infinite, too. Let <sup>u</sup>[w]<sup>p</sup> <sup>∈</sup> <sup>S</sup>. So ⊥, w →<sup>∗</sup> <sup>A</sup> <sup>q</sup>. Since <sup>C</sup> is ground and <sup>C</sup>[¯q] <sup>→</sup><sup>∗</sup> <sup>A</sup><sup>∞</sup> <sup>q</sup>¯<sup>f</sup> , we obtain C[q] <sup>→</sup><sup>∗</sup> <sup>A</sup> <sup>q</sup><sup>f</sup> from Lemma 3. We have <sup>C</sup>[w] = t, u[w]<sup>p</sup> as <sup>p</sup> ∈ Pos(u) \ Pos(t). It follows that t, u[w]p<sup>∈</sup> <sup>L</sup>(A) and thus there are infinitely many terms <sup>u</sup> such that t, u<sup>∈</sup> L(A). Since ◦ <sup>=</sup> L(A) we conclude the desired t <sup>∈</sup> INF◦. -

The final step to conclude that the infinity predicate is indeed regular is now easy.

*Proof (of Theorem 1).* Combining Theorem 2 and Theorem 3 yields the following equivalence:

$$t \in \mathbb{N} \mathsf{F}\_{\circ} \qquad \Longleftrightarrow \quad \langle t, u \rangle \in L(\mathcal{A}\_{\infty}) \text{ for some term } u$$

Hence a tree automaton that accepts INF◦ is obtained by subjecting A<sup>∞</sup> to a projection operation (on the first argument).

Projection on RR<sup>n</sup> automata has been formalized in Isabelle/HOL as part of [10]. -

The mistake in the proof given in the appendix of [16] is quoted below and corresponds to the proof of Theorem 2:

The set <sup>U</sup> <sup>=</sup> {u ∈ T (F) <sup>|</sup> (t, u) ∈ ◦} is infinite. Since the signature <sup>F</sup> is finite, infinitely many terms u in <sup>U</sup> have a height greater than t. Hence there exists a position p /∈ Pos(t) such that the set <sup>U</sup> <sup>=</sup> {u ∈U| p <sup>∈</sup> <sup>P</sup>os(u)} is infinite. For every <sup>u</sup> ∈ U we have t, u |<sup>p</sup> <sup>=</sup> ⊥, u|<sup>p</sup> . Since t, u is accepted by <sup>A</sup> and Q is finite, there must exist a state q such that ⊥, u|p →<sup>∗</sup> <sup>A</sup> <sup>q</sup> for infinitely many terms <sup>u</sup> ∈ U . Therefore q <sup>∈</sup> Q<sup>∞</sup>.

The following example refutes the above reasoning, which is the key step in the proof in [16]. It was found in attempt to formalize the proof.

*Example 3.* Let <sup>t</sup> <sup>=</sup> <sup>f</sup>(a, <sup>b</sup>) and consider the infinite set <sup>U</sup> <sup>=</sup> {f(f(a, <sup>b</sup>), <sup>g</sup><sup>n</sup>(b)) <sup>|</sup> n <sup>1</sup>}. The automaton

$$\mathcal{A} = (\{\mathbf{f}, \mathbf{g}, \mathbf{a}, \mathbf{b}\}^{(2)}, \{q\_1, \dots, q\_6\}, q\_6, \Delta).$$

with Δ consisting of the transition rules

$$\begin{array}{ccccc} \text{ff}(q\_4, q\_5) \to q\_6 & & \bot \mathbf{a} \to q\_2 & \mathbf{b} \mathbf{g}(q\_1) \to q\_5 & & \bot \mathbf{b} \to q\_1 \\ \text{af}(q\_2, q\_3) \to q\_4 & & \bot \mathbf{b} \to q\_3 & & \bot \mathbf{g}(q\_1) \to q\_1 \\ \end{array}$$

accepts the relation ◦ <sup>=</sup> {t}×U. Consider the position p = 11. We have p /<sup>∈</sup> <sup>P</sup>os(t) and <sup>p</sup> ∈ Pos(u) for all terms <sup>u</sup> ∈ U. Hence <sup>U</sup> <sup>=</sup> <sup>U</sup>. Moreover, t, u |<sup>p</sup> <sup>=</sup> ⊥, <sup>a</sup> <sup>=</sup> <sup>⊥</sup><sup>a</sup> for all terms u ∈ U . The only state reachable from <sup>⊥</sup><sup>a</sup> is <sup>q</sup><sup>2</sup> and clearly <sup>q</sup><sup>2</sup> <sup>∈</sup>/ <sup>Q</sup><sup>∞</sup>.

## **4 Executable Infinity Predicate**

Owing to the definition of <sup>Q</sup><sup>∞</sup>, the automaton <sup>A</sup><sup>∞</sup> defined in Definition <sup>3</sup> is not executable. In this section we give an equivalent but executable definition of Q<sup>∞</sup>, which we name Q<sup>e</sup> ∞:

$$Q^e\_{\infty} = \{ q \mid p \leadsto p \text{ and } p \leadsto q \text{ for some state } p \in Q \} \tag{8}$$

Here the relation is defined using the inference rules in Figure 1. Before proving that the two definitions are equivalent, we illustrate the definition of Qe <sup>∞</sup> by revisiting Example 1.

$$\frac{\bot f(p\_1, \ldots, p\_n) \to\_{\Delta} p}{p\_1 \leadsto p \quad \cdots \quad p\_n \leadsto p} \qquad \qquad \frac{p \leadsto q \quad q \to\_{\Delta} r}{p \leadsto r} \qquad \qquad \frac{p \leadsto q \quad q \leadsto r}{p \leadsto r}$$

**Fig. 1.** Inference rules for computing Q<sup>e</sup> ∞.

*Example 4.* We obtain 3 5 and 4 5 by applying the first inference rule to the transition rule <sup>⊥</sup>f(3, 4) <sup>→</sup> 5. Similarly, <sup>⊥</sup>g(5) <sup>→</sup> 5 gives rise to 5 5. Since A has no epsilon transitions, no further inferences can be made. It follows that Qe <sup>∞</sup> <sup>=</sup> {5}.

We call a term in <sup>T</sup> ({⊥} × F) *right-only*. A term in <sup>T</sup> (({⊥} × F) ∪ {}) with exactly one occurrence of the hole is a right-only context.

**Definition 4.** *We denote the composition of* <sup>→</sup><sup>Δ</sup>¬ *and* →<sup>∗</sup> Δ*by* Δ*.*

The proof of the next lemma is straightforward. Note that the relations →<sup>∗</sup> Δ and <sup>∗</sup> <sup>Δ</sup> do not coincide on *mixed* terms, involving function symbols and states.

**Lemma 5.** *Let* C *be a ground context. We have* C[p] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> *if and only if* <sup>p</sup> <sup>→</sup><sup>∗</sup> Δ p *and* C[p ] <sup>∗</sup> <sup>Δ</sup> <sup>q</sup> *for some state* <sup>p</sup> *.* -

**Lemma 6.** <sup>Q</sup><sup>∞</sup> <sup>⊆</sup> <sup>Q</sup><sup>e</sup> ∞

*Proof.* We start by proving the following claim:

if C[p] <sup>∗</sup> <sup>Δ</sup> <sup>q</sup> and <sup>C</sup> is a non-empty right-only context then <sup>p</sup> <sup>q</sup> (∗)

We use induction on the structure of C. If C <sup>=</sup> there is nothing to show. Suppose C <sup>=</sup> <sup>⊥</sup>f(t<sup>1</sup>,...,C ,...,t<sup>n</sup>) where C is the i-th subterm of C. The sequence C[p] <sup>∗</sup> <sup>Δ</sup> <sup>q</sup> can be rearranged as <sup>C</sup>[p] = <sup>⊥</sup>f(t<sup>1</sup>,...,C [p],...,t<sup>n</sup>) <sup>∗</sup> Δ <sup>⊥</sup>f(q<sup>1</sup>,...,q<sup>n</sup>) <sup>→</sup><sup>Δ</sup> <sup>q</sup> <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup>. We obtain <sup>q</sup><sup>i</sup> <sup>q</sup> and subsequently <sup>q</sup><sup>i</sup> <sup>q</sup> by using the inference rules in Figure 1. If <sup>C</sup> <sup>=</sup> then <sup>p</sup> <sup>=</sup> <sup>q</sup><sup>i</sup> and if <sup>C</sup> <sup>=</sup> then the induction hypothesis yields <sup>p</sup> <sup>q</sup><sup>i</sup> and thus <sup>p</sup> <sup>q</sup> by transitivity. This concludes the proof of (∗). -

Assume <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>∞</sup>, so there exist infinitely many terms t such that ⊥, t →<sup>∗</sup> Δ q. Since the signature is finite, there exist terms of arbitrary height. Thus there exists an arbitrary but fixed term t such that the height of t is greater than the number of states of Q. Write t <sup>=</sup> f(t<sup>1</sup>,...,t<sup>n</sup>). Since the height of <sup>t</sup> is greater than the number of the states in Q, there exist a subterm s of t, a state p, and contexts <sup>C</sup><sup>1</sup> and <sup>C</sup><sup>2</sup> <sup>=</sup> such that


4. C<sup>1</sup>[p] <sup>→</sup><sup>∗</sup> Δ q.

From Lemma <sup>5</sup> we obtain a state q such that <sup>p</sup> <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> and <sup>C</sup><sup>2</sup>[q ] <sup>∗</sup> <sup>Δ</sup> <sup>p</sup>. Hence q p by (∗). We obtain q q from q p in connection with the inference rule for epsilon transitions. We perform a case analysis of the context C<sup>1</sup>.


For the following lemma, we need the fact that A can be assumed to be trim, so every state is productive and reachable. We may do so because Theorem 1 talks about regular relations, and any automaton that accepts the same language as A will witness the fact that the given relation ◦ is regular.

#### **Lemma 7.** Q<sup>e</sup> <sup>∞</sup> <sup>⊆</sup> <sup>Q</sup><sup>∞</sup>*, provided that* <sup>A</sup> *is trim.*

*Proof.* In connection with the fact that A accepts ◦⊆T (F) × T (F), trimness of <sup>A</sup> entails that any run t <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> is embedded into an accepting run <sup>C</sup>[t] <sup>→</sup><sup>∗</sup> Δ C[q] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup><sup>f</sup> <sup>∈</sup> <sup>Q</sup><sup>f</sup> . So <sup>C</sup>[t] = u, v for some (u, v) ∈ ◦, and hence <sup>t</sup> must be a well-formed term. Moreover, if root(t) = <sup>⊥</sup>f for some f ∈ F then t <sup>=</sup> ⊥, u for some term u ∈ T (F). We now show the converse of claim (∗) in the proof of Lemma 6 for the relation →<sup>∗</sup> Δ:

$$\text{if } p \leadsto q \text{ then } C[p] \to\_{\Delta}^{\*} q \text{ for some ground right-only context } C \neq \Box \qquad (\*\*\*) $$

We prove the claim by induction on the derivation of p q. First suppose p q is derived from the transition rule <sup>⊥</sup>f(p<sup>1</sup>,...,p<sup>i</sup>,...,p<sup>n</sup>) <sup>→</sup> <sup>q</sup> in <sup>Δ</sup> with <sup>p</sup><sup>i</sup> <sup>=</sup> <sup>p</sup>. Because all states are reachable by well-formed terms, there exist terms <sup>t</sup><sup>1</sup>,...,t<sup>n</sup> ∈ T (F) such that ⊥, t →<sup>∗</sup> <sup>Δ</sup> <sup>p</sup><sup>i</sup> for all 1 <sup>i</sup> <sup>n</sup>. Let <sup>C</sup><sup>1</sup> <sup>=</sup> <sup>⊥</sup>f(⊥, t<sup>1</sup> ,..., ,...,⊥, t<sup>n</sup> ) where the hole is the <sup>i</sup>-th argument. We have C<sup>1</sup>[p] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>⊥</sup>f(p<sup>1</sup>,...,p<sup>i</sup>,...,p<sup>n</sup>) <sup>→</sup><sup>Δ</sup> <sup>q</sup>. Next suppose <sup>p</sup> <sup>q</sup> is derived from <sup>p</sup> <sup>q</sup> and <sup>q</sup> <sup>→</sup><sup>Δ</sup> <sup>q</sup>. The induction hypothesis yields a ground right-only context C <sup>=</sup> such that C[p] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> . Hence also C[p] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup>. Finally, suppose p q is derived from p r and r q. The induction hypothesis yields non-empty ground right-only contexts <sup>C</sup><sup>1</sup> and <sup>C</sup><sup>2</sup> such that <sup>C</sup><sup>1</sup>[p] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>r</sup> and C<sup>2</sup>[r] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup>. Hence <sup>C</sup>[p] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> for the context <sup>C</sup> <sup>=</sup> <sup>C</sup><sup>2</sup>[C<sup>1</sup>]. This concludes the proof of (∗∗). -

Now let q <sup>∈</sup> Q<sup>e</sup> <sup>∞</sup>. So there exists a state p such that p p and p q. Using (∗∗), we obtain non-empty ground right-only contexts <sup>C</sup><sup>1</sup> and <sup>C</sup><sup>2</sup> such that C<sup>1</sup>[p] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>p</sup> and <sup>C</sup><sup>2</sup>[p] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup>. Since all states are reachable, there exists a ground term <sup>t</sup> ∈ T (F(2)) such that t <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>p</sup>. Hence <sup>C</sup><sup>2</sup>[t] <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> and, by the observation made at the beginning of the proof, C<sup>2</sup>[t] is a well-formed term. Since <sup>C</sup><sup>2</sup> is right-only, it follows that <sup>t</sup> <sup>=</sup> ⊥, u for some term <sup>u</sup> ∈ T (F). Now consider the infinitely many terms <sup>t</sup><sup>n</sup> <sup>=</sup> <sup>C</sup><sup>2</sup>[C<sup>n</sup> <sup>1</sup> [t]] for <sup>n</sup> 0. We have <sup>t</sup><sup>n</sup> <sup>→</sup><sup>∗</sup> <sup>Δ</sup> q and <sup>t</sup><sup>n</sup> is right-only by construction. Hence <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>∞</sup>. - **Corollary 1.** Q<sup>e</sup> <sup>∞</sup> <sup>=</sup> <sup>Q</sup><sup>∞</sup>*, provided that* <sup>A</sup> *is trim.*

## **5 Normal Form Predicate**

The normal form predicate NF can be defined in the first-order theory of rewriting as

NF(t) ⇐⇒ ¬ ∃ u (t <sup>→</sup> u)

and this gives rise to the following procedure:


Since projection may transform a deterministic tree automaton into a nondeterministic one, this is inefficient. In this section we provide a direct construction of a tree automaton that accepts the set of ground normal forms of a left-linear TRS, which goes back to Comon [5], and present a formalized correctness proof. Throughout this section R is assumed to be left-linear.

We start with defining some preliminary concepts.

**Definition 5.** *Given a signature* F*, we write* F<sup>⊥</sup> *for the extension of* F *with a fresh constant symbol* <sup>⊥</sup>*. Given* t ∈ T (F, <sup>V</sup>)*,* t <sup>⊥</sup> *denotes the result of replacing all variables in* t *by* <sup>⊥</sup>*:*

$$x^\perp = \bot \qquad \qquad f(t\_1, \ldots, t\_n)^\perp = f(t\_1^\perp, \ldots, t\_n^\perp) \qquad \qquad \blacksquare$$

*We define the partial order on* <sup>T</sup> (F⊥) *as the least congruence that satisfies* <sup>⊥</sup> t *for all terms* t ∈ T (F⊥)*:*

$$\overline{\bot \leqslant t} \qquad\qquad\qquad \frac{t\_1 \leqslant u\_1 \quad \cdots \quad t\_n \leqslant u\_n}{f(t\_1, \dots, t\_n) \leqslant f(u\_1, \dots, u\_n)} \tag{8}$$

*The partial map* ↑: T (F⊥) × T (F⊥) → T (F⊥) *is defined as follows:*

⊥ ↑ <sup>t</sup> <sup>=</sup> <sup>t</sup> ↑ ⊥ <sup>=</sup> t f(t<sup>1</sup>,...,t<sup>n</sup>) <sup>↑</sup> <sup>f</sup>(u<sup>1</sup>,...,u<sup>n</sup>) = <sup>f</sup>(t<sup>1</sup> <sup>↑</sup> <sup>u</sup><sup>1</sup>,...,t<sup>n</sup> <sup>↑</sup> <sup>u</sup><sup>n</sup>) -

It is not difficult to show that t <sup>↑</sup> u is the least upper bound of comparable terms t and u.

**Definition 6.** - *Let* <sup>R</sup> *be a TRS over a signature* <sup>F</sup>*. We write* T <sup>⊥</sup> *for the set* {t <sup>⊥</sup> <sup>|</sup> t - *for some* <sup>→</sup> <sup>r</sup> ∈ R} ∪ {⊥}*. The set* <sup>T</sup><sup>↑</sup> *is obtained by closing* <sup>T</sup> <sup>⊥</sup> *under* ↑*.*

*Example 5.* Consider the TRS R consisting of following rules:

$$\mathsf{h}(\mathsf{f}(\mathsf{g}(\mathsf{a}),x,y)) \to \mathsf{g}(\mathsf{a}) \qquad \mathsf{g}(\mathsf{f}(x,\mathsf{h}(x),y))) \to x \qquad \mathsf{h}(\mathsf{f}(x,y,\mathsf{h}(\mathsf{a}))) \to \mathsf{h}(x)$$

We start by collecting the subterms of the left-hand sides:

$$T^{\perp} = \{ \perp, \mathfrak{a}, \mathfrak{g}(\mathfrak{a}), \mathfrak{h}(\perp), \mathfrak{h}(\mathfrak{a}), \mathfrak{f}(\mathfrak{g}(\mathfrak{a}), \perp, \perp), \mathfrak{f}(\perp, \mathfrak{h}(\perp), \perp), \mathfrak{f}(\perp, \perp, \mathfrak{h}(\mathfrak{a})) \}$$

Closing <sup>T</sup> <sup>⊥</sup> under <sup>↑</sup> adds the following terms:

$$\begin{aligned} \mathsf{f}(\mathsf{g}(\mathsf{a}),\bot,\bot) &\uparrow \mathsf{f}(\bot,\mathsf{h}(\bot),\bot) = \mathsf{f}(\mathsf{g}(\mathsf{a}),\mathsf{h}(\bot),\bot) \\ \mathsf{f}(\bot,\bot,\mathsf{h}(\mathsf{a})) &\uparrow \mathsf{f}(\bot,\mathsf{h}(\bot),\bot) = \mathsf{f}(\bot,\mathsf{h}(\bot),\mathsf{h}(\mathsf{a})) \\ \mathsf{f}(\mathsf{g}(\mathsf{a}),\mathsf{h}(\bot),\bot) &\uparrow \mathsf{f}(\bot,\mathsf{h}(\bot),\mathsf{h}(\mathsf{a})) = \mathsf{f}(\mathsf{g}(\mathsf{a}),\mathsf{h}(\bot),\mathsf{h}(\mathsf{a})) \end{aligned}$$

**Lemma 8.** *The set* <sup>T</sup><sup>↑</sup> *is finite.*

*Proof.* If t <sup>↑</sup> u is defined then <sup>P</sup>os(t <sup>↑</sup> u) = <sup>P</sup>os(t) ∪ Pos(u). It follows that the positions of terms in <sup>T</sup><sup>↑</sup> \ <sup>T</sup> <sup>⊥</sup> are positions of terms in <sup>T</sup> <sup>⊥</sup>. Since <sup>T</sup> <sup>⊥</sup> is finite, there are only finitely many such positions. Hence the finiteness of <sup>T</sup><sup>↑</sup> follows from the finiteness of F.

Although the above proof is simple enough, we formalized the proof below which is based on a concrete algorithm to compute T<sup>↑</sup>. Actually, the algorithm presented below is based on a general saturation procedure, which is of independent interest.

**Definition 7.** *Let* f : U <sup>×</sup> U <sup>→</sup> U *be a (possibly partial) function and let* S *be a finite subset of* <sup>U</sup>*. The* closure <sup>C</sup><sup>f</sup> (S) *is the least extension of* <sup>S</sup> *with the property that* <sup>f</sup>(a, b) <sup>∈</sup> <sup>C</sup><sup>f</sup> (S) *whenever* a, b <sup>∈</sup> <sup>C</sup><sup>f</sup> (S) *and* <sup>f</sup>(a, b) *is defined.*

The following lemma provides a sufficient condition for closures to exist. The proof gives a concrete algorithm to compute the closure.

**Lemma 9.** *If* f *is a total, associative, commutative, and idempotent function then* <sup>C</sup><sup>f</sup> (S) *exists and is finite.*

*Proof.* A straightforward induction proof reveals that for every <sup>a</sup> <sup>∈</sup> <sup>C</sup><sup>f</sup> (S) there exist elements <sup>a</sup><sup>1</sup>,...,a<sup>n</sup> <sup>∈</sup> <sup>S</sup> such that <sup>a</sup> <sup>=</sup> <sup>f</sup>(a<sup>1</sup>, f(a<sup>2</sup>,...f(a<sup>n</sup>−<sup>1</sup>, a<sup>n</sup>)...)). Select an arbitrary element <sup>b</sup> <sup>∈</sup> <sup>S</sup>. If <sup>b</sup> is among <sup>a</sup><sup>1</sup>,...,a<sup>n</sup> then, using the properties of <sup>f</sup>, we obtain <sup>a</sup> ∈ {f(b, c) <sup>|</sup> <sup>c</sup> <sup>∈</sup> <sup>C</sup><sup>f</sup> (<sup>S</sup> \ {b})}. If <sup>b</sup> is not among <sup>a</sup><sup>1</sup>,...,a<sup>n</sup> then <sup>a</sup> <sup>∈</sup> <sup>C</sup><sup>f</sup> (<sup>S</sup> \ {b}). Hence

$$C\_f(S) = C\_f(S \mid \{b\}) \cup \{b\} \cup \{f(b,c) \mid c \in C\_f(S \mid \{b\})\}$$

for every b <sup>∈</sup> S. Since S is finite, this gives rise to an iterative algorithm to compute <sup>C</sup><sup>f</sup> (S), which is given in Listing 5. In each iteration only finitely many elements are added. Hence <sup>C</sup><sup>f</sup> (S) is finite. -

```
saturate(S):
   I ← ∅
   for all x ∈ S do
      I ← {x} ∪ I ∪ {f(x, y) | y ∈ I}
   return I
```
**Listing 1.** Iterative closure algorithm.

Since our function ↑ is partial, we need to lift it to a total function that preserves associativity and commutativity. In our abstract setting this entails finding a binary predicate P on U such that f(a, b) is defined if P(a, b) holds. In addition, the following properties need to be fulfilled:


For the details we refer to the formalization. -

**Definition 8.** - *The tree automaton* <sup>A</sup>NF(R) = (F, Q, Q<sup>f</sup> , Δ) *is defined as follows:* <sup>Q</sup> <sup>=</sup> <sup>Q</sup><sup>f</sup> <sup>=</sup> <sup>T</sup><sup>↑</sup> *and* <sup>Δ</sup> *consists of all transition rules*

$$f(p\_1, \ldots, p\_n) \to q \tag{8}$$

*such that* <sup>f</sup>(p<sup>1</sup>,...,p<sup>n</sup>) *is no redex and* q *is the maximal element of* Q *satisfying* q f(p<sup>1</sup>,...,p<sup>n</sup>)*.* 4

*Example 6.* For the TRS R of Example 5, the tree automaton ANF(R) consists of the following transition rules:

$$\begin{array}{llll} \mathbf{a} \rightarrow 1 & \mathbf{g}(p) \rightarrow \begin{cases} 2 & \text{if } p = 1 \\ 0 & \text{if } p \notin \{1, 6, 9, 10\} \end{cases} & \mathbf{h}(p) \rightarrow \begin{cases} 4 & \text{if } p = 1 \\ 3 & \text{if } p \notin \{1, 8, 10\} \end{cases} \\\\ \mathbf{f}(p, q, r) \rightarrow \begin{cases} 5 & \text{if } p = 2, q \notin \{3, 4\} \\ 6 & \text{if } p \neq 2, q \in \{3, 4\}, r \neq 4 \\ 7 & \text{if } q \notin \{3, 4\}, r = 4 \\ 8 & \text{if } p = 2, q \in \{3, 4\}, r \neq 4 \\ 9 & \text{if } p \neq 2, q \in \{3, 4\}, r = 4 \\ 10 & \text{if } p = 2, q \in \{3, 4\}, r = 4 \\ 0 & \text{otherwise} \end{cases} \end{array}$$

Here we use the following abbrevations:

$$\begin{array}{llllll} \text{10 } = \bot & 3 = \mathsf{h}(\bot) & 6 = \mathsf{f}(\bot, \mathsf{h}(\bot), \bot) & 8 = \mathsf{f}(\mathsf{g}(\mathsf{a}), \mathsf{h}(\bot), \bot) \\ 1 = \mathsf{a} & 4 = \mathsf{h}(\mathsf{a}) & 7 = \mathsf{f}(\bot, \bot, \mathsf{h}(\mathsf{a})) & 9 = \mathsf{f}(\bot, \mathsf{h}(\bot), \mathsf{h}(\mathsf{a})) \\ 2 = \mathsf{g}(\mathsf{a}) & 5 = \mathsf{f}(\mathsf{g}(\mathsf{a}), \bot, \bot) & 10 = \mathsf{f}(\mathsf{g}(\mathsf{a}), \mathsf{h}(\bot), \mathsf{h}(\mathsf{a})) \\ \hline \dots & \dots & \dots & \dots & \dots \end{array}$$

<sup>4</sup> Since states are terms from T<sup>∞</sup> here, Definition 5 applies.


As can be seen from the above example, the tree automaton ANF(R) is not completely defined. Unlike the construction in [5], we do not have an additional state that is reached by all reducible ground terms.

Before proving that ANF(R) accepts the ground normal forms of R, we first show that ANF(R) is well-defined, which amounts to showing that for every <sup>f</sup>(p<sup>1</sup>,...,p<sup>n</sup>) with <sup>f</sup> ∈ F and <sup>p</sup><sup>1</sup>,...,p<sup>n</sup> <sup>∈</sup> <sup>T</sup><sup>↑</sup> the set of states <sup>q</sup> such that q f(p<sup>1</sup>,...,p<sup>n</sup>) has a maximum element with respect to the partial order .

**Lemma 10.** *For every term* <sup>t</sup> <sup>∈</sup> <sup>T</sup><sup>↑</sup> *the set* {<sup>s</sup> <sup>∈</sup> <sup>T</sup><sup>↑</sup> <sup>|</sup> <sup>s</sup> <sup>t</sup>} *has a unique maximal element.*

*Proof.* Let <sup>S</sup> <sup>=</sup> {<sup>s</sup> <sup>∈</sup> <sup>T</sup><sup>↑</sup> <sup>|</sup> <sup>s</sup> <sup>t</sup>}. Because <sup>⊥</sup> <sup>t</sup> and ⊥ ∈ <sup>T</sup><sup>↑</sup>, <sup>S</sup> <sup>=</sup> <sup>∅</sup>. If <sup>s</sup><sup>1</sup>, s<sup>2</sup> <sup>∈</sup> <sup>T</sup> then <sup>s</sup><sup>1</sup> <sup>t</sup> and <sup>s</sup><sup>2</sup> <sup>t</sup> and thus <sup>s</sup><sup>1</sup> <sup>↑</sup> <sup>s</sup><sup>2</sup> is defined and satisfies <sup>s</sup><sup>1</sup> <sup>↑</sup> <sup>s</sup><sup>2</sup> <sup>t</sup>. Since <sup>T</sup><sup>↑</sup> is closed under <sup>↑</sup>, <sup>s</sup><sup>1</sup> <sup>↑</sup> <sup>s</sup><sup>2</sup> <sup>∈</sup> <sup>T</sup><sup>↑</sup> and thus <sup>s</sup><sup>1</sup> <sup>↑</sup> <sup>s</sup><sup>2</sup> <sup>∈</sup> <sup>P</sup>. Consequently, <sup>S</sup> has a unique maximal element.

The next lemma is a trivial consequence of the fact that ANF(R) has no epsilon transitions.

**Lemma 11.** *The tree automaton* <sup>A</sup>NF(R) *is deterministic.* -

**Lemma 12.** *If* t ∈ T (F) *with* t <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> *and* <sup>s</sup><sup>⊥</sup> <sup>t</sup> <sup>⊥</sup> *for a proper subterm* s *of some left-hand side of* <sup>R</sup> *then* <sup>s</sup><sup>⊥</sup> q*.*

*Proof.* We use structural induction on <sup>t</sup>. Let <sup>t</sup> <sup>=</sup> <sup>f</sup>(t<sup>1</sup>,...,t<sup>n</sup>). We have t <sup>→</sup><sup>∗</sup> Δ <sup>f</sup>(q<sup>1</sup>,...,q<sup>n</sup>) <sup>→</sup><sup>Δ</sup> <sup>q</sup>. We procede by case analysis on <sup>s</sup>. If <sup>s</sup> is a variable then <sup>s</sup><sup>⊥</sup> <sup>=</sup> <sup>⊥</sup> and, as <sup>⊥</sup> is minimal in , we obtain s<sup>⊥</sup> q. Otherwise we must have root(s) = f from the assumption s<sup>⊥</sup> <sup>t</sup> <sup>⊥</sup>. So we may write s <sup>=</sup> f(s<sup>1</sup>,...,s<sup>n</sup>). The induction hypothesis yields s<sup>⊥</sup> <sup>i</sup> <sup>q</sup><sup>i</sup> for all 1 <sup>i</sup> <sup>n</sup>. Hence <sup>s</sup><sup>⊥</sup> <sup>=</sup> f(s<sup>⊥</sup> <sup>1</sup> ,...,s<sup>⊥</sup> <sup>n</sup> ) <sup>f</sup>(q<sup>1</sup>,...,q<sup>n</sup>). Additionally we have <sup>s</sup><sup>⊥</sup> <sup>∈</sup> <sup>Q</sup> by Definition <sup>8</sup> as s is a proper subterm of a left-hand side of <sup>R</sup>. Since f(q<sup>1</sup>,...,q<sup>n</sup>) <sup>→</sup> <sup>q</sup> is a transition rule, we obtain <sup>f</sup>(s<sup>1</sup>,...,s<sup>n</sup>)<sup>⊥</sup> q from the maximality of q. -

Using the previous result we can prove that no redex of R reaches a state in ANF(R).

**Lemma 13.** *If* t ∈ T (F) *is a redex then* t <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> *for no state* <sup>q</sup> <sup>∈</sup> <sup>T</sup><sup>↑</sup>*.*

*Proof.* We have <sup>⊥</sup> t for some left-hand side of <sup>R</sup>. For a proof by contradiction, assume t <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup>. Write <sup>t</sup> <sup>=</sup> <sup>f</sup>(t<sup>1</sup>,...,t<sup>n</sup>). We have <sup>t</sup> <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>f</sup>(q<sup>1</sup>,...,q<sup>n</sup>) <sup>→</sup><sup>Δ</sup> <sup>q</sup> and obtain <sup>⊥</sup> f(q<sup>1</sup>,...,q<sup>n</sup>) by a case analysis on and Lemma 12. Therefore the transition rule <sup>f</sup>(q<sup>1</sup>,...,q<sup>n</sup>) <sup>→</sup><sup>Δ</sup> <sup>q</sup> cannot exist by Definition 8. -

**Lemma 14.** *If* t <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> *and* <sup>t</sup> ∈ T (F) *then* <sup>q</sup> <sup>t</sup>*.*

*Proof.* We use structural induction on <sup>t</sup>. Let <sup>t</sup> <sup>=</sup> <sup>f</sup>(t<sup>1</sup>,...,t<sup>n</sup>). We have t <sup>→</sup><sup>∗</sup> Δ f(q<sup>1</sup>,...,q<sup>n</sup>) <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup>. The induction hypothesis yields <sup>q</sup><sup>i</sup> <sup>t</sup><sup>i</sup> for all 1 <sup>i</sup> <sup>n</sup> and thus also f(q<sup>1</sup>,...,q<sup>n</sup>) <sup>f</sup>(t<sup>1</sup>,...,t<sup>n</sup>). We have <sup>q</sup> <sup>f</sup>(q<sup>1</sup>,...,q<sup>n</sup>) by Definition <sup>8</sup> and thus q t by the transitivity of . -

**Lemma 15.** *If* t <sup>∈</sup> NF(R) *then* t <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> *for some state* <sup>q</sup> <sup>∈</sup> <sup>T</sup><sup>↑</sup>*.*

*Proof.* We use structural induction on <sup>t</sup>. Let <sup>t</sup> <sup>=</sup> <sup>f</sup>(t<sup>1</sup>,...,t<sup>n</sup>). Since <sup>t</sup><sup>1</sup>,...,t<sup>n</sup> <sup>∈</sup> NF(R) we obtain f(t<sup>1</sup>,...,t<sup>n</sup>) <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>f</sup>(q<sup>1</sup>,...,q<sup>n</sup>) from the induction hypothesis. Suppose f(q<sup>1</sup>,...,q<sup>n</sup>) is a redex, so <sup>l</sup> <sup>⊥</sup> f(q<sup>1</sup>,...,q<sup>n</sup>) for some left-hand side of <sup>R</sup>. From Lemma <sup>14</sup> we obtain <sup>q</sup><sup>i</sup> <sup>t</sup><sup>i</sup> for all 1 <sup>i</sup> <sup>n</sup> and thus <sup>f</sup>(q<sup>1</sup>,...,q<sup>n</sup>) <sup>f</sup>(t<sup>1</sup>,...,t<sup>n</sup>). Hence <sup>⊥</sup> f(t<sup>1</sup>,...,t<sup>n</sup>). This however contradicts the assumption that t is a normal form. (Here we need left-linearity of <sup>R</sup>.) Therefore f(q<sup>1</sup>,...,q<sup>n</sup>) is no redex and thus, using Lemma 10, there exists a transition <sup>f</sup>(q<sup>1</sup>,...,q<sup>n</sup>) <sup>→</sup> q in Δ and thus t <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup>. -

**Theorem 4.** *If* <sup>R</sup> *is a left-linear TRS then* L(ANF(R)) = NF(R)*.*

*Proof.* Let t ∈ T (F). If t <sup>∈</sup> NF(R) then t <sup>→</sup><sup>∗</sup> <sup>Δ</sup> <sup>q</sup> for some state <sup>q</sup> <sup>∈</sup> <sup>T</sup><sup>↑</sup> by Lemma 15. Since all states in <sup>T</sup><sup>↑</sup> are final, <sup>t</sup> <sup>∈</sup> <sup>L</sup>(ANF(R)). -

Next assume t /<sup>∈</sup> NF(R). Hence t <sup>=</sup> C[s] for some redex s. According to Lemma <sup>13</sup> s does not reach a state in <sup>A</sup>NF(R). Hence also <sup>t</sup> cannot reach a state and thus t /<sup>∈</sup> L(ANF(R)). -

## **6 Conclusion and Future Work**

In this paper we presented formalized correctness proofs of the regularity of the infinity and normal form predicates in the first-order theory of rewriting. For the former we also provided an executable version, which is important for checking certificates that will be provided in a future version of FORT. Our results are an important step towards the ultimate goal of proving the correctness of the decisions reported by FORT, but much work remains to be done. We are developing a certification language which reflects the high-level proof steps in the decision procedure for the full first-order theory of rewriting. This language will be independent of FORT. In particular, details of the intermediate tree automata computed by FORT will not be part of certificates. This keeps the certificates small and avoids having to implement a verified (and expensive) equivalence check on tree automata. We will provide executable Isabelle code for each of the constructs in the certification language, and so this involves replaying the automata constructions in Isabelle.

We conclude the paper by providing some details of the size of our formalization in Table 1.

*Acknowledgments.* We thank Bertram Felgenhauer and T. V. H. Prathamash for contributions in the early stages of this work. The comments by the reviewers helped to improve the presentation of the paper.

## **References**

1. Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press (1998). https://doi.org/10.1017/CBO9781139172752


**Table 1.** Formalization data.


Construction and Analysis of Systems. Lecture Notes in Computer Science, vol. 11429, pp. 156–166 (2019). https://doi.org/10.1007/978-3-030-17502-3 10


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Fold/Unfold Transformations for Fixpoint Logic**

Naoki Kobayashi<sup>1</sup> , Grigory Fedyukovich<sup>2</sup> , and Aarti Gupta<sup>3</sup>

<sup>1</sup> The University of Tokyo, Tokyo, Japan, koba@is.s.u-tokyo.ac.jp

<sup>2</sup> Florida State University, Tallahassee, USA, grigory@cs.fsu.edu

<sup>3</sup> Princeton University, Princeton, USA,aartig@cs.princeton.edu

**Abstract.** Fixpoint logics have recently been drawing attention as common foundations for automated program verification. We formalize fold/unfold transformations for fixpoint logic formulas and show how they can be used to enhance a recent fixpoint-logic approach to automated program verification, including automated verification of relational and temporal properties. We have implemented the transformations in a tool and confirmed its effectiveness through experiments.

## **1 Introduction**

A wide range of program properties can be verified by reducing to satisfiability/validity in a fixpoint logic [3–6, 18, 20, 22, 23, 29, 35]. In this paper, we build on top of MuArith, a first-order logic with least/greatest fixpoint operators and integer arithmetic, recently proposed by Kobayashi et al. [22]. It offers a powerful tool to handle the full class of modal μ-calculus properties of while-programs (imperative programs with loops but without general recursion). In contrast, earlier studies on temporal program verification require different methods for each subclass of the modal μ-calculus properties, such as LTL [12,16,28], CTL [2,3,13,34], and CTL<sup>∗</sup> [11]. The recent program verifier based on MuArith [22] is effective in practice, i.e., by exploiting general-purpose solvers for Satisfiability Modulo Theories (SMT) and Constrained Horn Clauses (CHC), it can outperform tools designed specifically for CTL verification of C programs [13].

Despite these promising results, the generality of the fixpoint logic approach come at a cost. Since fixpoint logic formulas obtained by reduction from various verification problems often involve nested fixpoint operators, it could be challenging to check the validity of these formulas automatically. To enhance the capability of fixpoint logic provers, in this paper, we propose novel fold/unfold transformations and prove their correctness. These transformations are generally used to simplify relational verification, and in particular, to reduce the number of recurrences used in the program (or a set of programs) under analysis. Originally proposed for logic programming [8, 19, 32], they have been recently adopted for determining the satisfiability of CHC [15,26] and allow discovery of *relational* invariants for a pair of loopy (or recursive) programs, as opposed to invariants within each individual program. Our transformations can be regarded as extensions of such transformations for a fixpoint logic, where quantifiers and arbitrarily nested least/greatest fixpoint operators are allowed.

We also present a procedure that seeks a way to apply the proposed fold/unfold transformations efficiently. Besides non-determinism in the choice of which fixpoint formulas to unfold, our "fold" operation replaces a formula φ with P (where P is the predicate defined by P <sup>=</sup> φ) and requires various reasoning to convert the current goal formula to a form E[φ], where the form of E can be more complex than in the case of fold/unfold transformations for logic programming or CHC.

We have implemented the transformations and integrated them with the program verifier Mu2CHC [22] based on MuArith. We considered a number of examples of MuArith formulas which include formulas obtained from program verification problems for checking relational and temporal properties. Our new transformations allowed Mu2CHC to solve these formulas, which would not be doable otherwise.

To sum up, our contributions are: (i) a formalization of fold/unfold transformations for a fixpoint logic and proofs of their soundness, (ii) demonstration of the usefulness of the proposed transformations for verification of relational and temporal properties of programs, and (iii) a concrete procedure for automated transformation and its implementation and experiments.

The rest of this paper is structured as follows. Section 2 reviews the definition of the first-order fixpoint logic MuArith [22], and reductions from program verification problems to validity checking in MuArith. Section 3 formalizes our transformations and proves their correctness. Section 4 shows applications of our transformations to verification of relational and temporal properties of recursive programs. Section 5 reports an implementation and experimental results. Section 6 discusses related work and Section 7 concludes the paper.

## **2 First-Order Fixpoint Logic** MuArith

We review the first-order fixpoint logic MuArith [22] in this section. MuArith is a variation of Mu-Arithmetic studied by Lubarsky [25] and Bradfield [7], obtained by replacing natural numbers with integers.

### **2.1 Syntax**

The set of (propositional) formulas, ranged over by ϕ, is defined in the following grammar.

$$\varphi \text{ (formulas)} ::= a\_1 \ge a\_2 \mid P^{(k)}(a\_1, \dots, a\_k) \mid$$

$$\varphi\_1 \lor \varphi\_2 \mid \varphi\_1 \land \varphi\_2 \mid \forall x. \varphi \mid \exists x. \varphi$$

$$P^{(k)} \text{ ( $k$ -ary predicates)} ::= X^{(k)} \mid \lambda(x\_1, \dots, x\_k). \varphi \mid$$

$$\mu X^{(k)}(x\_1, \dots, x\_k). \varphi \mid \nu X^{(k)}(x\_1, \dots, x\_k). \varphi$$

$$\mu \text{ (arithmetic expressions)} ::= n \mid x \mid a\_1 + a\_2 \mid a\_1 - a\_2$$

The metavariable ϕ represents a proposition, and P denotes a predicate on (a tuple of) integers. We write for 0 ≥ 0 and ⊥ for 0 ≥ 1. In examples, we may also use other relational symbols such as > and =. The meta-variable x denotes an integer, and the meta-variable X(k) denotes a k-ary first-order predicate variable. We write ar(X(k) ) for the arity of the predicate variable X(k) , i.e., k; we often omit the superscript (k) and just write X for a predicate variable. The predicate μX(k) (x<sup>1</sup>,...,x<sup>k</sup>).ϕ (resp. νX(k) (x<sup>1</sup>,...,x<sup>k</sup>).ϕ) denotes the least (resp. greatest) predicate X such that X(x<sup>1</sup>,...,x<sup>k</sup>) equals <sup>ϕ</sup>.

*Example 1.* Let μX(x).(x = 0 <sup>∨</sup> X(x <sup>−</sup> 1)) denote the least predicate X, such that X(x) <sup>≡</sup> x = 0 <sup>∨</sup> X(x <sup>−</sup> 1) <sup>≡</sup> x = 0 <sup>∨</sup> x <sup>−</sup> 1=0 <sup>∨</sup> X(x <sup>−</sup> 2) ≡ ···, i.e., λ(x).x <sup>≥</sup> 0. In contrast, νX(x).(x = 0 <sup>∨</sup> X(x <sup>−</sup> 1)) denotes λ(x).. 

We write **FV**(ϕ) for the set of free (predicate and integer) variables in ϕ; <sup>∀</sup>x, <sup>∃</sup>x, μX(k) ,νX(k) , and λx are binders. We sometimes write x for a sequence of variables x<sup>1</sup>,...,x<sup>k</sup>. We often write <sup>ϕ</sup> and <sup>X</sup> for the De Morgan dual of a formula ϕ and a predicate variable X, respectively. For example, μX(x).x = 0 <sup>∨</sup> X(x <sup>−</sup> 1) = νX(x).x = 0 <sup>∧</sup> X(x <sup>−</sup> 1). Here, X is a predicate variable, so the righthand side is α-equivalent to νX(x).x = 0 <sup>∧</sup> X(x <sup>−</sup> 1). The overline for X is used to indicate that it corresponds to the dual of X in the original formula μX(x).x = 0 <sup>∨</sup> X(x <sup>−</sup> 1).

#### **2.2 Semantics**

In this subsection, we define the formal semantics of formulas. Let **Z** be the set of integers, and **<sup>B</sup>** <sup>=</sup> {**<sup>B</sup>**, <sup>⊥</sup>**B**}, with <sup>⊥</sup>**<sup>B</sup> <sup>B</sup> B**. Let **<sup>D</sup>**<sup>k</sup> be the set **<sup>Z</sup>**<sup>k</sup> <sup>→</sup> **<sup>B</sup>** of functions (where **<sup>Z</sup>**<sup>k</sup> denotes the set of tuples consisting of <sup>k</sup> integers). We define the partial order <sup>k</sup> on **D**<sup>k</sup> by:

$$f \sqsubseteq\_k g \Leftrightarrow \forall n\_1, \dots, n\_k \in \mathbf{Z}. f(n\_1, \dots, n\_k) \sqsubseteq\_{\mathbf{B}} g(n\_1, \dots, n\_k).$$

Note that (**D**<sup>k</sup>, <sup>k</sup>) is a complete lattice, with λx<sup>1</sup>. ··· λx<sup>k</sup>.⊥**<sup>B</sup>** and λx<sup>1</sup>. ··· λx<sup>k</sup>.**<sup>B</sup>** as the least and greatest elements. We write <sup>⊥</sup><sup>k</sup> and <sup>k</sup> for λx<sup>1</sup>. ··· λx<sup>k</sup>.⊥**<sup>B</sup>** and λx<sup>1</sup>. ··· λx<sup>k</sup>.**B**, respectively. We also write (k) (resp., (k) ) for the greatest lower (resp., least upper) bound with respect to <sup>k</sup>. We often omit <sup>k</sup> and **<sup>B</sup>** and just write , <sup>⊥</sup>, , , , etc.. We often identify **<sup>B</sup>** and **<sup>D</sup>**<sup>0</sup> <sup>=</sup> **<sup>Z</sup>**<sup>0</sup> <sup>→</sup> **<sup>B</sup>**. We write **D**<sup>k</sup> → **D** for the set of monotonic functions from **D**<sup>k</sup> to **D**-.

We write **Env** for the set of functions that map each integer variable to an integer, and each <sup>k</sup>-ary predicate variable to an element of **<sup>D</sup>**k. For a formula ϕ (resp., a predicate P and an expression a) and an environment ρ <sup>∈</sup> **Env** such that **FV**(ϕ) <sup>⊆</sup> *dom*(**Env**) (resp., **FV**(P) <sup>⊆</sup> *dom*(**Env**) and **FV**(a) <sup>⊆</sup> *dom*(**Env**)), Fig. <sup>1</sup> defines the semantics -·ρ of ϕ (resp., P and a), where for a monotonic function <sup>F</sup> <sup>∈</sup> **<sup>D</sup>**<sup>k</sup> <sup>→</sup> **<sup>D</sup>**k, **LFP**(k) (F) = (k) {<sup>f</sup> <sup>∈</sup> **<sup>D</sup>**<sup>k</sup> <sup>|</sup> <sup>f</sup> <sup>k</sup> <sup>F</sup>(f)} and **GFP**(k) (F) = (k) {<sup>f</sup> <sup>∈</sup> **<sup>D</sup>**<sup>k</sup> <sup>|</sup> <sup>f</sup> <sup>k</sup> <sup>F</sup>(f)}. When <sup>ϕ</sup> and <sup>P</sup> are closed (i.e., do not contain free variables), we just write ϕ and -P for ϕ<sup>∅</sup> and -P<sup>∅</sup> respectively. By abuse of notation, we often write ϕ ψ if ϕρ ψρ for any (valid) environment ρ such that **FV**(ϕ)∪**FV**(ψ) <sup>⊆</sup> *dom*(ρ), and ϕ <sup>≡</sup> ψ if ϕρ <sup>=</sup> ψρ; similarly for predicates. For example, <sup>∃</sup>z.(x>z <sup>∧</sup> z>y) <sup>≡</sup> (x>y + 1) (x>y + 2).

<sup>a</sup><sup>1</sup> <sup>≥</sup> <sup>a</sup><sup>2</sup><sup>ρ</sup> <sup>=</sup> if <sup>a</sup><sup>1</sup>ρ <sup>≥</sup> a<sup>2</sup>ρ <sup>⊥</sup> if a<sup>1</sup>ρ < a<sup>2</sup>ρ -P(a<sup>1</sup>,...,a<sup>k</sup>)ρ <sup>=</sup> -Pρ(a<sup>1</sup>ρ, . . . , a<sup>k</sup>ρ) -Xρ <sup>=</sup> ρ(X) <sup>ϕ</sup><sup>1</sup> <sup>∨</sup> <sup>ϕ</sup><sup>2</sup><sup>ρ</sup> <sup>=</sup> ϕ<sup>1</sup>ρ ϕ<sup>2</sup>ρ <sup>ϕ</sup><sup>1</sup> <sup>∧</sup> <sup>ϕ</sup><sup>2</sup><sup>ρ</sup> <sup>=</sup> ϕ<sup>1</sup>ρ ϕ<sup>2</sup>ρ -<sup>∀</sup>x.ϕρ <sup>=</sup> <sup>n</sup>∈**<sup>Z</sup>**ϕρ{x <sup>→</sup> n} -<sup>∃</sup>x.ϕρ <sup>=</sup> <sup>n</sup>∈**<sup>Z</sup>**ϕρ{x <sup>→</sup> n} λ(x<sup>1</sup>,...,x<sup>k</sup>).ϕρ <sup>=</sup> λ(n<sup>1</sup>,...,n<sup>k</sup>) <sup>∈</sup> **<sup>Z</sup>**<sup>k</sup>.<sup>ϕ</sup>ρ{x<sup>1</sup> <sup>→</sup> <sup>n</sup><sup>1</sup>,...,x<sup>k</sup> <sup>→</sup> n<sup>k</sup>} μX(k) (x<sup>1</sup>,...,x<sup>k</sup>).ϕρ <sup>=</sup> **LFP**(k) (λf <sup>∈</sup> **<sup>D</sup>**<sup>k</sup>.λ(n<sup>1</sup>,...,n<sup>k</sup>).ϕρ{X <sup>→</sup> f, x<sup>1</sup> <sup>→</sup> <sup>n</sup><sup>1</sup>,...,x<sup>k</sup> <sup>→</sup> n<sup>k</sup>}) νX(k) (x<sup>1</sup>,...,x<sup>k</sup>).ϕρ <sup>=</sup> **GFP**(k) (λf <sup>∈</sup> **<sup>D</sup>**<sup>k</sup>.λ(n<sup>1</sup>,...,n<sup>k</sup>).ϕρ{X <sup>→</sup> f, x<sup>1</sup> <sup>→</sup> <sup>n</sup><sup>1</sup>,...,x<sup>k</sup> <sup>→</sup> n<sup>k</sup>}) nρ <sup>=</sup> n xρ <sup>=</sup> ρ(x) <sup>a</sup><sup>1</sup> <sup>+</sup> <sup>a</sup><sup>2</sup><sup>ρ</sup> <sup>=</sup> a<sup>1</sup>ρ <sup>+</sup> a<sup>2</sup>ρ <sup>a</sup><sup>1</sup> <sup>−</sup> <sup>a</sup><sup>2</sup><sup>ρ</sup> <sup>=</sup> a<sup>1</sup>ρ <sup>−</sup> a<sup>2</sup>ρ

**Fig. 1.** The semantics of formulas.

*Example 2.* Recall formula μX(x).x = 0 <sup>∨</sup> X(x <sup>−</sup> 1) from Example 1. We have μX(x).x = 0 <sup>∨</sup> X(x <sup>−</sup> 1)<sup>∅</sup> <sup>=</sup> **LFP**(1)(F), with <sup>F</sup> <sup>=</sup> λf <sup>∈</sup> <sup>D</sup><sup>1</sup>.λn <sup>∈</sup> **<sup>Z</sup>**.(<sup>n</sup> <sup>=</sup> 0)f(n−1). Since for any m, F <sup>m</sup>(λx <sup>∈</sup> **<sup>Z</sup>**.⊥) = λn <sup>∈</sup> **<sup>Z</sup>**.<sup>0</sup> <sup>≤</sup> <sup>n</sup> <sup>≤</sup> <sup>m</sup> <sup>−</sup>1, we have **LFP**(1)(F) = λn <sup>∈</sup> **<sup>Z</sup>**.<sup>0</sup> <sup>≤</sup> n (here, <sup>≤</sup> denotes the semantic relation on integers). In contrast, νX(x).x = 0 <sup>∨</sup> X(x <sup>−</sup> 1)<sup>∅</sup> <sup>=</sup> **GFP**(1)(F) = λn <sup>∈</sup> **<sup>Z</sup>**.. 

## **2.3 Program Verification as Validity Checking of** MuArith **Formulas**

Various verification problems for first-order recursive programs can be reduced to validity of MuArith formulas. We refer the reader to [22] for a general reduction schema from temporal properties to MuArith formulas. However, as shown in this subsection, some formulas require additional handling that motivates the need for new transformations to be presented in Section 3.

Consider the following functional program (written in the syntax of OCaml) that multiplies two numbers.

## let rec mult(x, y) = if y=0 then 0 else x + mult(x,y-1)

Then, the ternary relation *Mult*(x, y, r) that expresses "mult(x, y) terminates and returns r" is expressed as the following MuArith formula:

$$(\mu Mult(x, y, r). (y = 0 \land r = 0) \lor \exists s. (y \neq 0 \land r = x + s \land Mult(x, y - 1, s)).$$

This lets us express a partial correctness property "if P(x, y) holds and mult(x, y) terminates and returns r, then Q(x, y, r) holds" by: <sup>∀</sup>x, y, r.P(x, y)∧*Mult*(x, y, r) <sup>⇒</sup> Q(x, y, r). It can further be rewritten to the following MuArith formula:

$$\forall x, y, r. \overline{P}(x, y) \lor \overline{Mult}(x, y, r) \lor Q(x, y, r), \tag{1}$$

where <sup>P</sup> and *Mult* are respectively De Morgan duals of <sup>P</sup> and *Mult*; *Mult* can be expressed by:

$$\nu \overline{Mult}(x, y, r). (y \neq 0 \lor r \neq 0) \land \forall s. (y = 0 \lor r \neq x + s \lor \overline{Mult}(x, y - 1, s)).$$

The total correctness "if P(x, y), then mult(x, y) terminates and returns r, such that Q(x, y, r)" can be expressed by: <sup>∀</sup>x, y.P(x, y) ⇒ ∃r.*Mult*(x, y, r) <sup>∧</sup> Q(x, y, r), which is equivalent to the MuArith formula:

$$\forall x, y. \overline{P}(x, y) \lor \left(\exists r. Mult(x, y, r) \land Q(x, y, r)\right)$$

As a special case, the termination property "if y <sup>≥</sup> 0 then mult(x, y) terminates" can be expressed by:

$$\forall x, y. y < 0 \lor \exists r. Multi(x, y, r). \tag{2}$$

We can also express relational properties of programs such as the equivalence of two programs. Let us consider another implementation of multiplication:

```
let mult2(x,y) =
  let rec multacc(x,y,a) = if y=0 then a else multacc(x,y-1,x+a)
in multacc(x,y,0)
```
Then predicate *Multacc*(x, y, a, r) which represents "multacc(x, y, a) terminates and returns r" can be expressed by:

$$(\mu Multacc(x, y, a, r). (y = 0 \land r = a) \lor (y \neq 0 \land Multacc(x, y - 1, x + a, r)).$$

Thus, the equivalence of mult and mult2 can be expressed by: <sup>∀</sup>x, y, r.*Mult*(x, y, r) <sup>⇔</sup> *Multacc*(x, y, <sup>0</sup>, r), which can be expressed by the conjunction of the MuArith formulas:

$$\forall x, y, r.\overline{Mult}(x, y, r) \lor \underbrace{Multacc}\_{\alpha}(x, y, 0, r) \tag{3}$$

$$\forall x, y, r.Mult(x, y, r) \lor \overline{Multacc}(x, y, 0, r) \tag{4}$$

where *Multacc* is the De Morgan dual of *Multacc*, defined analogously to *Mult*.

**Motivation.** Kobayashi et al. [22] presented a method for proving the validity of MuArith formulas. It can prove formula (1) valid: since there are neither μ nor <sup>∃</sup>, it is reducible to the problem of satisfiability of CHC [4]. However, the method is not powerful enough on formulas (2) and (3) for termination and program equivalence, respectively. It first tries to eliminate existential quantifiers and μformulas, so that the resulting formula can be reduced to the satisfiability of CHC. But it fails when the witness of an existential quantifier (i.e., r such that <sup>∃</sup>r.ϕ) is not bounded by a linear expression, e.g., the witness for <sup>∃</sup>r is a non-linear expression x <sup>×</sup> y in the case of (2). This is unfortunate, as methods specialized on proving program termination, e.g. [18], can easily prove the termination of program mult. Thus, in order to exploit the advantage of the uniform approach to program verification based on MuArith, we need to strengthen the method for proving MuArith formulas.

### **2.4 Auxiliary Definitions**

We introduce additional definitions on formulas, which will be used later in our formalization of fold/unfold-like transformations. A (k, )-*context* (or, just a context) is an expression obtained from an -ary predicate by replacing a k-ary predicate variable with [ ] (in other words, a context is a predicate that may contain [ ] as a special predicate variable). For a context C and a predicate P (that does not contain free occurrences of variables bound in C), we write C[P] for the predicate obtained by replacing [ ] with P. For example, C <sup>=</sup> λ(x, y).∃z.[ ](x, z, y) is a (3, 2)-context, and C[λ(x, y, z).(x>y <sup>∧</sup> y>z)] is λ(x, y).∃z.(λ(x, y, z).(x > y <sup>∧</sup> y>z))(x, z, y), which is equivalent to λ(x, y).∃z.x > z <sup>∧</sup> z>y.

For a function <sup>F</sup> <sup>∈</sup> **<sup>D</sup>**<sup>k</sup> <sup>→</sup> **<sup>D</sup>**-, we say that F is *continuous* if it preserves the least upper bound, i.e., F( <sup>f</sup>∈<sup>S</sup> <sup>f</sup>) = <sup>f</sup>∈<sup>S</sup> <sup>F</sup>(f) for any (possibly infinite) set <sup>S</sup> <sup>⊆</sup> **<sup>D</sup>**k. Similarly, we say that F is *co-continuous* if it preserves the greatest lower bound, i.e., F( <sup>f</sup>∈<sup>S</sup> <sup>f</sup>) = <sup>f</sup>∈<sup>S</sup> <sup>F</sup>(f). For example, λf.f <sup>∧</sup> <sup>g</sup> <sup>∈</sup> **<sup>D</sup>**<sup>0</sup> <sup>→</sup> **<sup>D</sup>**<sup>0</sup> and λf.f <sup>∧</sup> <sup>g</sup> is both continuous and co-continuous for any <sup>ϕ</sup> <sup>∈</sup> **<sup>D</sup>**0. In contrast, λf.∃x.f(x) <sup>∈</sup> **<sup>D</sup>**<sup>1</sup> <sup>→</sup> **<sup>D</sup>**<sup>0</sup> is continuous but not cocontinuous;<sup>4</sup> λf.∀x.f(x) <sup>∈</sup> **<sup>D</sup>**<sup>1</sup> <sup>→</sup> **<sup>D</sup>**<sup>0</sup> is co-continuous but not continuous. We say that a context C is continuous if its semantics, i.e., λf.-C[X]{X → f} is; analogously for co-continuity.

The following lemma (which follows immediately from the definition) provides a syntactic condition that is sufficient for the co-continuity of a context.

**Lemma 1.** *Let* C *be a* (k, )*-context. If* C *can be generated by the following syntax, then* C *is co-continuous.*<sup>5</sup>

$$C ::= \left[ \right] \mid \lambda(x\_1, \ldots, x\_k) . C \mid C(a\_1, \ldots, a\_k) \mid C \land \varphi \mid \varphi \land C \mid C \lor \varphi \mid \varphi \lor C \mid \forall x. C$$

*Remark 1.* The syntax and semantics of MuArith was defined based on hierarchical fixpoint equations (HES) in [22]. The above semantics is equivalent to that of [22], modulo the standard conversions between fixpoint formulas and HES.

<sup>4</sup> In fact, let <sup>F</sup> <sup>=</sup> λf.∃x.f(x) <sup>∈</sup> **<sup>D</sup>**<sup>1</sup> <sup>→</sup> **<sup>D</sup>**<sup>0</sup> and <sup>S</sup> <sup>=</sup> {λx.x <sup>≥</sup> <sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>∈</sup> **<sup>Z</sup>**}. Then F(f) = for any f <sup>∈</sup> S, but F( 

<sup>f</sup>∈<sup>S</sup> <sup>f</sup>) = <sup>F</sup>(λx.⊥) = <sup>⊥</sup>. <sup>5</sup> Here, for the sake of simplicity, we mix the syntax of contexts that yield predicates and propositions.

## **3 Fold/Unfold-Like Transformations**

In this section, we present new fold/unfold-like transformations for MuArith, to enhance the power of MuArith validity checkers. We first informally review fold/unfold transformations for logic programming and explain what kind of transformation we wish to apply to MuArith formulas in Section 3.1. We then prove theorems that justify such transformations in Sections 3.2 and 3.3.

## **3.1 Overview of Transformations for** MuArith

**Revisiting Fold/Unfold Transformations for Logic Programming** The original concept [32] is presented in the following example, where each recurrence is represented by a CHC (i.e., an implication involving uninterpreted predicates *Even* and *Odd*).

$$\begin{array}{ll}Even(x)\Leftarrow x=0 &Even(x)\Leftarrow x>0,Even(x-2)\\ Odd(x)\Leftarrow x=1 & Odd(x)\Leftarrow x>0, Odd(x-2)\end{array}$$

We wish to prove that ⊥ ⇐ *Even*(x), *Odd*(x). Many of the existing CHC solvers, such as HoICE [9] and Z3 [24], fail to prove it as they do not handle the divisibility constraints well. After defining a new predicate *EvenOdd* as *EvenOdd*(x) ⇐ *Even*(x), *Odd*(x) and unfolding *Even*, we obtain the following new CHCs.

$$EvenOdd(x) \Leftarrow x = 0, Odd(x) \qquad EvenOdd(x) \Leftarrow x > 0, Even(x-2), Odd(x) \newline \frac{1}{2}$$

By unfolding *Odd*(x) in the first CHC, its body becomes inconsistent. By unfolding *Odd*(x) in the second CHC, we obtain the following new CHCs.

$$Even Odd(x) \Leftarrow x > 0, Even(x-2), x = 1\\Even Odd(x) \Leftarrow x > 0, Even(x-2), Odd(x-2)$$

By unfolding *Even*(x−2), the body of the first CHC becomes inconsistent. Now, the part "*Odd*(x <sup>−</sup> 2),*Even*(x <sup>−</sup> 2)" in the second CHC matches the definition of *EvenOdd*, so we can "fold" it and obtain the following new CHC.

$$Even Odd(x) \Leftarrow x > 0, Even Odd(x-2)$$

The least solution for *EvenOdd* is λx.⊥, hence we have now obtained ⊥ ⇐ *Even*(x), *Odd*(x) without synthesizing interpretations of *Even* and *Odd* over the divisibility constraints.

**Transformations for** MuArith**.** The above example can be reformulated in MuArith. Predicates *Even* and *Odd* are expressed as follows.

μ*Even*(x).x = 0 <sup>∨</sup> (x > <sup>0</sup> <sup>∧</sup> *Even*(x <sup>−</sup> 2)) (5)

$$
\mu \, Odd(x).x = 1 \vee (x > 0 \wedge Odd(x-2))\tag{6}
$$

We wish to prove that *Even*(x)∧*Odd*(x) is inconsistent, i.e. <sup>∀</sup>x.*Even*(x)∨*Odd*(x) is valid where *Even* and *Odd* are:

$$\begin{array}{c} \nu \overline{Even}(x).x \neq 0 \land (x \le 0 \lor \overline{Even}(x-2))\tag{7} \\\ \overline{On}(x) \quad \forall x \forall x \end{array} \tag{7}$$

$$\nu \overline{Odd}(x). x \neq 1 \land (x \le 0 \lor \overline{Odd}(x-2))\tag{8}$$

Now, let Y (x) <sup>=</sup> *Even*(x) <sup>∨</sup> *Odd*(x), which can be rewritten as follows.

$$\begin{aligned} Y(x) &\equiv \left( x \neq 0 \land \left( x \le 0 \lor \overline{Even}(x-2) \right) \right) \lor \left( x \neq 1 \land \left( x \le 0 \lor \overline{Odd}(x-2) \right) \right) \\ &\equiv \left( x \le 0 \lor x \ne 1 \lor \overline{Even}(x-2) \right) \right) \land \left( x \le 0 \lor \overline{Even}(x-2) \lor \overline{Odd}(x-2) \right) \\ &\equiv x \le 0 \lor \overline{Even}(x-2) \lor \overline{Odd}(x-2) \equiv x \le 0 \lor Y(x-2) \end{aligned}$$

Based on this, we wish to replace Y with νY (x).x <sup>≤</sup> <sup>0</sup> <sup>∨</sup> Y (x <sup>−</sup> 2); then the validity of <sup>∀</sup>x.Y (x) would follow immediately. As we will see later in Section 3.3, this transformation is indeed sound.

Intuitively, the above transformation works as follows. Given a formula C[X], which contains a fixpoint formula X defined by the equation X <sup>=</sup> D[X], introduce a new predicate Y , such that Y <sup>=</sup> C[X]. Then, unfold X to D[X] and obtain Y <sup>=</sup> C[D[X]]. Then, rewrite C[D[X]] to a formula of the form E[C[X]]. By "folding" C[X], we obtain Y <sup>=</sup> E[Y ], which serves as a new definition clause for Y . We wish to apply this kind of transformation not only to <sup>ν</sup>-only formulas like above, but also to formulas involving μ and quantifiers, as discussed below.

Recall formula (2) from Section 2.3. Let X(x, y) <sup>=</sup> <sup>∃</sup>r.*Mult*(x, y, r). Then,

$$\begin{aligned} X(x,y) &\equiv \exists r. ((y = 0 \land r = 0) \lor \exists s. (y \neq 0 \land r = x + s \land Multi(x, y - 1, s))) \\ &\equiv y = 0 \lor (y \neq 0 \land \exists s. Multi(x, y - 1, s)) \\ &\equiv y = 0 \lor (y \neq 0 \land X(x, y - 1)). \end{aligned}$$

As justified later in Section 3.2, we can then replace X with μX(x, y).y = 0∨(y <sup>=</sup> <sup>0</sup> <sup>∧</sup> X(x, y <sup>−</sup> 1)). We are then left with formula <sup>∀</sup>x, y.y < <sup>0</sup> <sup>∨</sup> X(x, y), which can then be proved valid by Mu2CHC [22], the existing MuArith validity checker.

Let us also recall a generalized version of formula (3):

$$\forall x, y, a, r.Mult(x, y, r) \lor Mutacc(x, y, a, r + a),$$

which contains μ and ν. Let Y (x, y, a, r) <sup>=</sup> *Mult*(x, y, r)<sup>∨</sup> *Multacc*(x, y, a, r <sup>+</sup> a). Then, we have:

$$\begin{array}{c} Y(x,y,a,r) \equiv ((y \neq 0 \lor r \neq 0) \land \forall s.(y = 0 \lor r \neq x + s \lor \overline{Multi}(x, y - 1, s))) \\ \lor (y = 0 \land r + a = a) \lor (y \neq 0 \land \mathit{Multaec}(x, y - 1, x + a, r + a)) \\ \equiv (y = 0 \Rightarrow r \neq 0 \lor r + a = a) \\ \land (y \neq 0 \Rightarrow (\overline{Mult}(x, y - 1, r - x) \lor \mathit{Multaec}(x, y - 1, x + a, r + a))) \\ \equiv y \neq 0 \Rightarrow Y(x, y - 1, x + a, r - x) \end{array}$$

As justified in Section 3.3, we can replace Y with νY (x, y, a, r).(y = 0∨Y (x, y <sup>−</sup> <sup>1</sup>, x <sup>+</sup> a, r <sup>−</sup> x)), giving us <sup>∀</sup>x, y, a, r.Y (x, y, a, r) immediately.

Although the above transformations are sound, the soundness of fold/unfold transformations for MuArith is delicate in general. For example, consider formula <sup>∃</sup>x.x <sup>≥</sup> y <sup>∧</sup> X(x, y), where:

$$X \stackrel{\triangle}{=} \nu X(x, y).\\x \ge y + 1 \land X(x, y + 1).$$

It is obviously false since there exists no x that satisfies x <sup>≥</sup> y <sup>∧</sup> x <sup>≥</sup> y + 1 <sup>∧</sup> x <sup>≥</sup> y + 2 ∧···≡ ∀n <sup>≥</sup> <sup>0</sup>.x <sup>≥</sup> y <sup>+</sup> n. Let Y (y) <sup>=</sup> <sup>∃</sup>x.x <sup>≥</sup> y <sup>∧</sup> X(x, y). Then,

$$\begin{array}{c} Y(y) \equiv \exists x. (x \ge y \land x \ge y + 1 \land X(x, y+1)) \\ \equiv \exists x. (x \ge y + 1 \land X(x, y+1)) \equiv Y(y+1). \end{array}$$

Based on this, one may be tempted to replace Y with νY (y).Y (y + 1) <sup>≡</sup> λy., but that is obviously wrong.

In the next two subsections, we present theorems that justify all the transformations above except the last (invalid) one.

#### **3.2 Transformations for** *μ***-Formulas**

In this subsection, we prove a theorem that enables the replacement of a predicate of the form C[μX.D[X]] with one of the form μY.E[Y ] and applies it to justify the transformation for <sup>∃</sup>r.*Mult*(x, y, r) discussed in the previous subsection. The corresponding transformation for ν-formulas is discussed in the next subsection. The theorem is stated as follows.

**Theorem 1.** *Let* C, D *and* E *be* (k, )*,* (k, <sup>k</sup>)*, and* ( , )*-contexts respectively. If* C[D[X]] -<sup>E</sup>[C[X]] *holds for any* <sup>k</sup>*-ary predicate* <sup>X</sup>*, then we have:*

$$C[\mu X(x\_1, \ldots, x\_k).D[X](x\_1, \ldots, x\_k)] \underset{\equiv}{\sqsupset} \mu Y(y\_1, \ldots, y\_\ell).E[Y](y\_1, \ldots, y\_\ell).$$

The theorem follows easily from the definition of the semantics of the least fixpoint operator.

*Proof.* Suppose C[D[X]] E[C[X]]. Then, we have

$$C[\mu X(\tilde{x}).D[X](\tilde{x})] \equiv C[D[\mu X(\tilde{x}).D[X](\tilde{x})]] \supseteq E[C[\mu X(\tilde{x}).D[X](\tilde{x})]].$$

Since μY (y-).E[Y ](y-) is the least predicate Y such that Y E[Y ], we have C[μX(x-).D[X](x-)] μY (y-).E[Y ](y-) as required. 

To see how the theorem above enables fold/unfold-like transformations, suppose that we wish to prove a formula of the form Y <sup>≡</sup> C[μX(x-).D[X](x-)]. It suffices to prove C[D[μX(x-).D[X](x-)]], obtained by unfolding X. If the assumption C[D[X]] E[C[X]] holds, we can change the goal to E[C[μX(x-).D[X](x-)]]. Thus, by the theorem, it suffices to prove μY (y-).E[Y ](y-), which is obtained by "folding" C[μX.D[X](x-)] to <sup>Y</sup> . Note that the theorem guarantees only that the transformation provides an *underapproximation* of the original predicate. A stronger condition is required for the equivalence; see Corollary 1 given later. Note also that finding an appropriate context E may not be easy in general; we discuss how to mechanically find E in Section 5.

*Example 3.* Recall again formula (2) from Section 2.3. Let us define C, D, and E by:

$$\begin{aligned} C &\stackrel{\triangle}{=} \lambda(x,y). \exists r. [](x,y,r) \\ E &\stackrel{\triangle}{=} \lambda(x,y). y = 0 \lor (y \neq 0 \land [](x,y-1)) \\ D &\stackrel{\triangle}{=} \lambda(x,y,r). (y = 0 \land r = 0) \lor \exists s. (y \neq 0 \land r = x + s \land [](x,y-1,s)). \end{aligned}$$

Then, for any ternary predicate X, we have:

$$\begin{array}{l} C[D[X]] \equiv \lambda(x,y). \exists r. (y = 0 \land r = 0) \lor \exists s. (y \neq 0 \land r = x + s \land X(x, y - 1, s)) \\ \equiv \lambda(x,y). y = 0 \lor \exists r. s. (y \neq 0 \land r = x + s \land X(x, y - 1, s)) \\ \equiv \lambda(x,y). y = 0 \lor (y \neq 0 \land \exists s. X(x, y - 1, s)) \equiv E[C[X]]. \end{array}$$

By Theorem 1, we have C[D[*Mult*]] μY (x, y).y = 0 <sup>∨</sup> (y = 0 <sup>∧</sup> Y (x, y)). Thus, the goal <sup>∀</sup>x, y.y < <sup>0</sup> ∨ ∃r.*Mult*(x, y, r) has been reduced to:

$$\forall x, y. y < 0 \lor (\mu Y(x, y). y = 0 \lor (y \neq 0 \land Y(x, y)))(x, y),$$

which can be proved valid by Mu2CHC.

#### **3.3 Fold/Unfold for** *ν***-Formulas**

We now prove a theorem that allows us to replace a predicate of the form C[νX.D[X]] with one of the form νY.E[Y ]. It is similar to Theorem 1, but requires more conditions. Recall Lemma 1, which provides a sufficient syntactic condition for the co-continuity.

**Theorem 2.** *Let* C, D *and* E *be* (k, )*,* (k, k)*, and* ( , )*-contexts respectively. Suppose that the following conditions hold: (i)* C[(k) ] - (-)*, (ii)* C[D[X]] - E[C[X]]*, and (iii)* C *is co-continuous. Then* C[νX(x<sup>1</sup>,...,x<sup>k</sup>).D[X](x<sup>1</sup>,...,x<sup>k</sup>)] νY (y<sup>1</sup>,...,y-).E[Y ](y<sup>1</sup>,...,y-)*.*

*Proof.* For <sup>F</sup> <sup>∈</sup> **<sup>D</sup>**<sup>k</sup> <sup>→</sup> **<sup>D</sup>**<sup>k</sup>, f <sup>∈</sup> **<sup>D</sup>**<sup>k</sup> and an ordinal <sup>γ</sup>, we define <sup>F</sup><sup>γ</sup>((k) ) inductively by: F<sup>0</sup>((k) ) = (k) , F<sup>γ</sup>+1((k) ) = F(F<sup>γ</sup>((k) )), and F<sup>γ</sup>((k) ) = γ-<γF<sup>γ</sup>- ((k) ) if γ is a limit ordinal. By abuse of notation, we write D<sup>γ</sup>[(k) ] for -D<sup>γ</sup>((k) ) if D is a (k, k)-context. Since there exists an ordinal γ such that νX.D[X] = D<sup>γ</sup>[(k) ] and νY.E[Y ] = E<sup>γ</sup>[(-) ], it suffices to show that C[D<sup>γ</sup>[(k) ]] - <sup>E</sup><sup>γ</sup>[(-) ] holds for any ordinal γ, by transfinite induction on γ. The base case where γ = 0 follows immediately from the first condition. If γ is a successor ordinal γ + 1, then

$$E[D^\gamma[\top]] \rightleftharpoons E[C[D^{\gamma'}[\top]]] \rightleftharpoons E[E^{\gamma'}[\top]] \equiv E^\gamma[\top].$$

Here, we have used the induction hypothesis in the second inequality. If γ is a limit ordinal, then we have:

$$C[D^\gamma[\top]] \equiv C[\top\_{\gamma'<\gamma}(D^{\gamma'}[\top])] \equiv \sqcap\_{\gamma'<\gamma} C[D^{\gamma'}[\top]] \sqsubseteq \sqcap\_{\gamma'<\gamma} E^{\gamma'}[\top] \equiv E^\gamma[\top].$$

Here we have used the co-continuity in the second inequality. We have thus proved C[D<sup>γ</sup>[(k) ]] - <sup>E</sup><sup>γ</sup>[(-) ] holds for any ordinal γ. We, therefore, have C[νX(x-).D[X](x-)] νY (y-).E[Y ](y-) as required. 

*Example 4.* Recall the formula <sup>∀</sup>x, y, a, r.*Mult*(x, y, r) <sup>∨</sup> *Multacc*(x, y, a, r <sup>+</sup> a) discussed in Section 3.1. Let us define C, D, E by:

$$\begin{array}{l} C \stackrel{\triangle}{=} \lambda(x, y, a, r). \left[ \right](x, y, r) \lor Mutacc(x, y, a, r + a) \\ D \stackrel{\triangle}{=} \lambda(x, y, r). \left( (y \neq 0 \lor r \neq 0) \land \forall s. (y = 0 \lor r \neq x + s \lor \left[ \right](x, y - 1, s)) \right) \\ E \stackrel{\triangle}{=} \lambda(x, y, a, r). y = 0 \lor \left[ \right](x, y - 1, x + a, r - x) \end{array}$$

They satisfy all the three conditions of Theorem 2. In particular, for any ternary predicate X, we have

$$\begin{array}{l} C[D[X]] \equiv \lambda(x, y, a, r). ((y \neq 0 \lor r \neq 0) \land\\ \forall s. (y = 0 \lor r \neq x + s \lor X(x, y - 1, s))) \lor Mutacc(x, y, a, r + a) \\ \equiv \lambda(x, y, a, r). ((y \neq 0 \lor r \neq 0) \land\\ \forall s. (y = 0 \lor r \neq x + s \lor X(x, y - 1, s))) \\ \qquad \lor (y = 0 \land r + a = a) \lor (y \neq 0 \land Mutacc(x, y - 1, x + a, r + a)) \\ \equiv \lambda(x, y, a, r). y = 0 \lor X(x, y - 1, r - x) \lor\\ \qquad \qquad \textit{Multacc}(x, y - 1, x + a, r + a)) \\ \equiv E[C[X]], \end{array}$$

based on the corresponding transformations shown in Section 3.1. We have thus <sup>∀</sup>x, y, a, r.*Mult*(x, y, r) <sup>∨</sup> *Multacc*(x, y, a, r <sup>+</sup> a) ∀x, y, a, r.(νY (x, y, a, r).y <sup>=</sup> <sup>0</sup> <sup>∨</sup> Y (x, y <sup>−</sup> <sup>1</sup>, x <sup>+</sup> a, r <sup>−</sup> x))(x, y, a, r), and the righthand side can be proved to be valid by Mu2CHC. 

Note that Theorems 1 and 2 guarantee the soundness of the replacement of <sup>C</sup>[αX(x<sup>1</sup>,...,x<sup>k</sup>).D[X](x<sup>1</sup>,...,x<sup>k</sup>)] with νY (y<sup>1</sup>,...,y-).E[X](y<sup>1</sup>,...,y-) (for α ∈ {μ, ν}), but not completeness: the validity of C[αX(x<sup>1</sup>,...,x<sup>k</sup>).D[X](x<sup>1</sup>,..., <sup>x</sup><sup>k</sup>)] does not necessarily imply that of νY (y<sup>1</sup>,...,y-).E[X](y<sup>1</sup>,...,y-). Actually, by combining Theorem 1 and the dual version of Theorem 2, we obtain the following corollary, which guarantees completeness under a stronger condition.

**Corollary 1.** *Let* C, D *and* E *be* (k, )*,* (k, k)*, and* ( , )*-contexts respectively. Suppose that the following conditions hold: (i)* C[⊥(k) ] - <sup>⊥</sup>(-)*, (ii)* C[D[X]] <sup>≡</sup>- E[C[X]]*, and (iii)* C *is continuous. Then* C[μX(x<sup>1</sup>,...,x<sup>k</sup>).D[X](x<sup>1</sup>,...,x<sup>k</sup>)] <sup>≡</sup>- μY (y<sup>1</sup>,...,y-).E[Y ](y<sup>1</sup>,...,y-)*.*

## **4 Further Examples**

In this section, we give more examples to demonstrate the utility of our transformations for relational/temporal property verification of recursive programs.

#### **4.1 Relational Reasoning on Recursive Programs**

Below we discuss an example which is beyond the reach for state-of-the-art CHC solvers (see e.g., [33], the end of Section 5).

*Example 5.* Consider the goal <sup>∀</sup>x, y, z, r.(*Mult*(x <sup>+</sup> y, z, r) ⇒ ∃s, t.*Mult*(x, z, s) <sup>∧</sup> *Mult*(y, z, t) <sup>∧</sup> r <sup>=</sup> s <sup>+</sup> t), which is equivalent to:

$$\forall x, y, z, r. (\overline{Multi}(x+y, z, r) \lor \exists s, t. (Multi(x, z, s) \land Multi(y, z, t) \land r = s + t)),$$

where *Mult* and *Mult* are as given in Section 2.3. The following contexts C, D, and E satisfy the following three conditions of Theorem 2.

$$\begin{array}{l} C \stackrel{\triangle}{=} \lambda(x, y, z, r). \left[ \left](x + y, z, r) \lor \exists s, t. (Mult(x, z, s) \land Mult(y, z, t) \land r = s + t) \right. \right. \\ D \stackrel{\triangle}{=} \lambda(x, z, r). (z \neq 0 \lor r \neq 0) \land (z = 0 \lor \left[ \left](x, z - 1, r - x) \right) \\ E \stackrel{\triangle}{=} \lambda(x, y, z, r). (z = 0 \lor (z \neq 0 \land \left[ \left](x, y, z - 1, r - x - y \right) \right) ). \end{array}$$

By Theorem 2, we have C[*Mult*] νY (x, y, z, r).E[<sup>Y</sup> ](x, y, z, r) <sup>≡</sup> <sup>λ</sup>(x, y, z, r).. We have thus proved that <sup>∀</sup>x, y, z, r.C[*Mult*](x, y, z, r) (i.e., <sup>∀</sup>x, y, z, r.(*Mult*(<sup>x</sup> <sup>+</sup> y, z, r) ⇒ ∃s, t.*Mult*(x, z, s) <sup>∧</sup> *Mult*(y, z, t) <sup>∧</sup> r <sup>=</sup> s <sup>+</sup> t)) is valid. 

## **4.2 Proving Temporal Properties**

Here we give an example of proving a liveness property of a recursive program by using our transformation. The example is a variation of the example discussed in [22], but it cannot be handled by their method for proving MuArith formulas.

*Example 6.* Consider the following OCaml program:

```
let rec sum n = if n=0 then 0 else n+sum(n-1)
let rec loop x = if x=0 then () else loop (x-1)
let rec repeat n = let x = sum n in loop x; repeat(n+1)
let main() = repeat 0
```
Suppose that we wish to prove that the function repeat is called infinitely often. The reduction from linear-time temporal property verification to MuArith yields the problem of determining the validity of *Repeat*(0), where:

$$\begin{array}{l} Repat \stackrel{\triangle}{=} \nu Repat(n). (\exists x. Sum(n, x)) \land (\forall x. \overline{Sum}(n, x) \lor Loop(x)) \land Repat(n+1) \\\ Sum \stackrel{\triangle}{=} \mu Sum(n, x). (n = 0 \land x = 0) \lor (n \neq 0 \land \exists r. Sum(n-1, r) \land x = n + r) \\\ Luop \stackrel{\triangle}{=} \mu Loop(x). x = 0 \lor (x \neq 0 \land Loop(x-1)). \end{array}$$

Here, *Sum* is the De Morgan dual of *Sum*. The validity of this formula cannot be proved by Mu2CHC due to the existential quantifier. Note that Mu2CHC replaces each existential quantifier <sup>∃</sup>x.ϕ with a bounded quantifier <sup>∃</sup>x <sup>≤</sup> a.ϕ, and a must be a linear expression. In the example above, x is not linearly bounded by n. To remove the existential quantifier, let

$$\begin{aligned} &C \stackrel{\triangle}{=} \lambda n. \exists x. [](n, x) \\ &E \stackrel{\triangle}{=} \lambda n. n = 0 \lor (n \neq 0 \land [](n - 1)) \\ &D \stackrel{\triangle}{=} \lambda (n, x). (n = 0 \land x = 0) \lor (x \neq 0 \land \exists r. [](n - 1, r) \land x = n + r). \end{aligned}$$


Since C[D[X]] E[C[X]] holds, we can apply Theorem <sup>1</sup> to underapproximate <sup>∃</sup>x.*Sum*(n, x) by μX(n).n = 0∨(n = 0∧X(n−1)). Therefore, the goal has been reduced to *Repeat* (0) where

$$\begin{array}{c} Repeat' \stackrel{\triangle}{=} \nu Repeat'(n). X(n) \land (\forall x. \overline{Sum}(n, x) \lor Loop(x)) \land Repeat'(n+1) \\\ X \stackrel{\triangle}{=} \mu X(n). n = 0 \lor (n \neq 0 \land X(n-1)), \end{array}$$

which can be proved valid by Mu2CHC automatically.

## **5 Algorithm and Evaluation**

In this section, we first present an algorithm for our transformation and then outline its implementation and report on experimental results.

#### **5.1 Algorithm**

Theorems 1 and 2 given in Section 3 state sufficient conditions for our fold/unfold transformation to be sound. In this subsection, we discuss how to systematically apply the theorems and how to find a context E.

To make it easy to find E, we restrict input formulas of our transformations to those of the form X(f(x, y)) <sup>∨</sup> Y (g(x, y)), X(f(x, y)) <sup>∧</sup> Y (g(x, y)), and <sup>∃</sup>y.X(f(x, y)), where X and Y are predicates defined by fixpoint operators, and f(x, y) and g(x, y) denote (possibly sequences of) terms that may contain free variables x and y. For the sake of simplicity, we assume here that the definitions for X and Y are independent; X cannot be obtained by unfolding Y , and vice versa. Transformations for more complex formulas like the one in Example 5 can be achieved by repeatedly applying the transformations for smaller contexts.

The transformation algorithm for disjunctive formulas is shown in Algorithm 1. It takes as input a formula Φ <sup>=</sup> X(f(x, y)) <sup>∨</sup> Y (g(x, y)) and outputs an underapproximation <sup>Φ</sup> of Φ. It can take [ ](f(x, y)) <sup>∨</sup> Y (g(x, y)) or X(f(x, y)) <sup>∨</sup> [ ](g(x, y)) as the context C and apply Theorem <sup>2</sup> if X or Y is

## **Algorithm 2:** Fold/unfold for ∃

**Input:** Formula Φ of the form <sup>∃</sup>y.X(f(x, y)), where X is a predicate defined by μ or ν. **Output:** A formula <sup>Φ</sup> such that Φ Φ . **<sup>1</sup>** <sup>ψ</sup><sup>i</sup> <sup>←</sup> dnf(normalize∃(unfold(Φ))); **<sup>2</sup> for each** <sup>ψ</sup><sup>i</sup> **do <sup>3</sup> if** <sup>ψ</sup><sup>i</sup> has the form (∃z.X(s)) <sup>∧</sup> <sup>ψ</sup> <sup>i</sup>, and f(t<sup>x</sup>, y) <sup>≡</sup> [t<sup>z</sup>/z]s, where **FV**(t<sup>x</sup>) ⊆ {x}, **FV**(t<sup>z</sup>) ⊆ {x, y} **then <sup>4</sup>** <sup>ψ</sup><sup>i</sup> <sup>←</sup> <sup>Z</sup>(t<sup>x</sup>) <sup>∨</sup> <sup>ψ</sup> i; **<sup>5</sup> return** μZ(x). ψ<sup>i</sup>;

defined by ν, and Theorem <sup>1</sup> otherwise (line 1). On line 2, the algorithm unfolds X and Y <sup>6</sup> and then normalizes the resulting formula to a conjunctive normal form (CNF), where quantified formulas are treated as atomic. It then applies the "fold" transformation to each conjunct <sup>ψ</sup><sup>i</sup>. To this end, for each <sup>ψ</sup><sup>i</sup> that contains <sup>X</sup>(s<sup>1</sup>) <sup>∨</sup> <sup>Y</sup> (s<sup>2</sup>), the algorithm finds terms <sup>t</sup><sup>1</sup> and <sup>t</sup><sup>2</sup> such that X(s<sup>1</sup>) <sup>∨</sup> Y (s<sup>2</sup>) <sup>≡</sup> X(f(t<sup>1</sup>, t<sup>2</sup>)) <sup>∨</sup> Y (g(t<sup>1</sup>, t<sup>2</sup>)); this is achieved by solving the unification constraints <sup>s</sup><sup>1</sup> <sup>≡</sup> <sup>f</sup>(x , y ) and <sup>s</sup><sup>2</sup> <sup>≡</sup> <sup>g</sup>(x , y ) modulo arithmetic theories, where x and <sup>y</sup> are treated as variables but <sup>x</sup> and <sup>y</sup> are treated as constants. Finally, the algorithm replaces <sup>X</sup>(s<sup>1</sup>) <sup>∨</sup> Y (s<sup>2</sup>) with Z(t<sup>1</sup>, t<sup>2</sup>), where Z(x, y) is a new predicate that corresponds to X(f(x, y)) <sup>∨</sup> Y (g(x, y)).

We omit the transformation algorithm for conjunctive formulas since it is similar to the case above, except that the new predicate Z is bound by μ (note that condition (i) of Theorem 2 may not be satisfied), and that it converts the unfolded formula to a disjunctive normal form (DNF), instead of CNF.

The algorithm for existential formulas is shown in Algorithm 2. It unfolds X, normalizes existential quantifiers, and obtains a DNF. In the normalization of existential quantifiers, it moves existential quantifiers inwards (by using, e.g., the law <sup>∃</sup>x.(ψ<sup>1</sup> <sup>∨</sup> <sup>ψ</sup><sup>2</sup>) <sup>≡</sup> (∃x.ψ<sup>1</sup>) <sup>∨</sup> (∃x.ψ<sup>2</sup>)) and eliminates them as much as possible (by using, e.g., the equality-based quantifier elimination). For each disjunct <sup>ψ</sup><sup>i</sup> of the form (∃z.X(s))∧ψ <sup>i</sup>, it finds <sup>t</sup><sup>x</sup> and <sup>t</sup><sup>z</sup>, such that <sup>f</sup>(t<sup>x</sup>, y) <sup>≡</sup> [t<sup>z</sup>/z]<sup>s</sup> (again, by performing unification modulo arithmetic theories), and replaces the disjunct with <sup>Z</sup>(t<sup>x</sup>) <sup>∧</sup> ψ <sup>i</sup>. Here, <sup>Z</sup>(t<sup>x</sup>) corresponds to <sup>∃</sup>y.X(f(t<sup>x</sup>, y)), and <sup>t</sup><sup>z</sup> serves as a witness for X(f(t<sup>x</sup>, y)) ⇒ ∃z.X(s).

#### **5.2 Implementation and Experiments**

We have implemented the transformation in a tool called MuFolder based on the algorithms discussed above, on top of the AdtInd theorem prover [37], using its routines for pattern-matching, normalization, and simplification. For the implication checks, MUnfold uses the Z3 SMT solver [27]. MuFolder can be tested at https://www.kb.is.s.u-tokyo.ac.jp/∼koba/mu/.

<sup>6</sup> If none of ψ<sup>i</sup>'s are changed in the loop on lines 3-5, we may backtrack and unfold X and Y more than once.


**Table 1.** Experiments.

We have evaluated MuFolder on several benchmarks outlined in Table 1. These benchmarks include formulas obtained from the relational and temporal verification properties; some of which have been taken from the benchmark set for Unno et al.'s induction-based CHC solver [33] and modified to include both μ and ν. We have confirmed that all the benchmark problems can be solved in our approach within a few seconds. To our knowledge, except the formulas 7, 8 (for which the method of [33] can be used) and 10,11 (for which Mu2CHC works), Mu2CHC (without our transformation) or the existing CHC solvers cannot directly prove the validity of the formulas. Note that formula 12 comes from Example 6. The combination of the transformation with Mu2CHC enables fully automated verification of Example 6.

## **6 Related Work**

As already mentioned, fold/unfold transformations have been originally proposed for logic programming [32], and later extended for CHC (a.k.a. constraint logic programs) [1,17]. Those transformations have originally been proposed to speed up program execution, but recently, Mordvinov and Fedyukovich [26] and De Angelis et al. [15] shown that related transformations are also useful in the context of verification based on CHC solving. Those transformations correspond to the transformation for the ν-only fragment of MuArith. <sup>7</sup> Our transformation can thus be considered an extension of fold/unfold-like transformations to MuArith, which allows alternations of least/greatest fixpoints. Sato [31] studied an extension of fold/unfold transformations for a first-order logic, where negations and quantifiers are allowed in clause bodies; thus, some mixtures of least/greatest fixpoints are allowed. The correctness of his transformation is, however, based on a three-valued logic, hence different from MuArith. The correctness of most of the

<sup>7</sup> This is because, although the semantics of each predicate is interpreted as the least fixpoint, the predicates occur in negative positions in goal clauses.

transformations mentioned above is guaranteed by some syntactic conditions, while our transformation is based on semantic conditions.

Unno et al. [33] proposed a method for automatically solving CHC problems by using induction. Their method is based on a tailor-made proof system; hence it is difficult to integrate the method with other CHC or MuArith solvers (in fact, that disadvantage motivated the above-mentioned work of De Angelis et al. [15]). Their method slightly goes beyond the CHC satisfiability (or the ν-only fragment of MuArith) but cannot deal with complex combinations of least/greatest fixpoints and quantifiers (like <sup>∀</sup>x, y, z, r.(*Mult*(x <sup>+</sup> y, z, r) <sup>⇒</sup> <sup>∃</sup>s, t.*Mult*(x, z, s) <sup>∧</sup> *Mult*(y, z, t) <sup>∧</sup> r <sup>=</sup> s <sup>+</sup> t), discussed in Section 4).

As mentioned in Section 1, fixpoint logic-based approaches to program verification (including CHC-based ones) have been drawing attention. Kobayashi et al. [22, 23, 35] have shown that temporal property verification of (higher-order) programs can be reduced to the validity checking of (higher-order) fixpoint logic formulas. They proposed a concrete method for checking validity of first-order fixpoint formulas and implemented a validity checking tool Mu2CHC. As discussed already, our transformations can be used to improve the capability of Mu2CHC. Another thread of work on a fixpoint logic-based approach to system verification is that of Parameterized Boolean Equation Systems (PBES) [21]. Actually, MuArith may be considered an instance of PBES, where data are restricted to integers. Groote, Willemse, and others [10, 14, 21, 30, 36] studied applications of PBES to verification of infinite state systems, and devised various techniques for solving PBES. To our knowledge, however, they have not studied fold/unfold transformations for PBES.

## **7 Conclusions**

We have formalized fold/unfold-like transformations for a fixpoint logic, and shown that they are useful for verification of relational/temporal properties of recursive programs. We have implemented the transformations, and shown their effectiveness through experiments.

**Acknowledgments.** We would like to thank anonymous referees for useful comments, especially for bringing the work on PBES to our attention. This work was supported in part by the University of Tokyo-Princeton Strategic Partnership Grant, JSPS KAKENHI Grant Number JP15H05706, and NSF (USA) award FMitF 1837030.

## **References**

1. Bensaou, N., Guessarian, I.: Transforming constraint logic programs. In: STACS 94, 11th Annual Symposium on Theoretical Aspects of Computer Science, Caen, France, February 24-26, 1994, Proceedings. LNCS, vol. 775, pp. 33–46. Springer (1994). https://doi.org/10.1007/3-540-57785-8 129


27-30, 2013. Proceedings. LNCS, vol. 8052, pp. 470–484. Springer (2013). https://doi.org/10.1007/978-3-642-40184-8 33


ference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29- April 6, 2008. Proceedings. LNCS, vol. 4963, pp. 337–340. Springer (2008). https://doi.org/10.1007/978-3-540-78800-3 24


source, provide a link to the Creative Commons license and indicate if changes were made. **Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Tools and Case Studies

## **Verifying OpenJDK's** LinkedList **using KeY**

Hans-Dieter A. Hiep()1 , Olaf Maathuis<sup>3</sup>, Jinting Bian<sup>1</sup>, Frank S. de Boer<sup>1</sup>, Marko van Eekelen<sup>2</sup>, and Stijn de Gouw<sup>2</sup>

<sup>1</sup> CWI, Science Park 123, 1098 XG Amsterdam, The Netherlands

{hdh,j.bian,frb}@cwi.nl <sup>2</sup> Open University, P.O. Box 2960, 6401 DL Heerlen, The Netherlands {marko.vaneekelen,stijn.degouw}@ou.nl <sup>3</sup> Achmea, P.O. Box 700, 7300 HC Apeldoorn, The Netherlands

olaf.maathuis@achmea.nl

**Abstract.** As a particular case study of the formal verification of stateof-the-art, real software, we discuss the specification and verification of a corrected version of the implementation of a linked list as provided by the Java Collection framework.

**Keywords:** Java standard library · deductive verification · KeY · Java Modeling Language · case study · bug

## **1 Introduction**

Software libraries are the building blocks of millions of programs, and they run on the devices of billions of users every day. Therefore, their correctness is of the utmost importance. The importance and potential of formal software verification as a means of rigorously validating state-of-the-art, real software and improving it, is convincingly illustrated by its application to TimSort, the default sorting library in many widely used programming languages, including Java and Python, and platforms like Android (see [7,9]): a crashing implementation bug was found.

The Java implementation of TimSort belongs to the Java Collection framework which provides implementations of basic data structures and is among the most widely used libraries. Nonetheless, over the years, 877 bugs in the Collections Framework have been reported in the official OpenJDK bug tracker.

Due to the intrinsic complexity of modern software, the possibility of interventions by a human verifier is indispensable for proving correctness. This holds in particular for the Java Collection library, where programs are expected to behave correctly for inputs of arbitrary size. As a particular case study, we discuss the formal verification of a corrected version of the implementation of a linked list as specified by the class LinkedList of the Java Collection framework in Java 8. Apart from the fact that the data structure of a linked list is one of the basic structures for storing and maintaining unbounded data, this is an interesting case study because it provides further evidence that formal verification of real software can lead to major improvements and correctness guarantees.

Fig. 1: Workflow

We follow the general workflow underlying the Tim-Sort case as depicted in Fig. 1. The workflow starts with a formalisation of the informal documentation of the Java code in the Java Modeling Language [10,16]. This formalisation goes hand in hand with the formal verification: failed verification attempts can provide information about further refinements of the specs. A failed verification attempt may also indicate an error in the code, and can as such be used for the generation of test cases to detect the error at run-time.

LinkedList is the only List implementation in the Collection Framework that allows collections of unbounded size. During verification we found out that the Java linked list implementation does not correctly take into account the Java integer overflow semantics. It is exactly for large lists (<sup>≥</sup> <sup>2</sup><sup>31</sup> items), that the implementation breaks. This basic observation gave rise to a number of test cases which show that Java's

LinkedList class breaks 22 methods out of a total of 25 methods of the List! 4

On the basis of these test cases we propose in Sect. 2 a code revision of the Java linked list implementation, and formally specify and verify its correctness in Sect. 3 with respect to the Java integer overflow semantics. Section 4 discusses the main challenges posed by this case study and related work.

This case study has been carried out using the state-of-the-art KeY theorem prover [3], because it formalizes the integer overflow semantics of Java and it allows to directly "load" Java programs. An archive of proof files and the KeY version used in this study is available on-line in the Zenodo repository [2].

## **2** LinkedList **in OpenJDK**

LinkedList was introduced in Java version 1.2 as part of Java's Collection Framework in 1998. The LinkedList class is part of the type hierarchy of this framework: LinkedList implements the List interface, and also supports all general Collection methods as well as the methods from the Queue and Deque interfaces. The List interface provides positional access to the elements of the list, where each element is indexed by Java's primitive int type.

The structure of the LinkedList class is shown in Listing 1. This class has three attributes: a size field, which stores the number of elements in the list, and two fields that store a reference to the first and last node. Internally, it uses the private static nested Node class to represent the items in the list. A static nested private class behaves like a top-level class, except that it is not visible outside the enclosing class (LinkedList, in this case). Nodes are doubly linked; each node is connected to the preceding (field prev) and succeeding node

<sup>4</sup> We filed a bug report to Oracle's security team. Once the report is made public by the Java maintainers, we will add the URL as metadata to our repository [2].

```
public class LinkedList<E>
      extends AbstractSequentialList<E>
      implements List<E>, Deque<E>, ... {
  transient int size = 0;
  transient Node<E> first;
  transient Node<E> last;
  private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;
    Node(Node<E> p, E i, Node<E> n) ...
  }
  ...
}
                                                 public boolean add(E e) {
                                                   linkLast(e);
                                                   return true;
                                                 }
                                                 void linkLast(E e) {
                                                   final Node<E> l = last;
                                                   final Node<E> newNode =
                                                       new Node<>(l, e, null);
                                                   last = newNode;
                                                   if (l == null) first = newNode;
                                                   else l.next = newNode;
                                                   size++;
                                                   modCount++;
                                                 }
```
Listing 1: The LinkedList class defines a doubly-linked list data structure.

```
public int indexOf(Object o) {
    int index = 0;
    if (o == null) {
        for (Node<E> x = first; x != null; x = x.next) {
            if (x.item == null)
                 return index;
            index++;
        }
    } else {
        for (Node<E> x = first; x != null; x = x.next) {
            if (o.equals(x.item))
                 return index;
            index++;
        }
    }
    return -1;
}
```
Listing 2: The indexOf method searches for an element from the first node on.

(field next). These fields contain null in case no preceding or succeeding node exists. The data itself is contained in the item field of a node.

LinkedList contains 57 methods. Due to space limitations, we now focus on three characteristic methods: see Listing 1 and Listing 2. Method add(E) calls method linkLast(E), which creates a new Node object to store the new item and adds the new node to the end of the list. Finally the new size is determined by unconditionally incrementing the value of the size field, which has type int. Method indexOf(Object) returns the position (of type int) of the first occurrence of the specified element in the list, or −1 if it's not present.

Each linked list consists of a sequence of nodes. Sequences are finite, indexing of sequences starts at zero, and we write σ[i] to mean the ith element of some sequence σ. A chain is a sequence σ of nodes of length n > 0 such that: the prev reference of the first node σ[0] is null, the next reference of the last node σ[n − 1] is null, the prev reference of node σ[i] is node σ[i − 1] for every index 0 <i<n, and the next reference of node σ[i] is node σ[i + 1] for every index 0 ≤ i<n−1. The first and last references of a linked list are either both null to represent the empty linked list, or there is some chain σ between the first and last node, viz. σ[0] = first and σ[n − 1] = last. Figure 2 shows example instances. Also see standard literature such as Knuth's [15, Section 2.2.5].

Fig. 2: Three example linked lists: empty, with a chain of one node, and with a chain of two nodes. Items themselves are not shown.

We make a distinction between the actual size of a linked list and its cached size. In principle, the size of a linked list can be computed by walking through the chain from the first to the last node, following the next reference, and counting the number of nodes. For performance reasons, the Java implementation also maintains a cached size. The cached size is stored in the linked list instance.

Two basic properties of doubly-linked lists are acyclicity and unique first and last nodes. Acyclicity is the statement that for any indices 0 ≤ i<j<n the nodes σ[i] and σ[j] are different. First and last nodes are unique: for any index i such that σ[i] is a node, the next of σ[i] is null if and only if i = n − 1, and prev of σ[i] is null if and only if i = 0. Each item is stored in a separate node, and the same item may be stored in different nodes when duplicate items are present in the list.

#### **2.1 Integer overflow bug**

The size of a linked list is encoded by a signed 32-bit integer (Java's primitive int type) that has a two's complement binary representation where the most significant bit is a sign bit. The values of int are bounded and between <sup>−</sup>2<sup>31</sup> (Integer.MIN VALUE) and 2<sup>31</sup> <sup>−</sup> 1 (Integer.MAX VALUE), inclusive. Adding one to the maximum value, 2<sup>31</sup> <sup>−</sup> 1, results in the minimum value, <sup>−</sup>2<sup>31</sup>: the carry of addition is stored in the sign bit, thereby changing the sign.

Since the linked list implementation maintains one node for each element, its size is implicitly bounded by the number of node instances that can be created. Until 2002, the JVM was limited to a 32-bit address space, imposing a limit of 4 gigabytes (GiB) of memory. In practice this is insufficient to create 2<sup>31</sup> node instances. Since 2002, a 64-bit JVM is available allowing much larger amounts of addressable memory. Depending on the available memory, in principle it is now possible to create 2<sup>31</sup> or more node instances. In practice such lists can be constructed today on systems with 64 gigabytes of memory, e.g., by repeatedly adding elements. However, for such large lists, at least 20 methods break, caused by signed integer overflow. For example, several methods crash with a run-time exception or exhibit unexpected behavior!

Integer overflow bugs are a common attack vector for security vulnerabilities: even if the overflow bug may seem benign, its presence may serve as a small step in a larger attack. Integer overflow bugs can be exploited more easily on large memory machines used for 'big data' applications. Already, real-world attacks involve Java arrays with approximately <sup>2</sup><sup>32</sup>/<sup>5</sup> elements [11, Section 3.2].

The Collection interface allows for collections with over Integer.MAX - VALUE elements. For example, its documentation (Javadoc) explicitly states the behavior of the size() method: 'Returns the number of elements in this collection. If this collection contains more than Integer.MAX VALUE elements, returns Integer.MAX VALUE'. The special case ('more than . . . ') for large collections is necessary because size() returns a value of type int.

When add(E) is called and unconditionally increments the size field, an overflow happens after adding 2<sup>31</sup> elements, resulting in a negative size value. In fact, as the Javadoc of the List interface describes, this interface is based on integer indices of elements: 'The user can access elements by their integer index (position in the list), . . . '. For elements beyond Integer.MAX VALUE, it is very unclear what integer index should be used. Since there are only 2<sup>32</sup> different integer values, at most 2<sup>32</sup> node instances can be associated with an unique index. For larger lists, elements cannot be uniquely addressed anymore using an integer index. In essence, as we shall see in more detail below, the bounded nature of the 32-bit integer indices implies that the design of the List interfaces breaks down for large lists on 64-bit architectures. The above observations have many ramifications: it can be shown that 22 of 25 methods in the List interface are broken. Remarkably, the actual size of the linked list remains correct as the chain is still in place: most methods of the Queue interface still work.

#### **2.2 Reproduction**

We have run a number of test cases to show the presence of bugs caused by the integer overflow. The running Java version was Oracle's JDK8 (build 1.8.0 201 b09) that has the same LinkedList implementation as in OpenJDK8. Before running a test case, we set up an empty linked list instance. Below, we give an high-level overview of the test cases. Each test case uses letSizeOverflow() or addElementsUntilSizeIs0(): these repeatedly call the method add() to fill the linked list with null elements, and the latter method also adds a last element ("this is the last element") causing size to be 0 again.

1. Directly after size overflows, the size() methods returns a negative value, violating what the corresponding Javadoc stipulates: its value should remain Integer.MAX VALUE = 2<sup>31</sup> <sup>−</sup> 1. letSizeOverflow();

```
System.out.println("linkedList.size() = " + linkedList.size() + ", actual: " + count);
// linkedList.size() = -2147483648, actual: 2147483648
```
Clearly this behavior is in contradiction with the documentation. The actual number of elements is determined by having a field count (of type long) that is incremented each time the method add() is called.

2. The query method get(int) returns the element at the specified position in the list. It throws an IndexOutOfBoundsException exception when size is negative. From the informal specification, it is unclear what indices should be associated with elements beyond Integer.MAX VALUE.

```
letSizeOverflow();
System.out.println(linkedList.get(0));
// Exception in thread "main" IndexOutOfBoundsException: Index: 0, Size: -2147483648
// at java.util.LinkedList.checkElementIndex(LinkedList.java:555) ...
```
3. The method toArray() returns an array containing all of the elements in this list in proper sequence (from first to last element). When size is negative, this method throws a NegativeArraySizeException exception. Furthermore, since the array size is bounded by 2<sup>31</sup> <sup>−</sup> 1 elements<sup>5</sup>, the contract of toArray() is unsatisfiable for lists larger than this. The method Collections.sort(List<T>) sorts the specified list into ascending order, according to the natural ordering of its elements. This method calls toArray(), and therefore also throws a NegativeArraySizeException.

```
letSizeOverflow();
Collections.sort(linkedList);
// Exception in thread "main" NegativeArraySizeException
// at java.util.LinkedList.toArray(LinkedList.java:1050)...
```
4. Method indexOf(Object o) returns the index of the first occurrence of the specified element in this list, or −1 if this list does not contain the element. However due to the overflow, it is possible to have an element in the list associated to index −1, which breaks the contract of this method.

```
addElementsUntilSizeIs0();
String last;
System.out.println("linkedList.getLast() = " + (last = linkedList.getLast()));
// linkedList.getLast() = This is the last element
System.out.println("linkedList.indexOf(" + last + ") = " + linkedList.indexOf(last));
// linkedList.indexOf(This is the last element) = -1
```
5. Method contains(Object o) returns true if this list contains the specified element. If an element is associated with index −1, it will indicate wrongly that this particular element is not present in the list.

```
addElementsUntilSizeIs0();
String last;
System.out.println("linkedList.getLast() = " + (last = linkedList.getLast()));
// linkedList.getLast() = This is the last element
System.out.println("linkedList.contains(" + last + ") = " linkedList.contains(last));
// linkedList.contains(This is the last element) = false
```
Specifically, method letSizeOverflow() adds 2<sup>31</sup> elements that causes the overflow of size. Method addElementsUntilSizeIs0() first adds 2<sup>32</sup> <sup>−</sup> 1 elements: the value of size is then −1. Then, it adds the last element, and size is 0 again. All elements added are null, except for the last element. For test cases 4 and 5, we deliberately misuse the overflow bug to associate an element with index −1. This means that method indexOf(Object) for this element returns −1, which according to the documentation means that the element is not present. For test cases 1, 2 and 3 we needed 65 gigabytes of memory for the JRE on a VM with 67 gigabytes of memory. For test cases 4 and 5 we needed 167 gigabytes of memory for the JRE on a VM with 172 gigabytes of memory. All test cases were carried out on a machine in a private cloud (SURFsara), which provides instances that satisfy these system requirements.

<sup>5</sup> In practice, the maximum array length turns out to be 2<sup>31</sup> <sup>−</sup> 5, as some bytes are reserved for object headers, but this may vary between Java versions [11,14].

#### **2.3 Mitigation**

There are multiple directions for mitigating the overflow bug: do not fix, fail fast, long size field and long or BigInteger indices. Due to lack of space, we describe only the fail fast solution. This solution stays reasonably close to the original implementation of LinkedList and does not leave any behavior unspecified.

In the fail fast solution, we ensure that the overflow of size may never occur. Whenever elements would be added that cause the size field to overflow, the operation throws an exception and leaves the list unchanged. As the exception is triggered right before the overflow would otherwise occur, the value of size is guaranteed to be bounded by Integer.MAX VALUE, i.e. it never becomes negative.

This solution requires a slight adaptation of the implementation: for methods that increase the size field, only one additional check has to be performed before a LinkedList instance is modified. This checks whether the result of the method causes an overflow of the size field. Under this condition, an IllegalStateException is thrown. Thus, only in states where size is less than Integer.MAX VALUE, it is acceptable to add a single element to the list.

We shall work in a separate class called BoundedLinkedList: this is the improved version that does not allow more than 2<sup>31</sup> <sup>−</sup>1 elements. Compared to the original LinkedList, two methods are added, isMaxSize() and checkSize():

```
private boolean isMaxSize() {
  return size == Integer.MAX_VALUE;
}
private void checkSize() {
  if (isMaxSize())
    throw new IllegalStateException("Not enough space");
}
```
These methods implement an overflow check. The latter method is called before any modification occurs that increases the size by one: this ensures that size never overflows. Some methods now differ when compared to the original LinkedList, as they involve an invocation of the checkSize() method.

## **3 Specification and verification of** BoundedLinkedList

The aim of our specification and verification effort is to verify formalizations of the given Javadoc specifications (stated in natural language) of the LinkedList. This includes establishing absence of overflow errors. Moreover, we restrict our attention only to the revised BoundedLinkedList and not to the rest of the Collection Framework or Java classes: methods that involve parameters with interface types, Java serialization or Java reflection are considered out of scope.

(Bounded)LinkedList inherits from AbstractSequentialList, but we consider its inherited methods out of scope. These methods operate on other collections such as removeAll or containsAll, and methods that have other classes as return type such as iterator. However, these methods call methods overridden by (Bounded)LinkedList, and can not cause an overflow by themselves.

We have made use of KeY's stub generator to generate dummy contracts for other classes that BoundedLinkedList depends on, such as for the inherited interfaces and abstract super classes: these contracts conservatively specify that every method may arbitrarily change the heap. The stub generator moreover deals with generics by erasing the generic type parameters. For exceptions we modify their stub contract to assume that their constructors are pure, viz. leaving existing objects on the heap unchanged. An important stub contract is the equality method of the absolute super class Object, which we have adapted: we assume every object has a side-effect free, terminating and deterministic implementation of its equality method<sup>6</sup>:

```
public class Object {
    /*@ public normal_behavior
      @ requires true;
      @ ensures \result == self.equals(param0);
      @*/
    public /*@ helper strictly_pure @*/ boolean
      equals(/*@ nullable */ Object param0);
    ...
}
```
## **3.1 Specification**

Following our workflow, we have iterated a number of times before the specifications we present here were obtained. This is a costly procedure, as revising some specifications requires redoing most verification effort. Until sufficient information is present in the specification, proving for example termination of a method is difficult or even impossible: from stuck verification attempts, and an intuitive idea of why a proof is stuck, the specification is revised.

Ghost fields. We use JML's ghost fields: these are logical fields that for each object gets a value assigned in a heap. The value of these fields are conceptual, i.e. only used for specification and verification purposes. During run-time, this field is not present and cannot affect the course of execution. Our improved class is annotated with two ghost fields: nodeList and nodeIndex.

The type of the nodeList ghost field is an abstract data type of sequences, a KeY built-in. This type has standard constructors and operations that can be used in contracts and in JML set annotations. A sequence has a length, which is finite but unbounded. The type of a sequence's length is \bigint. In KeY a sequence is unityped: all its elements are of the any sort, which can be any Java object reference or primitive, or built-in abstract data type. One needs to apply appropriate casts and track type information for a sequence of elements in order to cast elements of the any sort to any of its subsorts.

The nodeIndex ghost field is used as a ghost parameter with unbounded but finite integers as type. This ghost parameter is only used for specifying the behavior of the methods unlink(Node) and linkBefore(Object, Node). The ghost parameter tracks at which index the Node argument is present in the nodeList. This information is implicit and not needed at run-time.

<sup>6</sup> In reality, there are Java classes for which equality is not terminating. A nice example is LinkedList itself, where adding a list to itself leads to a StackOverflowError when testing equality with a similar instance. We consider the issue out of scope of this study as this behavior is explicitly described by the Javadoc.

Class invariant. The ghost field nodeList is used in the class invariant of our improved implementation, see below. We relate the fields first and last that hold a reference to a Node instance, and the chain between first and last, to the contents of the sequence in the ghost field nodeList. This allows us to express properties in terms of nodeList, where they reflect properties about the chain on the heap. One may compare this invariant with the description of chains as given in Sect. 2.

```
1 //@ private ghost \seq nodeList;
2 //@ private ghost \bigint nodeIndex;
3 /*@ invariant
4 @ nodeList.length == size &&
5 @ nodeList.length <= Integer.MAX_VALUE &&
6 @ (\forall \bigint i; 0 <= i < nodeList.length;
7 @ nodeList[i] instanceof Node) &&
8 @ ((nodeList == \seq_empty && first == null && last == null)
9 @ || (nodeList != \seq_empty && first != null &&
10 @ first.prev == null && last != null &&
11 @ last.next == null && first == (Node)nodeList[0] &&
12 @ last == (Node)nodeList[nodeList.length-1])) &&
13 @ (\forall \bigint i; 0 < i < nodeList.length;
14 @ ((Node)nodeList[i]).prev == (Node)nodeList[i-1]) &&
15 @ (\forall \bigint i; 0 <= i < nodeList.length-1;
16 @ ((Node)nodeList[i]).next == (Node)nodeList[i+1]);
17 @*/
```
The actual size of a linked list is the length of the ghost field nodeList, whereas the cached size is stored in a 32-bit signed integer field size. On line 4, the invariant expresses that these two must be equal. Since the length of a sequence (and thus nodeList) is never negative, this implies that the size field never overflows. On line 5, this is made explicit: the real size of a linked list is bounded by Integer.MAX VALUE. Line 5 is redundant as it follows from line 4, since a 32-bit integer never has a value larger than this maximum value. The condition on lines 6–7 requires that every node in nodeList is an instance of Node which implies it is non-null.

A linked list is either empty or non-empty. On line 8, if the linked list is empty, it is specified that first and last must be null references. On lines 9–12, if the linked list is non-empty, it is specified that first and last are nonnull and moreover that the prev field of the first Node and the next field of the last Node are null. The nodeList must have as first element the node pointed to by first, and last as last element. In any case, but vacuously true if the linked list is empty, the nodeList forms a chain of nodes: lines 13–16 describe that, for every node at index 0 <i< size, the prev field must point to its predecessor, and similar for successor nodes.

We note three interesting properties that are implied by the above invariant: acyclicity, unique first and unique last node. These properties can be expressed as JML formulas as follows:

```
(\forall \bigint i; 0 <= i < nodeList.length - 1;
    (\forall \bigint j; i < j < nodeList.length;
        nodeList[i] != nodeList[j])) &&
(\forall \bigint i; 0 <= i < nodeList.length;
    nodeList[i].next == null <==> i = nodeList.length - 1) &&
(\forall \bigint i; 0 <= i < nodeList.length;
    nodeList[i].prev == null <==> i = 0)
```
These properties are not literally part of our invariant, but their validity is proven interactively in KeY as a consequence of the invariant. Otherwise, we would need to reestablish also these properties each time we show the invariant holds.

Methods. All methods within scope are given a JML contract that specify its normal behavior and its exceptional behavior. As an example contract, consider the lastIndexOf(Object) method in Listing 3: it searches through the chain of nodes until it finds a node with an item equal to the argument. This method is interesting due to a potential overflow of the resulting index. BoundedLinkedList together with all method specifications are available on-line [2].

#### **3.2 Verification**

We start by giving a general strategy we apply to verify proof obligations. We also describe in more detail how to produce a single proof, in this case lastIndexOf(Object). This gives a general feel how proving in KeY works. This method is neither trivial, nor very complicated to verify. In this manner, we have produced proofs for each method contract that we have specified.

Overview of verification steps. When verifying a method, we first instruct KeY to perform symbolic execution. Symbolic execution is implemented by a number of proof rules that transform modal operators on program fragments in JavaDL. During symbolic execution, the goal sequent is automatically simplified, potentially leading to branches. Since our class invariant contains a disjunction (either the list is empty or not), we do not want these cases to be split early in the symbolic execution. Thus we instruct KeY to delay unfolding the class invariant. When symbolic execution is finished, goals may still contain updated heap expressions that must be simplified further. After this has been done, one can compare the open goals to the method body and its annotations, and see whether the open goals in KeY look familiar and check whether they are true.

In the remaining part of the proof the user must find an appropriate mix between interactive and automatic steps. If a sequent is provable, there may be multiple ways to construct a closed proof tree. At (almost) every step the user has a choice between applying steps manually or automatically. It requires some experience in choosing which rules to apply manually: clever rule application decreases the size of the proof tree. Certain rules are never applied automatically, such as the cut rule. The cut rule splits a proof tree into two parts by introducing a detour, but significantly reduces the size of a proof and thus the effort required to produce it. For example, the acylicity property can be introduced using cut.

Verification example. The method lastIndexOf has two contracts: one involves a null argument, and another involves a non-null argument. Both proofs are similar. Moreover, the proof for indexOf(...) is similar but involves the next reference instead of the prev reference. This contract is interesting, since proving its correctness shows the absence of the overflow of the index variable.

#### **Proposition.** lastIndexOf(Object) as specified in Listing 3 is correct.

Proof. Set strategy to default strategy, and set max. rules to 5,000, class axiom delayed. Finish symbolic execution on the main goal. Set strategy to 1,000 rules

```
/*@
 @ also
 @ ...
 @ public normal_behavior
 @ requires
 @ o != null;
 @ ensures
 @ \result >= -1 && \result < nodeList.length;
 @ ensures
 @ \result == -1 ==>
 @ (\forall \bigint i; 0 <= i < nodeList.length;
 @ !o.equals(((Node)nodeList[i]).item));
 @ ensures
 @ \result >= 0 ==>
 @ (\forall \bigint i; \result < i < nodeList.length;
 @ !o.equals(((Node)nodeList[i]).item)) &&
 @ o.equals(((Node)nodeList[\result]).item);
 @*/
public /*@ strictly_pure @*/ int
lastIndexOf(/*@ nullable @*/ Object o) {
   int index = size;
   if (o == null) {
       ...
   } else {
       /*@
         @ maintaining
         @ (\forall \bigint i; index <= i < nodeList.length;
         @ !o.equals(((Node)nodeList[i]).item));
         @ maintaining
         @ 0 <= index && index <= nodeList.length;
         @ maintaining
         @ 0 < index && index <= nodeList.length ==>
         @ x == (Node)nodeList[index - 1];
         @ maintaining
         @ index == 0 <==> x == null;
         @ decreasing
         @ index;
         @ assignable
         @ \strictly_nothing;
         @*/
       for (Node x = last; x != null; x = x.prev) {
           index--;
           if (o.equals(x.item))
               return index;
       }
   }
   return -1;
}
```
Listing 3: Method lastIndexOf(Object) annotated with JML. Searches the list from last to first for an element. Returns −1 if this element is not present in the list; otherwise returns the index of the node that was equal to the argument. Only the contract and branch in which the argument is non-null is shown due to space restrictions. Methods such as indexOf, removeFirstOccurrence and removeLastOccurrence are very similar.

and select DefOps arithmetical rules. Close all provable goals under the root. One goal remains. Perform update simplification macro on the whole sequent, perform propositional with split macro on the sequent, and close provable goals on the root of the proof tree. There is a remaining case:

**–** Case index − 1=0 ↔ x.prev = null: split the equivalence. First case, suppose index − 1 = 0, then x = self .nodeList[0] = self .first and self .first.prev = null: solvable through unfolding the invariant and equational rewriting. Now, second case, suppose x.prev = null. Then, either index = 1 or index > 1 (from splitting index ≥ 1). The first of which is trivial (close provable goal), and the second one requires instantiating quantified statements from the invariant, leading to a contradiction. Since we have supposed x.prev = null, but x = self .nodeList[index − 1] and self .nodeList[index − 1].prev = self .nodeList[index − 2] and self .nodeList[index − 2] = null.

Interesting verification conditions. The acyclicity property is used to close verification conditions that arise as a result of potential aliasing of node instances: it is used as a separation lemma. For example, after a method that changed the next field of an existing node, we want to reestablish that all nodes remain reachable from the first through next fields (i.e., "connectedness"): one proves that the update of next only affects a single node, and does not introduce a cycle. We prove this by using the fact that two nodes instances are different if they have a different index in nodeList, which follows from acyclicity. Below, we sketch an argument why the acyclicity property follows from the invariant. We have a video in which we show how the argument in KeY goes, see [1, 0:55–11:30].

### **Proposition.** Acyclicity follows from the linked list invariant.

Proof. By contradiction: suppose a linked list of size n > 1 is not acyclic. Then there are two indices, 0 ≤ i<j<n, such that the nodes at index i and j are equal. Then it must hold that for all j ≤ k<n, the node at k is equal to the node at k − (j − i). This follows from induction. Base case: if k = j, then node j and node j − (j − i) = i are equal by assumption. Induction step: suppose node at k is equal to node at k − (j − i), then if k + 1 < n it also holds that node k + 1 equals node k + 1 − (j − i): this follows from the fact that node k + 1 and k + 1 − (j − i) are both the next of node k < n − 1 and node k − (j − i). Since the latter are equal, the former must be equal too. Now, for all j ≤ k<n, node k equals node k − (j − i) in particular holds when k = n − 1. However, by the property that only the last node has a null value for next, and a non-last node has a non-null value for its next field, we derive a contradiction: if nodes k and k − (j − i) are equal then all their fields must also have equal values, but node k has a null and node k − (j − i) has a non-null next field!

Summary of verification effort. The total effort of our case study was about 7 man months. The largest part of this effort is finding the right specification. KeY supports various ways to specify Java code: model fields/methods, pure methods, and ghost variables. For example, using pure methods, contracts are specified by expressing the content of the list before/after the method using the pure method get(i), which returns the item at index i. This led to rather complex proofs: essentially it led to reasoning in terms of relational properties on programs (i.e. get(i) before vs get(i) after the method under consideration). After 2.5 man months of writing partial specifications and partial proofs in these different formalisms, we decided to go with ghost variables as this was the only formalism in which we succeeded to prove non-trivial methods.

It then took ≈ 4 man months of iterating in our workflow through (failed) partial proof attempts and refining the specs until they were sufficiently complete. In particular, changes to the class invariant were "costly", as this typically caused proofs of all the methods to break (one must prove that all methods preserve the class invariant). The possibility to interact with the prover was crucial to pinpoint the cause of a failed verification attempt, and we used this feature of KeY extensively to find the right changes/additions to the specifications.

After the introduction of the field nodeList, several methods could be proved very easily, with a very low number of interactive steps or even automatically. Methods unlink(Node) and linkBefore(Object, Node) could not be proven without knowing the position of the node argument. We introduced a new ghost field, nodeIndex, that acts like a ghost parameter. Luckily, this did not affect the class invariant, and existing proofs that did not make use of the new ghost field were unaffected.

Once the specifications are (sufficiently) complete, we estimate that it only took approximately 1 or 1.5 man weeks to prove all methods. This can be reduced further if informal proof descriptions are given. Moreover, we have recorded a video of a 30 minute proof session where the method unlinkLast is proven correct with respect to its contract [1].

Proof statistics. The below table summarizes the main proof statistics for all methods. The last two columns are not metrics of the proof, but they indicate the total lines of code (LoC) and the total lines of specifications (LoSpec).


We found the most difficult proofs were for the method contracts of: clear(), linkBefore(Object,Node), unlink(Node), node(int) and remove(Object). The number of interactive steps seem a rough measure for effort required. But, we note that it is not a reliable representation of the difficulty of a proof: an experienced user can produce a proof with very few interactive steps, while an inexperienced user may take many more steps. The proofs we have produced are by no means minimal.

## **4 Discussion**

In this section we discuss some of the main challenges of verifying the real-world Java implementation of a LinkedList, as opposed to the analysis of an idealized mathematical linked list.

Extensive use of Java language constructs. The LinkedList class uses a wide range of Java language features. This includes nested classes (both static and non-static), inheritance, polymorphism, generics, exception handling, object creation and foreach loops. To load and reason about the real-world LinkedList source code requires an analysis tool with high coverage of the Java language, including support for the aforementioned language features.

Support for intricate Java semantics. The Java List interface is position based, and associates with each item in the list an index of Java's int type. The bugs described in Section 2.1 were triggered on large lists, in which integer overflows occurred. Thus, while an idealized mathematical integer semantics is much simpler for reasoning, it could not be used to analyze the bugs we encountered! It is therefore critical that the analysis tool faithfully supports Java's semantics, including Java's integer (overflow) behavior.

Collections have a huge state space. A Java collection is an object that contains other objects (of a reference type). Collections can typically grow to an arbitrary (but in practice, bounded) size. By their very nature, collections thus intrinsically have a large state. To make this more concrete: triggering the bugs in LinkedList requires at least 2<sup>31</sup> elements (and 64 GiB of memory), and each element, since it is of a reference type, has at least 2<sup>32</sup> values. This poses serious problems to fully automated analysis methods that explore the state space.

Interface specifications. Several of the LinkedList methods contain an interface type as parameter. For example, the addAll method takes two arguments, the second one is of the Collection type:

```
public boolean addAll(int index, Collection c) {
        ...
        Object[] a = c.toArray();
        ...
}
```
As KeY follows the design by contract paradigm, verification of LinkedList's addAll method requires a contract for each of the other methods called, including the toArray method in the Collection interface. How can we specify interface methods, such as Collection.toArray? The stub generator generates a conservative contract: it may arbitrarily modify the heap and return any array. Simple conditions on parameters or the return value are easily expressed, but meaningful contracts that relates the behavior of the method to the contents of the collection require some notion of state to capture all mutations of the collection, so that previous calls to methods in the interface that contributed to the current contents of the collection are taken into account. Model fields/methods [3, Section 9.2] are a widely used mechanism for abstract specification. A model field or method is represented in a concrete class in terms of the concrete state given by its fields. In this case, as only the interface type Collection is known rather than a concrete class, such a representation cannot be defined. Thus the behavior of the interface cannot be fully captured by specifications in terms of model fields/variables, including for methods such as Collection.toArray. Ghost variables cannot be used either, since ghost variables are updated by adding set statements in method bodies, and interfaces do not contain method bodies. This raises the question: how to specify behavior of interface methods?<sup>7</sup>

Verifiable code revisions. We fixed the LinkedList class by explicitly bounding its maximum size to Integer.MAX VALUE elements, but other solutions are possible. Rather than using integers indices for elements, one could change to an index of type long or BigInteger. Such a code revision is however incompatible with the general Collection and List interfaces (whose method signatures mandate the use of integer indices), thereby breaking all existing client code that uses LinkedList. Clearly this is not an option in a widely used language like Java, or any language that aims to be backwards compatible.

It raises the challenge: can we find code revisions that are compatible with existing interfaces and client classes? We can take this challenge even further: can we use our workflow to find such compatible code revisions, and are also amenable to formal verification? The existing code in general is not designed for verification. For example, the LinkedList class exposes several implementation details to classes in the java.util package: i.e., all fields, including size, are package private (not private!), which means they can be assigned a new value directly (without calling any methods) by other classes in that package. This includes setting size to negative values. As we have seen, the class malfunctions for negative size values. In short, this means that the LinkedList itself cannot enforce its own invariants anymore: its correctness now depends on the correctness of other classes in the package. The possibility to avoid calling methods to access the lists field may yield a small performance gain, but it precludes a modular analysis: to assess the correctness of LinkedList one must now analyze all classes in the same package (!) to determine whether they make benign changes (if any) to the fields of the list. Hence, we recommend to encapsulate such implementation details, including making at least all fields private.

Proof reuse. Section 3.2 discussed the proof effort (in person months). It revealed that while the total effort was 6-7 person months, once the specifications are in place after many iterations of the workflow, producing the actual final proofs took only 1-2 weeks! But minor specification changes often require to redo nearly the whole proof, which causes much delay in finding the right specification. Other program verification case studies [3,4,8,9] show similarly that the main bottleneck today is specification, not verification. This calls for techniques to optimize proof reuse when the specification is slightly modified, allowing for a more rapid development of specifications.

<sup>7</sup> Since the representation of classes that implement the interface is unknown in the interface itself, a particularly challenging aspect here is: how to specify the footprint of an interface method, i.e.: what part of the heap can be modified by the method in the implementing class?

Status of the challenges. Most of these challenges are still open. The challenge concerning "Interface specifications" could perhaps be addressed by defining an abstract state of an interface by using/developing some form of a trace specification that map a sequence of calls to the interface methods to a value, together with a logic to reason about such trace specifications.

The challenges related to code revisions and proof reuse are compounded for analysis tools that use very fine-grained proof representations. For example, proofs in KeY consist of actual rule applications (rather than higher level macro/strategy applications), and proof rule applications explicitly refer to the indices of the (sub) formulas the rule is applied to. This results in a fragile proof format, where small changes to the specifications or source code (such as a code refactoring) break the proof.

The KeY system covered the Java language features sufficiently to load and statically verify the LinkedList source code. KeY also supports various integer semantics, allowing us to analyze LinkedList with the actual Java integer overflow semantics. As KeY is a theorem prover (based on deductive verification), it does not explore the state space of the class under consideration, thus solving the problem of the huge state space of Java collections. We could not find any other tools that solved these challenges, so we decided at that point to use KeY.

However, other state-of-the-art systems such as Coq, Isabelle and PVS support proof scripts. Those proofs are described at a typically much more coarsegrained level when compared to KeY. It would be interesting to see to what extent Java language features and semantics can be handled in (extensions of) such higher level proof script languages.

#### **4.1 Related work**

Kn¨uppel et al. [14] provide a report on the specification and verification of some methods of the classes ArrayList, Arrays, and Math of the OpenJDK Collections framework using KeY. Their report is mainly meant as a "stepping stone towards a case study for future research." To the best of our knowledge, no formal specification and verification of the actual Java implementation of a linked list has been investigated. In general, the data structure of a linked list has been studied mainly in terms of pseudo code of an idealized mathematical abstraction (see [18] for an Eiffel version and [12] for a Dafny version).

This paper (and [14]) has shown that the specification and verification of actual library software poses a number of serious challenges to formal verification. In our case study, we used KeY to verify Java's linked list. Other formalizations of Java also exists, such as Bali [17] and Jinja [13] (using the general-purpose theorem prover Isabelle/HOL), OpenJML [6] (a prover dedicated to Java programs), and VerCors [5] (focusing on concurrent Java programs, translated into Viper/Z3). However, these formalizations do not have a complete enough Java semantics to be able to analyze the bugs presented in this paper. In particular, these formalizations seem to have no built-in support for integer overflow arithmetic, although it can be added manually.

## **Self-references**


## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original authors and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Analysing installation scenarios of Debian packages *-*

Benedikt Becker<sup>1</sup> , Nicolas Jeannerod<sup>2</sup> , Claude Marché<sup>1</sup> , Yann Régis-Gianas<sup>2</sup>,<sup>3</sup> , Mihaela Sighireanu<sup>2</sup> , and Ralf Treinen<sup>2</sup>

<sup>1</sup> Université Paris-Saclay, Univ. Paris-Sud, CNRS, Inria, LRI, 91405, Orsay, France <sup>2</sup> Université de Paris, IRIF, CNRS, F-75013 Paris, France <sup>3</sup> Inria, F-75013 Paris, France

Abstract. The Debian distribution includes more than 28 thousand maintainer scripts, almost all of them are written in Posix shell. These scripts are executed with root privileges at installation, update, and removal of a package, which make them critical for system maintenance. While Debian policy provides guidance for package maintainers producing the scripts, few tools exist to check the compliance of a script to it. We report on the application of a formal verification approach based on symbolic execution to find violations of some non-trivial properties required by Debian policy in maintainer scripts. We present our methodology and give an overview of our toolchain. We obtained promising results: our toolchain is effective in analysing a large set of Debian maintainer scripts and it pointed out over 150 policy violations that lead to reports (more than half already fixed) on the Debian Bug Tracking system.

Keywords: Quality Assurance · Safety Properties · Debian · Software Package Installation · Shell Scripts · High-Level View of File Hierarchies · Symbolic Execution · Feature Tree Constraints

## 1 Introduction

The Debian distribution is one of the oldest free software distributions, providing today 60 000 binary packages built from more than 31 000 software source packages with an official support for nine different CPU architectures. It is one of the most used GNU/Linux distributions, and serves as the basis for some derived distributions like Ubuntu.

A software package of Debian contains an archive of files to be placed on the target machine when installing the package. The package may come with a number of so-called maintainer scripts which are executed when installing, upgrading, or removing the package. A current version<sup>4</sup> of the Debian distribution contains 28 814 maintainer scripts in 12 592 different packages, 9 771 of which

<sup>-</sup> This work has been partially supported by the ANR project CoLiS, contract number ANR-15-CE25-0001.

<sup>4</sup> sid for amd64, including contrib and non-free, as of October 6, 2019

are completely or partially written by hand. These scripts are used for tasks like cleaning up, configuration, and repairing mistakes introduced in older versions of the distribution. Since they may have to perform any action on the target machine, the scripts are almost exclusively written in some general-purpose scripting language that allows for invoking any Unix command.

The whole installation process is orchestrated by dpkg, a Debian-specific tool, which executes the maintainer scripts of each package according to scenarios. The dpkg tool and the scripts require root privileges. For this reason, the failure of one of these scripts may lead to effects ranging from mildly annoying (like spurious warnings) to catastrophic (removal of files belonging to unrelated packages, as already reported [39]). When an execution error of a maintainer script is detected, the dpkg tool attempts an error unwind, but the success of this operation depends again on the correct behaviour of maintainer scripts. There is no general mechanism to simply undo the unwanted effects of a failed installation attempt, short of using a file system implementation providing for snapshots.

The Debian policy [4] aims to normalise, in natural language, important technical aspects of packages. Concerning the maintainer scripts we are interested in, it states that the standard shell interpreter is Posix shell, with the consequence that 99% of all maintainer scripts are written in this language. The policy also sets down the control flow of the different stages of the package installation process, including attempts of error recovery, defines how dpkg invokes maintainer scripts, and states some requirements on the execution behaviour of scripts. One of these requirements is the idempotency of scripts. Most of these properties are until today checked on a very basic syntactic level (using tools like lintian [1]), by automated testing (like the piuparts suite [2]), or simply left until someone stumbles upon a bug and reports it to Debian.

The goal of our study is to improve the quality of the installation of software packages in the Debian distribution using a formal and automated approach. We focus on bug finding for three reasons. Firstly, a real Unix-like operating system is obviously too complex to be described completely and accurately by some formal model. Besides, the formal correctness properties may be difficult to apprehend by Debian maintainers especially when they are expressed on an abstract model. Finally, when a bug is detected, even on a system abstraction, one can try to reproduce it on a real system and, if confirmed, report it to the authors. This has a real and immediate impact on the quality of the software and helps to promote the usage of formal methods to a community that often is rather sceptical towards methods and tools coming from academic research.

The bugs in Debian maintainer scripts that we attempt to find may come at different levels: simple syntax errors (which may go unnoticed due to the unsafe design of the Posix shell language), non-compliance with the requirements of the Debian policy, usage of unofficial or undocumented features, or failure of a script in a situation where it is supposed to succeed.

The challenges are multiple: The Posix shell language is highly dynamic and recalcitrant to static analysis, both on a syntactic and semantic level. A Unix file system implementation contains many features that are difficult to

model, e.g., ownership, permissions, timestamps, symbolic links, and multiple hard links to regular files. There is an immense variety of Unix commands that may be invoked from scripts, all of which have to be modelled in order to be treated by our tools. To address properties of scripts required by the Debian policy, we need to capture the transformation done by the script on a file system hierarchy. For this, we need some kind of logic that is expressive enough, and still allows for automated reasoning methods. A particular challenge is checking the idempotency property for script execution because it requires relational reasoning. For this, we encode the semantics of a script as a logic formula specifying the relation between the input and the output of the script, and we check that it is equivalent to its composition with itself. Finally, all these challenges have to be met at the scale of tens of thousands of scripts.

The contributions of this work for this case study are:


We start in the next section with an overview of our method illustrated on a concrete example. Section 3 explains in greater detail the elements of our toolchain, the particular challenges, the hypotheses that we could make for the specific Debian use case at hand, and the solution that we have found. Section 4 presents the results we have found so far on the Debian packages, and the lessons learnt. We conclude in Section 5 by discussing additional outcomes of this study, the related and future work.

## 2 Overview of the case study and analysis methodology

#### 2.1 Debian packages

Three components of a Debian binary package play an important role in the installation process: the static content, i.e., the archive of files to be placed on the target machine when installing the package; the lists of dependencies and predependencies, which tell us which packages can be assumed present at different moments; and the maintainer scripts, i.e., a possibly empty subset of four scripts called preinst, postinst, prerm, and postrm. We found (Section 4.2) that 99% of the maintainer scripts in Debian are written in Posix shell [22].

Our running example is the binary package rancid-cgi [31]. It comes with only two maintainer scripts: preinst and postinst. The preinst script is included in Fig. 1. If the symbolic link /etc/rancid/lg.conf exists then it is 238 B. Becker et al.

```
1 if [ -h /etc/rancid/lg.conf ]; then
2 rm /etc/rancid/lg.conf
3 fi
4 if [ -e /etc/rancid/apache.conf ]; then
5 rm /etc/rancid/apache.conf
6 fi
```
Fig. 1. preinst script of the rancid-cgi package

removed; if the file /etc/rancid/apache.conf exists, no matter its type, it is also removed. Both removal operations use the Posix command rm which, without options, cannot remove directories. Hence, if /etc/rancid/apache.conf is a directory, this script fails while trying to remove it.

We did a statistical analysis of maintainer scripts in Debian to help us design our intermediate language, see Section 4.2 for details. We found that, for instance, most variables in these scripts can be expanded statically and hence are used like constants; most while loops can be translated into for loops; recursive functions are not used at all; redirections are almost always used to discard the standard output of commands.

#### 2.2 Managing package installation

The maintainer scripts are invoked by the dpkg utility when installing, removing or upgrading packages. Roughly speaking, for installation dpkg calls the preinst before the package static content is unpacked, and calls the postinst afterwards. For deinstallation, it calls the prerm before the static content is removed and calls the postrm afterwards. The precise sequence of script invocations and the actual script parameters are defined by informal flowcharts in the Debian policy [4, Appendix 9]. Fig. 2 shows the flowchart for the package installation. dpkg may be asked to: install a package that was not previously installed (Fig. 2), install a package that was previously removed but not purged, upgrade a package, remove a package, purge a package previously removed, remove and purge a package. These tasks include 39 possible execution paths, 4 of them presented in Fig. 2.

The Debian policy contains [4, Chapters 6 and 10] several requirements on maintainer scripts. This case study targets checking the requirements regarding the execution of scripts, and considers out of scope some other kinds of requirements, e.g., the permissions of script files. The requirements of interest are checked by different tools of our toolchain presented in Section 3. For example, the different ways to invoke a maintainer script are handled by the analysis of scenarios (Section 3.5) calling the scripts. Different requirements on the usage of the shell language are checked by the syntactic analysis (Section 3.1), like the usage of -e mode or of authorised shell features that are optional in the Posix standard. Some of the usage requirements can be detected by a semantic analysis; this is done in our toolchain by a translation into a formally defined language, called CoLiS (Section 3.1). Finally, requirements concerning the be-

Fig. 2. Debian flowchart for installing a package [4, Appendix 9] (The states represent calls to maintainer scripts with their arguments and the status returned by dpkg at the end of the process is in bold.)

haviour of scripts include the usage of exit codes and the idempotency of scripts. The last property is difficult to formalise since it refers to possible unforeseen failures (see discussion in Section 4.4). Checking behavioural properties requires to reason about their semantics, which is done by a symbolic execution in our toolchain (Section 3.4). We also check some requirements that are simply common sense and that are not stated in the policy, e.g., invoking Unix commands with correct options. This is done by the semantic analysis (Section 3.1).

#### 2.3 Principles and workflow of the analysis method

Our goal is to check the above properties of maintainer scripts in a formal way, by analysing each script and the composition of scripts in the execution paths exhibited by the flowcharts of dpkg. We call scenario either an execution path of dpkg, a single execution of a script, or a double execution of a script with the same parameters (to check idempotency); refer to Section 3.5 for more details.

The analysis should consider a variety of states for the system on which the execution takes place. Yet we assume the following hypotheses: the scripts are executed in a root process without concurrency with other user or root processes, the static content of the package is successfully unpacked, the dependencies defined by the package are present (fact checked by dpkg), and the /bin/sh command implements the standard Posix.1-2017 Shell Command Language with the additional features described in the Debian policy [4, Chapter 10].

The components of our toolchain for the analysis of a scenario are summarised on Fig. 3 and detailed in Section 3. Given a package and one scenario, the scenario player extracts the static content and the maintainer scripts, prepares the initial symbolic state of the scenario, symbolically executes the steps of the scenario to

Fig. 3. Toolchain for analysis of a scenario on a given package (see Section 2.3)

compute a symbolic relation between the input and the output states of the file system for each outcome of the scenario, and produces a diagnosis.

#### 2.4 Presentation of results

The results computed by the scenario player are presented in a set of web pages, one per scenario, and a summary page for the package [34]. Each scenario may have several computed exit codes; for an error code, the associated symbolic relation is translated automatically into a diagnosis message.

For example, consider the simple scenario of a call to the script preinst given in Fig. 1. The result web page includes the diagram in

Fig. 4. Example of diagnosis: error case for preinst call in the package rancid-cgi

Fig. 4, which is obtained by the interpretation of the symbolic relation computed by the scenario player for the error exit code. The diagram represents an abstraction of the initial file system on the left, an abstraction of the file system at the end of the script's execution on the right, and the relation between these abstractions (dotted lines). In this diagram, a plain edge represents the parent relation in the file hierarchy. A dotted edge describes a similarity relation, e.g., the trees rooted at /etc coincide except on the child named rancid. ⊥ denotes the absence of a node. Finally, a leaf can be annotated by a property, e.g., the annotation dir rooted at /etc/rancid/apache.conf. The diagram shows that the preinst script leads to an error state when the file /etc/rancid/apache.conf is a directory since the rm command cannot remove directories.

Finally, another set of generated web pages provides statistics on the coverage and the errors found for the full set of scenarios of the Debian distribution.

## 3 Design and implementation of the tool chain

The toolchain, as described in Fig. 3, hinges on a symbolic execution engine which computes the overall effect of a script on the file system as a symbolic relation between the input and the output file system. This section details this execution engine, which is composed of (i) a front-end that parses the script and translates it into a script in a formally defined intermediate language called CoLiS, and (ii) a back-end that symbolically executes the CoLiS scripts to get, for each outcome of the script, the relation between input and output file systems encoded by a tree constraint.

#### 3.1 Front-end

Shell parser. The syntax of the Posix shell language is unconventional in many aspects. For this reason, the implementation of a parser for Posix shell cannot simply reuse the standard techniques solely based on code generators. Most of the shell implementations falls back to manually written character-level parsers, which are difficult to maintain and to trust. morbig [30] is a parser that tries to use code generators as much as possible to keep the parser implementation at a high level of abstraction, simplifying maintenance and improving our ability to check if it complies with the Posix standard.

The CoLiS language. It was first presented in 2017 [23]. Its design aims to avoid some pitfalls of the shell, and to make explicit the dangerous constructions we cannot eliminate. It has a clear syntax and a formally defined semantics. We provide an automated and direct translation from Posix shell. The correctness of the translation from shell to CoLiS cannot be proven formally but must be trusted based on manual review of translations and tests.

For this case study, we improved the language proposed formerly [23] to increase the number of analysed Debian maintainer scripts. First, we added a number of constructs to the language. Second, we provide a formal semantics for the new constructs and we align the previous semantics [23] to the one of the Posix shell for a few other constructs. These changes and a complete description of the current CoLiS language are described in a technical report [6]. Fig. 5 shows the CoLiS version of the preinst script of the rancid-cgi package, shown previously in Fig. 1. Notice the syntax for string arguments and for lists of arguments that requires mandatory usage of delimiters. Generally speaking, the syntax of CoLiS is designed so as to remove potential ambiguities [6].

The toolchain for analysing CoLiS scripts is designed with formal verification in mind: the syntax, semantics, and interpreters of CoLiS are implemented using the Why3 environment [7] for formal verification. More precisely, the syntax of CoLiS is defined abstractly (as abstract syntax trees, AST for short) by an

```
1 if test [ '-h'; '/etc/rancid/lg.conf ' ] then
2 rm [ '/etc/rancid/lg.conf ' ]
3 fi
4 if test [ '-e'; '/etc/rancid/apache.conf ' ] then
5 rm [ '/etc/rancid/apache.conf ' ]
6 fi
```
Fig. 5. preinst script of the rancid-cgi package in CoLiS

algebraic datatype in Why3. Then CoLiS semantics is defined by a set of inductive predicates [6] that encodes a chiefly standard, big-step operational semantics. The semantic rules cover the contents of variables and input/output buffers used during the evaluation of a CoLiS script, but they do not specify the contents of the file system and the behaviour of Posix commands. The judgements and rules are parameterised by bounds on the number of loop iterations and the number of (recursively) nested function calls to allow for formalising the correctness of the symbolic interpreter. The bounds are either a non-negative integer, or ∞ for unbounded execution, and keep constant throughout the evaluation of a CoLiS instruction. We refer to [6] for the details.

A concrete interpreter for the CoLiS language is implemented in Why3. Its formal specifications (preconditions and post-conditions) state the soundness of the interpreter, i.e., that any result corresponds to the formal semantics with unbounded number of loop iterations and unbounded nested function calls. The specifications are checked using automated theorem provers [23].

Translation from shell to CoLiS. This is done automatically, but it is not formally proven. Indeed, a formal semantics of shell was missing until very recently [21]. For the control flow constructs, the AST of the shell script is translated into the AST of CoLiS. For the strings (words in shell), the translation generates either a string CoLiS expression or a list of CoLiS expressions depending on the content of the shell string. This translation makes explicit the string evaluation in shell, in particular the implicit string splitting. At the present time, the translator rejects 23% of shell scripts because it does not cover the full constructs of the shell, e.g., usage of globs, variables with parameters, and advanced uses of redirections.

The conformance of the CoLiS script with the original shell script is not proven formally but tested by manual review and some automatic tests. For the latter, we developed a tool that automatically compares the results of the CoLiS interpreter on the CoLiS script with the results of the Debian default shell (dash) on the original shell script. This tool uses a test suite of shell scripts built to cover the whole constructs of the CoLiS language. The test suite allowed us to fix the translator and the formal semantics of CoLiS and, as an additional outcome, it revealed a lack of conformance between the Debian default shell and Posix<sup>5</sup>.

<sup>5</sup> https://www.mail-archive.com/dash@vger.kernel.org/msg01683.html

Fig. 6. Examples of feature trees showing directories (t1), sub-directories (t2), a regular file and a symbolic link (t3).


Fig. 7. Basic constraints, from left to right: a feature, a regular file node, a directory node, a tree similarity, a feature absence, a maybe

#### 3.2 Feature trees and constraints

We employ models and logics to describe transformations of UNIX file systems. Feature trees [32,3,33] turn out to be suitable models for this case study. We have proposed a logic suitable to express file system transformations by extending previously existing logics. For the sake of space, we provide a concise overview of the model and logic used in this case study.

Feature trees. The models we consider here are trees with features (taken from F, an infinite set of legal file names) on the edges, the dir kind on the nodes and any kind (dir, reg or symlink) on the leaves. Examples are given in Fig. 6.

Constraints. To specify properties of feature tree models, we modify our first order logic [26] to suit this case study's needs. For the sake of presentation, we use a graphical representation of quantifier-free conjunctive clauses of this logic. See the technical report [24] for a detailed presentation.

The core basic constraints are presented in Fig. 7. The feature constraint expresses that y is a subtree of x accessible from the root of x via feature f. The kind constraints express that the root of a tree has the given kind (dir, reg or symlink). The similarity constraint expresses that x and y have the same children with the same names except for the children whose names are in F, a finite set of features, where they may differ.

For performance reasons, we added two more constraints; these do not increase the expressive power but help to prevent combinatorial explosion of formulas. The absence constraint expresses that either x is not a directory or x does not have a feature f at its root. The maybe constraint expresses that either x is not a directory, or it does not have a feature f at its root, or it has one that leads to y.

Fig. 8. A conjunctive clause

A model of a formula is a valuation that maps variables to feature trees. For instance, consider the valuation that associates t<sup>1</sup> to x, t<sup>2</sup> to y and t<sup>3</sup> to z, where t1, t<sup>2</sup> and t<sup>3</sup> are the trees defined in Fig. 6; it satisfies the formula in Fig. 8

Satisfiability. We designed a set of transformation rules [26] that turns any Σ1 formula into an irreducible form that is either false or a satisfiable formula. This is convenient in our setting because we can detect unsatisfiable formulas as soon as possible and keep the irreducible form instead of the original formula, speeding up further computations. Our toolchain includes an implementation of this system, using an efficient representation of irreducible Σ1-formulas as trees themselves. Finally, the system of rules is also extended to a quantifier elimination procedure, showing that the whole first-order logic is decidable.

#### 3.3 Specifications of UNIX commands

The specification of the UNIX commands uses our feature tree logic to express their effect on the file system. The specification formalises the description given in natural language in the Posix standard [22, Chapter Utilities] and, for some commands, in GNU manual pages. We only specified (most of) the UNIX commands called by the maintainer scripts.

The full specification is available in a separate technical report [24]. We present here its main ingredients. A UNIX command has the form: "cmd options paths", where "cmd" is a command name, "options" is a list of options, and "paths" is one or more absolute or relative paths (i.e., sequence of file names and symbols "." and ".."). For each combination of command name and option, we provide a list of formulas specifying the success and failure cases. A success or failure case formula has two free variables r and r- , which represent the root of the file system before and after the command execution. For some combinations of command names and options, the specification is not provided, but computed by the symbolic execution of a CoLiS script. This script captures the command behaviour by calling other (primitive) commands.

Path resolution. An important ingredient in command specification is the constraint encoding the resolution of a path in the file system. For this, we define a predicate resolve(r, cwd, p, z) stating that "when the root of the file system is r and the current working directory is the sequence of features cwd, the path p resolves and goes to variable z". The constraint defining this predicate is a Σ<sup>1</sup>

Fig. 9. Specification of success case for rm /etc/rancid/lg.conf

Fig. 10. Specification of error cases of rm /etc/rancid/lg.conf: explicit cases on the left, compact specification on the right

conjunction of basic constraints; it does not deal with symbolic link files on the path. For example, the constraint resolve(r, cwd, /etc/rancid/lg.conf, z) is represented by the path starting from r and ending in z in Fig. 9.

For some commands, a failure of path resolution may cause the failure of the command. To specify these failure cases, we have to use the negation of the predicate resolve, which generates a number of clauses which is linear in the length of the resolved path. Fig. 10 shows, in the three left-most constraints, the error cases for the resolution of the path to /etc/rancid/lg.conf. Because the internal representation of formulas keeps only conjunctive clauses, this may produce a state explosion of constraints when the command uses several paths. To obtain a compact internal representation of these error cases, we employ the maybe shorthand, as shown on the right of Fig. 10.

Let us consider the command rm /etc/rancid/lg.conf. Its specification includes one success case, given on Fig. 9: the resolution of the path /etc/rancid/lg.conf succeeded in the initial file system denoted by r, and the resulting file system, denoted by r is similar to r except for the absence of the feature lg.conf. The specification also includes one error case given on Fig. 10, where the path cannot be resolved to a regular path, and therefore the initial and final file systems are the same.

It is important to notice that specifications of commands are parameterised by their path(s) argument(s): for each concrete value of such paths, an appropriate constraint is produced. This fact is essential for using our symbolic engine, because the variables of a constraint denote nodes of the file system, but there is no notion of variable denoting file names or paths.

## 3.4 Analysis by symbolic execution

With a similar approach as for the concrete interpreter (Section 3.1), we designed and implemented a symbolic interpreter for the CoLiS language in Why3. Guided by a proof-of-concept symbolic interpreter for a simple IMP language [5], the main design choices for the symbolic interpreter for CoLiS are:


The Why3 code for the symbolic interpreter is annotated with post-conditions to express that it computes an over-approximation [5] of the concrete states that are reachable without exceeding the given bound on loop iterations. This property is formally proven using automated provers. The OCaml code is automatically extracted from Why3, and provides an executable symbolic interpreter with strong guarantees of soundness with respect to the concrete formal semantics.

Notice that our symbolic engine neither supports parallel executions, nor file permissions or file timestamps. This is another source of over-approximation, but also under-approximation, meaning that our approach can miss bugs whose triggering relies on the former features.

The symbolic interpreter provides a symbolic semantics for the given script: given an initial symbolic state that represents the possible initial shape of the file system, it returns a triple of sets of symbolic input/output relations, respectively for normal result, error result (corresponding to non-zero exit code) and result when a loop limit is reached. Error results are unexpected for Debian maintainer scripts, and these cases have to be inspected manually. To help this inspection, a visualisation of symbolic relations was designed, as already described in Fig. 4.

### 3.5 Scenarios

So far, we have presented how we analyse individual maintainer scripts. In reality, the Debian policy specifies in natural language in which order and with which arguments these scripts are invoked during package installation, upgrade, or removal (see, for instance, Fig. 2). We have specified these scenarios in a loopfree custom language. These scenarios define what happens after the success or

the failure of a script execution. They also specify when the static content is unpacked. Furthermore, our toolchain allows to define the assumptions that can be made on an initial filesystem before executing a scenario, for instance the File System Hierarchy Standard [38]. Our toolchain reports on packages that may remain in an unexpected state after the execution of one of these scenarios.

For instance, the installation scenario of the package rancid-cgi may leave that package in the state not-installed, which is reported by our toolchain using the diagram in Fig. 4.

## 4 Results and impact

#### 4.1 Coverage of the case study

The tools used and the datasets analysed during the current study are available in the Zenodo repository [36].

We execute the analysis on a machine equipped with 40 hyperthreaded Intel Xeon CPU @ 2.20GHz, and 750GB of RAM. To obtain a reasonable execution time, we limit the processing of one script to 60 seconds and 8GB of RAM. The time limit might seem low, but the experience shows that the few scripts (in 30 packages) that exceed this limit actually require hours of processing because they make a heavy use of dpkg-maintscript-helper. On our corpus of 12 592 packages with 28 814 scripts, the analysis runs in about half an hour.

All of those scripts that are syntactically correct with respect to the Posix standard (99.9%) are parsed successfully by our parser. The translation of the parsed scripts into our intermediary language CoLiS succeeds for 77% of them; the translation fails mainly because of the use of globs, variables with parameters and advanced uses of redirections.

Our toolchain then attempts to run 113 328 scenarios (12 592 packages with scripts, 9 scenarios per package). Out of those, 45 456 scenarios (40%) are run completely and 13 149 (12%) partially. This is because scenarios have several branches and although a branch might encounter failure, we try to get some information on execution of other branches. For the same reason, one scenario might encounter several failures. In total, we encounter 67 873 failures. The origins of failures are multiple, but the two main ones are (i) trying to execute a scenario that includes a script that we cannot convert (28% of failures), or (ii) the scripts might use commands unsupported by our tools, or unsupported features of supported commands (71% of failures).

Among the scenarios that we manage to execute at least partially, 19 reach an unexpected end state. These are potential bugs. We have examined them manually to remove false positives due to approximations done by our methodology or the toolchain. We discuss in Section 4.3 the main classes of true bugs revealed by this process.

#### 4.2 Corpus mining

The latest version of the Debian sid distribution on which we ran our tools dates from October 6, 2019. It contains 60 000 packages, 12 592 of which contain at


Table 1. Bugs found between 2016 and 2019 in Debian sid distributions

least one maintainer script, which leads to 28 814 scripts. In total, these scripts contain 442 364 source lines of code, 15 lines on average, and up to 1 138 for the largest script. Among them we find 220 bash scripts, 2 dash scripts, 14 perl scripts, and one ELF executable – the rest are Posix shell scripts.

In the process of designing our tools, and in order to validate our hypotheses, we ran statistical analysis on this corpus of scripts. The construction of our tool for statistical analysis is described in a technical report [25] where we also detail a few of our findings. To summarise, analysing the corpus revealed that:


This analysis had an important impact on the project by guiding the design choices of CoLiS, which Unix commands we should specify and in which order, etc. This also helped us to discover a few bugs, e.g., scripts invoking Unix commands with invalid options.

#### 4.3 Bugs found

We ran our toolchain on several snapshots of the Debian sid distribution taken between 2016 and 2019, the latest one being October 6, 2019. We reported over this period a total of 152 bugs to the Debian Bug Tracking System [37]. Some of them have immediately been confirmed by the package maintainer (for instance, [16]), and 96 of them have already been resolved.

Table 1 summarises the main categories of bugs we reported. Simple lexical analysis already detects 95 violations of the Debian Policy, for instance scripts that do not specify the interpreter to be used, or that do not use the -e mode [9]. The shell parser (Section 3.1) detects 3 scripts that use shell constructs not allowed by the Posix standard, or in a context where the Posix standard states that the behaviour is undefined [15]. There are also 3 miscellaneous bugs, like using unsafe shell constructs. The mining tool (Section 4.2) detects 5 scripts that invoke Unix commands with wrong options and 29 scripts that mix up redirection of standard-output and standard-error. The translation from the shell to the CoLiS language (Section 3.1) detects 9 scripts with wrong test expressions [11]. These may stay unnoticed during superficial testing since the shell confuses, when evaluating the condition of an if-then-else, an error exception with the Boolean value False. Inspection of the symbolic semantics extracted by the symbolic execution (Section 3.4) finds 5 scripts with semantic errors. Among these is the bug [16] of the package rancid-cgi already explained in Section 2.4. During the formalisation of Debian tools (see Section 3.3), we found 3 bugs. These include in particular a bug [12] in the dpkg-maintscript-helper command which is used 10 306 times in our corpus of maintainer scripts, and was fixed in the meantime.

#### 4.4 Lessons learnt

One basic problem when trying to analyse maintainer scripts is to understand precisely the meaning of the policy document. For instance, one of the more intriguing requirements is that maintainer scripts have to be idempotent (Section 6.2 in [4]). While it is common knowledge that a mathematical function f is idempotent when f(f(x)) = f(x) for any x, the meaning is much less clear in the context of Debian maintainer scripts as the policy goes on to explain "If the first call failed, or aborted half way through for some reason, the second call should merely do the things that were left undone the first time, if any, and exit with a success status if everything is OK." We suppose that this refers to causes of error external to the script itself (power failure, full disk, etc.), and that there might be an intervention by the system administrator between the two invocations. Since we cannot even explain in natural language what precisely that means, let alone formalise it, we decided to model at the moment only a rough under-approximation of that property that only compares executions by their exit code. This allowed us to detect a bug [14].

We found that identifying bugs in maintainer scripts always requires human examination. Automated tools allow to point out potential problems in a large corpus, but deciding whether such a problem actually deserves a bug report, and of what severity level, requires some experience with the Debian processes. This is most visible with semantic bugs in scripts, since an error exit code does not imply that there is a bug. Indeed, if a script detects a situation it cannot handle then it must signal an error and produce a useful error message. Deciding whether a detected error case is justified or accidental requires human judgement.

Filing bug reports demands some caution, and observance of rules and common practices in the community. For instance, the Debian Developers Reference [18] requires approval by the community before so-called mass bug filing. Consequently, we always sought for advice before sending batches of bugs, either on the Debian developers mailing list, or during Debian conferences.

## 5 Conclusion

The corpus of Debian maintainer scripts is an interesting case study for analysis due to its size, the challenging features of the scripting language, and the relational properties it requires to analyse. The results are very promising. First, we reported 152 bugs [37] to the Debian Bug Tracking system, 96 of which have already been resolved by Debian maintainers. Second, the toolchain performs the analysis of a package in seconds and of the full distribution in less than a hour, which makes it fit for integration in the workflow of Debian maintainers or for quality assurance at the the level of the whole distribution. Integration of our toolchain in the lintian tool will not be possible since it would add a lot of external dependencies to that tool, and since the reports generated by our tool still require human evaluation (see Section 4.4).

This study had several additional outcomes. The toolchain includes tools for parsing and light static analysis of shell scripts [30], an engine for the symbolic execution of imperative languages based on first-order logics representation of program configurations [5], and an efficient decision procedure for feature tree logics. We also provide a formal specification of Posix commands used in Debian scripts in terms of a first-order logic [24].

We are not aware of a project dealing with this kind of problem or obtaining comparable results. To our knowledge, the only existing attempt to analyse a complete corpus of package maintainer scripts was done in the context of the Mancoosi project [19]. In this work, the analysis, mainly syntactic, resulted in a set of building blocks used in maintainer scripts that may be used in a DSL. In a series of papers [20,28,29], Ntzik et al. consider the formal reasoning on the Posix scripts manipulating the file system based on (concurrent) separation logic. Not only do they employ a different logic (a second-order logic), but they also focus on (manual) proof techniques for correctness and not on automatic techniques for finding bugs. Moreover, they consider general scripts and properties that are not relational (like idempotency). There have been few attempts to formalise the shell. Greenberg [21] recently offers an executable formal semantics of Posix shell that will serve as a foundation for shell analysis tools. Abash [27] contains a formalisation of parts of the bash language and an abstract interpretation tool for the analysis of arguments passed by scripts to Unix commands; this work focused on identifying security vulnerabilities.

The successful outcome of this case study revealed new challenges that we aim to address in future work. In order to increase the coverage of our analysis and the acceptance by Debian maintainers, the translation from shell should cover more features, additional Unix commands should be formally specified, and the model should capture more features of the file system, e.g., permissions, or symbolic links. The efficiency of the analysis can still be improved by using a more compact representation of disjunctive constraints in feature tree logics or by exploiting the genericity of the symbolic execution engine to include other logic based symbolic representations that may be more efficient and precise. Finally, we want to use the computed constraints on scenarios to check new properties of scripts like equivalence of behaviours.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Endicheck: Dynamic Analysis for Detecting Endianness Bugs

Roman Kapl and Pavel Par ´ ´ızek

Department of Distributed and Dependable Systems, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic

Abstract. Computers store numbers in two mutually incompatible ways: littleendian or big-endian. They differ in the order of bytes within representation of numbers. This ordering is called endianness. When two computer systems, programs or devices communicate, they must agree on which endianness to use, in order to avoid misinterpretation of numeric data values.

We present Endicheck, a dynamic analysis tool for detecting endianness bugs, which is based on the popular Valgrind framework. It helps developers to find those code locations in their program where they forgot to swap bytes properly. Endicheck requires less source code annotations than existing tools, such as Sparse used by Linux kernel developers, and it can also detect potential bugs that would only manifest if the given program was run on computer with an opposite endianness. Our approach has been evaluated and validated on the Radeon SI Linux OpenGL driver, which is known to contain endianness-related bugs, and on several open-source programs. Results of experiments show that Endicheck can successfully identify many endianness-related bugs and provide useful diagnostic messages together with the source code locations of respective bugs.

## 1 Introduction

Modern computers represent and store numbers in two mutually incompatible ways: little-endian (with the least-significant byte first) or big endian (the most-significant byte first). The byte order is also referred to as *endianness*.

Processor architectures typically define a *native endianness*, in which the processor stores all data. When two computer systems or programs exchange data (e.g., via a network), they must first agree on which endianness to use, in order to avoid misinterpretation of numeric data values. Also devices connected to computers may have control interfaces with endianness different from the host's native endianness.

Therefore, programs communicating with other computers and devices need to swap the bytes inside all numerical values to the correct endianness. We use the term *target endianness* to identify the endianness a program should use for data exchanged with a particular external entity. Note that in some cases it is not necessary to know whether the target endianness is actually little-endian or big-endian. When the knowledge is important within the given context, we use the term *concrete endianness*.

If the developer forgets to transform data into the correct target endianness, the bug can often go unnoticed for a long time because software is nowadays usually developed and tested on the little-endian x86 or ARM processor architecture. For example, if two identical programs running on a little-endian architecture communicate over the network using a big-endian protocol, a missing byte-order transformation in the same place in code will not be observed. Our work on this project was, in the first place, motivated by the following concrete manifestation of the general issue described in the previous sentence. The Linux OpenGL driver for Radeon SI graphics cards (the Mesa 17.4 version) does not work on big-endian computers due to an endianness-related bug1, as the first author discovered when he was working on an industrial project that involved PowerPC computers in which Radeon graphic cards should be deployed.

We are aware of few approaches to detection of endianness bugs, which are based on static analysis and manually written source code annotations. An example is Sparse [11], a static analysis tool used by Linux kernel developers to identify code locations where byte-swaps are missing. The analysis performed by Sparse works basically in the same way as type checking for C programs, and relies on the usage of specialized *bitwise* data types, such as le16 and be32, for all variables with non-native endianness. Integers with different concrete endianness are considered by Sparse as having mutually incompatible types, and the specialized types are also not compatible with regular C integer types. In addition, macros like le32 to cpu are provided to enable safe conversion between values of the bitwise integer types and integer values of regular types. Such macros are specially annotated so that the analysis can recognize them, and developers are expected to use only those macros.

The biggest advantage of bitwise types is that a developer cannot assign a regular native endianness integer value to a variable of a bitwise type, or vice versa. Their nature also prevents the developer from using them in arithmetic operations, which do not work correctly on values with non-native byte order. On the other hand, a significant limitation of Sparse is that developers have to properly define the bitwise types for all data where endianness matters, and in particular to enable identification of data with concrete endianness — Sparse would produce imprecise results otherwise. Substantial manual effort is therefore required to create all the bitwise types and annotations.

Our goals in this whole project were to explore an approach based on dynamic analysis, and to reduce the amount of necessary annotations in the source code of a subject program. We present Endicheck, a dynamic analysis tool for detecting endianness bugs that is implemented as a plugin for the Valgrind framework [6]. The main purpose of the dynamic analysis performed by Endicheck is to track endianness of all data values in the running subject program and report when any data leaving the program has the wrong endianness. The primary target domain consists of programs written in C or C++, and in which developers need to explicitly deal with endianness of data values.

While the method for endianness tracking that we present is to a large degree inspired by dynamic *taint* analyses (see, e.g., [8]), our initial experiments showed that usage of existing taint analysis techniques and tools does not give good results especially with respect to precision. For example, an important limitation of the basic taint analysis, when used for endianness checking, is that it would report false positives on data that needs no byte-swapping, such as single byte-sized values. Therefore, we had to modify and extend the existing taint analysis algorithms for the purpose of endianness checking. During our work on Endicheck, we also had to solve many associated tech-

<sup>1</sup> https://bugs.freedesktop.org/show\_bug.cgi?id=99859

nical challenges, especially regarding storage and propagation of metadata that contain the endianness information — this includes, for example, precise tracking of single-byte values.

Endicheck is meant to be used only during the development and testing phases of the software lifecycle, mainly because it incurs a substantial runtime overhead that is not adequate for production deployment. Before our Endicheck tool can be used, the subject program needs to be modified, but only to inform the analysis engine where the byte-order is being swapped and where data values are leaving the program. In C and C++ programs, byte-order swapping is typically done by macros provided in the system C library, such as htons/htonl or those defined in the endian.h header file. Thus only these macros need to be annotated. During the development of Endicheck, we redefined each of those macros such that the custom variant calls the original macro and defines necessary annotations — for examples, see Figure 1 in Section 4 and the customized header file inet.h2. Similarly, data also tend to leave the program only through few procedures. For some programs, the appropriate place to check for correct endianness is the send/write family of system calls.

Endicheck is released under the GPL license. Its source code is available at https: //github.com/rkapl/endicheck.

The rest of the paper is structured as follows. Section 2 begins with a more thorough overview of the dynamic analysis used by Endicheck, and then it provides details about the way endianness information for data values are stored and propagated — this represents our main technical contribution, together with evaluation of Endicheck on the Radeon SI driver and several other real programs that is described in Section 5. Besides that, we also provide some details about the implementation of Endicheck (Section 3) together with a short user guide (Section 4).

## 2 Dynamic Analysis for Checking Endianness

We have already mentioned that the dynamic analysis used by Endicheck to detect endianness bugs is a special variant of taint analysis, since it uses and adapts some related concepts. In the rest of this paper, we use the term *endianness analysis*.

## 2.1 Algorithm Overview

Here we present a high-level overview of the key aspects of the endianness analysis. Like common taint and data-flow analysis techniques (see, e.g., [4] and [8]), our dynamic endianness analysis tracks flow of data through program execution, together with some metadata attached to specific data values. The analysis needs to attach metadata to all memory locations for which endianness matters, and maintain them properly. Metadata associated with a sequence of bytes (memory locations) that makes a numeric data value then capture its endianness. Similarly to many dynamic analyses, the metadata are stored using a mechanism called *shadow memory* [7] [9]. We give more details about the shadow memory in Section 2.2.

<sup>2</sup> https://github.com/rkapl/endicheck/blob/master/endicheck/ec-overlay/arpa/inet.h

Although we mostly focus on checking that the program being analyzed does not transmit data of incorrect endianness to other parties, there is also the opposite problem: ensuring that the program does not use data of other than native endianness. For this reason, our endianness analysis could be also used to check whether all operands of an arithmetic instruction have the correct native endianness — this is important because arithmetic operations are unlikely to produce correct results otherwise. Note, however, that checking of native endianness for operands has not yet been implemented in the Endicheck tool.

The basic principle behind the dynamic endianness analysis is to watch instructions as they are being executed and check endianness at specific code locations, such as the calls of I/O functions. We use the term *I/O functions* to identify all system calls and other functions that encapsulate data exchange between a program and external entities (e.g., writing or reading data to/from a hard disk, or network communication) in a specific endianness. When the program execution reaches the call of an I/O function, Endicheck checks whether all its arguments have the proper endianness. Note that the user of Endicheck specifies the set of I/O functions by annotations (listed in Section 4).

In order to properly maintain the endianness information stored in the shadow memory, our analysis needs to track almost every instruction being executed during the run of a subject program. The analysis receives notifications about relevant events from the Valgrind dynamic analysis engine. All the necessary code for tracking individual instructions (processing the corresponding events), updating endianness metadata (inside the shadow memory), and checking endianness at the call sites of I/O functions, is added to the subject program through dynamic binary instrumentation. Further technical details about the integration of Endicheck into Valgrind are provided later in Section 3.

Two distinguishing aspects of the endianness analysis — the format of metadata stored in the shadow memory and the way metadata are propagated during the analysis of program execution — are described in the following subsections.

#### 2.2 Shadow Memory

A very important requirement on the organization and structure of shadow memory was full transparency for any C/C++ or machine code program. The original layout of heap and stack has to be preserved during the analysis run, since Endicheck (and Valgrind in general) targets C and C++ programs that typically rely on the precise layout of data structures in memory. Consequently, Endicheck cannot allocate the space for shadow memory (metadata) within the data structures of the analyzed program.

When designing the endianness analysis, we decided to use the mechanism supported by Valgrind [7], which allows client analyses to store a tag for each byte in the virtual address space of the analyzed program without changing its memory layout. This mechanism keeps a translation table (similar to page tables used by operating systems) that maps memory pages to *shadow pages* where the metadata are stored.

The naive approach would be to follow the same principles as taint analyses, i.e. reuse the idea of taint bits, and mark each byte of memory as being either of native endianness or target endianness. However, our endianness analysis actually uses a richer format of metadata and individual tags, which improves the analysis precision.

Rich Metadata Format. In this format of metadata, each byte of memory and each processor register is annotated with one of the following tags that represent available knowledge about the endianness of stored data values.


In addition to these four tags, each byte of memory can also be annotated with the empty flag, indicating that the byte's value is zero. Now we give more details about the meaning of these tags, and discuss some of the associated challenges.

Single-byte values. Our approach to precise handling of single-byte values is motivated by the way arithmetic operations are processed. Determining the correct size of the result of an arithmetic operation (in terms of the number of actually used bytes) is difficult in practice, because compilers often choose to use instructions that operate on wider types than actually specified by the developer in program source code. This means the analysis cannot, in some cases, precisely determine whether the result of an arithmetic operation has only a single byte. Our solution is to always mark the leastsignificant byte of the result with the tag byte-sized. Such an approach guarantees that if only the least-significant byte of an integer value is actually used, it does not trigger any endianness errors when checked, because the respective memory location is not tagged as native. On the other hand, if the whole integer value is really used (or at least more than just the least-significant byte), there is one byte marked with the tag byte-sized and the rest of the bytes are marked as native, thus causing an endianness error when checked.

Empty byte flag. Usage of the empty flag helps to improve performance of the endianness analysis when processing byte-shuffling instructions, because all operations with empty flags are simpler than operations with the actual values. However, this flag can be soundly used only when the operands are byte-wise disjoint, i.e. when each byte is zero (empty) in at least one of the operands. Arithmetic operations are handled in a simplified way — they never mark bytes as empty in the result. Consequently, while the empty tag implies that the given byte is zero, the reverse implication does not hold.

Unknown tag. We introduced the tag unknown in order to better handle data values, for which the analysis cannot say whether they are already byte-swapped. Endicheck uses this tag especially for uninitialized data. Values marked with the tag unknown are not reported as erroneous by default, but this behavior is configurable. We discuss other related problems, concerning especially precision, below in Section 2.4.

## 2.3 Propagation of Metadata

An important aspect of the endianness analysis is that data values produced by the subject program are marked as having the native endianness by default. This behavior matches the prevailing case, because data produced by most instructions (e.g., by arithmetic operations) and constant values can be assumed to have native endianness.

In general, metadata are propagated upon execution of an instruction according to the following policy:


Endicheck correctly passes metadata also through routines such as memcpy and certain byte-shuffling operations (e.g., shift <<= and >>=). Complete details for all categories of instructions and routines are provided in the master thesis of the first author [3].

The only way to create data with the target tag is via explicit annotation from the user. Specifically, the user needs to add annotations to *byte-swapping* functions in order to set the target tag on return values.

#### 2.4 Discussion: Analysis Design and Precision

The basic scenario that is obviously supported by our analysis is the detection of endianness bugs when the target and native endianness are different. However, the design of our analysis ensures that it can be useful even in cases when the native endianness is the same as the target endianness. Although byte-swapping functions then become identities, the endianness analysis can still find data that would not be byte-swapped if the endianities were different — it can do this by setting the respective tags when data pass through the byte-swapping functions. In addition, the endianness analysis can be also used to detect the opposite direction of errors — programs using non-native endianness data values (e.g., received as input) without byte-swapping them first.

Endicheck does not handle constants and immediate values in instructions very well, since the analysis cannot automatically recognize their endianness and therefore cannot determine whether the data need byte-swapping or not. Constants stored in the data section of a binary executable represent the main practical problem to the analysis, because the data section does not have any structure — it is just a stream of bytes. Our solution is to mark data sections initially with the tag unknown. If this is not sufficient, a user must annotate the constants in the program source code to indicate whether they already have the correct endianness.

A possible source of false bug reports are unused bytes within a block of memory that has undefined content, unless the memory was cleared with 0s right after its allocation. This may occur, for example, when some fields inside C structures have specific alignment requirements. Some space between individual fields inside the structure layout is then unused, and marked either with the tag unknown or with the tag left over from the previous content of the memory block.

## 3 Implementation

We distribute the Endicheck tool in the form of an open source software package that was initially created as a fork of the Valgrind source code repository. Although tools and plugins for Valgrind can be maintained as separate projects, forking allowed us to make changes to the Valgrind core and use its build/test infrastructure. Within the whole source tree of Endicheck, which includes the forked Valgrind codebase, the code specific to Endicheck is located in the endicheck directory. It consists of these modules:


In the rest of this section, we briefly describe how Endicheck uses the Valgrind infrastructure and a few other important features. Additional technical details about the implementation are provided in the master thesis of the first author [3].

Usage of Valgrind infrastructure. Endicheck depends on the Valgrind core (i) for dynamic just-in-time instrumentation [6] of a target binary program and (ii) for the actual dynamic analysis of program execution. The subject binary program is instrumented with code that carries out all the tasks required by our endianness analysis — especially recording of important events and tracking information about the endianness of data values. When implementing the Endicheck plugin, we only had to provide code doing the instrumentation itself and define what code has to be injected at certain locations in the subject program. Note also that for the analysis to work correctly and provide accurate results, Valgrind instruments all components of the subject program that may possibly handle byte-swapped data, including application code, the system C library and other libraries. During the analysis run, Valgrind notifies the Endicheck plugin about execution of relevant instructions and Endicheck updates the information about endianness of affected data values accordingly. Besides instrumentation and the actual dynamic analysis, other features and mechanisms of the Valgrind framework used by Endicheck include: utility functions, origin tracking, and developer-friendly error reporting.

Origin tracking [1] is a mechanism that can help users in debugging the endianness issues. An error report contains two stack traces: one identifies the source code location of the call to the I/O function where the wrong endianness of some data value was detected, and the second trace, provided by origin tracking, identifies the source code location where the value has originated. In Endicheck, the origin information (identifier of the stack trace and execution context) is stored alongside the other metadata in the shadow memory for all values. We decided to use this approach because almost all values need origin tracking, since they can be sources of errors — in contrast to Memcheck, where only the uninitialized values can be sources of errors.

During our experiments with the Radeon SI OpenGL driver (described in Section 5.1), we have noticed that the driver maps the device memory into the user-space process. In that case, there is no single obvious point where to check the endianness of data that leave the program through the mapped memory. To solve this problem and support memory-mapped I/O, we extended our analysis to automatically check endianness at all writes to regions of the mapped device memory. We implemented this feature in such a way that each byte of a device memory region is tagged with a special flag protected — then, Endicheck can find very quickly whether some region of memory is mapped to a device or not. Note that the flag is associated with a memory location, while the endianness tags (described in Section 2.2) are associated with *data values*. Therefore, the special flag is not copied, e.g. when execution of memcpy is analyzed; it can be only set explicitly by the user.

## 4 User Guide

The recommended way to install Endicheck is building from the source code. Instructions are provided in the README file at the project web site. When Endicheck has been installed, a user can run it by executing the following command:

valgrind --tool=endicheck [OPTIONS...] PROGRAM ARGS...

Origin tracking is enabled by the option –track-origins=yes.

Annotations In order to analyze a given program, some annotations typically must be added into the program source code. A user of Endicheck has to mark the byte-swapping functions and the I/O functions (through which data values are leaving the program), because these functions cannot be reliably detected in an automated way.

The specific annotations are defined in the C header file endicheck.h. Here follows the list of supported annotations, together with explanation of their meaning:

– EC MARK ENDIANITY(start, size, endianness)

This annotation marks a region of memory from start to start+size-1 as having the given endianness. It should be used in byte-swapping functions. Target endianness is represented by the symbol EC TARGET.


Figure 1 shows an example program that demonstrates usage of the most important annotations (EC MARK and EC CHECK). If the call to htobe32 inside main is removed, Endicheck will report an endianness bug. This example also demonstrates possible ways to easily annotate standard functions, like htobe32 and write.

#### 262 R. K´apl and P. Par´ızek

```
#include <valgrind/endicheck.h>
uint32_t htobe32(uint32_t x) {
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
    x = bswap_32(x);
#endif
    EC_MARK_ENDIANITY(&x, sizeof(x), EC_TARGET);
    return x;
}
int ec_write(int file, const void *buf, size_t count) {
    EC_CHECK_ENDIANITY(buf, count, NULL);
    return write(file, buf, count);
}
#define write ec_write
int main() {
   uint32_t x = 0xDEADBEEF;
   x = htobe32(x);
   write(0, &x, sizeof(x));
   return 0;
}
```
Fig. 1. Small example program with Endicheck annotations.

## 5 Evaluation

We evaluated the Endicheck tool — namely its ability to find endianness bugs, precision and overhead — by the means of a case study on the Radeon SI driver, several opensource programs and a standardized performance benchmark. For the Radeon SI driver and each of the open-source programs, we provide a link to its source code repository (and identification of the specific version that we used for our evaluation) within the artifact that is referenced from the project web site.

## 5.1 Case Study

Our case study is Radeon SI, the Linux OpenGL driver for Radeon graphics cards, starting with the SI (Southern Islands) line of cards and continuing to the current models.

Since these Radeon cards are little-endian, the driver must byte-swap all data when running on a big-endian architecture such as PowerPC. However, the Radeon SI driver (in the Mesa 17.4 version) does not perform the necessary byte-swapping operations, and therefore simply does not work in the case of PowerPC — it crashes either the GPU or OpenGL programs using the driver. In particular, endianness bugs in this version of the Radeon SI driver cause the Glxgears demo on PowerPC to crash. We give more details about the bugs we have found in Section 5.2.

An important feature of the whole Linux OpenGL stack is that all layers, including the user-space program, communicate not only using calls of library functions and system calls, but they also extensively use mapping of the device memory directly into the user process. Given such an environment, Endicheck has to correctly handle (1) the flow of data through the whole OpenGL stack by instrumenting all the libraries used, and (2) communication through the shared memory that is used by the driver. This is why the support for mapped memory in Endicheck, through marking of device memory with a special flag, as described above in Section 3, is essential.

#### 5.2 Search for Bugs

For the purpose of evaluating Endicheck's ability to find endianness bugs, we picked a diverse set of open-source programs (in addition to the Radeon SI driver), including the following: BusyBox, OpenTTD, X.Org and ImageMagick. All programs are listed in Table 1. The only criterion was to select programs written in C that communicate over the network or store data in binary files, since only such programs may possibly contain endianness bugs. We also document our experience with fixing the endianness bugs in the Radeon SI driver and other programs.

One of the stated goals for Endicheck was to reduce the number of annotations that a user must add into the program source code in order to enable search for endianness bugs. Therefore, below we report the relevant measurements and discuss whether (and to what degree) this goal has been achieved.

In the rest of this section, first we discuss application of Endicheck on the Radeon SI driver (our case study) and then we present results for other programs.

Radeon SI case study. Within our case study, we have used the Glxgears demo program as a test harness for the Radeon SI driver. Initially we have run Glxgears on the x86 architecture, and after fixing all the issues found and reported by Endicheck, we moved the same graphics card to a PowerPC host computer and continued testing there.

In the case of the Radeon SI driver, all byte-swapping functions are located in a single file of one library (Gallium) on the OpenGL stack. Therefore, to enable search for endianness bugs in Radeon SI, we had to make just two changes: (1) annotate the function radeon drm cs add buffer as I/O function and (2) annotate the byte-swapping functions in Gallium. Overall, we had to add or change about 40 lines of source code, including annotations, in a single place. All our changes are published in the repository https://rkapl.cz/repos/git/roman/mesa. It contains the source code of Mesa augmented with our annotations and fixes for the endianness-related bugs in Radeon SI described below. For fixes of bugs found by Endicheck, we included the original Endicheck report in the commit message, under the ECNOTE header.

Figure 2 contains an example bug report produced by Endicheck with enabled origin tracking on Glxgears. The error report itself has three main parts (in this order): the problem description, origin stack trace (captured when the offending value is created) and point-of-check stack trace (recorded when some annotated I/O function is encountered). We show only fragments of stack traces for illustration (and to save space).

The problem description identifies the currently active thread, the nature of the error and the memory region containing the erroneous value. The memory region is identified by its address and an optional name provided by the program ("radeon add buffer" in

```
Thread 9 gallium_drv:0:
Memory does not contain data of Target endianness
Problem was found in block 0x41BF000 (named radeon_add_buffer)
at offset 0, size 8:
   0x41BF000: NNNNNNNN
The value was probably created at this point:
   at 0x8B787F7: si_init_msaa_functions (si_state_msaa.c:94)
   by 0x8B4F979: si_create_context (si_pipe.c:279)
       ...
   by 0x4C46661: glXCreateContext (glxcmds.c:427)
   by 0x10B67A: make_window.constprop.1 (glxgears.c:559)
   by 0x109A86: main (glxgears.c:777)
The endianness check was requested here:
   at 0x8B85C45: radeon_drm_cs_add_buffer (radeon_drm_cs.c:375)
   by 0x8B4A58B: si_set_constant_buffer (r600_cs.h:74)
   by 0x8B708D0: si_set_framebuffer_state (si_state.c:2934)
       ...
   by 0x55357FB: start_thread (pthread_create.c:465)
   by 0x5861B0E: clone (clone.S:95)
```
Fig. 2. Error report from Endicheck run on the Glxgears demo program

this case). Metadata are printed just for the part of the memory region that contains data with the wrong endianness, using this convention: N = native, U = undefined.

This particular error report (Figure 2) indicates that an array of floating-point values describing the *multisampling pattern* is not byte-swapped. Note that IEEE 754 floating point values also obey the endianness of the host platform, at least on the architectures x86, x64 and ARM. To repair the corresponding bug, we had to insert calls of byteswapping functions at the code location where the floating-point array is produced.

During our experiments with Radeon SI and Glxgears, four endianness bugs in total were detected by Endicheck on the x86 architecture before testing on PowerPC. After we fixed the bugs, the Glxgears demo did successfully run. This shows that Endicheck detected all bugs it was supposed to and provided reports useful enough so that the bugs could be fixed. Here we also need to emphasize that the Glxgears demo, naturally, does not exercise all code in the Radeon SI driver, and fixing the whole driver would require lot of additional work.

Other programs. As we said at the beginning of this section, we evaluated Endicheck's ability to find endianness bugs and precision on a set of realistic programs. Our primary goal in this part of the evaluation was to assess the following aspects:


Before trying to answer these questions, we wanted to be sure that the subject programs contain endianness bugs. However, some of the programs that we considered (OpenTTD, OpenArena and ImageMagick) are written in such a way that realistic endianness bugs cannot be injected into their codebase. ImageMagick uses a C++ abstraction layer for binary streams, which also handles endianness. OpenArena uses bitoriented encoding for most parts of the network communication. OpenTTD uses an abstraction layer too, but the developer can still make an endianness-related mistake in certain cases, such as storing an array of uint16 t values as an array of uint8 t values. We manually injected synthetic endianness bugs into the code of all the programs where this was possible. In this process, we also annotated the byte-swapping functions (like htonl). The bugs were created by removing one usage of byte-swapping functions.

The results of experiments are summarized in Table 1. For each program, the table provides the following information: whether it was possible to analyze the program at all, whether some endianness bugs were found, overhead related to false warnings, and how many lines of source code were added or changed in relation to Endicheck annotations. Data for the Radeon SI driver are also included in the table for completeness.


Table 1. Search for bugs: precision and necessary annotations

Data in Table 1 show that Endicheck could find the introduced bug in all the cases. Furthermore, Endicheck found two genuine endianness-related bugs in X.Org. The bugs were confirmed by the developers of X.Org and fixed in upstream3.

Endicheck also reports some false warnings, but their numbers are not overwhelming. Four cases in total occured for the Radeon SI driver and OpenTTD (two in each). This is a manageable amount, which can be even suppressed using further annotations.

#### 5.3 Performance

In this section, we report on the performance of Endicheck in terms of execution time overhead it introduces. We compare the performance data for programs instrumented with Endicheck, programs instrumented by the Memcheck plugin for Valgrind and programs without any instrumentation. For the purpose of experiments, we used the standardized benchmark SPEC CPU2000. Even though SPEC CPU2000 is a general benchmark, not tailored for endianness analysis, results of experiments with this benchmark

<sup>3</sup> https://gitlab.freedesktop.org/search?group\_id=&project\_id= 371&repository\_ref=master&scope=commits&search=Roman+Kapl

indicate the performance of Endicheck when doing a real analysis, because the controlflow paths exercised within Endicheck and the Valgrind core during an experiment do not depend on the specific metadata (tag values).

We run all experiments on a T550 ThinkPad notebook with 12 GiB of RAM and an i5-5200 processor clocked at 2.20 GHz, under Arch Linux from Q2 2018. The SPEC2000 test harness was used for all the runs, with iteration count set to 3. We compiled both Memcheck and Endicheck by GCC v7.3.0 with default options. Note that we had to omit the benchmark program "gap", because it produced invalid results when compiled with this version of GCC.

In the description of specific experiments, tables with results and their discussion, we use the following abbreviations:


Execution time. We divided our experiments designed for measuring the execution time into two groups. Our motivation was to ensure that all experiments, including the EC-OT configuration that incurs a large overhead, finish within a reasonable time limit. In the first group, we run the full range of configurations on the "test" data set provided by SPEC CPU2000, which is small compared to the full "reference" set, and used MC as the baseline for comparisons. Table 2 shows results for experiments in this group. All execution time data provided in this table are relative to MC, with the exception of data for the native configuration. The second group of experiments uses the full "reference" data set from SPEC CPU2000. Results for this group are provided in Table 3. In this case, we used the data for native (uninstrumented) programs as the baseline.


Table 2. Execution times for the SPEC CPU2000 test data set, relative to Memcheck.


Table 3. Execution times for the SPEC CPU2000 reference data set, relative to native runs.

Data in Table 3 indicate that the average slowdown of Memcheck is by the factor of 16.59. Endicheck, in comparison, slows down the analyzed program by the factor of 35.31. This means Endicheck has roughly two times higher overhead than Memcheck with default options. According to data in Table 2, the same relative slowdown of Endicheck with respect to Memcheck is 1.65x. This difference between the results for the reference and test data sets is caused by the different ratio of the time spent instrumenting the code versus time spent running the instrumented code.

However, data in both tables also show that the performance of Endicheck with origin tracking is lacking compared to Memcheck with the same option. It was still usable for our Radeon SI OpenGL tests, but measurements indicate that there is a space for optimization. Nevertheless, certain relative slowdown between the configurations EC-OT and MC-OT probably cannot be avoided, because Endicheck must track origin information for much more data than Memcheck. Based on our experiments, we observed that creating the origin information is the most expensive operation involved. When the origin tags are created for each superblock, instead of every instruction, the execution times drop roughly by a factor of two (see the columns EC-OT and EC-IT).

#### 5.4 Discussion

Based on the case study and results of experiments presented in the previous sections, we make the following general conclusions:


Regarding the annotation burden, we already mentioned that the user has to carefully mark in particular all the I/O functions and byte-swapping functions, so that Endicheck can correctly update endianness tags associated with memory locations during the run of the analysis. While it would be possible to recognize byte-swapping functions automatically, e.g. by static code analysis, then the endianness analysis would have to be run on a machine with the native endianness different from the target endianness, so that actual byte-swaps will be present.

Another limitation of Endicheck from the practical perspective is handling of complex data transformations, a problem shared with taint analysis. The metadata cannot be correctly preserved through transformations such as encryption/decryption and compression/decompression. However, in many cases, the problem could be avoided by requiring an endianness check to be performed just before the respective transformation.

## 6 Related Work

As far as we know, the Sparse tool [11] used by Linux kernel developers, which we already mentioned, is the only one publicly available specialized tool tackling the problem of finding endianness bugs. The main advantage of Endicheck over Sparse is better precision in some cases, i.e. fewer false bug reports, since dynamic analysis, which observes actual program execution and runtime data values, is typically more precise than static analysis. Endicheck also does not require so many annotations of functions and variables as Sparse — when using Endicheck, typically just few places in the program source code need to be annotated manually. More specifically, Sparse expects that an input program code involves (i) the specialized bitwise data types (e.g., le32) for all variables where endianness matters and (ii) the macros for conversion between regular types and bitwise types (e.g., le32 to cpu). With Endicheck, developers only have to annotate the byte-swapping functions used by the program (e.g., htons and htonl from the C library). On the other hand, Sparse has better coverage of program code, as it is based on static analysis.

The Valgrind dynamic analysis framework [6] comes bundled with a set of bug detection tools. Very popular is the Memcheck tool [5] for detecting memory access errors and leaks, which also served as an inspiration for the design and implementation of Endicheck. We mention the tool here, because it actually performs a variant of dynamic taint analysis — it marks each bit of the program memory as valid or invalid (tainted).

Closely related is also the runtime type checker Hobbes [2] for binary executables, which can detect some kinds of type mismatch bugs common in C programs. In order to reduce the number of false bug reports and to delimit integer values, Hobbes uses the mechanism of continuation markers — the first byte of each value has the marker unset, and the remaining bytes are set to indicate that they represent a continuation of an existing value. The analysis technique used by Hobbes could be modified to track endianness of integer values instead of distinguishing between pointers and integers, since one can model integers of different endianness as values that have different types (also like in the case of Sparse).

Another approach with functionality similar to Endicheck has been implemented within the LLVM/Clang plugin called DataFlowSanitizer [10]. It is a dynamic analysis framework that (i) enables programs to define tags for data values and check for specific tags, both through its API functions, and (ii) propagates all tags with the data.

## 7 Conclusion

We have presented a new dynamic analysis tool, Endicheck, for detecting endianness bugs in C/C++ programs. The tool is built upon the Valgrind framework. Endicheck provides a useful, and in many settings also preferable, alternative to static analysis tools like Sparse, because (1) it reports quite precise results (i.e., a low number of false warnings) due to the nature of dynamic analysis and (2) requires less annotations (and other changes) in the source code of the subject program in order to be able to detect missing byte-swap operations. The results of our experimental evaluation show that Endicheck can (1) handle large complex programs and (2) identify actual endianness bugs, and it has practical performance overhead. Endicheck could also be used in automated testing scenarios, as a useful alternative to testing programs on both little- and big-endian processor architecture. A testing environment based on Endicheck might be easier to set-up than the environment based, for example, on virtual machines.

## 7.1 Future Work

Possible extensions of Endicheck, which could improve its precision and practical usefulness even further, include:


Another way to detect endianness bugs more precisely is to use comparative runs (i.e, a kind of equivalence checking). The key idea is to run a program on two machines, where one has a big-endian architecture and the other has a little-endian architecture, and compare the data leaving both variants of the program. This approach has the potential to be the most accurate, because it can even detect problems in cases when data leaving the program are encrypted or compressed. On the other hand, it cannot always detect situations when the program forgets to byte-swap input data, unless the error affects one of the output values with *concrete* endianness.

Acknowledgments. This work was partially supported by the Czech Science Foundation project 18-17403S and partially supported by the Charles University institutional funding project SVV 260451.

## References

1. Bond, M.D., Nethercote, N., Kent, S.W., Guyer, S.Z., McKinley, K.S.: Tracking Bad Apples: Reporting the Origin of Null and Undefined Value Errors. In: Proceedings of OOPSLA 2007. ACM (2007)


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Describing and Simulating Concurrent Quantum Systems**

Richard Bornat<sup>1</sup>,<sup>4</sup> , Jaap Boender<sup>2</sup>,<sup>5</sup> , Florian Kammueller<sup>1</sup>,<sup>6</sup> , Guillaume Poly<sup>3</sup>,<sup>7</sup> , and Rajagopal Nagarajan<sup>1</sup>,<sup>8</sup>

 Department of Computer Science, Middlesex University, London, UK Hensoldt Cyber GmbH, Taufkirchen, Germany Widmee, R´egion de Bordeaux, France R.Bornat@mdx.ac.uk jacob.boender@hensoldt-cyber.com F.Kammueller@mdx.ac.uk guillaume.gwigwi.poly@gmail.com R.Nagarajan@mdx.ac.uk

**Abstract.** We present a programming language for describing and analysing concurrent quantum systems. We have an interpreter for programs in the language, using a symbolic rather than a numeric calculator, and we give its performance on examples from quantum communication and cryptography.

Quantum cryptographic protocols such as BB84 QKD [3] and E92 QKD [7] offer unconditional statistical security. These protocols have been implemented in commercial products; various QKD networks have been built around the world; and China has launched a dedicated satellite for quantum communication. The security of the protocols has been established information-theoretically, but their implementations may have security loopholes. We intend to investigate the security question, eventually by using formal methods to verify the properties of implementations, but first by simulation of protocols expressed as programs.

Large companies are developing full-stack solutions for implementing quantum algorithms, and quantum computers will likely be network-linked. Although we have focused on quantum communication and cryptography protocols, aspects of our work will be applicable to distributed quantum computation.

Concurrent quantum systems, such as communication and cryptographic protocols assume physically-separated agents (Alice, Bob, etc.) who communicate by sending each other qubits (quantum bits: polarised photons, for example) and classical bit-strings. There are a few dedicated, high-level programming languages for quantum systems such as Microsoft's Q# [2]. They focus on singlemachine computation and lack a treatment of communication, but a protocol simulation must ensure, for example, that a qubit transferred from one agent to another can't be used again by the sender and can't be used by the receiver before it is sent. We decided therefore to take a process-calculus approach, and we have implemented a tool inspired by CQP [9]. Our implementation is called qtpi [1], and uses symbolic rather than numeric quantum calculation. Programs are checked statically, before they run, to ensure that they obey real-world restrictions on the use of qubits (no cloning, no sharing). Unlike CQP, which preserves all possible outcomes, labelling each with a probability, qtpi takes a single execution path, making probabilistic choices between outcomes.

We have used qtpi to simulate simple protocols such as teleportation, and some more involved ones including the quantum key-distribution protocols BB84 [3] and E92 [7]. Each of these involves transmission of qubits and public transmission of classical messages (in the case of BB84, over an authenticated channel [13]), all of which is simulated. It is early days in our development of the tool, so there is as yet no provision for formal proof, but in these examples we can already simulate well over 1M qubit transfers per minute on a small laptop – i.e. we can simulate largish examples in a useful time.

## **1 Processes**

Protocols are carried out by agents which send each other messages but share no other information. We simulate agents by processes which share no data or variables. Typical protocol steps from the literature are


In addition an agent may perform a calculation, such as generating 1000 random bits or encrypting/decrypting a message or checking the values received in a message. Calculations aren't protocol steps and don't affect qubit state, though they often depend on the results of measuring qubits and their results often influence subsequent protocol steps. Our processes have analogues of protocol steps and calculations. In addition we are able to create processes, to choose conditionally between different processes and to set up a collection of processes running simultaneously.

The aim of our work is to mathematically analyse programs which describe quantum systems. Towards that end we have a semantics of quantum-mechanical calculation [5], written in Coq [10]. That is work in progress: for the time being we are able to execute our protocol-programs using our simulator [1].

## **1.1 A programming language**

Our language has two distinct notations: a protocol-step language, which is derived from the pi-calculus [11], and a functional calculation language, somewhat in the style of Miranda [12]. Neither language has assignment, although qubit measurement does change program state and so needs special attention. The protocol-step language has recursion, but only tail recursion: i.e. nothing can follow a process invocation step (but note that parallel execution of sub-processes provides more complexity).

Following the pi-calculus we use channels to communicate between processes. So Alice doesn't send to Bob, she sends down a channel which Bob can read from – or perhaps it might be Eve, if there is interference. Channels are values, so you can set up communication between two processes by giving them the same channel-argument when you create them, and you can send channel values in messages to alter connections dynamically.

In the protocol-step language steps are separated by dots ('.') and choices are made between processes rather than single or multiple steps. Channels are created by (new c); send is C!E, .., E; receive is C?(x, .., x); qubits are created by (newq q); quantum gating is Q, .., Q>>G; quantum measurement Q−/−(x).

In the expression language there is function application (<sup>f</sup> arg), arithmetic and Boolean calculation, conditional choice and recursion. It uses infinite-precision rationals for numerical calculations.

## **1.2 Symbolic quantum calculation**

Quantum calculations can be described using quantum circuits: diagrams such as Fig. 1 show how qubits (one per line) are put through gates (boxes, lineconnectors) and/or measured (meter symbols) giving a classical 0/1 result.

In quantum mechanics the state of a qubit is a vector a |0- + b |1-, with |a| <sup>2</sup> <sup>+</sup> <sup>|</sup>b<sup>|</sup> <sup>2</sup> = 1. Here <sup>|</sup>0 and |1 are the computational basis vectors, a and b are complex amplitudes, and |a| <sup>2</sup> and <sup>|</sup>b<sup>|</sup> <sup>2</sup> give the probability of measuring the state as |0 or |1-. In qtpi a single isolated qubit is therefore a pair of complex numbers, and quantum gates, such as the H, X and Z gates of Fig. 1, are square matrices of complex numbers which modify the state by multiplication. The state of n entangled qubits is a 2<sup>n</sup>-element vector, matrices which manipulate all of it have to be 2<sup>n</sup> <sup>×</sup> <sup>2</sup><sup>n</sup>, so calculations with large entanglements can rapidly grow out of the range of straightforward simulation. Luckily, quantum security protocols typically work with a small number of qubits at a time.

Because our calculations are simple, we can afford to implement them symbolically. We use h for -1/2; it is also equal to sin (π/4) and cos (π/4). A great deal of formulae can be expressed in terms of powers of h: for example cos (π/8) = -(1 + h)/2.

Symbolic calculation involves lots of symbolic simplification. That makes it relatively slow, compared to calculation with floating-point numbers, but it is absolutely accurate – h<sup>2</sup> + h<sup>2</sup>, for example, is exactly 1. When measuring, we must convert symbolic probabilities into numbers. But that is part of a statistical calculation, so minor inaccuracy is acceptable.

**Fig. 1.** Quantum circuit for teleportation

```
proc System () =
  (newq x=|+>, y=|0>) x,y>>CNot .
  (new c:^bit*bit) | Alice(x,c) | Bob(y,c)
proc Alice (x:qbit, c:^bit*bit) =
    (newq z)
    out!["initially Alice's z is "] . outq!(qval z) . out!["\n"] .
    z,x>>CNot . z>>H . z-/-(vz) . x-/-(vx) . c!vz,vx . _0
proc Bob(y:qbit, c:^bit*bit) =
    c?(b1,b2) .
    y >> match b1,b2 . + 0b0,0b0 . I
                       + 0b0,0b1 . X
                       + 0b1,0b0 . Z
                       + 0b1,0b1 . Z*X .
    out!["finally Bob's y is "] . outq!(qval y) . out!["\n"] . _0
274 R. Bornat et al.
```
**Fig. 2.** Teleportation of an unknown quantum state, with logging

## **1.3 No cloning**

In the real quantum world there is no way of cloning a qubit – you can't start with a qubit in some arbitrary state and finish up with two qubits in that state. That, plus the fact that measurement irrevocably alters a qubit's state, is what provides quantum security protocols with unconditional security – though the uncertainty of measurement means that the guarantee is probabilistic, not absolute. A programming language which simulates quantum effects should therefore not allow copying of the value of a qubit variable. We use language restrictions to facilitate anti-cloning checks: in particular we severely restrict the use of qubits in data structures, in messages, and after measurement or transmission. Those checks are partly implemented by typechecking, partly by an efficient static symbolic execution before simulation begins.

## **1.4 Other notable features**

Randomised priority queues of runnable processes and waiting communication offers ensure non-deterministic execution, and are used to eliminate infinite unfairness. Logging steps can be pushed into subprocesses to clarify protocol descriptions, leaving a marker in the logged process to show where it should occur (see examples in artifact [6]). Type descriptions are almost entirely optional.

## **2 Straightforward description**

Our aim is to provide a programming language in which protocol descriptions are transparently easy to read. For example, Fig. 2 shows teleportation [4] using three processes: Alice and Bob carry out the protocol, and System sets up the communication between them. The calculation follows the circuit in Fig. 1, but is shared between agents obeying the anti-cloning restrictions.

The System process creates qubits x and y (newq ..), initialised to |+ and |0-, and entangles them using a CNot gate (x,y>> ..). It creates a channel c which carries pairs of bits (new c ..), and then splits into two subprocesses: one becomes Alice, taking one of the qubits and the channel; the other becomes Bob, with the other qubit and the same channel. Those processes run in parallel.

The Alice process creates a new qubit <sup>z</sup>, without specifying its state, and logs that state (the anti-cloning restrictions make this tricky). Then it puts z and x through a CNot gate (z,x>> ..), puts z alone through a Hadamard gate (z>>H), and finally measures first z (z-/-(vz)), then x (x-/-(vx)), giving bits vz and vx. Finally it sends those bits to Bob on the c channel (c!...). The overall effect is subtle, because first System's actions entangle x and y, so that measurement of x constrains y, and then Alice entangles z, x and y, so that measurement of z constrains both x and y.

The Bob process waits to receive Alice's message (c? ..), and calculates a gate (match ..) to process the results depending on one of four possibilities for the two bits it receives (note one of the gates is the matrix product of Z and X). It puts y through that gate (y>> ..) and logs the result. The output of this program is always

#### initially Alice's z is 2:(a2|0>+b2|1>) finally Bob's y is 1:(a2|0>+b2|1>)

where a<sup>2</sup> and b<sup>2</sup> are unknown symbolic amplitudes. A sample execution trace, edited for brevity, shows the states produced by Alice's actions: qubit 0 is x, 1 is y, 2 is z; initially 0 and 1 are entangled, and the first step entangles all three.

```
Alice (2:(a2|0>+b2|1>),0:[0;1](h|00>+h|11>)) >> Cnot;
  result (2:[2;0;1](h*a2|000>+h*a2|011>+h*b2|101>+h*b2|110>),
          0:[2;0;1](h*a2|000>+h*a2|011>+h*b2|101>+h*b2|110>))
Alice 2:[2;0;1](h*a2|000>+h*a2|011>+h*b2|101>+h*b2|110>) >> H;
  result 2:[2;0;1]
            (h(2)*a2|000>+h(2)*b2|001>+h(2)*b2|010>+h(2)*a2|011>
             +h(2)*a2|100>-h(2)*b2|101>-h(2)*b2|110>+h(2)*a2|111>)
Alice: 2: (.. as above ..) -/- ;
  result 0 and (0:[0;1](h*a2|00>+h*b2|01>+h*b2|10>+h*a2|11>),
                1:[0;1](h*a2|00>+h*b2|01>+h*b2|10>+h*a2|11>))
Alice: 0:[0;1](h*a2|00>+h*b2|01>+h*b2|10>+h*a2|11>) -/- ;
  result 1 and 1:(b2|0>+a2|1>)
Chan 2: Alice -> Bob (0,1)
Bob 1:(b2|0>+a2|1>) >> X; result 1:(a2|0>+b2|1>)
```
Tracing several executions shows that Alice's measurements don't always give the same results in vz, vx and qubit 1, so Bob doesn't always use the same gate(s). The qubit z is never sent in a message, is destroyed by Alice's measurement, and its amplitudes are unknown to the program, but y always finishes up in the state that z began in. Without symbolic calculation we couldn't do such a simulation.

## **3 Performance on examples**

We can run various simulations of the quantum key-distribution protocol BB84 [3], with Alice and Bob and various Eve processes. In order to generate a onetime key to encrypt an n-bit message, Alice needs to send many more bits than n, and our simulation allows us to experiment with various parameters of her calculation to see what happens. Here is a shortened display of part of the output of an example simulation (timing measurements made on VirtualBox Ubuntu 18.10, on a 7-year-old MacBook Air with 8GB RAM):

```
length of message? 4000; length of a hash key? 40;
minimum number of checkbits? 500; number of sigmas? 10;
number of trials? 100
```
## 13718 qubits per trial; 0 interfered with; 100 succeeded

It takes about 0.6 seconds for each trial, but overall it makes 1.3M qubit transfers and measurements in 60 CPU seconds. With an intercept-and-resend Eve, the same exchanges take 95 seconds, but Eve's interference is detected every time. With a very short message and very few checkbits we can show that even such a naive Eve can sometimes win, as statistical analysis predicts.

Our simulation of E92 QKD [7] uses 20 000 entangled qubit pairs per trial for the same-sized problem. Because the protocol calculations are more complicated and our calculation language is interpreted rather than compiled, simulation takes over 4 CPU minutes.

Qtpi can handle larger entanglements. In about 13 seconds it's able to set up and measure one 'brick' (ten qubits, all CZ-entangled) of the measurement-based quantum computing mechanism in [8] – but that's too small to be useful, and larger entanglements are exponentially worse.

## **4 Conclusions**

We have a quantum programming language which allows description of protocols with multiple agents. It has protection, built from well-understood computer science foundations, against cloning of qubits within a simulation. It is not yet able to deal efficiently with entanglements of more than a few qubits. Its symbolic calculator is fast enough for the protocols we have examined.

## **5 Data Availability and Acknowledgements**

The qtpi interpreter and the examples referred to in the paper are available at https://doi.org/10.6084/m9.figshare.11882592. Our research was supported by UK National Cyber Security Centre through the VeTSS project "Formal Verification of Quantum Security Protocols using Coq". Nagarajan was also supported by EU Cost Action IC1405 "Reversible Computation - Extending Horizons of Computing". We thank Simon Gay for helpful discussions.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **EMTST: Engineering the Meta-theory of Session Types**

David Castro , Francisco Ferreira , and Nobuko Yoshida

Imperial College London, {d.castro-perez, f.ferreira-ruiz, n.yoshida} @imperial.ac.uk

**Abstract** Session types provide a principled programming discipline for structured interactions. They represent a wide spectrum of type-systems for concurrency. Their type safety is thus extremely important. EMTST is a tool to aid in representing and validating theorems about session types in the Coq proof assistant. On paper, these proofs are often tricky, and error prone. In proof assistants, they are typically long and difficult to prove. In this work, we propose a library that helps validate the theory of session types calculi in proof assistants. As a case study, we study two of the most used binary session types systems: we show the impossibility of representing the first system in α-equivalent representations, and we prove type preservation for the revisited system. We develop our tool in the Coq proof assistant, using locally nameless for binders and small scale reflection to simplify the handling of linear typing environments.

**Keywords:** Concurrency · proof assistants · meta-theory · session-types.

## **1 Introduction**

Given the prevalence of distributed computing and multi-core processors, concurrency is a key aspect of modern computing. The transition from sequential models of computation to concurrent systems has huge practical and theoretical consequences. Message passing calculi (like the π-calculus) have been used to model these systems since their introduction by Milner et al. [15]. Notably, in many cases *typing disciplines* are used as a way to control concurrent and distributed behaviour. Certifying basic typed π-calculi is important for both the safety of implementations and the trustworthiness of new theories.

In this work, we concentrate on providing tools for reasoning about *session types* [10], a typing discipline for structured interactions in distributed systems. Session types are applied to a wide range of problems, and their properties, such as deadlock-freedom, are well studied. These calculi are very expressive, and rather complex, with features like: shared and linear communication channels, name passing, and fresh name generation. Given this complexity, it is not surprising that some innocent looking extensions violated the type safety properties of the calculus in several literature (as pointed out by [23]). In consequence, the interest for mechanisation and formal proofs has risen significantly as a means to increase the trust on systems.

Type systems offer certain security properties by construction. These guarantees are backed by rigorous proofs (these proofs conform the meta-theory of the system). Moreover, these proofs are cumbersome to write, maintain and extend. Proof assistants aim to help with these problems. In this work, we develop the EMTST library to aid in the implementation of session calculi type systems. As a form of validation, we implement and replicate results in the meta-theory of binary session types. Concretely, we use the Coq proof assistant [20] to study the representation and meta-theory of the two systems described in [23].

EMTST uses *locally nameless* (LN)[1, 5] variable binders to represent syntax. The tool implements a LN library with extended support for multiple binding scopes, a robust environment implementation suitable for the challenges of session typing disciplines. The library and lemmas are written taking advantage of boolean reflection through the use of the Ssreflect [7] library.

We implement two case studies from [23]. The first study that we refer to as *the original system* and the second that we refer to as *the revised* systems. Notably, the way the original system handles names (in Sect. 3.1), makes its representation impossible when using intrinsically α-convertible terms (e.g: locally nameless, de Bruijn indices, and many others). Furthermore in Sect. 3.2, we discuss how the revised system allows us to implement and prove type preservation. In hindsight, this problem appears as evident, but it is an unexpected consequence, and it shows that mechanising proofs brings further understanding even to well-established and thoroughly studied systems. EMTST and our case studies are available at https://github.com/emtst/emtst-proof.

The rest of the paper is structured in the following way: in the next section we introduce the ideas and design behind EMTST our library for mechanising the meta-theory of session types. Subsequently in Sect. 3, we present the two case studies: in Sect. 3.1 the original system from [23, 11] and the revisited system in Sect. 3.2. We finalise, by giving a conclusion and related work.

## **2 EMTST: a Tool for Representing the Meta-theory of Session Types**

The study of meta-theory (i.e: proving a system has the expected properties) gives us confidence in the design. Additionally, proof formalisations, not only give us confidence in the results, but also often result in new insights about a problem. This is due to the fact that successful mechanisations require very precise specifications and careful thought to define and revisit all the concepts. In this context, EMTST is a tool that implements locally nameless (initially proposed by [8, 14, 13], and more recently further developed in [1, 5]) with multiple binding scopes, and a robust typing environment implementation using boolean reflection (by building on top of ssreflect [7]).

The key concept of LN is to use de Bruijn indices [2] for bound variables and names (sometimes called "atoms" in the literature) for free variables. A representation of syntax is well formed, namely *locally closed*, when this invariant is respected (i.e.: no de Bruijn index is free). Finally, in order to deal with open terms, there are two convenient operations on syntax, one is to *open* binders in terms, and one to *close* binders. The former substitutes a bound variable with a fresh name, and the other does the converse. For more details, refer to our tech report [4], the references, and the implementation.

## **2.1 Environments and Multiple Name Scopes**

```
Module Type ATOM.
 Parameter atom : Set.
 Definition t := atom.
 (∗ atoms can be compared to booleans ∗)
 Parameter eq atom : atom → atom → bool.
 Parameter eq reflect : ∀ (a b : atom),
   ssrbool.reflect (a = b) (eq atom a b).
 Parameter atom eqMixin : Equality.mixin of atom.
 Canonical atom eqType := EqType atom atom eqMixin.
 Parameter fresh : seq atom → atom.
 Parameter fresh not in : ∀ l, (fresh l) ∈/ l.
 (∗ ... ∗)
End ATOM.
```
**Figure 1.** The type of atoms

*Locally nameless implementation* is in three files. The first (theories/Atom.v) provides the basic definition and specification of atoms to act as names, the second one (theories/AtomScopes.v)

provides a way to create multiple disjoint sets of names for representing variables in the different scopes that session types require (e.g. variables

and channel names), and the final one (theories/Env.v) implements contexts and typings as finite maps, with emphasis on supporting the linearity requirements of various session typing disciplines.

We use module types and parametrised modules to abstract the type of atoms together with their supported operations. Figure 1 shows the interface for working with atoms: how to compare them and functions to obtain a fresh atom given a finite sequence of atoms (definition: fresh), and to have proof that the fresh atom is actually fresh (definition: fresh\_not\_in).

*Environments.* Environments are parametrised over two types, one for the keys, and one for the type of values. Environments env are either undefined, or a finite map of unique keys and values. All the operations keep the invariant that any operation that would lead to a duplicated entry key makes the tree undefined. We define the expected operations and lemmas over the type env. We provide an extensive library of proved theorems about environments that is tailored to support linear and affine systems.

EMTST is used in the two formalisations in Sect. 3.1 and 3.2 and we claim they are also suitable for other mechanisations where resource sensitivity and locally nameless are required. A release version of EMTST is available at [3] and the public repository at: https://github.com/emtst/emtst-proof.

## **3 Two Case Studies on Binary Session Types**

EMTST is intended to help with the complex binding structure of concurrent calculi that have names as a first class notion together with linear or affine typing disciplines. We study two seminal session type systems in the literature. First the *original system*, from Honda, Vasconcelos and Kubo's binary session type system [11] that is a milestone in the development of type systems for concurrent process calculi. This system types structured interaction between processes and supports channel mobility, that is higher-order sessions. Second, we implement the revisited session type presentation from [23], inspired by [6]. Our technical report [4] contains an extensive presentation.

#### **3.1 The Original System**


Figure 2 presents the syntax following [23], where names are ranged by a, b, c,. . . , channels are ranged by k and k . Notice that all the places where there are variable binders are denoted with parenthesis followed by a dot (e.g: k ?(x).P). The syntax is straightforwardly defined as the proc inductive type in theories/SyntaxO.v and following the LN technique the locally closed predicate, that formalises the binding structure, is defined as the predicate lc.

Besides its syntax, the original system is specified by its reduction, congruence and typing relations. We want to call attention to an important reduction rule for passing names:

$$\{\text{Pass-NM}\} \qquad \text{through } k\left[k'\right]; P\mid \mathbf{catch } k\left(k'\right). Q \longrightarrow P\mid Q$$

This rule states that when passing a channel k'the receiving end has to bind a channel using the same name (or be α-convertible to that name). Notoriously, the name k'is a bound name in the receiving end, and the restriction imposed by the rule is a subtle change to the equality up-to α-conversion convention. Moreover, relaxations of that requirement may break subject reduction, a complete discussion is presented in Sect. 3 of [23]. As it is, this rule cannot be formalised in a representation that cannot distinguish between α-equivalent terms. Since in these representations, one cannot talk about the actual name of a bound variable. This is fundamentally what it means to be *up-to* α*-equality*. As a consequence, in locally nameless we are forced to specify the following rule:

$$\begin{array}{cc} \text{[Pass-LN]} & \begin{array}{c} \text{1c } P \ \text{body } Q \\ \hline \text{throw } k \, [k']; P \mid \text{catch } k \, ().Q \longrightarrow P \mid Q^{k'} \end{array} \end{array}$$

In this version of the rule, the bound name is just an anonymous de Bruijn index, and when it is opened it is assigned the same name k'. This change might look innocent, but it breaks subject reduction. In theories/TypesO.v, we show that the same counter example from [23] is typable and that it breaks subject reduction. This is presented in the CounterExample module and in the oft\_reduced lemma. In the next section, we discuss how this problem was addressed.

### **3.2 The Revised System**

.

As discussed in Sect. 3.1 and [23], the presentation of the original session types calculus [11] makes extending it (and representing it in LN) a delicate operation. Fortunately, the revised system (also from [23], inspired by [6]) proposes a solution. Indeed, this solution is readily implementable using LN (and many other representations with implicit α-equivalence).

The key insight in the design of the revisited system is considering *channel endpoints* instead of just *channels*. As before, a new channel is created when a requested session is accepted, and each continuation gets one of the *endpoints* of the newly created channel.


**Figure 3.** Syntax representation annotated with binders

For the revisited system's formalisation we distinguish binders in four categories (as shown in Figure 3): First, expression variables, with names from the set AEV, then shared channel variables from ASC, also linear channel variables from ALC, and finally channel names from ACN (these names can also be bound in restrictions). Channel names are not variables, but objects that exist at runtime.

Multiple disjoint sets of names simplify reasoning about free names (concretely, it avoids freshness problems among different kinds of binders). This is an engineering compromise, as having more binders duplicates some easy theorems but, in exchange, they simplify the harder theorems that rely on facts about LN open/close operations. Other compromises are possible.

This concludes the technical development, and represents a full proof of subject reduction for binary types, following the revised system<sup>1</sup> as defined in [23].

## **4 Related Work and Conclusions**

We presented EMTST, a tool conceived to aid in the mechanisation of session calculi. Our tool supports locally nameless representations with many disjoint atom scopes, and a versatile representation of environments. All while taking advantage of the small scale reflection style of proofs. We validated our design by formalising the subject reduction proof for a full session calculus type system. And, we explored issues with adequacy when, for example, systems contain fragile specifications.

Tools like Metalib [22] (implemented based on [1]) and AutoSubst [18] exist, but lack the ability to represent different binding scopes in the same syntax. Also, Polonowski [17] implements a library for generic environments, while this library is similar to ours, it does not make use of boolean reflection, that, in our opinion simplifies dealing with the equality of environments. While these libraries were influential, our requirements of multiple scopes of binding and boolean reflection proofs, means that we needed to develop EMTST, our own fit for purpose library.

Finally, formalisations of session types in proof assistants exist in the literature (e.g.: [21, 24, 19, 16, 9]). Most of them with ad-hoc binder representations. They are not necessarily meant to be reused or general enough for other developments. This paper, and the EMTST library are a step towards helping this become easier. For that purpose we developed the library and validated its claims by formalising existing systems from the literature. In the process (see Sect. 3.1 vs Sect. 3.2), we motivate how early mechanisation would help avoid problems in the presentation of a system. In the future, we plan to extend our use of the library to reason about multiparty session types [12] and other systems.

## **Acknowledgements**

This work was supported in part by EPSRC projects EP/K011715/1, EP/K034413/1, EP/L00058X/1, EP/N027833/1, EP/N028201/1, and EP/T006544/1.

<sup>1</sup> A minor difference is that we use a simpler version of recursion compared to the original paper.

## **Bibliography**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Games and Automata

#### **Solving Mean-Payoff Games via Quasi Dominions***-*

Massimo Benerecetti , Daniele Dell'Erba , and Fabio Mogavero

Universit`a degli Studi di Napoli Federico II, Naples, Italy

**Abstract.** We propose a novel algorithm for the solution of *mean-payoff games* that merges together two seemingly unrelated concepts introduced in the context of parity games, *small progress measures* and *quasi dominions*. We show that the integration of the two notions can be highly beneficial and significantly speeds up convergence to the problem solution. Experiments show that the resulting algorithm performs orders of magnitude better than the asymptotically-best solution algorithm currently known, without sacrificing on the worst-case complexity.

## **1 Introduction**

In this article we consider the problem of solving *mean-payoff games*, namely infinite-duration perfect-information two-player games played on weighted directed graphs, each of whose vertexes is controlled by one of the two players. The game starts at an arbitrary vertex and, during its evolution, each player can take moves at the vertexes it controls, by choosing one of the outgoing edges. The moves selected by the two players induce an infinite sequence of vertices, called play. The payoff of any prefix of a play is the sum of the weights of its edges. A play is winning if it satisfies the game objective, called *mean-payoff objective*, which requires that the limit of the *mean payoff*, taken over the prefixes lengths, never falls below a given *threshold* ν.

Mean-payoff games have been first introduced and studied by Ehrenfeucht and Mycielski in [20], who showed that positional strategies suffice to obtain the optimal value. A slightly generalized version was also considered by Gurvich *et al.* in [24]. Positional determinacy entails that the decision problem for these games lies in NPTime∩CoNPTime [34], and it was later shown to belong to UPTime <sup>∩</sup> CoUPTime [25], being UPTime the class of unambiguous nondeterministic polynomial time. This result gives the problem a rather peculiar complexity status, shared by very few other problems, such as integer factorization [22], [1] and parity games [25]. Despite various attempts [7, 19, 24, 30, 34], no polynomial-time algorithm for the mean-payoff game problems is known so far.

A different formulation of the game objective allows to define another class of quantitative games, known as *energy games*. The *energy objective* requires that, given an initial value c, called *credit*, the sum of c and the *payoff* of every prefix

<sup>-</sup> Partially supported by GNCS 2019 & 2020 projects "Metodi Formali per Tecniche di Verifica Combinata" and "Ragionamento Strategico e Sintesi Automatica di Sistemi Multi-Agente".

of the play never falls below 0. These games, however, are tightly connected to mean-payoff games, as the two type of games have been proved to be log-space equivalent [11]. They are also related to other more complex forms of quantitative games. In particular, unambiguous polynomial-time reductions [25] exist from these games to *discounted payoff* [34] and *simple stochastic games* [18].

Recently, a fair amount of work in formal verification has been directed to consider, besides correctness properties of computational systems, also quantitative specifications, in order to express performance measures and resource requirements, such as quality of service, bandwidth and power consumption and, more generally, bounded resources. Mean-payoff and energy games also have important practical applications in system verification and synthesis. In [14] the authors show how quantitative aspects, interpreted as penalties and rewards associated to the system choices, allow for expressing optimality requirements encoded as mean-payoff objectives for the automatic synthesis of systems that also satisfy parity objectives. With similar application contexts in mind, [9] and [8] further contribute to that effort, by providing complexity results and practical solutions for the verification and automatic synthesis of reactive systems from quantitative specifications expressed in linear time temporal logic extended with mean-payoff and energy objectives. Further applications to temporal networks have been studied in [16] and [15]. Consequently, efficient algorithms to solve mean-payoff games become essential ingredients to tackle these problems in practice.

Several algorithms have been devised in the past for the solution of the decision problem for mean-payoff games, which asks whether there exists a strategy for one of the players that grants the mean-payoff objective. The very first deterministic algorithm was proposed in [34], where it is shown that the problem can be solved with O - <sup>n</sup><sup>3</sup> · <sup>m</sup> · <sup>W</sup> arithmetic operations, with n and m the number of positions and moves, respectively, and W the maximal absolute weight in the game. A strategy improvement approach, based on iteratively adjusting a randomly chosen initial strategy for one player until a winning strategy is obtained, is presented in [31], which has an exponential upper bound. The algorithm by Lifshits and Pavlov [29], which runs in time O(<sup>n</sup> · <sup>m</sup> · <sup>2</sup><sup>n</sup> · log<sup>2</sup> <sup>W</sup>), computes the "potential" of each game position, which corresponds to the initial credit that the player needs in order to win the game from that position. Algorithms based on the solution of linear feasibility problems over the tropical semiring have been also provided in [2–4]. The best known deterministic algorithm to date, which requires O(n · m · W) arithmetic operations, was proposed by Brim *et al.* [13]. They adapt to energy and mean-payoff games the notion of progress measures [28], as applied to parity games in [26]. The approach was further developed in [17] to obtain the same complexity bound for the optimal strategy synthesis problem. A strategyimprovement refinement of this technique has been introduced in [12]. Finally, Bjork *et al.* [6] proposed a randomized strategy-improvement based algorithm running in time min{O - <sup>n</sup><sup>2</sup> · <sup>m</sup> · <sup>W</sup> , 2<sup>O</sup>( <sup>√</sup>n·log <sup>n</sup>)}.

Our contribution is a novel mean-payoff progress measure approach that enriches such measures with the notion of *quasi dominions*, originally introduced in [5] for parity games. These are sets of positions with the property that as

long as the opponent chooses to play to remain in the set, it loses the game for sure, hence its best choice is always to try to escape. A quasi dominion from where escaping is not possible is a winning set for the other player. Progress measure approaches, such as the one of [13], typically focus on finding the best choices of the opponent and little information is gathered on the other player. In this sense, they are intrinsically asymmetric. Enriching the approach with quasi dominions can be viewed as a way to also encode the best choices of the player, information that can be exploited to speed up convergence significantly. The main difficulty here is that suitable lift operators in the new setting do not enjoy monotonicity. Such a property makes proving completeness of classic progress measure approaches almost straightforward, as monotonic operators do admit a least fixpoint. Instead, the lift operator we propose is only inflationary (specifically, non-decreasing) and, while still admitting fixpoints [10, 33], need not have a least one. Hence, providing a complete solution algorithm proves more challenging. The advantages, however, are significant. On the one hand, the new algorithm still enjoys the same worst-case complexity of the best known algorithm for the problem proposed in [13]. On the other hand, we show that there exist families of games on which the classic approach requires a number of operations that can be made arbitrarily larger than the one required by the new approach. Experimental results also witness the fact that this phenomenon is by no means isolated, as the new algorithm performs orders of magnitude better than the algorithm developed in [13].

## **2 Mean-Payoff Games**

A two-player turn-based *arena* is a tuple A =Ps⊕,Ps-, *Mv*, with Ps<sup>⊕</sup> ∩Ps- = ∅ and Ps - Ps<sup>⊕</sup> ∪ Ps-, such thatPs, *Mv* is a finite directed graph without sinks. Ps<sup>⊕</sup> (*resp.*, Ps-) is the set of positions of player <sup>⊕</sup> (*resp.*, ) and *Mv* <sup>⊆</sup> Ps <sup>×</sup> Ps is a left-total relation describing all possible moves. A *path* in V ⊆ Ps is a finite or infinite sequence π ∈ Pth(V) of positions in V compatible with the move relation, *i.e.*, (πi, πi+1) ∈ *Mv*, for all i ∈ [0, |π| − 1). A positional *strategy* for player <sup>α</sup> ∈ {⊕, } on V <sup>⊆</sup> Ps is a function <sup>σ</sup><sup>α</sup> <sup>∈</sup> Strα(V) <sup>⊆</sup> (V <sup>∩</sup> Psα) <sup>→</sup> Ps, mapping each α-position v in the domain of σ<sup>α</sup> to position σα(v) compatible with the move relation, *i.e.*, (v, σα(v)) ∈ *Mv*. With Strα(V) we denote the set of all α-strategies on V, while Str<sup>α</sup> denotes <sup>V</sup>⊆Ps Strα(V). A *play* in V <sup>⊆</sup> Ps from a position v ∈ V *w.r.t.* a pair of strategies (σ⊕, σ-) ∈ Str⊕(V) × Str-(V), called ((σ⊕, σ-), v)*-play*, is a path π ∈ Pth(V) such that π<sup>0</sup> = v and, for all i ∈ [0, |π| − 1), if π<sup>i</sup> ∈ Ps<sup>⊕</sup> then πi+1 = σ⊕(πi) else πi+1 = σ-(πi). The *play function* play : (Str⊕(V)×Str-(V))×V → Pth(V) returns, for each position v ∈ V and pair of strategies (σ⊕, σ-) ∈ Str⊕(V) × Str-(V), the maximal ((σ⊕, σ-), v) play play((σ⊕, σ-), v). If a pair (σ⊕, σ-) ∈ Str⊕(V) × Str-(V) induces a finite play starting from position v ∈ V, then play((σ⊕, σ-), v) identifies the maximal prefix of that play that is contained in V.

A *mean-payoff game* (MPG for short) is a tuple - =A,Wg,wg, where A is an arena, Wg <sup>⊂</sup> <sup>Z</sup> is a finite set of integer weights, and wg : Ps <sup>→</sup> Wg is a *weight function* assigning a weight to each position. Ps<sup>+</sup> (*resp.*, Ps−) denotes the set of positive-weight positions (*resp.*, non-positive-weight positions). For convenience, we shall refer to non-positive weights as negative weights. Notice that this definition of MPG is equivalent to the classic formulation in which the weights label the moves, instead. The weight function naturally extends to paths, by setting wg(π) - <sup>|</sup>π|−<sup>1</sup> <sup>i</sup>=0 wg(πi). The goal of player <sup>⊕</sup> (*resp.*, ) is to maximize (*resp.*, minimize) v(π) lim inf <sup>i</sup>→∞ <sup>1</sup> <sup>i</sup> · wg(π≤<sup>i</sup>), where π≤<sup>i</sup> is the prefix up to index i. Given a threshold ν, a set of positions V ⊆ Ps is a ⊕-*dominion*, if there exists a <sup>⊕</sup>-strategy <sup>σ</sup><sup>⊕</sup> <sup>∈</sup> Str⊕(V) such that, for all -strategies <sup>σ</sup>- ∈ Str-(V) and positions v ∈ V, the induced play π = play((σ⊕, σ-), v) satisfies v(π) > ν. The pair of winning regions (Wn⊕,Wn-) forms a ν-mean partition. Assuming ν integer, the ν-mean partition problem is equivalent to the 0-mean partition one, as we can subtract ν to the weights of all the positions. As a consequence, the MPG decision problem can be equivalently restated as deciding whether player <sup>⊕</sup> (*resp.*, ) has a strategy to enforce lim inf <sup>i</sup>→∞ <sup>1</sup> <sup>i</sup> · wg(π≤<sup>i</sup>) > 0 (*resp.*, lim inf <sup>i</sup>→∞ <sup>1</sup> <sup>i</sup> · wg(π≤<sup>i</sup>) ≤ 0), for all the resulting plays π.

## **3 Solving Mean-Payoff Games via Progress Measures**

The abstract notion of progress measure [28] has been introduced as a way to encode global properties on paths of a graph by means of simpler local properties of adjacent vertexes. In the context of MPGs, the graph property of interest, called *mean-payoff property*, requires that the mean payoff of every infinite path in the graph be non-positive. More precisely, in game theoretic terms, a *mean-payoff progress measure* witnesses the existence of strategy σ for player such that each path in the graph induced by fixing that strategy on the arena satisfies the desired property. A mean-payoff progress measure associates with each vertex of the underlying graph a value, called *measure*, taken from the set of extended natural numbers <sup>N</sup><sup>∞</sup> - <sup>N</sup> ∪ {∞}, endowed with an ordering relation <sup>≤</sup> and an addition operation +, which extend the standard ordering and addition over the naturals in the usual way. Measures are associated with positions in the game and the measure of a position v can intuitively be interpreted as an estimate of the payoff that player ⊕ can enforce on the plays starting in v. In this sense, they measure "how far" v is from satisfying the mean-payoff property, with the maximal measure ∞ denoting failure of the property for v. More precisely, the -strategy induced by a progress measure ensures that measures do not increase along the paths of the induced graph. This ensures that every path eventually gets trapped in a non-positive-weight cycle, witnessing a win for player .

To obtain a progress measure, one starts from some suitable association of position of the game with measures. The local information encoded by these measures is then propagated back along the edges of the underlying graph so as to associate with each position the information gathered along plays of some finite length starting from that position. The propagation process is performed according to the following intuition. The measures of positions adjacent to v are propagated back to v only if those measures push v further away from the

property. This propagation is achieved by means of a measure stretch operation +, which adds, when appropriate, the measure of an adjacent position to the weight of a given position. This is established by comparing the measure of v with those of its adjacent positions, since, for each position v, the mean-payoff property is defined in terms of the sum of the weights encountered along the plays from that position. The process ends when no position can be pushed further away from the property and each position is not dominated by any, respectively one, of its adjacents, depending on whether that position belongs to player ⊕ or to player , respectively. The positions that did not reach measure <sup>∞</sup> are those from which player can win the game and the set of measures currently associated with such positions forms a mean-payoff progress measure.

To make the above intuitions precise, we introduce the notion of measure function, progress measure, and an algorithm for computing progress measures correctly. It is worth noticing that the progress-measure based approach as described in [13], called SEPM from now on, can be easily recast equivalently in the form below. A *measure function* <sup>μ</sup>: Ps <sup>→</sup> <sup>N</sup><sup>∞</sup> maps each position <sup>v</sup> in the game to a suitable measure μ(v). The order ≤ of the measures naturally induces a pointwise partial order on the measure functions defined in the usual way, namely, for any two measure functions μ<sup>1</sup> and μ2, we write η<sup>1</sup> η<sup>2</sup> if μ1(v) ≤ μ2(v), for all positions v. The set of measure functions over a measure space, together with the induced ordering , forms a *measure-function space*.

**Definition 1 (Measure-Function Space).** *The* measure-function space *is the* partial order <sup>F</sup> -MF, *whose components are defined as follows:*


*The* <sup>⊕</sup>-denotation *(*resp.*,* -denotation*) of a measure function* <sup>μ</sup> <sup>∈</sup> MF *is the set* μ<sup>⊕</sup> <sup>μ</sup>−1(∞) *(*resp.*,* μ- μ−<sup>1</sup>(∞)*) of all positions having maximal (*resp.*, non-maximal) measure associated within* μ*.*

Consider a position v with an adjacent u with measure η. A measure update of <sup>η</sup> *w.r.t.* <sup>v</sup> is obtained by the stretch operator +: <sup>N</sup><sup>∞</sup> <sup>×</sup> Ps <sup>→</sup> <sup>N</sup>∞, defined as η + v max{0, η + wg(v)}, which corresponds to the payoff estimate that the given position will obtain by choosing to follow the move leading to the u.

A *mean-payoff progress measure* is such that the measure associated with each game position v need not be increased further in order to beat the actual payoff of the plays starting from v. In particular, it can be defined by taking into account the opposite attitude of the two players in the game. While the player ⊕ tries to push toward higher measures, the player will try to keep the measures as low as possible. A measure function in which the payoff of each ⊕-position (*resp.*, -position) v is not dominated by the payoff of all (*resp.*, some of) its adjacents augmented with the weight of v itself meets the requirements.

**Definition 2 (Progress Measure).** *A measure function* μ ∈ MF *is a* progress measure *if the following two conditions hold true, for all positions* v ∈ Ps*:*

*1.* μ(u) + v ≤ μ(v)*, for all adjacents* u ∈ *Mv*(v) *of* v*, if* v ∈ Ps⊕*; 2.* μ(u) + v ≤ μ(v)*, for some adjacent* u ∈ *Mv*(v) *of* v*, if* v ∈ Ps-*.*

The following theorem states the fundamental property of progress measures, namely, that every position with a non-maximal measures is won by player .

#### **Theorem 1 (Progress Measure).** μ- ⊆ Wn-*, for all progress measures* μ*.*

In order to obtain a progress measure from a given measure function, one can iteratively adjust the current measure values in such a way to force the progress condition above among adjacent positions. To this end, we define the *lift operator* lift: MF → MF as follows:

$$\text{lift}(\mu)(v) \triangleq \begin{cases} \mathsf{max}\{\mu(w) + v : w \in Mv(v)\}, & \text{if } v \in \mathrm{Ps}\_{\oplus}; \\ \mathsf{min}\{\mu(w) + v : w \in Mv(v)\}, & \text{otherwise.} \end{cases}$$

Note that the lift operator is clearly monotone and, therefore, admits a least fixpoint. A mean-payoff progress measure can be obtained by repeatedly applying this operator until a fixpoint is reached, starting from the minimal measure function μ<sup>0</sup> - {v ∈ Ps → 0} that assigns measure 0 to all the positions in the game. The following *solver operator* applied to μ<sup>0</sup> computes the desired solution: sol lfp μ . lift(μ): MF → MF. Observe that the measures generated by the procedure outlined above have a fairly natural interpretation. Each positive measure, indeed, under-approximates the weight that player ⊕ can enforce along finite prefixes of the plays from the corresponding positions. This follows from the fact that, while player ⊕ maximizes its measures along the outgoing moves, player minimizes them. In this sense, each positive measure witnesses the existence of a positively-weighted finite prefix of a play that player ⊕ can enforce. Let S - {wg(v) <sup>∈</sup> <sup>N</sup> : <sup>v</sup> <sup>∈</sup> Ps <sup>∧</sup> wg(v) <sup>&</sup>gt; <sup>0</sup>} be the sum of all the positive weights in the game. Clearly, the maximal payoff of a simple play in the underlying graph cannot exceed S. Therefore, a measure greater than S witnesses the existence of a cycle whose payoff diverges to infinity and is won, thus, by player ⊕. Hence, any measure strictly greater than S can be substituted with the value ∞. This observation establishes the termination of the algorithm and is instrumental to its completeness proof. Indeed, at the fixpoint, the measures actually coincide with the highest payoff player ⊕ is able to guarantee. Soundness and completeness of the above procedure have been established in [13], where the authors also show that, despite the algorithm requiring O(n · S) = O - <sup>n</sup><sup>2</sup> · <sup>W</sup> lift operations in the worst-case, with n the number of positions and W the maximal positive weight in the game, the overall cost of these lift operations is O(S · m · log S) = O(n · m · W · log(n · W)), with m the number of moves and O(log S) the cost of arithmetic operations to compute the stretch of the measures.

## **4 Solving Mean-Payoff Games via Quasi Dominions**

Let us consider the simple example game depicted in Figure 1, where the shape of each position indicates the owner, circles for player ⊕ and square for its opponent , and, in each label of the form /w, the letter w corresponds to the associated weight, where we assume k > 1. Starting from the smallest measure function <sup>μ</sup><sup>0</sup> <sup>=</sup> {a, <sup>b</sup>, <sup>c</sup>, <sup>d</sup> → <sup>0</sup>}, the first application of the lift operator returns <sup>μ</sup><sup>1</sup> <sup>=</sup> {<sup>a</sup> → <sup>k</sup>; <sup>b</sup>, <sup>c</sup> → 0; <sup>d</sup> → <sup>1</sup>} <sup>=</sup> lift(μ0). After that step, the following iterations of the fixpoint alternatively updates positions c and d, since the other ones already satisfy the progress condition. Being <sup>c</sup> <sup>∈</sup> Ps-, the lift operator chooses for it the measure computed along the move (c, d), thus obtaining μ2(c) = lift(μ1)(c) = μ1(d) = 1. Subsequently, d is updated to μ3(d) = lift(μ2)(d) = μ2(c) + 1 = 2. A progress measure is obtained after exactly 2k+1 iterations, when the measure of c reaches value k and d value k+1. Note, however, that the choice of the move (c, d) is clearly a losing strategy for player , as remaining in the highlighted region would make the payoff from position c diverge. Therefore, the only reasonable choice for player is to exit from that region by taking the move leading to position a. An operator able to diagnose this phenomenon early on could immediately discard the move (c, d) and jump directly to the correct payoff obtained by choosing the move to position a. As we shall see, such an operator might lose the monotonicity property and recovering the completeness of the resulting approach will prove more involved.

In the rest of this article we devise a progress operator that does precisely that. We start by providing a notion of *quasi dominion*, originally introduced for parity games in [5], which can be exploited in the context of MPGs.

**Definition 3 (Quasi Dominion).** *An set of positions* Q ⊆ Ps *is a* quasi ⊕ dominion *if there exists a* ⊕*-strategy* σ<sup>⊕</sup> ∈ Str⊕(Q)*, called* ⊕-witness for Q*, such that, for all -strategies* σ- ∈ Str-(Q) *and positions* v ∈ Q*, the play* π = play((σ⊕, σ-), v)*, called* (σ⊕, v)-play in Q*, satisfies* wg(π) > 0*. If the condition* wg(π) > 0 *holds only for infinite plays* π*, then* Q *is called* weak quasi ⊕-dominion*.*

Essentially, a quasi ⊕-dominion consists in a set Q of positions starting from which player ⊕ can force plays in Q of positive weight. Analogously, any infinite play that player ⊕ can force in a weak quasi ⊕-dominion has positive weight. Clearly, any quasi ⊕-dominion is also a weak quasi ⊕-dominion. Moreover, the latter are closed under subsets, while the former are

Fig. 1: An MPG. not. It is an immediate consequence of the definition above that all infinite plays induced by the ⊕-witness, if any, necessarily have infinite weight and, thus, are winning for player ⊕. Indeed, every such a play π is regular, *i.e.* it can be decomposed into a prefix π and a simple cycle (π)<sup>ω</sup>, *i.e.* π = π (π)<sup>ω</sup>, since the strategies we are considering are memoryless. Now, wg((π)ω) > 0, so, wg(π) <sup>&</sup>gt; 0, which implies wg((π)<sup>ω</sup>) = <sup>∞</sup>. Hence, wg(π) = <sup>∞</sup>.

**Proposition 1.** *Let* Q *be a weak quasi* ⊕*-dominion with* σ<sup>⊕</sup> ∈ Str⊕(Q) *one of its* <sup>⊕</sup>*-witnesses and* <sup>Q</sup> <sup>⊆</sup> <sup>Q</sup>*. Then, for all -strategies* <sup>σ</sup>- ∈ Str-(Q ) *and positions* <sup>v</sup> <sup>∈</sup> <sup>Q</sup> *the following holds: if the* (σ⊕Q- , v)*-play* π = play((σ⊕Q- , σ-), v) *is infinite, then* wg(π) = ∞*.*

From Proposition 1, it directly follows that, if a weak quasi ⊕-dominion Q is *closed w.r.t.* its ⊕*-witness*, namely all the induced plays are infinite, then it is a ⊕-dominion, hence is contained in Wn⊕.

Consider again the example of Figure 1. The set of position Q - {a, <sup>c</sup>, <sup>d</sup>} forms a quasi ⊕-dominion whose ⊕-witness is the only possible ⊕-strategy mapping position d to c. Indeed, any infinite play remaining in Q forever and compatible with that strategy (*e.g.*, the play from position c when player chooses the move from c leading to d or the one from a to itself or the one from a to d) grants an infinite payoff. Any finite compatible play, instead, ends in position a (*e.g.*, the play from c when player chooses the move from c to a and then one from a to b) giving a payoff of at least k > 0. On the other hand, Q - {c, <sup>d</sup>} is only a weak quasi <sup>⊕</sup>-dominion, as player can force a play of weight 0 from position c, by choosing the exiting move (c, a). However, the internal move (c, d) would lead to an infinite play in Q of infinite weight.

The crucial observation here is that the best choice for player in any position of a (weak) quasi ⊕-dominion is to exit from it as soon as it can, while the best choice for player ⊕ is to remain inside it as long as possible. The idea of the algorithm we propose in this section is to precisely exploit the information provided by the quasi dominions in the following way. Consider the example above. In position <sup>a</sup> player must choose to exit from Q = {a, <sup>c</sup>, <sup>d</sup>}, by taking the move (a, b), without changing its measure, which would corresponds to its weight k. On the other hand, the best choice for player in position c is to exit from the weak quasi-dominion <sup>Q</sup> <sup>=</sup> {c, <sup>d</sup>}, by choosing the move (c, <sup>a</sup>) and lifting its measure from 0 to k. Note that this contrasts with the minimal measure-increase policy for player employed in [13], which would keep choosing to leave c in the quasi-dominion by following the move to d, which gives the minimal increase in measure of value 1. Once c is out of the quasi-dominion, though, the only possible move for player <sup>⊕</sup> is to follow <sup>c</sup>, taking measure <sup>k</sup> + 1. The resulting measure function is the desired progress measure.

In order to make this intuitive idea precise, we need to be able to identify quasi dominions first. Interestingly enough, the measure functions μ defined in the previous section do allow to identify a quasi dominion, namely the set of positions μ−<sup>1</sup>(0) having positive measure. Indeed, as observed at the end of that section, a positive measure witnesses the existence of a positively-weighted finite play that player ⊕ can enforce from that position onward, which is precisely the requirement of Definition 3. In the example of Figure 1, <sup>μ</sup>−<sup>1</sup> <sup>0</sup> (0) <sup>=</sup> <sup>∅</sup> and <sup>μ</sup>−<sup>1</sup> <sup>1</sup> (0) <sup>=</sup> {a, <sup>c</sup>, <sup>d</sup>} are both quasi dominions, the first one *w.r.t.* the empty ⊕-witness and the second one *w.r.t.* the <sup>⊕</sup>-witness <sup>σ</sup>⊕(d) = <sup>c</sup>.

We shall keep the quasi-dominion information in pairs (μ, σ), called *quasidominion representations* (qdr, for short), composed of a measure function μ and a ⊕-strategy σ, which corresponds to one of the ⊕-witnesses of the set of positions with positive measure in μ. The connection between these two components is formalized in the definition below that also provides the partial order over which the new algorithm operates.

**Definition 4 (**QDR **Space).** *The* quasi-dominion-representation space *is the* partial order <sup>Q</sup> -QDR, *, whose components are defined as follows:*

	- ⊕*-strategy* σ *(a)* Q() *is a quasi* ⊕*-dominion enjoying* σ *as a* ⊕*-witness;*
	- *(b)* μ <sup>⊕</sup> *is a* <sup>⊕</sup>*-dominion;*
	- *(c)* μ (v) ≤ μ (σ (v)) + v*, for all* ⊕*-positions* v ∈ Q() ∩ Ps⊕*;*
	- *(d)* μ (v) ≤ μ (u) + <sup>v</sup>*, for all -positions* <sup>v</sup> <sup>∈</sup> <sup>Q</sup>() <sup>∩</sup> Ps*and* u ∈ *Mv*(v)*;*

*The* <sup>α</sup>-denotation <sup>α</sup> *of a* qdr *, with* <sup>α</sup> ∈ {⊕, }*, is the* <sup>α</sup>*-denotation* <sup>μ</sup> <sup>α</sup> *of its measure function.*

Condition 1a is obvious. Condition 1b, instead, requires that every position with infinite measure is indeed won by player ⊕ and is crucial to guarantee the completeness of the algorithm. Finally, Conditions 1c and 1d ensure that every positive measure under approximates the actual weight of some finite play within the induced quasi dominion. This is formally captured by the following proposition, which can be easily proved by induction on the length of the play.

**Proposition 2.** *Let be a* qdr *and* vπu *a finite path starting at position* <sup>v</sup> <sup>∈</sup> Ps *and terminating in position* u ∈ Ps *compatible with the* ⊕*-strategy* σ *. Then,* μ (v) ≤ wg(vπ) + μ (u)*.*

It is immediate to see that every MPG admits a non-trivial QDR space, since the pair (μ0, σ0), with μ<sup>0</sup> the smallest measure function and σ<sup>0</sup> the empty strategy, trivially satisfies all the required conditions.

## **Proposition 3.** *Every* MPG *has a non-empty* QDR *space associated with it.*

The solution procedure we propose, called QDPM from now on, can intuitively be broken down as an alternation of two phases. The first one tries to lift the measures of positions outside the quasi dominion Q() in order to extend it, while the second one lifts the positions inside Q() that can be forced to exit from it by player . The algorithm terminates when no new position can be absorbed within the quasi dominion and no measure needs to be lifted to allow the -winning positions to exit from it, when possible. To this end, we define a controlled lift operator lift: QDR×2Ps×2Ps QDR that works on qdrs and takes two additional parameters, a source and a target set of positions. The intended meaning is that we want to restrict the application of the lift operation to the positions in the source set S, while using only the moves leading to the target set T. The different nature of the two types of lifting operations is reflected in the actual values of the source and target parameters.

$$\text{lift}(\varrho, \mathcal{S}, \mathcal{T}) \triangleq \varrho^\star, \text{ where}$$

298 M. Benerecetti et al.

$$\mu\_{\varrho^{\bullet}}(v) \triangleq \begin{cases} \mathsf{max}\{\mu\_{\varrho}(u) + v : u \in M v(v) \cap \mathcal{T}\}, & \text{if } v \in \mathcal{S} \cap \mathcal{Ps}\_{\oplus}; \\\min\{\mu\_{\varrho}(u) + v : u \in M v(v) \cap \mathcal{T}\}, & \text{if } v \in \mathcal{S} \cap \mathcal{Ps}\_{\Xi}; \\\mu\_{\varrho}(v), & \text{otherwise}; \end{cases}$$

and, for all <sup>⊕</sup>-positions <sup>v</sup> <sup>∈</sup> <sup>Q</sup>( ) <sup>∩</sup> Ps⊕, we choose <sup>σ</sup> - (v) ∈ argmax<sup>u</sup>∈Mv(v)∩<sup>T</sup> μ (u) + v, if μ - (v) = μ (v), and σ - (v) = σ (v), otherwise. Except for the restriction on the outgoing moves considered, which are those leading to the targets in T, the lift operator acts on the measure component of a qdr very much like the original lift operator does. In order to ensure that the result is still <sup>a</sup> qdr, however, the lift operator must also update the <sup>⊕</sup>-witness of the quasi dominion. This is required to guarantee that Conditions 1a and 1c of Definition 4 are preserved. If the measure of a ⊕-position v is not affected by the lift, the ⊕-witness must not change for that position. However, if the application of the lift operation increases the measure, then the ⊕-witness on v needs to be updated to any move (v, u) that grants measure μ - (v) to v. In principle, more than one such move may exist and any one of them can serve as witness.

The solution corresponds to the inflationary fixpoint [10, 33] of the two phases mentioned above, sol ifp . prg+(prg0()): QDR QDR, defined by the progress operators prg<sup>0</sup> and prg+. The first phase is computed by the operator prg<sup>0</sup> : QDR QDR as follows: prg0() sup{, lift(, Q(),Ps)}. This operator is responsible of enforcing the progress condition on the positions outside the quasi dominion Q() that do not satisfy the inequalities between the measures along a move leading to Q() itself. It does that by applying the lift operator with Q() as source and no restrictions on the moves. Those position that acquire a positive measure in this phase contribute to enlarging the current quasi dominion. Observe that the strategy component of the qdr is updated so that it is a ⊕-witness of the new quasi dominion. To guarantee that measures never decrease, the supremum *w.r.t.* the QDR-space ordering is taken as result.

#### **Lemma 1.** μ *is a progress measure over* Q()*, for all fixpoints of* prg0*.*

The second phase, instead, implements the mechanism intuitively described above, while analyzing the simple example of Figure 1. This is achieved by the operator prg<sup>+</sup> reported in Algorithm 1. The procedure iteratively examines the current quasi dominion and lifts the measures of the positions that must exit from it. Specifically, it processes Q() layer by layer, starting from the outer layer of positions that must escape. The process ends when a, possibly empty, closed weak quasi dominion is obtained. Recall that all the positions in a closed weak quasi dominion are necessarily winning for player ⊕, due to Proposition 1. We distinguish two sets of positions in Q(). Those that already satisfy the progress condition and those that do not. The measures of first ones already witness an escape route from Q(). The other ones, instead, are those whose current choice is to remain inside it. For instance, when considering the measure function μ<sup>2</sup> in the example of Figure 1, position a belongs to the first set, while positions c and d to the second one, since the choice of c is to follow the internal move (c, d).

Since the only positions that change measure are those in the second set, only such positions need to be examined. To identify them, which form a weak quasi dominion Δ() strictly contained in Q(), we proceed as follows. First, we collect the set npp() of positions in Q() that do not satisfy the progress condition, called the *non-progress positions*. Then, we compute the set of positions that will have no choice other than reaching npp(), by computing the inflationary fixpoint of a suitable pre operator.

$$\begin{split} \mathsf{npp}(\varrho) \triangleq & \{ v \in \mathsf{Q}(\varrho) \cap \mathrm{Ps}\_{\oplus} : \exists u \in M v(v) \,, \mu\_{\varrho}(v) < \mu\_{\varrho}(u) + v \} \\ & \cup \{ v \in \mathsf{Q}(\varrho) \cap \mathrm{Ps}\_{\oplus} : \forall u \in M v(v) \, . \,\mu\_{\varrho}(v) < \mu\_{\varrho}(u) + v \} . \\ \mathsf{pre}(\varrho, \mathsf{Q}) \triangleq & \mathsf{Q} \cup \{ v \in \mathsf{Q}(\varrho) \cap \mathrm{Ps}\_{\oplus} : \sigma\_{\varrho}(v) \in \mathsf{Q} \} \\ & \cup \{ v \in \mathsf{Q}(\varrho) \cap \mathrm{Ps}\_{\oplus} : \forall u \in M v(v) \, \, \mathrm{Q} \, . \,\mu\_{\varrho}(v) < \mu\_{\varrho}(u) + v \} . \end{split}$$

The final result is Δ() - (ifp Q . pre(, Q))(npp()). Intuitively, Δ() contains all the ⊕-positions that are forced to reach npp() via the quasi-dominion ⊕-witness and all the -positions that can only avoid reaching npp() by strictly increasing their measure, which player wants obviously to prevent.

It is important to observe that, from a functional view-point, the progress operator prg<sup>+</sup> would work just as well if applied to the entire quasi dominion Q(), since it would simply leave unchanged the measure of those positions that already satisfy the progress condition. However, it is crucial that only the positions in Δ() are processed in order to achieve the best asymptotic complexity bound known to date. We shall reiterate on this point later on.

At each iteration of the while-loop of Algorithm 1, let Q denote the current (weak) quasi dominion, initially set to Δ() (Line 1). It first identifies the positions in Q that can immediately escape from it (Line 2). Those are *(i)* all the -position with a move leading outside of Q and *(ii)* the ⊕-positions v whose ⊕-witness σ forces v to exit from Q, namely σ (v) ∈ Q, and that cannot strictly increase their measure by choosing to remain in Q. While the condition


for -position is obvious, the one for <sup>⊕</sup>-positions require some explanation. The crucial observation here is that, while player ⊕ does indeed prefer to remain in the quasi dominion, it can only do so while ensuring that by changing strategy it does not enable infinite plays within Q that are winning for the adversary. In other words, the new ⊕-strategy must still be a ⊕-witness for Q and this can only be ensured if the new choice strictly increases its measure. The operator esc : QDR×2Ps <sup>→</sup> <sup>2</sup>Ps formalizes the idea:

$$\begin{split} \mathsf{resc}(\varrho, \mathbf{Q}) \triangleq \{ v \in \mathbf{Q} \cap \mathrm{Ps}\_{\Xi} : Mv(v) \mid \mathbf{Q} \neq \emptyset \} \\ \cup \{ v \in \mathbf{Q} \cap \mathrm{Ps}\_{\oplus} : \sigma\_{\varrho}(v) \notin \mathbf{Q} \land \forall u \in Mv(v) \cap \mathbf{Q} . \mu\_{\varrho}(u) + v \leq \mu\_{\varrho}(v) \} . \end{split}$$

Consider, for instance, the example in Figure 2 and a qdr such that μ <sup>=</sup> {<sup>a</sup> → 3; <sup>b</sup> → 2; <sup>c</sup>, <sup>d</sup>, <sup>f</sup> → 1; <sup>e</sup> → <sup>0</sup>} and <sup>σ</sup> <sup>=</sup> {<sup>b</sup> → <sup>a</sup>; <sup>f</sup> → <sup>d</sup>}. In this case, we have Q <sup>=</sup> {a, <sup>b</sup>, <sup>c</sup>, <sup>d</sup>, <sup>f</sup>} and <sup>Δ</sup>() = {c, <sup>d</sup>, <sup>f</sup>}, since <sup>c</sup> is the only non-progress positions, d is forced to follow c in order to avoid the measure increase required to reach <sup>b</sup>, and <sup>f</sup> is forced by the <sup>⊕</sup>-witness to reach <sup>d</sup>. Now, consider the situation where the current weak quasi dominion is Q = {c, <sup>f</sup>}, *i.e.* after <sup>d</sup> has escaped from <sup>Δ</sup>(). The escape set of Q is {c, <sup>f</sup>}. To see why the <sup>⊕</sup>-position <sup>f</sup> is escaping, observe that μ (f) + f =1= μ (f) and that, indeed, should player <sup>⊕</sup> choose to change its strategy and take the move (f, f) to remain in Q, it would obtain an infinite play with payoff 0, thus violating the definition of weak quasi dominion.

Before proceeding, we want to stress an easy consequence of the definition of the notion of escape set and Conditions 1c and 1d of Definition 4, *i.e.*, that every escape position of the quasi dominion Q() can only assume its weight as possible measure inside a qdr , as reported is the following proposition. This observation, together with Proposition 2, ensures that the measure of a position v ∈ Q() is an under approximation of the weight of all finite plays leaving Q().

Fig. 2: Another MPG.

#### **Proposition 4.** *Let be a* qdr*. Then,* μ (v) = wg(v) > 0*, for all* v ∈ esc(, Q())*.*

Now, going back to the analysis of the algorithm, if the escape set is non-empty, we need to select the escape positions that need to be lifted in order to satisfy the progress condition. The main difficulty is to do so in such a way that the resulting measure function still satisfies Condition 1d of Definition 4, for all the -positions with positive measure. The problem occurs when a -position can exit either immediately or passing through a path leading to another position in the escape set. Consider again the example above, where Q = <sup>Δ</sup>() = {c, <sup>d</sup>, <sup>f</sup>}. If position <sup>d</sup> immediately escapes from Q using the move (d, b), it would change its measure to μ (d) = μ(b) + d = 2 > μ(d) = 1. Now, position c has two ways to escape, either directly with move (c, a) or by reaching the other escape position d passing through f. The first choice would set its measure to μ(a) + c = 4. The resulting measure function, however, would not satisfy Condition 1d of Definition 4, as the new measure of c would be greater than μ (d)+c = 2, preventing to obtain a qdr. Similarly, if position d escapes from Q passing through c via the move (c, a), we would have μ(d) = μ(c) + d = (μ(a) + c) + d = 4 > 2 = μ(b) + d, still violating Condition 1d. Therefore, in this specific case, the only possible way to escape is to reach b. The solution to this problem is simply to lift in the current iteration only those positions that obtain the lowest possible measure increase, hence position d in the example, leaving the lift of c to some subsequent iteration of the algorithm that would choose the correct escape route via d. To do so, we first compute the minimal measure increase, called the *best-escape forfeit*, that each position in the escape set would obtain by exiting the quasi dominion immediately. The positions with the lowest possible forfeit, called *best-escape positions*, can all be lifted at the same time. The intuition is that the measure of all the positions that escape from a (weak) quasi dominion will necessarily be increased of at least the minimal best-escape forfeit. This observation is at the core of the proof of Theorem 2

(see the appendix) ensuring that the desired properties of qdrs are preserved by the operator prg+. The set of best-escape positions is computed by the operator bep: QDR×2Ps <sup>→</sup> <sup>2</sup>Ps as follows: bep(, Q) argmin<sup>v</sup>∈esc( ,Q) bef(μ , Q, v), where the operator bef : MF×2Ps×Ps <sup>→</sup> <sup>N</sup><sup>∞</sup> computes, for each position <sup>v</sup> in a quasi dominion Q, its best-escape forfeit:

$$\mathsf{bdef}(\mu, \mathcal{Q}, v) \triangleq \begin{cases} \mathsf{max}\{\mu(u) + v - \mu(v) : u \in M v(v) \mid \mathcal{Q}\}, & \text{if } v \in \mathcal{Ps}\_{\oplus}; \\\mathsf{min}\{\mu(u) + v - \mu(v) : u \in M v(v) \mid \mathcal{Q}\}, & \text{otherwise.} \end{cases}$$

In our example, bef(μ, <sup>Q</sup>, <sup>c</sup>) = <sup>μ</sup>(a) + <sup>c</sup> <sup>−</sup> <sup>μ</sup>(c)=4 <sup>−</sup> 1 = 3, while bef(μ, <sup>Q</sup>, <sup>d</sup>) = <sup>μ</sup>(b) + <sup>d</sup> <sup>−</sup> <sup>μ</sup>(d)=2 <sup>−</sup> 1 = 1. Therefore, bep(, Q) = {d}.

Once the set E of best-escape positions is identified (Line 3), the procedure lifts them restricting the possible moves to those leading outside the current quasi dominion (Line 4). Those positions are, then, removed from the set (Line 5), thus obtaining a smaller weak quasi dominion ready for the next iteration.

The algorithm terminates when the (possibly empty) current quasi dominion Q is closed. By virtue of Proposition 1, all those positions belong to Wn<sup>⊕</sup> and their measure is set to <sup>∞</sup> by means of the operator win: QDR×2Ps QDR (Line 6), which also computes the winning ⊕-strategy on those positions, as follows: win(, Q) - , where μ - μ [Q → ∞] and, for all <sup>⊕</sup>-positions <sup>v</sup> <sup>∈</sup> <sup>Q</sup>( )∩Ps⊕, we choose σ - (v) ∈ argmax<sup>u</sup>∈Mv(v)∩<sup>Q</sup> μ (u)+v, if σ (v) ∈ Q, and σ - (v) = σ (v), otherwise. Observe that, since we know that every ⊕-position v ∈ Q∩Ps⊕, whose current ⊕-witness leads outside Q, is not an escape position, any move (v, u) within Q that grants the maximal stretch μ (u) + v strictly increases its measure and, therefore, is a possible choice for a ⊕-witness of the ⊕-dominion Q.

At this point, it should be quite evident that the progress operator prg<sup>+</sup> is responsible of enforcing the progress condition on the positions inside the quasi dominion Q(), thus, the following necessarily holds.

#### **Lemma 2.** μ *is a progress measure over* Q()*, for all fixpoints of* prg+*.*

In order to prove the correctness of the proposed algorithm, we first need to ensure that any quasi-dominion space Q is indeed closed under the operators prg<sup>0</sup> and prg+. This is established by the following theorem, which states that the operators are total functions on that space.

## **Theorem 2.** *The operators* prg<sup>0</sup> *and* prg<sup>+</sup> *are total inflationary functions.*

Since both operators are inflationary, so is their composition, which admits fixpoint. Therefore, the operator sol is well defined. Moreover, following the same considerations discussed at the end of Section 3, it can be proved the fixpoint is obtained after at most n · (S + 1) iterations. Let ifp<sup>k</sup> X . F(X) denote the k-th iteration of an inflationary operator F. Then, we have the following theorem.

**Theorem 3 (Termination).** *The solver operator* sol ifp . prg+(prg0()) *is a well-defined total function. Moreover, for every* ∈ QDR *it holds that* sol() = (ifp<sup>k</sup> . prg+(prg0( )))()*, for some index* <sup>k</sup> <sup>≤</sup> <sup>n</sup>·(<sup>S</sup> + 1)*, where* <sup>n</sup> *is the number of positions in the* MPG *and* S - {wg(v) <sup>∈</sup> <sup>N</sup> : <sup>v</sup> <sup>∈</sup> Ps <sup>∧</sup> wg(v) <sup>&</sup>gt; <sup>0</sup>} *the total sum of its positive weights.*

As already observed before, Figure 1 exemplifies an infinite family of games with a fixed number of positions and increasing maximal weight k over which the SEPM algorithm requires 2k + 1 iterations of the lift operator. On the contrary, QDPM needs exactly two iterations of the solver operator sol to find the progress measure, starting from the smallest measure function μ0. Indeed, the first iteration returns a measure function μ<sup>1</sup> = sol(μ0), with μ1(a) = k, μ1(b) = μ1(c) = 0, and μ1(d) = 1, while the second one μ<sup>2</sup> = sol(μ1) identifies the smallest progress measure, with μ1(a) = μ1(c) = k, μ1(b) = 0, and μ1(d) = k + 1. From this observations, the next result immediately follows.

**Theorem 4.** *An infinite family of* MPG*<sup>s</sup>* {<sup>k</sup>}<sup>k</sup> *exists on which* QDPM *requires a constant number of measure updates, while* SEPM *requires* O(k) *such updates.*

From Theorem 1 and Lemmas 1 and 2 it follows that the solution provided by the algorithm is indeed a progress measure, hence establishing soundness. Completeness follows from Theorem 3 and from Condition 1b of Definition 4 that ensures that all the positions with infinite measure are winning for player ⊕.

**Theorem 5 (Correctness).** sol()- = Wn-*, for every* ∈ QDR*.*

The following lemma ensures that each execution of the operator prg<sup>+</sup> strictly increases the measure of all the positions in Δ().

**Lemma 3.** *Let* prg+()*. Then,* μ - (v) > μ (v)*, for all positions* v ∈ Δ()*.*

Recall that each position can at most be lifted S +1= O(n · W) times and, by the previous lemma, the complexity of sol only depends on the cumulative cost of such lift operations. We can express, then, the total cost as the sum, over the set of positions in the game, of the cost of all the lift operations performed on that positions. Each such operation can be computed in time linear in the number of incoming and outgoing moves of the corresponding lifted position v, namely O - (|*Mv*(v)<sup>|</sup> <sup>+</sup> <sup>|</sup>*Mv* <sup>−</sup><sup>1</sup>(v)|) · log <sup>S</sup> , with O(log S) the cost of each arithmetic operation involved. Summing all up, the actual asymptotic complexity of the procedure can, therefore, be expressed as O(n · m · W · log(n · W)).

**Theorem 6 (Complexity).** QDPM *requires time* O(<sup>n</sup> · <sup>m</sup> · <sup>W</sup> · log(<sup>n</sup> · <sup>W</sup>)) *to solve an* MPG *with* n *positions,* m *moves, and maximal positive weight* W*.*

## **5 Experimental Evaluation**

In order to assess the effectiveness of the proposed approach, we implemented both QDPM and SEPM [13], the most efficient known solution to the problem and the more closely related one to QDPM, in C++ within Oink [32]. Oink has been developed as a framework to compare parity game solvers. However, extending the framework to deal with MPGs is not difficult. The form of the arenas of the two types of games essentially coincide, the only relevant difference being that MPGs allow negative numbers to label game positions. We ran the two solvers against randomly generated MPGs of various sizes. <sup>1</sup>

<sup>1</sup> The experiments were carried out on a 64-bit 3.9GHz quad-core machine, with Intel i5-6600K processor and 8GB of RAM, running Ubuntu 18.04.

Figure 3 compares the solution time, expressed in seconds, of the two algorithms on 4000 games, each with 10<sup>4</sup> positions and randomly assigned weights in the range [−<sup>15</sup> <sup>×</sup> <sup>10</sup><sup>3</sup>, <sup>15</sup> <sup>×</sup> <sup>10</sup>3]. The scale of both axes is logarithmic. The experiments are divided in 4 clusters, each containing 1000 games. The benchmarks in different clusters differ in the maximal number m of outgoing moves per position, with m ∈ {10, 20, 40, 80}. These experiments clearly show that QDPM substantially outperforms SEPM. Most often, the gap between the two algorithms is between two and three orders

Fig. 3: Random games with 10<sup>4</sup> positions.

of magnitude, as indicated by the dashed diagonal lines. It also shows that SEPM is particularly sensitive to the density of the underlying graph, as its performance degrades significantly as the number of moves increases. The maximal solution time was 21000 sec. for SEPM and 0.017 sec. for QDPM. Figure 4, instead, compares the two algorithms fixing the maximal out-degree of the underlying graphs to 2, in the left-hand picture, and to 40, in the right-hand one, while increasing the number of positions from 10<sup>3</sup> to 10<sup>5</sup> along the x-axis. Each picture displays the performance results on 2800 games. Each point shows the total time to solve 100 randomly generated games with that given number of positions, which increases by 1000 up to size 2·10<sup>3</sup> and by 10000, thereafter. In both pictures the scale is logarithmic. For the experiments in the right-hand picture we had to set a timeout for SEPM to 45 minutes per game, which was hit most of the times on the bigger ones. Once again, the QDPM significantly outperforms SEPM on both kinds of benchmarks, with a gap of more than an order of magnitude on the first ones, and a gap of more than three orders of magnitude on the second ones. The results also confirm that the performance gap grows considerably as the number of moves per position increases.

Fig. 4: Total solution times in seconds of SEPM and QDPM on 5600 random games.

We are not aware of actual concrete benchmarks for MPGs. However, exploiting the standard encoding of parity games into mean-payoff games [25], we can compare the behavior of SEPM and QDPM on concrete verification problems encoded as parity games. For completeness, Table 1 reports some experiments on such problems. The table reports the execution times, expressed in seconds, required by the two algorithms to solve instances of two classic verification problems: the Elevator Verification and the Language Inclusion problems. These two benchmarks are included in the PGSolver [23] toolkit and are often used as benchmarks for parity games solvers. The first benchmark is a *verification under fairness* constraints of a simple model of an elevator, while the second one encodes the *language inclusion* problem between a non-deterministic B¨uchi automaton and a deterministic one. The results on various instances of those problems confirm that QDPM significantly outperforms the classic progress measure approach. Note also that the translation into MPGs, which encodes priorities as weights whose absolute value is exponential in the values of the priorities, leads to games with weights of high magnitude. Hence, the results in Table 1 provide further evidence that QDPM is far less dependent on the absolute value of the weights. They also show that QDPM can be very effective for the solution of real-world qualitative verification problems.

It is worth noting, though, that the translation from parity to MPGs gives rise to weights that are exponentially distant from each other [25]. As a consequence, the resulting benchmarks are not necessarily representative of MPGs, being a very restricted subclass. Nonetheless, they provide evidence of the applicability of the approach in practical scenarios.


Table 1: Concrete verification problems.

## **6 Concluding Remarks**

We proposed a novel solution algorithm for the decision problem of MPGs that integrates progress measures and quasi dominions. We argue that the integration of these two concepts may offer significant speed up in convergence to the solution, at no additional computational cost. This is evidenced by the existence of a family of games on which the combined approach can perform arbitrarily better than a classic progress measure based solution. Experimental results also show that the introduction of quasi dominions can often reduce solution times up to three order of magnitude, suggesting that the approach may be very effective in practical applications as well. We believe that the integration approach we devised is general enough to be applied to other types of games. In particular, the application of quasi dominions in conjunction with progress measure based approaches, such as those of [27] and [21], may lead to practically efficient quasi polynomial algorithms for parity games and their quantitative extensions.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Partial-Order Reduction for Parity Games with an Application on Parameterised Boolean Equation Systems**

Thomas Neele() , Tim A.C. Willemse, and Wieger Wesselink

Eindhoven University of Technology, Eindhoven, The Netherlands {t.s.neele, t.a.c.willemse, j.w.wesselink}@tue.nl

**Abstract.** Partial-order reduction (POR) is a well-established technique to combat the problem of state-space explosion. We propose POR techniques that are sound for parity games, a well-established formalism for solving a variety of decision problems. As a consequence, we obtain the first POR method that is sound for model checking for the full modal μ-calculus. Our technique is applied to, and implemented for the fixed point logic called *parameterised Boolean equation systems*, which provides a high-level representation of parity games. Experiments indicate that substantial reductions can be achieved.

## **1 Introduction**

In the field of formal methods, model checking [2] is a popular technique to analyse the behaviour of concurrent processes. However, the arbitrary interleaving of these parallel processes can cause an exponential blowup, which is known as the *state-space explosion* problem. Several approaches have been identified to alleviate this issue, by reducing the state space *on-the-fly*, *i.e.*, while generating it. Two established techniques are *symmetry reduction* [13] and *partial-order reduction* (POR) [8,26,30]. Whereas symmetry reduction can only be applied to systems that contain several copies of a component, POR also applies to heterogeneous systems. However, a major drawback of POR is that most variants at best preserve only a fragment of a given logic, such as LTL or CTL\* without the next operator (LTL−<sup>X</sup>/CTL<sup>∗</sup> <sup>−</sup><sup>X</sup>) [7] or the weak modal <sup>μ</sup>-calculus [28]. Furthermore, the variants of POR that preserve a branching time logic impose significant restrictions on the reduction by only allowing the prioritisation of exactly one action at a time. This decreases the amount of reduction achieved.

In this paper, we address these shortcomings by applying POR on parity games. A parity game is an infinite-duration, two-player game played on a directed graph with decorations on the nodes, in which the players *even* (denoted -) and *odd* (denoted -) strive to win the nodes of the graph. An application of parity games is encoding a model checking question: a combination of a model, in the form of a transition system, and a formal property, formulated in the modal μ-calculus [16]. In such games, every node v represents the combination of a state s from the transition system and a (sub)formula ϕ. Under a typical encoding, player wins in v if and only if ϕ holds in s.

In the context of model checking, parity games suffer from the same statespace explosion that models do. Exploring the state space of a parity game under POR can be a very effective way to address this. Our contributions are as follows:


Our approach has two distinct benefits over traditional POR techniques that operate on transition systems. First, it is the first work that enables the use of partial-order reduction for model checking for the full modal μ-calculus. Second, the conditions that we propose are strictly weaker than those necessary to preserve the branching structure of a transition system used in other approaches to POR for branching time logics [7,28], increasing the effectiveness of POR.

The experiments with our implementation for solving PBESs are quite promising. Our results show that, in particular, those instances in which PBESs encode model checking problems involving large state spaces benefit from the use of partial-order reduction. In such cases, a significant size reduction is possible, even when checking complex μ-calculus formulae, and the time penalty of conducting the static analysis is more than made up for by the speed-up in the state space exploration phase.

*Related Work* There are several proposals for using partial-order reduction for branching-time logics. Groote and Sellink [9] define several forms of *confluence reduction* and prove which behavioural equivalences (and by extension, which fragments of logics) are preserved. In confluence reduction, one tries to identify internal transitions that can safely be prioritised, leading to a smaller state space. Ramakrishna and Smolka [28] propose a notion that coincides with strong confluence from [9], preserving weak bisimilarity and the corresponding logic weak modal μ-calculus.

Similar ideas are presented by Gerth *et al.* in [7]. Their approach is based on the *ample set* method [26] and preserves a relation they call visible bisimulation and the associated logic CTL−<sup>X</sup>. To preserve the branching structure, they introduce a *singleton proviso* which, contrary to our theory, can greatly impair the amount of reduction that can be achieved (see our Example 3, page 7).

Valmari [33] describes the *stubborn sets* method for LTL−<sup>X</sup> model checking. In general, stubborn sets allow for larger reductions than ample sets. While investigating the use of stubborn sets for parity games, we identified a subtle issue in one of the stubborn set conditions (called **D1** in [33]). When applied to LTSs or KSs, this means that LTL−<sup>X</sup> is not necessarily preserved. Moreover, using the condition in the setting of parity games may result in games with different winners; for an example, see our technical report [24]. In [21], we further explore the consequences of the faulty condition for stubborn-set based POR techniques that can be found in the literature. We here resort to a strengthened version of condition **D1** that does not suffer from these issues.

Similar to our approach, Peled [27] applies POR on the product of a transition system and a B¨uchi automaton, which represents an LTL−<sup>X</sup> property. It is important to note, though, that this original theory is not sound, as discussed in [29]. Kan *et al.* [14] improve on Peled's ideas and manage to preserve all of LTL. To achieve this, they analyse the B¨uchi automaton that corresponds to the LTL formula to identify which part is stutter insensitive. With this information, they can reduce the state space in the appropriate places and preserve the validity of the LTL formula under consideration.

The recent work by Bønneland *et al.* [3] is close to ours in spirit, applying stubborn-set based POR to *reachability games*. Such games can be used for synthesis and for model checking reachability properties. Although the conditions on reduction they propose seem unaffected by the aforementioned issue with **D1**, unfortunately, their POR theory is nevertheless unsound, as we next illustrate.

In reachability games, player 1 tries to reach one of the *goal* states, while player 2 tries to avoid them. Bønneland *et al.* propose a condition **R** that guarantees that all goal states in the full game are also reachable in the reduced game. However, the reverse is not guaranteed: paths that do not contain a goal state are not necessarily preserved, essentially endowing player 1 with more power.

Consider the (solitaire) reachability game depicted on the right, in which all edges belong to player 2 and the only goal state is indicated with grey. Player 2 wins the non-reduced game by avoiding the goal state via the edges labelled with a and then b. However, {b} is a stubborn set—according to the conditions of [3]—in the initial state, and the dashed

transitions are thus eliminated in the reduced game. Hence, player 2 is forced to move the token to the goal state and player 1 wins in the reduced game. In the mean time, the authors of [3] confirmed and resolved the issue in [4].

*Outline.* We give a cursory overview of parity games in Section 2. In Section 3 we introduce partial-order reduction for parity games, and we introduce a further improvement in Section 3.3. Section 4 briefly introduces the PBES fixed point logic, and in Section 5, we describe how to effectively implement parity-game based POR for PBESs. We present the results of our experiments of using paritygame based POR for PBESs in Section 6. We conclude in Section 7.

## **2 Preliminaries**

Parity games are infinite-duration, two-player games played on a directed graph. The objective of the players, called *even* (denoted by -) and *odd* (denoted by -), is to win nodes in the graph.

**Definition 1.** *<sup>A</sup>* parity game *is a directed graph* <sup>G</sup> = (V,E,Ω,P)*, where*


We write <sup>s</sup> −→ <sup>t</sup> whenever (s, t) <sup>∈</sup> <sup>E</sup>. The set of successors of a node <sup>s</sup> is denoted with *succ*(s) = {<sup>t</sup> <sup>|</sup> <sup>s</sup> <sup>→</sup> <sup>t</sup>}. We use to denote an arbitrary player and ¯ to denote its opponent.

A parity game is played as follows: initially, a token is placed on some node of the graph. The owner of the node can decide where to move the token; the token may be moved along one of the outgoing edges. This process continues ad infinitum, yielding an infinite path of nodes that the token moves through. Such a path is called a *play*. A play <sup>π</sup> is won by player if the minimal priority that occurs infinitely often along π is even. Otherwise, it is won by player -.

To reason about moves that a player may want to take, we use the concept of *strategies*. A strategy <sup>σ</sup>- : <sup>V</sup> <sup>+</sup> <sup>→</sup> <sup>V</sup> for player is a partial function that determines where moves the token next, after the token has passed through a finite sequence of nodes. More formally, for all sequences s<sup>1</sup> ...s<sup>n</sup> such that <sup>P</sup>(sn) = , it holds that <sup>σ</sup>-(s<sup>1</sup> ...sn) ∈ *succ*(sn). If s<sup>n</sup> belongs to ¯ , <sup>σ</sup>-(s<sup>1</sup> ...sn) is undefined. A play s1, s2,... is *consistent* with a strategy σ if and only if σ(s<sup>1</sup> ...si) = s<sup>i</sup>+1 for all i such that σ(s<sup>1</sup> ...si) is defined. A player wins in a node <sup>s</sup> if and only if there is a strategy <sup>σ</sup> such that all plays that start in <sup>s</sup> and that are consistent with <sup>σ</sup>are won by player .

*Example 1.* Consider the parity game on the right. Here, priorities are inscribed in the nodes and the nodes are shaped according to their owner ( or -). Let π be an arbitrary, possibly empty, sequence of nodes. In this game, the strategy <sup>σ</sup>, partially defined as <sup>σ</sup>(πs1) = <sup>s</sup><sup>2</sup> and <sup>σ</sup>(πs2) = <sup>s</sup>1, is winning for in s<sup>1</sup> and s2. After all, the minimal priority that occurs infinitely often along

(s1s2)<sup>ω</sup> is 0, which is even. Player can win node s<sup>3</sup> with the strategy σ-(πs3) = <sup>s</sup>4. Note that player is always forced to move the token from node s<sup>4</sup> to s3.

## **3 Partial-Order Reduction**

In model checking, arbitrary interleaving of concurrent processes can lead to a combinatorial explosion of the state space. By extension, parity games that encode model checking problems for concurrent processes suffer from the same phenomenon. *Partial-order reduction* (POR) techniques help combat the blowup. Several variants of POR exist, such as *ample sets* [26], *persistent sets* [8] and *stubborn sets* [30,31]. The current work is based on Valmari's stubborn set theory as it can easily deal with nondeterminism [32].

## **3.1 Weak Stubborn Sets**

Partial-order reduction relies on edge labels, here referred to as *events* and typically denoted with the letter j, to categorise the set of edges in a graph and determine independence of edges. In a typical application of POR, such events and edge labellings are deduced from a high-level syntactic description of the graph structure (see also Section 4). A *reduction function* subsequently uses these events when producing an equivalent *reduced* graph structure from the same high-level description. For now, we tacitly assume the existence of a set of events and edge labellings for parity games and refer to the resulting structures as *labelled parity games*.

**Definition 2.** *<sup>A</sup>* labelled parity game *is a triple* <sup>L</sup> = (G, <sup>S</sup>, )*, where* <sup>G</sup> <sup>=</sup> (V,E,Ω,P) *is a parity game,* <sup>S</sup> *is a non-empty set of events and* : S → <sup>2</sup><sup>E</sup> *is an edge labelling.*

For the remainder of this section, we fix an arbitrary labelled parity game L = (G, <sup>S</sup>, ). We write <sup>s</sup> <sup>j</sup> −→ <sup>t</sup> whenever <sup>s</sup> −→ <sup>t</sup> and (s, t) <sup>∈</sup> (j). The same notation extends to longer executions s <sup>j</sup>1...j<sup>n</sup> −−−−→ <sup>t</sup>. We say an event <sup>j</sup> is enabled in a node s, notation s <sup>j</sup> −→, if and only if there is a transition <sup>s</sup> <sup>j</sup> −→ <sup>t</sup> for some <sup>t</sup>. The set of all enabled events in a node s is denoted with *enabled* <sup>G</sup>(s). An event j is *invisible* if and only if s <sup>j</sup> −→ <sup>t</sup> implies <sup>P</sup>(s) = <sup>P</sup>(t) and <sup>Ω</sup>(s) = <sup>Ω</sup>(t). Otherwise, <sup>j</sup> is *visible*.

A *reduction function* indicates which edges are to be explored in each node, based on the events associated to the edges. Given some initial node ˆs, such a function induces a unique *reduced labelled parity game* as follows.

**Definition 3.** *Given a node* <sup>s</sup><sup>ˆ</sup> <sup>∈</sup> <sup>V</sup> *and a* reduction function <sup>r</sup> : <sup>V</sup> <sup>→</sup> <sup>2</sup><sup>S</sup> *. The* reduced labelled parity game *induced by* r *and starting from* sˆ *is defined as* L<sup>r</sup> = (Gr, S, r)*, where* r(j) = (j) ∩ E<sup>r</sup> *and* G<sup>r</sup> = (Vr, Er,Ω,P) *is such that:* **–** <sup>E</sup><sup>r</sup> <sup>=</sup> {(s, t) <sup>∈</sup> <sup>E</sup> | ∃<sup>j</sup> <sup>∈</sup> <sup>r</sup>(s):(s, t) <sup>∈</sup> (j)} *is the transition relation under* <sup>r</sup>*;* **–** <sup>V</sup><sup>r</sup> <sup>=</sup> {<sup>s</sup> <sup>|</sup> sE<sup>ˆ</sup> <sup>∗</sup> <sup>r</sup> s} *is the set of nodes reachable with* Er*, where* E<sup>∗</sup> <sup>r</sup> *is the reflexive transitive closure of* Er*.*

Note that a reduced labelled parity game is only well-defined when r(s) ∩ *enabled*G(s) = ∅ for every node s ∈ Vr; if this property does not hold, E<sup>r</sup> is not total. Even if totality of E<sup>r</sup> is guaranteed, the same node s may be won by different players in L and L<sup>r</sup> if no restrictions are imposed on r. The following conditions on r, as we will show, are sufficient to ensure both. Below, we say an event j is a *key event* in s iff for all executions s <sup>j</sup>1...j<sup>n</sup> −−−−→ <sup>s</sup> such that <sup>j</sup><sup>1</sup> <sup>∈</sup>/ <sup>r</sup>(s),...,j<sup>n</sup> <sup>∈</sup>/ <sup>r</sup>(s), we have <sup>s</sup> <sup>j</sup> −→. Key events are typically denoted <sup>j</sup>key.

**Definition 4.** *We say that a reduction function* <sup>r</sup> : <sup>V</sup> <sup>→</sup> <sup>2</sup><sup>S</sup> *is a* weak stubborn set *iff for all nodes* <sup>s</sup> <sup>∈</sup> <sup>V</sup> *, the following conditions hold*<sup>1</sup>*:*

<sup>1</sup> As noted before, the condition **D1** that we propose is stronger than the version in literature [30,33] since that one suffers from the *inconsistent labelling problem* [21] which also manifests itself in the parity game setting, see our technical report [24].

**D1** *For all* <sup>j</sup> <sup>∈</sup> <sup>r</sup>(s) *and* <sup>j</sup><sup>1</sup> <sup>∈</sup>/ <sup>r</sup>(s),...,j<sup>n</sup> <sup>∈</sup>/ <sup>r</sup>(s)*, if* <sup>s</sup> <sup>j</sup><sup>1</sup> −→ <sup>s</sup><sup>1</sup> <sup>j</sup><sup>2</sup> −→ ··· <sup>j</sup><sup>n</sup> −→ <sup>s</sup><sup>n</sup> <sup>j</sup> −→ s <sup>n</sup>*, then there are nodes* s , s1,...,s <sup>n</sup>−<sup>1</sup> *such that* <sup>s</sup> <sup>j</sup> −→ <sup>s</sup> <sup>j</sup><sup>1</sup> −→ <sup>s</sup> 1 j2 −→ ··· <sup>j</sup><sup>n</sup> −→ s <sup>n</sup>*. Furthermore, if* <sup>j</sup> *is invisible, then* <sup>s</sup><sup>i</sup> <sup>j</sup> −→ <sup>s</sup> <sup>i</sup> *for every* 1 ≤ i<n*.*

**D2w** <sup>r</sup>(s) *contains a key event in* <sup>s</sup>*.*


Below, we also use (weak) stubborn set to refer to the set of events r(s) in some node s. First, note that every key event, which we typically denote by jkey, in a node <sup>s</sup> is enabled in <sup>s</sup>, by taking <sup>n</sup> = 0 in **D2w**; this guarantees totality of <sup>E</sup>r. Condition **D1** ensures that whenever an enabled event is selected for the stubborn set, it does not disable executions not in r(s). A stubborn set can never be empty, due to **D2w**. In a traditional setting where POR is applied on a transition system, the combination of **D1** and **D2w** is sufficient to preserve deadlocks. Condition **V** enforces that either all visible events are selected for the stubborn set, or none are. Condition **L** prevents the so called *action-ignoring problem*, where a certain event is never selected for the stubborn set and ignored indefinitely. Combined, **I** and **L** preserve plays with invisible events only.

We use the example below to further illustrate the purpose of—and need for conditions **V**, **I** and **L**. In particular, the example illustrates that the winning player in the original game and the reduced game might be different if one of these conditions is not satisfied.

*Example 2.* See the three parity games of Figure 1. From left to right, these games show a reduced game under a reduction function satisfying **D1** and **D2w** but not **V**, **I** or **L**, respectively. In each case, we start exploration from the node called ˆs, using the reduction function to follow the solid edges; consequently, the winning strategy <sup>σ</sup> for player in the original game is lost.

Note that the games in Figure 1 are from a subclass of parity games called *weak solitaire*, illustrating the need for the identified conditions even in restricted

**Fig. 1.** Three games that show the winner is not necessarily preserved if we drop one of the conditions **V**, **I** or **L**, respectively. The dashed nodes and edges are present in the original game, but not in the reduced game. The edges taken from ˆs by the winning strategy for player in the original game are indicated below each game.

settings. A game is *weak* if the priorities along all its paths are non-decreasing, *i.e.*, if s → t then Ω(s) ≤ Ω(t). A game is *solitaire* if only one player can make non-trivial choices. Weak solitaire games can encode the model checking of safety properties; solitaire games can capture logics such as LTL and ACTL∗.

Before we argue for the correctness of our POR approach in the next section, we finish with a small example that illustrates how our approach improves over existing methods for branching time logics.

*Example 3.* The conditions **C1**-**C3** of Gerth *et al.* [7] preserve LTL−<sup>X</sup> and are similar in spirit to our conditions. However, to preserve the branching structure, needed for preservation of CTL−<sup>X</sup>, the following *singleton proviso* is introduced: **C4** Either *enabled* <sup>G</sup>(s) <sup>⊆</sup> <sup>r</sup>(s) or <sup>|</sup>r(s)<sup>|</sup> = 1.

This extra condition can severely impact the amount of reduction achieved: consider the following two processes, where n ≥ 1 is some large natural number.

$$\star\rightsquigarrow\bigotimes\_{a'\_1}^{a\_1}\bigotimes ::= \cdots :: \rightsquigarrow\bigotimes\_{a'\_n}^{a\_n}\bigotimes\_{b'\_1} \qquad \star\bigotimes\_{b'\_1}^{b\_1}\bigotimes ::= \cdots :: \rightsquigarrow\bigotimes\_{b'\_n}^{b\_n}\bigotimes\_{b'\_n}$$

The cross product of these processes contains (n+ 1)<sup>2</sup> states. In the initial state, neither {a1, a <sup>1</sup>} nor {b1, b <sup>1</sup>} is a valid stubborn set, due to **C4**. However, the labelled parity game constructed using these processes and the μ-calculus formula νX.([−]X ∧μY.(−Y ∨ a<sup>n</sup>*true*)), has a very similar shape that *can* be reduced by prioritising transitions that correspond to b<sup>i</sup> or b <sup>i</sup> for some 1 ≤ i ≤ n. Note that this formula cannot be represented in LTL; condition **C4** is therefore essential for the correctness. While several optimisations for CTL−<sup>X</sup> model checking under POR are proposed in [19], unlike our approach, those optimisations only work for certain classes of CTL−<sup>X</sup> formulas and not in general.

## **3.2 Correctness**

Condition **D2w** suffices, as we already argued, to preserve totality of the transition relation of the reduced labelled parity game. Hence, we are left to argue that the reduced game preserves and reflects the winner of the nodes of the original game; this is formally claimed in Theorem 1. We do so by constructing a strategy in the reduced game that mimics the winning strategy in the original game. The plays that are consistent with these two strategies are then shown to be *stutter equivalent*, which suffices to preserve the winner.

Fix a labelled parity game L = (G, S, ), a node ˆs, a weak stubborn set r and the reduced labelled parity game L<sup>r</sup> = (Gr, S, r) induced by r and ˆs. We assume r and ˆs are such that G<sup>r</sup> has a finite state space. Below, ω is the set containing all natural numbers and the smallest infinite ordinal number.

**Definition 5.** *Let* <sup>π</sup> <sup>=</sup> <sup>s</sup>0s1s<sup>2</sup> ... *and* <sup>π</sup> <sup>=</sup> <sup>t</sup>0t1t<sup>2</sup> ... *be two paths in* <sup>G</sup>*. We say* π *and* π *are stutter equivalent, notation* π π *, if and only if one of the following conditions holds:*

**–** <sup>π</sup> *and* <sup>π</sup> *are both finite and there exists a non-decreasing partial function* f : ω → ω*, with* f(0) = 0 *and* f(|π|−1) = |π |−1*, such that for all* 0 ≤ i < |π| *and* i ∈ [f(i), f(i + 1))*, it holds that* P(si) = P(t<sup>i</sup>- ) *and* Ω(si) = Ω(t<sup>i</sup>-)*.*

**–** <sup>π</sup> *and* <sup>π</sup> *are both infinite and there exists an unbounded, non-decreasing total function* f : ω → ω*, with* f(0) = 0*, such that for all* i *and* i ∈ [f(i), f(i+1))*, it holds that* P(si) = P(t<sup>i</sup>- ) *and* Ω(si) = Ω(t<sup>i</sup>-)*.*

**Lemma 1.** *All infinite stutter equivalent paths have the same winner.*

In the lemmata below, we write →<sup>r</sup> to stress which transition must occur in Gr.

**Lemma 2.** *Suppose* <sup>s</sup><sup>0</sup> j1 −→ ··· <sup>j</sup><sup>n</sup> −→ <sup>s</sup><sup>n</sup> <sup>j</sup> −→ <sup>s</sup> <sup>n</sup> *for* j<sup>1</sup> ∈/ r(s0),...,j<sup>n</sup> ∈/ r(s0) *and* j ∈ r(s0)*. Then for some* s 0,...,s <sup>n</sup>*, both* s<sup>0</sup> j −→<sup>r</sup> <sup>s</sup> 0 j1 −→ ··· <sup>j</sup><sup>n</sup> −→ <sup>s</sup> <sup>n</sup> *and* s<sup>0</sup> ...sns <sup>n</sup> s0s <sup>0</sup> ...s n*.*

**Lemma 3.** *Suppose* <sup>s</sup><sup>0</sup> j1 −→ <sup>s</sup><sup>1</sup> <sup>j</sup><sup>2</sup> −→ ... *such that* <sup>j</sup><sup>i</sup> <sup>∈</sup>/ <sup>r</sup>(s0) *for every* <sup>j</sup><sup>i</sup> *occurring on this execution. Then, the following holds:*


We remark that Lemma 3 also holds for reduced labelled parity games that have an infinite state space, but where all the events are finitely branching. The proof of correctness, *viz.*, Theorem 1, uses the alternative executions described by Lemma 2 and 3. For full details, we refer to [24]; we here limit ourselves to sketching the intuition behind the application of these lemmata.

*Example 4.* The structure of Figure 2, in which parallel edges have the same label, visualises part of a game in which the solid edges labelled j1j2j<sup>3</sup> are part of a winning play for player -. This play is mimicked by path that follows the edges jkeyj2j1j keyj3, drawn with dashes. The new play reorders the events j1, j<sup>2</sup> and j<sup>3</sup> according to the construction of Lemma 2 and introduces the key events jkey and j key according to the construction of Lemma 3.

The following theorem shows that partial-order reduction preserves the winning player in all nodes of the reduced game. Its proof is inspired by [30] and [2, Lemma 8.21], and uses the aforementioned lemmata.

**Fig. 2.** Example of how <sup>j</sup>1, <sup>j</sup>2, <sup>j</sup><sup>3</sup> is mimicked by introducing <sup>j</sup>key and <sup>j</sup>- key and moving j<sup>2</sup> to the front (dashed trace). Transitions that are drawn in parallel have the same label.

**Theorem 1.** *If* <sup>G</sup><sup>r</sup> *has a finite state space then it holds that for every node* <sup>s</sup> *in* Gr*, the winner of* s *in* G<sup>r</sup> *is equal to the winner of* s *in* G*.*

## **3.3 Optimising D2w**

The theory we have introduced identifies and exploits rectangular structures in the parity game. This is especially apparent in condition **D1**. However, parity games obtained from model checking problems also often contain triangular structures, due to the (sometimes implicit) nesting of conjunctions and disjunctions, as the following example demonstrates.

*Example 5.* Consider the process (a b)·c, in which actions a and b are executed in (interleaved) parallel, and action c is executed upon termination of both a and b. The μ-calculus property μX.([a]X ∧ [b]X ∧ −*true*), also expressible in LTL, expresses that the action c must unavoidably be done within a finite number of steps; clearly this property holds true of the process. Below, the LTS is depicted on the left and a possible parity game encoding of our liveness property on this state space is depicted on the right. The edges in the labelled parity game that originate from the subformula −*true* are labelled with d.

Whereas the state space of the process can be reduced by prioritising a or b, the labelled parity game cannot be reduced due to the presence of a d-labelled edge in every node. For example, if s is the top-left node in the labelled parity game, then <sup>r</sup>(s) = {a, d} violates condition **D1**, since the execution <sup>s</sup> bd −→ exists, but s db −→ does not.

In order to deal with games that contain triangular structures, we propose a condition that is weaker than **D2w**.

**D2t** There is an event <sup>j</sup> <sup>∈</sup> <sup>r</sup>(s) such that for all <sup>j</sup><sup>1</sup> <sup>∈</sup>/ <sup>r</sup>(s),...,j<sup>n</sup> <sup>∈</sup>/ <sup>r</sup>(s), if s <sup>j</sup><sup>1</sup> −→ <sup>s</sup><sup>1</sup> <sup>j</sup><sup>2</sup> −→ ··· <sup>j</sup><sup>n</sup> −→ <sup>s</sup>n, then either <sup>s</sup><sup>n</sup> <sup>j</sup> −→ or there are nodes <sup>s</sup> , s 1,...,s n such that s <sup>j</sup> −→ <sup>s</sup> <sup>j</sup><sup>1</sup> −→ <sup>s</sup> 1 j2 −→ ··· <sup>j</sup><sup>n</sup> −→ <sup>s</sup> <sup>n</sup> and for all i, s<sup>i</sup> = s <sup>i</sup> or <sup>s</sup><sup>i</sup> <sup>j</sup> −→ <sup>s</sup> i.

Theorem 1 holds even for reduction functions satisfying the weak stubborn set conditions in which condition **D2t** is used instead of condition **D2w**. The proof thereof resorts to a modified construction of a mimicking winning strategy that is based on Lemma 4, described below, instead of Lemma 3.

**Lemma 4.** *Let* <sup>r</sup> *be a reduction function satisfying conditions D1, D2t, <sup>V</sup>, <sup>I</sup> and L. Suppose* s<sup>0</sup> j1 −→ <sup>s</sup><sup>1</sup> <sup>j</sup><sup>2</sup> −→ ... *such that* <sup>j</sup><sup>i</sup> <sup>∈</sup>/ <sup>r</sup>(s0) *for every* <sup>j</sup><sup>i</sup> *occurring on this execution. Then, the following holds:*

	- s<sup>n</sup> jkey −−→ <sup>s</sup> <sup>n</sup> *or* s<sup>n</sup> = s <sup>n</sup>*; and*

• s<sup>0</sup> jkey −−→<sup>r</sup> <sup>s</sup> 0 j1 −→ ··· <sup>j</sup><sup>n</sup> −→ <sup>s</sup> <sup>n</sup> *and* s<sup>0</sup> ...s<sup>n</sup> s0s <sup>0</sup> ...s n*.* **–** *If the execution is infinite, there exists another execution* <sup>s</sup><sup>0</sup> jkey −−→<sup>r</sup> <sup>s</sup> 0 j1 −→ s 1 j2 −→ ... *and* <sup>s</sup>0s<sup>1</sup> ··· <sup>s</sup>0s 0s <sup>1</sup> ... *.*

We remark that the concepts of triangular and rectangular structures bear similarities to the concept of weak confluence from [9].

## **4 Parameterised Boolean Equation Systems**

Parity games are used, among others, to solve *parameterised Boolean equation systems* (PBESs) [10], which, in turn, are used to answer, *e.g.*, first-order modal μ-calculus model checking problems [5]. In the remainder of this paper, we show how to apply POR in the context of solving a PBES (and, hence, the encoded decision problem). We first introduce PBESs and show how they induce labelled parity games.

Parameterised Boolean equation systems are sequences of fixed point equations over predicate formulae, *i.e.*, first-order logic formulae with second order variables. A PBES is given in the context of an abstract data type, which is used to reason about data. Non-empty data sorts of the abstract data type are typically denoted with the letters D and E. The corresponding semantic domains are D and E. We assume that sorts B and N represent the Booleans and the natural numbers respectively, and have B and N as semantic counterpart. The set of data variables is V, and its elements are usually denoted with d and e. To interpret expressions with variables, we use a *data environment* δ, which maps every variable in V to an element of the corresponding sort. The semantics of an expression <sup>f</sup> in the context of such an environment is denoted <sup>f</sup>δ. For instance, x < 2 + <sup>y</sup><sup>δ</sup> holds true iff <sup>δ</sup>(x) <sup>&</sup>lt; 2 + <sup>δ</sup>(y). To update an environment, we use the notation δ[v/d], which is defined as δ[v/d](d) = v and δ[v/d](d ) = δ(d ) for all variables d = d .

For lack of space, we only consider PBESs in *standard recursive form* [22,23], a normal form in which each right-hand side of an equation is a *guarded* formula instead of an arbitrary (monotone) predicate formula. We remark that a PBES can be rewritten to SRF in linear time, while the number of equations grows linearly in the worst case [23, Proposition 2].

Let X be a countable set of predicate variables. In the exposition that follows we assume for the sake of simplicity (but without loss of generality) that all predicate variables X ∈ X are of type D. We permit ourselves the use of nonuniformly typed predicate variables in our example.

**Definition 6.** *A guarded formula* <sup>φ</sup> *is a disjunctive or conjunctive formula of the form:*

$$\bigvee\_{j \in J} \exists e\_j ; E\_j. \, f\_j \land X\_j(g\_j) \; or \bigwedge\_{j \in J} \forall e\_j ; E\_j. \, f\_j \Rightarrow X\_j(g\_j)$$

*where* J *is an index set, each* f<sup>j</sup> *is a Boolean expression, referred to as* guard*, every* e<sup>j</sup> *is a (bound) variable of sort* E<sup>j</sup> *, each* g<sup>j</sup> *is an expression of type* D *and* *each* X<sup>j</sup> *is a predicate variable of type* D*. A guarded formula* φ *is said to be* total *if for each data environment* <sup>δ</sup>*, there is a* <sup>j</sup> <sup>∈</sup> <sup>J</sup> *and* <sup>v</sup> <sup>∈</sup> <sup>E</sup><sup>j</sup> *such that* <sup>f</sup><sup>j</sup> δ[v/e<sup>j</sup> ] *holds true.*

The denotational semantics of a guarded formula is given in the context of a data environment δ for interpreting data expressions and a *predicate environment* <sup>η</sup> : X → <sup>2</sup><sup>D</sup>, yielding an interpretation of <sup>X</sup><sup>j</sup> (g<sup>j</sup> ) as the truth value <sup>g</sup><sup>j</sup> <sup>δ</sup> <sup>∈</sup> <sup>η</sup>(X<sup>j</sup> ). Given a predicate environment and a data environment, a guarded formula induces a monotone operator on the complete lattice (2<sup>D</sup>, <sup>⊆</sup>). By Tarski's theorem, least (μ) and greatest (ν) fixed points of such operators are guaranteed to exist.

**Definition 7.** *<sup>A</sup>* parameterised Boolean equation *in SRF is an equation that has the shape* (μX(d:D) = φ(d)) *or* (νX(d:D) = φ(d))*, where* φ(d) *is a total guarded formula in which* d *is the only free data variable. A* parameterised Boolean equation system *in SRF is a sequence of parameterised Boolean equations in SRF, in which no two equations have the same left-hand side variable.* Henceforward, let E = (σ1X1(d:D) = ϕ1(d))...(σnXn(d:D) = ϕn(d)) be a fixed,

arbitrary PBES in SRF, where σ<sup>i</sup> ∈ {μ, ν}. The set of *bound predicate variables* of E, denoted bnd(E), is the set {X1,...,X<sup>n</sup>}. If the predicate variables occurring in the guarded formulae ϕi(d) of E are taken from bnd(E), then E is said to be *closed*; we only consider closed PBESs. Every bound predicate variable is assigned a *rank*, where rank<sup>E</sup> (Xi) is the number of alternations in the sequence of fixpoint symbols νσ1σ<sup>2</sup> ...σi. Observe that rank<sup>E</sup> (Xi) is *even* iff σ<sup>i</sup> = ν. We use the function op<sup>E</sup> : bnd(E) → {∨, ∧} to indicate for each predicate variable in E whether the associated equation is disjunctive or conjunctive. As a notational convenience, we write J<sup>i</sup> to refer to the index set of the guarded formula ϕi(d), and we assume that the index sets are disjoint for different equations.

The standard denotational fixed point semantics of a closed PBES associates a subset of D to each bound predicate variable (*i.e.*, their meaning is independent of the predicate environment used to interpret guarded formulae). For details of the standard denotational fixed point semantics of a PBES we refer to [10]. We forego the denotational semantics and instead focus on the (provably equivalent, see *e.g.* [23,6]) game semantics of a PBES in SRF.

**Definition 8.** *The* solution *to* <sup>E</sup> *is a mapping* -<sup>E</sup> : bnd(E) <sup>→</sup> <sup>2</sup><sup>D</sup>*, defined as* -<sup>E</sup>(Xi) = {<sup>v</sup> <sup>∈</sup> <sup>D</sup> <sup>|</sup> (Xi, v) *is won by in* G<sup>E</sup> }*, where* X<sup>i</sup> ∈ bnd(E) *and* G<sup>E</sup> *is the parity game associated to* E*. The game* G<sup>E</sup> = (V,E,Ω,P) *is defined as:*


Note that the parity game <sup>G</sup><sup>E</sup> may have an infinite state space when <sup>D</sup> is infinite. In practice, we are often interested in the part of the parity game that is reachable from some initial node (X, v); this is often (but not always) finite. This is illustrated by the following example.

*Example 6.* Consider the following PBES in SRF:

$$\begin{aligned} (\nu X(b;B) &= (b \wedge X(false)) \vee \exists n . N. n \le 2 \wedge Y(b, if(b,n,0))) \\ (\mu Y(b;B,n;N) &= true \Rightarrow Y(false,0)) \end{aligned}$$

The six nodes in the parity game which are reachable from (X, *true*) are depicted in Figure 3. The horizontally drawn edges all stem from the clause ∃n:N.n ≤ 2∧Y (b, *if* (b, n, 0)). Vertical edges stem from the clause b∧ X(*false*) (on the left) or the clause *true* ⇒ Y (*false*, 0) (on the right). The selfloop also stems from the clause *true* <sup>⇒</sup> <sup>Y</sup> (*false*, 0). Player wins all nodes in this game, and thus *true* <sup>∈</sup>/ -<sup>E</sup>(X).

As suggested by the above example, each edge is associated to (at least) one clause in E. Consequently, we can use the index sets J<sup>i</sup> to event-label the edges emanating from nodes associated with the equation for Xi. We denote the set of all events in E by evt(E), defined as evt(E) = <sup>X</sup>i∈bnd(E) <sup>J</sup>i. Event <sup>j</sup> <sup>∈</sup> <sup>J</sup><sup>i</sup> is *invisible* if rank<sup>E</sup> (Xi) = rank<sup>E</sup> (X<sup>j</sup> ) and op<sup>E</sup> (Xi) = op<sup>E</sup> (X<sup>j</sup> ), and *visible* otherwise.

**Fig. 3.** Reachable part of the parity game underlying the PBES of Example 6, when starting from node (X, *true*).

**Definition 9.** *Let* <sup>G</sup><sup>E</sup> *be the parity game associated to* <sup>E</sup>*. The labelled parity game associated to* E *is the structure* (G<sup>E</sup> , evt(E), )*, where* G<sup>E</sup> *is as defined in Def. 8, and, for* j ∈ Ji*,* (j) *is defined as the set* {(Xi, v),(X<sup>j</sup> , w) ∈ E | <sup>f</sup><sup>j</sup> δ[v/d] *holds true and* <sup>w</sup> <sup>=</sup> <sup>g</sup><sup>j</sup> δ[v/d] *for some* <sup>δ</sup>}*.*

## **5 PBES Solving Using POR**

A consequence of the partial-order reduction theorem is that a reduced parity game suffices for computing the truth value to X(e) for a given PBES E with X ∈ bnd(E). However, **D1**, **D2w**/**D2t** and **L** are conditions on the (reduced) state space as a whole and, hence, hard to check locally. We therefore approximate these conditions in such a way that we can construct a stubborn set *on-the-fly*.

From hereon, let E be a PBES in SRF and (G, S, ), with G = (V,E,Ω,P), its labelled parity game. The most common local condition for **L** is the *stack proviso* **<sup>L</sup>**<sup>S</sup> [26]. This proviso assumes that the state space is explored with *depth-first search* (DFS), and it uses the *Stack* that stores unexplored nodes to determine whether a cycle is being closed. If so, the node will be *fully expanded*, *i.e.*, r(s) = S.

**<sup>L</sup>**<sup>S</sup> For all nodes <sup>s</sup> <sup>∈</sup> <sup>V</sup>r, either *succ*<sup>G</sup><sup>r</sup> (s) <sup>∩</sup> *Stack* <sup>=</sup> <sup>∅</sup> or <sup>r</sup>(s) = <sup>S</sup>.

Locally approximating conditions **D1** and **D2w** requires a static analysis of the PBES. For this, we draw upon ideas from [17] and extend these to properly deal with non-determinism. To reason about which events are independent, we rely on the idea of *accordance*.

**Definition 10.** *Let* j, j ∈ S*. We define the* accordance *relations DNL, DNS , DNT and DNA on* S *as follows:*


Note that *DNL* and *DNT* are not necessarily symmetric. An illustration of the left-according, square-according and triangle-according conditions is given below.

$$\begin{array}{c} s \xrightarrow{j'} \begin{array}{c} s\_1 \\ \vdots \\ s' \end{array} \Rightarrow \begin{array}{c} s \\ \end{array} \xrightarrow{s} \begin{array}{c} \xrightarrow{j'} \\ \end{array} s\_1 \\ \end{array} \begin{array}{c} s \\ \end{array} \begin{array}{c} \xrightarrow{j'} \\ \end{array} s\_1 \\ \end{array} \Rightarrow \begin{array}{c} \xrightarrow{j'} \begin{array}{c} \\ \end{} s\_1 \\ \end{array} \begin{array}{c} \xrightarrow{j'} \\ \end{array} s\_1 \\ \end{array} \begin{array}{c} \xrightarrow{j'} \begin{array}{c} \\ \end{array} s\_1 \\ \end{array} \Rightarrow \begin{array}{c} \xrightarrow{j'} \begin{array}{c} \\ \end{} s\_1 \\ \end{array} \begin{array}{c} \xrightarrow{j'} \begin{array}{c} \\ \end{array} s\_1 \\ \end{array} \begin{array}{c} \xrightarrow{j'} \begin{array}{c} \\ \end{array} s\_1 \\ \end{array} \begin{array}{c} \xrightarrow{j'} \begin{array}{c} \\ \end{array} s\_1 \\ \end{array} \begin{array}{c} \xrightarrow{j'} \begin{array}{c} \\ \end{array} s\_1 \\ \end{array} \begin{array}{c} \xrightarrow{j'} \begin{array}{c} \\ \end{array} s\_1 \\ \end{array} \begin{array}{c} \xrightarrow{j'} \begin{array}{c} \\ \end{array} s\_1 \\ \end{array} \begin{array}{c} \xrightarrow{j'} \begin{array}{c} \\ \end{array} s\_1 \\ \end{array} \begin{array}{c} \xrightarrow{j'} \begin{array}{c} \\ \end{array} s\_1 \\ \end{array} \begin{array}{c} \xrightarrow{j'} \begin{array}{c} \\ \end{array} s\_1 \\ \end{array} \begin{array}{c} \xrightarrow{j'} \begin{array}{c} \\ \end{$$

Accordance relations safely approximate the independence of events. The dependence of events, required for satisfying **D2w** can be approximated using Godefroid's *necessary enabling sets* [8].

**Definition 11.** *Let* <sup>j</sup> *be an event that is disabled in some node* <sup>s</sup>*. A* necessaryenabling set *(NES) for* j *in* s *is any set NES*s(j) ⊆ S *such that for every execution* <sup>s</sup> <sup>j</sup>1...jn<sup>j</sup> −−−−−→ *there is at least one* <sup>j</sup><sup>i</sup> *such that* <sup>j</sup><sup>i</sup> <sup>∈</sup> *NES*s(j)*.*

For every node and event there might be more than one NES. In particular, every superset of a NES is also a NES. A larger-than-needed NES may, however, have a negative impact on the reduction that can be achieved. In a PBES with multiple parameters per predicate variable, computing a NES can be done by determining which parameters influence the validity of guards f<sup>j</sup> and which parameters are changed in the update functions g<sup>j</sup> . A more accurate NES may be computed using techniques to extract a control flow from a PBES [15].

The following lemmata show how the accordance relations and necessaryenabling set can be used to implement conditions **D1**, **D2w** and **D2t**, respectively. A combination of Lemma 5 and 6 in a deterministic setting appeared as Lemma 1 in [17]. Note that as a notational convention we write R(j) to denote the projection {j | (j, j ) ∈ R} of a binary relation.

**Lemma 5.** *A reduction function* <sup>r</sup> *satisfies D1 in node* <sup>s</sup> <sup>∈</sup> <sup>V</sup> *if for all* <sup>j</sup> <sup>∈</sup> <sup>r</sup>(s)*:* **–** *if* <sup>j</sup> *is disabled in* <sup>s</sup>*, then NES*s(j) <sup>⊆</sup> <sup>r</sup>(s) *for some NES*s*; and* **–** *if* <sup>j</sup> *is enabled in* <sup>s</sup>*, then DNL*(j) <sup>⊆</sup> <sup>r</sup>(s)*.*

**Lemma 6.** *A reduction function* <sup>r</sup> *satisfies D2w in a node* <sup>s</sup> <sup>∈</sup> <sup>V</sup> *if there is an enabled event* j ∈ r(s) *such that DNS*(j) ⊆ r(s)*.*

**Lemma 7.** *A reduction function* <sup>r</sup> *satisfies D2t in a node* <sup>s</sup> *if there is an enabled event* j ∈ r(s) *such that DNA*(j) ⊆ r(s)*.*

More reduction can be achieved if a PBES is partly or completely 'deterministic', in which case some of the conditions can be relaxed. We say that an event j is *deterministic*, denoted by *det*(j), if for all nodes t, t , t <sup>∈</sup> <sup>V</sup> , if <sup>t</sup> <sup>j</sup> −→ <sup>t</sup> and t <sup>j</sup> −→ <sup>t</sup> , then also t = t . This means event-determinism can be characterised as follows:

*det*(j) iff <sup>f</sup><sup>j</sup> <sup>δ</sup> and <sup>f</sup><sup>j</sup> δ implies <sup>g</sup><sup>j</sup> <sup>δ</sup> <sup>=</sup> <sup>g</sup><sup>j</sup> δ for all δ, δ with <sup>δ</sup>(d) = <sup>δ</sup> (d).

The following lemma specialises Lemma 5 and shows how knowledge of deterministic events can be applied to potentially improve the reduction.

**Lemma 8.** *A reduction function* <sup>r</sup> *satisfies D1 in a node* <sup>s</sup> *if for all* <sup>j</sup> <sup>∈</sup> <sup>r</sup>(s)*:*


**–** *if* <sup>¬</sup>*det*(j) *and* <sup>j</sup> *is enabled in* <sup>s</sup>*, then DNL*(j) <sup>⊆</sup> <sup>r</sup>(s)*.*

Since relations *DNS* and *DNL* are incomparable we cannot decide *a priori* which should be used for deterministic events. However, Lemma 8 permits choosing one of the accordance sets on-the-fly. This choice can be made based on a heuristic function, similar to the function for NESs proposed in [17].

## **6 Experiments**

We implemented the ideas from the previous section in a prototype tool, called pbespor, as part of the mCRL2 toolset [5]; it is written in C++. Our tool converts a given input PBES to a PBES in SRF, runs a static analysis to compute the accordance relations (see Section 5), and uses a depth-first exploration to compute the parity game underlying the PBES in SRF. The static analysis relies on an external SMT solver (we use Z3 in our experiments). To limit the amount of static analysis required and to improve the reduction, the implementation contains a rudimentary way of identifying whether the same event occurs in multiple PBES equations. Experiments are conducted on a machine with an Intel Xeon 6136 CPU @ 3 GHz, running mCRL2 with Git commit hash dd36f98875.

To measure the effectiveness of our implementation, we analysed the following mCRL2 models<sup>2</sup>: Anderson's mutual exclusion protocol [1], the dining philosophers problem, the gas station problem [11], Hesselink's handshake register [12], Le Lann's leader election protocol [18], Milner's Scheduler [20] and the Krebs cycle of ATP production in biological cells (model inspired by [25]). Most of these models are scalable. Each model is subjected to one or more requirements phrased as mCRL2's first-order modal μ-calculus formulae. Where possible, Table 1 provides a CTL<sup>∗</sup> formula that captures the essence of the requirement.

We analyse the effectiveness of our partial-order reduction technique by measuring the reduction of the size of the state space, and the time that is required to generate the state space. Since the static analysis that is conducted can require a non-neglible amount of time, we pay close attention to the various forms of static analysis that can be conducted. In particular, we compare the total time and effectiveness (in terms of reduction) of running the following static analysis:

<sup>2</sup> The models are archived online at https://doi.org/10.5281/zenodo.3602969.

**Table 1.** Runtime (analysis + exploration; in seconds) and number of states when exploring either the full state space or the reduced state space, for four different static analysis approaches. Figures printed in boldface indicate which of the additional static analyses is able to achieve the largest reduction over 'basic' (if any).


**–** computing left-accordance (*DNL*) vs. over-approximating it with all events.


As a baseline for comparisons, we take a basic static analysis (over-approximated DNL, over-approximated NES, **D2w**), see column 'basic' in Table 1. In order to guarantee termination of the static analysis phase, we set a timeout of 200ms per formula that is sent to the solver. Table 1 reports on the statistics we obtained for exploring the full state space and the four possible POR configurations described above; the table is sorted with respect to the time needed for a full exploration. The time we list consists of the time needed to conduct the analysis plus the time needed for the exploration.

For most small instances, the time required for static analysis dominates any speed-up gained by the state space reduction. When the state spaces are larger, achieving a speed-up becomes more likely, while the highest overhead suffered by 'basic' is 55% (Hesselink, cache consistency). Significant reduction can be achieved even for non-trivial properties, such as 'lann.5' with 'no data loss'. Scheduler is an extreme case: its processes have very few dependencies, leading to an exponential reduction, both in terms of the state space size and in terms of time. In several cases, the use of a NES or **D2t** brings extra reduction (highlighted in bold). Moreover, the extra time required to conduct the additional analysis seems limited. The use of DNL, on the other hand, never pays off in our experiments; it even results in a slightly larger state space in two cases.

We note that there are also models, not listed in Table 1, where our static analysis does not yield any useful results and no reduction is achieved. Even if in such cases a reduction would be possible in theory, the current static analysis engines are unable to deal with the more complex data types often used in such models; *e.g.*, recursively defined lists or infinite sets, represented symbolically with higher-order constructions. This calls for further investigations into static analysis theories that can effectively deal with complex data.

Finally, we point out that in the case of, *e.g.*, the dining philosophers problem, the relative reduction under the 'no deadlock' property is much better than under the '∀-∀*eat*' property. This demonstrates the impact properties can have on the reductions achievable, and it also points at a phenomenon we have not stressed in the current work, *viz.*, the impact of identifying events on the reductions achievable. We explain the phenomenon in the following example.

*Example 7.* Consider the LTS and the parity game on the right. The parity game encodes the property νX.([−]X ∧ ∀i. μY.([ai]Y ∧ −*true*)), which is equivalent to <sup>∀</sup>-ai, on this LTS. The event xy represents the transition from fixpoint X into Y , which does not involve an action from the LTS. Note that the complete state space is encoded in the fixpoint X. Due to the absence of some transitions in the part of the state space encoded in fixpoint Y , neither a<sup>1</sup> nor a<sup>2</sup> is according with xy. Hence, the only stubborn set in the initial node is {a1, a2, xy}, which yields no reduction.

Improving the event identification procedure can yield more reduction. For instance, if, for each i (bound in the universal quantifier), a different event xy<sup>i</sup> is created, then both a1, xy<sup>2</sup> and a2, xy<sup>1</sup> will be according. If we disregard the visibility of xy<sup>1</sup> and xy2, four nodes can be eliminated.

## **7 Conclusion**

We have presented an approach for applying partial-order reduction on parity games. This has two main advantages over POR applied on LTSs or Kripke structures: our approach supports the full modal μ-calculus, not just a fragment thereof, and the potential for reduction is greater, because we do not require a singleton proviso. Furthermore, we have shown how the ideas can be implemented with PBESs as a high-level representation. In future work, we aim to gain more insight into the effect of identifying events across PBES equations in several ways. We also want to investigate the possibility of solving a reduced parity game while is it being constructed. In certain cases, one may be able to decide the winner of the original game from this partial solution.

## **References**


), which permits use, sharing, adaptation, distribution and reproduction in any **Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/ medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Polynomial Identification of** *ω***-Automata**

Dana Angluin<sup>1</sup> , Dana Fisman<sup>2</sup> , and Yaara Shoval<sup>2</sup>

<sup>1</sup> Yale University, New Haven, CT, USA <sup>2</sup> Ben-Gurion University, Be'er Sheva, Israel

**Abstract.** We study identification in the limit using polynomial time and data for models of ω-automata. On the negative side we show that non-deterministic ω-automata (of types B¨uchi, coB¨uchi, Parity or Muller) can not be polynomially learned in the limit. On the positive side we show that the ω-language classes IB, IC, IP, and IM that are defined by deterministic B¨uchi, coB¨uchi, parity, and Muller acceptors that are isomorphic to their right-congruence automata (that is, the right congruences of languages in these classes are fully informative) are identifiable in the limit using polynomial time and data. We further show that for these classes a characteristic sample can be constructed in polynomial time.

**Keywords:** identification in the limit, characteristic sample, ω-regular.

## **1 Introduction**

With the growing success of machine learning in efficiently solving a wide spectrum of problems, we are witnessing an increased use of machine learning techniques in formal methods for system design. One thread in recent literature uses general purpose machine learning techniques for obtaining more efficient verification/synthesis algorithms. Another thread, following the automata theoretic approach to verification [33,21] works on developing grammatical inference algorithms for verification and synthesis purposes. Grammatical inference (aka automata learning) refers to the problem of automatically inferring from examples a finite representation (e.g. an automaton, a grammar, or a formula) for an unknown language. The term model learning [31] was coined for the task of learning an automaton model for an unknown system. A large body of works has developed learning techniques for different automata types (e.g. I/O automata [1], register automata [20], symbolic automata [14], ω-automata [7], and program automata [25]) and has shown its usability in a diverse range of tasks.<sup>3</sup>

In grammatical inference, the learning algorithm does not learn a language, but rather a finite representation of it. Complexity of learning algorithms may

<sup>-</sup> This research was supported by grant 2016239 from the United States – Israel Binational Science Foundation (BSF).

<sup>3</sup> E.g., tasks such as black-box checking [28], specification mining [2], assume-guarantee reasoning [13], regular model checking [18], learning verification fixed-points [32], learning interfaces [27], analyzing botnet protocols [12] or smart card readers [10], finding security bugs [10], error localization [11], and code refactoring [26,29].

vary greatly by switching representations. For instance, if one wishes to learn regular languages, she may consider representations using deterministic finite automata (DFAs), non-deterministic finite automata (NFAs), regular expressions, linear grammars etc. Since the translation results between two such formalisms are not necessarily polynomial, a polynomial learnability result for one representation does not necessarily imply a polynomial learnability result for another representation. Let <sup>C</sup> be a class of representations <sup>C</sup> with a size measure size(C) (e.g. for DFAs the size measure can be the number of states in the minimal automaton). We extend size(·) to the languages recognized by representations in <sup>C</sup> by defining size(L) to be the minimum of size(C) over all <sup>C</sup> representing <sup>L</sup>. In this paper we restrict attention to automata representations, namely, acceptors.

There are various learning paradigms considered in the grammatical inference literature, roughly classified into passive and active. We mention here the two central ones. In passive learning the model of learning from finite data refers to the following problem: given a finite sample T ⊆ Σ<sup>∗</sup> × {0, 1} of labeled words, a learning algorithm **A** should return an acceptor C that agrees with the sample <sup>T</sup>. That is, for every (w,l) <sup>∈</sup> <sup>T</sup> the following holds: <sup>w</sup> <sup>∈</sup> -<sup>C</sup> iff <sup>l</sup> = 1 (where -<sup>C</sup> is the language accepted by <sup>C</sup>). The class <sup>C</sup> is identifiable in the limit using polynomial time and data if and only if there exists a polynomial time algorithm **<sup>A</sup>** that takes as input a labeled sample <sup>T</sup> and outputs an acceptor C ∈ <sup>C</sup> that is consistent with T, and **A** also satisfies the following condition. If L is any language recognized by an automaton from class C, then there exists a labeled sample T<sup>L</sup> consistent with L of length bounded by a polynomial in size(L), and for any labeled sample T consistent with L such that T<sup>L</sup> ⊆ T, on input T the algorithm **A** produces an acceptor C that recognizes L. In this case, T<sup>L</sup> is termed a characteristic sample for the algorithm **A**. In some cases (e.g., DFAs) there is also a polynomial time algorithm to compute a characteristic sample for **A**, given an acceptor C ∈ <sup>C</sup>.

In active learning the model of query learning [5] assumes the learner communicates with an oracle (sometimes called teacher ) that can answer certain types of queries about the language. The most common type of queries are membership queries (is w ∈ L where L is the unknown language) and equivalence queries (is -<sup>A</sup> <sup>=</sup> <sup>L</sup> where <sup>A</sup> is the current hypothesis for an acceptor recognizing <sup>L</sup>). Equivalence queries are typically assumed to return a counterexample, i.e. a word in -<sup>A</sup> \ <sup>L</sup> or in <sup>L</sup> \ -A.

With regard to ω-automata (automata on infinite words) most of the works consider query learning. The representations learned so far include: (L)\$ [15], a non-polynomial reduction to finite words; families of DFAs (FDFA) [7,8,6,22]; strongly unambiguous B¨uchi automata (SUBA) [3]; and deterministic weak parity automata (DWPA) [23]. Among these only the latter is learnable in polynomial time using membership queries and proper equivalence queries.

One of the main obstacles in obtaining a polynomial learning algorithm for ω-regular languages is that they do not in general have a Myhill-Nerode characterization; that is, there is no theorem correlating the states of a minimal automaton of some of the common automata types (B¨uchi, Parity, Muller, etc.) to the equivalence classes of the right congruence of the language. The right congruence relation for an ω-language L relates two finite words x and y iff there is no infinite suffix z differentiating them, that is x ∼<sup>L</sup> y (for x, y ∈ Σ∗) iff <sup>∀</sup><sup>z</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup>. xz <sup>∈</sup> <sup>L</sup> ⇐⇒ yz <sup>∈</sup> <sup>L</sup>. In our quest for finding a polynomial query learning algorithm for a subclass of the ω-regular languages, we have studied subclasses of languages for which such a relation holds [4], and termed them fully informative. We use IB,IC,IP,IM to denote the classes of languages that are fully informative of type B¨uchi, coB¨uchi, Parity and Muller, respectively. A language <sup>L</sup> is said to be fully informative of type <sup>X</sup> for <sup>X</sup> ∈ {B, <sup>C</sup>, <sup>P</sup>, <sup>M</sup>} if there exists a deterministic automaton of type X which is isomorphic to the automaton derived from ∼L. While a lot of properties about these classes are now known, in particular that they span the entire hierarchy of ω-regular properties [34], a polynomial learning algorithm for them has not been found yet.

In this paper we show that the classes IB,IC,IP,IM can be identified in the limit using polynomial time and data. We further show that there is a polynomial time algorithm to compute a characteristic sample given an acceptor C ∈ IX. A corollary of this result is that the class of languages accepted by DWPAs (which as mentioned above is polynomially learnable in the query learning setting) also has a polynomial size characteristic sample. On the negative side, we show that the classes NBA, NCA, NPA, NMA of non-deterministic B¨uchi, coB¨uchi, Parity and Muller automata, resp., cannot be identified in the limit using polynomial data.

## **2 Preliminaries**

Automata and Acceptors An automaton is a tuple A = Σ, Q, qι, δ consisting of a finite totally ordered alphabet Σ of symbols, a finite set Q of states, an initial state <sup>q</sup><sup>ι</sup> <sup>∈</sup> <sup>Q</sup>, and a transition function <sup>δ</sup> : <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>2</sup><sup>Q</sup>. A run of an automaton on a finite word v = a1a<sup>2</sup> ...a<sup>n</sup> is a sequence of states q0, q1,...,q<sup>n</sup> such that q<sup>0</sup> = qι, and for each i ≥ 0, qi+1 ∈ δ(qi, ai+1). A run on an infinite word is defined similarly and results in an infinite sequence of states. We say that A is deterministic if |δ(q, a)| ≤ 1 and complete if |δ(q, a)| ≥ 1, for every q ∈ Q and a ∈ Σ. We extend δ to domain Q × Σ<sup>∗</sup> in the usual manner, and abbreviate δ(q, σ) = {q } as δ(q, σ) = q .

By augmenting an automaton with an acceptance condition α, obtaining a tuple Σ, Q, qι, δ, α , we get an acceptor, a machine that accepts some words and rejects others. An acceptor accepts a word if at least one of the runs on that word is accepting. For finite words the acceptance condition is a set F ⊆ Q and a run on a word v is accepting if it ends in an accepting state, i.e., if δ(qι, v) contains an element of F. For infinite words, there are various acceptance conditions in the literature; we consider four: B¨uchi, coB¨uchi, parity, and Muller. The B¨uchi and coB¨uchi acceptance conditions are also a set F ⊆ Q. A run of a B¨uchi automaton is accepting if it visits F infinitely often. A run of a coB¨uchi is accepting if it visits F only finitely many times. A parity acceptance condition is a map <sup>κ</sup> : <sup>Q</sup> <sup>→</sup> <sup>N</sup> assigning each state a natural number termed a color (or priority). A run is accepting if the **minimum** color visited infinitely often is **odd**. A Muller acceptance condition is a set of sets of states α = {F1, F2,...,Fk} for some <sup>k</sup> <sup>∈</sup> <sup>N</sup> and <sup>F</sup><sup>i</sup> <sup>⊆</sup> <sup>Q</sup> for <sup>i</sup> <sup>∈</sup> [1..k]. A run of a Muller automaton is accepting if the set <sup>S</sup> of states visited infinitely often in the run is a member of <sup>α</sup>. We use -A to denote the set of words accepted by a given acceptor A. We use NBA, NPA, NMA, NCA for non-determinstic B¨uchi, parity, Muller and coB¨uchi, automata. We use NBA, NPA, NMA and NCA for the classes of languages they recognize. The first three recognize the full class of ω-regular languages while the forth only a subset of it.

Right congruences An equivalence relation ∼ on Σ<sup>∗</sup> is a right congruence if x ∼ y implies xv ∼ yv for every x, y, v ∈ Σ∗. The index of ∼, denoted |∼| is the number of equivalence classes of ∼. Given a language L ⊆ Σ<sup>∗</sup> its canonical right congruence ∼<sup>L</sup> is defined as follows: x ∼<sup>L</sup> y iff ∀z ∈ Σ<sup>∗</sup> we have xz ∈ L ⇐⇒ yz ∈ L. For a word v ∈ Σ<sup>∗</sup> the notation [v] is used for the equivalence class of ∼ in which v resides.

With a right congruence ∼ of finite index one can naturally associate an automaton M<sup>∼</sup> = Σ, Q, qι, δ as follows: the set of states Q consists of the equivalence classes of ∼. The initial state q<sup>ι</sup> is the equivalence class [ε]. The transition function δ is defined by δ([u], a)=[ua]. Similarly, given a complete deterministic automaton M = Σ, Q, qι, δ we can naturally associate with it a right congruence as follows: x ∼<sup>M</sup> y iff M reaches the same state when reading x or y. The Myhill-Nerode Theorem states that a language L is regular iff ∼<sup>L</sup> is of finite index. Moreover, if L is accepted by a DFA A then ∼<sup>A</sup> refines ∼L. Finally, the index of ∼<sup>L</sup> gives the size of the minimal complete DFA for L.

For an <sup>ω</sup>-language <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>ω</sup>, the right congruence <sup>∼</sup><sup>L</sup> is defined similarly, by quantifying over <sup>ω</sup>-words. That is, <sup>x</sup> <sup>∼</sup><sup>L</sup> <sup>y</sup> iff <sup>∀</sup><sup>z</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> we have xz <sup>∈</sup> <sup>L</sup> ⇐⇒ yz ∈ L. Given a deterministic automaton M we can define ∼<sup>M</sup> exactly as for finite words. However, for ω-regular languages, the relation ∼<sup>L</sup> does not suffice to obtain a "Myhill-Nerode" characterization. As an example consider the language <sup>L</sup> = (a+b)∗(bba)<sup>ω</sup>. We have that <sup>∼</sup><sup>L</sup> consists of just one equivalence class, since for any <sup>x</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> and <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> we have that xw <sup>∈</sup> <sup>L</sup> iff <sup>w</sup> has (bba)<sup>ω</sup> as a suffix. But an ω-acceptor recognizing L obviously needs more than a single state.

The classes IB, IC, IP and IM A language L is in IB (resp., IC, IP, IM) if there exists a deterministic B¨uchi (resp., coB¨uchi, parity, Muller) acceptor A such that <sup>L</sup> <sup>=</sup> -<sup>A</sup> and there is a 1-to-1 relationship between the states of <sup>A</sup> and the equivalence classes of ∼L: if x ∼<sup>L</sup> y then x and y reach the same state q in A, and an ω-word z is accepted from q iff xz ∈ L (which holds iff yz ∈ L). These classes are more expressive than one might conjecture, it was shown in [4] that in every class of the infinite Wagner hierarchy [34] there are languages in IM and IP. Moreover, in a small experiment reported in [4], among randomly generated Muller automata, the vast majority turned out to be in IM.

Examples and samples Since we need finite representations of examples, ω-words in our case, we work with ultimately periodic words, that is, words of the form <sup>u</sup>(v)<sup>ω</sup> where <sup>u</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> and <sup>v</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup>. It is known that two regular <sup>ω</sup>-languages are equivalent iff they agree on the set of ultimately periodic words, so this choice is not limiting. The example u(v)<sup>ω</sup> is concretely represented by the pair (u, v) of finite strings, and its length is <sup>|</sup>u<sup>|</sup> <sup>+</sup> <sup>|</sup>v|. A labeled example is a pair (u(v)<sup>ω</sup>, l), where the label l is either 0 or 1. A sample is a finite set of labeled examples such that no example is assigned two different labels. The length of a sample is the sum of the lengths of the examples that appear in it. A sample T and a language L are consistent with each other if and only if for every labeled example (u(v)<sup>ω</sup>, l) <sup>∈</sup> <sup>T</sup>, <sup>l</sup> = 1 iff <sup>u</sup>(v)<sup>ω</sup> <sup>∈</sup> <sup>L</sup>. A sample and an acceptor are consistent with each other if and only if the sample and the language recognized by the acceptor are consistent with each other. The following results give two useful procedures on examples that are computable in polynomial time.

**Claim 1.** Given <sup>u</sup>1, u<sup>2</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> and <sup>v</sup>1, v<sup>2</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup>, if <sup>u</sup>1(v1)<sup>ω</sup> <sup>=</sup> <sup>u</sup>2(v2)<sup>ω</sup> then they differ in at least one of the first symbols, where = max(|u1|, |u2|) + |v1|·|v2|.

Let suffixes(u(v)<sup>ω</sup>) denote the set of all ω-words that are suffixes of u(v)<sup>ω</sup>.

**Claim 2.** The set suffixes(u(v)<sup>ω</sup>) consists of at most <sup>|</sup>u|+|v<sup>|</sup> different examples: one of the form u (v)<sup>ω</sup> for every nonempty suffix u of u, and one of the form (v2v1)<sup>ω</sup> for every division of v into a non-empty prefix and suffix as v = v1v2.

Identification in the limit using polynomial time and data We consider the notion of identification in the limit using polynomial time and data. This criterion of learning was introduced by [16], who showed that regular languages of finite strings represented by DFAs are learnable in this sense. We follow a more general definition given by [19]. The definition has two requirements: (1) a learning algorithm **A** that runs in polynomial time on a set of labeled examples and produces a hypothesis consistent with the examples, and (2) that for every language L in the class, there exists a set T<sup>L</sup> of labeled examples of size polynomial in a measure of size of L such that on any set of labeled examples containing TL, the algorithm **A** outputs a hypothesis correct for L. Condition (1) ensures polynomial time, while condition (2) ensures polynomial data. The latter is not a worst-case measure; there could be arbitrarily large finite samples for which **A** outputs an incorrect hypothesis. However, de la Higuera shows that identifiability in the limit with polynomial time and data is closely related to a model of a learner and a helpful teacher introduced by [17].

## **3 Negative Results**

We start with negative results. We show that when the representation at hand is non-deterministic, polynomial identification is not feasible.

**Theorem 3.** The class NBA cannot be identified in the limit using polynomial data.

Proof. The proof follows the idea given in the negative result for learning in the limit NFAs from polynomial data [19]. For any integer M ≥ 2, let p1,...,p<sup>m</sup> be

Fig. 1: The NBA B<sup>M</sup> for M = 5.

the set of all primes less than or equal to M. For each such M, consider the NBA B<sup>M</sup> over a two letter alphabet Σ = {a, b} with p1+p2+...+p<sup>m</sup> +2 states, where state 0 has a-transitions to state (**p**, 1) for each **p** ∈ {p1, p2,...,pm}. State (**p**, i) has an a-transition to state (**p**, i ⊕<sup>p</sup> 1) where ⊕<sup>p</sup> is addition modulo p. All states except the states (**p**, 0) have a b-transition to state b. The state b has a self-loop on b. The only accepting state is b. The NBA B<sup>M</sup> for M = 5 is given in Fig. 1.

The NBA <sup>B</sup><sup>M</sup> accepts the set of all words of the form <sup>a</sup><sup>k</sup>b<sup>ω</sup> such that <sup>k</sup> is not a positive multiple of = p<sup>1</sup> · p<sup>2</sup> ··· pm. Note that the size of the shortest ultimately periodic word in <sup>a</sup>∗b<sup>ω</sup> \ -<sup>B</sup><sup>M</sup> is + 1, and thus, to distinguish the language -<sup>B</sup><sup>M</sup> from the language <sup>a</sup>∗b<sup>ω</sup>, a word of at least this size must be provided. Since the number of primes not greater than M is Θ(M/ log M) and since each prime is of size at least 2 the data must be of size at least 2<sup>Θ</sup>(M/ log <sup>M</sup>) while the number of states of <sup>B</sup><sup>M</sup> is <sup>O</sup>(M<sup>2</sup>).

Since NBAs are a special case of non-deterministic parity automata (NPA) and non-deterministic Muller automata (NMA) it follows that these models too cannot be identified in the limit using polynomial data. Note that indeed the NBA in the proof of Theorem 3 can be regarded as an NPA by setting the color of state b to 1 and the color of all other states to 0. Likewise it can be regarded as an NMA by defining the accepting set as {{b}}.

**Corollary 1.** The classes NPA and NMA cannot be identified in the limit using polynomial data.

While NBAs are not a special case of non-deterministic coB¨uchi automata (NCA) it can be shown that NCA as well cannot be identified in the limit from polynomial data, which is in some sense surprising, since NCAs are not more expressive than DCAs, their deterministic counterpart, and accept a very small subclass of the regular ω-languages.

**Theorem 4.** The class NCA cannot be identified in the limit using polynomial data.

Proof. The proof is almost identical to that of Theorem 3. The only difference is that it considers the automaton C<sup>M</sup> that takes exactly the same form as B<sup>M</sup> from that proof but switching accepting and non-accepting states. Since C<sup>M</sup> clearly accepts the same language as that of BM, with the same number of states, the proof continues exactly the same.

## **4 Outline for the positive results**

The rest of the paper is devoted to the positive results. To show that a class is identified in the limit using polynomial time and data there are two steps: (i) constructing a sample of words T<sup>L</sup> of size polynomial in the given acceptor M for the language L at hand, the so called, characteristic sample, and (ii) providing a learning algorithm that for every given sample T returns an acceptor consistent with that sample, and in addition for any sample T that subsumes T<sup>L</sup> returns an acceptor that exactly recognizes L.

Since the construction of the characteristic sample is simpler we start with that. We show that the classes IB, IC, IP and IM have characteristic samples of size polynomial in the number of states of the acceptor, and that the characteristic sample can be constructed in polynomial time. The definition of an acceptor is composed of two steps: (a) the definition of the automaton and (b) the definition of the acceptance condition. Some words are put in the sample to help retrieving the automaton and some to help retrieving the acceptance condition. We view the characteristic sample as a union of two parts TAut (for retrieving the automaton) and TAcc (for retrieving the acceptance condition). The learning algorithm first constructs the automaton, then retrieves the acceptance condition.

In Section 5 we discuss the construction of TAut which is common to all the classes we consider, as they all are isomorphic to the automaton of the right congruence. In Section 6 we show how an algorithm can retrieve the automaton using the labeled words in TAut. In Section 7 we discuss the construction of TAcc that regards the acceptance condition of the DPA. This part is the most involved one. We first associate with a DPA a canonical forest of its strongly connected components. From this canonical forest we build the TAcc part of the characteristic sample. In Section 8 we show a learning algorithm that can retrieve in polynomial time the acceptance condition of the DPA, from labeled examples in TAcc. This implies that IP (as well as its special cases IB and IC) can be learned in the limit from polynomial time and data. In Section 9 we show that the class IM can also be learned in the limit from polynomial time and data.

## **5 The characteristic sample for the automaton**

In this section we show how to construct the TAut part of the sample. We first show that any two states that are distinguishable in the automaton, are distinguishable by words of length polynomial in the number of states.

#### **5.1 Polynomial construction of short distinguishing words**

Let M be an acceptor in one of the classes IB, IC, IP or IM with states Q over alphabet Σ. If M is in one of the first three classes we use max{|Σ|, |Q|} for its size measure. If <sup>M</sup> <sup>∈</sup> IM we use max{|Σ|, <sup>|</sup>Q|, m} for its size measure where m is the number of sets in the acceptance condition α. We say that states q<sup>1</sup> and <sup>q</sup><sup>2</sup> of <sup>M</sup> are distinguishable if there exists a word <sup>z</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> that is accepted from one but not the other (and that z is a distinguishing word). First we show that any two distinguishable states of M are distinguishable by an ultimately periodic word of size polynomial in M. Then we show how to use these words to construct the TAut part of the characteristic sample.

**Proposition 5.** If two states of a DMA, DPA, DBA or DCA of n states are distinguishable, then they are distinguishable by an ultimately periodic ω-word of length bounded by n<sup>2</sup> + n<sup>4</sup>.

Proof. We prove that for a DMA M of n states, if two distinct states q<sup>1</sup> and q<sup>2</sup> are distinguishable, then they are distinguishable by an ultimately periodic ω-word of length bounded by n<sup>2</sup>+n<sup>4</sup>. Since any DPA, DBA or DCA is equivalent to an isomorphic DMA, the above result holds also for DPAs, DBAs and DCAs.

Because q<sup>1</sup> and q<sup>2</sup> are distinguishable, there exists an ultimately periodic ω-word x(y)<sup>ω</sup> that is accepted from exactly one of the two states. For each nonnegative integer k and i = 1, 2, let qi(k) be the state visited after k symbols of x(y)<sup>ω</sup> have been read, starting with state qi. Also, let C<sup>i</sup> be the set of states visited infinitely often by the sequence qi(k), which determines the acceptance or rejection of x(y)<sup>ω</sup> from qi. The sequence of pairs (q1(k), q2(k)) for k = 0, 1,... takes on at most n<sup>2</sup> different values. Let C be the set of pairs visited infinitely often by this sequence. The two projections π1(C) and π2(C) are C<sup>1</sup> and C2.

Let be the minimum value for which (q1(k), q2(k)) visits only pairs in C for all <sup>k</sup> <sup>≥</sup> . Let <sup>x</sup> be the prefix of <sup>x</sup>(y)<sup>ω</sup> consisting of symbols. By removing symbols between repeated pairs (q1(k), q2(k)) from x we obtain a string u of length at most n<sup>2</sup> that reaches the pair (q1( ), q2( )) from (q1(0), q2(0)). Let m be the minimum value for which (q1(k), q2(k)) for ≤ k ≤ m visits all the pairs of C and returns to (q1( ), q2( )), and let y be the string from symbol to m−1 of x(y)<sup>ω</sup>. Distinguishing a subsequence of pairs that visits each element of C once, we can remove from y sequences of symbols between repeated pairs that do not include a distinguished pair between them. Thus we obtain a string v of length at most <sup>|</sup>C|n<sup>2</sup>, that starts at (q1( ), q2( )), visits all the distinguished pairs and returns to the starting pair. Since <sup>|</sup>C| ≤ <sup>n</sup><sup>2</sup>, the length of <sup>u</sup>(v)<sup>ω</sup> is at most n<sup>2</sup> + n<sup>4</sup>. Also, since the set of states visited infinitely often on input u(v)<sup>ω</sup> from q<sup>i</sup> is C<sup>i</sup> we have that u(v)<sup>ω</sup> is accepted from exactly one of q<sup>1</sup> and q2.

For DPAs as well as DMAs there is a polynomial time algorithm to determine whether two states are distinguishable and to find a distinguishing ω-word u(v)<sup>ω</sup> if they are. This result relies on a polynomial time algorithm to test the equivalence of two DPAs or two DMAs and return an example u(v)<sup>ω</sup> on which they differ if not [9]. Since DBA and DCA are special cases of a DPA, a polynomial construction of a distinguishing word applies to them as well.

#### **5.2 Constructing the characteristic sample for the automaton**

We now show how to construct the TAut part of the characteristic sample, given an acceptor <sup>M</sup> in one of the classes IM, IP, IB or IC. Let <sup>n</sup> be the number of states of M. We may assume that every state of M is reachable from the initial state qι. The algorithm constructs a set S of n access strings by breadth-first search in the transition graph of M such that S is prefix-closed and contains exactly one lexicographically least string of shortest possible length reaching each state of M from the initial state. Using Proposition 5, the algorithm may also construct a set E of at most n<sup>2</sup> distinguishing experiments that contains for each pair <sup>q</sup><sup>1</sup> and <sup>q</sup><sup>2</sup> of distinct states of <sup>M</sup>, an <sup>ω</sup>-word <sup>u</sup>(v)<sup>ω</sup> of length at most n<sup>2</sup> + n<sup>4</sup> that is accepted from exactly one of the states q<sup>1</sup> and q2.

Part one of the sample, TAut, consists of all the examples in (S·E)∪(S·Σ·E), labeled to be consistent with <sup>M</sup>. There are at most (1+|Σ|)n<sup>3</sup> labeled examples in TAut, each of length bounded by a polynomial in n. This information is enough to allow the polynomial time learning algorithm to reconstruct a transition graph isomorphic to that of M.

**Proposition 6.** Let M be any deterministic automaton that is consistent with the sample TAut. Then M has at least n states and if M has exactly n states then M and M have isomorphic transition graphs.

Proof. The states of M reached from the initial state by the access strings in S must all be distinct, because for any pair of different strings s1, s<sup>2</sup> ∈ S, there exists a word <sup>u</sup>(v)<sup>ω</sup> <sup>∈</sup> <sup>E</sup> such that <sup>s</sup><sup>1</sup> · <sup>u</sup>(v)<sup>ω</sup> and <sup>s</sup><sup>2</sup> · <sup>u</sup>(v)<sup>ω</sup> have different labels in TAut. Thus M must have at least n distinct states.

Assume that M has exactly n states. Given the state q of M reached by some <sup>s</sup> <sup>∈</sup> <sup>S</sup> and a symbol <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>, the labeled examples <sup>s</sup> · <sup>σ</sup> · <sup>u</sup>(v)<sup>ω</sup> in <sup>T</sup>Aut for all <sup>u</sup>(v)<sup>ω</sup> <sup>∈</sup> <sup>E</sup> uniquely determine which string <sup>s</sup> <sup>∈</sup> <sup>S</sup> corresponds to the state reached in M from q on input symbol σ. Thus the transition graph of M is isomorphic to the transition graph of M.

## **6 Learning the automaton**

Let L denote the language to be learned, and M denote an acceptor of n states that is isomorphic to its right congruence automaton and recognizes L. Let the input sample of labeled examples be T. We now describe a learning algorithm **A** that makes use of the information in the given sample T to construct an automaton. If T subsumes TAut the returned automaton will be isomorphic to the acceptor M.

From the sample T, the algorithm constructs as follows a set E of strings that serve as experiments used to distinguish states. For each labeled example (u(v)<sup>ω</sup>, l) in T, all of the elements of suffixes(u(v)<sup>ω</sup>) are placed in E. Thus if the sample T includes TAut, then for any pair of states of M the set E includes an experiment that distinguishes them.

Starting with the empty string ε, the algorithm attempts to build up a prefixclosed set S of finite strings that reach different states of M from the initial state. Initially, S<sup>1</sup> = {ε}. After S<sup>k</sup> has been constructed, the algorithm attempts to determine, for each s ∈ S<sup>k</sup> and each symbol σ ∈ Σ in the ordering defined on Σ, whether s · σ reaches the same state as some string already in S<sup>k</sup> or a new state. If for each string <sup>s</sup> in <sup>S</sup>k, there exists some <sup>u</sup>(v)<sup>ω</sup> <sup>∈</sup> <sup>E</sup> such that the sample <sup>T</sup> has different labels for <sup>s</sup> · <sup>σ</sup> · <sup>u</sup>(v)<sup>ω</sup> and <sup>s</sup> · <sup>u</sup>(v)<sup>ω</sup>, then this is evidence that s · σ reaches a new state, and Sk+1 is set to S<sup>k</sup> ∪ {s · σ}. If no such pair s and σ is found, then the final set S is Sk. Because M has only n states, this case is reached with k ≤ n. If the sample T subsumes TAut then this process will discover exactly the strings reaching all n states of M used in the construction of TAut; otherwise, it may terminate early.

In the second phase, the algorithm uses the strings in S as names for states and constructs a transition function δ using S and E. For each s ∈ S and σ ∈ Σ, we know that there is at least one <sup>s</sup> <sup>∈</sup> <sup>S</sup> such that there is no <sup>u</sup>(v)<sup>ω</sup> <sup>∈</sup> <sup>E</sup> for which <sup>s</sup> · <sup>σ</sup> · <sup>u</sup>(v)<sup>ω</sup> and <sup>s</sup> · <sup>u</sup>(v)<sup>ω</sup> have different labels in <sup>T</sup> (possibly because one or more of these examples are not in T at all.) The algorithm selects one such s and defines δ (s, σ) = s . If the strings in S actually reach all the states of M and the choice of s is unique in each case, then δ will be isomorphic to the transition function of M. This will be the case if the sample T includes TAut because then among the elements of E will be experiments that distinguish any pair of states of M; otherwise, δ may not be correct.

## **7 Characteristic sample for a DPA**

The construction of TAcc, the part of the characteristic sample used for retrieving the accepting condition of a DPA, builds on the construction of a forest of SCCs associated with a given DPA, which we term the canonical forest. Its properties and its construction are described next.

### **7.1 Constructing the canonical forest of a DPA**

We start with some definition and simple claims.Let P = (Σ, Q, qι, δ, κ) be a deterministic parity acceptor (DPA). A set of states C ⊆ Q is a strongly connected component (SCC) if and only if C is nonempty and for every q1, q<sup>2</sup> ∈ C, there exists a nonempty string <sup>v</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup> such that <sup>δ</sup>(q1, v) = <sup>q</sup><sup>2</sup> and for all <sup>u</sup> <sup>v</sup>, δ(q1, u) ∈ C. Note that an SCC need not be maximal, and that a singleton {q} is an SCC if and only if the state q has a self-loop, that is, δ(q, σ) = q for some σ ∈ Σ. For any ω-word w, the set C of states visited infinitely often in the run of P on input w is an SCC of P.

**Claim 7.** If C<sup>1</sup> and C<sup>2</sup> are SCCs of P and C<sup>1</sup> ∩ C<sup>2</sup> = ∅, then C<sup>1</sup> ∪ C<sup>2</sup> is also an SCC of P.

If P is a DPA and R ⊆ Q is any set of states, define SCCs(R) to be the set of all C such that C ⊆ R and C is an SCC of P. Also define maxSCCs(R) to be the maximal elements of SCCs(R) with respect to the subset ordering.

**Claim 8.** If P is a DPA and R ⊆ Q is any set of states, then the elements of maxSCCs(R) are pairwise disjoint, and every set C ∈ SCCs(R) is a subset of exactly one element of maxSCCs(R).

If P is a DPA, we extend its coloring function κ to any nonempty set R of states by κ(R) = min{κ(q) | q ∈ R}. We define the parity of R to be 1 if κ(R) is odd, and 0 otherwise. For an ω-word w, if the SCC C is the set of states visited infinitely often in the run of P on w, then w is accepted by P iff the parity of C is 1. Note that the union of two sets of parity b is also of parity b. For any set of states R ⊆ Q, we define minStates(R) to be the set of states q ∈ R such that κ(q) = κ(R), that is, the states of R that are assigned the minimum color among all states of R.

**The Canonical Forest** Using these definitions we can show that there exists a forest associated with a DPA that has the following interesting properties. We provide an example for a canonical forest for a given DPA at the end of the current subsection.

**Theorem 9.** Let P = (Σ, Q, q0, δ, κ) be a DPA. There exists a canonical forest F∗(P) that is unique up to isomorphism and has the following properties.


Proof. The root nodes of F∗(P) are the elements of maxSCCs(Q) and are SCCs that are pairwise disjoint, by Claim 8. Let C be one of them, and assume its parity is b. Let T be the set of SCCs that are subsets of C and of parity 1 − b. If T = ∅ then C has no children and is a leaf of F∗(P). Otherwise, the children of C are the maximal elements of T with respect to the subset ordering. The children of C must be pairwise disjoint because if they share a state, then their union is an SCC contained in C of parity 1 − b and is a proper superset of at least one of them, violating maximality. No child of C can contain an element of minStates(C) because otherwise the parity of the child would be b. Thus the union of the children of C must be a proper subset of C. These conditions imply that there are at most |Q| nodes in the forest, and that it is unique up to isomorphism.

Let D be any SCC of P. Then D ∈ SCCs(Q), so by Claim 8, because the roots of F∗(P) are the elements of maxSCCs(Q), there is a unique root node C<sup>0</sup> such that D ⊆ C0. Suppose the parity of C<sup>0</sup> is b. If D is not a subset of any of the children of C0, then it cannot have parity 1 − b, so the choice C = C<sup>0</sup>

satisfies the required condition. If, however, D is a subset of some child C<sup>1</sup> of C0, then because the children of C<sup>0</sup> are pairwise disjoint, C<sup>1</sup> is the only child of C<sup>0</sup> that contains D. Again, if D is not a subset of any of the children of C<sup>1</sup> then D and C<sup>1</sup> must have the same parity, and the choice C = C<sup>1</sup> satisfies the condition. Otherwise, we continue down the tree rooted at C<sup>0</sup> until a node C is found that satisfies the condition. Note that if we arrive at a leaf Ck, then D is not a subset of any of the children of C<sup>k</sup> (there are none) and D must have the same parity as C<sup>k</sup> because otherwise C<sup>k</sup> would have at least one child.

**The Canonical Coloring** The canonical forest F∗(P) allows us to define a canonical coloring κ<sup>∗</sup> for P, as follows. The states in (Q \ maxSCCs(Q)) are not contained in any SCC of P and do not affect the acceptance or rejection of any ω-word. For definiteness, we assign them κ∗(q) = 0. For each node C of F∗(P), we define Δ(C) to be the set of states of C that are not contained in the union of the children of C. For a root node C of parity b, we define κ∗(q) = b for all q ∈ Δ(C). Let C be an arbitrary node of F∗(P). If the states of Δ(C) have been assigned color k by κ<sup>∗</sup> and D is a child of C, then the states of Δ(D) are assigned color k + 1 by κ∗. We observe that if q<sup>1</sup> ∈ Δ(C) and q<sup>2</sup> is in a child of C, then κ∗(q1) < κ∗(q2), and κ∗(q1) is of the same parity as C.

**Theorem 10.** Let P = (Σ, Q, q0, δ, κ) be a DPA, and P be P with the canonical coloring κ<sup>∗</sup> for P in place of κ. Then P and P recognize the same ω-language.

Proof. Let w be an ω-word and let D be the SCC consisting of the states visited infinitely often in the run of P (and also of P ) on input w. Let C be the unique node of F∗(P) such that D is a subset of C and is not a subset of any of the children of C. Thus D contains at least one q ∈ Δ(C). In P the parity of D is the same as the parity of C, which is the same as the parity of κ∗(q), which is equal to the parity of D in P . Thus either both P and P accept w or both reject w.

**Computing the Canonical Forest** We now show that, given a DPA P = (Σ, Q, q0, δ, κ), we can compute the canonical forest of P in polynomial time. We first define a (possibly non-canonical) forest Fκ(P) using the given coloring κ. The root nodes are the elements of maxSCCs(Q), the set of all maximal SCCs of P. Once we have defined a node C of the forest, the children are the elements of the set maxSCCs(C \ minStates(C)), that is, the maximal SCCs contained in C with the set of states of minimum color removed. If this set is empty, the node has no children and is a leaf. Note that in contrast to the case of the canonical forest, in Fκ(P) the children of a node are not constrained to be of parity opposite to that of the parent.

By construction each node in the forest Fκ(P) is an SCC of P. If D is a descendant of C in the forest, then D is a proper subset of C, and κ(C) < κ(D). Because the roots are pairwise disjoint and the children of any node are pairwise disjoint, the sets minStates(C) for nodes C in the forest are pairwise disjoint and

Fig. 2: (a) Transition graph of DPA P with states colored by κ. (b) Non-canonical forest Fκ(P), with parities of nodes. (c) Canonical forest F <sup>∗</sup>(P), with parities of nodes. (d) Transition graph of P with the canonical coloring κ∗.

nonempty, so there are at most |Q| nodes. Because a linear time algorithm for computing strongly connected components can be used to compute the children of a node, the forest Fκ(P) may be computed in polynomial time in the size of the given DPA P.

To obtain the canonical forest F∗(P) from the possibly non-canonical forest Fκ(P), we may repeatedly merge pairs of adjacent nodes of the same parity until every pair of adjacent nodes are of different parity. That is, if C is a node of parity b and D is a child of C of parity b, then D ⊆ C, and we merge D into C by deleting D and making all the children of D direct children of C. Repeating this operation until there are no parent/child pairs of equal parity yields the canonical forest F∗(P). This computation can be done in polynomial time.

Note that to obtain a canonical forest for a given DBA (resp., DCA) we can simply first color states in F by 1 (resp. 0) and in Q\F by 2 (resp., 1) and then compute the canonical forest for the resulting DPA. In both cases the canonical forest will be of depth at most two, since in DBA an accepting SCC cannot be subsumed by a rejecting SCC (and vice versa in DCA).

An Example Figure 2(a) shows the transition graph of an example DPA P with states a through m, labeled by the colors assigned by κ. There is a directed edge from state q<sup>1</sup> to state q<sup>2</sup> if there exists a symbol σ ∈ Σ such that δ(q1, σ) = q2. Figure 2(b) shows the non-canonical SCC forest Fκ(P) of P, with the nodes labeled by their parities. Figure 2(c) shows the canonical SCC forest F∗(P) of P, with the nodes labeled by their parities. Figure 2(d) shows the transition graph of P re-colored using the canonical coloring κ∗.

#### **7.2 Constructing the characteristic sample for a DPA**

We can now construct TAcc, the second part of the characteristic sample for a DPA <sup>P</sup>. The sample <sup>T</sup>Acc consists of one example <sup>u</sup>(v)<sup>ω</sup> for each node <sup>C</sup> of the canonical forest F∗(P), where u is a string that reaches a state q in C from the initial state q0, and v is a nonempty string that, starting from q, visits every state of C and no state outside of C and returns to q. The length of the example u(v)<sup>ω</sup> can be taken to be bounded by n + n<sup>2</sup>. The example u(v)<sup>ω</sup> is labeled 1 if it is accepted by P and otherwise is labeled 0. Then TAcc contains at most n labeled examples, each of length polynomial in n. The final characteristic sample for <sup>L</sup> <sup>=</sup> -<sup>P</sup> is <sup>T</sup><sup>L</sup> <sup>=</sup> <sup>T</sup>Aut <sup>∪</sup> <sup>T</sup>Acc. The sample <sup>T</sup><sup>L</sup> contains <sup>O</sup>(|Σ|n<sup>3</sup>) labeled examples, each of length at most O(n<sup>4</sup>), which is polynomial in size(L).

## **8 The learning algorithm for a DPA**

We can now describe the learning algorithm **A** that makes use of the information in TL. Similar to Gold's construction, the algorithm optimistically assumes that the sample includes a characteristic sample, and if that assumption fails to produce an acceptor consistent with the sample, the algorithm defaults to producing a table-lookup acceptor to ensure that its hypothesis is consistent with the sample. The algorithm we describe is sufficient to establish the theoretical results, but for practical applications much more effort should be expended to find good heuristic choices to avoid defaulting too easily.

Let L denote the language to be learned, and P denote a DPA of n states that is isomorphic to its right congruence automaton and recognizes L. The first and second phases of the algorithm are as described in Section 6: in the first phase the algorithm builds the set S of states of the automaton, and in the second step it builds the transition relation δ . In the third phase, the acceptance (namely the coloring) is determined. In this phase, the algorithm may default to returning the table-lookup DPA for T. We first explain the construction of the table-lookup DPA then describe the third phase.

**A table-lookup DPA** A table-lookup DPA for a given sample T is constructed by finding the shortest prefix of each example u(v)<sup>ω</sup> in T that distinguishes it from all other examples in T and placing these prefixes in a trie-like structure. At each leaf of the trie is a structure accepting (or rejecting, depending on the label of the example) the appropriate

Fig. 3: Table-lookup DPA for T = {(a(b)ω, 1), ((ab)ω, 1), (ab(baa)ω, 0)}.

suffix of the unique example that arrives at that leaf. By Claim 1, this DPA

can be constructed in time polynomial in the length of the sample T. Note that this construction is easily modified to give a DBA, DCA or DMA instead of a DPA. As an example, for the sample <sup>T</sup> <sup>=</sup> {(a(b)<sup>ω</sup>, 1),((ab)<sup>ω</sup>, 1),(ab(baa)<sup>ω</sup>, 0)}, the corresponding prefixes are abbb, aba, and abba, and the table-lookup DPA for T is shown in Figure 3, with states labeled by colors 0 and 1.

**Determining the coloring** In the third phase, the algorithm attempts to define a coloring of the states in S. The algorithm constructs the set Z of all subsets C of S such that for some labeled example (u(v)<sup>ω</sup>, l) in T, the subset C is the set of elements of S that are visited infinitely often in the run on input u(v)<sup>ω</sup> starting at ε using the transition function δ . If in this process two examples with different labels are found to yield the same set C, the learning algorithm defaults to the table-lookup DPA for T. Otherwise, each set C in Z is associated with the label of the example(s) that yield C. The set Z is partially ordered by the subset relation. The learning algorithm then attempts to construct a forest F with nodes that are elements of Z, corresponding to the canonical forest of P. Initially, F contains as roots all the maximal elements of Z. If these are not pairwise disjoint, it defaults to the table-lookup DPA for T. Otherwise, for each unprocessed element C in F , it computes the set of all D ∈ Z such that D ⊆ C, D has the opposite label to C, and D is maximal with these properties, and makes D a child of C. When all the children of a node C have been determined, the algorithm checks two conditions: (1) that the children of C are pairwise disjoint, and (2) there is at least one s ∈ C that is not in any child of C. If either of these conditions fail, then it defaults to the table-lookup DPA for T. If both conditions are satisfied, then the node C is marked as processed. When there are no more unprocessed nodes, the construction of F is complete. Note that F can have at most n nodes, because S has at most n elements.

When the construction of F completes, for each node C in F let Δ(C) denote the elements of C that do not appear in any of its children. Then the learning algorithm assigns colors to the elements of S starting from the roots of F , as follows. If C is a root with label l, then κ (s) = l for all s ∈ Δ(C). If the elements of Δ(C) have been assigned color k and D is a child of C, then κ (s) = k + 1 for all s ∈ Δ(D). When this process is complete, any uncolored strings s are assigned κ (s) = 0. If the resulting DPA P is consistent with the sample T, the learning algorithm outputs P and halts. If the sample T includes both TAut (to specify the automaton) and TAcc (to specify the coloring), then F will be isomorphic to the canonical forest F∗(P) and κ will correspond to the canonical coloring κ∗, and P will recognize the target language L.

If the process described above does not result in a DPA that is consistent with the sample T, then the algorithm defaults to constructing the table-lookup DPA for T.

The learning algorithm also works for the classes IB and IC: In the case of IB and IC we need to define a set F rather than a coloring κ. After constructing the forest, the set F is determined to contain the states in the root nodes that are not in the leaves. Thus we have the following.

**Theorem 11.** The classes IB, IC and IP are identifiable in the limit using polynomial time and data. Moreover, characteristic samples can be computed in polynomial time.

A corollary of Theorem 11 is that the class of languages recognized by derministic weak parity acceptors (DWPA) which was shown to be polynomially learnable using membership and equivalence queries in [24] is identified in the limit using polynomial time and data. This class (which is equivalent to the intersection of classes DBA <sup>∩</sup> DCA) was shown to be a subset of IM in [30], and to be a subset of IP in [4].

**Corollary 2.** The class DWPA is identifiable in the limit using polynomial time and data. Moreover, characteristic samples can be computed in polynomial time.

## **9 The sample** *TAcc* **and the learning algorithm for a DMA**

The above results can be extended to the class IM. Recall that we define the size measure for a DMA to be max{|Σ|, |Q|, m}, where m is the number of sets in the acceptance condition. For the characteristic sample TL, TAut remains the same, but TAcc contains for each accepting set C, an example u(v)<sup>ω</sup> for which C is the set of states visited infinitely often. In the learning algorithm, the construction of the transition function remains the same. Instead of attempting to construct a coloring function, the learning algorithm finds for each labeled example (u(v)<sup>ω</sup>, 1) <sup>∈</sup> <sup>T</sup>, the set <sup>C</sup> of states <sup>s</sup> that are visited infinitely often on input u(v)<sup>ω</sup> starting from ε and using the transition function δ , and adds C to the acceptance condition. If the construction does not result in a DMA consistent with T, then it defaults to producing a table-lookup DMA for T. Because in addition, as stated in Section 5.1, a characteristic samples can be computed in polynomial time, we have the following.

**Theorem 12.** The class IM is identifiable in the limit using polynomial time and data. Moreover, a characteristic sample can be computed in polynomial time.

## **10 Discussion**

We have shown that the non-deterministic classes of ω-automata NBA, NPA, NMA and NCA cannot be identified in the limit using polynomial data. A negative result regarding query learning of the first three classes was recently obtained in [3]. That result makes a plausible assumption of cryptographic hardness, which is not required here. On the positive side we have shown that the classes IB, IC, IP and IM can be identified in the limit using polynomial time and data. And moreover, a characteristic sample can be constructed in polynomial time. The construction builds on the definition of a canonical forest for a DPA which may be of use in other contexts as well. The question whether the deterministic classes DBA, DPA, DMA and DCA can be polynomially learned in the limit remains open.

## **References**


34. Wagner, K.W.: A hierarchy of regular sequence sets. In: 4th Symposium on Mathematical Foundations of Computer (MFCS). pp. 445–449 (1975)

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## SV-COMP 2020

## Advances in Automatic Software Verification: SV-COMP 2020

Dirk Beyer

**TACAS SV-COMP Artifact 2020 Accepted**

LMU Munich, Germany

Abstract. This report describes the 2020 Competition on Software Verification (SV-COMP), the 9th edition of a series of comparative evaluations of fully automatic software verifiers for C and Java programs. The competition provides a snapshot of the current state of the art in the area, and has a strong focus on replicability of its results. The competition was based on 11 052 verification tasks for C programs and 416 verification tasks for Java programs. Each verification task consisted of a program and a property (reachability, memory safety, overflows, termination). SV-COMP 2020 had 28 participating verification systems from 11 countries.

Keywords: Formal Verification · Program Analysis · Competition

## 1 Introduction

The Competition on Software Verification (SV-COMP) serves as the showcase of the state of the art in the area of automatic software verification. SV-COMP 2020 is the 9th edition of the competition and presents an overview of the currently achieved results by tool implementations that are based on the most recent ideas, concepts, and algorithms for fully automatic verification. This competition report describes the (updated) rules and definitions, presents the competition results, and discusses some interesting facts about the execution of the competition experiments. The competition measures its own success by evaluating whether the objectives of the competition were achieved. To the objectives discussed earlier (1-4 [14]) we add two further objectives that deserve mentioning (5-6):


We now discuss the outcome of SV-COMP 2020 with respect to these objectives: (1) There were 28 participating software systems from 11 countries, using many different technologies (cf. Table 6). SV-COMP is considered an important event in the verification community. (2) The sv-benchmarks repository is considered one of the largest and most diverse collections of verification tasks in C and Java. The community dedicates a lot of maintenance effort, as the issue tracker <sup>1</sup> and the pull requests <sup>2</sup> on GitHub show. (3) SV-COMP has established a format for defining verification tasks, a standard specification language, and a set of functions to express non-deterministic values. Verification results are validated using verification witnesses and six different validators. (4) We received positive feedback from industry, reporting that it is helpful to look up the newest and best available verification tools, regarding the categories of interest. There are several participating systems from industry since 2017. (5) Participating in SV-COMP is also a challenge because the entry requirements are strict: the tools have to be packaged such that all necessary non-standard components are contained, the tools need to provide meaningful log output, the tool parameters have to be specified in the BenchExec benchmark-definition format, and a tool-info module needs to be implemented. All experiments are required to be fully replicable. It is a motivating experience to observe the learning of first-time participants. (6) Running large-scale performance experiments requires an infrastructure with considerable computing resources — which are not necessarily available to all tool developers. Through this competition and the preruns, the participants get the opportunity to repeatedly run experiments on the full benchmark set of verification tasks of the competition. The preruns and final run sum up to over one million verification runs and ten million witness-validation runs.

Related Competitions. It is well-understood that competitions are an important evaluation method, and there are many other competitions in the field of formal methods. The TOOLympics <sup>3</sup> [7] event in 2019 (part of the 25-years-of-TACAS celebration) presented 16 competitions in the area. Most closely related are the competitions RERS <sup>4</sup> [45] and VerifyThis <sup>5</sup> [46]. While SV-COMP <sup>6</sup> performs replicable experiments in a *controlled* environment (dedicated resources, resource limits), the RERS Challenges give more room for exploring combinations of interactive with automatic approaches without limits on the resources, and the VerifyThis Competition focuses on evaluating approaches and ideas rather than on *fully automatic* verification.

Large benchmark collections are extremely important to make approaches comparable and to agree on what constitutes interesting problems to solve. There are other large benchmark collections as well (e.g., by SPEC <sup>7</sup>), but the

<sup>1</sup> https://github.com/sosy-lab/sv-benchmarks/issues

<sup>2</sup> https://github.com/sosy-lab/sv-benchmarks/pulls

<sup>3</sup> https://tacas.info/toolympics.php

<sup>4</sup> http://rers-challenge.org

<sup>5</sup> http://etaps2016.verifythis.org

<sup>6</sup> https://sv-comp.sosy-lab.org

<sup>7</sup> https://www.spec.org

sv-benchmarks suite <sup>8</sup> is (a) free of charge, and (b) tailored to the state of the art in software verification. Benchmark repositories of various competitions and challenges also contribute to each other. For example, the sv-benchmarks suite contains programs that were originally used in RERS <sup>9</sup>, in termCOMP<sup>10</sup>, and in VerifyThis <sup>11</sup>. There is a flow of benchmarks in the other direction as well: The competition SMT-COMP [32] uses SMT formulas that were generated from programs of the sv-benchmarks collection. For example, the k-induction engine of CPAchecker was used to generate more than 1000 SMT formulas for the quantifier-free theory of arrays and bit-vectors (QF\_ABV) <sup>12</sup>.

## 2 Organization, Definitions, Formats, and Rules

Procedure. SV-COMP 2020's overall organization did not change in comparison to the earlier editions [8, 9, 10, 11, 12, 13, 14]. SV-COMP is an open competition, where all verification tasks are known before the submission of the participating verifiers, which is necessary due to the complexity of the C language. During the *benchmark submission* phase, new verification tasks were collected, classified, and added to the existing benchmark suite (i.e., SV-COMP uses an accumulating benchmark suite), during the *training* phase, the teams inspected the verification tasks and trained their verifiers (also, the verification tasks received fixes and quality improvement), and during the *evaluation* phase, verification runs were preformed with all competition candidates, and the system descriptions and archives were reviewed by the competition jury. The participants received the results of their verifier directly via e-mail, and after a few days of inspection, the results were publicly announced on the competition web site. The *Competition Jury* consisted again of the chair and one member of each participating team. Team representatives of the jury are listed in Table 5.

Qualification and License Requirements. As a new feature in SV-COMP 2020, a rule was introduced that allows the organizer to reuse systems that participated in previous years, and to enter new systems, provided that the developers were given the chance to contribute a submission themselves (both options were not used this time). Starting 2018, SV-COMP required that the verifier must be publicly available for download and has a license that


<sup>8</sup> https://github.com/sosy-lab/sv-benchmarks

<sup>9</sup> https://github.com/sosy-lab/sv-benchmarks/blob/svcomp20/c/eca-rers2012/README.txt

<sup>10</sup> https://github.com/sosy-lab/sv-benchmarks/blob/svcomp20/c/termination-restricted-15/ README.txt

<sup>11</sup> https://github.com/sosy-lab/sv-benchmarks/blob/svcomp20/c/verifythis/README.txt

<sup>12</sup> https://clc-gitlab.cs.uiowa.edu:2443/SMT-LIB-benchmarks-inc/QF\_ABV/tree/master/ 20190307-CPAchecker\_kInduction-SoSy\_Lab

```
1 format_version: '1.0'
2
3 # old file name: floppy_true−unreach−call_true−valid−memsafety.i.cil.c
4 input_files: 'floppy.i.cil−3.c'
5
6 properties:
7 − property_file: ../properties/unreach−call.prp
8 expected_verdict: true
9 − property_file: ../properties/valid−memsafety.prp
10 expected_verdict: false
11 subproperty: valid−memtrack
```
Fig. 1: Example task definition for program floppy.i.cil-3.c

Validation of Results. The validation of the results based on verification witnesses [19, 20] was done as in previous years (2017–2019), mandatory for *both* answers True or False. A few categories were excluded from validation if the validators did not sufficiently support a certain kind of program or property. Two new validators participated in SV-COMP 2020: Nitwit [66] and MetaVal [25].

Verification Tasks — Explicit Task-Definition Files. The notion of verification tasks did not change and we refer to previous reports for more details [10, 13]. We developed a new format for task definitions that was used for the Java category already in SV-COMP 2019. Technically, we need a verification task (a pair of a program and a specification to verify) to feed as input to the verifier, and an expected result against which we check the answer that the verifier returns. Previously, the above-mentioned three components were specified in the file name of the program; now all the information is stored in an extra file that contains a structured definition of the verification tasks for a program. For each program, the repository contains the program file and a task-definition file. Consider an example program that is available under the name floppy.i.cil-3.c: This program comes now with its task-definition file floppy.i.cil-3.yml. Figure 1 shows this task definition. The new format was used in SV-COMP 2019 for the Java category [14] and in the competition on software testing, Test-Comp 2019 [15].

The task definition uses the YAML format as underlying structured data format. It contains a version id of the format (line 1) and can contain comments (line 3). The field input\_files specifies the input program (example: 'floppy.i.cil-3.c'), which is either one file or a list of files. The field properties lists all properties of the specification for this program. Each property has a field property\_file that specifies the property file (example: ../properties/unreach-call.prp) and a field expected\_verdict that specifies the expected result (example: true).

Categories, Properties, Scoring Schema, and Ranking. The categories are listed in Tables 7 and 8 and described in detail on the competition web site.<sup>13</sup> Figure 2 shows the category composition. For the definition of the properties and the property format, we refer to the 2015 competition report [11]. All specifications are available in the directory c/properties/ of the benchmark

<sup>13</sup> https://sv-comp.sosy-lab.org/2020/benchmarks.php

Fig. 2: Category structure for SV-COMP 2020; category *C-FalsificationOverall* contains all verification tasks of *C-Overall* without *Termination*; *Java-Overall* contains all Java verification tasks


Table 1: Properties used in SV-COMP 2020 (unchanged since 2019 [14])

Table 2: Scoring schema for SV-COMP 2020 (unchanged since 2017 [13])


repository. Table 1 lists the properties and their syntactical representation as overview. Property G valid-memcleanup, and thus, the category *MemCleanup*, was used for the first time in SV-COMP 2019. The categories *AWS-C-Common* and *OpenBSD* were added for SV-COMP 2020.

The scoring schema is identical for SV-COMP 2017–2020: Table 2 provides the overview and Fig. 3 visually illustrates the score assignment for one property. The scoring schema still contains the special rule for unconfirmed correct results for expected result True that was introduced in the transitioning phase: one point is assigned if the answer matches the expected result but the witness was not confirmed.

Fig. 3: Visualization of the scoring schema for the reachability property (from [13], c Springer-Verlag)

The ranking was again decided based on the sum of points (normalized for meta categories). In case of a tie, the ranking was decided based on success run time, which is the total CPU time over all verification tasks for which the verifier reported a correct verification result. *Opt-out from Categories* and *Score Normalization for Meta Categories* was done as described previously [9] (page 597).

## 3 Reproducibility

All major components used in the competition are available in public version repositories. This allows independent replication of the SV-COMP experiments. An overview of the components that contribute to the reproducible setup of SV-COMP is provided in Fig. 4, and the details are given in Table 3. The SV-COMP 2016 report [12] describes all components of the SV-COMP organization and how we ensure that all parts are publicly available for maximal replicability.

We have published the competition artifacts at Zenodo to guarantee their long-term availability and immutability. These artifacts comprise the verification tasks, the produced competition results, and the produced verification witnesses. The DOIs and references are given in Table 4. The archive for the competition results includes the raw results in BenchExec's XML exchange format, the log output of the verifiers and validators, and a mapping from files names to SHA-256 hashes. The hashes of the files are useful for validating the exact contents of a file, and accessing the files inside the archive that contains the verification witnesses.

To provide a more transparent way of accessing the exact versions of the verifiers that were used in the competition, all verifier archives are stored in a public Git repository. GitLab was used to host the repository for the verifier archives due to its generous repository size limit of 10 GB. The final size of the Git repository is 5.78 GB.

Fig. 4: SV-COMP components and the execution flow

Table 3: Publicly available components for replicating SV-COMP 2020


## 4 Results and Discussion

The results of the competition experiments represent the state of the art in fully automatic software-verification tools. The report shows the results, in terms of effectiveness (number of verification tasks that can be solved and correctness of the results, as accumulated in the score) and efficiency (resource consumption in terms of CPU time). The results are presented in the same way as in last years, such that the improvements compared to last year are easy to identify. The results presented in this report were inspected and approved by the participating teams. We now discuss the highlights of the results.

Participating Verifiers. Table 5 and the competition web site <sup>14</sup> provide an overview of the participating verification systems. Table 6 lists the algorithms and techniques that are used in the verification tools.

Computing Resources. The resource limits were the same as in the previous competitions [12]: Each verification run was limited to 8 processing units (cores), 15 GB of memory, and 15 min of CPU time. The witness validation was limited to 2 processing units, 7 GB of memory, and 1.5 min of CPU time for violation witnesses and 15 min of CPU time for correctness witnesses. The machines for running the experiments are part of a compute cluster that consists of

<sup>14</sup> https://sv-comp.sosy-lab.org/2020/systems.php

Table 4: Artifacts published for SV-COMP 2020


Table 5: Competition candidates with tool references and representing jury members


168 machines; each verification run was executed on an otherwise completely unloaded, dedicated machine, in order to achieve precise measurements. Each machine had one Intel Xeon E3-1230 v5 CPU, with 8 processing units each, a frequency of 3.4 GHz, 33 GB of RAM, and a GNU/Linux operating system (x86\_64-linux, Ubuntu 18.04 with Linux kernel 4.15). We used BenchExec [23] to measure and control computing resources (CPU time, memory, CPU energy) and VerifierCloud <sup>15</sup> to distribute, install, run, and clean-up verification runs,

<sup>15</sup> https://vcloud.sosy-lab.org


Table 6: Algorithms and techniques that the competition candidates offer

and to collect the results. The values for time and energy are accumulated over all cores of the CPU. To measure the CPU energy, we use CPU Energy Meter [24] (integrated in BenchExec [23]).

One complete verification execution of the competition consisted of 138 074 verification runs (each verifier on each verification task of the selected categories according to the opt-outs), consuming 491 days of CPU time and 130 kWh of CPU energy (without validation). Witness-based result validation required 684 858 validation runs (each validator on each verification task for categories with witness validation, and for each verifier), consuming 311 days of CPU time. Each tool was executed several times, in order to make sure no installation issues occur during the execution. Including preruns, the infrastructure managed a total of 1 018 781 verification runs consuming 4.8 years of CPU time, and 10 705 227 validation runs consuming 6.9 years of CPU time.

Quantitative Results. Table 7 presents the quantitative overview of all tools and all categories. The head row mentions the category, the maximal score for the category, and the number of verification tasks. The tools are listed in alphabetical order; every table row lists the scores of one verifier. We indicate the top three candidates by formatting their scores in bold face and in larger font size. An empty table cell means that the verifier opted-out from the respective main category (perhaps participating in subcategories only, restricting the evaluation to a specific topic). More information (including interactive tables, quantile plots for every category, and also the raw data in XML format) is available on the competition web site <sup>16</sup> and in the results artifact (see Table 4).

Table 8 reports the top three verifiers for each category. The run time (column 'CPU Time') and energy (column 'CPU Energy') refer to successfully solved verification tasks (column 'Solved Tasks'). We also report the number of tasks for which no witness validator was able to confirm the result (column 'Unconf. Tasks'). The columns 'False Alarms' and 'Wrong Proofs' report the number of verification tasks for which the verifier reported wrong results, i.e., reporting a counterexample when the property holds (incorrect False) and claiming that the program fulfills the property although it actually contains a bug (incorrect True), respectively.

Score-Based Quantile Functions for Quality Assessment. We use scorebased quantile functions [9, 23] because these visualizations make it easier to understand the results of the comparative evaluation. The web site <sup>16</sup> and the results archive (see Table 4) include such a plot for each category. As an example, we show the plot for category *C-Overall* (all verification tasks) in Fig. 5. A total of 11 verifiers participated in category *C-Overall*, for which the quantile plot shows the overall performance over all categories (scores for meta categories are normalized [9]). A more detailed discussion of score-based quantile plots, including examples of what insights one can obtain from the plots, is provided in previous competition reports [9, 12].

<sup>16</sup> https://sv-comp.sosy-lab.org/2020/results


Table 7: Quantitative overview over all results; empty cells represent opt-outs


Table 8: Overview of the top-three verifiers for each category (measurement values for CPU time and energy rounded to two significant digits)

Fig. 5: Quantile functions for category *C-Overall*. Each quantile function illustrates the quantile (x-coordinate) of the scores obtained by correct verification runs below a certain run time (y-coordinate). More details were given previously [9]. A logarithmic scale is used for the time range from 1 s to 1000 s, and a linear scale is used for the time range between 0 s and 1 s.

Alternative Rankings. The community suggested to report a couple of alternative rankings that honor different aspects of the verification process as complement to the official SV-COMP ranking. Table 9 is similar to Table 8, but contains the alternative ranking categories *Correct* and *Green Verifiers*. Column 'Quality' gives the score in score points, column 'CPU Time' the CPU usage of successful runs in hours, column 'CPU Energy' the CPU usage of successful runs in kWh, column 'Solved Tasks' the number of correct results, column 'Wrong Results' the sum of false alarms and wrong proofs in number of errors, and column 'Rank Measure' gives the measure to determine the alternative rank.

*Correct Verifiers — Low Failure Rate.* The right-most columns of Table 8 report that the verifiers achieve a high degree of correctness (all top three verifiers in the C track have less than 2 % wrong results). The winners of category *Java-Overall* produced not a single wrong answer. The first category in Table 9 uses a failure rate as rank measure: number of incorrect results total score , the number of errors per score point (E/sp). We use E as unit for number of incorrect results and sp as unit for total score. It is remarkable to see that the worst result was 0.38 E/sp in SV-COMP 2019 and is now improved to 0.032 E/sp, with is an order of magnitude better.

*Green Verifiers — Low Energy Consumption.* Since a large part of the cost of verification is given by the energy consumption, it might be important to also consider the energy efficiency. The second category in Table 9 uses the energy consumption per score point as rank measure: total CPU energy total score , with the unit J/sp. It is interesting to see that the worst result from SV-COMP 2019 was 4 200 J/sp, and now it is improved to 2 200 J/sp.


Table 9: Alternative rankings; quality is given in score points (sp), CPU time in hours (h), energy in kilojoule (kJ), wrong results in errors (E), rank measures in errors per score point (E/sp), joule per score point (J/sp), and score points (sp)

Table 10: Confirmation rate of verification witnesses in SV-COMP 2020


Verifiable Witnesses. All SV-COMP verifiers are required to justify the result (True or False) by producing a verification witness (except for those categories for which no witness validator is available). We used six independently developed witness-based result validators [19, 20, 21, 25, 66].

The majority of witnesses that the verifiers produced can be confirmed by the results-validation process. Interestingly, the confirmation rate for the True results is significantly higher than for the False results. Table 10 shows the confirmed versus unconfirmed results: the first column lists the verifiers

Fig. 6: Number of participating teams for each year

of category *C-Overall*, the three columns for result True reports the total, confirmed, and unconfirmed number of verification tasks for which the verifier answered with True, respectively, and the three columns for result False reports the total, confirmed, and unconfirmed number of verification tasks for which the verifier answered with False, respectively. More information (for all verifiers) is given in the detailed tables on the competition web site <sup>16</sup> and in the results artifact; all verification witnesses are also contained in the witnesses artifact (see Table 4). Result validation is an important topic also in other competitions (e.g., in the SAT competition [5, 69]).

## 5 Conclusion

SV-COMP 2020, the 9th edition of the Competition on Software Verification, attracted 28 participating teams from 11 countries (see Fig. 6 for the participation numbers). SV-COMP continues to offer a broad overview of the state of the art in automatic software verification. The competition does not only execute the verifiers and collect results, but also validates the verification results, using six independently developed results validators. The number of verification tasks was increased to 11 052 in C and to 416 in Java. As before, the large jury and the organizer made sure that the competition follows the high quality standards of the TACAS conference, in particular with respect to the important principles of fairness, community support, and transparency.

Data Availability Statement. The verification tasks and results of the competition are published at Zenodo, as described in Table 4. All components and data that are necessary for reproducing the competition are available in public version repositories, as specified in Fig. 4 and Table 3. Furthermore, the results are presented online on the competition web site for easy access: https://sv-comp.sosy-lab.org/2020/results/.

## References


Open Access. This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### 2LS: Heap Analysis and Memory Safety (Competition Contribution)*-*

Viktor Mal´ık --<sup>3</sup> , Peter Schrammel<sup>1</sup>,<sup>2</sup> , and Toma´s Vojnar ˇ <sup>3</sup>

<sup>1</sup>Diffblue Ltd, Oxford, UK <sup>2</sup>University of Sussex, Brighton, UK <sup>3</sup>FIT, Brno University of Technology, Brno, CZ

Abstract 2LS is a framework for analysis of sequential C programs based on the CPROVER infrastructure and template-based synthesis techniques for checking both safety and termination. The paper presents the main improvements done in 2LS since 2018, which concern mainly the way 2LS handles dynamically allocated objects and structures as well as combinations of abstract domains.

## 1 Overview

2LS is a static analysis and verification tool for sequential C programs. At its core, it uses the kIkI algorithm (k-invariants and k-induction) [1], which integrates bounded model checking, k-induction, and abstract interpretation into a single, scalable framework. kIkI relies on incremental SAT solving in order to find proofs and refutations of assertions, as well as to perform termination analysis [2].

The 2019 and 2020 competition versions of 2LS feature product and power abstract domain combinations supporting invariant inference for programs manipulating shape and content of dynamic data structures [4]. Moreover, the 2020 version came with further enhancements for handling advanced features of memory allocation and made a step towards a support of generic abstract domain combinations.

Architecture. The architecture of 2LS has been described in previous competition contributions [7,5]. In brief, 2LS is built upon the CPROVER infrastructure [3] and thus uses *GOTO programs* as the internal program representation. The analysed program is translated into an acyclic, over-approximate single static assignment (SSA) form, in which loops are cut at the edges returning to the loop head. Subsequently, 2LS refines this over-approximation by computing inductive invariants in various abstract domains represented by parametrised logical formulae, so-called templates [1]. The competition version uses the zones domain for numerical variables combined with our shape domain for pointer-typed variables. The SSA form is bit-blasted into a propositional formula and given to a SAT solver. The kIkI algorithm then incrementally amends the formula to perform loop unwindings and invariant inference based on template-based synthesis [1].

<sup>-</sup>The Czech authors were supported by the project 20-07487S of the Czech Science Foundation.

<sup>-</sup>-Jury member: imalik@fit.vut.cz.

## 2 New Features

The major improvements of 2LS since 2018 are mostly related to analysis of heapmanipulating programs. We build on the shape domain presented in 2018 [5] and introduce abstract domain combinations that allow us to analyse both shape and content of dynamic data structures. Furthermore, we introduce a special handling for the case when an address of a freed heap object is re-used for the next allocation.

Apart from an improved verification of heap-manipulating programs, we also introduce a generic skeleton of an abstract domain join algorithm, which is a step towards a support of generic abstract domain combinations.

### 2.1 Combinations of Abstract Domains

The capability of 2LS to jointly analyse shape and content of dynamic data structures takes advantage of the template-based synthesis engine of 2LS. Invariants are computed in various abstract domains where each domain has the form of a template while relying on the analysis engine to handle the domain combinators.

Memory model In our memory model, we represent dynamically allocated objects by so-called *abstract dynamic objects*. Each such object is an abstraction of a number of concrete dynamic objects allocated by the same malloc call [4].

Shape Domain For analysing the shape of the heap, we use an improved version of the shape domain that we introduced in 2018 [5]. The domain over-approximates the *points-to* relation between pointers and symbolic addresses of memory objects in the analysed program: for each pointer-typed variable and each pointer-typed field of an abstract dynamic object p, we compute the set of all addresses that p may point to [4].

Template Polyhedra Domain For analysing numerical values, we use the template polyhedra abstract domains, particularly the *interval* and the *zones* domains [1].

Shape and Polyhedra Domain Combination Since both domains have the form of a template formula, we simply use them side-by-side in a product domain combination the resulting formula is a conjunction of the two template formulae [4].

This combination allows 2LS to infer, e.g., invariants describing an unbounded singly-linked list whose nodes contain values between 1 and 10. We show an example of such a list in Figure 1. Here, all list nodes are abstracted by a single abstract dynamic object ao<sup>1</sup> (i.e. we assume that they are all allocated at the same pro-

Figure 1. Unbounded singly-linked list abstracted by an abstract dynamic object ao1.

gram location). The invariant inferred by 2LS for such a list might look as follows:

$$<(ao\_1.next = \&ao\_1 \lor ao\_1.next = \text{NULL}) \land ao\_1.val \in [1, 10].$$

The first disjunction describes the shape of the list—the *next* field of each node points to some node of the list or to NULL1. The second part of the conjunct is then an invariant in the interval domain over all values stored in the list—it expresses the fact that the value of each node lies in the interval between 1 and 10.

## 2.2 Symbolic Paths

To improve precision of the analysis, we let 2LS compute different invariants for different *symbolic paths* taken by the analysed program. We require a symbolic path to express which loops were executed at least once. This allows us to distinguish situations when an abstract dynamic object does not represent any really allocated object and hence the invariant for such abstract dynamic object is not valid [4].

The symbolic path domain allows us to iteratively compute a set of symbolic paths p1,...,p<sup>n</sup> (represented by guard variables in the SSA) with associated shape and data invariants <sup>I</sup>1,...,In. The aggregated invariant is then <sup>p</sup><sup>1</sup> <sup>⇒</sup> <sup>I</sup><sup>1</sup> ∧···∧p<sup>n</sup> <sup>⇒</sup> <sup>I</sup>n, which corresponds to a power domain combination.

## 2.3 Re-using Freed Memory Object for Next Allocations

In C, it is possible that, after a free is called, the freed memory is subsequently re-used when a malloc is called afterwards. Due to this,

it may happen that the error state in the program in Figure 2 is reachable. This situation is, however, difficult to handle for 2LS as its memory model creates a unique abstract dynamic object for each malloc call. To overcome this limitation, we have introduced a special variable fr that is


#### Figure 2. Re-using a freed object

non-deterministically set to the value of the freed pointer at each free call. If two pointers x, y are compared in the analysed program using a relational operator op, we transform the comparison x op y into

$$((x\text{ }op\ y) \leftrightarrow ((x\neq fr \lor nondet\_x) \land (y\neq fr \lor nondet\_y)).\tag{1}$$

Here, nondet<sup>x</sup> and nondet<sup>y</sup> are unconstrained boolean variables modelling a nondeterministic choice. If neither x nor y has been freed, then the result of Eq. (1) is equal to x op y, but if either of the pointers might have been freed, then the result of Eq. (1) is non-deterministic, which makes our analysis sound for the described case.

## 2.4 Generic Abstract Domain Templates

As is mentioned in Section 1, abstract domains are represented in 2LS by so-called templates. The main reason of templates is that they reduce the second-order problem of finding an inductive invariant to a first-order problem of finding values of template parameters. Apart from defining the form of the template (a parametrised logical formula), each abstract domain also needs to specify an algorithm to perform join of the current

<sup>1</sup> Here, ao1.f is an abstraction of the f fields of all concrete objects represented by ao1. Analogously, &ao<sup>1</sup> is an abstraction of symbolic addresses of all represented objects.

values of template parameters with a model of satisfiability returned by an SMT solver. However, most of the domains use a similar approach to this algorithm, and therefore adding a new abstract domain to 2LS requires one to write an algorithm whose skeleton has already been written in existing domains.

To overcome this problem, we proposed a generic algorithm suitable for all existing abstract domains (see [6] for details). The main idea is based on the fact that most of the templates are conjunctions of multiple formulae, where each has its own parameter and describes a part of the analysed program, e.g., properties of a single program variable.

While this extension did not bring any additional functionality that would increase the score of 2LS in this year's edition of SV-COMP, it opened up possibilities for future enhancements, in particular (1) it simplifies adding of new abstract domains capable of analysing program properties that 2LS is currently not able to handle and (2) it is a significant step towards a support of generic abstract domain combinations that would allow 2LS to arbitrarily combine abstract domains and therefore analyse complex properties of programs requiring simultaneous reasoning in multiple domains.

## 3 Strengths and Weaknesses

One of the main strengths of 2LS is verification of programs requiring joint reasoning about shape and content of dynamic data structures. In 2019, we contributed 10 benchmarks into the ReachSafety category requiring such reasoning. The domain combination described in Section 2.1 allows 2LS to successfully verify 9 out of 10 of these benchmarks (the last one has timed out), making it the only tool capable of this apart from the category winner. Also, 2LS is notably strong in analysing termination, which is supported by the third place in the Termination category.

Still, there remain a lot of challenges and limitations. The main problem is that 2LS still lacks reasoning about array contents, and that it does not yet support recursion.

## 4 Tool Setup

The competition submission is based on 2LS version 0.8.2 The archive contains the binaries needed to run 2LS (2ls-binary, goto-cc), and so no further installation is needed. There is also a wrapper script 2ls which is used by Benchexec to run the tools over the verification benchmarks. See the wrapper script also for the relevant command line options given to 2LS. The further information about the contents of the archive could be find in the README file. The tool info module for 2LS is called two ls.py and the benchmark definition file 2ls.xml. As a back end, the competition submission of 2LS uses Glucose 4.0. 2LS competes in all categories except Concurrency and Java.

## 5 Software Project

2LS is implemented in C++ and it is maintained by Peter Schrammel with contributions by the community.<sup>3</sup> It is publicly available at http://www.github.com/diffblue/2ls under a BSD-style license.

<sup>2</sup> Executable available at https://doi.org/10.5281/zenodo.3678347.

<sup>3</sup> https://github.com/diffblue/2ls/graphs/contributors

## References


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **COASTAL: Combining Concolic and Fuzzing for Java (Competition Contribution)**

Willem Visser and Jaco Geldenhuys

Stellenbosch University, Stellenbosch, South Africa {visserw,geld}@sun.ac.za

**Abstract.** COASTAL is a program analysis tool for Java programs. It combines concolic execution and fuzz testing in a framework with built-in concurrency, allowing the two approaches to cooperate naturally.

## **1 Verification Approach and Software Architecture**

COASTAL analyses Java bytecode with an approach that combines concolic execution and fuzz testing in a unified framework. It uses the ASM bytecode manipulation library [2] to add code to compiled class files to monitor and interact with the system under test (SUT). The concurrent COASTAL components that carry out the analysis are shown in Figure 1:


<sup>-</sup>Jury member

**Fig. 1.** COASTAL architecture

**–** Divers, surfers, strategies, and the pathtree signal their actions via a publishsubscribe system. When events are published to the message broker, one or more observers are notified. The observers may, in turn, emit messages that direct the operation of COASTAL.

#### **1.1 Strategies**

As an example, a depth-first strategy is a simple configuration of COASTAL where the strategy employs only a single diver. The diver produces one path condition that is processed by the strategy by negating the last (deepest) constraint, and sending it to an SMT solver, which produces new input values (if any) that will explore the modified path. If a modified path condition is unsatisfiable, the last constraint is discarded and the process repeats. All path conditions are added to the pathtree as they are discovered. At the end of the analysis, the pathtree contains a summary of the execution tree of the SUT.

Other strategies include breadth-first and random exploration. Like depthfirst exploration, these strategies use only one diver and explore one path condition at a time. On the other hand, a generational strategy negates all the constraints of a path condition, one by one, and produces many potential input values. In this case, multiple divers can be used concurrently. Users can also deploy multiple strategies at the same time.

**Fuzzing strategies.** The user can employ surfers to perform straightforward fuzz testing (in the style of AFL [1,5,6]). Surfers use very little instrumentation. Unlike the divers — that instrument every bytecode instruction — only the outcomes of branching points are recorded. The "path condition" produced by a surfer is therefore a series of (mostly binary) choices that can be added to the pathtree; it lacks any details about the reason for the choice (for example, instead of "x > 5" it may simply record "false"), but the shape of the path is preserved. Multiple divers and multiple surfers are deployed concurrently and operate interactively.

**Hybrid strategies.** More advanced strategies can combine concolic and fuzzing analysis to exploit the strengths of both approaches: surfers (fuzzing) can rapidly explore new territory of the execution space, while divers (concolic) can investigate hard-to-reach corners. Such hybrid strategies enqueue (semi-)random inputs on s in and the results contribute to a "skeletal" pathtree. Since surfers produce results at a high rate, the easy-to-explore parts of the execution space are more quickly saturated. Unexplored regions of the pathtree are passed to the divers, and their results, in turn, open up new regions for the surfers to explore.

## **1.2 Observers and Models**

COASTAL was designed with extensibility in mind. One example is the use of observers. Any component is allowed to subscribe to the various message streams, and can interact with the system by publishing messages of their own, or by making direct calls to the public COASTAL API. Examples of observer tasks include:


In theory, strategies themselves could be implemented as observers. But since they are central to the operation of COASTAL, they are given special treatment.

Users can replace system- or user-level libraries by more appropriate models, either as a whole or on a method-by-method basis. For example, a complex library implementation of String.substring() can be replaced with a simpler, more efficient model that produces the same result and the same symbolic constraints.

## **2 Strengths and weaknesses**

The tool's strength lies in the combination of concolic and fuzzing analysis, but COASTAL is still under development and a "deep" bug (now fixed) prevented the use of fuzzing. Participation in SV-COMP [3] was invaluable in this regard: Several bugs and missing functionality were revealed and corrected.

**Results.** COASTAL does not output any incorrect answers, but produces an unknown result in 19% of cases. This is shown in column "Count" below.


For many cases, the answer is produced instantaneously (column "Immediate"). In the case of unknown answers, this indicates that COASTAL aborted its analysis because of an as-yet unsupported feature such as symbolic array sizes. For the 79 − 27 = 52 non-immediate unknown answers, COASTAL timed out because of large search spaces.

The longest-running true answer required 2 diver runs, each taking 20.48sec (printtokens eqchk.yml), whereas the longest-running false answer required 141 diver runs, each taking 0.54sec (spec1-5 product1.yml). This highlights a fundamental weakness of the tool: a long-running SUT takes longer to analyse. A generational strategy where multiple divers execute concurrently can ameliorate this problem, but on average does not find errors as quickly as the breadthfirst strategy. This points to the need to refine the generational strategy to prioritize shallow unexplored paths.

## **3 Tool setup**

## **Download.** http://doi.org/10.5281/zenodo.3679243 [7]

**Configuration.** COASTAL is configured to use a breath-first search strategy and a single diver. Z3 [4] is set as the constraint solver. (It is the only external tool required to run COASTAL and a Linux executable version is included in the download above.) Path conditions are limited to 800 conjuncts, and a time limit of 240 second is set. Symbolic strings are limited to 25 characters. Custom models are used for some Java classes: Character, String, StringBuilder, Pattern, Matcher, Scanner. COASTAL competed in the JavaOverall category.

**Installation.** The download above is self-contained. The COASTAL project at https://github.com/DeepseaPlatform/coastal/ includes shell scripts to package and run COASTAL for SV-COMP in the extra/svcomp subdirectory. The scripts needs an external copy of the Z3 solver to be available.

## **4 Software Project**

COASTAL is developed by the authors at Stellenbosch University, South Africa. It is available at https://github.com/DeepseaPlatform/coastal/ and is distributed under the GNU Lesser General Public License version 3.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Dartagnan: Bounded Model Checking for Weak Memory Models (Competition Contribution)

**TACAS SV-COMP Artifact 2020 Accepted**

Hernán Ponce-de-León-<sup>1</sup> , Florian Furbach<sup>2</sup>, Keijo Heljanko<sup>3</sup>, and Roland Meyer<sup>2</sup>

<sup>1</sup>University of the Bundeswehr Munich, Munich, Germany <sup>2</sup>TU Braunschweig, Braunschweig, Germany <sup>3</sup>University of Helsinki and HIIT, Helsinki, Finland

Abstract. Dartagnan is a bounded model checker for concurrent programs under weak memory models. What makes it different from other tools is that the memory model is not hard-coded inside Dartagnan but taken as part of the input. For SV-COMP'20, we take as input sequential consistency (i.e. the standard interleaving memory model) extended by support for atomic blocks. Our point is to demonstrate that a universal tool can be competitive and perform well in SV-COMP. Being a bounded model checker, Dartagnan's focus is on disproving safety properties by finding counterexample executions. For programs with bounded loops, Dartagnan performs an iterative unwinding that results in a complete analysis. The SV-COMP'20 version of Dartagnan works on Boogie code. The C programs of the competition are translated internally to Boogie using SMACK.

## 1 Overview and Software Architecture

Dartagnan is a bounded model checker for concurrent programs under weak memory models. It expects as input a program P annotated with a reachability condition S, a memory model M, and an unrolling bound k. It recursively unwinds all loops in P up to the bound k. The unwound program is converted into an SMT formula that symbolically represents all candidate executions. The memory model will filter out some candidates using a second formula, we explain this below. Events of a candidate execution model (instances of) program instructions, like memory accesses, local computations, and conditional/unconditional jumps. Edges model relations between events, including program order (the order within a thread), data-dependencies (an assigned variable is used within an expression), reads-from (matching each read with the write from which it takes its value), and coherence (the order in which writes commit to the memory).

A memory model can be understood as a predicate over candidate executions that declares some of them valid. We describe memory models in the CAT language [2]. A memory model is defined as a set of relations (those mentioned

<sup>-</sup>Jury member.


Fig. 1. CAT model used for SV-COMP'20.

above and others derived as unions, transitive/reflexive closures, compositions, etc.) and constraints over them (emptiness, acyclicity and irreflexivity). Given a memory model, we construct a formula that evaluates to true precisely under the candidate executions that are valid according to the memory model. Figure 1 shows the memory model used for SV-COMP'20. To support atomic blocks, Dartagnan adds a specific edge (rmw) for every pair of events between VERIFIER\_atomic\_begin() and its matching VERIFIER\_atomic\_end() or in a VERIFIER\_atomic\_ function. We encode atomicity for sequential consistency (SC) as the empty intersection of rwm and paths starting and ending with an external communication (i.e. between different threads). This means once an atomic block starts, external communications with the block are forbidden until all events in the block have been executed.

Dartagnan comes with a rich assertion language inspired by Herd [1]. Assertions define inequalities over the values of local and global variables. They can be used freely throughout the code, rather than being limited to the end of the execution. Semantically, our assertions do not stop the execution but record the failure and continue. To achieve this, each instructions assert(exp) is transformed to a local computation f ← exp where the fresh variable f ∈ F stores the value of exp at the corresponding point of the execution. We refer to the formula - <sup>f</sup>∈<sup>F</sup> <sup>¬</sup><sup>f</sup> as the reachability condition.

The formula for candidate executions of the program, the formula for validity under the given memory model, and the reachability condition together (in conjunction) yield the SMT encoding of the reachability problem at hand. Any solution to the conjunction corresponds to an execution that is valid according to the memory model and violates at least one assertion. Details on the encoding can be found in [8,9].

Dartagnan implements a may-alias analysis to improve pointer precision and a novel relation analysis. The latter technique reduces the SMT encoding to those parts of the relations that might affect the consistency with the memory model, resulting in a considerably smaller formula. Relation analysis improves the performance up to two orders of magnitude [4,5]. We remark that related approaches represent each candidate execution explicitly [1,6]. Thanks to the symbolic representation of executions and static analysis techniques such as relation analysis, Dartagnan is often more efficient [4,5].

Figure 2 shows the overall architecture of Dartagnan. It reads programs written in the litmus format of Herd [1] or the intermediate verification language Boogie [7]. For the competition, C programs are compiled to LLVM and then

Fig. 2. Dartagnan's architecture.

translated internally to Boogie using the SMACK tool [11]. The SMT solver is Z3 [3]. When a violation is found, Dartagnan returns a witness execution.

## 2 Strengths and Weaknesses

The main strength of Dartagnan is its fully configurable memory model. Unfortunately, in SV-COMP'20 there is no category for verification tasks under weak memory models. On the SV-COMP'20 benchmarks, Dartagnan reports only one incorrect result, being beaten in that aspect only by CPAchecker, DIVINE, Lazy-CSeq and Yogar-CBMC; three of them category winners. The incorrect result is related to the use of pointer arithmetic which is currently not supported by our alias analysis.

Its main strength is also its main weakness: Dartagnan's performance cannot quite match that of other verifiers that were developed specifically for sequential consistency. Dartagnan performs particularly poor on benchmarks with big atomics blocks. This is the case for most of the verification tasks in the pthread-wmm group which represent 83% of the ConcurrencySafety category. The problem is that Dartagnan adds rmw edges for all pairs in an atomic block. This results in a large encoding (even using relation analysis) and highly impacts its performance.

## 3 Tool Setup and Configuration

Besides the program to be verified, Dartagnan expects a CAT file containing the memory model of interest. For SV-COMP'20, this is the extension of sequential consistency given in Figure 1. The tool is run by executing the following command:

\$ java -jar dartagnan/target/dartagnan-V-jar-with-dependencies.jar -cat <CAT file> -i <program file> [options]

Placeholder V is the tool version (currently 2.0.5) and options is used to configure the unrolling bound, the alias analysis, and the fixpoint encoding. The full list of options can be found on the project website (see Section 4).

To make sure not to miss a violation, the competition version of Dartagnan implements an iterative approach. Initially, the bounded model checking algorithm is called with an unrolling bound of one. If it finds a violation or can prove that all loops have been unrolled completely (this is done using unwinding assertions), the verification process terminates with a conclusive answer. If not, Dartagnan increases the bound by one and repeats the process. For program with an infinite state space, our tool does not terminate.

Dartagnan participates in the ConcurrencySafety category. No specification file is required. The artifact is available on [10]. To reproduce the results of the competition, the tool can be executed with the following wrapper script:

```
$ Dartagnan-SVCOMP.sh <program file>
```
## 4 Software Project and Contributors

The project home page is https://github.com/hernanponcedeleon/Dat3M. Dartagnan is open source software distributed under the MIT license.

Acknowledgement: We thank Dirk Beyer and Philipp Wendler for their help during the process of integrating Dartagnan into the competition framework. We also thank Natalia Gavrilenko for her contributions to the development of the bounded model checking engine of the tool [4,5].

## References


#### 382 H. Ponce-de-Le´on et al.


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## VeriAbs : Verification by Abstraction and Test Generation (Competition Contribution)

Mohammad Afzal<sup>1</sup>, Supratik Chakraborty<sup>2</sup> , Avriti Chauhan<sup>1</sup>, Bharti Chimdyalwar<sup>1</sup>, Priyanka Darke<sup>1</sup>,-, Ashutosh Gupta<sup>2</sup>, Shrawan Kumar<sup>1</sup>, Charles Babu M<sup>3</sup>, Divyesh Unadkat<sup>1</sup>,<sup>2</sup> , and R Venkatesh<sup>1</sup>

> <sup>1</sup> Tata Research Development and Design Center, Pune, India <sup>2</sup> Indian Institute of Technology, Bombay, India

<sup>3</sup> Chennai Mathematical Institute, Chennai, India

Abstract. VeriAbs is a strategy selection based reachability verifier for C code. It analyzes the structure of loops, and intervals of inputs to choose one of the four verification strategies implemented in VeriAbs. In this paper, we present VeriAbs version 1.4 with updates in three strategies. We add an array verification technique called *full-program induction*, and enhance the existing techniques of loop pruning, *k*-path interval analysis, and disjunctive loop summarization. These changes have improved the verification of programs with arrays, and unstructured loops and unstructured control flows.

## 1 Verification Approach

VeriAbs is a reachability checker for C code that employs a portfolio of techniques and works by smartly selecting a sequence of techniques for each problem instance. Specifically, it performs structural and interval analysis of the input code to determine a sequence of suitable verification techniques, or a strategy [2]. An earlier version of the tool appeared in [9]. Figure 1 shows the architecture with this year's enhancements in dashed lines. When the input program contains unstructured loops, VeriAbs performs fuzz testing in parallel with *k*-induction. If the program does not contain unstructured loops but loops manipulating arrays, VeriAbs applies array abstraction techniques like loop shrinking, loop pruning, and full-program induction [7] in sequence. If the program contains inputs of very short ranges, VeriAbs applies explicit state model checking, and loop invariant generation using program behaviour, syntax and counter-examples in parallel [2]. Otherwise VeriAbs applies *k*-path interval analysis, loop abstraction, loop summarization, bounded model checking, and *k*-induction in the order presented in the architecture. If any technique successfully (in)validates the encoded properties, the tool reports the result, generates the witness, and exits. We next explain the enhancements made to VeriAbs this year.

### 1.1 Tool Enhancements

*Full-Program Induction.* VeriAbs applies full-program induction as presented in [7] to programs manipulating arrays of a symbolic size N given as a parameter. It takes as input

<sup>-</sup>Jury member, corresponding author : priyanka.darke@tcs.com

Verification Results: **S** ± Program **S**afe, **F** ± Property **F**ails, **U** ± **U**nknown

Fig. 1. Architecture Diagram

a parameterized program represented by PN, annotated with parameterized pre- and postconditions represented by ϕ(N) and ψ(N) respectively and checks the validity of the Hoare triple {ϕ(N)}P<sup>N</sup> {ψ(N)} for all values of N (>0). We summarize the technique in [7] here.

In the base case, it verifies that the given Hoare triple holds for a fixed number of values of N (say for N =1). If the check fails, a property violation is reported. It then hypothesizes that the Hoare triple {ϕ(N −1)} P<sup>N</sup>−<sup>1</sup> {ψ(N −1)} holds for N > 1, where P<sup>N</sup>−<sup>1</sup> is the program with parameter N − 1. In the induction step, the technique synthesizes a code fragment ∂PN, called the *difference program*, such that {ϕ(N)} P<sup>N</sup> {ψ(N)} is valid iff {ϕ(N)}P<sup>N</sup>−<sup>1</sup>;∂P<sup>N</sup> {ψ(N)} is valid. The *difference program* is the computation to be performed after the program P<sup>N</sup>−<sup>1</sup> has executed to get the same state as PN. It then computes a formula ∂ϕ(N), called the *difference pre-condition*, such that ϕ(N)is implied by the conjunction of ϕ(N −1) and ∂ϕ(N), and that ∂ϕ(N) continues to hold after the execution of P<sup>N</sup>−1. The induction step now needs to prove the validity of {ψ(N −1)∧∂ϕ(N)}∂P<sup>N</sup> {ψ(N)}. It uses weakest pre-condition computation to infer formulas pre(N) over the variables and arrays whose values were computed by P<sup>N</sup>−<sup>1</sup> and subsequently read in ∂PN. Base case is checked for pre(N) and it is subsequently used to strengthen the pre- and post-conditions in the inductive step. The technique, thus, inducts over the entire program via the parameter N, in place of inducting over individual loops by using specialized predicates as in [6]. Full-program induction does not rely on inductive invariants for each loop in the program.

```
1 b=0, d=0, c=30;
2 a = *;
3 if (a == 10)
4 c = 30; //Path P1
5 else if (a < 10)
6 b = 3; //Path P2
7 else if (a > 10)
8 d = 31; //Path P3
9 if (c==30 && a==10)
10 d = 31;
11 if(a >= 10)
12 assert(d == 31);
```
*k-Path Interval Analysis.* VeriAbs implements a *k*-path interval analysis which is an extension of the standard nonrelational interval domain [2]. It maintains the path-wise data ranges of variables along a configurable *k* number of paths at each program point, thus matching the precision of relational domains. When the number of paths at the join point exceeds *k*, a subset of paths are merged to maintain *k* paths at the join point. In previous versions, arbitrary subsets of paths were merged. For SV-COMP 2020, the join operation identifies variables of interest (VOIs) with respect to the given property to decide which paths to merge such that VOIs can retain precise values.

Fig. 2. Example

Consider the example shown in Figure 2 with a valid property at line 12 to be analyzed with *k=2* and the VOI d. It can be seen that three paths – P1, P2 and P3 join at line number 9. The enhanced join operation merges paths P1 and P2 so that the resultant paths are as follows:

```
P1+P2: {a=[MIN,10], b=[0,3], c=30, d=0},
P3: {a =[11,MAX], b=0, c=30, d=31}.
```
This information at the join point helps validate the property. Earlier, the join operation could merge the path P3 with P1 or P2, leading to an imprecise interval – [0,31] of d at the join point, resulting in spurious property violation. Our implementation considers variables used in the encoded property as the VOIs.

*Loop Pruning* is an array abstraction technique that defines a set of criteria (and a resulting set of program transformation rules) which if satisfied by loops processing arrays, it is sufficient to analyze the first few elements instead of the entire array [14]. In this version, pruning has been extended to programs containing nested loops and multidimensional arrays. By structural analysis, we identify if elements of the multidimensional array are processed *uniformly* in loops. If yes, we compute reduced dimensions of the array (for example, a[m][m] may be reduced to a[4][4]). We have also refined the pruning criteria to improve its applicability over multidimensional and dynamically allocated arrays, 56 additional SV-COMP'20 ReachSafety benchmarks are solved by the current implementation of array pruning as compared to the previous version.

*Disjunctive Loop Summarization.* VeriAbs analyses interleavings of unique paths within a loop to produce its disjunctive summary to find errors and proofs [2]. In the current version, VeriAbs extends this technique in the following situations: (a) while it earlier restricted affine transformations to identity matrices, we now allow diagonal matrices with finite monoid [4]; (b) we use the approach of generating *flattenings* as shown in [4] for loops which are *flattable*; (c) we use VeriAbs' general philosophy of deriving over-approximate summaries using the techniques in [12], when precise disjunctive summary is not derivable.

## 2 Software Architecture

VeriAbs is primarily developed in Java and Perl. It implements all program analyses (except full-program induction) and program transformers in Prism [13], the TCS Research program analysis framework. It transforms programs processing multidimensional or dynamically allocated arrays in loops to equivalent programs with symbolically sized 1D arrays. This transformed program is consumed by VAJRA v1.0 [7], the tool that implements full-program induction. VAJRA uses LLVM v6.0.0 [15] compiler infrastructure for program transformations and Z3 SMT solver v4.8.7 [10] for checking the validity of Hoare triples and for computing weakest pre-conditions. For BMC VeriAbs uses the C Bounded Model Checker (CBMC) v5.10 [8] with the Glucose Syrup SAT solver v4.0 [3]. For fuzz testing we enhance American Fuzzy Lop [16] to allow test case mutation within valid data ranges generated by *k*-path interval analysis for better path coverage. VeriAbs uses *k*-induction with continuously refined invariants as implemented in CPAchecker v1.8 [5] for an improved precision over our existing light weight implementation of *k*-induction.

In this version, we additionally derive disjunctive invariants for correctness witnesses using abstract acceleration and abstract interpretation, and add them to the control flow automaton generated by CPAchecker. If all implemented techniques fail, we use techniques implemented in Ultimate Automizer v3204b741 [11] to generate correctness witnesses.

## 3 Strengths and Weaknesses

The main strengths of VeriAbs are (1) strategy selection that correlates strengths of verification techniques and input code properties, and (2) a portfolio of sound techniques. Weaknesses: (1) long strategies – the lengths of strategies executed by VeriAbs in the worst case can be ten techniques, thus time consuming. Hence, smarter and shorter strategies are needed. (2) Nonlinear expressions in loops – loop abstractions in VeriAbs assign non-deterministic values to variables modified in such expressions. (3) Multidimensional arrays in loops manipulating noncontiguous locations – these are limitations of loop shrinking and pruning. These weaknesses are not limitations of the state-of-the-art, and appropriate techniques if integrated into VeriAbs can be easily invoked by the strategy selector to enable verification of such programs.

## 4 Tool Setup and Configuration

The VeriAbs SV-COMP 2020 executable is available for download at https://gitlab.com/ sosy-lab/sv-comp/archives-2019/tree/master/2020/veriabs.zip. To install the tool, download the archive, extract its contents, and then follow the installation instructions in VeriAbs/IN-STALL.txt. To execute VeriAbs, the user needs to specify the property file of the respective verification category using the --property-file option and the -64 option for programs with a 64 bit architecture. The witness is generated in the current working directory as witness.graphml. A sample command is as follows:

VeriAbs/scripts/veriabs <-64> --property-file ALL.prp example.c

VeriAbs participated in the ReachSafety and the SoftwareSystems-ReachSafety categories of SV-COMP 2020. The BenchExec wrapper script for the tool is veriabs.py and the benchmark description file is veriabs.xml.

## 5 Software Project and Contributors

VeriAbs is maintained by some members of the Foundations of Computing group at TCS Research [1]. They can be contacted at veriabs.tool@tcs.com. We are thankful to the developers of American Fuzzy Lop, CBMC, CPAchecker, Glucose Syrup, LLVM, UAutomizer and Z3 for allowing us to use the tools within VeriAbs.

## References


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **GACAL: Conjecture-based Verification (Competition Contribution)**

Benjamin Quiring and Panagiotis Manolios

Northeastern University, Boston MA, USA

**Abstract.** GACAL verifies C programs by searching over the space of possible invariants, using traces of the input program to identify potential invariants. GACAL uses the ACL2s theorem prover to verify these potential invariants, using an interface provided by ACL2s for connecting with external tools. GACAL iteratively searches for and proves invariants of increasing complexity until the program is verified.

## **1 Verification Approach**

GACAL is a tool for verifying reachability queries in C programs by iteratively and efficiently performing conjecture generation and conjecture verification. Conjecture generation involves searching through the space of possible conjectures using evaluation-based testing to identify likely-to-hold conjectures, and conjecture verification consists of using software verification technology to verify these conjectures. Our initial motivation was to develop a computational agent that can automatically complete the Invariant Game [1], in which players suggest invariants that are used by a reasoning engine to verify imperative programs, which we did with success- GACAL is a more fully developed form of the underlying conjecture generation ideas. This section presents a brief overview of GACAL's basic structure and methods for conjecture-based verification, and then discusses these, as well as associated challenges, in more depth. Section 2 provides information about the GACAL project, Section 3 provides an evaluation of GACAL, and Section 4 concludes this paper and discusses future work.

In GACAL, conjectures are potential invariants paired with program locations. Evaluation-based testing consists of evaluating possible invariants using execution-produced program traces. The ACL2s theorem prover [2] verifies conjectures using a graph representation of the input program. To search through the space of conjectures, GACAL first constructs a space of terms, which are C-expressions composed of the constants, variables, and arithmetic/bitwise operators in the program. Terms are combined using relational and logical operators to create possible invariants, and possible invariants which hold in all generated program traces are promoted to potential invariants and turned into conjectures. Discovered potential invariants are then analyzed using ACL2s and, if proven, used to verify the program. In the case that the program cannot be verified from the currently proven invariants, the above process is repeated: construct new, more complex, terms, find potential invariants via testing on program traces,

<sup>-</sup>Jury member: quiring.b@northeastern.edu

prove potential invariants, attempt program verification, and repeat. At a highlevel, this loop is the heart of GACAL's conjecture-based verification.

GACAL's approach to verification presents challenges which can be summarized into two categories: how to minimize the number of generated conjectures, and how to optimize the interactions with ACL2s. The techniques GACAL uses to address these challenges, as well as a more in-depth explanation of the previously mentioned methods are outlined below.

**Term and Invariant Construction** GACAL builds the space of terms by iteratively constructing all terms of a fixed size, where the size of a term is the number of constants, variables, and operators in that term. GACAL uses a collection of rewrite rules to filter the newly constructed terms: terms which can be rewritten to an equivalent form that has already been constructed are not kept. The size partial order on terms allows GACAL to perform rewriting effectively. Furthermore, the term constructor searches for new rewrite rules by evaluating and comparing terms under a set of random assignments to find pairs of equivalent terms. The discovered equivalences are generalized and turned into rewrite rules which are added to the collection of rewrite rules. We designed the rewriting techniques to have the property that all terms which cannot be rewritten are semantically distinct. In general, the term space is at least asymptotically exponential in size, and the rewriting techniques above, for the class of problems we consider, significantly improve the asymptotics.

Possible invariants are C-expressions of the form x == y, x < y, x <= y, and P || Q, where x, y are terms and P, Q are possible invariants. We allow multiple invariants to be associated with each program location, hence, we do not need explicit conjunction. We note that the space of possible invariants is closed under logical negation. GACAL filters out possible invariants which can be rewritten to an equivalent form that has already been created, reducing the size of the invariant space. The order the invariant space is searched over is deterministic and independent of the given program, and was chosen because it worked well for the benchmark programs. At a high level, GACAL inspects more specific invariants before more general invariants (e.g. x == y before x <= y).

**Trace Generation** To produce traces through the program GACAL creates many initial program states which randomly seed the result of all nondeterministic behaviors that occur during execution of the program, making them deterministic. For example, a seeded pseudo-random number generator can obtain values for 'nondeterministic integer' expressions. The initial states are propagated through the program for a bounded number of steps, generating a set of states associated with each program location. These initial traces are not changed during the course of verification.

Testing on program traces is essential to GACAL's conjecture generation, but programs may, for example, contain loops with many iterations or not terminate, and so obtaining traces which correspond to complete program executions may be computationally infeasible or impossible. To address this, GACAL creates additional types of traces which approximate the input program's behavior. The first type of these traces generalizes large constants to small and/or nondeterministic values, which allows loops with originally many iterations to be completed. The second type uses the counter-example generation abilities of ACL2s [3,4,5] to generate states at any program location which satisfy all currently proven invariants at that location, which are then propagated through the program. As GACAL proves more invariants, it recomputes the second type of traces to obtain a better approximation of the program. Since invariants tested on these traces are later checked for correctness, the fact that the traces may not reflect the original program's behavior does not introduce unsoundness. The states from the above two methods are only used to test invariants at a program location if there are no states from the original traces produced for that location, and if traces cannot be found at all then GACAL assumes all invariants are potential.

**Conjecture Verification** To prove conjectures, GACAL uses an algorithm which takes previously proven invariants as well as currently unproven potential invariants and iteratively removes invariants which cannot be proven until it reaches a fixpoint. This process requires a large number of verification queries and for the majority of programs checking these queries using ACL2s is where the majority of execution time is spent. To improve the ability of ACL2s to reason about GACAL queries, we developed an arithmetic library consisting of ACL2s theorems about the GACAL-supported C operators. Additionally, GACAL caches previous queries and their results, which allows it to answer queries that are similar to cached queries, without using the theorem prover. Finally, GACAL saves counter-examples that ACL2s provides when it falsifies queries and uses them to falsify new queries.

## **2 Tool Setup and Software Project and Architecture**

The competition submission<sup>1</sup> uses GACAL version 1.0. GACAL requires Python 3, Java, and Common Lisp, and the competition archive contains all files necessary to run GACAL without further installation. Other relevant information may be found in the README file. GACAL only competes in the C ReachSafety-Loops category. GACAL is maintained by Benjamin Quiring and Panagiotis Manolios, and is implemented primarily in Common Lisp. The external tools used by GACAL are the Eclipse CDT parser and the ACL2 Sedan [2]. GACAL is publicly available at https://gitlab.com/acl2s/conjecture-generation/gacal under a GNU GPLv3 license.

GACAL does not handle all C language features. Most importantly, GACAL does not handle arrays and types other than 32-bit unsigned and signed integers. There is no theoretical reason for this. GACAL does not correctly model C semantics for undefined behavior in signed arithmetic. There is a bug in the contest submission for translating goto statements into our graph representation of programs which affects a small number of benchmarks.

<sup>1</sup> Available at https://gitlab.com/sosy-lab/sv-comp/archives-2020 and Zenodo [6].

## **3 Evaluation**

GACAL performs best on programs it can execute to completion because this allows us to produce high quality traces covering all program locations. When this is not the case, GACAL often creates false conjectures which lead to a large number of theorem prover queries. Additionally, we note GACAL's execution time depends on the size of the term and invariant spaces, which grow exponentially based on the number of program variables, constants, and operations. The current version of GACAL verifies 66 of the 109 benchmark programs it parses, and the top three tools on this distribution verified 102, 70, and 70. There was one program which no other tools could verify, though GACAL succeeded.

The core of GACAL consists of potential invariant generation using program traces and the rewriting methods as outlined above. We found that the addition of the arithmetic library is essential to our ability to reason about unsigned arithmetic and the mod operator, allowing GACAL to verify 10% more total programs (which deal primarily with the listed features) and cuts the average time to query ACL2s by 33% on the verification queries which were not caught by the caching. We found that the additional trace generation methods did not significantly increase the number of programs that were verified, though they did decrease the average time for verifying a program. The caching of proof results and counter-examples is able to eliminate 85% of all verification queries from being submitted to ACL2s for checking, which increases the number of programs which are verified by over 10% and almost halves the average cost to verify a program. The caching methods also amplifies the benefits of the library and extra trace generation methods.

## **4 Conclusions and Future Work**

There are many ways to improve GACAL, including incorporating classical analyses such as range analysis, abstract interpretation, symbolic evaluation, etc, as well as handling a larger subset of the C language. Another improvement to GACAL is to perform the search for disjunctive invariants more efficiently; currently GACAL often finds many potential but false disjunctive conjectures, which result in a large number of verification queries. One way to improve the search may be to analyze the program to find meaningful hypotheses, which could considerably lower the number of tested and generated conjectures.

We believe that GACAL provides evidence that our conjecture-based verification techniques can be used to improve current software verification tools, as we were able to verify a competitive number of programs on the distribution we parse and we were able to verify a program that all other tools failed to verify, despite not using any of the classical analyses identified above.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Java Ranger at SV-COMP 2020 (Competition Contribution)**

Vaibhav Sharma<sup>1</sup>- , Soha Hussein<sup>1</sup>,<sup>2</sup> , Michael W. Whalen<sup>1</sup> , Stephen McCamant<sup>1</sup>, and Willem Visser<sup>3</sup>

> <sup>1</sup> University of Minnesota, Minneapolis, MN, USA {vaibhav, husse200, mwwhalen, smccaman}@umn.edu <sup>2</sup> Ain Shams University, Cairo, Egypt soha.hussien@cis.asu.edu.eg <sup>3</sup> Stellenbosch University, Stellenbosch, South Africa visserw@sun.ac.za

**Abstract.** Path-merging is a known technique for accelerating symbolic execution. One technique, named "veritesting" by Avgerinos et al. uses summaries of bounded control-flow regions and has been shown to accelerate symbolic execution of binary code. But, when applied to symbolic execution of Java code, veritesting needs to be extended to summarize dynamically dispatched methods and exceptional control-flow. Such an extension of veritesting has been implemented in Java Ranger by implementing as an extension of Symbolic PathFinder, a symbolic executor for Java bytecode. In this paper, we briefly describe the architecture of Java Ranger and describe its setup for SV-COMP 2020.

## **1 Approach**

Symbolic execution is a well-known program analysis technique that has been applied to many applications such as test generation [3,7], equivalence checking [6,8], and vulnerability finding [13]. However, when applied to large software, symbolic execution can suffer from scalability challenges caused by path explosion. Path-merging techniques such as veritesting [1] and dynamic state merging [4] help alleviate these scalability limitations. In particular, veritesting attempts to construct a static summary of a multi-path region and use it. Veritesting has been shown to significantly accelerate symbolic execution of binary code. Given that a large amount of software in use today is still written in Java, it is desirable to bring the benefits of veritesting to symbolic execution of Java as well. However, features such as dynamic dispatch make path-merging for Java code challenging [11]. The summary of a multi-path region that contains a dynamically-dispatched method call can only be constructed if the method to be called can also be summarized. Java Ranger (JR) extends the current stateof-the-art path-merging ideas presented by Avgerinos et al. [1] by first building static summaries which are later transformed using runtime information such as

<sup>-</sup>Jury Member

the dynamic type of an object reference used for accessing a field. Java Ranger is built as an extension to Symbolic PathFinder (SPF) [5].

## **2 Architecture**

Java Ranger is implemented as an SPF listener that watches for symbolic branch conditions in branching instructions. On encountering a symbolic branch instruction, JR attempts to create a summary for the multi-path region that begins at that branch instruction and ends at its exit points. A multi-path region is a region of code that begins at a branch instruction with a symbolic branch condition. An exit point of a multi-path region is either (1) the first program location in a control-flow path through the multi-path region which could not be summarized, or (2) the location of the immediate post-dominator of the multi-path region. This mechanism is also explained by Sharma et al. [12] in Figure 4.

## **3 Strengths And Weaknesses**

Since JR improves scalability limitations of symbolic execution, its strength can only be observed when running it over large software. However, JR falls back to vanilla symbolic execution when it finds no opportunity for path-merging. SV-COMP 2020 had 416 verification tasks in the Java track. More information on SV-COMP 2020 can be found in its competition report [2]. JR instantiated at least one static summary on 96 different benchmarks of the 416 benchmarks. The summary for a multi-path region can be instantiated more than once on each benchmark because it is possible that the symbolic executor will encounter the same multi-path region more than once while running the benchmark. In total, JR instantiated 356 unique summaries. The total number of instantiated summaries used by JR was 20,182. JR also inlined a method summary a total of 62,857 times while instantiating these summaries.

JR also had a "unknown" conclusion on 40 of the 416 SV-COMP 2020 verification tasks. 22 of the 40 were caused due to our JR configuration which turned off support for symbolic strings because we found SPF's support for solving string constraints was not stable. 9 "unknown" conclusions were reached due to missing support for symbolic array lengths in multi-dimensional arrays. 8 of the 40 occurred due to a timeout. The last "unknown" result occurs in the equivalence check verification task in the ApacheCLI benchmark due to JR's use of a depth limit.

We made use of two depth limit parameters in SV-COMP 2020. The first was a limit on the exploration depth of our baseline symbolic executor, SPF. The second was a depth limit on the recursive depth to which our method summaries would be inlined. While we wished to avoid the use of any such limit, we found similar kinds of limits were used by many participanting tools in SV-COMP 2019. It is common to use some kind of limitation when applying symbolic execution tools in practice, since they can get bogged down by path explosion or related problems, and path-merging helps with but does not eliminate this issue. The Java verification category of SV-COMP 2020 did not score a tool's answer differently if it used a depth limit for producing that answer. Instead, the use of depth limit is reflected in each tool's score only if it caused the tool to produce an incorrect answer. We describe these depth limits and JR's configuration options in the following section.

## **4 Tool Setup and Configuration**

Java Ranger's setup is very similar to the setup used by SPF. Since Java Ranger is simply an extension of SPF, the Java Ranger directory can be specified as a valid jpf-symbc extension of JPF. A JR configuration requires the following additions.

### veritestingMode = <1-5>

veritestingMode specifies the path-merging features to be enabled with each higher number adding a new feature to the set of features enabled by the previous number. Setting veritestingMode to 1 runs vanilla SPF. Setting it to 2 enables path-merging for multi-path regions with no method calls and a single exit point. Setting it to 3 adds path-merging for multi-path regions that make method calls where the method can be summarized by Java Ranger. Setting it to 4 adds pathmerging for multi-path regions with more than one exit point caused due to exceptional behavior and unsummarized method calls. Setting it to 5 adds pathmerging for summarizing return instructions in multi-path regions by treating them as an additional exit point.

performanceMode = <true or false> Setting performanceMode to true causes Java Ranger to minimize the number of solver calls to check the feasibility of the path condition when summarizing a multi-path region with multiple exit points.

TARGET CLASSPATH WALA=<classpath of target code> Java Ranger needs this variable to be set up as environment variable. It is not part of the .jpf configuration file. This environment variable tells Java Ranger where it should be expecting to find code that needs to be statically summarized.

jitAnalysis=<true or false> When turned on (the default value), this option causes JR to summarize multipath regions when it encounters them. When turned off, JR attempts to summarize all multi-path regions reachable in a statically-computed interprocedural call graph up to a configurable limit.

recursiveDepth=<an integer value> This option forces JR to restrict inlining of method summaries up to the value provided for this option. We set this parameter to 12 for SV-COMP 2020.

The following option is a JPF [14] configuration option which we also used for SV-COMP 2020.

#### search.depth limit=<an integer value>

This option forces JPF to restrict its exploration to the depth provided as the value for this option. JPF constructs a tree of possible choices and explores the tree in a heuristic order, depth-first by default. Since JR is built as an extension

to SPF, which is in turn built as an extension to JPF, we were able to restrict JR's exploration of choices using this option. We set this parameter to the value 13 for SV-COMP 2020.

## **5 Software Project and Contributors**

Java Ranger is an extension of SPF. It is maintained on GitHub [9]. The version of Java Ranger that participated in Sv-COMP 2020 is publicly available [10]. For more information, please contact the authors of this paper.

## **6 Acknowledgments**

The research described in this paper has been supported in part by the National Science Foundation under grant 1563920.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## JDart: Dynamic Symbolic Execution for Java Bytecode (Competition Contribution)

Malte Mues and Falk Howar

Dortmund University of Technology Dortmund, Germany malte.mues@tu-dortmund.de falk.howar@tu-dormtund.de

Abstract. JDart performs dynamic symbolic execution of Java programs: it executes programs with concrete inputs while recording symbolic constraints on executed program paths. A constraint solver is then used for generating new concrete values from recorded constraints that drive execution along previously unexplored paths. JDart is built on top of the Java PathFinder software model checker and uses the JConstraints library for the integration of constraint solvers.

## 1 Overview

JDart is a dynamic symbolic execution engine for the JVM build on top of Java PathFinder (JPF) [11]. Dynamic symbolic execution [4,6] (sometimes also referred to as concolic execution) executes programs with concrete values while recording symbolic constraints for execution paths. The approach combines the benefits of fast concrete execution with the possibility of generating new concrete values, triggered by symbolic constraints, that exercise previously unexplored program behaviors. JDart can be used for checking assertions in Java programs: Concolic execution will explore new program paths until either (a) an assertion violation is discovered, (b) all program paths have been explored, or (c) resource limits of the analysis are exhausted.

The initial driver of the development of JDart was the need for an analysis that is robust enough to handle large and complex systems, concretely the AutoResolver software for prediction and resolution of airplane loss of separation developed at NASA Ames Research Center [7]. Though JDart provides a robust and scalable platform for dynamic symbolic analysis of Java programs [7], we had to extend its functionality in several ways in order to be able to compete at SV-COMP 2020 [1]. We developed:


Fig. 1: Architecture of JDart [7].

While (1) enabled JDart to enter the competition, (2) accounts for the largest part of improvements over our own baseline, and (3) contributes to better performance on some benchmarks with assertion violations in big state spaces.

## 2 Architecture

JDart combines dynamic execution with recording and analysis of symbolic path constraints. It runs as an extension of the JPF software model checker [11]. In particular, JDart uses the Java virtual machine implemented by JPF and its capabilities for annotating values on the stack and the heap with symbolic information. The tool itself is written in Java and uses JConstraints [5] for encoding SMT problems. Moreover, JConstraints acts as a frontend to an SMT solver (e.g., Z3 [3]) used for finding concrete values that drive the analysis.

Figure 1 illustrates the architecture of JDart: The tool consists of three layers: Concrete analysis frontends make up the top layer (e.g., generation of method summaries, generation of test suites, assertion checking). The main components record and analyze execution paths (Explorer) and perform concolic execution (Executor). The Executor uses concolic implementations of bytecode instructions. These bytecodes are executed instead of the original JPF bytecodes. A concolic bytecode tracks the symbolic representation of a value and annotates a concrete value with its symbolic counterpart. Whenever execution takes a

branching decision based on a concrete value with a symbolic annotation, the symbolic value is added to the constraints tree maintained by the Explorer. A constraint solver is used for finding concrete values that drive execution along unexplored paths of the tree.

Leveraging the modular architecture of JDart and JConstraints, we implemented a meta-constraint solver for finding small concrete values for symbolic numeric variables. This allows JDart to find assertion violations faster and with less resource consumption in cases where a symbolic variable controls the number or length of execution paths (e.g., symbolic array size or a symbolic loop bound). The meta-constraint solver performs multiple calls to an SMT solver, adding successively weaker bounds to numeric variables. E.g., for a path constraint ϕ over symbolic numeric variable x, the solver adds bounds (−z ≤ x) ∧ (x ≤ z) with z ∈ (1, 2, 3, 5, 8, 13, 21, ...), i.e., the first numbers in the Fibonacci sequence. If the solver finds a model for the constraint, JDart uses this model for driving concolic execution. In case no model is found in a fixed number of attempts, the SMT solver is called without added bounds. The number of attempts is a configuration parameter of JDart and was fixed to 7 for SV-COMP 2020.

Analysis of JDart can be bounded by termination strategies. When checking assertions the termination strategy is stopping on the first occurrence of an assertion violation. Additional strategies could be bounding depth of the symbolic analysis, bounding runtime, or termination on specific errors. We refer the reader to [7] for a more detailed and complete discussion of the features of JDart.

## 3 Strengths and Weaknesses

JDart scored 524 points (max. of 602) in the Java track and was declared third winner for Java, behind JBMC (527 points) [2] and Java Ranger (549 points) [9]. All other tools scored considerably fewer points than JDart (next best is COASTAL [10] with 472). As Java Ranger and JBMC, JDart did not report a single incorrect verdict. JDart exhibits the general strengths and weaknesses of dynamic and symbolic analysis approaches for Java programs:


Unbounded Behavior. Based on principles of symbolic execution, JDart does not terminate on unbounded loops or in case of unbounded recursion, leading to a number of timeouts on the corresponding set of benchmarks.

## 4 Tool Setup

The source code of JDart used for the competition artifact [8] is available on GitHub<sup>1</sup>. JDart is designed as a plug-in to JPF and relies on ant as a build system. One of its dependencies is the jpf-core project [11]. The other dependency is the JConstraints library, which was configured to use Z3 [3] with incremental solving as a constraint solver for SV-COMP 2020.

For the competition, JDart is wrapped by the run-jdart.sh shell script which generates .jpf configuration files, specifying which benchmark to analyze and the global configuration options to JDart: For SV-COMP 2020 all termination criteria except for assertion violations are disabled, executing JDart as an almost unbounded assertion checker (the only bound in place is an upper bound of 127 on maximal length of String variables). The shell script records and interprets the output of JDart and can also report the version of JDart.

## 5 Software Project

The version of JDart that was used in SV-COMP 2020 is maintained by the Automated Quality Assurance Group at Technical University of Dortmund (in particular by the authors of this paper) and is available under the Apache License, version 2.0, on GitHub<sup>1</sup>. An initial version of JDart was developed by the authors of [7] at NASA Ames Research Center and Carnegie Mellon University. The original version of JDart is available on GitHub<sup>2</sup>.

Acknowledgments. We are grateful for the work on JDart and JConstraints by the respective original authors. Our success would not have been possible without their contributions.

## References

1. Beyer, D.: Advances in automatic software verification: SV-COMP 2020. In: Proc. TACAS (2). LNCS 12079, Springer (2020), https://www.sosy-lab.org/ research/pub/2020-TACAS.Advances\_in\_Automatic\_Software\_Verification\_ SV-COMP\_2020.pdf

<sup>1</sup> https://github.com/tudo-aqua/jdart, Commit c7e30a29b98a69df2c7c96ae39b90ba0fe00e204

<sup>2</sup> https://github.com/psycopaths/jdart


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Map2Check: Using Symbolic Execution and Fuzzing (Competition Contribution)

Herbert Rocha<sup>1</sup>- , Rafael Menezes<sup>3</sup> , Lucas C. Cordeiro<sup>2</sup> , and Raimundo Barreto3

1Department of Computer Science, Federal University of Roraima, Roraima, Brazil herbert.rocha@ufrr.br 2Department of Computer Science, University of Manchester, Manchester, United Kingdom 3Institute of Computing, Federal University of Amazonas, Amazonas, Brazil

Abstract. Map2Check is a software verification tool that combines fuzzing, symbolic execution, and inductive invariants. It automatically checks safety properties in C programs by adopting source code instrumentation to monitor data (e.g., memory pointers) from the program's executions using LLVM compiler infrastructure. For SV-COMP 2020, we extended Map2Check to exploit an iterative deepening approach using LibFuzzer and Klee to check for safety properties. We also use Crab-LLVM to infer program invariants based on reachability analysis. Experimental results show that Map2Check can handle a wide variety of safety properties in several intricate verification tasks from SV-COMP 2020.

## 1 Overview

Fuzzing involves providing random data as input to a program and then checks for crashes. By contrast, path-based symbolic execution is an entirely static method that symbolically explores the program state-space [1]. Due to a focus on single runs, fuzzing techniques scale up relatively well. Path-based symbolic execution gives more confidence in the verification results, but it suffers from the path-explosion problem, thus limiting scalability. Here we exploit an iterative approach using fuzzing and symbolic execution to implement a tool named Map2Check v7.3.1 . Our main original contributions include: (i) use LibFuzzer [7] to provide random data as input to C programs to quickly expose "shallow" bugs, i.e., those that do not require complex data input; (ii) implement a new runtime library and instrumentation approach to monitor for crashes, failing built-in assertions and pointer safety; (iii) adopt Crab-LLVM [11] to infer invariants; (iv) exploit a sequential approach with LibFuzzer and KLEE [3] to check safety properties in a novel way; and (v) adopt MetaSMT as a wrapper around various SMT solvers, e.g., Boolector [2] and Yices [4], previously not supported by our tool. The SV-COMP'20 results show that Map2Check can be useful in both falsifying and proving reachability error and pointer safety-related properties.

<sup>-</sup>Jury member

## 2 Verification Approach

Map2Check uses compiler techniques to analyze C programs using LLVM compiler infrastructure, thereby tracking pointer addresses and variable assignments in the LLVM bitcode [8]. In order to hold all values used in the analysis, a container API is employed in Map2Check. The tool also generates *built-in* assertions and checks them adopting an approach with fuzzing (to falsify properties) and symbolic execution (to prove the correctness). Fig. 1 illustrates the Map2Check flow, which has the following main steps: (i) convert the C code into the LLVM IR using Clang [5]; (ii) simplify the code via constant propagation and dead code elimination after the code instrumentation; (iii) to apply further Clang optimizations (e.g., canonicalize natural loops and promote memory to register); (iii) add Map2Check library functions to check the analyzed LLVM bitcode; (iv) generate inputs for Map2Check instrumented functions by executing Lib-Fuzzer and then KLEE with Crab-LLVM; and (v) generate the witness file by identifying each basic block executed in the control-flow graph of the LLVM IR.

Fig. 1. Map2Check Verification Flow.

In order to explore the program states and to generate inputs for the Map2Check instrumented functions, the LibFuzzer implementation works by creating a custom entry point, which contains an array of bytes (of uint8\_t). Thus, our implementation consists of generating concrete values from non-deterministic inputs that are our fuzzy targets. Additionally, we run multiple libFuzzer processes in parallel, where *N* fuzzing jobs should run to completion, i.e., until a bug is found or time/iteration limits are reached. Our fuzzing is coverage-guided (e.g., clang coverage), which tries to maximize the code coverage of a program. In our case, we adopted an inline-8bit-counters option from LLVM (SanitizerCoverage) for code coverage instrumentation built-in, where the compiler will insert inline counter that should be incremented on every edge.

The KLEE implementation works by creating a variable for the used data type, makes it symbolic, and then returns its value. As a result, KLEE produces concrete inputs for different program executions. We extend our KLEE implementation by adopting MetaSMT [6], which is an Embedded Domain Specific Language for SMT solvers. The API provided by MetaSMT is translated at compile-time, through template metaprogramming, into the native APIs provided by the SMT solvers [9]. Therefore, the overhead introduced by MetaSMT is small.

In order to improve the KLEE core solver execution, the KLEE tool is ran adopting: counterexample caching solver, which can be used to avoid calling the underlying solver in certain situations; and MetaSMT, which is employed to construct expressions that will be cached for each constraint to facilitate expression reuse. Note that symbolic execution often requires concrete solutions for satisfiable queries, e.g., before calling an external function, all symbolic bytes need to be replaced by concrete values, simplify constraints, and reuse query results [9]. Therefore, the KLEE cache solver is an important optimization, mainly of the counterexample cache that is based on the observation that many constraint sets are in a subset/superset relation.

To check the unreachability of an error location, we reduced the number of states in the analyzed program to be explored, thereby supplying invariants to the back-end solvers. We adopted Crab-LLVM [11] to infer inductive invariants as constraints to the error location. Therefore, the invariants are automatically introduced into the program as assumptions (before verification), and then KLEE receives the code as input. Crab-LLVM is a static analyzer that employs an abstract interpretation engine over LLVM bitcode based on the Crab library, which uses abstract domains such as intervals, octagon, and polyhedra. Crab is built on the top of IKOS<sup>1</sup> (Inference Kernel for Open Static Analyzers) to support a collection of abstract domains and fixpoint iterators.

## 3 Software Architecture

Map2Check v7.3.1 is implemented as a source-to-source transformation tool in C/C++ using LLVM (v6.0). Map2Check uses Clang (v6.0) as a front-end to parse a C program and to generate the respective LLVM bitcode to be used in the code transformation to track pointers and variable assignments. It uses LibFuzzer [7] (v6.0) and KLEE [3] (v2.0, as a symbolic execution) to automatically produce inputs to execute different program paths. MetaSMT (v4.*rc*2) is the API of reasoning engines. For SV-COMP'20, we adopt Yices (v2.5.1) that is used by KLEE to check constraints over bit-vectors and arrays, which substantially improved our results. Crab-LLVM [11] is used on reachability mode to infer inductive invariants for LLVM bitcode.

## 4 Strengths and Weaknesses of the Approach

Map2Check analyzed intricate verification tasks. The tool achieved the 2nd place in the ReachSafety-Arrays subcategory; in the ReachSafety-BitVectors category, Map2Check achieved a score of 46, thereby presenting better results than Pinaka, UKojak, VeriFuzz, and DIVINE. In other subcategories, our tool generated correct-unconfirmed and incorrect true results. These results are, in part, explained due to the Map2Check bugs in the witness generation and limitation to handle Crab-LLVM invariants from the overapproximations. We are investigating how to extend our tool by combining the data from fuzzing with KLEE as program assumptions using template invariant.

In the MemSafety category, Map2Check achieved a score of −68. However, our tool achieved essential results in comparison with the state-of-art tools, e.g., in the

<sup>1</sup> https://ti.arc.nasa.gov/opensource/ikos/

MemSafety-heap subcategory achieved a score of 174, which outperforms UAutomizer, ESBMC, DIVINE, and CBMC. Most incorrect results are, in part, explained due to bugs in the pointer tracking from our memory model, which could be improved by a trace semantics with program optimizations as relations on sets of the trace. Sadly, in the NoOverflows category, the score was −89. The incorrect results are, in part, explained due to bugs in the overflow analyzer. One way to improve this result is by combining the CPU flag postcondition test (LLVM supports several intrinsic functions, e.g., an add operation returns a structure with the result and overflow flag) with Sanitizers checking.

## 5 Tool Setup and Configuration

In order to run our map2check-wrapper.py script [10],<sup>2</sup> one must set the property file (-p) and the verification task; it provides as result: *TRUE* <sup>+</sup> *Witness, FALSE* <sup>+</sup> *Witness*, or *UNKNOWN*. For each error-path or correctness witness, a file (called witness. graphml) with the witness proof is generated in the Map2Check root-path folder. The dependencies, e.g., Clang and Yices tools, are included in the Map2Check distribution. The Benchexec tool info module is named map2check.py and Map2Check participates in SV-COMP'20 (as in the map2check.xml benchmark definition) in the following categories: ReachSafety-Arrays, ReachSafety-BitVectors, ReachSafety-ControlFlow, Reach Safety-Heap, ReachSafety-Loops, ReachSafety-Recursive, MemSafety, and NoOverflows.

## 6 Software Project

Map2Check v7.3.1 <sup>3</sup> is open source software distributed under the GPL license. We provide instructions for building Map2Check from the source in the file README (including the description of all dependencies). Map2Check is a joint project with the Federal University of Roraima and the Federal University of Amazonas in Brazil.

## References


<sup>2</sup> https://gitlab.com/sosy-lab/sv-comp/archives-2020/blob/master/2020/map2check.zip

<sup>3</sup> https://github.com/hbgit/Map2Check


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## PredatorHP Revamped (Not Only) for Interval-Sized Memory Regions and Memory Reallocation (Competition Contribution) *-*

Petr Peringer, Veronika Sokov ˇ a´--(-) , and Toma´s Vojnar ˇ

Brno University of Technology, Faculty of Information Technology, Centre of Excellence IT4Innovations, Czech Republic

Abstract. This paper concentrates on improvements of the PredatorHP shape analyzer in the past two years, including, e.g., improved handling of interval-sized memory regions or new support of memory reallocation. The paper characterizes PredatorHP's participation in SV-COMP 2020, pointing out its strengths and weakness and the way they were influenced by the latest changes in the tool.

## 1 Verification Approach and Software Architecture

We first briefly recap the main ideas behind PredatorHP and then discuss significant improvements that have been done in the tool in the past two years.

## 1.1 The Predator Shape Analyzer

Predator is implemented using C++ and the Boost libraries as a GCC plug-in on top of the Code Listener framework [2], which we recently upgraded to work with GCC 7.4.0. Moreover, as shown below, we extended Code Listener by adding a type analysis phase before the compiled code is passed to the shape analysis implemented in Predator. In case a memory safety property is to be checked and there are no complex types, such as structures, unions, arrays, strings, or pointers in the program under analysis (including possibly unreachable code), we directly assume the program to be memory safe.

The main aim of Predator is *shape analysis* of sequential C programs that use lowlevel C pointer statements to implement various kinds of lists (singly- or doubly-linked, possibly circular, nested, and/or shared). Predator looks for various *memory-related errors* (invalid pointer dereferences, double free operations, memory leaks, etc.), and it

<sup>-</sup> This work was supported by the Czech Ministry of Education, Youth and Sports within the IT4Innovations Excellence in Science (NPUII) project No. LQ1602.

<sup>-</sup>-Jury member, email: isokova@fit.vutbr.cz.

also checks validity of *assertions* present in the code. Predator uses *abstract interpretation* based on the domain of *symbolic memory graphs* (SMGs) [1]. Predator abstracts uninterrupted sequences of singly- or doubly-linked memory regions into appropriate kinds of list segments. Further, Predator abstracts numerical values (either values stored in memory regions, sizes of the regions, or offsets of pointers) using intervals with constant bounds. The constants used as the bounds have a pre-defined maximum/minimum value defined in the configuration of Predator (+32/-32 for SV-COMP'20). If the maximum/minimum value is exceeded, the bound is set to plus or minus infinity. Predator uses *summaries* to speed up analysis of programs structured into functions. Recursive programs are, however, analysed up to a given call depth only.

*tor Hunting Party* [3,4], whose flow of control is shown on the right, is implemented as a Python script, and used to increase the efficiency and precision of the analysis. Namely,

PredatorHP runs the base *Predator verifier* in parallel with several *Predator hunters* that do not use the list-segment abstraction, do not join semantically different SMGs, nor use function summaries with matching of call parameters based on SMG entailment. While the Predator verifier can claim a program correct, it cannot report errors to avoid false alarms caused by abstraction. Predator hunters are classified as *breadth-first* (BFS) and *depth-first* (DFS). The DFS hunters have a limit on the search depth defined as a certain number of GCC's GIMPLE instructions. The hunters can normally only report errors. The only exception is when the verified program has a finite state space that is fully explored by the BFS hunter in the given time limit.

In SV-COMP'20, based on empirical data, the BFS hunter does not use the Predator's *VarKiller*, which removes dead variables from SMGs. This led to a significant speedup on 5 verification tasks (and some slowdown on 3 tasks). Further, the most shallow DFS 200 hunter, searching up to the depth of 200 instructions and used in PredatorHP up to SV-COMP'19, was removed as it was not bringing any advantage wrt the DFS 900 hunter, and a DFS 1900 hunter was added to handle more complex tasks (in particular, memsafety-ext2/split\_list\_test05-1, ntdrivers/ floppy.i.cil-3). However, note that the DFS 900 hunter remains needed as otherwise 11 verification tasks would time out.

#### 1.2 Recent Modifications of PredatorHP

One of the main improvements of the latest version of Predator is that its SMG-based analysis has been extended to support *memory reallocation* on the heap. If a reallocation operation is executed on an SMG, two new SMGs are produced. The first one models the case when a new object of the required size is created, data from the old object are copied into the new object, and the old object is freed. In the second case, the existing object is resized. If the size decreases, Predator checks that no memory leak happens due to some pointer field is removed or invalidated (in case it is partially removed).

Another improvement concerns working with *interval-sized memory regions*, which arise when allocating structures or arrays of parametric size. Despite even older versions of Predator were able to create such regions, the way in which they could have been treated in the subsequent analysis of the program was very limited. In particular, it was impossible to dereference interval-sized regions, and hence Predator was very weak when analysing programs with structures or arrays of an in-advance-not-fixed size. This situation was first improved for SV-COMP'19 in the following pragmatic way.

Namely, whenever Predator hits a conditional statement that would previously yield an interval value with fixed bounds (such as the statement if (n>=0 && n<10) for so-far unconstrained n), it will split the further analysis into as many branches as the number of values in the interval is, each of them evaluating for a concrete value from the interval. After the split, no further interval-based allocations and dereferences, which the previous version of Predator used to fail on, happen. In order for the splitting not to cause a memory explosion, the latest version of Predator contains a parameter that controls the maximum size of split intervals, which was set to 300 in SV-COMP'20.

The above modification of Predator concerned dealing with memory regions whose size is given by an interval with finite bounds. In case one of the bounds is infinite, Predator has been extended to *sample* the interval and perform the further analysis with the sampled values. Currently, the sampling is done simply by taking some number of concrete values from the given interval starting/ending with the bound that is fixed (of course, for memory regions, unboundedness from above does only make sense). The number of considered samples is currently set to 3. Of course, this strategy cannot be used to soundly verify correctness of programs, and so it is used for detecting bugs only.

Despite the above mentioned treatment of intervals was primarily designed for dealing with interval-sized memory regions, it can help in other cases of dealing with integers too. Namely, it can help both when dealing with integer data as well as when dealing with interval-based pointer offsets.

Next, we have implemented checking whether all dynamically allocated memory has been deallocated when a function with the *noreturn attribute* (such as abort or exit) is called. The implementation simply searches the SMG representing the memory at the moment of a call of a noreturn function and checks that it does not contain any valid dynamically allocated object.

We have also added a support of the *clobber* instruction of GIMPLE, which terminates the life time of local variables of code blocks. Upon this instruction, Predator now marks the concerned memory region as deallocated, allowing it to detect *invalid dereferences* of *objects local to a block* from outside of the block. Further, we have added a support of the instructions *modulo* and *bitwise-or* and created models of the standard library functions for strcmp and realloc. This fixed several problems such as reporting false alarms when assigning fully-overlapping structures.

Finally, we improved the generation of witnesses. Apart from some bug fixes, we changed the trace generation for the reachability category. Namely, in this category, if some trace ends with an error other than calling VERIFIER error, the analysis recovers and continues to search for other traces.

## 2 Strengths and Weaknesses

The main strength of PredatorHP is that it treats code with various kinds of unbounded lists in a *sound* and *efficient* way. Predator hunters then allow it to quickly handle programs with a small finite state space (e.g., benchmarks from list-simple) and avoid many false alarms that could otherwise happen. Interestingly, among the 328 correct tasks in *ReachSafety-Heap*, *MemSafety-Heap*, and *MemSafety-LinkedLists*, only 98 use unbounded data structures, out of which the Predator verifier (and, of course, no hunter) handles 56 %. Next, out of the 328 tasks, 83 do not use linked data structures nor arrays, and 147 use them but are finite-state. The Predator verifier and the BFS hunter handle 93 % of the 83 tasks that are so trivial that even the verifier does not use any abstraction. Out of the 147 tasks, 53 tasks are handled by both of them, while 2 tasks are handled solely by the verifier and 75 solely by the BFS hunter.

A weakness of Predator is that it specialises in dealing with lists, and so it handles structures such as trees, skip-lists, or arrays in a bounded way, i.e., for error detection, only. Another weakness of Predator has traditionally been its weak treatment of nonpointer data. We have tried to improve on the latter weakness by the described heuristics for dealing with intervals of integers with a specific aim to improve the way Predator handles memory regions of parametric size. The results of PredatorHP on SV-COMP'20 benchmarks with arrays show that the heuristics did help. Indeed, the interval sampling heuristic allowed us to correctly detect 10 errors in tasks from array-memsafety, array-examples, and loops. Moreover, the interval-splitting heuristic also helped on some benchmarks for dealing with interval-based sizes, offsets, and/or integer data. Namely, it removed 8 unknown results in *ReachSafety* and 4 such results in *MemSafety*.

The new type analysis looking for presence of complex types allowed Predator to skip its main analysis loop in 77 tasks in the *MemSafety* category, of which 13 tasks (from termination-crafted) contain recursion, which Predator could not handle, and 6 tasks (from locks) would otherwise timeout. Due to the new support of reallocation, Predator verifies all tasks containing a call of realloc. Due to the added support of clobber instructions, Predator detects invalid memory accesses in benchmarks accessing variables outside of the block in which they were created. All other new improvements described above did also help in some cases and allowed PredatorHP to win the 1st place in the *MemSafety* category and in the *ReachSafety-Heap* sub-category.

## 3 Contributors, Software Project, and the Tool Setup

The main author of Predator is Kamil Dudka. Besides him and the PredatorHP team, Petr Muller, Michal Kotoun, and numerous other people listed in the ¨ docs/THANKS file in the distribution of Predator have contributed to the distribution of Predator.

Predator is an open source software project distributed under GNU GPLv3. The source code used in SV-COMP'20 is available too1. The README-SVCOMP-2020 file shipped with it describes how to build the tool. The script predatorHP.py serves to run the tool, taking a verification task file as a single positional argument. Paths to both the property file and the desired witness file are accepted via long options, i.e., 64-bit compiler options. The verification outcome is printed to the standard output. To run PredatorHP in the BenchExec environment, the predatorhp.py wrapper and the predatorhp.xml benchmark definition can be used. In SV-COMP'20, PredatorHP participated in the *MemSafety* category and in the *ReachSafety-Heap* sub-category.

<sup>1</sup> http://www.fit.vutbr.cz/research/groups/verifit/tools/predator-hp

## References


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Symbiotic 7: Integration of Predator and More***-* **(Competition Contribution)**

Marek Chalupa<sup>1</sup>--, Tom´aˇs Jaˇsek<sup>1</sup>, Luk´aˇs Tomoviˇc<sup>1</sup>, Martin Hruˇska<sup>2</sup>, Veronika Sokov´ ˇ a<sup>2</sup>, Paul´ına Ayaziov´a<sup>1</sup>, Jan Strejˇcek<sup>1</sup> , and Tom´aˇs Vojnar<sup>2</sup>

<sup>1</sup> Masaryk University, Brno, Czech Republic <sup>2</sup> Brno University of Technology, FIT, IT4Innovations Centre of Excellence, CZ, Brno, Czech Republic

**Abstract.** Symbiotic 7 brings improvements in all parts of the tool. In particular, we integrated the advanced shape analysis implemented in Predator to our instrumentation process for memory safety checking. Further, we extended our slicer to correctly handle non-terminating programs. This new slicing is applied in termination analysis, where we also added instrumentation for detection of simple cycles in the program state space. The witness generation process changed as well.

## **1 Verification Approach**

Symbiotic 7 follows the same basic schema as all previous versions [4,5]: the program to be verified is first instrumented (if needed), then reduced by static program slicing, and finally symbolically executed using Klee [2]. We describe the main modifications since Symbiotic 5 (participating in SV-COMP 2018) as modifications in Symbiotic 6 (competing in 2019) have not been published.

**Memory safety checking improvements** Symbiotic uses a static pointer analysis to detect instructions that can potentially violate memory safety. To check these instructions, Symbiotic 5 [5,3] instrumented the program with code that keeps records about allocated memory and uses the records to assert the validity of potentially misbehaving instructions. Then we sliced the program with respect to these assertions and called Klee to check assertion validity.

Since Symbiotic 6, we slice the program directly with respect to the potentially misbehaving instructions without inserting any additional code. Then we call Klee to check memory safety of the sliced program.

Symbiotic 7 newly integrates Predator [6], a static analyzer specialized on memory safety. We first run Predator in its over-approximating mode and

<sup>-</sup> M. Chalupa, T. Jaˇsek, P. Ayaziov´a, and J. Strejˇcek have been supported by the Czech Science Foundation grant GA18-02177S. M. Hruˇska, V. Sokov´ ˇ a, and T. Vojnar have been supported by the IT4Innovations Excellence in Science project (LQ1602) and the FIT BUT internal project FIT-S-20-6427.

<sup>-</sup>-Jury member and corresponding author: chalupa@fi.muni.cz.

in a configuration that analyses all branches in the given program and tries to recover from found errors. If Predator says that the program is safe, we simply answer *true*. Otherwise, we take bug reports from Predator and combine them with results of our static pointer analysis to get a more precise (i.e., smaller) set of potentially misbehaving instructions. Then we proceed like Symbiotic 6.

Symbiotic 7 is also the first version that can distinguish between *validmemcleanup* and *valid-memtrack* properties. To do this, our clone of Klee now reconstructs the shape of memory at the program exit if unfreed memory is found: Klee starts with local and global variables and resolves pointers in these (if any). Then it resolves pointers in the pointed memory, etc. This way we can find out if the unfreed memory is reachable via a chain of dereferences or not.

**Termination analysis** Symbiotic 6 introduced a simple support for termination property: a call to VERIFIER error is inserted before trivial infinite loops, e.g., while (true); loops. If the symbolic execution detects that such a call is reachable, Symbiotic answers *false* as the program can reach an infinite loop. If all paths of the program are explored by symbolic execution without reaching any of these calls, all program executions are clearly terminating and we answer *true* (an infinite program path cannot be fully explored by symbolic execution). Note that program slicing was disabled for non-termination checking in Symbiotic 6 as the slicer could remove infinite loops in some specific cases.

Symbiotic 7 brings two improvements. First, since we extended our slicer to correctly handle non-terminating programs [7], we now apply slicing with slicing criteria set to all exit points (including the instrumented error calls) of the program. Second, we instrument the program with checks for simple cycles in the state space. The instrumentation detects non-nested loops with a single entry for which it can conservatively determine a set {V1,...,Vk} that includes all variables potentially modified by the loop. At the beginning of the loop body, we insert assignments that store the value of each variable V<sup>i</sup> into a new variable V - i . At the end of the loop body, we insert the assertion assert(V<sup>1</sup> -= V - <sup>1</sup> ∨...∨V<sup>k</sup> -= V - <sup>k</sup>) to check a change in the vector of these variables. If this assertion is violated, the program has a non-terminating execution.

**Error path replay** Although the slicer in Symbiotic now provides algorithms that preserve non-termination properties of programs, outside the *Termination* category we still use the original *non-termination insensitive* slicing as it may remove more instructions. The price is, however, that Symbiotic may report false alarms: an unreachable error location situated below an infinite loop may become reachable when the loop is sliced out. To fix this issue, we try to reproduce each error found by symbolic execution in the original (unsliced) program. If the error is reproduced, we report it as a real error. Otherwise, we say *unknown*.

**Improved witness generation** Symbiotic 5 and 6 generated violation witnesses that describe only the initialization of non-deterministic variables at the beginning of the main function. Symbiotic 7, on the other hand, generates violation witnesses that contain a complete test vector, i.e., the whole sequence of values returned from VERIFIER nondet \* functions during the error path replay. To get and correctly identify all these values, we have modified our fork of Klee to support interpretation of VERIFIER nondet \* functions (and other undefined functions in general) internally. Currently, more than 99% of our violation witnesses (outside the *Termination* category) are confirmed. Symbiotic 7 still generates trivial correctness witnesses if no error is found.

**Other improvements** Other improvements in Symbiotic 7 used in SV-COMP 2020 include a faster data dependence analysis (a part of slicing) and better handling of assume statements in the slicer. Symbiotic is now also able to continue in verification if the instrumentation or slicer crashes or exceeds the time limit. In such a case, Klee is run on the original program which has been only optimized by standard llvm optimizations. For SV-COMP 2020, we set the time limit of 400 s on instrumentation and the time limit of 300 s on slicing.

## **2 Software Architecture**

Symbiotic 7 is built on top of llvm 8.0.1 [8]. The tool consists of a set of modules written in C++ that process llvm bitcode, and Python scripts that chain these modules according to given configuration.

For use in Symbiotic, we have made several bugfixes in Predator's llvm backend and ported it to llvm 8.0.1. Further, we have introduced distinguishing between safe and possibly erroneous program instructions.

Symbiotic uses its own fork of Klee that contains several modifications compared to the mainstream Klee. In particular, the fork has been extended to handle symbolic-sized memory allocations, to process marks delimiting the lifetime of scoped variables, to check for memory leaks, and to generate violation witnesses in the SV-COMP format.

## **3 Strengths and Weaknesses**

In SV-COMP 2020 [1], Symbiotic 7 won the *SoftwareSystems* category and scored second in the *MemSafety* category and the *FalsificationOverall* meta category. Overall, Symbiotic ended up on the fourth place.

The main reason for winning *SoftwareSystems* is having only a few incorrect answers. Indeed, Symbiotic did not win in the number of correct answers in any of the *SoftwareSystems* subcategories. However, we had only 4 incorrect answers and all of them in the subcategory *DeviceDriversLinux64*. This subcategory is huge and these incorrect answers have only a small impact on the weighted score.

In *MemSafety*, we took the second place after PredatorHP which executes several instances of the Predator tool with different configurations in parallel. Symbiotic calls just one of these instances as mentioned above. Additionally, PredatorHP uses gcc, while we use Predator running on llvm, which is not as mature as the former. Also, we had a number of new *unknown* answers because Klee does not support pointer comparisons, which we incorrectly did not detect in the previous versions of Symbiotic.

In general, Symbiotic's results stems from the good performance of Klee supported by efficient static analysis and slicing: the official results show that Symbiotic can decide many benchmarks very quickly.

The main weakness of our tool is the inherent complexity of symbolic execution and the limited possibility of analysing potentially unbounded loops or infinite paths with this technique. Indeed, as symbolic execution actually follows all paths in the program, it does not terminate if the program contains an unbounded loop or an infinite path (unless an error is found). Even when the number of paths is finite and all the paths are finite, symbolic execution usually runs out of resources if the number of paths is large. Although this problem is slightly alleviated by program slicing, our tool still does not scale well on complex programs.

## **4 Tool Setup and Configuration**

	- --prp=file, which sets the property specification file to use,
	- --witness=file, which sets the output file for the witness,
	- --32, which sets the 32-bit environment,
	- --help, which shows the full list of possible options.

## **5 Software Project and Contributors**

Symbiotic 6 and 7 have been developed by M. Chalupa, T. Jaˇsek, M. Vitovsk´a, M. Sim´ ˇ aˇcek, L. Tomoviˇc, and P. Ayaziov´a under the supervision of J. Strejˇcek. Predator has been adjusted for the described integration by M. Hruˇska and V. Sokov´ ˇ a under the supervision of T. Vojnar. Symbiotic and its components are available under the MIT license. The project is hosted by the Faculty of Informatics, Masaryk University. Klee, llvm, and Predator are also available under open-source licenses. Source codes of the project and references to all its components can be found at:

https://github.com/staticafi/symbiotic

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Ultimate Taipan with Symbolic Interpretation and Fluid Abstractions (Competition Contribution)**

Daniel Dietsch(-) , Matthias Heizmann(-) , Alexander Nutz, Claus Sch¨atzle, and Frank Sch¨ussele

University of Freiburg, Freiburg im Breisgau, Germany {dietsch,heizmann}@cs.uni-freiburg.de

**Abstract.** Ultimate Taipan is a software model checker that combines trace abstraction with abstract interpretation on path programs. In this year's version, we replaced our abstract interpretation engine and now use a combination of multiple abstraction functions, fixpoint computation, algebraic program analysis, and SMT solving. Our new approach will allow us to integrate new techniques more easily.

## **1 Verification Approach**

Ultimate Taipan is a software model checker which combines trace abstraction [8] and abstract interpretation [5]. The algorithm of Taipan follows the trace abstraction verification scheme for reachability where it constructs an abstraction of the program as a nested word automaton (NWA). This NWA has initially the same graph structure as the program's interprocedural control flow graph (ICFG), its states are program locations, its transitions are labeled with program locations, and states corresponding to error locations are accepting. Hence, the automaton recognizes a language where the symbols are statements and the words are sequences of statements (which we call traces) that lead to an error location. If the language of the abstraction automaton is empty, no error location can be reached and the program is safe. If there is a trace in the language, the algorithm needs to determine if it is a *feasible* trace, i.e., a trace that corresponds to an actual program execution, or not. Feasible traces constitute an actual counterexample and if one is found the algorithm terminates. If an infeasible trace is found, Taipan's algorithm differs from trace abstraction and does not only analyze the actual trace, but rather constructs a path program<sup>1</sup>from this trace. It then tries to synthesize inductive invariants for the whole path program [7]. From these invariants, a new automaton is constructed which language only recognizes infeasible traces. The new abstraction is then constructed as the difference of the automaton that only recognizes infeasible traces and the old abstraction automaton. If the error location's invariant of the path program is not f alse, the computed invariants are too weak to prove infeasibility, and Taipan falls back to using interpolating SMT solvers to compute new invariants that are strong enough to discharge the trace.

Daniel Dietsch — Jury Member

<sup>1</sup> A path program is a projection of the program to the trace.

Taipan's old algorithm used abstract interpretation to analyze path programs. In this year's iteration, we use a new approach, which is motivated by two drawbacks of our old algorithm. Firstly, extending an abstract interpretation engine with new abstract domains is labor-intensive and error-prone. Each abstract domain has an abstract post operator describing the effect program statements have on abstract states. For each abstract domain and each type of program statement the abstract post operator has to be defined and implemented, and re-use between domains is complicated. Furthermore, each abstract domain needs their own representation of an abstract state, s.t. exchanging information between multiple domains requires explicit conversions. Secondly, Abstract interpretation always abstracts. Because each abstract domain has its own abstract state representation, it is usually not possible to implement a precise post operator. Hence, every application of post is an abstraction, which leads to unnecessary loss of precision.

**Fig. 1:** Overview of the symbolic interpretation engine.

Our new approach is inspired by *Algebraic Program Analysis* [9, 4] and the renewed interest in this technique (e.g. [6]), and *Logical Interpretation* [10]. We use the modularity of algebraic program analysis to combine different techniques in an unifying framework and the idea of a shared representation of abstract program states as SMT formulas over which abstraction operators can compute fixpoints from logical interpretation.

An overview of our approach is depicted in Figure 1. The approach consists of two major components, the ICFG interpreter and the DAG interpreter.

The ICFG interpreter component generates for a (partial) interprocedural control flow graph (ICFG) and a subset of its program locations (locations of interest, LOI) a set of path expressions represented as RegexDAGs. A RegexDAG is a directed acyclic graph with vertices that are labeled with regular expressions over the program's statements without calls and returns but with summary and enter statements. Each RegexDAG has exactly one sink node that represents a location of interest. We use summary statements when we call to and return from a procedure on a path to a LOI, and enter statements when we do not return until we reach the LOI.

The DAG interpreter component then analyses a RegexDAG in topological order by applying different operators (Call Sum., Loop Sum., post op.) to the different vertex labels. All operators take a program state expressed as SMT formula φ and a regular expression over program statements (i.e., a vertex label) and produce a new (possibly abstracted) program state that captures all the effects. If a vertex has multiple incoming edges, the different input states are simply joined with a logical disjunction (∨). Some of these operators depend again on the ICFG interpreter to compute their result. The most basic operator is the post operator (post op.), which computes strongest post for star-free regular expressions and optionally applies an abstraction function to the result. The choice of abstraction function and if to apply them is governed by different heuristics that can be changed. We call these heuristics *fluids*. The other operators are the call summarization (Call Sum.) and loop summarization (Loop Sum.) operators. The call summarization operator computes a summary for a procedure call, either with or without considering the context. The loop summarization operator computes a summary for the Kleene-star operator of regular expressions. Our current implementation does this by computing a fixpoint and resolving nested loops by recursively inserting summaries. The different operators (post, call summarization, loop summarization) are completely modular and can be considered black-boxes for the interplay between the two main components. When the DAG interpreter reaches the sink vertex of the RegexDAG, it returns the disjunction of this sink's input program states as invariant for this LOI.

## **2 Strengths and Weaknesses**

Our new approach is easy to extend with new abstraction functions, fluids, and loop acceleration techniques. Compared to the previous approach we also gain much more precision by, e.g., having a reduced product between different kinds of abstraction without writing a transformation function – we can just use the logical disjunction. Using SMT formulas as representation of program states also allows us to reuse many of Ultimate's existing tools that deal with SMT, in particular simplification, quantifier elimination,, rewriting, and debugging.

Nevertheless, our current implementation is not as effective as the old one, because we did not finish porting the various abstract domains. We currently only support a basic interval abstraction and an explicit value abstraction, which severely limits the efficiency of our approach. We are also missing more intricate loop acceleration implementations, optimized fluid configurations, and our implementation does not yet support recursion.

## **3 Architecture, Setup, Configuration, and Project**

Ultimate Taipan is a part of the open-soure program analysis framework Ultimate2,3, written in Java and licensed under LGPLv3<sup>4</sup>. We used Taipan version 0.1.25-f470102c in our competition submission, which is available as a .zip archive from multiple sources5,6,7. Our submission requires Java 1.8 and Python 3.x. The submission contains an executable version of Taipan for Linux platforms, the binaries of the required SMT solvers Z3<sup>8</sup>, CVC4<sup>9</sup>, and Mathsat<sup>10</sup>, as well as a Python script, Ultimate.py, which maps the SV-COMP interface to Ultimate's command line interface. Taipan is invoked with

./Ultimate.py --spec prop.prp --file input.c --architecture 32bit|64bit --full-output

where prop.prp is the SV-COMP property file, input.c is the input C file, 32bit or 64bit is the architecture, and --full-output enables verbose output to stdout. The output of Taipan is written to the file Ultimate.log. A violation [3] or correctness [2] witness may be written to the file witness.graphml. The benchmarking tool BenchExec [1] supports Taipan through the tool-info module ultimatetaipan.py<sup>11</sup>. Taipan participates in all categories, as declared in its SV-COMP benchmark definition file utaipan.xml<sup>12</sup>.

## **References**


<sup>2</sup> https://ultimate.informatik.uni-freiburg.de <sup>3</sup> https://github.com/ultimate-pa/ultimate <sup>4</sup> https://www.gnu.org/licenses/lgpl-3.0.en.html <sup>5</sup> https://gitlab.com/sosy-lab/sv-comp/archives-2020/blob/master/2020/utaipan.zip <sup>6</sup> https://github.com/ultimate-pa/ultimate/releases/download/v0.1.25/UltimateTaipan-linux.zip <sup>7</sup> https://doi.org/10.5281/zenodo.3678625 <sup>8</sup> https://github.com/Z3Prover/z3 <sup>9</sup> https://cvc4.cs.nyu.edu/ <sup>10</sup> http://mathsat.fbk.eu/ <sup>11</sup> https://github.com/sosy-lab/benchexec/blob/master/benchexec/tools/ultimatetaipan.py <sup>12</sup> https://github.com/sosy-lab/sv-comp/blob/master/benchmark-defs/utaipan.xml

422 D. Dietsch et al.

[10] A. Tiwari and S. Gulwani. Logical Interpretation: Static Program Analysis Using Theorem Proving. In CADE, volume 4603 of LNCS, pages 147–166. Springer, 2007.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Author Index

Abate, Alessandro I-97 Afzal, Mohammad II-383 Ahmed, Daniele I-97 Akshay, S. I-387 Albert, Elvira II-118 Almaawi, Alyas I-115 Almeida, Bernardo II-39 An, Jie I-444 Angluin, Dana II-325 Ayaziová, Paulína II-413 Babu M, Charles II-383 Baier, Christel I-324 Barreto, Raimundo II-403 Barrett, Clark I-367 Bartocci, Ezio I-492 Becker, Benedikt II-235 Bendík, Jaroslav I-135 Benerecetti, Massimo II-289 Berger, Philipp I-40 Beyer, Dirk I-3, II-126, II-347 Biagi, Marco I-463 Bian, Jinting II-217 Bockenek, Joshua A. II-98 Boender, Jaap II-271 Bornat, Richard II-271 Bozga, Marius I-228 Budde, Carlos E. I-463, I-483 Castro, David II-278 Celik, Ahmet II-137 Černá, Ivana I-135 Chakraborty, Supratik I-22, II-383 Chalupa, Marek II-413 Chauhan, Avriti II-383 Chen, Mingshuai I-444 Chimdyalwar, Bharti II-383 Cimatti, Alessandro I-155 Cordeiro, Lucas C. II-403 Correas, Jesús II-118 Cubuktepe, Murat I-287

D'Argenio, Pedro R. I-463 Dangl, Matthias I-3 Darke, Priyanka II-383 de Boer, Frank S. II-217 de Gouw, Stijn II-217 Delgrange, Florent I-346 Dell'Erba, Daniele II-289 Deng, Yuxin II-21 Dietsch, Daniel II-418 Dixon, Alex I-405 Du, Wenjie II-21 Dubut, Jérémy I-191 Esparza, Javier I-228 Fan, Chuchu I-173 Fedyukovich, Grigory II-195 Ferreira, Francisco II-278 Fisman, Dana II-325 Frenkel, Hadar I-211 Frohn, Florian I-58 Funke, Florian I-324 Furbach, Florian II-378 Gastin, Paul I-387 Geatti, Luca I-155 Geldenhuys, Jaco II-373 Giacobbe, Mirco II-79 Gligoric, Milos II-137 Goel, Aman I-413 Gordillo, Pablo II-118 Griggio, Alberto I-155 Groote, Jan Friso II-3 Grumberg, Orna I-211 Gupta, Aarti II-195 Gupta, Ashutosh I-22, II-383

Hahn, Ernst Moritz I-306 Hamers, Ruben I-266 Hasuo, Ichiro I-191 Heizmann, Matthias II-418 Heljanko, Keijo II-378 Henzinger, Thomas A. II-79 Hiep, Hans-Dieter A. II-217 Howar, Falk II-398 Hruška, Martin II-413 Huisman, Marieke I-247 Hussein, Soha II-393

Iosif, Radu I-228

Jansen, David N. II-3 Jansen, Nils I-287 Jantsch, Simon I-324 Jašek, Tomáš II-413 Jeannerod, Nicolas II-235 Jongmans, Sung-Shik I-266 Joosten, Sebastiaan J. C. I-247 Junges, Sebastian I-287

Kammueller, Florian II-271 Kápl, Roman II-254 Katoen, Joost-Pieter I-40, I-287, I-346 Katsumata, Shin-ya I-191 Keiren, Jeroen J. A. II-3 Khurshid, Sarfraz I-115 Kimberly, Greg I-155 King, Andy I-79 Kobayashi, Naoki II-195 Kolčák, Juraj I-191 Kovács, Laura I-492 Krishna, S I-387 Kumar, Shrawan II-383

Lang, Frédéric II-57 Lazić, Ranko I-405 Lechner, Mathias II-79 Lochmann, Alexander II-178

Maathuis, Olaf II-217 Madhusudan, P. II-158 Malík, Viktor II-368 Mann, Makai I-367 Manolios, Panagiotis II-388 Marché, Claude II-235 Mateescu, Radu II-57 Mathur, Umang II-158 Mazzanti, Franco II-57 McCamant, Stephen II-393 Meel, Kuldeep S. I-115

Menezes, Rafael II-403 Meyer, Roland II-378 Middeldorp, Aart II-178 Mitra, Sayan I-173 Mogavero, Fabio II-289 Mokhlesi, Navid I-173 Monti, Raúl E. I-463 Mordido, Andreia II-39 Mues, Malte II-398 Mutius, Joshua von I-425

Nagarajan, Rajagopal II-271 Neele, Thomas II-307 Nutz, Alexander II-418

Okudono, Takamasa I-79 Oortwijn, Wytse I-247

Palmskog, Karl II-137 Parízek, Pavel II-254 Pasareanu, Corina I-211 Perez, Mateo I-306 Peringer, Petr II-408 Peruffo, Andrea I-97 Poly, Guillaume II-271 Ponce-de-León, Hernán II-378

Qin, Xudong II-21 Quatmann, Tim I-346 Quiring, Benjamin II-388

Randour, Mickael I-346 Ravindran, Binoy II-98 Régis-Gianas, Yann II-235 Rocha, Herbert II-403 Román-Díez, Guillermo II-118 Roychowdhury, Sparsa I-387 Rubio, Albert II-118

Sakallah, Karem I-413 Schätzle, Claus II-418 Schewe, Sven I-306 Schrammel, Peter II-368 Schüssele, Frank II-418 Sharma, Vaibhav II-393 Sheinvald, Sarai I-211 Shoval, Yaara II-325 Sibai, Hussein I-173 Sifakis, Joseph I-228

Sighireanu, Mihaela II-235 Šoková, Veronika II-408, II-413 Somenzi, Fabio I-306 Sprunger, David I-191 Stankovič, Miroslav I-492 Stoelinga, Mariëlle I-463 Strejček, Jan II-413 Švejda, Jan I-40

Tomovič, Lukáš II-413 Tonetta, Stefano I-155 Topcu, Ufuk I-287 Treinen, Ralf II-235 Trivedi, Ashutosh I-306

Unadkat, Divyesh I-22, II-383 Usman, Muhammad I-115

van de Pol, Jaco I-247 van Eekelen, Marko II-217 Vasconcelos, Vasco T. II-39 Venkatesh, R II-383

Verbeek, Freek II-98 Visser, Willem II-373, II-393 Viswanathan, Mahesh II-158 Vojnar, Tomáš II-368, II-408, II-413

Wang, Kaiyuan I-115 Wang, Wenxi I-115 Welzel, Christoph I-228 Wendler, Philipp II-126 Wesselink, Wieger II-307 Whalen, Michael W. II-393 Wijs, Anton II-3 Willemse, Tim A. C. II-307 Wimmer, Simon I-425 Wojtczak, Dominik I-306

Yamada, Akihisa I-191 Yoshida, Nobuko II-278

Zhan, Bohua I-444 Zhan, Naijun I-444 Zhang, Miaomiao I-444