**Brigitte Pientka Cesare Tinelli (Eds.)**

# LNAI 14132

# **Automated Deduction – CADE 29**

**29th International Conference on Automated Deduction Rome, Italy, July 1–4, 2023 Proceedings**

# Lecture Notes in Computer Science

# **Lecture Notes in Artificial Intelligence 14132**

Founding Editor Jörg Siekmann

Series Editors

Randy Goebel, *University of Alberta, Edmonton, Canada* Wolfgang Wahlster, *DFKI, Berlin, Germany* Zhi-Hua Zhou, *Nanjing University, Nanjing, China*

The series Lecture Notes in Artificial Intelligence (LNAI) was established in 1988 as a topical subseries of LNCS devoted to artificial intelligence.

The series publishes state-of-the-art research results at a high level. As with the LNCS mother series, the mission of the series is to serve the international R & D community by providing an invaluable service, mainly focused on the publication of conference and workshop proceedings and postproceedings.

Brigitte Pientka · Cesare Tinelli Editors

# Automated Deduction – CADE 29

29th International Conference on Automated Deduction Rome, Italy, July 1–4, 2023 Proceedings

*Editors* Brigitte Pientka McGill University Montreal, QC, Canada

Cesare Tinelli The University of Iowa Iowa City, IA, USA

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Artificial Intelligence ISBN 978-3-031-38498-1 ISBN 978-3-031-38499-8 (eBook) https://doi.org/10.1007/978-3-031-38499-8

LNCS Sublibrary: SL7 – Artificial Intelligence

© The Editor(s) (if applicable) and The Author(s) 2023. This book is an open access publication.

**Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

### **Preface**

This volume contains the proceedings of the 29th International Conference on Automated Deduction (CADE-29). CADE is the major forum for the presentation of research in all aspects of automated deduction, including foundations, applications, implementations, and practical experience. CADE-29 was held on 1–4 July 2023, hosted at the Faculty of Civil and Industrial Engineering of the Sapienza University of Rome, Italy, and colocated with the 8th International Conference on Formal Structures for Computation and Deduction (FSCD). CADE-29 emphasized the breadth of topics that are of interest, including applications in and beyond computer science and mathematics, and the use/contribution of automated deduction in AI.

The Program Committee (PC) examined 74 submissions this year and decided to accept 33 of them (28 full papers and 5 short papers or system descriptions). Submissions were single-blind and each of them was reviewed by at least three PC members or their external reviewers. The criteria for evaluation were originality and significance, technical quality, comparison with related work, quality of presentation, and reproducibility of experiments.

The program of the conference included three invited talks, two of which were joint talks with FSCD:


A fourth invited talk, "Automated Reasoning with Data," was given by *Moshe Vardi* as recipient of the 2023 Herbrand Award.

The conference hosted several workshops, and one competition on July 4–6:


In addition to the best paper awards, three CADE awards were presented at the conference:


#### vi Preface


Sincere thanks go to the many people who contributed to the success of CADE-29 — the authors, the participants, the invited speakers, the members of the PC, the external subreviewers, the general chair, the workshop and tutorial chair, the publicity chair, the staff at Springer, and the EasyChair team.

CADE-29 gratefully acknowledges the support of the CADE trustees, the board of the Association for Automated Reasoning, ACM SIGLOG, and the sponsors Amazon Web Services and Springer.

July 2023 Brigitte Pientka Cesare Tinelli

# **Organization**

### **General Chair**



### **Program Committee**



### **Additional Reviewers**

Erika Ábrahám Antonis Achilleos Takahito Aoto François Bobot James Brotherston Chad Brown Guillaume Burel Filip Bártek David Cerna Kaustuv Chaudhuri Md Solimul Chowdhury Gabriel Ebner Raul Fervari Pascal Fontaine Thibault Gauthier Khalil Ghorbal

Florian Rabe FAU Erlangen-Nürnberg, Germany Giles Reger AWS and University of Manchester, UK Martina Seidl Johannes Kepler University Linz, Austria Alexander Steen University of Greifswald, Germany Martin Suda Czech Technical University in Prague, Czechia Sophie Tourret Inria, France and MPI for Informatics, Germany Sarah Winkler Free University of Bozen-Bolzano, Italy

Alessandro Gianola Stéphane Graham-Lengrand Jonathan Huerta y Munive Sohei Ito Jan Jakubuv Albert Jiang Ariel Kellison Patrick Koopmann Temur Kutsia Dennis Müller Jakob Nordström Miroslav Olšák Eugenio Orlandelli Pedro Orvalho Nicolas Peltier Bartosz Piotrowski

Gian Luca Pozzato Mathias Preiner Stanisław Purgał Gianluca Redondi Joseph Reeves Martin Riener Colin Rothgang Navid Roux Reuben Rowe Claudio Sacerdoti Coen Luca San Mauro

#### Jan Frederik Schaefer Tanja Schindler Anders Schlichtkrull Ying Sheng Nicholas Smallbone Guilherme Toledo Dmitriy Traytel Makarius Wenzel Yechuan Xia Akihisa Yamada Emre Yolcu

### **Board of CADE Trustees**

Pascal Fontaine University of Liège, Belgium Jürgen Giesl RWTH Aachen, Germany Marijn Heule Carnegie Mellon University, USA Neil Murray University at Albany, USA Cláudia Nalon University of Brasília, Brazil Brigitte Pientka McGill University, Canada André Platzer Karlsruhe Institute of Technology, Germany Andrew Reynolds University of Iowa, USA Philipp Rümmer University of Regensburg, Germany Renate Schmidt University of Manchester, UK Stephan Schulz DHBW Stuttgart, Germany Sophie Tourret Inria, France and MPI, Germany

### **Board of the Association for Automated Reasoning**


Christoph Benzmüller University of Bamberg and FU Berlin, Germany Jürgen Giesl RWTH Aachen, Germany Philipp Rümmer University of Regensburg, Germany Sophie Tourret Inria, France and MPI, Germany

### **Sponsors**

# **Invited Talks**

### λ**-Superposition: From Theory to Trophy**

Jasmin Blanchette1,2,3

1Ludwig-Maximilians-Universität München, Munich, Germany jasmin.blanchette@lmu.de 2Max-Planck-Institut für Informatik, Saarland Informatics Campus, Saarbrüücken, Germany 3Université de Lorraine, CNRS, Inria, LORIA, Nancy, France

This extended abstract describes work performed in collaboration with Alexander Bentkamp, Simon Cruanes, Visa Nummelin, Stephan Schulz, Sophie Tourret, Petar Vukmirovi´c, and Uwe Waldmann on the design and implementation of λ-superposition, in the context of the Matroyshka research project.

When I conceived Matroyshka in 2015, my ambition was to develop higherorder provers that perform well on higher-order proof obligations originating from Isabelle/HOL [11] and other proof assistants. Lawrence Paulson had noticed that the performance on truly higher-order goals left much to be desired and "given the inherent difficulty of performing higher-order reasoning using first-order theorem provers, the way forward is to integrate Sledgehammer with an actual higher-order theorem prover, such as LEO-II" [13]. However, the subsequent integration of LEO-II [4] and Satallax [7] failed to bring the expected benefits [16]. My hypothesis was that most Isabelle problems have a large first-order component and the existing higher-order provers were not optimized for this kind of reasoning.

To obtain higher-order provers that excel at first-order reasoning, I proposed to start with a highly successful first-order calculus, superposition, and generalize it, as much as possible, in a graceful way, culminating with a higher-order calculus. Provers implementing this calculus would combine the strengths of native higher-order provers and the strengths of the superposition provers that served as Sledgehammer backends: E [14], SPASS [6], and Vampire [5].

To tackle the challenge of designing this calculus, which we call λ-superposition, we identified three milestones that we reached in turn. We first designed a superposition-like calculus for a λ-free, Boolean-free higher-order logic (also called applicative first-order logic) [1]. This logic supports partial application of function symbols (e.g., f or f a, where f is binary) and application of variables (e.g., *y* a). Already at this stage, the first serious issue arose with the term order that superposition uses to prune the search space. We were able to work around the issue by introducing a new inference rule called argument congruence. For this and the other milestones, much of the work went into ensuring refutational completeness.

For the second milestone, we designed a superposition-like calculus for a logic that supports λ-abstractions but not interpreted Booleans [3]. One difficulty that arose is that inferences need to perform higher-order unification. Unfortunately, higher-order unification is ill-behaved: It is undecidable and can yield a possibly infinite stream of unifiers. Moreover, due to interactions with the term order, we need to perform full unification (including flex-flex pairs) [17] and not simply preunification [10].

For the third milestone, we added support for interpreted Booleans [2]. This step was based on ideas by Ganzinger and Stuber [9]. They showed how to support logical symbols inside a superposition-like calculus, but fell short of including an interpreted Boolean type. Thus, we extended Ganzinger and Stuber's work [12] and used it as the basis of a graceful generalization to higher-order logic.

Whenever we designed a calculus, we also made sure to implement it in the Zipperposition prover [8]. Zipperposition was originally developed by Cruanes to explore induction, arithmetic, and deduction modulo. It is written in OCaml and is highly extensible. He extended it with a pragmatic higher-order mode with support for λ-abstractions and extensionality, without any completeness guarantees. This mode formed the basis for our subsequent work. Empirical evaluations on TPTP and Sledgehammer benchmarks were initially disappointing, but after some extensive tuning and new ideas for heuristics, Zipperposition became highly competitive, finishing first in the higher-order theorem division of the CADE ATP System Competition (CASC) in 2020, 2021, and 2022. Inspired by a similar integration in Leo-III [15] and Satallax, Zipperposition incorporates E as a backend to tackle first-order subproblems.

We also implemented λ-superposition in the high-performance prover E [18,19]. The E implementation is pragmatic and sacrifices completeness. For example, the possibly infinite stream of unifiers is truncated to make it finite, and some of the most explosive rules of λ-superposition are omitted. Probably because Zipperposition has a portfolio of modes extensively tuned against the TPTP library and uses a version of E as a backend, E finished only second in the higher-order theorem division of CASC 2022. On the other hand, E finished first in the Sledgehammer division of the same competition. Despite this, the performance improvement over Sledgehammer's first-order backends is small. I suspect that Isabelle problems are even more first-order than I thought.

We learned a few other lessons in the process:


**Acknowledgment.** I thank Alexander Bentkamp, Stephan Schulz, Mark Summerfield, Petar Vukmirovi´c, and Uwe Waldmann for textual suggestions. This research has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant No. 713999, Matryoshka). This research has also received funding from the Netherlands Organization for Scientific Research (NWO) under the Vidi program (project No. 016.Vidi.189.037, Lean Forward).

### **References**


### **Nominal Techniques for Software Specification and Verification**

Maribel Fernández

Department of Informatics, King's College London, London, UK Maribel.Fernandez@kcl.ac.uk

**Abstract.** In this talk we discuss the nominal approach to the specification of languages with binders and some applications to programming languages and verification.

**Keywords:** Binding Operator · Nominal Logic · Nominal Rewriting · Unification · Equational Axioms · Type Systems

#### **Overview**

The nominal approach to the specification of languages with binding operators, introduced by Gabbay and Pitts [20, 21, 28], has its roots in nominal set theory [27]. Its userfriendly syntax and first-order presentation (indeed, nominal logic [25, 26] is defined as a theory in first-order logic) makes formal reasoning about binding operators similar to conventional on-paper reasoning.

Nominal logic uses the well-understood concept of *permutation groups acting on sets* to provide a rigorous, first-order treatment of common informal practice to do with fresh and bound names. Nominal matching and nominal unification [36, 37] (which work modulo α-equivalence) are decidable and efficient algorithms exist [7, 8, 9, 22], which are the basis for efficient implementations of nominal rewriting [17–19, 34].

A number of systems (such as Nominal Isabelle [35]) highlighted the benefits of the nominal approach, which gave rise to elegant formalisations of Gödel's theorems [24] and the π-calculus [5] and to advances in programming language semantics [23]. However, there are still some obstacles to the inclusion of nominal features in programming languages and verification environments.

In this talk, I will present our current work towards incorporating nominal techniques into two widely-used rule-based first-order verification environments: the K specification framework [30] and the Maude programming language [11, 12].

An important component of rule-based programming and verification environments is the algorithm used to check equivalence of terms and to solve equations (unification). In practice, unification problems arise in the context of equational axioms (e.g., to take into account associative and commutative (AC) operators [6, 13, 14, 32, 33]). The first part of the talk will discuss notions of α-equivalence modulo associativity and commutativity

Partially funded by the Royal Society (International Exchanges, grant number IES\R2\212106)

axioms [1], extensions of nominal matching and unification to deal with AC operators [2], and the use of nominal narrowing [3] to deal with equational theories presented by convergent nominal rewriting rules.

Another important component of rule-based programming and verification environments is the type system. In the second part of the talk, I will discuss type systems for nominal languages (including polymorphic systems [15] and intersection systems [4]). Dependent type theories, the dominant approach to formalising programming languages, have been extended with nominal features [10, 29, 31]. A lambda-less nominal dependent type system is available [16] and we are currently working on a type checker for this system.

The talk is structured as follows: we will start with the definition of nominal logic (including the notions of fresh atoms and alpha-equivalence) followed by a brief introduction to nominal matching and unification. We will then define nominal rewriting, a generalisation of first-order rewriting that provides in-built support for alpha-equivalence following the nominal approach. Finally, we will discuss notions of nominal unification and rewriting modulo AC operators and briefly overview typed versions of nominal languages.

**Acknowledgements.** I am grateful to my PhD students and co-authors for many fruitful collaborations.

### **References**


### **How Can We Trust AI?**

Mateja Jamnik

Department of Computer Science and Technology, University of Cambridge, UK mateja.jamnik@cl.cam.ac.uk

**Abstract.** Not too long ago most headlines talked about our fear of AI. Today, AI is ubiquitous, and the conversation has moved on from whether we should use AI to how we can trust the AI systems that we use in our daily lives. In this talk I look at some key technical ingredients that help us build confidence and trust in using intelligent technology. I argue that intuitiveness, interaction, explainability and inclusion of human domain knowledge are essential in building this trust. I present some of the techniques and methods we are building for making AI systems that think and interact with humans in more intuitive and personalised ways, enabling humans to better understand the solutions produced by machines, and enabling machines to incorporate human domain knowledge in their reasoning and learning processes.

**Keywords:** Human-like Computing · Artificial Intelligence · Knowledge Representation · Machine Learning · Automated Reasoning · Cognitive Science.

Over the years the work in this talk has been supported by the EPSRC, Leverhulme Trust, ERC, NSF, Cambridge Gates Foundation, Cambridge Trust.

## **Contents**


xxiv Contents



# **Certified Core-Guided MaxSAT Solving**

Jeremias Berg<sup>1</sup> , Bart Bogaerts<sup>2</sup> , Jakob Nordstr¨om3,4 , Andy Oertel3,4(B) , and Dieter Vandesande<sup>2</sup>

<sup>1</sup> HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland

 Artificial Intelligence Laboratory, Vrije Universiteit Brussel, Brussels, Belgium University of Copenhagen, Copenhagen, Denmark Lund University, Lund, Sweden andy.oertel@cs.lth.se

**Abstract.** In the last couple of decades, developments in SATbased optimization have led to highly efficient maximum satisfiability (MaxSAT) solvers, but in contrast to the SAT solvers on which MaxSAT solving rests, there has been little parallel development of techniques to prove the correctness of MaxSAT results. We show how pseudo-Boolean proof logging can be used to certify state-of-the-art core-guided MaxSAT solving, including advanced techniques like structure sharing, weight-aware core extraction and hardening. Our experimental evaluation demonstrates that this approach is viable in practice. We are hopeful that this is the first step towards general proof logging techniques for MaxSAT solvers.

**Keywords:** MaxSAT · core-guided search · proof logging · certifying algorithms

### **1 Introduction**

Combinatorial optimization is one of the most impressive, and most intriguing, success stories in computer science. This area deals with computationally very challenging problems, which are widely believed to require exponential time in the worst case [21,49]. In spite of this, during the last couple of decades astonishing progress has been made on so-called combinatorial solvers for a number of different algorithmic paradigms such as Boolean satisfiability (SAT) solving and optimization [15], constraint programming (CP) [72], and mixed integer programming (MIP) [1,16]. Today, such solvers are routinely used to solve real-world problems with hundreds of thousands or even millions of variables.

While the performance of modern combinatorial solvers is truly impressive, one negative aspect is that they are highly complex pieces of software, and it is well documented that even mature state-of-the-art solvers sometimes give wrong results [2,18,25,37]. This can be fatal for applications where correctness is a non-negotiable demand. Perhaps the most successful approach for addressing this problem so far is the requirement in the SAT solving community that solvers

should be *certifying* [3,62], meaning that when given a formula a solver should output not only a verdict whether the formula is satisfiable or unsatisfiable, but also an efficiently machine-verifiable *proof log* establishing that this verdict is guaranteed to be correct. One can then feed the input formula, the verdict, and the proof log to a special, dedicated *proof checker*, and accept the result if the proof checker agrees that the proof log shows that the solver computation is valid. Over the years, different proof formats such as *RUP* [43], *TraceCheck* [14], *DRAT* [44,45], *GRIT* [27], and *LRAT* [26] have been developed, and for almost a decade *DRAT* proof logging has been compulsory in the (main track of the) SAT competition. However, there has been very limited progress in designing analogous proof logging techniques for more powerful algorithmic paradigms.

Our focus in this work is on the optimization paradigm that is arguably closest to SAT solving, namely *maximum satisfiability* or *MaxSAT* solving [8,56], and the challenge of developing proof logging techniques for MaxSAT solvers.

#### **1.1 Previous Work**

Since essentially all modern MaxSAT solvers are based on repeated invocations of SAT solvers, a first question is why SAT proof logging techniques are not sufficient. While *DRAT* is a very powerful proof system, it seems that the overhead of generating proofs of correctness for the rewriting steps in between SAT solver calls in MaxSAT solvers is too large to be tolerable for practical purposes. Another, related, problem is that for optimization problems one needs to reason about the objective function, which *DRAT* struggles to do since its language is limited to disjunctive clauses. But perhaps the biggest challenge is that while modern SAT solving is completely dominated by the *conflict-driven clause learning (CDCL)* method [11,59,66], for MaxSAT there is a rich variety of approaches including *linear SAT-UNSAT* (or *model-improving search*) [31,54,68], *coreguided search* [4,7,35,67], *implicit hitting set (IHS)* search [28,29], and some recent work on branch-and-bound methods [57] (where we stress that the lists of references are far from exhaustive).

One tempting solution to circumvent this heterogeneity of solving approaches is to treat the MaxSAT solver as a black box and use a single call to a certifying SAT solver to prove optimality of the final solution found. However, there are several problems with this proposal. Firstly, we would still need proof logging to ensure that the input to the SAT solver is a correct encoding of a claim of optimality for the correct problem instance. Secondly, such a SAT call could be extremely expensive, running counter to the goal of proof logging with low (and predictable) overhead. Finally, even if the SAT-call approach could be made to work efficiently, this would just certify the final result, and would not help validate the correctness of the reasoning of the solver. For these reasons, our goal is to provide proof logging for the actual computations of the MaxSAT algorithm.

While some proof systems and tools have been developed specifically for MaxSAT [19,34,48,53,64,65,69–71], none of them comes close to providing general-purpose proof logging, because they apply only for very specific algorithm implementations and/or fail to capture the full range of reasoning used in

an algorithmic approach. A recent work [75] by two co-authors on the current paper instead leverages the pseudo-Boolean proof logging system VeriPB [76] to certify correctness of the unweighted linear SAT-UNSAT solver QMaxSAT. VeriPB is similar in spirit to *DRAT*, but operates with more general 0–1 linear inequalities rather than just clauses. This simplifies reasoning about optimization problems, and also makes it possible to capture the powerful MaxSAT solver inferences in a more concise way. VeriPB has previously been used for proof logging of enhanced SAT solving techniques [17,42] and pseudo-Boolean solving [38], as well as for providing proof-of-concept tools for a nontrivial range of techniques in constraint programming [33,41] and subgraph solving [39,40].

#### **1.2 Our Contributions**

In this work, we use VeriPB to provide, to the best of our knowledge for the first time, efficient proof logging for the full range of techniques in a cutting-edge MaxSAT solver. We consider the state-of-the-art core-guided solver CGSS [47], based on RC2 [46], and show how to enhance CGSS to output proofs of correctness of its reasoning, including sophisticated techniques such as stratification [6,58], intrinsic-at-most-one constraints [46], hardening [6], weight-aware core-extraction [13], and structure sharing [47]. We find that the overhead for such proof logging is perfectly manageable, and although there is certainly room to improve the proof verification time, our experiments demonstrate that already a first proof-of-concept implementation of this approach is practically feasible.

It has been shown previously [32,39,52] that proof logging can also serve as a powerful debugging tool. This is because faulty reasoning is likely to lead to unsound proofs, which can be detected even if the solver produces correct output for all test cases. We exhibit yet another example of this—some proofs for which we struggled to make the verification work turned out to reveal two well-hidden bugs in RC2 and CGSS that earlier extensive testing had failed to uncover.

Although it still remains to provide proof logging for other MaxSAT approaches such as (general, weighted) linear SAT-UNSAT and implicit hitting set (IHS) search, we are optimistic that our work could serve as an important step towards general adoption of proof logging techniques for MaxSAT solvers.

#### **1.3 Outline of This Paper**

After reviewing preliminaries for pseudo-Boolean reasoning and core-guided MaxSAT solving in Sects. 2 and 3, we explain how core-guided MaxSAT solvers can be equipped with proof logging methods in Sect. 4. In Sect. 5 we present our experimental evaluation, after which some concluding remarks and directions for future research are given in Sect. 6.

#### **2 Preliminaries**

We start by a review of some standard material which can be found, e.g., in [20, 38,42]. A *literal* over a Boolean variable <sup>x</sup> (taking values in {0, <sup>1</sup>}, which we

identify with false and true, respectively) is x itself or its negation x, where <sup>x</sup> = 1 <sup>−</sup> <sup>x</sup>. A *pseudo-Boolean (PB)* constraint is a 0-1 integer linear inequality <sup>C</sup> . = - <sup>i</sup> <sup>a</sup>i<sup>i</sup> <sup>≥</sup> <sup>A</sup> (where . = denotes syntactic equality). When convenient, we can assume without loss of generality that PB constraints are in *normalized form* [10]; i.e., all literals <sup>i</sup> are over distinct variables and the coefficients a<sup>i</sup> and the *degree (of falsity)* A are non-negative integers. The set of literals in <sup>C</sup> is denoted *lits*(C). The *negation* of <sup>C</sup> is <sup>¬</sup><sup>C</sup> . = - <sup>i</sup> <sup>a</sup>i<sup>i</sup> <sup>≤</sup> <sup>A</sup> <sup>−</sup> 1 (rewritten in normalized form when needed). A *pseudo-Boolean formula* is a conjunction <sup>F</sup> . = <sup>j</sup> <sup>C</sup><sup>j</sup> of PB constraints. Note that a disjunctive clause can be viewed as a PB constraint with all coefficients and the degree equal to 1, and so formulas in conjunctive normal form (CNF) are special cases of PB formulas.

<sup>A</sup> *(partial) assignment* <sup>ρ</sup> is a (partial) function from variables to {0, <sup>1</sup>}, which we extend to literals by respecting the meaning of negation. Applying ρ to a constraint C yields C<sup>ρ</sup> by substituting the variables assigned in ρ by their values, and for a formula <sup>F</sup> . = <sup>j</sup> <sup>C</sup><sup>j</sup> we define <sup>F</sup>ρ . = <sup>j</sup> <sup>C</sup>j<sup>ρ</sup>. The constraint C is *satisfied* by ρ if - <sup>ρ</sup>(*i*)=1 <sup>a</sup><sup>i</sup> <sup>≥</sup> <sup>A</sup>, and <sup>ρ</sup> satisfies <sup>F</sup> if it satisfies all <sup>C</sup> <sup>∈</sup> <sup>F</sup>, in which case F is *satisfiable*. A formula lacking satisfying assignments is *unsatisfiable*. We say that <sup>F</sup> *implies* <sup>C</sup>, denoted <sup>F</sup> <sup>|</sup><sup>=</sup> <sup>C</sup>, if any assignment satisfying <sup>F</sup> also satisfies C.

An *objective* <sup>O</sup> . = - <sup>i</sup> <sup>w</sup>i<sup>i</sup> +M is an affine function over literals <sup>i</sup> to be minimized by (total) assignments α satisfying F. The *value* (or *cost*) of an objective O under such an α, which we refer to as a *solution*, is O(α) = - <sup>α</sup>(*i*)=1 <sup>w</sup><sup>i</sup> <sup>+</sup> <sup>M</sup>. We write *coeff* (O, <sup>i</sup>) to denote the coefficient w<sup>i</sup> of a literal <sup>i</sup> <sup>∈</sup> *lits*(O).

The foundation of the pseudo-Boolean proof logging in this paper is the *cutting planes* proof system [24], which is a method to iteratively derive new constraints implied by a pseudo-Boolean formula F. If C and D have been derived before or are *axiom constraints* in F, then any positive *linear combination* of these constraints can be derived. *Literal axioms* - ≥ 0 can also be added to any previously derived constraints. For a constraint - <sup>i</sup> <sup>a</sup>i<sup>i</sup> <sup>≥</sup> <sup>A</sup> in normalized form, *division* by a positive integer d derives - iai/d<sup>i</sup> ≥ A/d, and we also add a *saturation* rule that derives - <sup>i</sup> min{ai, A} · <sup>i</sup> <sup>≥</sup> <sup>A</sup> (where the soundness of these rules crucially depends on the normalized form). It is well known that any PB constraint implied by F can be derived using these rules.

A constraint C is said to *unit propagate* the literal to true under an assignment ρ if C<sup>ρ</sup> cannot be satisfied unless is true. During *unit propagation* on F under ρ, we extend ρ iteratively by any propagated literals until an assignment <sup>ρ</sup> is reached under which no constraint <sup>C</sup> <sup>∈</sup> <sup>F</sup> is propagating or some constraint C wants to propagate a literal that has already been assigned to the opposite value. The latter case is called a *conflict*, since C is *violated* by ρ . We say that F implies C by *reverse unit propagation (RUP)*, and that C is a *RUP constraint* with respect to <sup>F</sup>, if <sup>F</sup> ∧ ¬<sup>C</sup> unit propagates to conflict under the empty assignment. It is not hard to see that <sup>F</sup> <sup>|</sup><sup>=</sup> <sup>C</sup> holds if <sup>C</sup> is a RUP constraint, and as a convenient shorthand we will add a RUP rule for deriving new constraints.

In addition to deriving constraints that are implied by a formula F, we also allow deriving so-called *redundant* constraints C that are *not* implied by F as long as some optimal solution is guaranteed to be preserved. This is done by extending the proof system with a *redundance-based strengthening* rule [17,42]. We will only need the special case of this rule saying that for a fresh variable z and for any constraint <sup>D</sup> . = - <sup>i</sup> <sup>a</sup>i<sup>i</sup> <sup>≥</sup> <sup>A</sup> we can introduce the *reified constraints*

$$C\_{\rm reif}^{\ominus}(z, D) \triangleq A\overline{z} + \sum\_{i} a\_{i}\ell\_{i} \ge A \tag{1a}$$

$$C\_{\text{reif}}^{\leftarrow}(z,D) \stackrel{\cdot}{=} (\sum\_{i} a\_i - A + 1)z + \sum\_{i} a\_i \overline{\ell}\_i \ge \sum\_{i} a\_i - A + 1 \tag{1b}$$

encoding the implications <sup>z</sup> <sup>⇒</sup> <sup>D</sup> and <sup>z</sup> ⇐ <sup>D</sup>, respectively. We refer to <sup>z</sup> as the *reification variable*, and when D is clear from context, we will sometimes write just C<sup>⇒</sup> reif(z) for (1a) and <sup>C</sup>⇐ reif(z) for (1b).

The *maximum satisfiability (MaxSAT) problem* can be described conveniently as a special case of pseudo-Boolean optimization. A discussion on the equivalence of the following and the—more classical—clause-centric definition can be found in, for instance, [8,55]. An instance (F, O) of the (weighted partial) MaxSAT problem consists of a CNF formula F and an objective function O written as a non-negative affine combination of literals. The goal is to find a solution α that satisfies F and minimizes O(α). We say that such a solution α is *optimal* for the instance and that the optimal cost of the instance (F, O) is O(α).

#### **3 The OLL Algorithm for Core-Guided MaxSAT Solving**

We now proceed to discuss the core-guided MaxSAT solving in CGSS, which is based on the OLL algorithm [5,63], and describe the main heuristics used in efficient implementations of this algorithm. Given a MaxSAT instance (F*orig* , O*orig* ), OLL takes an optimistic view and attempts to find an assignment satisfying F*orig* in which O*orig* equals its constant term (i.e., all literals in *lits*(O*orig* ) are false). If such a solution exists, it is clearly optimal. Otherwise, the solver will extract a *core* K, which is a clause such that (i) K only contains objective literals, i.e., *lits*(K) <sup>⊆</sup> *lits*(O*orig* ), and (ii) <sup>F</sup>*orig* implies <sup>K</sup>, which means that any solution to F*orig* has to set at least one literal in *lits*(K) to true. The *cost <sup>w</sup>*(K, O) = min{*coeff* (O, -) : - <sup>∈</sup> *lits*(K)} of a core <sup>K</sup> is the smallest coefficient in the objective O of any literal in K. The core K is used to (conceptually) reformulate the instance into (F*ref* , O*ref* ) which has the same minimal-cost solutions. The constant term *LB* in O*ref* is a lower bound on the optimal cost of the instance, and the reformulation is done in such a way that the lower bound increases (exactly) with the cost of the core K as defined above.

In more detail, the algorithm maintains a reformulated objective O*ref* (initialized to O*orig* ) such that the (non-normalized) pseudo-Boolean constraint

$$O\_{orig} \ge O\_{ref} \doteq \sum\_{b \in lists(O\_{orig})} coeff(O\_{orig}, b) \cdot b \ge \sum\_{b' \in lists(O\_{ref})} coeff(O\_{ref}, b') \cdot b' + LB \quad (2)$$

is satisfied by all solutions of F*ref* . Note that the constraint (2), which we refer to as an *objective reformulation constraint*, implies that the constant term *LB* is a lower bound on the optimal cost.

In each iteration, a SAT solver is queried for a solution α to F*ref* with O*ref* (α) = *LB*. If such an α exists, the constraint (2) yields that O*orig* (α) = *LB*, and so α is a minimal-cost solution to (F*orig* , O*orig* ). Otherwise, the solver returns a new core K that requires at least one literal in *lits*(O*ref* ) to be set to 1. This implies that the optimal cost is strictly larger than *LB*, and the core K is used for a new reformulation step.

The objective reformulation step adds new clauses to F*ref* encoding the constraints <sup>y</sup>K,k ⇐ - <sup>b</sup>∈*Lit*(K) <sup>b</sup> <sup>≥</sup> <sup>k</sup> for <sup>k</sup> = 2,..., <sup>|</sup>K|. The new variables <sup>y</sup>K,k are added to O*ref* with coefficient *w*(K, O*ref* ) equalling the cost of K, and the coefficient in O*ref* of each literal in K is decreased by the same amount. Finally, the lower bound *LB*—the constant term of O*ref* —is also increased by *w*(K, O*ref* ). Since yK,k encodes that at least k literals in K are true, we have the equality - <sup>b</sup>∈*lits*(K) <sup>b</sup> =1+ -|K| <sup>k</sup>=2 <sup>y</sup>K,k, where the additive 1 comes from the fact that at least one literal in K has to be true, and the reformulation step is just applying this equality multiplied by *w*(K, O*ref* ) to O*ref* . Notice that the variables added during objective reformulation can later be discovered in other cores. In practice, all implementations of OLL we are aware of encode the semantics of counting variables incrementally [60]. This means that initially only the variable yK,<sup>2</sup> is defined, and the variable yK,i+1 is introduced only after yK,i is found in a core.

Implementations of OLL for MaxSAT—including the CGSS solver that we enhance with proof logging in this work—extend the algorithm with a number of heuristics such as stratification [6,58], hardening [6], the intrinsic-at-most-ones technique [46], weight-aware core extraction [13], and structure sharing [47].

*Stratification* extracts cores not over all literals in O*ref* but only over those whose coefficient is above some bound *wstrat*. This steers search toward cores containing literals with high coefficients, resulting in larger increases of *LB*. Once no more cores over such variables can be found, the algorithm lowers *wstrat*, terminating only after no more cores can be found with *wstrat* = 1. The fact that no more cores containing only variables with coefficients above *wstrat* exist is detected by the SAT solver returning a (possibly non-optimal) solution α. The minimal cost O*orig* (α) of all such solutions gives an upper bound *UB* on the optimal cost of the instance, allowing OLL to terminate as soon as *LB* = *UB*.

*Hardening* fixes literals in O*ref* to 0 based on information provided by the current upper and lower bounds *UB* and *LB*. If for any <sup>b</sup> <sup>∈</sup> *lits*(O*ref* ) it holds that *coeff* (O*ref* , b)+*LB* > *UB*, then any solution α with b = 1 would have higher cost than the current best solution known, and would thus not be optimal.

The *intrinsic-at-most-one* technique identifies subsets S ⊆ *lits*(O*ref* ) of objective literals such that - <sup>b</sup>∈S <sup>b</sup> <sup>≤</sup> 1 is implied, i.e., any solution can assign at most one literal in S to 0. This is used both to increase the lower bound and to reformulate the objective. If we let *<sup>w</sup>min* = min{*coeff* (O*ref* , b) : <sup>b</sup> ∈ S}, then <sup>S</sup> implies a lower bound increase of *LB*<sup>S</sup> = (|S| − 1) · *wmin*. Additionally, we define a new variable -<sup>S</sup> by the clause -<sup>S</sup> + - <sup>b</sup>∈S <sup>b</sup> <sup>≥</sup> 1 to indicate if in fact all literals in <sup>S</sup>

are true, and introduce it in the reformulated objective with coefficient *wmin*. This means that we remove the already known lower bound *LB*<sup>S</sup> from <sup>O</sup>*ref* and transfer the possible additional cost *<sup>w</sup>min* from <sup>S</sup> to the variable -S .

*Weight-aware core extraction* (WCE) delays objective reformulation, and the accompanying increase in new variables and clauses, for as long as possible. When a new core K is extracted by a solver that uses WCE, initially only the coefficient of each <sup>b</sup> <sup>∈</sup> *lits*(K) is lowered and the lower bound *LB* is increased by *w*(K, O*ref* ). Then the SAT solver is invoked again with the literals, that still have coefficients above *wstrat* in O*ref* , set to 0. When the SAT solver finds a satisfying assignment extending the assumptions, all objective reformulations steps are then performed at once. This is correct since the final effect is the same as if the core would have been discovered one by one and immediately followed by objective reformulation. Notice that this core extraction loop is guaranteed to terminate since the coefficient of at least one variable is decreased to 0 for each new core. *Structure sharing* is a recent extension to weight-aware core extraction that makes use of the potential overlap in cores detected in order to achieve more compact encodings of counting variable semantics.

#### **4 Proof Logging for the OLL Algorithm for MaxSAT**

We have now reached a point where we can describe the contribution of this work, namely how to add proof logging to an OLL-based core-guided MaxSAT solver, including all the state-of-the-art techniques described in Sect. 3.

In our proof logging routines we maintain the invariants described next. The reformulated objective O*ref* is already implicitly tracked by the solver and at all times it is possible to derive that <sup>O</sup>*orig* <sup>≥</sup> <sup>O</sup>*ref* as in (2). We also keep track of the current upper bound *UB* on O*orig* and best solution α*best* found so far. All cores that have been found and processed are in the set K.

*SAT Solver Calls.* The CDCL SAT solvers used in core-guided MaxSAT algorithms can support *DRAT* proof logging, and since the proof format used by VeriPB is a strict extension of *DRAT* (modulo small and purely syntactical modifications) it is straightforward to provide proof logging for the part of the reasoning done in SAT solver calls, and to add all learned clauses to the proof checker database.

Each invocation of the SAT solver returns either a new solution α or a new core K. When a solution α with O*orig* (α) < *UB* is obtained, it is logged in the proof, which adds the *objective-improving constraint*

$$O\_{orig} \le UB - 1\tag{3a}$$

(which is

$$\sum\_{b \in lists(O\_{orig})} coeff(O\_{orig}, b) \cdot \overline{b} \ge 1 + \sum\_{b \in lists(O\_{orig})} coeff(O\_{orig}, b) - UB \tag{3b}$$

in normalized form). A technical side remark is that later solutions with cost greater than *UB* cannot successfully be logged, since they violate the constraint (3a) added to the proof checker database, and so the proof logging routines make sure to only log solutions that improve the current upper bound.

If the SAT solver instead returns a new core K, this clause is guaranteed to be a reverse unit propagation (RUP) clause with respect to the set of clauses currently in the solver database, and so we can use the RUP rule to add K to the proof checker database (which contains a superset of the clauses known by the solver). For our book-keeping, we also add <sup>K</sup> to the set <sup>K</sup>. A special case is that K could be the contradictory empty clause, corresponding to the pseudo-Boolean constraint 0 ≥ 1. This means that there are no solutions to the formula.

To optimize the efficiency of proof verification, constraints should be deleted from the proof when they are no longer needed. Since SAT solver proofs are only used to prove *unsatisfiability* this does not cause any issues, but when certifying *optimality* we have to be careful in order not to create better-thanoptimal solutions (which could happen if, e.g., constraints in the input formula are removed). The *checked deletion* rule [17] ensuring this in VeriPB does not have any analogue in *DRAT*, so some care is needed here when translating SAT solver proofs into the VeriPB format.

*Incremental Totalizer with Structure Sharing.* Different implementations of OLL for MaxSAT differ in which encoding is used for the counting variables introduced during objective reformulation [9,50,51]. The two solvers we consider use totalizers [9], so we start by explaining this encoding and then show how to provide proof logging for the clauses added to the proof checker database.

The totalizer encoding for a set <sup>I</sup> <sup>=</sup> {-1,...,<sup>n</sup>} of literals is a CNF formula T that defines *counting variables* yI,j for j = 1,...,n such that for any assignment that satisfies <sup>T</sup> the variable <sup>y</sup>I,j is true if and only if n <sup>i</sup>=1 <sup>i</sup> <sup>≥</sup> <sup>j</sup>. The structure of <sup>T</sup> can be viewed as a binary tree, with literals in <sup>I</sup> at the leaves and with each internal node η associated with variables counting the true leaf literals in the subtree rooted at η. The variables yI,j are associated with the root of the tree.

More formally, given a set of literals I, we construct a binary tree with leaves labelled by the literals in <sup>I</sup>. For every node <sup>η</sup> of <sup>T</sup> , let *lits*(η) denote the leaves in the subtree rooted at η; where it is convenient, we will overload I to also refer to the root note. For each internal node η, the totalizer encoding introduces the counting variables <sup>S</sup><sup>η</sup> <sup>=</sup> {yη,<sup>1</sup>,...,yη,|*lits*(η)|}, the meaning of which can be encoded recursively in terms of the variables S<sup>η</sup><sup>1</sup> and S<sup>η</sup><sup>2</sup> for the children η<sup>1</sup> and η<sup>2</sup> of η by the (pseudo-Boolean form of the) clauses

$$C^{\leftarrow}\_{\eta}(\alpha,\beta,\sigma) \doteq y\_{\eta,\sigma} + \overline{y}\_{\eta\_1,\alpha} + \overline{y}\_{\eta\_2,\beta} \ge 1 \tag{4a}$$

$$C^{\Rightarrow}\_{\eta}(\alpha,\beta,\sigma) \doteq \overline{y}\_{\eta,\sigma+1} + y\_{\eta\_1,\alpha+1} + y\_{\eta\_2,\beta+1} \ge 1 \tag{4b}$$

for all integers α, β, σ such that <sup>α</sup> <sup>+</sup> <sup>β</sup> <sup>=</sup> <sup>σ</sup> and 0 <sup>≤</sup> <sup>α</sup> ≤ |*lits*(η1)|, 0 <sup>≤</sup> <sup>β</sup> <sup>≤</sup> <sup>|</sup>*lits*(η2)|, and 0 <sup>≤</sup> <sup>σ</sup> ≤ |*lits*(η)|. We use the notational conventions in (4a)– (4b) that y,<sup>1</sup> = for all leaves -, and that <sup>y</sup>η,<sup>0</sup> = 1 and <sup>y</sup>η,|*lits*(η)|+1 = 0 for all nodes <sup>η</sup> (so that clauses containing <sup>y</sup>η,<sup>0</sup> or <sup>y</sup>η,|*lits*(η)|+1 can be simplified to binary clauses or be omitted when they are satisfied). The clauses C<sup>⇒</sup> <sup>η</sup> (α, β, σ) in (4b) are not necessarily added to the clause database of the MaxSAT solver, but are sometimes included for improved propagation.

We now turn to the question of how to derive the clauses (4a)–(4b) encoding the meaning of the counting variables yI,j in the proof. This is a two-step process. First, reified pseudo-Boolean (and, in general, non-clausal) constraints C<sup>⇒</sup> reif(yη,j ) and <sup>C</sup>⇐ reif(yη,j ) as in (1a)–(1b), encoding that <sup>y</sup>η,j holds if and only if - ∈*lits*(η) - <sup>≥</sup> <sup>j</sup>, are derived by redundance-based strengthening. Then the clauses added to the MaxSAT solver are derived from these pseudo-Boolean constraints. Although we omit the details due to space constraints, it is not hard to show that for any internal node η with children η<sup>1</sup> and η2, the clauses C⇐ <sup>η</sup> (α, β, σ) and C<sup>⇒</sup> <sup>η</sup> (α, β, σ) in (4a)–(4b) can be derived from the constraints C⇐ reif(yη,σ), C<sup>⇒</sup> reif(yη,σ), <sup>C</sup>⇐ reif(y<sup>η</sup>1,α), <sup>C</sup><sup>⇒</sup> reif(y<sup>η</sup>1,α), <sup>C</sup>⇐ reif(y<sup>η</sup>2,β), and <sup>C</sup><sup>⇒</sup> reif(y<sup>η</sup>2,β) by standard cutting planes derivations as in [75]. In particular, the certification of these totalizers can be done incrementally: clauses in the encoding can be derived as the corresponding counter variables are lazily introduced in the OLL algorithm.

This approach is also compatible with structure sharing, where subtrees of a previously constructed totalizer tree can be reused (to avoid doing the same work twice). The only constraints from a subtree rooted at η<sup>∗</sup> that are needed when generating another totalizer encoding at a higher level are the constraints C<sup>⇒</sup> reif(y<sup>η</sup>∗,σ) and <sup>C</sup>⇐ reif(y<sup>η</sup>∗,σ) defining the counter variables in the subtree root <sup>η</sup>∗.

To decrease the memory usage of the proof checker, it can be useful to *delete* reification constraints from the proof once we know that they will no longer be needed. Without structure sharing, for an internal node η, once all clauses that mention yη,j are created, the constraints C⇐ reif(yη,j ) and <sup>C</sup><sup>⇒</sup> reif(yη,j ) will not be used anymore and can thus be deleted. On the other hand, structure sharing reuses as many counting variables as possible, even over multiple iterations of weight-aware core extraction. This means that C⇐ reif(yη,j ) and <sup>C</sup><sup>⇒</sup> reif(yη,j ) need to be retained, even after all clauses in the totalizer encoding for all parents of node η have been created.

*Objective Reformulation.* If counting variables yK,i for i = 2,...,s<sup>K</sup> have been introduced for the core K, then the objective reformulation with respect to K is derived with the help of the constraint

$$\sum\_{b \in K} b \ge 1 + \sum\_{i=2}^{\aleph\_K} y\_{K,i} \tag{5a}$$

(or

$$\sum\_{b \in K} b + \sum\_{i=2}^{s\_K} \overline{y}\_{K,i} \ge s\_K \tag{5b}$$

in normalized form). The constraint (5b) can in turn be obtained from the core clause K and the reified constraints C<sup>⇒</sup> reif(yK,j ). It is clear that this should be possible, since the latter constraints define the variables yK,j precisely so that (5b) should hold, and we refer to Algorithm 5 in [38] for the details. Also, each time

a new counting variable yK,j is introduced for a core K, we add it to (5b) to maintain this constraint as an invariant.

To illustrate how this update works, suppose we have a core <sup>K</sup> . = n <sup>i</sup>=1 <sup>b</sup><sup>i</sup> <sup>≥</sup> <sup>1</sup> for which n <sup>i</sup>=1 <sup>b</sup> <sup>+</sup> s*K*−1 <sup>i</sup>=2 <sup>y</sup>K,i <sup>≥</sup> <sup>s</sup><sup>K</sup> <sup>−</sup> 1 has already been derived. The next counting variable <sup>y</sup>K,s*<sup>K</sup>* is introduced by the reification <sup>s</sup><sup>K</sup> ·yK,s*<sup>K</sup>* <sup>+</sup>n <sup>i</sup>=1 <sup>b</sup><sup>i</sup> <sup>≥</sup> <sup>s</sup>K. The previous constraint is multiplied by <sup>s</sup><sup>K</sup> <sup>−</sup> 1 and added to the new reified constraint, yielding <sup>s</sup><sup>K</sup> · n <sup>i</sup>=1 <sup>b</sup>+(s<sup>K</sup> <sup>−</sup>1)· s*K*−1 <sup>i</sup>=2 <sup>y</sup>K,i+s<sup>K</sup> ·yK,s*<sup>K</sup>* <sup>≥</sup> (s<sup>K</sup> <sup>−</sup>1)·s<sup>K</sup> +1. Dividing this last constraint by s<sup>K</sup> results in n <sup>i</sup>=1 <sup>b</sup> <sup>+</sup> s*<sup>K</sup>* <sup>i</sup>=2 <sup>y</sup>K,i <sup>≥</sup> <sup>s</sup>K, which is the desired updated constraint.

For a set of extracted cores K, we can derive the objective reformulation constraint <sup>O</sup>*orig* <sup>≥</sup> <sup>O</sup>*ref* by multiplying (5b) for each <sup>K</sup> ∈ K by the cost *<sup>w</sup>*(K, O*ref* ) of K and summing up all these multiplied constraints. The fact that we have an inequality <sup>O</sup>*orig* <sup>≥</sup> <sup>O</sup>*ref* rather than an equality is due to the incremental use of totalizers. More specifically, if <sup>s</sup><sup>K</sup> <sup>=</sup> <sup>|</sup>*lits*(K)<sup>|</sup> would hold for every <sup>K</sup> ∈ K, it would be possible to derive O*orig* = O*ref* instead. Here we would like to stress one subtlety for developing proof logging for OLL: as the algorithm progresses and more output variables of totalizers are introduced (i.e., the counters s<sup>K</sup> increase), the reformulated objective potentially also increases—because of added counted variables when <sup>s</sup><sup>K</sup> increases we have the inequality <sup>O</sup>*orig* <sup>≥</sup> <sup>O</sup>*new ref* <sup>≥</sup> <sup>O</sup>*old ref* . For this reason, the old constraint <sup>O</sup>*orig* <sup>≥</sup> <sup>O</sup>*old ref* cannot be used to derive <sup>O</sup>*orig* <sup>≥</sup> O*new ref* after objective reformulation. Instead, we have to derive <sup>O</sup>*orig* <sup>≥</sup> <sup>O</sup>*ref* from scratch each time the solver argues with the reformulated objective. For doing this we need to have access to the entire set K of cores.

*Proving Optimality.* When the solver has found an optimal solution and established a matching lower bound, optimality is certified in the proof log using a proof by contradiction from the objective reformulation constraint <sup>O</sup>*orig* <sup>≥</sup> <sup>O</sup>*ref* in (2) and the (normalized form of the) objective-improving constraint <sup>O</sup>*orig* <sup>≤</sup> *UB* − 1 in (3b). If we add these two constraints and cancel like terms, we get

$$\sum\_{b' \in lists(O\_{ref})} coeff(O\_{ref}, b') \cdot \overline{b}' \ge 1 - UB + LB + \sum\_{b' \in lists(O\_{ref})} coeff(O\_{ref}, b') \,. \tag{6}$$

Since we have *UB* = *LB* when the optimal solution has been found, and since - b-<sup>∈</sup>*lits*(O*ref* ) *coeff* (O*ref* , b )· b cannot possibly exceed - b-<sup>∈</sup>*lits*(O*ref* ) *coeff* (O*ref* , b ), the constraint (6) can be simplified to contradiction 0 ≥ 1.

*Intrinsic At-Most-One Constraints.* Certifying intrinsic at-most-one constraints for a set S ⊆ *lits*(O*ref* ) of literals requires deriving (i) the at-most-one constraint stating that at most one <sup>b</sup> ∈ S is assigned to 0 by any solution and (ii) constraints defining the variable -<sup>S</sup> . Such sets S are detected by unit propagation that implicitly derives implications <sup>b</sup><sup>i</sup> <sup>⇒</sup> <sup>b</sup><sup>j</sup> in the form of binary clauses <sup>b</sup><sup>i</sup> <sup>+</sup> <sup>b</sup><sup>j</sup> <sup>≥</sup> <sup>1</sup> for every pair of variables in S. In the proof log, all these binary clauses can be obtained by RUP steps, after which the at-most-one constraint - <sup>b</sup>∈S <sup>b</sup> <sup>≤</sup> <sup>1</sup> (which is - <sup>b</sup>∈S <sup>b</sup> ≥ |S| − 1 in normalized form) is derived by a standard cutting planes derivation (see, e.g., [24]).

The reified constraints -<sup>S</sup> ⇐ - <sup>b</sup>∈S <sup>b</sup> ≥ |S<sup>|</sup> and -<sup>S</sup> ⇒ - <sup>b</sup>∈S <sup>b</sup> ≥ |S<sup>|</sup> defining the variable -<sup>S</sup> (which are -S+- <sup>b</sup>∈S <sup>b</sup> <sup>≥</sup> 1 and -S+- <sup>b</sup>∈S <sup>b</sup> ≥ |S|, respectively, in normalized form) are derived by redundance-based strengthening. Note that the latter constraint does not exist in the MaxSAT solver, but we need it in the proof in order to derive the objective reformulation for the at-most-one constraint.

*Hardening.* Formally, hardening corresponds to deriving <sup>b</sup> <sup>≥</sup> 1 in the proof for some literal <sup>b</sup> <sup>∈</sup> *lits*(O*ref* ) for which *UB* <sup>&</sup>lt; *LB* <sup>+</sup> *coeff* (O*ref* , b) holds. Such an inequality <sup>b</sup> <sup>≥</sup> 1 is implied by RUP if we first derive the constraint (6), since assigning b = 1 results in (6) being contradicting.

*Upper Bound Estimation.* A final technical proof logging detail is that some implementations of the OLL algorithm for MaxSAT—including the Pythonbased version of CGSS—do not use the actual cost of the solution found by the SAT solver as the upper bound *UB* when hardening. In order to avoid the overhead in Python of extracting the solution from the SAT solver, an upper bound estimate *UBest* is computed instead based on the initial assignment passed to the SAT solver in the call. Since any valid estimate is at least the cost of the solution found (i.e., *UBest* ≥ *UB*), hardening steps based on *UBest* can be justified by first deriving <sup>O</sup>*orig* <sup>≤</sup> *UBest* <sup>−</sup> 1, which follows from the latest objective-improving constraint (3a). However, in order to handle solutions correctly in the proof, the proof logging routines need to extract the solution found by the solver and compute the actual cost, which means that a Python-based solver will not be able to avoid this overhead when running with proof logging.

*Worked-Out Example.* We end this section with a complete, worked-out example of OLL solving and proof logging for the toy MaxSAT instance (F, O) with formula <sup>F</sup> <sup>=</sup> {(b<sup>1</sup> <sup>∨</sup>x),(¬x∨b2),(b<sup>3</sup> <sup>∨</sup>b4)} and objective <sup>O</sup> = 5b<sup>1</sup> + 5b<sup>2</sup> <sup>+</sup> <sup>b</sup><sup>3</sup> <sup>+</sup> <sup>b</sup>4.

After initialization, the internal SAT solver of the OLL algorithm is loaded with the clauses of F and the proof consists of constraints (1)–(3) in Table 1. The OLL search begins by invoking the SAT solver on the clauses in F in order to check the existence of any solutions. Assume the SAT solver returns the solution α<sup>1</sup> assigning b<sup>1</sup> = b<sup>3</sup> = b<sup>4</sup> = 1 and b<sup>2</sup> = x = 0. This solution has objective value O(α1) = O*orig* (α1) = 7 so the algorithm updates *UB* = 7 and logs the objective-improving constraint (4) in Table <sup>1</sup> equivalent to <sup>O</sup>*orig* <sup>≤</sup> 6.

Assume the stratification bound *wstrat* is initialised to 2. Then the solver is invoked with b<sup>1</sup> = b<sup>2</sup> = 0 and returns the core K<sup>1</sup> . <sup>=</sup> <sup>b</sup><sup>1</sup> <sup>+</sup> <sup>b</sup><sup>2</sup> <sup>≥</sup> 1, which is added to the proof as constraint (5). As already mentioned, core clauses are guaranteed to be RUP with respect to the set of clauses in the SAT solver database, which are also added to the proof.

For simplicity, we ignore WCE and structure sharing in this example, meaning that the solver next reformulates the objective based on K<sup>1</sup> by introducing clauses enforcing <sup>y</sup><sup>K</sup>1,<sup>2</sup> ⇐ (b<sup>1</sup> <sup>+</sup>b<sup>2</sup> <sup>≥</sup> 2) for the new counting variable <sup>y</sup><sup>K</sup>1,<sup>2</sup>. This is done by (i) introducing the pseudo-Boolean constraints (6) and (7) in Table 1 by reification, and (ii) deriving the clauses corresponding to these constraints. While the MaxSAT solver only uses the implication (6), the proof also requires


**Table 1.** Example proof produced by a certified OLL solver.

constraint (7) corresponding to <sup>y</sup><sup>K</sup>1,<sup>2</sup> <sup>⇒</sup> (b<sup>1</sup> <sup>+</sup> <sup>b</sup><sup>2</sup> <sup>≥</sup> 2). Conveniently, in this toy example <sup>y</sup><sup>K</sup>1,<sup>2</sup> ⇐ (b<sup>1</sup> <sup>+</sup> <sup>b</sup><sup>2</sup> <sup>≥</sup> 2) is already the clause <sup>b</sup><sup>1</sup> <sup>+</sup> <sup>b</sup><sup>2</sup> <sup>+</sup> <sup>y</sup><sup>K</sup>1,<sup>2</sup> <sup>≥</sup> 1, so step (ii) is not needed. For the general case, we derive totalizer clauses as explained in Sect. 4. Conceptually, we now replace 5b<sup>1</sup> + 5b<sup>2</sup> by 5y<sup>K</sup>1,<sup>2</sup> + 5 to obtain the reformulated objective O*ref* = b<sup>3</sup> + b<sup>3</sup> + 5y<sup>K</sup>1,<sup>2</sup> + 5 with lower bound *LB* = 5. The core K<sup>1</sup> says that at least one of b<sup>1</sup> and b<sup>2</sup> must be true, thus incurring a cost of 5, and y<sup>K</sup>1,<sup>2</sup> is added to the objective to indicate if both of them incur cost.

Since it now holds that *coeff* (O*ref* , y<sup>K</sup>1,<sup>2</sup>) + *LB* = 5+5 <sup>≥</sup> 7 = *UB*, the literal y<sup>K</sup>1,<sup>2</sup> is hardened to 0. In order to certify this hardening step, i.e., derive <sup>y</sup><sup>K</sup>1,<sup>2</sup> <sup>≥</sup> 1, the proof logger first derives the objective reformulation constraint <sup>5</sup>b<sup>1</sup> + 5b<sup>2</sup> <sup>+</sup> <sup>b</sup><sup>3</sup> <sup>+</sup> <sup>b</sup><sup>4</sup> <sup>≥</sup> <sup>b</sup><sup>3</sup> <sup>+</sup> <sup>b</sup><sup>4</sup> + 5y<sup>K</sup>1,<sup>2</sup> + 5 enforced by line (8) in Table 1. The objective-improving and objective reformulation constraints are then added together to get constraint (9), after which <sup>y</sup><sup>K</sup>1,<sup>2</sup> <sup>≥</sup> 1 is obtained by a RUP step.

The next SAT solver call with b<sup>3</sup> = b<sup>4</sup> = 0 returns as core the input clause <sup>b</sup><sup>3</sup> <sup>+</sup> <sup>b</sup><sup>4</sup> <sup>≥</sup> 1, and reformulation (lines (11)–(13)) yields <sup>O</sup>*ref* = 5y<sup>K</sup>1,<sup>2</sup> <sup>+</sup> <sup>y</sup><sup>K</sup>2,<sup>2</sup> + 6 with *LB* = 6. Now suppose the SAT solver finds the solution α<sup>2</sup> with b<sup>2</sup> = b<sup>3</sup> = x = 1 and all other variables set to 0, resulting in the objective-improving constraint (15). Since O*orig* (α2)=6= *LB*, the solver terminates and reports α<sup>2</sup> to be optimal. To certify that this is correct, another objective reformulation constraint (16) is derived, after which the contradictory constraint (17) is obtained by adding (15) and (16). This proves that solutions with cost less than 6 do not exist.

**Fig. 1.** Running time of CGSS with and without proof logging.

**Fig. 2.** CGSS running time compared to time required for proof checking.

#### **5 Experimental Evaluation**

To evaluate the proof logging techniques developed in this paper, we have implemented them in the state-of-the-art MaxSAT solver CGSS [22,47], which uses the OLL algorithm and structure-sharing totalizers. We employed VeriPB [76], extended to parse MaxSAT instances in the standard WCNF format, to verify the certificates of correctness emitted by the certifying solver.

Our experiments were conducted on machines with an 11th Gen Intel(R) Core(TM) i5-1145G7 @ 2.60 GHz CPU and 16 GB of memory. Each benchmark ran exclusively on a single machine with a memory limit of 14 GB and a time limit of 3 600 s for solving with CGSS and 36 000 s for checking the certificates with VeriPB. As benchmarks we used all 594 weighted and 607 unweighted instances from the complete track of the MaxSAT Evaluation 2022 [61], where an instance (F, O) is *unweighted* if all coefficients *coeff* (O, -) are equal. The data from our experiments can be found in [12].

*Overhead of Proof Logging.* To evaluate the overhead in solver running time, we compared the standard CGSS solver [23] without proof logging (but with the bug fixes discussed below) to CGSS with proof logging as described in this paper. With proof logging 803 instances are solved within the resource limits, which is 3 instances less than without proof logging (see Fig. 1). Adding proof logging slowed down CGSS by about 8.8% in the median over all solved instances. For 95% of the instances CGSS with proof logging was at most 36.2% slower. Thus, the proof logging overhead seems perfectly manageable and should present no serious obstacles to using proof logging in core-guided MaxSAT solvers.

*Overhead of Proof Checking.* To assess the efficiency of proof checking, we compared the running time of CGSS with proof logging to the time taken by VeriPB for checking the generated proofs. The instances that were not solved


**Table 2.** Illustration of discovered bug (where y*i,k* should be read as y*<sup>K</sup>i,k*).

by CGSS within the resource limits were filtered out, since the running time for checking an incomplete proof is inconclusive.

VeriPB successfully checked the proofs for 747 out of the 803 instances solved by CGSS (see Fig. 2); 42 instances failed due to the memory limit and 14 instances failed due to the time limit. Checking the proof took about 3 times the solving time in the median for successfully checked instances. About 87% of the successfully checked instances were checked within 10 times the solving time.

Proof checking time compared to solver running time varies widely, but our experiments indicate that the performance of VeriPB is sufficient in most cases, and verification time scales linearly with the size of the proof for a majority of the instances. However, there is room to improve VeriPB, where focus so far has been on proof logging strength rather than performance. For the instances where checking is 100 times slower than solving, the main bottleneck is the proof generated by the SAT solver, which could be addressed by standard techniques for checking *DRAT* proofs, and checking logged solutions (when objective improving constraints (3a) are added) could also be implemented more efficiently.

*Bugs Discovered by Proof Logging.* Our work on implementing proof logging in CGSS led to the discovery of two bugs, which were also present in the solver RC2 on which CGSS is based, but have now been fixed in CGSS in commit 5526d04 and in RC2 in commit d0447c3. The bugs are due to a slightly different implementation of OLL compared to the description in Sect. 3.

First, when a counting variable y<sup>K</sup>*old* ,i for a core K*old* appears for the first time in a later core K*new* , the next counting variable y<sup>K</sup>*old* ,i+1 is added to the reformulated objective with coefficient *w* K*new* , O*new* rather than *w* K*old* , O*old* . The coefficient of y<sup>K</sup>*old* ,i+1 is then further increased when y<sup>K</sup>*old* ,i is found in future cores. Second, rather than computing the upper bound *UB* from an actual

solution, CGSS uses a weaker estimate *UBest* obtained by summing the current lower bound and the coefficients of all literals b where *coeff* (O*ref* , b) < *wstrat* (meaning that these literals were not set to 0 in the SAT solver call, and so could potentially be true in the solution).

The bugs we detected could lead to the solver producing an overly optimistic estimate *UBest* < *UB*. The first way this can happen is when the contributions of counting variables yK,k in the reformulated objective are underestimated due to too small coefficients. The second bug is when the coefficient of yK*old* ,i+1 is first lowered below *wstrat* and then raised above this threshold again when y<sup>K</sup>*old* ,i is found in a core. Then CGSS fails to assume y<sup>K</sup>*old* ,i+1 = 0 in future solver calls. These bugs can result in erroneous hardening as detailed in the next example.

*Example 1.* Given a MaxSAT instance (F, O) with F = <sup>5</sup> <sup>i</sup>=1 <sup>b</sup><sup>i</sup> ,(o<sup>1</sup> <sup>∨</sup> <sup>o</sup>2) ∪ {b<sup>i</sup> <sup>∨</sup> <sup>e</sup><sup>i</sup> <sup>|</sup> <sup>i</sup> = 1,..., <sup>5</sup>} and <sup>O</sup> <sup>=</sup> -5 <sup>i</sup>=1 <sup>10</sup> · <sup>b</sup><sup>i</sup> + 11 · <sup>e</sup><sup>1</sup> + 14 · <sup>e</sup><sup>2</sup> + 11 · <sup>e</sup><sup>3</sup> + 3 · <sup>e</sup><sup>4</sup> <sup>+</sup> <sup>2</sup> · <sup>e</sup><sup>5</sup> <sup>+</sup> <sup>o</sup><sup>1</sup> <sup>+</sup> <sup>o</sup>2, assume the stratification bound is *<sup>w</sup>strat* = 2. Table <sup>2</sup> displays a possible CGSS run for this instance, except that for simplicity we assume one core extraction per iteration and no use of any other heuristics. The upper half of the table lists the variables set to 0 in solver calls, the extracted core, and the lower bound derived from it. The lower half of the table provides the reformulated objective. Even though the coefficient of y<sup>K</sup>1,<sup>3</sup> is increased to 8 after the fourth core, this variable is not set to 0 in subsequent iterations, which allows the solver to finish the stratification level after extracting 6 cores with a solution that sets to true the variables b1, b2, b3, b5, e4, o1, o2, y<sup>K</sup>2,<sup>2</sup> and y<sup>K</sup>1,i for i = 1,..., 4, and all other variables to false. The cost of this solution is 45.

Now CGSS would incorrectly estimate *UBest* = *LB* + 4 = 28, since y<sup>K</sup>1,<sup>3</sup> and y<sup>K</sup>2,<sup>2</sup> (abbreviated as y1,<sup>3</sup> and y2,<sup>2</sup> in the table) both have coefficient 1 in the current reformulated objective. This is lower than the cost 45 of the solution found (and even than the optimum 36), and erroneously allows hardening which considers y<sup>K</sup>1,<sup>3</sup> with the correct coefficient 8—to fix y<sup>K</sup>1,<sup>3</sup> = 0, even though b1, b<sup>2</sup> and b<sup>3</sup> (and hence also yK1,<sup>3</sup>) are true in every minimal-cost solution.

In our computational experiments there were cases of faulty hardening, but all incorrectly fixed values happened to agree with some optimal solution and so we never observed incorrect results. Proof logging detected the problem, however, since the derivations of the buggy hardening steps failed during proof checking. Interestingly, what proof logging did *not* turn up was any examples of mistaken claims <sup>O</sup>*orig* <sup>≤</sup> *UBest* <sup>−</sup> 1 when the cost of a found solution was estimated. The issue with mistaken estimates due to faulty stratification was instead discovered while analyzing and fixing the hardening bug. The moral of this is that even if all results are certified as correct, this does not certify that the code is free from bugs that have not yet manifested themselves. However, proof logging still guarantees that even if the solver would have undiscovered bugs, we can always trust computed results for which the accompanying proofs pass verification.

#### **6 Concluding Remarks**

In this work, we develop pseudo-Boolean proof logging techniques for core-guided MaxSAT solving and implement them in the solver CGSS [47] with support for the full range of sophisticated reasoning techniques it uses. To the best of our knowledge, this is the first time a state-of-the-art MaxSAT solver has been enhanced to output machine-verifiable proofs of correctness. We have made a thorough evaluation on benchmarks from the MaxSAT Evaluation 2022 using the VeriPB proof checker [17,42], and find that proof logging overhead is perfectly manageable and that proof verification time, while leaving room for improvement, is definitely practically feasible. Our work also showcases the benefit of proof logging as a debugging tool—erroneous proofs produced by CGSS revealed two subtle bugs in the solver that previous extensive testing had failed to uncover.

Regarding proof verification time, further investigation is needed into the rare cases where verification is much slower (say, more than a factor 10) than solving. There are reasons to believe, though, that this is not a problem of MaxSAT proof logging per se, but rather is explained by features not yet added to VeriPB, which is a tool currently undergoing very active development. So far, the proof checker has been optimized for other types of reasoning than the clausal reverse unit propagation (RUP) steps that dominate SAT proofs. Also, VeriPB lacks the ability to trim proofs during checking as in [44]. Finally, introducing a binary proof format in addition to plain-text proofs would be another way to boost performance of proof checking. But these are matters of engineering rather than research, and can be taken care of once the proof logging technology as such has been developed and has proven its worth.

The focus of this work is on core-guided MaxSAT solving, but we would like to extend our techniques to solvers using linear SAT-UNSAT (LSU) solving (such as Pacose [68]) and implicit hitting set (IHS) search (such as MaxHS [28,29]). Although there are certainly nontrivial technical challenges that will need to be overcome, we are optimistic that our work paves the way towards a unified proof logging system for the full range of modern MaxSAT solving approaches. Going beyond MaxSAT, it would also be interesting to extend VeriPB proof logging to pseudo-Boolean solvers using core-guided search [30] or IHS [73,74], and perhaps even to similar techniques in constraint programming [36] and answer set programming [5].

**Acknowledgements.** This work was partly carried out while some of the authors were visiting the Simons Institute for the Theory of Computing at UC Berkeley for the extended reunion of the program "Satisfiability: Theory, Practice, and Beyond" during the spring of 2023. We also benefited greatly from the Dagstuhl Seminar 22411 "Theory and Practice of SAT and Combinatorial Solving". Additionally, we acknowledge several inspirational discussions on certifying solvers and proof logging with, among others, Ambros Gleixner, Stephan Gocht, and Ciaran McCreesh. The computational experiments were enabled by resources provided by LUNARC at Lund University.

Jeremias Berg was fully supported by the Academy of Finland under grant 342145. Bart Bogaerts and Dieter Vandesande were supported by Fonds Wetenschappelijk

Onderzoek – Vlaanderen (project G070521N) and by the EU ICT-48 2020 project TAI-LOR (GA 952215). Jakob Nordstr¨om was supported by the Swedish Research Council grant 2016-00782 and the Independent Research Fund Denmark grant 9040-00389B. Andy Oertel was supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Superposition with Delayed Unification

Ahmed Bhayat1(B) , Johannes Schoisswohl<sup>2</sup> , and Michael Rawson<sup>2</sup>

<sup>1</sup> University of Manchester, Manchester, UK ahmed.bhayat@manchester.ac.uk <sup>2</sup> TU Wien, Vienna, Austria {johannes.schoisswohl,michael.rawson}@tuwien.ac.at

Abstract. Classically, in saturation-based proof systems, unification has been considered atomic. However, it is also possible to move unification to the calculus level, turning the steps of the unification algorithm into inferences. For calculi that rely on unification procedures returning large or even infinite sets of unifiers, integrating unification into the calculus is an attractive method of dovetailing unification and inference. This applies, for example, to AC-superposition and higher-order superposition. We show that first-order superposition remains complete when moving unification rules to the calculus level. We discuss some of the benefits this has even for standard first-order superposition and provide an experimental evaluation.

### 1 Introduction

Unification is a key feature in many proof calculi, particularly those based on the saturation framework. It acts as a filter, reducing the number of inferences that need to be carried out by instantiating terms only to the degree necessary. However, many unification algorithms have large time complexities and produce large, or even infinite, sets of unifiers. This is the case, for example, for ACunification, which can produce a doubly exponential number of unifiers [10], and higher-order unification, which can produce an infinite set of unifiers [20]. This motivates the study of how unification rules can be integrated into proof calculi to allow them to dovetail with standard calculus rules. One way to achieve this is to use the concept of unification with abstraction [13,17]. The general idea is that during the unification process, instead of solving all unification pairs, certain pairs are retained and added to the conclusion of an inference as negative *constraint* literals. Calculus-level unification inferences then work on such literals to solve these constraints and remove the literals in the case they are unifiable. Note how this differs from constrained resolution-style calculi such as [4,15] where the constraints are completely separate from the rest of the clause and are not subject to inferences.

To demonstrate the idea of dedicated unification inferences in combination with unification with abstraction, we provide the following example.

> <sup>C</sup><sup>1</sup> <sup>=</sup> <sup>f</sup>(g(a, x)) -<sup>≈</sup> t C<sup>2</sup> <sup>=</sup> <sup>f</sup>(g(a, b)) <sup>≈</sup> <sup>t</sup>

A standard superposition calculus would proceed by unifying f(g(a, b)) and <sup>f</sup>(g(a, x) with the unifier <sup>σ</sup> <sup>=</sup> {<sup>x</sup> <sup>→</sup> <sup>b</sup>} and then rewriting <sup>C</sup><sup>1</sup> with <sup>C</sup><sup>2</sup> to derive tσ -<sup>≈</sup> tσ. Equality resolution on tσ -<sup>≈</sup> tσ would then derive <sup>⊥</sup>. It is also possible to proceed by rewriting <sup>C</sup><sup>1</sup> with <sup>C</sup><sup>2</sup> *without* computing <sup>σ</sup> and instead add the constraint literal g(a, x) -<sup>≈</sup> g(a, b) to the conclusion to derive t -<sup>≈</sup> t <sup>∨</sup> g(a, x) -<sup>≈</sup> g(a, b). A dedicated unification inference could then decompose the constraint literal resulting in t -<sup>≈</sup> t <sup>∨</sup> a -<sup>≈</sup> a <sup>∨</sup> b -<sup>≈</sup> x. Further unification inferences could bind x to b, and remove the trivial pairs a -<sup>≈</sup> a and t -<sup>≈</sup> t to derive <sup>⊥</sup>.

In this paper, we investigate moving unification to the calculus level for standard first-order superposition. Whilst this may seem like a regressive step, as we lose much of unification's power to act as a filter on inferences and hence produce many more clauses, we think the investigation is valuable for two reasons.

Firstly, by showing how syntactic first-order unification can be lifted to the calculus level, we provide a roadmap for how more complex unification problems can be lifted to the calculus level. This may prove particularly useful in the higher-order case, where abstraction may expose terms to standard calculus rules that were unavailable before. Moreover, we note that in our calculus we do not turn the entire unification problem into a constraint, but rather a subproblem. Whilst this may be merely an interesting detail for first-order unification, for more complex unification problems, such a method could be used to eagerly solve simple unification subproblems whilst delaying complex subproblems by adding them as constraints.

Secondly, one of the most expensive operations in first-order theorem provers is the maintenance of indices. Indices are crucial to the performance of modern solvers, as they facilitate the efficient retrieval of terms unifiable or matchable with a query term. However, solvers typically spend a large amount of time inserting and removing terms from indices as well as unifying against terms in the indices. This is particularly the case in the presence of the AVATAR architecture [24] wherein a change in the model can trigger the insertion and removal of thousands of terms from various indices. By moving unification to the calculus level, we can replace complex indices with simple hash maps, since to trigger an inference we merely need to check for top symbol equality and not unifiability. Insertion and deletion become O(1) time operations. However, for first-order logic, we do not expect the time gained to offset the downsides of extra inferences carried out and extra clauses created. Our experimental results back up this hypothesis (see Sect. 7). Our main contributions are:


### 2 Preliminaries

*Syntax.* We consider standard monomorphic first-order logic with equality. We assume a signature consisting of a finite set of (monomorphically) typed function symbols and a single predicate, equality, denoted by ≈. A non-equality atom A can be expressed using equality as A ≈ where is a special function symbol [18]. Terms are formed in the normal way from variables and function symbols. We commonly use s, t or u or their primed variants to refer to terms. We write s : τ to show that term s has type τ . A term is ground if it contains no variables. We use the notation <sup>s</sup>n to refer to a tuple or list of terms of length n. More generally, we use the over bar notation to refer to tuples and lists of various objects. Where the length of the tuple or list is not relevant, we drop the subscript. By <sup>s</sup>i we denote the <sup>i</sup>th element of the tuple <sup>s</sup>n. Literals are positive or negative equalities written as s <sup>≈</sup> t and s -<sup>≈</sup> t respectively. We use s <sup>≈</sup>˙ t to refer to either a positive or a negative equality. Clauses are multisets of literals. A clause that contains no literals is known as the empty clause and denoted ⊥.

A substitution is a mapping from variables to terms. We assume, w.l.o.g., that all substitutions are idempotent. We commonly denote substitutions using σ and θ and denote the application of a substitution σ to a term s by sσ. A substitution θ is grounding for a term s, if sθ is ground. The definition of grounding substitution can be extended to literals and clauses in the obvious manner. A substitution σ is a unifier of terms s and t if sσ <sup>=</sup> tσ. A unifier σ is more general than a unifier σ if there exists a substitution <sup>ρ</sup> such that σρ <sup>=</sup> <sup>σ</sup> . With respect to syntactic first-order unification, if two terms are unifiable then they have a single most general unifier up to variable naming [1].

A transitive irreflexive relation over terms is known as an ordering. The superposition calculus we present below is, as usual, parameterised by a simplification ordering on ground terms. An ordering is a simplification ordering, if it possesses the following properties. It is total on ground terms. It is compatible with contexts, meaning that if s t, then u[s] u[t]. It is well-founded. Note that every simplification ordering has the subterm property. Namely, that if t is a proper subterm of s, then s t. For non-ground terms, the only property that is required of the ordering is that it is stable under substitution. That is, if s t then for all substitutions σ, sσ tσ. We extend the ordering to literals in the standard fashion via its multiset extension. A positive literal s <sup>≈</sup> s is treated as the multiset {s, s }, whilst a negative literal s -<sup>≈</sup> s is treated as the multiset {s, s, s , s }. The ordering is extended to clauses by its two-fold multiset extension. We use to denote the ordering on terms and its multiset extensions to literals and clauses.

*Semantics.* An interpretation is a pair (U, <sup>I</sup>), where <sup>U</sup> is a set of typed universes and <sup>I</sup> is an interpretation function, such that for each function symbol <sup>f</sup> : <sup>τ</sup><sup>1</sup> <sup>×</sup> ···×τ<sup>n</sup> <sup>→</sup> <sup>τ</sup> in the signature, <sup>I</sup>(f) is a concrete function of type <sup>U</sup>τ<sup>1</sup> ×···×Uτ*<sup>n</sup>* <sup>→</sup> <sup>U</sup>τ . A valuation <sup>ξ</sup> is a function that maps each variable <sup>x</sup> : <sup>τ</sup> to a member of <sup>U</sup>τ . For a given interpretation *<sup>M</sup>* and valuation <sup>ξ</sup>, we uses t ξ *<sup>M</sup>* to represent the denotation of t in *<sup>M</sup>* given <sup>ξ</sup>. A positive literal <sup>s</sup> <sup>≈</sup> <sup>t</sup> is true in an interpretation *<sup>M</sup>* for valuation ξ if s ξ *<sup>M</sup>* = t ξ *<sup>M</sup>* and false otherwise. A negative literal <sup>s</sup> -<sup>≈</sup> t is true in an interpretation *<sup>M</sup>* for valuation <sup>ξ</sup> if <sup>s</sup> <sup>≈</sup> <sup>t</sup> is false. A clause <sup>C</sup> holds in an interpretation *<sup>M</sup>* for valuation <sup>ξ</sup> if one of its literals is true in *<sup>M</sup>* for <sup>ξ</sup>. An interpretation *<sup>M</sup> models* a clause <sup>C</sup> if <sup>C</sup> holds in *<sup>M</sup>* for every valuation. An interpretation models a clause set, if it models every clause in the set. A set of clauses M entails a set of clauses N, denoted M <sup>|</sup><sup>=</sup> N, if every model of M is also a model of N.

### 3 Calculus

Intuitively, what we are aiming for with our calculus, is that whenever standard superposition applies a substitution σ to a conclusion with the side condition "<sup>σ</sup> is a unifier of terms <sup>t</sup><sup>1</sup> and <sup>t</sup><sup>2</sup>", our calculus adds a constraint <sup>t</sup><sup>1</sup> -<sup>≈</sup> <sup>t</sup><sup>2</sup> to the conclusion. The calculus then has further inference rules that mimic the steps of a first-order unification algorithm and work on negative literals. Our presentation below does not quite follow this intuition. Instead, if the unification problem is trivial we solve it immediately. If it is non-trivial, we carry out a single step of unification and add the resulting sub-problems as constraints. Our reasons for doing this are two-fold.


Wherever we present a clause as a subclause C and a literal <sup>l</sup> (e.g. <sup>C</sup>∨l), we denote the entire clause by the same name as the subclause without the dash (e.g. we refer to the clause C <sup>∨</sup> <sup>l</sup> by <sup>C</sup>). As in the classical superposition calculus, our calculus is parameterised by a *selection function* that is used to restrict the number of applicable inferences in order to avoid the search space growing unnecessarily. A selection function sel is a function that maps a clause to a subset of its negative literals. We say that literal l is σ-*eligible* in a clause C <sup>∨</sup> <sup>l</sup> if it is selected in C (l <sup>∈</sup> sel(C)), or there are no selected literals and lσ is maximal in Cσ. Strict σ-eligibility is defined in a like fashion, with maximality replaced by strict maximality. Where σ is empty, we sometimes speak of eligibility instead of σ-eligibility. In what follows, *CS* is a multiset of literals that we refer to as *constraints*.

$$\begin{aligned} \frac{D' \lor f(\overline{t}\_n) \approx t' \quad &C' \lor s[f(\overline{s}\_n)] \dot{\approx} s'}{C' \lor D' \lor s[t'] \dot{\approx} s' \lor \mathcal{CS}} \text{sup} \\\\ \frac{D' \lor x \approx t' \quad &C' \lor s[f(\overline{s}\_n)] \dot{\approx} s'}{(C' \lor D' \lor s[t'] \dot{\approx} s')\sigma} \text{VSUP} \end{aligned}$$

where <sup>σ</sup> <sup>=</sup> {<sup>x</sup> <sup>→</sup> <sup>f</sup>(sn)}, and *CS* <sup>=</sup> <sup>t</sup><sup>1</sup> -<sup>≈</sup> <sup>s</sup><sup>1</sup> <sup>∨</sup> ... <sup>∨</sup> <sup>t</sup>n -<sup>≈</sup> <sup>s</sup>n. Both rules share the following side conditions. Let <sup>t</sup> stand for either <sup>f</sup>(tn) or <sup>x</sup>. For Sup, the substitution σ mentioned in the side conditions is of course empty.


$$\frac{C' \lor f(\overline{t}\_n) \approx v' \lor f(\overline{s}\_n) \approx v}{C' \lor v \not\approx v' \lor f(\overline{s}\_n) \approx v \lor \mathcal{L}} \text{ EqFact} \quad \frac{C' \lor u' \approx v' \lor u \approx v}{(C' \lor v \not\approx v' \lor u \approx v)\sigma} \text{ VEQFACT}$$

for EqFact, *CS* <sup>=</sup> <sup>t</sup><sup>1</sup> -<sup>≈</sup> <sup>s</sup><sup>1</sup> <sup>∨</sup> ... <sup>∨</sup> <sup>t</sup>n -<sup>≈</sup> <sup>s</sup>n. For VEqFact, either <sup>u</sup> or <sup>u</sup> must be a variable and σ is the most general unifier of u and u . The side conditions for EqFact are:


The side conditions for VEqFact are:


The calculus also contains the following resolution/unification inferences. We refer to these as unification inferences, because each inference represents carrying out a single step of the well-known Robinson unification algorithm [11].

$$\begin{array}{c} C' \lor f(\overline{s}\_n) \not\models f(\overline{t}\_n) \\ \hline C' \lor x \not\models \mathcal{L} \\ \hline C' \sigma \end{array} \text{Diconpleteness}$$

where for Bind, σ <sup>=</sup> {x <sup>→</sup> t} and x does not occur in t. For Decompose, <sup>f</sup>(sn) -<sup>=</sup> <sup>f</sup>(tn) and *CS* <sup>=</sup> <sup>t</sup><sup>1</sup> -<sup>≈</sup> <sup>s</sup><sup>1</sup> <sup>∨</sup> ... <sup>∨</sup> <sup>t</sup>n -<sup>≈</sup> <sup>s</sup>n. All three inferences require that the final literal be σ-eligible in Cσ (for Decompose and ReflDel, σ is empty). We provide some examples to show how the calculus works.

*Example 1.* Consider the unsatisfiable clause set:

$$C\_1 = f(x, g(x)) \not\approx t \qquad C\_2 = f(g(b), y) \approx t$$

<sup>A</sup> Sup inference between <sup>C</sup><sup>1</sup> and <sup>C</sup><sup>2</sup> results in clause <sup>C</sup><sup>3</sup> <sup>=</sup> <sup>t</sup> -<sup>≈</sup> t <sup>∨</sup> x -≈ g(b) <sup>∨</sup> g(x) -<sup>≈</sup> <sup>y</sup>. A ReflDel inference on <sup>C</sup><sup>3</sup> results in the clause <sup>C</sup><sup>4</sup> <sup>=</sup> <sup>x</sup> -≈ g(b) <sup>∨</sup> g(x) -<sup>≈</sup> <sup>y</sup>. An application of Bind on <sup>C</sup><sup>4</sup> with <sup>σ</sup> <sup>=</sup> {<sup>x</sup> <sup>→</sup> <sup>g</sup>(b)} results in <sup>C</sup><sup>5</sup> <sup>=</sup> <sup>g</sup>(g(b)) -<sup>≈</sup> y. Another application of Bind, then leads to <sup>⊥</sup>.

*Example 2.* Consider the unsatisfiable clause set:

$$C\_1 = x \approx c \qquad C\_2 = f(a, c) \not\approx t \qquad C\_3 = f(c, c) \approx t$$

<sup>A</sup> VSup inference between <sup>C</sup><sup>1</sup> and <sup>C</sup><sup>2</sup> results in clause <sup>C</sup><sup>4</sup> <sup>=</sup> <sup>f</sup>(c, c) -<sup>≈</sup> t. A Sup inference between <sup>C</sup><sup>3</sup> and <sup>C</sup><sup>4</sup> results in the clause <sup>C</sup><sup>5</sup> <sup>=</sup> <sup>t</sup> -<sup>≈</sup> t∨c -<sup>≈</sup> c∨c -≈ c. A triple application of ReflDel starting from <sup>C</sup><sup>5</sup> derives <sup>⊥</sup>.

*Note 1.* We abuse terminology and use *inference* and *inference rule* to refer both to schemas such as shown above, as well as concrete instances of such schemas. Given an inference ι, we refer to the tuple of its premises by *prems*(ι), to its maximal premise by *mprem*(ι), and to its conclusion by *concl*(ι).

#### 4 Redundancy Criterion

We utilise Waldmann et al.'s framework [25] for proving the completeness of our calculus. Hence, our redundancy criterion is based on their intersected lifted criterion. In instantiating the framework, we roughly follow Bentkamp et al. [6]. Let the calculus defined above be referred to as *Inf* . We introduce a ground inference system *GInf* that coincides with standard superposition [3]. That is, it contains the well known three inferences, Sup, EqFact and EqRes. We refer to these inferences by GSup, GEqFact and GEqRes to indicate that they are only applied to ground clauses. Following the notation of the framework, we write *Inf* (N) (*GInf* (N)) to denote the set of all *Inf* (*GInf* ) inferences with premises in a clause set N. We introduce a grounding function *<sup>G</sup>* that maps terms, literals and clauses to the sets of their ground instances. For example, given a clause C, *<sup>G</sup>*(C) is the set {Cθ <sup>|</sup> θ is a grounding substitution}. We extend the function *<sup>G</sup>* to clause sets by letting *<sup>G</sup>*(N) = - C∈N *<sup>G</sup>*(C) where <sup>N</sup> is a set of clauses.

A ground clause C is redundant with respect to a set of ground clauses N if there are clauses <sup>C</sup><sup>1</sup>,...,Cn <sup>∈</sup> <sup>N</sup> such that for <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, <sup>C</sup>i <sup>≺</sup> <sup>C</sup> and <sup>C</sup><sup>1</sup>,...,Cn <sup>|</sup><sup>=</sup> <sup>C</sup>. The set of all ground clauses redundant with respect to a set of ground clauses <sup>N</sup> is denoted *GRed* Cl(N).

A clause C is redundant with respect to a set of clauses N, if for every D <sup>∈</sup> *<sup>G</sup>*(C), <sup>D</sup> is redundant with respect to *<sup>G</sup>*(N) or there is a clause <sup>C</sup> <sup>∈</sup> <sup>N</sup> such that D <sup>∈</sup> *<sup>G</sup>*(C ) and C - C where is the strict subsumption relation. That is C - C if <sup>C</sup> is subsumed by <sup>C</sup> , but C is not subsumed by C. The set of all clauses redundant with respect a set of clauses <sup>N</sup> is denoted *Red* Cl(N).

In order to define redundant inferences, we have to pay careful attention to selection functions. For non-ground clauses, we fix a selection function sel. We then let *G*(*sel*) be a set of selection functions on ground clauses with the following property. For each *gsel* <sup>∈</sup> *<sup>G</sup>*(*sel*), for every ground clause <sup>C</sup>, there exists a clause D such that C <sup>∈</sup> *<sup>G</sup>*(D) and the literals selected in <sup>C</sup> by *gsel* correspond to those selected in D by *sel*. We write *GInf gsel* to show that the ground inference system *GInf* is parameterised by the selection function *gsel*. Let ι be an inference in *Inf* . We extend the grounding function *G* to a family of grounding functions *G*gsel

for each *gsel* ∈ *G*(*sel*). Each function *G*gsel maps terms, literals and clauses as above, and maps members of *Inf* to subsets of *GInf gsel* as follows.<sup>1</sup>

Definition 1 (Ground Instance of an Inference). *Let* ι *be of the form* <sup>C</sup><sup>1</sup>,...,Cn <sup>E</sup> <sup>∨</sup> *CS. An inference* <sup>ι</sup>g <sup>∈</sup> *GInf gsel is in <sup>G</sup>*gsel(ι) *if it is of the form* <sup>C</sup><sup>1</sup>θ,..., Cn<sup>θ</sup> Eθ *for some grounding substitution* <sup>θ</sup>*. In this case, we say that* <sup>ι</sup>g *is the* <sup>θ</sup>*-ground instance of* <sup>ι</sup>*. Note that we ignore the constraints in the definition of ground instances.*

A ground inference <sup>C</sup>1,...,Cn, C <sup>E</sup> with maximal premise <sup>C</sup> is redundant with respect to a clause set <sup>N</sup> if for <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, <sup>C</sup>i <sup>∈</sup> *GRed* Cl(N) or <sup>C</sup> <sup>∈</sup> *GRed* Cl(N) or there exist clauses <sup>D</sup>1,...Dm <sup>∈</sup> <sup>N</sup> such that for <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>m</sup>, <sup>D</sup>i <sup>≺</sup> <sup>C</sup> and <sup>D</sup>1,...,Dm <sup>|</sup><sup>=</sup> <sup>E</sup>. The set of all ground inferences redundant with respect to a set N is denoted *GRed gsel* I (N).

An inference ι is redundant with respect to a clause set N if for every *gsel* <sup>∈</sup> *<sup>G</sup>*(*sel*) and for every ι <sup>∈</sup> *<sup>G</sup>*gsel(ι), ι ∈ *GRed gsel I* (*G*(N)). In words, every ground instance of the inference is redundant with respect to *<sup>G</sup>*(N). We denote the set of all redundant inferences with respect to a set <sup>N</sup> as *Red*I (N).

A clause set N is saturated up to redundancy by an inference system *Inf* if every member of *Inf* (N) is redundant with respect to N.

*Note 2.* Given the definition of clause redundancy above, the ReflDel inference can be utilised as a *simplification* inference. That is, the conclusion of the inference renders the premise redundant.

#### 5 Refutational Completeness

To prove refutational completeness we utilise the above mentioned framework of Waldmann et al. [25]. In particular, we use Theorem 14 from the paper to lift completeness from the ground level to the non-ground level. We bring Theorem 14 here for clarity and to keep the paper self contained. We then present it in our notation. Let *GRed* = (*GRed gsel* I , *GRed* Cl) and *Red* = (*Red*<sup>I</sup> , *Red* Cl).

Theorem 14 (from Waldmann et al. [25]). *If* (*GInf <sup>q</sup>* , *Red <sup>q</sup>* ) *is statically refutationally complete w.r.t.* <sup>|</sup>=<sup>q</sup> *for every* <sup>q</sup> <sup>∈</sup> <sup>Q</sup> *and if for every* <sup>N</sup> <sup>⊆</sup> **<sup>F</sup>** *that is saturated w.r.t. FInf and Red*∩*<sup>G</sup> there exists a* <sup>q</sup> *such that GInf <sup>q</sup>* (*G*q(N)) <sup>⊆</sup> *<sup>G</sup>*q(*FInf* (N))∪*Red <sup>q</sup> I* (*G*q(N))*, then* (*FInf* , *Red*∩*G*) *is statically refutationally complete w.r.t.* |=<sup>∩</sup> *G.*

Theorem 14 (from Waldmann et al. in our Notation). *If* (*GInf gsel* , *GRed*) *is statically refutationally complete w.r.t.* <sup>|</sup><sup>=</sup> *for every gsel* <sup>∈</sup> *<sup>G</sup>*(*sel*) *and if for every clause set* N *that is saturated w.r.t. Inf and Red there exists a gsel such that GInf gsel*(*G*gsel(N)) <sup>⊆</sup> *<sup>G</sup>*gsel(*Inf* (N)) <sup>∪</sup> *Red*I (*G*gsel(N))*, then* (*Inf* , *Red*) *is statically refutationally complete w.r.t.* <sup>|</sup>=*<sup>G</sup> .*

<sup>1</sup> When a grounding function *Ggsel* acts on a clause, literal or term, we commonly drop the *gsel* superscript as the selection function plays no role in the grounding of these.

Thus, in our context, the set Q is *<sup>G</sup>*(*sel*), the ground inference system *GInf <sup>q</sup>* maps to *GInf gsel* , the ground redundancy criterion *Red <sup>q</sup>* is (*GRed gsel* I , *GRed* Cl) and the ground entailment relation |=<sup>q</sup> maps to standard entailment on firstorder clauses. Moreover, the non-ground inference system *FInf* maps to *Inf* and the redundancy criterion *Red*∩*<sup>G</sup>* maps to (*Red*I , *Red* Cl). Note, that this final mapping is not exact, as the criterion *Red*∩*<sup>G</sup>* does not allow for a tiebreaker ordering, such as the strict subsumption relation, to be utilised in the definition of non-ground redundancy. However, this mismatch can easily be repaired since Theorem 16 of the framework paper extends the result of Theorem 14 to the case where tiebreaker orderings are used.

As our ground inference systems *GInf gsel* are ground superposition systems, static refutational completeness with respect to standard entailment and standard redundancy is a famous result. See for example [2]. What remains for us to prove in order to apply Theorem 14 and show the static refutational completeness of *Inf* , is:


Lemma 1. *For every gsel* <sup>∈</sup> *G*(*sel*)*, the grounding function G*gsel *is a grounding function in the sense of the framework.*

*Proof.* We need show that properties (G1) – (G3) defined by Waldmann et al. hold for grounding functions. These properties are:

(G1) for every ⊥ ∈ **F**⊥, ∅ -= *G*(⊥) ⊆ **G**⊥; (G2) for every <sup>C</sup> <sup>∈</sup> **<sup>F</sup>**, if ⊥ ∈ *<sup>G</sup>*(C) and ⊥ ∈ (G)<sup>⊥</sup> then <sup>C</sup> <sup>∈</sup> **<sup>F</sup>**⊥; (G3) for every <sup>ι</sup> <sup>∈</sup> *FInf* , if *<sup>G</sup>*(ι) -<sup>=</sup> *undef* , then *<sup>G</sup>*(ι) <sup>⊆</sup> *Red*I (*G*(*concl*(ι))).

As properties (G1) and (G2) relate to the grounding of terms and clauses, and our grounding of these is fully standard we skip these. We prove (G3), which in our terminology is: for every <sup>ι</sup> <sup>∈</sup> *Inf* , *<sup>G</sup>*gsel(ι) <sup>⊆</sup> *GRed gsel* I (*G*(*concl*(ι))). This can be achieved by showing that for every ι <sup>∈</sup> *<sup>G</sup>*gsel(ι), there exist clauses C <sup>∈</sup> *<sup>G</sup>*(*concl*(ι)) such that C <sup>|</sup><sup>=</sup> *concl*(ι ) and for each <sup>C</sup>i <sup>∈</sup> <sup>C</sup>, <sup>C</sup>i <sup>≺</sup> *mprem*(<sup>ι</sup> ). In what follows, let θ be the substitution by which ι is a grounding of ι.

If *CS* is the empty set in *concl*(ι), then *concl*(ι)<sup>θ</sup> <sup>=</sup> *concl*(<sup>ι</sup> ) and hence *concl*(ι)θ <sup>|</sup><sup>=</sup> *concl*(ι ). Moreover, *concl*(ι)θ <sup>∈</sup> *<sup>G</sup>*(*concl*(ι)) and thus *concl*(ι)θ <sup>≺</sup> *mprem*(ι ).

On the other hand, if *CS* is not empty, let <sup>u</sup> <sup>=</sup> <sup>f</sup>(tn) and <sup>u</sup> <sup>=</sup> <sup>f</sup>(sn) be the two terms within *prems*(ι) from which the constraints are created. By the existence of ι , we have that uθ <sup>=</sup> u <sup>θ</sup>, and hence that <sup>t</sup>i<sup>θ</sup> <sup>=</sup> <sup>s</sup>i<sup>θ</sup> for <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>. Hence, every literal in *CS*θ has the form t -<sup>≈</sup> t and is trivially false in every interpretation. Thus, we still have *concl*(ι)θ <sup>|</sup><sup>=</sup> *concl*(ι ). Moreover, by the subterm property of the ordering we have that <sup>t</sup>i<sup>θ</sup> -<sup>≈</sup> <sup>s</sup>i<sup>θ</sup> is smaller than the maximal/selected literal of *mprem*(ι ) for <sup>1</sup> <sup>≤</sup> i <sup>≤</sup> n and hence that *concl*(ι)θ <sup>≺</sup> *mprem*(ι ).

Lemma 2. *let* σ *be the most general unifier of terms* s *and* s *, and* θ *be any unifier of the same terms. Then for any term* t*,* (tσ)θ <sup>=</sup> tθ*.*

*Proof.* Since σ is the most general unifier, there must be a substitution ρ such that σρ <sup>=</sup> θ. Hence (tσ)θ = (tσ)σρ <sup>=</sup> tσρ <sup>=</sup> tθ where the second to last step follows from the fact that σ is idempotent.

Lemma 3. *For every clause set* <sup>N</sup> *saturated by Inf , there exists a gsel* <sup>∈</sup> *<sup>G</sup>*(*sel*) *such that GInf gsel* (*G*(N)) <sup>⊆</sup> *<sup>G</sup>*gsel(*Inf* (N)) <sup>∪</sup> *GRed gsel* I (*G*(N))*.*

*Proof.* For every <sup>D</sup> <sup>∈</sup> *<sup>G</sup>*(N) there must exist a clause C <sup>∈</sup> N such that D <sup>∈</sup> *<sup>G</sup>*(C). Let be an arbitrary well-founded ordering on clauses. We let C <sup>=</sup> *<sup>G</sup>*−<sup>1</sup>(D) denote the -smallest clause such that D <sup>∈</sup> *<sup>G</sup>*(C). We then choose the *gsel* <sup>∈</sup> *<sup>G</sup>*(*sel*) that for a clause <sup>D</sup> <sup>∈</sup> *<sup>G</sup>*(N) selects the corresponding literals to those selected by *sel* in *<sup>G</sup>*−<sup>1</sup>(D). Given this *gsel*, we need to show that every inference with premises in *<sup>G</sup>*(N) is either the ground instance of an inference with premises in N, or is redundant with respect to *<sup>G</sup>*(N).

<sup>A</sup> Sup inference is redundant if the term t replaced in the second premise occurs at or below a variable. The proof is exactly the same as in the standard proof of the completeness of superposition [3], so we don't repeat it. All other inferences can be shown to be the ground instance of inferences from clauses in N.

Let ι <sup>∈</sup> *GInf gsel* be the following GSup inference with premises in *<sup>G</sup>*(N).

$$\frac{D'\theta \lor t\theta \approx t'\theta}{C'\theta \lor D'\theta \lor s\theta[t'\theta] \check{\approx} s'\theta}$$

where *<sup>G</sup>*−<sup>1</sup>(Dθ) = <sup>D</sup> <sup>=</sup> <sup>D</sup> <sup>∨</sup> <sup>t</sup> <sup>≈</sup> <sup>t</sup> , *<sup>G</sup>*−<sup>1</sup>(Cθ) = C <sup>=</sup> C <sup>∨</sup> s <sup>≈</sup>˙ s and ι fulfils all the side conditions of GSup. Let σ be any substitution. The literal tθ <sup>≈</sup> t θ being strictly maximal in Dθ implies that tσ <sup>≈</sup> t σ is strictly maximal in Dσ due to the stability under substitution of . The literal sθ[tθ] <sup>≈</sup>˙ s θ being (strictly) eligible in Cθ with respect to *gsel* implies that sσ <sup>≈</sup> s σ is strictly eligible in Cσ with respect to *sel*. Let p be the position of tθ within sθ and let u be the subterm of s at p. Since the term tθ does not occur below a variable of C, such a position must exist. Moreover, u cannot be a variable since if it was tθ would occur at a variable of C. As θ is a unifier of u and t, it must be the case that either t is a variable, or u and t have the same top symbol. Further, Dθ <sup>≺</sup> Cθ implies that Cσ - Dσ, tθ t θ implies that tσ - t σ, and sθ[t θ] s θ implies sσ - s σ. Thus, if t is not a variable, there exists the following Sup inference ι from clauses D and C.

$$\frac{D' \lor t \approx t' \qquad C' \lor s[u] \not\approx s'}{C' \lor D' \lor s[t'] \not\approx s' \lor \mathcal{CS}}$$

We have that (C <sup>∨</sup> D <sup>∨</sup> s[t ] <sup>≈</sup>˙ s )θ <sup>=</sup> *concl*(ι). That is, the grounding of the conclusion of ι less the constraint literals is equal to the conclusion of ι. Thus, ι is the θ-ground instance of ι as per Definition 1. If t is a variable x, then there exists the following VSup inference ι from clauses D and C.

$$\frac{D' \lor x \approx t' \qquad C' \lor s[u] \not\approx s'}{(C' \lor D' \lor s[t'] \not\approx s')\sigma}$$

where σ <sup>=</sup> {x <sup>→</sup> u} is the most general unifier of t and u. Thus, we can use Lemma <sup>2</sup> to show that *concl*(ι )θ <sup>=</sup> *concl*(ι) and again ι is the θ-ground instance of ι .

Let ι <sup>∈</sup> *GInf gsel* be the following GEqFact inference with premise in *<sup>G</sup>*(N).

$$\frac{C'\theta \lor u'\theta \approx v'\theta \lor u\theta \approx v\theta}{C'\theta \lor v\theta \not\approx v'\theta \lor u\theta \approx v\theta}$$

where u θ <sup>=</sup> uθ, *<sup>G</sup>*−<sup>1</sup>(Cθ) = C <sup>=</sup> C <sup>∨</sup> u <sup>≈</sup> v <sup>∨</sup> u <sup>≈</sup> v and ι fulfils all the side conditions of GEqFact. Let σ be any substitution. The literal uθ <sup>≈</sup> vθ being maximal in Dθ implies that uσ <sup>≈</sup> vσ is maximal in Dσ. Since θ is a unifier of u and u, at least one of them must be a variable, or they must share a top symbol. Moreover, uθ vθ implies that uσ - vσ and u θ v θ implies that u σ - v σ. If neither u nor u is a variable, there exists the following EqFact inference <sup>ι</sup> from C.

$$\frac{C' \lor u' \approx v' \lor u \approx v}{C' \lor v \not\approx v' \lor u \approx v \lor \mathcal{CS}}$$

We have (C <sup>∨</sup> v -<sup>≈</sup> v <sup>∨</sup> u <sup>≈</sup> v)θ <sup>=</sup> *concl*(ι), making ι the θ-ground instance of ι as per Definition 1. If either u of u is a variable there exists the following VEqFact inference ι from C.

$$\frac{C' \lor u' \approx v' \lor u \approx v}{(C' \lor v \not\approx v' \lor u \approx v)\sigma}$$

where σ is the most general unifier of u and u . Thus, we can use Lemma 2 to show that *concl*(ι )θ <sup>=</sup> *concl*(ι). Finally, let ι <sup>∈</sup> *GInf gsel* be the following GEqRes inference with premise in *<sup>G</sup>*(N).

$$\frac{C'\theta \lor s\theta \not\approx s'\theta}{C'\theta}$$

θ where sθ <sup>=</sup> s θ, *<sup>G</sup>*−<sup>1</sup>(Cθ) = C <sup>=</sup> C <sup>∨</sup> s -<sup>≈</sup> s and ι fulfils all the side conditions of GEqRes. Let σ be any substitution. The literal sθ -<sup>≈</sup> s θ being eligible with respect to *gsel* in Cθ implies that s -<sup>≈</sup> s is eligible in <sup>C</sup> with respect to *sel*. Since θ is a unifier of s and s , at least one of them must be a variable, or they must share a top symbol. If s <sup>=</sup> s , then there exists the following ReflDel inference ι from C.

$$\frac{C' \lor s \not\models s}{C'}\_{\dots}$$

Otherwise we have two options. If either s (or analogously s ) is a variable, then there is the following Bind inference ι from C.

$$\frac{C' \lor x \not\models s'}{C' \sigma}$$

σ Otherwise s and s must share a top symbol and there is the following Decompose inference ι from C.

$$\frac{C' \lor f(\overline{s}\_n) \not\supset f(\overline{t}\_n)}{C' \lor \mathcal{CS}}$$

In the first case, we have *concl*(ι )θ <sup>=</sup> *concl*(ι). In the second case, σ is the most general unifier of s and s , so we can use Lemma 2 to show that *concl*(ι )θ <sup>=</sup> *concl*(ι). In the last case, we have that C θ <sup>=</sup> *concl*(ι). Thus in all cases, ι is the θ-ground instance of ι .

Using Lemmas 1 and 3 we can instantiate Theorem 14 to prove the static refutational completeness of *Inf* . There is a slight issue here, as Theorem 14 gives us refutational completeness with respect to Herbrand entailment. That is N <sup>|</sup><sup>=</sup> M if *<sup>G</sup>*(N) <sup>|</sup><sup>=</sup> *<sup>G</sup>*(M). We would like to prove completeness with respect to entailment as defined in Sect. 2 (known as Tarski entailment). This issue can easily be resolved by showing that the two concepts are equivalent with regards to refutations which can be achieved in a manner similar to Bentkamp et al. (Lemma 4.19 of [6]).

Theorem 1 (Static refutational completeness). *For a set of clauses* N *saturated up to redundancy by Inf ,* N <sup>|</sup><sup>=</sup> <sup>⊥</sup> *if and only if* ⊥ ∈ N*.*

Theorem 17 of Waldmann et al.'s framework can be used to derive dynamic refutational completeness from static refutational completeness. We refer readers to the framework for the formal definition of dynamic refutational completeness.

Theorem 2 (Dynamic refutational completeness). *The inference system Inf is dynamically refutationally complete with respect to the redundancy criterion* (*Red*I , *Red* Cl)*.*

### 6 Extending to Higher-Order Logic

We sketch how the ideas above can be extended to higher-order logic. This is ongoing research, and many of the technical details have yet to be fully worked out. Here, we provide a (very) informal description and then provide examples. The higher-order unification problem is undecidable and there can exist a potentially infinite number of incomparable most general unifiers for a pair of terms [12]. Existing higher-order paramodulation style calculi deal with this issue

in two main ways. One method is to abandon completeness and only unify to some predefined depth [22]. Another approach is to produce potentially infinite streams of unifiers and interleave the fetching of items from such streams with the standard saturation procedure [7]. Our idea is to solve easy sub-problems eagerly, such as when terms are first-order or in the pattern fragment [16], and add harder sub-problems as constraints. We then utilise dedicated inferences on negative literals to mimic the rules of Huet's well known (pre-)unification procedure [12]. We think that inferences similar to the following two, could be sufficient to achieve refutational completeness.

$$\frac{C' \lor x \overline{s}\_n \not\simeq f \overline{t}\_m}{(C' \lor x \ \overline{s}\_n \not\simeq f \ \overline{t}\_m) \{x \to \lambda \overline{y}\_n. f \ (z\_1 \overline{y}\_n) \dots (z\_m \overline{y}\_n)\}} \text{ IMTATE}$$

$$\frac{C' \lor x \ \overline{s}\_n \not\simeq f \ \overline{t}\_m}{(C' \lor x \ \overline{s}\_n \not\simeq f \ \overline{t}\_m) \{x \to \lambda \overline{y}\_n. y\_i \ (z\_1 \overline{y}\_n) \dots (z\_p \overline{y}\_n)\}} \text{ PROECT}$$

In both rules, each <sup>z</sup>i is a fresh variable of the relevant type, and <sup>x</sup> <sup>s</sup>n -<sup>≈</sup> <sup>f</sup> <sup>t</sup>m is selected in <sup>C</sup>. Project has <sup>k</sup> <sup>≤</sup> <sup>n</sup> conclusions, one for each <sup>y</sup>i of suitable type. We hope that through a careful definition of the selection function, along with the use of purification, we can avoid the need to apply unification inferences to flex-flex literals (negative literals where both sides of the equality have variable heads). Moreover, we are hopeful that the calculus we propose can remain complete without the need for inferences that carry out superposition beneath variables such as the FluidSup rule of λ-superposition [7] and the SubVarSup rule of combinatory-superposition [9].

*Example 3.* Consider the unsatisfiable clause set:

$$C\_1 = f \, y \, (x \, a) \, (x \, b) \, \not\approx t \qquad C\_2 = f \, c \, a \, b \approx t$$

<sup>A</sup> Sup inference between <sup>C</sup><sup>1</sup> and <sup>C</sup><sup>2</sup> results in clause <sup>C</sup><sup>3</sup> <sup>=</sup> tσ -<sup>≈</sup> tσ <sup>∨</sup> x a -≈ a <sup>∨</sup> x b -<sup>≈</sup> b where σ <sup>=</sup> {y <sup>→</sup> c}. Assume that the literal x a is selected in C<sup>3</sup>. We can carry out either a Project step on this literal or an Imitate step. The result of a project step is <sup>C</sup><sup>4</sup> = (tσ -<sup>≈</sup> tσ <sup>∨</sup> (λz. z) a -<sup>≈</sup> a <sup>∨</sup> x b -<sup>≈</sup> b){x <sup>→</sup> λz. z}. Applying the substitution and <sup>β</sup>-reducing results in <sup>C</sup><sup>5</sup> <sup>=</sup> tσ -<sup>≈</sup> tσ∨a -<sup>≈</sup> a∨b -<sup>≈</sup> b from which it is easy to reach a contradiction.

*Example 4 (Example 1 of Bentkamp et al.* [7]*).* Consider the unsatisfiable clause set:

$$C\_1 = f\ a \approx c \qquad C\_2 = h\left(y\,b\right)\left(y\,a\right) \not\cong h\left(g\left(f\,b\right)\right)\left(g\,c\right)$$

An EqRes inference on <sup>C</sup><sup>2</sup> results in <sup>C</sup><sup>3</sup> <sup>=</sup> y b -<sup>≈</sup> g (f b) <sup>∨</sup> y a -<sup>≈</sup> g c. An Imitate inference on the first literal of <sup>C</sup><sup>3</sup> followed by the application of the substitution and some <sup>β</sup>-reduction results in <sup>C</sup><sup>4</sup> <sup>=</sup> <sup>g</sup> (z b) -<sup>≈</sup> g (f b) <sup>∨</sup> g (z a) -<sup>≈</sup> g c. A further double application of EqRes gives us <sup>C</sup><sup>5</sup> <sup>=</sup> z b -<sup>≈</sup> f b <sup>∨</sup> z a -<sup>≈</sup> c. We again carry out Imitate on the first literal followed by an EqRes to leave us with <sup>C</sup><sup>6</sup> <sup>=</sup> x b -<sup>≈</sup> b <sup>∨</sup> f (x a) -<sup>≈</sup> <sup>c</sup>. We can now carry out a Sup inference between <sup>C</sup><sup>1</sup> and <sup>C</sup><sup>6</sup> resulting in <sup>C</sup><sup>7</sup> <sup>=</sup> x b -<sup>≈</sup> b <sup>∨</sup> c -<sup>≈</sup> c <sup>∨</sup> x a -<sup>≈</sup> a from which it is simple to derive <sup>⊥</sup> via an application of Imitate on either the first or the third literal. Note, that the empty clause was derived without the need for an inference that simulates superposition underneath variables, unlike in [7].

*Example 5 (Example 2 of Bentkamp et al.* [7]*).* Consider the unsatisfiable clause set:

$$C\_1 = f\ a \approx c \qquad C\_2 = h\left(y\left(\lambda x.\, g\left(f\ x\right)\right)a\right)y \not\approx h\left(g\,c\right)\left(\lambda w\,x.\, w\,x\right)$$

An EqRes inference on <sup>C</sup><sup>2</sup> results in <sup>C</sup><sup>3</sup> <sup>=</sup> <sup>y</sup> (λx. g (f x)) <sup>a</sup> -<sup>≈</sup> g c∨y -<sup>≈</sup> λw x. w x. Assuming that the second literal is selected,<sup>2</sup> an EqRes inference results in <sup>C</sup><sup>4</sup> = (<sup>y</sup> (λx. g (f x)) <sup>a</sup> -<sup>≈</sup> g c){<sup>y</sup> <sup>→</sup> λw x. w x}. Simplifying <sup>C</sup><sup>4</sup> via applying the substitution and β-reducing, we achieve g (f a) -<sup>≈</sup> g c. Superposing <sup>C</sup><sup>1</sup> onto this clause we end up with <sup>C</sup><sup>5</sup> <sup>=</sup> g c -<sup>≈</sup> g c from which the empty clause can easily be derived. Note again, that the empty clause has been derived without recourse to a FluidSup-like inference.

### 7 Experimental Results

We implemented the calculus in the Vampire theorem prover [14]. We also implemented a variant of the calculus, that utilises fingerprint indices [19] to act as an imperfect filter. The completeness proof indicates that a superposition inference only needs to be carried out when the two terms can *possibly* unify. Therefore, we store terms in fingerprint indices, which act as fast imperfect filters for finding unification partners, and only carry out superposition inferences with terms returned by the index. This restricts, somewhat, the number of inferences that take place, at the expense of some loss of speed. Thus, it represents a midway path between eager unification and delayed unification. As a final twist, we implemented a version of the calculus that uses fingerprint indices as well as solving constraint literals of the form x -<sup>≈</sup> t (where x is not a subterm of t) and t -<sup>≈</sup> t eagerly. Thus, in this version of the calculus there is no need for the Bind and ReflDel rules.

We compared each of these approaches with the standard superposition calculus implemented in Vampire. We refer to the standard calculus as Vampire and the delayed inference calculus without fingerprint indices by Vampire\*. 3 We refer to the delayed inference calculus with fingerprint indices by Vampire† .

<sup>2</sup> Most orderings would select the first literal. In this case, we can still derive a contradiction, but the proof is longer.

<sup>3</sup> Our implementation can be found at https://github.com/vprover/vampire/tree/ delayed-unification. To run the new calculus, use option -duc on. To run the standard calculus, the option duc is set to off.

Finally, we refer to the calculus that eagerly solves some constraint literals by Vampire‡ . 4

We tested these approaches against each other on benchmarks coming from CASC 2023 system competition [23]. As our new approach is not currently compatible with higher-order or polymorphic input, we restricted the comparison to monomorphic first-order problems. Namely, we used the 500 benchmarks in the FNE and FEQ categories. These are monomorphic, first-order benchmarks that either include equality (FEQ) or do not contain equality (FNE). All benchmarks in the set are theorems. The results can be seen in Table 1. All experiments were run on a node cluster located at The University of Manchester. Each node in the cluster is equipped with 192 gigabytes of RAM and 32 Intel-<sup>R</sup> Xeon processors with two threads per core. Each configuration was given 100s of CPU time per problem and run in single core mode. Vampire was run with options --mode casc which causes it to use a tuned portfolio of strategies. All other variants were run with options --mode casc --forced\_options duc=on which forces the use of the new calculus on top of the aforementioned portfolio.


Table 1. Summary of experimental results

The calculi based on delayed unification perform badly in comparison to standard superposition. This is unsurprising, as syntactic first-order unification is already an efficient process. By replacing it with delayed unification, we gain little in terms of time, but pay a heavy penalty in terms of the number of inferences carried out. The use of fingerprint indices helps somewhat in mitigating this issue, but not a great deal. Eagerly solving trivial constraints shows more promise and is actually able to solve two problems that the standard calculus can not (within the time limit). These are the benchmarks CSR036+3.p and LAT347+3.p.

### 8 Related Work

The only other proof calculi that we are aware of that explicitly integrate unification rules at the calculus level, are the higher-order paramodulation calculi [8,22]

<sup>4</sup> The code for both Vampire*†* and Vampire*‡* can be found at branch https://github. com/vprover/vampire/tree/delayed-unif-with-fp. Vampire*†* was built from commit c04a08feb5db3e7468a1fa and Vampire*‡* from commit fa2f139302b6a7a6487e73. Again, option -duc on is required for the new calculi to run.

and lazy paramodulation [21]. However, these calculi are paramodulation calculi and do not incorporate certain concepts of redundancy so crucial to the success of superposition provers. Moreover, the completeness proofs for these calculi are based on very different techniques to the Bachmair & Ganzinger style model building proofs commonly employed in the completeness proofs of superposition calculi.

There are other calculi that in some form do represent the folding of unification into the calculus, but the link between the unification rules and the calculus is less clear. For example, the recent work by one of the authors of this paper [13] relating to reasoning about linear arithmetic, moves theory reasoning relating to a number of equations from the unification algorithm to the calculus level. A different example, by another of this paper, is the combinatory-superposition calculus [9] which essentially folds higher-order combinatory unification into the calculus. In both cases, the relationship between the unification algorithm and the calculus rules is not obvious.

There are other methods of dovetailing unification with inference rules. For example, a unification procedure can be modified to return a stream of results. This stream can be interrupted in order to carry out further inferences and then returned to later. This is the approach taken by the higher-order Zipperposition prover [7] in order to handle the infinite sets of unifiers returned by higher-order unification. Conceptually, this is a very different solution to using constraints, since the intermediate terms created during unification are not available to the entire calculus as they are in our approach. Furthermore, from an implementation perspective, streams of unifiers are a far greater departure from the standard saturation architecture than the adding of constraints. Unification can also be partially delayed by preprocessing techniques such as Brand's modification method and its developments [5].

As mentioned in the introduction, abstraction resembles the basic strategy [4,15], where unification problems are added to the constraint part of a clause. Periodically, these constraints can be checked for satisfiability and clauses with unsatisfiable constraints removed. However, in the basic strategy, the constraints do not interact with the rest of the proof calculus. Moreover, redundancy of clauses can no longer be defined in terms of ground instances, but only in terms of ground instances that satisfy the constraints. This significantly affects the simplification machinery of superposition/resolution.

Unification with abstraction was first introduced, to the best of our knowledge, by Reger et al. in [17] in the context of theory reasoning. However, the concept was introduced in an ad-hoc fashion with no theoretical analysis of its impact on the completeness of the underlying calculus. Recently, the relationship between unification modulo an equational theory and unification with abstraction has been analysed [13] and a framework developed linking the two. It remains to explore whether the current work can fit into that framework.

#### 9 Conclusion

We have developed a first-order superposition calculus that delays unification through the use of constraints, and proved its completeness. Whilst the calculus does not perform well in practice, we feel that the calculus and its completeness proof form a template that can be followed to prove the completeness of calculi that involve unification procedures more complex than syntactic first-order unification. For example unification modulo a set of equations E. Some of the crucial features of our approach are: (1) the carrying out of partial unification and adding the remaining unification pairs back as constraints, and (2) the ignoring of constraint literals in the definition of redundant inference. In particular, feature (1) may well be crucial in taming issues relating to undecidable unification problems. For example, in higher-order logic where unification is undecidable, it is common to run unification to a particular depth and then give up if termination has not occurred. Of course, this harms completeness. With our approach it should be possible to add the remaining unification pairs back as constraints and maintain completeness. In the future, we would like to generalise our approach into a framework that can be used to prove the completeness of a variety of calculi as long as the unification problem for the underlying terms meets certain conditions. We would also like to explore instantiating such a framework to prove the completeness of particular calculi of interest to us such as AC-superposition and higher-order superposition.

Acknowledgements. We acknowledge funding from the ERC Consolidator Grant ARTIST 101002685, the TU Wien Doctoral College SecInt, and the FWF SFB project SpyCoDe F8504.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **On Incremental Pre-processing for SMT**

Nikolaj Bjørner1(B) and Katalin Fazekas<sup>2</sup>

<sup>1</sup> Microsoft Research, Redmond, USA nbjorner@microsoft.com <sup>2</sup> TU Wien, Vienna, Austria katalin.fazekas@tuwien.ac.at

**Abstract.** We introduce a calculus for incremental pre-processing for SMT and instantiate it in the context of z3. It identifies when powerful formula simplifications can be retained when adding new constraints. Use cases that could not be solved in incremental mode can now be solved incrementally thanks to the availability of pre-processing. Our approach admits a class of transformations that preserve satisfiability, but not equivalence. We establish a taxonomy of pre-processing techniques that distinguishes cases where new constraints are modified or constraints previously added have to be replayed. We then justify the soundness of the proposed incremental pre-processing calculus.

### **1 Introduction**

Pre-processing is a central ingredient for scaling automated deduction. These techniques apply targeted global simplification steps that can drastically reduce the complexity of problems before search techniques that use mainly local inference steps are invoked. They are used across several solver domains, spanning SAT, to SMT, first-order automated theorem proving, constraint programming, and integer programming. With the exception of SAT solvers, prior techniques do not combine well when new constraints are added incrementally to a preprocessed state. Solvers have the option to restart pre-processing from scratch. This model is viable if the overall number of solver calls is small compared to time spent solving, but is not practical for scenarios where many minor variations of a set of main constraints are queried. Such scenarios may be found in applications of dynamic symbolic execution or symbolic model checking.

A procedure to incorporate pre- and in-processing techniques [27] into incremental SAT solvers was introduced in [18], where such incremental in-processing allowed a dramatic improvement in the performance of bounded model checking applications. In the case of SAT, the effect of a simplification step is recorded in a *reconstruction stack*. Each eliminated clause is saved on that stack together with a partial assignment, called its *witness*, that is used to show the redundancy of the eliminated clause. For example, the redundancy of blocked clauses are witnessed by their blocked literal, a literal that upon all resolvents are tautological [26,32]. The reconstruction stack has two very important roles in SAT solvers. First of all, it has all the information that is necessary for model reconstruction [25]. When the elimination of a clause is not model-preserving, its witness on the stack tells how to modify or extend any found solution of the simplified formula such that it then satisfies the removed clause as well. Beyond that, the reconstruction stack allows to recognize all those previous simplification steps that are potentially invalidated by an incrementally added new constraint. For example, literals that were blocked in the global state of the previous clauses might not be blocked any more in the presence of some new constraints. Finding these clauses and their cone of influence on the reconstruction stack allows to *undo* only the problematic previous simplification steps, thereby allows pre- and in-processing to be incremental [18].

Motivated by incremental in-processing SAT solvers, our goal here is to pave a path towards a similar mechanism in the context of SMT solvers. However, SMT problems extend propositional SAT formulas in several dimensions: the base theory of SMT is the theory of equality over uninterpreted functions and predicates, SMT formulas may contain quantifiers, and constants and functions that have interpretations over theories. Concrete cases of incremental SMT pre-processing was considered in [19]. While most of the formula simplification techniques of SAT solvers are captured by well studied redundancy properties [23], such a unified understanding and description of SMT pre-processing techniques is not yet introduced. Though some redundancy notions of SAT solvers can be directly embedded or generalized to SMT [30], a notion that appears to capture simplifications in SMT in many cases is that of a *substitution*: an uninterpreted constant or function is defined into a solved form and the constraints are simplified based on the solution. When new constraints, containing the solved function symbols, are added after pre-processing, our method distinguishes between simplifications that allow applying the substitution to the new formula or removing the substitution and re-adding the old constraints that were simplified. We have found it useful to characterize pre-processing simplifications by the following categories.

*Equivalence Preserving Simplifications.* Many simplification methods are based on equivalence preserving simplifications. For example x>x−y+ 1 simplifies to y > 1. They are automatically incremental by virtue of not changing the set of models. Developing equivalence preserving simplifications is a significant area of research and engineering by itself. A good example is using and-inverter graphs (AIGs) for simplifying propositional and first-order formulas [24,45]. The main challenge with developing equivalence preserving simplifications in an incremental setting is to make them efficient.

*Rigid Constrained Simplifications.* An important class of simplifications are based on eliminating variables by finding solutions to them. In the formula x <sup>≤</sup> y + 1 <sup>∧</sup> x <sup>≥</sup> y + 1 <sup>∧</sup> ϕ[x, y] we can solve for x (or y) by setting x y + 1 and then substituting in the solution for x into ϕ. The simplified formula is ϕ[y + 1, y]. The set of models of the original formula must all satisfy the equality x y + 1. This property allows to reuse the simplification when later adding a formula ψ[x, y]. It can be added by applying the solution for x: <sup>ψ</sup>[<sup>y</sup> + 1, y]. A model of ϕ[y + 1, y] <sup>∧</sup> ψ[y + 1, y] must conversely correspond to a model of the original formulas ϕ[x, y] and ψ[x, y]. The equality x → y + 1 is used in a *model converter* to establish the original model. Some pre-processing techniques translate constraints from one domain to another. For example, formulas over bounded integers can be solved by translation into bit-vectors. This translation can be described with a set of equalities where bounded integers are solved for their bit-vector representation (see later an example in Table 1).

*Under Constrained Simplifications.* The rigid constrained simplifications already cover a significant class of pre-processing methods. Allowing incrementally solving for variables has a profound practical effect on using z3 incrementally in user scenarios. There is however a larger class of simplifications that also allow eliminating variables but do not preserve solutions to the eliminated variable. These simplifications have the same or more solutions for symbols in the original formula and we call them *under-constrained*. For example, the formula ((x y <sup>∧</sup> y<z <sup>+</sup> u) <sup>∨</sup> y <sup>≥</sup> z · u) contains x in only one position. It can be replaced by the formula ((b <sup>∧</sup> y<z <sup>+</sup> u) <sup>∨</sup> y <sup>≥</sup> z · u) where b is fresh. Similarly introducing definitions of fresh symbols does not eliminate solutions to symbols in the original formula. Lastly, when removing redundant clauses, the new formula may have more solutions. Tseitin transformation introduces definitions that allow removing redundant, non-CNF, formulas.

*Over Constrained Simplifications.* Symmetry reduction [14,38] and strengthening using propagation redundancy criteria [37] are prominent examples of simplifications that apply strengthening to reduce the search space. These transformations are not covered by the classes covered by our main result. We leave it to future work to examine whether or how to incorporate strengthening: one avenue is to leverage assumption literals [16] to temporarily enable strengthenings either as part of pre-processing or during search [39].

Table 1 summarizes the main categories of pre-processing techniques discussed so far. This paper develops a calculus of incremental pre-processing for rigid constrained, under-constrained, clause elimination, and introduction of definitions. However, it does not discuss further over-constrained simplifications.

In this paper we introduce the concept of *simplification modulo substitutions* and show that the main SMT pre-processing methods maintain such a property. Based on that, we show how to apply or revert the effect of previous pre-processing steps when new formulas are added after simplification.

#### **2 Preliminaries**

We assume the usual notions of first-order logic with equality, satisfiability, logical consequence and theory, as described e.g. in [17]. An interpretation M for a signature <sup>Σ</sup> (or <sup>Σ</sup>-model) consists of a non-empty set <sup>U</sup><sup>M</sup> called the universe of the model, and a mapping ( )<sup>M</sup> assigning to each variable and constant symbol an element of <sup>U</sup>M, to each <sup>n</sup>-ary function symbol <sup>f</sup> in <sup>Σ</sup> an <sup>n</sup>-ary function <sup>f</sup><sup>M</sup>


**Table 1.** Main categories of pre-processing techniques found in SMT solvers. Function *ite* is an abbreviation for *if-then-else* and *bv2int* is a function that maps a bit-vector to an integer value.

from <sup>U</sup> <sup>n</sup> <sup>M</sup> to <sup>U</sup>M, and to each <sup>n</sup>-ary predicate symbol <sup>p</sup> in <sup>Σ</sup> an <sup>n</sup>-ary function from the set <sup>U</sup> <sup>n</sup> <sup>M</sup> to distinguished values representing true and false. Note that to keep the presentation simple, we only consider a single universe in the models. Interpretations extend to terms by composition.

We use the terminology *symbols* referring to uninterpreted symbols (variables) and function symbols. Given a model <sup>M</sup> and a symbol x, the model <sup>M</sup>[<sup>x</sup> → <sup>a</sup>] is exactly the same as <sup>M</sup>, except that <sup>x</sup><sup>M</sup> <sup>=</sup> <sup>a</sup> where <sup>a</sup> ∈ U<sup>M</sup> for 0 ary symbols and <sup>a</sup> is a function over <sup>U</sup><sup>M</sup> for <sup>n</sup>-ary function or predicate symbols.

**Lemma 1 (Translation Lemma** [41]**).** *If* F *is a formula and* t *is a term s.t. no variable in* t *occurs bound in* F*, then* M |<sup>=</sup> F[t/x] *iff* <sup>M</sup>[x → t <sup>M</sup>] <sup>|</sup><sup>=</sup> F*.*

Note that we may use λ terms to represent updates to function and predicate symbols. The interpretation of a λ term is a function.

We denote *Skolem symbols* for n-ary functions (where n = 0 is possible) that cannot occur in input formulas. Only pre-processing methods may introduce the Skolem symbols as a guarantee that they are fresh.

**Convention 1 (Variable non-capture).** *Throughout this paper we assume that free and bound variables are disjoint, such that when we substitute a term* t *for a variable* x *in formula* F*, none of the variables in* t *are captured.*

**Definition 1 (Labeled substitution).** x←t; Ψ**<sup>B</sup>** *represents a substitution of* x *by* t*, justified by the formula* Ψ*. The label* **<sup>B</sup>** *is either or* <sup>⊥</sup> *and it indicates whether the map* x → t *may be used as an equal replacement of* Ψ*.*

*Example 1.* The labeled substitution x <sup>←</sup> y + 1; x y + 1<sup>⊥</sup> represents the substitution of x by y + 1 justified by the formula x y + 1. The label <sup>⊥</sup> of the substitution indicates that applying the substitution on a formula F where x y + 1 is present does not change the set of models of the formula.

**Definition 2.** *Given* θ <sup>=</sup> <sup>x</sup><sup>1</sup> <sup>←</sup> <sup>t</sup><sup>1</sup>; <sup>Ψ</sup><sup>1</sup> **<sup>B</sup>**<sup>1</sup> <sup>x</sup><sup>2</sup> <sup>←</sup> <sup>t</sup><sup>2</sup>; <sup>Ψ</sup><sup>2</sup> **<sup>B</sup>**<sup>2</sup> ... <sup>x</sup><sup>n</sup> <sup>←</sup> <sup>t</sup>n; <sup>Ψ</sup><sup>n</sup> **<sup>B</sup>***<sup>n</sup> and an interpretation* <sup>M</sup>*, we define the interpretation* <sup>M</sup>θ *as follows:*

$$\mathcal{M}\varepsilon = \mathcal{M}$$

$$\mathcal{M}\theta \langle x \leftarrow t; \Psi \rangle^{\mathbb{B}} = (\mathcal{M}[x \longmapsto t^{\mathcal{M}}])\theta$$

**Definition 3.** *Given* θ <sup>=</sup> <sup>x</sup><sup>1</sup> <sup>←</sup> <sup>t</sup><sup>1</sup>; <sup>Ψ</sup><sup>1</sup> **<sup>B</sup>**<sup>1</sup> <sup>x</sup><sup>2</sup> <sup>←</sup> <sup>t</sup><sup>2</sup>; <sup>Ψ</sup><sup>2</sup> **<sup>B</sup>**<sup>2</sup> ... <sup>x</sup><sup>n</sup> <sup>←</sup> <sup>t</sup><sup>n</sup>; <sup>Ψ</sup><sup>n</sup> **<sup>B</sup>***<sup>n</sup> and a formula* F*, we define the formula* F θ *as follows:*

$$F\varepsilon = F$$

$${}^{}\_{F}\langle x \leftarrow t; \Psi\rangle^{\mathbb{B}}\theta = (F[t/x])\theta$$

Informally, a sequence of substitutions θ is applied to interpretations from right to left (i.e. backwards), while to formulas from left to right (i.e. forward). Further, note that the translation lemma generalizes in a straight-forward way to substitutions.

#### **3 Incremental Pre-processing**

In this section we introduce a calculus to describe incremental pre-processing for SMT based on the following notion.

**Definition 4 (Simplification modulo** θ**).** *We say that the formula* F *simplifies to* <sup>F</sup> *modulo* <sup>θ</sup>*, denoted* <sup>F</sup> <sup>θ</sup> <sup>F</sup> *if*

*– If* M |<sup>=</sup> F *then there is a model* <sup>M</sup> *such that,* <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>F</sup> *and* <sup>M</sup> *agrees with* <sup>M</sup> *on all symbols that are in* F *or in background theories or not in* F *. – If* <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>F</sup> *then* <sup>M</sup> θ <sup>|</sup><sup>=</sup> F*.*

It follows that simplification allows transitive chaining assuming that symbols are not recycled.

**Lemma 2 (Transitivity of simplification).** *Let* <sup>F</sup> <sup>θ</sup> <sup>F</sup> *and* <sup>F</sup> <sup>θ</sup>- F *such that every symbol that is both in* F *and* F *also occurs in* F *(i.e. old symbols are not re-introduced). Then* F θθ- F*.*

#### **3.1 Simplification Rules**

There are several possible situations where the concept of simplification modulo substitutions can be used to capture potential simplification steps. For example, a useful special case for simplification modulo θ is when a formula F implies an equality x t that can then be turned into a substitution to simplify F.

*Example 2.* The formula *isCons*(x) <sup>∧</sup> F[x] implies <sup>∃</sup>h, t . x *cons*(h, t), where h, t are fresh variables (corresponding to the head and tail of a cons list). We may substitute x by *cons*(h, t) in F[x] to eliminate x. The literal *isCons*(*cons*(h, t)) is equivalent true and F[*cons*(h, t)] is a model simplification of the original formula modulo x *cons*(h, t).

There are also useful special cases where a formula F *does not* imply an equality x t, but the same equality may still be used to simplify F.

*Example 3.* In the formula F := ((x <sup>3</sup> <sup>∧</sup> x>u) <sup>∨</sup> y>u) <sup>∧</sup> u>z we can substitute x → 3 and retain simplification. The formula F simplifies to F[3/x] := (3 > u <sup>∨</sup> y>u) <sup>∧</sup> u>z, but <sup>F</sup> does not imply <sup>x</sup> = 3.

There are also cases where substitutions are not suitable to describe the relation between F and F . It is easier to characterize these by the property that F is a proper subset of F.

*Example 4.* A blocked clause p∨C can be removed from a set of formulas without changing satisfiability: F,(p∨C) <sup>p</sup>→p∨¬<sup>C</sup> <sup>F</sup>. If we were to substitute <sup>p</sup> by <sup>p</sup>∨¬<sup>C</sup> everywhere in F it would weaken clauses where p occurs positively.

Finally, it is possible to accomodate cases where pre-processing *introduces* definitions, such as through the *unfold* transformation (see Sect. 6.5), or by Skolemization and Tseitin transformations.

*Example 5.* The Skolemization of <sup>∀</sup>x . <sup>∃</sup>y.p(x, y) is <sup>∀</sup>x.p(x, fsk(x)). Here the original quantified formula is replaced by the Skolemized formula.

We model the pre-processing performed by an SMT solver as a sequence of abstract states where each state consists of two components: a formula F and an ordered sequence of labeled substitutions θ. Based on the shown cases, we formulate the following conditions for applying simplification rules in Fig. 1.

**Fig. 1.** A calculus for pre-processing in SMT

We formulated the side conditions that allow to identify a minimal set of conjuncts Ψ of F involved with the solution for x. Note that a simplification remains valid when adding conjuncts that do not contain x. The Update rule handles broadly a set of simplifications, including proof rules from DRAT systems and introduction of definitions and Skolemization. It may be presented in forms where Φ or Ψ or the substitution are empty. The substitution x → t generally represents a tuple of symbols x replaced by terms t. To simplify presentation we only discuss the case where x is a single symbol and we elide rules that preserve equivalence. The Update rule records Ψ so it can later be re-added in case a new constraint mentions x. This may be overkill when Φ[t/y] = Ψ for y fresh (in Sect. <sup>4</sup> we will show another rule, Invert, that adds only the equality y t in such cases).

**Lemma 3.** *If* F ⇒ ∃y.x t[y]*, s.t.* y ∈ F*,* x ∈ t*, and* t *is substitutable for* x *in* <sup>F</sup>*, then* <sup>F</sup> <sup>x</sup>→<sup>t</sup> <sup>F</sup>[t[y]/x]*.*

*Proof.* Let <sup>M</sup> be an interpretation s.t. M |<sup>=</sup> F. Then M |<sup>=</sup> F ∧ ∃y.x t[y] and by definition of the satisfaction relation, there must exists an <sup>a</sup> ∈ UM, s.t. <sup>M</sup>[y → a] <sup>|</sup><sup>=</sup> F <sup>∧</sup> x t[y]. Let <sup>M</sup> note <sup>M</sup>[<sup>y</sup> → <sup>a</sup>]. From <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>F</sup> <sup>∧</sup> <sup>x</sup> <sup>t</sup>[y] follows that x<sup>M</sup>- <sup>=</sup> t[y] M- and so F<sup>M</sup>- <sup>=</sup> F[t[y]/x] M- . Since <sup>M</sup> <sup>|</sup><sup>=</sup> F, we have that <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>F</sup>[t[y]/x]. For the other direction, when <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>F</sup>[t[y]/x], due to Lemma 1, M [x → t[y] M- ] <sup>|</sup><sup>=</sup> <sup>F</sup>. Hence, <sup>F</sup> <sup>x</sup>→<sup>t</sup> <sup>F</sup>[t[y]/x].

**Corollary 1.** *The side-condition for* Rigid *implies that* <sup>F</sup> <sup>x</sup>→<sup>t</sup> <sup>F</sup>[t/x]*.*

**Lemma 4.** *Assume* <sup>Ψ</sup> <sup>⊆</sup> F, x ∈ <sup>F</sup> \ <sup>Ψ</sup> *and* <sup>Ψ</sup> <sup>x</sup>→<sup>t</sup> <sup>Ψ</sup>[t/x]*, then* <sup>F</sup> <sup>x</sup>→<sup>t</sup> <sup>F</sup>[t/x]*.*

*Proof.* Since <sup>x</sup> ∈ <sup>F</sup>, (<sup>F</sup> \ <sup>Ψ</sup>)=(<sup>F</sup> \ <sup>Ψ</sup>)[t/x], thus (<sup>F</sup> \ <sup>Ψ</sup>) <sup>x</sup>→<sup>t</sup> (<sup>F</sup> \ <sup>Ψ</sup>)[t/x]. Then, from <sup>Ψ</sup> <sup>x</sup>→<sup>t</sup> <sup>Ψ</sup>[t/x] follows that <sup>F</sup> <sup>x</sup>→<sup>t</sup> <sup>F</sup>[t/x].

Lemma 3 established that the side-condition for Rigid ensures simplification modulo θ. We therefore have the following corollaries.

**Corollary 2.** *If a formula* <sup>F</sup> *is derived from* F *by the inferences from Fig. 1, then it has the property* <sup>F</sup> <sup>x</sup>→<sup>t</sup> <sup>F</sup> *.*

The other rules enforce preservation of satisfiability in their side-conditions.

**Corollary 3.** *The rules from Fig. 1 preserve satisfiability.*

The transitive application of the simplifications also preserve satisfiability in a way that extends the notion of simplification modulo a substitution.

**Proposition 1.** *Consider a formula* <sup>F</sup><sup>0</sup> *and a state* <sup>F</sup> <sup>θ</sup> *derived from* <sup>F</sup><sup>0</sup> <sup>ε</sup> *using the rules from Fig. 1. Then* <sup>F</sup><sup>0</sup> <sup>θ</sup> <sup>F</sup>*.*

*Proof.* It follows as Corollary 2 notes that each application of a rule from Fig. 1 is a simplification modulo and Lemma 2 notes that simplification modulo is transitive.

Informally, Proposition <sup>1</sup> means that using θ, one can transform any model of the simplified formula into a model of the original input formula. Note that the simplified F may contain fresh Skolem symbols that are not occurring in F<sup>0</sup>.

#### **3.2 Pre-processing Replay**

Rules of Fig. 1 captured possible pre-processing steps that can be applied on a single SMT problem. We now describe the scenario where we add additional constraints Φ to a pre-processed state. Without incremental pre-processing we have the option to conjoin <sup>Φ</sup> to the original formula <sup>F</sup><sup>0</sup> and re-run pre-processing. The goal of incremental pre-processing is to retain as much of the effect of previous work as possible.

We will show that for pre-processing steps derived by rule Rigid it is possible to apply the corresponding substitution to Φ directly, while the other simplification steps may require to re-introduce formulas that were previously removed. We call this process of applying the effect of simplifications on a new formula as pre-processing *replay*. Figure 2 shows an imperative implementation of preprocessing replay.

**Fig. 2.** Algorithm Replay

Our main proposition summarizes the main property of Replay and ensures that an arbitrary formula Φ can be added mid-stream after pre-processing.

**Proposition 2.** *Let* F θ *be a state resulting from pre-processing* F<sup>0</sup>*, and let* F <sup>∧</sup> Φ <sup>θ</sup> *be a state produced by applying procedure Replay to* <sup>Φ</sup> *and* <sup>θ</sup>*, then* <sup>F</sup><sup>0</sup> <sup>∧</sup> <sup>Φ</sup> *is equi-satisfiable to* <sup>F</sup> <sup>∧</sup> <sup>Φ</sup> *.*

To establish Proposition 2 we will introduce a calculus for reverting the effect of simplifications. It is shown in Fig. 3 and comprises of two rules, one for adding a formula with a substitution to F, the other both reverts the effect of a simplification and adds the reverted formula to F. The inferences rely on a side-condition that the formulas Φ, Ψ are *clean* relative to the substitution θ.

**Definition 5.** *A formula* Φ *is clean w.r.t. a substitution sequence* θ *iff*

$$\begin{array}{lclcl} \text{ADD}: & & & \\ F \parallel \theta & \implies & F, \Phi \theta \parallel \theta & \text{if } \Phi \text{ is clean w.r.t. } \theta \\ \text{UNDO}: & & & \\ F \parallel \theta\_0 (x \gets t; \Psi)^{\mathbb{E}} \theta \Longrightarrow F, \Psi \theta \parallel \theta\_0 \theta & \text{if } \Psi \text{ is clean w.r.t. } \theta \end{array}$$

**Fig. 3.** A calculus for reverting pre-processing. Undo reverts a simplification by reintroducing a constraint. It prunes θ until Add applies for a new constraint Φ.

*–* θ <sup>=</sup> ε*, or –* θ <sup>=</sup> x←t; Ψ**<sup>B</sup>**θ *,* x ∈ Φ *and* Φ *is clean with respect to* θ *, or –* θ <sup>=</sup> x←t; Ψ<sup>⊥</sup>θ *and* <sup>Φ</sup>[t/x] *is clean with respect to* <sup>θ</sup> *.*

Thus, intuitively, Φ is clean w.r.t. θ if Φθ uses only Rigid substitutions from θ.

We now establish that formulas that are clean relative to θ can be added (after substitution) to formulas while maintaining models. The substitution used in rigid updates corresponds to equalities that are consequences.

**Lemma 5.** *Given a state* <sup>F</sup> θθ *derived from the state* F θ *and formula* Φ *that is clean with respect to* θ *, then* F <sup>∧</sup> Φ <sup>θ</sup>- F <sup>∧</sup> Φθ *.*

*Proof.* We examine the two directions.


The correctness of the Add rule is now immediate:

**Corollary 4.** *Let* <sup>F</sup> <sup>θ</sup> *be derived from* <sup>F</sup><sup>0</sup> <sup>ε</sup>*, and* <sup>Φ</sup> *clean with respect to* <sup>θ</sup>*, then* <sup>F</sup><sup>0</sup> <sup>∧</sup> <sup>Φ</sup> *simplifies modulo* <sup>θ</sup> *to* <sup>F</sup> <sup>∧</sup> Φθ*.*

*Proof.* It follows from Lemma 5.

With Proposition 1 we established that Rigid, Flex and Update maintain <sup>F</sup><sup>0</sup> <sup>θ</sup> <sup>F</sup>. We need to show that also for rule Undo. The first step is to establish that the formula removed by each of the pre-processing rules can be re-added without affecting simplification.

**Lemma 6.** *Given an inference* <sup>F</sup> <sup>θ</sup> <sup>=</sup><sup>⇒</sup> <sup>F</sup> θ x←t; Ψ**<sup>B</sup>** *by either of the rules* Rigid*,* Update*,* Flex *the formula* F *simplifies to* F , Ψ *modulo* ε*.*

*Proof.* The proof is by case analysis by the rule that is applied.


**Lemma 7.** *Given* F θ <sup>x</sup>←t; <sup>Ψ</sup> **<sup>B</sup>**θ <sup>=</sup>⇒Undo F, Ψθ θθ *, s.t.* <sup>F</sup><sup>0</sup> <sup>θ</sup> <sup>x</sup>←t;<sup>Ψ</sup>**<sup>B</sup>**θ- <sup>F</sup>*, then* <sup>F</sup><sup>0</sup> θθ- F, Ψθ *holds.*

*Proof.* Given an inference <sup>F</sup><sup>1</sup> <sup>θ</sup> <sup>=</sup><sup>⇒</sup> <sup>F</sup><sup>2</sup> <sup>θ</sup> x←t; Ψ**<sup>B</sup>**. Lemma <sup>6</sup> establishes that the formula <sup>F</sup><sup>1</sup> simplifies to <sup>F</sup>2, Ψ modulo . Lemma <sup>5</sup> establishes that F2, Ψ simplifies to F, Ψθ modulo θ . Chaining the definition of simplification modulo transitively establishes the lemma.

With Corollary 4 and Lemma 7 we have then established Proposition 2.

It is worth examining why the side-conditions for simplification modulo are used. As the following example shows, transformations that only preserve satisfiability but strengthen formulas cannot be used easily in an incremental setting.

*Example 6.* Let <sup>F</sup><sup>0</sup> be the satisfiable formula <sup>x</sup> <sup>y</sup> <sup>∧</sup> <sup>y</sup> <sup>≤</sup> <sup>z</sup> <sup>∧</sup> <sup>z</sup> <sup>v</sup>. In that formula x, y are equal, and z, v are equal. Lets assume that we simplify via the solution where the classes are merged (i.e. where y z). It is satisfiability preserving. It suggests a transformation that we call Flex† .

$$\frac{x \simeq y \land y \leq z \land z \simeq v \parallel \varepsilon}{x \simeq z \land z \simeq v \parallel \langle y \leftarrow z; (x \simeq y \land y \leq z) \rangle^{\top}} \text{ FLE}^{\dagger}{}^{\dagger}$$

The resulting state is still satisfiable. Now Undo can be applied without any problems. The result is still satisfiable, but not equivalent to <sup>F</sup><sup>0</sup> (does not have the models where the two equivalence classes are not merged).

$$\frac{x \simeq z \land z \simeq v \parallel \langle y \gets z; (x \simeq y \land y \leq z) \rangle^{\top}}{(x \simeq y \land y \leq z) \land x \simeq z \land z \simeq v \parallel \varepsilon} \text{ UNno}$$

Adding the constraint <sup>y</sup> <sup>z</sup> <sup>−</sup> 1 to <sup>F</sup><sup>0</sup> would be satisfiable, but adding it to our formula is unsatisfiable.

### **4 Simplification Methods**

Many simplification methods used in practice during pre-processing are equivalence preserving. These methods include formula rewriting, constant propagation, NNF conversion, quantifier elimination, and bit-blasting. They do not require the methodology from this paper and have been integral in Z3 since its inception. We will here discuss main simplification pre-processing routines that do not preserve equivalence and how they relate to our taxonomy.

#### **4.1 Equality Solving**

One of the most useful pre-processing techniques eliminates symbols when they can be *solved*, that is, a constraint implies an equality x t, where t is a term that does not contain x. Equality solving corresponds to finding unitary solutions to unification problems modulo theories. Most uses of equality solving are captured by transformations justified by rule Rigid. In Z3, equality solving comprises of a two stage process:


To elaborate, let <sup>E</sup> be a set of solution candidates <sup>x</sup><sup>1</sup> <sup>=</sup> <sup>t</sup>1,...x<sup>n</sup> <sup>=</sup> <sup>t</sup><sup>n</sup>. The candidates may contain multiple equalities using the same symbol. For example, <sup>E</sup> could be x <sup>=</sup> f(x), x <sup>=</sup> g(y), y <sup>=</sup> h(z). We can't use the solution x <sup>=</sup> f(x) because x already occurs in f(x). But we can use the solution x <sup>=</sup> g(y), y <sup>=</sup> h(z) processed in this order as first x is replaced by g(y), then y is replaced by h(z). In the second stage we extract from <sup>E</sup> a subset of equalities <sup>x</sup><sup>i</sup><sup>1</sup> <sup>=</sup> <sup>t</sup><sup>i</sup><sup>1</sup> ,...,x<sup>i</sup>*<sup>k</sup>* <sup>=</sup> <sup>t</sup><sup>i</sup>*<sup>k</sup>* , where <sup>x</sup><sup>i</sup>*<sup>j</sup>* are distinct and <sup>t</sup><sup>i</sup>*<sup>j</sup>* are terms such that <sup>x</sup><sup>i</sup>*<sup>j</sup>* ∈ <sup>t</sup><sup>i</sup>*j* for <sup>j</sup> <sup>≤</sup> <sup>j</sup> . The subset is in triangular form.

*Example 7.* We illustrate two application of Rigid for eliminating two symbols from three equations. The choice of the first two equations is arbitrary. An alternative simplification could choose to eliminate x and z instead. It is not possible, however, to eliminate all three variables.

F, x y + 1, y z + 1, z f(x) θ <sup>=</sup>⇒Rigid F[y + 1/x], y z + 1, z f(y + 1) θ x←y + 1; x y + 1<sup>⊥</sup> <sup>=</sup>⇒Rigid F[y+1/x, z+1/y], z f(z+2) θ x←y+1; x y + 1<sup>⊥</sup> y←z + 1; y z + 1<sup>⊥</sup>

The set of unification modulo theories facilities used in Z3 is based on extracting simple definitions. Foremost, for a conjunct x t of ϕ, where x is uninterpreted, x <sup>=</sup> t, include the equality candidate x t. Other equality candidates are included from formulas of the form *ite*(c, x t, x s) and arithmetic equalities of the form x+s t, such that x t−s is a solution candidate for x. Note that solution candidates are not necessarily unique for an equality. The constraint x <sup>+</sup> y t can be used as solution to both x and y. If x has a nested occurrence within t, the solution for y, but not x, can be used. Equality solving interacts with simplification pre-processing: equalities over algebraic data-types can be assumed to be in decomposed form already since rewriting simplification decomposes equalities of the form *cons*(h<sup>1</sup>, t<sup>1</sup>) *cons*(h<sup>2</sup>, t<sup>2</sup>) into <sup>h</sup><sup>1</sup> <sup>h</sup><sup>2</sup> <sup>∧</sup> <sup>t</sup><sup>1</sup> <sup>t</sup><sup>2</sup>. Equality solving can be extended modulo theories in several directions. Arithmetical equalities can be extracted from Diophantine equations solving and polynomial equality factorization as part of establishing a Gr¨obner basis. Equalities can be extracted from inequalities [6,31], other theories, such as the theory of arrays allow extracting solutions from equalities *store*(a, i, v) t, where a is a symbol that does not occur in t, i, v, as a *store*(t, i, w), together with the constraint *select*(t, i) v, where w is fresh. We leave a study of the cost/benefits of these approaches within the context of incremental pre-processing to future work.

Equality solving is extended to sub-formulas in the following way: When a positive sub-formula implies an equality x t and the symbol x does not occur outside of the sub-formula then x can be replaced by t within the subformula. The solution is no longer *rigid constrained* but can be justified by Flex.

*Example 8.* Suppose x ∈ F, Ψ, then we can use Flex to justify the simplification

$$F, (x \simeq t \land \Phi[x]) \lor \Psi \parallel \theta \Longrightarrow^{\text{Flex}} F, \Phi[t] \lor \Psi \parallel \theta \left\{ x \leftarrow t; (x \simeq t \land \Phi[x]) \lor \Psi \right\}^{\top}$$

#### **4.2 Unconstrained Sub-terms**

Symbols that have a single occurrence in a formula may be solved for based on context. For example, with the formula x <sup>≤</sup> y, y < z, z <sup>≤</sup> u, p(u), q(u), the constant x can be eliminated by using the solution x y. Then y can be eliminated by setting y z <sup>−</sup> 1, and finally z u.

Invertibility of unconstrained symbols (see e.g. [7,8]) in an incremental setting for bit-vectors was introduced in [19]. The method implements the following proof-rule, exemplified for the term x <sup>+</sup> t, containing the only occurrence of x.

#### Invert : F[x <sup>+</sup> t] θ <sup>=</sup><sup>⇒</sup> F[y] θ x←y <sup>−</sup> t; y x <sup>+</sup> t if <sup>x</sup> occurs uniquely in <sup>F</sup> y is fresh

To justify rule Invert in our setting, it suffices to check the condition from Lemma 6. Alternatively, we can use the generic rule Update when applying unconstrained simplifications. The rule Invert is more efficient than using Update because the latter requires adding back an entire conjunction Ψ where the invertible term x <sup>+</sup> t occurs. Invertibility can also be used to justify elimination of nested definitions. For a definition F <sup>∧</sup>((x t∧Φ[x])∨Ψ) (see Example 8), where x ∈ F, Ψ can first be rewritten as F <sup>∧</sup> ((x t <sup>∧</sup> Φ[t]) <sup>∨</sup> Ψ). Then x t is invertible because it contains the only occurrence of x. The new constraint is F <sup>∧</sup> ((y <sup>∧</sup> Φ[t]) <sup>∨</sup> Ψ) where y is a fresh Boolean symbol.

Invertibility conditions are theory dependent. Figure 4 exemplifies main invertibility conditions for arithmetic<sup>1</sup>.

$$\begin{array}{lcl} F[t - x] & \parallel \theta \Longrightarrow^{\text{lvver}} F[y] & \parallel \theta \langle x \leftarrow t - y; y \simeq t - x \rangle^{\top} \\ F[x \cdot x'] & \parallel \theta \Longrightarrow^{\text{lvver}} F[y] & \parallel \theta \langle x, x' \leftarrow y, 1; y \simeq x \cdot x' \rangle^{\top} \\ F[x \le t] & \parallel \theta \Longrightarrow^{\text{lvver}} F[y] & \parallel \theta \langle x \leftarrow it e \langle y, t, t+1 \rangle; y \simeq x \le t \rangle^{\top} \\ F[t \le x] & \parallel \theta \Longrightarrow^{\text{lvver}} F[y] & \parallel \theta \langle x \leftarrow it e \langle y, t, t-1 \rangle; y \simeq t \le x \rangle^{\top} \end{array}$$

**Fig. 4.** Invertibility rules for symbols x, xthat occur uniquely in F; y is fresh.

Z3 uses a heap ordered by occurrence counts to identify candidates for invertibility. It first processes all symbols with occurrence count 1. If it is possible to eliminate a symbol with occurrence count 1, the occurrence counts of sub-terms under the term that gets eliminated are decreased. The elimination process stops once the heap only contains symbols with occurrence counts above 1.

#### **4.3 Symbol Elimination and Macros**

SAT solvers use symbol elimination [15] to simplify clauses. The first-order version [11] remains timely in more recent works as well [28]. A predicate p can be eliminated if it occurs at most once in every clause either positively or negatively. Clauses that contain p are replaced by resolvents by applying binary resolution exhaustively, and then remove clauses containing p.

*Example 9.* We illustrate symbol elimination for the ground case with two clauses, and F such that p ∈ F, as an instance of the Update rule.

$$\begin{array}{l} F, p(t) \lor \Phi, \neg p(s) \lor \Psi \parallel \theta \Longrightarrow \text{U}^{\text{U}\text{DATE}} \\ F, s \not\models t \lor \Phi \lor \Psi \parallel \theta \{p \gets \lambda x \; . \; p(x) \lor (x \simeq t \land \neg \Phi); p(t) \lor \Phi, \neg p(s) \lor \Psi\}^{\top} \end{array}$$

The same elimination technique can also be applied to Horn clauses where p does not occur both in the head and body of any rule. A solution for the eliminated predicate is a conjunction of the upper bounds for p or a disjunction of lower bounds for p. It is generally a quantified formula. If the involved clauses admit quantifier free interpolants, the solution can also be computed using an interpolant from a solution to the reduced system [4]. Thus, the term t in a substitution x → t may only be computed *after* an initial model is known.

There are many cases where symbols can be eliminated incrementally and justified by the Rigid rule:

– Macros <sup>∀</sup>x.f(x) t[x], <sup>∀</sup>x.f(x) + s t are handled as <sup>∀</sup>x.f(x) t <sup>−</sup> s, assuming f is not free in s, t. Then replace occurrences f(a) by t[a], respectively t[a] <sup>−</sup> s[a].

<sup>1</sup> A summary of rules used for other theories can be found online: https://microsoft. github.io/z3guide/docs/strategies/summary#tactic-elim-uncnstr.


Macro elimination can be extended to ordered structures and in combination of theories [42]. It has been integral to making quantified reasoning with bitvectors [44] practical. We claim that first-order in-processing rules based on blocked clauses, asymmetric tautology elimination, covered clauses known from SAT [29] can also be captured by Update. We substantiate the claim with an example, but leave a comprehensive treatment for future work:

*Example 10.* Consider the clause C := p(x) <sup>∨</sup> q(x) and F := <sup>¬</sup>p(x) <sup>∨</sup> p(f(x)) <sup>∨</sup> r(x),¬p(x) <sup>∨</sup> p(f(x)) <sup>∨</sup> p(g(x)). The variable x is universally quantified. Then C can be rewritten to p(x) <sup>∨</sup> q(x) <sup>∨</sup> p(f(x)) without affecting satisfiability. The covered literal p(f(x)) was added to C as it occurs in every resolvent with <sup>p</sup>(x). The model for p has to be fixed, however. The model update is a first-order lifting of the propositional case.

$$\begin{array}{l} F, p(x) \lor q(x) \parallel \theta \Longrightarrow^{\text{UPDAT}} \\ F, p(x) \lor q(x) \lor p(f(x)) \parallel \theta \newline \forall p \gets \lambda x \; . \; p(x) \lor p(f(x)); \forall x \; . \; p(x) \lor q(x) \mid^{\top} \end{array}$$

To illustrate unification constraints in model updates, consider the clause C := p(h(x)) <sup>∨</sup> q(x) and p := λx . p(x) ∨ ∃y.x <sup>h</sup>(y) ∧ ¬q(y):

$$\begin{array}{l} F, p(h(x)) \lor q(x) \parallel \theta \Longrightarrow^{\text{UpACE}} \\ F, p(h(x)) \lor q(x) \lor p(f(h(x))) \parallel \theta \langle p \leftarrow p'; \forall x \, ., p(h(x)) \lor q(x) \rangle^{\top} \end{array}$$

### **5 Implementation**

We have implemented incremental pre-processing as an integral component of a new SMT solver, part of Z3. It can be enabled by setting the option sat.smt=true from the command line. It includes simplification by equality solving, elimination of uninterpreted sub-terms and macro detection as described in Sect. 4<sup>2</sup>. The primary reason for supporting incremental pre-processing has been usability. GitHub issues pointing to performance cliffs when switching to incremental mode are recurrent. A distilled example where pre-processing can solve formulas is as follows:

*Example 11.* Consider the benchmark.

```
(set-option :unsat_core true) (set-option :sat.smt true)
(declare-const exp Int) (push)
(assert (! (= exp 1) :named assumption))
(assert (not (= 2 (^ 2 exp)))) (check-sat) (get-unsat-core)
```
<sup>2</sup> See https://microsoft.github.io/z3guide/docs/strategies/simplifiers for a summary of simplifiers.

The legacy solver of z3 cannot solve it because it only knows about constant folding when expanding the definition of exponentiation (the symbol ^). With incremental propagation, the equality (not (= 2 (^ 2 exp))) simplifies to false.

Simplifiers interoperate with user scopes: SMT solvers support scoping using operations push and pop. All assertions made within a push are invalidated by a matching pop. To allow simplifiers to inter-operate with recursive function definitions they track symbols used in the bodies of recursive functions as *frozen*. Those symbols are excluded from solving. Similar to CaDiCaL's implementation for replaying clauses (see [18]), our implementation of Replay stores the domain of θ in a hash-set to bypass processing formulas that have no symbols in θ.

### **6 Related Work**

#### **6.1 Pre- and In-processing for SAT and QBF**

Pre-processing for SAT has received significant attention with the milestone work in Satellite [15] and then using notions of blocked clauses [27] and solution reconstruction [25]. Pre-processing techniques for QBF are discussed for example in [3,22]. The main pre-processing methods for propositional satisfiability solvers can be captured using our rule Update (see Example 4 for an instance of blocked clause elimination simplification). For the case where <sup>¬</sup>p∨D is a blocked clause, the model update is the de-Morgan dual: removing <sup>¬</sup>p <sup>∨</sup> D triggers the update <sup>M</sup>[p → (p <sup>∧</sup> D)M].

The work [18] introduces an inference system that also addresses *redundant* clauses and represents model updates using a notion of *witness labeled clauses*. The semantic content of the rules used for SAT are captured by Update. However, we elided tracking redundant clauses in this work. The case for SMT motivate specialized rules Rigid, Flex and Invert.

#### **6.2 Pre-processing for SMT**

Pre-processing simplification is integral in all main SMT solvers, including [2,33]. Incremental pre-processing with special attention to bit-vectors was introduced in [19]. Transformations considered in this thesis can be represented by the Rigid and Invert rules. Z3 exposes pre-processing simplifications as tactics [13] and allows users to compose them to suit specific needs of applications.

Invertibility conditions are used in [34] to guide local search. This work considers also a candidate value of all symbols. For example, F[x · t] is invertible to F[y] if t evaluates to 1.

#### **6.3 Pre-processing for MIP**

*Pre-solving* is terminology for pre-processing for mixed-integer linear programming solvers. There is a significant repertoire of pre-solving methods integrated in leading MIP solvers. Their effects are well documented in the newer survey [1], which provides an updated perspective to [20]. Pre-solving was developed earlier in [40]. The main methods can be categorized as operating on single rows (single constraints) or single columns (single variables), multiple rows, and multiple columns, and use global information about the tableau. They include also methods known from other domains, such as literal probing also found in SAT solvers, and symmetry reduction for sparse systems [38]. We are not aware of under-constrained simplifications used in mainstream MIP solvers. Only symmetry reduction stands out as outside the scope of incremental pre-solve methods.

*Example 12.* Pre-processing that combines two rows or combines two columns relies on efficient indexing [21] to be effective. The two column non-zero cancellation method considers the situation where the coefficients to two variables maintain a high degree of correlation. Consider the following formula

$$2x + 4y + z \le 5 \land x + 2y + u \le 6 \land 3x + y + z \le 3 \land \varphi \text{ where } x, y \notin \varphi.$$

The coefficients to x, y in the first two inequalities are related by the affine relation given by λ = 2. In this case the system can be reformulated, justified by rule Rigid, by introducing a fresh variable v and using the inequalities

$$2v + z \le 5 \land v + u \le 6 \land 3v - 5y + z \le 3 \land \varphi.$$

#### **6.4 Pre-processing in First- and Higher-Order Provers**

Pre-processing is also an important part of first-order theorem provers. Techniques for creating small clausal normal forms have long attracted attention [35]. Main simplifications [24] are based on detecting definitions similar to what is described in Sect. 4.3, but with the extra twist of ensuring that simplifications preserve first-order decidability, such as ensuring that formulas remain within the EPR fragment. Furthermore a variant of AIGs with nodes representing quantifiers are used to detect shared structure. While [24] is only concerned establishing preservation of satisfiability we note that the classification as model equivalent from Sect. 4.3 extends to the cases considered. In-processing inspired by SAT was pursued for first-order [29,43] and recently for higher-order settings [5].

#### **6.5 Constrained Horn Clauses**

Constrained Horn Clauses [4], enjoy a tight connection with Logic Programming where several transformation techniques were developed [10,12], including incremental consequence propagation [36]. *Fold* [9] transformations introduce auxiliary predicates and rules that correspond to replacing a code-block with an auxiliary procedure. It is justified by Rigid. *Unfold* transformations can be justified by Update and correspond to macro elimination.

### **7 Summary**

We introduced a calculus of pre-processing for SMT. It distinguishes simplifications that are *rigid* and so can be applied to new formulas as substitutions. Other simplified formulas may need to be re-introduced similar to re-introducing removed clauses in SAT. We examine several of the pre-processing methods studied in SAT, ATP, MIP and SMT as instances of the calculus. We leave empirical and algorithmic studies of new pre- and in-processing methods to future work. Another angle we have left on the table is reconciling pre-processing with inprocessing. For SAT, it was useful to develop a calculus that accounted for both irredundant and redundant clauses. In our current effort we have set this angle aside in favour of establishing main properties on replaying substitutions.

**Acknowledgment.** Thanks to the reviewers for their extensive constructive feedback and to Diego Olivier Fernandez Pons for introducing us to MIP pre-solving. The research was partially funded by the Austrian Science Fund (FWF) under project No. T-1306.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Verified Given Clause Procedures

Jasmin Blanchette1,2,3(B) , Qi Qiu<sup>4</sup> , and Sophie Tourret2,3

<sup>1</sup> Ludwig-Maximilians-Universität München, Munich, Germany jasmin.blanchette@ifi.lmu.de <sup>2</sup> Université de Lorraine, CNRS, Inria, LORIA, Nancy, France sophie.tourret@inria.fr

<sup>3</sup> Max-Planck-Institut für Informatik, Saarland Informatics Campus, Saarbrücken,

Germany

<sup>4</sup> Université Claude Bernard Lyon-1, LIRIS CNRS UMR 5205, Université de Lyon, Lyon, France

qi.qiu@univ-lyon1.fr

Abstract. Resolution and superposition provers rely on the given clause procedure to saturate clause sets. Using Isabelle/HOL, we formally verify four variants of the procedure: the well-known Otter and DISCOUNT loops as well as the newer iProver and Zipperposition loops. For each of the variants, we show that the procedure guarantees saturation, given a fair data structure to store the formulas that wait to be selected. Our formalization of the Zipperposition loop clarifies some fine points previously misunderstood in the literature.

Keywords: Saturation provers · Proof assistants · Stepwise refinement

### 1 Introduction

Resolution [13] and superposition [2] provers are based on *saturation*. In a first approximation, these provers perform all possible inferences between the available clauses. The full truth, however, is more complex: Provers may delete clauses that are considered *redundant*; for example, with resolution, if p(x) is in the clause set, then both <sup>p</sup>(a) and <sup>p</sup>(x) <sup>∨</sup> <sup>q</sup>(x) are redundant and could be deleted.

The procedure that saturates a set of clauses—or more generally, formulas up to redundancy is called the *given clause procedure* [10, Sect. 2.3]. It has several variants. The two main variants are the Otter loop [10] and the DISCOUNT loop [1]. In this paper, we also consider the iProver [8] and Zipperposition loops [17]; they are variants of the Otter and DISCOUNT loops, respectively.

In its simplest form, the procedure distinguishes between *passive* and *active* formulas. Formulas start as passive. One passive formula is selected as the *given clause*. <sup>1</sup> Deletions and simplifications with respect to other passive and active formulas are then performed; for example, if the given clause is redundant with

<sup>1</sup> We keep the traditional name "given clause" even though our formulas are not necessarily clauses.

c The Author(s) 2023

B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 61–77, 2023. https://doi.org/10.1007/978-3-031-38499-8\_4

respect to an active formula, the given clause can be deleted, and if the given clause makes an active formula redundant, that formula can be deleted. Moreover, simplifications can take place; for example, in a superposition prover, if the term order specifies <sup>b</sup> <sup>a</sup>, the given clause is <sup>b</sup> <sup>≈</sup> <sup>a</sup>, and <sup>p</sup>(b) is an active formula, the active formula can be simplified to p(a) and made passive again.

Next, if the given clause has not been deleted, it is moved to the active set. All inferences between the given clause and formulas in the active set are then performed, and the resulting conclusions are put in the passive set. This procedure is repeated, starting with the selection of a new given clause, until the distinguished formula ⊥ is derived or the passive set is empty.

The main metatheorem about this procedure states that if the given clause is chosen fairly (i.e., no passive formula is ignored forever), then the active set will be saturated (up to redundancy) at the limit. As a corollary, if the proof calculus is refutationally complete (i.e., it derives ⊥ from any inconsistent set), then the prover based on the calculus will be refutationally complete as well.

We present an Isabelle/HOL [12] formalization of the Otter, DISCOUNT, iProver, and Zipperposition loops, culminating in a statement and proof of the main metatheorem for each one. We build on the pen-and-paper *saturation framework* developed by Waldmann, Tourret, Robillard, and Blanchette [18,19] and formalized in Isabelle/HOL by Tourret and Blanchette [16]. The framework is an elaboration of Bachmair–Ganzinger-style saturation [3, Sect. 4]. Waldmann et al. include descriptions of the four "loops" as instances of the framework, as Examples 71, 74, 81, and 82 [19]; our formalization follows these descriptions.

Among the four loops, the oldest one is the Otter loop. It originates from Otter, a resolution-based theorem prover for first-order logic introduced in 1988 [11]. Otter was the first prover to present the given clause algorithm, in its simplest form as described above.

The DISCOUNT loop followed a few years later as a byproduct of the DIS-COUNT system [7], itself built to distribute proof tasks among processors. What distinguishes a DISCOUNT loop is that it really treats the passive set as passive. Its formulas serve only as the pool from which to choose the next given clause; they are never involved in deletions or other simplifications. Another key difference between the two loops is the decoupling of the scheduling of an inference and the production of its conclusion, which makes DISCOUNT able to propagate deletions and simplifications to discard inferences before their conclusions enter the passive set. For example, suppose that, in DISCOUNT, an inference

$$\frac{\mathfrak{p}(x)\vee\mathfrak{p}(\mathfrak{a})\quad\neg\mathfrak{p}(y)\vee\mathfrak{q}(y)}{\mathfrak{p}(x)\vee\mathfrak{q}(y)}$$

called ι is scheduled, in a derivation using first-order resolution. Then suppose that, before ι is realized, p(a) is generated (e.g., as the result of the factorization of <sup>p</sup>(x) <sup>∨</sup> <sup>p</sup>(a)). This triggers the deletion of <sup>p</sup>(x) <sup>∨</sup> <sup>p</sup>(a), which has become redundant. Then ι becomes an orphan inference, since one of its premises is no longer in the active set. It can be deleted without threatening the procedure's completeness. In contrast, in an Otter loop, if ι is scheduled before p(a) is selected as the given clause, <sup>ι</sup>'s conclusion <sup>p</sup>(x)∨q(y) is directly added to the passive set, where it can be simplified.

What we call the iProver loop [8] is an extension of the Otter loop with a transition that removes a formula C if C is made redundant by a formula set M. This terminology is from Waldmann et al. [19, Example 74]. This rule, introduced when iProver was extended to handle the superposition calculus [8], combines an inference step with a step that simplifies the active set.

The last and most elaborate loop variant we present is the Zipperposition loop. Zipperposition is a higher-order theorem prover based on λ-superposition [4]. Its given clause procedure is designed to work with higher-order logic. Due to the explosiveness of higher-order unification, a single pair of premises can yield infinitely many conclusions. For example, the higher-order resolution inference

$$\frac{\mathfrak{p}\left(\mathfrak{f}\left(y\mathbf{a}\right)\right) \lor \mathfrak{q}\left(y \quad \neg \mathfrak{p}\left(z\left(\mathfrak{f}\left(\mathbf{a}\right)\right)\right)\right)}{\mathfrak{q}\left(\lambda x.\mathfrak{f}\left(\ldots\left(\mathfrak{f}\left(x\right)\right)...\right)\right)}$$

where y and z are variables, produces infinitely many conclusions of the form <sup>q</sup>(λx.f*<sup>n</sup>* <sup>x</sup>) for <sup>n</sup> <sup>∈</sup> <sup>N</sup>. Thus, the passive set must be able to store possibly infinite sequences of lazily performed inferences. The Zipperposition loop was described by Vukmirović et al. [17] and by Waldmann et al. [19, Example 82].<sup>2</sup> Vukmirović et al. describe the loop's implementation in Zipperposition, which we believe to be correct. In contrast, Waldmann et al. present an abstract version of the loop and connect it, via stepwise refinement, to their saturation framework, obtaining the main metatheorem. However, in the latter work, the details are not worked out. Thanks to the Isabelle formalization, we note and address several issues such that we now have a first rigorous—in fact, fully formal—presentation of the essence of the Zipperposition loop including the metatheorem.

Our work is part of IsaFoL (Isabelle Formalization of Logic), an effort that aims at developing a formal library of results about logic and automated reasoning [6]. The Isabelle files amount to about 7000 lines of code. They were developed using the 2022 edition of Isabelle and are available in the *Archive of Formal Proofs* (*AFP*) [5], where they are updated to follow Isabelle's evolution.

This work joins a long list of verifications of calculi and provers. We refer to Blanchette [6, Sect. 5] for an overview of such works. The most closely related works are the two proofs of completeness of Bachmair and Ganzinger's resolution prover RP, by Schlichtkrull, Blanchette, Traytel, and Waldmann [14] and by Tourret and Blanchette [16] as well as the proof of completeness of ordered (unfailing) completion by Hirokawa, Middeldorp, Sternagel, and Winkler [9]. Instead of focusing on a single prover, here we consider general prover architectures. Via refinement, our results can be applied to individual provers.

### 2 Abstract Given Clause Procedures

To prove the main metatheorem for each of the four loops, we build on the saturation framework. The framework defines two highly abstract given clause

<sup>2</sup> Both groups of researchers include Blanchette and Tourret.

procedures, called GC ("given clause") and LGC ("lazy given clause") [19, Sect. 4]. They are formalized in the file Given\_Clause\_Architectures.thy of the *AFP* entry Saturation\_Framework [15].

GC is an idealized Otter-style loop. It operates on sets of labeled formulas. Formulas have the generic type f, and labels have the generic type l. One label, active, identifies active formulas, and the other labels correspond to passive formulas. GC is defined as a two-rule transition system -GC. In Isabelle syntax:

$$\begin{array}{l} \mathsf{inductive } (\curvearrowright) :: (\mathcal{I} \times \mathcal{I}) \text{ set} \Rightarrow (\mathcal{I} \times \mathcal{I}) \text{ set} \Rightarrow \textit{bool} \text{ where} \\ \textit{process} \colon N\_1 = N \cup M \implies N\_2 = N \cup M' \implies M \subseteq \textsf{Red}\_{\mathsf{F}} \left( \mathbb{N} \cup M' \right) \implies \textit{a} \\ \textit{active\\_subset } M' = \emptyset \implies N\_1 \leadsto\_{\mathsf{GC}} N\_2 \\ \mid \textit{infer\\_} \colon N\_1 = N \cup \{ (C, L) \} \implies N\_2 = N \cup \{ (C, \textsf{active}) \} \cup M \implies \\ \textit{L} \neq \textit{active} \implies \textit{active\\_subset } M = \emptyset \implies \\ \mathtt{Inf\\_between } (\textsf{fst\\_active\\_subset } N) \{ C \} \\ \subseteq \textsf{Red}\_{\mathsf{I}} \left( \textsf{fst\\_} \mid N \cup \{ (C, \textsf{active}) \} \cup M \right) \implies \\ \textit{N}\_1 \leadsto\_{\mathsf{GC}} N\_2 \end{array}$$

When presenting Isabelle code, we will focus on the main ideas and not explain all the Isabelle syntax or all the symbols that occur in the code. We refer to Waldmann et al. [19] for mathematical statements of the key concepts and to the Isabelle theory files for the formal definitions.

Informally, the transition relation -GC is defined as an inductive predicate equipped with two introduction rules, *process* and *infer*. Both rules allow a transition from N<sup>1</sup> to N<sup>2</sup> under some conditions:


The main metatheorem for GC states that if the set of passive formulas is empty at the limit of a derivation, the active formula set is saturated at the limit.

The lazy variant LGC generalizes the DISCOUNT loop. It operates on pairs (T,N), where T :: f *inference set* is a set of inferences that have been scheduled but not yet performed and N :: ( f× l) *set* is a set of labeled formulas. It consists of four rules that can be summarized as follows:


– The *delete\_orphan\_infers* rule can be used to delete a scheduled inference if one of its premises has been deleted.

The main metatheorem for LGC states that if the set of scheduled inferences and the set of passive formulas are empty at the limit of a derivation starting in an initial state, the active formula set is saturated at the limit.

### 3 Otter and iProver Loops

The Otter loop [10] works on five-tuples (N, X, P, Y, A), where N is the set of *new* formulas; X is a subsingleton (i.e., the empty set or a singleton {C}) storing a formula moving from N to P; P is the set of so-called *passive* formulas (although, strictly speaking, the formulas in N, X, and Y are also passive); Y is a subsingleton storing the *given clause*, which moves from P to A; and A is the set of *active* formulas. All the sets are finite in practice.

Initial states have the form (N, ∅, ∅, ∅, ∅). Inferences are assumed to be finitary, meaning that the set of inferences with C and formulas from A as premises (formally written Inf\_between <sup>A</sup> {C}) is finite if <sup>A</sup> is finite. Premiseless inferences are disallowed.

Otter Loop without Fairness. The first version of the Otter loop, formalized in Otter\_Loop.thy, does not make any fairness assumption on the choice of the given clause. The guarantee it offers is correspondingly weak: If the sets N, X, P, and Y are empty at the limit of a derivation starting in an initial state, then A is saturated. But there is no guarantee that N, X, P, and Y are empty at the limit. Later in this section, we will show how to ensure this generically.

The transition system -OL for the Otter loop without fairness is as follows:

inductive (-OL) :: ( <sup>f</sup> <sup>×</sup> OL\_label) *set* <sup>⇒</sup> ( <sup>f</sup> <sup>×</sup> OL\_label) *set* <sup>⇒</sup> *bool* where *choose\_n*: C /∈ N =⇒ state (<sup>N</sup> ∪ {C}, <sup>∅</sup>, P, <sup>∅</sup>, A) -OL state (N, {C}, P, <sup>∅</sup>, A) <sup>|</sup> *delete\_fwd*: <sup>C</sup> <sup>∈</sup> Red<sup>F</sup> (<sup>P</sup> <sup>∪</sup> <sup>A</sup>) <sup>∨</sup> (∃C <sup>∈</sup> <sup>P</sup> <sup>∪</sup> A. C · <sup>C</sup>) =⇒ state (N, {C}, P, <sup>∅</sup>, A) -OL state (N, <sup>∅</sup>, P, <sup>∅</sup>, A) <sup>|</sup> *simplify\_fwd*: <sup>C</sup> <sup>∈</sup> Red<sup>F</sup> (<sup>P</sup> <sup>∪</sup> <sup>A</sup> ∪ {C }) =⇒ state (N, {C}, P, <sup>∅</sup>, A) -OL state (N, {C }, P, ∅, A) <sup>|</sup> *delete\_bwd\_p*: <sup>C</sup> <sup>∈</sup> Red<sup>F</sup> {C} ∨ <sup>C</sup> · <sup>C</sup> <sup>=</sup>⇒ state (N, {C}, P ∪ {C }, ∅, A) -OL state (N, {C}, P, <sup>∅</sup>, A) <sup>|</sup> *simplify\_bwd\_p*: <sup>C</sup> <sup>∈</sup> Red<sup>F</sup> C, C <sup>=</sup>⇒ state (N, {C}, P ∪ {C }, ∅, A) -OL state (<sup>N</sup> ∪ {C}, {C}, P, <sup>∅</sup>, A) <sup>|</sup> *delete\_bwd\_a*: <sup>C</sup> <sup>∈</sup> Red<sup>F</sup> {C} ∨ <sup>C</sup> · <sup>C</sup> <sup>=</sup>⇒ state (N, {C}, P, <sup>∅</sup>, A ∪ {C }) -OL state (N, {C}, P, <sup>∅</sup>, A) <sup>|</sup> *simplify\_bwd\_a*: <sup>C</sup> <sup>∈</sup> Red<sup>F</sup> (C, C) =⇒ state (N, {C}, P, <sup>∅</sup>, A ∪ {C }) -OL state (<sup>N</sup> ∪ {C}, {C}, P, <sup>∅</sup>, A) <sup>|</sup> *transfer* : state (N, {C}, P, <sup>∅</sup>, A) -OL state (N, <sup>∅</sup>, P ∪ {C}, <sup>∅</sup>, A) | *choose\_p*: C /∈ P =⇒ state (∅, <sup>∅</sup>, P ∪ {C}, <sup>∅</sup>, A) -OL state (∅, <sup>∅</sup>, P, {C}, A)

<sup>|</sup> *infer* : Inf\_between <sup>A</sup> {C} ⊆ Red<sup>I</sup> (<sup>A</sup> ∪ {C} ∪ <sup>M</sup>) =⇒ state (∅, <sup>∅</sup>, P, {C}, A) -OL state (M, <sup>∅</sup>, P, <sup>∅</sup>, A ∪ {C})

The state function converts a five-tuple into a set of labeled formulas—an equivalent representation that is often more convenient formally. The labels are New (for N), XX (for X), Passive (for P), YY (for Y ), and Active (for A, corresponding to active in GC).

The first nine rules all refine GC's *process* rule, whereas the tenth rule, *infer*, refines GC's *infer*. More precisely: The first rule moves a formula from N to X. The second and third rules delete or simplify the formula in X. The fourth to seventh rules delete or simplify other formulas using the formula in X. The eight rule moves a formula from X to P. The ninth rule moves a formula from P to Y . And the tenth rule moves a formula from Y to A and performs all inferences with formulas in A or otherwise ensures that the inferences are made redundant.

Following Waldmann et al., the rules introducing new formulas—namely, the *simplify* rules and *infer*—allow adding arbitrary formulas to the state and are therefore not sound. Since the metatheorems are about completeness, there is no harm in allowing unsound steps, such as skolemization. If desired, soundness can be required simply by adding the assumption N |= N for each step N -OL N in a derivation.

Compared with most descriptions of the Otter loop in the literature, the above formalization (and Example 71 in Waldmann et al. [19] on which it is based) is abstract and nondeterministic, allowing arbitrary interleavings of deletions, simplifications, and inferences. By not commiting to a specific strategy, we keep our code widely applicable: Our abstract Otter loop can be used as the basis of refinement steps targetting a wide range of deterministic procedures implementing specific strategies. We will see the same approach used for all the loops. We note that Bachmair and Ganzinger made a similar choice for their ordered resolution prover RP [3, Sect. 4].

Otter Loop with Fairness. Below we introduce a fair version of the Otter loop, called -OLf and formalized in Fair\_Otter\_Loop\_Def.thy. This new version is closer to an implementation.

inductive (-OLf) :: ( p, f) *OLf* \_*state* ⇒ ( p, <sup>f</sup>) *OLf* \_*state* <sup>⇒</sup> *bool* where *choose\_n*: C /∈ N =⇒ (<sup>N</sup> ∪ {C}, None, P, None, A) -OLf (N, Some C, P, None, A) <sup>|</sup> *delete\_fwd*: <sup>C</sup> <sup>∈</sup> Red<sup>F</sup> (elems <sup>P</sup> <sup>∪</sup> <sup>A</sup>) <sup>∨</sup> (∃C <sup>∈</sup> elems <sup>P</sup> <sup>∪</sup> A. C · <sup>C</sup>) =⇒ (N, Some C, P, None, A) -OLf (N, None, P, None, A) <sup>|</sup> *simplify\_fwd*: <sup>C</sup> <sup>≺</sup><sup>S</sup> <sup>C</sup> <sup>=</sup>⇒ <sup>C</sup> <sup>∈</sup> Red<sup>F</sup> (elems <sup>P</sup> <sup>∪</sup> <sup>A</sup> ∪ {C }) =⇒ (N, Some C, P, None, A) -OLf (N, Some C , P, None, A) . . . <sup>|</sup> *choose\_p*: <sup>P</sup> <sup>=</sup> empty <sup>=</sup>⇒ (∅, None, P, None, A) -OLf (∅, None, remove (select <sup>P</sup>) P , Some (select <sup>P</sup>), A) <sup>|</sup> *infer* : Inf\_between <sup>A</sup> {C} ⊆ Red<sup>I</sup> (<sup>A</sup> ∪ {C} ∪ <sup>M</sup>) =⇒ (∅, None, P, Some C, A) -OLf (M, None, P, None, A ∪ {C})

The definition of -OLf differs from that of -OL in two main respects:


Also note that the state is now directly represented as a five-tuple (without the mediation of the state function), where the subsingletons are of type f *option*, with values of the forms None and Some C.

Formula Queue. The queue that represents the passive formula set P is formalized in its own file, Prover\_Queue.thy. The file defines an abstract type of queue and the operations on it (empty, select, add, remove, and elems). It also expresses the fairness assumption on the select function:

If a sequence of queue operations starting from an empty queue contains infinitely many removals of the selected element, then the queue is empty at the limit.

Moreover, the file contains an example implementation of the queue as a FIFO queue. This ensures that the abstract requirements on the queue, including fairness, are satisfiable.

iProver Loop with Fairness. To obtain an iProver loop from an Otter loop, only one extra rule is needed. The fair version of the iProver loop is formalized in Fair\_iProver\_Loop.thy as follows:

```
inductive (-
              ILf) :: (
                    p, 
                       f) OLf _state ⇒ (
                                         p, 
                                           f) OLf _state ⇒ bool where
  ol: St -
          OLf St =⇒ St -
                           ILf St
                                                 } ∧ C ≺· C =⇒
    (∅, None, P, Some C, A) -
                              ILf (M, None, P, None, A)
```
The first rule, *ol*, executes any -OLf rule as an iProver loop rule. The second rule, *red\_by\_children*, replaces a formula C by a set of formulas M that make it redundant. As M, iProver would use a set of simplified formulas produced by inferences with C as a premise and formulas from A ∪ {C} as further premises. The rule is stated in a more general, unsound form.

We prove the main metatheorem first for the iProver loop. Then, since an Otter derivation is an iProver derivation (in which the second rule, *red\_by\_children*, is not used), the result carries over directly to the Otter loop. The Isabelle statement, located in Fair\_iProver\_Loop.thy, is as follows:

theorem *fair\_IL\_Liminf\_saturated* assumes full\_chain (-ILf) *Sts* and is\_initial\_OLf\_state (*Sts* ! 0) shows saturated (Liminf *Sts*)

Informally, this states that if *Sts* is a complete -ILf derivation starting in a state of the form (N, None, empty, None, <sup>∅</sup>), then the limit is saturated. The limit (strictly speaking, limit inferior) is defined by

$$\mathsf{Liminf }Xs = \bigcup\_{i<|Xs|} \bigcap\_{j: i \le j \land j < |Xs|}Xs \text{ ! }j$$

where *Xs* ! j returns the element at index j of the finite or infinite sequence *Xs*. In Isabelle, such sequences are represented by the type a *llist* of "lazy lists."

This metatheorem is proved within the scope of the passive set queue's fairness assumption. It is derived from the metatheorem about the transition system -IL without fairness, which is inherited from the abstract procedure GC.

*Proof Sketch.* The main difficulty is to show that N, X, P, and Y are empty at the limit. Once this is shown, we can apply the main metatheorem for GC, which states that if there are no passive formulas at the limit, the active formula set is saturated.

Let *St*<sup>0</sup> -IL *St*<sup>1</sup> -IL ··· be a complete derivation, where *St <sup>i</sup>* = (N*i*, X*i*, P*i*, Y*i*, A*i*). If the derivation is finite, it is easy to show that the final state, and hence the limit, must be of the form (∅, None, empty, None, A), as desired.

Otherwise, for an infinite derivation, we assume in turn that the limit of N, X, P, or Y is nonempty and show that this leads to a contradiction. We start with N. Let i be an index such that N*<sup>i</sup>* ∩N*i*+1 ∩··· = ∅, which exists by the definition of limit. This means that N*i*, N*i*+1,... are all nonempty. By invariance, we can show that Y*i*, Y*i*+1,... are all empty. Thus, if we have a transition from *St <sup>j</sup>* to *St <sup>j</sup>*+1 for j ≥ i, it cannot be *infer* (via *ol*) or *red\_by\_children*. It can be shown that for the remaining transition rules, we have *St <sup>i</sup>* <sup>1</sup> *St <sup>i</sup>*+1 <sup>1</sup> ··· , where <sup>1</sup> is the converse of the lexicographic combination <sup>1</sup> of three well-founded relations:


Since the lexicographic combination of well-founded relations is well founded, the chain *St <sup>i</sup>* <sup>1</sup> *St <sup>i</sup>*+1 <sup>1</sup> ··· cannot be infinite, a contradiction.

Next, we consider the X component. If X is nonempty forever, the only possible transition rules are deletions and simplifications, and both make the entire state decrease with respect to ≺≺S. Again, we get a contradiction.

Next, we consider the P component. The fairness assumption for the queue guarantees that P is empty at the limit, at the condition that the *choose\_p* rule is executed infinitely often. Since P is assumed not to be empty at the limit, *choose\_p* must be executed only finitely often. Let i be an index from which no *choose\_p* step takes place. We then have *St <sup>i</sup>* <sup>2</sup> *St <sup>i</sup>*+1 <sup>2</sup> ··· , where <sup>2</sup> is the converse of the lexicographic combination <sup>2</sup> of two well-founded relations:


Again, we get a contradiction.

Finally, for Y , the only two transitions possible, *infer* and *red\_by\_children*, are to a state where Y is empty afterward, contradicting the hypothesis that Y is nonempty forever.

### 4 DISCOUNT Loop

The DISCOUNT loop [1] works on four-tuples (T, P, Y, A), where T is the set of *scheduled* ("to do") inferences, P is the set of so-called *passive* formulas (although, strictly speaking, any formula in Y is also passive); Y is a subsingleton storing the *given clause*; and A is the set of *active* formulas. All the sets are finite.

Initial states have the form (∅, P, ∅, ∅). Inferences are assumed to be finitary. We disallow premiseless inferences. Waldmann et al. [19, Example 81] allow them and let the T component of initial sets consist of all of them. However, in their "reasonable strategy," they implicitly assume that T is finite, in which case premiseless inferences can be immediately performed and replaced by the resulting formulas inserted in P.

DISCOUNT Loop without Fairness. The first version of the DISCOUNT loop, formalized in DISCOUNT\_Loop.thy, does not make any fairness assumption on the choice of the inference to compute or the given clause. There is no guarantee that T, P, and Y are empty at the limit, but if they are, then A is saturated at the limit. Here is an extract of the definition, omitting the *delete\_bwd* and *simplify\_fwd* rules:

```
inductive (-
             DL) :: 
                   f inference set × (
  f × DL_label) set ⇒ 
  f inference set × (
                   f × DL_label) set ⇒ bool
where
  compute_infer : ι ∈ RedI (A ∪ {C}) =⇒
    state (T ∪ ι, P , ∅, A) -
                          DL state (T, P, {C}, A)
                                      DL state (T, P, {C}, A)
· C) =⇒
    state (T, P, {C}, A) -
                         DL state (T, P, ∅, A)
   .
   .
   .
    state (T, P, C, A ∪ {C
                          }) -
                              DL state (T, P ∪ {C}, {C}, A)
    state (T, P, {C}, A) -
                         DL state (T ∪ T
                                        , P, ∅, A ∪ {C})
    state (T ∪ T
                , P, Y, A) -
                            DL state (T, P, Y, A)
```
The state function converts a four-tuple (T, P, Y, A) into a pair (T,N), where N is a set of labeled formulas. The labels are Passive (for P), YY (for Y ), and Active (for A, corresponding to active in LGC). The rules *compute\_infer*, *schedule\_infer*, and *delete\_orphan\_infers* refine the LGC rules of the same names; the other rules refine *process*.

DISCOUNT Loop with Fairness. In the fair version of the DISCOUNT loop, formalized in Fair\_DISCOUNT\_Loop.thy, the scheduled inferences and the passive formulas are organized as a single queue. A state is then a triple (P, Y, A), where P is the single queue that merges T and P from the above DISCOUNT loop, and Y and A are as above. Elements of P have the forms Passive\_Inference ι and Passive\_Formula C. The select function of P is assumed to be fair: If select is called infinitely often, every element in the queue will eventually be chosen and the limit of P will be empty.

The definition of the transition system is as follows:

inductive (-DLf) :: ( p, f) *DLf* \_*state* ⇒ ( p, <sup>f</sup>) *DLf* \_*state* <sup>⇒</sup> *bool* where *compute\_infer* : <sup>P</sup> <sup>=</sup> empty <sup>=</sup>⇒ select <sup>P</sup> <sup>=</sup> Passive\_Inference <sup>ι</sup> <sup>=</sup>⇒ <sup>ι</sup> <sup>∈</sup> Red<sup>I</sup> (<sup>A</sup> <sup>∪</sup> <sup>C</sup>) =⇒ (P , None, A) -DLf (remove (select P) P , Some C, A) <sup>|</sup> *choose\_p*: <sup>P</sup> <sup>=</sup> empty <sup>=</sup>⇒ select <sup>P</sup> <sup>=</sup> Passive\_Formula <sup>C</sup> <sup>=</sup>⇒ (P , None, A) -DLf (remove (select P) P , Some C, A) <sup>|</sup> *delete\_fwd*: <sup>C</sup> <sup>∈</sup> Red<sup>F</sup> <sup>A</sup> <sup>∨</sup> (∃C <sup>∈</sup> A. C · <sup>C</sup>) =⇒ (P , Some C, A) -DLf (P , None, A) . . . <sup>|</sup> *simplify\_bwd*: <sup>C</sup> <sup>∈</sup>/ <sup>A</sup> <sup>=</sup>⇒ <sup>C</sup> <sup>≺</sup><sup>S</sup> <sup>C</sup> <sup>=</sup>⇒ <sup>C</sup> <sup>∈</sup> Red<sup>F</sup> {C, C} <sup>=</sup><sup>⇒</sup> (P , Some C, A∪ {C }) -DLf (add (Passive\_Formula C) P , Some C, A) <sup>|</sup> *schedule\_infer* : set ιs <sup>=</sup> Inf\_between <sup>A</sup> {C} <sup>=</sup>⇒ (P , Some C, A) -DLf (fold (add ◦ Passive\_Inference) ιs P , None, A ∪ {C}) <sup>|</sup> *delete\_orphan\_infers*: ιs = [] =⇒ set ιs <sup>⊆</sup> passive\_inferences\_of <sup>P</sup> <sup>=</sup>⇒ set ιs <sup>∩</sup> Inf\_from <sup>A</sup> <sup>=</sup> <sup>∅</sup> <sup>=</sup>⇒ (P, Y, A) -DLf (fold (remove ◦ Passive\_Inference) ιs P , Y , A)

We note the following:


As with the Otter and iProver loops, the most important result is saturation at the limit:

```
theorem fair_DL_Liminf_saturated
  assumes
    full_chain (-
                 DLf) Sts and
    is_initial_DLf_state (Sts ! 0)
  shows saturated (labeled_formulas_of (Liminf_fstate Sts))
```
*Proof Sketch.* The proof amounts to showing that the sets P and Y are empty at the limit. This is easy to show for finite derivations, so we focus on infinite ones. We proceed by contradiction. For P, the fairness assumption for the select function of the queue guarantees that P is empty at the limit, at the condition that the *compute\_infer* and *choose\_p* rules are collectively executed infinitely often. Since P is assumed not to be empty at the limit, these two rules must be executed only finitely often. Let i be an index from which no *compute\_infer* or *choose\_p* step takes place. We then have *St <sup>i</sup> St <sup>i</sup>*+1 ··· , where is the converse of the lexicographic combination of two well-founded relations:


Since the lexicographic combination of well-founded relations is well founded, the chain *St <sup>i</sup> St <sup>i</sup>*+1 ··· cannot be infinite, a contradiction.

Finally, we consider Y . If Y is nonempty forever, the only possible transitions make the entire state decrease with respect to . This yields a contradiction.

### 5 Zipperposition Loop

The Zipperposition loop [17] as described by Waldmann et al. [19, Example 82] works on four-tuples (T, P, Y, A), where the components have the same roles as in the DISCOUNT loop: T is the *scheduled* set, P is the *passive* set, Y is the *given clause*, if any, and A is the *active* set. For technical reasons, we need to enrich the state with a ghost component D ("done"), of type f *inference set*, resulting in a five-tuple (T, D, P, Y, A). All the sets are finite.

The hallmark of the Zipperposition loop is that it can handle infinitary inferences. We assume that Inf\_between <sup>A</sup> {C} is countable if <sup>A</sup> is finite. (This assumption is implicit in Waldmann et al.) To store the infinitely many conclusions of an inference, T contains possibly infinite sequences of inferences, instead of individual inferences. Premiseless inferences are also allowed. Initial states have the form (T, P, ∅, ∅, ∅), where T contains all the premiseless inferences of the underlying proof calculus and only those.

The implementation in Zipperposition by Vukmirović et al. [17] deviates from Waldmann et al. in one important respect: Instead of sequences of inferences, Zipperposition works with sequences of *subsingletons* of inferences. The special value ∅ is returned when no progress is made in computing an inference, to give control back to the given clause procedure. In the setting of Waldmann et al., this special value can be replaced by a tautology (e.g., or ≈), which the given clause procedure can delete as redundant.

Zipperposition Loop without Fairness. The first version of the Zipperposition loop, formalized in Zipperposition\_Loop.thy, does not make any fairness assumption on the choice of the inference to compute or the given clause. Here is an extract of the definition:

inductive (-ZL) :: f *inference set* × ( <sup>f</sup> <sup>×</sup> DL\_label) *set* <sup>⇒</sup> f *inference set* × ( <sup>f</sup> <sup>×</sup> DL\_label) *set* <sup>⇒</sup> *bool* where *compute\_infer* : <sup>ι</sup><sup>0</sup> <sup>∈</sup> Red<sup>I</sup> (<sup>A</sup> ∪ {C}) =⇒ zl\_state (<sup>T</sup> <sup>+</sup> {LCons <sup>ι</sup><sup>0</sup> ιs}, D, P, <sup>∅</sup>, A) -ZL zl\_state (<sup>T</sup> <sup>+</sup> {ιs}, D ∪ {ι0}, P ∪ {C}, <sup>∅</sup>, A) <sup>|</sup> *choose\_p*: zl\_state (T, D, P ∪ {C}, <sup>∅</sup>, A) -ZL zl\_state (T, D, P, {C}, A) <sup>|</sup> *delete\_fwd*: <sup>C</sup> <sup>∈</sup> Red<sup>F</sup> <sup>A</sup> <sup>∨</sup> (∃C <sup>∈</sup> A. C · <sup>C</sup>) =⇒ zl\_state (T, D, P, C, A) -ZL zl\_state (T, D, P, <sup>∅</sup>, A) . . . <sup>|</sup> *schedule\_infer* : inferences\_of <sup>T</sup> <sup>=</sup> Inf\_between <sup>A</sup> {C} <sup>=</sup>⇒ zl\_state (T, D, P, C, A) -ZL zl\_state (T + T , D <sup>−</sup> inferences\_of <sup>T</sup> , P, ∅, A ∪ {C}) <sup>|</sup> *delete\_orphan\_infers*: set ιs <sup>∩</sup> Inf\_from <sup>A</sup> <sup>=</sup> <sup>∅</sup> <sup>=</sup>⇒ zl\_state (<sup>T</sup> <sup>+</sup> {ιs}, D, P, Y , A) -ZL zl\_state (T, D <sup>∪</sup> set ιs, P , Y , A)

The zl\_state function converts a five-tuple (T, D, P, Y, A) into a pair (U, N), where


We use a multiset for the T component. Waldmann et al. use a set, but this is not very realistic because an implementation cannot in general detect duplicate infinite sequences.

The D component addresses a subtle issue in Waldmann et al. If we did not subtract D in the definition of U, the completeness theorem we would obtain from the LGC layer above would require the T component to be empty at the limit. However, a given inference ι might appear in T multiple times and hence always be present, even if we keep on removing copies of it, if new copies are continuously added. The issue goes away if we add ι to D whenever we compute it, in *compute\_infer*—then the inference is not present in <sup>U</sup> (i.e., inferences\_of <sup>T</sup> <sup>−</sup>D). In other words, computing an inference makes it momentarily disappear, even if there are multiple copies of it in T.

Admittedly, it is not easy to develop a robust intuitive understanding of how D works, but what matters ultimately is that D allows us to obtain a usable main metatheorem. The metatheorem states that if the set of scheduled inferences and the set of passive formulas are empty at the limit of a derivation starting in an initial state, the active formula set is saturated at the limit. We will also see, via an additional refinement layer, that the ghost component is truly a ghost and can be omitted once it has served its purpose.

Zipperposition Loop with Fairness. Unlike the fair DISCOUNT loop, the fair Zipperposition loop, formalized in Fair\_Zipperposition\_Loop.thy, keeps T and P separate. An extract of the Isabelle definition follows:

inductive (-ZLf) :: ( t, p, f) *ZLf* \_*state* ⇒ ( t, p, f) *ZLf* \_*state* ⇒ *bool* where *compute\_infer* : (∃ιs <sup>∈</sup> <sup>t</sup>\_llists T. ιs <sup>=</sup> LNil) =⇒ t\_pick\_elem T = (ι0, T ) =⇒ <sup>ι</sup><sup>0</sup> <sup>∈</sup> Red<sup>I</sup> (<sup>A</sup> ∪ {C}) =⇒ (T, D, P, None, A) -ZLf (T , D ∪ {ι0}, <sup>p</sup>\_add C P, None, A) <sup>|</sup> *choose\_p*: <sup>P</sup> <sup>=</sup> <sup>p</sup>\_empty <sup>=</sup>⇒ (T, D, P, None, A) -ZLf (T, D, p\_remove (p\_select P) P , Some (p\_select P), A) <sup>|</sup> *delete\_fwd*: <sup>C</sup> <sup>∈</sup> Red<sup>F</sup> <sup>A</sup> <sup>∨</sup> (∃C <sup>∈</sup> A. C · <sup>C</sup>) =⇒ (T, D, P, Some C, A) -ZLf (T, D, P, None, A) . . . <sup>|</sup> *schedule\_infer* : inferences\_of ιss <sup>=</sup> Inf\_between <sup>A</sup> {C} <sup>=</sup>⇒ (T, D, P, Some C, A) -ZLf (fold t\_add\_llist ιss T, D <sup>−</sup> inferences\_of ιss, P , None, A ∪ {C}) <sup>|</sup> *delete\_orphan\_infers*: ιs <sup>∈</sup> <sup>t</sup>\_llists <sup>T</sup> <sup>=</sup>⇒ set ιs <sup>∩</sup> Inf\_from <sup>A</sup> <sup>=</sup> <sup>∅</sup> <sup>=</sup>⇒ (T, D, P, Y , A) -ZLf (t\_remove\_llist ιs T, D <sup>∪</sup> set ιs, P , Y , A)

The presence of two queues introduces some complications. Waldmann et al. [19, Example 82] claim that "to produce fair derivations, a prover needs to choose the sequence in ComputeInfer fairly and to choose the formula in ChooseP fairly." However, this does not suffice: A counterexample would apply *compute\_infer* infinitely often in a fair fashion, retrieving elements from some infinite sequences, without ever applying *choose\_p* (whose choice of formula would then be vacuously fair). The solution is to add a fairness assumption stating that *compute\_infer* is applied at most finitely many times before *choose\_p* is applied—or, in other words, that if *compute\_infer* is applied infinitely often, then so is *choose\_p*. This leads to the following main metatheorem:

```
theorem fair_ZL_Liminf_saturated:
  assumes
    full_chain (-
                 ZLf) Sts and
    is_initial_ZLf_state (Sts ! 0) and
    infinitely_often compute_infer_step Sts −→
      infinitely_often choose_p_step Sts
  shows saturated (labeled_formulas_of (Liminf_zl_fstate Sts))
```
*Proof Sketch.* Recall that zl\_state maps (T, D, P, Y, A) to a pair (U, N). In the abstract LGC layer, U and the passive subset of N are required to be empty at the limit. To obtain the same effect in -ZLf, we must show that the sets U, P, and Y are empty at the limit. This is easy to show for finite derivations, so we focus on infinite ones. We proceed by contradiction.

We start with U. We first show that there must be infinitely many *compute\_infer* steps. Assume that there are finitely many. Then there exists an index i from which no more *compute\_infer* steps take place. We then have *St <sup>i</sup> St <sup>i</sup>*+1 ··· , where is the converse of the lexicographic combination of four well-founded relations:


We get a contradiction. Having shown that there are infinitely many *compute\_infer* steps, we exploit the queue's fairness to show that one of these steps will choose any given inference ι from the queue. Thanks to the D trick, ι will then momentarily vanish from U, ensuring that it is not in the limit. The same argument applies for any inference ι, showing that U is empty at the limit.

Next, we show that P is empty at the limit. We start by showing that there must be infinitely many *choose\_p* steps. Assume that there are finitely many. Then, by the third assumption, there must be finitely many *compute\_infer* steps as well. Let i be an index from which no more *compute\_infer* steps take place. We then have *St <sup>i</sup> St <sup>i</sup>*+1 ··· , as above, yielding a contradiction.

Finally, we show that Y is empty at the limit. Let i be an index such that Y*<sup>i</sup>* ∩ Y*i*+1 ∩··· = ∅. Since a *compute\_infer* step is possible only if Y is empty, no such steps are possible from index i. Again, we have *St <sup>i</sup> St <sup>i</sup>*+1 ··· , a contradiction.

Queue of Formula Sequences. The queue data structure used for the T component of the Zipperposition loop needs to store a finite number of possibly infinite sequences of inferences. It is formalized in Prover\_Lazy\_List\_Queue.thy. It provides the following operations on abstract queue and element types q and e:

### fixes

empty :: q and add\_llist :: e *llist* ⇒ q ⇒ q and remove\_llist :: e *llist* ⇒ q ⇒ q and pick\_elem :: q ⇒ e × q and llists :: q ⇒ e *llist multiset*

The fairness requirement on implementations of the abstract queue interface takes the following form:

If a sequence of queue operations contains infinitely many pick\_elem steps and ι is at the head of one of the sequences stored in the queue, then either the sequence will be entirely removed (by orphan deletion) or ι will eventually be chosen.

A syntactically stronger formulation of fairness, where ι may occur anywhere in a sequence, is derived as a corollary:

If a sequence of queue operations contains infinitely many pick\_elem steps and ι occurs in one of the sequences stored in the queue at some index in the sequence, then either the sequence (possibly amputated from its leading elements) will be entirely removed or ι will eventually be chosen.

As a proof of concept, the theory file contains an example implementation of the queue as a FIFO queue. The proof that this FIFO queue is fair is the most finicky proof of our entire development.

Zipperposition Loop without Ghost Fields. In the last step of our development, we remove the D state component. D is useful to retrieve a usable main metatheorem for -ZL, but it is not explicitly referenced in the metatheorem for the fair variant -ZLf. The resulting transition system -ZLfw, formalized in Fair\_Zipperposition\_Loop\_without\_Ghosts.thy, operates on fourtuples (T, P, Y, A). Each transition is identical to the corresponding -ZLf transition, omitting the D component. The main metatheorem is also essentially the same.

### 6 Conclusion

We presented an Isabelle/HOL formalization of four variants of the given clause procedure, starting from Tourret and Blanchette's formalization of two abstract given clause procedures [16]. We relied extensively on stepwise refinement to derive properties of more concrete transition systems from more abstract ones.

Our main findings concern the Zipperposition loop. We found that the refinement proof is not as straightforward as previously thought [19, Example 82] and requires a nontrivial abstraction function. In addition, we discovered a fairness condition—the necessity of avoiding computing inferences forever without selecting a formula—that was not mentioned before in the literature, and we clarified other fine points.

Acknowledgments. We thank Andrei Popescu for helping us with a coinductive proof that arose when removing the ghosts in the Zipperposition loop. We thank Uwe Waldmann for sharing with us his encyclopedic knowledge of the various given clause loops and their history. Finally, we thank Mark Summerfield and the anonymous reviewers for many helpful suggestions.

This research has received funding from the Netherlands Organization for Scientific Research (NWO) under the Vidi program (project No. 016.Vidi.189.037, Lean Forward).

### References

1. Avenhaus, J., Denzinger, J., Fuchs, M.: DISCOUNT: a system for distributed equational deduction. In: Hsiang, J. (ed.) RTA 1995. LNCS, vol. 914, pp. 397–402. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-59200-8\_72


(eds.) IJCAR 2020. LNCS (LNAI), vol. 12166, pp. 316–334. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-51074-9\_18

19. Waldmann, U., Tourret, S., Robillard, S., Blanchette, J.: A comprehensive framework for saturation theorem proving. J. Autom. Reason. 66(4), 499–539 (2022). https://doi.org/10.1007/s10817-022-09621-7

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# QSMA: A New Algorithm for Quantified Satisfiability Modulo Theory and Assignment

Maria Paola Bonacina1(B) , Stéphane Graham-Lengrand<sup>2</sup>, and Christophe Vauthier<sup>3</sup>

> <sup>1</sup> Università degli Studi di Verona, Verona, Italy mariapaola.bonacina@univr.it <sup>2</sup> SRI International, Menlo Park, USA stephane.graham-lengrand@csl.sri.com <sup>3</sup> École Normale Supérieure, Paris, France cvauthier@clipper.ens.psl.eu

Abstract. This paper presents and proves totally correct a new algorithm, called QSMA, for the satisfiability of a quantified formula modulo a complete theory and an initial assignment. The optimized variant of QSMA implemented in YicesQS is described and shown to preserve total correctness. A report on the performance of YicesQS at the 2022 SMT competition is included. YicesQS ran in the LIA, NIA, LRA, NRA, and BV categories and ranked second for the "largest contribution" award (single queries). It was the only solver to solve all LRA instances, where it was about two orders of magnitude faster than the second best solver (Z3).

### 1 Introduction

Applications of automated reasoning generate formulas involving both quantifiers and symbols defined in background theories. For example, software verification needs reasoners that decide the satisfiability of quantified formulas modulo theories such as data structures and arithmetic (e.g., [20]). Therefore, endowing SMT solvers with quantifier reasoning (e.g., [3,9,11–14,22]), enriching first-order theorem provers with built-in theories (e.g., [1,2,19]), and integrating provers and solvers [7], are major research objectives.

If there is a single background theory T , the T -satisfiability of quantified formulas can be reduced to that of quantifier-free formulas if T admits quantifier elimination (QE): for every formula ϕ there exists a quantifier-free formula F that is T -equivalent to ϕ. Since computing F can be prohibitively expensive (e.g., exponential in linear rational arithmetic (LRA) and doubly exponential in linear integer arithmetic (LIA) [8]), QE is not a practical solution.

In this paper we propose a practical solution in the form of a new algorithm called QSMA. In QSMA the computation of quantifier-free *model-based under-approximations* (MBU) and *model-based over-approximations* (MBO) of quantified formulas embodies a lazy approach to QE, which is tailored for c The Author(s) 2023

T -satisfiability. MBU generates a quantifier-free implicant of the given formula that is true in the given model. *Model(-guided) generalization* for linear [12] and nonlinear real arithmetic (NRA) [17] is an instance of MBU. MBO generates a quantifier-free implied formula that is false in the given model. *Model interpolation* for NRA [17] is an instance of MBO.

The QSMA algorithm assumes that the theory T is *complete*. By its recursive nature, QSMA solves a generalized form of the satisfiability problem, called *quantified SMA* (*satisfiability modulo theory and assignment*): given a formula ϕ with *arbitrary quantification*, and an *initial assignment* to Boolean or first-order subterms of ϕ, find a theory model of ϕ that extends the initial assignment, or report that none exists. In addition to QSMA and its total correctness, we present an optimized variant named OptiQSMA, which preserves total correctness and is implemented in the YicesQS solver built on top of Yices 2. A report on experimental results from the 2022 SMT competition and a discussion complete the paper. We begin with a high-level view of QSMA.

#### 1.1 High-Level View of the **QSMA** Algorithm

The QSMA algorithm works by progressively instantiating quantified variables. Consider a formula ϕ of the form ∃x¯1.∀x¯2.∃x¯<sup>3</sup> ...F[¯x1, x¯2, x¯3,...] where F is quantifier-free. For example, suppose the theory is LRA, ϕ = ∃x.∀y.∃z.F and F = z ≥ 0 ∧ x ≥ 0 ∧ y + z ≥ 0. Say that QSMA assigns x←0. Whatever value is chosen for y, the algorithm can show that ϕ is true in LRA by assigning z←max(0, −y). If F = z ≥ 0∧x ≥ 0∧y+z ≤ 0, no matter which (non-negative) value QSMA chooses for x, it can show that ϕ is false in LRA by picking y←1, because there is no value for z that satisfies z ≥ 0 ∧ z ≤ −1.

For an example that is not in prenex normal form, consider a formula ϕ of the form ∃x.((∀y1.F1[x, y1]) ⇒ (∀y2.F2[x, y2])), where F<sup>1</sup> and F<sup>2</sup> are quantifierfree. QSMA sees the formula as ∃x.((∃y1.¬F1[x, y1]) ∨ (¬∃y2.¬F2[x, y2])), and then as ∃x.(p<sup>1</sup> ∨ ¬p2), where p<sup>1</sup> and p<sup>2</sup> are proxy Boolean variables for the quantified subformulas. QSMA assigns values to x, p1, and p2. If p<sup>1</sup> is assigned true, the algorithm tries to extend the assignment with a value for y<sup>1</sup> that satisfies ¬F1[x, y1]. If p<sup>2</sup> is assigned false, the algorithm tries to show that there is no value for y<sup>2</sup> that satisfies ¬F2[x, y2].

Without loss of generality (¬¬ converts ∀ into ¬∃¬), we consider formulas

$$\varphi = \exists \bar{x}. F[\bar{z}, \bar{x}, \bar{p}] \{ p\_i \gets \exists \bar{y}\_i. G\_i[\bar{z}, \bar{x}, \bar{y}\_i] \}\_{i=1}^k.$$

F[¯z, x, ¯ p¯] denotes a quantifier-free formula where the variables z¯, x¯, and p¯ occur. Tuples z¯ and x¯ contain the first-order variables occurring free in F. Formula F is quantifier-free because the quantified subformulas ϕ*<sup>i</sup>* = ∃y¯*i*.G*i*[¯z, x, ¯ y¯*i*] are replaced by proxy Boolean variables p¯ = p1,...p*k*. Given an initial assignment to the free variables z¯, we construct a QSMA*-tree* for ϕ. QSMA starts trying to satisfy F[¯z, x, ¯ p¯]. If it fails, it means that ϕ is false under the initial assignment. If it succeeds, there are two cases. If k = 0, formula ϕ is true under the initial assignment. If k > 0, the algorithm descends recursively to consider the QSMAsubtrees for the ϕ*<sup>i</sup>* subformulas (1 ≤ i ≤ k). If QSMA assigned true to p*i*, it tries to show that ϕ*<sup>i</sup>* is true. If QSMA assigned false to p*i*, it tries to show that ϕ*<sup>i</sup>* is false. If it succeeds for all QSMA-subtrees, formula ϕ is true under the initial assignment. For this, the model built by QSMA should satisfy F[¯z, x, ¯ p¯] ∧ *n <sup>i</sup>*=1(p*<sup>i</sup>* ⇔ ϕ*i*). Otherwise, formula ϕ is false under the initial assignment.

### 2 Preliminaries

A signature Σ is given by a set S of sorts and a set of sorted symbols. Given a class <sup>V</sup> = (V*<sup>s</sup>*)*<sup>s</sup>*∈*<sup>S</sup>* of disjoint sets of sorted variables, <sup>Σ</sup>[V]-formulas, <sup>Σ</sup>sentences, and Σ[V]-interpretations are defined as usual. A Σ-structure is a Σ[∅]-interpretation. We use x, y, z for first-order variables, p for Boolean ones, and x¯, y¯, z¯, and p¯ for tuples of such variables. We also use ϕ and ψ for formulas, F and G for quantifier-free formulas, M for interpretations, |= for satisfaction and entailment, = for identity,  for disjoint union, and \ for set difference. *FV* (ϕ) is the set of the variables occurring free in ϕ. Slightly abusing the notation, *FV* (ϕ) is also treated as a tuple. Implication is written ⇒ and logical equivalence is written <sup>⇔</sup>. If <sup>V</sup><sup>1</sup> ⊆ V<sup>2</sup> (i.e., <sup>V</sup>*<sup>s</sup>* <sup>1</sup> ⊆ V*<sup>s</sup>* <sup>2</sup> for all s ∈ S), a Σ[V2]-interpretation M<sup>2</sup> is an *extension* of a Σ[V1]-interpretation M<sup>1</sup> to V2, if M<sup>2</sup> interprets the variables in <sup>V</sup>*<sup>s</sup>* <sup>2</sup> \ V*<sup>s</sup>* <sup>1</sup> for all s ∈ S and is otherwise identical to M1.

A theory T is defined by a signature Σ and a set of Σ-sentences called T axioms. A model of T , or T -model, is a Σ-structure that satisfies the T -axioms. A T [V]-model is a Σ[V]-interpretation that is a T -model when the interpretation of variables is ignored. A theory T is *complete*, if it is consistent, and for all Σ-sentences F, either F or ¬F is provable from the T -axioms. In this paper we deal with a single theory T that has a unique T -model M0, so that the interpretation of everything except variables is fixed. Therefore T is complete, for Σ-sentences T -validity, T -satisfiability, and truth in M<sup>0</sup> coincide, all T [V] models are extensions of M0, and a T -satisfiability procedure is concerned only with assignments to variables. Since there are one theory and one signature, we write formula for Σ[V]-formula and model for T -model or T [V]-model. A *conservative theory extension* <sup>T</sup> <sup>+</sup> of <sup>T</sup> adds to <sup>Σ</sup> special constants, called *values*, to name elements in the domain of M<sup>0</sup> as needed. Conservative means that a <sup>T</sup> -satisfiable formula is also <sup>T</sup> <sup>+</sup>-satisfiable.

The *quantified SMA problem* for theory T asks whether M<sup>0</sup> |= ϕ for an arbitrary formula ϕ and an initial assignment of values to the variables in *FV* (ϕ). Formulas have the form <sup>ϕ</sup> <sup>=</sup> <sup>∃</sup>x.F¯ [¯z, x, ¯ <sup>p</sup>¯]{p*i*←∃y¯*i*.G*i*[¯z, x, ¯ <sup>y</sup>¯*i*]}*<sup>k</sup> <sup>i</sup>*=1 described in the introduction, where *FV* (ϕ)=¯z and quantified variables are standardized apart. If *FV* (ϕ) = ∅, we still have SMA problems when considering subformulas under an assignment to existentially quantified variables.

### 3 The **QSMA** Framework

The QSMA algorithm works with a tree representation of a formula ϕ. A node n in the tree is labeled with a pair (¯x, F), where x¯ is a tuple of first-order variables, called the *local variables* of n, and F is a quantifier-free formula. The local variables are implicitly existentially quantified: they are existentially quantified variables whose quantifers have been stripped, so that they are locally free, so to speak, and can be assigned by the algorithm. An arc from a node n to a child node b is labeled with a Boolean variable p. This Boolean variable stands as a *proxy* for the quantified subformula represented by the subtree rooted at node b. Therefore, the Boolean variable p is also considered a proxy of b itself.

A formula ϕ may have free variables *FV* (ϕ)=¯z, whose assignment is given initially as part of the SMA problem instance. These variables are called *rigid*, because their assignments do not change during the tree traversal. As the algorithm traverses the tree, the local variables of a node n are *rigid* from the point of view of a child node b: their assignments do not change during the traversal of the subtree rooted at b. Therefore, we represent a formula ϕ as a pair formed by a tuple of rigid variables and a labeled tree. Slightly abusing the terminology, we call this pair a QSMA*-tree*. The root of a tree T is denoted *root*(T).

Definition 1 (QSMA-tree). *Given* <sup>ϕ</sup> <sup>=</sup> <sup>∃</sup>x.F¯ [¯z, x, ¯ <sup>p</sup>¯]{p*i*←∃y¯*i*.G*i*[¯z, x, ¯ <sup>y</sup>¯*i*]}*<sup>k</sup> <sup>i</sup>*=1*, where FV* (ϕ)=¯z *and* ϕ*<sup>i</sup>* = ∃y¯*i*.G*i*[¯z, x, ¯ y¯*i*]*,* 1 ≤ i ≤ k*, the* QSMA-tree *for* ϕ *is the pair* G = (¯z, T)*, where* z¯ *is called the tuple of the rigid variables of* G*, and* T *is a labeled tree defined inductively as follows:*


If subformula ϕ*<sup>i</sup>* occurs more than once in ϕ, the same proxy variable p*<sup>i</sup>* is used for all occurrences. The *ancestors* of a node n in T are the nodes on the unique path from *root*(T) to n excluding n itself. If node n in T is labeled (¯x, F), its k outgoing arcs are labeled p1,...,p*k*, and x¯1,..., x¯*<sup>m</sup>* are the local variables of the ancestors of n, then *FV* (F) ⊆ {z, ¯ x¯1,..., x¯*m*, x, p ¯ <sup>1</sup>,...,p*k*}. The set of the *assignable variables at node* n is *Var* (n)=¯x  {p1,...,p*k*}. The set of the *rigid variables at node* n is *Rigid*(n)=¯z  x¯<sup>1</sup>  ...  x¯*m*. Thus, *FV* (F) ⊆ *Rigid*(n)∪*Var* (n), *Rigid*(*root*(T)) = ¯z, and the QSMA-subtree rooted at node n is G*<sup>n</sup>* = (*Rigid*(n), T*n*). For a node n with label (¯x, F), the components of the label are denoted n.x¯ and n.F. The label of the arc from n to a child b is denoted b.p.

*Example 1.* Given ∃x.((∀y1.F1[x, y1]) ⇒ (∀y2.F2[x, y2])) from Sect. 1.1, let ϕ = <sup>∃</sup>x.((∃y1.¬F1[x, y1])∨(¬∃y2.¬F2[x, y2])) = <sup>∃</sup>x.(p1∨¬p2){p*<sup>i</sup>* ← ∃y*i*.¬F*i*[x, y*i*]}<sup>2</sup> *<sup>i</sup>*=1. The QSMA-tree for ϕ has root r labeled (x, p<sup>1</sup> ∨ ¬p2) with left child b<sup>1</sup> labeled (y1,¬F1[x, y1]), right child b<sup>2</sup> labeled (y2,¬F2[x, y2]), and arcs from r to b<sup>1</sup> and from r to b<sup>2</sup> labeled p<sup>1</sup> and p2, respectively. Note how *FV* (r.F) ⊆ {x, p1, p2}, *Var* (r) = {x, p1, p2}, and *Rigid*(r) = ∅. Also, *FV* (b1.F) ⊆ {x, y1}, *FV* (b2.F) ⊆ {x, y2}, *Var* (b1) = {y1}, *Var* (b2) = {y2}, and *Rigid*(b1) = *Rigid*(b2) = {x}.

*Example 2.* Consider ∀x.((∃y1.(x 2·y1)) ⇒ (∃y2.(3·x 2·y2))). A double negation eliminates the ∀, yielding ¬(∃x.((∃y1.(x 2·y1))∧(∀y2.(3·x 2·y2)))). Again, a double negation eliminates the ∀, producing ¬(∃x.((∃y1.(x 2·y1)) ∧ (¬(∃y2.(3·x 2·y2))))). Let ϕ = ∃x.((∃y1.(x 2·y1)) ∧ (¬(∃y2.(3·x 2·y2)))) = ∃x.(p1∧¬p2){p<sup>1</sup> ← ∃y1.(x 2·y1), p<sup>2</sup> ← ∃y2.(3·x 2·y2)}. The original formula is true in LRA iff ϕ is false in LRA. The QSMA-tree for ϕ has root r labeled (x, p<sup>1</sup> ∧ ¬p2) with left child b<sup>1</sup> labeled (y1, x 2·y1), right child b<sup>2</sup> labeled (y2, 3·x 2·y2), and arcs from r to b<sup>1</sup> and from r to b<sup>2</sup> labeled p<sup>1</sup> and p2, respectively. The variable sets of this tree are as in Example 1.

Conversely, given a QSMA-tree G = (¯z, T), we can associate a formula n.ψ to any node n in T and hence to the QSMA-subtree G*<sup>n</sup>* = (*Rigid*(n), T*n*).

Definition 2 (Formula at a node). *Given a* QSMA*-tree* G = (¯z, T)*, for all nodes* n *of* T*, the* formula n.ψ at node n *is defined inductively as follows:*


If G = (¯z, T) is the QSMA-tree for ϕ and r = *root*(T), then r.ψ = ϕ.

*Example 3.* For the QSMA-tree in Example 2, b1.ψ = ∃y1.(x 2·y1), b2.ψ = ∃y2.(3·x 2·y2), and r.ψ = ∃x.(p1∧¬p2){p<sup>1</sup> ← ∃y1.(x 2·y1), p<sup>2</sup> ← ∃y2.(3·x 2·y2)} = ∃x.((∃y1.(x 2·y1)) ∧ ¬(∃y2.(3·x 2·y2))) = ϕ.

Since the input formula ϕ is represented as a QSMA-tree G = (¯z, T), the problem of satisfying ϕ becomes the problem of satisfying G. Therefore, we define *satisfaction of a* QSMA*-tree* next. Slightly abusing the notation, we use |= also for satisfaction of QSMA-trees.

Definition 3 (Satisfaction of a QSMA-tree). *Given a* QSMA*-tree* G = (¯z, T) *with* r = *root*(T)*, and an extension* M *of* M<sup>0</sup> *to Rigid*(r)=¯z*,* M |= G *if there exists an extension* M *of* M *to Var* (r) *such that (i)* M |= r.F*, and (ii) for all children* b *of* r*,* M (b.p) = true *iff* M |= G*b.*

The QSMA algorithm works by traversing the QSMA-tree G = (¯z, T), and at each node n in T it assigns the assignable variables in *Var* (n)=¯x {p1,...,p*k*}. This assignment corresponds to the extension M in Definition 3. Let b be a child of n: the Boolean variable b.p labeling the arc from n to b is a proxy for the quantified subformula b.ψ of the formula n.ψ. If M (b.p) = true, the aim of the algorithm is to show that b.ψ is true, and if M (b.p) = false, the aim is to show that b.ψ is false. Therefore Condition (ii) in Definition 3 says M |= G*<sup>b</sup>* if M (b.p) = true and M |= G*<sup>b</sup>* if M (b.p) = false. The next theorem shows that satisfying a formula ϕ and satisfying the QSMA-tree for ϕ correspond.

Theorem 1. *For all formulas* ϕ *with FV* (ϕ)=¯z*, for all models* M *extending* M<sup>0</sup> *to* z¯*, if* G *is the* QSMA*-tree for* ϕ *then* M |= G *iff* M |= ϕ*.*

Checking whether M |= G by testing all possible extensions M would not do, because for most theories (e.g., LRA) there is an infinite number of extensions. We need a way to weed out large parts of the space of candidate models. Let ϕ denote the set of ϕ's models. We introduce *under-approximations* and *overapproximations* of ϕ in order to under-approximate and over-approximate ϕ.

Definition 4 (Under- and over-approximation). *Let* ϕ *be a formula with FV* (ϕ)=¯z*. Quantifier-free formulas* U *and* O *with FV* (U) = *FV* (O)=¯z *are, respectively, an* under-approximation *and an* over-approximation *of* ϕ*, if for all extensions* M *of* M<sup>0</sup> *to* z¯*,* M |= U *implies* M |= ϕ *and* M |= ϕ *implies* M |= O*.*

It follows that -U ⊆ ϕ ⊆ -O. Let G = (¯z, T) be the QSMA-tree for ϕ, and U and O under- and over-approximations of ϕ, respectively. Then, M |= U implies M |= ϕ which implies M |= G by Theorem 1. Thus, satisfying an under-approximation is a sufficient condition to have a solution. On the other hand, M |= ¬O implies M |= ϕ which implies M |= G by Theorem 1. By the contrapositive, if M |= G then M |= ¬O, that is, M |= O. Thus, satisfying an over-approximation is a necessary condition to have a solution. In order to construct such approximations, we assume to have a solver for theory T (and model M0) offering:


MBU and MBO produce, respectively, an under-approximation and an overapproximation. Formula U[¯z] is true in model M and implies ∃x.F¯ [¯z, x¯], and hence can be seen as an *interpolant between model and formula*. It was called *model generalization* [12,17], because U[¯z] may have other models in addition to M. Formula O[¯z] follows from ∃x.F¯ [¯z, x¯] and is false in M, and hence can be seen as a *reverse interpolant between formula and model*, called *model interpolant* [17].

### 4 The **QSMA** Algorithm and Its Total Correctness

Let G = (¯z, T) be the QSMA-tree for input formula ϕ with *FV* (ϕ)=¯z. Given a model M extending M<sup>0</sup> to z¯, the QSMA algorithm determines whether M |= G. Suppose that U and O are under- and over-approximations of ϕ, respectively. Picture -U, ϕ, and -O as bubbles. The -U bubble is inside the ϕ bubble, which is inside the -O bubble. The idea of the algorithm is to zoom in on a model of ϕ, by progressively weakening U, so that the -U bubble inflates, and progressively strengthening O, so that the -O bubble deflates. The algorithm operates in this manner for all subformulas of ϕ: for all nodes n of T it maintains under and over-approximations n.U and n.O of n.ψ, progressively weakening n.U and strengthening n.O. The weakening of n.U is done by introducing a disjunction with an MBU. The strengthening of n.O is done by introducing a conjunction with an MBO. The goal is that M satisfies n.U ∨ ¬n.O. As soon as M satisfies n.U, we know that M |= G*n*. As soon as M satisfies ¬n.O, we know that M |= G*n*.


Fig. 1. Pseudocode of the main function of the QSMA algorithm

The main function QSMA (Fig. 1) initializes n.U to ⊥ (under-approximation of all formulas and identity for disjunction) and n.O to (over-approximation of all formulas and identity for conjunction) for all nodes n of T. Then QSMA calls the function subtreeIsSolved (Fig. 2) with arguments *root*(T) and <sup>M</sup>.

Function subtreeIsSolved takes a node <sup>n</sup> and a model <sup>M</sup> extending <sup>M</sup><sup>0</sup> to *Rigid*(n) and determines whether M |= G*n*. If M |= n.U it returns *true*; if M |= ¬n.O it returns *false* (lines 3–5 in Fig. 2). Otherwise (i.e., M |= ¬n.U ∧ n.O), it enters a loop whose body contains the following steps:


<sup>1</sup> See https://mariapaola.github.io/CDSATandQSMA.html for a copy of this paper with the proofs inserted.

Fig. 2. Pseudocode of the auxiliary functions of the QSMA algorithm


by MBU's specification we know that M |= MBU(L , *FV* (L ) \*Rigid*(n),M). This update ensures that <sup>M</sup> <sup>|</sup><sup>=</sup> n.U. Then subtreeIsSolved returns *true* (line 16).

7. If solutionForallChildren returns *false*, the control returns to line 7. Suppose that solutionForallChildren returned *false*, because it found a child <sup>b</sup> of <sup>n</sup> such that <sup>M</sup>(b.p) = true and subtreeIsSolved(b,M) returned *false*. Then the call subtreeIsSolved(b,M) updated the formula b.O (line 10). Suppose that solutionForallChildren returned *false*, because it found a child <sup>b</sup> of <sup>n</sup> such that <sup>M</sup>(b.p) = false and subtreeIsSolved(b,M) returned *true*. Then the call subtreeIsSolved(b,M) updated the formula b.U (line 15). Either way the state has changed, variable L gets a new formula on line 7, and the subsequent call to SMA will not produce the same model.

*Example 4.* Apply subtreeIsSolved to the root of the QSMA-tree in Example 1. Formula L gets p<sup>1</sup> ∨ ¬p2. SMA produces an M that assigns values to x, p1, and p2. Suppose that M satisfies p<sup>1</sup> ∨ ¬p<sup>2</sup> by assigning true to p1. In the recursive call on b1, formula L gets ¬F1[x, y1]. If SMA produces an M that extends M with an assignment to y<sup>1</sup> such that M |= ¬F1[x, y1], we have a model. Suppose that M satisfies p<sup>1</sup> ∨ ¬p<sup>2</sup> by assigning false to p2. In the recursive call on b2, formula L gets ¬F2[x, y2]. If SMA fails to produces an M that extends M with an assignment to y<sup>2</sup> such that M |= ¬F2[x, y2], we have a model.

Theorem 2. *The function* subtreeIsSolved *is partially correct: if the preconditions hold and the function halts, then the postconditions hold.*

For termination, we begin with the MBU and MBO functions. Let T be LRA with a theory extension LRA<sup>+</sup> that adds constant symbols q˜ for all rational numbers q. Consider an MBU function such that MBU(F[¯z, x], x,M) = F[¯z, x]{x←q˜} and M |= F[¯z, q˜]. This kind of MBU is called *generalization-by-substitution* [12]. While F[¯z, q˜] is an under-approximation of ∃x.F[¯z, x], this MBU is not a good choice for termination. By applying MBU repeatedly with an infinite enumeration of rational constants, the QSMA algorithm could build an infinite sequence of under-approximations ( *<sup>n</sup> <sup>i</sup>*=1 F[¯z, x]{x←q˜*i*})*<sup>n</sup>*∈<sup>N</sup> none of which is LRA-equivalent to ∃x.F[¯z, x]. The next definition excludes such MBU functions, by requiring that for a given formula and variable tuple (that depends on the formula), MBU can generate only finitely many formulas.

Definition 5 (Finite basis). *An* MBU *function has* finite basis *if the set* {MBU(F[¯z, x¯], x, ¯ M) | M : *extension of* M<sup>0</sup> *to* z¯ *such that* M |= ∃x.F¯ [¯z, x¯]} *is finite for all quantifier-free formulas* F[¯z, x¯] *and tuples* x¯*.*

The notion of an MBO function having a finite basis is defined in the same way with |= in place of |=.

Lemma 1. *If* MBU *and* MBO *have finite basis, for all (possibly infinite) series of calls* {subtreeIsSolved(n,M*i*)}*i, all satisfying the preconditions and all terminating, formulas* n.U *and* n.O *are updated only a finite number of times.*

Once nontermination due to MBU or MBO is excluded even for an infinite series of halting calls, termination is proved by induction on the QSMA-tree.

Theorem 3. *If the* MBU *and* MBO *functions have finite basis, whenever the preconditions are satisfied the function* subtreeIsSolved *halts.*

*Example 5.* Apply subtreeIsSolved to the root of the QSMA-tree in Example 2. Formula L gets p<sup>1</sup> ∧ ¬p2. SMA produces an M that assigns values to x, p1, and p2. Suppose that M assigns 1 to x, while it must assign true to p<sup>1</sup> and false to p2. In the recursive call on b1, formula L gets x 2·y1. If SMA produces an M that extends <sup>M</sup> with <sup>y</sup>1<sup>←</sup> <sup>1</sup> <sup>2</sup> , we have a model of G*<sup>b</sup>*<sup>1</sup> . In the recursive call on b2, formula L gets 3·x 2·y2. If SMA produces an M that extends M with <sup>y</sup>2<sup>←</sup> <sup>3</sup> <sup>2</sup> , we have a model of G*<sup>b</sup>*<sup>2</sup> , but because M (p2) = false, there is no model of G. Indeed, formula ϕ of Example 2 is false as the original formula is true.

### 5 The **OptiQSMA** Algorithm and Its Total Correctness

YicesQS implements an optimized variant of QSMA, called OptiQSMA, that reduces the number of recursive calls to subtreeIsSolved by entrusting more work to each call to SMA. Reconsider the behavior of QSMA in Example 4. We can avoid a recursive call to subtreeIsSolved by asking SMA to satisfy (p<sup>1</sup> ∨ ¬p2) ∧ (p<sup>1</sup> ⇒ ¬F1[x, y1]) in lieu of p<sup>1</sup> ∨ ¬p2. This way, if the candidate model returned by SMA assigns true to p1, it also assigns to x and y<sup>1</sup> values that satisfy ¬F1[x, y1]. This means that ∃y1.¬F1[x, y1] is found true without recursion. On the other hand, if p<sup>2</sup> is assigned false, the algorithm still has to make the recursive call to see if it can satisfy ∃y2.¬F2[x, y2].

The idea of OptiQSMA is to do a look-ahead on a path in the QSMA-tree, doing the work in one shot rather then through recursive calls on all the nodes in the path. The look-ahead applies to a path such that the Boolean labels of all the arcs in the path are assigned true by the candidate model. The following definition builds a formula to allow the look-ahead.

Definition 6 (Look-ahead formula). *Given a* QSMA*-tree* G = (¯z, T)*, for all nodes* n *of* T *the* look-ahead formula *of* n *is LF*(n) = n.F ∧- *<sup>n</sup>*→*<sup>b</sup>*(b.p <sup>⇒</sup> *LF*(b))*.*

The next definition distinguishes the nodes that are handled together in one shot without recursion and those where recursion is still needed. Nodes of the first kind are called *no alternation nodes*, because such nodes are on a path as described above, where all Boolean labels are assigned true and hence there is no alternation between true and false. Nodes of the second kind are called *first alternation nodes*, because they are the nodes reached by the first arc whose Boolean label is assigned false.

Fig. 3. Pseudocode of the main function of the OptiQSMA algorithm

Definition 7 (No alternation nodes and first alternation nodes). *Given a* QSMA*-tree* G = (¯z, T) *for all nodes* n *of* T *and extensions* M *of* M<sup>0</sup> *to FV* (*LF*(n))*, the set* NAN(n,M) *of the* no-alternation nodes from n according to M *(resp. the set* FAN(n,M) *of the* first-alternation nodes from n according to M*) contains all and only the nodes* b *such that: (i)* b *is a descendant of* n *through a path* n → n<sup>1</sup> → ... → n*<sup>q</sup>* → b *(*q ≥ 0*), (ii)* ∀i*,* 1 ≤ i ≤ q*,* M(n*i*.p) = true*, and (iii)* M(b.p) = true *(resp.* M(b.p) = false*).*

A node b ∈ FAN(n,M) such that q = 0 in Condition (i) of Definition 7 is a child of n: for a child there is no optimization. The OptiQSMA algorithm seeks a candidate model M that satisfies *LF*(n) and recurses only on the nodes in FAN(n,M). Therefore, the definition of *satisfaction with look-ahead*, denoted |=*la*, follows the pattern of Definition 3, replacing r.F with *LF*(r) and Condition (ii) of Definition 3 with a condition for the nodes in the FAN set.

Definition 8 (Satisfaction with look-ahead). *Given a* QSMA*-tree* G = (¯z, T) *with* r = *root*(T) *and an extension* M *of* M<sup>0</sup> *to Rigid*(r)=¯z*,* M |=*la* G *if there exists an extension* M *of* M *to FV* (*LF*(r)) *such that (i)* M |= *LF*(r) *and (ii) for all nodes* b ∈ FAN(r,M )*,* M |=*la* G*b.*

Since for the nodes b ∈ FAN(r,M ) it is M (b.p) = false, the |=*la* relation is negated in Condition (ii). The next theorem shows that the optimization does not change the problem.

Theorem 4. *Given a* QSMA*-tree* G = (¯z, T) *and an extension* M *of* M<sup>0</sup> *to* z¯*,* M |= G *if and only if* M |=*la* G*.*

The OptiQSMA algorithm maintains under-approximations n.U of n.ψ for all nodes n, but not over-approximations. Accordingly, the main function OptiQSMA (Fig. 3) initializes only n.U for all nodes n, and then calls optiSubtreeIsSolved (Fig. 4). This function returns SAT(U) if M |=*la* G and UNSAT(O) if M |=*la* G. The formula U is an under-approximation of r.ψ (r = *root*(T)) such that M |= U. The formula O is an over-approximation of r.ψ such that M |= O. The main function OptiQSMA has no usage for U and O and merely returns *true* or *false* accordingly. Function optisubtreeIsSolved builds and returns underapproximations and over-approximations recursively. The reason for saving only

Fig. 4. Pseudocode of the auxiliary functions of the optiQSMA algorithm

under-approximations is practical, and will become clear after the illustration of optisubtreeIsSolved. This function takes a node <sup>n</sup> and a model <sup>M</sup> extending M<sup>0</sup> to *Rigid*(n) and determines whether M |=*la* G*n*, by executing a loop whose body contains the following steps:


are not kept. Otherwise, there is potential for satisfaction with look-ahead. Function optiSubtreeIsSolved initializes the formula reasons to and invokes solutionForallDescendants passing reasons by reference.

4. Function solutionForallDescendants considers first all descendants b in FAN(n,M), and calls optiSubtreeIsSolved(b,M) for each of them. If this call returns SAT(U), it means that M |=*la* <sup>G</sup>*b*; solutionForallDescendants weakens b.U by disjunction with U and returns *false*.

If optiSubtreeIsSolved(b,M) returns UNSAT(O), it means that <sup>M</sup> |=*la* <sup>G</sup>*b*, and we move on to the next descendant in FAN(n,M). Prior to that, reasons is strengthened by conjunction with ¬b.p ⇒ ¬O. For all descendants b in NAN(n,M), solutionForallDescendants strengthens reasons by conjunction with b.p.

5. If solutionForallDescendants returns *true*, optiSubtreeIsSolved builds formula <sup>L</sup> as *LF*(n)∧reasons, and returns SAT(U), where <sup>U</sup> is the outcome of the application of MBU to L and M. Otherwise, the control returns to line 3. Since solutionForallDescendants returned *false*, it means that it found a node <sup>b</sup> in FAN(n,M) for which optiSubtreeIsSolved(b,M) returned SAT(U) and the formula b.U was updated (line 17). Therefore the state has changed, variable L gets a new formula on line 3, and the subsequent call to SMA will not produce the same model.

In the experiments it turned out that storing over-approximations for all nodes is less efficient than using them to compute L and then forget them. Thus, the over-approximation O encapsulated in the UNSAT(O) value returned by a recursive call to optiSubtreeIsSolved is used to build the temporary formula reasons, but it is not saved, and reasons is used to compute L .

Theorem 5. *The function* optiSubtreeIsSolved *is partially correct: if the preconditions hold and the function halts, then the postconditions hold.*

The proof of partial correctness of optiSubtreeIsSolved shows that every model that satisfies <sup>L</sup> = (*LF*(n) <sup>∧</sup> reasons) fulfills Definition 8. In this sense, reasons is an explanation of why a model is found with look-ahead.

Theorem 6. *If the* MBU *and* MBO *functions have finite basis, whenever the preconditions are satisfied the function* optiSubtreeIsSolved *halts.*

### 6 The YicesQS Solver and Experimental Results

The OptiQSMA algorithm is implemented in YicesQS to equip Yices 2 with support for quantifiers for complete theories (unrelated to Yices 2 support for quantifiers in UF).<sup>2</sup> MBO is available as model interpolation from Yices's MCSAT [10] solver for quantifier-free formulas, including theory-specific techniques for bitvectors (BV) [15] and arithmetic. The latter are based on NLSAT [16] and ultimately on Cylindrical Algebraic Decomposition (CAD). Basic MBU is done

<sup>2</sup> See https://github.com/disteph/yicesQS and https://yices.csl.sri.com/.


Fig. 5. Plot for BV.

as generalization-by-substitution [12] and improved with *model-based projection* (e.g., [18]) for arithmetic, and *invertibility conditions* [21], including -terms, for BV. In YicesQS model-based projection also is based on CAD.

In the 2022 SMT competition, YicesQS entered the single-query, non-incremental tracks of BV, LRA, LIA, NRA, and NIA (nonlinear integer arithmetic). The experiments were run on the Starexec cluster with a 20 min timeout per benchmark and 60GB of memory. The benchmarks were a subset of the SMT-LIB collection. The results presented below were computed by running the competition script join.sh on the raw data from StarExec,<sup>3</sup> sorting the data, and producing the plots that are available online.<sup>4</sup> A description of the participating solvers can be found on the competition website.<sup>5</sup>

Figure 5 shows the results for BV, where YicesQS solved quickly a high number of benchmarks (compared for example with CVC5), but was not outstanding, possibly because YicesQS 2022 makes a limited use of invertibility conditions for model interpolation. Figure 6 shows the results for the four arithmetics. The columns on the left list number of solved instances and time to solve them for each logic and solver. In the plot on the right, each color corresponds to a solver and point (x, y) of that color means that the x*th* fastest-solved benchmark was solved by that solver in time y (log scale). 2021 Z3 is included because in some of these logics it performed slightly better than 2022 Z3. The logic where YicesQS performed best is LRA: it was the only solver to solve all 1,003 benchmarks. Z3 2021 was second best, solving 948 benchmarks with a total runtime about 100 times higher. YicesQS has neither a special treatment (e.g., simplex-based) of linear problems, nor integer-specific techniques: it relies on CAD-based techniques for MBU and MBO also for integer problems. Thus, it is somewhat average on LIA and NIA. These two theories are undecidable (NRA due to division by 0) and hence they lie outside of the theoretical framework of QSMA. YicesQS

<sup>3</sup> https://github.com/SMT-COMP/smt-comp/tree/master/2022/results.

<sup>4</sup> http://www.csl.sri.com/users/sgl/Work/Cade2023-data/index.html.

<sup>5</sup> https://smt-comp.github.io/2022/participants.html.

Fig. 6. Plots for the four arithmetics.

answers should still be correct, but termination can be lost. With Z3 being a non-competing participant in the SMT 2022 competition, YicesQS came second for *Largest Contribution* (single queries), because of its overall performance in the four arithmetics, where it also came first for satisfiable instances and in the 24 sec timeout setup (instead of 20 min).

### 7 Discussion: Related Work and Future Work

Quantified SMT was approached by a procedure with an ∃-solver and a ∀-solver for prenex normal form formulas with ∃∀ prefix [12]. A formulation as a game between an ∃-player and a ∀-player appeared with the *QSAT algorithm* [3] for prenex normal form formulas with (∃∀)<sup>+</sup> prefix. QSMA accepts arbitrary formulas with quantifiers in arbitrary positions.

Both QSAT and QSMA work for a generic theory T over basic T -specific components. QSAT uses *model-based projection* [3,18] and a solver for quantifier-free satisfiability that supports UNSAT cores. Model-based projection is an instance of MBU. An UNSAT core (as a conjunction) is an MBO in the special case where the input assignment is Boolean. While MBO can produce UNSAT cores, MBO generalizes the concept of UNSAT core with theory-specific reasoning when there are *non-Boolean input assignments*, as it is the case in QSMA. It is unclear whether the combination of UNSAT cores and theory-specific MBU can emulate MBO or provide the same benefits. QSAT is implemented in Z3 and it is the default solver for LIA, LRA, and NRA.

YicesQS is a recent implementation that only participated in the SMT competition in 2021 and 2022. Directions for further development include augmenting integer reasoning, and improving model interpolation in BV by a better usage of invertibility conditions. Another lead for future work is to compose QSMA within the *CDSAT framework for conflict-driven reasoning in unions of theories* [4–6]. For this, one may need to drop the assumption that there is a unique model M<sup>0</sup> and only its extensions need to be considered, which will be a generalization also in the single theory case. As most known MBU and MBO functions are for single theories, one may have to study how to get MBU and MBO functions for a union of theories from such functions for the component theories. Another issue is the interplay between QSMA's recursive descent over the QSMA-tree for the formula and CDSAT's conflict-driven search.

Acknowledgements. Part of this work was done while the first and third authors were visiting SRI International, whose support is much appreciated. This material is based upon work supported by NSF with awards CCF-1816936 and CCF-1817204. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the US Government or NSF.

### References

1. Althaus, E., Kruglov, E., Weidenbach, C.: Superposition modulo linear arithmetic SUP(LA). In: Ghilardi, S., Sebastiani, R. (eds.) FroCoS 2009. LNCS (LNAI), vol. 5749, pp. 84–99. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642- 04222-5\_5


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Uniform Substitution for Dynamic Logic with Communicating Hybrid Programs**

Marvin Brieger1(B) , Stefan Mitsch<sup>2</sup> , and Andr´e Platzer2,3

<sup>1</sup> LMU Munich, Munich, Germany marvin.brieger@sosy.ifi.lmu.de <sup>2</sup> Carnegie Mellon University, Pittsburgh, USA smitsch@cs.cmu.edu, platzer@kit.edu <sup>3</sup> Karlsruhe Institute of Technology, Karlsruhe, Germany

**Abstract.** This paper introduces a uniform substitution calculus for dLCHP, the dynamic logic of communicating hybrid programs. Uniform substitution enables parsimonious prover kernels by using axioms instead of axiom schemata. Instantiations can be recovered from a single proof rule responsible for soundness-critical instantiation checks rather than being spread across axiom schemata in side conditions. Even though communication and parallelism reasoning are notorious for necessitating subtle soundness-critical side conditions, uniform substitution when generalized to dLCHP manages to limit and isolate their conceptual overhead. Since uniform substitution has proven to simplify the implementation of hybrid systems provers substantially, uniform substitution for dLCHP paves the way for a parsimonious implementation of theorem provers for hybrid systems with communication and parallelism.

**Keywords:** Uniform substitution · Parallel programs · Differential dynamic logic · Assumption-commitment reasoning · CSP

### **1 Introduction**

Hybrid systems and parallel systems are notoriously subtle to analyze. Combining both not only culminates these subtleties but is further complicated because parallel hybrid systems are interlocked by synchronization in a shared global time. The *dynamic logic of communicating hybrid programs* dLCHP [6]

$$\frac{[\alpha]\varphi}{[\alpha \parallel \beta](\varphi \wedge \psi)} \stackrel{(\*\*\*)}{(\*\*\*)}$$

**Fig. 1.** The proof rule is only sound under subtle side conditions (--).

tames the complexity of parallel hybrid systems providing a compositional proof calculus that disentangles reasoning into purely discrete, continuous, and communication pieces. However, the calculus is subject to schematic side conditions whose implementation is generally error-prone causing large soundness-critical code bases [30]. In particular, compositional reasoning about parallelism as in the idealized proof rule in Fig. 1 holds the challenge to exhaustively characterize *all* side conditions required to make *all* instances of this proof rule sound. Proof systems for discrete parallelism [1,19,27,35,44,46] already have complicated side conditions, but complexity only increases with continuous interactions in shared global time.

In order to compositionally support compositional reasoning for parallel hybrid systems, this paper generalizes Church's uniform substitution [8] and develops a uniform substitution calculus [30–32] for dLCHP. Uniform substitution modularizes the calculus itself enabling its parsimonious implementation. Although applicable to discrete parallelism, the dLCHP development resolves the inherent challenge that parallel hybrid systems always synchronize in time.

Uniform substitution adopts a finite list of concrete formulas as axioms instead of an infinite set of formulas via axiom schemata with side conditions. This enables theorem provers without the extensive algorithmic checks otherwise required for each schema to sort out unsound instances. Thanks to the proof rule US for uniform substitution, only sound instances derive from the axioms such that the parallel composition rule in dLCHP could be adopted almost literally as above, but with all the soundness-critical checking encapsulated solely in rule US. Thanks to US's checking, parallel systems reasoning even reduces to a single parallel injection axiom [α]ψ → [α β]ψ that merely describes the preservation of property ψ of one parallel component α in the parallel system α β. Proofs about α β reduce to a sequence of property embeddings with this axiom from local abstractions of the subcomponents, which combine soundly due to US.

Soundness checks in uniform substitution are ultimately determined by the binding structures as identified in the static semantics. The development of uniform substitution for dLCHP is, therefore, grounded in the following key observation: Communication and parallelism both cause additional binding structure that needs attention in the substitution process performed by rule US:

(B I) Expressions depend on communication along (co)finite channel sets (besides finitely many free variables), which, by the core substitution principle [8], must not be introduced free into contexts where they are written.

(B II) Subprograms in a parallel context need to be restricted in the variables and channels written as compositional proof rules for parallelism require local abstractions of subprograms not depending on the internals of the context [35].

Grounded in the need for abstraction (B II), [α]ψ → [α β]ψ can only be adopted as a sound axiom schema if α and β do not share state, and if program β does not interfere with the contract ψ, i.e., (i) ψ has no free variables bound by β (with exceptions), and (ii) ψ does not depend on communication channels written by β (except for channels joint with α). This extensive side condition would need nontrivial soundness-critical implementations of dLCHP axiom schemata. Still, uniform substitution can be lifted with only small changes locally checking for clashes with written channels, and prohibited variables or channels.

The modularity of uniform substitution is the key to the parsimonious implementation [23] of the theorem prover KeYmaera X [11] for differential dynamic logic dL and differential game logic dGL [29], thus paving the way for a straightforward theorem prover implementation of dLCHP. Since dLCHP conservatively generalizes dL [6], its uniform substitution calculus inherits the complete [33] axiomatic treatment of differential equation invariants [30]. All proofs are in [7].

### **2 Dynamic Logic of Communicating Hybrid Programs**

This section briefly recaps dLCHP [6], the dynamic logic of communicating hybrid programs (CHPs). It combines hybrid programs [28] with CSP-style communication and parallelism [15]. By assumption-commitment (ac) reasoning [22,46,47], dLCHP allows compositional verification of parallelism in dL. For uniform substitution, function and predicate symbols, and program constants are added.

#### **2.1 Syntax**

The set of variables V = V<sup>R</sup> ∪V<sup>N</sup> ∪V<sup>T</sup> has real (VR), integer (VN), and trace (V<sup>T</sup> ) variables. For each x ∈ VR, the differential symbol x is in VR, too. The designated variable μ ∈ V<sup>R</sup> represents the shared global time. The set of channel names is Ω. By convention x, y ∈ VR, n ∈ VN, h ∈ V<sup>T</sup> , ch ∈ Ω, and z ∈ V . Channel set <sup>Y</sup> <sup>⊆</sup> <sup>Ω</sup> is (co)finite. Vectorial expressions are denoted ¯e. Moreover, <sup>f</sup><sup>M</sup>, <sup>g</sup><sup>M</sup> are M-valued function symbols and p, q, r are predicate symbols, where argument sorts are annotated by : M1,..., Mk. Finally, a, b are program constants.

**Definition 1 (Terms).** *Terms consist of real* (TrmR)*, integer* (TrmN)*, channel* (TrmΩ)*, and trace* (Trm<sup>T</sup> ) *terms, and are defined by the grammar below, where* θ, θ1, θ<sup>2</sup> <sup>∈</sup> <sup>Q</sup>[VR] <sup>⊂</sup> Trm<sup>R</sup> *are polynomials in* <sup>V</sup>R*:*


Real terms are polynomials in V<sup>R</sup> enriched with function symbols f <sup>R</sup>(Y, e¯) (including constants <sup>c</sup> <sup>∈</sup> <sup>Q</sup>) only depending on communication along channels <sup>Y</sup> and terms ¯e, differential terms (θ) , and val(te) and time(te), which access the value and the timestamp of the last communication in te, respectively. By convention, <sup>θ</sup> <sup>∈</sup> <sup>Q</sup>[VR] denotes a pure polynomial in <sup>V</sup><sup>R</sup> without (·) , val(·), and time(·) as they occur in programs. For simplicity, we do not define <sup>Q</sup>[VR] <sup>⊂</sup> Trm<sup>R</sup> as a fifth term sort but use the convention that function symbols g<sup>R</sup> can only be replaced with Q[VR]-terms. Integer terms are variables n, function symbols <sup>f</sup> <sup>N</sup>(Y, <sup>e</sup>¯) (including constants 0, 1), addition, and length <sup>|</sup>te<sup>|</sup> of trace term te. 1 The function symbol <sup>f</sup> <sup>Ω</sup>(Y, <sup>e</sup>¯) includes constants ch <sup>∈</sup> <sup>Ω</sup>, and chan(te) is channel access. Trace terms record the communication history of programs. They encompass variables h, function symbols f<sup>T</sup> (Y, e¯) (including the empty trace ), communication items ch, θ1, θ<sup>2</sup> with value θ<sup>1</sup> and timestamp θ2, projection

<sup>1</sup> Omitting multiplication results in decidable Presburger arithmetic [34].

te ↓ Y onto channels Y , and access te[ie] of the ie-th item in te. Where useful, op(¯e) denotes built-in function symbols of fixed interpretation, e.g., *·* + *·*.

dLCHP's context-sensitive program and formula syntax presumes notions of free and bound variables (Sect. 2.3) defined on the context-free syntax:

**Definition 2 (Programs).** Communicating hybrid programs *are defined by the following grammar, where* <sup>θ</sup> <sup>∈</sup> <sup>Q</sup>[VR] *is a polynomial in* <sup>V</sup><sup>R</sup> *and* <sup>χ</sup> <sup>∈</sup> FOL<sup>R</sup> *is a formula of first-order real-arithmetic. In* α β*, the subprograms must not share state but can share time and history, i.e.,* BV(α) <sup>∩</sup> BV(β) ⊆ {μ, μ } ∪ V<sup>T</sup> *.* 2

$$\begin{aligned} \alpha, \beta ::= a \{ Y, \bar{z} \} \mid x := \theta \mid x := \* \mid ? \chi \mid \{ x' = \theta \& \chi \} \mid \alpha; \beta \mid \alpha \cup \beta \mid \alpha^\* \mid \\\ ch(h) ! \theta \mid ch(h) ? x \mid \alpha \parallel \beta \end{aligned}$$

The program constant a(|Y, z¯|) restricts the written channels to Y ⊆ Ω and the bound variables to ¯z ⊆ V<sup>R</sup> ∪ V<sup>T</sup> , where Y and ¯z are (co)finite. Instead of a(|Y, z¯|), write a if Y and ¯z can be arbitrary. Assignment x := θ updates x to θ, nondeterministic assignment x := ∗ assigns an arbitrary real value to x, and the test ?χ does nothing if χ holds and aborts the computation otherwise. The continuous evolution {x = θ&χ} follows the ODE x = θ for any duration as long as formula χ is not violated. The global time μ evolves with every continuous evolution according to ODE μ = 1. Sequential composition α; β executes β after α, choice α ∪ β executes α or β nondeterministically, α<sup>∗</sup> repeats α zero or more times, ch(h)!θ sends θ along channel ch, and ch(h)?x receives a value into variable x along channel ch. The trace variable h records communication. Finally, α β executes α and β in parallel synchronized in global time μ.

*Example 3.* The program ct<sup>∗</sup> ve<sup>∗</sup> models a simplified cruise control [24]. The vehicle ve repeatedly receives a target velocity vtr ve from the controller ct along channel tar. The target vtr ct sent by ct is in range [0, V ]. Hence, ve's velocity vve stays in range [0, V ] within the > 0 time units till the next communication if vve ∈ [0, V ] held initially. The evolution {t = 1} allows passage of time in ct.

$$\begin{aligned} \mathsf{ct} \equiv v\_{\mathsf{ct}}^{\mathrm{tr}} &:=\*; ? (0 \le v\_{\mathsf{ct}}^{\mathrm{tr}} \le V); \mathsf{tar}(h)! v\_{\mathsf{ct}}^{\mathrm{tr}}; \{t'=1\} \\ \mathsf{v} \mathbf{e} \equiv \mathsf{tar}(h)? v\_{\mathsf{v}\bullet}^{\mathrm{tr}}; a\_{\mathsf{v}\bullet} &:= \frac{v\_{\mathsf{v}\bullet}^{\mathrm{tr}} - v\_{\mathsf{v}\bullet}}{\epsilon}; t\_0 := \mu; \{v\_{\mathsf{v}\bullet}^{\prime} = a\_{\mathsf{v}\bullet} \& \,\mu - t\_0 \le \epsilon\} \end{aligned}$$

**Definition 4 (Formulas).** *Formulas are defined by the grammar below for relations* ∼*, terms* e1, e<sup>2</sup> ∈ Trm *of equal sort, and* z ∈ V *. Moreover, the acformulas are unaffected by state change in* <sup>α</sup>*, i.e.,* (FV(A)∪FV(C))∩BV(α) <sup>⊆</sup> <sup>V</sup><sup>T</sup> *.*

$$<\varphi, \psi, \mathsf{A}, \mathsf{C} ::= e\_1 \sim e\_2 \mid p(Y, \bar{e}) \mid \neg \varphi \mid \varphi \land \psi \mid \forall z \, \varphi \mid [\alpha] \psi \mid [\alpha] \lcorner [\mathsf{A}, \mathsf{C}] \psi$$

The formulas combine first-order dynamic logic with ac-reasoning. Predicate symbols p(Y, e¯) depend on channels Y and terms ¯e. The ac-box [α]{A,C}ψ

<sup>2</sup> Previous work [6] disallows reading of variables bound in parallel as their change is not observable. This restriction is conceptually desirable but not soundness-critical. Here we drop it for simplicity, but it could be maintained by US as well.

expresses that C holds after each communication event and ψ in the final state, for all runs of <sup>α</sup> whose incoming communication satisfies <sup>A</sup>. Other connectives <sup>∨</sup>, →, ↔ and quantifiers ∃z ϕ ≡ ¬∀z ¬ϕ can be derived. The relations ∼ include = for all term sorts, ≥ on real and integer terms, and prefixing on trace terms.

By convention, the predicate symbol q<sup>R</sup> can only be replaced with formulas of first-order real arithmetic. It serves as placeholder for tests χ in CHPs.

*Example 5.* The cruise control from Example 3 is safe if its velocity stays in range [0, V ]. This can be expressed with the formula ϕ → [ct<sup>∗</sup> ve∗]ψsafe, where ψsafe ≡ 0 ≤ vve ≤ V and ϕ ≡ ψsafe ∧ > 0 ∧ V > 0.

#### **2.2 Semantics**

A *trace* τ = (τ1, ..., τk) is a finite chronological sequence of communication events <sup>τ</sup><sup>i</sup> <sup>=</sup> chi, di, s<sup>i</sup> , where ch<sup>i</sup> <sup>∈</sup> <sup>Ω</sup>, and <sup>d</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup> is the communicated value, and <sup>s</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup> is a timestamp such that <sup>s</sup><sup>i</sup> <sup>≤</sup> <sup>s</sup><sup>j</sup> for 1 <sup>≤</sup> i<j <sup>≤</sup> <sup>k</sup>. A *recorded trace* τ = (τ1, ..., τk) additionally carries a trace variable h<sup>i</sup> ∈ V<sup>T</sup> with each event, i.e., <sup>τ</sup><sup>i</sup> <sup>=</sup> hi, chi, di, s<sup>i</sup> . For variable <sup>z</sup> <sup>∈</sup> <sup>V</sup><sup>M</sup> and <sup>M</sup> ∈ {R, <sup>N</sup>, T }, let type(z) = <sup>M</sup>. A *state* v maps each z ∈ V to a value v(z) ∈ type(z). The sets of traces, recorded traces, and states are denoted T , Trec, and S, respectively.

For <sup>d</sup> <sup>∈</sup> type(z), the state <sup>v</sup><sup>d</sup> <sup>z</sup> is the modification of v at z to d. For τ ∈ Trec, the trace τ (h) ∈ T is obtained from the subsequence of τ carrying h ∈ V<sup>T</sup> by removing the carried variable. *State-trace concatenation* v · τ ∈ S for τ ∈ Trec, appends τ (h) to v at h for all h ∈ V<sup>T</sup> . The *projection* τ ↓ Y of (recorded) trace τ is the subsequence of all communication events in τ whose channel is in Y ⊆ Ω. The *state projection* v ↓ Y ∈ S modifies v at h to v(h) ↓ Y for all h ∈ V<sup>T</sup> .

An *interpretation* <sup>I</sup> assigns a function <sup>I</sup>(f<sup>M</sup> : <sup>M</sup>1,..., <sup>M</sup>k) :×<sup>k</sup> <sup>i</sup>=1 <sup>M</sup><sup>i</sup> <sup>→</sup> <sup>M</sup> to each function symbol f<sup>M</sup> that is smooth in all real-valued arguments if M = R, and a relation <sup>I</sup>(<sup>p</sup> : <sup>M</sup>1,..., <sup>M</sup>k) ⊆×<sup>k</sup> <sup>i</sup>=1 <sup>M</sup><sup>i</sup> to each <sup>k</sup>-ary predicate symbol <sup>p</sup>.

**Definition 6 (Term Semantics).** *The* valuation Iv[[e]] <sup>∈</sup> <sup>R</sup> <sup>∪</sup> <sup>N</sup> <sup>∪</sup> <sup>Ω</sup> ∪ T *of term* e *in interpretation* I *and state* v *is defined as follows:*

$$\begin{aligned} Iv[z] &= v(z) \\ Iv[f(Y, e\_1, \ldots, e\_k)] &= I(f)(I\ddot{v}[e\_1], \ldots, I\ddot{v}[e\_k]) \quad \text{where } \ddot{v} = v \downarrow Y \\ Iv[\mathtt{op}(e\_1, \ldots, e\_k)] &= \mathtt{op}(Iv[e\_1], \ldots, Iv[e\_k]) \quad \text{for builtin } \mathtt{op} \in \{\cdot + \cdot, \cdot \downarrow Y, \ldots\} \\ Iv[\boldsymbol{\theta}/\boldsymbol{\theta}] &= \sum\_{\boldsymbol{x} \in V\_{\mathbb{R}}} v(\boldsymbol{x}') \frac{\partial Iv[\boldsymbol{\theta}]}{\partial \boldsymbol{x}} \end{aligned}$$

The projection ˜v = v ↓ Y ensures that f(Y, e¯) only depends on Y , i.e., the communication in v along channels Y does not matter. The differentials (θ) have a semantics describing the local rate of change of θ [30].

The denotational semantics of CHPs [6] combines dL's Kripke semantics [30] with a linear history semantics [47] and a global notion of time. Denotations are subsets of D = S×Trec × S<sup>⊥</sup> with S<sup>⊥</sup> = S ∪ {⊥}. Final state ⊥ marks an unfinished computation, i.e., it still can be continued or was aborted due to a failing test. If (w = ⊥ and τ τ ), where is the prefix relation on traces, or (τ , w )=(τ,w), then (τ , w ) is a prefix of (τ,w) written (τ , w ) (τ,w). Since (even empty) communication of unfinished computations is still observable, denotations D ⊆ D of CHPs are prefix-closed and total, i.e., (v, τ, w) ∈ D and (τ , w ) (τ,w) implies (v, τ , w ) ∈ D, and ⊥<sup>D</sup> ⊆ D with ⊥<sup>D</sup> = S×{} × {⊥}. Moreover, all (v, τ, w) ∈ D are chronological, i.e., v(μ) ≤ w(μ) and when τ = (τ1,...,τk) = and let τi(μ)=(hi, chi, di, s<sup>i</sup> )(μ) = si, then v(μ) ≤ τ1(μ) and if w = ⊥, then τk(μ) ≤ w(μ). Note that τ is chronological as all traces are.

The interpretation I(a(|Y, z¯|)) ⊆ D of a program constant a(|Y, z¯|) is a prefixclosed and total set of chronological computations that (i) only communicate along (write) channels Y and (ii) only bind variables ¯z. More precisely, for all (v, τ, w) ∈ I(a(|Y, z¯|)), we have (i) τ ↓ Y - = , and (ii) v = w on V<sup>T</sup> and w · τ = v on ¯z-. For D,M ⊆ D, we define D<sup>⊥</sup> = {(v, τ, ⊥) | (v, τ, w) ∈ D}, and (v, τ, w) ∈ D
M if (v, τ1, u) ∈ D and (u, τ2, w) ∈ M exist with τ = τ<sup>1</sup> ·τ2. For states wα, wβ, the merged state w<sup>α</sup> ⊕ w<sup>β</sup> is ⊥ if one of the substates w<sup>α</sup> or w<sup>β</sup> is ⊥. Otherwise, <sup>w</sup><sup>α</sup> <sup>⊕</sup> <sup>w</sup><sup>β</sup> <sup>=</sup> <sup>w</sup><sup>α</sup> on BV(α) and <sup>w</sup><sup>α</sup> <sup>⊕</sup> <sup>w</sup><sup>β</sup> <sup>=</sup> <sup>w</sup><sup>β</sup> on BV(α)- (or, equivalently by syntactic well-formedness, on BV(β) and BV(β), respectively). If Y is the set of all channel names occurring in α, we write τ ↓ α for τ ↓ Y .

**Definition 7 (Program semantics).** *Given an interpretation* I*, the* semantics I[[α]] ⊆ D *of a CHP* α *is defined as follows, where* ⊥<sup>D</sup> = S×{} × {⊥} *and* - *denotes the satisfaction relation (Definition 8):*

$$\begin{aligned} &I[\alpha[\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\!]\!] \!] \!] \!] \!] \!] \!] \!} \![\![\![\![\pi\,\rho\!] \!] \!] \!] \!} \![\![\![\pi\,\rho\!] \!] \!] \!} \![\![\![\pi\,\rho\!] \!] \!] \!} \![\![\![\pi\,\rho\!] \!] \!] \!} \, \![\![\![\![\pi\,\rho\!] \!] \!] \!] \!} \, \![\![\![\![\![\rho\!]\!] \!] \!] \!] \!} \![\![\![\![\rho\!]\!] \!] \!] \!] \!} \, \![\![\![\![\![\![\![\![\![\![\![\![\![\!]\!]\!] \!] \!] \!] \!] \!] \!] \!} \!] \!] \!} \, \![\![\![\![\![\rho\!]\!] \!] \!] \!] \!] \!} \, \![\#\![\![\rho\!]\!] \!] \!] \, \![\#\![\![\rho\!]\!] \!] \!] \, \![\#\![\#\![\#\![\rho\!]\!] \!] \!] \, \![\#\![\#\![\#\![\#\![\rho\!]\!] \!] \!] \!] \, \![\#\![\#\![\#\![\rho\!]\!] \!] \!] \, \!$$

The semantics is indeed constructed prefix-closed, total, and chronological. Communication τ of α<sup>1</sup> α<sup>2</sup> is implicitly characterized via its subsequences for the subprograms. By τ = τ ↓ (α<sup>1</sup> α2), there is no non-causal communication. Joint communication and the whole computation are synchronized in global time by the projections and by wα<sup>1</sup> = wα<sup>2</sup> on {μ, μ }, respectively. Likewise, by projection, communication is synchronously recorded by trace variables.

**Definition 8 (Formula semantics).** *The* satisfaction Iv φ *of a* dLCHP *formula* φ *in interpretation* I *and state* v *is inductively defined as follows:*


$$\{Iv \cdot \tau' \mid \tau' \prec \tau\} \models \mathsf{A} \text{ implies } Iv \cdot \tau \models \mathsf{C} \tag{commit} \tag{commit}$$

$$\left( \{ Iv \cdot \tau' \mid \tau' \preceq \tau \} \models \mathsf{A} \text{ and } w \neq \bot \right) \text{ implies } Iw \cdot \tau \models \psi \tag{\text{post}} \,\, \tau$$

*Where* U ϕ *for a set of interpretation-state pairs* U *and any formula* ϕ *if* Iv ϕ *for all* Iv ∈ U*. In particular,* ∅ ϕ*.*

In item 6 and 7, reachable worlds are built from states v and w, and communication τ , as change of state *and* communication are observable. The strict prefix <sup>≺</sup> for the assumption in case (commit) in item <sup>6</sup> excludes (when <sup>A</sup> <sup>≡</sup> <sup>C</sup>) the circularity that commitment C can be shown in states where it is assumed.

#### **2.3 Static Semantics**

In the uniform substitution process, checks of free and bound variables, as well as accessed and written channels, separate sound from unsound axiom instantiations. As parallelism requires fine-grained control over channels, the static semantics for dL [30] is lifted to a communication-aware static semantics for dLCHP. It uses accessed channels to characterize the subsequence of a communication trace influencing truth of a formula even more precisely than free variables.

To precisely grasp free and bound variables, and accessed and written channels, Definition 9 gives a semantic characterization. In this section, formulas are considered truth-valued, i.e., Iv[[φ]] = **tt** if Iv φ and Iv[[φ]] = **ff** if Iv φ.

**Definition 9 (Static semantics).** *For term or formula* e*, and program* α*, free variables* FV(e) *and* FV(α)*, bound variables* BV(α)*, accessed channels* CN(e)*, and written channels* CN(α) *form the static semantics.*

FV(e) = {<sup>z</sup> <sup>∈</sup> <sup>V</sup> | ∃I, v, <sup>v</sup>˜ *such that* <sup>v</sup> = ˜<sup>v</sup> *on* {z} *and* Iv[[e]] = Iv˜[[e]]} CN(e) = {*ch* <sup>∈</sup> <sup>Ω</sup> | ∃I, v, <sup>v</sup>˜ *such that* <sup>v</sup> ↓ {*ch*}- = ˜v ↓ {*ch*} *and* Iv[[e]] = Iv˜[[e]]} FV(α) = {<sup>z</sup> <sup>∈</sup> <sup>V</sup> | ∃I, v, v, τ, w ˜ *such that* <sup>v</sup> = ˜<sup>v</sup> *on* {z} *and* (v, τ, w) ∈ I[[α]]*, and there is no* (˜v, τ, ˜ w˜) ∈ I[[α]] *such that* τ˜ = τ *and* w = ˜w *on* {z}-} BV(α) = {<sup>z</sup> <sup>∈</sup> <sup>V</sup> | ∃I,(v, τ, w) <sup>∈</sup> <sup>I</sup>[[α]] *such that* <sup>w</sup> <sup>=</sup> <sup>⊥</sup> *and* (<sup>w</sup> · <sup>τ</sup> )(z) <sup>=</sup> <sup>v</sup>(z)} CN(α) = {*ch* <sup>∈</sup> <sup>Ω</sup> | ∃I,(v, τ, w) <sup>∈</sup> <sup>I</sup>[[α]] *such that* <sup>τ</sup> ↓ {*ch*} <sup>=</sup> }

The already subtle static semantics of hybrid systems [30] becomes even more subtle with communication and parallelism. For example, CHPs (silently) synchronize with the global time μ, which is free and bound in ODEs, and the differential <sup>μ</sup> is bound, i.e.,<sup>μ</sup> <sup>∈</sup> FV({x <sup>=</sup> <sup>θ</sup> & <sup>χ</sup>}) and μ, μ <sup>∈</sup> BV({x <sup>=</sup> <sup>θ</sup> & <sup>χ</sup>}) if the evolution has a run of non-zero duration, regardless of whether μ occurs in x. Since reachable worlds of CHPs consist of communication *and* state, bound variables BV(α) of program <sup>α</sup> compare <sup>v</sup> with the state-trace concatenation <sup>w</sup> · <sup>τ</sup> instead of missing <sup>τ</sup> . Consequently, <sup>h</sup> <sup>∈</sup> BV(ch(h)!θ) <sup>⊆</sup> FV(ch(h)!θ), which also reflects that the initial communication never gets lost.

**Lemma 10 (Bound effect property).** *The sets* BV(α) *and* CN(α) *are the smallest sets with the* bound effect property for program α*. That is,* v = w *on* V<sup>T</sup> *and* <sup>v</sup> <sup>=</sup> <sup>w</sup> · <sup>τ</sup> *on* BV(α) *if* <sup>w</sup> <sup>=</sup> <sup>⊥</sup>*, and* <sup>τ</sup> <sup>↓</sup> CN(α)-= *for all* (v, τ, w) ∈ I[[α]]*.*

By the following *communication-aware* coincidence property, terms and formulas only depend on their free variables, which for trace variables can be further refined to the subtraces whose channels are accessed. This subtrace-level precision is crucial in the soundness proof of the parallel injection axiom as it allows to drop β from [α β]ψ only if β does not write channels of ψ that are not also written by α. The signature Σ(·) of an expression denotes all occurring symbols.

**Lemma 11 (Coincidence for terms and formulas).** *The sets* FV(e) *and* CN(e) *are the smallest sets with the* communication-aware coincidence property for term or formula <sup>e</sup>*. That is, if* <sup>v</sup> <sup>↓</sup> CN(e)=˜<sup>v</sup> <sup>↓</sup> CN(e) *on* FV(e) *and* <sup>I</sup> <sup>=</sup> <sup>J</sup> *on* Σ(e)*, then* Iv[[e]] = Jv˜[[e]]*. In particular, for formula* φ*:* Iv φ *iff* Jv˜ φ*.*

Programs communicate but do *not* depend on the recorded history, thus the coincidence property for programs is not communication-aware. However, programs can produce the same communication starting from coinciding states.

**Lemma 12 (Coincidence for programs).** *The set* FV(α) *is the smallest set with the* coincidence property for program <sup>α</sup>*. That is, if* <sup>v</sup> = ˜<sup>v</sup> *on* <sup>X</sup> <sup>⊇</sup> FV(α)*, and* I = J *on* Σ(α)*, and* (v, τ, w) ∈ I[[α]]*, then* (˜v, τ, ˜ w˜) ∈ J[[α]] *exists such that* w = ˜w *on* X*, and* τ = ˜τ *, and (*w = ⊥ *iff* w˜ = ⊥*).*

# **3 Uniform Substitution for dL**CHP

In dLCHP, a uniform substitution [30] σ maps function and predicate symbols to terms (of equal sort) and formulas, respectively, while substituting the arguments of the symbol for their placeholders in the replacement, and program constants are mapped to CHPs. For example, σ = {f(*·*)→ *·* + 1, a → ch(h)?v; {x = v}} replaces all occurrences of function symbol f with *·* + 1 while the reserved 0-ary function symbol *·* marks the positions for the parameter of f in the replacement. Moreover, σ replaces the program constant a with the program ch(h)?v; {x = v}.

The key to sound uniform substitution is that new free variables must not be introduced into a context where they are bound [8]. In the presence of communication, likewise, *new channel access must not be introduced into contexts* where the channel is written (B I). For parallelism, substitution *must not reveal internals* of the parallel context to the local abstraction of a subprogram (B II), and must not violate state disjointness. The one-pass approach [32] used for dLCHP postpones these checks *and* simply applies the substitution recursively while collecting written variables and channels as taboo set (Fig. 2), thus operates linearly in the input. Clashes between the taboo, and new free variables and channel access are only checked locally at the replacement site. Likewise, clashes between the permitted channels and variables of a program constant, and its replacement program are checked locally.

The substitution operator σU,W <sup>Z</sup> (α) for program α takes an input taboo U ⊆ V ∪ Ω and a parallel context W ⊆ V , and returns, if defined, the substitution result and a set of output taboos Z ⊆ V ∪ Ω. For terms and formulas, the substitution operator <sup>σ</sup><sup>U</sup> only takes a taboo <sup>U</sup> <sup>⊆</sup> <sup>V</sup> <sup>∪</sup><sup>Ω</sup> as input. The substitution process clashes, i.e., prevents unsound instantiation, if it were to introduce a free variable or accessed channel into a context where it is bound (B I) *or* if it were to write variables and channels violating abstraction (B II). Moreover, substitution preserves well-formedness of programs and formulas, i.e., substitution clashes if replacements were to violate well-formedness.

**Fig. 2.** Application of uniform substitution for taboo U and parallel context W, where <sup>W</sup>U,γ <sup>≡</sup> <sup>W</sup> <sup>∪</sup> (BV(σU,W (γ) \ ({μ, μ- } ∪ V<sup>T</sup> )) for any program γ, and e ↓ Y for term e is recursive push down of projection ↓Y , where p(Y0, e) ↓ Y ≡ p(Y<sup>0</sup> ∩ Y,e).

The side condition (FV(σf(*·*)) <sup>∪</sup> CN(σf(*·*))) <sup>∩</sup> <sup>U</sup> <sup>=</sup> <sup>∅</sup> implements locally that the replacement for f must not introduce free parameters that are tabooed by U (B I). The substitution {*·* <sup>→</sup> <sup>σ</sup><sup>U</sup> (<sup>e</sup> <sup>↓</sup> <sup>Y</sup> )}<sup>∅</sup> is responsible for the argument <sup>e</sup>, 3 where ∅ suffices as the taboo U is already checked on e ↓ Y . By the projection, e↓Y only depends on channels Y . Quantification ∀z taboos the bound variable z. Program α in a box or ac-box has an empty parallel context ∅.

The substitution σU,W <sup>Z</sup> (α) computes the output taboo Z by adding the written variables and channels of program α to U, e.g., real variable x for assignment x := θ and for receiving ch(h)?x additionally channel ch and trace variable h. The output taboo Z is passed to ac-formulas and postconditions of boxes and ac-boxes for recursive checks for clashes w.r.t. (B I). Crucially for soundness, Lemma 13 below proves that σU,W <sup>Z</sup> (·) correctly computes the output taboo Z.

The taboo U∪W passed to nested expressions contains the parallel context W to prevent free variables in replacements of function and predicate symbols that are bound in parallel. This prepares the substitution process to preserve the syntax restrictions for parallel composition from previous work [6].<sup>4</sup> Substitution for evolution {x = θ & χ} considers that the global time μ, μ is always implicitly bound regardless of whether it occurs in x, x . The fixpoint notation σZ,W <sup>Z</sup> (α) for the replacement of repetition α<sup>∗</sup> ensures that the output taboo of the first iteration is tabooed in the subsequent iterations [32]. Computing the parallel context of α and β in case α β requires one additional pass for both subprograms because what they potentially bind after substitution adds to the parallel context of the respective other subprogram.

**Lemma 13 (Correct output taboo).** *Application* σU,W <sup>Z</sup> (α) *of uniform substitution retains input taboo* U *and correctly adds the bound variables and written channels of program* <sup>α</sup>*, i.e.,* <sup>Z</sup> <sup>⊇</sup> <sup>U</sup> <sup>∪</sup> BV(σU,W <sup>Z</sup> (α)) <sup>∪</sup> CN(σU,W <sup>Z</sup> (α))*.*

The side condition of σU,W <sup>Z</sup> (a(|Y, z¯|)) maintains local abstraction of subprograms (B II) because the replacement cannot bind more than a(|Y, z¯|), thus cannot bind variables and channels of an abstraction that is independent of a(|Y, z¯|). This also preserves state-disjointness (well-formedness) of parallel programs.

#### **3.1 Semantic Effect of Uniform Substitution**

The key ingredients for proving soundness of uniform substitution are Lemma 16 and 17 below. They prove that the effect of the syntactic transformation applied by uniform substitution can be equally mimicked by semantically modifying the interpretation of function and predicate symbols, and program constants. This adjoint interpretation σ<sup>∗</sup> <sup>w</sup>I for interpretation I and state w changes how symbols are interpreted according to their syntactic replacements in the substitution σ.

<sup>3</sup> Extension to vectorial arguments is straightforward.

<sup>4</sup> For <sup>α</sup> <sup>β</sup>, the restriction is (V(α) <sup>∩</sup> BV(β)) <sup>∪</sup> (V(β) <sup>∩</sup> BV(α)) ⊆ {μ, μ- } ∪ V<sup>T</sup> [6]. However, in this paper, programs obey a less restrictive syntax for simplicity.

**Definition 14 (Adjoint substitution).** *For interpretation* I *and state* w*, the* adjoint interpretation σ<sup>∗</sup> <sup>w</sup>I *changes the meaning of function and predicate symbols, and program constants according to the substitution* σ *evaluated in state* w*:*

$$\begin{aligned} &\sigma\_w^\* I(f^{\mathbb{M}} : \mathbb{M}\_{\text{arg}}) : \mathbb{M}\_{\text{arg}} \to \mathbb{M}; d \mapsto I^d\_\cdot w[\sigma f(\cdot)] \quad \text{where } \mathbb{M}, \mathbb{M}\_{\text{arg}} \in \{\mathbb{R}, \mathbb{N}, \Omega, T\} \\ &\sigma\_w^\* I(p : \mathbb{M}\_{\text{arg}}) = \{d \in \mathbb{M}\_{\text{arg}} \mid I^d\_\cdot w \succeq \sigma p(\cdot)\} \quad \text{where } \mathbb{M}\_{\text{arg}} \in \{\mathbb{R}, \mathbb{N}, \Omega, T\} \\ &\sigma\_w^\* I(a[\!\!\!/ \!f, \bar{z}] \!\!\} = I[\!\!\! \!a \,] \end{aligned}$$

We follow the observation for dGL [32] that the more liberal one-pass substitution requires stronger coincidence between the substitution and the adjoint on neighborhoods of the original state. Where the dGL soundness proof has succeeded by a neighborhood semantics of state on taboos, the dLCHP proof succeeds with a generalization to a neighborhood semantics of state and communication on taboos. The neighborhood of a state consists of its variations:

**Definition 15 (Variation).** *For a set* U ⊆ V ∪ Ω*, a state* v *is a* U-variation of state w *if* v *and* w *only differ on variables or projections onto channels in* U*, i.e.,* v ↓ (U- ∩ Ω) = w ↓ (U- ∩ Ω) *on* U-∩ V *.*

The proofs of Lemma 16 and 17 follow a lexicographic induction on the structure of substitution, and term, formula, or program. In Lemma 17, the induction is mutual for formulas and programs.

**Lemma 16 (Semantic uniform substitution).** *The term* e *evaluates equally over* U*-variations under uniform substitution* σ<sup>U</sup> *and adjoint interpretation* σ<sup>∗</sup> <sup>w</sup>I*, i.e.,* Iv[[σ<sup>U</sup> (e)]] = σ<sup>∗</sup> <sup>w</sup>Iv[[e]] *for all* U*-variations* v *of* w*.*

**Lemma 17 (Semantic uniform substitution).** *The formula* φ *and the program* α *have equal truth value and semantics, respectively, over* U*-variations under uniform substitution* σ<sup>U</sup> *and adjoint interpretation* σ<sup>∗</sup> <sup>w</sup>I*, i.e.,*

*1. for all* U*-variations* v *of* w*:* Iv σ<sup>U</sup> (φ) *iff* σ<sup>∗</sup> <sup>w</sup>Iv φ *2. for all* (<sup>U</sup> <sup>∪</sup>W)*-variations* <sup>v</sup> *of* <sup>w</sup>*:* (v, τ, o) <sup>∈</sup> <sup>I</sup>[[σU,W <sup>Z</sup> (α)]] *iff* (v, τ, o) ∈ σ<sup>∗</sup> <sup>w</sup>I[[α]]

#### **3.2 Uniform Substitution Proof Rule**

The proof rule US for uniform substitution is the single point of truth for the sound instantiation of axioms (plus renaming of bound variables [30] and written channels, e.g., [x := θ]ψ(x) to [y := θ]ψ(y) and [ch(h)?x]ψ(ch) to [dh(h)?x]ψ(dh). Soundness of the rule, i.e., that validity of its premise implies validity of the conclusion, immediately follows from Lemma 17. Since the substitution process starts with no taboos, σ(φ) is short for σ∅(φ). If the substitution clashes, i.e., σ∅(φ) is not defined, then rule US is not applicable.

**Theorem 18 (**US **is sound).** *The proof rule* US *is sound.*

$$\frac{\phi}{\sigma(\phi)}\text{ us}$$

Unlike dL [30] and dGL [32], dLCHP has a context-sensitive syntax for programs and formulas (see Definition 2 and Definition 4). By Proposition 19, uniform substitution, however, preserves syntactic well-formedness. Since all axioms in Sect. 4 will be well-formed, only well-formed formulas can be derived in dLCHP.

**Proposition 19 (**US **preserves well-formedness).** *The result* σ<sup>U</sup> (φ) *(if defined) of applying uniform substitution to a well-formed formula* φ *is wellformed.*

### **4 Axiomatic Proof Calculus**

Figure 3 presents a sound proof calculus for dLCHP. The significant difference to dLCHP's schematic calculus [6] is that it completely abandons soundness-critical side conditions, internalizing them syntactically in the axioms. Only axiom **[]**W**<sup>A</sup>** was adjusted to obtain a symbolic representation and an ac-version K**AC** of modal modus ponens is included. Now, distribution of ac-boxes over conjuncts **[]AC***∧* and ac-monotonicity M**[***·***]AC** derive from K**AC**, thus are dropped. Except for the small changes soundness is inherited from the schematic axioms [6].

Algebraic laws for reasoning about traces [6] can be easily adapted to uniform substitution as well [7]. Decidable first-order real arithmetic [41] and Presburger arithmetic [34] have corresponding oracle proof rules [6].

*Remark 20.* To obtain a truly finite list of axioms from Fig. 3, symbolic (co)finite sets can be finitely axiomatized as a boolean algebra together with extensionality, which can be unrolled to a finite disjunction for (co)finite sets [7].

**Parallel Composition.** The parallel injection axiom **[ ]AC** in Fig. 3 decomposes parallel CHPs by local abstraction (B II). Unlike dLCHP's [6] and Hoarestyle [46,47] schematic calculi for ac-reasoning, axiom **[ ]AC** internalizes the noninterference property [6, Def. 7] that determines valid instances of formula

$$[\alpha]\_{\{\mathsf{A},\mathsf{C}\}}\psi \to [\alpha \parallel \beta]\_{\{\mathsf{A},\mathsf{C}\}}\psi \tag{1}$$

purely syntactically. To focus on noninterference, a(|Ya, z¯<sup>a</sup>|) wf b(|Yb, z¯<sup>b</sup>|) abbreviates well-formed parallel composition a(|Ya, z¯<sup>a</sup>|) b(|Yb,(¯z<sup>b</sup> ∩ z¯- <sup>a</sup>) ∪ {μ, μ } ∪ V<sup>T</sup> |) using operator wf for program constants a(|Ya, z¯<sup>a</sup>|), b(|Yb, z¯<sup>b</sup>|). This notation ensures disjoint parallel state except for the global time μ, μ and recorder variables V<sup>T</sup> .

Intuitively, axiom **[ ]AC** restricts β in Eq. (1) such that α overapproximates the behavior of <sup>α</sup> <sup>β</sup> influencing <sup>A</sup>, <sup>C</sup>, or <sup>ψ</sup>. For this purpose, noninterference internalized in b(|Y<sup>b</sup> ∩ (Y - ∪ Ya), z¯-|) forbids b to bind variables ¯z that are free in the postcondition p(Y, z¯), and Y forbids b to bind channels Y (except for channels Y<sup>a</sup> written by a because joint parallel communication can already be observed from a, too). Moreover, parallel programs always agree on the global time μ, μ and the communication recorded by trace variables V<sup>T</sup> . Therefore, the operator wf explicitly allows their sharing even if ¯z disallows it. Note that Y<sup>a</sup> and Y , and ¯z<sup>a</sup> and ¯z may overlap but can also be disjoint.

$$\begin{array}{lcl} [\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\!\!]\!]\!]\!]\!]\!]\!]\!]\!\!}\!]\!\!]\!\!}\!\!]\!\![\![\![\![\![\![\![\![\![\!\!]\!]\!]\!]\!]\!]\!\!\!}\!\!\!}\!\!\!\!}\!\!\!\![\![\![\![\![\![\![\![\![\![\!\!]\!]\!]\!]\!]\!]\!\!\!]\!\!\!}\!\!\!\!\!}\!\!\!\!\!\!\!\!\!}\!\!\!\!\!\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\![\!.\!]\!]\!]\!]\!]\!]\!]\!]\!\!}\!\!\!}\!\!\!\!\!}\!\!\!\!\!\!}\!\!\!\!\!\!\!}\!\!\!\!\!\!}\!\!\!\!\!\!}\!\!\!\!\!\!\!\!}\!\!\!\!\!\!\!}\!\!\!\!\!\!\!\!}\!\!\!\!\!\!\!}\!\!\!\!\!\!\!\!}\!\!\!\!\!\!\!\!}\!\!\!\!\!\!\!\!}\!\!\!\!\!\!\!\!\!}\!\!\!\!\!\!\!\!\!}\!\!\!\!\!\!\!\!\!\!\!\!}\!\!\!\!\!\!\!\!\!\!\!\!\!\!}\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!}\!\!\!$$


**Fig. 3.** dLCHP proof calculus

Despite its asymmetric shape, axiom **[ ]AC** decomposes [αβ](φ∧ψ) into [α]φ and [β]ψ (if they mutually do not interfere) via independent proofs for [αβ]φ and [αβ]ψ, which drop either α or β by **[ ]AC** modulo commutativity.

**Axiom System.** For each program statement, there is either a dynamic or an ac-axiom because the respective other version derives by axiom **[]***-,* or **[***-***]AC**. Axioms **[:=]**, **[:***∗***]**, and **[?]** are as in dL [30]. Axioms **[;]AC**, **[***∪***]AC**, and **[** *<sup>∗</sup>***]AC** for decomposition, and I**AC** for induction carefully generalize their versions in differential [30] dynamic [14] logic to ac-reasoning. Sending is handled step-wise via flattening the assumption-commitments by axiom **[**ch**!]AC** and axiom **[**ch**!]** that executes the effect onto the recorder h. The duality **[**ch**?]AC** turns receiving into arbitrary sending, which only synchronizes if it agrees with the parallel context on the value. Usage of axiom W**[]AC** is for convenience. Axiom **[***μ***]** materializes the flow of global time μ such that dL's axiomatization of continuous evolution [30] gets applicable, which requires ODE shape ¯x = f <sup>R</sup>(¯x). The axiomatic proof

rules G**AC**, MP, *∀*, and CE are an ac-version of G¨odels generalization rule, modus ponens, quantifier elimination, and contextual equivalence, respectively.

The axiom **[]**W**<sup>A</sup>** can weaken assumptions. Its slight change compared to dLCHP's schematic calculus [6] exploits that the compositionality condition W<sup>A</sup> is only required for a's reachable worlds. Interestingly, dLCHP's monotonicity rule M**[***·***]AC** [6] does not derive from modal modus ponens K**AC** and G¨odel generalization G**AC** in analogy to dL [30] but needs W**[]AC** handling monotonicity of assumptions, which does not fit into G**AC** because necessitating the assumption in G**AC** would render the derivation of [α]{ T ,T}<sup>T</sup> by <sup>G</sup>**AC** impossible.

Axioms using postcondition P ≡ p(Y, z¯), e.g., in **[;]AC**, allow any replacement of P since accessed channels Y ⊆ Ω and free variables ¯z ⊆ V<sup>R</sup> ∪ V<sup>T</sup> can be arbitrary. Replacements of assumptions R <sup>≡</sup> <sup>r</sup>(Y, <sup>h</sup>¯) and commitments Q <sup>≡</sup> <sup>q</sup>(Y, <sup>h</sup>¯) can instead only mention trace variables <sup>h</sup>¯ <sup>⊆</sup> <sup>V</sup><sup>T</sup> bound in their context. This reflects that trace variables are the only interface between the program α and the ac-formulas <sup>A</sup> and <sup>C</sup> in an ac-box [α]{A,C}<sup>ψ</sup> (well-formedness).

**Theorem 21 (Soundness).** *The proof calculus for* dLCHP *presented in Fig. 3 is sound as an instantiation of the schematic calculus [6].*

**Clashes.** Clashes sort out unsound instantiations of axioms. Unlike in dL and dGL [30,32] whose clashes are solely due to tabooed variables in terms and formulas, clashes in dLCHP can also be due to tabooed channels, and even due to taboos in programs. For example, the substitution σ = {a → gh(h)!1, b → ch(h)!2, p→ψ, r <sup>→</sup>T, q <sup>→</sup>T} with <sup>ψ</sup> ≡ |<sup>h</sup> <sup>↓</sup> ch<sup>|</sup> <sup>&</sup>gt; <sup>0</sup> ∧ |<sup>h</sup> <sup>↓</sup> dh<sup>|</sup> <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> y < 0 clashes below, where Y = {ch, dh}, and ¯z ≡ h, y, and R ≡ r(Y ), and Q ≡ q(Y ). Writing channel ch in the replacement for b would break the local abstraction of a as ch is accessed in ψ but not written in the replacement for a, thus the clash indeed sorts out an unsound instantiation.

$$\frac{[a\{\{\text{gh}\},h\}]\_{\{\text{R},\text{Q}\}}p(Y,\bar{z}) \to [a\{\{\text{gh}\},h\} \parallel\_{\text{wf}} b\{\{\text{ch}\} \cap (Y^{\text{E}} \cup \{\text{gh}\})\}, \bar{z}^{\text{E}}]\}\_{\{\text{R},\text{Q}\}}p(Y,\bar{z})}{[\text{gh}(h)!1]\_{\{\text{T},\text{T}\}}\psi \to [\text{gh}(h)!1 \parallel \text{ch}(h)!2]\_{\{\text{T},\text{T}\}}\psi} \ \ \ \forall \text{class} \rangle$$

In contrast, <sup>σ</sup> <sup>=</sup> {a→ch(h)?x; gh(h)!1, b <sup>→</sup>ch(h)!2, p→ψ, r <sup>→</sup>T, q <sup>→</sup>T} does not clash below, where Y = {ch, dh}, and Y<sup>a</sup> = {ch, gh}, and other abbreviations are as above, because ch ∈ Y - ∪ Y<sup>a</sup> = {dh}-. Intuitively, the ch-communication of b remains observable after dropping b from the parallel composition as it is joint with a.

$$\frac{\begin{array}{c} \ast\\ \end{array}}{\begin{array}{c} [a\{Y\_{a},h,x\}\big]\_{\{\mathsf{R},\mathsf{Q}\}} p(Y,\overline{z}) \rightarrow [a\{Y\_{a},h,x\}\big]\_{\{\mathsf{wf}\}} b\{\{\mathsf{ch}\}\cap\{Y^{\mathsf{C}}\cup Y\_{a}\},\overline{z}^{\mathsf{Q}}\}\big]\_{\{\mathsf{R},\mathsf{Q}\}} p(Y,\overline{z})\\ \hline [\mathsf{ch}(h)?x;\mathsf{g}(h)!1\big]\_{\{\mathsf{T},\mathsf{T}\}}\psi \rightarrow [(\mathsf{ch}(h)?x;\mathsf{g}(h)!1)\parallel\{\mathsf{ch}(h)!2\big]\_{\{\mathsf{T},\mathsf{T}\}}\psi\end{array}}{\text{US}} \text{US}$$

Also note that by the operator wf for well-formed parallel composition, the recorder variable h can be shared without causing a clash above. However, clashes prevent instantiation that would violate syntactic well-formedness of programs (Definition 2) by binding the same state variable in parallel:

$$\frac{[a\{\emptyset,x\}]\_{\{r,q\}}p(x,y) \to [a\{\emptyset,x\}\parallel\_{\text{wf}}b\{\emptyset,\{x,y\}^{\mathbb{C}}\}]\_{\{r,q\}}p(x,y)}{[x:=y]\_{\{\tau,\tau\}}y = x \to [x:=y \parallel x:=0]\_{\{\tau,\tau\}}y = x} \text{ } \neq \text{classh}$$

Well-formedness of programs and formulas is ensured in the axioms by wellformed parallel composition wf and limitation to trace variables <sup>h</sup>¯ in R<sup>j</sup> <sup>≡</sup> <sup>r</sup><sup>j</sup> (Y, <sup>h</sup>¯) and Q<sup>j</sup> <sup>≡</sup> <sup>q</sup><sup>j</sup> (Y, <sup>h</sup>¯) in ac-boxes [α]{R*<sup>j</sup>* ,Q*<sup>j</sup>* }<sup>ψ</sup> in Fig. 3, respectively. By Proposition 19, uniform substitution always preserves well-formedness.

*Example 22.* The proof tree below decomposes safety (Example 5) of cruise control (Example 3) into safety 1 of controller ct and branch 2 to be continued to safety of the vehicle ve. The lower subproof introduces the ac-formulas

$$\mathsf{A} \equiv \mathsf{C} \equiv \left( |h \downarrow \text{tar}| > 0 \to 0 \le \mathsf{va1}(h \downarrow \text{tar}) \le V \right),$$

using axiom **[]**W**<sup>A</sup>** to abstract from the communication between ct and ve. The upper subproof uses the parallel injection axiom **[ ]AC** to drop ve. Uniform substitution US does not clash as the commitment C only refers to joint communication of ct and ve. Other applications of US (e.g., for **[]**W**<sup>A</sup>**) are omitted. Rule Prop denotes propositional reasoning. Abbreviations are as follows: <sup>α</sup> <sup>≡</sup> <sup>a</sup>(|tar, vtr ct , t, t , μ, μ , h|), R ≡ r(tar, h), Q ≡ q(tar, h), P ≡ p(tar).

#### **5 Related Work**

Uniform substitution for differential dynamic logic dL [30] generalizes Church's uniform substitution for first-order logic [8, §35, 40]. Unlike the lifting from dL to differential game logic dGL [31], dLCHP generalizes into the complementary direction of communication and parallelism. Unlike schematic calculi [2,19,27, 44,46], whose treacherous schematic simplicity relies on encoding all subtlety of parallel systems in significant soundness-critical side conditions, our development builds upon a minimalistic non-schematic parallel injection axiom *and* sound instantiation encapsulated in uniform substitution. This provides a new, more atomic and more modular understanding of parallel systems overcoming the root cause for large soundness-critical prover kernels [5,9,12,16,18,36]. Usage of uniform substitution reduced the kernel of the theorem prover KeYmaera from 105 kLOC to 2 kLOC in KeYmaera X [23]. We expect dLCHP's integration into KeYmaera X to stay in the same order of magnitude.

To the best of our knowledge, assumption-commitment reasoning [22,46] <sup>5</sup> has no tool support, which might be due to vast implementation effort. The latter can be underpinned by analogy with tools [5,9,16,18,36] for verification of sharedvariables concurrency, some of which use rely-guarantee reasoning [36,39]. Unlike uniform substitution for dLCHP that enables a straightforward implementation of a small prover kernel, they all rely on large soundness-critical code bases. Unlike refinement checking for CSP [12] and discrete-time CSP [4], dLCHP supports safety properties of dense-time hybrid systems. Contrary to our goal of small prover kernels, implementations of model checkers [12] are inherently large.

Beyond embeddings of concurrency reasoning for discrete systems into proof assistants [3,25,26,38], dLCHP can verify parallel hybrid systems synchronizing in shared global time. The latter imposes even more complicated binding structures than parallel or hybrid systems alone but dLCHP's uniform substitution calculus continues to manage them in a modular way.

The recent tool HHLPy [37] for hybrid CSP (HCSP) [17] is limited to the sequential fragment. Unlike extending HHLPy to parallelism, which would require extensive soundness-critical side conditions and a treatment of the duration calculus, integrating dLCHP into KeYmaera X [11] boils down to adding a finite list of concrete object level formulas as axioms and only small changes to the uniform substitution process. In contrast to dLCHP's compositional parallel systems calculus [6], HCSP calculi [13,20,42] are non-compositional [6] as they either unroll exponentially many interleavings from the operational semantics [13,42] or can only decompose independent parallel components [20] causing limited ability to reason about complex systems. Former HCSP tools [43,45] only implement a non-compositional calculus [20] reinforcing the significance of our approach for managing parallel hybrid systems reasoning. Other hybrid process algebras defer to model checkers for reasoning [10,21,40]. Further discussion of dLCHP is in [6].

### **6 Conclusion**

This paper introduced a sound one-pass uniform substitution calculus for the dynamic logic of communicating hybrid programs dLCHP thereby mastering the significant challenge of developing simple sound proof calculi for parallel hybrid systems with communication. Uniform substitution can separate even notoriously complicated binding structures from parallelism with communication in multi-dynamical logics into axioms and their instantiation. In the case of dLCHP,

<sup>5</sup> Assumption-commitment and rely-guarantee reasoning are specific patterns for message-passing and shared variables concurrency, respectively. The broader assumeguarantee principle has been used across diverse areas for various purposes.

this applies to channel access in predicates and the need for local abstraction of subprograms in parallel statements, and it even turns out that uniform substitution can maintain a context-sensitive syntax along the way. Thanks to uniform substitution, parallel systems reasoning reduces to multiple uses of an asymmetric parallel injection axiom.

Now, with uniform substitution a straightforward implementation of dLCHP in KeYmaera X is only one step away.

**Acknowledgments.** This project was funded in part by the Deutsche Forschungsgemeinschaft (DFG) – 378803395 (ConVeY), an Alexander von Humboldt Professorship, and by the AFOSR under grant number FA9550-16-1-0288.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# An Isabelle/HOL Formalization of the SCL(FOL) Calculus

Martin Bromberger<sup>1</sup> , Martin Desharnais1,2(B) , and Christoph Weidenbach<sup>1</sup>

<sup>1</sup> Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany {mbromber,desharnais,weidenbach}@mpi-inf.mpg.de <sup>2</sup> Graduate School of Computer Science, Saarland Informatics Campus,

Saarbrücken, Germany

Abstract. We present an Isabelle/HOL formalization of Simple Clause Learning for first-order logic without equality: SCL(FOL). The main results are formal proofs of soundness, non-redundancy of learned clauses, termination, and refutational completeness. Compared to the unformalized version, the formalized calculus is simpler and more general, some results such as non-redundancy are stronger and some results such as non-subsumption are new. We found one bug in a previously published version of the SCL Backtrack rule. Compared to related formalizations, we introduce a new technique for showing termination based on non-redundant clause learning.

Keywords: interactive theorem proving · automated theorem proving · first-order logic · CDCL · SCL · non-redundant clause learning

### 1 Introduction

The SCL ("Clause Learning from Simple Models" or simply "Simple Clause Learning") family of calculi lifts a conflict-driven clause learning (CDCL) approach to first-order logic: SCL(FOL) is for first-order logic without equality [8,10], SCL(T) is for first-order logic with theories [6], SCL(EQ) is for firstorder logic with equality [12], and HSCL is for exhaustive partial models exploration in first-order logic without equality [7]. In its original formulation [10], SCL(FOL) required exhaustive propagation and a precise strategy for the application of the rules in order to learn non-redundant clauses. This was improved upon by SCL(T) [6] by dropping exhaustive propagation and weakening the strategy, i.e., any run according to the strategy in [10] is also a run according to the strategy in [6]. The SCL(FOL) version presented in Bromberger et al. [8] integrates those changes and additionally refines the Backtrack rule.

We present an Isabelle/HOL formalization of the non-executable specification of SCL(FOL) based on and developed in parallel to Bromberger et al. The main results are soundness, non-redundancy of learned clauses, termination, and refutational completeness. In contrast to the goal of Bromberger et al. to guide toward an implementation, our goal is to be as simple and general as possible. For that, we (i) simplified the calculus (e.g., no more explicity tracking of decision levels), (ii) generalized the calculus (e.g., multiple acceptable positions in the Backtrack rule), (iii) strengthened existing theorems (e.g., Theorem 11 on non-redundancy), and (iv) proved new theorems (e.g., Corollary 12 on nonsubsumption).

This work is part of the IsaFoL (Isabelle Formalization of Logic) effort [2], which aims at developing a library of results about logical calculi. The Isabelle theory files are available in the *Archive of Formal Proofs* (AFP) [9] and amount to more than 11 000 lines of source text. They build heavily upon many other entries of the AFP: (i) First\_Order\_Terms [17] for first-order terms, term substitutions, and MGU; (ii) Ordered\_Resolution\_Prover [14–16] for the clausal calculus, clause substitutions, Herbrand interpretation, and compactness of firstorder logic; and (iii) Saturation\_Framework\_Extensions [5,18] for entailment of the clausal calculus. We contributed many lemmas and definitions back to both the Isabelle distribution and the aforementioned AFP entries (e.g., over 50 to First\_Order\_Terms). We made heavy use of the Isar language [19] to write structured proofs, the Sledgehammer tool [13] for proof automation, and locales [1]—Isabelle's parameterized module system—to structure our development and reuse existing components from the AFP entries. To ease associating the main results in this paper with their counterparts in the Isabelle development, names in monospace are taken verbatim from the formalization.

The formalization follows the basic ideas of the existing formalizations of the first-order resolution calculus [16] and propositional CDCL calculi [3,4]. Compared to propositional logic, first-order logic adds a number of challenges: the extra term level requires to consider variables, substitutions, groundings, and the concept of factorization. To preserve completeness, propagation of ground literals must not be exhaustive anymore, resulting in a level-wise exploration w.r.t. a bounding atom. Inside this bound, the calculus always terminates. If one level does not suffice to find a refutation, the bound can be increased and exploration can be continued. For unsatisfiable formulas, we prove the existence of a bound sufficient to derive ⊥, which guarantees that only finitely many levels need to be explored.

The paper is now organized as follows. Section 2 recaps the SCL(FOL) calculus from Bromberger et al. as the basis of our formalization presented in Sect. 3. We first present the Isabelle formalization of the abstract rules of the SCL(FOL) calculus. Then we prove invariants preserved by the rules starting from the initial state, Lemma 1. Subsequently, we prove soundness, Theorem 7, non-redundancy of learned clauses, Theorem 11, termination with respect to a fixed bound, Theorem 18, and finally refutational completeness with respect to an appropriate bound, Theorem 20. We discuss important aspects of the formalization and proof ideas here and refer the reader to the formalization for more details. The paper ends with a short conclusion of the obtained results.

### 2 The SCL(FOL) Calculus

We shortly repeat basic first-order logic notions and the SCL(FOL) calculus presented in Bromberger et al. We consider an untyped, first-order logic without equality. A *term* is defined inductively as either a variable x or a function application f( −→t ) for a constant f and a (possibly-empty) list of terms −→t . An *atom* is a predicate symbol applied to a list of term arguments. A *literal* is either a positive atom A or a negative atom ¬A. For literals we write L or K. The atom of a literal may be selected with atom(A) = A and atom(¬A) = A. The complement of a literal is defined as comp(A) = ¬A and comp(¬A) = A. A disjunctive *clause* is a finite multiset of literals. For clauses we write C or D. We use the syntax L ∨ C and C ∨ D synonymously with the multiset sums {L} + C and C + D respectively. We also use the syntax ⊥ synonymously with the empty multiset {}. All variables in clauses are to be understood as universally quantified.

*Substitutions* are total unary functions from variables to terms. A substitution σ may be applied to a variable x, a term t, an atom A, a literal L, or a clause C, denoted xσ, tσ, Aσ, Lσ, or Cσ respectively. Substitution application is leftassociative, i.e., Cσ1σ<sup>2</sup> = (Cσ1)σ2. The domain of a substitution σ is defined as dom(σ) = {x | xσ = x}. The composition of two substitutions σ<sup>1</sup> and σ<sup>2</sup> is defined as the function σ<sup>1</sup> ◦ σ<sup>2</sup> = (λx. xσ1σ2). A substitution γ is a *grounding* for a term t, an atom A, a literal L, or a clause C if tγ, Aγ, Lγ, or Cγ are respectively ground, i.e., if they do not contain variables. A substitution ρ is a *renaming* if it is injective and xρ is a variable for all variables x. The *inverse* of a renaming ρ is any function ρ−<sup>1</sup> from terms to variables such that ρ−<sup>1</sup> (xρ) = x for all variables x. The *restriction* of a substitution σ to a set of variables V is defined as the function (λx. if x ∈ V then xσ else x). A substitution σ is *idempotent* if σ ◦ σ = σ. A substitution υ is a *unifier* for a set of terms T if t1υ = t2υ for all terms t<sup>1</sup> ∈ T and t<sup>2</sup> ∈ T. A substitution μ is a *most general unifier* (MGU) for a set of terms T if μ is a unifier for T and there exists a substitution σ such that μ ◦ σ = υ for all unifiers υ for T. A substitution μ is an *idempotent, most general unifier* (IMGU) for a set of terms T if μ is a unifier for T and μ ◦ υ = υ for all unifiers υ for T; note that μ is an IMGU iff it is both idempotent and a MGU.

When formalizing logical calculi, IMGUs are preferable because they allow to apply groundings to a term both directly and after applying an IMGU, i.e., tγ = tμγ for all terms t, groundings γ, and IMGU μ. Non-idempotent MGU do not have this property as the following counter-example shows. Consider the terms t<sup>1</sup> = f(x, y, z) and t<sup>2</sup> = f(w, y, z), the grounding γ = {x → a, y → b, z → c, w → a}, and the non-idempotent MGU μ = {x → w, y → z, z → y} where x, y, z, w are variables and a, b, c are ground constants, then we have t1γ = f(a, b, c) = f(a, c, b) = t1μγ. In published literature, an IMGU is often meant instead of an MGU; the idempotency requirement is often kept implicit because standard implementations for computing MGUs actually produce IMGUs.

The function gnd(C) = {Cγ | Cγ is ground} expresses the set of all groundings of a clause C. The function gnd(N)=(- C ∈ N. gnd(C)) expresses the set of all groundings of a set of clauses N; its subset whose clauses are restricted to atoms less than or equal to a bound β w.r.t. an order ≺<sup>B</sup> is defined as gndB<sup>β</sup>(N) = {<sup>C</sup> <sup>∈</sup> gnd(N)| ∀<sup>L</sup> <sup>∈</sup> C. atom(L) <sup>B</sup> <sup>β</sup>}. Note that gnd(gnd(N)) = gnd(N). The strict order ≺<sup>B</sup> is total on ground literals and is such that for each β there are only finitely many literals L with L ≺<sup>B</sup> β. An example of such an order could be KBO without zero-weight symbols. Note that LPO does not satisfy the last condition of a ≺<sup>B</sup> order although it is a well-founded and total order.

Herbrand entailment is defined as (I |=<sup>H</sup> N ←→ (∀C ∈ N. I |=<sup>H</sup> C)) for a set of clauses N, (I |=<sup>H</sup> C ←→ (∃L ∈ C. I |=<sup>H</sup> L)) for a clause C, (I |=<sup>H</sup> A ←→ A ∈ I), and (I |=<sup>H</sup> ¬A ←→ A /∈ I) for a literal with atom A; note that the symbol |=<sup>H</sup> is overloaded. Ground entailment is defined as (N<sup>1</sup> |=<sup>G</sup> N<sup>2</sup> ←→ (∀I.I |=<sup>H</sup> N<sup>1</sup> −→ I |=<sup>H</sup> N2)). First-order entailment is defined as (N<sup>1</sup> |= N<sup>2</sup> ←→ gnd(N1) |=<sup>G</sup> gnd(N2)). A set of ground clauses N is satisfiable if there exists a Herbrand interpretation I such that I |=<sup>H</sup> N; otherwise, it is unsatisfiable.

An annotated literal is the pairing of a literal with an annotation. We call it a *decision literal* when the annotation is a natural number n indicating the literal's level (i.e., that it is the nth decision) and a *propagation literal* when the annotation is a closure of the clause the literal originated from. The literal of an annotated literal K is denoted lit(K) and the annotation is denotated ann(K). The level of a clause is the maximum level of its literals. A *trail* is a finite sequence of annotated ground literals: it grows from left to right. The empty trail is written and appending a new annotated literal K to a trail Γ is written Γ,K. The concatenation of two trails Γ<sup>1</sup> and Γ<sup>2</sup> is written Γ2, Γ1. A trail Γ can be converted to a set with set(Γ).

A literal L is true under trail Γ if L ∈ {lit(K)|K ∈ set(Γ)}. A literal L is false under trail Γ if comp(L) ∈ {lit(K)|K ∈ set(Γ)}. A literal L is defined in a trail Γ if L is true or false under Γ; otherwise, it is undefined. A clause C is true under trail Γ if (∃L ∈ C. L is true under Γ). A clause C is false under trail Γ if (∀L ∈ C. L is false under Γ). A clause C is defined in a trail Γ if (∀L ∈ C. L is defined in Γ); otherwise, it is undefined.

The SCL(FOL) calculus is defined as a transition system operating on states (Γ; N;U; β; k; C) where Γ is a trail, N is a finite set of initial clauses, U is a finite set of learned clauses, β is a bounding atom restricting the considered ground literals, k is a natural number counting the number of decisions taken in Γ, and C is either or a clause closure (C; γ) such that Cγ is ground and false in Γ. The initial state is (; N; ∅; β; 0; ) for some initial clause set N and bound β.

The transition relation ⇒SCL is a mapping between states. The rules below are from Bromberger et al. and serve as a reference for the Isabelle formalization described in Sect. 3.

Propagate (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; ) <sup>⇒</sup>SCL (Γ, Lγ((C0∨L)μ;γ) ; N;U; β; k; ) if (C ∨L) ∈ (N ∪U), C = C0∨C1, C1γ = Lγ∨···∨Lγ, C0γ does not contain Lγ, μ is the IMGU of the literals in C<sup>1</sup> and L, (C ∨L)γ is ground, (C ∨L)γ ≺<sup>B</sup> {β}, C0γ false under Γ, and Lγ is undefined in Γ.

Decide (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; ) <sup>⇒</sup>SCL (Γ, Lγ<sup>k</sup>+1; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup> + 1; ) if L ∈ C for a C ∈ (N ∪U), Lγ is a ground literal undefined in Γ, and Lγ ≺<sup>B</sup> β. Conflict (Γ; N;U; β; k; ) ⇒SCL (Γ; N;U; β; k; (C; γ)) if C ∈ (N ∪ U), Cγ is false under Γ for a grounding substitution γ.

These rules construct a (partial) model via the trail Γ for N ∪ U until a conflict, i.e., a clause false under Γ is found. The above rules always terminate, because there are only finitely many ground literals L with L ≺<sup>B</sup> β. It might be necessary to successively increase β for full refutational completeness.

Skip (Γ,K; N;U; β; k; (C; γ)) ⇒SCL (Γ; N;U; β; k − i; (C; γ)) if comp(K) does not occur in Cγ, if K is a decision literal then i = 1; otherwise, i = 0.

Factorize (Γ; N;U; β; k; (C ∨ L ∨ L ; γ)) ⇒SCL (Γ; N;U; β; k; ((C ∨ L)μ; γ)) if Lγ = L γ and μ = IMGU(L, L ).

Note that this rule may be used multiple times if the conflicting clause contains more than two duplicates of a given literal or if multiple distinct literals have duplicates.

$$\begin{array}{ll} \mathsf{Resolve} & (\Gamma, K \gamma\_D ^{(D \vee K; \gamma\_D)}; N; U; \beta; k; (C \vee L; \gamma\_C)) \\ & \Rightarrow\_{\mathrm{SCL}} & (\Gamma, K \gamma\_D ^{(D \vee K; \gamma\_D)}; N; U; \beta; k; ((C \vee D) \mu; \gamma\_C \circ \gamma\_D)) \\ \mathrm{if } K \gamma\_D = \mathrm{comp}(L \gamma\_C), \, \mu = \mathrm{IMGU}(K, \mathrm{comp}(L)). \end{array}$$

The clauses D ∨ K and C ∨ L are assumed to have disjoint variables.

Backtrack (Γ0, K, Γ1, comp(Lγ)<sup>k</sup>; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; (<sup>C</sup> <sup>∨</sup> <sup>L</sup>; <sup>γ</sup>)) ⇒SCL (Γ0; N;U ∪ {C ∨ L}; β; j; )

if Cγ is of level i < k, and Γ0, K is the minimal trail subsequence such that there is a grounding substitution γ with (C ∨ L)γ is false under Γ0, K but not in Γ0, and Γ<sup>0</sup> is of level j.

The clause C ∨L added by the rule Backtrack to U is called a *learned clause*. The empty clause ⊥ can only be generated by rule Resolve or be already present in N, hence, as usual for CDCL-style calculi, the generation of ⊥ together with the clauses in N ∪ U represent a resolution refutation.

A sequence of SCL rule applications is called a *reasonable run* if the rule Decide does not enable an immediate application of rule Conflict. A sequence of SCL rule applications is called a *regular run* if it is a reasonable run and the rule Conflict has precedence over all other rules.

### 3 Formalization of the SCL(FOL) Calculus

The formalization introduces some new concepts absent from Sect. 2. A multiset C can be converted to a set, i.e., without duplicates, with set(C). The multiplicity of an element x in a multiset C is denoted by count(C, x). The cardinality of a multiset—the sum of the multiplicities of its elements—is denoted by |C|. The multiset whose only element is x with multiplicity n is denoted by repeat(n, x); note that count(repeat(n, x), x) = n, and set(repeat(n, x)) = {x} if n > 0. The multiset extension of an order on literals extends the order to multisets containing literals; we use the Huet-Oppen specification [11], one of several equivalent alternatives for this extension. The *adaptation* of a substitution σ to a renaming ρ is a function whose domain is the renamed domain of σ and whose codomain is the same as σ; it is defined as the function (λx. if <sup>x</sup> ∈ {yρ <sup>|</sup> <sup>y</sup> <sup>∈</sup> dom(σ)} then (ρ−<sup>1</sup> <sup>x</sup>)<sup>σ</sup> else <sup>x</sup>). A substitution <sup>γ</sup> is a *merged grounding* of a grounding γ<sup>A</sup> for a set of variables A and a grounding γ<sup>B</sup> for a set of variables B if (A ∩ B = {} −→ (∀x ∈ A. xγ<sup>A</sup> is ground) −→ (∀x ∈ B. xγ<sup>B</sup> is ground) −→ (∀x ∈ A. xγ = xγA) ∧ (∀x ∈ B. xγ = xγB)); an example of a function that fulfills this specification is (λx. if x ∈ A then xγ<sup>A</sup> else xγB). The length of a trail Γ is denoted by |Γ|. The nth right-most element of a trail Γ is denoted by Γ[n]; we use zero-based indexing where the right-most element is the 0th element. The Herbrand interpretation of a trail Γ is defined as HI(Γ)=(- K ∈ set(Γ). case lit(K) of A ⇒ {A}|¬A ⇒ {}).

The formalization also changes some existing concepts. No distinction is made between *atoms* and terms, so first-order terms are used everywhere in place of atoms. The level annotation of a *decision literal* is not required anymore and replaced by a † marker, it is now written K = (K; †) for some literal K. A *propagation literal* is written (KγD)(K;D;γD) = (KγD; (D; K; γD)) for some literal K, clause D, and grounding γD. Note that the propagated literal is explicitly separated from its clause in the closure annotation; this eases the formulation of the additional invariants 5 and 6 of Lemma 1., that the respective clause is always false under the respective trail. For the *trail* Γ,K, the Isabelle formalization uses the constructor List.Cons K Γ which actually grows from right to left. However, we keep the well-established left-to-right convention in this paper because it significantly eases the presentation. An state is a tuple (Γ;U; C) where Γ is a trail, U is a finite set of learned clauses, and C is an optional clause closure. The individual components can be selected with trail((Γ;U; C)) = Γ, learned((Γ;U; C)) = U, and conflict((Γ;U; C)) = C. The *initial state* is (; {}; ), i.e., empty trail, no learned clauses, and no conflicting closure. The finite set of initial clauses N and the bounding atom β are no longer stored in the state but are rather parameters of the transition relation; this was done to highlight the fact that they are never modified by any rule. The natural number k counting the number of decisions, used in Sect. 2 to determine an appropriate backtracking point, turned out not to be necessary and was dropped entirely. We assume the existence of a binary relation on atoms ≺<sup>B</sup> such that (∀β. {t|t ≺<sup>B</sup> β} is finite) but dropped the requirement for ≺<sup>B</sup> to be a strict order total on ground terms. We also don't lift ≺<sup>B</sup> to literals and clauses, but always use it at the atom level. We define the relation <sup>B</sup> as the reflexive closure of ≺B.

The transition relation <sup>⇒</sup>N,β SCL is a binary predicate between states and is parameterized by the finite set N of initial clauses and the bounding atom β. It is defined as the disjunction of the following rules. Following each rule, we highlight the main differences from Sect. 2 not already covered.

Propagate (Γ;U; ) <sup>⇒</sup>N,β Propagate (Γ,(Lμγ)(Lμ;C0μ;γ) ;U; ) if (L∨C) ∈ (N ∪U), γ is a grounding for L∨C, (∀K ∈ (L∨C). atom(Kγ) <sup>B</sup> β), C<sup>0</sup> = {K ∈ C | Kγ = Lγ}, C<sup>1</sup> = {K ∈ C | Kγ = Lγ}, C0γ is false under Γ, Lγ is undefined in Γ, and μ is an IMGU for all terms in {atom(K)| K ∈ (L ∨ C1)}.

Compared to Sect. 2, we express the splitting of C into C<sup>0</sup> and C<sup>1</sup> formally as set operations and replace ≺<sup>B</sup> with <sup>B</sup>. This replacement has no effect on the results but allowing the bound β to be in gndB<sup>β</sup>(N) eases the proof of Lemma 21, where the largest element of the (finite) unsatisfiable core is directly used as new bound. There are also situations where the maximal element of a signature is required to derive a contradiction: a non-strict bound requires to artificially extend the signature while a non-strict bound does not.

Decide (Γ;U; ) <sup>⇒</sup>N,β Decide (Γ,(Lγ);U; ) if (L∨C) ∈ N, γ is a grounding for L, Lγ is undefined in Γ, and atom(Lγ) <sup>B</sup> β.

Compared to Sect. 2, we replace ≺<sup>B</sup> with <sup>B</sup> and take the decision literal from N instead of N ∪ U. The ground instances of literals of U are a subset of the ground instances of literals of N so it is redundant to also consider U here.

Conflict (Γ;U; ) <sup>⇒</sup>N,β Conflict (Γ;U; (C; γ)) if C ∈ (N ∪ U), γ is a grounding for C, and Cγ is false under Γ.

Skip (Γ,K;U; (C; <sup>γ</sup>)) <sup>⇒</sup>N,β Skip (Γ;U; (C; γ)) if comp(lit(K)) ∈/ Cγ.

Factorize (Γ;U; (L <sup>∨</sup> <sup>L</sup> <sup>∨</sup> <sup>C</sup>; <sup>γ</sup>)) <sup>⇒</sup>N,β Factorize (Γ;U; ((L ∨ C)μ; γ)) if Lγ = L γ and μ is the IMGU for the terms atom(L) and atom(L ).

Resolve (Γ;U; (<sup>L</sup> <sup>∨</sup> <sup>C</sup>; <sup>γ</sup><sup>C</sup> )) <sup>⇒</sup>N,β Resolve (Γ;U; ((Cρ<sup>C</sup> ∨ DρD)μ; γ)) if Γ = Γ ,(KγD)(K;D;γD) , and Kγ<sup>D</sup> = comp(Lγ<sup>C</sup> ), ρ<sup>C</sup> and ρ<sup>D</sup> are renamings such that the variables of (L∨ C)ρ<sup>C</sup> and (K ∨ D)ρ<sup>D</sup> are disjoint, μ is the IMGU for the terms atom(L)ρ<sup>C</sup> and atom(K)ρD, γ <sup>C</sup> and γ <sup>D</sup> are adaptations of γ<sup>C</sup> and γ<sup>D</sup> to the renamings ρ<sup>C</sup> and ρ<sup>D</sup> respectively, and γ is a merged grounding of γ C for the variables of (L ∨ C)ρ<sup>C</sup> and γ <sup>D</sup> for the variables of (K ∨ D)ρD.

Note that the definition of merged grounding implies the following equalities: μ ◦ γ = γ, Lρ<sup>C</sup> γ = Lγ<sup>C</sup> , Cρ<sup>C</sup> γ = Cγ<sup>C</sup> , KρDγ = KγD, and DρDγ = DγD.

Compared to Sect. 2, we explicitly rename the merged clauses to avoid variable-name clashes instead of assuming disjoint variables, and use an abstract specification for the merged grounding instead of forcing substitution composition. The latter makes our rule more general by allowing more freedom to an implementation.

Backtrack (Γ, Γ ,K;U; (<sup>L</sup> <sup>∨</sup> <sup>C</sup>; <sup>γ</sup>)) <sup>⇒</sup>N,β Backtrack (Γ; {L ∨ C} ∪ U; ) if K = comp(Lγ) and (γ . (L ∨ C)γ is ground and false under Γ).

Compared to Sect. 2, we allow backtracking to any non-conflicting trail instead of specifying the position. This makes our rule more general by, again, allowing more freedom to an implementation. The minimally backtracking strategy introduced in Definition 4 brings back equivalence to the Backtrack rule of Sect. 2.

Isabelle Technicalities. We define the SCL rules in the scl\_fol\_calculus locale. It fixes an abstract binary relation ≺<sup>B</sup> as a locale parameter and assumes that it bounds a finite number of atoms. It also fixes an abstract function to generate variable renamings as a locale parameter and assumes its correctness; this function is not required for the specification of the calculus but is required in multiple proofs. Most of the following definitions and theorems are in the context of this locale. Each SCL rule is defined separately as an inductive predicate. Having separate definitions allows to refer to the rules individually in subsequent definitions and theorems. Using inductive predicates, as opposed to plain definitions, is convenient because Isabelle automatically generates some useful introduction and elimination lemmas, and configures structured Isar syntax for case analysis.

From the SCL rules, we can prove a number of invariants about states. Most of them are intuitive while few are technicalities of the Isabelle formalization. We will use the invariants as hypotheses for many of the main lemmas and theorems.

Lemma 1 (scl\_state\_invariants). *Let* (Γ;U; <sup>C</sup>) *be an state w.r.t.* <sup>⇒</sup>N,β *SCL . The following invariants hold for the initial state* (; {}; ) *and are each individually preserved by the SCL rules.*

	- *–* ∀DKγΓ Γ. Γ = Γ ,(Kγ)(K;D;γ), Γ −→ Dγ *is false under* <sup>Γ</sup> *–* ∀C γ. C = (C; γ) −→ Cγ *is false under* Γ
	- *–* ∀K ∈ *set*(Γ). ∀DK γ. *ann*(K)=(D; K; γ) −→ *lit*(K) = Kγ
	- *–* ∀K ∈ *set*(Γ). <sup>∀</sup>DK γ. <sup>K</sup> = (Kγ)(K;D;γ) −→ *comp*(Kγ) <sup>∈</sup>/ Dγ
	- ∀K ∈ D. ∃D ∈ N. ∃K ∈ D . ∃σ. K σ = K
	- *–* ∀K ∈ *set*(Γ). ∃L ∈ N ∪ U. ∃σ. Lσ = *lit*(K)

The SCL calculus is defined as a transition system where many decisions are deferred to strategies. A *strategy* specifies a transition system whose transitions are a subset of those from an existing transition system. We say that a strategy S *restricts* a transition system T (or symmetrically that T is *restricted* by S) if (∀x y. S x y −→ T x y). Note that strategies can be chained to iteratively apply more restrictions.

We define the reasonable and regular strategies restricting the <sup>⇒</sup>N,β SCL relation in order to prove the main results of this paper.

Definition 2. *The reasonable strategy* <sup>⇒</sup>N,β *Rea-SCL restricts the SCL calculus by preventing decisions that immediately lead to a conflict. Such situations could be replaced by a propagation. Formally:*

$$S \Rightarrow\_{\text{Rea-SCL}}^{N,\beta} S' \quad \longleftrightarrow \quad S \Rightarrow\_{SCL}^{N,\beta} S' \land (S \Rightarrow\_{Decide}^{N,\beta} S' \longrightarrow (\nexists S'' \Rightarrow\_{Conific}^{N,\beta} S''))$$

Definition 3. *The regular strategy* <sup>⇒</sup>N,β *Reg-SCL restricts the reasonable strategy by prioritizing the conflict rule to any other. Formally:*

$$S \Rightarrow\_{Reg\text{-}SCL}^{N,\beta} S' \quad \longleftrightarrow \quad S \Rightarrow\_{Reg\text{-}SCL}^{N,\beta} S' \land \{ (\exists S' \, '\, S \Rightarrow\_{Conflect}^{N,\beta} S' \prime) \longrightarrow S \Rightarrow\_{Conflect}^{N,\beta} S' \}$$

While not required for the coming results, we also define the minimally backtracking strategy to express the constraint on the backtracking position found in Sect. 2.

Definition 4. *The minimally backtracking strategy* <sup>⇒</sup>N,β *Min-Bac-SCL restricts the regular strategy by requiring that backtracking removes the shortest possible suffix of the trail. Formally:*

$$S \Rightarrow\_{Min \cdot{Bac \cdot{}} \cdot{Bac \cdot{}} \cdot{SCL}}^{N, \beta} S' \quad \longleftrightarrow S \Rightarrow\_{Reg \cdot{SCL}}^{N, \beta} S' \land (S \Rightarrow\_{Backtrack}^{N, \beta} S' \longrightarrow) \\ \text{trail}(S') \text{ is the longest prefix of } \text{trail}(S) \\ \text{not } \text{in } \text{conflict } \text{with the } \text{leural} \text{ clause})$$

All three strategies build on one-another and ultimately restrict the SCL relation. We can express this formally as implications, of which the first can be used to show that coming results (e.g., Corollaries 13 and 19) also hold for the minimally backtracking strategy.

Lemma 5 (strategy\_restrictions). *The minimally backtracking strategy restricts the regular strategy, which restricts the reasonable strategy, which restricts the SCL calculus. Formally:*

*–* ∀NβSS . S <sup>⇒</sup>N,β *Min-Bac-SCL* <sup>S</sup> −→ <sup>S</sup> <sup>⇒</sup>N,β *Reg-SCL* S *–* ∀NβSS . S <sup>⇒</sup>N,β *Reg-SCL* <sup>S</sup> −→ <sup>S</sup> <sup>⇒</sup>N,β *Rea-SCL* S *–* ∀NβSS . S <sup>⇒</sup>N,β *Rea-SCL* <sup>S</sup> −→ <sup>S</sup> <sup>⇒</sup>N,β *SCL* S

The bounding atom β restricts the calculus to only consider the finitely many ground atoms less than or equal to β w.r.t. ≺B; this will play an important role in the termination proof. When SCL terminates, it either derived a contradiction, or it found a model for the bounded groundings of the initial clauses. Because β is usually chosen heuristically, the model might be unsatisfactory for the considered use case and one may want to continue execution with a bigger bound. This is allowed if the new bound properly extends the previous bound β w.r.t. <sup>B</sup>.

Theorem 6 (monotonicity\_wrt\_bound). *If the ground atoms bound by* β *are a subset of the ground atoms bound by* β *, formally if* (∀A. A *is ground* −→ A <sup>B</sup> β −→ A <sup>B</sup> β )*, then the SCL, reasonable SCL, regular SCL, and minimally backtracking transitions w.r.t.* β *are also transitions w.r.t.* β *, formally*

*–* ∀NSS . S <sup>⇒</sup>N,β *SCL* <sup>S</sup> −→ <sup>S</sup> <sup>⇒</sup>N,β- *SCL* S *, –* ∀NSS . S <sup>⇒</sup>N,β *Rea-SCL* <sup>S</sup> −→ <sup>S</sup> <sup>⇒</sup>N,β- *Rea-SCL* S *, –* ∀NSS . S <sup>⇒</sup>N,β *Reg-SCL* <sup>S</sup> −→ <sup>S</sup> <sup>⇒</sup>N,β- *Reg-SCL* S *, and –* ∀NSS . S <sup>⇒</sup>N,β *Min-Bac-SCL* <sup>S</sup> −→ <sup>S</sup> <sup>⇒</sup>N,β- *Min-Bac-SCL* S *.*

Theorem 6 implies that all properties w.r.t. a bound β also hold w.r.t. a compatible bound β . Its hypothesis is fulfilled if <sup>B</sup> is transitive on ground atoms, β and β are ground atoms, and β <sup>B</sup> β . The bounding atom could even be increased at any point in an SCL run, not just when the calculus terminated.

The different rules and strategies considered so far express a single step of computation for the SCL calculus; they offer a good level of granularity to both understand and mechanize the details of the calculus. But many results of the following sections ought to express properties of the calculus as a whole. We express such results in terms of a run from the initial state. A *run* is the reflexive, transitive closure of a rule or strategy, e.g. <sup>S</sup> (⇒N,β SCL)<sup>∗</sup> S is an SCL run from the state S to the state S .

The soundness of the individual SCL rules is shown by invariant 10. We now consider the soundness of terminating runs of the SCL calculus as a whole.

Theorem 7 (correct\_termination). *Let* S = (Γ;U; C) *be a state w.r.t.* ⇒N,β *SCL . If invariants 2, 3, 5, 6 and 10 hold for* S*, and if* S *is a stuck state with some restrictions, formally if*

$$\begin{array}{l} \mathsf{-} \not\exists S'.S \Rightarrow\_{Propageate}^{N,\beta} S',\\ \mathsf{-} \not\exists S'.S \Rightarrow\_{Decide}^{N,\beta} S' \land (\not\exists S''.S' \Rightarrow\_{Confict}^{N,\beta} S''),\\ \mathsf{-} \not\exists S'.S \Rightarrow\_{Confict}^{N,\beta} S',\\ \mathsf{-} \not\exists S'.S \Rightarrow\_{Skip}^{N,\beta} S',\\ \mathsf{-} \not\exists S'.S \Rightarrow\_{Resolve}^{N,\beta} S',\\ \mathsf{-} \not\exists S'.S \Rightarrow\_{Backtrack}^{N,\beta} S' \text{ and the backtracking is minimal,} \end{array}$$

*then either the conflicting clause* ⊥ *has been derived and the groundings gnd*(N) *of the initial clauses* N *are unsatisfiable, or there is no conflicting clause and the groundings gnd*B<sup>β</sup>(N) *of the initial clauses* N *are satisfiable by the trail, formally either*

$$-\ (\exists \gamma. \mathcal{C} = (\bot; \gamma)) \land (\nexists I. \ I \vdash\_{\mathcal{H}} \mathit{gnd}(N)), \ or \dots$$

*–* <sup>C</sup> <sup>=</sup> ∧ HI(Γ) <sup>|</sup>=<sup>H</sup> *gnd*B<sup>β</sup>(N)*.*

Note that no hypothesis restricts the usage of the Factorize rule because it is an optional step of conflict resolution that has no impact on satisfiability.

Theorem 7 holds for a family of strategies, in contrast to Theorem 5 from Bromberget et al., which was only shown for what is here called the minimally backtracking strategy. This family of strategies contains any strategy that preserves the required invariants and is restricted by the minimally backtracking strategy. From Lemma 5 we know that these two requirements are fulfilled by the SCL relation but also by the reasonable, regular, and minimally backtracking strategies. This leads to a more intuitive corollary based on runs.

Corollary 8 (correct\_termination\_strategies). *If an SCL, reasonable SCL, regular SCL, or minimally backtracking SCL run starting from the initial state* (; {}; ) *terminates in a state* S = (Γ;U; C)*, formally any of*

*–* (; {}; ) (⇒N,β *SCL*)<sup>∗</sup> S ∧ (-S . S <sup>⇒</sup>N,β *SCL* S )*, –* (; {}; ) (⇒N,β *Rea-SCL*)<sup>∗</sup> S ∧ (-S . S <sup>⇒</sup>N,β *Rea-SCL* S )*, –* (; {}; ) (⇒N,β *Reg-SCL*)<sup>∗</sup> S ∧ (-S . S <sup>⇒</sup>N,β *Reg-SCL* S )*, or –* (; {}; ) (⇒N,β *Min-Bac-SCL*)<sup>∗</sup> S ∧ (-S . S <sup>⇒</sup>N,β *Min-Bac-SCL* S )*,*

*then the conclusion of Theorem 7 holds.*

Note that each strategy is used with positive polarity in the "run" hypothesis and negative polarity in the "no-more-step" hypothesis. For this reason, it is impossible to provide a corollary with a single requirement to restrict or be restricted by any known strategy.

Traditional saturation-based calculi for first-order logic, e.g. Resolution and Superposition, can learn redundant clauses and thus their implementations require costly checks for non-redundancy. SCL(FOL) learns only non-redundant clauses. Thus, an implementation would not need to check for (forward) nonredundancy. We first repeat the definition of *standard redundancy* as found in [18].

Definition 9. *A clause* C *is redundant w.r.t. a set of clauses* N *and a strict order on clauses* ≺ *if* (∀C ∈ *gnd*(C). {D ∈ *gnd*(N)| D ≺ C } |=<sup>G</sup> C )*.*

We first prove non-redundancy w.r.t. a trail-induced dynamic order and then lift this result to non-redundancy w.r.t. a static order.

Definition 10. *A trail* <sup>Γ</sup> *induces a well-founded, strict partial order* <sup>≺</sup><sup>Γ</sup> *, total on all atoms in* Γ*'s literals. Assuming* Γ *has the form* L<sup>∗</sup> n,...,L<sup>∗</sup> 2,L<sup>∗</sup> 1,L<sup>∗</sup> <sup>0</sup> *for all* ∗ ∈ {†,(D, γD) *for some* D *and* γD}*, we have the following ordering.*

$$\vdash atom(L\_n) \prec^{\Gamma} \dots \prec^{\Gamma} atom(L\_2) \prec^{\Gamma} atom(L\_1) \prec^{\Gamma} atom(L\_0)$$

*In other words, "older" elements on the left are smaller than "newer" elements on the right. Formally:*

$$\forall t\_1 \prec^{\Gamma} t\_2 \longleftarrow \left( \exists i < |\Gamma|. \exists j < i. \, t\_1 = atom(\operatorname{lit}(\Gamma[i])) \land t\_2 = atom(\operatorname{lit}(\Gamma[j])) \right)$$

Compared to Bromberger et al., the trail-induced order is defined on atoms instead of literals and non-redundancy is proven for any lifting to literals.

Theorem 11 (dynamic\_non\_redundancy\_regular\_scl). *Following conflict resolution in a regular run, formally if*

$$\begin{array}{ll} - & (\epsilon; \{\}; \top) \left( \Rightarrow\_{Reg\text{-}SCL}^{N,\beta} \right)^{\*} (\varGamma; U; \top), \\ - & (\varGamma; U; \top) \Rightarrow\_{Conflict}^{N,\beta} \ S\_{1}, \\ - & S\_{1} \left( \Rightarrow\_{Skip, Factorize, Resolve}^{N,\beta} \right)^{+} S\_{n}, \; and \\ - & S\_{n} \Rightarrow\_{Backtrack}^{N,\beta} \ S\_{1+n}, \end{array}$$

*then neither is the learned clause* C = *conflict*(Sn) *generalized by any initial or learned clause, formally* (-D ∈ N ∪ U. ∃σ. Dσ = C)*, nor is it redundant w.r.t.* <sup>N</sup> <sup>∪</sup><sup>U</sup> *and the order we get by first lifting the trail-induced order* <sup>≺</sup><sup>Γ</sup> *from atoms to literals and then taking its multiset extension.*

Dynamic non-redundancy with respect to the trail-induced order does not by itself release an implementation from performing backward non-redundancy checks, but it is a strong guarantee on the quality of learned clauses. For backward redundancy checks an order needs to be used that encompasses all dynamic trail-induced orders. An order based on a strict multiset relation has this property. So for backward redundancy we can, e.g., delete subsumed clauses.

Corollary 12 (static\_non\_subsumption\_regular\_scl). *If a regular run starting from the initial state* (; {}; ) *learns a clause* C*, formally if*

$$\begin{array}{ll} - \ (\epsilon; \{\}; \top) \left( \Rightarrow\_{Reg\text{-}SCL}^{N,\beta} \right)^{\*} \ (\Gamma; U; (C; \gamma)) \ and \\ - \ (\Gamma; U; (C; \gamma)) \Rightarrow\_{Backtrack}^{N,\beta} \ S, \end{array}$$

*then* C *is not subsumed by any of the initial or learned clauses, formally* -D ∈ N ∪ U. ∃σ. Dσ ⊆ C*.*

All non-redundancy results can be generalized to an arbitrary strategy restricting the regular strategy. We only show one example here and refer the reader to the formalization for the others.

Corollary 13 (dynamic\_non\_redundancy\_strategy). *Following conflict resolution in the run of a strategy restricting regular SCL, formally if*

$$\begin{array}{ll} - & \left(\epsilon; \{\}\right); \top \right) (\Rightarrow\_{Strapy}^{N,\beta})^{\*} \left(\varGamma; U; \top\right), \\ - & \left(\Gamma; U; \top\right) \Rightarrow\_{Conflict}^{N,\beta} S\_{1}, \\ - & S\_{1} \left(\Rightarrow\_{Skip, Factorize, Resolve}\right)^{+} S\_{n}, \\ - & S\_{n} \Rightarrow\_{Backtrack}^{N,\beta} S\_{1+n}, \text{ and} \\ - & \forall S \, S'. \, S \Rightarrow\_{Strapy}^{N,\beta} S' \longrightarrow S \Rightarrow\_{Reg\*SCL}^{N,\beta} S', \end{array}$$

*then neither is the learned clause generalized by any initial or learned clause, formally* (-D ∈ N ∪ U. ∃σ. Dσ = C)*, nor is it redundant w.r.t.* N ∪ U *and the order we get by first lifting the trail-induced order* <sup>≺</sup><sup>Γ</sup> *from atoms to literals and then taking its multiset extension.*

During the development of this formalization, we discovered that the original Backtrack rule found in [6] allows to learn a duplicate of the last learned clause, which violates the stated non-redundancy of learned clauses. The original Backtrack rule ensures that the conflict closure is not false under the new trail, but the learned clause could still be in conflict w.r.t. another grounding. Following this conflict, the Backtrack rules would be immediately applicable and would learn the same clause again. This could only happen a finite number of times as backtracking reduces the length of the (finite) trail. As an example, consider the set of clauses N = {P(x), Q(y),¬Q(z)∨R(z),¬R(w)∨S(w),¬P(v)∨¬S(v)}, and a big enough β. The following SCL run was valid with the original Backtrack rule. Note that the notation for the trail was shortened to save space.

(; {}; ) (⇒N,β Decide) <sup>∗</sup> (P(a),Q(a),P(b),Q(b); {}; ) (⇒N,β Propagate) <sup>∗</sup> (P(a),Q(a),P(b),Q(b),R(b) (R(z);¬Q(z);z→b) ,S(b) (S(w);¬R(w);w→b) ; {}; ) ⇒N,β Conflict (P(a),Q(a),P(b),Q(b),R(b) (R(z);¬Q(z);z→b) ,S(b) (S(w);¬R(w);w→b) ; {}; (¬P (v)∨¬S(v); v →b)) ⇒N,β Resolve+Skip (P(a),Q(a),P(b),Q(b),R(b) (R(z);¬Q(z);z→b) ; {}; (¬P (v) ∨ ¬R(v); v →b)) ⇒N,β Resolve+Skip (P(a),Q(a),P(b),Q(b); {}; (¬P(v) ∨ ¬Q(v); v → b)) ⇒N,β Backtrack (P(a),Q(a),P(b); {¬P(v) ∨ ¬Q(v)}; ) ⇒N,β Conflict+Skip (P(a),Q(a); {¬P(v) ∨ ¬Q(v)}; (¬P(v) ∨ ¬Q(v); v →a)) ⇒N,β Backtrack (P(a); {¬P(v) ∨ ¬Q(v)}; )

This counterexample was only discovered when we failed to prove Theorem 11 in Isabelle. Note that this formalization is based on and was developed simultaneously to Bromberger et al., which originally inherited the Backtrack rule from [10]. The solution, which was promptly integrated into this formalization and Bromberger et al., is for the Backtrack rule to find a position without conflict w.r.t. the learned clause. Note that the original Backtrack rule reaches such a state after having learned the same clause finitely often, which has no effect on the set of learned clauses because sets ignore duplicates. Thus, the original Backtrack rule did not invalidate the other properties of the SCL calculus. This discovery is strong evidence of the usefulness of mechanized formalization for both published work and ongoing research: the Isabelle formalization lead to the discovery of a previously unknown bug and it guided the development of the refinement.

A calculus expressed as a state machine terminates if the transition relation starting from the initial state is well-founded following the arrow direction. We prove well-foundedness of regular SCL in three steps: (1) we first prove wellfoundedness of SCL without backtracking, denoted <sup>⇒</sup>N,β SCL-no-Back ; (2) we then prove that a regular run can only learn finitely many clauses; and (3) from these two results we finally prove well-foundedness of regular SCL. Step 1 is novel to the formalization. Prior work in Bromberger et al. focuses exclusively on the Backtrack rule (step 2) in order to prove termination of regular SCL (step 3). Also novel to the formalization are decreasing measuring functions for steps 1 and 2.

Definition 14. *The measuring function* M3(N, β,S) *for SCL without backtracking maps a set of initial clauses* N*, a bounding atom* β*, and a state* S *to a 4-tuple. The tuple elements are (1) a boolean identifying whether the state is conflict-free, (2) a (finite) set overapproximating the literals that could be added to the trail, (3) a (finite) list overapproximating the numbers of resolution steps that could be performed at each position in the trail, and (4) the (finite) cardinality of the conflicting clause. Formally:*

$$\begin{aligned} \mathcal{M}\_1(\beta,\Gamma) &= \{L \, | \, atom(L) \preceq\_B \beta\} - \{list(K) \, | \, \mathcal{K} \in set(\Gamma)\} \\ \mathcal{M}\_2(\epsilon, C) &= \epsilon \\ \mathcal{M}\_2((\Gamma, K), C) &= \mathcal{M}\_2(\Gamma, C), 0 \\ \mathcal{M}\_2((\Gamma, (K\gamma)^{(K; D; \gamma)}), C) &= \operatorname{let}\, n = count(C, comp(K\gamma)) \text{ in } n \\ &= \mathcal{M}\_2(\Gamma, C \vee repeat(n, D\gamma)), n \\ \mathcal{M}\_3(N, \beta, (\Gamma; U; C; \top)) &= (True; \mathcal{M}\_1(\beta, \Gamma); \epsilon; 0) \\ \mathcal{M}\_3(N, \beta, (\Gamma; U; (C; \gamma))) &= (False; \{\}; \mathcal{M}\_2(\Gamma, C); |C|) \end{aligned}$$

With this, we can prove termination of SCL without backtracking (step 1).

Theorem 15 (termination\_scl\_without\_back). *SCL without backtracking is well-founded on all states reachable by an SCL-without-backtracking run starting from the initial state, formally on* {<sup>S</sup> <sup>|</sup>(; {}; ) (⇒N,β *SCL-no-Back*)<sup>∗</sup> S}*.*

We now turn to proving termination of regular SCL with backtracking by first defining an appropriate measuring function.

Definition 16. *The measuring function* M4(β,S) *for the rule Backtrack maps a bounding atom* β *and a state* S *to a finite set of clauses without duplicates. It computes an over-approximation of the set of clauses that could still be learned modulo duplicates. Formally:*

$$\mathcal{M}\_4(\beta, S) := 2^{\{L \mid atom(L) \preceq\_B \beta\}} - \{set(C) \mid C \in \operatorname{gnd}(\operatorname{ learned}(S))\}$$

We then prove that it decreases every time we learn a new clause (step 2).

Lemma 17 (M*\_back\_after\_regular\_backtrack*). *Following conflict resolution in a regular run, formally if*

$$\begin{array}{ll} - & (\epsilon; \{\}; \top) \left( \Rightarrow\_{Reg\text{-}SCL}^{N,\beta} \right)^{\*} (\varGamma; U; \top), \\ - & (\varGamma; U; \top) \Rightarrow\_{Conf\text{-}Rict}^{N,\beta} \ S\_{1}, \\ - & S\_{1} \left( \Rightarrow\_{Skip, Factorize, Resolve}^{N,\beta} \right)^{+} S\_{n}, \; and \\ - & S\_{n} \Rightarrow\_{Backtrack}^{N,\beta} S\_{1+n}, \; then \end{array}$$

*1. the ground conflict is distinct from all groundings of initial and learned clauses modulo duplicates, formally* (∃C γ. *conflict*(Sn)=(C; γ) ∧ *set*(Cγ) ∈/ {*set*(D)| D ∈ *gnd*(N ∪ U)})*, and*

*2. the set of clauses that could potentially be learned strictly diminishes, formally* M4(β,S1+n) ⊂ M4(β,Sn)*.*

Lemma 17 is novel to the formalization. Together with Theorem 15 it allows us to prove termination of regular SCL with backtracking (step 3).

Theorem 18 (termination\_regular\_scl). *Regular SCL is well-founded on all states reachable by a regular-SCL run starting from the initial state, formally on* {<sup>S</sup> <sup>|</sup>(; {}; ) (⇒N,β *Reg-SCL*)<sup>∗</sup> S}*.*

All termination results can be generalized to an arbitrary strategy restricting the regular strategy. We only show one example here and refer the reader to the formalization for the others.

Corollary 19 (termination\_strategy). *If a strategy restricts regular SCL, formally if* (∀S S . S <sup>⇒</sup>N,β *Strategy* <sup>S</sup> −→ <sup>S</sup> <sup>⇒</sup>N,β *Reg-SCL* S )*, then it is well-founded on all states reachable by a run using this strategy and starting from the initial state, formally on* {<sup>S</sup> <sup>|</sup>(; {}; ) (⇒N,β *Strategy*)<sup>∗</sup> S}*.*

All theorems until now were first expressed and proven using invariants and then the versions expressed using runs were derived. However, Theorem 18 posed an interesting problem because its proof requires the backtracking step to have knowledge of the trail when a conflict last occurred. But this information is lost in the SCL state due to the Skip rule shrinking the trail. We did define an invariant that expresses the historical form of the trail and its properties derived from the regular strategy, but it is complex and the added value compared to working directly on a regular run is questionable. For simplicity, we chose not to present this invariant in this paper.

Together, soundness and termination allow us to prove refutational completeness of the regular SCL calculus w.r.t. a fixed bound.

Theorem 20 (completeness\_wrt\_bound). *If the groundings gnd*B<sup>β</sup>(N) *of the initial clauses* N *are unsatisfiable, then all regular SCL runs starting from the initial state terminate and derive the conflicting clause* ⊥*, formally*


Theorem 20 is only defined w.r.t. a bound, but fortunately we can prove that there must always exist an appropriate bound.

Lemma 21 (ex\_bound\_if\_unsat). *If the relation* ≺<sup>B</sup> *is a well-founded, strict order, total on ground atoms and the groundings gnd*(N) *of the initial clauses* N *are unsatisfiable, then there exists a bound* β *such that the groundings gnd*B<sup>β</sup>(N) *are unsatisfiable.*

Note that while Lemma 21 proves the existence of an appropriate bound, it provides no constructive way of finding one. What one can do is follow along Theorem 6 and iteratively increase a heuristically chosen bound until an appropriate one is found; if the set of initial clauses is unsatisfiable, this will terminate.

Isabelle Technicalities. Lemma 21's hypothesis that ≺<sup>B</sup> is a well-founded, total, strict order cannot be expressed as a theorem-local hypothesis. The reason is that the compactness theorem for clausal first-order logic requires terms to be an instance of the wellorder type class, which is not the case in the scl\_fol\_calculus locale, where the assumptions on the ≺<sup>B</sup> relation are kept minimal. Because Isabelle does not allow to instantiate a type class with a concrete type inside a locale or theorem, we define a new locale that extends scl\_fol\_calculus and adds a type class requirement on the first-order term constants. This enables the type-class system to automatically instantiate the wellorder type class for terms using the previously registered Knuth-Bendix order. We then instantiate the ≺<sup>B</sup> relation of scl\_fol\_calculus with the Knuth-Bendix order. This type class and locale gymnastic could be avoided if the formalization of the compactness theorem was refactored to offer a predicatebased version alongside the existing type-class-based version.

### 4 Conclusion

We generalized and formalized the SCL(FOL) calculus in Isabelle/HOL. The main results are formal proofs of soundness, non-redundancy of learned clauses, termination, and refutational completeness. Because the formalization was performed simultaneously to Bromberger et al., they could benefit from each other. A mechanized formalization must consider low-level details, but it is also the opportunity to identify the most import aspects of the theory and abstract over details needed in the context of an actual implementation. For example, we abstracted from the level of a state to define the Backtracking rule and replaced it with an abstract specification of the result. A level was used in all pen-anpaper presentations of the calculus in order to have a constructive way of going back to the maximal trail where the learned clause propagates. The abstraction supports investigation of several Backtrack rule versions and to base the soundness result on a version with a minimal requirement, i.e., the learned clause is no longer false with respect to the trail.

The formalization did uncover a small bug in the calculus, but also showed that its effect was very localized and naturally lead to a solution. Another benefit of the formalization is how much it supports refactoring and exploratory experimentation. When making a change to a definition or a conjecture, Isabelle immediately and exhaustively points to the parts that need to be adapted. Very often, proofs can automatically be adapted using proof automation tools such as Sledgehammer. This was invaluable to quickly try out ideas or change subtle parts of the calculus. One such example is in the Resolve rule, where the formalization first used substitution composition as found in the original calculus and latter replaced it by an abstract specification of merged grounding. This idea came from a private discussion sketching an eventual C implementation where it became clear that substitution composition would be a costly operation. We then introduced the abstract specification of merged grounding and fixed the formalization by following the mistakes reported by Isabelle.

Acknowledgments. We thank Sophie Tourret and Jasmin Blanchette for many fruitful discussions. We thank the anonymous reviewers for their constructive feedback on this paper.

### References


Schulz, S., Ternovska, E. (eds.) The 8th International Workshop on the Implementation of Logics, IWIL 2010, Yogyakarta, Indonesia, 9 October 2011. EPiC Series in Computing, vol. 2, pp. 1–11. EasyChair (2010). https://doi.org/10.29007/36dt


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **SCL(FOL) Can Simulate Non-Redundant Superposition Clause Learning**

Martin Bromberger<sup>1</sup>, Chaahat Jain1,2, and Christoph Weidenbach1(B)

<sup>1</sup> Max Planck Institute for Informatics, Saarbr¨ucken, Germany

*{*mbromber,cjain,weidenbach*}*@mpi-inf.mpg.de <sup>2</sup> Graduate School of Computer Science, Saarbr¨ucken, Germany

**Abstract.** We show that SCL(FOL) can simulate the derivation of nonredundant clauses by superposition for first-order logic without equality. Superposition-based reasoning is performed with respect to a fixed reduction ordering. The completeness proof of superposition relies on the grounding of the clause set. It builds a ground partial model according to the fixed ordering, where minimal false ground instances of clauses then trigger non-redundant superposition inferences. We define a respective strategy for the SCL calculus such that clauses learned by SCL and superposition inferences coincide. From this perspective the SCL calculus can be viewed as a generalization of the superposition calculus.

**Keywords:** first-order reasoning · superposition · SCL · non-redundant clause learning

### **1 Introduction**

Superposition [1,2,18] is currently considered as the prime calculus for firstorder logic reasoning where all leading first-order theorem provers implement a variant thereof [14,16,20,22]. More recently, the family of SCL calculi (Clause Learning from Simple Models, or just Simple Clause Learning) [4,8,9,11,17] was introduced. There are first experimental results [3] available, and first steps towards an overall implementation [5,7].

The main differences between superposition and SCL for first-order logic without equality are: (i) superposition assumes a fixed ordering on literals whereas the ordering in SCL is dynamic and evolves out of the satisfiability of clauses, (ii) superposition performs single superposition left and factoring inferences whereas SCL typically performs several such inferences to derive a single learned clause, (iii) the superposition model operator is not effective on the non-ground clause level whereas the SCL model assumption is effective. For first-order logic without equality superposition reduces to ordered resolution combined with the powerful superposition redundancy criterion. Our simulation result cannot be one-to-one because an SCL learned clause is typically generated by several superposition inferences and superposition factoring inferences are performed by SCL only in the context of resolution inferences. The simulation result considers the

c The Author(s) 2023 ground case, where the superposition strategy used in the completeness proof only triggers non-redundant inferences [1]. We call this strategy SUP-MO, Definition 5. Overall first-order superposition completeness is then obtained by a lifting argument to the non-ground clause level. We actually show that a superposition refutation of some ground clause set can be simulated by an SCL refutation on the same clause set, such that they coincide on all superposition left (ordered resolution) inferences. For the superposition calculus we refer to [1] and for SCL to [9] where all main properties of both calculi have meanwhile been verified inside the Isabelle framework [10,19,21].

For example, consider a superposition refutation of the simple ground clause set

$$N\_{\rm{SUP}}^0 = \{ (C\_1) \; P(a) \lor P(a), \quad (C\_2) \; \neg P(a) \lor Q(b), \quad (C\_3) \; \neg Q(b) \}$$

with respect to a KBO [13], where all symbols have weight one, and precedence a <sup>≺</sup> b <sup>≺</sup> P <sup>≺</sup> Q. Superposition generates only non-redundant clauses. Then with respect to the usual superposition ordering extension to literals and clauses we get (C<sup>1</sup>) <sup>≺</sup>KBO-SUP (C<sup>2</sup>) <sup>≺</sup>KBO-SUP (C<sup>3</sup>) and the superposition model operator produces the Herbrand model N<sup>0</sup> SUP,<sup>I</sup> <sup>=</sup> <sup>∅</sup>. Now clause (C<sup>1</sup>) is the minimal false clause, triggering a factoring inference resulting in (C<sup>4</sup>) P(a) and clause set N<sup>1</sup> SUP <sup>=</sup> <sup>N</sup><sup>0</sup> SUP ∪ {(C<sup>4</sup>) <sup>P</sup>(a)}. The clause <sup>P</sup>(a) cannot be derived by SCL because factoring is only preformed in the context of resolution inferences. Now (C<sup>4</sup>) is the smallest clause in N<sup>1</sup> SUP and the superposition model operator produces N<sup>1</sup> SUP,<sup>I</sup> <sup>=</sup> {P(a), Q(b)} with minimal false clause (C<sup>3</sup>). A superposition left inference between (C<sup>3</sup>) and (C<sup>2</sup>) generates (C<sup>5</sup>) <sup>¬</sup>P(a) and N2 SUP <sup>=</sup> <sup>N</sup><sup>1</sup> SUP ∪ {(C<sup>5</sup>) <sup>¬</sup>P(a)}. The generation of <sup>¬</sup>P(a) can now be simulated by SCL by constructing the SCL trail [P(a)<sup>1</sup>Q(b){¬<sup>P</sup> (a)∨Q(b)}] out of N0 SUP <sup>=</sup> <sup>N</sup><sup>0</sup> SCL leading to the learned clause (C<sup>5</sup>) <sup>¬</sup>P(a) and respective clause set N<sup>2</sup> SCL <sup>=</sup> <sup>N</sup><sup>0</sup> SCL ∪ {(C<sup>5</sup>) <sup>¬</sup>P(a)}. Note that <sup>P</sup>(a) could have also been propagated, see Sect. 2 rule Propagate, but this would eventually not lead to the learned clause (C<sup>5</sup>) <sup>¬</sup>P(a) but <sup>⊥</sup>. Finally, the superposition model operator produces N<sup>2</sup> SUP,<sup>I</sup> <sup>=</sup> {P(a), Q(b)} with minimal false clause (C<sup>5</sup>) and infers <sup>⊥</sup>. The SCL simulation generates the trail [P(a){<sup>P</sup> (a)}] and then learns <sup>⊥</sup> as well out of a conflict with (C<sup>5</sup>). Note that this SCL trail is based on a factoring of (C<sup>1</sup>) to P(a) that was the explicit first step of the superposition refutation. Recall that by using an exhaustive propagation strategy, SCL would start with the trail [P(a)<sup>P</sup> (a) Q(b){¬<sup>P</sup> (a)∨Q(b)}] and immediately derive <sup>⊥</sup>. Exhaustive propagation is not a good strategy in general, because first-order logic clauses may enable infinitely many propagations. Even together with the typical SCL restriction to finitely many ground instances, there are exponentially many propagations possible, in general. Therefore, the *regular* strategy defined in [9] does not require exhaustive propagation, but guarantees non-redundant clause learning. The SCL-SUP strategy, Definition 8, and Definition 10, simulating superposition SUP-MO runs is also a regular strategy, Lemma 17.

The paper is now organized as follows. After repetition of the needed concepts of SCL and superposition, Sect. 2, the simulation result is contained in Sect. 3. We show that any superposition refutation of a ground clause set producing only non-redundant inferences through the SUP-MO strategy, can be simulated via the SCL-SUP strategy. Based on the 14 simulation invariants of Definition 7, we show the invariants by an inductive argument on the length of the superposition refutation, starting from the initial state, Lemma 13, for intermediate superposition inference steps Lemma 14, until the final refutation Lemma 15, and Lemma 16. For the simulation we do not consider selection in superposition inferences in favor of a less complicated presentation. The paper ends with a discussion of the obtained results. A full version of the paper including all proofs is available on arxiv [6].

### **2 Preliminaries**

We assume a first-order language without equality where N denotes a clause set; C, D denote clauses; L, K, H denote literals; A, B denote atoms; P, Q, R denote predicates; t, s terms; f, g, h function symbols; a, b, c constants; and x, y, z variables. Atoms, literals, clauses and clause sets are considered as usual, where in particular clauses are identified both with their disjunction and multiset of literals [9]. The complement of a literal is denoted by the function comp. The function atom(L) denotes the atomic part of a literal. Semantic entailment <sup>|</sup><sup>=</sup> is defined as usual where variables in clauses are assumed to be universally quantified. Substitutions σ, τ are total mappings from variables to terms, where dom(σ) := {x <sup>|</sup> xσ <sup>=</sup> x} is finite and codom(σ) := {t <sup>|</sup> xσ <sup>=</sup> t, x <sup>∈</sup> dom(σ)}. Their application is extended to literals, clauses, and sets of such objects in the usual way. A term, atom, clause, or a set of these objects is *ground* if it does not contain any variable. A substitution σ is *ground* if codom(σ) is ground. A substitution σ is *grounding* for a term t, literal L, clause C if tσ, Lσ, Cσ is ground, respectively. The function mgu denotes the *most general unifier* of two terms, atoms, literals. We assume that any mgu of two terms or literals does not introduce any fresh variables and is idempotent. A *closure* is denoted as C · σ and is a pair of a clause C and a substitution σ that is grounding for C. The function ground returns the set of all ground instances of a literal, clause, or clause set with respect to the signature of the respective clause set.

<sup>A</sup> *(partial) model* M for a clause set N is a satisfiable set of ground literals. A ground clause C is true in M, denoted M <sup>|</sup><sup>=</sup> C, if C <sup>∩</sup> M <sup>=</sup> <sup>∅</sup>, and false otherwise. A ground clause set N is true in M, denoted M <sup>|</sup><sup>=</sup> N if all clauses from N are true in M. A *(partial) Herbrand model* I for a clause set N is a set of ground atoms. A ground clause <sup>C</sup> is true in <sup>I</sup>, denoted <sup>I</sup> <sup>|</sup>=<sup>H</sup> <sup>C</sup>, if there is an atom A <sup>∈</sup> C such that A <sup>∈</sup> I, or there is a negative literal <sup>¬</sup>A <sup>∈</sup> C such that A ∈ I, and false otherwise. A ground clause set N entails a ground clause C, denoted N <sup>|</sup><sup>=</sup> C, if M <sup>|</sup><sup>=</sup> C implies M <sup>|</sup><sup>=</sup> {C} for all models M.

We identify sets and sequences whenever appropriate. However, the trail of an SCL run is always a sequence of ground literals.

Let ≺ denote a well-founded, total, strict ordering on ground literals. This ordering is then lifted to clauses and clause sets by its respective multiset extension. We overload ≺ for literals, clauses, clause sets if the meaning is clear from the context. The ordering is lifted to the non-ground case via instantiation: we define C <sup>≺</sup> D if for all grounding substitutions σ it holds Cσ <sup>≺</sup> Dσ. We define as the reflexive closure of <sup>≺</sup> and N <sup>C</sup> := {<sup>D</sup> <sup>|</sup> <sup>D</sup> <sup>∈</sup> <sup>N</sup> and <sup>D</sup> C}.

**Definition 1 (Clause Redundancy).** *A ground clause* C *is* redundant *with respect to a ground clause set* N *and an order* <sup>≺</sup> *if* N <sup>C</sup> <sup>|</sup><sup>=</sup> <sup>C</sup>*. A clause* <sup>C</sup> *is* redundant *with respect to a clause set* N *and an order* <sup>≺</sup> *if for all* C <sup>∈</sup> ground(C) *it holds that* C *is redundant with respect to* ground(N)*.*

Let ≺<sup>B</sup> denote a well-founded, total, strict ordering on ground atoms such that for any ground atom A there are only finitely many ground atoms B with <sup>B</sup> <sup>≺</sup><sup>B</sup> <sup>A</sup>. For example, an instance of such an ordering could be KBO without zero-weight symbols. (Note that LPO does not satisfy the last condition of a ≺<sup>B</sup> ordering although it is a well-founded, total, strict ordering.) The ordering ≺<sup>B</sup> is lifted to literals by comparing the respective atoms and if the atoms of two literals are the same, then the negative version of the literal is larger than the positive version. It is lifted to clauses by a multiset extension.

*The SCL(FOL) Calculus:* The inference rules of SCL(FOL) [9] are represented by an abstract rewrite system. They operate on a problem state, a six-tuple (Γ; N;U; β; k; D) where Γ is a sequence of annotated ground literals, the *trail*; N and U are the sets of *initial* and *learned* clauses; β is a ground literal limiting the size of the trail; k counts the number of decisions; and D is either , <sup>⊥</sup> or a clause closure C · σ such that Cσ is ground and false in Γ. Literals in Γ are either annotated with a number, also called a level; i.e., they have the form L<sup>k</sup> meaning that L is the k-th guessed decision literal, or they are annotated with a closure that propagated the literal to become true. A ground literal L is of *level* i with respect to a problem state (Γ; N;U; β; k; D) if L or comp(L) occurs in Γ and the first decision literal left from L (comp(L)) in Γ, including L, is annotated with i. If there is no such decision literal then its level is zero. A ground clause D is of *level* i with respect to a problem state (Γ; N;U; β; k; D) if i is the maximal level of a literal in D. The level of the empty clause <sup>⊥</sup> is 0. Recall D is a non-empty closure or or <sup>⊥</sup>. Similarly, a trail Γ is of level i if the maximal literal in Γ is of level i.

A literal/atom L/A is *undefined* in Γ if neither L/A nor comp(L)/comp(A) occur in Γ. The start state of SCL is (; <sup>N</sup>; <sup>∅</sup>; <sup>β</sup>; 0; ) for some initial clause set N and bound β. The below rules are exactly the rules from [9] and serve as a reference for our simulation proof in Sect. 3.

$$\begin{array}{ccccc}\textbf{Propagate} & \left(\varGamma;N;U;\beta;k;\top\right) & \Rightarrow\_{\text{SCL}} & \left(\varGamma,L\sigma^{(C\_0\vee L)\delta\cdot\sigma};N;U;\beta;k;\top\right) \\ \hfil\mathrel{1} & 1& \alpha\text{ }\dots\text{ }\tau\text{ }\ldots\text{ }\tau\text{ }\omega\text{ }\ \gamma\text{ }\ \alpha\text{ }\ \gamma\text{ }\ \sigma\text{ }\ \tau\text{ }\ \tau\text{ }\ \ldots\text{ }\tau\text{ }\ \end{array}$$

provided <sup>C</sup> <sup>∨</sup> <sup>L</sup> <sup>∈</sup> (<sup>N</sup> <sup>∪</sup> <sup>U</sup>), <sup>C</sup> <sup>=</sup> <sup>C</sup><sup>0</sup> <sup>∨</sup> <sup>C</sup><sup>1</sup>, <sup>C</sup><sup>1</sup><sup>σ</sup> <sup>=</sup> Lσ ∨ ··· ∨ Lσ, <sup>C</sup><sup>0</sup><sup>σ</sup> does not contain Lσ, <sup>δ</sup> is the mgu of the literals in <sup>C</sup><sup>1</sup> and <sup>L</sup>, (<sup>C</sup> <sup>∨</sup> <sup>L</sup>)<sup>σ</sup> is ground, (<sup>C</sup> <sup>∨</sup> <sup>L</sup>)<sup>σ</sup> <sup>≺</sup><sup>β</sup> {β}, <sup>C</sup><sup>0</sup><sup>σ</sup> is false under <sup>Γ</sup>, and Lσ is undefined in <sup>Γ</sup>.

**Decide** (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; ) <sup>⇒</sup>SCL (Γ, Lσ<sup>k</sup>+1; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup> + 1; )

provided atom(L) occurs C for a C <sup>∈</sup> (N <sup>∪</sup> U), Lσ is a ground literal undefined in <sup>Γ</sup>, and Lσ <sup>≺</sup><sup>β</sup> <sup>β</sup>.

**Conflict** (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; ) <sup>⇒</sup>SCL (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>D</sup> · <sup>σ</sup>)

provided D <sup>∈</sup> (N <sup>∪</sup> U), Dσ false in Γ for a grounding substitution σ.

**Skip** (Γ, L; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>D</sup> · <sup>σ</sup>) <sup>⇒</sup>SCL (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup> <sup>−</sup> <sup>i</sup>; <sup>D</sup> · <sup>σ</sup>)

provided comp(L) does not occur in Dσ, if L is a decision literal then i = 1, otherwise i = 0.

**Factorize** (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; (<sup>D</sup> <sup>∨</sup> <sup>L</sup> <sup>∨</sup> <sup>L</sup> ) · <sup>σ</sup>) <sup>⇒</sup>SCL (Γ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; (<sup>D</sup> <sup>∨</sup> <sup>L</sup>)<sup>η</sup> · <sup>σ</sup>) provided Lσ <sup>=</sup> L σ, η = mgu(L, L ). **Resolve** (Γ, Lδ(C∨L)·δ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; (<sup>D</sup> <sup>∨</sup> <sup>L</sup> ) · σ) <sup>⇒</sup>SCL (Γ, Lδ(C∨L)·δ; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; (<sup>D</sup> <sup>∨</sup> <sup>C</sup>)<sup>η</sup> · σδ) provided Lδ = comp(L σ), η = mgu(L, comp(L )). **Backtrack** (Γ<sup>0</sup>, K, Γ<sup>1</sup>, comp(Lσ)k; <sup>N</sup>;U; <sup>β</sup>; <sup>k</sup>; (<sup>D</sup> <sup>∨</sup> <sup>L</sup>) · <sup>σ</sup>) <sup>⇒</sup>SCL (Γ<sup>0</sup>; <sup>N</sup>;<sup>U</sup> ∪ {<sup>D</sup> <sup>∨</sup> <sup>L</sup>}; <sup>β</sup>; <sup>j</sup>; )

provided Dσ is of level i < k, and Γ0, K is the minimal trail subsequence such that there is a grounding substitution τ with (D <sup>∨</sup> L)τ is false in Γ0, K but not in <sup>Γ</sup><sup>0</sup>, and <sup>Γ</sup><sup>0</sup> is of level <sup>j</sup>.

A sequence of rule applications of a particular calculus is called a *run* of the calculus. A *strategy* for a calculus restricts the set of runs we actually allow by imposing further conditions on the allowed rule applications.

**Definition 2 (SCL Runs).** *A sequence of SCL rule applications is called a* reasonable run *if the rule Decide does not enable an immediate application of rule Conflict. A sequence of SCL rule applications is called a* regular run *if it is a reasonable run and the rule Conflict has precedence over all other rules.*

All regular SCL runs are sound, only derive non-redundant clauses, always terminate, and SCL with a regular strategy is refutationally complete (for firstorder logic without equality) [9].

*The Superposition Calculus:* Superposition [1,2,18] is a calculus for first-order logic reasoning that also infers/learns new clauses like SCL. In contrast to SCL, it does these inferences based on a static ordering ≺ and, at the level of inference rules, independent of a partial model. A permissible ordering ≺ for the superposition calculus is always a well-founded, total, strict ordering on ground literals. This ordering is then lifted to clauses and clause sets by its respective multiset extension. A problem state in the superposition calculus is just a set N of clauses. The start state the initial clause set. Due to the restriction to firstorder logic without equality, the most basic version of the superposition calculus consists just of the following two rules (without selection):

**Superposition Left** (<sup>N</sup> {C<sup>1</sup>∨P(t<sup>1</sup>,...,t<sup>n</sup>), C<sup>2</sup>∨¬P(s<sup>1</sup>,...,s<sup>n</sup>)}) <sup>⇒</sup>SUP (<sup>N</sup> ∪ {C<sup>1</sup> <sup>∨</sup> <sup>P</sup>(t<sup>1</sup>,...,t<sup>n</sup>), C<sup>2</sup> ∨ ¬P(s<sup>1</sup>,...,s<sup>n</sup>)}∪{(C<sup>1</sup> <sup>∨</sup> <sup>C</sup><sup>2</sup>)σ})

where (i) <sup>P</sup>(t<sup>1</sup>,...,t<sup>n</sup>)<sup>σ</sup> is strictly maximal in (C<sup>1</sup> <sup>∨</sup> <sup>P</sup>(t<sup>1</sup>,...,t<sup>n</sup>))<sup>σ</sup> (ii) <sup>¬</sup>P(s<sup>1</sup>,...,s<sup>n</sup>)<sup>σ</sup> is maximal, (iii) <sup>σ</sup> is the mgu of <sup>P</sup>(t<sup>1</sup>,...,t<sup>n</sup>) and P(s<sup>1</sup>,...,s<sup>n</sup>).

**Factoring** (<sup>N</sup> {<sup>C</sup> <sup>∨</sup> <sup>P</sup>(t<sup>1</sup>,...,t<sup>n</sup>) <sup>∨</sup> <sup>P</sup>(s<sup>1</sup>,...,s<sup>n</sup>)}) <sup>⇒</sup>SUP (N ∪ {C <sup>∨</sup> P(t<sup>1</sup>,...,t<sup>n</sup>) <sup>∨</sup> P(s<sup>1</sup>,...,s<sup>n</sup>)}∪{(C <sup>∨</sup> P(t<sup>1</sup>,...,t<sup>n</sup>))σ}) where (i) P(t<sup>1</sup>,...,t<sup>n</sup>)<sup>σ</sup> is maximal in (<sup>C</sup> <sup>∨</sup> <sup>P</sup>(t<sup>1</sup>,...,t<sup>n</sup>) <sup>∨</sup> <sup>P</sup>(s<sup>1</sup>,...,s<sup>n</sup>))<sup>σ</sup> (ii) <sup>σ</sup> is the mgu of <sup>P</sup>(t<sup>1</sup>,...,t<sup>n</sup>) and P(s<sup>1</sup>,...,s<sup>n</sup>).

Let sfac(C) represent a clause obtained by exhaustively applying superposition Factoring on C. Recall, that superposition Factoring only applies to maximal positive literals. Let sfac(N) represent the clause set N after every clause has been exhaustively factorized by Superposition Factorization.

Although the superposition calculus itself is independent of a partial model and may learn non-redundant clauses, the completeness proof of superposition in [1] is based on a strategy that builds ground partial models according to the fixed ordering ≺, where minimal false ground instances of clauses then trigger non-redundant superposition inferences. Note that the completeness proof relies on a grounding of the clause set that may lead to infinitely many clauses. However, the strategy from the completeness proof can also be seen as a superposition strategy for an initial clause set, where all clauses are already ground. On ground, finite clause sets, superposition restricted to the strategy only infers non-redundant clauses, always terminates, and is complete. The partial model needed in each step of the strategy is constructed according to the following model operator:

**Definition 3 (Superposition Model Operator).** *Let* N *be a set of ground clauses. Then* <sup>N</sup><sup>I</sup> *is the Herbrand model according to the superposition model operator for clause set* N *and it is constructed recursively over the partial Herbrand models* <sup>N</sup><sup>C</sup> *for all* <sup>C</sup> <sup>∈</sup> <sup>N</sup>*:*

$$\begin{array}{lcl} N\_C = \bigcup\_{D \preccurlyeq C} \delta\_D & N\_I = \bigcup\_{C \in N} \delta\_C\\ \delta\_D = \begin{cases} \{B\} & if \ D = D' \lor B, \text{ \$B \textquotedblleft} t \text{ \$T\$}, \text{ \$B \textquotedblleft} t \text{ \$T\$}, \text{ \$S \textquotedblright} \\\ \emptyset & otherwise \end{cases} \end{array}$$

*We say that a clause* C *is* productive *(wrt. the model construction of a clause set* <sup>N</sup>*) if* <sup>δ</sup><sup>C</sup> <sup>=</sup> <sup>∅</sup>*. We say that a clause* <sup>C</sup> produces *an atom* <sup>B</sup> *(wrt. the model construction of a clause set* <sup>N</sup>*) if* <sup>δ</sup><sup>C</sup> <sup>=</sup> {B}*.*

After constructing the model <sup>N</sup><sup>I</sup> for a clause set <sup>N</sup>, the strategy selects the smallest clause in <sup>N</sup> that is false in <sup>N</sup><sup>I</sup> . The strategy then selects a fitting inference rule based on the reason why the clause is false in <sup>N</sup><sup>I</sup> . The newly inferred clause either changes the model in the next step or changes the smallest clause that is false. This is the strategy used in the superposition completeness proof [1].

**Definition 4 (Minimal False Clause).** *The minimal false clause* C <sup>∈</sup> N *is the smallest clause in* <sup>N</sup> *according to* <sup>≺</sup> *such that* <sup>N</sup><sup>C</sup> <sup>∪</sup> <sup>δ</sup><sup>C</sup> |=<sup>H</sup> <sup>C</sup>*.*

#### **Definition 5 (Superposition Model-Operator Strategy: SUP-MO).** *The* superposition model-operator strategy *is defined over the minimal false clause with regards to the current clause set* N*. The strategy can encounter the following cases:*


The first two cases of the SUP-MO strategy also describe its final states according to [1]. In all other states there is always exactly one rule applicable according to the SUP-MO strategy, which also means that SUP-MO is never stuck.

**Lemma 6 (SUP-MO Applicability).** *Let* N *be a set of ground clauses. If* N *has a minimal false clause* C <sup>=</sup> <sup>⊥</sup>*, then there exists exactly one rule applicable to* N *according to the SUP-MO strategy.*

### **3 SCL Simulates Superposition**

In general, it is not possible to simulate all inferences of the superposition calculus with SCL because SCL only learns/infers non-redundant clauses, whereas syntactic superposition inferences have no such guarantees. Moreover, the inferences by SCL are all based on conflicts according to a partial model driven by the satisfiability of clause instances, whereas the inferences by superposition are based on a static ordering ≺. We can mitigate these differences by restricting superposition with the SUP-MO strategy because SUP-MO has non-redundancy guarantees and it infers new clauses based on minimal false clauses with respect to a ground partial model.

Let N<sup>0</sup> be a set of ground clauses, totally ordered by a superposition reduction ordering <sup>≺</sup>. Let N<sup>i</sup> (for i > 0) be the result of <sup>i</sup> steps of the superposition calculus applied to N<sup>0</sup> according to the SUP-MO strategy, i.e., <sup>N</sup><sup>0</sup> <sup>⇒</sup>SUP-MO <sup>N</sup><sup>1</sup> <sup>⇒</sup>SUP-MO ... <sup>⇒</sup>SUP-MO <sup>N</sup><sup>i</sup> . Again, all N<sup>i</sup> are sets of ground clauses, totally ordered by a superposition reduction ordering ≺. The SCL strategy SCL-SUP that simulates superposition restricted to SUP-MO runs is defined inductively on the clause ordering ≺. To guide and to prove the correctness of our simulation, we assign to each SCL state and every clause some additional information. For this purpose, every SCL state is annotated with a triple (i, C, γ), where i is an integer that states that the SCL state simulates the superposition state N<sup>i</sup> , C is the last clause that was used as a decision aid by the strategy, γ is a function such that γ(C) = sfac(C) if sfac(C) <sup>∈</sup> N<sup>i</sup> and <sup>γ</sup>(C) = <sup>C</sup> otherwise, the SCL state also simulates the model construction for N<sup>i</sup> upto <sup>N</sup><sup>i</sup> C-<sup>∪</sup>δ<sup>C</sup>- , where C <sup>=</sup> γ(C). The annotated states are written (Γ; N<sup>0</sup>;U; β; k; E)(i,C,γ). The overall start state is then (; N<sup>0</sup>; <sup>∅</sup>; <sup>β</sup>; 0; )(0,⊥,γ), where we assume <sup>β</sup> large enough so <sup>A</sup> <sup>≺</sup><sup>β</sup> <sup>β</sup> for all <sup>A</sup> <sup>∈</sup> atom(N<sup>0</sup>), <sup>⊥</sup> ∈ <sup>N</sup><sup>0</sup>, and <sup>γ</sup>(C) = sfac(C) if sfac(C) <sup>∈</sup> <sup>N</sup><sup>0</sup> and γ(C) = C otherwise. We will later see that the annotated integer is not relevant for the actual choice of SCL rules by the SCL-SUP strategy but only to prove that the strategy actually simulates superposition. Moreover, we define a new ordering <sup>≺</sup><sup>γ</sup> based on our superposition ordering <sup>≺</sup> and function <sup>γ</sup> such that <sup>C</sup> <sup>≺</sup><sup>γ</sup> <sup>D</sup> if <sup>γ</sup>(C) <sup>≺</sup> <sup>γ</sup>(D).

**Definition 7 (State Simulation).** *Let* (Γ; <sup>N</sup><sup>0</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>E</sup>)(i,D,γ) *be an SCL state for the input clauses* N<sup>0</sup>*. Let* <sup>L</sup> *be the maximal literal in* <sup>D</sup> *if* <sup>D</sup> <sup>=</sup> <sup>⊥</sup> *and the minimal literal according to* <sup>≺</sup> *otherwise. Let* <sup>N</sup><sup>0</sup> <sup>⇒</sup>*SUP-MO* <sup>N</sup><sup>1</sup> <sup>⇒</sup>*SUP-MO* ... <sup>⇒</sup>*SUP-MO* <sup>N</sup><sup>i</sup> *be the superposition run following the SUP-MO strategy starting from the input clause set* N<sup>0</sup>*. Let* <sup>D</sup> <sup>=</sup> <sup>γ</sup>(D)*. Then we say that the SCL state* (Γ; <sup>N</sup><sup>0</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>E</sup>)(i,D,γ) simulates <sup>N</sup><sup>i</sup> *and the model construction upto* Ni D- <sup>∪</sup> <sup>δ</sup><sup>D</sup>*if*

	- **(ix)** *for every clause* <sup>C</sup> <sup>∈</sup> <sup>N</sup><sup>i</sup> *with* C γ(D) *that produces an atom* B*, i.e.,* <sup>δ</sup><sup>C</sup> <sup>=</sup> {B}*, there exists* <sup>C</sup> <sup>∈</sup> <sup>N</sup><sup>0</sup> <sup>∪</sup> <sup>U</sup> *such that* <sup>C</sup> <sup>=</sup> <sup>γ</sup>(C ) *and* C <sup>γ</sup> <sup>D</sup>*.*
	- **(x)** Γ *contains only decisions if* E <sup>=</sup>
	- **(xi)** E ∈ {, ⊥} *iff* Γ <sup>=</sup> Γ Bsfac(D) *,* Γ *contains only decisions, there exists* E <sup>∈</sup> N<sup>i</sup> *where* γ(E) = E <sup>=</sup> E *is the minimal false clause in* N<sup>i</sup> *, and*

The above invariants can be summarized as follows: (i) All ground atoms encountered are known from the start and the trail bound β is large enough so SCL can Decide/Propagate them. (ii)–(iv) Every initial clause C or inferred clause by SUP-MO must coincide with an initial clause C or learned clause by SCL; this means on the one hand that for every clause C learned by SCL-SUP, SUP-MO infers a clause C that is identical up to factoring; on the other hand it means that for every clause C inferred by SUP-MO, SCL-SUP learns a clause C that entails C (i.e. C <sup>|</sup><sup>=</sup> C) and is at most as large as C wrt. γ. (v)–(ix) The partial model constructed by SCL-SUP and SUP-MO coincide and any atom B in <sup>N</sup><sup>C</sup> <sup>∪</sup> <sup>δ</sup><sup>C</sup> produced by clause <sup>D</sup> has a clause <sup>D</sup> on the SCL side that could propagate B and vice versa. (x)–(xiii) Ensure that any Conflict in SCL-SUP corresponds to a minimal false clause and that the trail is always constructed in such a way that the Resolve applications per Conflict call are limited to the maximal literal in the conflict; this property is needed or the next clause that would be learned by SCL no longer coincides with the clauses learned by SUP-MO. (xiv) Describes the final state in case the input clause set is unsatisfiable.

Now that we have defined how an SCL state must look like in order to simulate a superposition state, we define SCL-SUP, the SCL strategy that eventually simulates a SUP-MO run. First, note that not all states visited by SCL-SUP satisfy the invariants of Definition 7. However, the invariants hold again after each so-called *atomic sequence* of SCL-SUP steps. Second, one atomic sequence of SCL-SUP steps may skip over several successive superposition states. The reason is that SCL can and must skip all steps of SUP-MO that occur because the maximal literal in a clause is not strictly maximal, i.e., superposition Factoring steps. SCL performs factoring implicitly in its Propagation rule so SCL never has to explicitly simulate case (4) of Definition 5. Third, definition of the SCL-SUP strategy is split in two parts and each part describes some atomic sequences of SCL-SUP steps.

**Definition 8 (SCL Superposition Strategy: SCL-SUP Part 1).** *Let* <sup>S</sup><sup>0</sup> = (Γ; <sup>N</sup><sup>0</sup>;U; <sup>β</sup>; <sup>k</sup>; )(i,C,γ) *be an SCL state with additional annotations for the strategy. Let* <sup>D</sup> *be the next largest clause from* <sup>C</sup> *in the ordering* <sup>≺</sup><sup>γ</sup> *with respect to the ground clause set* N<sup>0</sup> <sup>∪</sup> <sup>U</sup>*. Let* <sup>L</sup> *be the maximal literal of* <sup>D</sup>*. Let* [¬A1,¬A2,... <sup>¬</sup>A<sup>n</sup>] *be all negative literals such that for all* <sup>i</sup> *we have* <sup>A</sup><sup>i</sup> <sup>≺</sup> <sup>L</sup>*, all* <sup>A</sup><sup>i</sup> *undefined in* <sup>Γ</sup>*,* <sup>A</sup><sup>i</sup> *occurs in* <sup>N</sup><sup>0</sup> <sup>∪</sup> <sup>U</sup>*, and* <sup>A</sup><sup>i</sup> <sup>≺</sup> <sup>A</sup><sup>i</sup>+1*. Let* <sup>D</sup> <sup>=</sup> <sup>γ</sup>(D) *be in* N<sup>i</sup> *such that* sfac(D) = sfac(D )*. Let* <sup>j</sup><sup>0</sup> + 1 *be the number of occurrences of* <sup>L</sup> *in* D *and* j <sup>=</sup> i <sup>+</sup> j0*. Then the* SCL Superposition Strategy *(SCL-SUP) performs the following steps to* <sup>S</sup><sup>0</sup> *(possibly without any actual SCL rule applications, just changing the state annotation):*


*A (potentially empty) sequence of SCL rule applications according to SCL-SUP is called an* atomic sequence *of SCL-SUP steps if it starts from a state* <sup>S</sup><sup>0</sup> *and ends in a state* <sup>S</sup><sup>2</sup> *outlined in the cases (2a-c).*

The first part of the strategy simulates the recursive construction of the partial model used in the SUP-MO strategy (see Definition 3). It assumes that the model is already constructed up to the current annotated clause C and extends this model for the next largest clause <sup>D</sup> <sup>∈</sup> (N<sup>0</sup> <sup>∪</sup>U). To this end, it uses the rule Decide in step (1) to set all atoms A to false that are still undefined but can no longer be produced by any clause greater or equal to D. Next the strategy makes a case distinction. Step (2a) handles the case where D corresponds to a clause D in the superposition state (modulo some Factoring steps skipped by SCL) that produces atom B; SCL-SUP then adds B to the trail with the rule Decide because producing/adding this atom does not falsify a clause. Step (2b) handles a similar case compared to step (2a); but in this case producing/adding the atom B to the trail results in a minimal false clause E; in order to force a resolution step between clause D and E, SCL-SUP first uses Propagate to add B to the trail and then applies conflict to E. Step (2c) handles the case where D corresponds to a clause D that will not produce an atom <sup>B</sup> even modulo some Factoring steps; in this case no further SCL rule applications are necessary as the SUP-MO model will not change. Note that the annotated function γ is needed so the SCL state knows when the superposition state would have applied Factoring to a clause C, which also means that it is now treated as its factorized version γ(C) = sfac(C) in our inductive clause ordering.

*Example 9.* Let us now further demonstrate the three different cases of the first part of the SCL-SUP strategy with the help of an example. Let N<sup>0</sup> be our initial set of clauses:

$$\begin{array}{ccccc} N^0 = \{(C\_1) \ P(a), & (C\_2) \ \neg P(b) \lor Q(a), & (C\_3) \ \neg P(a) \lor Q(a) \lor Q(a), \\ & (C\_4) \ P(a) \lor \neg Q(a), & (C\_5) \ \neg P(a) \lor \neg Q(a) \} \\ & \dots & \text{2.627... 2.12... 2.1... 2.8... 2.8... 3.6} \end{array}$$

We compare the run of SCL-SUP for N<sup>0</sup> with the run of SUP-MO for <sup>N</sup><sup>0</sup> to demonstrate that both runs coincide. As superposition ordering, we choose an LPO with precedence a <sup>≺</sup> b <sup>≺</sup> P <sup>≺</sup> Q. This means that the atoms are ordered <sup>P</sup>(a) <sup>≺</sup> <sup>P</sup>(b) <sup>≺</sup> <sup>Q</sup>(a) <sup>≺</sup> <sup>Q</sup>(b) and the clauses in <sup>N</sup><sup>0</sup> are ordered <sup>C</sup><sup>1</sup> <sup>≺</sup> <sup>C</sup><sup>2</sup> <sup>≺</sup> <sup>C</sup><sup>3</sup> <sup>≺</sup> <sup>C</sup><sup>4</sup> <sup>≺</sup> <sup>C</sup><sup>5</sup>. The initial SUP-MO state is simply the clause set <sup>N</sup><sup>0</sup> and the initial SCL-SUP state is (, N<sup>0</sup>, <sup>∅</sup>, β, <sup>0</sup>, )(0,⊥,γ0), where γ<sup>0</sup>(C) = C for all clauses <sup>C</sup>. In the first step of SCL-SUP, SCL-SUP first selects the clause <sup>C</sup><sup>1</sup> as its new decision aid because it is the next largest clause in N<sup>0</sup> compared to <sup>⊥</sup>. Then SCL-SUP continues with step (1) of Definition 3. In this step SCL-SUP does nothing because there are no atoms smaller than P(a). Next, SCL-SUP detects that the maximal literal of <sup>C</sup><sup>1</sup> is positive, |<sup>=</sup> <sup>C</sup><sup>1</sup>, and that the trail [P(a)<sup>1</sup>] does not result in a conflict. Therefore, SCL-SUP follows step (2a) of Definition 3 and Decides P(a), which results in the state ([P(a)<sup>1</sup>], N<sup>0</sup>, <sup>∅</sup>, β, <sup>1</sup>, )(0,C1,γ0). Meanwhile, SUP-MO starts with constructing a model for N<sup>0</sup> starting with the clause <sup>C</sup><sup>1</sup>. The result is that <sup>C</sup><sup>1</sup> is productive and <sup>δ</sup><sup>C</sup><sup>1</sup> <sup>=</sup> {P(a)} and <sup>N</sup><sup>0</sup> <sup>C</sup><sup>1</sup> = ∅, which coincides with our new SCL trail.

SCL-SUP considers the clause <sup>C</sup><sup>2</sup> as its new decision aid and continues with step (1) of Definition 3. This time there is an atom smaller than the maximal literal of <sup>C</sup><sup>2</sup> namely <sup>P</sup>(b). Therefore, SCL-SUP Decides <sup>¬</sup>P(b) in step (1) of Definition 3, which results in ([P(a)<sup>1</sup>,¬P(b)<sup>2</sup>], N<sup>0</sup>, <sup>∅</sup>, β, <sup>2</sup>, )(0,C2,γ0). Next, SCL-SUP detects that the maximal literal of <sup>C</sup><sup>2</sup> is positive but that [P(a)<sup>1</sup>,¬P(b)<sup>2</sup>] <sup>|</sup><sup>=</sup> C<sup>2</sup>. Therefore, SCL-SUP follows step (2c) of Definition <sup>3</sup> and ends this atomic sequence immediately. SUP-MO continues the model construction for N<sup>0</sup> with the clause <sup>C</sup><sup>2</sup>. The clause <sup>C</sup><sup>2</sup> is not productive because <sup>N</sup><sup>0</sup> <sup>C</sup><sup>2</sup> <sup>|</sup>=<sup>H</sup> <sup>C</sup><sup>2</sup>, where N0 <sup>C</sup><sup>2</sup> <sup>=</sup> <sup>δ</sup><sup>C</sup><sup>1</sup> <sup>=</sup> {P(a)} and <sup>δ</sup><sup>C</sup><sup>2</sup> <sup>=</sup> <sup>∅</sup>, which again coincides with our new SCL trail as Herbrand models do not explicitly define atoms assigned to false.

SCL-SUP now considers the clause <sup>C</sup><sup>3</sup> as its new decision aid and continues with step (1) of Definition 3. In this step SCL-SUP does nothing because all atoms smaller than Q(a) are already assigned. Next, SCL-SUP detects that the maximal literal of <sup>C</sup><sup>3</sup> is positive, [P(a)<sup>1</sup>,¬P(b)<sup>2</sup>] |<sup>=</sup> <sup>C</sup><sup>3</sup>, and that the clause <sup>C</sup><sup>5</sup> is false with respect to the trail [P(a)<sup>1</sup>,¬P(b)<sup>2</sup>, Q(a)sfac(C3) ]. Therefore, SCL-SUP follows step (2b) of Definition 3, i.e. it Propagates P(a) and applies Conflict to <sup>C</sup><sup>5</sup>, resulting in ([P(a)<sup>1</sup>,¬P(b)<sup>2</sup>, Q(a)sfac(C3)], N0, <sup>∅</sup>, β, <sup>2</sup>, C<sup>3</sup>)(1,C2,γ1), where <sup>γ</sup><sup>1</sup> is identical to <sup>γ</sup><sup>0</sup> except that <sup>γ</sup><sup>1</sup>(C<sup>3</sup>) = sfac(C<sup>3</sup>) = <sup>¬</sup>P(a) <sup>∨</sup> <sup>Q</sup>(a). Note that SCL-SUP must change the state annotations because the maximal literal in <sup>C</sup><sup>3</sup> is not strictly maximal, so SCL-SUP skips and eventually silently performs the Factorization step performed by SUP-MO. Note also that in the changed clause ordering <sup>≺</sup><sup>γ</sup><sup>1</sup> the order of <sup>C</sup><sup>2</sup> and <sup>C</sup><sup>3</sup> changed, i.e., <sup>C</sup><sup>3</sup> <sup>≺</sup><sup>γ</sup><sup>1</sup> <sup>C</sup><sup>2</sup>, which corresponds to sfac(C<sup>3</sup>) <sup>≺</sup> C<sup>2</sup>. Meanwhile, SUP-MO continues the model construction for N<sup>0</sup> with the clause <sup>C</sup><sup>3</sup>. The clause <sup>C</sup><sup>3</sup> is not productive because the maximal literal is not strictly maximal so <sup>δ</sup>(3) <sup>=</sup> <sup>∅</sup> and <sup>N</sup><sup>0</sup> <sup>C</sup><sup>3</sup> <sup>∪</sup>δ<sup>C</sup><sup>3</sup> |=<sup>H</sup> <sup>C</sup><sup>3</sup> so <sup>C</sup><sup>3</sup> is the minimal false clause in N<sup>0</sup>. SUP-MO resolves this conflict by applying Factoring to <sup>C</sup><sup>3</sup>, which means SUP-MO infers the clause <sup>C</sup><sup>6</sup> = sfac(C<sup>3</sup>) = <sup>¬</sup>P(a)∨Q(a). The new clause order in superposition state <sup>N</sup><sup>1</sup> <sup>=</sup> <sup>N</sup><sup>0</sup> ∪ {C<sup>6</sup>} is <sup>C</sup><sup>1</sup> <sup>≺</sup> <sup>C</sup><sup>6</sup> <sup>≺</sup> <sup>C</sup><sup>2</sup> <sup>≺</sup> <sup>C</sup><sup>3</sup> <sup>≺</sup> <sup>C</sup><sup>4</sup> <sup>≺</sup> <sup>C</sup><sup>5</sup>, which matches the changed ordering <sup>C</sup><sup>3</sup> <sup>≺</sup><sup>γ</sup><sup>1</sup> <sup>C</sup><sup>2</sup> because <sup>C</sup><sup>6</sup> <sup>=</sup> <sup>γ</sup><sup>1</sup>(C<sup>3</sup>). Next, SUP-MO updates its model construction for <sup>N</sup><sup>1</sup>. The result is that <sup>C</sup><sup>1</sup> and <sup>C</sup><sup>6</sup> are productive and that N<sup>1</sup> <sup>C</sup><sup>6</sup> <sup>∪</sup> <sup>δ</sup><sup>C</sup><sup>6</sup> <sup>=</sup> {P(a), Q(a)}, which matches the current SCL trail. Moreover, if we continue the model construction upto <sup>C</sup><sup>5</sup> then no new literals are produced and <sup>C</sup><sup>5</sup> also turns into the minimal false clause for <sup>N</sup><sup>1</sup>.

**Definition 10 (SCL Superposition Strategy: SCL-SUP Part 2).** *Let* <sup>S</sup><sup>0</sup> = (Γ, Bsfac(C) ; <sup>N</sup><sup>0</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>E</sup>)(i,C,γ) *be an SCL state with* <sup>E</sup> ∈ {, ⊥} *and additional annotations for the strategy. Let* L <sup>=</sup> <sup>¬</sup>B *be the maximal literal of* E*. Let* Γ *contain only decision literals. Let all atoms* A *occurring in* N<sup>0</sup> <sup>∪</sup> <sup>U</sup> *with* A <sup>≺</sup> B *be defined in* Γ *following the order* <sup>≺</sup>*, i.e., for all* A *occurring in* N<sup>0</sup> <sup>∪</sup> U *with* A <sup>≺</sup> B *there exist* Γ *and* Γ *such that* Γ <sup>=</sup> Γ , L<sup>A</sup>, Γ*,* <sup>L</sup><sup>A</sup> <sup>=</sup> <sup>A</sup> *or* <sup>L</sup><sup>A</sup> <sup>=</sup> <sup>¬</sup><sup>A</sup> *and all atoms* <sup>A</sup> <sup>∈</sup> <sup>N</sup><sup>0</sup> <sup>∪</sup><sup>U</sup> *with* <sup>A</sup> <sup>≺</sup> <sup>A</sup> *are defined in* <sup>Γ</sup> *. Let* E *be contained in* N<sup>i</sup> *. Let* <sup>j</sup><sup>0</sup> *be the number of occurrences of* <sup>L</sup> *in* <sup>E</sup> *and* <sup>j</sup> <sup>=</sup> <sup>i</sup> <sup>+</sup> <sup>j</sup><sup>0</sup>*. Let* sfac(C) = <sup>C</sup><sup>1</sup> <sup>∨</sup> <sup>B</sup> *and* <sup>E</sup> <sup>=</sup> <sup>E</sup> <sup>∨</sup> <sup>E</sup>*, where* <sup>E</sup> *contains all occurrences of* <sup>L</sup> *in* E*. Then the* SCL Superposition Strategy *(SCL-SUP) performs the following steps to* S<sup>0</sup>*:*


### *A (potentially empty) sequence of SCL rule applications according to SCL-SUP is called an* atomic sequence *of SCL-SUP steps if it starts from a state* <sup>S</sup><sup>0</sup> *and ends in a state* <sup>S</sup><sup>5</sup> *outlined in the cases (2a) and (5a-c).*

The second part of the strategy simulates the actual inferences resulting from a minimal false clause found in step (2b) of Definition 8 or found in steps (4a) and (4c) of Definition 10. These inferences always correspond to Superposition Left steps of the SUP-MO strategy that resolve minimal false clauses <sup>E</sup> in N<sup>i</sup> with maximal literal <sup>¬</sup>B with the clause C in N<sup>i</sup> that produced B. Note however that SCL-SUP may combine several Superposition Left steps of the SUP-MO strategy into one new learned clause. This is the case whenever the maximal literal <sup>¬</sup>B in the minimal false clause E in <sup>N</sup><sup>i</sup> is not strictly maximal. In this case, the next minimal false clause E will always correspond to the last inferred clause, the maximal literal of this clause will still be <sup>¬</sup>B, the clause producing B will be again C , and therefore the next Superposition Left partner of E is also again C . Moreover, all of the skipped inferences are actually redundant with respect to the final inference E <sup>2</sup> in this chain, which explains why SCL-SUP is still capable of simulating SUP-MO although it skips the intermediate inferences. The actual SCL-SUP clause <sup>E</sup><sup>2</sup> corresponding to final SUP-MO inference <sup>E</sup> <sup>2</sup> is computed in the steps (1) and (2) of Definition 10 with greedy applications of the rules Resolve and Factorize. The following steps of Definition 10 take care of the four different cases how E <sup>2</sup> changes the model and minimal false clause in <sup>N</sup><sup>j</sup> . The first case is that E <sup>2</sup> = ⊥ so SUP-MO has reached a final state. This case is handled by step (2a) of Definition 10 that simply empties the trail with applications of the rule Skip so the resulting SCL state has the form of a SCL-SUP final state. The second case is that the maximal literal <sup>L</sup><sup>1</sup> in <sup>E</sup> <sup>2</sup> is negative. In this case, the model for N<sup>i</sup> and <sup>N</sup><sup>j</sup> is still the same and just the minimal false clause changes to E <sup>2</sup>. This case is handled by steps (2b)–(4a) of Definition 10 that Backtrack before comp(L<sup>1</sup>) was decided, propagate it instead and apply Conflict to E<sup>2</sup>. In the third and fourth case the maximal literal <sup>L</sup><sup>1</sup> in <sup>E</sup> <sup>2</sup> is positive. In this case, the model for N<sup>i</sup> and <sup>N</sup><sup>j</sup> actually changes because <sup>E</sup> <sup>2</sup> is always productive. Case (2b)–(4b) of Definition <sup>10</sup> handles the case where producing <sup>L</sup><sup>1</sup> leads to no new minimal false clause, and case (2b)–(4c) of Definition 10 handles the case where it does. Both cases work symmetrically to steps (2a) and (2b) of Definition 8.

*Example 11.* We continue Example 9 to demonstrate cases (1)→(4a) and (1)→(2a) of the second part of the SCL-SUP strategy. We left the runs in the SCL state ([P(a)<sup>1</sup>,¬P(b)<sup>2</sup>, Q(a)sfac(C3)], N0, <sup>∅</sup>, β, <sup>2</sup>, C<sup>3</sup>)(1,C2,γ1) that simulates the superposition state N<sup>1</sup>, where

$$\begin{array}{lcl} N^1 = \{(C\_1)P(a), & (C\_2)\neg P(b) \lor Q(a), & (C\_3)\neg P(a) \lor Q(a) \lor Q(a),\\ (C\_4)\ P(a) \lor \neg Q(a), & (C\_5)\neg P(a) \lor \neg Q(a), & (C\_6)\neg P(a) \lor Q(a) \} \\\ & . \end{array}$$

and <sup>C</sup><sup>5</sup> became the minimal false clause in <sup>N</sup><sup>1</sup> after <sup>C</sup><sup>1</sup> and <sup>C</sup><sup>6</sup> produced together the partial model {P(a), Q(a)}. SUP-MO continues from the state N<sup>1</sup> by applying Superposition Left to <sup>C</sup><sup>5</sup> and <sup>C</sup><sup>6</sup>. In the new state <sup>N</sup><sup>2</sup> <sup>=</sup> <sup>N</sup><sup>1</sup> ∪ {(C<sup>7</sup>) <sup>¬</sup>P(a)<sup>∨</sup> <sup>¬</sup>P(a)} the new clause order is <sup>C</sup><sup>1</sup> <sup>≺</sup> <sup>C</sup><sup>7</sup> <sup>≺</sup> <sup>C</sup><sup>6</sup> <sup>≺</sup> <sup>C</sup><sup>2</sup> <sup>≺</sup> <sup>C</sup><sup>3</sup> <sup>≺</sup> <sup>C</sup><sup>4</sup> <sup>≺</sup> <sup>C</sup><sup>5</sup> and after constructing the model for C<sup>1</sup>, which produces again <sup>P</sup>(a), the clause <sup>C</sup><sup>7</sup> becomes again the minimal false clause. SCL-SUP follows (1) of Definition <sup>10</sup> and applies Resolve to <sup>C</sup><sup>5</sup> and sfac(C<sup>3</sup>) = <sup>C</sup><sup>6</sup>, resulting in the state ([P(a)<sup>1</sup>,¬P(b)<sup>2</sup>, Q(a)sfac(C3)], N<sup>0</sup>, <sup>∅</sup>, β, <sup>2</sup>, C<sup>7</sup>)(2,C2,γ1). Then SCL-SUP continues with steps (2b) and (3) by applying Skip twice and Backtrack once to jump to the state (, N<sup>0</sup>, {C<sup>7</sup>}, β, <sup>0</sup>, )(2,C2,γ1). Next, SCL-SUP continues with step (4a) because the maximal literal of <sup>C</sup><sup>7</sup> is <sup>¬</sup>P(a) and therefore negative. This means SCL-SUP will add P(a) again to the trail but this time by applying Propagate to <sup>C</sup><sup>1</sup> and afterwards it applies Conflict to <sup>C</sup><sup>7</sup>. The resulting state ([P(a)sfac(C1)], N<sup>0</sup>, {C<sup>7</sup>}, β, <sup>0</sup>, C<sup>7</sup>)(2,C1,γ1) matches again the SUP-MO state <sup>N</sup><sup>2</sup>.

SUP-MO continues from the state <sup>N</sup><sup>2</sup> by applying Superposition Left to <sup>C</sup><sup>7</sup> and <sup>C</sup><sup>1</sup>, resulting in <sup>N</sup><sup>3</sup> <sup>=</sup> <sup>N</sup><sup>2</sup> ∪ {(C<sup>8</sup>)¬P(a)}. Since <sup>C</sup><sup>8</sup> has the same maximal literal as <sup>C</sup><sup>7</sup> it becomes automatically the next minimal false clause in <sup>N</sup><sup>3</sup>. As a result, SUP-MO applies Superposition Left to <sup>C</sup><sup>8</sup> and <sup>C</sup><sup>1</sup>, which returns <sup>N</sup><sup>5</sup> <sup>=</sup> N<sup>3</sup> ∪ {(C<sup>9</sup>) ⊥} a final state that proves the unsatisfiability of N<sup>0</sup>. Meanwhile, SCL-SUP simulates both Superposition Left steps with one atomic SCL-SUP sequence. It starts with step (1) of Definition 10 and applies Resolve twice, resulting in the state ([P(a)<sup>1</sup>,¬P(b)<sup>2</sup>, Q(a)sfac(C3) ], N<sup>0</sup>, <sup>∅</sup>, β, <sup>2</sup>, <sup>⊥</sup>)(4,C2,γ1). Then it continues with step (2a) of Definition 10 and applies Skip until the trail is empty. The resulting state (, N<sup>0</sup>, <sup>∅</sup>, β, <sup>2</sup>, <sup>⊥</sup>)(4,⊥,γ1) is a final state and proves unsatisfiability of N<sup>0</sup>.

*Example 12.* The next example demonstrates the atomic sequence (1)→(4b) of the second part of the SCL-SUP strategy. Let <sup>N</sup><sup>0</sup> be our initial set of clauses:

$$N^0 = \{(C\_1)P(a), \quad (C\_2)\neg P(b), \quad (C\_3)\neg P(a)\lor Q(a), \quad (C\_4)\ P(b)\lor \neg Q(a)\}$$

As superposition ordering, we choose an LPO with precedence a <sup>≺</sup> b <sup>≺</sup> P <sup>≺</sup> Q. This means that the atoms are ordered P(a) <sup>≺</sup> P(b) <sup>≺</sup> Q(a) <sup>≺</sup> Q(b) and the clauses in <sup>N</sup><sup>0</sup> are ordered <sup>C</sup><sup>1</sup> <sup>≺</sup> <sup>C</sup><sup>2</sup> <sup>≺</sup> <sup>C</sup><sup>3</sup> <sup>≺</sup> <sup>C</sup><sup>4</sup>. In order to keep the example short, we skip the initial SCL-SUP steps and continue directly with the state S = ([P(a)<sup>1</sup>,¬P(b)<sup>2</sup>, Q(a)sfac(C3)], N0, <sup>∅</sup>, β, <sup>2</sup>, C<sup>4</sup>)(0,C3,γ), where <sup>γ</sup>(C) = <sup>C</sup> for all clauses C and β <sup>=</sup> Q(b). This state simulates the superposition state N<sup>0</sup> upto the model construction for C<sup>3</sup>, where <sup>N</sup><sup>0</sup> <sup>C</sup><sup>3</sup> <sup>∪</sup>δ<sup>C</sup><sup>3</sup> <sup>=</sup> <sup>δ</sup><sup>C</sup><sup>1</sup> <sup>∪</sup>δ<sup>C</sup><sup>3</sup> <sup>=</sup> {P(a), Q(a)} and <sup>C</sup><sup>4</sup> is the minimal false clause. SUP-MO continues from the state <sup>N</sup><sup>0</sup> by applying Superposition Left to <sup>C</sup><sup>4</sup> and <sup>C</sup><sup>3</sup>. In the new state <sup>N</sup><sup>1</sup> <sup>=</sup> <sup>N</sup><sup>0</sup>∪{(C<sup>5</sup>) <sup>¬</sup>P(a)∨P(b)} the new clause order is <sup>C</sup><sup>1</sup> <sup>≺</sup> <sup>C</sup><sup>5</sup> <sup>≺</sup> <sup>C</sup><sup>2</sup> <sup>≺</sup> <sup>C</sup><sup>3</sup> <sup>≺</sup> <sup>C</sup><sup>4</sup> and the partial model upto <sup>C</sup><sup>5</sup> is <sup>N</sup><sup>0</sup> <sup>C</sup><sup>5</sup> <sup>∪</sup> <sup>δ</sup><sup>C</sup><sup>5</sup> <sup>=</sup> <sup>δ</sup><sup>C</sup><sup>1</sup> <sup>∪</sup> <sup>δ</sup><sup>C</sup><sup>5</sup> <sup>=</sup> {P(a)}∪{P(b)}, which turns <sup>C</sup><sup>2</sup> into the next minimal false clause. SCL-SUP simulates the above steps by following the atomic sequence (1)→(4b) of Definition 10. The result is the state ([P(a)<sup>1</sup>, P(b)sfac(C5) ], N0, {C<sup>5</sup>}, β, <sup>1</sup>, C<sup>2</sup>)(1,C5,γ) matching again our current superposition state and model.

Without clause C<sup>2</sup>, SCL-SUP would apply the atomic sequence (1)→(4a) of Definition <sup>10</sup> to S, resulting in the state ([P(a)<sup>1</sup>, P(b)<sup>2</sup>], N<sup>0</sup> \ {C<sup>2</sup>}, {C<sup>5</sup>}, β, <sup>2</sup>, )(1,C5,γ). This matches the state N<sup>1</sup> \ {C<sup>2</sup>} and its partial model upto <sup>C</sup><sup>5</sup> that is still the same as for <sup>N</sup><sup>1</sup> with the exception that it does not lead to a minimal false clause.

In order to actually show that every SCL-SUP run simulates a SUP-MO run, we need to prove three properties. The first property is that each state visited by an SCL-SUP run must simulate a state visited by the corresponding SUP-MO run. Note that this property does not yet say anything about the order in which SCL-SUP simulates the SUP-MO states. This property can also be seen as a soundness argument for our strategy.

#### **Lemma 13 (Initial SCL State Simulates Initial Superposition State).**

*The initial SCL state* (; <sup>N</sup><sup>0</sup>; <sup>∅</sup>; <sup>β</sup>; 0; )(0,⊥,γ) *simulates the initial superposition state* N<sup>0</sup> *and the model construction upto* <sup>N</sup><sup>0</sup> <sup>⊥</sup> <sup>∪</sup> <sup>δ</sup><sup>⊥</sup>

**Lemma 14 (SCL-SUP Preserves Simulation).** *Let the SCL state* S <sup>=</sup> (Γ; <sup>N</sup><sup>0</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>E</sup>)(i,C,γ) *simulate the superposition state* <sup>N</sup><sup>i</sup> *and the corresponding model construction upto* N<sup>i</sup> C- <sup>∪</sup> <sup>δ</sup><sup>C</sup>- *, where* C <sup>=</sup> γ(C)*. Let the SCL state* S = (Γ ; N<sup>0</sup>;U ; β; k ; E )(j,D,γ-) *be the result of one atomic sequence of SCL-SUP steps. Then there exists a clause* <sup>D</sup> <sup>∈</sup> N<sup>j</sup> *with* γ (D) = D *and* S *simulates the superposition state* N<sup>j</sup> *and the model construction upto* <sup>N</sup><sup>j</sup> D- <sup>∪</sup> <sup>δ</sup><sup>D</sup>-*.*

The second property is that each atomic sequence of SCL-SUP steps always makes progress in the simulation. This means that each atomic sequence of SCL-SUP steps either advances the superposition state N<sup>i</sup> simulated by the current SCL state S = (Γ; N<sup>0</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>E</sup>)(i,D,γ), i.e., it increases the annotated <sup>i</sup>, or it still simulates the same superposition state N<sup>i</sup> but advances the simulation of the model construction operator, i.e. it increases the annotated clause C and keeps i the same. Note that it can actually happen that an atomic sequence of SCL-SUP steps skips over several superposition states. This property can also be seen as a termination argument for our strategy because SUP-MO always terminates on ground clause sets.

**Lemma 15 (SCL-SUP Advances the Simulation).** *Let the SCL state* <sup>S</sup> = (Γ; <sup>N</sup><sup>0</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>E</sup>)(i,D,γ) *simulate the superposition state* <sup>N</sup><sup>i</sup> *and the model construction upto* N<sup>i</sup> <sup>D</sup> <sup>∪</sup> <sup>δ</sup><sup>D</sup>*. Let the SCL state* <sup>S</sup> = (Γ ; N<sup>0</sup>;U ; β; k ; E )(j,D-,γ-) *be the next state reachable by one atomic sequence of SCL-SUP steps. Then either* i<j *or* <sup>i</sup> <sup>=</sup> <sup>j</sup> *and* <sup>γ</sup> <sup>=</sup> <sup>γ</sup> *and* <sup>D</sup> <sup>≺</sup><sup>γ</sup> <sup>D</sup> *.*

The last missing property shows that the SCL-SUP strategy can always advance the current SCL state whenever the simulated superposition state can be advanced by the SUP-MO strategy. This means SCL-SUP is never stuck when SUP-MO can still progress. These properties hold because the simulation invariants in Definition 7 either correspond to a correct final state or they satisfy the preconditions of Definition 8 or Definition 10. This property can also be seen as a partial correctness argument for our strategy.

**Lemma 16 (SCL-SUP Correctness of Final States).** *Let the SCL state* <sup>S</sup> = (Γ; <sup>N</sup><sup>0</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>E</sup>)(i,D,γ) *simulate the superposition state* <sup>N</sup><sup>i</sup> *and the model construction upto* N<sup>i</sup> <sup>γ</sup>(D) <sup>∪</sup> <sup>δ</sup><sup>γ</sup>(D)*. Let there be no more states reachable from* <sup>S</sup> *following an atomic sequence of SCL-SUP steps. Then* S *is a* final state*, i.e., either (i)* E <sup>=</sup> <sup>⊥</sup>*,* D <sup>=</sup> <sup>⊥</sup>*,* ⊥ ∈ N<sup>i</sup> *, and* N<sup>0</sup> *is unsatisfiable or (ii)* Γ <sup>|</sup><sup>=</sup> N0*.*

We can also show that any SCL-SUP run is also a regular run. Although this is not strictly necessary for the simulation proof, it is beneficial because it means that SCL-SUP inherits many properties that hold for SCL restricted to a regular strategy. For instance, that all learned clauses are non-redundant and that SCL-SUP always terminates.

**Lemma 17 (SCL-SUP is a Regular SCL Strategy).** *SCL-SUP is a regular SCL strategy if it is executed on a state* <sup>S</sup> = (Γ; <sup>N</sup><sup>0</sup>;U; <sup>β</sup>; <sup>k</sup>; <sup>E</sup>)(i,C,γ) *that simulates a superposition state* N<sup>i</sup> *and the corresponding model construction upto* Ni <sup>γ</sup>(C) <sup>∪</sup> <sup>δ</sup><sup>γ</sup>(C)*.*

### **4 Conclusion**

We have shown that the SCL(FOL) calculus [9] can simulate model driven superposition [1] refutations deriving only non-redundant clauses. The superposition calculus cannot simulate SCL refutations due to its static a priori ordering. In general, an SCL(FOL) learned clause is generated out of several resolution and factorization steps. From this perspective the SCL(FOL) calculus is more general and flexible than the superposition calculus. Furthermore, it only generates non-redundant clauses whereas any superposition implementation generates redundant clauses due to the syntactic application of the superposition inference rules.

Selection in superposition can also be simulated, but requires an additional branch in the SCL-SUP strategy, because selection of non-maximal, negative literals by superposition requires a different trail ordering for SCL in order to simulate a respective superposition left inference.

For future work, we plan to lift our simulation result from the ground case to the non-ground case. This lifting will require the extension of the SCL calculus by an additional rule that learns clauses that are computed as intermediate steps during the conflict analysis. This rule was left out of previous versions of SCL because we would never use it in a CDCL inspired SCL-run and because it would have complicated the termination and non-redundancy proofs for SCL. Nevertheless, we are confident that the rule can be designed in such a way that all properties of the original calculus still hold.

Considering the extension to the non-ground case, this result can be used in various directions. It can be used to develop an alternative implementation of the superposition calculus. Given a fixed ordering, the trail can be developed according to the ordering, generating only non-redundant superposition inferences. On the other hand, the concept of finite saturation can be kept this way preserving a strong mechanism for detecting satisfiability. Secondly, the result means that SCL can be used to naturally combine propagation driven reasoning with fixed ordering driven reasoning. This might overcome some of the issues of the current first-order portfolio approaches implemented in the state-of-the-art provers.

Another calculus contained in first-order reasoning portfolios is InstGen [12, 15]. It abstracts a first-order clause set to propositional logic via a grounding with a single constant. In case a CDCL sat solver proves the abstraction unsatisfiable, the first-order clause set is unsatisfiable too. For otherwise, the model found on the propositional level triggers an instantiation inference of a first-order clause. The instance rules out the before found propositional model modulo the abstraction.

The CDCL model building after grounding can be simulated via a respective SCL trail. This will then lead to a stuck state if SCL is restricted to the InstGen grounding. Now let C be the false first-order clause selected by InstGen for an instance. Then the SCL stuck state can be extended to a conflict state for C. Then SCL will not learn an instance of C, but a related clause that also rules out the previously found model on the propositional level. This way the relationship between InstGen and SCL can be investigated as well.

**Acknowledgements.** We thank our reviewers for their careful reading and constructive comments.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Formal Reasoning About Influence in Natural Sciences Experiments

Florian Bruse , Martin Lange(B) , and Sören Möller

Theoretical Computer Science/Formal Methods, University of Kassel, Kassel, Germany martin.lange@uni-kassel.de

Abstract. We present a simple calculus for deriving statements about the local behaviour of partial, continuous functions over the reals, within a collection of such functions associated with the elements of a finite partial order. We show that the calculus is sound in general and complete for particular partial orders and statements. The motivation for this work is drawn from an attempt to foster digitalisation in secondaryeduction classrooms, in particular in experimental lessons in natural science classes. This provides a way to formally model experiments and to automatically derive the truth of hypotheses made about certain phenomena in such experiments.

Keywords: formal modelling · proof system · continuous functions · completeness

### 1 Introduction

Formal reasoning using proof rules is a well-established mechanism for explaining and deriving the truth of statements, both in general-purpose first- and higherorder logics [2,16] as well as special-purpose logics in arithmetic [5], knowledge discovery [15], program verification [13] etc. Here we are concerned with the problem of proving statements about the local "behaviour" of certain real-valued functions. A proof calculus for such simple statements may be interesting purely for its logical (meta-)properties. There is, however, also a very concrete motivation for this work: digitalisation of experiments in natural sciences in secondaryeducation classrooms. Studies show how digitalisation can benefit such teachinglearning environments [10,18], not least by channeling pupils' interaction through a software tool to enforce better learning [11].

In classes of natural sciences like biology, physics and chemistry, pupils are often taught some background knowledge about particular subjects which they then need to put to the test experimentally. For this, they are given a *research question* which typically asks them to discover and formulate a particular phenomenon in form of a so-called *hypothesis*, and to validate its correctness experimentally. Take for instance as an "experiment" in a physics class the standard European alternating current at 230 V 50 Hz. The way that voltage fluctuates over time – in other words: time *influences* voltage – and voltage induces (resp. *influences*) a current, forms the background theory, and a research question could for instance be: *how does the current change over time*? We aim to provide digital technology that can answer such questions automatically in order to give valid feedback to a pupil about their success in this task.

Formal models for processes from natural sciences have been proposed in the literature [19], like Petri nets [6,12] or hybrid automata [1,3]. They allow for precise modelling of experiments; the price to pay is that of undecidability of model checking already, let alone validity checking. Moreover, they rely on exact knowledge about the nature of influences in such experiments, and this can often only be described by differential equations. Hence, determining correctness of a hypothesis requires sophisticated algebraic or numerical methods.

Here, we model experiments abstractly as *influence schemes*, that is sets C of statements about certain parts of an influence, allowing them to be built from observations for instance. Correctness of a hypothesis H then is the question of whether H logically follows from C. We provide the framework for modelling experiments and hypotheses about influences in the form of a simple language of statements, a formal semantics via collections of partial continuous functions, and a proof calculus for logical consequence in this language. We show that it is sound in general, complete for a large and useful class of hypotheses and experiment models, i.e. influence schemes, and that it is polynomial-time decidable.

The completeness proof uses elements that are similar to constructions for general logics. A key ingredient is normalisation, essentially a saturation process comparable to the construction of Hintikka or maximally consistent sets, cf. [7, 17]. Another one is the effective construction of countermodels for such saturated sets, cf. [8,9,14]. The details of these constructions are of course tailored to the specifics of the mixed discrete-continuous structures here, dealing with properties of collections of (partial) continuous functions associated with pairs of elements of some underlying finite partial order.

The paper is organised as follows. Section 2 introduces the mathematical basics in terms of functions on the reals, statements, influence schemes, hypotheses etc. Section 3 presents the proof calculus including its soundness. Section 4 begins by showing that the proof calculus is generally incomplete, as the relatively simple statements cannot make assertions capturing certain phenomena arising with functions on the reals. We then develop a restriction on influence schemes and show that completeness does hold in this case. The full proofs of technical lemmas are omitted for reasons of space restriction. Section 5 discusses the computational problem of proof search. Section 6 concludes with remarks on further work.

### 2 Modelling Influence

Statements and Influence Schemes. In all of the following, <sup>V</sup> <sup>=</sup> {a, b, . . .} denotes a finite set of *variables*, and we assume that these are partially ordered by ≤ with < denoting its strict subset.

An *interval* (of reals) is denoted [x, y] for x, y ∈ Q ∪ {−∞,∞} with x ≤ y. Abusing standard notation, we write, e.g. [−∞, 10] rather than (−∞, 10] for the set of all real numbers z with z ≤ 10, since we only consider intervals that are closed at rational bounds (for purposes of effective representation) and semiopen only at infinities. This provides a common notation for intervals and saves us making case distinctions everywhere, depending on the interval bounds.

A V-*statement* is a 5-tuple S = (a, I, q, I , b), typically written as a *IqI*- b, s.t. a, b ∈ V with a<b, and I,I are intervals in the sense above. I is called the *domain*, denoted *dom*(S), and required to be a non-singleton interval. I is the *range*, denoted *rng*(S). Finally, q ∈ Q := {,,→,-} is called a *behaviour*. It describes a gradient of the influence abstractly as either *monotonic*, *antitonic*, *constant* or *arbitrary*. When the variables a, b involved in the statement S are clear from or irrelevant for the context, we also often simply write *IqI*- .

The statement S is used to formalise the assertion "*variable* a*influences variable* b *on the interval* I *in a way described by* q*, s.t. varying the value for* a *in this interval results in* b *taking values from the interval* I *.*"

A V-*influence scheme*, or simply influence scheme if V is clear from the context, is a finite set C of V-statements. Intuitively, an influence scheme describes the way that certain variables influence each other in an abstract way.

*Example 1.* We build an influence scheme for the AV-voltage experiment. The relevant variables are t for *time*, v for *voltage* and c for *current*, ordered by t < v < c. A theory of how voltages alternates over time (in the standard European alternating 230 V/50 Hz setting) and how it induces a current at a resistance of 326 Ω can be formalised as follows. Remember that a scheme is a finite set of statements like t [0*,*5] [0*,*326] v etc. Each can easily be visualised as a rectangle in the 2-dimensional plane for the pair of involved variables: horizontal and vertical edges determine domain and range, and the behaviour can be shown as a label on the rectangle. A particular influence scheme C with 20 statements is shown in Fig. 1 as grey rectangles in this way. The behaviours in the graph in the middle are left out for better visibility; they are all supposed to be .

The orange lines in the graphs of Fig. 1 represent a so-called *influence experiment*, as it will be explained below. At this point, it can be used to show that influence schemes as formal models of experiments can be obtained through data sampling. Note how the borders of the rectangles in the scheme C coincide with values of the functions represented by the orange lines in most cases.

Note that the scheme C shown in Fig. 1 contains no statements for the pair (t, c) of variables. This does not mean that time does not influence current in this scheme: clearly, if time influences voltage, and voltage influences current, then time executes some influence on current. Hence, a valid question asks whether the statement H shown as a blue rectangle follows logically from the scheme C in the sense that whenever time influences voltage and voltage influences current in the way described by C, does time then also influence current in the way described by H? We use the letter H for such a statement as it plays the role of a *hypothesis*: in logical terms it is just a statement, but from an application point of view it is special in that it signifies an implicit question after its truth with respect to a scheme.

Fig. 1. An influence scheme (grey rectangles), a hypothesis (blue dashed rectangle) and an influence experiment (orange lines) between time, voltage and current. (Color figure online)

A Formal Semantics. In order to give a well-defined meaning to the question whether H follows from C for a given scheme C and hypothesis H, we introduce a formal interpretation of statements in so-called *influence experiments*. We need to recall and define a few technicalities about functions over the reals.

An *influence* is a function f : R R s.t. *dom*(f) is a non-singleton interval in the sense above, and f is continuous on its domain in the usual sense. We write f(x) = ⊥ if x ∈ *dom*(f). When composing partial functions we assume undefined values to be absorbing, i.e. g(f(x)) = ⊥ if f(x) = ⊥.

An influence f is called *monotonic*, *antitonic* or *constant* on [x, y] ⊆ *dom*(f), if for all z, z ∈ [x, y] with z ≤ z we have f(z) ≤ f(z ), respectively f(z) ≥ f(z ) and f(z) = f(z ). It *satisfies* the statement S = [*x,y*] *<sup>q</sup>* [*x*- *,y*- ] , written f |= S, if the following two conditions are met.


Since every constant function is monotonic and antitonic, and each of these is also an arbitrary one, we naturally obtain a partial order on behaviours that features unique infima and suprema, shown in Fig. 3. Note that, whenever <sup>f</sup> <sup>|</sup><sup>=</sup> *IqI*- and <sup>q</sup> <sup>q</sup> then also <sup>f</sup> <sup>|</sup><sup>=</sup> *I q*- *I*- .

We are now ready to define the formal semantics of influence schemes.

Definition 1. *Let* <sup>V</sup> *be as above. A* <sup>V</sup>*-*influence experiment *is a collection* <sup>F</sup> *of influences, namely one function* F*a,b for each pair* (a, b) *s.t.* a<b*, altogether satisfying the following* coherence property *(CP).*

*– For all* a, b, c ∈ V *s.t.* a<b*,* b<c *and all* x ∈ R*:* F*a,c*(x) = F*b,c*(F*a,b*(x))*.*

<sup>F</sup> *satisfies the* <sup>V</sup>*-statement* <sup>S</sup> <sup>=</sup> <sup>a</sup> *IqI*- <sup>b</sup>*, written* F |<sup>=</sup> <sup>S</sup>*, if* <sup>F</sup>*a,b* <sup>|</sup><sup>=</sup> *IqI*- *.* F *satisfies the* V*-influence scheme* C*, written* F |= C*, if* F |= S *for all* S ∈ C*.*

CP together with the absorption of ⊥ in function composition is the reason for demanding the variables to be partially ordered: F*a,a*, for any variable a would have to be the total identity function to satisfy CP. And then we would have <sup>F</sup>*b,a* <sup>=</sup> <sup>F</sup><sup>−</sup><sup>1</sup> *a,b* for any a, b. Thus, by demanding that F*a,b* is only defined whenever a<b we avoid problems arising with non-invertible functions.

*Example 2.* Figure 1 shows a particular time-voltage-current experiment F as three influences drawn as orange graphs. It represents the way that voltage alternates in time along a sine curve with amplitude 230 · <sup>√</sup><sup>2</sup> <sup>≈</sup> <sup>326</sup> V and frequency 50 Hz. Electric current depends linearly on voltage in this experiment, with a factor of <sup>1</sup> <sup>326</sup> used here suggesting an electrical resistance of 326 Ω. The coherence property then demands a third influence Ft*,*<sup>c</sup> as their composition on the domain of Ft*,*<sup>v</sup> = [0,∞] which is also a sine curve.

Let C be the influence scheme shown in Fig. 1 and introduced in Example 1. Clearly F |= C because F does not satisfy the second (degenerate) rectangle representing the statement t [3*,*7] <sup>→</sup> [264*,*264] v and neither the fifth representing <sup>t</sup> [12*,*16] [−310*,*−192] <sup>v</sup>. This is because <sup>F</sup>t*,*<sup>v</sup> is neither constant on [3, 7] nor antitonic on [12, 16], and because it assumes values outside of the statements' ranges on these domains, e.g. Ft*,*v(5) = 326 ∈ [264, 264] and Ft*,*v(15) = −326 ∈ [−310, −192].

Note that satisfaction of a statement S by an influence f means that the graph of f enters the rectangle representing S through its left edge and leaves it only through its right edge, and within this rectangle it displays the behaviour stated in S. This is the case for instance for the hypothesis H drawn as a blue rectangle: F |= H indeed. But this does not allow any conclusion to be drawn about whether H follows from C in any way.

The interpretation of an influence scheme through influence experiments naturally gives rise to a notion of logical consequence: we say that the V-statement H *follows* from the V-influence scheme C, written C |= H, if F |= H for all V-influence experiments s.t. F |= C. Thus, an influence scheme C can be seen as a finite representation of an (uncountable) number of V-experiments, which yields the abstract nature of these schemes as mentioned in the introduction.

The semantics also gives rise to a natural notion of equivalence between schemes: C and C are *equivalent*, written C≡C , if for all F we have F |= C iff F |= C . Note that this is the case iff for all hypotheses H we have C |= H iff C |= H. Equivalent schemes can therefore be seen as (possibly different) descriptions of the same experimental setup, up to a certain amount of imprecision determined by the description of the experimental setup through discrete statements.

### 3 The Calculus of Influence

The concept of consequence between a scheme and a hypothesis provides the foundations for a logical approach to modelling experimental setups and correctness of hypotheses w.r.t. them. Ideally, the consequence relation |= would be decidable, since this would provide a way to automatically check the correctness of a hypothesis w.r.t. a given scheme. In this section we develop a proof-theoretic characterisation of |= in terms of a provability predicate . Ideally, would be sound and complete w.r.t. |=, i.e. a statement would follow from an influence scheme iff it is provably derivable from it. Then decidability of (cf. Sect. 5)

Fig. 2. Proof rules for correctness of a statement w.r.t. an influence scheme C. See Fig. 3 for the definitions of and ⊗.

would yield the basis for automatic reasoning about influence in experimental setups.

Henceforth, let V and a V-influence scheme C be fixed. We say that a Vstatement H is *provable* w.r.t. C, written C H, if there is a finite proof for H in the proof system whose rules are shown in Fig. 2.

We will briefly explain the intuition behind each of them. The rule (F), which serves as an axiom, essentially states that any statement which is part of the scheme, follows from it. (G) expresses the fact that experiments are comprised of potentially partial functions whose domain is always some interval. It states that any function F*a,b* which shows some certain behaviour on the interval [x, y], and some certain behaviour on the interval [x , y ] where y<x , must also be defined on the interval [y , x]. However, we cannot determine better bounds than infinities on its values, nor a non-arbitrary behaviour there.

Rule (T) expresses the transitivity principle laid out in the coherence property of V-experiments: when a influences b s.t. a-values in I lead to b-values in I1, and I<sup>1</sup> ⊆ I2, and b-values in I<sup>2</sup> lead to c-values in I , then a-values in I lead to c-values in I . Moreover, the behaviour of the influence from a to c can be derived from the ones from a to b and from b to c via the multiplication table for ⊗ shown in Fig. 3.

Rule (I <sup>−</sup>) expresses weakening of statements w.r.t. the involved intervals. Any function which maps values from I<sup>1</sup> to values in I<sup>2</sup> must also do so for values from a subset of I1, and their range is naturally limited by any superset of I2. On the other hand, (I <sup>+</sup>) represents an important strengthening principle:

Fig. 3. Order -(left) and multiplication ⊗ (right) on behaviours.

any function that maps values from I<sup>1</sup> to I <sup>1</sup> and values from I<sup>2</sup> to I <sup>2</sup> must map values from I1∩I <sup>1</sup> to I2∩I <sup>2</sup>. Note that the rule is only (meaningfully) applicable if I<sup>1</sup> ∩ I 1 = ∅. Moreover, the behaviour on the intersection can be determined from those on the two involved intervals. For instance, if F*a,b* is monotonic on I<sup>1</sup> and antitonic on I <sup>1</sup> then it must be both monotonic and antitonic on I<sup>1</sup> ∩ I 1, hence, it must in fact be constant there.

Rules (L<sup>+</sup> )–(R<sup>+</sup> ) express further strengthening principles which are applicable in situations where two statements are made about the behaviour of a function on adjacent intervals. Suppose for instance, that F*a,b* maps values from [x, y] monotonically into [l, u], and values from [y, z] somehow into [l , u ]. In particular, we have F*a,b*(y) ≤ u since y ∈ [x, y], and F*a,b*(y) ≤ u since y ∈ [y, z], i.e. F*a,b*(y) ≤ min(u, u ). By monotonicity, for all z with x ≤ z ≤ y we must have F*a,b*(z ) ≤ min(u, u ) as well. Hence, from the knowledge about the monotonic behaviour of F*a,b* on [x, y] and the upper bound on an adjacent interval to the right of it, we can possibly infer a tighter upper bound on the values of <sup>F</sup>*a,b* on [x, y]. This is what rule (L<sup>+</sup> ) does. The other three rules (L<sup>+</sup> ), (R<sup>+</sup> ) and (R<sup>+</sup> ) cover the analogous cases of the behaviour being antitonic or the adjacent statement being on the other side.

Rule (J) can be used to infer statements about the behaviour of a function on parts of its domain which are comprised of several intervals. If F*a,b* maps values from [x, y] into I<sup>1</sup> with behaviour q, and values from [y, z] into I<sup>2</sup> with behaviour q , then it maps values from [x, z] into I<sup>1</sup> ∪I2, provided that this is an interval. Moreover, the behaviour on the larger interval can be determined from q and q by simply taking the supremum w.r.t . This is obviously associative, which allows us to write sup(q1,...,q*n*) without ambiguity.

Note that (J) is also a weakening rule: for instance, from S<sup>1</sup> = a [0*,*1] - [0*,*1] b and S<sup>2</sup> = a [1*,*2] - [1*,*2] b we can infer S = a [0*,*2] - [0*,*2] b, describing any influence F*a,b* that maps values from [0, 2] to [0, 2], for instance F*a,b*(x)=2 − x. I.e. we have F |= S, but F |= S<sup>1</sup> and F |= S2. Likewise, (Q−) allows the weakening of behaviours. It states that a function which possesses a certain behaviour on an interval also possesses any weaker behaviour on this interval.

At last, rule (C) expresses a simple principle: an influence of variable a onto b whose values can be bounded by a singleton interval, is of constant behaviour.

*Example 3.* A proof of C H for the scheme C and the hypothesis H = t [12*.*5*,*15] [−1*.*05*,*−0*.*5] c shown in Fig. 1 (cf. Example 1) is given in Fig. 4. The

Fig. 4. Proof of the hypothesis *H* from the scheme C in Example 1.

subtrees that are abbreviated by vertical dots are very similar to their siblings and therefore omitted in order to keep the tree small.

The following theorem then guarantees that C |= H holds, too.

Theorem 1 (Soundness). *Let* <sup>C</sup> *be an influence scheme and* <sup>S</sup> *be a statement. If* C S *then* C |= S*.*

*Proof.* First we observe that all the rules are sound in the sense that if C |= T for all premises T of some rule, then C |= S for its conclusion S. This is trivial for rule (F) and can be easily be shown by contradiction for the other 11 rules. The theorem can then easily be shown by induction on the height of a proof tree for C S.

### 4 Completeness for Elementary Diamond-Free Schemes

General Incompleteness. We remark that the calculus of influence is not complete in general. Consider the variable order a<b<c and the scheme C (in grey) and hypothesis H (in dashed blue) represented by the following rectangles.

It seems that H does not follow from C because it demands constant behaviour of an influence F*b,c* on the interval [1, 2] while C only prescribes monotonic behaviour there. However, we have C |= H indeed for the following reason: the combination of S<sup>1</sup> = a [1*,*2] [1*,*2] b with b [1*,*2] [1*,*2] c yields a [1*,*2] [1*,*2] c. Together with a [1*,*2] [1*,*2] c we get a [1*,*2] <sup>→</sup> [1*,*2] c, i.e. we must have that F*a,c* is constant on [1, 2] for any F with F |= C. Since F*a,c* = F*b,c* ◦ F*a,b* and F*a,b* cannot be constant on [1, 2] because of the two statements neighbouring S1, we must indeed have that F*b,c* is constant on [1, 2]. Thus, C |= H but the rules do not support this kind of *backwards* reasoning (from (a, c) to (b, c)). Hence, we have C H.

There are two principal ways to go from here: either extend the calculus by rules formalising this kind of reasoning, or try to achieve completeness for a restricted class of schemes and hypotheses only. We do the latter; the former would require a significant extension of the machinery as the example above shows: backwards reasoning introduces nondeterminism, and in order to resolve it one needs to take contexts of statements into account. This suggests that general completeness may only be achieved through a general extension of the format of rules. Note also that completeness cannot hold for a class of schemes containing inconsistent ones, where C is said to be *consistent* if there is some F s.t. F |= C. The reason is that we have C |= H for any H whenever C is inconsistent, even when H makes an assertion about variables not occurring in C in which case it is clear that H cannot be derived from C.

Normalisation. We develop some general machinery that is useful for obtaining completeness in a restricted case. For a scheme C and variables a, b with a<b we write <sup>C</sup>*a,b* for the set of statements <sup>S</sup> ∈ C s.t. <sup>S</sup> <sup>=</sup> <sup>a</sup> *IqI*- b for some I, q, I .

Definition 2. We call a scheme <sup>C</sup> *separated* if for all a, b ∈ V with a<b there are n ∈ N and x<sup>1</sup> < ... < x*<sup>n</sup>*+1 ∈ Q ∪ {−∞,∞}, behaviours q1,...,q*<sup>n</sup>* and intervals [l1, u1],..., [l*n*, u*n*] s.t.

$$\mathcal{C}\_{a,b} = \left\{ \begin{array}{c} \underbrace{\left[x\_1, x\_2\right] q\_1 \left[l\_1, u\_1\right]}\_{}, \dots, \underbrace{\left[x\_2, x\_3\right] q\_2 \left[l\_2, u\_2\right]}\_{}, \dots, \dots, \underbrace{\left[x\_n, x\_{n+1}\right] q\_n \left[l\_n, u\_n\right]}\_{} \right\} \end{array}$$

This induces a natural notion of *left* and *right neighbour* of a statement T in a separated scheme, denoted *lnb*(T) and *rnb*(T) when they exist.

We say that such a separated C is *minimal* if for all i = 1,...,n we have

a) if q*<sup>i</sup>* = then u*<sup>i</sup>* ≤ u*<sup>i</sup>*+1 and l*<sup>i</sup>*−<sup>1</sup> ≤ l*i*, b) if q*<sup>i</sup>* = then l*<sup>i</sup>* ≥ l*<sup>i</sup>*+1 and u*<sup>i</sup>*−<sup>1</sup> ≥ u*i*i c) if q*<sup>i</sup>* = → then u*<sup>i</sup>* ≤ min(u*<sup>i</sup>*−<sup>1</sup>, u*<sup>i</sup>*+1) and l*<sup>i</sup>* ≥ max(l*<sup>i</sup>*−<sup>1</sup>, l*<sup>i</sup>*+1),

where we set l<sup>0</sup> = l*<sup>n</sup>*+1 := −∞ and u<sup>0</sup> = u*<sup>n</sup>*+1 := ∞ to avoid case distinctions.

C is called *transitive* if for all a, b, c ∈ V with a<b<c and all x, y ∈ R we have the following: if <sup>x</sup> <sup>∈</sup> <sup>I</sup>1, <sup>y</sup> <sup>∈</sup> <sup>I</sup><sup>2</sup> for some statement <sup>a</sup> *<sup>I</sup>*<sup>1</sup> *<sup>q</sup>*<sup>1</sup> *<sup>I</sup>*<sup>2</sup> <sup>b</sup> ∈ C, and <sup>y</sup> <sup>∈</sup> <sup>I</sup><sup>3</sup> for some statement <sup>b</sup> *<sup>I</sup>*<sup>3</sup> *<sup>q</sup>*<sup>2</sup> *<sup>I</sup>*<sup>4</sup> <sup>c</sup> ∈ C, then there is a statement <sup>a</sup> *<sup>I</sup>*<sup>5</sup> *<sup>q</sup>*<sup>3</sup> *<sup>I</sup>*<sup>6</sup> <sup>c</sup> ∈ C s.t. <sup>x</sup> <sup>∈</sup> <sup>I</sup><sup>5</sup> and <sup>I</sup><sup>6</sup> <sup>⊆</sup> <sup>I</sup>4.

C is called *normalised* if it is separated, minimal and transitive.

So, intuitively, separation and minimality predict that the statements in a normalised scheme can be arranged as a sequence of horizontally adjacent rectangles, for each pair of variables a, b, with no gaps in between, and no statement can be strengthened further because of its left or right neighbours (compare this

Fig. 5. A normalisation C<sup>∗</sup> (red) of the influence scheme C from Example 1 (Color figure online) (grey).

to the strengthening rules (L<sup>+</sup> )–(R<sup>+</sup> )). Transitivity means that C is complete in the sense that whenever it allows F*a,b*(x) = y and F*b,c*(y) = z for some x, y, z, then it must also predict the possibility of F*a,c*(x) = z.

Lemma 1 (Normalisation Lemma). *Let* <sup>C</sup> *be a consistent scheme. There is a normalised scheme* C<sup>∗</sup> *s.t.* C<sup>∗</sup> ≡ C *and for all* T ∈ C<sup>∗</sup> *we have* C T*.*

*Proof.* (Sketch) We successively transform C into C<sup>∗</sup> using operations that follow rule applications. (G), (I <sup>+</sup>) and (I <sup>−</sup>) (in restricted form) can be used to obtain separation, (L<sup>+</sup> )–(R<sup>+</sup> ) to ensure minimality, and (T) together with (J) to ensure transitivity. The trick is then to arrange the process of saturating C by adding new statements and replacing some with others in a terminating way.

In the following, we will write C<sup>∗</sup> to denote a normalised scheme obtained from C that satisfies the conditions of this lemma. Note that C<sup>∗</sup> is not necessarily unique; for example statements with adjacent domains and equal ranges and behaviours can be merged using rule (J) or statements can be split w.r.t. to their domain using (I <sup>−</sup>) without breaking the conditions of the lemma.

*Example 4.* Figure 5 shows the result of normalising the scheme C from Example 1 (grey rectangles) as a scheme C<sup>∗</sup> with 11+25+11=47 statements shown as red rectangles. It should be clear that the hypothesis H, also depicted here as a blue rectangle, does indeed follow from C<sup>∗</sup>: intuitively, it is impossible to draw an influence experiment into these diagrams as three functions that traverse through the red rectangles in the prescribed ways without also traversing through the blue rectangle correctly.

Figure 5 suggests the use of the normalisation process for proof construction: a close inspection of the example proof in Fig. 4 allows the origin of the red rectangles touched by the hypothesis H to be traced back to the grey ones from the original scheme.

Countermodel Construction. The following two lemmas contain one of the main ingredients for obtaining a completeness result: they show how to construct influences on a particular statement in a normalised scheme piecewise to one that satisfies all the statements for the same variables in this scheme. Note that this does not construct an influence experiment (yet) as it does not show how to construct influences for other pairs of variables.

We first make an observation about the possibility to satisfy statements in a normalised scheme by particular influences. A sequence S1,...,S*<sup>m</sup>* of statements S*<sup>i</sup>* = a [*xi,yi*] *<sup>q</sup><sup>i</sup>* [*x*- *i,y*- *<sup>i</sup>*] b is called *connected* if y*<sup>i</sup>* = x*i*+1, i.e. S*i*+1 = *rnb*(S*i*) for all i<n. A *connector* for S1,...,S*<sup>n</sup>* is an influence f s.t. *dom*(f)=[x1, y*n*] and, for all <sup>i</sup> <sup>≤</sup> <sup>n</sup>, we have that <sup>f</sup> <sup>|</sup><sup>=</sup> [*xi,yi*] *<sup>q</sup><sup>i</sup>* [*x*- *i,y*- *<sup>i</sup>*] . Such a connector f is *strict* if, additionally, for all i ≤ n we have f <sup>|</sup><sup>=</sup> [*xi,yi*] *<sup>q</sup>*- [*x*- *i,y*- *<sup>i</sup>*] for any q ≺ q*i*. It is *range-covering* if there are x, y ∈ [x1, y*n*] such that f(x) = min{x 1,...,x *<sup>n</sup>*} and f(y) = max{y 1,...,y *<sup>n</sup>*}. Sometimes, we will need to construct connectors for single statements S which are simply sequences of length 1 only.

Lemma 2 (Connectors Lemma). *Let* <sup>C</sup> *be consistent and normalised and* S = a [*x,x*- ] *q* [*y,y*- ] b ∈ M*.*


*Proof.* (Sketch) Parts (a)–(c) essentially boil down to a case distinction, depending on the behaviour q. However, it is relatively easy to observe that the requirements in all three cases are always satisfiable by a function that is either linear or composed of two linear functions on the interval [x, x ], making use of the intuitive fact that in a rectangle, with two points given on the left and right edge and one in the middle, it is always possible to draw a (straight) line within this rectangle from the left point to the middle one, and then continue it to the right one. Part (d) requires a decomposition of the sequence S1,...,S*<sup>n</sup>* according to their behaviours.

An immediate consequence of this is the possibility to build influences for not just a single statement in a normalised scheme, but in fact for all the statements concerning the same pair of variables. This crucially relies on parts (b) and (c) of Lemma 2.

Lemma 3 (Small Extension Lemma). *Let* <sup>V</sup> *be a partially ordered set of variables,* a, b ∈ V *s.t.* a<b*, and* C *be a consistent and normalised* V*-influence scheme s.t.*

$$\mathcal{L}\_{a,b} = \{ \underbrace{\underbrace{[x\_1, x\_2] \, q\_1 \, I\_1}\_{T\_1}}\_{T\_1}, \underbrace{\underbrace{[x\_2, x\_3] \, q\_2 \, I\_2}\_{T\_2}}\_{T\_2}, \dots, \underbrace{\underbrace{[x\_n, x\_{n+1}] \, q\_n \, I\_n}\_{T\_n}}\_{T\_n} \} \cdot$$

*Let* 1 ≤ j ≤ k ≤ n *and* f *be a connector for* T*<sup>j</sup>* ,...,T*k. Then there is an influence* f *s.t. dom*(f)=[x1, x*<sup>n</sup>*+1]*,* f |= T*<sup>j</sup> for all* j = 1,...,n*, and* f(x) = f (x) *for all* x ∈ [x*<sup>j</sup>* , x*<sup>k</sup>*+1]*.*

Completeness for Elementary Schemes over Diamond-Free Orders. Let C be a class of pairs of schemes and statements. We say that the calculus of influence is *complete for* C if for all (C, S) ∈ C we have: if C |= S then C S. We now concentrate on a class that allows for a construction proving completeness, and which still captures a large class of experiments and hypotheses occurring in natural sciences, cf. the concluding section for a discussion on that.

We call a pair (a, b) of variables *elementary* if a<b and there is no c s.t. a<c<b. Any finite partial order is the (reflexive-)transitive closure of a finite set of elementary pairs. A statement a *IqI*- b is called *elementary* if (a, b) is elementary. A scheme C is called *elementary* if all T ∈ C are elementary.

We say that the partial order ≤ is *diamond-free* if for all a, b, c, d: if a ≤ b ≤ d and a ≤ c ≤ d then b ≤ c or c ≤ b. In a finite diamond-free partial order, for every pair (a, b) with a<b there is a unique sequence c1,...,c*<sup>n</sup>* for some n ≥ 0 s.t. (a, c1),(c*n*, b) and (c*i*, c*<sup>i</sup>*+1) for i = 1,...,n − 1 are all elementary.

In a diamond-free elementary scheme, all *derivable* non-elementary statements can be traced back to applications of the transitivity rule (T). Moreover, in any normalisation of a diamond-free elementary scheme obtained as in Lemma 1, all non-elementary statements can be traced back to an application of rule (T).

Lemma 4 (Decomposition Lemma). *Let* <sup>C</sup> *be an elementary scheme over a diamond-free partial order and* C<sup>∗</sup> *be a normalisation of* C *obtained via Lemma 1. Suppose* T = a *IqI*- c ∈ C<sup>∗</sup> *such that* (a, c) *is non-elementary. Then there is* b *with* a<b<c *and* S = a *I q*<sup>1</sup> *<sup>I</sup>*<sup>1</sup> b *and* S 1,...,S *<sup>n</sup> such that* S, S 1,...,S *<sup>n</sup>* ∈ C<sup>∗</sup>*, joining* S 1,...,S *<sup>n</sup> via* (J) *yields* S = b *<sup>I</sup>*<sup>2</sup> *<sup>q</sup>*<sup>1</sup> *<sup>I</sup>*- c*, and* q = q<sup>1</sup> ⊗ q2*,* I<sup>1</sup> ⊆ I2*.*

The key ingredients are that all non-elementary statements in C<sup>∗</sup> are derivable in C, and the fact that C<sup>∗</sup> is normalised, whence a derivation of T in C can be used to generate a derivation of T in C<sup>∗</sup>. Note that w.l.o.g. we can assume that I<sup>1</sup> = I<sup>2</sup> in the above lemma.

Now let C be an elementary diamond-free scheme. We observe that any influence experiment that satisfies all statements in C on elementary relations automatically satisfies all *derivable* statements on non-elementary relations due to correctness of the rules in the calculus of influence, in particular their observance of the coherence principle. This yields the following.

Lemma 5 (Sufficiency Lemma). *Let* <sup>C</sup> *be an elementary and diamond-free scheme, and let* C<sup>∗</sup> *be a normalisation of* C *obtained via Lemma 1. Then any influence experiment that satisfies all elementary statements in* C<sup>∗</sup> *satisfies* all *statements of* C<sup>∗</sup>*.*

The next lemma then contains the heart of the completeness proof. It shows how to construct counterexamples, in the form of specific influence experiments, for normalised schemes and hypotheses that appear to state something different to what is contained in the normalised scheme.

Lemma 6 (Counterexample Lemma). *Let* <sup>C</sup> *be a consistent, elementary scheme over a diamond-free partial order and* C<sup>∗</sup> *be a normalisation of* C *obtained* *via Lemma 1. Let* a, b ∈ V *s.t.* a<b *and*

$$\mathcal{C}\_{a,b}^\* = \{ \underbrace{\underbrace{\left[x\_1, x\_2\right] \, q\_1 \left[l\_1, u\_1\right]}\_{T\_1}}\_{T\_1}, \underbrace{\underbrace{\left[x\_2, x\_3\right] \, q\_2 \left[l\_2, u\_2\right]}\_{T\_2}}\_{T\_2}, \dots, \underbrace{\underbrace{\left[x\_n, x\_{n+1}\right] \, q\_n \left[l\_n, u\_n\right]}\_{T\_n}}\_{T\_n} \} \ \ \} \ \ \ \ \} \ \ \ \ \ \ \ \ \ \ \ \ \text{N}$$

*Let* H = a [*x*0*,y*0] *<sup>q</sup>* [*l,u*] b*. If one of the following conditions holds, then there is an influence experiment* F *s.t.* F |= C<sup>∗</sup> *but* F |= H*.*


*Proof.* (Sketch) We give a high-level, intuitive idea of the construction. If (a, b) is elementary, it suffices to find an F*a,b* such that F*a,b* |= C<sup>∗</sup> *a,b* but F*a,b* |= J. The functions for the other elementary relations can be interpreted in an arbitrary fashion such that F*c,d* satisfies C<sup>∗</sup> *c,d* for all (c, d). This is always possible since C, and hence C<sup>∗</sup> is consistent. The interpretations of the non-elementary relations are then obtained automatically via the coherence principle; note that this always satisfies any statements on the respective non-elementary relations due to Lemma 5.

Case (a) is the simpler one. Here, [x0, y0] - [x1, x*<sup>n</sup>*+1]. Hence, it suffices to construct an experiment F s.t. *dom*(F*a,b*)=[x1, x*<sup>n</sup>*+1], whence F |= H. We need to ensure F |= C by simply truncating the domain of any influence experiment that satisfies C. Such an experiment exists since C is consistent.

For case (b), H disagrees with the statements in C<sup>∗</sup> *a,b* in at least one of two ways: (i) it restricts the values of an experiment at some point x more than the unique statement T*<sup>i</sup>* in the sequence in C<sup>∗</sup> *a,b* covering x does. Then we pick a value y that is covered by the vertical interval in T*<sup>i</sup>* but not in H, use Lemma 2 (a) to obtain a connector that runs through this point (x, y) and extend it to an influence using Lemma 3 to ensure F |= C but F |= H. Or (ii) the behaviour stated in H is strictly stronger than those in the corresponding statements in C∗ *a,b*. Then we obtain a strict connector for these statements using Lemma 2 (d) and extend it accordingly using Lemma 3. Strictness ensures that the influence F*a,b* has the behaviours required by C<sup>∗</sup> but not by H, hence F |= H as well.

If (a, b) is not elementary, by the decomposition lemma (Lemma 4) there is a sequence a = c1,...,c*<sup>n</sup>* = b of elementary relations and a sequence S1,...,S*<sup>n</sup>*−<sup>1</sup> of statements derivable in C<sup>∗</sup> that satisfy the requirements of Lemma 4. We omit case (a). If we are in case (b) (i), again we pick a point (x, y) not covered by H, but by the statements in C<sup>∗</sup> *a,b*. We then generate a sequence of points (x*i*, y*i*) for i ≤ n such that x = x<sup>1</sup> and y*<sup>i</sup>* = x*<sup>i</sup>*+1 for all i<n and y*<sup>n</sup>* = y. It then suffices to invoke Lemma 2 (a) and Lemma 3 to complete the individual relations F*<sup>c</sup>i,ci*+1 such that they go through the point (x*i*, y*i*).

For the case (b) (ii), it suffices to build interpretations of the F*<sup>c</sup>i,ci*+1 that are strict w.r.t. S*i*. However, for i > 1, the statement T*<sup>i</sup>* might not exist in C<sup>∗</sup>, but may only be derivable via (J). We use Lemma 2 (d) to obtain a strict, rangecovering connector for the sequence of statements that derive S*<sup>i</sup>* and, again, use Lemma 3 to complete it into an influence for F*<sup>c</sup>i,ci*+1 . Since these connectors are range-covering, we obtain a strict interpretation for F*a,b* from these intermediate F*ci,ci*+1 , which is the desired contradiction.

#### Theorem 2 (Completeness for Elementary Diamond-Free Schemes).

*The calculus of influence is complete for the class of consistent and elementary schemes over diamond-free partial orders, and arbitrary hypotheses.*

*Proof.* Let C be consistent and elementary, its underlying partial order ≤ be diamond-free. Let C<sup>∗</sup> be a normalisation of C obtained via Lemma 1. Hence, C<sup>∗</sup> is also consistent. Let H = a [*x,y*] *q I* b s.t. a<b and suppose that

$$\mathcal{C}\_{a,b}^\* = \{ \underbrace{\underbrace{[x\_1, x\_2] \, q\_1 \, I\_1}\_{T\_1}}\_{T\_1}, \underbrace{\underbrace{[x\_2, x\_3] \, q\_2 \, I\_2}\_{T\_2}}\_{T\_2}, \dots, \underbrace{\underbrace{[x\_n, x\_{n+1}] \, q\_n \, I\_n}\_{T\_n}}\_{T\_n} \} $$

Moreover, by Lemma 1 we have C T*<sup>i</sup>* for all i = 1,...,n.

If x<x<sup>1</sup> or y>x*<sup>n</sup>*+1 then Lemma 6 (a) would yield a contradiction to the assumption that C<sup>∗</sup> |= H. Thus, there are i and j s.t. x ∈ [x*i*, x*<sup>i</sup>*+1] and y ∈ [x*<sup>j</sup>* , x*<sup>j</sup>*+1]. Now we must have *<sup>j</sup> <sup>h</sup>*=*<sup>i</sup>* I*<sup>h</sup>* ⊆ I and sup(q*i*,...,q*<sup>j</sup>* ) q for otherwise Lemma 6 (b) would yield a contradiction to the assumption that C<sup>∗</sup> |= H.

Let T := a [*xi,xj*+1] sup(*qi,...,q<sup>j</sup>* ) *<sup>I</sup>i*∪*...*∪*I<sup>j</sup>* b. By repeated applications of rule (J), T is provable from T*i*,...,T*<sup>j</sup>* , whence C T. Moreover, H is provable from T by at most one application of rule (I <sup>−</sup>) and (Q−) each. So C H as well.

The completeness proof shows that for any consistent scheme there is always a satisfying experiment that is comprised of stepwise linear functions. One may argue that this does not capture the heart of functional behaviour in natural sciences. It is possible, though, to require influences not only to be continuous but even differentiable (on their domains). To fulfil this requirement, one could simply use splines of order 3 in the proof of Lemma 2 with their first derivative being 0 at the left and right edges of each rectangle.

### 5 Proof Search and Empirical Results

We observe that the consequence relation between influence schemes and hypotheses is in fact polynomial-time decidable, using a bottom-up approach.

Theorem 3. *The problem of deciding, given a scheme* <sup>C</sup> *and a hypothesis* <sup>H</sup>*, whether or not* C <sup>H</sup> *holds, is decidable in time* |C|<sup>O</sup>(1)*.*

*Proof.* A close inspection of the proof rules shows that rule (I <sup>−</sup>) can always be pushed downwards in a proof and successive applications of it can be shortened to a single one, s.t. C H iff there is some H which is provable from C without using rule (I <sup>−</sup>), but H can be derived from H by a single application of (I <sup>−</sup>).

Next we observe that all rules except (I <sup>−</sup>) have the following property: the bounds of domain and range of the conclusion are bounds of the domain or range of some premise. This guarantees termination of a simple bottom-up procedure for proof search: saturate C by applications of all rules other than (I −). The number of different statements created this way is bounded by <sup>4</sup>·v<sup>2</sup> ·b<sup>4</sup> <sup>=</sup> <sup>O</sup>(|C|<sup>6</sup>) where v is the number of variables occurring in C, and b is the number of different interval bounds occurring in it. For each of these statements, check whether H can be derived using (I <sup>−</sup>). This can be done in time polynomial in |C|.

An implementation of a proof search tool, written in Python, is publicly available.<sup>1</sup> The repository also contains formalisations of some influence schemes and examples of statements whose derivability can be checked using the tool. A deeper look at the implementation details is beyond the scope of this paper and deferred for space considerations. It uses a more sophisticated top-down proof search that constructs only the relevant part of the normalisation of a scheme, i.e. only "around" those statements that can occur in a proof for the given hypothesis H. This can not only contain statements about other variables due to rule (T) but also statements further away from H because rules (L<sup>+</sup> )–(R<sup>+</sup> ) can transmit requirements on underlying influence experiments along the horizontal axis.

### 6 Conclusion

We presented a simple language for statements about the behaviour of functions in a collection that can be interpreted as a way that different entities influence one another. We gave it a formal semantics and devised a proof calculus to characterise the (uncountable) notion of logical consequence that is generally sound and complete for a large class of schemes that covers typical cases occurring in the formal modelling of experimental setups from natural science classes.

It remains to be seen whether the calculus can be extended logically (by further rules for instance) to completely capture a larger class of influence schemes.

Future work will also comprise a number of extensions of the calculus for the purpose of obtaining higher expressiveness. Some experimental setups are inherently temporal in the sense that the influence which a asserts on b depends on a value range of a and a point in time, as in "*Yeast grows at temperatures between 15 and 40* ◦*during the next five minutes.*" We have made a proposal to incorporate time in [4]. It also incorporates the ability to make refined assertions about the behaviour of an influence, as in "*Voltage increase is at most* 65.4 V *msec*−<sup>1</sup>." This replaces the abstract behaviours etc. by intervals like [0, 65.4], and the geometric interpretation of a statement becomes a trapezoid.

Formal statements could also include a third interval denoting time points, and influence experiments become collections of binary real-valued functions which interpret cuboids in three-dimensional real spaces. This would also be an approach to model the combined effect of several variables on another variable, even if the modeling of time as a special variable is not desired.

Acknowledgement. We thank Shahla Rasulzade for discussions that have led to this work, and for suggesting to study a temporal extension thereof.

<sup>1</sup> https://github.com/SoerenMoeller/influence\_solver.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Theory of Cartesian Arrays (with Applications in Quantum Circuit Verification)**

Yu-Fang Chen<sup>1</sup> , Philipp R¨ummer2,3(B) , and Wei-Lun Tsai<sup>1</sup>

<sup>1</sup> IIS, Academia Sinica, Taipei, Taiwan <sup>2</sup> University of Regensburg, Regensburg, Germany ph r@gmx.net <sup>3</sup> Uppsala University, Uppsala, Sweden

**Abstract.** We present a theory of Cartesian arrays, which are multidimensional arrays with support for the projection of arrays to subarrays, as well as for updating sub-arrays. The resulting logic is an extension of Combinatorial Array Logic (CAL) and is motivated by the analysis of quantum circuits: using projection, we can succinctly encode the semantics of quantum gates as quantifier-free formulas and verify the end-to-end correctness of quantum circuits. Since the logic is expressive enough to represent quantum circuits succinctly, it necessarily has a high complexity; as we show, it suffices to encode the k-color problem of a graph under a succinct circuit representation, an NEXPTIME-complete problem. We present an NEXPTIME decision procedure for the logic and report on preliminary experiments with the analysis of quantum circuits using this decision procedure.

### **1 Introduction**

There has been extensive research on logics to reason about array data-types in programs. Arrays can concisely represent the values of an unbounded number of memory locations, and have been successfully applied to verify industrial-scale programs [11,15,29]. An array formula encoding the semantics of a program path is typically linear in the number of program statements. Much of the existing work focuses on one-dimensional arrays and uses nesting to handle the case of multiple dimensions.

This paper studies a logic called *Cartesian Array Logic (CaAL)*, in which multi-dimensional arrays are treated as first-class citizens. The motivation for designing this logic comes from developing a tailormade theory for reasoning about *quantum circuits* or *programs,* which need a fundamentally different representation of states than classical programs. *Quantum states* exist in a *superposition* of classical states. Figure 1 gives an example of a 5-qubit quantum state,


which can be interpreted as a probability distribution over 2<sup>5</sup> classical states; every classical state, which can be seen as a string of n bits, is associated with a probability of being observed.

Current SMT-based solutions for reasoning about quantum programs [3] encode program paths to a Satisfiability Modulo Theories (SMT) formula over the theory of real numbers. For a n-qubit quantum program, the direct encoding uses 2<sup>n</sup> variables to represent the execution of a quantum circuit, one variable per classical state. The formula representing a quantum circuit is exponential in the circuit size.

In the Cartesian Array Logic designed in this paper, one can instead encode an <sup>n</sup>-qubit quantum state as an array <sup>s</sup> : (B<sup>n</sup> <sup>⇒</sup> <sup>C</sup>) that maps each classical state to a complex number c encoding the probability of this classical state being observed. The squared absolute value <sup>|</sup>c<sup>|</sup> <sup>2</sup> is the probability that the complex number c encodes. *Quantum gates,* the basic operating units of a *quantum circuit,* can be viewed as functions that transform one quantum state to another. We show that CaAL can concisely encode the semantics of quantum gates, so that a path formula becomes linear in the circuit size. The semantics of a quantum circuit is the composition of the gate encodings.

*Structure of the Paper.* The syntax and formal semantics of the CaAL logic will be given in Sect. 2. In the same section, we show that this logic is quite expressive, it can easily encode the satisfiability problem of a quantified Boolean formula (QBF). We show that deciding the logic is, in fact, NEXPTIME-hard by a polynomial reduction from the k-color problem of a succinct circuit representation of graphs [23]. As an application, in Sect. 3, we show that the logic can concisely encode the semantics of *quantum circuits*, using B<sup>n</sup> as the index type and C as the value type. In Sect. 4, we present a decision procedure for CaAL, extending the classical approach of read-over-write propagation used for arrays. In the worst case, our procedure might perform an exponential number of such propagations; hence, if the underlying logic can be decided in NP, our logic can be decided in NEXPTIME. The preliminary experimental results of applying this decision for quantum circuit verification can be found in Sect. 5.

*Contributions* of the paper are (i) a new array logic, CaAL, with native support for multi-dimensional arrays; (ii) the proof the satisfiability problem of CaAL is NEXPTIME-hard; (iii) a linear encoding of the semantics of quantum circuits in CaAL; (iv) an NEXPTIME decision procedure for CaAL without nested array sorts; and (v) a preliminary evaluation of our approach using standard quantum circuits.

*Related Work on Verification of Quantum Circuits.* Although quantum states can be naturally represented as arrays, the connection between array theories and quantum circuit verification is novel, to the best of our knowledge. In the past, people have considered automated quantum circuit verification based on automata [7], various types of equivalence checking [1,9,19,33], abstract interpretation [24,34], and model checking [13,21,32]. However, techniques based on satisfiability modulo theories (SMT) are still lacking. The closest work to ours is a symbolic execution and verification framework of quantum circuits [3]. The work encodes quantum circuit verification problems into SMT with the theory of real numbers, using variables in trigonometric functions, e.g., sin x, which might lose precision in corner cases. As mentioned, their approach requires 2<sup>n</sup> variables to encode a n-qubit circuit in the worst case. As far as we know, our work is the first SMT-based approach that allows a precise and succinct encoding and verification of quantum circuits.

*Related Work on Array Theories.* There is a large body of research on array decision procedures for SMT, going back to the 1980s, and most SMT solvers implement at least the theory of extensional arrays (with operations *read* and *write*/*store*) in our paper, as standardized in SMT-LIB [2]. Stump et al. [29] presented a decision procedure for this theory and formed the basis for many later procedures. An extension of the theory, called Combinatorial Array Logic (CAL), with functions for *constant arrays* and for the *point-wise extension of functions* was presented by De Moura et al. [11]. CAL served as the main inspiration for our work and is in this paper extended further by adding *projections* and *updates of sub-arrays.* An extension of CAL with *cardinality constraints* was presented by Raya et al. [25]. Christ et al. [8] present an algorithm for the theory of arrays where lemmas are created lazily based on weak equivalences; this method was later extended to handle *constant arrays* [20].

There are also many more generalized decision procedures for arrays. For instance, Ganesh et al. [16] focus on the combined theory of arrays and bitvectors and present a decision procedure based on pre-processing, bit-blasting, and linear arithmetic solving. Brummayer et al. present a decision procedure for the same theory that introduces lemmas lazily, guided by congruence closure [6]. An extended array theory tailored to software, including operations memset and memcpy, was presented by Falke et al. [12]. More recently, several theories of finite arrays were proposed. Bonacina et al. [5] extend the standard theory of arrays with an abstract notion of length, and present a decision procedure based on the CDSAT framework. Wang et al. [31] consider a logic extending CAL with a length function, as well as operations for concatenation, slicing, and repetition of arrays, and identify a decidable fragment. Sheng et al. [27] propose a theory of sequences that combines the standard array operations with a length function, concatenation, and slicing. All those logics cannot directly encode quantum circuits in a similar style as CaAL, however, since no projection operation is available.

### **2 A Theory of Cartesian Arrays**

### **2.1 Preliminaries**

We work in the setting of multi-sorted first-order logic with equality; see, e.g., [18]. A signature is a tuple Σ = (ΣS, Σ<sup>F</sup> , Σ<sup>P</sup> ) consisting of a set Σ<sup>S</sup> of sorts, a set Σ<sup>F</sup> of function symbols, and a set Σ<sup>P</sup> of predicates. Predicates and functions have fixed arity and argument sorts, and functions have a fixed result sort. Given a signature <sup>Σ</sup> and a set <sup>X</sup> of sorted variables, we define the usual notions of Σ-terms, Σ-atoms, Σ-literals, Σ-formulas, and Σ-sentences. Formulas are evaluated over <sup>Σ</sup>-structures <sup>M</sup> = (D, I) that interpret every sort <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>S</sup> as a non-empty domain <sup>I</sup>(σ) <sup>⊆</sup> <sup>D</sup>, predicates <sup>p</sup> <sup>∈</sup> <sup>Σ</sup><sup>P</sup> as relations <sup>I</sup>(p), and functions <sup>f</sup> <sup>∈</sup> <sup>Σ</sup><sup>F</sup> as set-theoretical functions <sup>I</sup>(f). We slightly abuse notation; we assume that also variables <sup>x</sup> ∈ X are mapped to values <sup>I</sup>(x) by <sup>M</sup>. The evaluation of terms, formulas, etc., is defined as is common; the equality symbol = is assumed to be interpreted as the equality relation on D. A theory T over Σ is a set of Σ-sentences. A Σ-formula φ is called T-satisfiable if there is a Σ-structure M satisfying both the T-axioms and φ.

#### **2.2 Definition of the Theory of Cartesian Arrays**

Cartesian arrays are introduced in the context of a base signature Σ<sup>B</sup> and a base ΣB-theory TB, which provides the index and value sorts for arrays. The signature ΣCaAL = (Σ<sup>S</sup> CaAL, Σ<sup>F</sup> CaAL, Σ<sup>P</sup> CaAL) of CaAL is then defined as follows. The set of sorts is the least set Σ<sup>S</sup> CaAL such that (i) <sup>Σ</sup><sup>S</sup> <sup>B</sup> <sup>⊆</sup> <sup>Σ</sup><sup>S</sup> CaAL, and (ii) σ, τ <sup>∈</sup> <sup>Σ</sup><sup>S</sup> CaAL and <sup>n</sup> <sup>∈</sup> <sup>N</sup>><sup>0</sup> imply (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> ) <sup>∈</sup> <sup>Σ</sup><sup>S</sup> CaAL. A sort (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> ) is an array sort of arity n with index sort σ and value sort τ .


**Table 1.** Operations included in Σ<sup>F</sup> CaAL for each sort (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> ).

The set Σ<sup>F</sup> CaAL includes <sup>Σ</sup><sup>S</sup> <sup>B</sup>, as well as the operations listed in Table 1 for every array sort (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> ). The operators ·[·,..., ·] and *store* are the functions for reading from and writing to arrays, as in the standard theory of arrays. K and *map*<sup>f</sup> correspond to the functions introduced in CAL [11]; in particular, any base function <sup>f</sup> <sup>∈</sup> <sup>Σ</sup><sup>F</sup> <sup>B</sup> is lifted to an operator on arrays using *map*<sup>f</sup> . The operators *proj* and *arrayStore* are specific to our theory CaAL, and can be used to project an <sup>n</sup>-dimensional array to an (<sup>n</sup> <sup>−</sup> 1)-dimensional sub-array by fixing the value of the k'th index, and to update the corresponding portion of the original array, respectively. The set Σ<sup>P</sup> CaAL coincides with <sup>Σ</sup><sup>P</sup> <sup>B</sup> . Semantics is defined by the axiom schemata in Table 2.

*Example 1.* We illustrate the use of two-dimension arrays s, s : (B<sup>2</sup> <sup>⇒</sup> <sup>C</sup>) to encode two-qubit quantum states. Suppose that s represents the state √ 1 <sup>2</sup> (|00 <sup>+</sup> <sup>|</sup>11), and <sup>s</sup> <sup>=</sup> <sup>X</sup>2(s) is the quantum state after applying an <sup>X</sup> gate (the quantum version of a "not"-gate) on the 2nd qubit of s. The matrix representations of s and s are as follows; note that the results of x<sup>2</sup> = 0 and x<sup>2</sup> = 1 are swapped in s and s .

$$s = \begin{array}{cccc} \, \_{x\_1=0} & \, \_{x\_1=1} & & \, \_{x\_1=0} & \, \_{x\_1=1} \\ s = \, \_{x\_2=1} & \left( \begin{array}{ccc} \frac{1}{\sqrt{2}} & 0 \\ 0 & \frac{1}{\sqrt{2}} \end{array} \right) & , & s' = \, \_{x\_2=1} & \left( \begin{array}{ccc} 0 & & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & 0 \end{array} \right) & . \end{array}$$

The projection *proj* <sup>1</sup>(s, k) maps the matrix s to its k'th column vector, specifically the column with x<sup>1</sup> = k. In CaAL, we can construct s from s as s = *arrayStore*2(*arrayStore*2(K(0), 1, *proj* <sup>2</sup>(s, 0)), 0, *proj* <sup>2</sup>(s, 1)). To compute the sum of the two matrices, we use *map*+(s, s ), which is also utilized for other quantum gate operations.

Several extensions of the theory of Cartesian arrays are possible but beyond the scope of this paper. Those include (i) arrays with multiple different index sorts, as opposed to just n copies of the same index sort σ; and (ii) a theory that also includes point-wise extensions of predicates.

**Table 2.** Axioms of the Theory of Cartesian Arrays. As shorthand notation, we write ¯<sup>i</sup> : <sup>σ</sup><sup>n</sup> for a vector of <sup>n</sup> index variables <sup>i</sup>1 : σ, . . . , i<sup>n</sup> : <sup>σ</sup>.

<sup>∀</sup><sup>a</sup> : (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> ),¯<sup>i</sup> : <sup>σ</sup><sup>n</sup>, x : τ. *store*(a,¯i, x)[¯i] = <sup>x</sup> (1) <sup>∀</sup><sup>a</sup> : (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> ),¯<sup>i</sup> : <sup>σ</sup><sup>n</sup>, ¯<sup>j</sup> : <sup>σ</sup><sup>n</sup>, x : τ. ¯<sup>i</sup> <sup>=</sup> ¯<sup>j</sup> <sup>∨</sup> *store*(a,¯i, x)[ ¯<sup>j</sup> ] = <sup>a</sup>[ ¯<sup>j</sup> ] (2) <sup>∀</sup>a, b : (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> ). <sup>∃</sup>¯<sup>i</sup> : <sup>σ</sup><sup>n</sup>. <sup>a</sup> <sup>=</sup> <sup>b</sup> <sup>∨</sup> <sup>a</sup>[¯i] <sup>=</sup> <sup>b</sup>[¯i] (3) <sup>∀</sup><sup>x</sup> : τ,¯<sup>i</sup> : <sup>σ</sup><sup>n</sup>. <sup>K</sup>(x)[¯i] = <sup>x</sup> (4) <sup>∀</sup>a<sup>1</sup> : (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup>1),...,a<sup>k</sup> : (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup>k),¯<sup>i</sup> : <sup>σ</sup><sup>n</sup>. *map*<sup>f</sup> (a1,...,ak)[¯i] = <sup>f</sup>(a1[¯i],...,ak[¯i]) (5) <sup>∀</sup><sup>a</sup> : (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> ),¯<sup>i</sup> : <sup>σ</sup><sup>n</sup>. *proj* <sup>k</sup>(a, ik)[i1,...,i<sup>k</sup>−1, i<sup>k</sup>+1,...,in] = <sup>a</sup>[¯i] (6) <sup>∀</sup><sup>a</sup> : (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> ), b : (σ<sup>n</sup>−<sup>1</sup> <sup>⇒</sup> <sup>τ</sup> ),¯<sup>i</sup> : <sup>σ</sup><sup>n</sup>. *arrayStore*k(a, ik, b)[¯i] = <sup>b</sup>[i1,...,i<sup>k</sup>−1, i<sup>k</sup>+1,...,in] (7) <sup>∀</sup><sup>a</sup> : (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> ), b : (σ<sup>n</sup>−<sup>1</sup> <sup>⇒</sup> <sup>τ</sup> ),¯<sup>i</sup> : <sup>σ</sup><sup>n</sup>, j : σ. <sup>j</sup> <sup>=</sup> <sup>i</sup><sup>k</sup> <sup>∨</sup> *arrayStore*k(a, j, b)[¯i] = <sup>a</sup>[¯i] (8)

#### **2.3 Complexity of Satisfiability in CaAL**

We now study the hardness of satisfiability of quantifier-free CaAL formulas. The quantified Boolean formula problem (QBF) generalizes the Boolean satisfiability problem by allowing *existential* and *universal* quantifiers to be applied to variables. Its satisfiability problem is *PSPACE-complete* [28]. Without loss of generality, we can assume that QBF formulas are in *prenex normal form* <sup>Q</sup>1x1.Q2x2. ··· <sup>Q</sup>nxn.φ, which consists of a Boolean formula <sup>φ</sup> over <sup>n</sup> Boolean variables <sup>x</sup>1,...,xn, and a prefix of quantifiers <sup>Q</sup>1, Q2,...,Q<sup>n</sup> ∈ {∀, ∃}.

To reduce the satisfiability problem of QBF to CaAL, we assume that the base theory provides a sort B with the standard operations. This sort will be used for both index and values. An array toCaAL(φ):(B<sup>n</sup> <sup>⇒</sup> <sup>B</sup>) encoding the semantics of φ is defined recursively as follows:

– toCaAL(xk) = *arrayStore*k(K(0), 1, K(1)). – toCaAL(¬φ) = *map*¬(toCaAL(φ)). – toCaAL(φ<sup>1</sup> <sup>∧</sup> <sup>φ</sup>2) = *map*∧(toCaAL(φ1),toCaAL(φ2)).

Observe that *arrayStore*k(K(0), 1, K(1))[i1,...,ik,...,in] = ik, and note that the size of toCaAL(φ) is linear in the size of φ. We can construct a CaAL formula that is equisatisfiable with <sup>Q</sup>1x1. ··· <sup>Q</sup>nxn.φ as follows:

$$\begin{aligned} \mathsf{QELin}(Q\_1 x\_1 \dotsm Q\_n x\_n. \phi) &= \\ \mathsf{Q}(q\_1[0] \odot\_1 q\_1[1]) \wedge \bigwedge\_{i=2}^n q\_{i-1} &= \mathsf{map}\_{\odot\_i}(\mathit{proj}\_i(q\_i, 0), \mathit{proj}\_i(q\_i, 1)) \wedge q\_n = \mathsf{stoCaAL}(\phi) \end{aligned}$$

where <sup>i</sup> <sup>=</sup> <sup>∧</sup> when <sup>Q</sup><sup>i</sup> <sup>=</sup> <sup>∀</sup>, and <sup>i</sup> <sup>=</sup> <sup>∨</sup> otherwise. Note that the QBF formula <sup>Q</sup>1x1. ··· <sup>Q</sup>nxn.φ is valid if and only if the CaAL formula QElim(Q1x1. ··· <sup>Q</sup>nxn.φ) is satisfiable.

**Theorem 1.** *The satisfiability problem of CaAL over* B *is PSPACE-hard.*

This lower bound can be improved, however. The k-colorability problem for graphs with succinct circuit representation is NEXPTIME-complete [23]. This problem can be reduced to the satisfiability problem of CaAL in polynomial time as well.

Consider an undirected graph with 2<sup>n</sup> nodes, and let φ(¯x, x¯ ) be a Boolean circuit encoding the edge relation of the graph: φ(¯x, x¯ ) evaluates to true whenever there is an edge (¯x) <sup>→</sup> (x¯ ) in the graph. The k-colorability of the graph can be characterized as the following formula, where <sup>c</sup> : (B<sup>n</sup> <sup>→</sup> <sup>N</sup>) is an array representing the color of each node:

$$\forall \bar{x}, \bar{x'}: \mathbb{B}^n. \ \phi(\bar{x}, \bar{x'}) \to c[\bar{x}] \neq c[\bar{x'}] \land c[\bar{x}] < k \land c[\bar{x'}] < k \ . \ .$$

In a similar way as for QBF, we encode φ as an array formula φ of linear size, in which <sup>a</sup><sup>φ</sup> : (B<sup>n</sup> <sup>×</sup> <sup>B</sup><sup>n</sup> <sup>⇒</sup> <sup>B</sup>) is an array variable representing the edge relation. We then create two intermediate arrays a, b : (Bn×B<sup>n</sup> <sup>⇒</sup> <sup>N</sup>) and use the following formula in CaAL to encode the relation <sup>∀</sup>x, ¯ <sup>x</sup>¯ : <sup>B</sup>n. a[¯x, <sup>x</sup>¯ ] = <sup>c</sup>[¯x]∧b[¯x, <sup>x</sup>¯ ] = c[x¯ ]:

$$\begin{aligned} \mathsf{EqColor}(a, b, c) & \equiv \\ a = a\_n \land c = a\_0 \land \bigwedge\_{j=1}^n proj\_{j+n}(a\_j, 0) &= proj\_{j+n}(a\_j, 1) = a\_{j-1} \land \\ b = b\_n \land c = b\_0 \land \bigwedge\_{j=1}^n proj\_j(b\_j, 0) &= proj\_j(b\_j, 1) = b\_{j-1} \end{aligned}$$

Then we encode the k-color problem with the following CaAL formula:

$$\phi' \land \mathsf{EqColor}(a, b, c) \land map\_f(a\_\phi, a, b) = K(1)$$
 
$$\text{where} \quad f(e, col1, col2) \equiv e \rightarrow (col1 \neq col2 \land col1 < k \land col2 < k).$$

**Theorem 2.** *The satisfiability problem of CaAL is NEXPTIME-hard.*

### **3 Array Semantics of Quantum Circuits**

As an application, we show that CaAL can encode the semantics of quantum circuits. Below, we only give a short overview of quantum circuits and define notations; for more details, see, e.g., the textbook of Nielsen and Chuang [22].

In a n-qubit quantum, a state is a *superposition* of *computational basis states* {|j | <sup>j</sup> ∈ {0, <sup>1</sup>}<sup>n</sup>}. For example, for a system with three qubits <sup>x</sup>1, <sup>x</sup>2, and <sup>x</sup>3, the computational basis state |101 (in Dirac notation) denotes a state in which both x<sup>1</sup> and x<sup>3</sup> are set to 1, and x<sup>2</sup> is set to 0. A n-qubit quantum state s is then denoted as a formal sum <sup>j</sup>∈{0,1}<sup>n</sup> <sup>c</sup><sup>j</sup> · |j, where <sup>c</sup>0, c1,...,c2n−<sup>1</sup> <sup>∈</sup> <sup>C</sup> are *complex probability amplitudes* satisfying the constraint that <sup>j</sup>∈{0,1}<sup>n</sup> <sup>|</sup>c<sup>j</sup> <sup>|</sup> <sup>2</sup> = 1. Intuitively, <sup>|</sup>c<sup>j</sup> <sup>|</sup> <sup>2</sup> is the probability that when we measure the quantum state <sup>s</sup> in the computational basis, we obtain the basis state <sup>|</sup>j. The constraint <sup>j</sup>∈{0,1}<sup>n</sup> <sup>|</sup>c<sup>j</sup> <sup>|</sup> <sup>2</sup> = 1 states that probabilities need to sum up to 1 for all computational basis states.

We can record a quantum state as an array that maps a computational basis state to its complex probability amplitudes. The state s is represented as an array <sup>s</sup> : (B<sup>n</sup> <sup>⇒</sup> <sup>C</sup>) satisfying <sup>s</sup>[j] = <sup>c</sup><sup>j</sup> for all <sup>j</sup> ∈ {0, <sup>1</sup>}<sup>n</sup>; slightly abusing notation, we denote both the state and the array by s.

#### **3.1 Quantum Circuits**

A *quantum circuit* consists of a sequence of *quantum gates*. Each quantum gate defines a specific transformation of quantum states. For example, the Pauli-X gate (the quantum version of classical "not" gate) on the k-th qubit transforms a state <sup>s</sup> to <sup>s</sup> satisfying <sup>∀</sup><sup>i</sup> <sup>∈</sup> {0, <sup>1</sup>}<sup>k</sup>−<sup>1</sup>, b ∈ {0, <sup>1</sup>}, j ∈ {0, <sup>1</sup>}n−<sup>k</sup> : <sup>s</sup> [ibj] = s[i¯bj], i.e., it negates the k-th index bit. Another example is the Pauli-Z gate on

**Fig. 2.** The EPR circuit, consisting of an H and a CX gate with control qubit (•) and target qubit (⊕).

the k-th qubit, which transforms a state s to <sup>s</sup> satisfying <sup>∀</sup><sup>i</sup> ∈ {0, <sup>1</sup>}<sup>k</sup>−1, b ∈ {0, <sup>1</sup>}, j ∈ {0, <sup>1</sup>}<sup>n</sup>−<sup>k</sup> : <sup>s</sup> [ibj] = *ite*(b, <sup>−</sup><sup>1</sup> · <sup>s</sup>[ibj], s[ibj]). Here, probability amplitudes are multiplied with <sup>−</sup>1 when <sup>b</sup> is 1, and are unchanged otherwise.

A H gate, or Hadamard gate, on the k-th qubit transforms a state s to s satisfying <sup>∀</sup><sup>i</sup> ∈ {0, <sup>1</sup>}<sup>k</sup>−1, b ∈ {0, <sup>1</sup>}, j ∈ {0, <sup>1</sup>}<sup>n</sup>−<sup>k</sup> :

$$s'[ibj] = i te(b, \frac{s[i0j] - s[i1j]}{\sqrt{2}}, \frac{s[i0j] + s[i1j]}{\sqrt{2}}).$$

Notice that the amplitude of a basis state of s is affected by the amplitude of two basis states of √ s, enabling a more diverse superposition. The division with 2 is for normalizing the probability sum.

A more advanced class of gates are multiple-qubit gates. The CX gate ("controlled-X") on the control qubit c and target qubit t applies an X gate to t when c is 1, and is identity otherwise. Formally, assuming c<t, the gate transforms a state <sup>s</sup> to <sup>s</sup> satisfying <sup>∀</sup>i<sup>1</sup> ∈ {0, <sup>1</sup>}<sup>c</sup>−1, b<sup>c</sup> ∈ {0, <sup>1</sup>}, i<sup>2</sup> ∈ {0, <sup>1</sup>}<sup>t</sup>−c−1, b<sup>t</sup> <sup>∈</sup> {0, <sup>1</sup>}, i<sup>3</sup> ∈ {0, <sup>1</sup>}<sup>n</sup>−<sup>t</sup> :

$$s'[i\_1b\_ci\_2b\_ti\_3] = i te(b\_c, s[i\_1b\_ci\_2\bar{b}\_ti\_3], s[i\_1b\_ci\_2b\_ti\_3]).$$

The Toffoli gate CCX ("controlled-controlled-X gate") has two control qubit c, d and applies the X gate to the target qubit t only when c = d = 1.

We have introduced enough quantum gates to define the EPR circuit (Fig. 2), named after Einstein, Podolsky, and Rosen for constructing the Bell state, i.e., a 2-qubit circuit converting a basis state |00 to a maximally entangled state √ 1 <sup>2</sup> (|00 <sup>+</sup> <sup>|</sup>11). Starting from a state <sup>s</sup> (represented <sup>s</sup> that maps 00 to 1 and others to 0, the circuit first applies H on the first qubit x<sup>1</sup> (denoted H<sup>1</sup> in this paper) to produce the quantum state s with s [00] = s [10] = <sup>√</sup> 1 <sup>2</sup> and <sup>s</sup> [11] = s [01] = 0. Then a CX<sup>1</sup>,<sup>2</sup> converts it further to s with s[00] = s[11] = <sup>√</sup> 1 2 and <sup>s</sup>[01] = <sup>s</sup>[10] = 0. Notice that CX<sup>1</sup>,<sup>2</sup> converts <sup>|</sup>10 to <sup>|</sup>11, i.e., when <sup>x</sup><sup>1</sup> is 1, it negates x2.

*Note on Complexity.* Simulation of a quantum circuit is bounded-error quantum polynomial time (BQP) hard, a complexity class that is incomparable with NP,


**Table 3.** Semantics of quantum gates in Cartesian array logic. We use s and s to denote the quantum state before and after executing the circuit.

as it can compute exactly the probability amplitudes of a quantum state after executing a circuit. We will show that the Cartesian array logic can encode the semantics of quantum circuits, so one can also use the logic for quantum circuit simulation. Hence, exponential time is the best deterministic algorithm we can hope for when solving CaAL formulas.

### **3.2 Interpretation of Quantum Gates**

We show the encoding of quantum gates in CaAL in Table 3. Notice that this gate set includes several universal gates (e.g., H, CX, and T [10]) that can approximate *any quantum gate* to an arbitrary precision requirement. Arbitrary degree rotation can also be supported using the theory of reals as the base theory. This paper presents a precise encoding that only requires a theory of integers. In the figure, we use s and s to denote the quantum states (encoded as arrays) before and after executing a quantum gate. To encode s = Xk(s), negating the k-th qubit, we use *proj* <sup>k</sup>(s , 0) = *proj* <sup>k</sup>(s, 1) <sup>∧</sup> *proj* <sup>k</sup>(s , 1) = *proj* <sup>k</sup>(s, 0): index k = 0 in s equals the case of k = 1 in s. The handling of Z, S, and T gates is similar, using the *map* function to multiply the array values with different constants. Note that here we use ω to represent e πi <sup>4</sup> = cos <sup>π</sup> <sup>4</sup> <sup>+</sup>isin <sup>π</sup> <sup>4</sup> = <sup>√</sup> 1 <sup>2</sup> + <sup>√</sup> i 2 , the unit vector that is at an angle of 45◦ to the positive real axis in the complex plane. Later we will show that this representation allows a precise algebraic representation of complex numbers using a five-tuple of integers. Observe that <sup>ω</sup><sup>4</sup> <sup>=</sup> <sup>−</sup>1. The <sup>Y</sup> gate combines the two constructions; it negates the <sup>k</sup>-th index qubit and multiplies each projection with different constant coefficients. For the H, Rx( <sup>π</sup> <sup>2</sup> ), and Ry( <sup>π</sup> <sup>2</sup> ) gates, we use a binary *map* function to update the amplitudes. For the controlled gates, we use the projection function to classify the cases according to the control bits and apply the X or Z gate only when all controlled bits are 1.

*Example 2.* We use CaAL to verify the correctness of the EPR circuit Fig. 2: the circuit transforms the state |00 to <sup>√</sup> 1 <sup>2</sup> (|00 + |11). For this, the initial state of the circuit is encoded as an array expression, the H and CX gates are encoded according to Table 3, and the intended final state of the circuit is represented as a negated equation:

$$s\_0 = store(K(0), (0,0), 1).$$

$$\begin{array}{ll} \wedge & proj\_1(s\_1, 0) = map\_{(-, +)/\sqrt{2}}(proj\_1(s\_0, 0), proj\_1(s\_0, 1)) \\ \wedge & proj\_1(s\_1, 1) = map\_{(-, -)/\sqrt{2}}(proj\_1(s\_0, 0), proj\_1(s\_0, 1)) \\ \wedge & proj\_1(s\_2, 0) = proj\_1(s\_1, 0) \\ \wedge & proj\_2(proj\_1(s\_2, 1), 0) = proj\_2(proj\_1(s\_1, 1), 1) \\ \wedge & proj\_2(proj\_1(s\_2, 1), 1) = proj\_2(proj\_1(s\_1, 1), 0) \\ \wedge & \wedge & s\_2 \neq store(store(K(0), (1, 1), \frac{1}{\sqrt{2}}), (0, 0), \frac{1}{\sqrt{2}}) \end{array}$$

The formula is unsatisfiable if and only if the EPR circuit correctly performs the transformation.

*Representation of Complex Numbers.* To achieve accuracy with no loss of precision, in this paper, when working with C, we use a subset of the complex numbers that the following algebraic encoding can express (cf. [7,30,35]):

$$(\frac{1}{\sqrt{2}})^k (a + b\omega + c\omega^2 + d\omega^3),\tag{9}$$


**Table 4.** Tableau proof rules of the decision procedure for CaAL.

where a, b, c, d, k <sup>∈</sup> <sup>Z</sup>. A complex number is then represented by a five-tuple (a, b, c, d, k). Although the considered set of numbers is only a small subset of C, it is closed under the operations needed to encode quantum gates, and it can arbitrarily closely approximate any complex number. For this, note that (a, 0, c, 0, k) represents <sup>1</sup> √2 <sup>k</sup> (<sup>a</sup> <sup>+</sup> cω<sup>2</sup>) = <sup>a</sup>√<sup>2</sup> <sup>k</sup> + ci √2 <sup>k</sup> , and pick suitable a, c, and k. The representation is also sufficient to describe a set of quantum gates that can implement universal quantum computation (Table 3).

### **4 A Decision Procedure for Cartesian Arrays**

We now present a decision procedure for quantifier-free CaAL. Our calculus is an extension of the calculus for CAL [11] with rules for the *proj* and *arrayStore* operations. For the sake of presentation, we use the setting of analytic tableaux [14], although the same proof rules can be used also in a model-constructing calculus [11].

As a simplifying assumption, in this section we furthermore require that the index sorts <sup>σ</sup> of an array sort (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> ) represent *infinite* domains. This assumption can be lifted in the same way as for CAL [11], but the details are orthogonal to the task of supporting the new array operations.

#### **4.1 Preliminaries**

A tableau [14] is a finite tree growing downwards, in which each node is labelled with a formula, the root is labelled with the formula to be refuted, and the children of each node are derived from the formulas on the branch leading to the node using one of the available proof rules. We assume a tableau calculus equipped with a set of standard rules [14]: (i) α- and β-rules for eliminating Boolean connectives <sup>∧</sup>,∨; (ii) <sup>δ</sup>-rules for eliminating existential quantifiers <sup>∃</sup>; (iii) rules for reasoning about positive and negative equalities x = y between variables, which include rules for closing proof branches; (iv) rules implementing a decision procedure for the base theory TB.

Our calculus operates on *flat* formulas, which are formulas in which functions f only occur in equations y = f(¯x) in positive positions, i.e., underneath an even number of negations, with y, x¯ being variables. Every formula can be converted to a flat formula by introducing a linear number of new variables.

We define proof rules using the following notation:

$$\text{rule} \begin{array}{ccc|c} \phi\_1 & \phi\_2 & \cdots & \phi\_k \\ \hline \psi\_1 & \cdots & \mid & \psi\_m \\ \end{array}$$

The rule is applicable if the premises φ1,...,φ<sup>k</sup> occur on a proof branch, and has the effect of expanding the tableau: the proof branch is split into m new branches, to which the formulas ψ1,...,ψm, respectively, are appended.

In the premises of a rule, we frequently include assumptions <sup>x</sup> <sup>∼</sup> <sup>y</sup> that require that the equality x = y follows from positive equalities between variables on the proof branch. We also use premises x : σ, stating that x is a variable of sort σ occurring on the proof branch.

#### **4.2 Proof Rules**

The rules of our calculus are shown in Table 4. The rules idx,K⇓,store⇓,store⇑, map⇓, map⇑ coincide with the rules used for CAL [11], and define the semantics of the operators K, *store*, and *map*. Extensionality is implemented by the rule ext, which can be applied for any two array variables a, b of the same type occurring on a branch.

The semantics of *proj* and *arrayStore* is defined, in a similar way as for *store*, by upward and downward propagation of array reads. Since *arrayStore*k(b, j, c) combines two arrays b, c into a single new array, downward propagation has to route reads either to b or to c. Upward propagation from c is always possible, while reads on b can only be propagated if they are not overwritten by c.

For sake of presentation, we write the conclusion in the rules map ⇓, map ⇑, and ext in non-flat form, and assume that the transformation to a flat formula happens implicitly by adding existentially quantified variables representing the sub-terms.

Congruence reasoning is necessary only for array reads, and implemented using the rule readConq. For simplicity, in our formulation the rule splits over the cases ¯<sup>i</sup> <sup>=</sup> ¯<sup>j</sup> and ¯<sup>i</sup> <sup>=</sup> ¯j, and effectively searches for an arrangement of the index variables satisfying a formula. An actual implementation could rely on equality propagation being performed by a theory combination procedure.

As one of the more tricky points, the completeness of the calculus sometimes requires new array reads to be generated. This aspect is covered by the rules and δ in CAL [11], which are rules that can, however, not directly be used in our setting of multi-dimensional arrays. To obtain completeness, our calculus sometimes has to construct reads by combining different index variables occurring on a branch, and sometimes invent index values that are distinct from all indexes occurring in a formula. The introduction of corresponding new reads is handled by the rules freshIdx and read.

*Example 3.* Consider arrays a, b : (Z<sup>2</sup> <sup>⇒</sup> <sup>Z</sup>), and the formulas

$$proj\_1(a, i) = K(42) \land proj\_2(a, j) = K(43) \tag{10}$$

$$a = K(42) \land b = store(a, (i, i), 43) \land proj\_1(b, i) = K(43) \tag{11}$$

Both formulas are unsatisfiable, but cannot be refuted using the rules discussed so far. In (10), no reads <sup>a</sup>[··· ] exist, so that no propagations can be performed by any of the rules. It is necessary to identify the constraints on the value a[i, j] as contradictory. The rule read can be used to introduce a new formula <sup>∃</sup>v. v <sup>=</sup> <sup>a</sup>[i, j] on a proof branch, after which the rules *proj* ⇑ and <sup>K</sup>⇓ can be applied.

To show that (11) is unsatisfiable, we need to consider a point (i, j) with <sup>j</sup> <sup>=</sup> <sup>i</sup> and derive that <sup>a</sup>[i, j] = <sup>b</sup>[i, j] = 42, and contradicting *proj* <sup>1</sup>(b, i) = <sup>K</sup>(43). The introduction of a fresh index value j (different from i) is handled by the rule freshIdx, which relies on the index sort σ representing an infinite domain. Once the existence of an index <sup>j</sup> <sup>=</sup> <sup>i</sup> has been asserted, the rule read can be used to introduce an equation v = a[i, j], and the contraction be derived.

#### **4.3 Correctness and Complexity**

**Theorem 3.** *The presented tableau calculus is sound and complete for flat quantifier-free CaAL formulas: there is a closed tableau for a formula* φ *if and only if* φ *is unsatisfiable.*

*Proof. Soundness:* As usual, we identify each proof branch with the conjunction of its formulas and a tableau with the disjunction of its proof branches. It can be shown that the tableau before expansion using a proof rule is equi-satisfiable to the tableau before the expansion, modulo the array axioms in Table 2.

*Completeness:* We make the simplifying assumption that φ only contains arrays with (infinite) index sort σ and value sort τ , and in particular that array sorts are not nested. Completeness for the general case follows by recursively applying model construction.

Consider then the systematic construction of a tableau for a formula φ by exhaustively applying proof rules under the following restrictions: (i) regularity, i.e., rules are only applied if they lead to new formulas being added to each generated branch; (ii) rule freshIdx can only be applied once on a branch, only after ext has been applied to all pairs a, b of array variables on the branch, and choosing i1,...,i<sup>k</sup> as the set of all variables of sort σ on the branch.

Observe that this systematic application of rules terminates: the calculus never introduces new array variables so that only finitely many applications of ext are possible. Note that ext and freshIdx are the only rules introducing new index variables. Since freshIdx is applied at most once on a branch, the set of index variables is bounded, and there is only a bounded number of array reads v = a[ ¯i].

Assume now that a tableau for φ cannot be closed, i.e., has at least one branch B that cannot be closed, although all possible rule applications have been performed. We extract a model of φ from B. Suppose that M<sup>T</sup> = (D<sup>T</sup> , I<sup>T</sup> ) is a model that interprets the non-array-variables (including index variables), satisfying all literals on B that do not contain array variables, and denote the equivalence class of an array variable <sup>a</sup> on <sup>B</sup> by [a] = {<sup>b</sup> <sup>|</sup> <sup>a</sup> <sup>∼</sup> <sup>b</sup>}. Extending <sup>I</sup><sup>T</sup> , we construct an interpretation <sup>I</sup> with <sup>I</sup>((σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> )) = <sup>I</sup><sup>T</sup> (σ)<sup>n</sup> <sup>→</sup> <sup>I</sup><sup>T</sup> (<sup>τ</sup> ) being a function space, and the theory functions ·[·], *store*, K, *map*<sup>f</sup> , *proj* and *arrayStore* having their expected meaning. I is constructed in such a way that all array literals on B are satisfied; the satisfaction of compound formulas on B, and in particular of φ, then follows like in the standard Hintikka construction [14].

The interpretation <sup>I</sup>(a) of an array variable <sup>a</sup> : (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> ) is derived from the array reads on [a] occurring on B. The main difficulty is to consistently interpret the (infinitely many) elements of the array that are not mentioned explicitly on B. For this, denote the index variable introduced by the unique freshIdx application on B by , and observe that its value I<sup>T</sup> () is distinct from the value of all other index variables. We will use values read from I<sup>T</sup> ()-locations as default values for the arrays. Let

$$R\_a = \{ (\langle I\_T(i\_1), \dots, I\_T(i\_n) \rangle, I\_T(v)) \mid v = b[\bar{i}] \text{ occurs on } B \text{ and } a \sim b \} $$

be the set of array reads for <sup>a</sup> : (σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> ). The relation <sup>R</sup><sup>a</sup> describes a non-empty, consistent (but partial) valuation of the array elements, due to the exhaustive application of rules read and readCong.

The gaps in R<sup>a</sup> will be filled with default values introduced by . For this, we define a precedence ordering ⊆ <sup>I</sup><sup>T</sup> (σ)<sup>∗</sup> <sup>×</sup>I<sup>T</sup> (σ)<sup>∗</sup> over index vectors; intuitively, <sup>c</sup>¯ ¯ d if ¯c and ¯ d agree in all components, unless d<sup>k</sup> = I<sup>T</sup> (), which is interpreted as don't-care:

c1,...,ckd1,...,dm iff <sup>k</sup> <sup>=</sup> <sup>m</sup> and <sup>∀</sup><sup>i</sup> ∈ {1,...,k} : <sup>c</sup><sup>i</sup> <sup>=</sup> <sup>d</sup><sup>i</sup> <sup>∨</sup> <sup>d</sup><sup>i</sup> <sup>=</sup> <sup>I</sup><sup>T</sup> () The value of array variable <sup>I</sup>(a) <sup>∈</sup> <sup>I</sup>((σ<sup>n</sup> <sup>⇒</sup> <sup>τ</sup> )) is then:

$$I(a) = \left\{ (\bar{c}, x) \mid \begin{array}{l} (\bar{d}, x) \in R\_a, \text{ where } \bar{c} \preceq \bar{d} \\ \text{and for all } (\bar{d}', x') \in \bar{R}\_a: \text{ if } \bar{c} \preceq \bar{d}' \text{ then } \bar{d} \preceq \bar{d}' \end{array} \right\}$$

To see that I(a) is functionally consistent, note that whenever ( ¯ d, x) and ( ¯ d , x ) exist in <sup>R</sup><sup>a</sup> such that ¯<sup>c</sup> ¯ <sup>d</sup> and ¯<sup>c</sup> ¯ d , then there is also some ( ¯ <sup>d</sup>, x) <sup>∈</sup> <sup>R</sup><sup>a</sup> such that ¯<sup>c</sup> ¯ <sup>d</sup> ¯ d, ¯ d . This is because the rule read has been applied exhaustively.

It remains to be shown that I satisfies all array literals. By construction, equations a = b will be satisfied. To see that equations v = a[ ¯i] hold, note that <sup>I</sup>(a) <sup>⊇</sup> <sup>R</sup>a. Equations <sup>a</sup> <sup>=</sup> <sup>b</sup> are satisfied due to the exhaustive application of ext: there has to be some vector ¯i of index variables such that a[ ¯i] <sup>=</sup> <sup>b</sup>[ ¯i].

All other array literals are positive equations of the form x = f(¯y), and hold because exhaustive propagation of read atoms was performed. As an example, consider an equation a = *proj* <sup>k</sup>(b, j); it has to be shown that <sup>I</sup>(a) = {(c1,...,c<sup>k</sup>−1, ck+1,...,cn, x) <sup>|</sup> (¯c, x) <sup>∈</sup> <sup>I</sup>(b), c<sup>k</sup> <sup>=</sup> <sup>I</sup><sup>T</sup> (j)}. Observe that <sup>R</sup><sup>a</sup> <sup>=</sup> {(c1,...,c<sup>k</sup>−1, ck+1,...,cn, x) <sup>|</sup> (¯c, x) <sup>∈</sup> <sup>R</sup>b, c<sup>k</sup> <sup>=</sup> <sup>I</sup><sup>T</sup> (j)} due to the rules *proj* ⇓ and *proj* ⇑. Consider then a point (¯c, x) <sup>∈</sup> <sup>I</sup>(a), defined by ( ¯ d, x) <sup>∈</sup> <sup>R</sup>a, and the corresponding index vectors ¯c <sup>=</sup> c1,...,c<sup>k</sup>−1, I<sup>T</sup> (j), ck,...,c<sup>n</sup>−<sup>1</sup> and ¯ <sup>d</sup> <sup>=</sup> d1,...,d<sup>k</sup>−1, I<sup>T</sup> (j), dk,...,d<sup>n</sup>−<sup>1</sup> in <sup>R</sup>b, and show that (¯c , x) <sup>∈</sup> <sup>I</sup>(b) is defined by ( ¯ d , x) <sup>∈</sup> <sup>R</sup>b.

The proof of the theorem highlights the restrictions necessary to obtain a decision procedure for CaAL: all rules should be applied under the condition of regularity, and the rule freshIdx has to be restricted to at most one application per branch, and only after applications of ext have been performed.

To evaluate runtime, like in the proof of Theorem 3 we make the assumption that there are no nested array sorts, i.e., index and value sorts are themselves not arrays. To avoid degenerate cases when evaluating runtime, we assume that a formula φ cannot be smaller than the maximum arity of occurring array variables. We then get:

**Lemma 1.** *The satisfiability problem of quantifier-free CaAL formulas* φ *without nested array sorts is in NEXPTIME, assuming that the satisfiability problem of the base theory is in NP.*

*Proof.* This follows from the proof of Theorem 3. On every branch, the rule ext can be applied at most quadratically often, and the number of index variables occurring on a branch is polynomial in the size of the input formula φ. The number of distinct read atoms v = a[ ¯i] that can be introduced on a branch, and therefore the number of rule applications altogether is then polynomially bounded by the number of variables in φ, and exponentially bounded in the maximum arity of array variables in φ. After exhaustive application of the rules in Table 4, solving an at most exponential number of base theory formulas (with at most exponential size) on a branch is in NEXPTIME.

#### **4.4 Optimizations**

The calculus and decision procedure are primarily designed with simplicity in mind, rather than focusing on practical efficiency. Although the procedure's complexity may not be reduced below NEXPTIME, incorporating various optimizations can yield significant practical improvements. Two obvious improvements to be considered are: (i) The detection of **linear array variables**, which are essentially variables that are assigned to at most once in array literals [11]. It is enough to perform upward propagation (rules ⇑) only for non-linear variables. (ii) The **restriction of the number of reads** introduced using the rule read. In practice, only a few of the generated equations are actually needed to ensure completeness. Instead of generating all possible reads eagerly, a procedure could focus on the other rules first, and only introduce additional reads when it is detected that default values are missing for some sub-arrays. We believe that other refinements presented in [11] can be carried over to our decision procedure as well.


**Table 5.** Experimental results. We list the **circuit** name, the number of **qubits** and **gates** in the circuit, the verification **result**, and the execution **time**.

### **5 Preliminary Experimental Result**

We have implemented the decision procedure proposed for CaAL, the encoding of quantum gates using array operations, and of complex numbers as five-tuples of integers in the SMT solver Princess [26]. The implementation is still a proof of concept and largely unoptimized, so that the results reported in this section should be considered preliminary. We evaluate the performance of CaAL based on a set of benchmarks for quantum circuit verification. All experiments were conducted on a server with an AMD EPYC 7742 64-core processor (1.5 GHz), 1,152 GiB of RAM, and a 1 TB SSD running Ubuntu 20.04.5 LTS but were run with only one core for the sake of fairness. Files to reproduce the experiment can be found in https://zenodo.org/record/7970588. The experimental results are shown in Table 5. Specifically, we tested four different verification problems with different circuit sizes.


For Grover's algorithm, XXX = Single means we check the correctness of the circuit against a specific oracle, and XXX = All means we check against all possible oracles. We manually injected two bugs (by altering one gate) into two examples to demonstrate bug-catching capability. With a timeout of 60min, our implementation can analyze circuits with at most 7 qubits and at most 85 gates, which are still relatively small circuits. Analyzing the results, we discovered that, in particular, the H gates used to create a superposition state at the beginning of a circuit are challenging for the array decision procedure, as they lead to an exponential number of array reads being created.

### **6 Conclusions**

We have presented CaAL, an expressive logic of extensional arrays, with operations for reading and storing values, creating constant arrays, a point-wise extension of functions on array values to arrays, projection of arrays, and updating array slices. We have established that checking the satisfiability of quantifierfree CaAL formulas is NEXPTIME-complete, for a base theory in NP and nonnested arrays. The root cause for the complexity of CaAL (as opposed to the NP complexity of CAL and the standard theory of arrays) is that formulas can be constructed in which a cell in one array has dependencies to an exponential number of cells in another array. In our decision procedure, such situations lead to an exponential number of reads generated during propagation. High degrees of dependency are typical, however, for quantum circuits.

We believe that CaAL is a suitable framework for reasoning about quantum circuits. Due to the expressiveness of the logic, the encoding of quantum gates becomes remarkably succinct and elegant (Table 3), and easily understandable both for researchers in quantum circuit verification and people in automated reasoning. While theoretically optimal, we consider the decision procedure proposed for CaAL only as a first step: the high complexity of CaAL implies that bruteforce approaches like saturation are unlikely to scale to interesting instances. As future work, we therefore plan to explore the use of abstraction methods and of more succinct array representations in the decision procedure, thus making it possible to exploit the highly structured nature of typical quantum circuits in the solving process. We also plan to investigate whether interesting fragments of CaAL with lower complexity can be identified.

**Acknowledgements.** This work has been partially funded by the Swedish Research Council (VR) under grant 2018-04727, the Swedish Foundation for Strategic Research (SSF) under the project WebSec (Ref. RIT17-0011), the Wallenberg project UPDATE, and the NSTC QC project under Grant no. NSTC 111-2119-M-001-004- and 112-2119- M-001-006-.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **SAT-Based Subsumption Resolution**

Robin Coutelier1(B), Laura Kov´acs<sup>2</sup> , Michael Rawson<sup>2</sup> , and Jakob Rath<sup>2</sup>

<sup>1</sup> U. Li`ege, Li`ege, Belgium robin.coutelier@student.uliege.be <sup>2</sup> TU Wien, Vienna, Austria

**Abstract.** Subsumption resolution is an expensive but highly effective simplifying inference for first-order saturation theorem provers. We present a new SAT-based reasoning technique for subsumption resolution, without requiring radical changes to the underlying saturation algorithm. We implemented our work in the theorem prover Vampire, and show that it is noticeably faster than the state of the art.

### **1 Introduction**

Saturation-based proof search is a popular approach to first-order theorem proving [6,14,18]. In addition to efficient inference systems [1,8], saturation provers also implement *redundancy elimination* to reduce the size of the search space. Redundancy elimination deletes clauses from the search space by showing them to be logical consequences of other (smaller) clauses, and therefore redundant. However, checking whether a first-order formula is implied by another firstorder formula is undecidable, and so eliminating redundant clauses is in general undecidable too. In practice, saturation systems apply cheaper conditions for redundancy elimination, such as removing equational tautologies by congruence closure or deleting subsumed clauses by establishing multiset inclusion. Recently, SAT solving has been applied to efficiently detect and remove subsumed clauses [10]. *We extend SAT-based reasoning in first-order theorem proving to a combination of subsumption and resolution, subsumption resolution* [2] *(Sect.* 4*).*

Both subsumption and subsumption resolution are NP-complete [4]. To improve efficiency in practice, we (i) encode subsumption resolution as SAT formulas over (match) set constraints (Sect. 5) and (ii) directly integrate CDCL SAT solving for checking subsumption resolution in first-order theorem proving (Sect. 6). We implement our approach in the theorem prover Vampire [6], improving the state-of-the-art in first-order reasoning (Sect. 7).

**Related Work.** Subsumption and subsumption resolution are some of the most powerful and frequently used redundancy criteria in saturation-based provers. Subsumption resolution is supported as *contextual literal cutting* in [14], along with efficient approaches for detecting multiset inclusions among clauses [6,13, 18]. Special cases of unit deletion as a by-product of subsumption tests are also proposed in [16]. Much attention has been given to refinements of *term indexing* [13,16] to drastically reduce the set of candidate clauses checked for subsumption. Recently, these approaches have been complemented by SAT solving [10], reducing subsumption checking to SAT. Our work generalises this approach by solving for both subsumption and subsumption resolution via SAT.

SAT solvers have been applied widely to first-order theorem proving, including but not limited to AVATAR [17], instance-based methods [5], heuristic grounding [14], global subsumption [12] and combinations thereof [11], but using SAT solvers for classical subsumption methods is under-explored. To the best of our knowledge, SAT solving for subsumption resolution has so far not been addressed in the landscape of automated reasoning.

### **2 Illustrative Examples and Main Contributions**

Let us illustrate a few challenges of subsumption resolution, which motivate our approach to solving it (Sect. 4). Given a pair of clauses L and M, denoted as (L, M), the problem is to decide whether M can be simplified by L via a special case of logical consequence. In Fig. 1 we show examples where it is not obvious for which pairs (L*i*, M*i*) subsumption resolution can be applied.


**Fig. 1.** Illustrative examples.

In fact, subsumption resolution can only be applied to (L1, M1). Later, we show how our approach determines that M<sup>1</sup> can be shortened in the presence of L<sup>1</sup> (Example 3.1), but also how the remaining pairs cannot apply subsumption resolution (Examples 5.1, 5.2, and 4.1). For example, (L4, M4) is filtered by *pruning* to bypass the SAT routine altogether.

#### **Our Contributions**


### **3 Preliminaries**

We assume familiarity with first-order logic with equality. We include standard Boolean connectives and quantifiers in the language, and the constants -, ⊥ for truth and falsehood. We use x, y, z for first-order variables, c, d, e for constants, f,g for functions, p, q, r for atoms, l,m for literals, and L, M for clauses, all potentially with indices. If L is a clause l<sup>1</sup> ∨ ... ∨ l*n*, we sometimes consider it as a multiset of its literals l*i*, and write |L| for its cardinality (i.e. the number n of literals in L). The empty clause is denoted -. Free variables are universally quantified. An expression E is a term, atom, literal, clause, or formula.

**Substitutions and Matches.** A substitution σ is a (partial) mapping from variables to terms. The result of applying a substitution σ to an expression E is denoted σ(E) and is the expression obtained by simultaneously replacing each variable x in E by σ(x). For example, the application of σ := {x → f(c)} to the clause L := {p(x), q(x, y)} yields σ(L) = {p(f(c)), q(f(c), y)}. Note that σ(L) is a logical consequence of L.

A *matching substitution*, in short a *match*, between literals l and m is a substitution σ such that σ(l) = m. For example, the match of p(x) onto p(f(c)) is {x → f(c)}. Two matches are *compatible* and can be combined in the same substitution iff they do not assign different terms to the same variable. For example, the substitutions {x → f(c), y → g(d)} and {x → f(c), z → h(e)} are compatible, but {x → f(c)} and {x → g(c)} are not.

**Saturation and Redundancy.** Many first-order systems apply the superposition calculus [1] in a saturation loop [8]. Given an input set F of clauses, saturation iteratively derives logical consequences and adds them to F. By soundness and completeness of superposition, if is derived the system can report unsatisfiability of F; if is not encountered and no further clauses can be derived, the system reports satisfiability of F.

Saturation is more efficient when F is as small as possible. For this reason, saturation-based provers also employ *simplifying* inferences. Simplifying inferences reduce the number or size of clauses in F. This is formalised using the following notion of *redundancy*: a ground clause M is redundant in a set of ground clauses F if M is a logical consequence of clauses in F that are strictly smaller than M w.r.t. a fixed simplification ordering . A non-ground clause M is redundant in a set of clauses F if each ground instance of M is redundant in the set of ground instances of F. If M is redundant in F, then M can be removed from F while retaining completeness.

**Subsumption.** A clause L *subsumes* a distinct clause M iff there is a substitution σ such that

$$
\sigma(L) \subseteq\_M M \tag{1}
$$

where ⊆*<sup>M</sup>* denotes multiset inclusion. We also say that M is *subsumed* by L. Note that subsumed clauses are redundant.

Removing subsumed clauses M from the search space F is implemented through a simplifying rule, checking condition (1) over pairs of clauses (L, M) from F. Matches between every literal in L to some literal in M are checked; if a compatible set of matches is found, then M can be removed from F.

**Subsumption Resolution.** Subsumption resolution aims to remove one redundant literal from a clause. Clauses M and L are said to be the main and side premise of subsumption resolution, respectively, iff there is a substitution σ, a set of literals L ⊆ L and a literal m ∈ M such that

$$
\sigma(L') = \{\neg m'\} \quad \text{and} \quad \sigma(L \nmid L') \subseteq M \nmid \{m'\}.\tag{2}
$$

If so, M can be replaced by M \ {m }. Subsumption resolution is hence the rule

$$\text{(SR)} \quad \frac{L \quad \mathcal{M}}{M \text{ } \{m'\}}$$

We indicate the deletion of a clause M by drawing a line through it (✚M✚), and we refer to the literal m of M as the *resolution literal* of SR. Intuitively, subsumption resolution is binary resolution followed by subsumption of one of its premises by the conclusion. However, by combining two inferences into one it can be treated as a simplifying inference, which is advantageous from the perspective of proof search dynamics.

*Example 3.1.* Consider L1, M<sup>1</sup> of Fig. 1. Subsumption resolution is applied by using the substitution σ := {x<sup>1</sup> → g(y1), x<sup>2</sup> → c, x<sup>3</sup> → e}. Note that σ(L1) = p(g(y1), c) ∨ p(f(c), e). σ(L1) and M<sup>1</sup> can be resolved to obtain p(g(y1), c). The clause p(g(y1), c) subsumes M1, since it is a sub-multiset of M1. We have

$$\frac{p(x\_1, x\_2) \lor p(f(x\_2), x\_3) \quad p(\underline{g(y\_1), c}) \lor \neg p(\overline{f(c), e})}{p(g(y\_1), c)}$$

#### **4 SAT-Based Subsumption Resolution**

We describe the main steps of our SAT-based approach for deciding the applicability of subsumption resolution on a pair (L, M) of clauses. The core of our work solves (2) by finding match substitutions between literals in L and M. Our technique is summarised in Algorithm 1.

**Pruning.** The first step of Algorithm 1 *prunes* pairs (L, M) of clauses that cannot be simplified by subsumption resolution due to a syntactic restriction over symbols in L and M, *viz.* whether the set of predicates in L is a subset of the predicates in M. If not, then there is a literal in L that cannot be matched to any literal in M, and hence subsumption resolution cannot be applied.

*Example 4.1.* The clause pair (L4, M4) from Fig. 1 is pruned by Algorithm 1: the set of predicates in L<sup>4</sup> and M<sup>4</sup> are respectively {p, q, r} and {p, q}, implying that the literal r(x3) of L<sup>4</sup> cannot be matched to any literal in M4.



**Match Set.** The *match set* of Algorithm 1 computes matching substitutions over literals of L and M. The match set *ms* consists of a sparse matrix that assigns each literal pair (l*i*, m*<sup>j</sup>* ) ∈ L × M a substitution σ*i,j* such that σ*i,j* (l*i*) = m*<sup>j</sup>* or σ*i,j* (l*i*) = ¬m*<sup>j</sup>* . In addition, a polarity P*i,j* is also assigned to (l*i*, m*<sup>j</sup>* ), as follows: we set polarity P*i,j* = + if σ*i,j* (l*i*) = m*<sup>j</sup>* and P*i,j* = − if σ*i,j* (l*i*) = ¬m*<sup>j</sup>* . This matrix is sparse because in general not all literal pairs (l*i*, m*<sup>j</sup>* ) ∈ L × M can be matched. Additionally, it is again possible to prune (L, M) while filling the match set: if a row of the match set is empty, then there is some literal in L that cannot be matched to any literal in M. In this case, subsumption resolution cannot use L to simplify M, so the pair (L, M) is pruned.

**SAT Solver.** The *solver* of Algorithm 1 is the CDCL-based SAT solver introduced previously [10], which supports reasoning over matching substitutions in addition to standard propositional reasoning. This solver also features direct support for *AtMostOne* constraints. Solver performance was tuned for subsumption, which we retain for subsumption resolution. Each propositional variable v is associated with a substitution σ*v*, and the solver ensures that all substitutions σ*v*, for which v is assigned in the current model, are compatible. Conceptually, a global substitution σ satisfying the invariant σ = -{σ*<sup>v</sup>* | v = -} is kept in the SAT solver. In the following, we will write this binding as v ⇒ σ*<sup>v</sup>* ⊆ σ.

*Example 4.2.* Suppose propositional variables v<sup>1</sup> and v<sup>2</sup> are associated with substitutions σ<sup>1</sup> := {x → y} and σ<sup>2</sup> := {x → z}, respectively. As σ<sup>1</sup> and σ<sup>2</sup> are incompatible, the solver will block assigning v<sup>1</sup> = and v<sup>2</sup> = simultaneously since it would break the above invariant.

**Encoding Constraints.** Given the match set of (L, M), we formalise the subsumption resolution problem (2) as the conjunction of four constraints over matching substitutions. Our formalisation is given in Theorem 5.1 and is complete in the following sense: subsumption resolution can be applied over (L, M) iff each constraint of Theorem 5.1 is satisfiable. Application of subsumption resolution is tested via satisfiability checking over our constraints from Theorem 5.1. Encodings of our subsumption resolution constraints are given in Sect. 5.

**Building the Conclusion.** If a model is found for the constraints encoding subsumption resolution, the conclusion M \ {m } of SR is built using the model.

### **5 Subsumption Resolution and SAT Encodings**

As mentioned in Sect. 4, we turn the application of subsumption resolution SR over (L, M) into the satisfiability checking problem of Algorithm 1. We give our formalisation of SR in Theorem 5.1, followed by two encodings to SAT (Sect. 5.1– 5.2) and adjustments to subsumption (Sect. 5.3).

**Theorem 5.1 (Subsumption Resolution Constraints).** *Clauses* M *and* L *are the main and side premise, respectively, of an instance of the subsumption resolution rule SR iff there exists a substitution* σ *that satisfies the following four properties:*

$$
\vec{x} \text{existence} \tag{3}
$$

$$
\vec{\neg} \, i \, j \,. \sigma(l\_i) = \neg m\_j \tag{3}
$$

*uniqueness* ∃j . ∀i j. σ(l*i*) = ¬m*<sup>j</sup>* ⇒ j = j (4)

$$
existsnesss \qquad \forall i. \exists j. \left(\sigma(l\_i) = \neg m\_j \lor \sigma(l\_i) = m\_j\right) \tag{5}$$

*coherence* ∀j. ∃i. σ(l*i*) = m*<sup>j</sup>* ⇒ ∀i. σ(l*i*) = ¬m*<sup>j</sup>* (6)

We relate these constraints to the definition of subsumption resolution (2). The **existence** property (3) requires a literal m*<sup>j</sup>* in M such that a literal l*<sup>i</sup>* of <sup>L</sup> can be matched to <sup>¬</sup>m*<sup>j</sup>* , ensuring the existence of the resolution literal in SR. **Uniqueness** (4) asserts that the resolution literal m*<sup>j</sup>* of SR is unique, required because SR performs only a single resolution step. **Completeness** (5) requires each literal in L be matched either to the complement of a resolution literal, or to a literal in M. Since each (complementary) literal in L is matched to one (resolution) literal of M, the completeness property ensures that the conclusion of SR subsumes M. Finally, **coherence** (6) states that all literals in M must be matched by literals in L with uniform polarity. This implies that all literals of L other than the resolution literal are present in the conclusion of SR. We note that these constraints can be used to recreate Example 3.1.

*Example 5.1.* The clause pair (L2, M2) of Fig. 1 does not satisfy the uniqueness property: both the match between p(x1) and ¬p(y) and the match between q(x2) and ¬q(c) are negative and so no substitution can satisfy all constraints simultaneously. Therefore, subsumption resolution cannot be applied over (L2, M2).

*Example 5.2.* The clause pair (L3, M3) violates the coherence property for all possible σ, since a negative map from p(x1) to ¬p(y) cannot coexist with a positive map from ¬p(x2) to ¬p(y). Subsumption resolution cannot be performed over (L3, M3).

#### **5.1 Direct SAT Encoding of Subsumption Resolution**

We present our encoding of subsumption resolution constraints as a SAT problem, allowing us to use Algorithm 1 for deciding the application of SR. In the sequel we consider the clauses L, M as in Theorem 5.1.

**Compatibility.** We introduce indexed propositional variables b<sup>+</sup> *i,j* and b<sup>−</sup> *i,j* to represent σ(l*i*) = m*<sup>j</sup>* and σ(l*i*) = ¬m*<sup>j</sup>* respectively, which we use to track compatible matching substitutions between literals of L and M. More precisely, a propositional variable is created if and only if the corresponding match is possible (i.e., in the formulas below, if no match exist, replace the corresponding propositional variable by ⊥). As it is not possible to have simultaneously a substitution σ*i,j* (l*i*) = m*<sup>j</sup>* and σ*i,j* (l*i*) = ¬m*<sup>j</sup>* , we also write b*i,j* to mean either b+ *i,j* or b<sup>−</sup> *i,j* when the polarity of the match is irrelevant. Following Sect. 4, the variables are bound to their substitutions:

$$\text{SAT-based compatibility} \qquad \bigwedge\_{i} \bigwedge\_{j} [b\_{i,j} \Rightarrow \sigma\_{i,j} \subseteq \sigma] \tag{7}$$

SR **Constraints.** Constraints (3)–(6) of Theorem 5.1 employ *bounded* quantification over the finite number of literals in L, M. Expanding these quantifiers over their respective domains, we translate them into the following SAT formulas:

#### **SAT-based existence** *i j* b<sup>−</sup> *i,j* (8)

$$\text{SAT-based uniqueness} \qquad \bigwedge\_{j} \bigwedge\_{i} \bigwedge\_{i' \ge i} \bigwedge\_{j' > j} \neg b\_{i,j}^{-} \lor \neg b\_{i',j'}^{-} \tag{9}$$

**SAT-based completeness**

$$\text{SAT-based coherence} \qquad \qquad \land$$

$$\bigwedge\_{j} \bigwedge\_{i} \bigwedge\_{i'} \neg b\_{i,j}^{+} \lor \neg b\_{i',j}^{-} \qquad \qquad (11)$$

b*i,j* (10)

SR **as SAT Problem.** Based on the above, application of subsumption resolution is decided by the satisfiability of (7)∧(8)∧(9)∧(10)∧(11). This SAT formula extended with substitutions represents the result of encodeConstraint() in Algorithm 1 and is used further in Algorithm 3. When this formula is satisfiable, we construct the substitution σ required for SR by

$$\sigma = \bigcup \{ \sigma\_{i,j} \mid b\_{i,j} = \top \}.$$

From the model of the SAT solver, we extract the first literal b<sup>−</sup> *i,j* assigned -, from which we conclude that the jth literal in M is the resolution literal of SR. As such, application of SR over <sup>L</sup> and <sup>M</sup> results in replacing <sup>M</sup> by <sup>M</sup> \ {m*j*}.

*Remark 5.1.* Implicitly, all l*<sup>i</sup>* literals are mapped to at most one literal m*<sup>j</sup>* . Indeed, if there were several literals m*<sup>j</sup>* such that σ(l*i*) = m*<sup>j</sup>* or σ(l*i*) = ¬m*<sup>j</sup>* , then either the respective matches are not compatible (guarded by the compatibility property (7)), there are identical literals in M, or M is a tautology (which is not allowed).

*Remark 5.2.* While we defined b*i,j* to be true if, and *only* if, σ*i,j* ⊆ σ, we only encode the sufficient condition b*i,j* ⇒ σ*i,j* ⊆ σ. The completeness property (10) together with Remark 5.1 state that each l*<sup>i</sup>* must have exactly one match to some m*<sup>j</sup>* or ¬m*<sup>j</sup>* . Therefore, if σ*i,j* ⊆ σ then the respective b*i,j* must be true and the condition also becomes necessary: b*i,j* ⇐ σ*i,j* ⊆ σ.

*Example 5.3.* Consider the pair (L1, M1) of Fig. 1. The match set ms of Algorithm 1 is:

$$
\sigma\_{i,j} = \begin{bmatrix}
\{x\_1 \mapsto g(y\_1), x\_2 \mapsto c\} \ \{x\_1 \mapsto f(c), x\_2 \mapsto e\} \\
\perp \qquad \{x\_1 \mapsto c, x\_2 \mapsto e\}
\end{bmatrix} \quad P\_{i,j} = \begin{bmatrix} + - \\ - \end{bmatrix}
$$

Since σ2*,*<sup>1</sup> is incompatible with any substitution, b2*,*<sup>1</sup> = ⊥ need not be defined. This also allows to disregard SAT clauses that are trivially satisfied. The existence (8) and completeness (10) properties cannot have empty clauses: this is easily detected while filling the match set, and the instance of SR is pruned. Adding falsified literals in these constraints is unnecessary. The uniqueness (9) and coherence (11) properties have only negative polarity literals and therefore there is no need to add clauses containing b2*,*<sup>1</sup>. In light of the previous comment, we use variables b<sup>+</sup> <sup>1</sup>*,*<sup>1</sup>, b<sup>−</sup> <sup>1</sup>*,*<sup>2</sup> and b<sup>−</sup> <sup>2</sup>*,*<sup>2</sup> and encode SR using the following constraints:

$$\begin{aligned} b\_{1,1}^{+} & \Rightarrow \{x\_{1} \mapsto g(y\_{1}), x\_{2} \mapsto c\} \subseteq \sigma & \quad \text{SAT-based compatibility of } b\_{1,1}^{+}\\ b\_{1,2}^{-} & \Rightarrow \{x\_{1} \mapsto f(c), x\_{2} \mapsto e\} \subseteq \sigma & \quad \text{SAT-based compatibility of } b\_{1,2}^{-}\\ b\_{2,2}^{-} & \Rightarrow \{x\_{2} \mapsto c, x\_{3} \mapsto e\} \subseteq \sigma & \quad \text{SAT-based compatibility of } b\_{2,2}^{-}\\ b\_{1,2}^{-} \lor b\_{2,2}^{-} & \qquad \text{SAT-based compatibility of } b\_{2,2}^{-}\\ b\_{1,1}^{+} \lor b\_{1,2}^{-} & \qquad \text{SAT-based completeness, } i = 1\\ b\_{2,2}^{-} & \qquad \text{SAT-based completeness, } i = 2 \end{aligned}$$

The uniqueness (9) and coherence (11) properties are trivial here because the problem is simple: all b<sup>−</sup> *i,j* have the same j, and no literal m*<sup>j</sup>* can be mapped with different polarities. By using SAT solving from Algorithm 1 over the above SAT constraints, we obtain the SAT model b<sup>+</sup> <sup>1</sup>*,*<sup>1</sup> ∧ ¬b<sup>−</sup> <sup>1</sup>*,*<sup>2</sup> ∧ b<sup>−</sup> <sup>2</sup>*,*<sup>2</sup>, with b<sup>−</sup> <sup>2</sup>*,*<sup>2</sup> the first literal assigned with negative polarity. The application of SR over (L1, M1) yields the conclusion M \ {m2} = p(g(y1), c), replacing M.

#### **5.2 Indirect SAT Encoding of Subsumption Resolution**

SAT-based formulas (9) and (11) may yield many constraints, with worst-case complexity O(|L| <sup>2</sup>|M<sup>|</sup> <sup>2</sup>). In practice such situations rarely occur, since the match set ms is sparsely populated. Nevertheless, to alleviate this worst-case complexity, we further constrain the approach of Sect. 5.1. We introduce structuring propositional variables c*<sup>j</sup>* such that c*<sup>j</sup>* is iff there exists a literal l*<sup>i</sup>* with σ(l*i*) = ¬m*<sup>j</sup>* , which we encode as:

$$\text{SAT-based structureality} \qquad \bigwedge\_{j} \left[ \neg c\_{j} \lor \bigvee\_{i} b\_{i,j}^{-} \right] \land \bigwedge\_{j} \bigwedge\_{i} \left( c\_{j} \lor \neg b\_{i,j}^{-} \right) \tag{12}$$

SR **as revised SAT problem.** While the compatibility property (7) remains unchanged, the SR constrains of Theorem 5.1 are revised as given below.

**SAT-based revised existence**

$$\bigvee\_{j} c\_{j} \quad (13)$$

b*i,j* (15)

**SAT-based revised uniqueness** *AtMostOne*({c*<sup>j</sup>* , j = 1, ..., |M|}) (14)

$$AtMostOne (\{c\_j, j = 1, \ldots, |M|\}) \quad (14)$$

$$\text{SAT-based received completeness} \qquad \bigwedge \text{'} $$

$$\text{SAT-based received coherence}$$

$$\bigwedge\_{j} \bigwedge\_{i} \left( \neg c\_{j} \lor \neg b\_{i,j}^{+} \right) \quad (16)$$

*i j*

Similarly to Sect. 5.1, application of subsumption resolution is decided via Algorithm 1 by checking satisfiability of (7)∧ (12) ∧ (13) ∧ (14) ∧ (15) ∧ (16) . Using the above SAT formula as the result of encodeConstraint() in Algorithm 1, the worst-case behaviour is eliminated in exchange for O(|M|) propositional variables, c*<sup>j</sup>* . While the direct encoding of Sect. 5.1 is more efficient on small problems as it requires fewer variables and constraints, the indirect encoding of this section is expected to behave better on larger problems (see Sect. 7).

*Remark 5.3.* Note that the uniqueness property (14) is handled via *AtMostOne* constraints, based on the approach of [10]. If a variable c*<sup>j</sup>* is set to -, then our SAT solver in Algorithm 1 infers that all other variables c*<sup>j</sup>*are set to ⊥.

*Example 5.4.* Consider again the clause pair (L1, M1) of Fig. 1. Compared to Example 5.3, our revised encoding of SR requires one additional variable c2, as m<sup>2</sup> in Example 5.3 is used with negative polarity. The revised constraints are:


The SAT solver returns b<sup>+</sup> <sup>1</sup>*,*<sup>1</sup> ∧ ¬b<sup>−</sup> <sup>1</sup>*,*<sup>2</sup> ∧ b<sup>−</sup> <sup>2</sup>*,*<sup>2</sup> ∧ c<sup>2</sup> as a solution to the above SAT problem, from which the application of SR yields a similar result to that of Example 5.3.

*Remark 5.4.* We note that our method naturally supports commutative predicates, such as equality. Let denote object-level equality. Suppose we have literals l*<sup>i</sup>* := a b and m*<sup>j</sup>* := c d. Two propositional variables with associated matching substitutions σ*i,j* and σ *i,j* are introduced, where σ*i,j* matches a b against c d and σ *i,j* matches a b against d c. If zero or one matches exist, then the problem behaves exactly like the non-symmetric case. If both matches exist, then σ*i,j* and σ *i,j* must be incompatible: otherwise, c and d would be identical terms and the trivial literal m*<sup>j</sup>* would have been eliminated. Therefore, our SAT-based encodings for subsumption resolution do not need to be adapted and behave as expected.

#### **5.3 SAT Constraints for Subsumption**

**multiplicity conservation**

In the new framework of Algorithm 1, the formulation suggested by [10] was adjusted to work with subsumption resolution. Algorithm 1 needs very little adaptation for subsumption: the *encodeConstraint*() method uses the encoding below, and the conclusion needs not be built as only the satisfiability of the formulas is relevant. The re-written SAT encoding becomes:


*AtMostOne*({b<sup>+</sup>

*i,j* , i = 1, ..., |L|}) (19)

Note that the set of propositional variables used in our SAT-based formulas (17)–(19) encoding subsumption is a subset of the variables used by our SATbased subsumption resolution constraints.

*j*

**Pruning for Subsumption.** The pruning technique described in Sect. 4 can be adapted into a stronger form for subsumption. In this case, we will check for multi-set inclusion between multi-sets of (predicates, polarity) pairs.

### **6 SAT-Based Subsumption Resolution in Saturation**

In this section we discuss the integration of our SAT-based subsumption resolution approach within saturation-based proof search.

**Forward/Backward Simplifications.** For the purpose of efficient reasoning, saturation algorithms use two main variants of simplification inferences implementing redundancy. *Forward* simplifications are applied on a newly generated


*ms* ← createMatchSet() *solver* ← createSatSolver(*ms*) **procedure** Subsumption(L, M) FS, FSR ← pruned(L, M) - F<sup>S</sup> (resp. FSR) gets true if subsumption (resp. subsumption resolution) cannot succeed fillMatchSet(*ms*, L, M) - Build the whole match set, and update F<sup>S</sup> and FSR **if** F<sup>S</sup> **then** subsumption cannot be applied **return** NoSubsumption encodeConstraints(*solver* , *ms*) - SAT-constraints of Section 5.3 **if** *solver* .solve() is SAT **then return** Subsumed **else return** NoSubsumption


clause M to check whether M can be simplified by an existing clause L. *Backward* simplifications use a newly generated clause L to check whether L can simplify existing clauses M. Backward simplification tends to be more expensive.

**SAT-Based Subsumption Resolution in Saturation.** Since subsumption is a stronger form of simplification, subsumption is checked before subsumption resolution. This means that subsumption resolution is applied only if subsumption fails for all candidate premises. We integrate Algorithm 1 within saturation so that it is used both for subsumption and subsumption resolution.

Algorithms 2–3 display a variation of the integration of our SAT-based approach for checking subsumption resolution during saturation. Since most of the setup of subsumption is also required for subsumption resolution, both simplification rules are set up at the same time. As such, whenever turning to subsumption resolution, the same match set ms from Algorithm 2 can be reused, while also taking advantage of pruning steps performed during subsumption.

We modified the forward simplification algorithm as described in Algorithm 4. In this new setting, checking the same pair (L, M) for subsumption


**procedure** ForwardSimplify(M,F) M<sup>∗</sup> ← NoSubsumptionResolution **for** L ∈ F \ {M} **do if** subsumption(L, M) is Subsumed **then** using Algorithm 2 F ← F \ {M} **return** - M is subsumed and removed **if** M<sup>∗</sup> = NoSubsumptionResolution **then** M<sup>∗</sup> ← subsumptionResolution(L, M) using Algorithm 3 **if** M<sup>∗</sup> = NoSubsumptionResolution **then** F ← F \ {M}∪{M<sup>∗</sup>} - M<sup>∗</sup> is the conclusion of subsumption resolution between L and M **return return** ⊥

**Algorithm 5.** Evaluation of SAT-based subsumption resolution

**procedure** ForwardSimplifyWrapper(M,F) s ← startTimer() r ← ForwardSimplify(M,F) - Benchmarked method - Prevent modification of F e ← endTimer() writeInFile(e − s) r ← Oracle(M,F) checkCoherence(r, r ) - Empiric check **return** r

directly followed by subsumption resolution enables us to use Algorithms 2 and 3 efficiently. Algorithm 4 pays the price of checking subsumption resolution even if subsumption may succeed, but in practice inefficiencies in this respect are seen rarely.

**Role of Indices.** When applying inferences that require terms or literals to unify or match, modern automated first-order theorem provers typically use *term indices* [9] to consider only viable candidates within the set of clauses. Subsumption and subsumption resolution is no exception. Our testbed system Vampire currently uses a substitution tree to index clauses for matching by their literals (Sect. 7).

### **7 Implementation and Experiments**

We implemented and integrated our SAT-based subsumption resolution approach in the saturation-based first-order theorem prover Vampire [6] 1.

<sup>1</sup> https://github.com/vprover/vampire/tree/robin c-subsumption resolution.

**Versions compared.** We use following versions of Vampire in our evaluation:


**Experimental Setting.** To evaluate our work, we used the examples of the TPTP library (version 8.1.2) [15]. In our evaluation, 24 926 problems were used out of the 25 257 TPTP problems; the remaining problems are not supported by Vampire (e.g., problems with both higher-order operators and polymorphism).

Our experimental evaluation was done on a machine with two 32-core AMD Epyc 7502 CPUs clocked at 2.5 GHz and 1006 GiB of RAM (split into 8 memory nodes of 126 GiB shared by 8 cores). Each benchmark problem was run with the options -sa otter -t 60, meaning that we used the Otter saturation algorithm [7] with a 60-second time-out. We use the Otter strategy because it is the most aggressive in terms of simplification and therefore runs the most subsumption resolutions. We turned off the AVATAR framework (-av off) in order to have full control over SAT-based reasoning in Vampire.

**Evaluation Setup.** Our evaluation process is summarised in Algorithm 5, incorporating the following notes.


For the reasons above, we decided to measure the run time of a complete execution of Algorithm 4. To prevent the branches to change, an Oracle is used to choose the path to follow. The Oracle is based on our indirect SAT encoding (Vampire<sup>∗</sup> *<sup>I</sup>* ). This way, the same computation graph is used for all evaluated methods. To prevent cache preheating, we run the Oracle after the respective evaluated method. This way the cache is in a normal state for the evaluated method. To measure the run time of Algorithm 4, a Wrapper method was built on top of the Forward Simplify procedure of Algorithm 4. This Wrapper replaces the Forward Simplify loop in Vampire with minimal changes to the code. To empirically verify the correctness of our results, we used the Wrapper to compare the result of the evaluated method with the result of the Oracle.

**Experimental Details and Analysis.** Fig. 2 lists the cumulative instances solved by the respective Vampire versions, highlighting the strength of forward simplifications for effective saturation.

**Fig. 2.** Cumulative instances of applying subsumption resolution, using the TPTP examples. A point (n, t) on the graph means that n forward simplify loops were executed in less than t μs. The flatter the curve, the faster the Vampire version is.

**Table 1.** Average time spent in the Forward Simplify loop. Vampire<sup>∗</sup> *<sup>D</sup>* is the fastest method, closely followed by the Vampire<sup>∗</sup> *<sup>I</sup>* . However, the indirect encoding is much more stable and has a lower variance.


*Remark 7.1.* Our experimental summary in Fig. 2 shows that the total number of Forward Simplify loops ran in 60 s. However, the average and standard deviation were computed only on the intersection of the problems solved. That is, only the Forward Simplify loops finished by all the methods are taken into account. Otherwise, if a hard problem is solved in, for instance, 1 000 000 μs by one method, and times out for another, the average for the better would increase a lot, but the weaker method would not be penalised. Table 1 summarises the average solving time of our evaluation.

**Comparison of Encodings.** We correlated the constraint building and SAT solving time with the length of clauses, using the different encodings of Sects. 5.1– 5.2. Figure 3 shows that on larger clauses, the average computation time increases faster for the direct encoding than for the indirect encoding.



**Fig. 3.** Average time (μs) spent on the creating and solving SAT-based subsumption resolution constraints.

**Table 2.** Number of TPTP problems solved by the considered versions of Vampire. The run was made using the options -sa otter -av off with a timeout of 60 s. The **Gain/Loss** column reports the difference of solved instances compared to Vampire*M*.


**Experimental Summary.** Our experiments show that Vampire<sup>∗</sup> *<sup>I</sup>* yields the most stable approach for SAT-based subsumption resolution (Table 1), especially when it comes on solving large instances (Fig. 3). Our results demonstrate the superiority of SAT-based subsumption resolution used with forward simplifications in saturation (e.g., Vampire<sup>∗</sup> *<sup>D</sup>* and Vampire<sup>∗</sup> *<sup>I</sup>* ), as concluded by Table 2.

### **8 Conclusion**

We advocate SAT solving for improving saturation-based first-order theorem proving. We encode powerful simplification rules, in particular subsumption resolution, as SAT problems, triggering eager and efficient reasoning steps for the purpose of keeping proof search small. Our experiments with Vampire showcase the benefit of SAT-based subsumption. In the future, we aim to further extend simplification rules with SAT solving, in particular focusing on subsumption demodulation for equality reasoning [3].

**Acknowledgements.** This work is the result of a research internship hosted at TU Wien and defended at the University of Li`ege. The authors would like to thank Pascal Fontaine for valuable discussions and comments. We acknowledge funding from the ERC Consolidator Grant ARTIST 101002685 and the FWF SFB project SpyCoDe F8504.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A More Pragmatic CDCL for IsaSAT and Targetting LLVM (Short Paper)**

Mathias Fleury1,2,3(B) and Peter Lammich4,5

 University Freiburg, Freiburg, Germany fleury@cs.uni-freiburg.de Max-Planck-Institut f¨ur Informatik, Saarbr¨ucken, Germany Johannes Kepler University Linz, Linz, Austria University of Manchester, Manchester, UK University of Twente, Twente, Netherlands p.lammich@utwente.nl

**Abstract.** IsaSAT is the most advanced verified SAT solver, but it did not yet feature inprocessing (to simplify and strengthen clauses). In order to improve performance, we enriched the base calculus to not only do CDCL but also inprocess clauses. We also replaced the target of our code synthesis by Isabelle/LLVM. With these improvements, we can solve 4 times more SAT Competition 2022 problems than the original IsaSAT version, and 4.5 times more problems than any other verified SAT solver we are aware of. Additionally, our changes significantly reduce the trusted code base of our verification.

### **1 Introduction**

SAT solving is a very important tool that has been extensively used in various applications like mathematics or cryptography. To ensure the correctness of the answer provided by a SAT solver, there are two approaches: either producing a certificate that can be checked independently or verifying a SAT solver. The first approach has been extensively studied and works very well in practice [19,26,28] – only checked proofs are counted in the SAT Competition [2].

The second approach, i.e., verifying a whole SAT solver is orders of magnitudes more complex than checking a certificate. To this end, the goal of the IsaFoL (Isabelle Formalization of Logic) [3] effort is to develop methodology and libraries for formalizing modern research in automated reasoning. In this context, we have verified a CDCL calculus (conflict-driven clause learning) and a two-watched literals data structure (Sect. 2). To show that they are useful, we have developed the verified SAT solver IsaSAT [8], which we later optimized [12]. To our surprise, it won the EDA Challenge 2021 defeating all the non-verified solvers, but, as expected, it finished last at the SAT Competition 2022 [2]. However, the former used a much shorter timeout (200 s, not announced before the competition) whereas the latter uses 5000 s.

In this paper, we present our new developments in IsaSAT, which make our solver arguably the most advanced formally verified SAT solver to date: inprocessing and verifying fast LLVM code [20] rather than slow functional code.

Inprocessing is a critical feature of modern SAT solvers (e.g., every winner of the SAT Competition since 2013 includes it). In order to use it in our formally verified solver, we had to extend our verified CDCL calculus: Our new PCDCL calculus includes features to encompass various inprocessing techniques, even if we have not yet implemented every possible technique (Sect. 3).

We generate IsaSAT by exporting a model in the interactive theorem prover Isabelle [22] to executable code. Earlier we used Isabelle's default code generator to export to Standard ML (SML). However, the performance was not sufficient – especially memory consumption was very high. Thus, we switched to Isabelle/LLVM [18], which generates LLVM intermediate representation (LLVM IR). Apart from allowing faster imperative code, it also reduced the trusted code base (Sect. 4), replacing the rather niche MLton [27] compiler by only the backend of the widely used LLVM.

Porting our entire development to Isabelle/LLVM required some changes and some cleanup. Moreover, when we implemented and verified inprocessing, we realized that some design decisions need to be improved. In Sect. 5, we report on our experiences and lessons learned while porting and extending IsaSAT.

Finally, we have benchmarked IsaSAT on the problems from the SAT Competition 2022. We show that just porting IsaSAT from SML to Isabelle/LLVM significantly improved the performance, and the new inprocessing techniques combined with heuristic improvements give us another significant increase, demonstrating the usefulness of our PCDCL calculus (Sect. 6).

This presentation is an extended version of our (non-peer-reviewed) system description from the EDA Challenge 2021 [13] and the SAT Competition 2022 [6]. Compared to that version, we have provided much more details on PCDCL, our experience porting the development to LLVM, and performance tests.

### **2 Preliminaries**

**CDCL.** CDCL is a procedure that builds a partial assignment called *a trail* either by guessing (called *deciding*) or propagating information. If the partial assignment is a model, the algorithm stops. If there is a conflict between the partial assignment and a clause, the partial assignment is repaired and a new clause is learned. For more details (beyond the scope of this paper), we refer the reader to the *Handbook of Satisfiability* [7].

We use a transition system for our formalization of CDCL [8]. Its state consists of the trail *M*, the (multi)sets of initial and learned clauses (*N* and *U*), and the conflict clause to analyze (or None if there is none). We show one rule, decide, that adds *L* to the current assignment *M*:

inductive decide :: *st* ⇒ *st* ⇒ *bool* where undefined lit *M L* =-⇒ |*L*|∈|*N*<sup>|</sup> <sup>=</sup>- decide (*M, N, U,* None) (*<sup>L</sup>* · *M, N, U,* None)

If no conflict has been found so far (None), we add the new literal *L* at the beginning of the trail *M*. We prove that our set of rules is terminating and correct [8].

**Code Synthesis.** To generate the IsaSAT code, we start from the abstract rules like decide and gradually refine it to some deterministic functions using the Refinement Framework [16]. Then, we rely on Sepref [17] to synthesize code: It takes an (Isabelle) function and synthesizes a new version, replacing functional data structures (like lists) by imperative data structures (like arrays). There are two versions of the tool. The older version, which we used before [8,12], uses Imperative HOL [9] and Isabelle's standard (trusted) code generator [14] to export code into various functional languages. We used Standard ML (SML) with the compiler MLton [27], because it offers (by far) the best performance for our use case. The new Sepref is part of the Isabelle/LLVM library (developed by the second author) and generates LLVM IR from a model of LLVM IR inside the theorem prover. The code generator interprets a shallow embedding of Isabelle/LLVM as equivalent to similar looking LLVM code. This reduces the trusted code base in two ways: first, the trusted pretty printer is simpler, and, second, instead of the rather niche full compiler MLton, we use only the backend of the widely used LLVM [20].

The biggest difference is that Imperative HOL allows arbitrary large arrays and integers, whereas Isabelle/LLVM is more realistic, requiring integers (in particular array offsets, see Sect. 5.1) to have a fixed bit-width.

**Related Work.** Our goal is to produce a fully verified SAT solver, without any runtime checks, that both *terminates* and returns a *correct model* while using efficient data structures. No other solver achieves all three goals. The SAT solver TrueSAT from Andrici and Ciobaca [1] relies on the original DPLL and uses less efficient data structures (including counters instead of watch lists), but it terminates. Historically, this would roughly correspond to SAT solver from the early 90s. However, it only uses stateless heuristics, and it is not clear if the approach can be extended to CDCL (where the solver learns and keeps new clauses) or to stateful heuristics (like VSIDS [21]). The solvers versat [23] and Creusat [25] go into a similar direction with CDCL instead of DPLL, but prove a weaker correctness property: they only show that an UNSAT result is correct, while a SAT result requires an additional check. Also, termination is not proved. Only proving this partial property makes many proofs considerably easier, in particular adding restarts. Oe et al's solver versat [23] was the first partially verified solver that could run benchmarks from the SAT Competition. More recently, Skot˚am [25] has verified in his Master's thesis the SAT solver CreuSAT using the Creusot framework (relying on Why3 internally). While CreuSAT is much faster than versat in our tests, its correctness relies on (trusted) SMT solvers, and the proofs are not checked by a small kernel like our Isabelle code. However, the verification also takes much less time (a few minutes compared to several hours).

Modern SAT solvers use inprocessing to make the subsequent CDCL run heuristically faster [15]. In particular, clauses are strengthened and global transformation (e.g., to remove variables) are applied. Two techniques (that we do not support), variable elimination and addition, slowly change the models of the formula by changing the set of variable. The SAT solver then reconstructs a model of the original formula at the end. Fazekas et al. [11] made it compatible with incremental SAT solving. All others inprocessing technique fit into our extended CDCL described in the next section.

### **3 Pragmatic CDCL for Inprocessing**

SAT solvers nowadays apply a combination of CDCL (most of the time) and inprocessing (sometimes). Therefore, we extended our calculus similarly. At the core, we have our terminating CDCL. We also allow for formula transformation and restarts. We call the combination *pragmatic CDCL* or PCDCL.

**Splitting the Clause Set.** Inprocessing makes it possible to strengthen and simplify clauses. However, we want models from the final set of clauses to remain models from the initial set of clauses. Deleting clauses is not possible: if we start with the clauses *<sup>A</sup>*∨*<sup>C</sup>* and *<sup>B</sup>*∨¬*B*, removing the tautology means that the model *<sup>A</sup>* of *<sup>A</sup>* <sup>∨</sup> *<sup>C</sup>* is not a model of the initial clause set anymore. Hence we want to keep the literal *B* without considering the tautology for propagation/conflict.

To solve the issue we split our set of clauses into two parts: clauses that are useful for propagation and clauses that can be ignored but are kept for their literals. Thus we keep the set of all literals A constant. For our proof of refinement to the original CDCL, we have to make sure that the new behavior is also possible in the original calculus – in particular we do not miss propagations or conflicts. In the case of tautologies, this is simple (they are never used). If we consider subsumption, like *<sup>A</sup>* <sup>∨</sup> *<sup>B</sup>* subsumes *<sup>A</sup>* <sup>∨</sup> *<sup>B</sup>* <sup>∨</sup> *<sup>C</sup>*, whenever the latter propagates, then the former is a conflict. Therefore, the behavior is compatible.

While the idea of splitting our clauses seems surprising, the additional clause sets are only required for the connection to our CDCL transition system, and we entirely remove them when generating the code. Moreover, the refinement is easier as we do not have to update our heuristics to remove literals (and potentially shorten arrays). Finally, this is similar to the behavior of SAT solvers like Kissat [4]: while the clauses are removed, all literals of the problem are set.

In our original refinement, we have split the clauses to distinguish between clauses of length 1 (where we cannot distinguish two distinct literals and thus they cannot fit into our two-watched literals data structures) and longer clauses, but the aim was only distinguishing on the length.

One important point to notice is that the role of our clause sets changes. In our original CDCL, *N* was the (immutable) set of initial clauses and *U* contains the redundant clauses that can be removed at any point: *N* ensures that we do gain new models during our transformations. Now, the set changes: strengthening an irredundant clause from *N* also shortens the clause that is in there. Therefore, a naive version could remove literals.

Overall we have 4 sets of clauses: the irredundant clauses *N* and the redundant *U* clauses, and each one is divided into the active clauses (*N<sup>a</sup>* and *Ua*) and the inactive (discarded) clauses (*N<sup>d</sup>* and *Ud*). For example, tautologies or subsumed clauses are discarded, but remain in *N*, so literals are never removed. In our development there are actually three sets (containing a literal set at level 0 or tautologies, subsumed clauses, and false clauses) to reduce the number of case distinction in some proofs. We never demote irredundant clauses to redundant ones, but we can promote them.

**Inprocessing Rules.** Our aim when picking the rule is to be general (like we can learn any useful clause) and then we specialize rules to specific techniques. We will show this with the example of subsumption-resolution [7]. When doing subsumption-resolution, we resolve two clauses together if the conclusion is shorter. Then we can remove either one or both of the antecedents. For example, resolving *<sup>A</sup>* <sup>∨</sup> *<sup>B</sup>* <sup>∨</sup> *<sup>C</sup>* with *<sup>A</sup>* ∨ ¬*<sup>C</sup>* produces the clause *<sup>A</sup>* <sup>∨</sup> *<sup>B</sup>* with subsumes the former clause. If the latter clause was *<sup>A</sup>*∨*<sup>B</sup>* ∨¬*C*, the resolved clauses would actually subsume both clauses.

One of the most important inprocessing rule learns any possible clause. To simplify the presentation, we will only give the rules operating on the learned clauses, but similar rules exists for the initial set of clauses.

$$\begin{array}{l} \mathsf{inductive} \quad \mathsf{cdcl.\mathsf{learn}.\mathsf{clause} ::\ "/\mathsf{prog.st}\Rightarrow\ 'prag.st\Rightarrow\ book\ \mathsf{where} \ 'n = \ 'C \mid \sim N \ . \mathsf{docker} \ . \mathsf{cdcl.\mathsf{else}} \ (\ \mathsf{x} = 0 \ \mathsf{x} = N \ \mathsf{N} \land N\_d \ \mathsf{in} \ \mathsf{C} \ (\ \mathsf{x} = N \ \mathsf{t} = \mathsf{t} \ \mathsf{t} \ \mathsf{t} \ \mathsf{t} \ \mathsf{docker} \ \mathsf{C} \ \mathsf{x} \ \mathsf{t} \ \mathsf{t} \ \mathsf{t} \ \mathsf{t} \ \mathsf{t} \ \mathsf{None} \ (M, N, U, \mathsf{None}, N\_d, U\_d) \\ \mathsf{cdcl.\mathsf{Learn.\mathsf{clause}} \ (M, N, U, \mathsf{None}, N\_d, U\_d) \\ \qquad (M, N, U \wedge C, \mathsf{None}, N\_d, U\_d) \end{array}$$

The side conditions not only include that the clause is entailed and duplicatefree, but also the clause is not a tautology and we do not break CDCL invariants (count decided *M* = 0). Then we can deactivate subsumed clauses:

$$\begin{array}{l} \mathsf{inductive} \ \mathsf{cdd\\_subsund} :: \mathsf{?}\\ C \subseteq D \Longrightarrow \mathsf{count\\_decided} \ M = 0 \Longrightarrow\\ \mathsf{cdd\\_subsund} \ (M, N, U \wedge C \land D, \mathsf{None}, N\_d, U\_d) \\ \ (M, N, U \wedge C, \mathsf{None}, N\_d, D \wedge U\_d) \end{array}$$

We combine these rules to express subsumption-resolution: We first learn the clause obtained by resolution. Then we can remove the antecedents. If either antecedent is in *N*, we also have promoted the conclusion from *N* to *U*. The advantage of our approach is that we can express other inprocessing techniques without adding new rules, only by specializing them.

Overall we have 9 rules with some overlap with CDCL (propagation and conflict), but mostly simplification of clauses (removing true clauses and false literals from clauses) and pure literal deletion: When a literal always appears positively (or always negatively), we can set this literal to be true unconditionally (later removing all clauses containing it): every model after adding the clause is also a model of the original set of clauses but not the opposite. This is the first transformation that does not preserve models in IsaSAT or any other verified SAT solvers.

**Refinement of Subsumption-Resolution.** While the definition of subsumption resolution is very simple, the refinement to code was challenging.

We verified forward subsumption [7] following CaDiCaL [5] (unbounded however, so all clauses selected heuristically are checked). We sort clauses by size and check if the current candidate is subsumed by one of the smaller clauses. Because we use two-watched literals, we need to distinguish between the binary clauses (than can produce new units) and the other clauses. At the end, we implemented two forward subsumption passes: one for binary clauses only and the other for larger clauses.

To subsume the candidates, we build occurrence lists and populate them with binary clauses, whereas Kissat [5] reuses watch lists. Moreover, for efficiency, we need a new marking data structure for efficient detection of subsumingresolution.

### **4 Correctness of the Code and Completeness**

Our specification model if satisfiable takes the multiset of clauses and returns a model (if there is one) or None if the clauses are unsatisfiable. Our implementation IsaSATSML opts takes an array containing the clauses and returns an optional array containing the assignment, assuming that the clauses do not contain duplicated literals or the empty clause (precondition proper lits no dups ⊥). The additional argument opts activates and deactivates certain techniques for solving. The following theorem states that our implementation refines the specification:

**Theorem 1 (SML End-to-End Correctness).** *The following refinement relation holds:*

> (IsaSATSML opts*,* model if satisfiable) ∈ [proper lits no dups ⊥] clauses assn → option model assn

The LLVM version is nearly the same. It can handle duplicated literals and the empty clause. Moreover, the new specification model if satisfiable bounded allows for an *unknown* result if arrays would grow larger than the size permitted by the fixed bit-width. While this limit does not exist in Imperative HOL, it exists in practice as no machine supports arrays that large. Therefore, we technically weakened our theorem, but did not change practical guarantees on the generated code. For IsaSATSML we start [12] with 64-bit unsigned integers and only switch to GMP integers if the arrays grow too large.

**Theorem 2 (LLVM End-to-End Correctness).** *The following refinement relation holds:*

> (IsaSATLLVM opts*,* RETURN ◦ model if satisfiable bounded) ∈ [proper lits] clauses assn → option model assn

Moreover, the change from SML to LLVM reduces the trusted code base: The Isabelle/LLVM model is closer to the actual LLVM, such that the trusted pretty-printer is simpler. LLVM is also more low-level, such that fewer parts of the compiler have to be trusted. Finally, the LLVM compiler is more widely used and tested than the rather niche MLton compiler we used before.

### **5 Experience Porting the Development to LLVM**

We report on the challenges we faced when updating the huge IsaSAT formalization (Sect. 5.1). Moreover, we report on the unverified parts of IsaSAT (Sect. 5.2), and finally compile some lessons learned (Sect. 5.3).

### **5.1 Required Changes**

Before porting the development to LLVM, we removed our only remaining source of unbounded integers: the clause indices during the garbage collection. As garbage collection does not happen very often, we did not expect this to make a difference. Surprisingly, it turns out to have a performance impact.

Isabelle/LLVM is an entire tool set, including a fork of the original Sepref tool. While related to the original Sepref tool, there are different libraries, and the development of the two versions has diverged.

Initially, we tried to support both versions of Sepref. We ended up with two sets of files for code synthesis, and duplication of some libraries (to provide constants defined in Isabelle/LLVM but not in SeprefSML). This significantly complicated our refinement approach, although we made it conceptually cleaner during the porting. Then, we realized that IsaSATLLVM was much faster than IsaSATSML (we observed a factor 2 on our test files), and decided to discontinue the SML backend.

With this, also some workarounds for SML specific performance issues (like the tuple uint32 \* bool \* uint64 being much less efficient than combining the uint32 and the Boolean into a single 64-bit number) became obsolete.

**Compilation.** We have experimented with compilation flags before to improve performance. We know from the SML code that we need to increase the level of inlining, because many small functions make the verification easier. The same applies for LLVM and the easiest solution is to use link-time optimization that increases the inlining level as a side effect. However, this makes profiling impossible – exactly like the SML code. So there is no regression here.

**Tuples.** In 2021, we observed a major performance regression of the synthesis, caused by a new feature in SeprefLLVM: pointer-equality tracking caused quadratic behaviour for case-splits of tuples. As our solver state is a large tuple, synthesis became impossible (several dozen minutes for simple functions).

To avoid the issue, we decided to work around on the abstract level, using getter and setter functions for the state's components, rather than case splitting. Now, every function on the state would first get the required components, update them, and then put them back. For example:

```
definition rescore conflict :: clause index ⇒ isasat ⇒ isasat where
  rescore conflict C S = do{
    let (M,S) = extract trail S;
    ... (*reads the trail M and can change it*) ...
    let S = update trail M S;
    RETURN S
  }
```
This makes synthesis much faster. However, the ownership model of Sepref does not allow aliasing, nor do our refinement relations allow leaving a 'gap' in the state where we moved out an element. As an easy work-around, we resorted to placing dummy-values, like empty lists, in the state, hoping that LLVM would optimize away the allocations and deallocations for these values. However, this did not happen: In the hot-spot of the SAT solver, the propagation loop, the dummy value for the trail was recreated and freed each time. Thus, we locally resorted to unfolding our code to make sure that we need only one free in the inner propagation loop. We leave a more principled solution of this problem (possibly changing Sepref) to future work.

We even attempted to go one step further (as the state-of-the-art SAT solver Kissat [4] does) and simply passing a pointer to the state structure as argument. Once we had already changed our refinement with accessors, we simply had to change them to work on a pointer. However, we never managed to make the synthesized code efficient. We observed a factor of 10 slower code. Handoptimizing the accessors (basically making sure that LLVM understands that we care only about one component) reduced this to factor 2 slower. Once we realized that the LLVM optimizer was replacing the pointer by the structure passed directly as argument, we gave up on that approach.

### **5.2 Unverified Parts**

In the generated SAT solver, there are some parts that we cannot verify. First, the parser is not verified, because the file system has no model in Isabelle (unlike CakeML, where conditions apply however). To this end, we link the verified code with an unverified C program, which provides the parser and command line interface.

Second, Isabelle/LLVM does not support any output (like statistics, or the DRAT proofs [28] required for the SAT Competition). For the SML version, we could use a feature of Isabelle's code generator to (axiomatically) implement a function by some external function (e.g. a function that does nothing in the model, by a printing function). As Isabelle/LLVM does not yet have such a feature, we resorted to post-processing the generated code (i.e., a function that does nothing in the model, is replaced by a printing function or even a function storing some literals for DRAT proofs). Note that this post-processing is not required for IsaSAT to work (but it won't print DRAT proofs).

#### **5.3 Lessons Learned**

**Lesson 1: Embrace Duplication.** We have already highlighted the importance of the set of all possible literals A, in particular to establish a bound on the size of various arrays. At first, we tried to avoid duplicating this set across the different components on the specification side. This, however, resulted in a closer coupling of the various refinement proofs, impeding modularity: data structures that, conceptually, are just a small part of the whole state, have to be formalized on the whole state, just to have the set A available. We solved this problem by duplicating the set A on the abstract level for all new data structures. Note that this duplication is removed in a later refinement stage.

**Lesson 2: The Limits are Isabelle Files.** Checking our Isabelle files takes nearly two hours. This can be explained by three factors: 1. the heuristic and code synthesis amounts to 91 000 loc, making it a very large formalization; 2. the synthesis is single-threaded (for technical reasons); 3. Sepref encourages a style that is not very parallel: every refinement starts with a call to a tactic refine vcg that generates the goals (meaning that all successive tactics have to wait). To improve performance we have attempted [12] to generate more standard proofs in Isar (by generating the text corresponding to the theorems to prove), but it is not clear that this style is faster as huge number of variables are generated (this style is required for more complicated proofs, however).

In order to improve Isabelle's performance and speed-up the testing of new heuristics in IsaSATLLVM, we have split the files into three parts: the shared definitions of the functions to refine, the (single-threaded) synthesis, and the correctness proof of the refinement. Even with these optimizations, proof checking still takes 2 h. There is also no clear improvement path. The old SML version has a similar problem, but it is overall faster because it has fewer features, making it less critical.

**Lesson 3: Performance Bugs exist.** In order to improve performance, we need to measure and observe performance. To solve that problem, IsaSAT prints statistics and produces some timing information. The statistics during the run made identifying scheduling bugs for the different techniques possible – we accidentally ran some techniques way too often or barely ever. Especially because we increase the interval between two inprocessing rounds geometrically, a simple statistics at the end of the run is not sufficient. One interesting performance bug we found was that we accidentally inverted reducing clauses (marking them as removed) and garbage collection (physically removing them). Therefore, we would nearly always physically delete clauses. We never saw this issue, because we also printed the statistics inverted. To help debugging performance, we produce some timing information by measuring time in the C program:


**Fig. 1.** CDF of the performance of SAT solvers


This helps to identify bottlenecks but also outliers where one technique is particularly slow and requires some limits or a change in the scheduling to avoid slowing down the solver too much. This makes it possible to identify errors like allocations in loops. The overall timing matches what we expect from other SAT solvers (although usually they spend more time on inprocessing and less on propagation).

### **6 Performance**

In order to study the performance we have run 3 different IsaSAT versions: the original SML solver (using MLton with the LLVM backend), the first port of the IsaSAT solver, and the current version with inprocessing and various other improvements on heuristics that do not require any change on our PCDCL calculus, notably rephasing and target phases [10] (but no local search) and the alternation between aggressive restarts (heuristically seems better for UNSAT) and few restarts (seems better for SAT) following the ideas of Chanseok Oh [24].

We run all the benchmarks from the SAT Competition 2022 on an Intel Xeon E5-2620 v4 CPU at 2.10 GHz (with turbo-mode disabled) with a memory limit of 7 GB and a timeout of 5000 s. For comparison, we have included versat [23] and CreuSAT [25]. For completeness, we have included Kissat [6] (more precisely the bulky version submitted for the anniversary track).

The results are given in Fig. 1 as a CDF (the higher the curve, the more solved problems). The first surprise is that CreuSAT performs similarly to IsaSATSML (37 vs 40 solved problems), worse than expected given the results reported in the Master's thesis [25] that tested on the 2015 benchmarks. We suspect that is due to the garbage collection and the fact that problems from the SAT Competition have become harder.

There is a clear improvement when going from the SML version to the LLVM version (98 solved), while the latest version solves 166. The SML version produces 335 out-of-memory errors (OOMs), the base LLVM version is more memory efficient (23 OOMs) like the latest IsaSAT version (19 OOMs) or CaDiCaL that has the same memory layout (17 OOMs). However, there is still a large gap to reach the performance level of Kissat and its inprocessing techniques.

### **7 Conclusion**

We have reported on updating our verified SAT solver IsaSAT to a more powerful base calculus (our pragmatic CDCL) which can express inprocessing, and to the more efficient Isabelle/LLVM backend. We have also compiled important lessons learned from proof-engineering and maintaining large formalizations like IsaSAT (∼200 kloc of proofs).

Our changes made IsaSAT solve 4 times more problems (166/40), making it the most efficient verified SAT solver. At the same time, our verification is more complete than the next fastest verified solvers.

Most techniques (including the two most important, vivification and probing) either fit into our new PCDCL base calculus or do not require any change (like random walk [10] that is conjectured to be the reason for the major performance improvement in 2020). One major technique that we cannot currently express is variable elimination, because models are changed and need to be fixed. We leave the required extensions to our PCDCL for future work.

**Acknowledgments.** The work presented here was done over several years and several work places. The first author was supported for some time by Austrian Science Fund (FWF), NFN S11408-N23 (RiSE), and the LIT AI Lab funded by the State of Upper Austria. We thank the anonymous reviewers for their detailed comments.

### **References**


Competition 2020 - Solver and Benchmark Descriptions. Department of Computer Science Report Series B, vol. B-2020-1, pp. 51–53. University of Helsinki (2020)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Proving Non-Termination by Acceleration Driven Clause Learning (Short Paper)**

Florian Frohn(B) and J¨urgen Giesl(B)

LuFG Informatik 2, RWTH Aachen University, Aachen, Germany florian.frohn@cs.rwth-aachen.de, giesl@informatik.rwth-aachen.de

**Abstract.** We recently proposed *Acceleration Driven Clause Learning* (ADCL), a novel calculus to analyze satisfiability of *Constrained Horn Clauses* (CHCs). Here, we adapt ADCL to transition systems and introduce ADCL-NT, a variant for disproving termination. We implemented ADCL-NT in our tool LoAT and evaluate it against the state of the art.

### **1 Introduction**

Termination is one of the most important properties of programs, and thus termination analysis is a very active field of research. Here, we are concerned with *dis*proving termination of *transition systems* (TSs), a popular intermediate representation for verification of programs written in more expressive languages.

*Example 1.* Consider the following TS <sup>T</sup> with entry-point init and two further *locations* -1, -<sup>2</sup> over the variables x, y, z, where x- , y- , z represent the values of x, y, z *after* applying a transition, and <sup>=</sup> x, x**++**, and x**– –** abbreviate x- = x, x- = x + 1, and x- <sup>=</sup> <sup>x</sup> <sup>−</sup> 1. The first two transitions are a variant<sup>1</sup> of chc-LIA-Lin 052 from the *CHC Competition '22* [7] and the last two are a variant<sup>2</sup> of flip2 rec.jar-obl-8 from the *Termination and Complexity Competition (TermComp)* [21].

$$\mathsf{init} \to \ell\_1 \left\| x' \le 0 \land z' \ge 5000 \land y' \le z' \right\| \tag{7}$$

$$\ell\_1 \to \ell\_1 \left[ y \le 2 \cdot z \land x++ \land \left( \left( x < z \land \overline{\overline{y}} \right) \lor \left( x \ge z \land y++ \right) \right) \land \overline{\overline{z}} \right] \tag{7\_{\ell\_1}}$$

$$\ell\_1 \to \ell\_2 \left[ x = y \land x > 2 \cdot z \land \stackrel{\equiv}{x} \land \stackrel{\equiv}{y} \right] \tag{7\_{\ell\_1 \to \ell\_2}}$$

$$\ell\_2 \to \ell\_2 \left[ x = y \land x > 0 \land \overline{\overline{x}} \land y \text{--} \right] \tag{\tau\_{\ell\_2}^{-}}$$

$$\ell\_2 \to \ell\_2 \left[ x > 0 \land y > 0 \land x' = y \land \left( (x > y \land y' = x) \lor (x < y \land \bar{\bar{y}}) \right) \right] \qquad (\tau\_{\ell\_2}^\#)$$

<sup>1</sup> We generalized the example to make it more interesting, and we added the condition y ≤ 2 · z to enforce termination of τ-

<sup>1</sup> . <sup>2</sup> We combined the transitions for the cases x>y and x<y into the equivalent transition τ -= -<sup>2</sup> to demonstrate how our approach can deal with disjunctions in conditions.

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 235950644 (Project GI 274/6-2).

c The Author(s) 2023

B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 220–233, 2023. https://doi.org/10.1007/978-3-031-38499-8\_13

At -<sup>1</sup>, <sup>T</sup> operates in two "phases": First, just <sup>x</sup> is incremented until <sup>x</sup> reaches z (1st disjunct of τ-<sup>1</sup> ). Then, <sup>x</sup> and <sup>y</sup> are incremented until <sup>y</sup> reaches 2 · <sup>z</sup> + 1 (2nd disjunct of τ-<sup>1</sup> ). If x = y = c holds for some c > 1 at that point (which is the case if <sup>x</sup> <sup>≤</sup> <sup>y</sup> <sup>=</sup> <sup>z</sup> holds initially), then the execution can continue at -<sup>2</sup> as follows:

$$\ell\_2(c, c, c\_z) \longrightarrow\_{\tau\_{\ell\_2}} \ell\_2(c, c - 1, c\_z) \longrightarrow\_{\tau\_{\ell\_2}^{\#}} \ell\_2(c - 1, c, c\_z) \longrightarrow\_{\tau\_{\ell\_2}^{\#}} \ell\_2(c, c, c\_z) \longrightarrow\_{\tau\_{\ell\_2}} \dots$$

Here, -<sup>2</sup>(c, c, cz) means that the current location is -<sup>2</sup> and the values of x, y, z are c, c, cz. The 1st and 2nd step with τ <sup>=</sup> -<sup>2</sup> satisfy the 1st (x>y <sup>∧</sup> ...) and 2nd (x<y∧...) disjunct of <sup>τ</sup> <sup>=</sup> -<sup>2</sup> 's condition, respectively. Thus, T does not terminate.

Example 1 is challenging for state-of-the-art tools for several reasons. First, more than 5000 steps are required to reach -<sup>2</sup>, so reachability of -<sup>2</sup> is difficult to prove for approaches that unroll the transition relation or use other variants of iterative deepening. Thus, chc-LIA-Lin 052 is beyond the capabilities of most other state-of-the-art tools for proving reachability.

Second, the pattern "τ <sup>=</sup> -<sup>2</sup> , 1st disjunct of <sup>τ</sup> <sup>=</sup> -<sup>2</sup> , 2nd disjunct of <sup>τ</sup> <sup>=</sup> -<sup>2</sup> " must be found to prove non-termination. Therefore, flip2 rec.jar-obl-8 (which does not use disjunctions) cannot be solved by other state-of-the-art termination tools.

Third, Example 1 contains disjunctions, which are not supported by many termination tools. Presumably, the reason is that most techniques for (dis)proving termination of loops are restricted to conjunctions (e.g., due to the use of templates and Farkas' Lemma). While disjunctions can be avoided by splitting disjunctive transitions according to the DNF of their conditions, this leads to an exponential blow-up in the number of transitions.

We present an approach that can prove non-termination of systems like Example 1 automatically. To this end, we tightly integrate non-termination techniques into our recent *Acceleration Driven Clause Learning (ADCL)* calculus [16], which has originally been designed for CHCs, but it can also be used to analyze TSs.

Due to the use of acceleration techniques that compute the transitive closure of recursive transitions, ADCL finds long witnesses of reachability automatically. If acceleration techniques cannot be applied, it unrolls the transition relation, so it can easily detect complex patterns of transitions that admit non-terminating runs. Finally, ADCL reduces reasoning about disjunctions to reasoning about conjunctions by considering conjunctive variants of disjunctive transitions. Thus, combining ADCL with non-termination techniques for conjunctive transitions allows for disproving termination of TSs with complex Boolean structure.

After introducing preliminaries in Sect. 2, Sect. 3 presents a straightforward adaption of ADCL to TSs. Section 4 introduces our main contribution: ADCL-NT, a variant of ADCL for proving non-termination. Finally, in Sect. 5, we discuss related work and demonstrate the power of our approach by comparing it with other state-of-the-art tools. All proofs can be found in [19].

### **2 Preliminaries**

We assume familiarity with basics from many-sorted first-order logic. V is a countably infinite set of variables and <sup>A</sup> is a first-order theory over a <sup>k</sup>-sorted signature <sup>Σ</sup><sup>A</sup> with carrier <sup>C</sup><sup>A</sup> = (CA,1,..., <sup>C</sup>A,k). QF(ΣA) is the set of all quantifierfree first-order formulas over <sup>Σ</sup>A, which are w.l.o.g. assumed to be in negation normal form, and QF∧(ΣA) only contains conjunctions of <sup>Σ</sup>A-literals. Given a first-order formula <sup>η</sup> over <sup>Σ</sup>A, <sup>σ</sup> is a *model* of <sup>η</sup> (written <sup>σ</sup> <sup>|</sup>=<sup>A</sup> <sup>η</sup>) if it is a model of <sup>A</sup> with carrier <sup>C</sup>A, extended with interpretations for <sup>V</sup> such that <sup>η</sup> is satisfied. As usual, <sup>|</sup>=<sup>A</sup> <sup>η</sup> means that <sup>η</sup> is valid, and <sup>η</sup> <sup>≡</sup><sup>A</sup> <sup>η</sup> means <sup>|</sup>=<sup>A</sup> <sup>η</sup> ⇐⇒ <sup>η</sup>- .

We write x for sequences and x<sup>i</sup> is the i th element of x. We use "::" for concatenation of sequences, where we identify sequences of length 1 with their elements, so we may write, e.g., x :: *xs* instead of [x] :: *xs*.

**Transition Systems.** Let <sup>d</sup> <sup>∈</sup> <sup>N</sup> be fixed, and let x, x- ∈ V<sup>d</sup> be disjoint vectors of pairwise different variables. Each <sup>ψ</sup> <sup>∈</sup> QF(ΣA) induces a relation −→<sup>ψ</sup> on <sup>C</sup><sup>d</sup> A where s −→<sup>ψ</sup> t iff <sup>ψ</sup>[x/s, x- /t] is satisfiable. So for the condition <sup>ψ</sup> := (<sup>x</sup> <sup>=</sup> <sup>y</sup> <sup>∧</sup> x > <sup>0</sup><sup>∧</sup> <sup>=</sup> <sup>x</sup>∧y**– –**) of <sup>τ</sup> <sup>=</sup> -<sup>2</sup> , we have (4, <sup>4</sup>, 4) −→<sup>ψ</sup> (4, <sup>3</sup>, 7). L⊇{init, err} is a finite set of *locations*. A *configuration* is a pair (-, s) ∈L×C<sup>d</sup> <sup>A</sup>, written -(s). A *transition* is a triple τ = (-, ψ, -- ) ∈L× QF(ΣA) × L, written - <sup>→</sup> -- ψ, and its *condition* is cond(τ ) := ψ. W.l.o.g., we assume - <sup>=</sup> err and -- <sup>=</sup> init. Then <sup>τ</sup> induces a relation −→<sup>τ</sup> on configurations where <sup>s</sup> −→<sup>τ</sup> <sup>t</sup> iff <sup>s</sup> <sup>=</sup> -(s),t = -- (t), and s −→<sup>ψ</sup> t. So, e.g., -<sup>2</sup>(4, <sup>4</sup>, 4) −→<sup>τ</sup><sup>=</sup> -2 -<sup>2</sup>(4, 3, 7). We call τ *recursive* if - = -- , *conjunctive* if <sup>ψ</sup> <sup>∈</sup> QF∧(ΣA), *initial* if - = init, and *safe* if -- <sup>=</sup> err. Moreover, we define (- <sup>→</sup> -- <sup>ψ</sup>)|<sup>ψ</sup>- := - <sup>→</sup> -- ψ- . A *transition system* (TS) T is a finite set of transitions, and it induces the relation −→<sup>T</sup> := - <sup>τ</sup>∈T −→<sup>τ</sup> .

*Chaining* τ = <sup>s</sup> <sup>→</sup> t ψ and τ - = -- <sup>s</sup> <sup>→</sup> -- t ψ- yields chain(τ,τ - ) := (<sup>s</sup> → -- t ψc) where ψ<sup>c</sup> := ψ[x- /x--]∧ψ- [x/x--] for fresh x-- ∈ V<sup>d</sup> if <sup>t</sup> = -- <sup>s</sup>, and <sup>ψ</sup><sup>c</sup> := <sup>⊥</sup> (meaning *false*) if <sup>t</sup> <sup>=</sup> -- <sup>s</sup>. So −→chain(τ,τ-) = −→<sup>τ</sup> ◦−→<sup>τ</sup>- , and chain(τ-<sup>1</sup>→-<sup>2</sup> , τ <sup>=</sup> -<sup>2</sup> ) = -<sup>1</sup> <sup>→</sup> -2 <sup>ψ</sup> where <sup>ψ</sup> <sup>≡</sup><sup>A</sup> (<sup>x</sup> <sup>=</sup> <sup>y</sup> <sup>∧</sup> x > <sup>2</sup> · <sup>z</sup> <sup>∧</sup> x > <sup>0</sup> <sup>∧</sup> <sup>=</sup> <sup>x</sup> <sup>∧</sup> <sup>y</sup>**– –**). For non-empty, finite sequences of transitions we define chain([τ ]) := τ and chain([τ1, τ2] :: τ ) := chain(chain(τ1, τ2) :: τ ). We lift notations for transitions to finite sequences via chaining. So cond(τ ) := cond(chain(τ )), τ is *recursive* if chain(τ ) is recursive, −→τ <sup>=</sup> −→chain(τ), etc. If <sup>τ</sup> is initial and cond(<sup>τ</sup> :: τ ) <sup>≡</sup><sup>A</sup> <sup>⊥</sup>, then (<sup>τ</sup> :: τ ) ∈ T <sup>+</sup> is a *finite run*. <sup>T</sup> is safe if every finite run is safe. If there is a <sup>σ</sup> such that <sup>σ</sup> <sup>|</sup>=<sup>A</sup> cond(τ - ) for every finite prefix τ of τ ∈ T <sup>ω</sup>, then τ is an *infinite run*. If no infinite run exists, then T is *terminating*.

**Acceleration.** *Acceleration techniques* compute the transitive closure of relations. In the following definition, we only consider relations defined by conjunctive formulas, since many existing acceleration techniques do not support disjunctions [4], or have to resort to approximations in the presence of disjunctions [13].

**Definition 2 (Acceleration).** *An* acceleration technique *is a function* accel : QF∧(ΣA) → QF∧(ΣA- ) *such that* −→<sup>+</sup> <sup>ψ</sup> = −→accel(ψ)*, where* A *is a firstorder theory. For recursive conjunctive transitions* τ *, we define* accel(τ ) := <sup>τ</sup> <sup>|</sup>accel(cond(τ))*.*

So we clearly have −→<sup>+</sup> <sup>τ</sup> = −→accel(τ). Note that most theories are not "closed under acceleration". E.g., accelerating the Presburger formula x- <sup>1</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> <sup>+</sup> <sup>x</sup><sup>2</sup> <sup>∧</sup> <sup>=</sup> x2 yields the non-linear formula n > <sup>0</sup> <sup>∧</sup> <sup>x</sup>- <sup>1</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> <sup>+</sup> <sup>n</sup> · <sup>x</sup><sup>2</sup> <sup>∧</sup> <sup>=</sup> x2. If neither N nor Z are contained in <sup>C</sup>A, then an additional sort for the range of <sup>n</sup> is required in the formula that results from applying accel. Hence, Definition <sup>2</sup> allows <sup>A</sup>-= A.

#### **3 ADCL for Transition Systems**

We originally proposed the ADCL calculus to analyze satisfiability of linear *Constrained Horn Clauses* (CHCs) [16]. Here, we rephrase it for TSs, and in Sect. 4, we modify it for proving non-termination. The adaption to TSs is straightforward as TSs can be transformed into equivalent linear CHCs and vice versa (see, e.g., [10]).

To bridge the gap between transitions <sup>τ</sup> where cond(<sup>τ</sup> ) <sup>∈</sup> QF(ΣA) and acceleration techniques for formulas from QF∧(ΣA), ADCL uses *syntactic implicants*.

**Definition 3 (Syntactic Implicants** [16, **Def. 6]).** *If* <sup>ψ</sup> <sup>∈</sup> QF(ΣA)*, then:*

$$\begin{aligned} \mathsf{sip}(\psi, \sigma) &:= \bigwedge \{ \pi \text{ is a literal of } \psi \mid \sigma \mid \sigma \mid \!= \! \! \! / \! \! \! / \! \! \mid \!= \! \! \! \! / \! \! \! \mid \!= \! \! \! \! \! \mid \!= \! \! \! \! \! \mid \!= \! \! \! \! \! \mid \!= \! \! \! \! \! \! \mid \!= \! \! \! \! \! \! \! \! \mid \!= \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \!$$

*Here,* sip *abbreviates* syntactic implicant projection*.*

As sip(ψ, σ) is restricted to literals from ψ, sip(ψ) is finite. Syntactic implicants ignore the semantics of literals. So we have, e.g., (X > 1) <sup>∈</sup>/ sip(X > <sup>0</sup>∧X > 1) = {X > <sup>0</sup> <sup>∧</sup> X > <sup>1</sup>}. It is easy to show <sup>ψ</sup> <sup>≡</sup><sup>A</sup> sip(ψ), and thus −→<sup>T</sup> <sup>=</sup> −→sip(<sup>T</sup> ).

Since sip(τ ) is worst-case exponential in the size of cond(τ ), we do not compute it explicitly. Instead, ADCL constructs a run τ step by step, and to perform a step with τ , it searches for a model σ of cond(τ :: τ ). If such a model exists, it appends <sup>τ</sup> <sup>|</sup>sip(cond(τ),σ) to τ . This corresponds to a step with a conjunctive variant of τ whose condition is satisfied by σ. In other words, our calculus constructs sip(cond(τ ), σ) "on the fly" when performing a step with τ , where <sup>σ</sup> <sup>|</sup>=<sup>A</sup> cond(τ :: <sup>τ</sup> )

The core idea of ADCL is to learn new, *non-redundant* transitions via acceleration. Essentially, a transition is redundant if its transition relation is a subset of another transition's relation. Thus, redundant transitions are not useful for (dis-)proving safety.

**Definition 4 (Redundancy,** [16, **Def. 8]).** *A transition* τ *is* (strictly) redundant *w.r.t.* τ - *, denoted* <sup>τ</sup> <sup>τ</sup> - *(*τ τ - *) if* −→<sup>τ</sup> ⊆ −→<sup>τ</sup>- *(*−→<sup>τ</sup> ⊂ −→<sup>τ</sup>- *). For a TS* <sup>T</sup> *, we have* <sup>τ</sup> T *(*<sup>τ</sup> - <sup>T</sup> *) if* <sup>τ</sup> <sup>τ</sup> - *(*τ τ - *) for some* τ -∈ T *.*

In the sequel, we assume oracles for redundancy, satisfiability of QF(ΣA) formulas, and acceleration. In practice, we use incomplete techniques instead (see Sect. 5).

From now on, let T be the TS that is being analyzed with ADCL. A *state* of ADCL consists of a TS <sup>S</sup> that augments <sup>T</sup> with *learned transitions*, a run τ of <sup>S</sup> called the *trace*, and a sequence of sets of *blocking transitions* [Bi] k <sup>i</sup>=0, where transitions that are redundant w.r.t. B<sup>k</sup> must not be appended to the trace.

The following definition introduces the ADCL calculus. It extends the trace step by step (using the rule Step, which performs an evaluation step with a transition) and learns new transitions via acceleration (Accelerate) whenever a suffix of the trace is recursive. To avoid non-terminating ADCL-derivations, our notion of *redundancy* from Definition 4 is used to backtrack whenever a suffix of the trace corresponds to a special case of another (learned) transition (Covered). Moreover, Backtrack is used whenever a run cannot be continued. A more detailed explanation of ADCL is provided after Definition 5.

**Definition 5 (ADCL** [16, **Def. 9, 10]).** *<sup>A</sup>* state *is a triple* (S, [τi] k <sup>i</sup>=1, [Bi] k <sup>i</sup>=0) *where* S⊇T *is a TS,* k <sup>i</sup>=0 <sup>B</sup><sup>i</sup> <sup>⊆</sup> sip(S)*, and* [τi] k <sup>i</sup>=1 <sup>∈</sup> sip(S)∗*. The transitions in* sip(<sup>T</sup> ) *are called* original *and the transitions in* sip(S) \ sip(<sup>T</sup> ) *are* learned*. A transition* <sup>τ</sup><sup>k</sup>+1 <sup>B</sup><sup>k</sup> *is* blocked*, and* <sup>τ</sup><sup>k</sup>+1  <sup>B</sup><sup>k</sup> *is* active *if* chain([τi] k+1 <sup>i</sup>=1 ) *is an initial transition with satisfiable condition (i.e.,* [τi] k+1 <sup>i</sup>=1 *is a run). Let*

$$\mathsf{bbt}(\mathcal{S}, [\tau\_i]\_{i=1}^k, [B\_0, \dots, B\_k]) := (\mathcal{S}, [\tau\_i]\_{i=1}^{k-1}, [B\_0, \dots, B\_{k-1} \cup \{\tau\_k\}])$$

*where* bt *abbreviates "backtrack". Our calculus is defined by the following rules.*

$$\begin{array}{ll}\hline\hline\mathcal{T}\sim(\mathcal{T},[],[\mathcal{D}]) & \mbox{(Inrr)} & \mbox{(\mathcal{S},\vec{\tau},\vec{B})\leadsto\mbox{(\mathcal{S},\vec{\tau}\dots\tau,\vec{B}\dots\mathcal{D})} & \mbox{(\mathcal{S}\text{TEP})}\\\hline\hline\mathcal{T}^{\diamond}\text{ is } recursive & |\vec{\tau}^{\diamond}|=|\vec{\mathcal{B}}^{\diamond}| & \mbox{accel}(\vec{\tau}^{\diamond})=\tau\underline{\mathsf{T}}\operatorname{\bf{sip}}(\mathcal{S})\\\hline(\mathcal{S},\vec{\tau}\dots\vec{\tau}^{\diamond},\vec{B}\dots\vec{\mathcal{B}}^{\diamond})\leadsto\left(\mathcal{S}\cup\{\tau\},\vec{\tau}\dots\tau,\vec{B}\dots\{\tau\}) & \mbox{(ACCELRATE)}\\\hline\\\hline\widetilde{\tau}^{\prime}\sqcap\operatorname{\bf{sip}}(\mathcal{S}) & or & \widetilde{\tau}^{\prime}\sqsubseteq\operatorname{\bf{sip}}(\mathcal{S})\wedge|\widetilde{\tau}^{\prime}|>1\\\hline&s=(\mathcal{S},\vec{\tau}\dots\vec{\tau}^{\prime},\vec{B})\smile\mathsf{b}t(s) & & \mbox{(CoverEpm)}\\\hline\end{array} \text{(A)}\hspace{2.\#\operatorname{\bf{b}}}\text{2.\#\operatorname{\bf{c}}}\text{2.\#\operatorname{\bf{b}}}\text{2.\#\operatorname{\bf{c}}}\text{2.\#\operatorname{\bf{b}}}\text{2.\#\operatorname{\bf{c}}}\text{2.\#\operatorname{\bf{d}}}\text{2.\#\operatorname{\bf{c}}}\text{2.\#\operatorname{\bf{d}}}\text{2.\#\operatorname{\bf{c}}}\text{2.\#\operatorname{\bf{d}}}\text{2.\#\operatorname{\bf$$

*all transitions from* sip(S) *are inactive* <sup>τ</sup> *is safe* <sup>s</sup> = (S, τ :: τ, B ) bt(s) (Backtrack)

$$\frac{\vec{\tau} \text{ is } unsafe}{(\mathcal{S}, \vec{\tau}, \vec{B}) \sim \text{unsafe}} \quad \text{(REFUTE)} \quad \frac{all \text{ transitions from } \text{sip}(\mathcal{S}) \text{ are } inactive \text{ } }$$

We write <sup>I</sup> , <sup>S</sup> , ... to indicate that the rule Init, Step, ... was used. Step adds a transition to the trace. When the trace has a recursive suffix, Accelerate allows for learning a new transition which then replaces the recursive suffix on the trace, or we may backtrack via Covered if the recursive suffix is redundant. Note that Covered does not apply if τ - sip(S) and <sup>|</sup>τ - | = 1, as it could immediately undo every Step, otherwise. If no further Step is possible, Backtrack applies. Note that Backtrack and Covered block the last transition from the trace so that we do not perform the same Step again. If τ is an unsafe run, Refute yields unsafe, and if the entire search space has been exhausted without finding an unsafe run (i.e., if all initial transitions are blocked), Prove yields safe.

The definition of ADCL in [16] is more liberal than ours: In our setting, Accelerate may only be applied if the learned transition is non-redundant, and our definition of "active transitions" enforces that the first transition on the trace is always an initial transition. In [16], these requirements are not enforced by the definition of ADCL, but by the definition of *reasonable strategies* [16, Def. 14]. For simplicity, we integrated these requirements into Definition 5. Additionally, Covered should be preferred over Accelerate, and Accelerate should be preferred over Step.

*Example 6.* We apply ADCL to a version of Example 1 with the additional transition

$$\ell\_1 \to \text{err}\left[x = y \land x > 2 \cdot z \land \overline{\overline{x}} \land \overline{\overline{y}} \land \overline{\overline{z}}\right]. \tag{7\_{\text{err}}}$$

T I (<sup>T</sup> , [], [∅]) <sup>S</sup> 2 (<sup>T</sup> , [τi, <sup>τ</sup>-<sup>1</sup> <sup>|</sup>ψx<z ], [∅, <sup>∅</sup>, <sup>∅</sup>]) (<sup>x</sup> <sup>≤</sup> <sup>1</sup> <sup>∧</sup> <sup>z</sup> <sup>≥</sup> <sup>5</sup><sup>k</sup> <sup>∧</sup> <sup>y</sup> <sup>≤</sup> <sup>z</sup>) A (S1, [τi, τ <sup>+</sup> x<z], [∅, <sup>∅</sup>, {<sup>τ</sup> <sup>+</sup> x<z}]) (<sup>x</sup> <sup>≤</sup> <sup>z</sup> <sup>∧</sup> <sup>z</sup> <sup>≥</sup> <sup>5</sup><sup>k</sup> <sup>∧</sup> <sup>y</sup> <sup>≤</sup> <sup>z</sup>) S (S1, [τi, τ <sup>+</sup> x<z, τ-<sup>1</sup> <sup>|</sup>ψx≥<sup>z</sup> ], [∅, <sup>∅</sup>, {<sup>τ</sup> <sup>+</sup> x<z}, <sup>∅</sup>]) (<sup>x</sup> <sup>=</sup> <sup>z</sup> + 1 <sup>∧</sup> <sup>z</sup> <sup>≥</sup> <sup>5</sup><sup>k</sup> <sup>∧</sup> <sup>y</sup> <sup>≤</sup> <sup>z</sup> + 1) A (S2, [τi, τ <sup>+</sup> x<z, τ <sup>+</sup> <sup>x</sup>≥z], [∅, <sup>∅</sup>, {<sup>τ</sup> <sup>+</sup> x<z}, {<sup>τ</sup> <sup>+</sup> <sup>x</sup>≥z}]) (<sup>x</sup> <sup>≥</sup> <sup>y</sup> <sup>∧</sup> x>z <sup>≥</sup> <sup>5</sup><sup>k</sup> <sup>∧</sup> <sup>y</sup> <sup>≤</sup> <sup>2</sup> · <sup>z</sup> + 1) S (S2, [τi, τ <sup>+</sup> x<z, τ <sup>+</sup> <sup>x</sup>≥z, <sup>τ</sup>err], [∅, <sup>∅</sup>, {<sup>τ</sup> <sup>+</sup> x<z}, {<sup>τ</sup> <sup>+</sup> <sup>x</sup>≥z}, <sup>∅</sup>]) (<sup>x</sup> = 2 · <sup>z</sup> +1= <sup>y</sup> <sup>∧</sup> <sup>z</sup> <sup>≥</sup> <sup>5</sup>k) R

unsafe

Here, 5k abbreviates 5000 and:

$$\begin{aligned} \psi\_{x 0 \wedge x' = x + n \wedge x + n \le z \wedge \stackrel{\scriptstyle \rightarrow}{y} \wedge \stackrel{\scriptstyle \rightarrow}{z} \right] \\ \tau^{+}\_{x\geq z} &:= \ell\_{1} \to \ell\_{1} \left[ y + n - 1 \le 2 \cdot z \wedge n > 0 \wedge x' = x + n \wedge x \ge z \wedge y' = y + n \wedge \stackrel{\scriptstyle \rightarrow}{z} \right] \\ \mathcal{S}\_{1} &:= \mathcal{T} \cup \left\{ \tau^{+}\_{x$$

On the right, we show formulas describing the configurations that are reachable with the current trace. Every -derivation starts with Init. The first two Steps add the initial transition τ<sup>i</sup> and an element of sip(τ-<sup>1</sup> ) to the trace. Since x<z holds after applying τi, the only possible choice for the latter is τ-<sup>1</sup> |<sup>ψ</sup>x<z .

As τ-<sup>1</sup> <sup>|</sup><sup>ψ</sup>x<z is recursive, it is accelerated and replaced with accel(τ-<sup>1</sup> |<sup>ψ</sup>x<z ) = τ <sup>+</sup> x<z, which simulates n steps with τ-<sup>1</sup> <sup>|</sup><sup>ψ</sup>x<z . Moreover, <sup>τ</sup> <sup>+</sup> x<z is also added to the current set of blocking transitions, as we always have −→<sup>2</sup> <sup>τ</sup> ⊆ −→<sup>τ</sup> for learned transitions τ and thus adding them to the trace twice in a row is pointless.

Next, τ-<sup>1</sup> is applicable again. As neither x<z nor <sup>x</sup> <sup>≥</sup> <sup>z</sup> holds for all reachable configurations, we could continue with any element of sip(τ-<sup>1</sup> ) = {τ-<sup>1</sup> <sup>|</sup>ψx<z , <sup>τ</sup>-<sup>1</sup> <sup>|</sup>ψx≥<sup>z</sup> }. We choose <sup>τ</sup>-<sup>1</sup> <sup>|</sup>ψx≥<sup>z</sup> , so that the recursive transition <sup>τ</sup>-<sup>1</sup> |ψx≥<sup>z</sup> can be accelerated to τ <sup>+</sup> <sup>x</sup>≥z. Then <sup>τ</sup>err applies, and the proof is finished via Refute.

For our purposes, the most important property of ADCL is the following.

**Theorem 7.** *If* <sup>T</sup> <sup>∗</sup> (S, τ, B ) *and* τ *is non-empty, then* cond(τ ) <sup>≡</sup><sup>A</sup> <sup>⊥</sup> *and* −→τ ⊆ −→<sup>+</sup> <sup>T</sup> *. So if* <sup>T</sup> <sup>∗</sup> unsafe*, then* <sup>T</sup> *is unsafe.*

The other properties of ADCL that were shown in [16] immediately carry over to our setting, too: if <sup>T</sup> <sup>∗</sup> safe, then <sup>T</sup> is safe; if <sup>T</sup> is unsafe, then <sup>T</sup> <sup>∗</sup> unsafe; in general, does not terminate. The proofs are analogous to [16].

### **4 Proving Non-Termination with ADCL-NT**

From now on, we assume that the analyzed TS T does not contain unsafe transitions. To prove non-termination, we look for a corresponding *certificate*.

**Definition 8 (Certificate of Non-Termination).** *Let* τ = - <sup>→</sup> - -...*. A satisfiable formula* <sup>ψ</sup> certifies non-termination of <sup>τ</sup> *, written* <sup>ψ</sup> <sup>|</sup>=<sup>∞</sup> <sup>A</sup> <sup>τ</sup> *, if for any model* σ *of* ψ*, there is an infinite sequence* -(σ(x)) = <sup>s</sup><sup>1</sup> −→<sup>τ</sup> <sup>s</sup><sup>2</sup> −→<sup>τ</sup> ...

There exist many techniques for finding certificates of non-termination automatically, see Sect. 5. However, Definition 8 has several shortcomings. First, the problem of finding such certificates becomes very challenging if cond(τ ) contains disjunctions. Second, it is insufficient to consider a single transition when only non-singleton sequences τ such that chain(τ ) is recursive admit non-terminating runs. Third, just finding a certificate <sup>ψ</sup> of non-termination for some τ ∈ T <sup>∗</sup> does not suffice for proving non-termination of T . Additionally, a proof that the pre-image of −→τ|<sup>ψ</sup> is reachable from an initial configuration is required. All of these problems can be solved by integrating the search for certificates of non-termination into the ADCL calculus.

**Definition 9 (ADCL-NT).** *To prove non-termination, we extend ADCL with the rule* Nonterm *and modify* Covered *as shown below. We write* nt *for the relation defined by the (modified) rules from Definition 5 and* Nonterm*.*

$$\begin{array}{ccl}\overline{\tau}^{\complement} \text{is } \text{recursive} & \overline{\tau}^{\complement} \sqsubset \mathsf{sin}(\mathcal{S}) \text{ or } \overline{\tau}^{\complement} \sqsubseteq \mathsf{sin}(\mathcal{S}) \wedge |\overline{\tau}^{\complement}| > 1} \\\hline & s = (\mathcal{S}, \vec{\tau} :: \vec{\tau}^{\complement}, \vec{B}) \multimap\_{\mathsf{nt}} \text{bt}(s) \\\hline \mathsf{chain}(\overline{\tau}^{\complement}) = \ell \to \ell \left[ \dots \right] & \psi \vdash\_{\mathcal{A}}^{\complement} \vec{\tau}^{\complement} \quad \tau = \ell \to \mathsf{err} \left[ \psi \right] \kern-1. \mathsf{f} \text{ } \mathsf{sin}(\mathcal{S}) \\\hline & (\mathcal{S}, \vec{\tau} :: \vec{\tau}^{\complement}, \vec{B}) \multimap\_{\mathsf{nt}} \left( \mathcal{S} \cup \{\tau\}, \vec{\tau} :: \vec{\tau}^{\complement}, \vec{B} \right) \end{array} \quad \text{(Coverhead's  $\textbf{\bar{\upimath}}$ (\mathcal{S}))}$$

So the idea of Nonterm is to apply a technique which searches for a certificate of non-termination to a recursive suffix of the trace. Apart from introducing Nonterm, we restricted Covered to recursive suffixes. The reason is that backtracking when the trace has a redundant, non-recursive suffix may prevent us from analyzing loops, resulting in a precision issue.

*Example 10.* Let <sup>T</sup> := {τi, τ - <sup>i</sup> , τ-, τ--} where

$$
\tau\_{\mathsf{i}} := \mathsf{init} \to \ell \left[ \top \right] \quad \tau\_{\mathsf{i}}' := \mathsf{init} \to \ell' \left[ \top \right] \quad \tau\_{\ell} := \ell \to \ell' \left[ \top \right] \quad \tau\_{\ell'} := \ell' \to \ell \left[ \top \right].
$$

and means *true*. Due to the loop - −→τ- -- −→τ-- -, T is clearly nonterminating. Without requiring that τ is recursive in Covered, <sup>T</sup> can be analyzed as follows:

$$\begin{split} & \mathcal{T} \stackrel{\text{I}}{\sim}\_{\text{nt}} (\mathcal{T}, [], [\mathcal{D}]) \stackrel{\text{S}}{\sim}\_{\text{nt}}^{2} (\mathcal{T}, [\tau\_{\text{t}}, \tau\_{\text{t}}], \mathcal{B}^{\text{3}}) \stackrel{\text{C}}{\sim}\_{\text{nt}} (\mathcal{T}, [\tau\_{\text{l}}], [\mathcal{D}, \{\tau\_{\text{f}}\}]) \stackrel{\text{B}}{\sim}\_{\text{nt}} (\mathcal{T}, [], [\{\tau\_{\text{l}}\}]) \\ & \stackrel{\text{S}}{\sim}\_{\text{nt}}^{2} (\mathcal{T}, [\tau\_{\text{l}}^{\prime}, \tau\_{\text{t}}^{\prime}], \{\tau\}) \stackrel{\text{C}}{\sim}\_{\text{nt}} (\mathcal{T}, [\tau\_{\text{l}}^{\prime}], [\{\tau\_{\text{l}}\}, \{\tau\_{\text{t}}\}]) \stackrel{\text{B}}{\sim}\_{\text{nt}} (\mathcal{T}, [], [\{\tau, \tau\_{\text{l}}^{\prime}\}]) \stackrel{\text{B}}{\sim}\_{\text{nt}} \mathbf{s} \mathbf{f} \mathbf{e} \end{split}$$

The 1st application of Covered is possible as [τi, τ-] <sup>τ</sup> - <sup>i</sup> and the 2nd application of Covered is possible as [τ - <sup>i</sup> , τ-- ] <sup>τ</sup>i. Note that the trace never contains both τ and τ--, but both transitions are needed to prove non-termination.

Recall the shortcomings of Definition 8 mentioned above. First, due to the use of syntactic implicants, ADCL-NT reduces reasoning about arbitrary transitions to reasoning about conjunctive transitions. Second, as Nonterm considers a suffix τ of the trace, it can prove non-termination of sequences of transitions. Third, ADCL's capability to prove reachability directly carries over to our goal of proving non-termination. So in contrast to most other approaches (see Sect. 5), ADCL-NT does not have to resort to other tools or techniques for proving reachability.

We only search for a certificate of non-termination for τ if ADCL-NT established reachability of the pre-image of −→τ beforehand. Note, however, that this does not imply reachability of the pre-image of −→-<sup>→</sup>err <sup>ψ</sup>, as ψ entails cond(τ -), but not the other way around. Hence, we cannot directly derive nontermination of <sup>T</sup> when Nonterm applies. Regarding the strategy for nt, one should try to use Nonterm once for each recursive suffix of the trace.

*Example 11.* Reconsider Example 1. Up to (excluding) the second-last step, the derivation from Example 6 remains unchanged. Then we get

(S2, [τi, τ <sup>+</sup> x<z, τ <sup>+</sup> <sup>x</sup>≥<sup>z</sup>], [... ]) (<sup>x</sup> <sup>≥</sup> <sup>y</sup> <sup>∧</sup> x > <sup>5</sup>k) S 4 nt (S2, [τi, τ <sup>+</sup> x<z, τ <sup>+</sup> <sup>x</sup>≥<sup>z</sup>, <sup>τ</sup>-<sup>1</sup>→-<sup>2</sup> , τ <sup>=</sup> -<sup>2</sup> , <sup>τ</sup> <sup>=</sup> -<sup>2</sup> <sup>|</sup><sup>ψ</sup>x>y , <sup>τ</sup> <sup>=</sup> -<sup>2</sup> <sup>|</sup><sup>ψ</sup>x<y ], [...]) (1 <sup>≡</sup><sup>2</sup> <sup>y</sup> <sup>=</sup> x > <sup>10</sup>k) N nt (S3, [τi, τ <sup>+</sup> x<z, τ <sup>+</sup> <sup>x</sup>≥<sup>z</sup>, <sup>τ</sup>-<sup>1</sup>→-<sup>2</sup> , τ <sup>=</sup> -<sup>2</sup> , <sup>τ</sup> <sup>=</sup> -<sup>2</sup> <sup>|</sup><sup>ψ</sup>x>y , <sup>τ</sup> <sup>=</sup> -<sup>2</sup> <sup>|</sup><sup>ψ</sup>x<y ], [...]) (1 <sup>≡</sup><sup>2</sup> <sup>y</sup> <sup>=</sup> x > <sup>10</sup>k) S nt (S3, [τi, τ <sup>+</sup> x<z, τ <sup>+</sup> <sup>x</sup>≥<sup>z</sup>, <sup>τ</sup>-<sup>1</sup>→-<sup>2</sup> , τ <sup>=</sup> -<sup>2</sup> , <sup>τ</sup> <sup>=</sup> -<sup>2</sup> <sup>|</sup><sup>ψ</sup>x>y , <sup>τ</sup> <sup>=</sup> -<sup>2</sup> <sup>|</sup><sup>ψ</sup>x<y , τerr], [...]) <sup>R</sup> nt unsafe where <sup>ψ</sup>x>y := x > <sup>0</sup> <sup>∧</sup> y > <sup>0</sup> <sup>∧</sup> <sup>x</sup> <sup>=</sup> <sup>y</sup> <sup>∧</sup> x>y <sup>∧</sup> <sup>y</sup> <sup>=</sup> x τerr := <sup>2</sup> <sup>→</sup> err x = y > 1 <sup>ψ</sup>x<y := x > <sup>0</sup> <sup>∧</sup> y > <sup>0</sup> <sup>∧</sup> <sup>x</sup> <sup>=</sup> <sup>y</sup> <sup>∧</sup> x<y <sup>∧</sup> <sup>=</sup> y S<sup>3</sup> := S<sup>2</sup> ∪ {τerr}

The formulas on the right describe the values of x and y that are reachable with the current trace, where 1 <sup>≡</sup><sup>2</sup> <sup>y</sup> means that <sup>y</sup> is odd. After the first Step with τ-<sup>1</sup>→-<sup>2</sup> , just τ <sup>=</sup> -<sup>2</sup> can be used, as cond(τ-<sup>1</sup>→-<sup>2</sup> ) implies x- = y- . While τ <sup>=</sup> -<sup>2</sup> is recursive, Accelerate cannot be applied next, as −→<sup>τ</sup> <sup>=</sup> -2 <sup>=</sup> −→<sup>+</sup> τ <sup>=</sup> -2 , so the learned transition would be redundant. Thus, we continue with τ <sup>=</sup> -<sup>2</sup> , projected to x>y (as cond(τ <sup>=</sup> -<sup>2</sup> ) implies <sup>x</sup>- = y-+ 1). Again, all transitions that could be learned are redundant, so Accelerate does not apply. We next use τ <sup>=</sup> -<sup>2</sup> projected to x<y, as the previous Step swapped x and y. As the suffix [τ <sup>=</sup> -<sup>2</sup> , <sup>τ</sup> <sup>=</sup> -<sup>2</sup> <sup>|</sup>ψx>y , <sup>τ</sup> <sup>=</sup> -<sup>2</sup> |ψx<y ] of the trace does not terminate (see Example 1), Nonterm applies. So we learn the transition τerr, which is added to the trace to finish the proof, afterwards.

**Theorem 12.** *If* T <sup>∗</sup> nt unsafe*, then* <sup>T</sup> *does not terminate.*

While Theorem 12 establishes the soundness of our approach, we now investigate completeness. In contrast to ADCL for safety (Sect. 3), ADCL-NT is not refutationally complete, but the proof is non-trivial. So in the following, we show that there are non-terminating TSs T where T <sup>∗</sup> nt unsafe. To prove incompleteness, we adapt the construction from the proof that ADCL does not terminate [16, Thm. 18]. There, states (S, τ, B ) were extended by a component <sup>L</sup> that maps every element of sip(S) to a regular language over sip(<sup>T</sup> ). However, the proof of [16, Thm. 18] just required reasoning about finite (prefixes of infinite) runs, but we have to reason about infinite runs. So in our setting <sup>L</sup> maps each element <sup>τ</sup> of sip(S) to a regular or an <sup>ω</sup>-regular language over sip(<sup>T</sup> ), i.e., <sup>L</sup>(<sup>τ</sup> ) <sup>⊆</sup> sip(<sup>T</sup> )<sup>∗</sup> or <sup>L</sup>(<sup>τ</sup> ) <sup>⊆</sup> sip(<sup>T</sup> )<sup>ω</sup>. We lift <sup>L</sup> from sip(S) to sequences of transitions as follows.

$$\mathcal{L}(\varepsilon) := \varepsilon \qquad \qquad \mathcal{L}(\vec{\tau} :: \tau) := \mathcal{L}(\vec{\tau}) :: \mathcal{L}(\tau) \quad \text{if} \quad \mathcal{L}(\tau) \subseteq \text{sip}(\tau)^\*$$

Here, "::" denotes language concatenation (i.e., <sup>L</sup><sup>1</sup> :: <sup>L</sup><sup>2</sup> <sup>=</sup> {τ<sup>1</sup> :: <sup>τ</sup><sup>2</sup> <sup>|</sup> <sup>τ</sup><sup>1</sup> <sup>∈</sup> <sup>L</sup>1, τ<sup>2</sup> ∈ L2}) and we only consider sequences where <sup>L</sup>(<sup>τ</sup> ) is regular (not <sup>ω</sup>regular) to ensure that L is well defined. So while we lift other notations to sequences of transitions via chaining, <sup>L</sup>(τ ) does *not* stand for <sup>L</sup>(chain(τ )).

**Definition 13 (ADCL-NT with Regular Languages).** *We extend states by a fourth component* <sup>L</sup>*, and adapt* Init*,* Accelerate*, and* Nonterm *as follows:*

$$\mathcal{L}(\tau) = \{\tau\} \text{ for all } \tau \in \text{sip}(\mathcal{T})$$

$$\frac{\mathcal{L} \curvearrowright \ (\mathcal{T}, [], [\mathcal{Q}], \mathcal{L})}{}$$

τ *is recursive* |τ -<sup>|</sup> <sup>=</sup> <sup>|</sup>B -<sup>|</sup> accel(τ -) = <sup>τ</sup> sip(S) (S, τ :: τ -, B :: B -, <sup>L</sup>) nt (S∪{τ}, τ :: τ, B :: {τ}, <sup>L</sup> (τ → L(τ -) <sup>+</sup>)) (Accelerate) chain(τ -) = → -... ψ |=<sup>∞</sup> <sup>A</sup> τ <sup>τ</sup> <sup>=</sup> <sup>→</sup> err <sup>ψ</sup> sip(S) (S, τ :: τ -, B, <sup>L</sup>) nt (S∪{τ}, τ :: τ -, B, <sup>L</sup> (τ → L(τ -) <sup>ω</sup>)) (Nonterm)

*All other rules from Definition 5 leave the last component of the state unchanged.*

Here, <sup>L</sup>(π)<sup>+</sup> := - <sup>n</sup>∈N≥<sup>1</sup> <sup>L</sup>(π)<sup>n</sup>, and <sup>L</sup>(π)<sup>ω</sup> is the <sup>ω</sup>-regular language consisting of all words that result from concatenating infinitely many elements of <sup>L</sup>(π) \ {ε}.

In Accelerate and Nonterm, chain(τ -) is recursive. Thus, τ does not contain unsafe transitions. Hence, <sup>L</sup>(τ -) and thus also <sup>L</sup>(τ -)<sup>+</sup> are well defined and regular, and <sup>L</sup>(τ -)<sup>ω</sup> is <sup>ω</sup>-regular. Moreover, the use of "" is justified by the condition <sup>τ</sup>  sip(S). The next lemma states two crucial properties about L.

**Lemma 14.** *Assume* T <sup>∗</sup> nt (S, τ, B, <sup>L</sup>) *and let* <sup>τ</sup> = (- <sup>→</sup> -- <sup>ψ</sup>) <sup>∈</sup> sip(S)*.*


Based on this lemma, we can prove that our extension of nt from Definition 13 is not refutationally complete. Then refutational incompleteness of ADCL-NT as introduced in Definition 9 follows immediately. The reason is that L is only used in the premise of Init in Definition 13, but there the requirement "L(<sup>τ</sup> ) = {τ} for all <sup>τ</sup> <sup>∈</sup> sip(<sup>T</sup> )" is trivially satisfiable by choosing <sup>L</sup> accordingly.

#### **Theorem 15.** *There is a non-terminating TS* T *such that* T <sup>∗</sup> nt unsafe*.*

*Proof (Sketch).* As in the proof of [16, Thm. 18], for any (original or learned) transition <sup>τ</sup> such that <sup>L</sup>(<sup>τ</sup> ) is regular, <sup>L</sup>(<sup>τ</sup> ) contains at most one square-free word (i.e., a word without a non-empty infix <sup>w</sup> :: <sup>w</sup>). Thus, if <sup>L</sup>(<sup>τ</sup> ) is <sup>ω</sup>-regular, then <sup>L</sup>(<sup>τ</sup> ) does not contain an infinite square-free word. Moreover, as in the proof of [16, Thm. 18], one can construct a TS <sup>T</sup> that admits a single infinite run τ , and this infinite run is square-free. Thus, there is no transition <sup>τ</sup> such that <sup>L</sup>(<sup>τ</sup> ) contains a suffix of τ , i.e., no nt-derivation starting with <sup>T</sup> corresponds to τ . Hence, by Lemma 14, assuming T <sup>∗</sup> nt unsafe results in a contradiction.

Since ADCL can prove unsafety as well as safety, it is natural to ask if there is a dual to ADCL-NT that can prove termination. The most obvious approach would be the following: Whenever the trace has a recursive suffix τ -, then termination of τ needs to be proven before the next -step. The following example shows that this is not enough to ensure that <sup>T</sup> <sup>+</sup> nt safe implies termination of T .

*Example 16.* Let <sup>T</sup> := {τ<sup>i</sup> <sup>=</sup> init <sup>→</sup> - <sup>ψ</sup>i}∪{τ<sup>m</sup> <sup>=</sup> - <sup>→</sup> - <sup>ψ</sup>m <sup>|</sup> <sup>0</sup> <sup>≤</sup> <sup>m</sup> <sup>≤</sup> <sup>2</sup>} and

$$
\psi\_1 := x' = 0 \quad \psi\_0 := x = 0 \land x' = 1 \quad \psi\_1 := x = 1 \land x' = 2 \quad \psi\_2 := x = 2 \land x' = 1.
$$

As we have -(1) −→<sup>τ</sup><sup>1</sup> -(2) −→<sup>τ</sup><sup>2</sup> -(1), T is clearly non-terminating. We get:

T I nt (<sup>T</sup> , [], [∅]) <sup>S</sup> 3 nt (<sup>T</sup> , [τi, τ0, τ1], <sup>∅</sup><sup>4</sup> ) A nt (S1, [τi, τ01], <sup>∅</sup><sup>2</sup> :: {τ01}) S nt (S1, [τi, τ01, τ2], <sup>∅</sup><sup>2</sup> :: {τ01} :: <sup>∅</sup>) <sup>A</sup> nt (S2, [τi, τ012], <sup>∅</sup><sup>2</sup> :: {τ01, τ012}) S nt (S2, [τi, τ012, τ1], <sup>∅</sup><sup>2</sup> :: {τ01, τ012} :: <sup>∅</sup>) <sup>C</sup> nt (S2, [τi, τ012], <sup>∅</sup><sup>2</sup> :: {τ01, τ012, τ1}) B nt (S2, [τi], <sup>∅</sup> :: {τ012}) <sup>∗</sup> nt (S2, [τi], <sup>∅</sup> :: {τ012, τ0, τ01}) <sup>B</sup> nt (S2, [], [{τ<sup>i</sup>}]) <sup>P</sup> nt safe

After three Steps, we accelerate the recursive suffix [τ0, τ1] of the trace, resulting in τ<sup>01</sup> = - <sup>→</sup> - <sup>x</sup> = 0∧x-= 2 and <sup>S</sup><sup>1</sup> <sup>=</sup> T ∪{τ01}. After one more step, [τ01, τ2] is accelerated to τ<sup>012</sup> = - <sup>→</sup> - <sup>x</sup> = 0 <sup>∧</sup> <sup>x</sup>- = 1 and we get <sup>S</sup><sup>2</sup> <sup>=</sup> <sup>S</sup><sup>1</sup> ∪ {τ012}. After the next step, [τ012, τ1] is redundant w.r.t. τ01, so Covered applies. Then we Backtrack, as no other transitions are active. The next Steps also yield states that allow for backtracking (as their traces have the redundant suffixes [τ0, τ1] and [τ01, τ2]), so we can finally apply Backtrack again and finish with Prove.

Note that whenever the trace has a recursive suffix, then it leads from -(i) to -(j) where <sup>i</sup> <sup>=</sup> <sup>j</sup>, i.e., each such suffix is trivially terminating. In particular, the cycle -(1) −→<sup>τ</sup><sup>1</sup> -(2) −→<sup>τ</sup><sup>2</sup> -(1) is not apparent in any of the states.

This example reveals a fundamental problem when adapting ADCL for proving termination: ADCL ensures that all reachable *configurations* are covered, which is crucial for proving safety, but there are no such guarantees for all *runs*. Therefore, we think that adapting ADCL for proving termination requires major changes.

### **5 Related Work and Experiments**

We presented ADCL-NT, a variant of ADCL for proving non-termination. The key insight is that tightly integrating techniques to detect non-terminating transitions into ADCL allows for handling classes of TSs that are challenging for other techniques. In particular, ADCL-NT can find non-terminating executions involving disjunctive transitions or complex patterns of transitions. Moreover, it tightly couples the search for non-terminating configurations and the proof of their reachability, whereas other approaches usually separate these two steps.

**Related Work.** There are many techniques to find certificates of nontermination [2,14,15,22,23,25]. We could use any of them (they are black boxes for ADCL-NT).

Most non-termination techniques for TSs first search for non-terminating configurations, and then prove their reachability [5,6,9,22], or they extract and analyze *lassos* [23]. In contrast, ADCL-NT tightly integrates the search for nonterminating configurations and reachability analysis.

Earlier versions of our tool LoAT [12,15] also interleaved both steps using a technique akin to the state elimination method to transform finite automata to regular expressions. This technique cannot handle disjunctions, and it is incomplete for reachability. Hence, LoAT is now solely based on ADCL-NT.

**Implementation.** So far, our implementation in our tool LoAT is restricted to integer arithmetic. It uses the technique from [15] for acceleration and finding certificates of non-termination, the SMT solvers Z3 [26] and Yices [11], the recurrence solver PURRS [1], and libFAUDES [24] to implement the automata-based redundancy check from [16].

**Experiments.** To evaluate our implementation in LoAT, we used the 1222 *Integer Transition Systems* (ITSs) and the 335 C *Integer Programs* from the *Termination Problems Database* [28] used in *TermComp* [21]. The C programs are small, hand-crafted examples that often require complex proofs. The ITSs are significantly larger, as they were obtained from automatic transformations of C or Java programs. Moreover, they contain a lot of "noise", e.g., branches where termination is trivial or variables that are irrelevant for (non-)termination. Thus, they are well suited to test the scalability and robustness of the tools.

We compared our implementation (LoAT ADCL) with other leading termination analyzers: iRankFinder [2,9], T2 [6], Ultimate [8], VeryMax [3,22], and the previous version of LoAT [15] (LoAT '22). For T2, VeryMax, and Ultimate, we took the versions of their last *TermComp* participations (2015, 2019, and 2022). For iRankFinder, we used the configuration from the evaluation of [15], which is tailored towards proving non-termination. We excluded AProVE [20], as it cannot prove non-termination of ITSs, and it uses LoAT and T2 as backends when analyzing C programs. Moreover, we excluded Ultimate from the evaluation on ITSs, as it cannot parse them. All experiments were run on StarExec [27] with 300 s wallclock timeout, 1200 s CPU timeout, and 128 GB memory limit per example.


The table above shows the results for ITSs, where the column "unique" contains the number of examples that could be solved by the respective tool, but no others. It shows that LoAT ADCL is the most powerful tool for proving non-termination of ITSs. The main reasons for the improvement are that LoAT ADCL builds upon a complete technique for proving reachability (in contrast to, e.g., LoAT '22), and the close integration of non-termination techniques into a technique for proving reachability, whereas most competing tools separate these steps from each other.

If we only consider the examples where non-termination is proven, LoAT ADCL is also the fastest tool. If we consider all examples, then the *average* runtime of LoAT ADCL is significantly slower. This is not surprising, as ADCL-NT does not terminate in general. So while it is very fast in most cases (as witnessed by the very fast *median* runtime), it times out more often than the other tools.

For <sup>C</sup> integer programs, the best tools are very close (VeryMax: 103×No, LoAT ADCL: 102×No, Ultimate: 100×No). Regarding runtimes, the situation is analogous to ITSs. See [18] for detailed results, more information about our evaluation, and a pre-compiled binary. LoAT is open-source and available on GitHub [17].

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **COOL 2 – A Generic Reasoner for Modal Fixpoint Logics (System Description)**

Oliver G¨orlitz<sup>1</sup>, Daniel Hausmann<sup>2</sup> , Merlin Humml1(B) , Dirk Pattinson<sup>3</sup> , Simon Prucker<sup>1</sup> , and Lutz Schr¨oder<sup>1</sup>

<sup>1</sup> Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany merlin.humml@fau.de

<sup>2</sup> Gothenburg University, Gothenburg, Sweden

<sup>3</sup> Australian National University, Canberra, Australia

**Abstract.** There is a wide range of modal logics whose semantics goes beyond relational structures, and instead involves, e.g., probabilities, multi-player games, weights, or neighbourhood structures. Coalgebraic logic serves as a unifying semantic and algorithmic framework for such logics. It provides uniform reasoning algorithms that are easily instantiated to particular, concretely given logics. The *COOL 2* reasoner provides an implementation of such generic algorithms for coalgebraic modal fixpoint logics. As concrete instances, we obtain in particular reasoners for the aconjunctive and alternation-free fragments of the graded μ-calculus and the alternating-time μ-calculus. We evaluate the tool on standard benchmark sets for fixpoint-free graded modal logic and alternating-time temporal logic (ATL), as well as on a dedicated set of benchmarks for the graded μ-calculus.

### **1 Introduction**

Modal and temporal logics are established tools in the specification and verification of systems. While many such logics are interpreted over relational transition systems, the semantics of quite a number of important logics goes beyond the relational setup, involving, for instance, probabilities [20,30], concurrent games as in alternating-time logics [1,36], monotone neighbourhoods structures as in game logic [34] and concurrent dynamic logic [37], or integer transition weights as in the multigraph semantics [5] of the graded μ-calculus [25]. *Coalgebraic logic* [4] provides a uniform semantic and algorithmic framework for these logics, based on the paradigm of *universal coalgebra* [38]. It provides reasoning algorithms of optimal complexity at various levels of expressiveness, up to the coalgebraic

D. Hausmann—Supported by the ERC Consolidator grant D-SynMA (No. 772459)

M. Humml—Supported by Deutsche Forschungsgemeinschaft (DFG) as part of the Research Training Group 2475 (grant number 393541319/GRK2475/1-2019) and the project 'RAND' (grant number 377333057).

L. Schr¨oder—Supported by Deutsche Forschungsgemeinschaft (DFG) under project no. 419850228.

B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 234–247, 2023. https://doi.org/10.1007/978-3-031-38499-8\_14

μ-calculus [3,21–23]. These algorithms are parametric in the transition type of systems (weighted, probabilistic, game-based etc.) as well as in suitable choices of modalities specific to the given system type. Their instantiation to specific logics requires providing either a set of next-step modal tableau rules satisfying a suitable completeness criterion [41] or, more generally, a plug-in algorithm that determines satisfiability for an extremely simple *one-step logic* that describes the interaction between modalities, and consists of (conjunctions of) modal operators applied to variables only [29].

The *COalgebraic Ontology Logic solver (COOL)* provides reasoning support for coalgebraic logics based on these generic algorithms. The first version of the tool [15] provided reasoning support for fixpoint-free coalgebraic hybrid logic with global assumptions, using a global caching principle [13]. In the present paper, we present *COOL 2*, which provides reasoning support for coalgebraic fixpoint logics, specifically for both the aconjunctive fragment and the alternationfree fragment of the coalgebraic μ-calculus. By instantiation, we obtain in particular the first implemented reasoners for the graded μ-calculus [26] (for which a set of coalgebraic modal tableau rules has been described in the literature [41]; however, this rule set has later turned out to be incomplete, cf. Remark 2.3) and the alternating-time μ-calculus [1]. We describe the structure of the tool including implementational details, and present evaluation results, focusing on the graded μ-calculus and alternating-time temporal logic (ATL). Additional details on the evaluation can be found in the full version [17].

*Related Work:* We have already mentioned work in coalgebraic logic on which COOL is based [3,13,21–23,41]. COOL is conceptually a successor of the *Coalgebraic Logic Satisfiability Solver (CoLoSS)* [2] but does not share any of its code. CoLoSS implements fixpoint-free logics, and is entirely unoptimised. The first version of COOL [15] has been evaluated on fixpoint-free next-step logics.

COOL does cover also various relational modal logics, for which there are numerous specialised reasoners, including highly optimised description logic reasoners such as FaCT++ [44], Pellet [42], RACER [18], and HermiT [12]. As these systems do not support fixpoint logics, a comparison would be of limited value. In previous work, COOL has been evaluated on various relational fixpoint logics, and has been shown to perform favourably on Computation Tree Logic [23] (in comparison to reasoners featured in a previous systematic evaluation [14]), as well as on the aconjunctive fragment of the modal μ-calculus [22] (in comparison to MLSolver [11]). A reasoner for (next-step) graded modal logic has been evaluated against various description logic reasoners [43], using however the above-mentioned incomplete set of modal tableau rules.

For the same reasons, we refrain from evaluating COOL 2 against reasoners for coalition logic, i.e. the fixpoint-free fragment of the alternating-time μcalculus, such as CLProver [32]. The only implemented reasoner for any fragment of the alternating-time μ-calculus that does include fixpoints still appears to be the tableau reasoner TATL for alternating-time temporal logic [6,7]. TATL has been compared to COOL on random formulas in previous work [23].

### **2 Satisfiability in the Coalgebraic** *µ***-Calculus**

COOL 2 is a satisfiability checker for the coalgebraic μ-calculus [3], that is, for the extension of coalgebraic modal logic with extremal fixpoint operators. Formulas of this logic are interpreted over coalgebras, where the semantics of modal operators is defined by means of so-called *predicate liftings* [41]; we recapitulate examples of system types and modalities subsumed by this paradigm in Example 2.1.

*Syntax:* Formulas are built relative to a set Var of fixpoint variables and a *modal similarity type* Λ, that is, a set of modal operators with assigned finite arities that is closed under duals, with ♥ ∈ Λ denoting the dual of ♥ ∈ Λ. Formulas ψ, φ, . . . of the *coalgebraic* μ*-calculus* over Λ are given by the grammar

ψ, φ := ⊥|| ψ ∧ φ | ψ ∨ φ | ♥(ψ1,...,ψn) | X | μX. ψ | νX. ψ,

where ♥ ∈ <sup>Λ</sup> has arity <sup>n</sup> and <sup>X</sup> <sup>∈</sup> Var. A formula <sup>χ</sup> is *aconjunctive* if for every conjunction ψ∧φ that is a subformula of χ, at most one of the formulas ψ and φ contains a free fixpoint variable X that is bound by a least fixpoint operator μX. While the logic does not contain negation as an explicit operator, full negation can be defined as usual; e.g. we have ¬♥ψ = ♥¬ψ and ¬μX. ψ = νX.¬ψ[¬X/X], using ¬¬X = X.

Both the theoretical satisfiability checking algorithm and its implementation in COOL 2 operate on the *Fischer-Ladner closure* [21,24,27] of the target formula. The *alternation depth* (e.g. [21,29,33]) of a formula is the maximum depth of dependent alternating nestings of least and greatest fixpoints within the formula. Formulas with alternation depth 1 are *alternation-free*.

*Semantics:* Formulas are interpreted over F-coalgebras, that is, structures

$$(C, \xi: C \to FC),$$

where <sup>F</sup> : Set <sup>→</sup> Set is a functor determining the branching type of the systems at hand; thus ξ(x) ∈ F C encodes the transitions from x ∈ C, structured according to F. Modalities ♥ ∈ Λ of arity n are interpreted as *predicate liftings*, that is, families of maps -♥<sup>U</sup> : (2<sup>U</sup> )<sup>n</sup> <sup>→</sup> <sup>2</sup>F U (for <sup>U</sup> <sup>∈</sup> Set) that assign predicates on F U to n-tuples of predicates on U, subject to a *naturality* condition [35,40]. On a coalgebra (C, ξ), the semantics of formulas is defined inductively in the usual way for the propositional operators and fixpoints, and by -♥(ψ1,...,ψn) = ξ−<sup>1</sup>[-♥<sup>C</sup> (ψ1,..., ψn)] for modalities.

A closed formula ψ is *satisfiable* if there is a coalgebra (C, ξ) and a state x ∈ C such that x ∈ ψ. A formula ψ is *valid* if ¬ψ is not satisfiable.

**Example 2.1.**(1) The standard *modal* μ*-calculus* [24] is obtained using the functor F = P(A) × P, where A is a fixed set of atoms, the similarity type Λ = {♦, , a,¬a | a ∈ A}, and predicate liftings

$$\begin{aligned} \{\lozenge\}\_C(B) &= \{ (A, Z) \in 2^A \times 2^C \mid Z \cap B \neq \emptyset \} & \quad \{a\}\_C &= \{ (A, Z) \in 2^A \times 2^C \mid a \in A \} \\ \{\square\}\_C(B) &= \{ (A, Z) \in 2^A \times 2^C \mid Z \subseteq B \} & \quad \{\neg a\}\_C &= \{ (A, Z) \in 2^A \times 2^C \mid a \notin A \} \end{aligned}$$

The expressive power of the modal μ-calculus is demonstrated by the formulas

$$
\mu X.\nu Y.(p \land \Diamond Y) \lor \Diamond X \qquad\qquad \nu X.\mu Y.(p \land \Diamond X) \lor \Diamond Y.\Gamma
$$

The former is a co-B¨uchi formula expressing the existence of a path on which p holds forever, from some point on; the latter formula expresses the B¨uchi property that there is a path on which the atom p is satisfied infinitely often.


*Satisfiability Checking:* We proceed to recall the satisfiability checking algorithm for the coalgebraic μ-calculus that forms the basis of the implementation within COOL 2. This algorithm adapts the automata-based approach to satisfiability checking for the standard μ-calculus, and generalises the treatment of modal steps by parametrizing over a solver for the *one-step satisfiability* problem of the logic, which concerns satisfiability of formulae with exactly one layer of next-step modalities [21]. It thus avoids the necessity of tractable sets of tableaux rules for modal operators. Under mild assumptions on the complexity of the one-step satisfiability problem of the base logic at hand ('*tractability*'), the algorithm witnesses a, typically optimal, upper bound ExpTime for the complexity of the satisfiability problem; unlike a previous algorithm [4], the algorithm thus has optimal runtime also in cases where no tractable sets of modal tableaux rules are known, such as the graded (or, more generally, Presburger) μ-calculus (further cases of this kind include the probabilistic μ-calculus with polynomial inequalities [21] and the unrestricted form of the *alternating-time* μ*-calculus with disjunctive explicit strategies* [16]).

The algorithm constructs and solves a parity game that characterises satisfiability of the input formula χ. In this game one player attempts to construct a tableau structure for χ while the opposing player attempts to refute the existence of such a structure. Modal steps in this tableau construction are treated by using instances of the one-step satisfiability problem for the logic at hand, thereby generalising traditional modal tableau rules. The winning condition of the game is encoded by a non-deterministic parity automaton Aχ, reading infinite words that encode sequences of step-wise formula evaluations (so-called *formula traces*) within a coalgebra; such words encode branches in the constructed tableau structure. Conjunctions give rise to nondeterminism in this automaton, and the parity condition of the automaton is used to accept exactly those words that encode sequences of formula evaluations in which some least fixpoint is unfolded infinitely often. To use the language accepted by A<sup>χ</sup> as the winning condition in a parity game, we transform A<sup>χ</sup> to an equivalent deterministic parity automaton Bχ. This automaton then is paired with the tableau construction to yield a parity game in which the existential player aims to show the existence of a tableau structure in which all branches are rejected by Bχ, and that is built in such a way that modalities always are jointly one-step satisfiable. To ensure the latter property, the modal moves in the game invoke instances of the one-step satisfiability problem of the base logic. For more details on one-step satisfiability and the overall algorithm, see [17,21].

**Corollary 2.2 (**[21]**).** *Suppose that the one-step satisfiability problem is tractable. Then the satisfiability problem of the corresponding instance of the coalgebraic* μ*-calculus is in* ExpTime*.*

**Remark 2.3.** As mentioned above, previous algorithms for the coalgebraic μ-calculus (also implemented in COOL 2) rely on complete sets of modal tableau rules, specifically on one-step cutfree complete sets of so-called *onestep rules* [41]; such rules (in their incarnation as tableau rules) have a premiss with exactly one layer of modal operators and a purely propositional conclusion. A typical example is the usual tableau rule for the modal logic K: 'To satisfy a<sup>1</sup> ∧···∧ a<sup>n</sup> ∧ ¬a0, satisfy a<sup>1</sup> ∧···∧ a<sup>n</sup> ∧ ¬a0'. It has been shown that the existence of a tractable one-step cutfree complete set of one-step rules implies tractability of one-step satisfiability [29], i.e. the approach via one-step satisfiability is more general.

As indicated in the introduction, a tractable one-step cutfree complete set of one-step rules for graded modal logic has been claimed in the literature [41,43] but has since turned out to be incomplete; we give a counterexample in the full version [17]. (A similar rule for Presburger modal logic [28] has also been shown to be in fact incomplete [29].)

### **3 Implementation**

The previous version COOL [15] only implements fixpoint-free (coalgebraic) logics, such as standard modal logic, probabilistic modal logic, or coalition logic. The main novelty of the new version COOL 2, described here, is

– the addition of fixpoint constructs to the previously implemented logics, supporting alternation-free and aconjunctive fragments of the resulting μ-calculi, and implementing on-the-fly solving to allow early termination

– support for treating modal steps both by tableaux rules (when a suitable rule set exists), and by one-step satisfiability checking (in the remaining cases)

In more detail, COOL 2 is written in OCaml and implements the satisfiability checking algorithm described in Sect. 2, treating modal steps by solving instances of the one-step satisfiability problem<sup>1</sup>. For logics where a suitable set of modal tableau rules is implemented, those are used for the treatment of modal steps, rather than relying on one-step satisfiability (unless the user explicitly chooses otherwise); in these cases, COOL 2 essentially implements the algorithm described in [29]. The current implementation supports the alternationfree and the aconjunctive fragments of the standard μ-calculus (both serial and non-serial), the monotone μ-calculus [19], the alternating-time μ-calculus (i.e. coalition logic with fixpoint operators), and the graded μ-calculus. Tractable tableaux rules are available for all cases except for the graded μ-calculus, for which COOL 2 uses the one-step satisfiability algorithm to decide satisfiability. In particular, COOL 2 is the only existing reasoner for the graded μ-calculus (as well as the only reasoner covering the alternating-time μ-calculus beyond ATL).

The concrete logic used can be selected via a command-line parameter setting up the data structures in COOL 2 accordingly before parsing and checking the syntax of the given formula χ. COOL 2 then builds the determinised automaton Bχ, yielding the parity game described above in a step-wise manner, repeatedly adding nodes in *expansion steps* that explore the game. In the case of simpler alternation-free formulas, the Miyano-Hayashi method [31] is used to construct Bχ, resulting in asymptotically smaller games with a B¨uchi winning condition; for the more involved aconjunctive formulas, the implementation uses the permutation method for determinisation of limit-deterministic parity automata [9,22]. Nodes in the constructed game are marked as either unexpanded, undecided, unsatisfiable, or satisfiable.

Optional *solving steps* may take place at any point during the construction of Bχ, depending on runtime parameters of COOL 2; these steps compute the winning regions of the partial game that has been constructed so far and accordingly mark nodes as satisfiable or unsatisfiable, if possible. The reasoner terminates as soon as the initial node is marked satisfiable or unsatisfiable. If this does not allow for early termination, the game eventually becomes fully explored, at which point a final (obligatory) solving step for the complete game is guaranteed to mark the initial node, thereby ensuring termination.

We detail the implementation of the two main procedures within COOL 2.

*Implementation of Expansion Steps.* The propositional expansion steps in the game construction for nodes v are performed using the propositional satisfiability solver MiniSat [8] to compute a word that encodes consistent propositional formula manipulations for v. Afterwards, the successor of v in B<sup>χ</sup> under this word is computed and added to the game.

When the one-step satisfiability based algorithm of COOL 2 is used, modal expansion steps for nodes v create new game nodes for each subset κ of the

<sup>1</sup> Sources are available at https://git8.cs.fau.de/software/cool.

modalities that are to be jointly satisfied at v; this is done by computing the successor of v in B<sup>χ</sup> that is reached by manipulating each formula from κ.

When the tableau-based algorithm of COOL 2 is used, the modal expansion step for a node v instead computes all applications of a modal rule matching v and inserts, for each such rule application, and each conjunctive clause κ in the conclusion of the rule application, the new game node that is reached from v in B<sup>χ</sup> by manipulating the modalities that constitute κ. Intuitively, using tableau rules reduces the search space by only adding nodes found in the conclusion of some matching rule application.

Any node that is added by some expansion step is initially marked as undecided. Crucially, all expansion steps perform on-the-fly determinisation, that is, given a game node v and a word that encodes a sequence of formula manipulations, the newly added game node is computed using only the information stored in v.

*Implementation of Solving Steps.* A single solving step computes the winning regions in the parity game that has been constructed up to this point, and marks nodes accordingly. The game solving is done using either the parity game solver PGSolver [10] or a native implementation provided by COOL 2 that solves the game by fixpoint iteration.

If the one-step satisfiability-based algorithm is used, an assigned modal node v is satisfiable if its modalities are jointly one-step satisfiable in those successors of v that are satisfiable themselves. An enumerative representation of the game thus contains existential moves to all subsets Π of subsets of modalities of v that are sufficiently large for one-step satisfaction of the modalities of v, followed by universal moves to nodes induced by any κ ∈ Π; the full game thus is of doubly-exponential size. This can be avoided by inlining the modal steps, thereby evading the intermediate nodes Π. The winning region can then be computed in single-exponential time by using COOL 2's native fixpoint iteration over a function that computes the two-tiered modal steps in one go.

Decision procedures for the one-step satisfiability problems in the relational and the graded case are implemented in COOL 2 along the lines of the algorithms described in [21, Example 6] (in the graded case, nondeterministic guessing is replaced with a recursive search procedure).

If the algorithm based on modal tableau rules is used, the treatment of modal steps follows the tableaux-based algorithm that is given in [3]. States v are satisfiable if for all rule applications that match v, the conclusion of the application contains a conjunctive clause κ such that the node induced by κ is satisfiable.

COOL 2 also allows the user to specify the desired frequency of optional game solving steps, including the options once and adaptive. With the option once, no intermediate solving takes place so that the game is fully constructed and solved just once, at the very end of the execution. With the option adaptive, intermediate solving takes places, but the frequency of solving reduces as the size of the constructed graph increases; this option implements *on-the-fly* solving and allows for finishing early in cases where a small model or refutation exists.

### **4 Evaluation**

We conduct experiments in order to evaluate the performance of the various algorithms implemented in COOL in comparison with each other, as well as in comparison with other tools (where applicable).<sup>2</sup> Complete definitions of all formula series used in the evaluation as well as additional experimental results can be found in the full version [17].

*Experiments:* In a first experiment, we compare COOL 2 with the established reasoner FaCT++, which supports the description logic SROIQ(D) (subsuming fixpoint-free graded modal logic), using the following series of formulas from Snell et al. [43].

$$\mathsf{Cardinality}(n) := \langle n-1 \rangle \neg p \land \langle n-1 \rangle p \land [n] \neg q \land [n] q \tag{Sat}$$

$$\mathsf{CardinalityU}(n) := \langle n-1 \rangle \neg p \land \langle n-1 \rangle p \land [n] \neg q \land [n-1]q \tag{\text{UnSat}}$$

Intuitively, the satisfiable Cardinality(n) formulas express that there are at least 2n successors and that both q and ¬q are satisfied in at most n successors, each; similarly the unsatisfiable CardinalityU(n) formulas state that there are at least 2n successors, and that q and ¬q hold in at most n and n − 1 successors, respectively; the latter statements imply that there are at most 2n−1 successors, yielding a contradiction.

Going beyond next-step formulas, we continue by devising various complex series of graded μ-calculus formulas that involve (nested) fixpoints and express non-trivial properties of graded trees, automata and games.

– We obtain a series of unsatisfiable formulas by requiring the existence of an n + 1-branching tree in which p holds everywhere while at the same time requiring that this tree contains some state with n + 2 successors that satisfy p:

$$\mathsf{TreeU}(n) = (\nu X. \langle n \rangle (p \land X) \land [n+1] \neg p) \land (\mu Y. \langle n+1 \rangle p \lor \langle n \rangle (p \land Y)) \quad \text{(UnSat)}$$

– Next we turn our attention to graded formulas involving parity conditions. We devise a series of valid formulas expressing that graded parity automata can be transformed to graded B¨uchi automata accepting a superlanguage of the original automaton:

$$\mathsf{ParityToBuechi}(n,k) := \mathsf{Parity}(n,k) \to \mathsf{Buechi}(n,k) \tag{Vaild}$$

Here, Parity(n, k) encodes parity acceptance with k priorities and grade n while Buechi(n, k) expresses B¨uchi acceptance by a nondeterministic automaton that eventually guesses the maximal priority that occurs infinitely often; the negated formula <sup>¬</sup>ParityToBuechi(n, k) is unsatisfiable.

<sup>2</sup> Scripts and executables that allow for reproducing our experiments can be found at DOI 10.5281/zenodo.8042581.

– Rabin conditions are given by families of pairs i<sup>j</sup> , f<sup>j</sup> <sup>j</sup>≤<sup>k</sup> of sets i<sup>j</sup> , f<sup>j</sup> of states, and express the constraint that there is some j ≤ k such that states from i<sup>j</sup> (*infinite*) are visited infinitely often and states from f<sup>j</sup> (*finite*) are visited only finitely often. We can express Rabin conditions with k pairs (and onestep property ψ), B¨uchi properties and satisfaction of single Rabin-pairs by formulas Rabin(k, ψ), Buechi(f,ψ) and RabinPair(i, f, ψ), respectively. Then we obtain valid formulas stating that the existence of an n+ 1-branching tree that satisfies the Rabin condition on each path implies that there is a path satisfying a simpler B¨uchi condition or a single Rabin-pair, respectively:

$$\mathsf{RabinToBuechi}(k,n) := \mathsf{Rabin}(k,\langle n\rangle) \to \mathsf{Bucheh}(i\_1 \lor \dots \lor i\_k,\langle 0\rangle) \tag{Vailid}$$

$$\mathsf{Rabin}\mathsf{ToRpar}(k,n) := \mathsf{Rabin}(k,\langle n\rangle) \to \bigvee\_{1 \le j \le k} \mathsf{Rabin}\mathsf{Pair}(i\_j, f\_j, \langle 0\rangle) \tag{Valid}$$

– Coming to games, we specify the winning regions in graded B¨uchi and Rabin games by formulas BuechiG(f,n) and RabinG(k, n), respectively; in such graded games, players are required to have at least n winning moves at their nodes in order to win. The following valid formulas then express that winning strategies in graded Rabin games with k pairs guarantee that some node from i<sup>1</sup> ∪ ... ∪ i<sup>k</sup> is visited infinitely often:

$$\mathsf{RabinGame}(k,n) := \mathsf{RabinG}(k,n) \to \mathsf{BugechiG}(i\_1 \vee \ldots \vee i\_k, n) \tag{Vailid}$$

In a final experiment on alternating-time formulas, we compare COOL 2 with TATL [6] on the ATL example formulas given in [6] as well as on additional formula series. For instance, we turn the formula 1 Gp∧¬ 2 F 1 Gp (written here using ATL syntax) from [6] into a series Nest(n) with increasing number of nested operators; formulas then alternatingly are satisfiable and unsatisfiable:

$$
\chi(0) = p \qquad \chi(i+1) = \neg \langle 2 \rangle \\
\! \backslash F \langle 1 \rangle \!\backslash G \chi(i) \qquad \mathsf{Nest}(n) = \langle 1 \rangle \!\backslash G \\
p \land \chi(n),
$$

*Results:* We conducted all experiments on a virtual machine with four 2, 3GHz vCPUs processors and 8GB of RAM. We compare with a 64-bit binary of FaCT++ v1.6.5 and with TATL. We compute all results with a timeout of 60 seconds and average the results over multiple executions. For the execution and measurement we use hyperfine<sup>3</sup>. Below, 'COOL' and 'COOL on-the-fly' refer to invoking COOL 2 with solving rate once and adaptive, respectively.

Results for the Cardinality and CardinalityU series are shown in Fig. 1 and Fig. 2, respectively. From n = 10 and n = 8 onwards, COOL 2 outperforms FaCT++ considerably. An explanation for this could be that FaCT++ appears to treat multiplicities in a na¨ıve way while COOL 2 employs the more efficient one-step satisfiability algorithm.

Results for the unsatisfiable tree property are shown in Fig. 3. As these formulas contain fixpoint operators, a comparison with FaCT++ is not possible. While COOL 2 is generally capable of handling quite large branching factors, this experiment showcases the drawbacks of on-the-fly solving in the case that a formula cannot be decided early so that repeated attempts of solving the game early lead to overhead computations.

<sup>3</sup> https://github.com/sharkdp/hyperfine.

**Fig. 1.** Runtimes for Cardinality(n) **Fig. 2.** Runtimes for CardinalityU(n)

**Fig. 3.** Runtimes for TreeU(n) **Fig. 4.** Runtimes for <sup>¬</sup>ParityToBuechi(n, k)

Runtimes for COOL 2 (using on-the-fly solving) on the unsatisfiable series of parity formulas <sup>¬</sup>ParityToBuechi(n, k) are shown in Fig. 4. The results indicate that increasing the number of priorities k has a much stronger effect on the runtime than increasing multiplicities n in the modalities. This is in accordance with expectations as increasing k leads to much larger determinized automata and resulting satisfiabilty games, while increasing n only complicates the modal steps in the game while leaving the global game structure unchanged.

Results for the Rabin families of formulas are given in the table below, with † indicating a timeout of 60 s. COOL 2 is able to handle reasonably large formulas describing Rabin properties of automata and games, with the series for n = 1 expressing properties of standard automata (solved using tableau rules), and the series with n = 2 properties of graded automata with multiplicity 2 (solved using one-step satisfiability).

In accordance with previous experiments on random ATL formulas of larger sizes in [23], COOL 2 generally outperforms TATL by a large margin, starting from formulas containing at least five modalities or involving nesting of temporal operators; this trend is confirmed by Fig. 5 which shows the stepped execution times for the series Nest that alternates between being satisfiable and unsatisfiable


**Fig. 5.** Runtimes for the ATL series Nest(n)

In summary, COOL 2 shows promising performance in comparison to TATL and FaCT++, as well as for practical applicability. On graded formulas without fixpoints, COOL 2 scales much better than FaCT++ with regard to increasing multiplicities. In the presence of fixpoints, COOL 2 still scales well and can handle multiplicities that should be sufficient for practical use. The formula series <sup>¬</sup>ParityToBuechi appears to show the limits of COOL 2 with the current implementation of graded one-step satisfiability checking. Nonetheless, our results indicate that COOL 2 is capable of automatically proving or refuting involved properties of (graded) ω-automata and games in reasonable time.

### **5 Conclusion**

We have described and evaluated the current version COOL 2 of the *CO*algebraic *O*ntology *L*ogic reasoner (COOL). Future development will include the implementation of additional instance logics, such as the probabilistic and graded μ-calculus with linear inequalities, as well as support for the full coalgebraic μ-calculus via on-the-fly determinisation of *unrestricted* B¨uchi automata, using the Safra-Piterman construction.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Choose Your Colour: Tree Interpolation for Quantified Formulas in SMT**

Elisabeth Henkel1(B) , Jochen Hoenicke<sup>2</sup> , and Tanja Schindler<sup>3</sup>

<sup>1</sup> University of Freiburg, Freiburg im Breisgau, Germany henkele@informatik.uni-freiburg.de <sup>2</sup> Certora, Tel Aviv-Yafo, Israel jochen@certora.com <sup>3</sup> University of Li`ege, Li`ege, Belgium tanja.schindler@uliege.be

**Abstract.** We present a generic tree-interpolation algorithm in the SMT context with quantifiers. The algorithm takes a proof of unsatisfiability using resolution and quantifier instantiation and computes interpolants (which may contain quantifiers). Arbitrary SMT theories are supported, as long as each theory itself supports tree interpolation for its lemmas. In particular, we show this for the theory combination of equality with uninterpreted functions and linear arithmetic. The interpolants can be tweaked by virtually assigning each literal in the proof to interpolation partitions (colouring the literals) in arbitrary ways. The algorithm is implemented in SMTInterpol.

**Keywords:** Tree Interpolation · Quantified Formulas · SMT

### **1 Introduction**

Craig interpolants [7] were originally proposed to reason about proof complexity. In the last two decades, research reignited when interpolants proved useful for software verification, in particular for generating invariants [15]. Tree interpolants are useful for verifying programs with recursion [12], and for solving non-linear Horn-clause constraints [23], which can be used for thread modular reasoning [10,16] and verifying array programs [20]. For many verification problems, reasoning about first-order quantified formulas is needed. Quantified formulas are, among others, needed to model unsupported theories or to express global properties of arrays [19], for example, sortedness [3,24].

An interpolation problem is an unsatisfiable conjunction of several input formulas, the partitions of the interpolation problem. An interpolant summarises the contribution of a single or multiple partitions to the unsatisfiability. Interpolants can be computed from resolution proofs. However, most methods require localised proofs where each literal is associated with some input partition [22]. Proofs generated by SMT solvers, especially with quantifier instantiations, usually contain mixed terms and literals created during the solving process that cannot be associated with a single input formula.

In this paper, we extend our work on proof tree preserving sequence interpolation of quantified formulas [13]. The method presented therein allows for the computation of inductive sequence interpolants from instantiation-based resolution proofs of quantified formulas in the theory of uninterpreted functions. The key idea of this method is to perform a virtual modification of mixed terms introduced through quantifier instantiations, thus allowing to compute an inductive sequence of interpolants on a single, non-local proof tree.

We extend the interpolation algorithm to compute tree interpolants and to support arbitrary SMT theories (with the single restriction that such a theory itself must support tree interpolation for its lemmas). We simplify the treatment of mixed terms by virtually flattening all literals independently of the partitioning. We show that the literals can be coloured (assigned to a partition) arbitrarily, and that for every colouring, correct interpolants are produced. The interpolants contain quantifiers for the flattening variables that bridge different partitions, and by choosing colours sensibly the number of quantifiers can be reduced. In contrast to previous works [1,12] which produce tree interpolants by repeated binary interpolation and require multiple proofs, our method computes a tree interpolant from a single proof.

*Related Work.* Many practical algorithms to compute interpolants have been presented. We focus here on proof-based methods that either work in the presence of quantifiers, or that can compute tree interpolants, or both.

Our work builds on the method presented in [4] for computing interpolants from instantiation-based proofs in SMT. It is based on *purifying* quantifier instantiations by introducing variables for terms not fitting the partition, and adding defining auxiliary equalities as a new input clause in the proof. Our method introduces these variables and equalities only virtually for computing the partial interpolants. Thus, tree interpolants can be computed from a single proof of unsatisfiability, while in [4] a purified proof is required for each partition.

There exist several methods to compute interpolants for quantified formulas inductively from superposition-based proofs. In [2], each literal is given a *label* (similar to our colouring) used to project the clause to the different partitions. First, a *provisional* interpolant is computed that may contain local symbols. These symbols are replaced by quantified variables to obtain an interpolant. In contrast to our method, the approach only works when the provisional interpolants contain at most local constants, i.e., no local functions or predicates, and the assignment of labels is not flexible as our colouring. The method in [17] is based on a slightly modified proof, where substitution steps are done separately. First, a *relational* interpolant is computed, which may contain local function symbols, but only shared predicates. In logic without equality, or when the only local symbols are constants, the relational interpolant can be turned into an interpolant by quantifying over non-shared terms, respecting their dependencies.

A very different method based on summarising subproofs is presented in [9]. The proof is split into subproofs belonging to a single partition. The relevant subproofs are summarised in an *intermediant* stating that their premises imply their conclusion. If the subproofs contain only symbols of the respective partition, the resulting formula is an interpolant. If the proof can be split in that way, the method works for any theory and proof system, but for tree interpolation, a different proof would be required for each partitioning.

Tree interpolants can be computed by repeated binary interpolation from formulas where the children interpolants are included, as discussed in [12]. In the propositional setting, [11] discusses under which conditions sets of interpolants with certain relations, such as tree interpolants, can be obtained by binary interpolation on different partitionings of the same formula. The method is implemented in OpenSMT, but the solver, and therefore the interpolation engine, does not support quantifiers.

A general framework for computing tree interpolants for ground formulas from a single proof has been presented in [5]. It works for combinations of equality-interpolating theories and is based on projecting *mixed literals* using auxiliary variables and predicates. Additionally, the rule for computing a resolvent's interpolant from its antecedents' interpolants is more involved. The method cannot deal with quantifier instantiations, nor with terms mixing subterms from different partitions. We discuss in Sect. 6 how it can be combined with the interpolation method for quantified formulas presented in this paper.

The first implementation of a tree interpolation algorithm in the presence of quantifiers and theories was in Vampire [1]. It is based on repeatedly computing binary interpolants for modified interpolation problems, similar to [12]. For each binary computation, the proof must be localised in order to be able to compute interpolants. In contrast, our method computes tree interpolants in one go from a single proof that has been obtained without knowledge of the partitioning of the tree interpolation problem. To the best of our knowledge, Vampire is the only other tool that is able to compute tree interpolants in the presence of quantifiers.

### **2 Notation**

We assume that the reader is familiar with first-order logic. We define a *theory* T by its *signature*, that contains constant, function and predicate symbols, and its set of *axioms*, closed formulas that fix the meaning of those function and predicate symbols that are *interpreted* by the theory.

<sup>A</sup> *term* is a variable or the application of an n-ary function symbol to n terms. An *atom* is the application of an n-ary predicate to n terms, and a *literal* is an atom or its negation. A *clause* is a disjunction of literals, and a formula is in *conjunctive normal form* (CNF) if it is a conjunction of clauses. We use - (resp. ⊥) for the formula that is always true (resp. false).

We will demonstrate our algorithm using the theory of equality, and the theory of linear arithmetic (with rationals and/or integers). The theory of equality establishes reflexivity, symmetry, and transitivity of the equality predicate =, and congruence for each *uninterpreted* function symbol. For simplicity of the presentation, uninterpreted constants are considered as 0-ary functions, and uninterpreted predicate symbols as uninterpreted functions with Boolean return value. The theory of linear arithmetic contains the predicates <sup>≤</sup>, <, rational constants c, the binary addition function +, and a family of unary multiplication functions c·, one for each rational constant c. These symbols have their usual semantics, and the main theory lemmas are trichotomy (x<y <sup>∨</sup> x <sup>=</sup> y <sup>∨</sup> x>y) and a variant of Farkas lemma. For simplicity, we apply arithmetic conversions implicitly and treat x <sup>≤</sup> y and y <sup>≥</sup> x and 1 · x + (−1)· y <sup>≤</sup> 0 as the same literal, and x>y as its negated literal.

We denote constants by a, b, c, functions by f, g, h, variables by v, x, y, z, and terms by s, t. We use for literals, C for clauses, and φ, F, I for formulas.

For a term t, the outermost (or *head*) function symbol is denoted by *hd*(t). The set of all uninterpreted function symbols occurring in a formula F is *symb*(F) and the set of all free variables in F is *FreeVars*(F). The result of substituting in a formula F each occurrence of a variable x by a term t is denoted by F{x → t}. By x¯ and t ¯, we denote the list of variables <sup>x</sup>1,...,xn and terms <sup>t</sup>1,...tn, respectively. We use the symbol ≡ to denote equivalence between formulas, and to assign a formula to a formula variable.

### **3 Preliminaries**

*Craig Interpolation.* A binary *Craig interpolant* [7] for an unsatisfiable conjunction A <sup>∧</sup> B is a formula I that is implied by A, contradicts B, and contains only symbols that occur in both A and B. A generalisation are tree interpolants, which introduce several partitions in a tree-like structure.

**Definition 1 (Tree interpolation).** *<sup>A</sup>* tree interpolation problem (V,E,F) *is a labelled binary tree where* V *is a set of nodes connected by directed edges* E <sup>⊆</sup> V <sup>×</sup> V *pointing towards the root node. Every node except for the root node has one outgoing edge to its parent, and each non-leaf node has exactly two incoming edges. The* partitions P <sup>⊆</sup> V *of the tree interpolation problem are the leaf nodes. The labelling function* F *assigns a formula to each partition* p <sup>∈</sup> P *of the tree such that their conjunction is unsatisfiable. We use st*(v) <sup>⊆</sup> P *to denote the set of leaves in the subtree of the node* v*, i.e., the set of leaves for which a path to the node* v *exists.*

*<sup>A</sup>* tree interpolant *for the interpolation problem* (V,E,F) *is a labelling function* I *for all nodes with the following properties:*


*Remarks.* In contrast to the earlier definition of tree interpolation [1,5], only the leaves of the tree are labelled by F here. A tree interpolation problem with labelled inner nodes can be transformed to our formalism by adding a leaf child to each such node. A non-binary tree can be extended to a binary tree by adding more internal nodes. If the interpolants of the newly created nodes are ignored, the remaining interpolants are tree interpolants according to the earlier definition for tree interpolation.

A binary interpolant of A and B corresponds to the tree interpolant of the tree containing just two leaves A and B, more precisely, it is the interpolant labelled to the first leaf. Vice versa, each interpolant I(v) of a tree interpolant is also a binary interpolant of the formulas in the partitions A := *st*(v) and <sup>A</sup><sup>c</sup> := <sup>P</sup> \ *st*(v). Since the set <sup>A</sup> defines <sup>v</sup> uniquely, we can also use <sup>I</sup>A to denote I(v). We call a symbol A*-local* if it only occurs in partitions in A, A<sup>c</sup>*-local* if it only occurs in partitions in A<sup>c</sup>, and *shared* if it occurs in both. The interpolant may only contain shared symbols.

*Theory Combination.* We assume that the solver uses Nelson–Oppen style theory combination sharing equalities without explicitly introducing auxiliary variables, and that each lemma in the proof belongs to one theory. Subterms in these lemmas containing symbols from a different theory are treated as if they were auxiliary variables. We further assume that there is a theory-specific interpolation procedure for the lemmas. In this paper, we do not have the assumption that theories are equality-interpolating. We introduce quantifiers in the interpolants for such theories. However, our approach can also be combined with equality-interpolating theories and corresponding procedures to avoid quantifiers, see Sect. 6.

*CNF Transformation and Quantifiers.* We assume that complex input formulas are transformed to CNF by Tseitin-encoding, which introduces Boolean proxy atoms. Existentially quantified variables are replaced with Skolem constants or functions (if nested under a universal quantifier) and conjunctions are lifted over universal quantifiers. Complex subformulas under a universal quantifier are replaced by uninterpreted predicates, taking as arguments the quantified variables. Quantified Tseitin-style axioms give the meaning for these predicates. Thus, we end up with quantified clauses of the form <sup>∀</sup>x. - ¯ <sup>1</sup>(¯x) ∨ ··· ∨ n(¯x), which we treat as a proxy literal. Instances of quantified clauses are created using instantiation lemmas of the form <sup>¬</sup>(∀x. - ¯ <sup>1</sup>(¯x) ∨···∨ n(¯x)), -1(t ¯),...,n(t ¯) where t ¯ are ground terms. Note that the proxy atom for a quantified formula occurs only positively in input clauses and negated in instantiation lemmas. We note that all preprocessing steps are done locally for each input formula, and that auxiliary predicates and Skolem functions are fresh predicate or function symbols. An interpolant of the preprocessed formulas is also an interpolant of the original formulas, because the auxiliary symbols are not shared between different input formulas and will never appear in the interpolant.

*Proofs.* A *resolution proof* for the unsatisfiability of a formula in CNF is a derivation of the empty clause ⊥ using the resolution rule

$$\begin{array}{cc} C\_1 \lor \ell & C\_2 \lor \neg \ell\\ \hline C\_1 \lor C\_2 \\ \dots & \dots & \dots \end{array}$$

where <sup>C</sup><sup>1</sup> and <sup>C</sup><sup>2</sup> are clauses, and is a literal called the *pivot* (literal). A resolution proof can be represented by a tree, or more generally, if the same subproof is used more than once, by a directed acyclic graph (DAG). In our setting, the DAG has three types of leaves: *input clauses*, *theory lemmas*, i.e., clauses that are valid in the theory T , and *instantiation lemmas* of the form <sup>¬</sup>(∀x.φ ¯ (¯x)) <sup>∨</sup> φ(t ¯). The inner nodes are clauses obtained by resolution, and the unique root node is the empty clause ⊥.

Binary interpolants can be computed from a resolution proof by computing so-called partial interpolants for each clause. Each proof step proves a clause C as a consequence of the input A∧B, hence it proves that A∧B∧¬C is unsatisfiable. If each literal in the proof is assigned to, or *coloured* with, either partition A or B, a *partial interpolant* for each intermediate step is the interpolant of A <sup>∧</sup> <sup>¬</sup>C - A and B ∧ ¬C - B, where the projection <sup>¬</sup>C - A extracts from the conjunction <sup>¬</sup>C all literals that are coloured with partition A. McMillan showed for propositional logic that partial interpolants (cf. Definition 2 in [18]) can be computed recursively for each resolution step as the disjunction of the partial interpolants of the antecedents if the pivot is coloured as A, and their conjunction if it is coloured as B.

### **4 Colouring of Terms and Literals**

In this section, we fix an interpolation problem (V,E,F), with partitions P <sup>⊆</sup> V . We use the following example to illustrate our interpolation algorithm.

*Example 1 (Running example).* Take the tree interpolation problem with nodes V <sup>=</sup> {123, <sup>1</sup>, <sup>23</sup>, <sup>2</sup>, <sup>3</sup>} and edges E <sup>=</sup> {(1, 123),(23, 123),(2, 23),(3, 23)} (see also Fig. 1), where the partitions <sup>P</sup> <sup>=</sup> {1, <sup>2</sup>, <sup>3</sup>} are labelled with <sup>F</sup>(p) <sup>≡</sup> <sup>φ</sup>p where

$$
\phi\_1 \equiv \forall x. \; g(h(x)) \le x, \quad \phi\_2 \equiv \forall y. \; g(y) \ge b, \quad \phi\_3 \equiv \forall z. \; f(g(z)) \ne f(b).
$$

The conjunction of the three formulas is unsatisfiable. Instantiating <sup>φ</sup><sup>1</sup> with <sup>b</sup> gives <sup>g</sup>(h(b)) <sup>≤</sup> <sup>b</sup>. Instantiating <sup>φ</sup><sup>2</sup> with <sup>h</sup>(b) gives <sup>g</sup>(h(b)) <sup>≥</sup> <sup>b</sup>. Together they imply <sup>g</sup>(h(b)) = <sup>b</sup>. However, this contradicts <sup>φ</sup><sup>3</sup> instantiated with <sup>h</sup>(b). This proof creates, among others, the new literal g(h(b)) <sup>≤</sup> b. The term g(h(b)) contains function symbols that do not occur in a common partition.

We recall that by *symb*(F(p)), we denote the uninterpreted function symbols occurring in the formula F(p). We also keep track of the partitions where a symbol occurs:

**Definition 2 (Partitions).** *The* partitions of a function symbol f *are the partitions where this symbol occurs:*

$$ partitions(f) = \{ p \in P \mid f \in symbol(F(p)) \}.$$

McMillan's interpolation algorithm assumes that all symbols of a literal occur in one partition, such that the literal can be coloured with that partition. This is no longer the case in SMT, because new literals are created during the proof search, especially in the presence of instantiation lemmas. Our solution to this problem is to split each literal into many smaller literals and assign each of them to a partition. To keep the presentation simple, we flatten all (non-proxy) literals using a fresh variable for each application term. Thus, for every term t occurring in the resolution proof, we create a fresh variable <sup>v</sup>t and associate with it a set of flattening equalities. In each literal, the top-level terms are replaced with their associated variable, and the defining equalities are conjoined.

**Definition 3 (Flattening).** *For a term* <sup>t</sup>*, we introduce a fresh variable* <sup>v</sup>t*, and similarly for all its subterms. The associated set of flattening equalities FlatEQ*(t) *is defined as follows:*

$$FlatEQ(t) = \{v\_{f(t\_1, \ldots, t\_n)} = f(v\_{t\_1}, \ldots, v\_{t\_n}) \mid f(t\_1, \ldots, t\_n) \text{ is a subterm of } t\}.$$

*The flattened version of a literal is*

$$flatten(\ell) \equiv \begin{cases} v\_{t\_1} = v\_{t\_2} & \text{if } \ell \equiv t\_1 = t\_2\\ c\_1 \cdot v\_{t\_1} + \dots + c\_n \cdot v\_{t\_n} \le c & \text{if } \ell \equiv c\_1 \cdot t\_1 + \dots + c\_n \cdot t\_n \le c \end{cases}$$

*and the associated set of flattening equalities is as follows*

$$FlatEQ(\ell) = \begin{cases} FlatEQ(t\_1) \cup FlatEQ(t\_2) & \text{if } \ell \equiv t\_1 = t\_2\\ FlatEQ(t\_1) \cup \dots \cup FlatEQ(t\_n) & \text{if } \ell \equiv c\_1 \cdot t\_1 + \dots + c\_n \cdot t\_n \le c. \end{cases}$$

*The flattened version of a negated literal is the negation of the flattened literal, i.e., flatten*(¬-) ≡ ¬*flatten*(-)*. The set of flattening equalities for a negated literal is the set of flattening equalities for the literal itself, i.e., FlatEQ*(¬-) = *FlatEQ*(-)*.*

The conjunction of the equalities in *FlatEQ*(t) implies that <sup>v</sup>t <sup>=</sup> <sup>t</sup>. Similarly, the conjunction *flatten*(-)∧ *FlatEQ*(-) implies the literal and is equisatisfiable to -. Proxy literals like quantified formulas are not flattened, as they will never occur in a partial interpolant. For such a proxy literal, *flatten*(∀x.φ(x)) ≡ ∀x.φ(x) and *FlatEQ*(∀x.φ(x)) = <sup>∅</sup>.

*Example 2 (Flattening).* Consider the literal g(h(b)) <sup>≤</sup> b. Its flattened version is *flatten*(g(h(b)) <sup>≤</sup> <sup>b</sup>) <sup>≡</sup> <sup>v</sup>g(h(b)) <sup>≤</sup> <sup>v</sup>b, and the set of flattening equalities is

$$\begin{aligned} FlatEQ(g(h(b)) \le b) &= FlatEQ(g(h(b))) \cup FlatEQ(b) \\ &= \{v\_{g(h(b))} = g(v\_{h(b)}), v\_{h(b)} = h(v\_b), v\_b = b\}. \end{aligned}$$

To define partial interpolants, we colour each atom with some partition, denoted by *colour* (-) <sup>∈</sup> P. The negated atom always has the same colour. For proxy atoms created during the CNF conversion, it is important to colour them with the input partition from which they were created. The colour of other literals can be chosen arbitrarily, but a good heuristic would choose a partition where most of the outermost function symbols occur. Each flattening equality is associated with all partitions where the corresponding function symbol occurs. The *projection* of auxiliary equations on a partition p, denoted by *FlatEQ*(-) p, is defined as the conjunction of the equalities (vf(t1,...,t*n*) <sup>=</sup> <sup>f</sup>(vt<sup>1</sup> ,...,vt*<sup>n</sup>* )) <sup>∈</sup> *FlatEQ*(-) where p <sup>∈</sup> *partitions*(f).

Finally, we define the projection of a literal to a partition p. The *projection kernel* - -<sup>−</sup> p is *flatten*(-) if p <sup>=</sup> *colour* (-) or otherwise. The *projection of to* p is defined as - p <sup>≡</sup> - -<sup>−</sup> p <sup>∧</sup> *FlatEQ*(-) p. We define the projection to a set of partitions - - A with A <sup>⊆</sup> P (and similarly - -<sup>−</sup> A) as the conjunction of all projections - p with p <sup>∈</sup> A. For a conjunction of literals F <sup>≡</sup> -<sup>1</sup> ∧···∧ n, we define F p <sup>≡</sup> -<sup>1</sup> p ∧···∧ n p and similar for F - A, F -<sup>−</sup> p and F -<sup>−</sup> A.

*Example 3 (Projection of literals).* Consider again the literal g(h(b)) <sup>≤</sup> b from our running example (Example 1), and assume that we arbitrarily assign it to partition 2, i.e., *colour* (g(h(b)) <sup>≤</sup> b) = 2. We have *partitions*(g) = {1, <sup>2</sup>, <sup>3</sup>}, *partitions*(h) = {1} and *partitions*(b) = {2, <sup>3</sup>}. The projections are hence:

$$\begin{aligned} g(h(b)) \le b \mid 1 \equiv v\_{g(h(b))} = g(v\_{h(b)}) \land v\_{h(b)} = h(v\_b) \\ g(h(b)) \le b \mid 2 \equiv v\_{g(h(b))} \le v\_b \land v\_{g(h(b))} = g(v\_{h(b)}) \land v\_b = b \\ g(h(b)) \le b \mid 3 \equiv v\_{g(h(b))} = g(v\_{h(b)}) \land v\_b = b \end{aligned}$$

Similar to the last paragraph in Sect. 3, we define a partial interpolant of a clause C as an interpolant of the input problem and <sup>¬</sup>C. More precisely, it is the tree interpolant of a slightly modified tree interpolation problem, where the projection <sup>¬</sup>C p is added to each leaf node p <sup>∈</sup> P. Since this step adds flattening variables potentially shared between several partitions, these variables can occur in the interpolants. The following definition accounts for the variables occurring in the projection of a clause.

**Definition 4 (Supported variable).** *We call a variable* <sup>v</sup>t supported by a clause C *if its corresponding term* t *is a subterm of a non-proxy literal in* C*.*

The partial tree interpolant of a clause <sup>C</sup> may then contain a variable <sup>v</sup>t as long as it is supported by the clause C.

**Definition 5 (Partial tree interpolant).** *A* partial tree interpolant *for a clause* C *is a tree interpolant as defined in Definition <sup>1</sup> for the tree interpolation problem* (V,E,F ) *where the leaves are labelled with* F (p) <sup>≡</sup> F(p) ∧ ¬C p*. For the symbol condition, all variables supported by the clause may occur in all partial interpolants.*

### **5 Interpolation for Quantified Formulas**

In the following, we describe how to compute tree interpolants for instantiationbased resolution proofs. We assume that each literal has been assigned to exactly one partition of the tree interpolation problem, as described in the previous section. Following McMillan's algorithm, we compute partial tree interpolants inductively over the proof tree. The leaves of the proof tree are theory lemmas, for which we use theory-specific interpolation procedures, or they are input clauses or instantiation lemmas, for which we compute partial tree interpolants as described below. The inner nodes are obtained by resolution steps, for which we follow McMillan's algorithm to combine interpolants, and additionally treat variables that violate the symbol condition, as described later in this section.

#### **5.1 Interpolation Algorithm**

We start by explaining how the interpolants for leaf nodes are computed. Our algorithm computes interpolants separately for each node v <sup>∈</sup> V in the tree interpolation problem. As mentioned in the preliminaries, we set A <sup>=</sup> *st*(v) and use <sup>I</sup>A to denote the interpolant <sup>I</sup>(v).

*Input Clauses.* We assume that each input clause occurs in exactly one partition. The partial tree interpolant for an input clause C from partition p is given by <sup>I</sup>A ≡ ¬(¬<sup>C</sup> -<sup>−</sup> <sup>A</sup><sup>c</sup>) if <sup>p</sup> <sup>∈</sup> <sup>A</sup>, and <sup>I</sup>A ≡ ¬<sup>C</sup> -<sup>−</sup> A if p ∈ A.

Note that the literals can be assigned to a different partition than the clause. Although it makes sense to assign a literal to the same partition as the input clause it occurs in, this is not possible when the literal occurs in several input clauses. Therefore, the above formulas are not necessarily or ⊥. Proxy literals always have the same colour as the input clause and will therefore never appear in the interpolant.

*Instantiation Lemmas.* The partial tree interpolant for an instantiation lemma C obtained from a quantified input clause <sup>∀</sup>x.φ(x) from partition *colour* (∀x.φ(x)) is computed in the same way as for input clauses.

*Theory Lemmas.* We only require that for each theory one can compute a partial tree interpolant for its lemmas, or to be more precise, the flattened negated lemmas. Thus, we can reuse any existing procedure. For self-containment, we cover transitivity, congruence, trichotomy and Farkas lemmas, which are the kind of lemmas our solver produces for the theory of equality and linear arithmetic.<sup>1</sup>

For a *transitivity* lemma with the corresponding conflict <sup>¬</sup><sup>C</sup> <sup>≡</sup> <sup>t</sup><sup>1</sup> <sup>=</sup> <sup>t</sup><sup>2</sup> <sup>∧</sup> ···∧ <sup>t</sup>n−<sup>1</sup> <sup>=</sup> <sup>t</sup>n <sup>∧</sup> <sup>t</sup><sup>1</sup> <sup>=</sup> <sup>t</sup>n we can ignore the auxiliary equations introduced by flattening the terms, as the projection kernel is also a transitivity lemma. A partial tree interpolant is computed by summarising for each A the chains of the flattened equalities (and, if applicable, the single disequality) that are assigned to a partition <sup>p</sup> <sup>∈</sup> <sup>A</sup>. More precisely, let <sup>i</sup><sup>1</sup> <sup>&</sup>lt; ··· < im be the boundary indices such that *colour* (ti*j*−<sup>1</sup> <sup>=</sup> <sup>t</sup>i*<sup>j</sup>* ) <sup>∈</sup> <sup>A</sup> and *colour* (ti*<sup>j</sup>* <sup>=</sup> <sup>t</sup>i*j*+1) <sup>∈</sup>/ <sup>A</sup> or vice versa. Set <sup>i</sup><sup>1</sup> = 1 if <sup>t</sup><sup>1</sup> <sup>=</sup> <sup>t</sup>n and <sup>t</sup><sup>1</sup> <sup>=</sup> <sup>t</sup><sup>2</sup> are in different partitions and <sup>i</sup>m <sup>=</sup> <sup>n</sup> if <sup>t</sup>n−<sup>1</sup> <sup>=</sup> <sup>t</sup>n

<sup>1</sup> Branches in linear integer arithmetic [8] are decisions on inequality literals and are handled by our resolution rule.

and <sup>t</sup><sup>1</sup> <sup>=</sup> <sup>t</sup>n are in different partitions. If <sup>m</sup> = 0, then all colours of the equalities are in A and the interpolant is <sup>⊥</sup>, or they are all in A<sup>c</sup> and the interpolant is -. Otherwise, the interpolant summarises the equalities between the boundary indices that have a colour in <sup>A</sup>: if *colour* (t<sup>1</sup> <sup>=</sup> <sup>t</sup>n) <sup>∈</sup>/ <sup>A</sup>, then the interpolant is <sup>I</sup><sup>A</sup> <sup>≡</sup> <sup>v</sup>i<sup>1</sup> <sup>=</sup> <sup>v</sup>i<sup>2</sup> <sup>∧</sup> <sup>v</sup>i<sup>3</sup> <sup>=</sup> <sup>v</sup>i<sup>4</sup> ∧···∧ <sup>v</sup>i*m*−<sup>1</sup> <sup>=</sup> <sup>v</sup>i*<sup>m</sup>*, otherwise the interpolant is <sup>I</sup><sup>A</sup> <sup>≡</sup> <sup>v</sup>i<sup>2</sup> <sup>=</sup> <sup>v</sup>i<sup>3</sup> ∧···∧ <sup>v</sup>i*m*−<sup>2</sup> <sup>=</sup> <sup>v</sup>i*m*−<sup>1</sup> <sup>∧</sup> <sup>v</sup>i*<sup>m</sup>* <sup>=</sup> <sup>v</sup>i<sup>1</sup> . Here, <sup>v</sup><sup>i</sup> denotes the auxiliary variable introduced for <sup>t</sup>i.

The flattened version of the conflict corresponding to a *congruence* lemma <sup>C</sup> <sup>≡</sup> <sup>f</sup>(t1,...,tn) = <sup>f</sup>(s1,...,sn) <sup>∨</sup> <sup>t</sup><sup>1</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> ∨···∨ <sup>t</sup>n <sup>=</sup> <sup>s</sup>n is

$$\begin{aligned} v\_{f(t\_1,\dots,t\_n)} &\neq v\_{f(s\_1,\dots,s\_n)} \land v\_{t\_1} = v\_{s\_1} \land \dots \land v\_{t\_n} = v\_{s\_n} \\ \land v\_{f(t\_1,\dots,t\_n)} &= f(v\_{t\_1},\dots,v\_{t\_n}) \land v\_{f(s\_1,\dots,s\_n)} = f(v\_{s\_1},\dots,v\_{s\_n}) \\ \land \bigwedge \{\ell \mid \ell \in FlatEQ(t), t \in \{t\_1,\dots,t\_n,s\_1,\dots,s\_n\}\}. \end{aligned}$$

Note that the formula is still a congruence conflict if we drop the last line. Consequently, the flattening equalities for the arguments of the f-applications, and for their subterms, are not needed in the computation of a partial interpolant, they only establish the implication between the flattened and the original lemma. To obtain a partial tree interpolant, we first choose an arbitrary partition <sup>p</sup>f <sup>∈</sup> *partitions*(f). The partial tree interpolant is computed as follows.

$$I\_A \equiv \begin{cases} \neg(\neg C \downarrow^- A^c) & \text{if } p\_f \in A\\ \neg C \downarrow^- A & \text{otherwise} \end{cases}$$

For a *trichotomy* lemma <sup>C</sup> <sup>≡</sup> <sup>t</sup><sup>1</sup> <sup>=</sup> <sup>t</sup><sup>2</sup> <sup>∨</sup> <sup>t</sup><sup>1</sup> > t<sup>2</sup> <sup>∨</sup> <sup>t</sup><sup>1</sup> < t<sup>2</sup>, both <sup>I</sup>A ≡ ¬<sup>C</sup> -<sup>−</sup> A and I A ≡ ¬(¬<sup>C</sup> -<sup>−</sup> A<sup>c</sup>) are partial interpolants. We can always choose the projection that contains at most one literal.

<sup>A</sup> *Farkas* lemma has the form <sup>C</sup> ≡ ¬(s<sup>1</sup> <sup>≤</sup> <sup>b</sup><sup>1</sup>) ∨···∨¬(sn <sup>≤</sup> <sup>b</sup>n) where <sup>s</sup>i is of the form <sup>c</sup>i<sup>1</sup> · <sup>v</sup><sup>1</sup> <sup>+</sup> ... <sup>+</sup> <sup>c</sup>im · <sup>v</sup>m and <sup>b</sup>i, cij are numeric (integer) constants. It is a valid lemma if there are Farkas coefficients (numeric integer constants) <sup>k</sup><sup>1</sup>,...,kn <sup>&</sup>gt; 0 with <sup>n</sup> i=1 <sup>k</sup><sup>i</sup> · <sup>s</sup><sup>i</sup> = 0 and <sup>n</sup> i=1 <sup>k</sup><sup>i</sup> · <sup>b</sup><sup>i</sup> <sup>&</sup>lt; 0. We assume that the lemma is flattened and all <sup>v</sup>i are variables. The flattening equalities can be omitted from the lemma without changing its validity. For a set of partitions A, we denote by <sup>L</sup>A := {<sup>i</sup> <sup>|</sup> *colour* (si <sup>≤</sup> <sup>b</sup>i) <sup>∈</sup> <sup>A</sup>} the indices where <sup>s</sup>i <sup>≤</sup> <sup>b</sup>i is <sup>A</sup>local. The partial tree interpolant for a Farkas lemma is computed by summing up the <sup>A</sup>-local literals multiplied by their Farkas coefficients. We obtain <sup>I</sup>A <sup>≡</sup> ( <sup>i</sup>∈L*<sup>A</sup>* <sup>k</sup><sup>i</sup> · <sup>s</sup>i) <sup>≤</sup> ( <sup>i</sup>∈L*<sup>A</sup>* <sup>k</sup><sup>i</sup> · <sup>b</sup>i). Variables whose coefficients sum to zero are removed from the inequality. If A contains all inequalities, they sum up to the conflict 0 ≤ <sup>n</sup> i=1 <sup>k</sup><sup>i</sup> · <sup>b</sup><sup>i</sup> and we set <sup>I</sup><sup>A</sup> ≡ ⊥.

**Theorem 1.** *The interpolants as defined in this section are valid partial tree interpolants for the respective leaf nodes.*

The proof for this theorem is a straight-forward case distinction over the type of leaf node. Details can be found in [14].

*Resolution Steps.* In a resolution step, we obtain the partial interpolant of the resolvent using the partial interpolants of the premises.

$$\frac{C\_1 \lor \ell : I\_A^1 \qquad C\_2 \lor \neg \ell : I\_A^2}{C\_1 \lor C\_2 : I\_A^3}$$

A

As the first step, we follow McMillan's algorithm and combine the interpolants of the premises either with ∨ or with ∧ depending on whether the pivot literal is A or A<sup>c</sup>-local. For tree interpolants, this is done separately for each node of the tree interpolation problem, and a literal is seen as A-local if its colour is one of the leaves in the subtree of the node.

$$I\_A^3 \equiv \begin{cases} I\_A^1 \lor I\_A^2 & \text{if } colour(\ell) \in A \\ I\_A^1 \land I\_A^2 & \text{if } colour(\ell) \notin A \end{cases}$$

The formula I<sup>3</sup> A computed above may still contain variables supported by the antecedents that are no longer supported by the resolvent <sup>C</sup><sup>1</sup>∨C<sup>2</sup>. Each of those *unsupported* variables must either be replaced by its definition or bound by a quantifier in the partial tree interpolant. More precisely, let <sup>v</sup>t be an unsupported variable such that t is not a subterm of t with <sup>v</sup>t- <sup>∈</sup> *FreeVars*(I<sup>3</sup> A). This variable must always exist, as there is always an outermost unsupported variable. Let <sup>t</sup> <sup>=</sup> <sup>f</sup>(t1,...,tn). We replace <sup>I</sup><sup>3</sup> A as follows:

$$I\_A^3 \equiv \begin{cases} \exists x. I\_A^3 \{ v\_t \mapsto x \} & \text{if } f \text{ is } A\text{-local, i.e., } partitions(f) \subseteq A, \\ \forall x. I\_A^3 \{ v\_t \mapsto x \} & \text{if } f \text{ is } A^c\text{-local, i.e., } partitions(f) \cap A = \emptyset, \\ I\_A^3 \{ v\_t \mapsto f(v\_{t\_1}, \dots, v\_{t\_n}) \} & \text{if } f \text{ is shared (otherwise)}. \end{cases}$$

We do this repeatedly for all variables in *FreeVars*(I<sup>3</sup> A) that are unsupported. The variables may be treated in any order that respects the partial order induced by the subterm relation as described above. However, all interpolants of the tree interpolant must use the same order.

**Theorem 2.** *If* I<sup>1</sup> A *is a partial tree interpolant of* <sup>C</sup><sup>1</sup> <sup>∨</sup> *and* I<sup>2</sup> A *is a partial tree interpolant of* <sup>C</sup><sup>2</sup> ∨ ¬-*, then* I<sup>3</sup> A *as computed above, after the removal of unsupported variables, is a partial tree interpolant of* <sup>C</sup><sup>1</sup> <sup>∨</sup> <sup>C</sup><sup>2</sup>*.*

The proof for this theorem is given in [14].

*Example 4 (Resolution).* Take the running example and suppose - <sup>≡</sup> g(h(b)) = b is the pivot, I<sup>1</sup> {1} <sup>≡</sup> <sup>v</sup>g(h(b)) <sup>≤</sup> <sup>v</sup><sup>b</sup> and <sup>I</sup><sup>2</sup> {1} ≡ -. The interpolants are combined as I<sup>1</sup> {1} <sup>∧</sup> <sup>I</sup><sup>2</sup> {1} since *colour* (-) ∈ {1}. This results in the interpolant <sup>v</sup>g(h(b)) <sup>≤</sup> <sup>v</sup>b. After the resolution step, we assume that <sup>v</sup>g(h(b)), vh(b), vb are no longer supported. The outermost variable is <sup>v</sup>g(h(b)), which must be replaced by its definition: <sup>g</sup>(vh(b)) <sup>≤</sup> <sup>v</sup>b. Now <sup>v</sup>h(b) is bound by a quantifier, and since <sup>h</sup> only occurs in partition 1, an existential quantifier is used: <sup>∃</sup>y. g(y) <sup>≤</sup> <sup>v</sup>b. In the final step, <sup>v</sup>b is bound by a universal quantifier since <sup>b</sup> does not occur in 1, yielding <sup>∀</sup>x.∃y. g(y) <sup>≤</sup> x.

Note that the order of eliminating variables is important. If <sup>v</sup>b had been chosen in the first step despite occurring in *FlatEQ*(g(h(b))), the resulting formula would have been <sup>∃</sup>y.∀x.g(y) <sup>≤</sup> x. This formula is not logically equivalent and is indeed not a valid interpolant, as it does not follow from <sup>∀</sup>x.g(h(x)) <sup>≤</sup> x.

**Fig. 2.** Resolution proof for Example 1 with input clauses , instantiation lemmas , theory lemmas , and resolvents .

**Theorem 3.** *The algorithm in this section computes valid tree interpolants from a proof of unsatisfiability.*

*Proof.* By induction, every node in the proof tree is labelled by a valid partial tree interpolant: Theorem 1 is the base case and Theorem 2 the inductive step. The proof of unsatisfiability ends with the empty clause and its partial interpolant is a tree interpolant for the original problem.

#### **5.2 Full Interpolation Example**

We illustrate the algorithm on our running example (Example 1). Consider the tree interpolation problem given in Fig. 1. The symbol b occurs in partitions 2 and 3, f in 3, g in 1, 2, and 3, and h in 1. Our goal is to compute tree interpolants <sup>I</sup>{1}, <sup>I</sup>{2}, and <sup>I</sup>{3} for the leaf nodes such that <sup>φ</sup><sup>1</sup> implies <sup>I</sup>{1}, <sup>φ</sup><sup>2</sup> implies <sup>I</sup>{2}, and <sup>φ</sup><sup>3</sup> implies <sup>I</sup>{3}, and tree interpolant <sup>I</sup>{2,3} such that <sup>I</sup>{2,3} is implied by <sup>I</sup>{2} <sup>∧</sup> <sup>I</sup>{3}, and <sup>I</sup>{1} <sup>∧</sup> <sup>I</sup>{2,3} implies <sup>⊥</sup>.

Figure 2 shows an instantiation-based resolution proof for the unsatisfiability of <sup>φ</sup><sup>1</sup> <sup>∧</sup>φ<sup>2</sup> <sup>∧</sup>φ<sup>3</sup>. First, we assign each literal occurring in the proof tree to exactly one partition. We colour each proxy literal for a quantified formula by a partition in which it occurs, e.g., *colour* (∀x.g(h(x)) <sup>≤</sup> x) = 1. For the other literals, we can choose arbitrary colours. We assign the literals g(h(b)) = b, g(h(b)) <sup>≤</sup> b, and g(h(b)) <sup>≥</sup> b to partition 2, and the literal f(g(h(b))) <sup>=</sup> f(b) to partition 3. We then compute for each literal the projection onto each partition, i.e., - <sup>p</sup>i. For - <sup>≡</sup> g(h(b)) <sup>≤</sup> b assigned to partition 2, the projections are given in Example 3. As g(h(b)) <sup>≥</sup> b and g(h(b)) = b are assigned to the same partition as and only differ in the comparison operator, their projections only differ in the comparison operator of the flattened version of the original literal. For the remaining literal f(g(h(b))) = f(b), we get the following projections:

$$\begin{array}{l} f(g(h(b))) = f(b) \mid 1 \equiv v\_{g(h(b))} = g(v\_{h(b)}) \land v\_{h(b)} = h(v\_b) \\ f(g(h(b))) = f(b) \mid 2 \equiv v\_{g(h(b))} = g(v\_{h(b)}) \land v\_b = b \\ f(g(h(b))) = f(b) \mid 3 \equiv v\_{f(g(h(b)))} = v\_{f(b)} \land v\_{f(g(h(b)))} = f(v\_{g(h(b))}) \land v\_b = b \\ & v\_{g(h(b))} = g(v\_{h(b)}) \land v\_{f(b)} = f(v\_b) \land v\_b = b \end{array}$$

We now compute partial tree interpolants for each node in the proof tree. The first input clause <sup>C</sup> <sup>≡</sup> <sup>φ</sup><sup>1</sup> on the top left of the proof tree is from partition 1. The partial interpolants <sup>I</sup>{1} and <sup>I</sup>{1,2,3} are set to <sup>¬</sup>(¬<sup>C</sup> -<sup>−</sup> A<sup>c</sup>) ≡ ⊥, and I{2}, <sup>I</sup>{3}, and <sup>I</sup>{2,3} are set to <sup>¬</sup><sup>C</sup> -<sup>−</sup> A ≡ -. For the input clauses <sup>φ</sup><sup>2</sup> and <sup>φ</sup><sup>3</sup>, the interpolants are computed analogously. To summarise:

$$\phi\_1 : \bigwedge^\perp \sum\_{\gamma \vdash \Delta} \tau \prec\_{\phi\_2} \tau \prec\_{\phi\_2} \tau \prec\_{\phi\_1} \tau \prec\_{\phi\_2} \tau \prec\_{\Delta} \tau \prec\_{\Delta} \tau$$

We now compute the partial tree interpolants for the instantiation lemma on the top right of the proof tree. Similar as for the input clauses, we set <sup>I</sup>{1} to <sup>¬</sup>(¬C -<sup>−</sup> A<sup>c</sup>), i.e., to <sup>¬</sup>(¬C -<sup>−</sup> 2) ∧ ¬(¬C -<sup>−</sup> 3) <sup>≡</sup> <sup>v</sup>g(h(b)) <sup>≤</sup> <sup>v</sup>b. Analogously, we compute all other partial tree interpolants for the three instantiation lemmas:

<sup>¬</sup>φ<sup>1</sup> <sup>∨</sup> <sup>g</sup>(h(b)) <sup>≤</sup> <sup>b</sup> : <sup>⊥</sup> vg(h(b)) > v<sup>b</sup> <sup>v</sup>g(h(b)) <sup>≤</sup> <sup>v</sup><sup>b</sup> <sup>v</sup>g(h(b)) > v<sup>b</sup> - <sup>¬</sup>φ<sup>2</sup> <sup>∨</sup> <sup>g</sup>(h(b)) <sup>≥</sup> <sup>b</sup> : <sup>⊥</sup> ⊥ -⊥ - <sup>¬</sup>φ<sup>3</sup> <sup>∨</sup> <sup>f</sup>(g(h(b))) <sup>=</sup> <sup>f</sup>(b) : ⊥ ⊥ - -⊥

For the trichotomy lemma, the partial tree interpolants can be set to <sup>¬</sup>C -<sup>−</sup> A or <sup>¬</sup>(¬C -<sup>−</sup> A<sup>c</sup>). Due to our colouring, all literals in the lemma are either in A or in <sup>A</sup>c. To get the most simple partial interpolants, we set <sup>I</sup>{1} and <sup>I</sup>{3} to <sup>¬</sup>C -<sup>−</sup> A ≡ -, and <sup>I</sup>{2} and <sup>I</sup>{2,3} to <sup>¬</sup>(¬<sup>C</sup> -<sup>−</sup> Ac) ≡ ⊥:

$$\begin{aligned} \tau \lnot \sim\_{\perp} \lnot \sim\_{\perp} \lnot \sim\_{\perp} \lnot (\lnot (b(b)) \lnot \sim b(b(b)) \lnot \sim b) \end{aligned}$$

For the congruence lemma, we have <sup>p</sup>f = 3. The partial tree interpolants <sup>I</sup>{1} and <sup>I</sup>{2} are set to <sup>¬</sup><sup>C</sup> -<sup>−</sup> <sup>A</sup> as <sup>p</sup><sup>f</sup> ∈ <sup>A</sup> for these partitions. We get <sup>I</sup>{1} ≡ - (neither of the flattened literals in <sup>¬</sup>C is contained in the projection kernel) and <sup>I</sup>{2} <sup>≡</sup> <sup>v</sup>g(h(b)) <sup>=</sup> <sup>v</sup>b, since we chose 2 as the colour of this literal. Similarly, <sup>I</sup>{3} and <sup>I</sup>{2,3} are set to <sup>¬</sup>(¬<sup>C</sup> -<sup>−</sup> <sup>A</sup><sup>c</sup>). We get <sup>I</sup>{3} <sup>≡</sup> <sup>v</sup>g(h(b)) <sup>=</sup> <sup>v</sup><sup>b</sup> and <sup>I</sup>{2,3} ≡ ⊥:

g(h(b)) <sup>=</sup> b <sup>∨</sup> f(g(h(b))) = f(b) : ⊥ ⊥ <sup>v</sup>g(h(b)) <sup>=</sup> <sup>v</sup><sup>b</sup> <sup>v</sup>g(h(b)) <sup>=</sup> <sup>v</sup><sup>b</sup>

Having computed the partial tree interpolants for all leaves in the proof tree, we now compute the partial tree interpolants for each resolvent. If the colour of the pivot literal is in the A-part, i.e., *colour* (-) <sup>∈</sup> A, the partial tree interpolant of the resolvent is the disjunction of the partial tree interpolants of its antecedents. Otherwise, if *colour* (-) <sup>∈</sup> A<sup>c</sup>, we build the conjunction of the partial tree interpolants of its antecedents. In the resolution step for the resolvent clause <sup>C</sup><sup>3</sup> <sup>≡</sup> <sup>g</sup>(h(b)) <sup>≤</sup> <sup>b</sup>, the pivot literal is assigned to partition 1, i.e., colour(∀x.g(h(x)) <sup>≤</sup> x) = 1. To obtain I{1}, we hence build the disjunction of the partial interpolants of the antecedents <sup>C</sup><sup>1</sup> ≡ ∀x.g(h(x)) <sup>≤</sup> <sup>x</sup> and <sup>C</sup><sup>2</sup> <sup>≡</sup> <sup>¬</sup>(∀x.g(h(x)) <sup>≤</sup> <sup>x</sup>) <sup>∨</sup> <sup>g</sup>(h(b)) <sup>≤</sup> <sup>b</sup>, so we get <sup>I</sup>{1} <sup>≡</sup> <sup>I</sup><sup>1</sup> {1} <sup>∨</sup> <sup>I</sup><sup>2</sup> {1} <sup>≡</sup> <sup>v</sup>g(h(b)) <sup>≤</sup> <sup>v</sup>b. Similarly, we obtain <sup>I</sup>{2}, <sup>I</sup>{3} and <sup>I</sup>{2,3} by conjoining the respective partial interpolants. Since the top-left interpolant is only or ⊥ and the colouring of the pivot literal ensures that we either build the conjunction with or the disjunction with ⊥, the resulting tree interpolant of the resolvent is the same as for the top-right clause. The variables <sup>v</sup>g(h(b)) and <sup>v</sup><sup>b</sup> are both supported by <sup>C</sup><sup>3</sup> and thus allowed to appear in the partial interpolant. The resolution steps of the other inner nodes are very similar in that their partial interpolants always equal the partial interpolant of one of their antecedents. To summarise:

The last resolution step is a bit more involved. We have already computed the tree interpolant for partition 1 in Example <sup>4</sup> as <sup>I</sup>{1} ≡ ∀x.∃y.g(y) <sup>≤</sup> <sup>x</sup>. For partition 2, the disjunction <sup>v</sup>g(h(b)) > v<sup>b</sup> <sup>∨</sup> <sup>v</sup>g(h(b)) <sup>=</sup> <sup>v</sup><sup>b</sup> can be simplified to <sup>v</sup>g(h(b)) <sup>≥</sup> <sup>v</sup>b. The outermost variable <sup>v</sup>g(h(b)) is then replaced by <sup>g</sup>(vh(b)), since <sup>g</sup> occurs in 1 and 2. Then for <sup>v</sup>h(b) a universal quantifier is introduced, since <sup>h</sup> only occurs in partition 1, resulting in <sup>∀</sup>y.g(y) <sup>≥</sup> <sup>v</sup>b. Finally, <sup>v</sup>b is replaced by <sup>b</sup>, since it occurs in both 2 and 3. This results in <sup>I</sup>{2} ≡ ∀y.g(y) <sup>≥</sup> <sup>b</sup>. We omit the computation of the partial interpolant for partitions 3 and the node 23. The partial tree interpolant computed in this step is the tree interpolant of the full interpolation problem:

### **6 Combination with Equality-Interpolating Theories**

In Sects. 4 and 5, we assign each literal to exactly one partition, such that we can apply McMillan's algorithm to combine partial interpolants of the antecedents to obtain a partial interpolant for the resolvent. In the presence of equalityinterpolating theories [25], we can also allow for *mixed* literals where only outermost terms must be assigned to one partition. More precisely, we can allow for equalities <sup>t</sup><sup>1</sup> <sup>=</sup> <sup>t</sup><sup>2</sup> where the left-hand side <sup>t</sup><sup>1</sup> is in one partition and the righthand side <sup>t</sup><sup>2</sup> in another, or linear constraints of the form <sup>c</sup><sup>1</sup> ·t<sup>1</sup> <sup>+</sup> ... <sup>+</sup> <sup>c</sup>n ·tn <sup>c</sup><sup>0</sup> with constants <sup>c</sup>i and ∈{=, <sup>≤</sup>, <, <sup>≥</sup>, >}, where each <sup>t</sup>i is assigned to one partition. Such literals can be treated by applying proof tree preserving tree interpolation [5].

A mixed literal - <sup>≡</sup> <sup>t</sup><sup>1</sup> <sup>=</sup> <sup>t</sup><sup>2</sup> is coloured with two colours <sup>p</sup><sup>1</sup> and <sup>p</sup><sup>2</sup>, so that each colour can be chosen to contain the outermost symbols of <sup>t</sup><sup>1</sup> and <sup>t</sup><sup>2</sup>, respectively. The projections are - -<sup>−</sup> <sup>p</sup><sup>1</sup> <sup>≡</sup> <sup>v</sup>t<sup>1</sup> <sup>=</sup> <sup>v</sup>-, - -<sup>−</sup> <sup>p</sup><sup>2</sup> <sup>≡</sup> <sup>v</sup>- <sup>=</sup> <sup>v</sup>t<sup>2</sup> and for the negated literal <sup>¬</sup>- -<sup>−</sup> <sup>p</sup><sup>1</sup> <sup>≡</sup> *EQ*1(v-, vt<sup>1</sup> ) and <sup>¬</sup>- -<sup>−</sup> <sup>p</sup><sup>2</sup> <sup>≡</sup> *EQ*2(v-, vt<sup>2</sup> ), where <sup>v</sup> is a fresh variable and *EQ*<sup>1</sup>,*EQ*<sup>2</sup> are shared uninterpreted predicates with <sup>∀</sup>x, y.¬(*EQ*1(x, y) <sup>∧</sup> *EQ*2(x, y)), that are only used for the interpolation algorithm. The partial interpolants for a lemma containing mixed literals will contain the auxiliary variable <sup>v</sup>-. If a negated mixed equality occurs in the conflict (the negated lemma), we further require that <sup>v</sup> occurs only in literals of the form *EQ*i(v-, s) for some shared term s. Valid interpolants will naturally have this shape, as the interpolated conflict also contains <sup>v</sup> only as first parameter of an *EQ*i. We then introduce a new combination rule in the first part of interpolating resolution steps: For a mixed literal -, the two interpolants <sup>I</sup><sup>1</sup>[*EQ*i(v-, s)] and <sup>I</sup><sup>2</sup>(v-) are combined to I<sup>1</sup>[I<sup>2</sup>(s)], i.e., interpolant I<sup>2</sup>(s) replaces the *EQ*literals occurring in the interpolant I<sup>1</sup> to form the resolvent interpolant. This eliminates the variable <sup>v</sup> without introducing a quantifier. The remaining part is unchanged, i.e., we still introduce quantifiers for unsupported flattening variables. A proof that the first step produces a valid resolvent interpolant can be found in [5]. This method produces quantifier-free interpolants if the input formulas were quantifier-free. An example for this method can be found in [13].

### **7 Implementation in SMTInterpol**

We implemented the algorithm in SMTInterpol<sup>2</sup> [6] with a few alterations. First, we used the combination with equality-interpolating theories described in the previous section. Second, we do not apply flattening explicitly. Instead of using an auxiliary variable, the interpolation algorithms for the lemmas include the corresponding term directly. This may result in an interpolant where the interpolant has symbols that are not allowed, because the auxiliary variable was shared but its corresponding function symbol is local to one partition. Only in that case, we introduce the fresh variables for these subterms and replace the offending subterm in the interpolant with its variable. This creates the same interpolants as our presented algorithm, because the latter replaces each variable that stands for a shared function symbol by its definition in the end.

SMTInterpol also supports literals that are shared. If this is done na¨ıvely, the computed interpolants may violate the tree inductivity property (third property in Definition 1). We solve this by treating each literal as occurring in one designated partition when interpolating a lemma (minimizing the number of alternating chains in transitivity lemmas). We then apply Pudl´ak's resolution rule [21] that has a case for shared literals. Our implementation colours input literals with all partitions it occurs in. For new terms created in the proof, the colour that matches the most outermost function symbols is chosen. If the term uses only symbols from one partition, then it is coloured with that partition. Equalities and inequalities between terms of different partitions are handled with the equality-interpolating procedure to avoid introducing quantifiers when it is not necessary.

#### **8 Conclusion**

We presented a tree interpolation algorithm for SMT formulas with quantifiers. The key idea is to virtually flatten each conflict corresponding to a clause in the resolution proof, such that the literals in the flattened version are non-mixed and can be assigned to the different partitions. The colouring of the original literals can even be chosen arbitrarily. Depending on the assigned colours, partial interpolants may contain flattening variables that bridge different partitions, which eventually must be bound by quantifiers.

Our algorithm computes tree interpolants from a single, non-local proof of unsatisfiability obtained independently of the partitioning of the interpolation problem. It supports quantifiers and arbitrary SMT theories, given that the

<sup>2</sup> Official webpage: https://ultimate.informatik.uni-freiburg.de/smtinterpol/

Code available under LGPLv3 at https://github.com/ultimate-pa/smtinterpol.

theory itself supports tree interpolation for its lemmas, and we provided the algorithms for the theory of equality and the theory of linear rational arithmetic.

Correctness proofs for our algorithm are available in [14]. The algorithm is implemented in the open-source SMT solver SMTInterpol.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Proving Termination of C Programs with Lists**

Jera Hensel(B) and J¨urgen Giesl(B)

LuFG Informatik 2, RWTH Aachen University, Aachen, Germany {hensel,giesl}@informatik.rwth-aachen.de

**Abstract.** There are many techniques and tools to prove termination of C programs, but up to now these tools were not very powerful for fully automated termination proofs of programs whose termination depends on recursive data structures like lists. We present the first approach that extends powerful techniques for termination analysis of C programs (with memory allocation and explicit pointer arithmetic) to lists.

### **1 Introduction**

In [11,16,17,25], we introduced an approach for automatic termination analysis of C that also handles programs whose termination relies on the relation between allocated memory addresses and the data stored at such addresses. This approach is implemented in our tool AProVE [14]. Instead of analyzing C directly, AProVE compiles the program to LLVM code using Clang [9]. Then it constructs a (finite) symbolic execution graph (SEG) such that every program run corresponds to a path through the SEG. AProVE proves memory safety during the construction of the SEG to ensure absence of undefined behavior (which would also allow non-termination). Afterwards, the SEG is transformed into an integer transition system (ITS) such that all paths through the SEG (and hence, the C program) are terminating if the ITS is terminating. To analyze termination of the ITS,

AProVE applies standard techniques and calls the tools T2 [7] and LoAT [12,13] to detect non-termination of ITSs. However, like other termination tools for C, up to now AProVE supported dynamic data structures only in a very restricted way.

In this paper, we introduce a novel technique to analyze C programs on lists. In the program on the right, nondet uint returns a random unsigned integer. The for loop creates a list of n random numbers if n > 0. The while loop traverses this list via pointer arithmetic: Starting with tail, it computes the address of the next field of the

```
struct list {
 unsigned int value;
 struct list* next; };
int main() {
 // initialize length
 unsigned int n = nondet_uint();
 // initialize list of length n
 struct list* tail = NULL;
 struct list* curr;
 for (unsigned int k = 0; k < n; k++) {
   curr = malloc(sizeof(struct list));
   curr->value = nondet_uint();
   curr->next = tail;
   tail = curr; }
 // traverse list
 struct list* ptr = tail;
 while(ptr != NULL) {
   ptr = *((struct list**)((void*)ptr +
         offsetof(struct list, next)));}}
```
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 235950644 (Project GI 274/6-2).

c The Author(s) 2023

B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 266–285, 2023. https://doi.org/10.1007/978-3-031-38499-8\_16

current element by adding the offset of the next field within a list to the address of the current list and dereferencing the computed address (i.e., the content of the next field). This is done by offsetof, defined in the C library stddef.h. <sup>1</sup> Since the list is acyclic and the next pointer of its last element is the null pointer, list traversal always terminates. Of course, the while loop could also traverse the list via ptr = ptr->next, but in C, memory accesses can be combined with pointer arithmetic. This example contains both the access via curr->next (when initializing the list) and pointer arithmetic (when traversing the list).

We present a new general technique to infer *list invariants* via symbolic execution, which express all properties that are crucial for memory safety and termination. In our example, the list invariant contains the information that dereferencing the next pointer in the while loop is safe and that one finally reaches the null pointer. In general, our novel list invariants allow us to abstract from detailed information about lists (e.g., about their intermediate elements) such that abstract states with "similar" lists can be merged and generalized during the symbolic execution in order to obtain finite SEGs. At the same time, list invariants express enough information about the lists (e.g., their length, their start address, etc.) such that memory safety and termination can still be proved.

We define the abstract states used for symbolic execution in Sect. 2. In Sect. 3, after recapitulating the construction of SEGs, we adapt our techniques for merging and generalizing states from [25] to infer list invariants. Moreover, we adapt those rules for symbolic execution that are affected by introducing list invariants. Section 4 discusses the generation of ITSs and the soundness of our approach. Section 5 gives an overview on related work. Moreover, we evaluate the implementation of our approach in the tool AProVE using benchmark sets from *SV-COMP* [3] and the *Termination Competition* [15]. All proofs can be found in [18].

*Limitations.* To ease the presentation, in this paper we treat integer types as unbounded. Moreover, we assume that a program consists of a single non-recursive function and that values may be stored at any address. Our approach can also deal with bitvectors, data alignments, and programs with arbitrary many (possibly recursive) functions, see [11,16,25] for details. However, so far only lists without sharing can be handled by our new technique. Extending it to more general recursive data structures is one of the main challenges for future work.

### **2 Abstract States for Symbolic Execution**

The LLVM code for the for loop is given on the next page. It is equivalent to the code produced by Clang without optimizations on a 64-bit computer. We explain it in detail in Sect. 3. To ease readability, we omitted instructions and keywords that are irrelevant for our presentation, renamed variables, and wrote list

<sup>1</sup> Note that ptr + n increases ptr by n times the size of the type \*ptr. As we want to increase ptr by a number of bytes and ptr is not an i8 pointer, we first cast ptr to void\*. Then ((void\*)ptr + offsetof(struct list, next)) contains the next pointer, so we cast our computed address to struct list\*\* before dereferencing it.

instead of struct.list. Moreover, we gave the C instructions (in gray) before the corresponding LLVM code. The code consists of several *basic blocks* including cmpF and bodyF (corresponding to the loop comparison and body).

We now recapitulate the *abstract states* of [25] used for symbolic execution and extend them by a component *LI* for list invariants, i.e., they have the form ((b, i), *LV* , *AL*,*PT*, *LI*, *KB*). The first component is a *program position* (b, i), indicating that instruction i of block b is executed next. *Pos* ⊆ (*Blks* <sup>×</sup> <sup>N</sup>) is the set of all program positions, and *Blks* are all basic blocks.

The second component is a partial injective function *LV* : <sup>V</sup><sup>P</sup> ⇀ <sup>V</sup>sym, which maps *local program variables* V<sup>P</sup> of the program P to an infinite set Vsym of symbolic variables with <sup>V</sup>sym <sup>∩</sup> <sup>V</sup><sup>P</sup> <sup>=</sup> <sup>∅</sup>. We identify *LV* with the set of equations {<sup>x</sup> <sup>=</sup> *LV* (x) <sup>|</sup> <sup>x</sup> <sup>∈</sup> domain(*LV* )} and we often extend *LV* to a function from <sup>V</sup><sup>P</sup> <sup>⊎</sup> <sup>N</sup> to <sup>V</sup>sym <sup>⊎</sup> <sup>N</sup> by defining *LV* (n) = n for all n ∈ N.

```
list = type { i32, list* }
define i32 @main() { ...
cmpF:
  k<n
  0: k = load i32, i32* k_ad
  1: kltn = icmp ult i32 k, n
  2: br i1 kltn, label bodyF, label initPtr
bodyF:
  curr = malloc(sizeof(struct list));
  0: mem = call i8* @malloc(i64 16)
  1: curr = bitcast i8* mem to list*
  curr->value = nondet_uint();
  2: nondet = call i32 @nondet_uint()
  3: curr_val = getelementptr list,
                list* curr, i32 0, i32 0
  4: store i32 nondet, i32* curr_val
  curr->next = tail;
  5: tail = load list*, list** tail_ptr
  6: curr_next = getelementptr list,
                 list* curr, i32 0, i32 1
  7: store list* tail, list** curr_next
  tail = curr;
  8: store list* curr, list** tail_ptr
  k++
  9: kinc = add i32 k, 1
  10:store i32 kinc, i32* k_ad
  11:br label cmpF
  ... }
```
The third component of each state is a set *AL* of (bytewise) allocations <sup>v</sup>1, v2 with <sup>v</sup>1, v<sup>2</sup> <sup>∈</sup> <sup>V</sup>sym, which indicate that v<sup>1</sup> ≤ v<sup>2</sup> and that all addresses between v<sup>1</sup> and v<sup>2</sup> have been allocated. We require any two entries v1, v2 and w1, w2 from *AL* with v<sup>1</sup> ≠ w<sup>1</sup> or v<sup>2</sup> ≠ w<sup>2</sup> to be disjoint.

The fourth and fifth components *PT* and *LI* model the memory contents. *PT* contains "points-to" entries of the form v<sup>1</sup> -<sup>→</sup>ty <sup>v</sup><sup>2</sup> where <sup>v</sup>1, v<sup>2</sup> <sup>∈</sup> <sup>V</sup>sym and ty is an LLVM type, meaning that the address <sup>v</sup><sup>1</sup> of type ty points to <sup>v</sup>2. In contrast, the set *LI* of *list invariants* (which is new compared to [25]) does not describe pointwise memory contents but contains invariants vad v- -−→ty [(*off* <sup>i</sup> : ty<sup>i</sup> : vi..vˆi)]<sup>n</sup> <sup>i</sup>=<sup>1</sup> where <sup>n</sup>∈N<sup>&</sup>gt;0, <sup>v</sup>ad , v-, vi, <sup>v</sup>ˆ<sup>i</sup> <sup>∈</sup>Vsym, *off* <sup>i</sup> <sup>∈</sup><sup>N</sup> for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, ty and ty<sup>i</sup> are LLVM types for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, and there is exactly one "recursive field" <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>n</sup> such that ty<sup>j</sup> <sup>=</sup> ty\*. <sup>2</sup> Such an invariant represents a struct ty with n fields that corresponds to a recursively defined list of length v-. Here, vad points to the first list element, the i-th field starts at address vad +*off* <sup>i</sup> (i.e., with offset *off* <sup>i</sup>)<sup>3</sup> and has type tyi, and the values of the i-th fields of the first and last list element are v<sup>i</sup> and ˆvi, respectively. For example, the following list invariant (1) represents all lists of length x and type list whose elements store a 32-bit integer in their first field and the pointer to the next element in their second field

<sup>2</sup> Soundness of our approach is not affected if there are other recursive fields, but our symbolic execution technique for list traversal on list invariants in Sect. 3.2.2 can only be applied if the traversal is done along field j.

<sup>3</sup> The field offsets can be computed using the data layout string in the LLVM program.

with offset 8. The first list element starts at address xmem, the second starts at address xnext, and the last element contains the null pointer. Moreover, the first element stores the integer value xnd and the last list element stores the integer ˆxnd.

$$x\_{\mathtt{n}\mathtt{on}} \xleftarrow{x\_{\ell}}\_{\mathtt{1}\mathtt{int}} \left[ (0:\mathtt{132}:x\_{\mathtt{nd}}.\hat{x}\_{\mathtt{nd}}), (8:\mathtt{1}\mathtt{ist}\*:x\_{\mathtt{n}\mathtt{ext}}.0) \right] \tag{1}$$

For example, this invariant represents the list with the allocation xmem, xmem+15, where the first four bytes store the integer 5 and the last eight bytes store the pointer xnext, and the allocation xnext, xnext+15, where the first four bytes store the integer 2 and the last eight bytes store the null pointer (i.e., the address 0). Here, we have x- = 2. Section 3.2.2 will show that the expressiveness of our list invariants is indeed needed to prove termination of programs that traverse a list.

The last component of a state is a *knowledge base KB* of quantifier-free firstorder formulas that express integer arithmetic properties of Vsym. We identify *sets* of first-order formulas {ϕ1,...,ϕ<sup>m</sup>} with their conjunction <sup>ϕ</sup><sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>ϕ</sup>m.

A special state *ERR* is reached if we cannot prove absence of undefined behavior (e.g., if memory safety might be violated by dereferencing the null pointer).

As an example, the following abstract state (2) represents concrete states at the beginning of the block cmpF, where the program variable curr is assigned the symbolic variable xmem, the allocation x<sup>k</sup> ad, xend <sup>k</sup> ad consisting of 4 bytes stores the value xkinc, and xmem points to the first element of a list of length x- (equal to xkinc) that satisfies the list invariant (1). (This state will later be obtained during the symbolic execution, see State O in Fig. 3 in Sect. 3.1.)

$$\begin{array}{|l}\hline\hline\{\textbf{cmF},0\},\ \{\textbf{curlr}=x\_{\textbf{nuc}},\ \textbf{kinc}=x\_{\textbf{kinc}},\ \ldots\},\ \{\left[x\_{\textbf{k}\text{ad}},x\_{\textbf{k}\text{ad}}^{\text{end}}\right],\ \ldots\},\ \{x\_{\textbf{k}\text{ad}}\hookrightarrow x\_{\textbf{k}\text{in}\varepsilon},\ldots\},\\\left\{x\_{\textbf{nuc}}\stackrel{\textbf{x\_{\ell}}}{\longrightarrow}&\{\left[\left(0:\textbf{i32}:x\_{\textbf{nuc}}.\hat{x}\_{\textbf{n}d}\right),\left(8:\textbf{1in}\texttt{t}\bullet:x\_{\textbf{nuc}}.0\right)\right]\},\ \{x\_{\textbf{k}\text{ad}}^{\text{end}}=x\_{\textbf{k}\text{ad}}+3,\ x\_{\ell}=x\_{\textbf{k}\text{in}\varepsilon},\ldots\}\end{array}\right\}\qquad(2)$$

A state <sup>s</sup> <sup>=</sup> (p, *LV* , *AL*,*PT*, *LI*, *KB*) is *represented by a formula* s which contains *KB* and encodes *AL*, *PT*, and *LI* in first-order logic. This allows us to use standard SMT solving for all reasoning during the construction of the SEG. Moreover, s is also used for the generation of the ITS afterwards. The encoding of *AL* and *PT* is as in [25], see [18]: s contains formulas which express that allocated addresses are positive, that allocations represent disjoint memory areas, that equal addresses point to equal values, and that addresses are different if they point to different values. For each element of *LI*, we add the following new formulas to s which express that the list length v is ≥ 1 and the address vad of the first element is not null. If v- = 1, then the values v<sup>i</sup> and ˆv<sup>i</sup> of the fields in the first and the last element are equal. If v- ≥ 2, then the next pointer v<sup>j</sup> in the first element must not be null. Finally, if there is a field whose values v<sup>k</sup> and ˆv<sup>k</sup> differ in the first and the last element, then the length vmust be ≥ 2.

{v- <sup>≥</sup> <sup>1</sup> <sup>∧</sup> <sup>v</sup>ad <sup>≥</sup> <sup>1</sup> <sup>|</sup> (vad v- −→ty [(off <sup>i</sup> : ty<sup>i</sup> : <sup>v</sup>i..vˆi)]<sup>n</sup> <sup>i</sup>=1) <sup>∈</sup> LI } <sup>∪</sup> { n <sup>i</sup>=<sup>1</sup> <sup>v</sup><sup>i</sup> <sup>=</sup> <sup>v</sup>ˆ<sup>i</sup> <sup>|</sup> (vad v- −→ty [(off <sup>i</sup> : ty<sup>i</sup> : <sup>v</sup>i..vˆi)]<sup>n</sup> <sup>i</sup>=1) <sup>∈</sup> LI and <sup>|</sup><sup>=</sup> s ⇒ <sup>v</sup>- <sup>=</sup> <sup>1</sup>} <sup>∪</sup> {v<sup>j</sup> <sup>≥</sup> <sup>1</sup> <sup>|</sup> (vad v- −→ty [(off <sup>i</sup> : ty<sup>i</sup> : <sup>v</sup>i..vˆi)]<sup>n</sup> <sup>i</sup>=1) <sup>∈</sup> LI with ty<sup>j</sup> <sup>=</sup> ty<sup>∗</sup> and <sup>|</sup><sup>=</sup> s ⇒ <sup>v</sup>- <sup>≥</sup> <sup>2</sup>} <sup>∪</sup> {v- <sup>≥</sup> <sup>2</sup> <sup>|</sup> (vad v- −→ty [(off <sup>i</sup> : ty<sup>i</sup> : <sup>v</sup>i..vˆi)]<sup>n</sup> <sup>i</sup>=1) <sup>∈</sup> LI and <sup>∃</sup> <sup>k</sup>∈N>0, k <sup>≤</sup> n, s.t. <sup>|</sup><sup>=</sup> s ⇒ <sup>v</sup><sup>k</sup> <sup>≠</sup> <sup>v</sup>ˆk}

In *concrete* states c, all values of variables and memory contents are determined uniquely. To ease the formalization, we assume that all integers are

**Fig. 1.** SEG for the First Iteration of the for Loop

unsigned and refer to [16] for the general case. So for all <sup>v</sup> <sup>∈</sup> <sup>V</sup>sym(c) (i.e., all <sup>v</sup> <sup>∈</sup> <sup>V</sup>sym occurring in <sup>c</sup>) we have <sup>|</sup><sup>=</sup> c ⇒ <sup>v</sup> <sup>=</sup> <sup>n</sup> for some <sup>n</sup> <sup>∈</sup> <sup>N</sup>. Moreover, here *PT* only contains information about allocated addresses and *LI* = ∅ since the abstract knowledge in list invariants is unnecessary if all memory contents are known.

For instance, all concrete states ((cmpF, 0), *LV* , *AL*,*PT*, ∅, *KB*) represented by the state (2) contain allocations of 16 bytes for some ≥ 1, where in the first four bytes a 32-bit integer is stored and in the last eight bytes the address of the next allocation (or 0, in case of the last allocation) is stored.

See [18] for a formal definition to determine which concrete states are represented by a state s. To this end, as in [25] we define a *separation logic* formula sSL which also encodes the knowledge contained in the memory components of states. To extend this formula to list invariants, we use a fragment similar to *quantitative* separation logic [4], extending conventional separation logic by list predicates. For any state <sup>s</sup>, we have <sup>|</sup><sup>=</sup> sSL ⇒ s, i.e., s is a weakened version of sSL that we use for symbolic execution and the termination proof.

### **3 Symbolic Execution with List Invariants**

We first recapitulate the construction of SEGs. Then, Sect. 3.1 extends the technique for *merging* and generalization of states from [25] to infer list invariants. Finally, we adapt the rules for symbolic execution to list invariants in Sect. 3.2.

Our symbolic execution starts with a state A at the first instruction of the first block (called entry in our example). Figure 1 shows the first iteration of the for loop. Dotted arrows indicate that we omitted some symbolic execution steps. For every state, we perform symbolic execution by applying the corresponding inference rule as in [25] to compute its successor state(s) and repeat this until all paths end in return states. We call an SEG with this property *complete*.

As an example, we recapitulate the inference rule for the load instruction in the case where a value is loaded from allocated and initialized memory. It loads the value of type ty that is stored at the address ad to the program variable x. Let *size*(ty) denote the size of ty in bytes for any LLVM type ty. If we can prove that there is an allocation v1, v2 containing all addresses *LV* (ad),..., *LV* (ad)+ *size*(ty) − 1 and there exists an entry (w<sup>1</sup> -<sup>→</sup>ty <sup>w</sup>2) <sup>∈</sup> *PT* such that <sup>w</sup><sup>1</sup> is equal to the address *LV* (ad) loaded from, then we transform the state s at position p = (b, i) to a state s at position <sup>p</sup><sup>+</sup> <sup>=</sup> (b, i <sup>+</sup> 1). In <sup>s</sup> , a fresh symbolic variable w is assigned to x and w =w<sup>2</sup> is added to *KB*. We write *LV* [x : =w] for the function where *LV* [x : =w](x) = w and *LV* [x : =w](y) = *LV* (y) for all y ≠ x.


In our example, the entry block comprises the first three lines of the C program and the initialization of the pointer to the loop variable k: First, a nondeterministic unsigned integer is assigned to <sup>n</sup>, i.e., (n=vn)∈*LV* <sup>B</sup>, where <sup>v</sup><sup>n</sup> is not restricted. Moreover, memory for the pointers tail ptr and k ad is allocated and they point to tail = NULL and k=0, respectively (tail ptr = vtp and k ad = v<sup>k</sup> ad with (vtp -→list\* 0),(v<sup>k</sup> ad -<sup>→</sup>i32 0) <sup>∈</sup> *PT* <sup>B</sup>). For simplicity, in Fig. <sup>1</sup> we use concrete values directly instead of introducing fresh variables for them. Since we assume a 64-bit architecture, tail ptr's allocation contains 8 bytes. For the integer value of k, only 4 bytes are allocated. Alignments and pointer sizes depend on the memory layout and are given in the LLVM program.

State C results from B by evaluating the load instruction at (cmpF, 0), see the above load rule. Thus, there is an *evaluation edge* from B to C.

The next instruction is an integer comparison whose Boolean return value depends on whether the unsigned value of k is less than the one of n. If we cannot decide the validity of a comparison, we refine the state into two successor states. Thus, the states <sup>D</sup> and <sup>E</sup> (with (v<sup>n</sup> <sup>&</sup>gt; 0) <sup>∈</sup> *KB*<sup>D</sup> and (v<sup>n</sup> <sup>≤</sup> 0) <sup>∈</sup> *KB* <sup>E</sup>) are reached by *refinement edges* from State C. Evaluating D yields kltn = 1 in F. Therefore, the branch instruction leads to the block bodyF in State G. State E is evaluated to a state with kltn = 0. This path branches to the block initPtr and terminates quickly as tail ptr points to an empty list.

The instruction at (bodyF, 0) allocates 16 bytes of memory starting at vmem in State H. The next instruction casts the pointer to the allocation from i8\* to list\* and assigns it to curr. Now the allocated area can be treated as a list element. Then nondet uint() is invoked to assign a 32-bit integer to nondet.

**Fig. 2.** Second Iteration of the for Loop

The getelementptr instruction computes the address of the integer field of the list element by indexing this field (the second i32 0) based on the start address (curr). The first index (i32 0) specifies that a field of \*curr itself is computed and not of another list stored after \*curr. Since the address of the integer value of the list element coincides with the start address of the list element, this instruction assigns vmem to curr val. Afterwards, the value of nondet is stored at curr val (vmem -→i32 vnd), the value 0 stored at vtp is loaded to tail, and a second getelementptr instruction computes the address of the recursive field of the current list element (vcn=vmem+8) and assigns it to curr next, leading to state J. In the path to K, the values of tail and curr are stored at curr next and tail ptr, respectively (vcn -→list\* 0, vtp -→list\* vmem). Finally, the incremented value of k is assigned to kinc and stored at k ad (v<sup>k</sup> ad -→i32 1).

To ensure a finite graph construction, when a program position is reached for the second time, we try to merge the states at this position to a *generalized* state. However, this is only meaningful if the domains of the *LV* functions of the two states coincide (i.e., the states consider the same program variables). Therefore, after the branch from the loop body back to cmpF (see State L in Fig. 2), we evaluate the loop a second time and reach M. Here, a second list element with value wnd and a next pointer wcn point-

ing to vmem has been stored at a new allocation wmem, wend mem . Now, curr points to the new element and k has been incremented again, so k ad points to 2.

#### **3.1 Inferring List Invariants and Generalization of States**

As mentioned, our goal is to merge L and M to a more general state O that represents all states which are represented by L or M. The challenging part during generalization is to find loop invariants automatically that always hold at this position and provide sufficient information to prove termination of the loop. For O, we can neither use the information that curr points to a struct whose next field contains the null pointer (as in L), nor that its next field points to another struct whose next field contains the null pointer (as in M).

With the approach of [25], when merging states like L and M where a list has different lengths, the merged state would only contain those list elements that are allocated in both states (often this is only the first element). Then elements which are the null pointer in one but not in the other state are lost. Hence, proving memory safety (and thus, also termination) fails when the list is traversed afterwards, since now there might be next pointers to non-allocated memory.

We solve this problem by introducing *list invariants*. In our example, we will infer an invariant stating that curr points to a list of length x- ≥ 1. This invariant also implies that all struct fields are allocated and that there is no sharing.

To this end, we adapt the merging heuristic from [25]. To merge two states s and s at the same program position with domain(*LV* <sup>s</sup> ) = domain(*LV* <sup>s</sup>- ), we introduce a fresh symbolic variable xvar for each program variable var and use instantiations μ<sup>s</sup> and μ<sup>s</sup> which map xvar to the corresponding symbolic variables of s and s . For the merged state s, we set *LV* <sup>s</sup> (var)=xvar. Moreover, we identify corresponding variables that only occur in the memory components and extend μ<sup>s</sup> and μ<sup>s</sup> accordingly. In a second step, we check which constraints from the memory components and the knowledge base hold in both states in order to find invariants that we can add to the memory components and the knowledge base of s. For example, if μs(x), μs(xend) ∈ *AL*<sup>s</sup> and μ<sup>s</sup>- (x), μ<sup>s</sup>- (xend) ∈ *AL*<sup>s</sup>- for x, xend <sup>∈</sup> <sup>V</sup>sym, then x, xend is added to *AL*<sup>s</sup> . To extend this heuristic to lists, we have to regard several memory entries together. If there is an ad <sup>∈</sup> <sup>V</sup><sup>P</sup> such that μs(xad) = vstart <sup>1</sup> and μ<sup>s</sup>- (xad) = wstart <sup>1</sup> both point to lists of type ty but of different lengths <sup>s</sup> ≠ <sup>s</sup> with s, <sup>s</sup>-≥ 1, then we create a list invariant.

For a state s we say that vstart <sup>1</sup> *points to a list of type* ty *with* n *fields and length* <sup>s</sup> *with allocations* vstart <sup>k</sup> , vend <sup>k</sup> *and values* vk,i (for 1 ≤ k ≤ <sup>s</sup> and 1 ≤ i ≤ n) if the following conditions (*a*)–(*d*) hold:


Condition (a) states that ty is a list type with n fields, where the pointer to the next element is in the j-th field. In (b) we ensure that each list element has a unique allocation of the correct size where vstart <sup>1</sup> is the start address of the first allocation. Condition (c) requires that for the k-th element, the initial address plus the i-th offset points to a value vk,i of type tyi. Finally, (d) states that the recursive field of each element indeed points to the initial address of the next element.

Then, for fresh x-, xi, <sup>x</sup>ˆ<sup>i</sup> <sup>∈</sup> <sup>V</sup>sym, we add the following list invariant to *LI* <sup>s</sup> .

$$x\_{\mathbf{ad}} \xleftarrow{x\_{\ell}} \text{ty} \left[ (off\_i : \mathbf{ty}\_i : x\_i...\hat{x}\_i) \right]\_{i=1}^n \tag{3}$$

To ensure that the allocations expressed by the list invariant are disjoint from all allocations in *AL*<sup>s</sup> , we do not use the list allocations vstart <sup>k</sup> , vend <sup>k</sup> to infer generalized allocations in *AL*<sup>s</sup> . Similarly, to create *PT*<sup>s</sup> , we only use entries


#### **Fig. 3.** Merging of States

v -<sup>→</sup>ty <sup>w</sup> from *PT*<sup>s</sup> and *PT*s- where v is disjoint from the list addresses, i.e., where <sup>|</sup><sup>=</sup> s ⇒ <sup>v</sup> <sup>&</sup>lt;vstart <sup>k</sup> <sup>∨</sup><sup>v</sup> <sup>&</sup>gt;vend <sup>k</sup> holds for all 1 ≤ k ≤ <sup>s</sup> and analogously for s . Moreover, we add formulas to *KB*<sup>s</sup> stating that (A) the length x of the list is at least the smaller length of the merged lists, (B) x is equal to all variables x which result from merging variables v and w that are equal to the lengths <sup>s</sup> and <sup>s</sup> in s and s , and (C) the symbolic variable x<sup>i</sup> for the value of the i-th field of the first list element is equal to all variables x with μs(x) = v1,i and μ<sup>s</sup>- (x) = w1,i where v1,i and w1,i are the values of the i-th field of the first list element in s and s (and analogously for the values ˆx<sup>i</sup> of the last list element):

(A) min(s, <sup>s</sup>-) ≤ x-

s-

(B) - <sup>x</sup>∈μ−<sup>1</sup> <sup>s</sup> (v)∩μ−<sup>1</sup> (w) x- <sup>=</sup> <sup>x</sup> for all v, w <sup>∈</sup> <sup>V</sup>sym with <sup>|</sup><sup>=</sup> <sup>s</sup> ⇒ <sup>v</sup> <sup>=</sup> <sup>s</sup> and <sup>|</sup><sup>=</sup> s- ⇒ <sup>w</sup> <sup>=</sup> <sup>s</sup>-

$$\text{(C)} \ \bigwedge\_{x \in \mu\_x^{-1}(v\_{1,i}) \cap \mu\_{x'}^{-1}(w\_{1,i})} x\_i = x \text{ and } \bigwedge\_{x \in \mu\_x^{-1}(v\_{\ell\_x,i}) \cap \mu\_{x'}^{-1}(w\_{\ell\_{x'},i})} \hat{x}\_i = x \text{ for all } 1 \le i \le n.$$

s-

s-

To identify the variables in the list invariant (3) of s with the corresponding values in s and s , the instantiations μ<sup>s</sup> and μ<sup>s</sup> are extended such that μs(x-)=s, μ<sup>s</sup>- (x-) = <sup>s</sup>- , μs(xi) = v1,i, μ<sup>s</sup>- (xi) = w1,i, μs(ˆxi) = v<sup>s</sup>,i, and μ<sup>s</sup>- (ˆxi) = ws- ,i for all 1 ≤ i ≤ n. Similarly, if there already exist list invariants in s and s , for each pair of corresponding variables a new variable is introduced and mapped to its origin by μ<sup>s</sup> and μ<sup>s</sup>- . This adaption of the merging heuristic only concerns the result of merging but not the rules *when* to merge two states. Thus, the same reasoning as in [25] can be used to prove soundness and termination of merging.

In our example, L and M contain lists of length <sup>L</sup> = 1 and <sup>M</sup> = 2. To ease the presentation, we re-use variables that are known to be equal instead of introducing fresh variables. If xmem is the variable for the program variable curr, we have μL(xmem) = vmem and μM(xmem) = wmem. Indeed, vmem resp. wmem points to a list with values vk,i resp. wk,i as defined in (a)–(d): For the type list with n=2, ty1=i32, ty2=list∗, *off* <sup>1</sup>=0, *off* <sup>2</sup>=8, and <sup>j</sup>=2 (see (a)), we have vmem, vend mem ∈*AL*<sup>L</sup> and vmem, vend mem , wmem, wend mem <sup>∈</sup> *AL*<sup>M</sup>, all consisting of *size*(list) <sup>=</sup> 16 bytes, see (b). We have (vmem -→i32 vnd),(vcn -<sup>→</sup>list\* 0) <sup>∈</sup>*PT* <sup>L</sup> with (vcn <sup>=</sup>vmem <sup>+</sup>8) <sup>∈</sup>*KB*<sup>L</sup> and (vmem -→i32 vnd),(vcn -→list\* 0),(wmem -→i32 wnd),(wcn -<sup>→</sup>list\* <sup>v</sup>mem) <sup>∈</sup> *PT* <sup>M</sup> with (vcn <sup>=</sup> <sup>v</sup>mem <sup>+</sup> 8),(wcn <sup>=</sup> <sup>w</sup>mem <sup>+</sup> 8) <sup>∈</sup> *KB*<sup>M</sup> (see (c)), so the first list element in <sup>M</sup> points to the second one (see (d)). Therefore, when merging L and M to a new state O (see Fig. 3), the lists are merged to a list invariant of variable length x- and we add the formulas (A) 1 ≤ x and (B) x- <sup>=</sup> <sup>x</sup>kinc to *KB* <sup>O</sup>. By (C), the i32 value of the first element is identified with xnd, since μL(xnd) is equal to the first value of the first list element in L and μM(xnd) is equal to the first value of the first list element in M. Similarly, the values of the last list elements are identified with 0, as in L and M.

After merging s and s to a generalized state s, we continue symbolic execution from s. The next time we reach the same program position, we might have to merge the corresponding states again. As described in [25], we use a heuristic for constructing the SEG which ensures that after a finite number of iterations, a state is reached that only represents concrete states that are also represented by an *already existing* (more general) state in the SEG. Then symbolic execution can continue from this more general state instead. So with this heuristic, the construction always ends in a complete SEG or an SEG containing the state *ERR*.

We formalized the concept of "generalization" by a symbolic execution rule in [25]. Here, the state s is a generalization of s if the conditions (g1)−(g6) hold.

Condition (g1) prevents cycles consisting only of refinement and generalization edges in the graph. Condition (g2) states that the instantiation <sup>μ</sup>: <sup>V</sup>sym(s) → Vsym(s) <sup>∪</sup> <sup>Z</sup> maps symbolic variables from the more general state s to their counterparts from the more specific state s such that they correspond to the same program variable. Conditions (g3)–(g6) ensure that all knowledge present in *KB*, *AL*, *PT*, and *LI* still holds in s with the applied instantiation.

$$\begin{array}{lcl}\textbf{generallization with instantaneous }\mu\\ & s=(p,\,LV,\,\,\,AL,\,\,PT,\,\,\,LI,\,\,KB)\\ & \overline{s=(p,\,\,\overline{V},\,\,\overline{A},\,\,\,\overline{P},\,\,\,\overline{L},\,\,\overline{KB})\\ (g)\;s\;\text{has an incoming evaluation edge}\\ & (g)\;2d\;\text{domain}\left(\overline{L}\right)\;\text{and}\,\,\,LV(\textbf{var})\;\,\text{and}\,\,LV(\textbf{var})=\mu\left(\overline{L}\overline{V}(\textbf{var})\right)\;\text{for all }\textbf{var}\in\textsf{Vp}\text{ where}\\ & (g)\;(s\geq s)\;\mu\left(\overline{K}\overline{B}\right)\;\text{then}\,\,\text{then}\,\,\|v\_{v}\|\leq\delta,\,L\,\text{with}\,\,\|v\_{v}\|\leq s\;\,\,v\_{1}=\mu(x\_{1})\land v\_{2}=\mu(x\_{2})\\ (g)\;\,\,\,\|\;\,\,\|\;\,\,\|\;\,\,\|\;\,\,\|\;\,\,\|\;\,\,\|\;\,\,\,v\_{1}=\mu(x\_{1})\land v\_{2}=\mu(x\_{2})\\ (g)\;\,\,\,\|\;\,\,\|\;\,\,\,v\_{1}=\upsilon\_{V}\,\,\,\,\|\;\,\,\,\|\;\,\,\,\,\|\;\,\,\,\,v\_{2}=\mu(x\_{2})\\ (g)\;\,\,\,\|\;\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,$$

Condition (g6) is new compared to [25] and takes list invariants into account. So for every list invariant *l* of s there is either a corresponding list invariant *l* in s such that lists represented by *l* in s are also represented by *l* in s, or there is a concrete list in s that is represented by *l* in s. The last condition of the latter case ensures that disjointness between the memory domains of *PT* and *LI* is preserved. See [18] for the soundness proof of the extended generalization rule, i.e., that every concrete state represented by s is also represented by s.

Our merging technique always yields generalizations according to this rule, i.e., the edges from L and M to O in Fig. 3 are generalization edges. Here, one


**Fig. 4.** Extending a List Invariant

chooses μ<sup>L</sup> and μ<sup>M</sup> such that μL(xmem) = vmem, μL(x-) = 1, μL(xnd) = vnd, μL(ˆxnd) = vnd, μL(xnext) = 0, μM(xmem) = wmem, μM(x-) = 2, μM(xnd) = wnd, μL(ˆxnd) = vnd, and μM(xnext)=vmem. In both cases, all conditions of the second case of (g6) with <sup>L</sup> = 1 and <sup>M</sup> = 2 are satisfied. With μL(xkinc) = 1 resp. μM(xkinc) = 2, we also have <sup>|</sup><sup>=</sup> L ⇒ <sup>μ</sup>L(x-) <sup>=</sup> <sup>μ</sup>L(xkinc) resp. <sup>|</sup><sup>=</sup> M ⇒ <sup>μ</sup>M(x-) = μM(xkinc).

### **3.2 Adapting List Invariants**

To handle and modify list invariants, three of our symbolic execution rules have to be changed. Section 3.2.1 presents a variant of the store rule where the list invariant is *extended* by an element. In Sect. 3.2.2, we adapt the load rule to load values from the first list element and we present a variant of the getelementptr rule for list *traversal*. Soundness of our new rules is proved in [18]. For all other instructions, the symbolic execution rules from [25] remain unchanged.

#### **3.2.1 List Extension**

After merging L and M, symbolic execution continues from the more general state O in Fig. 3. Here, the values of k and kinc and the length of the list are not concrete but any positive (resp. non-negative) value with x- = xkinc = x<sup>k</sup> + 1. The symbolic execution of O is similar to the steps from B to J in Sect. 3 (see Fig. 1). First, the value xkinc stored at k ad is loaded to k. To distinguish whether k < n still holds, the next state is refined. From the refined state with k < n, we enter the loop body again. A new block ymem, yend mem of 16 bytes is allocated and ymem is assigned to mem and curr. Then, a new unknown value ynd is assigned to nondet. The address of the i32 value of the current element (equal to ymem) is computed by the first getelementptr instruction of the loop and the value ynd of nondet is stored at it. The second getelementptr instruction computes the address ycn of the recursive field and results in State P in Fig. 4, where ycn =ymem +8 is added to *KB* <sup>P</sup> . Now, store sets the address of the next field to the head of the list created in the previous iteration. Since this instruction extends the list by an element, instead of adding ycn -<sup>→</sup>list\* <sup>x</sup>mem to *PT* <sup>Q</sup>, we extend the list invariant: The length is set to y and identified with x- <sup>+</sup>1 in *KB*<sup>Q</sup>. The pointer <sup>x</sup>mem to the first element is replaced by ymem, while the first recursive field in the list gets the value xmem. Since (ymem -<sup>→</sup>i32 <sup>y</sup>nd) <sup>∈</sup> *PT* <sup>P</sup> , <sup>y</sup>nd is the value of the first i32 integer in the list. We remove all entries from *PT* <sup>Q</sup> that are already contained in the new list invariant, e.g., ymem -→i32 ynd.

To formalize this adaption of list invariants, we introduce a modified rule for store in addition to the one in [25]. It handles the case where there is a concrete list at some address vstart, pa points to the m-th field of this list's first element, one wants to store a value *t* at the address pa, and one already has a list invariant l for the "tail" of the list in the j-th field (if m ≠ j) resp. for the list at the address *t* (if m = j). In all other cases, the ordinary store rule is applied.

More precisely, let the list invariant l describe a list of length v<sup>l</sup> at the address vad . Then l is replaced by a new list invariant l which describes the list at the address vstart after storing t at the address pa. Irrespective of whether m ≠ j or m = j, the resulting list at vstart has the list at vad as its "tail" and thus, its length v is v- +1. We prevent sharing of different elements by removing the allocation vstart, vend of the list and all points-to information of pointers in vstart, vend .

**list extension (**<sup>p</sup> : **"**store ty <sup>t</sup>, ty\* pa**",** <sup>t</sup> <sup>∈</sup> <sup>V</sup><sup>P</sup> <sup>∪</sup> <sup>N</sup>**,** pa <sup>∈</sup> <sup>V</sup><sup>P</sup> **)** s = (p, LV , AL, PT, LI, KB) s- = (p<sup>+</sup>, LV , AL\{<sup>v</sup>start, vend }, PT- , LI \{l} <sup>∪</sup> {<sup>l</sup> - }, KB- ) if • there is <sup>l</sup> <sup>=</sup> (vad v- -−→lty [(off <sup>i</sup> : lty<sup>i</sup> : <sup>w</sup>i..wˆi)]<sup>n</sup> <sup>i</sup>=1) <sup>∈</sup> LI with lty<sup>j</sup> <sup>=</sup> lty<sup>∗</sup> • there is <sup>v</sup>start, vend <sup>∈</sup> AL with <sup>|</sup><sup>=</sup> <sup>s</sup> ⇒ <sup>v</sup>end <sup>=</sup> <sup>v</sup>start <sup>+</sup> size(lty) <sup>−</sup> <sup>1</sup> • there exists 1 <sup>≤</sup> <sup>m</sup> <sup>≤</sup> <sup>n</sup> such that ty <sup>=</sup> lty<sup>m</sup> and <sup>|</sup><sup>=</sup> <sup>s</sup> ⇒ LV (pa) <sup>=</sup> <sup>v</sup>start <sup>+</sup> off <sup>m</sup> • |<sup>=</sup> <sup>s</sup> ⇒ <sup>v</sup>ad <sup>=</sup> <sup>v</sup><sup>j</sup> if <sup>m</sup> <sup>≠</sup> <sup>j</sup> and <sup>|</sup><sup>=</sup> <sup>s</sup> ⇒ <sup>v</sup>ad <sup>=</sup> LV (t) if <sup>m</sup> <sup>=</sup> <sup>j</sup> • for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> with <sup>i</sup> <sup>≠</sup> <sup>m</sup> there exist <sup>v</sup>start <sup>i</sup> , v<sup>i</sup> <sup>∈</sup> <sup>V</sup>sym with <sup>|</sup><sup>=</sup> <sup>s</sup> ⇒ <sup>v</sup>start <sup>i</sup> <sup>=</sup> <sup>v</sup>start <sup>+</sup> off <sup>i</sup> and (vstart <sup>i</sup> -<sup>→</sup>lty<sup>i</sup> <sup>v</sup>i) <sup>∈</sup> PT • PT- = {(x<sup>1</sup> -<sup>→</sup>sy <sup>x</sup>2) <sup>∈</sup> PT | |<sup>=</sup> <sup>s</sup> ⇒ (vend <sup>&</sup>lt; <sup>x</sup>1) <sup>∨</sup> (x<sup>1</sup> <sup>+</sup> size(sy) <sup>−</sup> <sup>1</sup> <sup>&</sup>lt; <sup>v</sup>start )} • l - = (vstart <sup>v</sup>- - -−→lty [(off <sup>i</sup> : lty<sup>i</sup> : <sup>v</sup>i..wˆi)]<sup>n</sup> <sup>i</sup>=1) • KB- = KB <sup>∪</sup> {v<sup>m</sup> <sup>=</sup> LV (t), v- - = v- <sup>+</sup> <sup>1</sup>}, where <sup>v</sup>m, v- are fresh

#### **3.2.2 List Traversal**

After the current element ymem is stored at xtp and the value xkinc of k is incremented to ykinc and stored at x<sup>k</sup> ad, we reach a state R at position (cmpF, 0) by the branch instruction. However, our already existing state O is more general than R, i.e., we can draw a generalization edge from R to O using the generalization rule with the instantiation μ<sup>R</sup> where μR(xmem) = ymem, μR(xnd) = ynd, μR(xcn) = ycn, μR(xk) = xkinc, μR(xkinc) = ykinc, μR(x-) = y-, μR(ˆxnd) = xˆnd, and μR(xnext) = xmem. Thus, the cycle of the first loop closes here.

**Fig. 5.** Traversing a List Invariant

As mentioned, in the path from O to R there is a state at position (cmpF, 1) which is refined (similar to State C). If k < n holds, we reach R. The other path with k <n leads out of the for loop to the block initPtr followed by the while loop (see State S and the corresponding LLVM code on the side). The value xmem at address tail ptr is loaded to tail' and stored at a new pointer variable ptr. State T is reached after the first iteration of the while loop body. Here, block cmpW loads the value xmem stored at ptr to

str. Since it is not the null pointer, we enter bodyW, which corresponds to the body of the while loop. First, xmem is cast to an i8 pointer. Then getelementptr computes a pointer xnp to the next element by adding 8 bytes to xmem. After another cast back to a list\* pointer, we load the content of the new pointer to next. To this end, we need the following new variant of the load rule to load values that are described by a list invariant.


With this new load rule, the content of the new pointer is identified as xnext. It is loaded to next and stored at xptr. Then we return to the block cmpW (State T). Merging T with its predecessor at the same program position is not possible yet since the domains of the respective *LV* functions do not coincide. Now, xnext is loaded to str and compared to the null pointer. Since we do not have information about xnext, T's successor state is refined to a state with xnext = 0 (which starts a path out of the loop to a return state), and to a state with xnext ≥ 1, which reaches U after a few evaluation steps, see Fig. 5. Now, getelementptr computes the pointer x np <sup>=</sup> <sup>x</sup>next <sup>+</sup> 8 to the third element of the list, which is assigned to next ptr. U contains x- ≥ 2 since the first and the last pointer value are known to be different (xnext ≠0). This information is crucial for creating a new list invariant starting at xnext, which is used in the next iteration of the loop. Therefore, if our list invariant did not contain variables for the first and the last pointer, we could not prove termination of the program. In such a case where the pointer to the third element of a list invariant is computed and the length of the list is at least two, we *traverse* the list invariant to retain the correspondence between the computed pointer x np and the new list invariant. In the resulting state V , we represent the first list element by an allocation xmem, xend mem and preserve all knowledge about this element that was encoded in the list invariant (xend mem <sup>=</sup>xmem+15, <sup>x</sup>mem -→i32 xnd, xnp -→list\* xnext). Moreover, we adapt the list invariant such that it now represents the list at xnext (i.e., without its first element) starting with the value x nd. We also relate the length of the new list invariant to the length of the former one (x - = x-− 1).

Thus, in addition to the rule for getelementptr in [25], we now introduce rules for list traversal via getelementptr. The rule below handles the case where the address calculation is based on the type i8 and the getelementptr instruction adds the number of bytes given by the term t to the address pa. Here, the offsets in our list invariants are needed to compute the address of the accessed field. We also have similar rules for list traversal via field access (i.e., where the next element is accessed using curr'->next as in the for loop) and for the case where we cannot prove that the length vof the list is at least 2, see [18].


We continue the symbolic execution of State V in our example and finally obtain a complete SEG with a path from a state W at the position (cmpW, 0) to the next state W at this position, and a generalization edge back from W to W using an instantiation μ<sup>W</sup>- . Both W and W contain a list invariant similar to T where instead of the length x in T, we have the symbolic variables z and z in W and W , where μ<sup>W</sup>- (z-) = z -(see [18] for more details).

### **4 Proving Termination**

To prove termination of a program P, as in [25] the cycles of the SEG are translated to an integer transition system whose termination implies termination of P. The edges of the SEG are transformed into ITS transitions whose application conditions consist of the state formulas s and equations to identify corresponding symbolic variables of the different states. For evaluation and refinement edges, the symbolic variables do not change. For generalization edges, we use the instantiation μ to identify corresponding symbolic variables. In our example, the ITS has cyclic transitions of the following form:

$$\begin{aligned} O(x\_{\mathtt{n}}, x\_{\mathtt{k}}, x\_{\mathtt{k}\mathtt{n}\mathtt{c}}, \ldots) &\to \, ^{+}R(x\_{\mathtt{n}}, x\_{\mathtt{k}}, x\_{\mathtt{k}\mathtt{n}\mathtt{c}}, \ldots) &| & x\_{\mathtt{k}\mathtt{n}\mathtt{c}} = x\_{\mathtt{k}} + 1 \wedge x\_{\mathtt{n}} > x\_{\mathtt{k}} \wedge \ldots \\ R(x\_{\mathtt{n}}, x\_{\mathtt{k}}, x\_{\mathtt{k}\mathtt{n}\mathtt{c}}, \ldots) &\to \, ^{0}O(x\_{\mathtt{n}}, x\_{\mathtt{k}\mathtt{n}\mathtt{c}}, \ldots) \\ W(z\_{\mathtt{\ell}}, z'\_{\mathtt{\ell}}, \ldots) &\to ^{+}W'(z\_{\mathtt{\ell}}, z'\_{\mathtt{\ell}}, \ldots) &| & z\_{\mathtt{\ell}} = z'\_{\mathtt{\ell}} - 1 \wedge z\_{\mathtt{\ell}} \ge 1 \wedge \ldots \\ W'(z\_{\mathtt{\ell}}, z'\_{\mathtt{\ell}}, \ldots) &\to \, W(z'\_{\mathtt{\ell}}, \ldots) \end{aligned}$$

The first cycle resulting from the generalization edge from R to O terminates since k is increased until it reaches n. The generalization edge yields a condition identifying xkinc in R with x<sup>k</sup> in O, since μR(xk) = xkinc. With the conditions <sup>x</sup>kinc <sup>=</sup> <sup>x</sup><sup>k</sup> <sup>+</sup> 1 and <sup>x</sup><sup>n</sup> <sup>&</sup>gt; <sup>x</sup><sup>k</sup> (from *KB* <sup>O</sup>), the resulting transitions of the ITS are terminating. The second cycle from the generalization edge from W to W

terminates since the length of the list starting with curr' decreases. Although there is no program variable for the length, due to our list invariants the states contain variables for this length, which are also passed to the ITS. Thus, the ITS contains the variable z- (where z in W is identified with z in W due to μW- (z-) = z -). Since the condition z - = z- − 1 is obtained on the path from W to W and z- ≥ 1 is part of W due to the list invariant with length z in *LI* <sup>W</sup> , the resulting transitions of the ITS clearly terminate. Analogous to [25, Cor. 11 and Thm. 13], we obtain the following theorem. To prove that a complete SEG represents all program paths, in [25] we used the LLVM semantics defined by the Vellvm project [26]. One now also has to prove soundness of those symbolic execution rules which were modified due to the new concept of list invariants (i.e., generalization, list extension, and list traversal), see [18].

**Theorem 1 (Memory Safety and Termination).** *Let* P *be a program with a complete SEG* G*. Since a complete SEG does not contain ERR,* P *is memory safe for all concrete states represented by the states in* G*.* <sup>4</sup> *If the ITS corresponding to* G *is terminating, then* P *is also terminating for all states represented by* G*.*

### **5 Conclusion, Related Work, and Evaluation**

We presented a new approach for automated proofs of memory safety and termination of C/LLVM-programs on lists. It first constructs a symbolic execution graph (SEG) which overapproximates all program runs. Afterwards, an integer transition system (ITS) is generated from this graph whose termination is proved using standard techniques. The main idea of our new approach is the extension of the states in the SEG by suitable *list invariants*. We developed techniques to infer and modify list invariants automatically during the symbolic execution.

During the construction of the SEG, the list invariants abstract from a concrete number of memory allocations to a list of allocations of variable length while preserving knowledge about some of the contents (the values of the fields of the first and the last element) and the list shape (the start address of the first element, the list length, and the content of the last recursive pointer which allows us to distinguish between cyclic and acyclic lists). They also contain information on the memory arrangement of the list fields which is needed for programs that access fields via pointer arithmetic. The symbolic variables for the list length and the first and last values of list elements are preserved when generating an ITS from the SEG. Thus, they can be used in the termination proof of the ITS (e.g., the variables for list length can occur in ranking functions).

In [5,6,22] we developed a technique for termination analysis of Java, based on a program transformation to *integer term rewrite systems* instead of ITSs. This approach does not require specific list invariants as recursive data structures on the heap are abstracted to terms. However, these terms are unsuitable for

<sup>4</sup> Our approach can only prove but not disprove memory safety, i.e., a SEG with the state ERR just means that we failed in showing memory safety.

C, since they cannot express memory allocations and the connection to their contents.

Separation logic predicates for termination of list programs were also used in [1], but their list predicates only consider the list length and the recursive field, but no other fields or offsets. The tools Cyclist [24] and HipTNT+ [19] are integrated in separation logic systems which also allow to define heap predicates. However, they require annotations and hints which parameters of the list predicates are needed as a termination measure. The tool 2LS [20] also provides basic support for dynamic data structures. But all these approaches are not suitable if termination depends on the contents or the shape of data structures combined with pointer arithmetic. In [10], programs can be annotated with arithmetic and structural properties to reason about termination. In contrast, our approach does not need hints or annotations, but finds termination arguments fully automatically.

We implemented our approach in AProVE [25]. While C programs with lists are very common, existing tools can hardly prove their termination. Therefore, the current benchmark collections for termination analysis contain almost no list programs. In 2017, a benchmark set<sup>5</sup> of 18 typical C-programs on lists was added to the *Termination* category of the *Competition on Software Verification* (*SV-COMP*) [3], where 9 of them are terminating. Two of these 9 programs do not need list invariants, because they just create a list without operating on it afterwards. The remaining seven terminating programs create a list and then traverse it, search for a value, or append lists and compute the length afterwards. Only few tools in *SV-COMP* produced correct termination proofs for programs from this set: HipTNT+ and 2LS failed for all of them. CPAchecker [2] and PeSCo [23] proved termination and non-termination for one of these programs in 2020. UAutomizer [8] proved termination for two and non-termination for seven programs. The termination proofs of CPAchecker, PeSCo, and UAutomizer only concern the programs that just create a list. Our new version of AProVE is the only termination prover<sup>6</sup> that succeeds if termination depends on the shape or contents of a list after its creation. Note that for non-termination, a proof is a single non-terminating program path, so here list invariants are less helpful.

For the *Termination Competition* [15] 2022, we submitted 18 terminating C programs on lists<sup>7</sup> (different from the ones at *SV-COMP*), where two of them just create a list. Three traverse it afterwards (by a loop or recursion), and ten search for a value, where for nine, also the list contents are relevant for termination. Three programs perform common operations like inserting or deleting an element. UAutomizer proves termination for a program that just creates a list but not for programs operating on the list afterwards. With our approach, AProVE succeeds on 17 of the 18 programs. Overall, AProVE and UAutomizer were the two

<sup>5</sup> https://github.com/sosy-lab/sv-benchmarks/tree/master/c/termination-memorylinkedlists.

<sup>6</sup> We did not compare with the tool VeriFuzz [21], since it does not prove termination but only tests for non-termination and thus, it is unsound for inferring termination.

<sup>7</sup> https://github.com/TermCOMP/TPDB/tree/master/C/Hensel 22.

most powerful tools for termination of C in *SV-COMP* 2022 and the *Termination Competition* 2022, with UAutomizer winning the former and AProVE winning

the latter competition. To download AProVE, run it via its web interface, and for details on our


experiments, see https://aprove-developers.github.io/recursive structs.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Reasoning About Regular Properties: A Comparative Study**

Toma´s Fiedor, Luk <sup>ˇ</sup> a´s Hol <sup>ˇ</sup> ´ık(B) , Martin Hruska, Adam Rogalewicz, Juraj S ˇ ´ıc, ˇ and Pavol Vargovcˇ´ık

Brno University of Technology, Brno, Czech Republic *{*ifiedortom,holik,ihruska,rogalew,sicjuraj,ivargovcik*}*@fit.vutbr.cz

**Abstract.** Several new algorithms for deciding emptiness of Boolean combinations of regular languages and of languages of alternating automata have been proposed recently, especially in the context of analysing regular expressions and in string constraint solving. The new algorithms demonstrated a significant potential, but they have never been systematically compared, neither among each other nor with the state-of-the art implementations of existing (non)deterministic automata-based methods. In this paper, we provide such comparison as well as an overview of the existing algorithms and their implementations. We collect a diverse benchmark mostly originating in or related to practical problems from string constraint solving, analysing LTL properties, and regular model checking, and evaluate collected implementations on it. The results reveal the best tools and hint on what the best algorithms and implementation techniques are. Roughly, although some advanced algorithms are fast, such as antichain algorithms and reductions to IC3/PDR, they are not as overwhelmingly dominant as sometimes presented and there is no clear winner. The simplest NFA-based technology may sometimes be a better choice, depending on the problem source and the implementation style. We believe that our findings are relevant for development of automata techniques as well as for related fields such as string constraint solving.

### **1 Introduction**

Efficient representation of regular properties of finite words has been the subject of research for a long time, with applications and results spanning much of the field of formal reasoning, including regular expression matching, verification, testing, modelling, or general decision procedures of logics. When regular properties are combined using Boolean and similar operations, interesting decision problems are PSPACE-complete. This includes the most essential problem of language emptiness (further just emptiness). The textbook approaches that use deterministic automata are plagued by state space explosion. Determinization and complementation is done by exponential subset construction and conjunction is quadratic. This motivated the research on efficient algorithms for non-deterministic and alternating finite automata (NFA and AFA, respectively).

Using nondeterminism and alternation, one can gain one or two levels of exponential savings in the size of automata, respectively. Alternation in context of automata was first studied in [24] and [18,38,53], and extensively in the context of automata over infinite words and temporal logics (e.g., [57,58,66,76]). It adds conjunctive branching to the disjunctive non-deterministic branching and allows to avoid the blow-up in the automata size completely. However, from the perspective of the worst case complexity, the gained succinctness is payed back by the PSPACE-completeness of language emptiness. Still, the more succinct the representation gives more opportunities for clever heuristics that combat the worst case complexity and work in practical cases, essentially by avoiding re-creation of the entire (non)deterministic representation.

Several very promising techniques and their implementations were proposed during the recent years. The latest advances in testing AFA emptiness appeared in the context of analysing combinations of regular expressions and in string solving. A group of these techniques is based on reducing AFA emptiness to a reachability in a Boolean transition systems and using existing implementations of model-checking algorithms, most notably of IC3/PDR [15,46], such as ABC [17], nuXmv [22], or IC3Ref [16], to solve it [27,28,47,80]. The most recent contribution from [73] extends the SMT-solver Z3 with symbolic derivatives, a generalisation of Antimirov derivatives of regular expressions. Z3 uses them to convert a combination of regular expressions into an alternating/Boolean automaton and on the fly tests its language emptiness through the classical de-alternation and a search for an accepting configuration.

Slightly older algorithm for testing equivalence of AFA (convertible to an emptiness test) is based on computing bisimulation up-to congruence [30]. It generalizes the original NFA-equivalence test of [11]. The congruence closure algorithms were preceded by the antichain algorithms that optimize the subset construction by the subsumption pruning [41,82], and by the first attempt to use the model checking algorithms, namely the algorithm Impact of [63], to emptiness of combinations of regular properties [40]. Lastly, the area of string constraint solving gave rise to a large variety of string constraint solvers. They approach combinations of regular properties through a spectrum of clever techniques based e.g. on automata, transformations to other types of constraints, reasoning on lengths of strings, Parikh images, etc. (e.g. Z3 [65,73], CVC4/5 [7,68], Z3Str4 [9], OSTRICH [25,26], Trau [4,5] to name a few).

These works demonstrate a significant promise, but they are presented in specific, often narrow contexts and under varying views on state of the art. Consequently, they have never been sufficiently compared against each other. Even comparisons against the most efficient implementations of the more standard techniques based on (non)deterministic automata is rare. String solvers were compared only against string solvers, advanced AFA-emptiness tests were compared only against the basic de-alternation. A somewhat interesting comparison was done only between NFA-antichain and upto congruence-based language inclusion and equivalence test in [11] and in [39], and between the basic antichain based AFA emptiness and a version that uses abstract interpretation [41]. A number of works also take as their baseline implementations of automata or string solvers which, even though being respectable tools in their own right, are currently not the fastest solvers of combinations of regular properties in either category. On top of that, all the mentioned works on solving combinations of regular properties use only narrow benchmarks, often mutually exclusive.

Systematic comparisons of tools and algorithms on meaningful benchmarks is obviously needed to answer the questions 'What to use?' and 'What to compare with?', and generally for the field of reasoning about regular properties and automata to progress. We thus present a comparison of implementations of major algorithms. We compare the tools on a large benchmark of problems that we have collected from other works, from string constraint solving problems, analysis of regular expressions, regular model checking, and analysing LTL properties of systems. We believe that it is currently the most comprehensive benchmark in existence. Our main focus is on examples around string solving and analysis of regular expressions, which is also where the most of the recent developments has happened. These benchmarks mostly allow for a relatively simple representations of automata transition functions. Even though the alphabets in examples coming form this are large (e.g. UNICODE with up to 232 symbols), the alphabet size can, in most cases, be reduced to few symbols by working with alphabet minterms (classes of indistinguishable symbols) instead of individual symbols. The issue of effective symbolic representation of transition relations with large alphabets then does not dominate the evaluation, although it would be critical in other application areas, such as deciding WS1S (monadic second-order logic of one successor) or linear integer arithmetic [20,44,81].

We have obtained results that paint the basic landscape of the available techniques and tools. They identify tools and approaches which are likely to work well and should be used as the baseline in comparisons. We also provide a relatively diverse and large benchmark to be used in comparisons. The results broadly confirm that the new algorithms represent a leap in efficiency compared to the technology of DFA and also make a reduction of a problem to language emptiness of alternating automaton an attractive option. On the other hand, they challenge some folklore knowledge and conclusions implied elsewhere. For instance, reductions to IC3/PDR, although yielding one of the fastest algorithm, are not as vastly superior as sometimes presented. Some practically relevant benchmark categories are best solved by a combination of an antichain algorithm with a SAT solver. Others, surprisingly many in fact, by a simple efficiency oriented implementation of basic algorithms for nondeterministic automata. Our results also underscore that there is no universal silver bullet. The particular kind of the problem, determined to a large degree by its source, is a decisive factor that should be taken into account when choosing and tuning a solver.

We will maintain and further grow the benchmark set, at GitHub [1], as well as the framework for the entire comparison, at [2], in order for it to be easily usable and extensible by others.

### **2 Preliminaries**

A *(nondeterministic) finite automaton (NFA)* over Σ is a tuple A = (,Δ, ,) where is a finite set of *states*, Δ is a set of *transitions* of the form −{→} with , ∈ and ∈ Σ, ⊆ is the set of *initial states*, and ⊆ is the set of *final states*. A *run* of A over a word ∈ Σ<sup>∗</sup> is a sequence 0−{1→} 1−{2→} ...−{→} where for all 1 ≤ ≤ , it holds that ∈ Σ∪ { }, = <sup>1</sup> · <sup>2</sup> ··· , and either −1−{→} ∈ Δ or −<sup>1</sup> = , = . The run is *accepting* if <sup>0</sup> ∈ and ∈ , and the *language* (A) of A is the set of all words for which A has an accepting run.

The automaton is *deterministic (DFA)* if for every state and symbol , Δ has at most one transition −{→} . Any NFA can be determinized by the *subset construction*, which creates the DFA = (2,Δ , {}, { | ∩ ≠ ∅}) where −{→} ∈ Δ iff = { | ∈ ∧ −{→} ∈ Δ}. The basic automata constructions implementing Boolean operations with languages are intersection, A∩A = ( × ,Δ×, × , × ) where (, )−{→} (, ) ∈ Δ<sup>×</sup> iff −{→} ∈ Δ and −{→} ∈ Δ , non-deterministic union A∪A = ( ∪ ,Δ∪Δ , ∪ , ∪ ), deterministic union by product which is the same as ∩ up to that the final states are × ∪ × , and complementation which consists of determinization and complementing the final states.

*Alternating Automata.* An *alternating finite automaton (AFA)* in the most general form would be a tuple M = (Σ,P,,, , ) where, when denoting B() the Boolean predicate formulae over variables : 1) Σ is a finite alphabet; 2) P is a set of unary *symbol predicates* with a free variable ; 3) is a finite set of *states*; 4) : → B( ∪P) is a *transition function* where states of have only positive occurrences 5) ∈ B() is a positive *initial condition*; and 6) ∈ B() is a negative *final/accepting condition*. 1

It can be interpreted as the *forward NFA* <sup>f</sup> = (Σ, P (),Δ<sup>f</sup> , , ) with states ⊆ called *configurations* of . Assume many sorted interpretation of formulae over variables of the type Boolean (values 0 and 1) and the variable of the type Σ. A set of states ⊆ is understood as an assignment → {0,1} in which () = 1 corresponds to ∈ . A pair (, ), ∈ Σ is understood as the same assignment extended with ↦→ . The satisfaction relation |= between a formula and a configuration or a pair (, ) is defined as usual. The transition relation Δ<sup>f</sup> then contains a transition −{→} iff ( , ) |= - ∈ <sup>Δ</sup>(), and and are the sets of configurations that satisfy and , respectively. It is common to define Δ<sup>f</sup> to contain only the smallest transitions, that is, for a given and , only the transitions −{→} with the ⊆-minimal target are in Δ. <sup>2</sup> The language of , (), is the language of <sup>f</sup> .

The AFA can equivalently be interpreted as the *backward NFA*, the automaton <sup>b</sup> = (Σ, P (),Δ<sup>b</sup>, , ) where −{→} ∈ Δ<sup>b</sup> if (, ) |= Δ() for each ∈ . Here it is enough to take, for a given and , only the transition with the ⊆-largest source <sup>3</sup> (this makes the transition relation backward deterministic).

*Boolean Automata.* Alternating automata may be extended to Boolean finite automata (BFA) by allowing any Boolean combination in the initial, final, and transition formulae (states in the initial and transition formulae may occur negatively, states in the final formula may occur positively). Note that the extension of AFA to BFA is not dramatic, as a BFA is easily encoded as an AFA with only double the size, by the following steps: 1) for each ∈ , add state ¯ with Δ(¯) = ¬Δ(), 2) transform all formulas in ,,Δ to DNF, 3) replace all literals ¬ by ¯ in Δ and and replace literals by ¬¯ in .

*Restricted Forms of AFA Transition Relation.* The general form of AFA, as defined above, is the most succinct. It provides space for most optimizations, such as in [77]. Automata in this form are generated from LTL conversions of [34] used in [30,77]. On the other hand, only a small subset of algorithms and tools support AFA in this most liberal form. A common restriction (used e.g. in [30]) is to separate symbols from states in

<sup>1</sup> This is not a most standard definition of AFA but it allows us to later cover and categorize their common syntactic variants. See e.g. [18,41,57] for more standard definitions.

<sup>2</sup> A state in a configuration is understood as a constraint. The less constraints, the more can be accepted from the configuration. Transitions to more constrained configurations are useless.

<sup>3</sup> Going backward, larger configurations are more permissive. Transitions from the same target with smaller configurations are useless.

the transition formulae, that is, having Δ() in the form ∧ with ∈ B(P), ∈ B(). We call such AFA *separated*. The transition relation can then be seen as a function → B(P) ×B(). Separated AFA are often considered with the state formula in the disjunctive normal form (e.g. in [36,41]), which we call the *DNF form*, and Δ then may be seen as a set of transitions of the form −{→} where is a (positive) clause of .

*The Decision Problems.* We will concentrate on two decision problems:


### **3 Existing Algorithms and Tools**

In this section, we will overview the existing approaches and tools implementing AFA and BRE emptiness.

#### **3.1 Representation of Automata Transition Relations**

In the simplest form, a predicate on a automata transition represents a single letter from the alphabet. This is called an *explicit transition*. Explicit automata are simple, allow for low level optimizations, and implementation of complex algorithms for them is manageable (such as advanced algorithms for computing simulations [23,50,70]). The technique of a-priori mintermization, that replaces the alphabet by the alphabet of minterms, classes of indistinguishable symbols, makes explicit automata usable also when alphabets are large. However, when the number of minterms tends to explode, explicit automata do not scale.

Various implementations of automata have been using transition predicates implemented as BDDs, Boolean formulae, formulae over SMT-theory of bit-vectors, intervals of numbers, etc. This has been systematized in the works on *symbolic automata* [31,33,79], where the symbol predicates may be taken from any effective Boolean algebra (and the automata are in the separated form). Even more compact than symbolic automata are representations of the transition relation used in the WS1S solver MONA or in some of the implementations of AFA, which in a way drop the restriction to the separated form. We will discuss the concrete implementations below.

#### **3.2 (Non)deterministic Finite Automata**

The baseline approach to solve BRE is to use DFA or NFA. Boolean operations are implemented as the classical construction listed in Sect. 2. Automata may be kept deterministic, or they are kept non-deterministic whenever possible and determinized only before complementing. An important ingredient of achieving efficiency is usually to

<sup>4</sup> ⊆ is emptiness of ∩ and equivalence is emptiness of ( ∩ )∪( ∩ ).

minimize automata at least once every few operations (important e.g. in applications such as regular model checking [12] or some approaches to string solving [4,10,25]). The deterministic approaches construct the minimal DFA by the Hopcroft, Moore, Brzozowski, or the Huffman algorithm [19,52,54,64], the non-deterministic approach may use simulation [23,45,50,55,70] or bisimulation [48,69,75] based reduction methods. Simulation reduces significantly more but is much costlier. DFA/NFA are implemented in many libraries. Here we select a representative sample.

First, ENFA is the simplest tool, our own implementation of NFA, which was originally meant to play the role of a baseline. It uses explicit automata with mintermization. It is implemented in C++, with efficiency in mind, but with no extensive optimizations (roughly, transitions from a state stored in a two layered data structure, the first layer divided and ordered by symbols, and the second layer ordered by the target state). It uses an off the shelf implementation of one of the newest generation algorithms for computing simulation [23,50,70] (that achieve good efficiency through a usage of the partition-relation data structure) taken from VATA tree automata library [59] (implementing namely [50]).<sup>5</sup>

The BRICS automata library [67] is often considered a baseline in comparisons [67]. It uses primarily deterministic automata and transition relation represented symbolically using character ranges. It is written in Java and relatively optimized.

The AUTOMATA library [78], made in C#, implements symbolic NFA/DFA parametrized by an effective Boolean algebra. We use it with the default algebra of BDDs. AUTOMATA has been long developed and has accumulated many optimizations and novel techniques for handling symbolic automata (e.g., optimized minimization [32]).

MONA [44], written in C, is the most influential and optimized implementation of deterministic automata. It specialises in deciding WS1S formulae, which besides Boolean combinations includes also quantification. The decision procedure generates DFA with complex transition relations over large alphabets of bit-vectors. For this purpose, MONA uses a compact representation of the transition relation: a single MTBDD for all transitions originating in a state, with the target states in its leaves. MONA can represent only a DFA, hence it always implicitly determinizes.

VATA [59], written in C++, is a library implementing non-deterministic tree automata. As NFA are a special case of tree automata, we can use it as an implementation of the basic constructions for explicit NFA. It is relatively optimized. We include it into the comparison for its fast implementation of the antichain inclusion checking [12,49], which for NFA boils down to the inclusion check of [36].

#### **3.3 Alternating Automata**

*De-alternation.* The basic approach to AFA emptiness is *de-alternation*, transformation to an NFA, either the forward <sup>f</sup> or the backward <sup>b</sup>, followed by testing the emptiness of the resulting NFA. Both NFAs are constructed by a variation on the NFA subset construction. We are not aware of any tool using pure de-alternation, and we believe that it would not be competitive. The forward algorithm is however the basis of [73]

<sup>5</sup> In our experiment, simulation is only used after parsing and has minimal overall impact.

used in Z3 where it is run on the fly with a novel symbolic derivative construction (discussed also in the paragraph on string constraint solvers).

*Interpolation Based Abstraction Refinement.* Attempts to harness model checking algorithms to AFA emptiness appeared in the context of string solving and processing of regular expressions. To our best knowledge, the earliest attempt was [40], where conjunctions of regular constraints were solved using the interpolation-based algorithm of [62]. The interpolation-based abstraction refinement, namely the algorithm Impact of [63], was also used in [56]. This work concentrated on more general problem, solving emptiness of AFA over data words with an infinite data domain (that can relate past and current values of data variables). Their tool JALTIMPACT [3] (in Java), that we include into our comparison, can be run on our benchmark too.

*Reduction to Reachability and IC3/PDR.* The work of [80] presented the first translation of string constraints (mostly BRE) into reachability in a Boolean transition system (circuit) that was then solved by the model checker nuXmv [22]. This was de facto the first reduction of AFA emptiness to reachability in a Boolean transition system (BTS).

Let us briefly overview the basic principle of the reduction. The *forward BTS* for an AFA has configurations that are Boolean assignments to , initial and final configurations satisfy and , respectively, and transitions are given by the formula Φf Δ : - ∈ → [Δ()] . Here we use [] to denote the formula obtained from by substituting every state by its primed version , and we will also denote by [] the primed version { | ∈ } of a configuration . A *successor* of a configuration is any configuration ¯ such that [¯] satisfies ∃∃Φ<sup>f</sup> <sup>Δ</sup> <sup>∧</sup> - ∈ (the symbol variable alpha is of the bit-vector sort). *Reachability* is then the transitive and reflexive closure of the successor relation and the *reachability problem* asks whether a final configuration is reachable from an initial one. It is the case if and only if is not empty. The forward reduction has been used in [80]. Alternatively, the *backward BTS* for has the initial configurations satisfying , final configurations satisfying , and the successor relation given by the formula Φ<sup>b</sup> Δ : - ∈ <sup>→</sup> <sup>Δ</sup>().

The work [28] applied IC3/PDR [15,46], implemented in IC3Ref [16], together with the backward BTS reduction to solve emptiness of BRE and obtained very encouraging results. The implementation used in [28], called Qzy, is, however, proprietary and not publicly available. Similar approach was taken by [47], where a string constraint was translated to a multi-tape AFA and then to a BTS by the forward translation, and given to IC3/PDR to solve through tools nuXmv [22] or ABC [17]. Results of [77] seem to indicate that the backward translation is better and the same is suggested by the comparison in [27,28] in which the string solver Sloth [47], based on the forward reduction, was much slower than Qzy, based on the backward reduction. In this comparison, we include our own C++ implementation BWIC3 of the backward reduction based on the model checker ABC.

*Antichains.* Antichain algorithms presented in [82] were the first breakthrough in solving BRE. They use subsumption relations between the states of the automata constructed by variations of the subset construction to prune the constructions. They were used to test language universality and inclusion of NFAs and AFA emptiness. The AFA emptiness namely is based on an on-the-fly search for an accepting state of the <sup>f</sup> or for an initial state of the <sup>b</sup>. Subsumption prunes discovered states that are larger (smaller for the backward algorithm) than others.

The antichain algorithms were enhanced and generalized in a number of works, e.g. with a more aggressive pruning by the simulation-based subsumption [6,36], or by counterexamples guided abstraction refinement in [41]. In this comparison, we include the NFA inclusion check implemented in the VATA tree automata library [59]. We also experimented with a student-made implementation of the antichain AFA emptiness check of [41] that uses abstraction refinement (the original implementation is no longer maintained and we were not able to run it). However, not being able to achieve a competitive performance, we excluded it from the comparison. One reason of the poor performance may be that simplest form of AFA, explicit DNF form (used in the original version [41]), might be too inefficient and costly to construct in our examples, partly due to a large number of minterms induced by the AFA emptiness benchmark.

We implemented (in C++) the antichain AFA emptiness test of [36] that integrates tightly with a SAT solver to handle the general form of AFA with large alphabets. We will refer to it as ANTISAT. We will briefly explain its principle. It essentially implements the reachability test for the backward BTS discussed in the previous paragraph. A configuration is represented by the conjunction = - ∈\ ¬. Note that is satisfied by the downward closure of , which are all configurations included in (subsumed by) . To compute predecessors of configurations represented by , the SAT solver (namely MiniSAT [37]) is called on the formula Φ : Ψ<sup>b</sup> <sup>Δ</sup> <sup>∧</sup> <sup>∧</sup>Ach. Here, Ach excludes all already discovered configurations from the solution. It is a conjunction of clauses : ∈\ for every previously discovered configuration . The SAT solver discovers a satisfying assignment , which is turned into a new configuration = ∩ (that is, the values of the symbol bits constituting the bit-vector are omitted from ). Unless is initial, it is queued for further predecessor computation and is immediately added to Ach through the interface of incremental SAT solving as the clause . Finally, only maximal predecessors of are of interest, as the non-maximal ones are subsumed by them. We enforce the maximality of through working directly with the internal SAT solver structures: at decision points, the SAT solver is forced to give priority to decisions that assign 1 to state variables.

*Bisimulation up-to Congruence.* A later class of algorithms, here refered to as *up-to algorithms*, checks equivalence as a bisimulation between configurations of AFA, and utilises the up-to congruence technique to prune the search space. The first algorithm on NFA equivalence [11] was extended to alternating automata emptiness check in [30]. These algorithms are close to antichains. As shown in [11], the pruning potential of the up-to techniques is in theory the same or larger than that of antichain. A disadvantage of the up-to congruence technique is the need for expensive evaluation of congruence closures. The more extensive experiments of [39] shows antichain algorithms as faster, with an exception of randomly generated automata with small alphabets and very dense transition relations. We include into the comparison the Java implementation of the AFAemptiness of [30] (emptiness reduces to equivalence with a trivial empty AFA), that we refer to as BISIM. The other implementations of up-to algorithms we are aware of, from [39] and [11], are single-purpose programs that decide equivalence of two NFAs, hence we would be able to run them on a very small fraction of our benchmark only.

### **3.4 String Constraints Solvers**

There are dozens of string constraint solvers that implement, to a various degree, a support for deciding combinations of regular properties. String languages are rich and BRE are not the absolute priority of the solvers, hence they perform on them generally worse than specialised tools. However, string solvers implement a wide scale of unique techniques and pragmatic heuristics that may work in specific instances. Representatives of the solvers with the most mature implementations (also used in most comparisons in the literature) are Z3 [65,73] and CVC5 [7,68]. CVC5 solves BRE mostly through rewriting rules. Recently [73] extended Z3 with an approach based on the Antimirov derivative automata construction generalised to symbolic automata and extended regular expressions. Essentially, the construction produces a symbolic AFA/BFA and checks its emptiness on the fly while running the forward de-alternation. As shown in [73], it is significantly more efficient in solving BRE than other SMT solvers (including CVC5).

#### **3.5 Other Approaches and Tools**

Although we believe that we have collected a representative subset of existing algorithms and tools, we have not collected all interesting specimens. Some were not available, some were difficult to run or prepare the inputs for, some seemed covered by experimentation in other works. Including these tools and algorithms into the comparison could still be interesting and we leave it for the future work (we plan to keep extending the tool base as well as the benchmark set). Namely, the tool DPRLE [51], used in the comparison in [28], seemed to be mostly outperformed by the IC3/PDR approach implemented in Qzy, however, not absolutely consistently. The implementation of NFA antichain and up-to congruence techniques used in [39] seems efficient, with its NFA antichain inclusion twice as fast as that of VATA. The up-to congruence NFA equivalence checking of [11] could be fast too ([11] and [39] report somewhat conflicting results). There are numerous NFA/DFA libraries, e.g. the C alternative of BRICS [61] or the Java implementation of symbolic NFA of [29]. ALASKA [35] might contain interesting implementations of antichain algorithms but is no longer maintained and available. Our comparison is missing a basic implementation of antichain-powered de-alternation for explicit AFA in the DNF form, which, if not overwhelmed by a large number of minterms, could reach a good performance through simple fast data structures, similarly to our ENFA.

### **4 Benchmarks**

We collected as comprehensive benchmark as possible, harvesting examples used in previous works as well as generating some of our own. It is available together with the whole experiment from [2] and at GitHub [1] (we plan to maintain and grow the benchmark and welcome contributors).

Our main focus of the current benchmark set is the areas where the most of the development in solving AFA and BRE emptiness happened recently, which is string constraint solving and analysis of regular expressions used in analysing and filtering texts. Atomic regular properties are here mostly given in the form of regular expressions over UNICODE character classes. The alphabet is large but the number of minterms is mostly small or moderate. This is true also for our examples from regular model checking. Symbolic handling of complex transition relations over large alphabets is thus not absolutely crucial and the experiment can stay focused on the main algorithms for emptiness check. For that reason, we do not include benchmarks from solving WS1S [21], the primary target of MONA, or Presburger arithmetic with automata [13,81], where the techniques of handling symbolic alphabet are indispensable. Techniques specialising at this kind of problems would deserve their own study. Our benchmarks where the symbolic alphabet representation is still rather important are AFA coming from (combinations of) LTL properties, with alphabets of sets of atomic propositions, and from translations of string constraint problems to AFA with complex multi-track alphabets.6

*Boolean Combinations of Regular Expressions.* This group of BRE contains benchmarks on which we can run all tools, including those based on NFA and DFA. They have small to moderate numbers of minterms (about 30 in average, at most over a hundred).


(1) [a-c]a[a-c]{ +1} ∩[a-c]a[a-c]{} (long strings),

<sup>6</sup> We did not attempt to generate purely random problems. First, purely random automata generated e.g. by [74] seem to have different characteristics than automata coming from practical problems (e.g. in [12,39]). Second, although generating random NFA is possible with a generator controlled by three simple parameters which give a manageable parameter-value space covering all NFA, it is not clear how to similarly generate random AFA or BRE. On the other hand, we do include a benchmark based on randomly generated LTL formulae, which we consider relatively close to realistic LTL specifications.

<sup>7</sup> https://github.com/lorisdanto/symbolicautomata/blob/master/benchmarks/src/main/java/ regexconverter/pattern%4075.txt.


*AFA Benchmark.* The second group of examples contains AFA not easily convertible to BRE. Here we can run only tools that handle general AFA emptiness. Some of these benchmarks also have large sets of minterms (easily reaching to thousands) and complex formulae in the AFA transition function, hence converting them to restricted forms such such as separated DNF or explicit may be very costly. This also seems to be the main reason for which our implementation of [41] could not compete.


<sup>8</sup> https://drive.google.com/file/d/1eOYGvm3C8sQ-9iyfZ8qx42K54hgrFNTC.

### **5 The Comparison**

We ran our experiments on Debian GNU/Linux 11, with Intel Core 3.4 GHz processor, 8 CPU cores, and 20 GB RAM. All experiments were run with the timeout of 60 s (increasing the timeout did not have a significant impact). Additional details as well as the virtual machine with the entire benchmark are available at [2].

*Benchmarking Infrastructure.* The initial difficulty is that the tools expect different input formats and forms of automata and the benchmarks come in different formats as well. We converted all benchmarks to our internal AFA format, from which we generated formats supported by the AFA handling tools JALTIMPACT, BWIC3, ANTISAT, and BISIM, or we extend the tools with a parser. The BRE benchmarks come from various sources. We first convert them into a master file which specifies the Boolean combination of atomic NFA, each atomic NFA stored in a separate file. The SMTlib format is generated for Z3 and CVC5. In the case of b-hand-made, b-param, and b-smt, the atomic automata are translated from regular expressions using the parser of BRICS, while in the case of b-regex, where the regexes contain features not supported by BRICS, we use the parser from BISIM. b-smt and b-hand-made requires first translating from SMT-lib to a regular expression. In the case of b-armc-incl, the atomic automata come directly as NFAs, and are converted into formats of the individual BRE solvers (we again wrote parsers for some of the solvers), and to our AFA format for the AFA solvers. Every BRE solver was extended by an interpreter of the master file that reads the NFA/DFA from the generated solver-specific files (except the SMT solvers, which read SMT-lib). We note that due to some difficulties with internal structures, we currently cannot run BRICS on b-armc-incl, and due to the lack of a converter from complex regular expressions and from pure NFA to the SMT format, we do not run Z3 and CVC5 on b-regex and on b-armc-incl.

*Measured Data.* We will present the results obtained with BRE (where we run all the tools) and with AFA emptiness (where we run BWIC3, ANTISAT, BISIM, and JALTIM-PACT) separately. We also separate the results on examples from applications from results on parametric hand-made examples.

Table 1 summarizes the statistics from evaluating the benchmarks. The table lists: (i) the average time, (ii) the median time, and (iii) the number of timeouts and number of errors (mostly, a tool ran out of the memory, made a bad alloc or ran into a segmentation fault). A few errors, e.g. in CVC5 or BISIM, were due to the unsupported features in the inputs. The tools' performance is then visualised on cactus plots in Fig. 1. For each tool, the plot shows the progress of the tool on each benchmark: the axis is the cumulative time taken on the benchmark, with the individual examples on the axis ordered by the runtime taken by the tool. Timeouts are omitted. In the appendix, we also show a set of scatter-plots that compare for every benchmark the three best performing tools.

Finally, we compared the tools on the parametric benchmarks a-ltl-param and bparam. We illustrate the results in Fig. 2. Each graph shows the times for the increasing value of the specific parameter on the axis.


**Table 1.** Summary of AFA and BRE benchmarks. Table lists (i) the average, (ii) the median, and (iii) the number of timeouts and errors (in brackets). Winners are highlighted in bold.

#### **5.1 Discussion**

Based on the measurements, we make several observations.

Firstly, the tool which combines universality (it can be run on AFA as well as on BRE emptiness) with the most consistent good performance is BWIC3. It dominates most of the AFA emptiness benchmark, shows great or a very good performance on the BRE benchmark, and often stands out on the parametric examples. Moreover, the measurements reported in [28] suggest that the backward BTS reduction has even more potential. This is visible namely from the comparison of our results on the parametric benchmarks di -sat, di -unsat, inter-sat, and inter-unsat. Our implementation matched the result of [28] on di -sat and partially on inter-sat, saw a worse trend on di -unsat and much worse trend on inter-unsat. A likely culprit is a different underlying modelchecker, ABC [17] in our implementation versus IC3Ref [16] in [28]. However, IC3Ref was not used out of the box in [28], harnessing it efficiently for problems of our king is not entirely trivial.

Secondly, the results on application related BRE (all BRE except the parametric examples in b-param) quite surprisingly favour the tools based mostly on relatively basic NFA algorithms. The overall best is the simplest tool of all, our implementation ENFA of basic NFA constructions. Close to the performance of ENFA is VATA, which uses the antichain inclusion checking on b-armc-incl and b-regex (the fact that explicit complementation of ENFA is faster than the antichain of VATA suggests that the inclusion benchmarks are not particularly hard). VATA specialises to the more general tree automata, which probably causes unnecessary overhead. AUTOMATA also performs well. It uses slightly more advanced algorithms than ENFA (such as lazy evaluation of difference, though, without antichain pruning). Its symbolic representation of transition functions with BDDs probably does not provide much advantage here. This result challenges the view that translating complex problems, arising for instance in string con-

**Fig. 1.** Cactus plots of AFA and BRE benchmarks. The axis is the cumulative time taken on the benchmark in logarithmic scale, benchmark on the axis are ordered by the runtime of each tool.

straint solving, into AFA in order to use the sophisticated machinery of AFA solvers is an obvious silver bullet. Organizing the computation into smaller NFA operations, where, moreover, partial results can be minimized and re-used, and a simpler and hence more flexible NFA technology is used, might be a better strategy (this seems to work very well for instance in our recent prototype string constraint solver [10]).

Our AFA emptiness test ANTISAT based on the antichain algorithm and a SAT solver has an interesting performance. As can be seen on the cactus plots, besides its absolute domination on a-ltlf-spec, it is significantly faster than other tools on a large

**Fig. 2.** Models of runtime on parametric benchmarks based on specific parameter with timeout 60s. The sawtooths represent the tool failed on the benchmark for some while solving benchmarks for −1 and +1. For brevity, we draw the models only until they start continually failing.

portion of the other AFA emptiness benchmark, but struggles on the rest. The examples where it dominates are often automata with the structure resembling a lasso (or several lassos) with a long handle. The other implementation of an antichain algorithm, NFA/NTA inclusion in VATA, also shows a good performance. This together points on the overall strength of antichain algorithms.

The SMT string constraint solvers are not among the best in the benchmark related to practical applications, but are competitive (especially Z3), and win on some parametric cases. This may be due to that various heuristics unique to SMT solvers, especially rewriting that reduces one type of a constraint to another, kicks in. For instance, Z3 seems to solve exppaths1 with a help of rewriting to the sub-string constraint in the theory of sequences. In general, the measurements on parametric examples underscore the fact that no algorithm is universally the best and their relative performance may vary drastically depending on the kind of an input.

Although the mediocre performance of the other tools can be partially explained by their focus on a different kind of a problem or a dated underlying technology, and each of them is respectable in its own right, a point can be made against relying on them as a baseline in comparisons of tools for solving our kind of problem. MONA, optimized for a different settings (complex alphabets of bit-vectors with many minterms), is held back by the implicit determinization, and, in our case, probably by the overhead of the symbolic representation. It also frequently runs out of the 32-bit address space for BDD nodes. Similarly for BRICS, which also always determinizes. The low performance of BISIM is surprising relative to the good results of the up-to algorithms reported in [11,30]. It is more consistent with [39] where up-to algorithms were not wining against antichains on the more practical examples. Our results however do not directly contradict the results of [30] itself, since it does not compare with the fast tools identified here and stands to a large degree on parametric and random benchmarks. There is also always the possibility that we have prepared the input in a way not ideal for the tool. For instance, transformation to the separated AFA, required by BISIM, is not entirely trivial. Further investigation of this and a comparison with some other implementation of the up-to techniques seems to be needed. The lack of a raw speed of JALTIMPACT on BRE and AFA emptiness is expectable considering that it is meant for a different kind of systems, AFA over data words. The stable trends shown in the graphs suggest that an implementation of an interpolation-based abstraction refinement optimized for BRE and AFA emptiness might have a potential.

*Main Takeaways.* The backward reduction of AFA emptiness to BTS reachability in a combination with IC3 is very fast and extremely versatile, showing very good performance on almost all benchmarks. However, on BRE with a relation to a real world application, simple NFA algorithms actually tend to have the best raw performance, with the simplest implementation of NFA being the best. Antichain algorithms work also well, even significantly better than other algorithms on specific kinds of AFA. These seem to be the tools to use. Reasonable implementations of the backward BTS reduction with IC3, of antichain, and of basic NFA should also be the baseline of comparisons.

MONA and BRICS, based on DFA, as well as JALTIMPACT focused on data words rather then on pure regular properties, do no reach the performance of the best tools. Also BISIM did not confirm the power of up-to algorithms. SMT-solvers, Z3 especially, are competitive, but cannot be considered the top of state of the art.

Generally, the particular kind and source of benchmark is a decisive factor influencing the performance of tools, as especially visible on the parametric benchmark.

*Threads to Validity.* Our results must be taken with a grain of salt as the experiment contains an inherent room for error. Although we tried to be as fair as possible, not knowing every tool intimately, the conversions between formats and kinds of automata, discussed at the start of Sect. 5, might have introduced biases into the experiment. Tools are written in different languages and some have parameters which we might have used in sub-optimal way (we use the tools in their default settings), or, in the case of libraries, we could have used a sub-optimal combination of functions. We also did not measure memory peaks, which could be especially interesting e.g. in when the tools are deployed on a cloud. We are, however, confident that our main conclusions are well justified and the experiment gives a good overall picture. The entire experiment is available for anyone to challenge or improve upon [2].

**Acknowledgments.** This work has been supported by the Czech Ministry of Education, Youth and Sports ERC.CZ project LL1908, and the FIT BUT internal project FIT-S-20-6427.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Program Synthesis in Saturation**

Petra Hozzov´a1(B) , Laura Kov´acs<sup>1</sup> , Chase Norman<sup>2</sup>, and Andrei Voronkov3,4

> TU Wien, Vienna, Austria petra.hozzova@tuwien.ac.at UC Berkeley, Berkeley, USA University of Manchester, Manchester, UK EasyChair, Manchester, UK

**Abstract.** We present an automated reasoning framework for synthesizing recursion-free programs using saturation-based theorem proving. Given a functional specification encoded as a first-order logical formula, we use a first-order theorem prover to both establish validity of this formula and discover program fragments satisfying the specification. As a result, when deriving a proof of program correctness, we also synthesize a program that is correct with respect to the given specification. We describe properties of the calculus that a saturation-based prover capable of synthesis should employ, and extend the superposition calculus in a corresponding way. We implemented our work in the first-order prover Vampire, extending the successful applicability of first-order proving to program synthesis.

**Keywords:** Program Synthesis · Saturation · Superposition · Theorem Proving

### **1 Introduction**

Program synthesis constructs code from a given specification. In this work we focus on synthesis using functional specifications summarized by valid first-order formulas [1,14], ensuring that our programs are provably correct. While being a powerful alternative to formal verification [20], synthesis faces intrinsic computational challenges. One of these challenges is posed to the reasoning backend used for handling program specifications, as the latter typically include firstorder quantifier alternations and interpreted theory symbols. As such, efficient reasoning with both theories and quantifiers is imperative for any effort towards program synthesis.

In this paper we address this demand for recursion-free programs. We advocate the use of first-order theorem proving for extracting code from correctness proofs of functional specifications given as first-order formulas <sup>∀</sup>x.∃y.F[x, y]. These formulas state that "for all (program) inputs <sup>x</sup> there exists an output y such that the input-output relation (program computation) F[x, y] is valid". Given such a specification, we synthesize a recursion-free program while also deriving a proof certifying that the program satisfies the specification.

The programs we synthesize are built using first-order theory terms extended with if−then−else constructors. To ensure that our programs yield computational models, i.e., that they can be evaluated for given values of input variables x, we restrict the programs we synthesize to only contain *computable* symbols.

**Our Approach in a Nutshell.** In order to synthesize a recursion-free program, we prove its functional specification using saturation-based theorem proving [11,15]. We extend saturation-based proof search with answer literals [5], allowing us to track substitutions into the output variable y of the specification. These substitutions correspond to the sought program fragments and are conditioned on clauses they are associated with in the proof. When we derive a clause corresponding to a program branch if C then r, where C is a condition and r a term and both C, r are computable, we store it and continue proof search assuming that <sup>¬</sup>C holds; we refer to such conditions C as (program) branch conditions. The saturation process for both proof search and code construction terminates when the conjunction of negations of the collected branch conditions becomes unsatisfiable. Then we synthesize the final program satisfying the given (and proved) specification by assembling the recorded program branches (see e.g. Examples 1–3).

The main challenges of making our approach effective come with (i) integrating the construction of the programs with if−then−else into the proof search, turning thus proof search into *program search/synthesis*, and (ii) guiding program synthesis to only computable branch conditions and programs.

**Contributions.** We bring the next contributions solving the above challenges:<sup>1</sup>


<sup>1</sup> Proofs of our results are given in the extended version [8] of our paper.

### **2 Preliminaries**

We assume familiarity with standard multi-sorted first-order logic with equality. We denote variables by x, y, terms by s, t, atoms by A, literals by L, clauses by C, D, formulas by F, G, all possibly with indices. Further, we write σ for Skolem constants. We reserve the symbol for the *empty clause* which is logically equivalent to ⊥. Formulas and clauses with free variables are considered implicitly universally quantified (i.e. we consider closed formulas). By we denote the equality predicate and write t s as a shorthand for <sup>¬</sup>t s. We use a distinguished *integer sort*, denoted by Z. When we use standard integer predicates <, <sup>≤</sup>, >, <sup>≥</sup>, functions +, <sup>−</sup>,... and constants 0, <sup>1</sup>,... , we assume that they denote the corresponding interpreted integer predicates and functions with their standard interpretations. Additionally, we include a conditional term constructor if−then−else in the language, as follows: given a formula F and terms s, t of the same sort, we write if F then s else t to denote the term s if F is valid and t otherwise.

An *expression* is a term, literal, clause or formula. We write E[t] to denote that the expression E contains the term t. For simplicity, E[s] denotes the expression E where all occurrences of t are replaced by the term s. A *substitution* θ is a mapping from variables to terms. A substitution θ is a *unifier* of two expressions E and E if Eθ <sup>=</sup> <sup>E</sup> θ, and is a *most general unifier* (*mgu*) if for every unifier η of E and E , there exists substitution μ such that η <sup>=</sup> θμ. We denote the mgu of E and E with mgu(E,E ). We write <sup>F</sup>1,...,F*<sup>n</sup>* <sup>G</sup>1,...,G*<sup>m</sup>* to denote that <sup>F</sup><sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>F</sup>*<sup>n</sup>* <sup>→</sup> <sup>G</sup><sup>1</sup>∨... <sup>∨</sup>G*<sup>m</sup>* is valid, and extend the notation also to validity modulo a theory T. Symbols occurring in a theory T are *interpreted* and all other symbols are *uninterpreted*.

#### **2.1 Computable Symbols and Programs**

We distinguish between *computable* and *uncomputable* symbols in the signature. The set of computable symbols is given as part of the specification. Intuitively, a symbol is computable if it can be evaluated and hence is allowed to occur in a synthesized program. A term or a literal is *computable* if all symbols it contains are computable. A symbol, term or literal is *uncomputable* if it is not computable.

A *functional specification*, or simply just a *specification*, is a formula

$$\forall \overline{x}. \exists y. F[\overline{x}, y]. \tag{1}$$

The variables x of a specification (1) are called *input variables*. Note that while we use specifications with a single variable y, our work can analogously be used with a tuple of variables <sup>y</sup> in (1).

Let <sup>σ</sup> denote a tuple of Skolem constants. Consider a computable term <sup>r</sup>[σ] such that the instance F[σ, r[σ]] of (1) holds. Since <sup>σ</sup> are fresh Skolem constants, the formula <sup>∀</sup>x.F[x, r[x]] also holds; we call such <sup>r</sup>[x] a *program* for (1) and say that the program r[x] *computes a witness* of (1).


**Fig. 1.** The superposition calculus Sup.

Further, if <sup>∀</sup>x.(F<sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>F</sup>*<sup>n</sup>* <sup>→</sup> <sup>F</sup>[x, r[x]]) holds for computable formulas F1,...,F*<sup>n</sup>*, we write r[x], *n <sup>i</sup>*=1 <sup>F</sup>*<sup>i</sup>* to refer to a *program with conditions* <sup>F</sup>1,..., <sup>F</sup>*<sup>n</sup>* for (1). In the sequel, we refer to (parts of) programs with conditions also as *conditional branches*. In Sect. 4 we show how to build programs for (1) by composing programs with conditions for (1) (see Corollary 3).

#### **2.2 Saturation and Superposition**

Saturation-based proof search implements *proving by refutation* [11]: to prove validity of F, a saturation algorithm establishes unsatisfiability of <sup>¬</sup>F. Firstorder theorem provers work with clauses, rather than with arbitrary formulas. To prove a formula F, first-order provers negate F which is further skolemized and converted to clausal normal form (CNF). The CNF of <sup>¬</sup>F is denoted by cnf(¬F) and represents a set S of initial clauses. First-order provers then *saturate* S by computing logical consequences of S with respect to a sound inference system <sup>I</sup>. The saturated set of S is called the *closure* of S and the process of computing the closure of S is called *saturation*. If the closure of S contains the empty clause -, the original set S of clauses is unsatisfiable, and hence the formula F is valid.

We may extend the set S of initial clauses with additional clauses C<sup>1</sup>,...,C*<sup>n</sup>*. If C is derived by saturating this extended set, we say C is derived from S *under additional assumptions* C<sup>1</sup>,...,C*<sup>n</sup>*.

The *superposition calculus*, denoted as Sup and given in Fig. 1, is the most common inference system used by saturation-based provers for first-order logic with equality [15]. The Sup calculus is parametrized by a *simplification ordering* on terms and a *selection function*, which selects in each non-empty clause a non-empty subset of literals (possibly also positive literals). We denote selected literals by underlining them. An inference rule can be applied on the given premise(s) if the literals that are underlined in the rule are also selected in the premise(s). For a certain class of selection functions, the superposition calculus Sup is *sound* (if is derived from F, then F is unsatisfiable) and *refutationally complete* (if F is unsatisfiable, then can be derived from it).

#### **2.3 Answer Literals**

Answer literals [5] provide a question answering technique for tracking substitutions into given variables throughout the proof. Suppose we want to find a witness for the validity of the formula

$$\exists y. F[y]. \tag{2}$$

Within saturation-based proving, we first derive the skolemized negation of (2) and add an *answer literal* using a fresh predicate ans with argument y, yielding

$$\forall y. \left(\neg F[y] \lor \mathtt{ans}(y)\right). \tag{3}$$

We then saturate the CNF of (3), while ensuring that answer literals are not selected for performing inferences. If the clause ans(t<sup>1</sup>)<sup>∨</sup> ... <sup>∨</sup>ans(t*<sup>m</sup>*) is derived during saturation, note that this clause contains only answer literals in addition to the empty clause; hence, in this case we proved unsatisfiability of <sup>∀</sup>y.¬F[y], implying validity of (2). Moreover, <sup>t</sup>1,...,t*<sup>m</sup>* provides a *disjuntive answer*, i.e. witness, for the validity of (2); that is, F[t<sup>1</sup>]<sup>∨</sup> ... <sup>∨</sup>F[t*<sup>m</sup>*] holds [12]. In particular, if we derive the clause ans(t) during saturation, we found a *definite answer* t for (2), namely F[t] is valid.

**Answer Literals with** if−then−else**.** The derivation of disjunctive answers can be avoided by modifying the inference rules to only derive clauses containing at most one answer literal. One such modification is given within the A(R) calculus for binary resolution [22], where R is a so-called strongly liftable term restriction. The A(R)-calculus replaces the binary resolution rule when both premises contain an answer literal by the following A-resolution rule:

$$\frac{A \lor C \lor \mathtt{ans}(r) \quad \neg A' \lor C' \lor \mathtt{ans}(r')}{(C \lor C' \lor \mathtt{ans}(\mathtt{if}\ A \ \mathtt{then}\ r' \ \mathtt{else}\ r))\theta} \text{ ( $A$ -resolution)},$$

where θ := mgu(A, A ) and the restriction R(if A then r else r) holds.

In our work we go beyond the A-resolution rule and modify both the superposition calculus and the saturation algorithm to reason not only about answer literals but also about their use of if−then−else terms (see Sects. 4–5).

#### **3 Illustrative Example**

Let us illustrate our approach to program synthesis. We use answer literals in saturation to construct programs with conditions while proving specifications (1). By adding an answer literal to the skolemized negation of (1), we obtain

$$\forall y. (\neg F[\overline{\sigma}, y] \lor \mathtt{ans}(y)),$$

**Fig. 2.** Axioms defining a group. Uninterpreted function symbols i(·), e, ∗ represent the inverse, the identity element, and the group operation, respectively.

where σ are the skolemized input variables x. When we derive a unit clause ans(r[σ]) during saturation, where r[σ] is a computable term, we construct a program for (1) from the definite answer r[σ] by replacing σ with the input variables <sup>x</sup>, obtaining the program <sup>r</sup>[x]. Hence, deriving computable definite answers by saturation allows us to synthesize programs for specifications.

*Example 1.* Consider the group theory axioms (A1)–(A3) of Fig. 2. We are interested in synthesizing a program for the following specification:

$$\forall x. \exists y. \; x\*y \simeq e \tag{4}$$

In this example we assume that all symbols are computable. To synthesize a program for (4), we add an answer literal to the skolemized negation of (4) and convert the resulting formula to CNF (preprocessing). We consider the set S of clauses containing the obtained CNF and the axioms (A1)-(A3). We saturate S using Sup and obtain the following derivation:<sup>2</sup>


Using the above derivation, we construct a program for the functional specification (4) as follows: we replace σ in the definite answer i(σ) by x, yielding the program i(x). Note that for each input x, our synthesized program computes the inverse i(x) of x as an output. In other words, our synthesized program for (4) ensures that each group element x has a right inverse <sup>i</sup>(x).

While Example 1 yields a definite answer within saturation-based proof search, our work supports the synthesis of more complex recursion-free programs (see Examples 2–3) by composing program fragments derived in the program search (Sect. 4) as well as by using answer literals with if−then−else to effectively handle disjunctive answers (Sect. 5).

<sup>2</sup> For each formula in the derivation, we also list how the formula has been derived. For example, formula 5 is the result of superposition (Sup) with formula 4 and axiom A1, whereas binary resolution (BR) has been used to derive formula 6 from 5 and 1.

### **4 Program Synthesis with Answer Literals**

We now introduce our approach to saturation-based program synthesis using answer literals (Algorithm 1). We focus on recursion-free program synthesis and present our work in a more general setting. Namely, we consider functional specifications whose validity may depend on additional assumptions (e.g. additional program requirements) <sup>A</sup><sup>1</sup>,...,A*n*, where each <sup>A</sup>*<sup>i</sup>* is a closed formula:

$$A\_1 \land \dots \land A\_n \to \forall \overline{x}. \exists y. F[\overline{x}, y] \tag{5}$$

Note that specification (1) is a special case of (5). However, since <sup>A</sup>1,...,A*<sup>n</sup>* are closed formulas, (5) is equivalent to <sup>∀</sup>x.∃y.(A<sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>A</sup>*<sup>n</sup>* <sup>→</sup> <sup>F</sup>[x, y]), which is a special case of (1).

Given a functional specification (5), we use answer literals to synthesize programs with conditions (Sect. 4.1) and extend saturation-based proof search to reason about answer literals (Sect. 4.2). For doing so, we add the answer literal ans(y) to the skolemized negation of (5) and obtain

$$A\_1 \land \dots \land A\_n \land \forall y. (\neg F[\overline{\sigma}, y] \lor \mathtt{ans}(y)). \tag{6}$$

We saturate the CNF of (6), while ensuring that answer literals are not selected within the inference rules used in saturation. We guide saturation-based proof search to derive clauses <sup>C</sup>[σ]∨ans(r[σ]), where C[σ] and r[σ] are computable.

#### **4.1 From Answer Literals to Programs**

Our next result ensures that, if we derive the clause C[σ]∨ans(r[σ]), the term r[σ] is a definite answer under the assumption <sup>¬</sup>C[σ] (Theorem 1). We note that we do not terminate saturation-based program synthesis once a clause C[σ]∨ans(r[σ]) is derived. We rather record the program r[x] with condition <sup>¬</sup>C[x] (and possibly also other conditions), replace clause C[σ]∨ans(r[σ]) by <sup>C</sup>[σ], and continue saturation (Corollary 2). As a result, upon establishing validity of (5), we synthesized a program for (5) (Corollary 3).

**Theorem 1 [Semantics of Clauses with Answer Literals].** *Let* C *be a clause not containing an answer literal. Assume that, using a saturation algorithm based on a sound inference system* <sup>I</sup>*, the clause* <sup>C</sup>∨ans(r[σ]) *is derived from the set of clauses consisting of initial assumptions* A<sup>1</sup>,...,A*<sup>n</sup>, the clausified formula* cnf(¬F[σ, y]∨ans(y)) *and additional assumptions* <sup>C</sup><sup>1</sup>,...,C*<sup>m</sup>. Then,*

$$A\_1, \ldots, A\_n, C\_1, \ldots, C\_m \vdash C, F[\overline{\sigma}, r[\overline{\sigma}]].$$

*That is, under the assumptions* <sup>C</sup><sup>1</sup>,...,C*<sup>m</sup>*,¬C*, the computable term* r[σ] *provides a definite answer to* (5)*.*

We further use Theorem 1 to synthesize programs with conditions for (5).

**Corollary 2 [Programs with Conditions].** *Let* r[σ] *be a computable term and* C[σ] *a ground computable clause not containing an answer literal. Assume that clause* C[σ]∨ans(r[σ]) *is derived from the set of initial clauses* <sup>A</sup><sup>1</sup>,...,A*n, the clausified formula* cnf(¬F[σ, y]∨ans(y)) *and additional ground computable assumptions* C<sup>1</sup>[σ],...,C*m*[σ]*, by using saturation based on a sound inference system* I*. Then,*

$$\left< r[\overline{x}], \bigwedge\_{j=1}^{m} C\_j[\overline{x}] \land \neg C[\overline{x}] \right> $$

*is a program with conditions for* (5)*.*

Note that a program with conditions <sup>r</sup>[x], *m <sup>j</sup>*=1 <sup>C</sup>*<sup>j</sup>* [x] ∧ ¬C[x] corresponds to a conditional (program) branch if *m <sup>j</sup>*=1 <sup>C</sup>*<sup>j</sup>* [x] ∧ ¬C[x] then <sup>r</sup>[x]: only if the condition *m <sup>j</sup>*=1 <sup>C</sup>*<sup>j</sup>* [x] ∧ ¬C[x] is valid, then <sup>r</sup>[x] is computed for (5).

We use programs with conditions <sup>r</sup>[x], *m <sup>j</sup>*=1 <sup>C</sup>*<sup>j</sup>* [x] ∧ ¬C[x] to finally synthesize a program for (5). To this end, we use Corollary 2 to derive programs with conditions, and once their conditions cover all possible cases given the initial assumptions A1,...,A*<sup>n</sup>*, we compose them into a program for (5).

**Corollary 3 [From Programs with Conditions to Programs for** (5)**].** *Let* <sup>P</sup><sup>1</sup>[x],...,P*<sup>k</sup>*[x]*, where* P*<sup>i</sup>*[x] = r*<sup>i</sup>*[x], *i*−1 *<sup>j</sup>*=1 <sup>C</sup>*<sup>j</sup>* [x] ∧ ¬C*<sup>i</sup>*[x]*, be programs with conditions for* (5)*, such that n <sup>i</sup>*=1 <sup>A</sup>*<sup>i</sup>* <sup>∧</sup> *k <sup>i</sup>*=1 <sup>C</sup>*<sup>i</sup>*[x] *is unsatisfiable. Then* <sup>P</sup>[x]*, given by*

$$\begin{aligned} P[\overline{x}] &:= \text{if } \neg C\_1[\overline{x}] \text{ then } r\_1[\overline{x}] \\ &\qquad \text{else if } \neg C\_2[\overline{x}] \text{ then } r\_2[\overline{x}] \\ &\qquad \dots \\ &\qquad \text{else if } \neg C\_{k-1}[\overline{x}] \text{ then } r\_{k-1}[\overline{x}] \\ &\qquad \text{else } r\_k[\overline{x}], \end{aligned} \tag{7}$$

*is a program for* (5)*.*

Note that since the conditional branches of (7) cover all possible cases to be considered over x, we do not need the condition if <sup>¬</sup>C*<sup>k</sup>*. In particular, if k = 1, i.e. *n <sup>i</sup>*=1 <sup>A</sup>*<sup>i</sup>*∧C<sup>1</sup>[x] is unsatisfiable, then the synthesized program for (5) is <sup>r</sup><sup>1</sup>[x].

#### **4.2 Saturation-Based Program Synthesis**

Our program synthesis results from Theorem 1, Corollary 2 and Corollary 3 rely upon a saturation algorithm using a sound (but not necessarily complete) inference system I. In this section, we present our modifications to extend stateof-the-art saturation algorithms with answer literal reasoning, allowing to derive clauses C[σ]∨ans(r[σ]), where both <sup>C</sup>[σ] and <sup>r</sup>[σ] are computable. In Sects. 5–6 we then describe modifications of the inference system I to implement rules over clauses with answer literals.


Our saturation algorithm is given in Algorithm 1. In a nutshell, we use Corollary <sup>2</sup> to construct programs from clauses C[σ]∨ans(r[σ]) and replace clauses C[σ]∨ans(r[σ]) by C[σ] (lines 7–10 of Algorithm 1). The newly added computable assumptions C[σ] are used to guide saturation towards deriving programs with conditions where the conditions contain C[x]; these programs with conditions are used for synthesizing programs for (5), as given in Corollary 3.

Compared to a standard saturation algorithm used in first-order theorem proving (e.g. lines 4–5 of Algorithm 1), Algorithm 1 implements additional steps for processing newly derived clauses C[σ]∨ans(r[σ]) with answer literals (lines 6–10). As a result, Algorithm 1 establishes not only the validity of the specification (5) but also synthesizes a program (lines 12–13). Throughout the algorithm, we maintain a set P of programs with conditions derived so far and a set C of additional assumptions. For each new clause C*<sup>i</sup>*, we check if it is in the form C[σ]∨ans(r[σ]) where C[σ] is ground and computable (line 7). If yes, we construct a program with conditions r[x], - *C*-∈C <sup>C</sup> ∧ ¬C[x], extend <sup>C</sup> with the additional assumption <sup>C</sup>[x], and replace <sup>C</sup>*<sup>i</sup>* by <sup>C</sup>[σ] (lines 8–10). Then, when we derive the empty clause, we construct the final program as follows. We first collect all clauses that participated in the derivation of -. We use this clause collection to filter the programs in P – we only keep a program originating from a clause C[σ]∨ans(r[σ]) if the condition <sup>C</sup>[σ] was used in the proof, obtaining programs <sup>P</sup><sup>1</sup>,...,P*<sup>k</sup>*. From <sup>P</sup><sup>1</sup>,...,P*<sup>k</sup>* we then synthesize the final program <sup>P</sup> using the construction (7) from Corollary 3.

*Remark 1.* Compared to [22] where potentially large programs (with conditions) are tracked in answer literals, Algorithm 1 removes answer literals from clauses and constructs the final program only after saturation found a refutation of the negated (5). Our approach has two advantages: first, we do not have to keep track of potentially many large terms using if−then−else, which might slow down saturation-based program synthesis. Second, our work can naturally be integrated with clause splitting techniques within saturation (see Sect. 7).

### **5 Superposition with Answer Literals**

We note that our saturation-based program synthesis approach is not restricted to a specific calculus. Algorithm 1 can thus be used with *any sound* set of inference rules, including theory-specific inference rules, e.g. [10], as long as the rules allow derivation of clauses in the form C∨ans(r), where C, r are computable and C is ground. I.e., the rules should only derive clauses with at most one answer literal, and should not introduce uncomputable symbols into answer literals.

In this section we present changes tailored to the superposition calculus Sup, yet, without changing the underlying saturation process of Algorithm 1. We first introduce the notion of an abstract unifier [17] and define a computable unifier – a mechanism for dealing with the uncomputable symbols in the reasoning instead of introducing them into the programs. The use of such a unifier in any sound calculus is explained, with particular focus on the Sup calculus.

**Definition 1 (Abstract unifier** [17]**).** *An* abstract unifier *of two expressions* <sup>E</sup>1, E<sup>2</sup> *is a pair* (θ,D) *such that:*


Intuitively speaking, an abstract unifier combines disequality constraints D with a substitution <sup>θ</sup> such that the substitution is a unifier of <sup>E</sup>1, E<sup>2</sup> if the constraints D are not satisfied.

**Definition 2 (Computable unifier).** *A* computable unifier *of two expressions* <sup>E</sup>1, E<sup>2</sup> *with respect to an expression* <sup>E</sup><sup>3</sup> *is an abstract unifier* (θ,D) *of* <sup>E</sup>1, E<sup>2</sup> *such that the expression* E3θ *is computable.*

For example, let f be computable and g uncomputable. Then ({y → f(z)}, z g(x)) is a computable unifier of the terms f(g(x)), y with respect to f(y). Further, ({y → f(g(x))}, <sup>∅</sup>) is an abstract unifier of the same terms, but not a computable unifier with respect to f(y).

**Ensuring Computability of Answer Literal Arguments.** We modify the rules of a sound inference system I to use computable unifiers with respect to the answer literal argument instead of unifiers. Since a computable unifier may entail disequality constraints D, we add D to the conclusions of the inference rules. That is, for an inference rule of I as below

$$\begin{array}{ccccc} C\_1 & \cdots & C\_n \\ \hline & C\theta \end{array},\tag{8}$$

where θ is a substitution such that Eθ E θ holds for some expressions E,E , we extend <sup>I</sup> with the following n inference rules with computable unifiers:

$$\begin{array}{ccccc} C\_1 \lor \mathtt{ans}(r) & C\_2 & \cdots & C\_n\\ \hline \underline{(D \lor C \lor \mathtt{ans}(r)) \theta'} & & \cdots & \underline{(D \lor C \lor \mathtt{ans}(r)) \theta'} \end{array} , \qquad \begin{array}{ccccc} C\_1 & C\_2 & \cdots & C\_n \lor \mathtt{ans}(r) \\ \hline \underline{(D \lor C \lor \mathtt{ans}(r)) \theta'} & & \cdots & \end{array} , \tag{9}$$

where (θ , D) is a computable unifier of E,E with respect to r and none of <sup>C</sup><sup>1</sup>,...,C*<sup>n</sup>* contains an answer literal. We obtain the following result.




**Fig. 3.** Selected rules of the extended superposition calculus Sup for reasoning with answer literals, with underlined literals being selected.

#### **Lemma 4 [Soundness of Inferences with Answer Literals].** *If the rule* (8) *is sound, the rules* (9) *are sound as well.*

We note that we keep the original rule (8) in I, but impose that none of its premises <sup>C</sup><sup>1</sup>,...,C*<sup>n</sup>* contains an answer literal. Clearly, neither the such modified rule (8) nor the new rules (9) introduce uncomputable symbols into answer literals. Rather, these rules add disequality constraints D into their conclusions and immediately select D for further applications of inference rules. Such a selection guides the saturation process in Algorithm 1 to first discharge the constraints D containing uncomputable symbols with the aim of deriving a clause C <sup>∨</sup>ans(r ) where C is computable. The clause C <sup>∨</sup>ans(r ) is then converted into a program with conditions using Corollary 2.

**Superposition with Answer Literals.** We make the inference rule modifications (8), together with the addition of new rules (9), for each inference rule of the Sup calculus from Fig. 1. Further, we also ensure that rules with multiple premises, when applied on several premises containing answer literals, *derive clauses with at most one answer literal*. We therefore introduce the following two rule modifications. (i) We use the if−then−else constructor to combine answer literals of premises, by adapting the use of if−then−else within binary resolution [13,14,22] to superposition rules. (ii) We use an answer literal from only one of the rule premises in the rule conclusion and add new disequality constraint r r between the premises' answer literal arguments, similar to the constraints D of the computable unifier. Analogously to the computable unifier constraints, we immediately select this disequality constraint r r .

The resulting extension of the Sup calculus with answer literals is given in Fig. 3. In addition to the rules of Fig. 3, the extended calculus contains rules constructed as (9) for superposition and binary resolution rules of Fig. 1. Using Lemma 4, we conclude the following.

**Lemma 5 [Soundness of** S**up with Answer Literals].** *The inference rules from Fig. 3 of the extended* S*up calculus with answer literals are sound.*

By the soundness results of Lemmas 4–5, Corollaries 2–3 imply that, when applying the calculus of Fig. 3 in the saturation-based program synthesis approach of Algorithm 1, we construct correct programs.

*Example 2.* We illustrate the use of Algorithm 1 with the extended Sup calculus of Fig. 3, strengthening our motivation from Sect. <sup>3</sup> with if−then−else reasoning. To this end, consider the functional specification over group theory:

$$\forall x, y. \exists z. (x\*y \not\le y\*x \to z\*z \not\le e),\tag{10}$$

asserting that, if the group is not commutative, there is an element whose square is not e. In addition to the axioms (A1)–(A3) of Fig. 2, we also use the right identity axiom (A2') <sup>∀</sup>x. x <sup>∗</sup> e x. <sup>3</sup> Based on Algorithm 1, we obtain the following derivation of the program for (10):

1. <sup>σ</sup><sup>1</sup> <sup>∗</sup> <sup>σ</sup><sup>2</sup> <sup>σ</sup><sup>2</sup> <sup>∗</sup> <sup>σ</sup><sup>1</sup> <sup>∨</sup> ans(z) [preprocessed specification] 2. e z <sup>∗</sup> z <sup>∨</sup>ans(z) [preprocessed specification] 3. <sup>σ</sup><sup>1</sup> <sup>∗</sup> <sup>σ</sup><sup>2</sup> <sup>σ</sup><sup>2</sup> <sup>∗</sup> <sup>σ</sup><sup>1</sup> [answer literal removal 1. (Algorithm 1, line 10)] 4. x <sup>∗</sup> (x <sup>∗</sup> y) e <sup>∗</sup> y <sup>∨</sup>ans(x) [Sup 2., A3] 5. e x <sup>∗</sup> (y <sup>∗</sup> (x <sup>∗</sup> y))<sup>∨</sup> ans(x <sup>∗</sup> y) [Sup A3, 2.] 6. x <sup>∗</sup> (x <sup>∗</sup> y) y <sup>∨</sup>ans(x) [Sup 4., A2] 7. x <sup>∗</sup> e y <sup>∗</sup> (x <sup>∗</sup> y)<sup>∨</sup> ans(if e x <sup>∗</sup> (y <sup>∗</sup> (x <sup>∗</sup> y)) then x else x <sup>∗</sup> y) [Sup 6., 5.] 8. y <sup>∗</sup> (x <sup>∗</sup> y) x∨ans(if <sup>e</sup> <sup>x</sup> <sup>∗</sup> (<sup>y</sup> <sup>∗</sup> (<sup>x</sup> <sup>∗</sup> <sup>y</sup>)) then <sup>x</sup> else <sup>x</sup> <sup>∗</sup> <sup>y</sup>) [Sup 7., A2'] 9. x <sup>∗</sup> y y <sup>∗</sup> x∨ans(if x <sup>∗</sup> (y <sup>∗</sup> x) y then x else if e x <sup>∗</sup> (y <sup>∗</sup> (x <sup>∗</sup> y)) then x else x <sup>∗</sup> y) [Sup 6., 8.] 10. ans(if <sup>σ</sup><sup>1</sup> <sup>∗</sup> (σ<sup>2</sup> <sup>∗</sup> <sup>σ</sup><sup>1</sup>) <sup>σ</sup><sup>2</sup> then <sup>σ</sup><sup>1</sup> else if <sup>e</sup> <sup>σ</sup><sup>1</sup> <sup>∗</sup> (σ<sup>2</sup> <sup>∗</sup> (σ<sup>1</sup> <sup>∗</sup> <sup>σ</sup><sup>2</sup>)) then <sup>σ</sup><sup>1</sup> else <sup>σ</sup><sup>1</sup> <sup>∗</sup> <sup>σ</sup><sup>2</sup>) [BR 9., 3.] 11. -[answer literal removal 11. (Algorithm 1, line 10)]

<sup>3</sup> We include axiom (A2') only to shorten the presentation of the obtained derivation.

The programs with conditions collected during saturation-based program synthesis, in particular corresponding to steps 3. and 11. above, are:

$$\begin{aligned} P\_1[x, y] &:= \langle z, x \ast y \simeq y \ast x \rangle \\ P\_2[x, y] &:= \langle \mathtt{if} \ x \ast (y \ast x) \simeq y \text{ then } x \text{ else (11 } e \simeq x \ast (y \ast (x \ast y)) \text{ then } x \text{ else } x \ast y \rangle, \\ x \ast y \not\simeq y \ast x \rangle \end{aligned}$$

Note the variable z, representing an arbitrary witness, in P<sup>1</sup>[x, y]. An arbitrary value is a correct witness in case x∗y y <sup>∗</sup>x holds, as in this case (10) is trivially satisfied. Thus, we do not need to consider the case x∗y y∗x separately. Hence, we construct the final program P[x, y] only from P<sup>2</sup>[x, y] and obtain:

$$P[x,y] \coloneqq \text{if } x\*(y\*x) \simeq x \text{ then } x \text{ \*\*else (if } e \simeq x\*(y\*(x\*y)) \text{ then } x \text{ \*\*else } x\*y)$$

We conclude this section by illustrating the benefits of computable unifiers.

*Example 3.* Consider the group theory specification

$$\forall x, y. \exists z. \; z\*(i(x)\*i(y)) = e,\tag{11}$$

describing the inverse element z of i(x) <sup>∗</sup> i(y). We annotate the inverse i(·) as uncomputable to disallow the trivial solution i(i(x) <sup>∗</sup> i(y)). Using computable unifiers, we synthesize the program y <sup>∗</sup> x; that is, a program computing y <sup>∗</sup> x as the inverse of <sup>i</sup>(x) <sup>∗</sup> i(y).

#### **6 Computable Unification with Abstraction**

When compared to the Sup calculus of Fig. 1, our extended Sup calculus with answer literals from Fig. 3 uses computable unifiers instead of mgus. To find computable unifiers, we introduce Algorithm 2 by extending a standard unification algorithm [7,18] and an algorithm for unification with abstraction of [17]. Algorithm 2 combines computable unifiers with mgu computation, resulting in the computable unifier θ := mgucomp(E<sup>1</sup>, E<sup>2</sup>, E<sup>3</sup>) to be further used in Fig. 3.

Algorithm 2 modifies a standard unification algorithm to ensure computability of E<sup>3</sup>θ. Changes compared to a standard unification algorithm are highlighted. Algorithm <sup>2</sup> does not add <sup>s</sup> → <sup>t</sup> to <sup>θ</sup> if <sup>s</sup> is a variable in <sup>E</sup><sup>3</sup> and <sup>t</sup> is uncomputable. Instead, if t is f(t<sup>1</sup>,...,t*<sup>n</sup>*) where <sup>f</sup> is computable but not all <sup>t</sup><sup>1</sup>,...,t*<sup>n</sup>* are computable, we extend <sup>θ</sup> by <sup>s</sup> → <sup>f</sup>(x<sup>1</sup>,...,x*<sup>n</sup>*) and then add equations <sup>x</sup><sup>1</sup> <sup>=</sup> <sup>t</sup><sup>1</sup>,...,x*<sup>n</sup>* <sup>=</sup> <sup>t</sup>*<sup>n</sup>* to the set of equations <sup>E</sup> to be processed. Otherwise, f is uncomputable and we perform an abstraction: we consider s and t to be unified under the condition that s t holds. Therefore we add a constraint s t to the set of literals D which will be added to any clause invoking the computable unifier. To discharge the literal s t, one must prove s t. While s can be later substituted for other terms, as long as we use mgucomp, <sup>s</sup> will never be substituted for an uncomputable term. Thus, we conclude the following result.

**Theorem 6.** *Let* <sup>E</sup><sup>1</sup>, E<sup>2</sup>, E<sup>3</sup> *be expressions. Then* (θ,D) := mgucomp(E<sup>1</sup>, E<sup>2</sup>, E<sup>3</sup>) *is a computable unifier.*

#### **Algorithm 2.** Computable Unification with Abstraction

function mgucomp(E1, E2, E3) if E<sup>3</sup> is uncomputable then fail let E be a set of equations and θ be a substitution; E := {E<sup>1</sup> = E2}; θ := {} let D be a set of disequalities; D := ∅ repeat if E is empty then return (θ, D) where D is the disjunction of literals in D Select an equation s = t in E and remove it from E if s coincides with t then do nothing else if s is a variable and s does not occur in t then if s does not occur in E<sup>3</sup> or t is computable then θ :=θ◦{s→t}; E =E{s→t} else if t = f(t1,...,t*n*) and f is computable then θ :=θ◦{s→f(x1,...,x*n*)}; E :=E{s→f(x1,...,x*n*)}∪{x<sup>1</sup> =t1,...,x*<sup>n</sup>* =t*n*} where x1,...,x*<sup>n</sup>* are fresh variables else if t = f(t1,...,t*n*) and f is uncomputable then D := D∪{s t} else if s is a variable and s occurs in t then fail else if t is a variable then E := E∪{t = s} else if s and t have different top-level symbols then fail else if s=f(s1,...,s*n*) and t=f(t1,...,t*n*) then E :=E∪{s<sup>1</sup> =t1,...,s*<sup>n</sup>* =t*n*}

### **7 Implementation and Experiments**

**Implementation.** We implemented our saturation-based program synthesis approach in the Vampire prover [11]. We used Algorithm 1 with the extended Sup calculus of Fig. 3. The implementation, consisting of approximately 1100 lines of C++ code, is available at https://github.com/vprover/vampire/tree/ synthesis-pr. The synthesis functionality can be turned on using the option --question answering synthesis.

Vampire accepts functional specifications in an extension of the SMT-LIB2 format [4], by using the new command assert-not to mark the specification. We consider interpreted theory symbols to be computable. Uninterpreted symbols can be annotated as uncomputable via the command (set-option :uncomputable (symbol1 ... symbolN)).

Our implementation also integrates Algorithm 1 with the AVATAR architecture [26]. We modified the AVATAR framework to only allow splitting over ground computable clauses that do not contain answer literals. Further, if we derive a clause C[σ]∨ans(r[σ]) with AVATAR assertions C<sup>1</sup>[σ],...,C*<sup>m</sup>*[σ], where C[σ] is ground and computable, we replace it by the clause C[σ]<sup>∨</sup> *<sup>m</sup> <sup>i</sup>*=1 <sup>¬</sup>C*<sup>i</sup>*[σ]∨ans(r[σ]) without any assertions. We then immediately record a program with conditions r[x],¬C[x] <sup>∧</sup> *m <sup>i</sup>*=1 <sup>C</sup>*<sup>i</sup>*[x], and replace the clause by C[σ]<sup>∨</sup> *<sup>m</sup> <sup>i</sup>*=1 <sup>¬</sup>C*<sup>i</sup>*[σ] (see lines 7–10 of Algorithm 1), which may be then further split by AVATAR.

Finally, our implementation simplifies the programs we synthesize. If during Algorithm <sup>1</sup> we record a program z,F where z is a variable, we do not use this program in the final program construction (line 12 of Algorithm 1) even if F occurs in the derivation of -(see Example 2).

**Examples and Experimental Setup.** The goal of our experimental evaluation is to showcase the benefits of our approach on problems that are deemed to be hard, even unsolvable, by state-of-the-art synthesis techniques. We therefore focused on first–order theory reasoning and evaluated our work on the group theory problems of Examples 1–3, as well as on integer arithmetic problems.

As the SMT-LIB2 format can easily be translated into the SyGuS 2.1 syntax [16], we compared our results to cvc5 1.0.4 [3], supporting SyGuS-based synthesis [2]. Our experiments were run on an AMD Epyc 7502, 2.5 GHz CPU with 1 TB RAM, using a 5 min time limit per example. Our benchmarks as well as the configurations for our experiments are available at: https://github.com/ vprover/vampire benchmarks/tree/master/synthesis

**Experimental Results with Group Theory Properties.** Vampire synthesizes the solutions of the Examples 1–3 in 0.01, 13, and 0.03 s, respectively. Since these examples use uninterpreted functions, they cannot be encoded in the SyGuS 2.1 syntax, showcasing the limits of other synthesis tools.

**Experimental Results with Maximum of** n <sup>≥</sup> <sup>2</sup> **Integers.** For the maximum of 2 integers, the specification is <sup>∀</sup>x1, x<sup>2</sup> <sup>∈</sup> <sup>Z</sup>. <sup>∃</sup><sup>y</sup> <sup>∈</sup> <sup>Z</sup>. y <sup>≥</sup> x<sup>1</sup>∧y <sup>≥</sup> x<sup>2</sup>∧(y <sup>=</sup> x<sup>1</sup>∨y <sup>=</sup> x<sup>2</sup>) , and the program we synthesize is if <sup>x</sup><sup>1</sup> < x<sup>2</sup> then <sup>x</sup><sup>2</sup> else <sup>x</sup><sup>1</sup>. Both our work and cvc5 are able to synthesize programs choosing the maximal value for up to n = 23 input variables, as summarized below. For n > 23, both Vampire and cvc5 time out.


**Experimental Results with Polynomial Equations.** Vampire can synthesize the solution of polynomial equations; for example, for <sup>∀</sup>x<sup>1</sup>, x<sup>2</sup> <sup>∈</sup> <sup>Z</sup>.∃<sup>y</sup> <sup>∈</sup> <sup>Z</sup>.(y<sup>2</sup> <sup>=</sup> x<sup>2</sup> <sup>1</sup> + 2x<sup>1</sup>x<sup>2</sup> <sup>+</sup>x<sup>2</sup> <sup>2</sup>), we synthesize <sup>x</sup><sup>1</sup> <sup>+</sup>x<sup>2</sup>. Vampire finds the corresponding program in 26 s using simple first-order reasoning, while cvc5 fails in our setup.

### **8 Related Work**

Our work builds upon deductive synthesis [14] adapted for the resolution calculus [13,22]. We extend this line of work with saturation-based program synthesis, by using adjustments of the superposition calculus.

Component-based synthesis of recursion-free programs [21] from logical specifications is addressed in [6,21,24]. The work of [21] uses first-order theorem proving to prove specifications and extract programs from proofs. In [6,24], ∃∀ formulas are produced to capture specifications over component properties and SMT solving is applied to find a term satisfying the formula, corresponding to a straight-line program. We complement [21] with saturation-based superposition proving and avoid template-based SMT solving from [6,24].

A prominent line of research comes with syntax guided synthesis (SyGuS) [1], where functional specifications are given using a context-free grammar. This grammar yields program templates to be synthesized via an enumerative search procedure based on SMT solving [3,9]. We believe our work is complementary to SyGuS, by strengthening first-order reasoning for program synthesis, as evidenced by Examples 1–3.

The sketching technique [19,25] synthesizes program assignments to variables, using an alternative framework to the program synthesis setting we rely upon. In particular, sketching addresses domains that do not involve input logical formulas as functional specifications, such as example-guided synthesis [23].

### **9 Conclusions**

We extend saturation-based proof search to saturation-based program synthesis, aiming to derive recursion-free programs from specifications. We integrate answer literals with saturation, and modify the superposition calculus and unification to synthesize computable programs. Our initial experiments show that a first-order theorem prover becomes an efficient program synthesizer, potentially opening up interesting avenues toward recursive program synthesis, for example using saturation-based proving with induction.

**Acknowledgements.** We thank Haniel Barbosa for support with experiments with cvc5. This work was partially funded by the ERC CoG ARTIST 101002685, and the FWF grants LogiCS W1255-N23 and LOCOTES P 35787.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Uniform Formalisation of Three-Valued Logics in Bisequent Calculus**

Andrzej Indrzejczak(B) and Yaroslav Petrukhin

Department of Logic, University of Lodz, -L´od´z, Poland andrzej.indrzejczak@filhist.uni.lodz.pl, iaroslav.petrukhin@edu.uni.lodz.pl

**Abstract.** We present a uniform characterisation of three-valued logics by means of bisequent calculus (BSC). It is a generalised form of sequent calculus (SC) where rules operate on the ordered pairs of ordinary sequents. BSC may be treated as the weakest kind of system in the rich family of generalised SC operating on items being some collections of ordinary sequents. This family covers several forms of hypersequent and nested sequent calculi introduced to provide decent SC for several non-classical logics. It seems that for many non-classical logics, including some many-valued, paraconsistent and modal logics, this reasonably modest generalization of standard SC is sufficient. In this paper we examine a variety of three-valued logics and show how they can be formalised in the framework of bisequent calculus. All provided systems are cut-free and satisfy the subformula property. Also the interpolation theorem is constructively proved for some logics.

**Keywords:** Bisequent Calculus · Cut elimination · Many-valued Logic · Three-valued logic · Interpolation Theorem

### **1 Introduction**

The aim of this paper is to provide a uniform characterization of a variety of three-valued logics by means of a simple cut-free generalised sequent calculus (SC) called bisequent calculus (BSC). It is the weakest kind of system in the rich family of generalised sequent calculi operating on collections of ordinary sequents [23]. If we restrict our interest to structures built of two sequents only, we obtain a limiting case of either hypersequent or nested sequent calculi; it is what we call bisequent calculus.

Is such restricted calculus of any use? Hypersequent calculi already may be seen as a quite restrictive form of generalised SC, yet they were shown to be useful in many fields (see, e.g., [25] for a survey of applications of hypersequent calculi

Funded by the European Union (ERC, ExtenDD, project number: 101054714). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.

in modal logic, and [37] for their use in fuzzy logic). BSC is even more restrictive but preliminary work on its application is promising. It was already successfully applied to first-order modal logic **S5** [23] and to the class of four-valued quasirelevant logics [27]. In what follows we will focus on another application of such minimal framework – to three-valued logics.

Several proof systems of different kinds were proposed so far for many-valued logics (see e.g. H¨ahnle [20] for a survey). The most direct and popular approach to construction of many-valued sequent or tableau systems is based on the idea of syntactic representation of n values either by means of n-sided sequents (e.g. [8,45,56]) or by n labels attached to formulae or sets of formulae (e.g. [11,53,55]). This solution was presented by many authors and despite its popularity has many drawbacks (see [25] for discussion). Significant improvement in the construction of efficient SC or tableau systems for many-valued logic was proposed independently by Doherty [15] and H¨ahnle [19], where labels correspond not to single values but to their sets (sets-as signs). Among other proof-theoretic approaches to many-valued logics let us mention Caleiro and Marcelino's [10] analytic calculi for many-valued non-deterministic logics as well as the result by Gr¨atz [18] who has recently developed analytic tableau systems based on sets-as-signs DNF representations with a correspondence to canonical sequent calculi.

Although BSC is a strictly syntactical calculus its semantical interpretation makes it similar to set-as-signs approach. A fuller discussion of this issue is provided in [27]. BSC is uniform in the sense that all three-valued logics are characterised by the same set of axiomatic sequents, and in the case of logics having the same set of connectives (i.e. defined in the same way) the rules are identical even if the set of designated values or the consequence relation is defined in different way. In this sense BSC is more uniform than several other approaches where either the set of axioms must be changed or rules for connectives must be different (even if described by means of the same table). In particular, BSC is superior in this respect to the generalised calculus presented in [25].

Section 2 has rather encyclopaedic character and provides self-contained description of a representative selection of three-valued logics. Section 3 contains a case study of BSC for **K**<sup>3</sup> and **LP**. In Sect. 4 we provide rules for connectives of all logics introduced in Sect. 2. Section 5 shows how BSC can be applied to prove interpolation for some three-valued paraconsistent and paracomplete logics. We finish with remarks on possible extensions and comparison with other approaches to formalisation of many-valued logics.

### **2 Logics**

We will examine several three-valued propositional logics determined by three element matrices with classical-like connectives (negation, disjunction, conjunction, and implication, plus the usual three-valued modal-style connectives); we are not going to consider other types of connectives because of the lack of space. The languages of these logics are freely generated algebras similar to three element algebras of values. Logics are interpreted by homomorphisms from languages to algebras such that h(c*n*(ϕ1,...,ϕ*n*)) = c(h(ϕ1),...,h(ϕ*n*)) for every n−ary connective c and the corresponding operation c.

Let us consider as the starting point two three element Kleene's algebras of the form: A<sup>3</sup> = -A, O where A = {0, u, 1} and O contains an unary operation ¬ : A −→ A and binary operations : A × A −→ A, where ∈ {∧,∨,→}. The operations are defined by the following truth tables in the strong and weak Kleene algebra; the latter considered also by Bochvar [9] (negation is the same in both):


We obtain four matrices by specifying a set of designated values D either as {1} or {1, u}. These are called SM<sup>3</sup> <sup>1</sup>, SM<sup>3</sup> <sup>2</sup>, WM<sup>3</sup> <sup>1</sup> and WM<sup>3</sup> <sup>2</sup> (where S stands for strong, W for weak, 1 and 2 indicate the amount of designated values). In general we will call matrices with D = {1} 1-matrices, and with D = {1, u} – 2-matrices. Accordingly we will also call logics determined by 1-matrices and 2-matrices, 1- and 2-logics respectively. For any matrix we define a relation of matrix consequence in the standard way:

Γ |=*<sup>M</sup>* ϕ iff for any homomorphism h : if h(Γ) ⊆ D, then h(ϕ) ∈ D.

Logics are identified with their matrix consequences. In particular, logics determined by these matrices are **K**<sup>3</sup> (strong Kleene 1-logic) [31], **LP** – the logic of paradox of Asenjo and Priest (corresponding 2-logic) [2,42], **K<sup>w</sup>** <sup>3</sup> (weak Kleene 1-logic) [31], **PWK** (paraconsistent weak Kleene 2-logic) of Halld´en [21].

Let us consider a few modifications of strong and weak Kleene logics. Here is McCarthy's logic **K**<sup>→</sup> <sup>3</sup> [36] (also called Kleene's sequential and studied by Fitting [16]) and its interesting modification presented by Komendantskaya [32] under the name **K**<sup>←</sup> <sup>3</sup> by means of the following truth tables (again, negation is unchanged):


Both **K**<sup>→</sup> <sup>3</sup> and **K**<sup>←</sup> <sup>3</sup> are logics determined by 1-matrices. An important property of **K**3, **K<sup>w</sup>** <sup>3</sup> , **K**<sup>→</sup> <sup>3</sup> , and **K**<sup>←</sup> <sup>3</sup> is that they are the only three-valued logics with one designated value which produce partial recursive predicates (see [31,32] for more details).

Several other important logics are obtained by changing the definitions of → and ¬. Consider Lukasiewicz's [34], Slupecki's [49], Heyting's [22] implications as well as Heyting's [22], Bochvar's [9], Post's and dual Post's [41] negations. Let us also consider yet another pair of additive conjunction and disjunction, arising in Lukasiewicz's logic:


SM<sup>3</sup> <sup>1</sup> with Lukasiewicz's implication (instead of Kleene's one) yields famous Lukasiewicz's **L-** <sup>3</sup>, the first many-valued logic. In **L-** <sup>3</sup> we may deal with two pairs of conjunction and disjunction. We have: ϕ ∨ ψ = (ϕ →*<sup>L</sup>* ψ) →*<sup>L</sup>* ψ and ϕ ∧ ψ = <sup>¬</sup>(¬<sup>ϕ</sup> ∨ ¬ψ), but <sup>ϕ</sup> <sup>∧</sup>*<sup>L</sup>* <sup>ψ</sup> <sup>=</sup> <sup>¬</sup>(<sup>ϕ</sup> <sup>→</sup>*<sup>L</sup>* <sup>¬</sup>ψ) and <sup>ϕ</sup> <sup>∨</sup>*<sup>L</sup>* <sup>ψ</sup> <sup>=</sup> <sup>¬</sup><sup>ϕ</sup> <sup>→</sup>*<sup>L</sup>* <sup>ψ</sup>. SM<sup>3</sup> <sup>1</sup> with Slupecki's implication is an alternative to **L-** <sup>3</sup> having the deduction theorem. It was studied by Slupecki, Bryll, and Prucnal [49] as well as Avron [4], under the name **GM**3. If we change negation and implication of SM<sup>3</sup> <sup>1</sup> to Heyting's ones, then we get Heyting's [22] logic **G**3, a close relative of intuitionistic logic (the name after G¨odel who also studied it [17]; this logic was investigated by Ja´skowski as well [29]). The disjunction of SM<sup>3</sup> <sup>1</sup> and Post's cyclic negation from Post's logic **P**<sup>3</sup> [41] which is known for being functionally complete in the threevalued setting. In [40], a dual cyclic negation ¬*DP* was suggested (it reverses the direction of cyclicality of Post's negation). SM<sup>3</sup> <sup>2</sup> with Heyting's implication and Bochvar's negation was investigated by Osorio and Carballido [38] under the name **G** <sup>3</sup>. In the case of SM<sup>3</sup> <sup>2</sup> the following connectives are interesting as well: Soboci´nski's [51] conjunction, disjunction, and implications as well as D'Ottaviano/DaCosta/Ja´skowski/Slupecki's implication [13,30,48]:


Soboci´nski's logic **S**<sup>3</sup> is obtained from SM<sup>3</sup> <sup>2</sup> by the replacement of all binary connectives of this matrix with Soboci´nski's original ones. This logic may be treated as a relevant logic. However, a more popular three-valued relevant logic is Anderson and Belnap's **RM**<sup>3</sup> [1] which is obtained from SM<sup>3</sup> <sup>2</sup> only by the replacement of its implication with Soboci´nski's one. Note that earlier Soboci´nski [50] considered yet another implication → *<sup>S</sup>*. SM<sup>3</sup> <sup>2</sup> with the implication due to D'Ottaviano/DaCosta/Ja´skowski/Slupecki (first mentioned by Slupecki [48]) instead of Kleene's one was independently studied by several authors: D'Ottaviano and da Costa themselves [13,14], Asenjo and Tamburino [3], Batens [7] (under the name PI*<sup>s</sup>*), Avron [5] (under the name **RM**<sup>⊃</sup> <sup>3</sup> ), and Rozonoer [44] (under the name **PCont**). An important extension of this logic is **J**<sup>3</sup> by D'Ottaviano and da Costa [13]. It has an additional connective which is Lukasiewicz's tabular possibility operator (see below; we also present Lukasiewicz's tabular necessity operator).


As it is easy to guess, since **RM**<sup>3</sup> may be viewed as a relevant logic, it should be paraconsistent as well. Moreover, **J**3, **S**3, **LP**, and many other three-valued logics with two designated values are paraconsistent (in contrast, three-valued logics with one designated value are paracomplete). One of the most famous three-valued paraconsistent logics is Sette's logic **P**<sup>1</sup> [47]. It has Bochvar's negation and the above presented binary connectives (both 1 and u are designated). There is a version of **P**<sup>1</sup> with Kleene's negation introduced by Carnielli and Marcos [12,35] and called **P**<sup>2</sup>. A paracomplete companion of **P**<sup>1</sup>, the logic **I**<sup>1</sup>, was presented by Sette and Carnielli [46]: it has Heyting's negation and presented below binary connectives (the implication has been first introduced by Bochvar [9]). Its version with Kleene's negation is **I**<sup>2</sup> due to Marcos [35]. Both **I**<sup>1</sup> and **I**<sup>2</sup> have one designated value.


Last but not least, let us mention Rescher's [43] and Tomova's [57] implications (added above). These implications can be added to SM<sup>3</sup> <sup>1</sup>. Tomova [57] introduced the concept of natural implication. In three-valued case with one designated value there are only 6 natural implications: Lukasiewicz's, Slupecki's, Heyting's, Bochvar's, Rescher's, and Tomova's. In the case with two designated values there are 24 natural implications, including Heyting's and Rescher's as well as D'Ottaviano/DaCosta/Ja´skowski/Slupecki's implication, both Soboci´nski's implications, and Sette's implication.

### **3 Bisequent Calculus for K<sup>3</sup> (and LP)**

Bisequents in BSC are ordered pairs of sequents Γ ⇒ Δ | Π ⇒ Σ, where Γ, Δ, Π, Σ are finite (possibly empty) multisets of formulae. We will call the left component of a bisequent as 1-sequent and the right as 2-sequent respectively. Bisequents with all elements being atomic will be also called atomic. In what follows *B* stands for arbitrary bisequents and *S* for sequents.

Let us define the calculus BSC-K<sup>3</sup> which provides an adequate formalisation of **<sup>K</sup>**3. A bisequent <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> <sup>Σ</sup> is axiomatic iff it has nonempty <sup>Γ</sup> <sup>∩</sup> <sup>Σ</sup> or Γ ∩ Δ or Π ∩ Σ. In fact this set of axioms is fixed for all considered calculi. If constants , ⊥, U (the last for fixed undefined proposition) are added we must add axioms of the form: Γ ⇒ Δ,  | Π ⇒ Σ; Γ ⇒ Δ | Π ⇒ Σ, ; ⊥, Γ ⇒ Δ | Π ⇒ Σ; Γ ⇒ Δ | ⊥, Π ⇒ Σ; U, Γ ⇒ Δ | Π ⇒ Σ and Γ ⇒ Δ | Π ⇒ Σ,U.

The set of rules characterising the operations of the strong Kleene algebra consists of the following schemata:

$$\begin{array}{c} (\neg \Rightarrow) \begin{array}{l} \Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \varphi\\ \neg \varphi, \varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma\\ \Gamma \Rightarrow \Delta, \varphi \mid \varPi \Rightarrow \Sigma\\ \Gamma \Rightarrow \Delta \mid \neg \varphi, \varPi \Rightarrow \Sigma \end{array} \begin{array}{l} \Gamma \Rightarrow \Delta \mid \varphi, \varPi \Rightarrow \Sigma\\ \hline \Gamma \Rightarrow \Delta, \neg \varphi \mid \varPi \Rightarrow \Sigma\\ \hline \end{array} \\ (\mid \Rightarrow \text{)} \begin{array}{l} \varphi, \varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma\\ \hline \Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \neg \varphi \end{array} \end{array}$$

(∧⇒|) ϕ, ψ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>S</sup> <sup>ϕ</sup> <sup>∧</sup> ψ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>S</sup> (⇒∧ |) <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ <sup>|</sup> S Γ <sup>⇒</sup> Δ, ψ <sup>|</sup> <sup>S</sup> Γ ⇒ Δ, ϕ ∧ ψ | S (| ∧⇒) <sup>S</sup> <sup>|</sup> ϕ, ψ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>S</sup> <sup>|</sup> <sup>ϕ</sup> <sup>∧</sup> ψ, Γ <sup>⇒</sup> <sup>Δ</sup> (|⇒∧) <sup>S</sup> <sup>|</sup> <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ S <sup>|</sup> <sup>Γ</sup> <sup>⇒</sup> Δ, ψ S | Γ ⇒ Δ, ϕ ∧ ψ (⇒∨ |) <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ, ψ <sup>|</sup> <sup>S</sup> <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ <sup>∨</sup> <sup>ψ</sup> <sup>|</sup> <sup>S</sup> (∨⇒ |) ϕ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> S ψ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>S</sup> ϕ ∨ ψ, Γ ⇒ Δ | S (|⇒∨) <sup>S</sup> <sup>|</sup> <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ, ψ <sup>S</sup> <sup>|</sup> <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ <sup>∨</sup> <sup>ψ</sup> (| ∨⇒) <sup>S</sup> <sup>|</sup> ϕ, Γ <sup>⇒</sup> Δ S <sup>|</sup> ψ, Γ <sup>⇒</sup> <sup>Δ</sup> S | ϕ ∨ ψ, Γ ⇒ Δ (⇒→ |) <sup>Γ</sup> <sup>⇒</sup> Δ, ψ <sup>|</sup> ϕ, Π <sup>⇒</sup> <sup>Σ</sup> <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ <sup>→</sup> <sup>ψ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> <sup>Σ</sup> (|⇒→) ϕ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> Σ,ψ Γ ⇒ Δ | Π ⇒ Σ,ϕ → ψ (→⇒ |) <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> Σ,ϕ ψ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> <sup>Σ</sup> ϕ → ψ, Γ ⇒ Δ | Π ⇒ Σ (|→⇒) <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> Σ Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> ψ,Π <sup>⇒</sup> <sup>Σ</sup> Γ ⇒ Δ | ϕ → ψ,Π ⇒ Σ

Note that all rules satisfy the subformula property and other desirable properties of well-behaved SC. In particular, they are context independent in the sense that validity-preservation of rules is intact by deletion or addition of the same parameters in the premisses and conclusion. This feature will be of special importance for the proof of the interpolation theorem. One may easily observe that in case of the rules for strong ∧,∨ we have just standard G3 rules but repeated in both components. Rules for negation and implication have different character since side and principal formula are in different sequents in all cases.

Bisequents as such do not directly correspond to standard consequence relations in suitable matrices. Hence before we define the notion of a proof in BSC-K<sup>3</sup> (or any other logic) it is better to start with more general concept. A proof-search tree for a bisequent B in BSC-L, where L is any logic, is a tree of bisequents with B as the root and nodes generated by rules of BSC-L. A proof-search tree is complete iff every leaf is atomic, and it is axiomatic iff all leaves are axiomatic. The height of a proof-search tree is defined as the length of the maximal branches. A simple consequence of the subformula property of rules is:

**Proposition 1.** *Every proof-search tree may be extended to a complete proofsearch tree.*

The notion of a proof in BSC-K<sup>3</sup> is introduced not only by restricting the class of proof-search trees in BSC-K<sup>3</sup> to axiomatic ones but also by restricting the class of admissible roots. In general the rationale for bisequents is that 1-sequent corresponds to consequence relation in 1-matrices and 2-sequent to consequence relation in 2-matrices. Since **K**<sup>3</sup> is characterised by 1-matrix we have:

BSC-K<sup>3</sup> B iff there is an axiomatic proof-search tree for B := Γ ⇒ ϕ |⇒.

We define the L-validity (L-satisfiability) of bisequents in the following way:

**<sup>L</sup>** <sup>|</sup><sup>=</sup> <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> <sup>Σ</sup> iff every homomorphism <sup>h</sup> satisfies <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> <sup>Σ</sup>. The latter holds for h iff for some ϕ: either (ϕ ∈ Γ and h(ϕ) = 1) or (ϕ ∈ Δ and h(ϕ) = 1) or (ϕ ∈ Π and h(ϕ) = 0) or (ϕ ∈ Σ and h(ϕ) = 0).

Clearly **<sup>L</sup>** |<sup>=</sup> <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> <sup>Σ</sup> iff for some <sup>h</sup>, all elements of <sup>Γ</sup> are true, all elements of Δ are either false or undefined, all elements of Π are either true or undefined and all elements of Σ are false. In this case we say that h falsifies this sequent.

Obviously, all axiomatic bisequents are valid for any logic L. As for the rules they are not only sound (i.e. validity-preserving) but also invertible; namely it holds:

**Theorem 1.** *For all rules of BSC-K*3*, all premisses are* **K**3*-valid iff the conclusion is* **K**3*-valid.*

*Proof.* Straightforward proof by tedious checking.

A simple consequence of this theorem is that for every rule the conclusion is falsified by some h iff at least one premiss is falsified by the same h.

**Theorem 2 (Soundness).** *If BSC-K*<sup>3</sup> <sup>Γ</sup> <sup>⇒</sup> <sup>ϕ</sup> |⇒*, then* <sup>Γ</sup> <sup>|</sup>=*<sup>K</sup>*<sup>3</sup> <sup>ϕ</sup>

*Proof.* By induction on the height of the proof, use Theorem 1.

Invertibility of all rules implies that proof search process is confluent, i.e. that the order of applications of rules does not affect the result. In particular, B is provable iff every proof-search tree may be extended to obtain a proof.

**Theorem 3 (Completeness).** *If* <sup>Γ</sup> <sup>|</sup>=*<sup>K</sup>*<sup>3</sup> <sup>ϕ</sup>*, then BSC-K*<sup>3</sup> <sup>Γ</sup> <sup>⇒</sup> <sup>ϕ</sup> |⇒*.*

*Proof.* Assume that <sup>Γ</sup> <sup>|</sup>=*<sup>K</sup>*<sup>3</sup> <sup>ϕ</sup> but BSC-K<sup>3</sup> - Γ ⇒ ϕ |⇒. Hence in every complete proof-search tree for Γ ⇒ ϕ |⇒ there is at least one branch starting with non-axiomatic atomic bisequent falsified by some h. Since all rules inherit this valuation, then the root is also falsified contrary to our assumption.

As a simple consequence we obtain also a decision procedure for **K**<sup>3</sup> (and for other logics L with complete BSC-L). Another by-product of our proof is that the following cut rules are admissible in BSC-K<sup>3</sup> (and other logics):

$$(Cut\ |)\ \begin{array}{c} \Gamma \Rightarrow \Delta, \varphi \mid \Lambda \Rightarrow \Theta \qquad \varphi, \Pi \Rightarrow \Sigma \mid \Xi \Rightarrow \Omega \\ \hline \Gamma, \Pi \Rightarrow \Delta, \Sigma \mid \Lambda, \Xi \Rightarrow \Theta, \Omega \end{array} \end{vmatrix} $$

$$(\mid Cut)\quad \frac{\Gamma \Rightarrow \Delta \mid \Lambda \Rightarrow \Theta, \varphi \qquad \Pi \Rightarrow \Sigma \mid \varphi, \Xi \Rightarrow \Omega}{\Gamma, \Pi \Rightarrow \Delta, \Sigma \mid \Lambda, \Xi \Rightarrow \Theta, \Omega}$$

Moreover, we can constructively prove that these cut rules are admissible in the same way as it is done for four-valued logics in [27]. Due to lack of space we omit this issue here.

Note that the rules stated above provide BSC not only for **K**<sup>3</sup> but also for **LP**. The only difference is that in **LP** we consider as provable all bisequents of the form ⇒| Γ ⇒ ϕ, which is a consequence of the fact that it is determined by 2-matrix. All the results established for BSC-K<sup>3</sup> hold for BSC-LP.

### **4 Bisequent Calculi for Other Logics**

We provide sets of rules adequate for all logics described in Sect. 2. Every operation will be characterised by four rules of introduction to antecedents and consequents of 1- and 2-sequent. The rules are devised on the basis of geometrical insights based on the tabular representation of the respective connective: to establish the premisses for the rule with the principal formula in one of the four positions in a bisequent, we just examine its tabular representation. For example, if indicated values of the arguments form a rectangle, one premiss is enough, in case of more complex shapes, two or three premisses are required. Since the process of construction of rules on the basis of tables is not deterministic we do not propose any algorithm for that aim, however by the end of this section we will illustrated the method with one example. In every case it holds that either:

$$\Gamma \vdash\_L \varphi \text{ iff } \text{BSC-L} \vdash \Gamma \Rightarrow \varphi \mid \Rightarrow \quad \text{or} \quad \Gamma \vdash\_L \varphi \text{ iff } \text{BSC-L} \vdash \Rightarrow \mid \Gamma \Rightarrow \varphi$$

depending on the fact whether |=*<sup>L</sup>* denotes consequence relation for logics characterised by 1-matrices or by 2-matrices. Adequacy of BSC-L for all concrete logics is proved in the same way as for BSC-K3. Therefore we limit our presentation to systematic characterisation of rules from which the BSC for suitable logic can be composed.

We start with rules for respective unary operations (including Lukasiewicz's modalities):

(| ¬*H*⇒) <sup>S</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> Σ,ϕ <sup>S</sup> | ¬ϕ, Π <sup>⇒</sup> <sup>Σ</sup> (|⇒¬*H*) <sup>S</sup> <sup>|</sup> ϕ, Π <sup>⇒</sup> <sup>Σ</sup> S | Π ⇒ Σ,¬ϕ (¬*B*⇒ |) <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ <sup>|</sup> <sup>S</sup> <sup>¬</sup>ϕ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>S</sup> (⇒¬*<sup>B</sup>* <sup>|</sup>) ϕ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>S</sup> Γ ⇒ Δ,¬ϕ | S (| ¬*P*⇒) <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> Σ, ϕ ϕ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> <sup>Σ</sup> Γ ⇒ Δ | ¬ϕ, Π ⇒ Σ (|⇒¬*<sup>P</sup>* ) <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ <sup>|</sup> ϕ, Π <sup>⇒</sup> <sup>Σ</sup> <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> Σ,¬<sup>ϕ</sup> (¬*DP*⇒|) <sup>Γ</sup> <sup>⇒</sup> Δ, ϕ <sup>|</sup> ϕ, Π <sup>⇒</sup> <sup>Σ</sup> ¬ϕ, Γ ⇒ Δ | Π ⇒ Σ (⇒ ¬*DP* <sup>|</sup>) <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> Σ, ϕ ϕ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> <sup>Σ</sup> Γ ⇒ Δ,¬ϕ | Π ⇒ Σ

The remaining rules in each case (namely (¬*H*⇒ |), (⇒¬*<sup>H</sup>* |), (|¬*B*⇒), (|⇒¬*B*) (¬*P*⇒ |), (⇒¬*<sup>P</sup>* |), (| ¬*DP*⇒) and (|⇒¬*DP* )) are like respective rules of BSC-K3. Consider premisses of (|⇒¬*<sup>P</sup>* ) and (¬*DP*⇒|) displaying two occurrences of the same side formula: in semantical terms it gives the effect of evaluating ϕ as undefined.

$$\begin{array}{ll} (\lozenge \Rightarrow \text{)} & \frac{\varGamma \Rightarrow \varDelta \mid \varphi, \varPi \Rightarrow \varSigma}{\lozenge \varphi, \varGamma \Rightarrow \varDelta \mid \varPi \Rightarrow \varSigma} \ (\upzenge \land \text{)} & \frac{\varGamma \Rightarrow \varDelta \mid \varPi \Rightarrow \varSigma, \varphi}{\varGamma \Rightarrow \Delta, \lozenge \varphi \mid \varPi \Rightarrow \varSigma} \\\\ (\lozenge \Rightarrow) & \frac{\varGamma \Rightarrow \Delta \mid \varphi, \varPi \Rightarrow \varSigma}{\varGamma \Rightarrow \Delta \mid \lozenge \varphi, \varPi \Rightarrow \varSigma} \ (\upzenge \land \text{)} & \frac{\varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \varSigma, \varphi}{\varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \lozenge \varphi} \\\\ (\upzenge \Rightarrow) & \frac{\varphi, \varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \varSigma}{\vartriangleleft \varphi, \varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \varSigma} \ (\upzenge \sqenge \sqsupset \mid \text{)} & \frac{\varGamma \Rightarrow \Delta, \varphi \mid \varPi \Rightarrow \varSigma}{\varGamma \Rightarrow \Delta, \varBox \varphi \mid \varPi \Rightarrow \Sigma} \\\\ (\upzenge \hline \end{array}$$

$$(\bigzenge \begin{array}{rcl} \varphi, \varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma \\ \hline \Gamma \Rightarrow \Delta \mid \varPi \varphi, \varPi \Rightarrow \Sigma \end{array} (\bigzenge \begin{array}{rcl} \Gamma \Rightarrow \Delta, \varphi \mid \varPi \Rightarrow \Sigma \\ \hline \Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \big(\varphi \end{array} \end{array}$$

Not surprisingly rules introducing modal formula to antecedents or to succedents of 1- and 2-sequents have the same premisses; this is a consequence of the fact that such formula is never undefined. The same remark applies to rules for ¬*<sup>H</sup>* and ¬*B*.

The set of rules for weak ∧,∨,→ is also partly identical with those for BSC-K3. The identical rules are (∧*w*⇒|), (⇒∧*<sup>w</sup>* |), (|⇒∨*w*), (| ∨*w*⇒), (|⇒→*w*) and (|→*w*⇒). In the remaining cases we have three premiss rules:

$$\begin{array}{lll} (|\wedge\_{w}\Box\_{w}) & \Gamma\Rightarrow\Delta\mid\varphi,\psi,\Pi\Rightarrow\Sigma & \Gamma\Rightarrow\Delta,\psi\mid\varphi,\Pi\Rightarrow\Sigma\\ & & \Gamma\Rightarrow\Delta\mid\varphi\land\psi,\Pi\Rightarrow\Sigma\\ (|\Rightarrow\_{w}) & \frac{\Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\varphi,\psi & \varphi,\Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\psi & \psi,\Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\varphi\\ (|\Rightarrow\_{w}) & \frac{\Gamma\Rightarrow\Delta,\varphi,\psi\mid\Pi\Rightarrow\Sigma & \Gamma\Rightarrow\Delta,\varphi\mid\varphi,\Pi\Rightarrow\Sigma & \Gamma\Rightarrow\Delta,\psi\mid\psi,\Pi\Rightarrow\Sigma\\ (\Rightarrow\_{w}) & \frac{\Gamma\Rightarrow\Delta,\varphi,\psi\mid\Pi\Rightarrow\Sigma & \Gamma\Rightarrow\Delta,\varphi\mid\psi\mid\Pi\Rightarrow\Sigma\\ (\vee\_{w}\to\square) & \varphi,\psi,\Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma & \varphi,\Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\psi & \psi,\Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\varphi \end{array}$$

$$\{\psi\_w \Rightarrow \mid\}\quad\frac{\varphi,\psi,\Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \varSigma \qquad \varphi,\Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma,\psi \qquad \psi,\Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma,\varphi \}}{\varphi \lor \psi,\Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma}$$

$$\{\Rightarrow \rightarrow\_w | \ | \ \begin{array}{c} \Gamma \Rightarrow \Delta, \psi \mid \varphi, \Pi \Rightarrow \Sigma \qquad \Gamma \Rightarrow \Delta, \varphi \mid \varphi, \Pi \Rightarrow \Sigma \qquad \Gamma \Rightarrow \Delta, \psi \mid \psi, \Pi \Rightarrow \Sigma \\\hline \end{array} \}$$

$$\Gamma \Rightarrow \Delta, \varphi \rightarrow \psi \mid \Pi \Rightarrow \Sigma$$

$$\{\left(\rightarrow\_w \Rightarrow \mid\right) \begin{array}{c} \varphi, \psi, \Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \varSigma \qquad \Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \varphi, \psi \qquad \psi, \Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \varphi \\ \hline \varphi \rightarrow \psi, \Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma \end{array} \}$$

In the case of **K**<sup>→</sup> <sup>3</sup> and **K**<sup>←</sup> <sup>3</sup> the specific rules are:

$$\begin{array}{ll} \left( \left| \begin{array}{c} \Rightarrow \land\_{mC} \\ \end{array} \right) & \frac{\Gamma \Rightarrow \Delta \mid H \Rightarrow \Sigma, \varphi \quad \varphi, \Gamma \Rightarrow \Delta \mid H \Rightarrow \Sigma, \psi \\ \hline \Gamma \Rightarrow \Delta \mid H \Rightarrow \Sigma, \varphi \land \psi \\ \end{array} \\ \left( \left| \begin{array}{c} \Rightarrow \Delta \mid \varphi, \psi, H \Rightarrow \Sigma \\ \end{array} \right. \\ \left( \left| \begin{array}{c} \Rightarrow \Delta \mid \varphi, \psi, H \Rightarrow \Sigma \\ \end{array} \right. \\ \left( \left| \begin{array}{c} \Rightarrow \Box \; \exists \; \exists \; \; \; \; H \Rightarrow \Sigma \\ \qquad \varphi \lor \psi, \Gamma \Rightarrow \Delta \mid H \Rightarrow \Sigma \\ \end{array} \right. \\ \left( \left| \begin{array}{c} \Rightarrow \Delta, \varphi, \psi \; \; \; H \Rightarrow \Sigma \\ \qquad \Gamma \Rightarrow \Delta, \varphi \lor \psi \; \; \; H \Rightarrow \Sigma \\ \end{array} \right. \\ \left( \left| \begin{array}{c} \Rightarrow \Delta, \psi \; \; \; \; \varphi, H \Rightarrow \Sigma \\ \end{array} \begin{array}{c} \Gamma \Rightarrow \Delta, \varphi \; \; \; \; \; \; \Gamma \Rightarrow \Delta \\ \end{array} \right. \\ \left( \left| \begin{array}{c} \Rightarrow \Delta, \psi \; \; \; \; \; \varphi, H \Rightarrow \Sigma \\ \qquad \Gamma \Rightarrow \Delta, \varphi \; \; \; \; \; \; \; \; \; \; \; \; \;\Rightarrow \; \; \; \; \; \;\; \end{array} \right. \\ \end{array} \end{array}$$

(→⇒*mC* <sup>|</sup>) ϕ, ψ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> Σ Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> Σ,ϕ ϕ → ψ, Γ ⇒ Δ | Π ⇒ Σ (| ⇒∧*K*) <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> Σ,ψ ψ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> Σ,ϕ Γ ⇒ Δ | Π ⇒ Σ,ϕ ∧ ψ (| ∧⇒*K*) <sup>Γ</sup> <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> ϕ, ψ, Π <sup>⇒</sup> Σ Γ <sup>⇒</sup> Δ, ψ <sup>|</sup> ψ,Π <sup>⇒</sup> <sup>Σ</sup> Γ ⇒ Δ | ϕ ∧ ψ,Π ⇒ Σ (∨⇒*<sup>K</sup>* <sup>|</sup>) ϕ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> Σ,ψ ψ, Γ <sup>⇒</sup> <sup>Δ</sup> <sup>|</sup> <sup>Π</sup> <sup>⇒</sup> <sup>Σ</sup> ϕ ∨ ψ, Γ ⇒ Δ | Π ⇒ Σ

$$(\Rightarrow \lor\_K \mid) \quad \frac{\Gamma \Rightarrow \Delta, \varphi, \psi \mid \varPi \Rightarrow \Sigma \qquad \Gamma \Rightarrow \Delta, \psi \mid \psi, \varPi \Rightarrow \Sigma}{\Gamma \Rightarrow \Delta, \varphi \lor \psi \mid \varPi \Rightarrow \Sigma}$$

$$\left(\left(\Rightarrow \to\_K \mid \right)\right)\;\frac{\Gamma \Rightarrow \Delta, \psi \mid \varphi, \Pi \Rightarrow \Sigma \qquad \Gamma \Rightarrow \Delta, \psi \mid \psi, \Pi \Rightarrow \Sigma}{\Gamma \Rightarrow \Delta, \varphi \to \psi \mid \Pi \Rightarrow \Sigma}$$

$$(\mathop{\rightarrow}\mathop{\Rightarrow}\_{K}\big|)\quad\frac{\psi,\Gamma\Rightarrow\Delta\mid\ II\Rightarrow\Sigma\qquad\Gamma\Rightarrow\Delta\mid\ II\Rightarrow\Sigma,\varphi,\psi}{\varphi\rightarrow\psi,\Gamma\Rightarrow\Delta\mid\ II\Rightarrow\Sigma}$$

The remaining rules in both cases are identical with (∧⇒|), (⇒∧ |), (|⇒∨), (| ∨⇒), (| ⇒→) and (| →⇒) from BSC-K3.

The implication of Lukasiewicz [34] and his specific additive ∧*<sup>L</sup>* and ∨*<sup>L</sup>* are characterised by the following rules:

$$\begin{array}{lll} (|\wedge\_{L}\rightarrow\rangle & \frac{\varphi,\Gamma\Rightarrow\Delta\mid\psi,\Pi\Rightarrow\Sigma\qquad\psi,\Gamma\Rightarrow\Delta\mid\varphi,\Pi\Rightarrow\Sigma\\ \hline \Gamma\Rightarrow\Delta\mid\varphi\land\psi,\Pi\Rightarrow\Sigma\\ (|\Rightarrow\_{L}) & \frac{\Gamma\Rightarrow\Delta,\varphi,\psi\mid\Pi\Rightarrow\Sigma\qquad\varphi,\Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\psi\end{array} & \begin{array}{lll} \varphi,\Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\psi & \begin{array}{c} \psi,\Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\varphi\\ \hline \Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\varphi\land\psi \end{array} \end{array} \\\\ (|\vee\_{L}\rightarrow\rangle & \frac{\varphi,\Gamma\Rightarrow\Delta\mid\psi,\Pi\Rightarrow\Sigma\qquad\psi,\Gamma\Rightarrow\Delta\mid\varphi,\Pi\Rightarrow\Sigma\\ (|\Rightarrow\_{L}) & \frac{\Gamma\Rightarrow\Delta,\varphi,\psi\mid\Pi\Rightarrow\Sigma\qquad\varphi,\Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\psi\end{array} & \begin{array}{ccc} \varphi,\Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\varphi\\ \hline \Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\psi\vee\psi\\ \hline \Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\varphi\lor\psi\\ \hline \end{array} \end{array}$$
 
$$(\begin{array}{ccc} \varphi\rightarrow\Delta\mid\iff\Delta,\psi\mid\Pi\Rightarrow\Sigma&\Gamma\Rightarrow\Delta\mid\varphi,\Pi\Rightarrow\Sigma,\psi\\ \hline \end{array} \quad\qquad\qquad\Gamma\Rightarrow\Delta\mid\varphi,\Pi\Rightarrow\Sigma,\psi$$
 
$$(\begin{array}{ccc} \varphi\rightarrow\Delta\mid\iff\Delta,\varphi\lor\psi,\Pi\Rightarrow\Sigma&\Gamma\Rightarrow\Delta\mid\varphi,\Pi\Rightarrow\Sigma,\psi\\ \hline \end{array} \quad\qquad\qquad\qquad\Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\varphi$$
 
$$(\begin{array}{ccc} \tau\rightarrow\Delta\mid\iff\Delta,\varphi\mid\psi,\Pi\Rightarrow\Sigma&\psi\\ \hline \end{array} \quad\qquad\qquad\qquad\qquad\qquad\Gamma\Rightarrow\Delta\mid\Pi\Rightarrow\Sigma,\varphi$$

The remaining rules are identical with (∧⇒|), (⇒∧ |), (⇒∨ |), (∨⇒|), (| ⇒→) and (| →⇒) from BSC-K3.

For Soboci´nski's connectives we have:

$$(\land\_S \Rightarrow \mid) \quad \frac{\varphi, \Gamma \Rightarrow \Delta \mid \psi, \Pi \Rightarrow \Sigma \qquad \psi, \Gamma \Rightarrow \Delta \mid \varphi, \Pi \Rightarrow \Sigma}{\varphi \land \psi, \Gamma \Rightarrow \Delta \mid \Pi \Rightarrow \Sigma}$$

$$\begin{array}{lll} (\Rightarrow \wedge\_{S} \mid) & \begin{array}{l} \Gamma \Rightarrow \Delta, \varphi, \psi \mid \varPi \Rightarrow \Sigma \\ \Gamma \Rightarrow \Delta, \varphi \land \psi \mid \varPi \Rightarrow \Sigma, \psi \\ \Gamma \Rightarrow \Delta, \varphi \land \psi \mid \varPi \Rightarrow \Sigma \end{array} & \begin{array}{l} \psi, \Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \psi \\ \Pi \Rightarrow \Sigma \end{array} \\\\ (\Rightarrow \vee\_{S}) & \begin{array}{l} \Gamma \Rightarrow \Delta, \varphi \mid \varPi \Rightarrow \Sigma, \psi \\ \Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \varphi \lor \psi \end{array} & \begin{array}{l} \Gamma \Rightarrow \Delta, \psi \mid \varPi \Rightarrow \Sigma, \varphi \\ \Gamma \Rightarrow \Delta, \varphi \lor \psi \end{array} \\\\ (\Rightarrow \Delta) & \begin{array}{l} \Gamma \Rightarrow \Delta \mid \varphi, \psi, \varPi \Rightarrow \Sigma \\ \Gamma \Rightarrow \Delta \mid \varphi \lor \psi, \varPi \Rightarrow \Sigma \end{array} & \begin{array}{l} \psi, \varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \varphi \\ \Gamma \Rightarrow \Delta \mid \varphi \lor \psi, \varPi \Rightarrow \Sigma, \psi \\ \Gamma \Rightarrow \Delta \mid \varphi, \varPi \Rightarrow \Sigma, \psi \\ \end{array} \\\\ (\Longrightarrow \end{array}$$

$$(\mid \to\_S \Rightarrow) \quad \frac{\Gamma \Rightarrow \Delta, \varphi \mid \psi, \Pi \Rightarrow \Sigma \qquad \psi, \Gamma \Rightarrow \Delta \mid \Pi \Rightarrow \Sigma \qquad \Gamma \Rightarrow \Delta \mid \Pi \Rightarrow \Sigma, \varphi \mid \varphi \mid \psi, \Pi \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi, \Sigma \Rightarrow \Sigma \qquad \varphi \mid \psi \mid \psi$$

The remaining rules look like in BSC-K3.

Sette's connectives are characterised by the following rules:

$$\begin{array}{l} (\Rightarrow\_{Se} \mid \begin{array}{l} \Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \varphi \quad \Gamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \psi\\ \Gamma \Rightarrow \Delta, \varphi \land \psi \mid \varPi \Rightarrow \Sigma \end{array} \\\\ (\land\_{Se} \Rightarrow \mid) \dfrac{\varGamma \Rightarrow \Delta \mid \varphi, \psi, \varPi \Rightarrow \Sigma}{\varphi \land \psi, \varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma} \left( \Rightarrow \forall\_{Se} \mid \right) \dfrac{\varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \varphi, \psi}{\varGamma \Rightarrow \Delta, \varphi \lor \psi \mid \varPi \Rightarrow \Sigma} \\\\ (\forall\_{Se} \Rightarrow \mid) \dfrac{\varGamma \Rightarrow \Delta \mid \varphi, \varPi \Rightarrow \Sigma \quad \Gamma \Rightarrow \Delta \mid \psi, \varPi \Rightarrow \Sigma}{\varphi \lor \psi, \varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma} \\\\ (\rightarrow\_{Se} \Rightarrow \mid) \dfrac{\varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \varphi \quad \Gamma \Rightarrow \Delta \mid \psi, \varPi \Rightarrow \Sigma}{\varphi \to \psi, \varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma} \end{array}$$

$$\left(\left(\Rightarrow \underset{Se}{\to}\right)\right)\frac{\Gamma \Rightarrow \Delta \mid \varphi, \Pi \Rightarrow \Sigma, \psi}{\Gamma \Rightarrow \Delta, \varphi \rightarrow \psi \mid \Pi \Rightarrow \Sigma} \left(\left|\Rightarrow \rightarrow\_{Se}\right)\right)\frac{S \mid \varphi, \Pi \Rightarrow \Sigma, \psi}{S \mid \Pi \Rightarrow \Sigma, \varphi \rightarrow \psi}$$

$$\left( \left| \to\_{Se} \Rightarrow \right| \begin{array}{c} S \mid \varPi \Rightarrow \Sigma, \varphi \qquad S \mid \psi, \varPi \Rightarrow \Sigma \\\hline S \mid \varphi \to \psi, \varPi \Rightarrow \Sigma \end{array} \right)$$

(| ∧*Se*⇒), (|⇒∧*Se*), (|⇒∨*Se*), (| ∨*Se*⇒) are like in BSC-K3. Finally Carnielli and Sette connectives characterising **I<sup>1</sup>** and **I<sup>2</sup>**:

$$\begin{array}{c} (|\Rightarrow \wedge\_C) \xrightarrow{\Gamma \Rightarrow \Delta, \varphi \mid \varPi \Rightarrow \Sigma \quad \Gamma \Rightarrow \Delta, \psi \mid \varPi \Rightarrow \Sigma \\\\ (|\land\_C \Rightarrow) \frac{\varphi, \psi, \varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \varphi \wedge \psi}{\varGamma \Rightarrow \Delta \mid \varphi \wedge \psi, \varPi \Rightarrow \Sigma} \end{array} \begin{array}{c} \Box \Rightarrow \Delta, \varphi, \psi \mid \varPi \Rightarrow \Sigma \\\\ (|\Rightarrow \circ\_C) \frac{\varphi, \psi, \varGamma \Rightarrow \Delta \mid \varphi \wedge \psi, \varPi \Rightarrow \Sigma}{\varGamma \Rightarrow \Delta \mid \varPi \Rightarrow \Sigma, \varphi \vee \psi} \end{array}$$

336 A. Indrzejczak and Y. Petrukhin

$$\begin{array}{c} (|\vee\_{C\Rightarrow}) \xrightarrow{\varphi,\Gamma\Rightarrow\Delta\mid\ H\Rightarrow\Sigma\quad\psi,\Gamma\Rightarrow\Delta\mid\ H\Rightarrow\Sigma} \\ \Gamma\Rightarrow\Delta\mid\varphi\lor\psi,\Pi\Rightarrow\Sigma \\\\ (|\rightarrow\_{C\Rightarrow}) \xrightarrow{\Gamma\Rightarrow\Delta,\varphi\mid\ H\Rightarrow\Sigma\quad\psi,\Gamma\Rightarrow\Delta\mid\ H\Rightarrow\Sigma} \\ \end{array}$$

$$(|\Rightarrow\_{C}) \xrightarrow{\varphi,\Gamma\Rightarrow\Delta,\psi\mid\ H\Rightarrow\Sigma} (\Rightarrow\Box,\psi\mid\ H\Rightarrow\Sigma)$$

$$(|\Rightarrow\to\_{C}) \frac{\varphi,\Gamma\Rightarrow\Delta,\psi\mid\ H\Rightarrow\Sigma}{\Gamma\Rightarrow\Delta\mid\ H\Rightarrow\Sigma,\varphi\to\psi} \left(\Rightarrow\Box\neg c\right) \begin{array}{c} \varphi,\Gamma\Rightarrow\Delta,\psi\mid S \\ \Gamma\Rightarrow\Delta,\varphi\to\psi\mid S \end{array}$$

$$(\to\_C \Rightarrow \mid) \quad \frac{\Gamma \Rightarrow \Delta, \varphi \mid S \quad \quad \psi, \Gamma \Rightarrow \Delta \mid S}{\varphi \to \psi, \Gamma \Rightarrow \Delta \mid S}$$

(∧*C*⇒ |), (⇒∧*<sup>C</sup>* |), (⇒∨*<sup>C</sup>* |), (∨*C*⇒ |) are like in BSC-K3.

We finish with the characterisation of the remaining implications introduced in Sect. 2. In most cases it is obtained by combining rules which were previously introduced. In particular:

Slupecki's [49] implication is characterised by means of: (|⇒→) and (|→⇒) from BSC-K<sup>3</sup> as well as (⇒→*<sup>C</sup>* |) and (→*C*⇒ |).

Heyting's implication [22] is characterised by means of: (→*L*⇒ |), (⇒→*<sup>L</sup>* |), (|⇒→*Se*),(|→*Se*⇒).

D'Ottaviano/DaCosta/Ja´skowski/Slupecki's implication [13,30,48] is characterised by means of: (→⇒ |), (⇒→ |), (|⇒→*Se*),(|→*Se*⇒).

Rescher's implication [43] is characterised by means of: (→*L*⇒ |), (⇒→*<sup>L</sup>* |), (|⇒→*S*),(|→*S*⇒).

Tomova's implication [22] is characterised by means of: (→*L*⇒ |), (⇒→*<sup>L</sup>* |), (|⇒→*<sup>C</sup>* ),(|→*C*⇒).

Only in case of Soboci´nski's implication → *<sup>S</sup>* we have a pair of new rules:

$$\{\left(\rightarrow\_{S}^{\prime}\Rightarrow\right)\}\quad\frac{\psi,\Gamma\Rightarrow\Delta\mid\ II\Rightarrow\Sigma\qquad\Gamma\Rightarrow\Delta\mid\ II\Rightarrow\Sigma,\varphi,\psi\qquad\Gamma\Rightarrow\Delta,\varphi\mid\varphi,\psi,\Pi\Rightarrow\Sigma\qquad\varphi$$

$$\varphi\rightarrow\psi,\Gamma\Rightarrow\Delta\mid\ II\Rightarrow\Sigma$$

$$\{\Rightarrow \stackrel{\epsilon'}{\to}\_S | \} \quad \frac{\varphi, \Gamma \Rightarrow \Delta, \psi \mid \Pi \Rightarrow \Sigma \quad \Gamma \Rightarrow \Delta \mid \varphi, \Pi \Rightarrow \Sigma, \psi \quad \Gamma \Rightarrow \Delta, \psi \mid \psi, \Pi \Rightarrow \Sigma, \varphi}{\Gamma \Rightarrow \Delta, \varphi \rightarrow \psi \mid \Pi \Rightarrow \Sigma}$$

The remaining two rules are: (|⇒→*Se*) and (|→*Se*⇒).

Let us show how (⇒→ *<sup>S</sup>*|) was obtained on the basis of the table for → *<sup>S</sup>* from p. 4. ϕ → ψ is either false or undefined which corresponds to four cells:


The remaining premiss covers the cell with 0 in the first and second rows attributed to ψ while ϕ is 1 or u. Note that since the left premiss covers the first row and the right premiss covers the last row we could alternatively formulate the middle premiss as Γ ⇒ Δ, ϕ | ϕ, Π ⇒ Σ,ψ to cover exactly the cell with 0 in the second row (here ϕ is just u) but since ψ is 0 in two rows where ϕ is 1 or u we can be more economical here. A reader can check that many rules can be formulated in alternative way. We always tried to find the most economical representation which can be used easily also for proving syntactically the cut elimination theorem (which will be shown in the extended version of this paper).

Now, consider an arbitrary connective c of the logic **L**, the corresponding operation c as characterised by suitable matrix determining **L** in Sect. 2, and the four rules for c. It holds:

**Theorem 4.** *For all presented rules characterising arbitrary* c *of any* **L***: all premisses are* **L***-valid iff the conclusion is* **L***-valid.*

*Proof.* This is an analogue of Theorem 1 for any considered logic **L** which implies adequacy of respective BSC-L.

### **5 Interpolation**

We present a constructive proof of the interpolation theorem for some logics based on the strategy proposed by Muskens and Wintein [58]. It was originally applied in tableau setting for Belnap-Dunn four-valued logic as well as for **K<sup>3</sup>** and **LP**. Here we demonstrate that BSC can be also used for showing that interpolation holds for some paracomplete and paraconsistent logics. Let **<sup>L</sup>** <sup>∈</sup> {**I<sup>1</sup>**, **<sup>I</sup><sup>2</sup>**, **<sup>P</sup><sup>1</sup>**, **<sup>P</sup><sup>2</sup>**}.

**Theorem 5.** *For any contingent formulae* ϕ, ψ*, if* <sup>ϕ</sup> <sup>|</sup>=*<sup>L</sup>* <sup>ψ</sup>*, then we can construct an interpolant for* **<sup>I</sup><sup>1</sup>**, **<sup>I</sup><sup>2</sup>** *on the basis of proof-search trees for* <sup>ϕ</sup> ⇒|⇒ *and* <sup>⇒</sup> <sup>ψ</sup> |⇒ *and an interpolant for* **<sup>P</sup><sup>1</sup>**, **<sup>P</sup><sup>2</sup>** *on the basis of proof-search trees for* ⇒| ϕ ⇒ *and* ⇒|⇒ ψ *in suitable BSC-L.*

*Proof.* We will demonstrate the case of BSC-I<sup>1</sup>; the case of BSC-I<sup>2</sup> is identical and the cases of BSC-P<sup>1</sup> and BSC-P<sup>2</sup> are dual, so we only comment on them in the key points. Assume that ϕ |=*I*<sup>1</sup> ψ; hence by completeness we have a cut-free proof of <sup>ϕ</sup> <sup>⇒</sup> <sup>ψ</sup> |⇒ in BSC-I<sup>1</sup>. Now produce complete proof-search trees for ϕ ⇒|⇒ and ⇒ ψ |⇒. Since ϕ, ψ are contingent, they have some nonaxiomatic leaves. Let Γ<sup>1</sup> ⇒ Δ<sup>1</sup> | Π<sup>1</sup> ⇒ Σ1,...,Γ*<sup>k</sup>* ⇒ Δ*<sup>k</sup>* | Π*<sup>k</sup>* ⇒ Σ*<sup>k</sup>* be the list of non-axiomatic atomic leaves of the proof-search tree for ϕ ⇒|⇒ and Θ<sup>1</sup> ⇒ Λ<sup>1</sup> | Ξ<sup>1</sup> ⇒ Ω1,...,Θ*<sup>n</sup>* ⇒ Λ*<sup>n</sup>* | Ξ*<sup>n</sup>* ⇒ Ω*<sup>n</sup>* such a list taken from the proof-search tree for ⇒ ψ |⇒. It holds:

*Claim (1).* For any i ≤ k and j ≤ n, Γ*i*, Θ*<sup>j</sup>* ⇒ Δ*i*,Λ*<sup>j</sup>* | Π*i*, Ξ*<sup>j</sup>* ⇒ Σ*i*, Ω*<sup>j</sup>* is an axiomatic atomic bisequent.

To see this take a tree for ϕ ⇒|⇒ and add ψ to succedents of all 1-sequents in the tree. Due to context independence of all rules it is a correct proof-search tree. Now for each leaf Γ*<sup>i</sup>* ⇒ Δ*i*, ψ | Π*<sup>i</sup>* ⇒ Σ*<sup>i</sup>* append a tree of ⇒ ψ |⇒ but with Γ*<sup>i</sup>* added to each antecedent and Δ*<sup>i</sup>* added to each succedent of 1-sequents, and similarly with Π*<sup>i</sup>* and Σ*<sup>i</sup>* in all 2-sequents. In the resulting proof-search tree we have leaves of the form Γ*i*, Θ*<sup>j</sup>* ⇒ Δ*i*,Λ*<sup>j</sup>* | Π*i*, Ξ*<sup>j</sup>* ⇒ Σ*i*, Ω*<sup>j</sup>* for all i ≤ k and <sup>j</sup> <sup>≤</sup> <sup>n</sup>. If at least one of them is not axiomatic, then ϕ ⇒ ψ |⇒.

Next for every Γ*<sup>i</sup>* ⇒ Δ*<sup>i</sup>* | Π*<sup>i</sup>* ⇒ Σ*i*, i ≤ k, define the following sets:

Γ *<sup>i</sup>* = Γ*<sup>i</sup>* ∩ ( - Λ*<sup>j</sup>* ∪ - Ω*<sup>j</sup>* ) for j ≤ n Δ *<sup>i</sup>* = Δ*<sup>i</sup>* ∩ - Θ*<sup>j</sup>* for j ≤ n Π *<sup>i</sup>* = Π*<sup>i</sup>* ∩ - Ω*<sup>j</sup>* for j ≤ n Σ *<sup>i</sup>* = Σ*<sup>i</sup>* ∩ ( - Θ*<sup>j</sup>* ∪ - Ξ*<sup>j</sup>* ) for j ≤ n

Since every Γ*i*, Θ*<sup>j</sup>* ⇒ Δ*i*,Λ*<sup>j</sup>* | Π*i*, Ξ*<sup>j</sup>* ⇒ Σ*i*, Ω*<sup>j</sup>* is axiomatic we are guaranteed that Γ *i*∪Δ *i*∪Π *i*∪Σ *<sup>i</sup>* <sup>=</sup> <sup>∅</sup>. Note also that AT(Γ *i*∪Δ *i*∪Π *i*∪Σ *<sup>i</sup>*) ⊆ AT(ϕ)∩AT(ψ), where AT stands for the set of atoms. Now define an interpolant Int(ϕ, ψ) for considered logics. For **I<sup>1</sup>**, **I<sup>2</sup>** it has the same form:

$$
\bigwedge I\_1' \wedge \bigwedge \neg \Sigma\_1' \wedge \neg \left( \bigvee \neg \Pi\_1' \vee \bigvee \Delta\_1' \right) \vee \dots \vee \bigwedge I\_k' \wedge \bigwedge \neg \Sigma\_k' \wedge \neg \left( \bigvee \neg \Pi\_k' \vee \bigvee \Delta\_k' \right),
$$

where ¬Π means the set of negations of all elements in Π.

For **P<sup>1</sup>**, **P<sup>2</sup>** Int(ϕ, ψ) is defined as:

$$\bigwedge H\_1' \wedge \bigwedge \neg \Delta\_1' \wedge \neg \left( \bigvee \neg \Gamma\_1' \lor \bigvee \Sigma\_1' \right) \vee \dots \vee \bigwedge H\_k' \wedge \bigwedge \neg \Delta\_k' \wedge \neg \left( \bigvee \neg \Gamma\_k' \lor \bigvee \Sigma\_k' \right)$$

We can show that:

*Claim (2).* Int(ϕ, ψ) is an interpolant for ϕ |=*<sup>L</sup>* ψ.

*Proof.* As an example, we present the proof for BSC-I<sup>1</sup>. For the sake of proof let us recall that BSC-I<sup>1</sup> consists of the rules characterising <sup>∧</sup>*<sup>C</sup>* ,∨*<sup>C</sup>* ,→*<sup>C</sup>* and <sup>¬</sup>*H*. However, most of the rules necessary for conducting the proof are identical with respective rules from BSC-K3, so the label C in their names will be omitted in these cases for easier recognition where the specific rules (concretely (| ∨*<sup>C</sup>* ⇒) and (|⇒ ∨*<sup>C</sup>* )) are required.

Since for every Γ *<sup>i</sup>* ∧ ¬Σ *<sup>i</sup>* ∧ ¬( ¬Π *<sup>i</sup>* ∨ Δ *<sup>i</sup>*) all (negated) atoms are by definition taken from AT(ϕ) <sup>∩</sup> AT(ψ), we must only prove that BSC-I<sup>1</sup> <sup>ϕ</sup> <sup>⇒</sup> Int(ϕ, ψ) |⇒, and BSC-I<sup>1</sup> Int(ϕ, ψ) <sup>⇒</sup> <sup>ψ</sup> |⇒ (the same for BSC-I<sup>2</sup>), and BSC-P<sup>1</sup> ⇒| <sup>ϕ</sup> <sup>⇒</sup> Int(ϕ, ψ) and BSC-P<sup>1</sup> ⇒| Int(ϕ, ψ) <sup>⇒</sup> <sup>ψ</sup> (and the same for BSC-P<sup>2</sup>).

Again take a complete proof-search tree for ϕ ⇒|⇒ and add Int(ϕ, ψ) to every succedent of 1-sequent. For every Γ*<sup>i</sup>* ⇒ Δ*i*, Int(ϕ, ψ) | Π*<sup>i</sup>* ⇒ Σ*<sup>i</sup>* apply (⇒∨|) to get

$$
\Gamma\_i \Rightarrow \Delta\_i, \bigwedge \Gamma\_i' \land \bigwedge \neg \Sigma\_i' \land \neg (\bigvee \neg \Pi\_i' \lor \bigvee \Delta\_i'), \operatorname{Int}(\varphi, \psi)^{-i} \mid \Pi\_i \Rightarrow \Sigma\_i, \Delta
$$

where Int(ϕ, ψ)−*<sup>i</sup>* is the rest of the disjunction (if any). Applying (⇒∧|) we obtain three bisequents:

$$\begin{array}{l} \text{(a)} \ I\_{i} \Rightarrow \Delta\_{i}, \bigwedge I'\_{i}, Int(\varphi, \psi)^{-i} \mid \varPi\_{i} \Rightarrow \Sigma\_{i} \\\text{(b)} \ I\_{i} \Rightarrow \Delta\_{i}, \bigwedge \neg \Sigma'\_{i}, Int(\varphi, \psi)^{-i} \mid \varPi\_{i} \Rightarrow \Sigma\_{i} \\\text{(c)} \ I'\_{i} \Rightarrow \Delta\_{i}, \neg(\bigvee \neg \Pi'\_{i} \lor \bigvee \Delta'\_{i}), Int(\varphi, \psi)^{-i} \mid \varPi\_{i} \Rightarrow \Sigma\_{i}. \end{array}$$

Systematically applying (⇒∧|) to (a) we obtain <sup>Γ</sup>*<sup>i</sup>* <sup>⇒</sup> <sup>Δ</sup>*i*, p, Int(ϕ, ψ)−*<sup>i</sup>* <sup>|</sup> Π*<sup>i</sup>* ⇒ Σ*<sup>i</sup>* for each p ∈ Γ *<sup>i</sup>* and since Γ *<sup>i</sup>* ⊆ Γ*<sup>i</sup>* they are all axiomatic. Similarly with (b) but now we first obtain <sup>Γ</sup>*<sup>i</sup>* <sup>⇒</sup> <sup>Δ</sup>*i*,¬p, Int(ϕ, ψ)−*<sup>i</sup>* <sup>|</sup> <sup>Π</sup>*<sup>i</sup>* <sup>⇒</sup> <sup>Σ</sup>*<sup>i</sup>* for each p ∈ Σ *<sup>i</sup>*. After the application of (⇒¬|) we obtain <sup>Γ</sup>*<sup>i</sup>* <sup>⇒</sup> <sup>Δ</sup>*i*, Int(ϕ, ψ)−*<sup>i</sup>* <sup>|</sup> p, Π*<sup>i</sup>* ⇒ Σ*<sup>i</sup>* which is axiomatic since Σ *<sup>i</sup>* ⊆ Σ*i*. For (c) we first apply (⇒¬|) and obtain <sup>Γ</sup>*<sup>i</sup>* <sup>⇒</sup> <sup>Δ</sup>*i*, Int(ϕ, ψ)−*<sup>i</sup>* <sup>|</sup> ¬Π *<sup>i</sup>* ∨ Δ *<sup>i</sup>*, Π*<sup>i</sup>* ⇒ Σ*i*. By (| ∨*<sup>C</sup>* ⇒) we obtain: ¬Π *<sup>i</sup>* , Γ*<sup>i</sup>* <sup>⇒</sup> <sup>Δ</sup>*i*, Int(ϕ, ψ)−*<sup>i</sup>* <sup>|</sup> <sup>Π</sup>*<sup>i</sup>* <sup>⇒</sup> <sup>Σ</sup>*<sup>i</sup>* and <sup>Δ</sup> *<sup>i</sup>*, Γ*<sup>i</sup>* <sup>⇒</sup> <sup>Δ</sup>*i*, Int(ϕ, ψ)−*<sup>i</sup>* <sup>|</sup> <sup>Π</sup>*<sup>i</sup>* <sup>⇒</sup> Σ*i*. Systematic application of (∨ ⇒|) to the latter produces axiomatic bisequents p, Γ*<sup>i</sup>* <sup>⇒</sup> <sup>Δ</sup>*i*, Int(ϕ, ψ)−*<sup>i</sup>* <sup>|</sup> <sup>Π</sup>*<sup>i</sup>* <sup>⇒</sup> <sup>Σ</sup>*<sup>i</sup>* for each <sup>p</sup> <sup>∈</sup> <sup>Δ</sup> *<sup>i</sup>*. Systematic application of (∨ ⇒|) to the former produces <sup>¬</sup>p, Γ*<sup>i</sup>* <sup>⇒</sup> <sup>Δ</sup>*i*, Int(ϕ, ψ)−*<sup>i</sup>* <sup>|</sup> <sup>Π</sup>*<sup>i</sup>* <sup>⇒</sup> <sup>Σ</sup>*<sup>i</sup>* for each p ∈ Π *<sup>i</sup>* . After application of (¬ ⇒|) they also yield axiomatic sequents. Hence we have a proof of ϕ ⇒ Int(ϕ, ψ) |⇒.

We have to do the same with a complete proof-search tree for ⇒ ψ |⇒ but now adding Int(ϕ, ψ) to every antecedent of all 1-sequents in the tree. For every leaf Int(ϕ, ψ), Θ*<sup>j</sup>* ⇒ Λ*<sup>j</sup>* | Ξ*<sup>j</sup>* ⇒ Ω*<sup>j</sup>* we apply (∨⇒|) to each disjunct of Int(ϕ, ψ) until we get leaves: Γ <sup>1</sup> ∧ ¬Σ <sup>1</sup> ∧ ¬( ¬Π <sup>1</sup> ∨ Δ <sup>1</sup>), Θ*<sup>j</sup>* ⇒ Λ*<sup>j</sup>* | Ξ*<sup>j</sup>* ⇒ Ω*<sup>j</sup>* ... Γ *<sup>k</sup>* ∧ ¬Σ *<sup>k</sup>* ∧ ¬( ¬Π *<sup>k</sup>* ∨ Δ *<sup>k</sup>*), Θ*<sup>j</sup>* ⇒ Λ*<sup>j</sup>* | Ξ*<sup>j</sup>* ⇒ Ω*<sup>j</sup>* . To each such leaf we apply (∧ ⇒|) obtaining bisequents of the form Γ *<sup>i</sup>* ,¬Σ *<sup>i</sup>*,¬( ¬Π *<sup>i</sup>* ∨ Δ *<sup>i</sup>*), Θ*<sup>j</sup>* ⇒ Λ*<sup>j</sup>* | Ξ*<sup>j</sup>* ⇒ Ω*<sup>j</sup>* for i ≤ k, j ≤ n. In each case the application of (¬ ⇒| ) yields Γ *<sup>i</sup>* , Θ*<sup>j</sup>* ⇒ Λ*<sup>j</sup>* | Ξ*<sup>j</sup>* ⇒ Ω*<sup>j</sup>* , Σ *i*, ¬Π *<sup>i</sup>* ∨ Δ *<sup>i</sup>*. The application of (|⇒ ∨*<sup>C</sup>* ) to ¬Π *<sup>i</sup>* ∨ Δ *<sup>i</sup>* yields Γ *<sup>i</sup>* , Θ*<sup>j</sup>* ⇒ Λ*<sup>j</sup>* , ¬Π *i* , Δ *<sup>i</sup>* | Ξ*<sup>j</sup>* ⇒ Ω*<sup>j</sup>* , Σ *<sup>i</sup>*. Systematic application of (⇒∨|) and (⇒¬|) gives leaves of the form Γ *<sup>i</sup>* , Θ*<sup>j</sup>* ⇒ Λ*<sup>j</sup>* , Δ *i* | Π *<sup>i</sup>* , Ξ*<sup>j</sup>* ⇒ Ω*<sup>j</sup>* , Σ *<sup>i</sup>*. Since for every i ≤ k, j ≤ n, Γ*i*, Θ*<sup>j</sup>* ⇒ Δ*i*,Λ*<sup>j</sup>* | Π*i*, Ξ*<sup>j</sup>* ⇒ Σ*i*, Ω*<sup>j</sup>* is axiomatic these primed versions are axiomatic too. Assume the contrary, then it must be e.g. some p /∈ Γ *<sup>i</sup>* such that either p ∈ Γ*<sup>i</sup>* ∩ Λ*<sup>j</sup>* or p ∈ Γ*<sup>i</sup>* ∩ Ω*<sup>j</sup>* (or for other pairs generating axioms). But it is impossible since by definition Γ *<sup>i</sup>* must contain such p (and the same for other cases of primed sets).

The proof for BSC-I<sup>2</sup> is identical since the only difference between these two logics is that **I<sup>1</sup>** has Heyting's negation whereas in **I<sup>2</sup>** it is Kleene's negation. But the two BSC rules for negation which are used in the proof are common to both negations. The proof for **P<sup>1</sup>**, **P<sup>2</sup>** is dual to the above and uses slightly different definition of Int(ϕ, ψ) specified above. Again the two logics differ only with respect to negations, but the rules used in the proof are common to Bochvar's and Kleene's one.

Eventually note that this proof may be applied also to other logics but in some cases it is convenient to extend their languages. For example, interpolants for some logics can be defined as disjunctions of the following formulae:

$$\begin{array}{c} \text{For } \mathbf{K\_3 - \bigwedge} \varGamma'\_i \wedge \bigwedge \neg \Sigma'\_i \wedge \bigwedge \neg B\_B \varDelta'\_i \wedge \bigwedge \neg\_B \neg \Pi'\_i \\ \text{For } \mathbf{LP - \bigwedge} \varPi'\_i \wedge \bigwedge \neg \Delta'\_i \wedge \bigwedge \neg\_H \Sigma'\_i \wedge \bigwedge \neg\_H \neg \Gamma'\_i \\ \text{For } \mathbf{G\_3 - \bigwedge} \varGamma'\_i \wedge \neg\_H (\bigwedge \varPi'\_i \to \bigvee \Sigma'\_i) \wedge \bigwedge \neg\_B \Delta'\_i \\ \text{For } \mathbf{G'\_3 - \bigwedge} \varPi'\_i \wedge \bigwedge \neg\_B \Delta'\_i \wedge \bigwedge \neg\_H \Sigma'\_i \wedge \bigwedge \neg\_B \neg\_B \Gamma'\_i \end{array}$$

### **6 Conclusion**

Bisequent calculi can be seen as one of the possible syntactical realizations of so called Suszko's thesis [54] in the treatment of many-valued logics. According to Suszko every logic is two-valued in the sense that all values are divided into designated and non-designated and this is reflected in the definition of consequence relation. In the case of bisequent calculi it is additionally made evident that two possible choices of designated values can be made. However, on a deep level a BSC is similar to some other proposed formalisations mentioned in the Introduction. On one hand bisequents resemble several labelled approaches where labels denote sets of values; a difference is that instead of labels a position of a formula in a bisequent is crucial, hence the method is strictly syntactical. On the other hand, there is a similarity with Avron's [4] and Avron, Ben-Naim, and Konikowska's [6] sequent calculi with special rules defined for negated formulae; a difference is that BSC satisfies ordinary subformula property and purity conditions to the effect that in schemata of rules only one (occurrence of a) connective is involved. The price is that instead of standard sequents we use a pair of them.

As we mentioned in the Introduction there is one more general difference. In the case of labelled calculi or Avron's SC we have the same input for 1- and 2-logics, whereas in BSC a different input for both classes of logics is defined; a 1 or a 2-sequent in a bisequent. A consequence of our choice is that for every pair of 1- and 2-logic with the same connectives (like e.g. **K**<sup>3</sup> and **LP**) the rules and axioms are identical. In contrast, in other mentioned approaches for such pairs of related logics, the respective calculi must differ either with respect to some axioms (closure conditions in tableaux) or to rules. It seems that the present solution where systems differ only with respect to the input is more economical and uniform. In fact we can consider also logics determined by different notions of consequence relations while still keeping the rules and axioms intact. Two relations considered in the text express informally the situation where either truth is preserved or non-falsity is preserved. But two other possibilities are open as well: Γ ⇒|⇒ ψ corresponds to the notion of no-counterexample consequence (see e.g. Lehmann [33], Paoli [39]), whereas ⇒ ϕ | Γ ⇒ corresponds to the liberal consequence which leads from non-falsity to truth. This level of uniformity follows from the fact that rules of BSC are not computed on the basis of any normal (disjunctive or conjunctive) form, like in other approaches, but on the basis of geometrical insights illustrated in Sect. 4.

Finally notice that the application of BSC may be extended easily to firstorder languages. It is quite obvious how to define suitable rules for quantifiers. But the proof of adequacy requires more refined methods than those applied here so for the lack of space we limited ourselves to propositional case. However, we finish the paper with one more problem for further investigation: the application of first-order BSC to formalisation of neutral free logics, and in particular to specific theories of definite descriptions based on some Fregean ideas (see e.g. Lehmann [33], Stenlund [52]). Since sequent and tableau calculi for such theories built on positive and negative free logics were already provided in [24,26,28], this paper offers a proper ground for extension of these results to neutral free logics.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Proving Almost-Sure Innermost Termination of Probabilistic Term Rewriting Using Dependency Pairs**

Jan-Christoph Kassing(B) and J¨urgen Giesl(B)

LuFG Informatik 2, RWTH Aachen University, Aachen, Germany kassing@cs.rwth-aachen.de, giesl@informatik.rwth-aachen.de

**Abstract.** Dependency pairs are one of the most powerful techniques to analyze termination of term rewrite systems (TRSs) automatically. We adapt the dependency pair framework to the probabilistic setting in order to prove almost-sure innermost termination of probabilistic TRSs. To evaluate its power, we implemented the new framework in our tool AProVE.

### **1 Introduction**

Techniques and tools to analyze innermost termination of term rewrite systems (TRSs) automatically are successfully used for termination analysis of programs in many languages (e.g., Java [10,35,38], Haskell [18], and Prolog [19]). While there exist several classical orderings for proving termination of TRSs (e.g., based on polynomial interpretations [30]), a *direct* application of these orderings is usually too weak for TRSs that result from actual programs. However, these orderings can be used successfully within the *dependency pair* (DP) framework [2,16,17]. This framework allows for modular termination proofs (e.g., which apply different orderings in different sub-proofs) and is one of the most powerful techniques for termination analysis of TRSs that is used in essentially all current termination tools for TRSs, e.g., AProVE [20], MU-TERM [22], NaTT [40], TTT2 [29], etc.

On the other hand, *probabilistic* programs are used to describe randomized algorithms and probability distributions, with applications in many areas. To use TRSs also for such programs, *probabilistic term rewrite systems* (PTRSs) were introduced in [8,9]. In the probabilistic setting, there are several notions of "termination". A program is *almost-surely terminating* (AST) if the probability for termination is 1. As remarked in [24]: "AST is the classical and most widelystudied problem that extends termination of non-probabilistic programs, and is considered as a core problem in the programming languages community". A strictly stronger notion is *positive almost-sure termination* (PAST), which

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 235950644 (Project GI 274/6-2) and DFG Research Training Group 2236 UnRAVeL.

c The Author(s) 2023

B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 344–364, 2023. https://doi.org/10.1007/978-3-031-38499-8\_20

requires that the expected runtime is finite. While there exist many automatic approaches to prove (P)AST of imperative programs on numbers (e.g., [1,4,11, 15,21,24–26,32–34,36]), there are only few automatic approaches for programs with complex non-tail recursive structure [7,12], and even less approaches which are also suitable for algorithms on recursive data structures [3,6,31,39]. The approach of [39] focuses on algorithms on lists and [31] mainly targets algorithms on trees, but they cannot easily be adjusted to other (possibly user-defined) data structures. The calculus of [6] considers imperative programs with stack, heap, and pointers, but it is not yet automated. Moreover, the approaches of [3,6,31,39] analyze expected runtime, while we focus on AST.

PTRSs can be used to model algorithms (possibly with complex recursive structure) operating on algebraic data types. While PTRSs were introduced in [8, 9], the first (and up to now only) tool to analyze their termination automatically was presented in [3], where orderings based on interpretations were adapted to prove PAST. Moreover, [14] extended general concepts of abstract rewrite systems (e.g., confluence and uniqueness of normal forms) to the probabilistic setting.

As mentioned, already for non-probabilistic TRSs a *direct* application of orderings (as in [3]) is limited in power. To obtain a powerful approach, one should combine such orderings in a modular way, as in the DP framework. In this paper, we show for the first time that an adaption of dependency pairs to the probabilistic setting is possible and present the first DP framework for probabilistic term rewriting. Since the crucial idea of dependency pairs is the modularization of the termination proof, we analyze AST instead of PAST, because it is well known that AST is compositional, while PAST is not (see, e.g., [25]). We also present a novel adaption of the technique from [3] for the direct application of polynomial interpretations in order to prove AST (instead of PAST) of PTRSs.

We start by briefly recapitulating the DP framework for non-probabilistic TRSs in Sect. 2. Then we recall the definition of PTRSs based on [3,9,14] in Sect. 3 and introduce a novel way to prove AST using polynomial interpretations automatically. In Sect. 4 we present our new probabilistic DP framework. The implementation of our approach in the tool AProVE is evaluated in Sect. 5. We refer to [28] for all proofs (which are much more involved than the original proofs for the non-probabilistic DP framework from [2,16,17]).

#### **2 The DP Framework**

We assume familiarity with term rewriting [5] and regard TRSs over a finite signature <sup>Σ</sup> and a set of variables <sup>V</sup>. A *polynomial interpretation* Pol is a <sup>Σ</sup>-algebra with carrier set <sup>N</sup> which maps every function symbol <sup>f</sup> <sup>∈</sup> <sup>Σ</sup> to a polynomial <sup>f</sup>Pol <sup>∈</sup> <sup>N</sup>[V]. For a term <sup>t</sup> ∈ T (Σ, <sup>V</sup>), Pol(t) denotes the interpretation of <sup>t</sup> by the Σ-algebra Pol. An arithmetic inequation like Pol(t1) > Pol(t2) *holds* if it is true for all instantiations of its variables by natural numbers.

**Theorem 1 (Termination With Polynomial Interpretations** [30]**).** *Let* R *be a TRS and let* Pol : <sup>T</sup> (Σ, <sup>V</sup>) <sup>→</sup> <sup>N</sup>[V] *be a monotonic polynomial interpretation (i.e.,* x>y *implies* <sup>f</sup>Pol(. . . , x, . . .) > fPol(..., y,...) *for all* <sup>f</sup> <sup>∈</sup> <sup>Σ</sup>*). If for every* <sup>→</sup> <sup>r</sup> ∈ R*, we have* Pol() <sup>&</sup>gt; Pol(r)*, then* <sup>R</sup> *is terminating.*

The search for polynomial interpretations is usually automated by SMT solving. Instead of polynomials over the naturals, Theorem 1 (and the other termination criteria in the paper) can also be extended to polynomials over the non-negative reals, by requiring that whenever a term is "strictly decreasing", then its interpretation decreases at least by a certain fixed amount δ > 0.

$$\begin{aligned} \text{Example 2. Consider the TRS } \mathcal{R}\_{\text{div}} &= \{ (1), \ldots, (4) \} \text{ for division from [2]}.\\ \mathfrak{mins}(x, \mathcal{O}) &\to x \qquad \qquad \qquad \qquad (1) \qquad \mathsf{div}(\mathcal{O}, \mathsf{s}(y)) \to \mathcal{O} \end{aligned} \tag{3}$$

$$\mathsf{minus}(\mathsf{s}(x), \mathsf{s}(y)) \rightarrow \mathsf{minus}(x, y) \quad \text{(2)} \quad \mathsf{div}(\mathsf{s}(x), \mathsf{s}(y)) \rightarrow \mathsf{s}\left(\mathsf{div}(\mathsf{minus}(x, y), \mathsf{s}(y))\right) \tag{4}$$

Termination of <sup>R</sup>minus <sup>=</sup> {(1),(2)} can be proved by the polynomial interpretation that maps minus(x, y) to <sup>x</sup> <sup>+</sup> <sup>y</sup> + 1, <sup>s</sup>(x) to <sup>x</sup> + 1, and <sup>O</sup> to 0. However, a direct application of classical techniques like polynomial interpretations fails for Rdiv. These techniques correspond to so-called (quasi-)simplification orderings [13] which cannot handle rules like (4) where the right-hand side is embedded in the left-hand side if y is instantiated with s(x). In contrast, the dependency pair framework is able to prove termination of Rdiv automatically.

We now recapitulate the DP framework and its core processors, and refer to, e.g., [2,16,17,23] for more details. In this paper, we restrict ourselves to the DP framework for *innermost* rewriting (denoted " <sup>i</sup> →R"), because our adaption to the probabilistic setting relies on this evaluation strategy (see Sect. 4.1).

**Definition 3 (Dependency Pair).** *Let* R *be a (finite) TRS. We decompose its signature* <sup>Σ</sup> <sup>=</sup> <sup>Σ</sup><sup>C</sup> <sup>Σ</sup><sup>D</sup> *such that* <sup>f</sup> <sup>∈</sup> <sup>Σ</sup><sup>D</sup> *if* <sup>f</sup> = root() *for some rule* <sup>→</sup> <sup>r</sup> ∈ R*. The symbols in* <sup>Σ</sup><sup>C</sup> *and* <sup>Σ</sup><sup>D</sup> *are called* constructors *and* defined symbols*, respectively. For every* <sup>f</sup> <sup>∈</sup> <sup>Σ</sup>D*, we introduce a fresh* tuple symbol <sup>f</sup># *of the same arity. Let* Σ# *denote the set of all tuple symbols. To ease readability, we often write* <sup>F</sup> *instead of* <sup>f</sup>#*. For any term* <sup>t</sup> <sup>=</sup> <sup>f</sup>(t1,...,tn) ∈ T (Σ, <sup>V</sup>) *with* <sup>f</sup> <sup>∈</sup> <sup>Σ</sup>D*, let* <sup>t</sup> # <sup>=</sup> <sup>f</sup>#(t1,...,tn)*. Moreover, for any* <sup>r</sup> ∈ T (Σ, <sup>V</sup>)*, let* SubD(r) *be the set of all subterms of* <sup>r</sup> *with defined root symbol. For a rule* <sup>→</sup> <sup>r</sup> *with* SubD(r) = {t1,...,t<sup>n</sup>}*, one obtains the* <sup>n</sup> *dependency pairs (DPs)* # <sup>→</sup> <sup>t</sup> # <sup>i</sup> *with* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>*.* DP(R) *denotes the set of all dependency pairs of* <sup>R</sup>*.*

*Example 4.* For the TRS Rdiv from Example 2, we get the following dependency pairs.

$$\mathsf{M}(\mathsf{s}(x), \mathsf{s}(y)) \to \mathsf{M}(x, y) \quad \text{(5)}\qquad\qquad \mathsf{D}(\mathsf{s}(x), \mathsf{s}(y)) \to \mathsf{M}(x, y) \quad\qquad \text{(6)}\qquad\qquad \mathsf{D}(\mathsf{s}(x), \mathsf{s}(y)) \to \mathsf{D}(\mathsf{minus}(x, y), \mathsf{s}(y)) \quad\qquad \text{(7)}\qquad\qquad \mathsf{D}(\mathsf{s}(x), \mathsf{s}(y)) \to \mathsf{D}(\mathsf{minus}(x, y), \mathsf{s}(y)) \quad\qquad \text{(8)}\qquad\qquad \mathsf{D}(\mathsf{s}(x), \mathsf{s}(y)) \to \mathsf{D}(\mathsf{minus}(x, y), \mathsf{s}(y)) \quad\qquad \text{(9)}\qquad\qquad \mathsf{D}(\mathsf{minus}(x, y), \mathsf{s}(y)) \quad\qquad \text{(10)}\qquad\qquad \mathsf{D}(\mathsf{minus}(x, y), \mathsf{s}(y)) \quad\qquad \text{(11)}\qquad\qquad \mathsf{D}(\mathsf{minus}(x, y), \mathsf{s}(y)) \quad\qquad \text{(12)}\qquad\qquad \mathsf{D}(\mathsf{minus}(x, y), \mathsf{s}(y)) \quad\qquad \text{(13)}\qquad\qquad \mathsf{D}(\mathsf{minus}(x, y), \mathsf{s}(y)) \quad\qquad \text{(14)}\qquad\qquad \mathsf{D}(\mathsf{minus}(x, y), \mathsf{s}(y)) \quad\qquad \text{(15)}\qquad\qquad \mathsf{D}(\mathsf{minus}(x, y), \mathsf{s}(y)) \quad\qquad \text{(16)}\qquad\qquad \mathsf{D}(\mathsf{minus}(x, y), \mathsf{s}(y)) \quad\qquad \text{(17)}\qquad\qquad \mathsf{D}(\mathsf{minus}(x, y), \mathsf{s}(y)) \quad\qquad \text{(18)}\qquad\qquad \text{($$

The DP framework uses *DP problems* (D, <sup>R</sup>) where <sup>D</sup> is a (finite) set of DPs and <sup>R</sup> is a (finite) TRS. A (possibly infinite) sequence <sup>t</sup> # <sup>0</sup> , t# <sup>1</sup> , t# <sup>2</sup> ,... with t # <sup>i</sup> <sup>→</sup><sup>i</sup> <sup>D</sup>,<sup>R</sup> ◦ <sup>→</sup><sup>i</sup> <sup>∗</sup> R t # <sup>i</sup>+1 for all <sup>i</sup> is an (innermost) (D, <sup>R</sup>)-*chain*. Here, <sup>→</sup><sup>i</sup> <sup>D</sup>,<sup>R</sup> is the restriction of →<sup>D</sup> to rewrite steps where the used redex is in normal form w.r.t. R. A chain represents subsequent "function calls" in evaluations. Between two function calls (corresponding to steps with D) one can evaluate the arguments with <sup>R</sup>. For example, <sup>D</sup>(s<sup>2</sup>(O),s(O)), <sup>D</sup>(s(O),s(O)) is a (DP(Rdiv), <sup>R</sup>div)-chain, as <sup>D</sup>(s<sup>2</sup>(O),s(O)) <sup>→</sup><sup>i</sup> DP(Rdiv)*,*Rdiv <sup>D</sup>(minus(s(O), <sup>O</sup>),s(O)) <sup>→</sup><sup>i</sup> <sup>∗</sup> <sup>R</sup>div <sup>D</sup>(s(O),s(O)), where <sup>s</sup><sup>2</sup>(O) is <sup>s</sup>(s(O)).

A DP problem (D, <sup>R</sup>) is called *innermost terminating* (iTerm) if there is no infinite innermost (D, <sup>R</sup>)-chain. The main result on dependency pairs is the *chain criterion* which states that a TRS <sup>R</sup> is iTerm iff (DP(R), <sup>R</sup>) is iTerm. The key idea of the DP framework is a *divide-and-conquer* approach which applies *DP processors* to transform DP problems into simpler sub-problems. A *DP processor* Proc has the form Proc(D, <sup>R</sup>) = {(D1, <sup>R</sup>1),...,(D<sup>n</sup>, <sup>R</sup><sup>n</sup>)}, where <sup>D</sup>, <sup>D</sup>1,..., <sup>D</sup><sup>n</sup> are sets of dependency pairs and <sup>R</sup>, <sup>R</sup>1,..., <sup>R</sup><sup>n</sup> are TRSs. A processor Proc is *sound* if (D, <sup>R</sup>) is iTerm whenever (D<sup>i</sup>, <sup>R</sup><sup>i</sup>) is iTerm for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>. It is *complete* if (D<sup>i</sup>, <sup>R</sup><sup>i</sup>) is iTerm for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> whenever (D, <sup>R</sup>) is iTerm.

So given a TRS <sup>R</sup>, one starts with the initial DP problem (DP(R), <sup>R</sup>) and applies sound (and preferably complete) DP processors repeatedly until all subproblems are "solved" (i.e., sound processors transform them to the empty set). This allows for modular termination proofs, since different techniques can be applied on each resulting "sub-problem" (D<sup>i</sup>, <sup>R</sup><sup>i</sup>). The following three theorems recapitulate the three most important processors of the DP framework.

The (innermost) (D, <sup>R</sup>)*-dependency graph* is a control flow graph that indicates which dependency pairs can be used after each other in a chain. Its node set is <sup>D</sup> and there is an edge from # <sup>1</sup> <sup>→</sup> <sup>t</sup> # <sup>1</sup> to # <sup>2</sup> <sup>→</sup> <sup>t</sup> # <sup>2</sup> if there exist (5) (6) (7) substitutions σ1, σ<sup>2</sup> such that t # <sup>1</sup> <sup>σ</sup><sup>1</sup> <sup>→</sup><sup>i</sup> <sup>∗</sup> R # <sup>2</sup> σ2, and both # <sup>1</sup> σ<sup>1</sup> and # <sup>2</sup> σ<sup>2</sup> are in normal form w.r.t. <sup>R</sup>. Any infinite (D, <sup>R</sup>)-chain corresponds to an infinite path in the dependency graph, and since the graph is finite, this infinite path must end in some strongly connected component (SCC).<sup>1</sup> Hence, it suffices to consider the SCCs of this graph independently. The (DP(Rdiv), <sup>R</sup>div)-dependency graph can be seen on the right.

**Theorem 5 (Dep. Graph Processor).** *For the SCCs* <sup>D</sup>1, ..., <sup>D</sup><sup>n</sup> *of the* (D, <sup>R</sup>)*-dependency graph,* ProcDG(D, <sup>R</sup>) = {(D1, <sup>R</sup>), ...,(D<sup>n</sup>, <sup>R</sup>)} *is sound and complete.*

While the exact dependency graph is not computable in general, there are several techniques to over-approximate it automatically, see, e.g., [2,17,23]. In our example, applying ProcDG to the initial problem (DP(Rdiv), <sup>R</sup>div) results in the smaller problems - {(5)}, <sup>R</sup>div and - {(7)}, <sup>R</sup>div that can be treated separately.

The next processor removes rules that cannot be used to evaluate right-hand sides of dependency pairs when their variables are instantiated with normal forms.

<sup>1</sup> Here, a set <sup>D</sup> of dependency pairs is an *SCC* if it is a maximal cycle, i.e., it is a maximal set such that for any - # <sup>1</sup> <sup>→</sup> <sup>t</sup> # <sup>1</sup> and - # <sup>2</sup> <sup>→</sup> <sup>t</sup> # <sup>2</sup> in D there is a non-empty path from - # <sup>1</sup> <sup>→</sup> <sup>t</sup> # <sup>1</sup> to - # <sup>2</sup> <sup>→</sup> <sup>t</sup> # <sup>2</sup> which only traverses nodes from D .

**Theorem 6 (Usable Rules Processor).** *Let* <sup>R</sup> *be a TRS. For every* <sup>f</sup> <sup>∈</sup> <sup>Σ</sup>Σ# *let* RulesR(f) = { <sup>→</sup> <sup>r</sup> ∈R| root() = <sup>f</sup>}*. For any* <sup>t</sup> ∈ T - <sup>Σ</sup> <sup>Σ</sup>#, <sup>V</sup> *, its* usable rules <sup>U</sup>R(t) *are the smallest set such that* <sup>U</sup>R(x) = <sup>∅</sup> *for all* <sup>x</sup> ∈ V *and* <sup>U</sup>R(f(t1,...,tn)) = RulesR(f) <sup>∪</sup> <sup>n</sup> <sup>i</sup>=1 <sup>U</sup>R(ti) <sup>∪</sup> -<sup>→</sup>r∈RulesR(f) <sup>U</sup>R(r)*. The* usable rules *for the DP problem* (D, <sup>R</sup>) *are* <sup>U</sup>(D, <sup>R</sup>) = -#→t#∈D <sup>U</sup>R(<sup>t</sup> #)*. Then* ProcUR(D, <sup>R</sup>) = {(D, <sup>U</sup>(D, <sup>R</sup>))} *is sound but not complete.*<sup>2</sup>

For the DP problem - {(7)}, <sup>R</sup>div only the minus-rules are usable and thus ProcUR- {(7)}, <sup>R</sup>div = { - {(7)}, {(1),(2)} }. For - {(5)}, <sup>R</sup>div there are no usable rules at all, and thus ProcUR- {(5)}, <sup>R</sup>div = { - {(5)}, <sup>∅</sup> }.

The last processor adapts classical orderings like polynomial interpretations to DP problems.<sup>3</sup> In contrast to their direct application in Theorem 1, we may now use weakly monotonic polynomials fPol that do not have to depend on all of their arguments. The reduction pair processor requires that all rules and dependency pairs are weakly decreasing and it removes those DPs that are strictly decreasing.

**Theorem 7 (Reduction Pair Processor with Polynomial Interpretations).** *Let* Pol : T - <sup>Σ</sup> <sup>Σ</sup>#, <sup>V</sup> <sup>→</sup> <sup>N</sup>[V] *be a weakly monotonic polynomial interpretation (i.e.,* <sup>x</sup> <sup>≥</sup> <sup>y</sup> *implies* <sup>f</sup>Pol(. . . , x, . . .) <sup>≥</sup> <sup>f</sup>Pol(..., y,...) *for all* <sup>f</sup> <sup>∈</sup> <sup>Σ</sup> <sup>Σ</sup>#*). Let* <sup>D</sup> <sup>=</sup> <sup>D</sup><sup>≥</sup> D<sup>&</sup>gt; *with* <sup>D</sup><sup>&</sup>gt; <sup>=</sup> <sup>∅</sup> *such that:*


*Then* ProcRP(D, <sup>R</sup>) = {(D≥, <sup>R</sup>)} *is sound and complete.*

The constraints of the reduction pair processor for the remaining DP problems ({(7)}, {(1),(2)}) and ({(5)}, <sup>∅</sup>) are satisfied by the polynomial interpretation which maps <sup>O</sup> to 0, <sup>s</sup>(x) to <sup>x</sup> + 1, and all other non-constant function symbols to the projection on their first arguments. Since (7) and (5) are strictly decreasing, ProcRP transforms both ({(7)}, {(1),(2)}) and ({(5)}, <sup>∅</sup>) into DP problems of the form (∅,...). As ProcDG(∅,...) = ∅ and all processors used are sound, this means that there is no infinite innermost chain for the initial DP problem (DP(Rdiv), <sup>R</sup>div) and thus, <sup>R</sup>div is innermost terminating.

### **3 Probabilistic Term Rewriting**

Now we recapitulate *probabilistic TRSs* [3,9,14] and present a novel criterion to prove almost-sure termination automatically by adapting the direct application

<sup>2</sup> For a complete version of the usable rules processor, one has to use a more involved notion of DP problems with more components that we omit here for readability [16].

<sup>3</sup> In this paper, we only regard the reduction pair processor with polynomial interpretations, because for most other classical orderings it is not clear how to extend them to probabilistic TRSs, where one has to consider "expected values of terms".

of polynomial interpretations from Theorem 1 to PTRSs. In contrast to TRSs, a PTRS has finite<sup>4</sup> multi-distributions on the right-hand side of rewrite rules.

**Definition 8 (Multi-Distribution).** *A finite* multi-distribution μ *on a set* <sup>A</sup> <sup>=</sup> <sup>∅</sup> *is a finite multiset of pairs* (<sup>p</sup> : <sup>a</sup>)*, where* <sup>0</sup> < p <sup>≤</sup> <sup>1</sup> *is a probability and* <sup>a</sup> <sup>∈</sup> <sup>A</sup>*, such that* (p:a)∈<sup>μ</sup> <sup>p</sup> = 1*.* FDist(A) *is the set of all finite multidistributions on* <sup>A</sup>*. For* <sup>μ</sup> <sup>∈</sup> FDist(A)*, its* support *is the multiset* Supp(μ)={<sup>a</sup> <sup>|</sup> (p:a)∈<sup>μ</sup> *for some* <sup>p</sup>}*.*

**Definition 9 (PTRS).** *<sup>A</sup>* probabilistic rewrite rule *is a pair* <sup>→</sup> <sup>μ</sup> <sup>∈</sup> <sup>T</sup> (Σ, <sup>V</sup>) <sup>×</sup> FDist(<sup>T</sup> (Σ, <sup>V</sup>)) *such that* ∈ V *and* <sup>V</sup>(r) ⊆ V() *for every* <sup>r</sup> <sup>∈</sup> Supp(μ)*. A* probabilistic TRS *(PTRS) is a finite set* <sup>R</sup> *of probabilistic rewrite rules. Similar to TRSs, the PTRS* R *induces a* rewrite relation <sup>→</sup><sup>R</sup> ⊆ T (Σ, <sup>V</sup>) <sup>×</sup> FDist(<sup>T</sup> (Σ, <sup>V</sup>)) *where* <sup>s</sup> <sup>→</sup><sup>R</sup> {p<sup>1</sup> : <sup>t</sup>1,...,p<sup>k</sup> : <sup>t</sup><sup>k</sup>} *if there is a position* <sup>π</sup>*, a rule* → {p<sup>1</sup> : <sup>r</sup>1,...,p<sup>k</sup> : <sup>r</sup><sup>k</sup>}∈R*, and a substitution* <sup>σ</sup> *such that* <sup>s</sup>|<sup>π</sup> <sup>=</sup> σ *and* <sup>t</sup><sup>j</sup> <sup>=</sup> <sup>s</sup>[rjσ]<sup>π</sup> *for all* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup>*. We call* <sup>s</sup> <sup>→</sup><sup>R</sup> <sup>μ</sup> *an* innermost *rewrite step (denoted* s i <sup>→</sup><sup>R</sup> <sup>μ</sup>*) if every proper subterm of the used redex* σ *is in normal form w.r.t.* R*.*

*Example 10.* As an example, consider the PTRS <sup>R</sup>rw with the only rule <sup>g</sup>(x) <sup>→</sup> {<sup>1</sup>/<sup>2</sup> : x, <sup>1</sup>/<sup>2</sup> : <sup>g</sup>(g(x))}, which corresponds to a symmetric random walk.

As proposed in [3], we *lift* →<sup>R</sup> to a rewrite relation between multidistributions in order to track all probabilistic rewrite sequences (up to nondeterminism) at once. For any 0 < p <sup>≤</sup> 1 and any <sup>μ</sup> <sup>∈</sup> FDist(A), let <sup>p</sup> · <sup>μ</sup> <sup>=</sup> {(<sup>p</sup> · <sup>q</sup> : <sup>a</sup>) <sup>|</sup> (<sup>q</sup> : <sup>a</sup>) <sup>∈</sup> <sup>μ</sup>}.

**Definition 11 (Lifting).** *The* lifting <sup>⇒</sup> <sup>⊆</sup> FDist(<sup>T</sup> (Σ, <sup>V</sup>)) <sup>×</sup> FDist(<sup>T</sup> (Σ, <sup>V</sup>)) *of a relation* →⊆T (Σ, <sup>V</sup>) <sup>×</sup> FDist(<sup>T</sup> (Σ, <sup>V</sup>)) *is the smallest relation with:*


For a PTRS <sup>R</sup>, we write <sup>⇒</sup><sup>R</sup> and <sup>i</sup> <sup>⇒</sup><sup>R</sup> for the liftings of <sup>→</sup><sup>R</sup> and <sup>i</sup> →R, respectively.

*Example 12.* For instance, we obtain the following <sup>i</sup> ⇒Rrw -rewrite sequence:

$$\begin{array}{c} \{1:\mathsf{g}(\mathcal{O})\} \stackrel{\mathsf{i}}{\underset{\mathrm{i}}{\rightleftarrows}} \mathbb{R}\_{\mathsf{m}}\left\{{1/2:\mathcal{O},{1/2}:\mathsf{g}^{2}(\mathcal{O})\} \stackrel{\mathsf{i}}{\underset{\mathrm{i}}{\rightleftarrows}} \mathbb{R}\_{\mathsf{m}}\left\{{1/2:\mathcal{O},{1/4}:\mathsf{g}(\mathcal{O})},{1/4:\mathsf{g}^{3}(\mathcal{O})}\right\} \\ \stackrel{\mathsf{i}}{\underset{\mathrm{i}}{\rightleftarrows}} \{1/2:\mathcal{O},{1/s}:\mathcal{O},{1/s}:\mathsf{g}^{2}(\mathcal{O}),{1/s}:\mathsf{g}^{2}(\mathcal{O}),{1/s}:\mathsf{g}^{4}(\mathcal{O})\} \end{array}$$

Note that the two occurrences of <sup>O</sup> and <sup>g</sup><sup>2</sup>(O) in the multi-distribution above could be rewritten differently if the PTRS had rules resulting in different terms. So it should be distinguished from {<sup>5</sup>/<sup>8</sup> : <sup>O</sup>, <sup>1</sup>/<sup>4</sup> : <sup>g</sup><sup>2</sup>(O), <sup>1</sup>/<sup>8</sup> : <sup>g</sup><sup>4</sup>(O)}.

<sup>4</sup> Since our goal is the automation of termination analysis, in this paper we restrict ourselves to finite PTRSs with finite multi-distributions.

To express the concept of almost-sure termination, one has to determine the probability for normal forms in a multi-distribution.

**Definition 13 (**|μ|R**).** *For a PTRS* <sup>R</sup>*,* NF<sup>R</sup> ⊆ T (Σ, <sup>V</sup>) *denotes the set of all normal forms w.r.t.* <sup>R</sup>*. For any* <sup>μ</sup> <sup>∈</sup> FDist(<sup>T</sup> (Σ, <sup>V</sup>))*, let* <sup>|</sup>μ|<sup>R</sup> <sup>=</sup> (p:t)∈μ,t∈NF<sup>R</sup> <sup>p</sup>*.*

*Example 14.* Consider the multi-distribution {<sup>1</sup>/<sup>2</sup> : <sup>O</sup>, <sup>1</sup>/<sup>8</sup> : <sup>O</sup>, <sup>1</sup>/<sup>8</sup> : <sup>g</sup><sup>2</sup>(O), <sup>1</sup>/<sup>8</sup> : <sup>g</sup><sup>2</sup>(O), <sup>1</sup>/<sup>8</sup> : <sup>g</sup><sup>4</sup>(O)} from Example <sup>12</sup> and <sup>R</sup>rw from Example 10. Then <sup>|</sup>μ|Rrw <sup>=</sup> <sup>1</sup>/<sup>2</sup> <sup>+</sup> <sup>1</sup>/<sup>8</sup> <sup>=</sup> <sup>5</sup>/<sup>8</sup> .

**Definition 15 ((Innermost) AST).** *Let* <sup>R</sup> *be a PTRS and* (μn)<sup>n</sup>∈<sup>N</sup> *be an infinite* <sup>⇒</sup>R*-rewrite sequence, i.e.,* <sup>μ</sup><sup>n</sup> <sup>⇒</sup><sup>R</sup> <sup>μ</sup><sup>n</sup>+1 *for all* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*. Note that* lim<sup>n</sup>→∞ <sup>|</sup>μ<sup>n</sup>|<sup>R</sup> *exists, since* <sup>|</sup>μ<sup>n</sup>|<sup>R</sup> ≤ |μ<sup>n</sup>+1|<sup>R</sup> <sup>≤</sup> <sup>1</sup> *for all* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*.* <sup>R</sup> *is* almost-surely terminating (AST) *(* innermost almost-surely terminating (iAST)*) if* lim<sup>n</sup>→∞ <sup>|</sup>μ<sup>n</sup>|<sup>R</sup> = 1 *holds for every infinite* <sup>⇒</sup>R*-rewrite sequence (* <sup>i</sup> <sup>⇒</sup>R*-rewrite sequence)* (μn)<sup>n</sup>∈<sup>N</sup>*.*

*Example 16.* For the (unique) infinite extension of the <sup>i</sup> ⇒Rrw -rewrite sequence (μn)<sup>n</sup>∈<sup>N</sup> in Example 12, we have lim<sup>n</sup>→∞ <sup>|</sup>μ<sup>n</sup>|<sup>R</sup> = 1. Indeed, <sup>R</sup>rw is AST (but not PAST, i.e., the expected number of rewrite steps is infinite for every term containing g).

Theorem 17 introduces a novel technique to prove AST automatically using a direct application of polynomial interpretations.

**Theorem 17 (Proving AST with Polynomial Interpretations).** *Let* R *be a PTRS, let* Pol : <sup>T</sup> (Σ, <sup>V</sup>) <sup>→</sup> <sup>N</sup>[V] *be a monotonic, multilinear*<sup>5</sup> *polynomial interpretation (i.e., for all* <sup>f</sup> <sup>∈</sup> <sup>Σ</sup>*, all monomials of* <sup>f</sup>Pol(x1,...,xn) *have the form* <sup>c</sup> · <sup>x</sup><sup>e</sup><sup>1</sup> <sup>1</sup> · ... · <sup>x</sup><sup>e</sup>*<sup>n</sup>* <sup>n</sup> *with* <sup>c</sup> <sup>∈</sup> <sup>N</sup> *and* <sup>e</sup>1,...,e<sup>n</sup> ∈ {0, <sup>1</sup>}*). If for every rule* → {p<sup>1</sup> : <sup>r</sup>1,...,p<sup>k</sup> : <sup>r</sup><sup>k</sup>}∈R*,*

*(1) there exists a* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup> *with* Pol() <sup>&</sup>gt; Pol(r<sup>j</sup> ) *and (2)* Pol() <sup>≥</sup> <sup>1</sup>≤j≤<sup>k</sup> <sup>p</sup><sup>j</sup> · Pol(r<sup>j</sup> )*,*

*then* R *is AST.*

In [3], it was shown that PAST can be proved by using multilinear polynomials and requiring a strict decrease in the expected value of each rule. In contrast, we only require a weak decrease of the expected value in (2) and in addition, at least one term in the support of the right-hand side must become strictly smaller (1). As mentioned, the proof for Theorem 17 (and for all our other new results and observations) can be found in [28]. The proof idea is based on [32], but it extends their approach from while-programs on integers to terms. However, in contrast to [32], PTRSs can only deal with constant probabilities, since all variables stand for terms, not for numbers. Note that the constraints (1) and (2) of our new criterion in Theorem 17 are equivalent to the constraint of the classical Theorem 1 in the special case where the PTRS is in fact a TRS (i.e., all rules have the form → {1 : <sup>r</sup>}).

<sup>5</sup> As in [3], multilinearity ensures "monotonicity" w.r.t. expected values, since multilinearity implies fPol(...,- <sup>1</sup>≤*j*≤*<sup>k</sup>* <sup>p</sup>*<sup>j</sup>* · Pol(r*<sup>j</sup>* ),...) = - <sup>1</sup>≤*j*≤*<sup>k</sup>* <sup>p</sup>*<sup>j</sup>* · Pol(f(...,r*<sup>j</sup>* ,...)).

*Example 18.* To prove that Rrw is AST with Theorem 17, we can use the polynomial interpretation that maps <sup>g</sup>(x) to <sup>x</sup> + 1 and <sup>O</sup> to 0.

### **4 Probabilistic Dependency Pairs**

We introduce our new adaption of DPs to the probabilistic setting in Sect. 4.1. Then we present the processors for the probabilistic DP framework in Sect. 4.2.

#### **4.1 Dependency Tuples and Chains for Probabilistic Term Rewriting**

We first show why straightforward adaptions are unsound. A natural idea to define DPs for probabilistic rules → {p<sup>1</sup> : <sup>r</sup>1,...,p<sup>k</sup> : <sup>r</sup><sup>k</sup>}∈R would be (8) or (9):

$$\{\ell^\# \to \{p\_1 : r\_1, \dots, p\_i : t\_j^\#, \dots, p\_k : r\_k\} \mid t\_j \in \text{Sub}\_D(r\_j) \text{ with } 1 \le j \le k\} \tag{8}$$

$$\{\ell^\# \to \{p\_1 : t\_1^\#, \dots, p\_k : t\_k^\#\} \: \mid \: t\_j \in \text{Sub}\_D(r\_j) \text{ for all } 1 \le j \le k\}\tag{9}$$

For (9), if SubD(r<sup>j</sup> ) = <sup>∅</sup>, then we insert a fresh constructor <sup>⊥</sup> into SubD(r<sup>j</sup> ) that does not occur in <sup>R</sup>. So in both (8) and (9), we replace <sup>r</sup><sup>j</sup> by a single term t # <sup>j</sup> in the right-hand side. The following example shows that this notion of probabilistic DPs does not yield a sound chain criterion. Consider the PTRSs R<sup>1</sup> and R2:

$$\mathcal{R}\_1 = \{ \mathbf{g} \rightarrow \{ 1/2 : \mathcal{O}, 1/2 : \mathbf{f}(\mathbf{g}, \mathbf{g}) \} \} \quad \mathcal{R}\_2 = \{ \mathbf{g} \rightarrow \{ 1/2 : \mathcal{O}, 1/2 : \mathbf{f}(\mathbf{g}, \mathbf{g}, \mathbf{g}) \} \} \quad (10)$$

R<sup>1</sup> is AST since it corresponds to a symmetric random walk stopping at 0, where the number of gs denotes the current position. In contrast, R<sup>2</sup> is not AST as it corresponds to a random walk where there is an equal chance of reducing the number of gs by 1 or increasing it by 2. For both R<sup>1</sup> and R2, (8) and (9) would result in the only dependency pair <sup>G</sup> → {<sup>1</sup>/<sup>2</sup> : <sup>O</sup>, <sup>1</sup>/<sup>2</sup> : <sup>G</sup>} and <sup>G</sup> → {<sup>1</sup>/<sup>2</sup> : <sup>⊥</sup>, <sup>1</sup>/<sup>2</sup> : <sup>G</sup>}, resp. Rewriting with this DP is clearly AST, since it corresponds to a program that flips a coin until one gets head and then terminates. So the definitions (8) and (9) would not yield a sound approach for proving AST.

R<sup>1</sup> and R<sup>2</sup> show that the number of occurrences of the same subterm in the right-hand side r of a rule matters for AST. Thus, we now regard the *multiset* MSubD(r) of all subterms of r with defined root symbol to ensure that multiple occurrences of the same subterm in r are taken into account. Moreover, instead of pairs we regard *dependency tuples* which consider all subterms with defined root in r at once. Dependency tuples were already used when adapting DPs for complexity analysis of (non-probabilistic) TRSs [37]. We now adapt them to the probabilistic setting and present a novel rewrite relation for dependency tuples.

**Definition 19 (Transformation** dp**).** *If* MSubD(r) = {t1,...,t<sup>n</sup>}*, then we define* dp(r) = cn(t # <sup>1</sup> ,...,t# <sup>n</sup> )*. To make* dp(r) *unique, we use the lexicographic ordering* <sup>&</sup>lt; *on positions where* <sup>t</sup><sup>i</sup> <sup>=</sup> <sup>r</sup>|<sup>π</sup>*<sup>i</sup> and* <sup>π</sup><sup>1</sup> <...<πn*. Here, we extend* <sup>Σ</sup><sup>C</sup> *by fresh* compound *constructor symbols* <sup>c</sup><sup>n</sup> *of arity* <sup>n</sup> *for* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*.*

When rewriting a subterm t # <sup>i</sup> of cn(t # <sup>1</sup> ,...,t# <sup>n</sup> ) with a dependency tuple, one obtains terms with nested compound symbols. To abstract from nested compound symbols and from the order of their arguments, we introduce the following normalization.

**Definition 20 (Normalizing Compound Terms).** *For any term* t*, its* content cont(t) *is the multiset defined by* cont(cn(t1,...,tn)) = cont(t1) <sup>∪</sup> ... <sup>∪</sup> cont(tn) *and* cont(t) = {t} *otherwise. For any term* <sup>t</sup> *with* cont(t) = {t1,...,tn}*, the term* cn(t1,...,tn) *is a* normalization *of* t*. For two terms* t, t *, we define* <sup>t</sup> <sup>≈</sup> <sup>t</sup> *if* cont(t) = cont(t )*. We define* ≈ *on multi-distributions in a similar way: whenever* <sup>t</sup><sup>j</sup> <sup>≈</sup> <sup>t</sup> <sup>j</sup> *for all* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup>*, then* {p<sup>1</sup> : <sup>t</sup>1,...,p<sup>k</sup> : <sup>t</sup><sup>k</sup>}≈{p<sup>1</sup> : <sup>t</sup> <sup>1</sup>,...,p<sup>k</sup> : t k}*.*

So for example, c3(x, x, y) is a normalization of c2(c1(x), c2(x, y)). We do not distinguish between terms and multi-distributions that are equal w.r.t. ≈ and we write cn(t1,...,tn) for any term t with a compound root symbol where cont(t) = {t1,...,t<sup>n</sup>}, i.e., we consider all such <sup>t</sup> to be normalized.

For any rule → {p<sup>1</sup> : <sup>r</sup>1,...,p<sup>k</sup> : <sup>r</sup><sup>k</sup>}∈R, the natural idea would be to define its *dependency tuple (DT)* as # → {p<sup>1</sup> : dp(r1),...,p<sup>k</sup> : dp(rk)}. Then innermost *chains* in the probabilistic setting would result from alternating a DT-step with an arbitrary number of <sup>R</sup>-steps (using <sup>i</sup> ⇒<sup>∗</sup> <sup>R</sup>). However, such chains would not necessarily correspond to the original rewrite sequence and thus, the resulting chain criterion would not be sound.

*Example 21.* Consider the PTRS <sup>R</sup><sup>3</sup> <sup>=</sup> {f(O) → {1 : <sup>f</sup>(a)}, <sup>a</sup> → {<sup>1</sup>/<sup>2</sup> : <sup>b</sup>1, <sup>1</sup>/<sup>2</sup> : <sup>b</sup>2}, <sup>b</sup><sup>1</sup> → {1 : O}, <sup>b</sup><sup>2</sup> → {1 : <sup>f</sup>(a)}}. Its DTs would be <sup>D</sup><sup>3</sup> <sup>=</sup> {F(O) <sup>→</sup> {1 : <sup>c</sup>2(F(a),A)},<sup>A</sup> → {<sup>1</sup>/<sup>2</sup> : <sup>c</sup>1(B1), <sup>1</sup>/<sup>2</sup> : <sup>c</sup>1(B2)},B<sup>1</sup> → {1 : <sup>c</sup>0},B<sup>2</sup> → {1 : <sup>c</sup>2(F(a),A)}}. <sup>R</sup><sup>3</sup> is not iAST, because one can extend the rewrite sequence

$$\{1:\mathsf{f}(\mathcal{O})\} \stackrel{\mathsf{i}}{\Longrightarrow}\_{\mathcal{R}\_3} \{1:\mathsf{f}(\mathsf{a})\} \stackrel{\mathsf{i}}{\Longrightarrow}\_{\mathcal{R}\_3} \{1/2:\mathsf{f}(\mathsf{b}\_1), 1/2:\mathsf{f}(\mathsf{b}\_2)\} \stackrel{\mathsf{i}}{\Longrightarrow}\_{\mathcal{R}\_3} \{1/2:\mathsf{f}(\mathcal{O}), 1/2:\mathsf{f}(\mathsf{f}(\mathsf{a}))\} \tag{11}$$

to an infinite sequence without normal forms. The resulting chain starts with

$$\begin{array}{l} \stackrel{\text{i}}{\underset{\text{T}}{\rightleftharpoons}} \mathcal{D}\_{3} \left\{ \begin{array}{l} 1: \mathsf{c}\_{1}(\mathsf{F}(\mathcal{O})) \{ \\ 1: \mathsf{c}\_{2}(\mathsf{F}(\mathsf{a}), \mathsf{A}) \} \\ \stackrel{\text{i}}{\underset{\text{T}}{\rightleftharpoons}} \mathcal{D}\_{3} \left\{ \begin{array}{l} 1/2: \mathsf{c}\_{2}(\mathsf{F}(\mathsf{a}), \mathsf{B}\_{1}), {}^{1/2}: \mathsf{c}\_{2}(\mathsf{F}(\mathsf{a}), \mathsf{B}\_{2}) \end{array} \right\} \\ \stackrel{\text{i}}{\underset{\text{T}}{\rightleftharpoons}} \mathcal{D}\_{3} \left\{ \begin{array}{l} 1/4: \mathsf{c}\_{2}(\mathsf{F}(\mathsf{b}\_{1}), \mathsf{B}\_{1}), {}^{1/4}: \mathsf{c}\_{2}(\mathsf{F}(\mathsf{b}\_{2}), \mathsf{B}\_{1}), {}^{1/4}: \mathsf{c}\_{2}(\mathsf{F}(\mathsf{b}\_{1}), \mathsf{B}\_{2}), {}^{1/4}: \mathsf{c}\_{2}(\mathsf{F}(\mathsf{b}\_{2}), \mathsf{B}\_{2}). \end{array} \right\} \end{array} \right.$$

The second and third term in the last distribution do not correspond to terms in the original rewrite sequence (11). After the next D3-step which removes B1, no further D3-step can be applied to the underlined term anymore, because b<sup>2</sup> cannot be rewritten to O. Thus, the resulting chain criterion would be unsound, as every chain (μn)<sup>n</sup>∈<sup>N</sup> in this example contains such <sup>D</sup>3-normal forms and therefore, it is AST (i.e., lim<sup>n</sup>→∞ <sup>|</sup>μ<sup>n</sup>|<sup>D</sup><sup>3</sup> = 1 where <sup>|</sup>μ<sup>n</sup>|<sup>D</sup><sup>3</sup> is the probability for <sup>D</sup>3-normal forms in <sup>μ</sup>n). So we have to ensure that when <sup>A</sup> is rewritten to <sup>B</sup><sup>1</sup> via a DT from D3, then the "copy" a of the redex A is rewritten via R<sup>3</sup> to the corresponding term <sup>b</sup><sup>1</sup> instead of <sup>b</sup>2. Thus, after the step with <sup>i</sup> ⇒<sup>R</sup><sup>3</sup> we should have c2(F(b1),B1) and c2(F(b2),B2), but not c2(F(b2),B1) or c2(F(b1),B2).

Therefore, for our new adaption of DPs to the probabilistic setting, we operate on *pairs*. Instead of having a rule → {p<sup>1</sup> : <sup>r</sup>1,...,p<sup>k</sup> : <sup>r</sup>k} from <sup>R</sup> and its corresponding dependency tuple # → {p<sup>1</sup> : dp(r1),...,p<sup>k</sup> : dp(rk)} separately, we couple them together to #, →{p<sup>1</sup> : dp(r1), r1,...,p<sup>k</sup> : dp(rk), rk}. This type of rewrite system is called a *probabilistic pair term rewrite system (PPTRS)*, and its rules are called *coupled dependency tuples*. Our new DP framework works on *(probabilistic) DP problems* (P, <sup>S</sup>), where <sup>P</sup> is a PPTRS and <sup>S</sup> is a PTRS.

**Definition 22 (Coupled Dependency Tuple).** *Let* R *be a PTRS. For every* <sup>→</sup> <sup>μ</sup> <sup>=</sup> {p<sup>1</sup> : <sup>r</sup>1,...,p<sup>k</sup> : <sup>r</sup><sup>k</sup>}∈R*, its* coupled dependency tuple *(or simply* dependency tuple*,* DT*) is* DT ( <sup>→</sup> <sup>μ</sup>) = #, →{p<sup>1</sup> : dp(r1), r1,...,p<sup>k</sup> : dp(rk), r<sup>k</sup>}*. The set of all coupled dependency tuples of* <sup>R</sup> *is denoted by* DT (R)*.*

*Example 23.* The following PTRS Rpdiv adapts Rdiv to the probabilistic setting.

$$\mathsf{mins}(x,\mathcal{O}) \to \{1:x\} \quad \text{(12)}\qquad \mathsf{mins}(\mathsf{s}(x),\mathsf{s}(y)) \to \{1:\mathsf{mins}(x,y)\} \quad \text{(13)}$$

$$\text{div}(\mathcal{O}, \mathbf{s}(y)) \to \{1: \mathcal{O}\} \tag{14}$$

$$\mathsf{div}(\mathsf{s}(x), \mathsf{s}(y)) \rightarrow \{ \mathsf{l}/\mathsf{2} : \mathsf{div}(\mathsf{s}(x), \mathsf{s}(y)), \mathsf{l}/\mathsf{2} : \mathsf{s}(\mathsf{div}(\mathsf{minus}(x, y), \mathsf{s}(y))) \} \tag{15}$$

In (15), we now do the actual rewrite step with a chance of <sup>1</sup>/<sup>2</sup> or the terms stay the same. Our new probabilistic DP framework can prove automatically that Rpdiv is iAST, while (as in the non-probabilistic setting) a direct application of polynomial interpretations via Theorem 17 fails. We get DT (Rpdiv) = {(16),...,(19)}:

$$
\langle \mathsf{M}(x,\mathcal{O}), \mathsf{minus}(x,\mathcal{O}) \rangle \to \{1 : \langle \mathsf{c}\_{0}, x \rangle\} \tag{16}
$$

$$\langle \mathsf{M}(\mathsf{s}(x), \mathsf{s}(y)), \mathsf{minus}(\mathsf{s}(x), \mathsf{s}(y)) \rangle \rangle \to \{1 : \langle \mathsf{c}\_1(\mathsf{M}(x, y)), \mathsf{minus}(x, y) \rangle \} \tag{17}$$

$$
\langle \mathsf{D}(\mathcal{O}, \mathsf{s}(y)), \mathsf{div}(\mathcal{O}, \mathsf{s}(y)) \rangle \to \{1 : \langle \mathsf{c}\_{0}, \mathcal{O} \rangle\} \tag{18}
$$

$$\begin{aligned} \{\mathsf{D}(\mathsf{s}(x),\mathsf{s}(y)),\mathsf{div}(\mathsf{s}(x),\mathsf{s}(y))\} & \to \{\mathsf{l}/\mathsf{2}:\langle\mathsf{c}\_{1}(\mathsf{D}(\mathsf{s}(x),\mathsf{s}(y))),\mathsf{div}(\mathsf{s}(x),\mathsf{s}(y))\rangle, \\ \mathsf{l}/\mathsf{2}:\langle\mathsf{c}\_{2}(\mathsf{D}(\mathsf{minus}(x,y),\mathsf{s}(y)),\mathsf{M}(x,y)),\mathsf{s}(\mathsf{div}(\mathsf{minus}(x,y),\mathsf{s}(y)))\rangle\} & \tag{19} \end{aligned}$$

**Definition 24 (PPTRS,**i <sup>P</sup>,<sup>S</sup> **).** *Let* P *be a finite set of rules of the form* #, →{p<sup>1</sup> : <sup>d</sup>1, r1,...,p<sup>k</sup> : <sup>d</sup>k, r<sup>k</sup>}*. For every such rule, let* proj1(P) *contain* # → {p<sup>1</sup> : <sup>d</sup>1,...,p<sup>k</sup> : <sup>d</sup><sup>k</sup>} *and let* proj2(P) *contain* → {p<sup>1</sup> : <sup>r</sup>1,...,p<sup>k</sup> : <sup>r</sup><sup>k</sup>}*. If* proj2(P) *is a PTRS and* cont(d<sup>j</sup> ) <sup>⊆</sup> cont(dp(r<sup>j</sup> )) *holds*<sup>6</sup> *for all* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup>*, then* P *is a* probabilistic pair term rewrite system (PPTRS)*.*

*Let* <sup>S</sup> *be a PTRS. Then a normalized term* <sup>c</sup>n(s1,...,sn) rewrites *with the PPTRS* <sup>P</sup> *to* {p<sup>1</sup> : <sup>b</sup>1,...,p<sup>k</sup> : <sup>b</sup><sup>k</sup>} *w.r.t.* <sup>S</sup> *(denoted* i <sup>P</sup>,<sup>S</sup> *) if there are an* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>*, an* #, →{p<sup>1</sup> : <sup>d</sup>1, r1,...,p<sup>k</sup> : <sup>d</sup>k, r<sup>k</sup>} ∈ P*, a substitution* <sup>σ</sup> *with* <sup>s</sup><sup>i</sup> <sup>=</sup> #<sup>σ</sup> <sup>∈</sup> NF<sup>S</sup> *, and for all* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup> *we have* <sup>b</sup><sup>j</sup> <sup>=</sup> <sup>c</sup>n(<sup>t</sup> j 1,...,t<sup>j</sup> <sup>n</sup>) *where*

<sup>6</sup> The reason for cont(d*<sup>j</sup>* ) <sup>⊆</sup> cont(dp(r*<sup>j</sup>* )) instead of cont(d*<sup>j</sup>* ) = cont(dp(r*<sup>j</sup>* )) is that in this way processors can remove terms from the right-hand sides of DTs, see Theorem 32.

	- (i) t j i- = si*for all* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup> *or*
	- (ii) t j i- = si-[rjσ]<sup>τ</sup> *for all* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup>*,*

*if* si- <sup>|</sup><sup>τ</sup> <sup>=</sup> σ *for some position* <sup>τ</sup> *and if* → {p<sup>1</sup> : <sup>r</sup>1,...,p<sup>k</sup> : <sup>r</sup>k}∈S*. So* si *stays the same in all* <sup>b</sup><sup>j</sup> *or we can apply the rule from* proj2(P) *to rewrite* si *in all* <sup>b</sup><sup>j</sup> *, provided that this rule is also contained in* <sup>S</sup>*. Note that even if the rule is applicable, the term* si*can still stay the same in all* b<sup>j</sup> *.*

*Example 25.* For R<sup>3</sup> from Example 21, the (coupled) dependency tuple for the <sup>f</sup>-rule is <sup>F</sup>(O), <sup>f</sup>(O)→{1 : <sup>c</sup>2(F(a),A), <sup>f</sup>(a)} and the DT for the <sup>a</sup>-rule is <sup>A</sup>, <sup>a</sup>→{<sup>1</sup>/<sup>2</sup> : <sup>c</sup>1(B1), <sup>b</sup>1, <sup>1</sup>/<sup>2</sup> : <sup>c</sup>1(B2), <sup>b</sup>2}. With the lifting i -<sup>P</sup>,<sup>S</sup> of i <sup>P</sup>,<sup>S</sup> , we get the following sequence which corresponds to the rewrite sequence (11) from Example 21.

$$\begin{aligned} \{1: \mathsf{c}\_{1}(\mathsf{F}(\mathcal{O}))\} & \stackrel{\stackrel{\text{i}}{\rightarrow}}{\underset{\stackrel{\text{i}}{\rightarrow}}{\mathcal{D}T(\mathcal{R}\_{3}), \mathcal{R}\_{3}}} \{1: \mathsf{c}\_{2}(\mathsf{F}(\mathsf{a}), \mathsf{A})\} \\ & \stackrel{\stackrel{\text{i}}{\rightarrow}}{\underset{\stackrel{\text{i}}{\rightarrow}}{\mathcal{D}T(\mathcal{R}\_{3}), \mathcal{R}\_{3}}} \{1/2: \mathsf{c}\_{2}(\mathsf{F}(\mathsf{b}\_{1}), \mathsf{B}\_{1}), 1/2: \mathsf{c}\_{2}(\mathsf{F}(\mathsf{b}\_{2}), \mathsf{B}\_{2})\} \end{aligned} \tag{20}$$

So with the PPTRS, when rewriting A to B<sup>1</sup> in the second step, we can simultaneously rewrite the inner subterm a of F(a) to b<sup>1</sup> or keep a unchanged, but we cannot rewrite a to b2. This is ensured by b<sup>1</sup> in the second component of <sup>A</sup>, <sup>a</sup>→{<sup>1</sup>/<sup>2</sup> : <sup>c</sup>1(B1), <sup>b</sup>1,...}, since by Definition 24, if <sup>s</sup><sup>i</sup> contains σ at some arbitrary position τ , then one can (only) use the rule in the second component of the DT to rewrite σ (i.e., here we have s<sup>i</sup>- = F(a), s<sup>i</sup> = A, and s<sup>i</sup>- |<sup>τ</sup> = a). A similar observation holds when rewriting A to B2. Recall that with the notion of chains in Example 21, one *cannot simulate* every possible rewrite sequence, which leads to unsoundness. In contrast, with the notion of coupled DTs and PPTRSs, every possible rewrite sequence *can be simulated* which ensures soundness of the chain criterion. Of course, due to the ambiguity in (i) and (ii) of Definition 24, one could also create other "unsuitable" i -DT (R3),R<sup>3</sup> -sequences where a is not reduced to b<sup>1</sup> and b<sup>2</sup> in the second step, but is kept unchanged. This does not affect the soundness of the chain criterion, since every rewrite sequence of the original PTRS can be simulated by a "suitable" chain. To obtain completeness of the chain criterion, one would have to avoid such "unsuitable" sequences.

We also introduce an analogous rewrite relation for PTRSs, where we can apply the same rule simultaneously to the same subterms in a single rewrite step.

**Definition 26 (**i <sup>S</sup> **).** *For a PTRS* <sup>S</sup> *and a normalized term* <sup>c</sup>n(s1,...,sn)*, we define* <sup>c</sup>n(s1, ..., sn) i <sup>S</sup> {p<sup>1</sup> : <sup>b</sup>1, ..., p<sup>k</sup> : <sup>b</sup><sup>k</sup>} *if there are an* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>*, an*  → {p<sup>1</sup> : <sup>r</sup>1,...,p<sup>k</sup> : <sup>r</sup>k}∈S*, a position* <sup>π</sup>*, a substitution* <sup>σ</sup> *with* <sup>s</sup>i|<sup>π</sup> <sup>=</sup> σ *such that every proper subterm of* σ *is in* NF<sup>S</sup> *, and for all* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup> *we have* b<sup>j</sup> = cn(t j 1,...,t<sup>j</sup> <sup>n</sup>) *where*

	- (i) t j i- = si*for all* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup> *or*
	- (ii) t j i- = si- [rjσ]<sup>τ</sup> *for all* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup>*, if* <sup>s</sup>i-<sup>|</sup><sup>τ</sup> <sup>=</sup> σ *for some position* <sup>τ</sup> *.*

So for example, the lifting i -<sup>S</sup> of i <sup>S</sup> for <sup>S</sup> <sup>=</sup> <sup>R</sup><sup>3</sup> rewrites {1 : <sup>c</sup>2(f(a), <sup>a</sup>)} to both {<sup>1</sup>/<sup>2</sup> : <sup>c</sup>2(f(b1), <sup>b</sup>1), <sup>1</sup>/<sup>2</sup> : <sup>c</sup>2(f(b2), <sup>b</sup>2)} and {<sup>1</sup>/<sup>2</sup> : <sup>c</sup>2(f(a), <sup>b</sup>1), <sup>1</sup>/<sup>2</sup> : <sup>c</sup>2(f(a), <sup>b</sup>2)}.

A straightforward adaption of "chains" to the probabilistic setting using i -<sup>P</sup>,<sup>S</sup> ◦ i -\* <sup>S</sup> would force us to use steps with DTs from P at the same time for all terms in a multi-distribution. Therefore, instead we view a rewrite sequence on multi-distributions as a tree (e.g., the tree representation of the rewrite sequence (20) from Example 25 is on the right). Regarding the

paths in this tree (which represent rewrite sequences of terms with certain probabilities) allows us to adapt the idea of chains, i.e., that one uses only finitely many S-steps before the next step with a DT from P.

**Definition 27 (Chain Tree).** <sup>T</sup>= (V, E, L, P) *is an (innermost)* (P, <sup>S</sup>)-chain tree *if*


Conditions 1–5 ensure that the tree represents a valid rewrite sequence and the last condition is the main property for chains.

**Definition 28 (**|T|Leaf**, iAST).** *For any innermost* (P, <sup>S</sup>)*-chain tree* <sup>T</sup> *we define* |T|Leaf = <sup>v</sup>∈Leaf <sup>p</sup>v*. We say that* (P, <sup>S</sup>) *is* iAST *if we have* <sup>|</sup>T|Leaf = 1 *for every innermost* (P, <sup>S</sup>)*-chain tree* <sup>T</sup>*.*

While we have |T|Leaf = 1 for every finite chain tree T, for infinite chain trees T we may have <sup>|</sup>T|Leaf <sup>&</sup>lt; 1 or even <sup>|</sup>T|Leaf = 0 if <sup>T</sup> has no leaf at all.

With this new type of DTs and chain trees, we now obtain an analogous chain criterion to the non-probabilistic setting.

## **Theorem 29 (Chain Criterion).** *A PTRS* <sup>R</sup> *is iAST if* (DT (R), <sup>R</sup>) *is iAST.*

In contrast to the non-probabilistic case, our chain criterion as presented in the paper is *sound* but not *complete* (i.e., we do not have "iff" in Theorem 29). However, we also developed a refinement where our chain criterion is made complete by also storing the positions of the defined symbols in dp(r) [27]. In this way, one can avoid "unsuitable" chain trees, as discussed at the end of Example 25.

Our notion of DTs and chain trees is only suitable for *innermost* evaluation. To see this, consider the PTRSs R <sup>1</sup> and R <sup>2</sup> which both contain <sup>g</sup> → {<sup>1</sup>/<sup>2</sup> : <sup>O</sup>, <sup>1</sup>/<sup>2</sup> : <sup>h</sup>(g)}, but in addition <sup>R</sup> <sup>1</sup> has the rule <sup>h</sup>(x) → {1 : <sup>f</sup>(x, x)} and <sup>R</sup> 2 has the rule <sup>h</sup>(x) → {1 : <sup>f</sup>(x, x, x)}. Similar to <sup>R</sup><sup>1</sup> and <sup>R</sup><sup>2</sup> in (10), <sup>R</sup> <sup>1</sup> is AST while R <sup>2</sup> is not. In contrast, both R <sup>1</sup> and R <sup>2</sup> are iAST, since the innermost evaluation strategy prevents the application of the h-rule to terms containing g. Our DP framework handles R <sup>1</sup> and R <sup>2</sup> in the same way, as both have the same DT <sup>G</sup>, <sup>g</sup>→{<sup>1</sup>/<sup>2</sup> : <sup>c</sup>0, O, <sup>1</sup>/<sup>2</sup> : <sup>c</sup>2(H(g),G), <sup>h</sup>(g)} and a DT <sup>H</sup>(x), <sup>h</sup>(x) → {1 : <sup>c</sup>0, <sup>f</sup>(...)}. Even if we allowed the application of the second DT to terms of the form H(g), we would still obtain |T|Leaf = 1 for every chain tree T. So a DP framework to analyze "full" instead of innermost AST would be considerably more involved.

#### **4.2 The Probabilistic DP Framework**

Now we introduce the probabilistic dependency pair framework which keeps the core ideas of the non-probabilistic framework. So instead of applying one ordering for a PTRS directly as in Theorem 17, we want to benefit from modularity. Now <sup>a</sup> *DP processor* Proc is of the form Proc(P, <sup>S</sup>) = {(P1, <sup>S</sup>1),...,(P<sup>n</sup>, <sup>S</sup><sup>n</sup>)}, where <sup>P</sup>,P1,...,P<sup>n</sup> are PPTRSs and <sup>S</sup>, <sup>S</sup>1,..., <sup>S</sup><sup>n</sup> are PTRSs. A processor Proc is *sound* if (P, <sup>S</sup>) is iAST whenever (P<sup>i</sup>, <sup>S</sup><sup>i</sup>) is iAST for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>. It is *complete* if (P<sup>i</sup>, <sup>S</sup><sup>i</sup>) is iAST for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> whenever (P, <sup>S</sup>) is iAST. In the following, we adapt the three main processors from Theorems 5, 6, and 7 to the probabilistic setting and present two additional processors.

The (innermost) (P, <sup>S</sup>)-*dependency graph* indicates which DTs from <sup>P</sup> can rewrite to each other using the PTRS S. The possibility of rewriting with S is not related to the probabilities. Thus, for the dependency graph, we can use the *non-probabilistic variant* np(S) = { <sup>→</sup> <sup>r</sup><sup>j</sup> <sup>|</sup> → {p<sup>1</sup> : <sup>r</sup>1,...,p<sup>k</sup> : <sup>r</sup><sup>k</sup>}∈S, <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup>}.

**Definition 30 (Dep. Graph).** *The node set of the* (P, <sup>S</sup>)-dependency graph *is* <sup>P</sup> *and there is an edge from* # <sup>1</sup> , 1→{p<sup>1</sup> : <sup>d</sup>1, r1,...,p<sup>k</sup> : <sup>d</sup>k, r<sup>k</sup>} *to* # <sup>2</sup> , 2 → ... *if there are substitutions* <sup>σ</sup>1, σ<sup>2</sup> *and* <sup>t</sup> # <sup>∈</sup> cont(d<sup>j</sup> ) *for some* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup> *such that* <sup>t</sup> #σ<sup>1</sup> <sup>→</sup><sup>i</sup> <sup>∗</sup> np(S) # <sup>2</sup> σ<sup>2</sup> *and both* # <sup>1</sup> σ<sup>1</sup> *and* # <sup>2</sup> <sup>σ</sup><sup>2</sup> *are in* NF<sup>S</sup> *.*

For <sup>R</sup>pdiv from Example 23, the (DT (Rpdiv), <sup>R</sup>pdiv) dependency graph is on the side. In the non-probabilistic DP framework, every step with <sup>→</sup><sup>i</sup> <sup>D</sup>,<sup>R</sup> corresponds to an edge in the (D, <sup>R</sup>)-dependency graph. Similarly, in the probabilistic set-

ting, every path from one node of <sup>P</sup> to the next node of <sup>P</sup> in a (P, <sup>S</sup>)-chain tree corresponds to an edge in the (P, <sup>S</sup>)- dependency graph. Since every infinite path in a chain tree contains infinitely many nodes from P, when tracking the arguments of the compound symbols, every such path traverses a cycle of the dependency graph infinitely often. Thus, it again suffices to consider the SCCs of the dependency graph separately. So for our example, we obtain ProcDG(DT (Rpdiv), <sup>R</sup>pdiv) = {({(17)}, <sup>R</sup>pdiv),({(19)}, <sup>R</sup>pdiv)}. To automate the following two processors, the same over-approximation techniques as for the non-probabilistic dependency graph can be used.

**Theorem 31 (Prob. Dep. Graph Processor).** *For the SCCs* <sup>P</sup>1, ...,P<sup>n</sup> *of the* (P, <sup>S</sup>)*-dependency graph,* ProcDG(P, <sup>S</sup>)={(P1, <sup>S</sup>), ..., (P*n*, <sup>S</sup>)} *is sound and complete.*

Next, we introduce a new *usable terms processor* (a similar processor was also proposed for the DTs in [37]). Since we regard dependency *tuples* instead of pairs, after applying ProcDG, the right-hand sides of DTs # <sup>1</sup> , 1 → ... might still contain terms t # where no instance t #σ<sup>1</sup> rewrites to an instance # <sup>2</sup> σ<sup>2</sup> of a left-hand side of a DT (where we only consider instantiations such that # <sup>1</sup> σ<sup>1</sup> and # <sup>2</sup> <sup>σ</sup><sup>2</sup> are in NF<sup>S</sup> , because only such instantiations are regarded in chain trees). Then t # can be removed from the right-hand side of the DT. For example, in the DP problem ({(19)}, <sup>R</sup>pdiv), the only DT (19) has the left-hand side <sup>D</sup>(s(x),s(y)). As the term M(x, y) in (19)'s right-hand side cannot "reach" D(...), the following processor removes it, i.e., ProcUT({(19)}, <sup>R</sup>pdiv) = {({(21)}, <sup>R</sup>pdiv)}, where (21) is

$$\begin{aligned} \{\mathsf{D}(\mathsf{s}(x),\mathsf{s}(y)),\mathsf{div}(\mathsf{s}(x),\mathsf{s}(y))\} &\to \{\mathsf{l}^{1}/\mathsf{a}: (\mathsf{c}\_{1}(\mathsf{D}(\mathsf{s}(x),\mathsf{s}(y))),\mathsf{div}(\mathsf{s}(x),\mathsf{s}(y))),\\ {\mathsf{l}^{1}/\mathsf{a}: (\mathsf{c}\_{1}(\mathsf{D}(\mathsf{minus}(x,y),\mathsf{s}(y))),\mathsf{s}(\mathsf{div}(\mathsf{minus}(x,y),\mathsf{s}(y))))\} &\}.\end{aligned} \text{ (21)}$$

So both Theorems 31 and 32 are needed to fully simulate the dependency graph processor in the probabilistic setting, i.e., they are both necessary to guarantee that the probabilistic DP processors work analogously to the nonprobabilistic ones (which in turn ensures that the probabilistic DP framework is similar in power to its non-probabilistic counterpart). This is also confirmed by our experiments in Sect. 5 which show that disabling the processor of Theorem 32 affects the power of our approach. For example, without Theorem 32, the proof that Rpdiv is iAST in the probabilistic DP framework would require a more complicated polynomial interpretation. In contrast, when using both processors of Theorems 31 and 32, then one can prove iAST of Rpdiv with the same polynomial interpretation that was used to prove iTerm of Rdiv (see Example 36).

**Theorem 32 (Usable Terms Processor).** *Let* # <sup>1</sup> *be a term and* (P, <sup>S</sup>) *be a DP problem. We call a term* t # usable *w.r.t.* # <sup>1</sup> *and* (P, <sup>S</sup>) *if there is a* # <sup>2</sup> , 2 → ... ∈ P *and substitutions* <sup>σ</sup>1, σ<sup>2</sup> *such that* <sup>t</sup> #σ<sup>1</sup> <sup>→</sup><sup>i</sup> <sup>∗</sup> np(S) # <sup>2</sup> σ<sup>2</sup> *and both* # <sup>1</sup> σ<sup>1</sup> *and* # <sup>2</sup> <sup>σ</sup><sup>2</sup> *are in* NF<sup>S</sup> *. If* <sup>d</sup> <sup>=</sup> <sup>c</sup>n(<sup>t</sup> # <sup>1</sup> ,...,t# <sup>n</sup> )*, then* UT (d)- # <sup>1</sup> ,P,<sup>S</sup> *denotes the term* cm(t # <sup>i</sup><sup>1</sup> ,...,t# <sup>i</sup>*m*)*, where* <sup>1</sup> <sup>≤</sup> <sup>i</sup><sup>1</sup> < ... < i<sup>m</sup> <sup>≤</sup> <sup>n</sup> *are the indices of all terms* t # <sup>i</sup> *that are usable w.r.t.* # <sup>1</sup> *and* (P, <sup>S</sup>)*. The transformation that removes all non-usable terms in the right-hand sides of dependency tuples is denoted by:*

$$\mathcal{T}\_{\mathsf{IT}}(\mathcal{P}, \mathcal{S}) = \left\{ \langle \ell^{\#}, \ell \rangle \to \{ p\_1 : \langle \mathcal{U}\mathcal{T}(d\_1)\_{\ell^{\#}, \mathcal{P}, \mathcal{S}}, r\_1 \rangle, \dots, p\_k : \langle \mathcal{U}\mathcal{T}(d\_k)\_{\ell^{\#}, \mathcal{P}, \mathcal{S}}, r\_k \rangle \right\}$$

$$|\langle \ell^{\#}, \ell \rangle \to \{ p\_1 : \langle d\_1, r\_1 \rangle, \dots, p\_k : \langle d\_k, r\_k \rangle \} \in \mathcal{P} \right\}$$

*Then* ProcUT(P, <sup>S</sup>) = {(TUT(P, <sup>S</sup>), <sup>S</sup>)} *is sound and complete.*

To adapt the *usable rules processor*, we adjust the definition of usable rules such that it regards every term in the support of the distribution on the righthand side of a rule. The usable rules processor only deletes non-usable rules from S, but not from proj2(P). This is sufficient, because according to Definition 24, rules from proj2(P) can only be applied if they also occur in S.

**Theorem 33 (Probabilistic Usable Rules Processor).** *Let* (P, <sup>S</sup>) *be a DP problem. For every* <sup>f</sup> <sup>∈</sup> <sup>Σ</sup> <sup>Σ</sup># *let* Rules<sup>S</sup> (f) = { <sup>→</sup> <sup>μ</sup> ∈S| root() = <sup>f</sup>}*. For any term* <sup>t</sup> ∈ T - <sup>Σ</sup> <sup>Σ</sup>#, <sup>V</sup> *, its* usable rules <sup>U</sup><sup>S</sup> (t) *are the smallest set such that* <sup>U</sup><sup>S</sup> (x) = <sup>∅</sup> *for all* <sup>x</sup> ∈ V *and* <sup>U</sup><sup>S</sup> (f(t1,...,tn)) = Rules<sup>S</sup> (f) <sup>∪</sup> n <sup>i</sup>=1 <sup>U</sup><sup>S</sup> (ti) <sup>∪</sup> -<sup>→</sup>μ∈Rules<sup>S</sup> (f),r∈Supp(μ) <sup>U</sup><sup>S</sup> (r)*. The* usable rules *for* (P, <sup>S</sup>) *are* <sup>U</sup>(P, <sup>S</sup>) = -#→μ∈proj1(P),d∈Supp(μ) <sup>U</sup><sup>S</sup> (d)*. Then* ProcUR(P, <sup>S</sup>) = {(P, <sup>U</sup>(P, <sup>S</sup>))} *is sound.*

*Example 34.* For the DP problem ({(21)}, <sup>R</sup>pdiv) only the minus-rules are usable and thus ProcUR({(21)}, <sup>R</sup>pdiv) = {({(21)}, {(12),(13)})}. For ({(17)}, <sup>R</sup>pdiv) there are no usable rules at all, hence ProcUR({(17)}, <sup>R</sup>pdiv) = {({(17)}, <sup>∅</sup>)}.

For the *reduction pair processor*, we again restrict ourselves to multilinear polynomials and use analogous constraints as in our new criterion for the direct application of polynomial interpretations to PTRSs (Theorem 17), but adapted to DP problems (P, <sup>S</sup>). Moreover, as in the original reduction pair processor of Theorem 7, the polynomials only have to be weakly monotonic. For every rule in S or proj1(P), we require that the expected value is weakly decreasing. The reduction pair processor then removes those DTs #, →{p<sup>1</sup> : <sup>d</sup>1, r1,...,p<sup>k</sup> : <sup>d</sup>k, r<sup>k</sup>} from <sup>P</sup> where in addition there is at least one term d<sup>j</sup> that is strictly decreasing. Recall that we can also rewrite with the original rule → {p<sup>1</sup> : <sup>r</sup>1,...,p<sup>k</sup> : <sup>r</sup><sup>k</sup>} from proj2(P), provided that it is also contained in S. Therefore, to remove the dependency tuple, we also have to require that the rule <sup>→</sup> <sup>r</sup><sup>j</sup> is weakly decreasing. Finally, we have to use c*-additive* interpretations (with c<sup>n</sup>Pol(x1,...,xn) = x<sup>1</sup> + ... + xn) to handle compound symbols and their normalization correctly.

**Theorem 35 (Probabilistic Reduction Pair Processor).** *Let* Pol : <sup>T</sup> (<sup>Σ</sup> <sup>Σ</sup>#, <sup>V</sup>) <sup>→</sup> <sup>N</sup>[V] *be a weakly monotonic, multilinear, and* <sup>c</sup>*-additive polynomial interpretation. Let* <sup>P</sup> <sup>=</sup> <sup>P</sup><sup>≥</sup> P<sup>&</sup>gt; *with* <sup>P</sup><sup>&</sup>gt; <sup>=</sup> <sup>∅</sup> *such that:*


*Then* ProcRP(P, <sup>S</sup>) = {(P≥, <sup>S</sup>)} *is sound and complete.*

*Example 36.* The constraints of the reduction pair processor for the two DP problems from Example 34 are satisfied by the c-additive polynomial interpretation which again maps <sup>O</sup> to 0, <sup>s</sup>(x) to <sup>x</sup> + 1, and all other non-constant function symbols to the projection on their first arguments. As in the nonprobabilistic case, this results in DP problems of the form (∅,...) and subsequently, ProcDG(∅,...) yields ∅. By the soundness of all processors, this proves that Rpdiv is iAST.

So with the new probabilistic DP framework, the proof that Rpdiv is iAST is analogous to the proof that Rdiv is iTerm in the original DP framework (the proofs even use the same polynomial interpretation in the respective reduction pair processors). This indicates that our novel framework for PTRSs has the same essential concepts and advantages as the original DP framework for TRSs. This is different from our previous adaption of dependency pairs for complexity analysis of TRSs, which also relies on dependency tuples [37]. There, the power is considerably restricted, because one does not have full modularity as one cannot decompose the proof according to the SCCs of the dependency graph.

In proofs with the probabilistic DP framework, one may obtain DP problems (P, <sup>S</sup>) that have a non-probabilistic structure (i.e., every DT in <sup>P</sup> has the form #, →{1 : d, r} and every rule in <sup>S</sup> has the form → {1 : <sup>r</sup> }). We now introduce a processor that allows us to switch to the original non-probabilistic DP framework for such (sub-)problems. This is advantageous, because due to the use of dependency *tuples* instead of pairs in P, in general the constraints of the probabilistic reduction pair processor of Theorem 35 are harder than the ones of the reduction pair processor of Theorem 7. Moreover, Theorem 7 is not restricted to multilinear polynomial interpretations and the original DP framework has many additional processors that have not yet been adapted to the probabilistic setting.

**Theorem 37. (Probability Removal Processor).** *Let* (P, <sup>S</sup>) *be a probabilistic DP problem where every DT in* <sup>P</sup> *has the form* #, →{1 : d, r} *and every rule in* <sup>S</sup> *has the form* → {1 : <sup>r</sup> }*. Let* np(P) = {# <sup>→</sup> <sup>t</sup> # <sup>|</sup> # <sup>→</sup> {1 : <sup>d</sup>} ∈ proj1(P), t# <sup>∈</sup> cont(d)}*. Then* (P, <sup>S</sup>) *is iAST iff the non-probabilistic DP problem* (np(P), np(S)) *is iTerm. So if* (np(P), np(S)) *is iTerm, then the processor* ProcPR(P, <sup>S</sup>) = <sup>∅</sup> *is sound and complete.*

### **5 Conclusion and Evaluation**

Starting with a new "direct" technique to prove almost-sure termination of probabilistic TRSs (Theorem 17), we presented the first adaption of the dependency pair framework to the probabilistic setting in order to prove innermost AST automatically. This is not at all obvious, since most straightforward ideas for such an adaption are unsound (as discussed in Sect. 4.1). So the challenge was to find a suitable definition of dependency pairs (resp. tuples) and chains (resp. chain trees) such that one can define DP processors which are sound and work analogously to the non-probabilistic setting (in order to obtain a framework which is similar in power to the non-probabilistic one). While the soundness proofs for our new processors are much more involved than in the non-probabilistic case, the new processors themselves are quite analogous to their non-probabilistic counterparts and thus, adapting an existing implementation of the non-probabilistic DP framework to the probabilistic one does not require much effort.

We implemented our contributions in our termination prover AProVE, which yields the first tool to prove almost-sure innermost termination of PTRSs on arbitrary data structures (including PTRSs that are not PAST). In our experiments, we compared the direct application of polynomials for proving AST (via our new Theorem 17) with the probabilistic DP framework. We evaluated AProVE on a collection of 67 PTRSs which includes many typical probabilistic algorithms. For example, it contains the following PTRS Rqs for probabilistic quicksort.

$$\begin{aligned} \mathsf{rotate}(\mathsf{cons}(x,xs)) &\longrightarrow \{ \mathsf{l} \, \mathsf{l} \, : \, \mathsf{cons}(x,xs), \, \mathsf{l} \, \mathsf{l} \, : \, \mathsf{rotate}(\mathsf{app}(xs, \mathsf{cons}(x, \mathsf{nil}))) \} \\ \mathsf{qss}(\mathsf{nil}) &\longrightarrow \{ \mathsf{l} \, : \, \mathsf{nil} \} \\ \mathsf{qss}(\mathsf{cons}(x, xs)) &\longrightarrow \{ \mathsf{l} \, : \, \mathsf{qss} \mathsf{Rel} \mathsf{p}(\mathsf{rotate}(\mathsf{cons}(x, xs))) \} \\ \mathsf{qss} \mathsf{Rel} \mathsf{p}(\mathsf{cons}(x, xs)) &\longrightarrow \{ \mathsf{l} \, : \, \mathsf{app}(\mathsf{qss}(\mathsf{lov}(x, xs)), \mathsf{cons}(x, \mathsf{qss}(\mathsf{high}(x, xs)))) \} \end{aligned}$$

The rotate-rules rotate a list randomly often (they are AST, but not terminating). Thus, by choosing the first element of the resulting list, one obtains a random pivot element for the recursive call of quicksort. In addition to the rules above, Rqs contains rules for list concatenation (app), and rules such that low(x, *xs*) (resp. high(x, *xs*)) returns all elements of the list *xs* that are smaller (resp. greater or equal) than x, see [28]. Using the probabilistic DP framework, AProVE can prove iAST of Rqs and many other typical programs.

61 of the 67 examples in our collection are iAST and AProVE can prove iAST for 53 (87%) of them. Here, the DP framework proves iAST for 51 examples and the direct application of polynomial interpretations via Theorem 17 succeeds for 27 examples. (In contrast, proving PAST via the direct application of polynomial interpretations as in [3] only works for 22 examples.) The average runtime of AProVE per example was 2.88 s (where no example took longer than 8 s). So our experiments indicate that the power of the DP framework can now also be used for probabilistic TRSs.

We also performed experiments where we disabled individual processors of the probabilistic DP framework. More precisely, we disabled either the usable terms processor (Theorem 32), both the dependency graph and the usable terms processor (Theorems 31 and 32), or all processors except the reduction pair processor of Theorem 35. Our experiments show that disabling processors indeed affects the power of the approach, in particular for larger examples with several defined symbols (e.g., then AProVE cannot prove iAST of Rqs anymore). So all of our processors are needed to obtain a powerful technique for termination analysis of PTRSs.

Due to the use of dependency *tuples* instead of pairs, the probabilistic DP framework does not (yet) subsume the direct application of polynomials completely (two examples in our collection can only be proved by the latter, see [28]). Therefore, currently AProVE uses the direct approach of Theorem 17 in addition to the probabilistic DP framework. In future work, we will adapt further processors of the original DP framework to the probabilistic setting, which will also allow us to integrate the direct approach of Theorem 17 into the probabilistic DP framework in a modular way. Moreover, we will develop processors to prove AST of full (instead of innermost) rewriting. Further work may also include processors to disprove (i)AST and possible extensions to analyze PAST and expected runtimes as well. Finally, one could also modify the formalism of PTRSs in order to allow non-constant probabilities which depend on the sizes of terms.

For details on our experiments and for instructions on how to run our implementation in AProVE via its *web interface* or locally, we refer to https://aprovedevelopers.github.io/ProbabilisticTermRewriting/.

**Acknowledgements.** We are grateful to Marcel Hark, Dominik Meier, and Florian Frohn for help and advice.

#### **References**


LNCS (LNAI), vol. 12167, pp. 436–447. Springer, Cham (2020). https://doi.org/ 10.1007/978-3-030-51054-1 28


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Verification of NP-Hardness Reduction Functions for Exact Lattice Problems**

Katharina Kreuzer(B) and Tobias Nipkow

Technical University of Munich, Boltzmannstr. 3, 85748 Garching, Germany k.kreuzer@tum.de

**Abstract.** This paper describes the formal verification of NP-hardness reduction functions of two key problems relevant in algebraic lattice theory: the closest vector problem and the shortest vector problem, both in the infinity norm. The formalization uncovered a number of problems with the existing proofs in the literature. The paper describes how these problems were corrected in the formalization. The work was carried out in the proof assistant Isabelle.

**Keywords:** verification · NP-hardness · lattice problems · integer programming

### **1 Introduction**

In recent years, algebraic lattices have received increasing attention for their use in post-quantum cryptography. Algebraic lattices are additive, discrete subgroups of R*<sup>n</sup>*, i.e. a set of points in R*<sup>n</sup>* with certain structures. One can also define lattices over finite fields, rings or modules as used in many modern post-quantum crypto systems such as the CRYSTALS suites, NTRU and Saber.

Two problems form the very basis for computationally hard problems on lattices, namely the closest vector problem (CVP) and the shortest vector problem (SVP). Given a finite set of basis vectors in R*<sup>n</sup>*, the set of all linear combinations with integer coefficients forms a lattice. In optimization form, the SVP asks for the shortest vector in the lattice and the CVP asks for the lattice vector closest to some given target vector, both with respect to some given norm.

When working over the reals, the <sup>p</sup>-norm (for <sup>p</sup> <sup>≥</sup> 1) is defined as *<sup>p</sup> <sup>i</sup>* <sup>|</sup>x*i*<sup>|</sup> *p*. The most common examples are the Euclidean norm x<sup>2</sup> and the infinity norm x<sup>∞</sup> = max*i*{|x*i*|}, which is the limit for <sup>p</sup> → ∞.

We have formalized, corrected and verified a number of NP-hardness proofs from the literature, uncovering a number of mistakes along the way. The first NP-hardness proof of the CVP and SVP in infinity norm is due to van Emde-Boas [7]. For other norms (especially for the Euclidean norm), there is only a randomized reduction for the NP-hardness of the SVP so far [2]. For the CVP,

This work was supported by the Research Training Group GRK 2428 CONVEY of the German Research Council (DFG).

c The Author(s) 2023

B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 365–381, 2023. https://doi.org/10.1007/978-3-031-38499-8\_21

NP-hardness has been shown in any <sup>p</sup>-norm for <sup>p</sup> <sup>≥</sup> 1. One exemplary proof can be found in the book by Micciancio and Goldwasser [15, Chapter 3, Thm 3.1].

The CVP and SVP were the starting point for lattice-based post-quantum cryptography [16]. Moreover, the relevance of these problems can also be seen from the rich literature on approximation results. For example, the LLLalgorithm by Lenstra, Lenstra and Lov´asz [12] gives a polynomial-time algorithm for lattice basis reduction which solves integer linear programs in fixed dimensions. Using this reduced basis, one can find good approximations to the CVP using Babai's algorithm [3] for certain approximation factors. Still, for arbitrary dimensions, the problem remains NP-hard. Further approximation results for the CVP, SVP and integer programming can be found elsewhere [6,9,10,14,19]. These approximation problems are used in cryptography. However, we will focus on the exact CVP and SVP in this paper.

A number of more basic NP-hardness proofs have been formalized in several theorem provers so far. For example, there are formalizations of the Cook-Levin Theorem in Coq [8] and Isabelle [4]. Formalizing Karp's 21 NP-hard problems (including the Subset Sum and Partition Problems assumed to be NP-hard in this paper) in Isabelle is an ongoing project.

#### **1.1 Contributions**

In this paper we present NP-hardness proofs of the CVP and SVP in infinity norm that have been verified in a proof assistant. We roughly follow the book by Micciancio and Golwasser [15, Chapter 3, Thm 3.1] and the report by van Emde-Boas [7]. However, many problems with the original proofs were encountered during the formalization efforts. We will have a look at different approaches and their advantages or problems.

We also verified the proof of NP-hardness of the CVP for any finite <sup>p</sup> <sup>≥</sup> <sup>1</sup> from the book by Micciancio and Goldwasser. This verification did not uncover any problems with the informal proof. Thus we do not discuss it in detail.

These formalizations were carried out with the help of the proof assistant Isabelle [17,18] and are available online [11]. They comprise 5200 lines. To the authors knowledge, they are the first formalizations of hardness proofs for lattice problems. Because of the importance of the SVP and CVP and the problems in existing proofs, we consider our proofs a contribution to the foundations of verified cryptography. However, we do not claim that these hardness results directly imply quantum-resistance of any lattice-based cryptosystems.

#### **1.2 Overview**

The paper is structured as follows. Section 2 introduces the foundations. The rest of the paper is dedicated to the proofs, which are phrased as the following two polynomial time reduction chains:


Subset Sum and Partition are famous fundamental problems whose NP-hardness has been proved many times in the literature and which we take for granted.

Section 3 presents the reduction of Subset Sum to the CVP. Differences between our formalization and the book by Micciancio and Goldwasser [15] are presented with examples that demonstrate problems with the original proof. Moreover, an example is given why the generalization to the SVP given in [15] does not work.

Therefore we turn to the early proof of NP-hardness of the SVP by van Emde Boas [7]. This proof uses the Bounded Homogeneous Linear Equations problem (BHLE) which is introduced in Sect. 4. The formalization of this proof is one of the major achievements in this paper. It posed a significant challenge since it often relied on human intuition and had to be restructured appropriately to allow a formal proof. The main proof steps are explained and difficulties in the formalization effort are described. This proof only works in infinity norm and we explain why. In Sect. 5, the reduction from BHLE to the SVP is given. Again, this proof was quite elaborate to formalize as there were inaccuracies and a lot of intuition was involved. Differences between the formal proof and [7] are explained by examples.

In Sect. 6, we have a quick look at the reduction proof for the CVP in p-norm (for finite <sup>p</sup> <sup>≥</sup> 1). In the case of the SVP there only exists a randomized hardness proof in Euclidean norm by Ajtai [1] up to now.

Finally, the time complexity of the reduction functions are considered in Sect. 7. We conclude the paper with a short summary and outlook.

### **2 Foundations**

This section introduces known foundations mainly to fix the terminology and notation: problem reductions, lattices, and the combinatorial problems under consideration (CVP, SVP, Partition and Subset Sum).

#### **2.1 Problem Reductions**

Formally, a *decision problem* is given by the set of *YES-instances* P and a set <sup>Γ</sup> of problem *instances*, where <sup>P</sup> <sup>⊆</sup> <sup>Γ</sup>. We often associate the decision problem with the set of YES-instances, when the instance set Γ is obvious and not explicitly defined. In this paper we will often phrase problems informally (e.g. "decide if p is prime") rather than give them explicitly as sets. For example, the decision problem "decide if a natural number p is prime" will be formalized in the following way: the set of problem instances is Γ = N (in Isabelle these are all elements of type nat); and the YES-instances are <sup>P</sup> <sup>=</sup> {<sup>p</sup> <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>p</sup> is prime} (in Isabelle this is a set of type nat set).

**Definition 1 (Problem reduction).** *Let* <sup>A</sup> <sup>⊆</sup> <sup>Γ</sup> *and* <sup>B</sup> <sup>⊆</sup> <sup>Δ</sup> *be two problems. A function* <sup>f</sup> : <sup>Γ</sup> <sup>→</sup> <sup>Δ</sup> *is a reduction from* <sup>A</sup> *to* <sup>B</sup> *if it fulfills the following properties:*


If A is NP-hard, a reduction to B proves NP-hardness of B.

In this paper we present reduction functions informally (e.g. "an a is reduced to a b that is constructed like this") and often with copious amounts of ". . . " to construct vectors etc. Of course in the formalization these reduction functions are spelled out in complete detail. Since all operations used in the reduction functions in this paper are elementary, the polynomial time property has not been formalized but is briefly discussed in Sect. 7. The focus of our paper are the proofs <sup>a</sup> <sup>∈</sup> <sup>A</sup> <sup>⇔</sup> <sup>f</sup>(a) <sup>∈</sup> <sup>B</sup>.

#### **2.2 Lattice-Based Computational Problems**

To have a better understanding, we will first introduce lattices as such. Lattices are a structured set of points. They form an additive, discrete subgroup of R*<sup>n</sup>*. Formally, we define the following.

**Definition 2 (Lattice).** *Let* <sup>A</sup> <sup>=</sup> {a1,...,a*n*} ⊂ <sup>R</sup>*<sup>n</sup> be a set of linearly independent vectors. Then the integer span of* <sup>A</sup> *forms a lattice* <sup>L</sup>*, that is:*

**Fig. 1.** Two exemplary lattices in R<sup>2</sup>

*Example 1.* In Fig. 1 two examples of lattices in R<sup>2</sup> are depicted. The red point is the origin. The two blue arrows show the basis vectors a<sup>1</sup> and a<sup>2</sup> that are linearly independent and span the lattice. Every integer combination of the two blue arrows is a black point, an element of the lattice.

We can see that the grid spanned by the basis vectors is discrete and has some recurring structures. These structures are determined by the basis vectors: the

L = *<sup>n</sup> i*=1 <sup>c</sup>*i*a*<sup>i</sup>* <sup>|</sup> <sup>c</sup>*<sup>i</sup>* <sup>∈</sup> <sup>Z</sup>  angle between them and their length. In Fig. 1a, the angle between the two basis vectors is 90◦ yielding a rectangular fundamental domain. Whereas in Fig. 1b, we have an angle of 60◦ between the basis vectors and equal length. This produces a fundamental domain of an equilateral triangle.

Indeed, the automorphism group of a lattice is a symmetry group, see Conway [5, Chapter 3.4]. For example, in Fig. 1a the symmetry group is **pmm** and in Fig. 1b is it **p3m1** [13].

In the rest of the text and in the formalization we restrict to finite bases over Z (instead of R), simply for computability reasons. Of course bases over Q can be transformed into bases over Z by scaling all basis vectors.

The starting point of most known hard problems on lattices are the shortest vector problem and the closest vector problem. They are defined below (as usual in decision and not in optimization form). The lattice L ⊆ <sup>Z</sup>*<sup>n</sup>* is assumed to be generated by a finite basis in Z*<sup>n</sup>*.

**Definition 3 (Closest Vector Problem (CVP)).** *Given a lattice* L*, a vector* <sup>b</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> and an estimate* <sup>k</sup>*, decide whether there exists a vector* <sup>v</sup> ∈ L *such that*

$$\|v - b\| \le k$$

**Definition 4 (Shortest Vector Problem (SVP)).** *Given a lattice* L *and an estimate* <sup>k</sup>*, determine whether there exists a vector* <sup>v</sup> ∈ L *such that*

$$\|v\| \le k \text{ and } v \ne 0$$

#### **2.3 Partition and Subset Sum Problems**

Recall that we plan to prove NP-hardness of the CVP and SVP in the case of the infinity norm by reducing the well-studied NP-complete Subset Sum and Partition problems to the CVP and SVP. We state the definitions.

**Definition 5 (Partition problem).** *Given a finite list of integers* a1,...,a*n, does there exist a partition of* {<sup>1</sup> ...n} *into subsets* <sup>I</sup> *and* {<sup>1</sup> ...n} \ <sup>I</sup> *such that*

$$\sum\_{i \in I} a\_i = \sum\_{i \in \{1...n\} \backslash I} a\_i$$

The Partition problem can be seen as a special case of the Subset Sum problem.

**Definition 6 (Subset Sum problem).** *Given a finite list of integers* <sup>a</sup>1,...,a*<sup>n</sup> and an integer* <sup>s</sup>*, decide whether there exists a subset* <sup>S</sup> *of* {<sup>1</sup> ...n} *such that*

$$\sum\_{i \in S} a\_i = s$$

#### **2.4 Notation**

Throughout the paper we use traditional mathematical notation, in particular the graphical "...". The formal Isabelle notation is by necessity more verbose (and precise). Our formalization employs both lists and vectors as a type for finite sequences and converts between them where necessary. For reasons of presentation we blur this distinction in the paper.

### **3 CVP**

In this section, we formalize the proof of the NP-hardness of the CVP in the infinity norm along the lines of [15, p 48., Chap. 3.2, Thm 3.1] by reducing Subset Sum to the CVP.

An instance a1,...,a*n*, s of Subset Sum is mapped to the following instance of the CVP:

$$\mathcal{L} = \begin{pmatrix} a\_1 \cdots \cdots a\_n \\ a\_1 \cdots \cdots a\_n \\ 2 & 0 \\ \cdot & \cdot \\ 0 & 2 \end{pmatrix} \cdot \mathbb{Z}^n \qquad b = \begin{pmatrix} s-1 \\ s+1 \\ 1 \\ \vdots \\ 1 \end{pmatrix} \qquad k = 1 \tag{1}$$

We proved the following theorem:

**Theorem 1.** *The above mapping is a reduction from the Subset Sum problem to the CVP (in infinity norm).*

This implies that the CVP (in infinity norm) is an NP-hard problem.

The reduction function used by Micciancio and Goldwasser [15] actually looks a bit different. The image of a1,...,a*n*, s would be

$$B = \begin{pmatrix} a\_1 \cdots \cdots a\_n \\ 2 & 0 \\ & \ddots \\ 0 & & 2 \end{pmatrix} \qquad \mathcal{L} = B \cdot \mathbb{Z}^n \qquad b = \begin{pmatrix} s \\ 1 \\ \vdots \\ 1 \end{pmatrix} \qquad k = 1 \tag{2}$$

However, the proof in [15, p. 49] with this reduction function works only for p < <sup>∞</sup>. It goes along the lines of the following idea: Take <sup>k</sup> <sup>=</sup> <sup>√</sup>*<sup>p</sup>* <sup>n</sup>. In the case of <sup>p</sup> <sup>=</sup> <sup>∞</sup>, we get <sup>k</sup> = lim*<sup>p</sup>*→∞ <sup>√</sup>*<sup>p</sup>* <sup>n</sup> = 1. Then we can formulate the following equality (equation (3.5) in [15, p. 49]):

$$\|\|Bx - b\|\|\_{p}^{p} = \left| \sum\_{i=1}^{n} a\_i x\_i - s \right|^{p} + \sum\_{i=1}^{n} |2x\_i - 1|^p \tag{3}$$

Given a YES-instance a1,...,a*n*, s of Subset Sum, there exists a vector x = (x1,...,x*n*) ∈ {0, <sup>1</sup>}*<sup>n</sup>*, such that *<sup>n</sup> <sup>i</sup>*=1 <sup>a</sup>*i*x*<sup>i</sup>* <sup>−</sup> <sup>s</sup> = 0 and <sup>|</sup>2x*<sup>i</sup>* <sup>−</sup> <sup>1</sup><sup>|</sup> = 1. Then Bx <sup>−</sup> <sup>b</sup>*<sup>p</sup> <sup>p</sup>* = n which proves this case.

Given a YES-instance of the CVP defined by <sup>L</sup>, <sup>t</sup> and <sup>k</sup> that are the image of <sup>a</sup>1,...,a*n*, s under the reduction function as in (2), we get Bx <sup>−</sup> <sup>b</sup>*<sup>p</sup> <sup>p</sup>* <sup>≤</sup> <sup>n</sup>. Since all values are integers, we have <sup>|</sup>2x*i*−1| ≥ 1. It follows that *<sup>n</sup> <sup>i</sup>*=1 <sup>a</sup>*i*x*i*−<sup>s</sup> = 0 and <sup>|</sup>2x*<sup>i</sup>* <sup>−</sup> <sup>1</sup><sup>|</sup> = 1. Thus, we can deduce that <sup>a</sup>1,...,a*n*, s was indeed a YES-instance of Subset Sum.

The major problem we encountered was that this proof works fine for p < <sup>∞</sup> but for <sup>p</sup> <sup>=</sup> <sup>∞</sup>, the sum in (3) becomes a maximum instead. The equation then reads

$$\|\|Bx - b\|\|\_{\infty} = \max\left( \left| \sum\_{i=1}^{n} a\_i x\_i - s \right|, |2x\_i - 1| \text{ for } 1 \le i \le n \right)$$

This invalidates the arguments in the proof since | *<sup>n</sup> <sup>i</sup>*=1 <sup>a</sup>*i*x*<sup>i</sup>* <sup>−</sup> <sup>s</sup><sup>|</sup> can now be in the range {−1, <sup>0</sup>, <sup>1</sup>}. The constraints are too lax to ensure the equality to zero.

A solution was to alter the matrix and target vector and add another entry. The matrix and target vector we used are given in Eq. (1). The alternation to <sup>s</sup> <sup>−</sup> 1 and <sup>s</sup> + 1 forces a linear combination of the <sup>a</sup>*<sup>i</sup>* to be exactly <sup>s</sup> in the hardness proof, since | *<sup>i</sup>* <sup>c</sup>*i*a*<sup>i</sup>* <sup>−</sup> (<sup>s</sup> <sup>±</sup> 1)| ≤ 1.

After communicating with Daniele Micciancio, one of the authors of [15], he suggested using a constant c > 1 and the generating instance

$$
\mathcal{L} = \begin{pmatrix} c \cdot a\_1 \cdot \cdots \cdot c \cdot a\_n \\ 2 & 0 \\ & \ddots \\ 0 & & 2 \end{pmatrix} \cdot \mathbb{Z}^n \qquad b = \begin{pmatrix} c \cdot s \\ 1 \\ \vdots \\ 1 \end{pmatrix} \qquad k = 1
$$

This solves the problem as well and can be implemented using e.g. c = 2. This technique is described later in the book [15, pp. 49–51] when trying to explain the NP-hardness proof for the SVP in the infinity norm.

#### **3.1 Towards the SVP**

The authors of [15] argue that the reduction argument of the SVP can be deduced generating an instance of the SVP using the Subset Sum instance a1,...,a*n*, s in the following way. For c > 1, e.g. c = 2, take

$$B = \begin{pmatrix} c \cdot a\_1 \cdot \cdots \cdot c \cdot a\_n \ c \cdot s \\ 2 & 0 & 1 \\ & \ddots & \\ 0 & & 2 & 1 \end{pmatrix} \qquad \mathcal{L} = B \cdot \mathbb{Z}^{n+1} \qquad k = 1$$

The authors claim that every shortest vector in the image of the reduction function has −1 as last coefficient. For example, let a YES-instance of the SVP be defined by the generating matrix <sup>B</sup> of the lattice and let <sup>x</sup> = (x1,...,x*n*, <sup>−</sup>1)*<sup>T</sup>*

be the coefficients such that Bx is a shortest vector. Then we know that

$$||Bx||\_{\infty} = \left|| \begin{pmatrix} c \cdot (x\_1 a\_1 + \dots + x\_n a\_n - s) \\ 2x\_1 - 1 \\ \vdots \\ 2x\_n - 1 \end{pmatrix} \right||\_{\infty} \le 1$$

Since c > 1, it follows, that <sup>x</sup>1a<sup>1</sup> <sup>+</sup> ··· <sup>+</sup> <sup>x</sup>*n*a*<sup>n</sup>* <sup>−</sup> <sup>s</sup> = 0, which yields a solution for the given Subset Sum instance a1,...,a*n*, s.

However, this reduction does not always work as the following example shows:

*Example 2.* Given the Subset Sum instance (a1, a2, a3, s) = (1, 1, 1, 1). This is a YES-instance, since a solution is given by x<sup>1</sup> = 1, x<sup>2</sup> = 0 and x<sup>3</sup> = 0. The basis matrix of the corresponding SVP would be (with c > 1)

$$B = \begin{pmatrix} c \ c \ c \ c \\ 2 \ 0 \ 0 \ 1 \\ 0 \ 2 \ 0 \ 1 \\ 0 \ 0 \ 2 \ 1 \end{pmatrix}$$

Take for example the vector <sup>v</sup> <sup>=</sup> <sup>B</sup> · (−1, <sup>−</sup>1, <sup>−</sup>1, 3)*<sup>T</sup>* = (0, <sup>1</sup>, <sup>1</sup>, 1)*<sup>T</sup>* . It has infinity norm 1 and is thus a shortest vector in the lattice generated by B. However, this vector has the last coefficient 3 and not −1, even though it clearly is a shortest vector of the lattice given by B. The corresponding scaled "solution" for Subset Sum would be (1/3, <sup>1</sup>/3, <sup>1</sup>/3, <sup>−</sup>1) but since only integer values are allowed in the solution space, this is not a solution in our sense.

We consider another example. Let the Subset Sum instance be a <sup>1</sup> = 3, s = 1. We can easily see that this is not a YES-instance, i.e. there exists no solution. Still, the corresponding SVP instance given via the reduction function is generated by the matrix

$$B' = \begin{pmatrix} c \cdot 3 \ c \cdot 1\\ 2 & 1 \end{pmatrix}$$

In this case the coefficients (−1, 3)*<sup>T</sup>* yield a shortest vector in the lattice spanned by B , since

$$\left\| \left| B' \begin{pmatrix} -1 \\ 3 \end{pmatrix} \right| \right\|\_{\infty} = \left\| \begin{pmatrix} 0 \\ 1 \end{pmatrix} \right\|\_{\infty} \le 1$$

Thus, B defines a YES-instance of the SVP, but the original Subset Sum instance is not a YES-instance.

In [15], it is stated for the infinity norm that any shortest vector yields a solution for the Subset Sum Problem, which is not the case in these examples: we cannot ensure that a shortest vector always has −1 as a last coordinate.

Although the proof in [15] does not work out as expected, there is still the reduction proof by van Emde-Boas [7] which reduces a problem called the Bounded Homogeneous Linear Equation problem to the SVP in infinity norm. This will be discussed in the next two sections.

### **4 Bounded Homogeneous Linear Equations**

A technical report by Peter van Emde-Boas [7] gives another reduction proof for the NP-hardness of the SVP in infinity norm. The author first reduces the Partition Problem to a problem called Bounded Homogeneous Linear Equation (BHLE) which is then reduced to the SVP.

#### **Definition 7 (Bounded Homogeneous Linear Equations problem).**

*Given a finite vector of integers* <sup>b</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> and a positive integer* <sup>k</sup>*, decide whether there exists an* <sup>x</sup> <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* \ {0} *with* x<sup>∞</sup> <sup>≤</sup> <sup>k</sup> *such that*

$$\langle b, x \rangle = 0$$

We have verified a reduction from Partition to BHLE, and thus BHLE is NP-hard.

#### **Theorem 2.** *There is a reduction from Partition to BHLE in infinity norm.*

The proof is carefully engineered and rather intricate. Differences to the original proof and problems encountered during the formalization are:


Let us have a look at the proof and its difficulties in the formalization in more detail. We start from a Partition instance a = a1,...,a*n*. Note that we ignore the trivial case n = 0 in this presentation (but deal with it in the formal proofs)—this means <sup>n</sup> <sup>−</sup> <sup>1</sup> <sup>≥</sup> 0. We reduce <sup>a</sup> to a BHLE instance <sup>b</sup> as follows:

– Define

$$M = 2 \cdot (\sum\_{i=1}^{n} |a\_i|) + 1\tag{4}$$

– For 1 <sup>≤</sup> i<n generate a 5-tuple

$$\begin{aligned} b\_{i,1} &= a\_i + M \cdot (5^{4i-4} + 5^{4i-3} + 5^{4i-1}) \\ b\_{i,2} &= M \cdot (5^{4i-3} + 5^{4i}) \\ b\_{i,3} &= M \cdot (5^{4i-4} + 5^{4i-2}) \\ b\_{i,4} &= a\_i + M \cdot (5^{4i-2} + 5^{4i-1} + 5^{4i}) \\ b\_{i,5} &= M \cdot (5^{4i-1}) \\ b\_i &= b\_{i,1}, b\_{i,2}, b\_{i,4}, b\_{i,5}, b\_{i,3} \end{aligned}$$

Note that b*i,*<sup>3</sup> has moved to the last position in b*i*.

– For i = n generate only a 4-tuple:

$$\begin{aligned} b\_{n,1} &= a\_n + M \cdot (5^{4n-4} + 5^{4n-3} + 5^{4n-1}) \\ b\_{n,2} &= M \cdot (5^{4n-3} + 1) \\ b\_{n,4} &= a\_n + M \cdot (5^{4n-2} + 5^{4n-1} + 1) \\ b\_{n,5} &= M \cdot (5^{4n-1}) \\ b\_n &= b\_{n,1}, b\_{n,2}, b\_{n,4}, b\_{n,5} \end{aligned} \tag{6}$$

Note that


In summary, the entry b*i,*<sup>3</sup> is uniformly in the last position in the b*<sup>i</sup>* but omitted from the final b*n*.

The Partition instance <sup>a</sup> of length <sup>n</sup> is reduced to a vector <sup>b</sup> of length 5n−1:

$$b = (b\_1, \dots, b\_{n-1}, b\_n) \tag{7}$$

The NP-hardness proof now follows in three steps:


#### **4.1 Auxiliary Lemma**

As a first step, the proof needs a short auxiliary lemma from number theory.

**Lemma 1.** *Let* x, y, c <sup>∈</sup> <sup>Z</sup>*<sup>n</sup> and* <sup>M</sup> *be an integer. Assume that* M > *<sup>n</sup> <sup>i</sup>*=1 <sup>|</sup>x*i*<sup>|</sup> *and that* <sup>|</sup>c*i*| ≤ <sup>1</sup> *for all* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>*. Furthermore, let the following equation hold:*

$$\sum\_{i=1}^{n} c\_i \cdot (x\_i + M \cdot y\_i) = 0 \tag{8}$$

*Then we have*

$$
\langle c, x \rangle = 0 \qquad \text{and} \qquad \langle c, y \rangle = 0
$$

In this lemma, we can reinterpret <sup>x</sup>*<sup>i</sup>* <sup>+</sup> <sup>M</sup> · <sup>y</sup>*<sup>i</sup>* from (8) as a number in basis <sup>M</sup> with lowest digit x*i*. Even with a coefficient c*i*, the lowest digit in basis M has to be zero, as well as the rest. By splitting off the lowest digits consecutively, we can show, that indeed all digits in basis M have to equal zero.

### **4.2** *a ∈* **Partition =***⇒ b ∈* **BHLE**

This direction is quite easy. Let a1,...,a*<sup>n</sup>* be a YES-instance of partition with partitioning set I. We will show that the following vector x is a solution to the corresponding BHLE:

$$\begin{aligned} x &= (x\_1, \dots, x\_{n-1}, x\_n) \\ x\_i &= \begin{cases} 1, -1, 0, -1, 0 & i \in I \land n-1 \in I \\ 0, 0, -1, 1, 1 & i \in I \land n-1 \notin I \\ 0, 0, -1, 1, 1 & i \notin I \land n-1 \in I \\ 1, -1, 0, -1, 0 & i \notin I \land n-1 \notin I \end{cases} & 1 \le i < n \\ x\_n &= 1, -1, 0, -1 \end{aligned}$$

We have to show that b, x = 0. This is proven by plugging in the definitions and rearranging terms in the sum of the scalar product such that they cancel out. As a last step in the proof, we need to show that x<sup>∞</sup> <sup>≤</sup> 1. For the infinity norm this is quite easy. However, it would not be true for other norms. For <sup>p</sup> <sup>≥</sup> <sup>1</sup> and p < <sup>∞</sup> we have for <sup>n</sup> <sup>≥</sup> 1:

$$\|x\|\_p = \sqrt[p]{3n} > 1$$

Thus, the chosen constraints x only work in infinity norm.

### **4.3** *a ∈* **Partition** *⇐***=** *b ∈* **BHLE**

This direction is harder. Let b be a YES-instance of BHLE. That is, there exists a nonzero <sup>x</sup> such that b, x = 0 and x<sup>∞</sup> <sup>≤</sup> 1. We have to show that there is a partition I on a1,...,a*<sup>n</sup>* with *<sup>i</sup>*∈*<sup>I</sup>* <sup>a</sup>*<sup>i</sup>* <sup>=</sup> *<sup>i</sup>*∈{1*...n*}\*<sup>I</sup>* <sup>a</sup>*i*.

The proof idea works as follows. First, we apply the auxiliary lemma and get a constraint on the a*<sup>i</sup>* on the one hand, and a condition on the x*<sup>i</sup>* with coefficients that are powers of 5 on the other hand. Using this condition on the x*i*, we generate equational constraints on the entries of x by looking at the digits in basis 5. We argue that a number equals zero if and only if all its digits are zero.

The generated equations lead to a good characterisation of x, namely the weight <sup>w</sup> <sup>=</sup> <sup>x</sup>5(*n*−1)+1. From the assumption that x<sup>∞</sup> <sup>≤</sup> 1, we deduce <sup>|</sup>w| ≤ 1. Again, this step can only be reasoned in the infinity norm. For other p-norms, this argumentation breaks as we need the property <sup>|</sup>w| ≤ 1 to complete the proof. Using the value of w, we can constuct a partitioning set I with the required property from the equation on the a*i*.

### **5 SVP**

Knowing that the BHLE is indeed an NP-hard problem, we reduce it to the SVP. Then we can conclude that the SVP in infinity norm is NP-hard.

**Theorem 3.** *There is a reduction from BHLE to the SVP in infinity norm.*

Again some difficulties were met when formalizing the proof for the above theorem. First of all, note that the terminology in [7] and nowadays is a bit different. In [7], the shortest vector problem only denotes the shortest vector problem in the Euclidean norm. What we call the shortest vector problem in the infinity norm is named closest vector problem in [7]. To make terminology even more confusing, our understanding of the closest vector problem is called the nearest vector problem in [7]. To make the notation clear, we provide a table for reference in Fig. 2.


**Fig. 2.** Notation

A more mathematical problem encountered was that the reduction itself used in [7] was not entirely correct. In the reduction two factors k = k+1 and k were introduced. These factors should have certain properties to allow the arguments of the reduction proof to go through. However, this is only true when tweaking these factors a bit to make the whole proof watertight. We will now have a closer look.

Given the BHLE instance b = (b1,...,b*n*) and k, create the following SVP instance:

$$
\mathcal{L} = \begin{pmatrix} 1 & & & 0 & 0 \\ & \ddots & & & \vdots \\ 0 & & & 1 & 0 \\ - & (k+1) \cdot b - k'' \end{pmatrix} \cdot \mathbb{Z}^n \qquad \begin{array}{c} k = k \end{array}
$$

where k is the factor in question. In the technical report, we have

$$k'' = 2 \cdot (k+1) \cdot (\sum\_{i} b\_i) + 1$$

The following example however shows that this factor is not enough.

*Example 3.* Consider the BHLE instance given by <sup>b</sup> = (1, <sup>−</sup>1) and <sup>k</sup> = 1. This is a YES-instance, since the vector (1, 1) yields the expected properties.

Define the following matrices.

$$B\_0 = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 2 & -2 & 1 \end{pmatrix} \qquad B\_1 = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 2 & -2 & 9 \end{pmatrix} \qquad B\_2 = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 6 & -6 & 25 \end{pmatrix}$$

The associated SVP instance is the lattice generated by B0. Then the vector (0, 0, 1)*<sup>T</sup>* with infinity norm 1 is a solution to the SVP instance generated by the basis matrix B0. However, since the last entry is nonzero, this does not provide a solution for BHLE. Contrary to this example, the proof in the technical report shows that for all SVP solutions the last entry must be zero.

The reason, why the argument in the technical report breaks at this point is because b<sup>1</sup> + b<sup>2</sup> = 0, thus making k = 1 very small. One step to prevent this is to use the absolute values of the b*<sup>i</sup>* in k instead. The new k <sup>1</sup> we consider is

$$k\_1'' = 2 \cdot (k+1) \cdot \left(\sum\_i |b\_i|\right) + 1$$

With this new factor k <sup>1</sup> we get the generating matrix B<sup>1</sup> and the vector (0, 0, 1) is no longer a shortest vector.

Still, this is not enough. Consider the same <sup>b</sup> = (1, <sup>−</sup>1) as above, but let k = 5. Then we get B<sup>2</sup> as the generating matrix of the SVP lattice. The vector x = (0, 5, 1)*<sup>T</sup>* is a shortest vector whose last entry is nonzero. Again it contradicts the proof in the technical report. The reason this time is the following: the argument that (k+1) (*<sup>n</sup> <sup>i</sup>*=1 <sup>x</sup>*i*b*i*) and <sup>k</sup> <sup>1</sup> have different relative sizes fails. Indeed, we have

$$
\left\| \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 6 & -6 & 25 \end{pmatrix} \cdot \begin{pmatrix} 0 \\ 5 \\ 1 \end{pmatrix} \right\|\_{\infty} = \left\| \begin{pmatrix} 0 \\ 5 \\ -5 \end{pmatrix} \right\|\_{\infty} = 5 \le k
$$

We can obtain different relative sizes of (k+1) (*<sup>n</sup> <sup>i</sup>*=1 <sup>x</sup>*i*b*i*) and <sup>k</sup> <sup>1</sup> by defining

$$k\_2'' = 2 \cdot k \cdot (k+1) \cdot \left(\sum\_i |b\_i|\right) + 1\tag{9}$$

Now we can make sure that the last entry of a solution to the SVP problem is indeed zero. For the proof of Theorem 3 we consider the reduction given by

$$\mathcal{L} = \underbrace{\begin{pmatrix} 1 & & & 0 & 0 \\ & \ddots & & & \vdots \\ 0 & & & 1 & 0 \\ - & (k+1) \cdot b - k\_2'' \end{pmatrix}}\_B \cdot \mathbb{Z}^n \qquad k = k$$

where <sup>B</sup> denotes the basis matrix generating the lattice <sup>L</sup> as given above.

Consider a solution <sup>x</sup> = (x1,...,x*n*+1) of the SVP with Bx<sup>∞</sup> <sup>≤</sup> <sup>k</sup>. Then we have

$$Bx = \begin{pmatrix} 1 & & & 0 & 0 \\ & \ddots & & \vdots \\ 0 & & & 1 & 0 \\ -(k+1)\cdot b - k\_2' \end{pmatrix} \cdot \begin{pmatrix} x\_1 \\ \vdots \\ \vdots \\ x\_n \\ x\_{n+1} \end{pmatrix} = \begin{pmatrix} x\_1 \\ \vdots \\ \vdots \\ x\_n \\ (k+1)(\sum\_{i=1}^n x\_i b\_i) + x\_{n+1} \cdot k\_2'' \end{pmatrix}$$

As this yields a solution to the SVP, we get:

$$|(k+1)(\sum\_{i=1}^{n} x\_i b\_i) + x\_{n+1} \cdot k\_2''| \le k \tag{10}$$

Then we calculate:

$$\begin{aligned} (k+1)(\sum\_{i=1}^n x\_i b\_i) + x\_{n+1} \cdot k\_2'' &\le (k+1)(\sum\_{i=1}^n |x\_i| |b\_i|) + x\_{n+1} \cdot k\_2'' \le \frac{1}{2} \\ &\le (k+1)k(\sum\_{i=1}^n |b\_i|) + x\_{n+1} \cdot k\_2'' \end{aligned}$$

Assuming that <sup>x</sup>*n*+1 = 0, we have

$$|(k+1)k(\sum\_{i=1}^{n}|b\_i|)| < |2 \cdot k \cdot (k+1) \cdot (\sum\_{i}|b\_i|) + 1| = |k\_2''| \le |x\_{n+1} \cdot k\_2''|$$

Thus the two summands indeed have different relative sizes and can never cancel out the other summand. This leads to a contradiction to (10). Therefore, x*n*+1 = 0 must be true and (x1,...,x*n*) constitutes a solution to the BHLE when using k <sup>2</sup> as in (9).

### **6 Other** *p***-Norms**

Up to now, we have investigated lattice problems under the infinity norm. Even though this yields nice hardness results, in practice the Euclidean norm is used more often. Unfortunately, when considering p-norms things do not play out as nicely. In this section, we assume 1 <sup>≤</sup> p < <sup>∞</sup> whenever we talk about a specific <sup>p</sup>.

For the CVP, there is a generalisation of the proof for every p-norm in [15, p. 48, Chap. 3.2, Thm 3.1] which we also formalized. Let a1,...,a*n*, s be an instance of Subset Sum. The reduction function maps this instance to:

$$\mathcal{L} = \begin{pmatrix} a\_1 \cdots \cdots a\_n \\ 2 & 0 \\ & \ddots \\ 0 & & 2 \end{pmatrix} \cdot \mathbb{Z}^n \qquad \begin{array}{c} \begin{pmatrix} s \\ 1 \\ \vdots \\ 1 \end{pmatrix} \\\ \begin{array}{c} \\ \end{array} \qquad k = \sqrt[n]{n} \end{array}$$

Then the following theorem holds:

**Theorem 4.** *The above mapping is a reduction from the Subset Sum problem to the CVP in* p*-norm.*

This implies that the CVP in p-norm is an NP-hard problem. The outline to the proof is given in Sect. 3 after Theorem 1. The important difference to the infinity norm is that the bound k scales with the dimension n of the lattice.

For the SVP, there is no known deterministic NP-hardness result in the Euclidean norm, or even any p-norm. However, Ajtai [1,2] found an interesting alternative which is quite useful for the application in cryptography, namely randomized reductions using polynomial-time probabilistic reduction functions. In cryptography, these results guarantee the hardness of "average" cases. That is, given an average instance according to a probability distribution, it will most likely be intractable.

### **7 Time Complexity**

As stated in Sect. 2, time complexity of the above reduction functions has not been formalized. However, we give a short explanation why all reduction functions are indeed in polynomial time.

**Subset Sum to CVP:** The reduction function as given in Eq. (1) creates (n + 2)(n + 1) + 1 values using only memory access or one addition. Therefore, the time complexity in this case is <sup>O</sup>(n<sup>2</sup>).

**Partition to BHLE:** In this case, the reduction function maps the input a of length n to b as defined in Eq. (7). The value k = 1 is fixed. Then a is mapped to a vector of length 5<sup>n</sup> <sup>−</sup> 1. When calculating the <sup>b</sup>*i*, we need to calculate the value of <sup>M</sup> as in (4). As we sum over all input values, this lies in <sup>O</sup>(n). Each <sup>b</sup>*<sup>i</sup>* can then be calculated in <sup>O</sup>(n) since it only contains a constant number of additions of the input with fixed cofactors (see (5)–(6)). Putting the construction of the list and the calculation of the b*<sup>i</sup>* together, we find that the whole reduction function is in <sup>O</sup>(n<sup>2</sup>).

**BHLE to the SVP:** Consider the reduction function as given in Eq. (5) using the value k <sup>2</sup> as in (9). Calculating k <sup>2</sup> requires n + 2 memory accesses which are processed in n + 4 arithmetic operations, thus having a time complexity of <sup>O</sup>(n). Every other entry in the matrix is calculated on <sup>O</sup>(1), since they contain at most two memory accesses and at most two arithmetic operations. The input generates (n+ 1)<sup>2</sup> + 1 values, of which (n+ 1)(n+ 1) are in <sup>O</sup>(1) (namely all the zeros and ones, the vector (<sup>k</sup> + 1)· <sup>a</sup> and the constraint <sup>k</sup>) and one is calculated in <sup>O</sup>(n) (namely <sup>k</sup> <sup>2</sup> ). Thus, the whole reduction function lies in <sup>O</sup>(n<sup>2</sup>).

### **8 Outlook**

With this paper, we now have a formal proof for NP-hardness of the CVP and SVP in the infinity norm, as well as a formal proof of the CVP in p-norm (for <sup>1</sup> <sup>≤</sup> p < <sup>∞</sup>). In the formalization process, many gaps and imprecisions in the pen-and-paper proofs were fixed. The changes to the original proofs have been elaborated with explanations and examples. Unfortunately, giving a deterministic reduction proof of the SVP in <sup>p</sup> norm for p < <sup>∞</sup> is still an open problem. Under probabilistic assumptions, Ajtai showed NP-hardness of the SVP in Euclidean norm in [2].

An interesting topic for future work is to develop a framework for probabilistic reductions such as in [2]. This will give the foundation to extend formalization of hardness proofs to other problems in lattice theory, especially those used in lattice-based cryptography, such as the Learning with Errors (LWE) Problem, Ring-LWE and Module-LWE. This will underline the security of many latticebased crypto systems. Another topic for future work is to formalize the hardness proofs for approximate versions of the CVP and SVP.

**Acknowledgements.** We thank Manuel Eberl for continuous support and fruitful discussions. The first author gratefully acknowledges the financial support of this work by the research training group ConVeY funded by the German Research Foundation under grant GRK 2428.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Buy One Get 14 Free: Evaluating Local Reductions for Modal Logic**

Cl´audia Nalon<sup>1</sup> , Ullrich Hustadt2(B) , Fabio Papacchini<sup>3</sup> , and Clare Dixon<sup>4</sup>

<sup>1</sup> Department of Computer Science, University of Bras´ılia, Bras´ılia, Brazil nalon@unb.br

<sup>2</sup> Department of Computer Science, University of Liverpool, Liverpool, UK U.Hustadt@liverpool.ac.uk

<sup>3</sup> School of Computing and Communications, Lancaster University in Leipzig, Leipzig, Germany

f.papacchini@lancaster.ac.uk

<sup>4</sup> Department of Computer Science, University of Manchester, Manchester, UK clare.dixon@manchester.ac.uk

**Abstract.** We are interested in widening the reasoning support for propositional modal logics in the so-called modal cube. The modal cube consists of extensions of the basic modal logic K with an arbitrary combination of the modal axioms B, D, T, 4 and 5. We revisit recently developed local reductions from all logics in the modal cube to a normal form comprising sets of clausal formulae with associated modal levels. We extend these reductions further to the basic modal logic K, called *definitional reductions*. This enables any prover for K to be used to solve the satisfiability problem for all logics in the modal cube. We also present alternative, *axiomatic*, reductions based on ideas originally proposed by Kracht, providing new theoretical results and improved bounds on the size of the reductions. We compare both sets of reductions combined with state-of-the-art provers for K on a large set of parametric benchmarks for all logics in the modal cube. The results show that the provers perform better with reductions based on the clausal normal form than the axiomatic reductions.

### **1 Introduction**

Following [4], modal logics can be seen as simple but expressive languages for talking about relational structures that provide an internal and local perspective on those structures. The most intensively studied modal logics are the basic modal logic K and its extensions with one or more of the axioms B (symmetry), D (seriality), T (reflexivity), 4 (transitivity) and 5 (Euclideaness), that form

C. Nalon was partially supported by FAPDF 11/2021, DPG/UnB 004/2022. C. Dixon was partially supported by the EPSRC funded RAI Hubs FAIR-SPACE (EP/R026092/1) and RAIN (EP/R026084/1), and the EPSRC funded programme Grant S4 (EP/N007565/1).

the so-called *modal cube*. There are numerous reasons for this. To name just three: (i) relations which are serial, symmetric, transitive, etc. are very common; (ii) the logics in the modal cube can be used to represent and reason about idealised mental attitudes such as knowledge, belief, desire and intention; (iii) mathematical techniques, algorithms, calculi, as well as implemented reasoning tools for these logics provide building blocks for the study and application of more complex modal logics.

In [27], we have presented a reduction from each of the 15 distinct logics in the modal cube to Separated Normal Form with Sets of Modal Levels, SNFsml, a clausal normal form for basic modal logic in which clauses are labelled with possibly infinite sets of modal levels, and to Separated Normal Form with Modal Levels, SNFml, where each clause is given a natural number label. The latter reduction then allowed us to use the modal-layered clausal resolution (MLR) calculus [22], implemented in the modal logic theorem prover KSP [19,26] to reason in these logics. We evaluated this approach on a new collection of benchmark formulae for all 15 logics and compared its performance with that of the global modal resolution (GMR) calculus also implemented in KSP and with Leo-III, an automated theorem prover for polymorphic higher-order logic [32]. The GMR calculus has specific rules for each logic while Leo-III reasons about modal logics using a translation approach and has translations for each of the 15 logics built in. The evaluation showed that the approach performs better than Leo-III but not as well as the GMR calculus in KSP. We identified the reduction from SNFsml to SNFml as the main contributing factor, in particular, on satisfiable formulae where the MLR calculus has to fully saturate the corresponding set of SNFml clauses up to redundancy before it can conclude that the original formula is satisfiable.

In this paper, we investigate and evaluate an alternative use of our reductions from logics in the modal cube to SNFml. A finite set of clauses in SNFml can straightforwardly be transformed into a formula in the basic modal logic K. Such a transformation then allows the use of any existing approach to solving the satisfiability problem in K to the satisfiability problem in all logics in the modal cube. An advantage of the use of this transformation over a translation from each of the 15 logics to first-order (or higher-order) logic [1,5,9,14] is the availability of implemented decision procedures for basic modal logic. In contrast, while many decidable fragments of first-order logics are known, including decidable fragments that are suitable targets of translations of modal logic formulae, implemented decision procedures for these fragments are rare. See also related discussions in [27,30].

The original motivation for our work on reductions to SNFsml and SNFml were Kracht's reductions of the normal modal logics KB, KD, KT, and K4 to K [17,18]. Extending our reduction from SNFml to K to obtain a reduction from the modal cube to K raises first the question whether one can devise a reduction based on the same idea as Kracht's for the remaining logics of the modal cube. We will call such a reduction *axiomatic* as the idea is to use certain instances of axiom schemata embedded into modal contexts of nested ✷-operators up to a certain depth bound. We answer this question positively by providing the reductions missing in Kracht's work. The second question then raised is how well provers for K perform on our reduction compared to an axiomatic reduction. Our empirical evaluation indicates that the definitional reduction appears to result in better performance overall when combined with state-of-the-art K provers.

The structure of the paper is as follows. In Sect. 2 we recall common concepts of propositional modal logics and the definition of our normal form SNFml. Section 3 recalls our reduction from logics in the modal cube to SNFml, defines the transformation of a finite set of SNFml clauses to basic modal, and introduces the axiomatic reduction for the logics in the modal cube. In Sect. 4 we compare the performance of a combination of the reductions defined in Sect. 3 when combined with provers for basic modal logic as well as with the global resolution calculus for logics in the modal cube implemented in KSP.

### **2 Preliminaries**

The language of modal logic is an extension of the language of propositional logic with unary modal operators ✷ and ✸. More precisely, given a denumerable set of *propositional symbols*, <sup>P</sup> <sup>=</sup> {p, p0, q, q0, t, t0,...} as well as propositional *constants* **true** and **false**, *modal formulae* are inductively defined as follows: constants and propositional symbols are modal formulae. If ϕ and ψ are modal formulae, then so are <sup>¬</sup>ϕ, (ϕ∧ψ), (ϕ∨ψ), (<sup>ϕ</sup> <sup>→</sup> <sup>ψ</sup>), ✷ϕ, and ✸ϕ. We also assume that ∧, and ∨ are associative and commutative operators and consider, e.g., (p∨(<sup>q</sup> <sup>∨</sup>r)) and (<sup>r</sup> <sup>∨</sup>(<sup>q</sup> <sup>∨</sup>p)) to be identical formulae. We often omit parentheses if this does not cause confusion. The size of ϕ is the number of occurrences of propositional constants, propositional variable, boolean operators and modal operators in ϕ. By var(ϕ) we denote the set of all propositional symbols occurring in ϕ. This function easily extends to finite sets of modal formulae. A *modal axiom (schema)* is a modal formula ψ representing the set of all instances of ψ.

A *literal* is either a propositional symbol or its negation; the set of literals is denoted by <sup>L</sup><sup>P</sup> . By <sup>¬</sup><sup>l</sup> we denote the *complement* of the literal <sup>l</sup> <sup>∈</sup> <sup>L</sup><sup>P</sup> , that is, if <sup>l</sup> is the propositional symbol <sup>p</sup> then <sup>¬</sup><sup>l</sup> denotes <sup>¬</sup>p, and if <sup>l</sup> is the literal <sup>¬</sup><sup>p</sup> then <sup>¬</sup><sup>l</sup> denotes <sup>p</sup>. By <sup>|</sup>l<sup>|</sup> for <sup>l</sup> <sup>∈</sup> <sup>L</sup><sup>P</sup> we denote <sup>p</sup> if <sup>l</sup> <sup>=</sup> <sup>p</sup> or <sup>l</sup> <sup>=</sup> <sup>¬</sup>p. A *modal literal* is either ✷<sup>l</sup> or ✸l, where <sup>l</sup> <sup>∈</sup> <sup>L</sup><sup>P</sup> .

An occurrence of a subformula has *positive polarity* if it is inside the scope of an even number of (explicit or implicit) negations, and it has *negative polarity* if it is one inside the scope of an odd number of negations. A literal is *pure* if all its occurrences have either a positive or a negative polarity.

The modal logic K is given by the smallest set of modal formulae which includes all propositional tautologies, the axiom schema ✷(<sup>ϕ</sup> <sup>→</sup> <sup>ψ</sup>) <sup>→</sup> (✷<sup>ϕ</sup> <sup>→</sup> ✷ψ), is closed under modus ponens and the rule of necessitation (if <sup>ϕ</sup> <sup>∈</sup> <sup>K</sup> then ✷<sup>ϕ</sup> <sup>∈</sup> <sup>K</sup>). Given a modal logic <sup>L</sup> and set of axioms <sup>Σ</sup>, the smallest modal logic <sup>L</sup> <sup>⊃</sup> <sup>L</sup> <sup>∪</sup> <sup>Σ</sup> is an *extension* of <sup>L</sup> and we denote <sup>L</sup> by LΣ.

The standard semantics of modal logics is the *Kripke semantics* or *possible world semantics*. A *Kripke frame* <sup>F</sup> is an ordered pair W, R where <sup>W</sup> is a nonempty set of *worlds* and R is a binary (accessibility) relation over W. A *Kripke* *structure* <sup>M</sup> over <sup>P</sup> is an ordered pair F, V where <sup>F</sup> is a Kripke frame and the *valuation* V is a function mapping each propositional symbol in P to a subset <sup>V</sup> (p) of <sup>W</sup>. A *rooted Kripke structure* is an ordered pair M,w<sup>0</sup> with <sup>w</sup><sup>0</sup> <sup>∈</sup> <sup>W</sup>.

Satisfaction (or truth) of a formula at a world w of a Kripke structure M = W, R, V is inductively defined by:

> M,w |<sup>=</sup> **true**; M,w <sup>|</sup><sup>=</sup> **false**; M,w |<sup>=</sup> <sup>p</sup> iff <sup>w</sup> <sup>∈</sup> <sup>V</sup> (p), where <sup>p</sup> <sup>∈</sup> <sup>P</sup>; M,w |<sup>=</sup> <sup>¬</sup><sup>ϕ</sup> iff M,w <sup>|</sup><sup>=</sup> <sup>ϕ</sup>; M,w |= (<sup>ϕ</sup> <sup>∧</sup> <sup>ψ</sup>) iff M,w |<sup>=</sup> <sup>ϕ</sup> and M,w |<sup>=</sup> <sup>ψ</sup>; M,w |= (<sup>ϕ</sup> <sup>∨</sup> <sup>ψ</sup>) iff M,w |<sup>=</sup> <sup>ϕ</sup> or M,w |<sup>=</sup> <sup>ψ</sup>; M,w |= (<sup>ϕ</sup> <sup>→</sup> <sup>ψ</sup>) iff M,w |<sup>=</sup> <sup>¬</sup><sup>ϕ</sup> or M,w |<sup>=</sup> <sup>ψ</sup>; M,w |<sup>=</sup> ✷<sup>ϕ</sup> iff for every <sup>v</sup>, wRv implies M,v |<sup>=</sup> <sup>ϕ</sup>; M,w |<sup>=</sup> ✸<sup>ϕ</sup> iff there is <sup>v</sup>, wRv and M,v |<sup>=</sup> <sup>ϕ</sup>.

If M,w |<sup>=</sup> <sup>ϕ</sup> then we say that <sup>ϕ</sup> is true at <sup>w</sup> in <sup>M</sup>. A rooted Kripke structure <sup>M</sup> <sup>=</sup> M,w<sup>0</sup> is a *model* of a modal formula <sup>ϕ</sup> iff M,w0 |<sup>=</sup> <sup>ϕ</sup> and <sup>M</sup> satisfies <sup>ϕ</sup>. A modal formula is *satisfiable* iff there exists a Kripke structure M and a world <sup>w</sup> <sup>∈</sup> <sup>M</sup> such that M,w |<sup>=</sup> <sup>ϕ</sup>. A rooted Kripke structure <sup>M</sup> <sup>=</sup> W, R, V, w<sup>0</sup> is a *rooted tree Kripke structure* iff R is a tree, that is, a directed acyclic connected graph where each node has at most one predecessor, with *root* w0.

A *path from* w <sup>0</sup> *to* w <sup>k</sup> *of length* <sup>k</sup>, <sup>k</sup> <sup>≥</sup> 0, in a frame <sup>F</sup> <sup>=</sup> W, R is a sequence (w 0, w 1,...,w <sup>k</sup>) where for every <sup>i</sup>, 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup> <sup>−</sup> 1, <sup>w</sup> <sup>i</sup> R w i+1. A path (w <sup>0</sup>) of length 0 is identified with its root w <sup>0</sup>. In a rooted tree Kripke structure M with root <sup>w</sup><sup>0</sup> for every world <sup>w</sup><sup>k</sup> <sup>∈</sup> <sup>W</sup> there is exactly one path connecting <sup>w</sup><sup>0</sup> and wk; the *modal level* (in M), denoted by mlM(wk), is given by the length of the path from w<sup>0</sup> to wk. More generally, for a rooted Kripke structure M with root w0, the *depth* of a world w<sup>k</sup> (in M), denoted by depthM(wk), is the length of the shortest path from w<sup>0</sup> to wk. The *depth* of M is the maximal depth of a world in <sup>M</sup>. The *outdegree* of a world <sup>w</sup> in <sup>F</sup> is given by |{w <sup>|</sup> wRw }|.

The 15 logics in the modal cube consist of K itself and its extensions with one or more of the modal axioms shown in Table 1. Each of these axioms defines a class of Kripke frames where the accessibility relation R satisfies the firstorder property stated in the table. Combinations Σ of axioms then define a class F<sup>Σ</sup> of Kripke frames where the accessibility relation satisfies the combination of their corresponding properties. Given a logic L = KΣ, a modal formula ϕ is


**Table 1.** Modal axioms and relational frame properties



<sup>L</sup>*-satisfiable* iff there exists a frame <sup>F</sup> <sup>∈</sup> <sup>F</sup>Σ, a valuation <sup>V</sup> and a world <sup>w</sup> <sup>∈</sup> <sup>F</sup> such that <sup>M</sup> <sup>=</sup> F, V, w |<sup>=</sup> <sup>ϕ</sup> and we call <sup>M</sup> an <sup>L</sup>-model of <sup>ϕ</sup>.

A modal formula is in *simplified NNF* (denoted by nnf(ϕ)), if it has been simplified by exhaustively applying the rewrite rules in Table 2, and it is in Negation Normal Form (NNF), that is, a formula where only propositional symbols are allowed in the scope of negations.

The reductions given in the next section produce formulae in a clausal normal form, called *Separated Normal Form with Sets of Modal Levels* SNFsml, given in [29]. The language of SNFsml extends that of the basic modal logic K with sets of modal levels as labels. Clauses in SNFsml have one of the following forms:

S : n <sup>i</sup>=1 <sup>l</sup><sup>i</sup> (literal clause) S : l <sup>→</sup> ✷<sup>l</sup> (positive modal clause) S : l <sup>→</sup> ✸<sup>l</sup> (negative modal clause)

where <sup>S</sup> <sup>⊆</sup> <sup>N</sup> and <sup>l</sup>, <sup>l</sup> , <sup>l</sup><sup>i</sup> are propositional literals with 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, <sup>n</sup> <sup>∈</sup> <sup>N</sup>. We write : ϕ instead of N : ϕ and such clauses are called *global clauses*. Positive and negative modal clauses are together known as *modal clauses*.

Given a rooted tree Kripke structure M and a set S of natural numbers, by M[S] we denote the set of worlds that are at a modal level in S, that is, <sup>M</sup>[S] = {<sup>w</sup> <sup>∈</sup> <sup>W</sup> <sup>|</sup> mlM(w) <sup>∈</sup> <sup>S</sup>}. Then

<sup>M</sup> <sup>|</sup><sup>=</sup> <sup>S</sup> : <sup>ϕ</sup> iff M,w |<sup>=</sup> <sup>ϕ</sup> for every world <sup>w</sup> <sup>∈</sup> <sup>M</sup>[S].

The use of sets as labels allows a concise representation of clauses that might hold in a possibly infinite number of levels.

If <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>S</sup> : <sup>ϕ</sup>, then we say that <sup>S</sup> : <sup>ϕ</sup> *holds in* <sup>M</sup> or *is true in* <sup>M</sup>. For a set <sup>Φ</sup> of labelled formulae, <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup> iff <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>S</sup> : <sup>ϕ</sup> for every <sup>S</sup> : <sup>ϕ</sup> in <sup>Φ</sup>, and we say <sup>Φ</sup> is *K-satisfiable*.

We introduce some notation that will be used in the following. For m, n <sup>∈</sup> <sup>N</sup>, <sup>m</sup> <sup>≤</sup> <sup>n</sup>, let [m. . n] = {m, . . . , n} ⊆ <sup>N</sup>. Let <sup>S</sup><sup>+</sup> <sup>=</sup> {<sup>l</sup> + 1 <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>l</sup> <sup>∈</sup> <sup>S</sup>}, <sup>S</sup><sup>−</sup> <sup>=</sup> {<sup>l</sup> <sup>−</sup> <sup>1</sup> <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>l</sup> <sup>∈</sup> <sup>S</sup>}, and <sup>S</sup><sup>≥</sup> <sup>=</sup> {ln <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>n</sup> <sup>≥</sup> min(S) <sup>≥</sup> <sup>l</sup>}, where min(S) is the least element in S. Note that the restriction of the elements being in N implies that S<sup>−</sup> cannot contain negative numbers.

A formula is in *Separated Normal Form with Modal Levels* (SNFml) [22,23], if it is a conjunction of clauses in on of the following forms:


where ml <sup>∈</sup> <sup>N</sup> ∪ {} and <sup>l</sup>, <sup>l</sup> , <sup>l</sup><sup>i</sup> are propositional literals with 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, <sup>n</sup> <sup>∈</sup> <sup>N</sup>. Effectively, this normal form corresponds to a restriction on the SNFsml where the sets are singletons or , representing all levels.

### **3 Reductions**

#### **3.1 Definitional Reduction**

In [27] we introduced a reduction ρ*sml* <sup>L</sup> (ϕ) that for any modal logic <sup>L</sup> <sup>=</sup> <sup>K</sup><sup>Σ</sup> with Σ ⊆ {B, <sup>D</sup>,T, <sup>4</sup>, <sup>5</sup>}, transforms a modal formula <sup>ϕ</sup> in simplified NNF to a finite set Φsml <sup>L</sup> of clauses in SNFsml such that <sup>ϕ</sup> is <sup>L</sup>-satisfiable iff <sup>Φ</sup>sml <sup>L</sup> is <sup>K</sup>satisfiable. For K4, K5 and their extensions by further axioms, ρ*sml* <sup>L</sup> produces sets of clauses where the labelling sets S are potentially infinite. However, depending on syntactic properties of ϕ it is possible to impose upper bounds on the maximal modal level that occurs in those sets so that the reduction remains satisfiability preserving. Table 3 shows such a bound for each logic in the modal cube. In the table and in the following, for a modal formula ϕ in simplified NNF, (i) d<sup>ϕ</sup> <sup>m</sup> is the modal depth of ϕ, (ii) d<sup>ϕ</sup> ✸ is the maximal nesting of ✸-operators not in the scope of any ✷ operators in ϕ, (iii) n<sup>ϕ</sup> ✷ is the number of ✷-subformulae in ϕ, and (iv) n<sup>ϕ</sup> ✸ is the number of ✸-subformulae below ✷-operators in ϕ. Using these bounds it is then possible to define a function ρ*ml* <sup>L</sup> that transforms a modal formula <sup>ϕ</sup> in simplified NNF to a finite set Φml <sup>L</sup> of clauses in SNFml such that <sup>ϕ</sup> is <sup>L</sup>-satisfiable iff Φml <sup>L</sup> is <sup>K</sup>-satisfiable.

Table 4 shows the definitions of modified reductions ¯ρ*sml* <sup>L</sup> and ¯ρ*ml* <sup>L</sup> to SNFsml and SNFml, respectively. In contrast to ρ*sml* <sup>L</sup> , ¯ρ*sml* <sup>L</sup> already uses the bounds in Table 3 to ensure that all labelling sets S occurring in the reduction of a modal formula remain finite. The function ¯ρml <sup>L</sup> then does not enforce further restrictions, but straightforwardly transforms a finite set of SNFsml-clauses with finite labelling sets into a finite set of SNFml clauses. This presentation of the reduction of modal formulae to a finite set of clauses in SNFml is closer to the implementation of the process in the prover KSP.

Given a finite set Φ of clauses in SNFml we can use a function τ <sup>f</sup> to obtain an equivalent modal formula as follows:

$$\tau^{\mathfrak{f}}(\Phi) = \bigwedge \{ \Box^{ml} C \mid ml: C \in \Phi \}.$$

where ✷<sup>0</sup>ψ = ψ and ✷<sup>n</sup>+1ψ = ✷✷<sup>n</sup>ψ.

**Table 3.** Bounds on the maximal modal level in SNFsml clauses


$$\bar{\rho}\_L^{ml}(\varphi) = \{ml : \psi \mid S : \psi \in \bar{\rho}\_L^{ml}(\varphi) \text{ and } ml \in S\}$$

$$\bar{\rho}\_L^{ml}(\varphi) = \{\{0\} : t\_\xi\} \cup \rho\_L^d(\{0\} : t\_\xi \circ \varphi)$$

$$\text{where } d = d\_L^{ml}(\varphi) \text{ as per Table 3 and } \rho\_L^d \text{ is defined as follows:}$$

$$\rho\_L^d(S : t \to \mathbf{true}) = \emptyset$$

$$\rho\_L^d(S : t \to \mathbf{false}) = \{S : \neg t\}$$

$$\rho\_L^d(S : t \to (\psi\_1 \land \psi\_2)) = \{S : \neg t \lor \eta(\psi\_1), S : \neg t \lor \eta(\psi\_2)\} \cup \delta\_L^d(S, \psi\_1) \cup \delta\_L^d(S, \psi\_2)$$

$$\rho\_L^d(S : t \to \psi) = \{S : \neg t \lor \psi\}$$

$$\text{if } \psi \text{ is a disjunction of literals}$$

$$\rho\_L^d(S : t \to (\psi\_1 \lor \psi\_2)) = \{S : \neg t \lor \eta(\psi\_1) \lor \eta(\psi\_2)\} \cup \delta\_L^d(S, \psi\_1) \cup \delta\_L^d(S, \psi\_2)$$

$$\text{if } \psi\_1 \lor \psi\_2 \text{ is not a disjunction of literals}$$

$$\rho\_L^d(S : t \to \lozenge \psi) = \{S : t \to \lozenge \psi\} \cup \delta\_L^d(S^+, \psi)$$

$$\rho\_L^d(S : t \to \neg t \psi) = P\_L^d(S : t \to \neg t \lor \psi) \delta\_L^d(\delta\_S^d(S, \psi)$$

where η and δ<sup>d</sup> <sup>L</sup> are defined as follows:

$$\eta(\psi) = \begin{cases} \psi, & \text{if } \psi \text{ is a literal} \\ t\_{\psi}, & \text{otherwise} \end{cases} \quad \delta\_L^d(S, \psi) = \begin{cases} \emptyset, & \text{if } \psi \text{ is a literal} \\ \rho\_L^d(S: t\_{\psi} \to \psi), & \text{otherwise} \end{cases}$$

and functions P<sup>d</sup> <sup>L</sup>, lP<sup>d</sup> <sup>L</sup> and lδ<sup>d</sup> <sup>L</sup> are defined as follows:


**Table 4.** ρ¯sml <sup>L</sup> - and ¯ρml <sup>L</sup> -reductions of modal formulae to SNFsml and SNFml, respectively, <sup>Σ</sup> ⊆ {B, <sup>4</sup>, <sup>5</sup>}.

A smaller equivalent formula can be constructed as follows. For a finite set <sup>Φ</sup> of clauses in SNFml let <sup>Φ</sup>[ml] = {<sup>C</sup> <sup>|</sup> ml : <sup>C</sup> <sup>∈</sup> <sup>Φ</sup>} and mlmax = max{ml <sup>|</sup> ml : <sup>C</sup> <sup>∈</sup> <sup>Φ</sup>}. Then

$$\tau^n(\Phi) = \bigwedge \Phi[0] \wedge \Box(\bigwedge \Phi[1] \wedge \Box(\bigwedge \Phi[2] \wedge \cdots \wedge \Box(\bigwedge \Phi[m l\_{\max}]) \cdot \cdots)).\tag{1}$$

Combining ¯ρml <sup>L</sup> and <sup>τ</sup> <sup>n</sup> we can define a reduction <sup>ρ</sup>*def* <sup>L</sup> as

$$
\rho\_L^{def}(\varphi) = \tau^\mathfrak{n}(\bar{\rho}\_L^{ml}(\varphi)).
$$

which we call the *definitional reduction* of ϕ for the modal logic L.

**Theorem 1 (**[30]**).** *Let* <sup>L</sup> <sup>=</sup> *<sup>K</sup>*<sup>Σ</sup> *with* <sup>Σ</sup> ⊆ {*B*, *<sup>D</sup>*,*T*, *<sup>4</sup>*, *<sup>5</sup>*} *and* <sup>ϕ</sup> *be a modal formula in simplified NNF. Then* ϕ *is* L*-satisfiable iff* ρ*def* <sup>L</sup> (ϕ) *is <sup>K</sup>-satisfiable.*

This reduction allows us to use any reasoner for the basic modal logic K as a reasoner for all the logics in the modal cube.

#### **3.2 Axiomatic Reduction**

The reductions ρ*sml* <sup>L</sup> and <sup>ρ</sup>*ml* <sup>L</sup> in [27] were developed as an alternative to and improvement on reductions from the modal logics KB, KD, KT, and K4 to K introduced by Kracht [18]. In contrast to ρ*sml* <sup>L</sup> and <sup>ρ</sup>*ml* <sup>L</sup> which require modal formulae to be in NNF and treat the modal operators ✷ and ✸ differently, Kracht's reductions assumes that (i) modal formulae are not necessarily in NNF and (ii) the only modal operator occurring in modal formulae is ✷ and no distinction is made between positive and negative occurrences of this operator. In the following we extend Kracht's reduction to all logics in the modal cube while adhering to those two assumptions.

Let -<sup>≤</sup>0ψ = ψ and -<sup>≤</sup>n+1<sup>ψ</sup> = (ψ∧✷-<sup>≤</sup><sup>n</sup>ψ). We can then define a reduction ρax <sup>L</sup> for all modal logics <sup>L</sup> in the modal cube as follows:

$$\rho\_L^{ax}(\varphi) = \varphi \land \mathbb{E}^{\le b\_L^{ax}(\varphi)} \bigwedge P\_L^{ax}(\varphi) \tag{2}$$



where b*ax* <sup>L</sup> (ϕ) and <sup>P</sup>*ax* <sup>L</sup> (ϕ) are as defined in Table 5. We call <sup>ρ</sup>*ax* <sup>L</sup> (ϕ) the *axiomatic reduction* of ϕ for the modal logic L.

**Theorem 2** *Let* <sup>L</sup> <sup>=</sup> *<sup>K</sup>*<sup>Σ</sup> *with* <sup>Σ</sup> ⊆ {*B*, *<sup>D</sup>*,*T*, *<sup>4</sup>*, *<sup>5</sup>*} *and* <sup>ϕ</sup> *be a modal formula in simplified NNF. Then* ϕ *is* L*-satisfiable iff* ρ*ax* <sup>L</sup> (ϕ) *is <sup>K</sup>-satisfiable.*

Just as the definitional reduction, the axiomatic reduction allows us to use any reasoner for basic modal logic as a reasoner for all the logics in the modal cube.

#### **3.3 Discussion**

There are five main differences between the definitional reduction and the axiomatic reduction, and between the axiomatic reduction and the work in [18]:

1. The axiomatic reduction for all logics except the logics KB, KD, KT, K4 is new. Kracht [18] did define a reduction from K5 to K4, but since K5 is not a subset of K4, this reduction is not correct. Our definition of the axiomatic reduction corrects that mistake while remaining close to the Kracht's original idea by adding instances of 4 at modal levels greater than 0.

The bounds given for KB, KD, and KT given in Table 5 are the same as Kracht's [18]. However, for K4 he used a bound given by the number of distinct subformulae of the formula ϕ under consideration. We are able to show that a bound given by the number of distinct ✷-subformulae is sufficient. For the remaining logics, the bounds are new.


For example, consider the formula ✷✷p in KB. Then with the axiomatic reduction we obtain the formula

$$(\Box \Box p \land \boxplus \overline{\Box}^{\leq 2}((\neg p \to \sqsupset \neg \sqsupset p) \land (\neg \Box p \to \sqsupset \neg \Box \neg \Box \Box p)))$$

which itself is a formula of modal depth 5. With the definitional reduction, we obtain

$$(t\_{\square \square p} \wedge (t\_{\square \square p} \to \square t\_{\square p}) \wedge \square (t\_{\square p} \to \square p) \wedge (p \lor t\_{\square \neg t\_{\square p}}) \wedge (t\_{\square \neg t\_{\square p}} \to \square \neg t\_{\square p}))$$

which is a formula of modal depth 2.

Taking this into account we can see that for <sup>L</sup> <sup>=</sup> <sup>K</sup><sup>Σ</sup> where <sup>Σ</sup> ⊆ {B, <sup>D</sup>,T} we can expect the modal depth of ρ*def* <sup>L</sup> (ϕ) to be less than or equal to that of ρ*ax* <sup>L</sup> (ϕ), while for the remaining logics of the modal cube it depends on the individual formula which reduction will produce a formula of greater modal depth. Nevertheless, for logics such as K4 we expect that the modal depth of ρ*ax* <sup>L</sup> (ϕ) will often be drastically lower than that of <sup>ρ</sup>*def* <sup>L</sup> (ϕ).


We will revisit the effect that Points 2, 3, and 4 have on the size and modal depths of formulae, on the performance of provers, and the models they may produce in the next section.

#### **4 Evaluation**

In our evaluation we compare the effect of using the definitional reduction and the axiomatic reduction as input for three provers for K: CEGARBox [10], Spartacus [13], and KSP [24,30]. Spartacus and CEGARBox were included as they presented best performance in recent evaluations [10,24–26,29,30] when compared with several other provers with built-in support for modal logics: BDDTab [12], FaCT++ [34], InKreSAT [16], SPASS [33], and Leo-III+**E** [8,31].

We have included two more approaches in the comparison: (i) the *global modal resolution* (GMR) calculi [21] that include specific inference rules for each of the logics in the modal cube, implemented in KSP; (ii) *modal layered resolution* (MLR) calculi [22] together with the reductions given in Table 4, again implemented in KSP. The first is an example of 'native' reasoning in the logics concerned, while the inclusion of latter allows us to investigate the effect of 'internalising' the reduction and having inference rules that operate on modal clauses. Both calculi support several refinements of resolution. We report only results for the ordered refinement (cord) as it was the best performing overall.

The two reductions combined with CEGARBox, Spartacus, and KSP and the GMR and MLR calculi in KSP give us a total of eight different approaches.

We have used the benchmarks introduced in [27], which comprise<sup>1</sup> (i) 100 unsatisfiable formulae for each of the logics being considered; these are based on 20 formulae each from 5 classes of the LWB benchmark collection [3] modified so that the formulae for logic L are only unsatisfiable in L and its extensions; and also (ii) 100 formulae that are S5-satisfiable, that is, formulae that are satisfiable in all 15 logics; these consist of 20 formulae each from 5 classes of the LWB benchmark collection.

We have supplied all reductions and provers with preprocessed formulae extracted from KSP. The simplified negation normal form for a formula <sup>ϕ</sup>, nnf(ϕ), is generated by KSP as follows. First, the formula is rewritten into box normal form [28], a normal form similar to the negation normal form, but where the operator ✸ is rewritten as ¬✷¬. To the resulting formula, we apply prenexing [20], that is, moving the modal operators outwards as much as possible. The simplification rules given in Table 2 are then applied together with pure literal elimination (i.e. replacing occurrences of pure literals by **true**) and constant propagation. Table 6 shows the effect of all these preprocessing steps on average size, average modal depth, and average number of boxes in our benchmark formulae, separately for unsatisfiable (U) and satisfiable (S) formulae. Over all formulae we get a 20% reduction in size and a 66% reduction in the number of ✷-operators. The modal depth remains unchanged which is an indication of the robustness of the benchmarks.

For the axiomatic reduction, the resulting formula is then extracted from KSP and the reduction according to Eq. 2 and Table 5 is applied externally. For the


**Table 6.** Effect of preprocessing on benchmark formulae

<sup>1</sup> Input files for the provers used here and the source for KSP are available at http:// nalon.org/#software.

definitional reduction, the formula is not extracted but transformed by KSP into SNFml according to Tables 3 and 4. During the transformation into the normal form, complex subformulae are replaced by the same symbol in all positions they might occur. After transformation into SNFml, the kept clauses are extracted from KSP and used to produce the modal formula for the definitional reduction according to Eq. 1.

Table 7 shows experimental results comparing the performance of the eight approaches. The first three columns of the table show the logic, the satisfiability status of the formulae for our benchmark collection used for this logic ('U' for 'unsatisfiable, 'S' for 'satisfiable'), and their number. In total we have 30 *sets of benchmark formulae*. The next eight columns then show how many of those formulae were solved by each of the eight approaches. A time limit of 100 CPU seconds was set for each formula and where a reduction is used the time taken includes the computation of the reduction. The highest number or numbers in each row are highlighted in bold. The last six columns show the results for ρ*def* L and ρ*ax* <sup>L</sup> combined with CEGARBox, Spartacus, and KSP. Here, for each logic <sup>L</sup> and each satisfiability status we have indicated with italics which reduction resulted in better performance for each of the three provers. In the following we call each such pair a *comparison point*. Benchmarking was performed on a PC with an AMD Ryzen 5 5600X CPU @ 4.60 GHz max and 64 GB main memory using Fedora release 37 as operating system.

For both satisfiable and unsatisfiable benchmark formulae, the combination of the definitional reduction with CEGARBox performs best. Overall, it solves 25% more formulae than the second best approach, the GMR calculi in KSP. CEGARBox with the definitional reduction also outperforms CEGARBox with the axiomatic reduction on both satisfiable and unsatisfiable benchmark formulae. The same is true for the MLR calculus in KSP when combined with one of the two reductions and for Spartacus on satisfiable benchmark formulae when combined with one of the two reductions.

We can see that the internal transformation to SNFml together with the MLR calculus in KSP performs better than first computing the definitional reduction ρ*def* <sup>L</sup> and then handing the resulting formula to KSP. The former approach performs better on 26 out of 30 sets of benchmark formulae. This is not surprising since in the latter case KSP does apply the transformation into SNFml again. This implies that new propositional symbols are introduced when applying renaming and new clauses are added defining those symbols. Also, for the ordered resolution refinement we use, all literals in the scope of modal operators will be renamed in order to retain completeness [22]. Again, for each renamed literal there will be an additional clause. Overall, KSP will perform inferences with a larger set of SNFml clauses over a larger set of propositional symbols. This is bound to degrade performance in most cases.

Looking at individual logics, a more varied picture is evident. Consider both satisfiable and unsatisfiable benchmark formulae for the logics K5, KD5, K4B (which is the same logic as K5B), K45, KD45, and S5 and the behaviour of Spartacus and KSP with one of the two reductions on these. Of these 24 comparison points, the axiomatic reduction results in better performance on 21 and


**Table 7.** Performance of KΣ provers, ρ*def* <sup>L</sup> combined with K provers and ρ*ax* <sup>L</sup> combined with K provers

the definitional reduction only on 3. In particular, Spartacus with the axiomatic reduction consistently shows better performance for these logics than with the definitional reduction. In stark contrast, CEGARBox with the definitional reduction still performs better on 10 out of 12 comparison points. Interestingly, this advantage of the axiomatic reduction does not carry over to K4 and its extensions KD4 and S4. Here, with exceptions of 3 out of 18 comparison points, the definitional reduction with one of CEGARBox, KSP, and Spartacus leads to better performance than the axiomatic reduction.


**Table 8.** Comparison of axiomatic and definitional reduction combined with Spartacus on satisfiable benchmark formulae.

We can gain additional insight by looking in more detail at the behaviour of provers. While this would be most beneficial for CEGARBox, this tool currently only outputs the satisfiability status of formulae but neither models nor proofs. Instead we turn to Spartacus which can output models for satisfiable formulae. Table 8 shows information on the input formulae that were given to Spartacus, resulting from one of our reductions, and the models that Spartacus produced. The first four columns show the logic, the reduction that was used, how many satisfiable benchmark formulae (out of 100) Spartacus was able to solve, and how many formulae it was able to solve with both reductions. The number in the fourth column is not necessarily the minimum of the two numbers in the


**Table 9.** Comparison of axiomatic and definitional reduction combined with KSP on unsatisfiable benchmark formulae.

third column for a particular logic. The next two columns contain the average size and average modal depth of ρ*def* <sup>L</sup> and <sup>ρ</sup>*ax* <sup>L</sup> where Spartacus solve both. Finally, the last three columns contain the average number of worlds, number of edges, and depth of the models for these formulae. Spartacus uses blocking, even for the modal logic K, and the models it produces are not trees but general graphs. A fine-grained analysis on the level of individual formulae shows that, with the exception of the logic KB4, it is generally the case that the reduction that produces smaller formulae leads Spartacus to produce smaller models, and thereby also leads to more formulae being solved. Only for KB4 are there more instances where a larger formula resulting from a reduction, namely the definitional reduction, lead to smaller models. However, it is still the case that axiomatic reduction then allows more formulae to be solved for KB4.

For unsatisfiable formulae we consider KSP. Table 9 shows information on the input formulae that were given to KSP, resulting from one of our reductions, and the proof search conducted by KSP. The first six columns correspond to those in Table 8. The final three columns contain the average number of inference steps KSP requires to find a proof, the average size of those proofs, and the average maximal modal level of a clause in those proofs. Again we see that the reduction that produces smaller formulae, with few exceptions, also leads KSP to find proofs in fewer inference steps and allows it to solve more formulae.

### **5 Conclusions**

The axiomatic and the definitional reductions from logics in the modal cube to basic modal logic that we have presented in this paper allow any decision procedure for basic modal logic to be used to solve the satisfiability problem in all 15 logics of the modal cube. This is of particular interest as over the last 25 years, a range of decision procedures for basic modal logic have been implemented and improved [2,6,7,10–13,15,34] but only few implemented decision procedures for all logics of the modal cube exist. Our empirical results also indicate that such reductions are not only a theoretical possibility but are effective and efficient: the combination of the definitional reduction with CEGARBox is currently the best performing approach on our collection of benchmark formulae for the modal cube. There are a number of other contributing factors to the efficiency of the approach that are also beneficial outside the context of reductions. Preprocessing techniques such as simplification and prenexing can reduce the size and, in the context of modal logics, the number of modal operators in a modal formula. The use of surrogate propositional symbols and of a clausal normal form allows to again reduce the size and structural complexity of formulae.

Despite the positive empirical results, we nevertheless hope that more provers that natively support all the logics of the modal cube will be implemented. At the moment our comparison is limited to our own resolution-based prover KSP. Support for modal logics except K in other provers is often limited to KD, KT, and S4. A wider range of provers for all logic in the modal cube would allow us to establish the robustness of our empirical results and possibly enable us to identify strength and weaknesses relative to native provers. It would be beneficial if such support for native reasoning in a logics of the modal cube would also include the provisions of proofs for unsatisfiable formulae and models for satisfiable formulae as well as some abstract measure of the computational effort expended in finding those. This is paramount for our ability to explain the behaviour of prover on our benchmarks.

Finally, our collection of benchmark formulae requires further refinement. Some of the satisfiable formulae in that collection seem to allow rather small models and overall do not appear to be sufficiently challenging across all the logics. We will need to investigate whether this can be remedied simply by moving to higher parameter values for these parameterised classes of formulae or whether completely new classes of formulae are required.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Left-Linear Completion with AC Axioms

Johannes Niederhauser1(B) , Nao Hirokawa<sup>2</sup> , and Aart Middeldorp<sup>1</sup>

<sup>1</sup> Department of Computer Science, Universität Innsbruck, Innsbruck, Austria johannes.niederhauser@student.uibk.ac.at, aart.middeldorp@uibk.ac.at <sup>2</sup> School of Information Science, JAIST, Nomi, Japan hirokawa@jaist.ac.jp

Abstract. We revisit AC completion for left-linear term rewrite systems where AC unification is avoided and the normal rewrite relation can be used in order to decide validity questions. To that end, we give a new correctness proof for finite runs and establish a simulation result between the two inference systems known from the literature. Furthermore, we show how left-linear AC completion can be simulated by general AC completion. In particular, this result allows us to switch from the former to the latter at any point during a completion process. Finally, we present experimental results for our implementation of left-linear AC completion in the tool accompll.

Keywords: Completion · AC axioms · Term rewriting

### 1 Introduction

Completion has been extensively studied since its introduction in the seminal paper by Knuth and Bendix [10]. One of the main limitations of the original formulation is its inability to deal with equations which cannot be oriented into a terminating rule such as the commutativity axiom. This shortcoming can be resolved by completion modulo an equational theory E. In the literature, there are two different approaches of achieving this. The general approach [3,6] requires E-unification and allows us to decide validity problems using the rewrite relation →R/<sup>E</sup> which is defined as ↔<sup>∗</sup> <sup>E</sup> · →<sup>R</sup> · ↔<sup>∗</sup> <sup>E</sup> . For left-linear term rewrite systems, however, there is Huet's approach [5] which avoids E-unification and allows us to decide validity problems with the normal rewrite relation →<sup>R</sup> and a single check for E-equivalence of the computed normal forms. In their respective books, Avenhaus [1] and Bachmair [3] present inference systems for left-linear completion modulo an equational theory. In this paper, we revisit slightly modified versions (A and B) of these inference systems for finite runs. In addition to a new correctness proof for A in the spirit of [4] which does not rely on proof orderings (Sect. 3), we reduce correctness of B to the correctness of A by establishing a simulation result between finite runs in these systems (Sect. 4). For

c The Author(s) 2023

Supported by JSPS KAKENHI Grant Number JP22K11900.

B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 401–418, 2023. https://doi.org/10.1007/978-3-031-38499-8\_23

the concrete equational theory of associative and commutative (AC) function symbols, we also show the connection between the inference system A and general AC completion by means of another simulation result (Sect. 5). Finally, we present experimental results obtained from our implementation of A for AC in the tool accompll which show that the avoidance of AC unification can result in significant performance improvements over general AC completion (Sects. 6 and 7).

### 2 Preliminaries

We assume familiarity with term rewriting and completion as described e.g. in [2] but recall some central notions. We consider term rewriting systems (TRSs) which operate on terms over a given signature F. Terms which do not contain the same variable more than once are referred to as *linear* terms. We say that a TRS is *left-linear* if is a linear term for every rule - → r ∈ R. A TRS R is *terminating* if the associated rewrite relation →<sup>R</sup> is well-founded. In that case, we write <sup>s</sup> <sup>→</sup>! <sup>R</sup> <sup>t</sup> if <sup>t</sup> is a normal form of <sup>s</sup>. A TRS <sup>R</sup> is *confluent* if different computation paths can always be joined, i.e., <sup>R</sup> ∗ →· →<sup>∗</sup> <sup>R</sup> ⊆ →<sup>∗</sup> <sup>R</sup> · <sup>R</sup> ∗ →. An important sufficient criterion for confluence is the well-known critical pair lemma which states that a terminating TRS is confluent if all non-trivial overlaps between left-hand sides of rules (*critical pairs*) are joinable. Furthermore, there is the notion of *prime critical pairs* [8] which further restricts the considered critical peaks <sup>t</sup> <sup>R</sup> p →<sup>s</sup> <sup>→</sup>- <sup>R</sup> <sup>u</sup> to the ones where all proper subterms of <sup>s</sup>|<sup>p</sup> are irreducible. In particular, terminating TRSs whose prime critical pairs are joinable are also confluent. The set of (prime) critical pairs is denoted by CP(R) (PCP(R)). We define CP(R1, <sup>R</sup>2) as the set of all critical pairs stemming from local peaks of the form <sup>t</sup> <sup>R</sup> p 1 →<sup>s</sup> <sup>→</sup>- <sup>R</sup><sup>2</sup> <sup>u</sup> and CP±(R1, <sup>R</sup>2) = CP(R1, <sup>R</sup>2)∪CP(R2, <sup>R</sup>1). A TRS is *complete* if it is terminating and confluent. Hence, a complete presentation R of an equational system (ES) E can be used to decide the validity problem for E: s ↔<sup>∗</sup> <sup>E</sup> <sup>t</sup> if and only if <sup>s</sup> <sup>→</sup>! <sup>R</sup> · <sup>R</sup> ! →t.

We now turn our attention to rewriting modulo AC function symbols. To that end, we start by giving general definitions for abstract rewrite systems (ARSs). Let A = A,→ be an ARS and ∼ an equivalence relation on A. We write ⇔ for ←∪→∪∼, →/∼ for ∼·→·∼ and ↓<sup>∼</sup> for →<sup>∗</sup> ·∼· <sup>∗</sup> →. Given A, we denote A,→/∼ by A/∼. The ARS A is *terminating modulo* ∼ if there are no infinite rewrite sequences with →/∼ and *Church–Rosser modulo* ∼ if ⇔<sup>∗</sup> ⊆ ↓<sup>∼</sup>. The ARS A is *complete modulo* ∼ if it is terminating modulo ∼ and Church–Rosser modulo ∼. While there is no distinction for termination modulo ∼ between A and A/∼ (∼·∼ = ∼ by transitivity), it makes a considerable difference whether we talk about the Church–Rosser modulo ∼ property and therefore completeness modulo ∼ of A or A/∼. The following lemma is taken from [1, Lemma 4.1.12]. It establishes an important connection between the Church–Rosser modulo ∼ property of an ARS A and A/∼.

Lemma 1. *Let* A = A,→ *and* A = A, *be ARSs and* ∼ *an equivalence relation on* A *such that* → ⊆ ⊆ →/∼*. If* A *is Church–Rosser modulo* ∼ *then* A/∼ *is Church–Rosser modulo* ∼*.*

The definitions and results for ARSs carry over to TRSs by replacing the equivalence relation ∼ by the equational theory ↔<sup>∗</sup> <sup>B</sup> of an ES <sup>B</sup>. Most theoretical results of this paper are not specific to AC but hold for an arbitrary base theory <sup>B</sup> of which we only demand that <sup>V</sup>ar(-) = <sup>V</sup>ar(r) for all - ≈ r ∈ B. We abbreviate ↔<sup>∗</sup> <sup>B</sup> by <sup>∼</sup><sup>B</sup> and the rewrite relation <sup>→</sup>R/<sup>B</sup> is defined as <sup>∼</sup><sup>B</sup> · →<sup>R</sup> · ∼B. Furthermore, we write ↓<sup>∼</sup> <sup>R</sup> for the relation <sup>→</sup><sup>∗</sup> <sup>R</sup> · ∼<sup>B</sup> · <sup>R</sup> <sup>∗</sup> . Termination modulo → B is shown by B*-compatible reduction orders* >, i.e., > is well-founded, closed under contexts and substitutions and ∼<sup>B</sup> · > · ∼<sup>B</sup> ⊆ >. This paper deals with a completion procedure which produces TRSs R such that R (rather than R/B) is complete modulo B. In particular, the completion procedure uses the joinability with respect to ↓<sup>∼</sup> <sup>R</sup> of CP(R)∪CP±(R, <sup>B</sup><sup>±</sup>) where <sup>B</sup><sup>±</sup> denotes B∪{<sup>r</sup> <sup>≈</sup> - | - ≈ r ∈ B} as a sufficient and necessary criterion for the Church–Rosser modulo B property of a B-terminating TRS R. Note that this criterion works with standard critical pairs and therefore does not need unification modulo B. However, the criterion is not valid for non-left-linear TRSs as the following example shows.

*Example 1.* Consider the TRS <sup>R</sup> consisting of the single rule <sup>f</sup>(x, x) <sup>→</sup> <sup>x</sup> with + as an additional AC function symbol. There are no critical pairs in R and between <sup>R</sup> and AC, so CP(R) = CP±(R,AC±) = <sup>∅</sup>. Now consider the conversion <sup>f</sup>(<sup>x</sup> <sup>+</sup> y, y <sup>+</sup> <sup>x</sup>) <sup>∼</sup>AC <sup>f</sup>(<sup>x</sup> <sup>+</sup> y, x <sup>+</sup> <sup>y</sup>) <sup>→</sup><sup>R</sup> <sup>x</sup> <sup>+</sup> <sup>y</sup>. According to the criterion, <sup>f</sup>(<sup>x</sup> <sup>+</sup> y, y + x) ↓<sup>∼</sup> <sup>R</sup> <sup>x</sup> <sup>+</sup> <sup>y</sup> should hold, but this is clearly not the case.

### 3 Avenhaus' Inference System

The idea of completion modulo an equational theory B for left-linear systems where the normal rewrite relation can be used to decide validity problems has been put forward by Huet [5]. To the best of our knowledge, inference systems for this approach are only presented in the books by Avenhaus [1] and Bachmair [3]. This section presents a new correctness proof of a version of Avenhaus' inference system for finite runs in the spirit of [4] which does not rely on proof orderings. Correctness of Bachmair's system is established by a simulation result in Sect. 4.

#### 3.1 Inference System

Definition 1. *The inference system* <sup>A</sup> *is parameterized by a fixed* <sup>B</sup>*-compatible reduction order* > *on terms. It transforms pairs consisting of an ES* E *and a TRS* R *over the common signature* F *according to the following inference rules where* s ≈˙ t *denotes either* s ≈ t *or* t ≈ s*:*

$$\begin{array}{c} \begin{array}{c} \mathcal{E}, \mathcal{R} \\ \mathcal{E} \cup \{s \approx t\}, \mathcal{R} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \mathcal{E}, \mathcal{R} \\ \end{array} \\ \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \mathcal{E} \end{array} \\ \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \mathcal{E} \end{array} \cup \begin{array}{c} \begin{array}{c} \mathcal{E} \end{array} \\ \end{array} \begin{array}{c} \begin{array}{c} \mathcal{E} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \mathcal{E} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \mathcal{E} \end{array} \end{array} \end{array} \end{array} \end{array}$$

E, R <sup>E</sup>, R∪{<sup>t</sup> <sup>→</sup> <sup>s</sup>} *if* <sup>s</sup> <sup>R</sup> →· ↔<sup>B</sup> <sup>t</sup> delete E{<sup>s</sup> <sup>≈</sup> <sup>t</sup>}, <sup>R</sup> <sup>E</sup>, <sup>R</sup> *if* <sup>s</sup> <sup>∼</sup><sup>B</sup> <sup>t</sup>

$$\begin{array}{ll}\mathsf{Simplify} & \frac{\mathcal{E}\uplus\{s\dot{\simeq}t\},\mathcal{R}}{\mathcal{E}\cup\{u\approx t\},\mathcal{R}} & \mbox{if } s\to\mathsf{R}\_{\mathcal{R}/\mathsf{B}}\ u & \mbox{ collapses} & \frac{\mathcal{E},\mathcal{R}\uplus\{t\to s\}}{\mathcal{E}\cup\{u\approx s\},\mathcal{R}} & \mbox{if } t\to\mathsf{R}\ u\\ & & \mbox{compute} & \frac{\mathcal{E},\mathcal{R}\uplus\{s\to t\}}{\mathcal{E},\mathcal{R}\cup\{s\to u\}} & \mbox{if } t\to\mathsf{R}/\mathsf{B}\ u \end{array}$$

A step in an inference system <sup>I</sup> from an ES <sup>E</sup> and a TRS <sup>R</sup> to an ES <sup>E</sup> and a TRS R is denoted by (E, R) <sup>I</sup> (E , R ). The parentheses of the pairs are only used when the expression is surrounded by text in order to increase readability. In the following, PCP±(R, <sup>B</sup><sup>±</sup>) denotes the restriction of CP±(R, <sup>B</sup><sup>±</sup>) to prime critical pairs but where irreducibility is always checked with respect to R, i.e., the critical peaks <sup>t</sup> <sup>R</sup> p →<sup>s</sup> <sup>↔</sup>- <sup>B</sup> <sup>u</sup> and <sup>t</sup> <sup>↔</sup><sup>p</sup> <sup>B</sup> <sup>s</sup> <sup>→</sup>- <sup>R</sup> <sup>u</sup> are both prime if all proper subterms of s|<sup>p</sup> are irreducible with respect to R.

Definition 2. *Let* E *be an ES. A finite sequence*

$$\mathcal{E}\_0, \mathcal{R}\_0 \vdash\_{\mathsf{A}} \mathcal{E}\_1, \mathcal{R}\_1 \vdash\_{\mathsf{A}} \dots \vdash\_{\mathsf{A}} \mathcal{E}\_n, \mathcal{R}\_n$$

*with* <sup>E</sup><sup>0</sup> <sup>=</sup> <sup>E</sup> *and* <sup>R</sup><sup>0</sup> <sup>=</sup> <sup>∅</sup> *is a* run *for* <sup>E</sup>*. If* <sup>E</sup><sup>n</sup> <sup>=</sup> <sup>∅</sup>*, the run* fails*. The run is* fair *if* R<sup>n</sup> *is left-linear and the following inclusions hold:*

$$\mathsf{PCP}(\mathcal{R}\_n) \subseteq \bigcup\_{i=0}^{\sim} \cup \bigcup\_{i=0}^n \leftrightarrow\_{\mathcal{E}\_i \cup \mathcal{R}\_i} \qquad \mathsf{PCP}^\pm(\mathcal{R}\_n, \mathcal{B}^\pm) \subseteq \bigcup\_{i=0}^n \cup \bigcup\_{i=0}^n \leftrightarrow\_{\mathcal{R}\_n}$$

Intuitively, fair and non-failing runs yield a B-complete presentation R<sup>n</sup> of the initial set of equations E, i.e., ↔<sup>∗</sup> E∪B <sup>=</sup> <sup>↔</sup><sup>∗</sup> <sup>R</sup>*<sup>n</sup>* ∪B ⊆ ↓<sup>∼</sup> <sup>R</sup>*<sup>n</sup>* . In particular, the inference rules are designed to preserve the equational theory augmented by B. The following example shows that deducing *local cliffs* (<sup>R</sup> →· ↔B) as rules as well as the restriction to <sup>→</sup><sup>R</sup> in the collapse rule are crucial properties of the inference system.

*Example 2.* Consider the ES <sup>E</sup> consisting of the single equation <sup>x</sup>+<sup>0</sup> <sup>≈</sup> <sup>x</sup> where <sup>+</sup> is an AC function symbol. We clearly have <sup>0</sup>+<sup>x</sup> <sup>↔</sup><sup>∗</sup> E ∪AC <sup>x</sup>, so an AC complete system <sup>C</sup> representing <sup>E</sup> has to satisfy <sup>0</sup>+<sup>x</sup> <sup>↓</sup><sup>∼</sup> <sup>C</sup> <sup>x</sup>. There is just one way to orient the only equation in <sup>E</sup>, which results in the rule <sup>x</sup> <sup>+</sup> <sup>0</sup> <sup>→</sup> <sup>x</sup>. Since we want our run to be fair, we add the rules stemming from the prime critical pairs between <sup>x</sup> <sup>+</sup> <sup>0</sup> <sup>→</sup> <sup>x</sup> and AC±:

$$\mathbf{0} + x \rightarrow x \quad x + (\mathbf{0} + y) \rightarrow x + y \quad x + (y + \mathbf{0}) \rightarrow x + y \quad (x + y) + \mathbf{0} \rightarrow x + y$$

If collapsing with →R/AC is allowed, all these rules become trivial equations and can therefore be deleted. Thus, the modified inference system allows for a fair run which is not complete as <sup>0</sup> <sup>+</sup> <sup>x</sup> <sup>↓</sup><sup>∼</sup> <sup>R</sup> <sup>x</sup> does not hold for <sup>R</sup> <sup>=</sup> {<sup>x</sup> <sup>+</sup> <sup>0</sup> <sup>→</sup> <sup>x</sup>}. Furthermore, if we add pairs of terms stemming from local cliffs as equations, we get the same result by applications of simplify.

The inference system presented in Definition 1 is almost the same as the one presented by Avenhaus in [1]. However, since we only consider finite runs, the encompassment condition for the collapse rule has been removed in the spirit of [13]. The following example shows that this can lead to smaller B-complete systems.

*Example 3.* Consider the ES <sup>E</sup> <sup>=</sup> {f(<sup>x</sup> <sup>+</sup> <sup>y</sup>) <sup>≈</sup> <sup>f</sup>(x) + <sup>f</sup>(y)} where <sup>+</sup> is an AC symbol. The inference system presented in [1] produces the AC complete system

$$\mathbf{f}(x+y) \to \mathbf{f}(x) + \mathbf{f}(y) \qquad \qquad \qquad \mathbf{f}(y+x) \to \mathbf{f}(x) + \mathbf{f}(y)$$

in which either of the rules could be collapsed if it was allowed to collapse with the other rule. In [1] this is prevented by an encompassment condition which essentially forbids to collapse at the root position with a rewrite rule whose lefthand side is a variant of the left-hand side of the rule which should be collapsed. However, this is possible with the system presented in this paper, so for an AC complete representation just one of the two rules suffices.

#### 3.2 Confluence Criterion

The confluence criterion used in the correctness proof of A is an extended version of the one used in [4] which we dub *peak-and-cliff decreasingness*. In the following, we assume that equivalence relations ∼ are defined as the reflexive and transitive closure of a symmetric relation , so ∼ = <sup>∗</sup>. Furthermore, we assume that steps are labeled with labels from a set I, so let A = A, {→α}<sup>α</sup>∈<sup>I</sup> be an ARS and ∼ = ( <sup>α</sup>∈<sup>I</sup> <sup>α</sup>)∗an equivalence relation on <sup>A</sup>.

Definition 3. *The ARS* A *is* peak-and-cliff decreasing *if there is a well-founded order* > *on* I *such that for all* α, β ∈ I *the inclusions*

$$
\alpha \leftarrow \cdot \rightarrow\_{\beta} \subseteq \xleftrightarrow{\ast}\_{\overrightarrow{\vee\alpha\beta}} \qquad \qquad \qquad \qquad \alpha \leftarrow \cdot \vdash \rightsquigarrow \subseteq \xleftrightarrow{\ast}\_{\overrightarrow{\vee\alpha}} \cdot \xleftarrow{=} \cdot \cdot}
$$

*hold. Here* <αβ *denotes the set* {γ ∈ I | α > γ or β > γ} *and if* J ⊆ I *then* →<sup>J</sup> *denotes* <sup>γ</sup>∈<sup>J</sup> <sup>→</sup>γ*. We simplify* <αα *to* <α*.*

Lemma 2. *Every conversion modulo* ∼ *is either a valley modulo* ∼ *or contains a local peak or cliff:*

⇔<sup>∗</sup> ⊆ ↓<sup>∼</sup> ∪ ⇔<sup>∗</sup> ·←·→·⇔<sup>∗</sup> ∪ ⇔<sup>∗</sup> · ·→·⇔<sup>∗</sup> ∪ ⇔<sup>∗</sup> ·←· · ⇔<sup>∗</sup>

The proof of the following theorem is based on a well-founded order on multisets. We denote the multiset extension of an order > by >mul. It is well-known that the multiset extension of a well-founded order is also well-founded.

Theorem 1. *If* A *is peak-and-cliff decreasing then* A *is Church–Rosser modulo* ∼*.*

*Proof.* With every conversion C we associate a multiset M<sup>C</sup> consisting of labels of its rewrite and equivalence relation steps. Since A is peak-and-cliff decreasing, there is a well-founded order > on I which allows us to replace conversions C of the forms <sup>α</sup> →· →β, <sup>α</sup> →· <sup>β</sup> and <sup>β</sup> · →αby conversions C where M<sup>C</sup> >mul MC- . Hence, we prove that A is Church–Rosser modulo ∼, i.e., ⇔<sup>∗</sup> ⊆ ↓∼, by well-founded induction on >mul. Consider a conversion a ⇔<sup>∗</sup> b which we call C. By Lemma 2 we either have a ↓<sup>∼</sup> b (which includes the case that C is empty) or one of the following cases holds:

a ⇔<sup>∗</sup> ·←·→·⇔<sup>∗</sup> b a ⇔<sup>∗</sup> ·←· · ⇔<sup>∗</sup> b a ⇔<sup>∗</sup> · ·→·⇔<sup>∗</sup> b

If a ↓<sup>∼</sup> b we are immediately done. In the remaining cases, we have a local peak or cliff with concrete labels α and β, so M<sup>C</sup> = Γ<sup>1</sup> {α, β} Γ2. Since A is peak-and-cliff decreasing, there is a conversion C with M<sup>C</sup>- = Γ<sup>1</sup> Γ Γ<sup>3</sup> where {α, β} >mul Γ. Hence, M<sup>C</sup> >mul M<sup>C</sup> and we finish the proof by applying the induction hypothesis.

In the following, we connect the joinability of local peaks and cliffs to the joinability of prime critical pairs which allows us to apply peak-and-cliff decreasingness in the correctness proof of A.

Definition 4. *Given a TRS* R *and terms* s*,* t *and* u*, we write* t <sup>s</sup> <sup>u</sup> *if* <sup>s</sup> <sup>→</sup><sup>+</sup> <sup>R</sup> <sup>t</sup>*,* <sup>s</sup> <sup>→</sup><sup>+</sup> <sup>R</sup> <sup>u</sup>*, and* <sup>t</sup> <sup>↓</sup><sup>R</sup> <sup>u</sup> *or* <sup>t</sup> <sup>↔</sup>PCP(R) <sup>u</sup>*. We write* <sup>t</sup> -∼ <sup>s</sup> <sup>u</sup> *if* <sup>s</sup> <sup>→</sup><sup>+</sup> <sup>R</sup> <sup>t</sup>*,* <sup>s</sup> <sup>∼</sup> <sup>u</sup> *and* t ↓<sup>∼</sup> <sup>R</sup> <sup>u</sup> *or* <sup>t</sup> <sup>↔</sup>PCP±(R,B±) <sup>u</sup>*. Furthermore,* - ∼ <sup>s</sup> = {(u, t) | t -∼ <sup>s</sup> u}*.*

Lemma 3. *Let* R *be a left-linear TRS. The following two properties hold:*

*1. If* t <sup>R</sup> →s →<sup>R</sup> u *then* t -2 <sup>s</sup> u*. 2. If* t <sup>R</sup> →s ↔<sup>B</sup> u *then* t <sup>s</sup> · -∼ <sup>s</sup> u*.*

### 3.3 Correctness Proof

We show that every fair and non-failing finite run results in a B-complete presentation. To this end, we first verify that inference steps in A preserve convertibility. We abbreviate E∪R∪B to ERB and E ∪ R ∪ B to ERB .

Lemma 4. *If* (E, R) <sup>A</sup> (E , R ) *then the following inclusions hold:*

−−−→ERB <sup>⊆</sup> <sup>=</sup> −−−−→ R- /<sup>B</sup> · ( <sup>=</sup> −−−→ER- <sup>∪</sup> <sup>∗</sup> ←→<sup>B</sup> ) · <sup>=</sup> ←−−−− <sup>R</sup>- /<sup>B</sup> −−−→ ERB-<sup>⊆</sup> <sup>∗</sup> ←−−→ERB

Corollary 1. *If* (E, R) <sup>∗</sup> <sup>A</sup> (E , R ) *then* <sup>∗</sup> ←−−→ERB <sup>=</sup> <sup>∗</sup> ←−−−→ ERB-*.*

Lemma 5. *If* (E, R) <sup>∗</sup> <sup>A</sup> (E , R ) *and* R ⊆ > *then* R ⊆ >*.*

Definition 5. *Let* ↔ *be a rewrite relation or equivalence relation,* M *a finite multiset of terms and* > *a* B*-compatible reduction order. We write* s <sup>M</sup>←→ <sup>t</sup> *if* <sup>s</sup> <sup>↔</sup> <sup>t</sup> *and there exist terms* s , t ∈ M *such that* s s *and* t t *for* = > ∪ ∼B*.*

We follow the convention that if a conversion is labeled with M, all single steps can be labeled with M.

#### Lemma 6. *Let* (E, R) <sup>A</sup> (E , R ) *and* R ⊆ >*.*

*1. For any finite multiset* <sup>M</sup> *we have* <sup>M</sup> ←−−→ERB <sup>∗</sup> <sup>⊆</sup> <sup>M</sup> ←−−−→ ERB- ∗*. 2. If* s M−−→R t *then* s M−−→R- <sup>=</sup> · <sup>N</sup> ←−−−→ ERB- <sup>∗</sup> t *with* {s} >mul N*.*

Finally, we are able to prove the correctness result for A, i.e., all finite fair and non-failing runs produce a B-complete TRS which represents the original set of equations. In contrast to [1] and [3], the proof shows that it suffices to consider prime critical pairs.

Theorem 2. *Let* E *be an ES. For every fair and non-failing run*

$$\mathcal{E}\_0, \mathcal{R}\_0 \vdash\_{\mathsf{A}} \mathcal{E}\_1, \mathcal{R}\_1 \vdash\_{\mathsf{A}} \dots \vdash\_{\mathsf{A}} \mathcal{E}\_n, \mathcal{R}\_n$$

*for* E*, the TRS* R<sup>n</sup> *is a* B*-complete representation of* E*.*

*Proof.* Let > be the B-compatible reduction order used in the run. From fairness we obtain <sup>E</sup><sup>n</sup> <sup>=</sup> <sup>∅</sup> as well as the fact that <sup>R</sup><sup>n</sup> is left-linear. Corollary <sup>1</sup> establishes ↔<sup>∗</sup> E∪B <sup>=</sup> <sup>↔</sup><sup>∗</sup> <sup>R</sup>*<sup>n</sup>* ∪B and termination modulo <sup>B</sup> of <sup>R</sup><sup>n</sup> follows from Lemma 5. It remains to prove that R<sup>n</sup> is Church–Rosser modulo B which we do by showing peak-and-cliff decreasingness. So consider a labeled local peak <sup>t</sup> <sup>R</sup> M *n* 1 →<sup>s</sup> <sup>→</sup><sup>M</sup><sup>2</sup> <sup>R</sup>*<sup>n</sup>* <sup>u</sup>. Lemma 3(1) yields t -2 <sup>s</sup> u. Let v <sup>s</sup> w appear in this sequence (so v = t or w = u). By definition, v ↓<sup>R</sup>*<sup>n</sup>* w or v ↔PCP(R*n*) w. Together with fairness, the fact that ∼<sup>B</sup> is reflexive as well as closure of rewriting under contexts and substitutions we obtain v ↓<sup>∼</sup> <sup>R</sup>*<sup>n</sup>* <sup>w</sup> or (v, w) <sup>∈</sup> <sup>n</sup> <sup>i</sup>=0 ↔<sup>E</sup>*<sup>i</sup>* ∪R*<sup>i</sup>* . In both cases, it is possible to label all steps between v and w with {v, w}. Since s>v and s>w we have M<sup>1</sup> >mul {v, w} and M<sup>2</sup> >mul {v, w}. Repeated applications of Lemma 6(1) therefore yield a conversion in R<sup>n</sup> ∪B between v and w where every step is labeled with a multiset that is smaller than both M<sup>1</sup> and M2. Hence, the corresponding condition required by peak-and-cliff decreasingness is fulfilled.

Next consider a labeled local cliff <sup>t</sup> <sup>R</sup> M *n* 1 →<sup>s</sup> <sup>↔</sup><sup>M</sup><sup>2</sup> <sup>B</sup> <sup>u</sup>. From Lemma 3(2) we obtain a term v such that t <sup>s</sup> v -∼ <sup>s</sup> u. As in the case for local peaks we obtain a conversion between t and v where each step can be labeled with {t, v} <mul M1. Together with fairness, v -∼ <sup>s</sup> u yields v ↓<sup>∼</sup> <sup>R</sup>*<sup>n</sup>* <sup>u</sup> or (v, u) <sup>∈</sup> <sup>n</sup> <sup>i</sup>=0 ↔<sup>R</sup>*<sup>i</sup>* . In the former case there exists a k such that v →<sup>∗</sup> <sup>R</sup>*<sup>n</sup>* · ∼<sup>B</sup> · <sup>R</sup>*<sup>n</sup>* k →u. If k = 0 we can label all steps with {v}. If k > 0 the conversion is of the form v →<sup>∗</sup> <sup>R</sup>*<sup>n</sup>* · ∼<sup>B</sup> · k R − *n* 1 →w <sup>R</sup>*<sup>n</sup>* →u. We can label the rightmost step with M<sup>2</sup> and the remaining steps with {v, w}. Note that s>v. Since > is a B-compatible reduction order we also have s>w. Thus, M<sup>1</sup> >mul {v, w} which establishes the corresponding condition required by peak-and-cliff decreasingness for all k. In the remaining case we have (v, u) <sup>∈</sup> <sup>n</sup> <sup>i</sup>=0 ↔<sup>R</sup>*<sup>i</sup>* , so there is some i n such that v ↔<sup>R</sup>*<sup>i</sup>* u. Actually, we know that <sup>u</sup> <sup>→</sup><sup>M</sup><sup>2</sup> <sup>R</sup>*<sup>i</sup>* <sup>v</sup> since otherwise we would have both s>v and v>s by the B-compatibility of >. Repeated applications of Lemma 6(1,2) therefore yield a conversion between u and v of the form

$$u \xrightarrow[\mathcal{R\_n}{\mathcal{R}\_n}]{M\_2} \cdots \xleftarrow{N}\_{n \cup \mathcal{B}} ^\*v$$

where {u} >mul N. By definition, s u for some s ∈ M<sup>1</sup> and therefore M<sup>1</sup> >mul N, which means that the corresponding condition required by peak-and-cliff decreasingness is fulfilled. Overall, it follows that R<sup>n</sup> is peak-and-cliff decreasing and therefore Church–Rosser modulo B.

Note that the proofs of the previous theorem and Theorem 1 do not require multiset orders induced by quasi-orders but use multiset extensions of proper B-compatible reduction orders which are easier to work with. This could be achieved by defining peak-and-cliff decreasingness in such a way that wellfounded orders suffice for the abstract setting. However, the usage of multiset orders based on B-compatible reduction orders as well as a notion of labeled rewriting which allows us to label steps with B-equivalent terms are crucial in order to establish peak-and-cliff decreasingness for TRSs.

### 4 Bachmair's Inference System

As already mentioned, the inference system proposed by Avenhaus [1] is essentially the same as <sup>A</sup>. The only other inference system for <sup>B</sup>-completion for leftlinear TRSs is due to Bachmair [3]. We investigate a slightly modified version of this inference system where arbitrary local peaks are deducible and the encompassment condition from the collapse rule is removed as we only consider finite runs and call the resulting system B.

The main difference between A and B is that in B one may only use the standard rewrite relation →<sup>R</sup> for simplifying equations and composing rules. This allows us to deduce local cliffs as equations. The goal of this section is to establish correctness of B via a simulation by A.

Definition 6. *The inference system* B *is the same as* A *but with rewriting in compose and simplify restricted to* →<sup>R</sup> *and the following rule which replaces the two deduction rules of* A*:*

$$\text{define } \begin{array}{l} \mathcal{E}, \mathcal{R} \\ \hline \mathcal{E} \cup \{s \approx t\}, \mathcal{R} \end{array} \text{ if } s \underset{\mathcal{R} \gets \cdot \to \cdot \otimes \cdot \pm t}{\to} t$$

Definition 7. *Let* E *be an ES. A finite sequence*

$$\mathcal{E}\_0, \mathcal{R}\_0 \vdash\_{\mathsf{B}} \mathcal{E}\_1, \mathcal{R}\_1 \vdash\_{\mathsf{B}} \dots \vdash\_{\mathsf{B}} \mathcal{E}\_n, \mathcal{R}\_n$$

*with* <sup>E</sup><sup>0</sup> <sup>=</sup> <sup>E</sup> *and* <sup>R</sup><sup>0</sup> <sup>=</sup> <sup>∅</sup> *is a* run *for* <sup>E</sup>*. If* <sup>E</sup><sup>n</sup> <sup>=</sup> <sup>∅</sup>*, the run* fails*. The run is* fair *if* R<sup>n</sup> *is left-linear and the following inclusion holds:*

$$\mathsf{PCP}(\mathcal{R}\_n) \cup \mathsf{PCP}^{\pm}(\mathcal{R}\_n, \mathcal{B}^{\pm}) \subseteq \downarrow\_{\mathcal{R}\_n}^{\sim} \cup \bigcup\_{i=0}^n \leftrightarrow\_{\mathcal{E}\_i}$$

In contrast to Definition 2, the fairness condition is the same for all prime critical pairs since the inference rule deduce of B never produces rewrite rules.

In the following, <sup>o</sup> <sup>I</sup> denotes an application of the rule orient in an inference system I. In order to prove that fair and non-failing runs in B can be simulated in A, we start with the following technical lemma.

Lemma 7. *If* (E1, <sup>R</sup>1) <sup>B</sup> (E2, <sup>R</sup>2) *and* (E1, <sup>R</sup>1) <sup>o</sup> <sup>∗</sup> <sup>B</sup> (E <sup>1</sup>, R <sup>1</sup>) *then* (E <sup>1</sup>, R <sup>1</sup>) <sup>=</sup> A (E <sup>2</sup>, R <sup>2</sup>) *where* (E2, <sup>R</sup>2) <sup>o</sup> <sup>∗</sup> <sup>B</sup> (E <sup>2</sup>, R <sup>2</sup>)*. In a picture:*

$$\begin{array}{ccc} \mathcal{E}\_1, \mathcal{R}\_1 & \vdash\_{\mathsf{B}} & \mathcal{E}\_2, \mathcal{R}\_2 \\\\ \overline{\mathbb{T}}^{\circ} & & \overline{\mathbb{T}}^{\circ} \\\\ \mathcal{E}'\_1, \mathcal{R}'\_1 & \vdash\_{\overline{\mathsf{A}}}^{\to} & \mathcal{E}'\_2, \mathcal{R}'\_2 \end{array}$$

For the proof of the simulation result, we need a slightly different form of the previous lemma. Analogous to the notation for rewrite relations, the relation <sup>o</sup> ! I denotes the exhaustive application of the inference rule orient.

Corollary 2. *If* (E1, <sup>R</sup>1) <sup>B</sup> (E2, <sup>R</sup>2) *and* (E1, <sup>R</sup>1) <sup>o</sup> ! <sup>B</sup> (E <sup>1</sup>, R <sup>1</sup>) *then* (E <sup>1</sup>, R <sup>1</sup>) <sup>∗</sup> <sup>A</sup> (E <sup>2</sup>, R <sup>2</sup>) *where* (E2, <sup>R</sup>2) <sup>o</sup> ! <sup>B</sup> (E <sup>2</sup>, R 2)*.*

Theorem 3. *For every fair run* (E, <sup>∅</sup>) <sup>∗</sup> <sup>B</sup> (∅, <sup>R</sup>) *there exists a fair run* (E, <sup>∅</sup>) <sup>∗</sup> <sup>A</sup> (∅, <sup>R</sup>)*.*

*Proof.* Assume (E0, <sup>R</sup>0) <sup>n</sup> <sup>B</sup> (En, <sup>R</sup>n) where <sup>R</sup><sup>0</sup> <sup>=</sup> <sup>E</sup><sup>n</sup> <sup>=</sup> <sup>∅</sup>. By <sup>n</sup> applications of Corollary 2 we arrive at the following situation:

E0, R<sup>0</sup> E <sup>0</sup>, R E<sup>0</sup> <sup>0</sup> , R<sup>0</sup> E1, R<sup>1</sup> E <sup>1</sup>, R 1 ··· ··· En, R<sup>n</sup> E <sup>n</sup>, R n B o ! <sup>A</sup> <sup>∗</sup> A B ∗ A B ∗ A o B ! o B ! o B !

The following two statements hold:


Statement (1) is immediate from the simulation relation <sup>o</sup> ! <sup>B</sup> and statement (2) follows from B-compatibility of the used reduction order together with the fact that every (prime) critical pair is connected by one Rn-step and one B-step. Furthermore, <sup>E</sup><sup>n</sup> <sup>=</sup> <sup>∅</sup> implies <sup>E</sup> <sup>n</sup> <sup>=</sup> <sup>∅</sup> as well as <sup>R</sup><sup>n</sup> <sup>=</sup> <sup>R</sup> <sup>n</sup>. Hence, we obtain fairness of the run in A by showing the following inclusions:

$$\mathsf{PCP}(\mathcal{R}'\_n) \subseteq \downarrow\_{\mathcal{R}'\_n}^{\prime} \cup \bigcup\_{i=0}^n \hookrightarrow\_{\mathcal{E}'\_i \cup \mathcal{R}'\_i} \qquad \mathsf{PCP}^\pm(\mathcal{R}'\_n, \mathcal{B}^\pm) \subseteq \downarrow\_{\mathcal{R}'\_n}^{\prime} \cup \bigcup\_{i=0}^n \hookrightarrow\_{\mathcal{R}'\_i}$$

Let <sup>s</sup> <sup>≈</sup> <sup>t</sup> <sup>∈</sup> PCP(R <sup>n</sup>). By fairness of the run in <sup>B</sup> we obtain <sup>s</sup> <sup>↓</sup><sup>∼</sup> R- *<sup>n</sup>* t or s ↔<sup>E</sup>*<sup>k</sup>* t for some k n. In the former case, we are immediately done. In the latter case we obtain s ↔E- *<sup>k</sup>* ∪R- *<sup>k</sup>* <sup>t</sup> from (1) as desired. Now, let <sup>s</sup> <sup>≈</sup> <sup>t</sup> <sup>∈</sup> PCP±(R <sup>n</sup>, B<sup>±</sup>). By fairness of the run in <sup>B</sup> we obtain <sup>s</sup> <sup>↓</sup><sup>∼</sup> R- *<sup>n</sup>* t or s ↔E*<sup>k</sup>* t for some k n. Again, we are immediately done in the former case. In the latter case we have s ↔R- *k* t because of (1) and (2). Therefore, the run in <sup>A</sup> is fair.

The previous theorem is an important simulation result which justifies the emphasis on A in this paper. Moreover, together with Theorem 2 the correctness of the inference system B is an easy consequence.

Corollary 3. *Every fair and non-failing run for* <sup>E</sup> *in* <sup>B</sup> *produces a* <sup>B</sup>*-complete presentation of* E*.*

### 5 AC Completion

So far, the theoretical results have been generalized by using the equational theory B as a placeholder. In practice, however, this paper is concerned with the particular theory AC. The results of this section allow us to assess the effectiveness of the inference system A in the setting of AC completion.

### 5.1 Limitations of Left-Linear AC Completion

In addition to the restriction to left-linear rewrite rules, the following example demonstrates another severe limitation of the inference system A previously unmentioned in the literature.

*Example 4.* Consider the ES E consisting of the equations

$$\mathsf{and}(\mathsf{0},\mathsf{0}) \approx \mathsf{0} \qquad\qquad\mathsf{and}(\mathsf{1},\mathsf{1}) \approx \mathsf{1} \qquad\qquad\qquad\mathsf{and}(\mathsf{0},\mathsf{1}) \approx \mathsf{0}$$

where and is an AC function symbol. There is only one way to orient each equation. Furthermore, there are no critical pairs between the resulting rewrite rules. Hence, using the inference system A we arrive at the intermediate TRS

$$\mathsf{and}(\mathsf{0},\mathsf{0}) \to \mathsf{0} \qquad\qquad\mathsf{and}(\mathsf{1},\mathsf{1}) \to \mathsf{1} \qquad\qquad\mathsf{and}(\mathsf{0},\mathsf{1}) \to \mathsf{0}$$

where the only possible next step is to deduce local cliffs. We will now show that this has to be done infinitely many times. Note that an AC-complete presentation R of E has to be able to rewrite any AC-equivalent term of a redex: Consider the infinite family of terms

$$s\_0 = \mathsf{and}(\mathbf{0}, \mathbf{1})\\\ s\_1 = \mathsf{and}(\mathsf{and}(\mathbf{0}, x\_1), \mathbf{1})\\\ s\_2 = \mathsf{and}(\mathsf{and}(\mathsf{and}(\mathbf{0}, x\_1), x\_2), \mathbf{1}) \dashrightarrow$$

as well as

$$t\_0 = \mathbf{0} \qquad \qquad t\_1 = \text{and} \\ (\mathbf{0}, x\_1) \qquad \qquad t\_2 = \text{and} \\ (\text{and} \\ (\mathbf{0}, x\_1), x\_2) \qquad \qquad \cdots \qquad $$

Clearly, s<sup>n</sup> ↔<sup>∗</sup> E ∪AC <sup>t</sup><sup>n</sup> for all <sup>n</sup> <sup>∈</sup> <sup>N</sup> and therefore also <sup>s</sup><sup>n</sup> <sup>↓</sup><sup>∼</sup> <sup>R</sup> <sup>t</sup><sup>n</sup> for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>, but this demands infinitely many rules in R: For each s<sup>n</sup> there is an AC-equivalent term such that the constants 0 and 1 are next to each other which allows us to rewrite it using the rule and(0, <sup>1</sup>) <sup>→</sup> <sup>0</sup>. However, with <sup>n</sup> also the amount of variables between these constants increases which requires R to have infinitely many rules since rewrite rules can only be applied before the representation modulo AC is changed.

Note that there is nothing special about this example except the fact that it contains at least one equation which can only be oriented such that the left-hand side contains an AC function symbol where both arguments have "structure", i.e., both arguments represent more complicated terms than a variable. As a consequence, the necessity of infinite rules applies to all equational systems which have this property. Needless to say, this means that for a large class of equational systems the corresponding AC-canonical presentation (in the left-linear sense) is infinite if it exists. This observation is in stark contrast to the properties of general AC completion as presented in the next section which can complete the ES E from Example 4 into a finite AC-canonical TRS by simply orienting all rules from left to right.

#### 5.2 General AC Completion

Inference systems for completion modulo an equational theory which are not restricted to the left-linear case usually need more inference rules than the ones already covered in this paper. For general AC completion, however, there exists a particularly simple inference system which constitutes a special case of normalized completion [12] and can be found in Sarah Winkler's PhD thesis [16, p. 109].

Definition 8. *The inference system* KBAC *is the same as* A *for the fixed theory AC but with a modified collapse rule which allows us to rewrite with* →R/AC *and the following rule which replaces the two deduction rules of* A*:*

$$\text{define } \begin{array}{l} \mathcal{E}, \mathcal{R} \\ \hline \mathcal{E} \cup \{s \approx t\}, \mathcal{R} \end{array} \text{ if } s \underset{\mathcal{R}}{\leftarrow} \cdot \sim\_{\mathsf{AC}} \cdot \to\_{\mathsf{\mathcal{R}}} t.$$

The purpose of this section is to show how A can be simulated by KBAC in the case of <sup>B</sup> <sup>=</sup> AC. Since local cliffs cannot be deduced in KBAC, the simulation has to work with a potentially smaller set of rewrite rules. Furthermore, during a run, the variants of rules stemming from local cliffs may be in different states with respect to inter-reduction (collapse and compose). Given an intermediate TRS <sup>R</sup> of a run in <sup>A</sup> as well as an intermediate TRS <sup>R</sup> of a run in KBAC, the invariant R⊆→<sup>+</sup> R-/AC resolves both of the aforementioned problems. The main motivation behind this invariant is the avoidance of compose and collapse in the KBAC run.

Lemma 8. *If* (E1, <sup>R</sup>1) <sup>A</sup> (E2, <sup>R</sup>2) *and* <sup>R</sup><sup>1</sup> ⊆ →<sup>+</sup> R- <sup>1</sup>/AC *then there exists a TRS* R <sup>2</sup> *such that* (E1, R <sup>1</sup>) <sup>∗</sup> KBAC (E2, R <sup>2</sup>) *and* <sup>R</sup><sup>2</sup> ⊆ →<sup>+</sup> R- 2/AC*.*

*Proof.* Let > be a fixed AC-compatible reduction order which is used in both A and KBAC. Suppose (E1, <sup>R</sup>1) <sup>A</sup> (E2, <sup>R</sup>2) and <sup>R</sup><sup>1</sup> ⊆ →<sup>+</sup> R- 1/AC. We proceed by a case analysis on the rule applied in the inference step (E1, R1) <sup>A</sup> (E2, R2). The only interesting cases are when deduce, simplify, compose, or collapse is applied.

– If deduce is applied, we further distinguish whether it was applied to a local peak or cliff. In the case of a local cliff, we have E<sup>1</sup> = E<sup>2</sup> and R<sup>2</sup> = R1∪{- → r} with - →R1/AC r. From - <sup>→</sup>R1/AC <sup>r</sup> and <sup>R</sup><sup>1</sup> ⊆ →<sup>+</sup> R- <sup>1</sup>/AC we obtain - <sup>→</sup><sup>+</sup> R- <sup>1</sup>/AC <sup>r</sup>. Thus, <sup>R</sup><sup>2</sup> ⊆ →<sup>+</sup> R- <sup>1</sup>/AC holds. As (E1, <sup>R</sup> <sup>1</sup>) <sup>0</sup> KBAC (E2, R <sup>1</sup>) is trivial, the claim follows. In the case of a local peak, we have R<sup>1</sup> = R<sup>2</sup> and E<sup>2</sup> = E<sup>1</sup> ∪ {t ≈ u} with t <sup>R</sup><sup>1</sup> →<sup>s</sup> <sup>→</sup><sup>R</sup><sup>1</sup> <sup>u</sup>. Since <sup>R</sup><sup>1</sup> ⊆ →<sup>+</sup> R- <sup>1</sup>/AC holds, we have <sup>t</sup> <sup>R</sup>- ∗ → <sup>1</sup>/AC v <sup>R</sup>- 1 →· ∼AC s ∼AC · →R- <sup>1</sup> w →<sup>∗</sup> R- <sup>1</sup>/AC <sup>u</sup> for some <sup>v</sup> and <sup>w</sup>. By performing deduce and simplify steps

$$(\mathcal{E}\_1, \mathcal{R}\_1') \models\_{\mathsf{K}\mathsf{B}\_{\mathsf{M}\mathsf{C}}} (\mathcal{E}\_1 \cup \{v \approx w\}, \mathcal{R}\_1') \models\_{\mathsf{K}\mathsf{B}\_{\mathsf{M}\mathsf{C}}}^{\*} (\mathcal{E}\_1 \cup \{t \approx u\}, \mathcal{R}\_1') = (\mathcal{E}\_2, \mathcal{R}\_1') \cup$$

is obtained. As <sup>R</sup><sup>1</sup> <sup>=</sup> <sup>R</sup>2, the inclusion <sup>R</sup><sup>2</sup> ⊆ →<sup>+</sup> R- <sup>1</sup>/AC is trivial. Hence, the claim holds.


$$\ell' \llcorner\_{\mathcal{R}'\_1/\mathsf{AC}} \stackrel{\*}{\leftarrow} t \ll\_{\mathcal{R}'\_1} \leftarrow \sim\_{\mathsf{AC}} \ell \sim\_{\mathsf{AC}} \cdot \to \sim\_{\mathcal{R}'\_1} u \to^\*\_{\mathcal{R}'\_1/\mathsf{AC}} r$$

for some t and u. Performing deduce and simplify, we obtain:

$$(\mathcal{E}\_1, \mathcal{R}\_1') \models\_{\mathsf{K}\mathsf{B}\_{\mathsf{M}}} (\mathcal{E}\_1 \cup \{t \approx u\}, \mathcal{R}\_1') \models\_{\mathsf{K}\mathsf{B}\_{\mathsf{M}}}^\* (\mathcal{E}\_1 \cup \{\ell' \approx r\}, \mathcal{R}\_1') = (\mathcal{E}\_2, \mathcal{R}\_1')$$

By <sup>R</sup><sup>2</sup> ⊆ R<sup>1</sup> ⊆ →<sup>+</sup> R- <sup>1</sup>/AC the claim is concluded.

Theorem 4. *For every fair run* (E, <sup>∅</sup>) <sup>∗</sup> <sup>A</sup> (∅, <sup>R</sup>) *there exists a run* (E, <sup>∅</sup>) <sup>∗</sup> KBAC (∅, <sup>R</sup> ) *such that* R /AC *is an AC-complete presentation of* <sup>E</sup>*.*

*Proof.* With a straightforward induction argument, we obtain the run (E, <sup>∅</sup>) <sup>∗</sup> KBAC (∅, <sup>R</sup> ) as well as R⊆→<sup>+</sup> R-/AC (∗) from Lemma 8. Furthermore, AC termination of R and ↔<sup>∗</sup> E ∪AC <sup>=</sup> <sup>↔</sup><sup>∗</sup> R-<sup>∪</sup>AC (∗∗) are easy consequences from

the definition of KBAC. AC-completeness of <sup>R</sup> follows from fairness of the run in <sup>A</sup> and Theorem 2. For the Church–Rosser modulo AC property of <sup>R</sup> /AC, consider a conversion s ↔<sup>∗</sup> R- <sup>∪</sup>AC <sup>t</sup>. From (∗∗) we obtain <sup>s</sup> <sup>↔</sup><sup>∗</sup> E ∪AC and therefore s →<sup>∗</sup> <sup>R</sup> · ∼AC · <sup>R</sup> ∗ →t by the fact that R is an AC-complete presentation of E. Finally, (∗) yields s →<sup>∗</sup> R-/AC · ∼AC · <sup>R</sup>- /AC ∗ →t as desired. Thus, R /AC is an AC-complete presentation of E.

In addition to the result of the previous theorem, the proof of Lemma 8 provides a procedure to construct a KBAC run which "corresponds" to a given A run. In particular, this means that it is possible to switch from A to KBAC at any point while performing AC completion. This is of practical relevance: Assume that AC completion is started with A in order to avoid AC unification. If A gets stuck due to simplified equations which are not orientable into a left-linear rule or it seems to be the case that the procedure diverges due to the problem described in Example 4, starting from scratch with KBAC is not necessary. We conclude the section by illustrating the practical relevance of the simulation result with an example.

*Example 5.* Consider the ES E for abelian groups consisting of the equations

$$\mathbf{e} \cdot x \approx x \qquad\qquad\qquad x^- \cdot x \approx \mathbf{e}$$

where · is an AC symbol. Note that the well-known completion run for nonabelian group theory is also a run in A: Critical pairs with respect to the associativity axiom are deducible via local cliffs, non-left-linear intermediate rules are allowed and all (intermediate) rules are orientable with e.g. AC-KBO. Hence, we obtain the TRS R consisting of the rules


and switch to KBAC where we can collapse the redundant rules 4, 6, 7 and 9. A final joinability check of all AC critical pairs reveals that the resulting TRS R is an AC-complete presentation of abelian groups. Hence, the simulation result allows to make progress with A even when it is doomed to fail. In particular, critical pairs between rules whose left-hand sides do not contain AC symbols do not need to be recomputed.

### 6 Implementation

To the best of our knowledge, our tool accompll is the first implementation of left-linear AC completion. It is written in Haskell and available on its website<sup>1</sup>. Instead of expecting explicit AC-compatible reduction orders as input, accompll performs completion with termination tools [15]. In principle, completion with termination tools has to consider all combinations of possible orientations of equations in order to find a complete system. However, traversing the whole search space is rather inefficient. The state of the art for solving this problem efficiently is *multi-completion with termination tools* due to Winkler et al. [20]. Since the implementation of this method is a major effort, accompll adopts a simple but incomplete strategy presented in [14]: Instead of traversing the whole search space, accompll runs two threads in parallel where one thread prefers to orient equations from left to right and vice versa. If one of the threads finishes successfully, the corresponding result is reported. Completion fails if both threads fail.

As input, the tool expects a file in the WST<sup>2</sup> format describing the equational theory on which left-linear AC completion should be performed. The user can choose whether <sup>→</sup><sup>R</sup> or <sup>→</sup>R/AC is used for rewriting in the inference rules simplify and compose. Furthermore, the generation of critical pairs can be restricted to the primality criterion.

Another feature is the validity problem solving mode which solves a given instance of the validity problem for an equational theory E upon successful completion of E. This mode can be triggered by supplying a concrete equation s ≈ t as a command line argument in addition to the file describing E.

In the tool accompll, external termination tools do much of the heavy lifting. In particular, the user can supply the executable of an arbitrary termination tool as long as the output starts with YES, MAYBE, NO or TIMEOUT (all other cases are treated as an error). The input format for the termination tool can be set by a command line argument. The available options are the WST format as well as the XML format of the Nagoya Termination Tool [21] 3.

Since starting a new process for every call of the termination tool causes a lot of operating system overhead, the tool supports an interactive mode which allows it to communicate with a single process of the termination tool in a dialogue style. Here, the only constraint for the termination tool is that it accepts a sequence of termination problems separated by the keyword (RUN). This is currently only implemented in an experimental version of Tyrolean Termination Tool 2 (TTT2) [11], but we hope that more termination tools will follow as this approach has a positive effect on the runtime of completion with termination tools while demanding comparatively little implementation effort.

### 7 Experimental Results

The problem set used for the experimental results consists of 50 ESs. It is based on the one used in [18] and has been extended by further examples from the literature as well as handcrafted examples. The experiments were performed on

<sup>1</sup> https://github.com/niedjoh/accompll.

<sup>2</sup> https://www.lri.fr/~marche/tpdb/format.html.

<sup>3</sup> https://www.trs.cm.is.nagoya-u.ac.jp/NaTT/natt-xml.html.


Table 1. Experimental results on 50 problems (excerpt)

<sup>a</sup> mkbTT does not output the completed system for unknown reasons.

an Intel Core i7-7500U running at a clock rate of 2.7 GHz with 15.5 GiB of main memory. Our tool accompll was used with the termination tool TTT2 as well as an experimental version (denoted by TTT2e) which allows our tool to communicate a sequence of termination problems without having to start a new process all the time, as described in the preceding section.

Table 1 shows some interesting results and compares the two configurations of accompll with the normalized completion [12] mode of mkbTT [19] and the AC completion mode of MædMax [17]. The tool mkbTT is the original implementation of multi-completion with termination tools [20]. MædMax, on the other hand, implements *maximal completion* [9] which makes use of MaxSAT/MaxSMT solvers instead of termination tools in order to avoid using concrete reduction orders as input. To the best of our knowledge, there is no comparable completion tool which supports AC axioms. Since normalized completion subsumes general AC completion, a comparison with the aforementioned modes of both systems allows us to assess the effectiveness of accompll with respect to the state of the art in AC completion. Note that normalized completion uses AC unification.

In Table 1, columns (1) show the execution time in seconds where ∞ denotes that the timeout of 60 s has been reached and ⊥ denotes failure of completion. Columns (2) state the number of rules of the completed TRS. The first two problems show that the avoidance of AC unification can indeed have a positive effect on the execution time. However, the third problem indicates that there may also be an opposite effect on small problems. The last two problems show the two main limitations of left-linear AC completion: Abelian groups do not have an AC-complete presentation which is left-linear and Example 2 from [7] is a ground ES which causes left-linear AC completion to suffer from the problem described in Example 4 by definition. The severity of these limitations is reflected in the total number of solved problems. In particular, the problem set does not contain an ES which is completed only by accompll. However, given Theorem 4, this is not unexpected. Another noteworthy but unsurprising fact is that complete systems produced by accompll tend to have more rules since every rule needs different versions of left-hand sides to facilitate rewriting without AC-matching. It would also be interesting to compare the execution times for typical queries of the form E s ≈ t as the resulting systems of left-linear AC completion allow for more efficient joinability checks using →<sup>R</sup> instead of →R/AC. We leave this for future work.

The complete results are available on the tool's website<sup>4</sup>. We conclude with some additional notes on the results.


### 8 Conclusion

In this paper, we consolidated the existing literature for left-linear AC completion in the case of finite runs and gave new insight into its merits compared to general AC completion. Furthermore, our implementation accompll allowed us to run practical experiments. An extended version of this paper with full proof details and an appendix which describes the original inference systems of Avenhaus and Bachmair is available on the website of accompll (see Footnote 4). We conclude by giving some pointers for future work. First of all, the merits of our novel simulation result for general AC completion could be evaluated experimentally by providing an implementation. Another interesting research direction is normalized completion for the left-linear case. If successful, this would facilitate the treatment of important cases such as abelian groups despite the restriction to left-linear TRSs. Furthermore, a formalization of the established theoretical results is desirable. To that end, the existing Isabelle/HOL formalization from [4] is a perfect starting point as some results of this paper are extensions of the results for standard rewriting presented there.

Acknowledgments. We thank Jonas Schöpf and Fabian Mitterwallner for providing the experimental version of TTT2 as well as the anonymous reviewers for their valuable suggestions.

<sup>4</sup> http://cl-informatik.uibk.ac.at/software/accompll/.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# On *P* -Interpolation in Local Theory Extensions and Applications to the Study of Interpolation in the Description Logics *EL, EL***<sup>+</sup>**

Dennis Peuter, Viorica Sofronie-Stokkermans(B) , and Sebastian Thunert

University of Koblenz, Koblenz, Germany {dpeuter,sofronie}@uni-koblenz.de, S.Thunert@gmx.de

Abstract. We study the *P*-interpolation property for certain local theory extensions, and use these results for proving ≤-interpolation in classes of semilattices with monotone operators. For computing the ≤-interpolating terms, we use a hierarchic approach. We use these results for the study of -interpolation in the description logics EL and EL<sup>+</sup>.

### 1 Introduction

In this paper we study the problem of P-interpolation, a problem strongly related to interpolation w.r.t. logical theories. The problem can be formulated as follows:

Let T be a theory, A and B be conjunctions of ground literals in the signature of T , possibly with additional constants, P a predicate symbol in the signature of T , a a constant occurring in A and b a constant occurring in B. Assume that A ∧ B |=<sup>T</sup> aP b. Can we find a ground term t containing only constants and function symbols "shared" by A and B, such that A ∧ B |=<sup>T</sup> aP t ∧ tP b?

Interpolation has been studied in classical and non-classical logics and in extensions and combinations of theories; and is very important in program verification and also in the area of description logics. The first algorithms for interpolant generation in program verification required explicit constructions and "separations" of proofs [14,16]. In [13] interpolants are computed using variants of resolution. For certain theories, the "separation" of proofs relied on the possibility of "separating" atoms, i.e. on P-interpolation. Equality interpolation is used in [34] for devising an interpolation method in combinations of theories with disjoint signatures. In [22,24] and [19], for instance, we consider interpolation problems in certain classes of extensions T<sup>0</sup> ∪ K of a base theory T<sup>0</sup> and use a hierarchical approach to compute interpolants. The method relies on the P-interpolation property of the base theory T0. In most of the applications we considered, P is the equality predicate ≈ or a predicate ≤ with the property that in all models of T0, the interpretation of ≤ is a partial ordering. Since at that time our main interest was the study of *interpolation problems*, in [22,24] and [19] P-interpolation is only used in order to help in giving methods for interpolation and not as a goal in itself. However, in several papers in the area of description logics (cf. e.g. [8,31]) when defining the notion of interpolation in description logics the authors define in fact a notion of -interpolation. In [8] (Thm. 4) it is proved that EL<sup>+</sup> allows interpolation (in fact, the notion of -interpolation mentioned above) for *safe* role inclusions – this is related to the notion of "sharing" considered in [24], cf. also Sect. 4. The proof technique in [8] uses simulations. In this paper, we analyze the property of P-interpolation in theory extensions, propose a method for solving it based on hierarchical reasoning and satisfiability modulo theories, and formulate the -interpolation problem for EL and EL<sup>+</sup> as a <sup>≤</sup>-interpolation problem in a theory of semilattices with operators. We first studied ≤-interpolation in [17] in the context of description logics; the -interpolating concept descriptions were regarded as a form of "highlevel" explanations. In this paper we further extend the work in [17]. The general approach we propose opens the possibility of applying similar methods to more general classes of non-classical logics (including e.g. substructural logics or the logics with monotone operators studied in [27,28]) or in verification (to consider more general theory extensions than those with uninterpreted function symbols analyzed in [19]). The main results can be summarized as follows:


*Structure of the Paper:* In Sect. 2 and 3 basic notions are introduced, and some results needed later are proved. In Sect. 4 we identify classes of local theory extensions allowing P-interpolation and propose a hierarchical method of computing P-interpolants. This is used in Sect. 5 to study the existence of ≤-interpolation in classes of semilattices with monotone operators. In Sect. 6 we use the links between the theory of semilattices with operators and the description logics EL and EL<sup>+</sup>, and show how the results can be used in the study of these logics. The details of the proofs and additional examples can be found in [18].

### 2 Theories, Convexity, *P* -Interpolation, Beth Definability

We assume known standard definitions from first-order logic such as Πstructures, models, homomorphisms, logical entailment, satisfiability, unsatisfiability.

We consider signatures of the form Π = (Σ, Pred), where Σ is a family of function symbols and Pred a family of predicate symbols. In this paper, a theory T is described by a set of closed formulae (the axioms of the theory). We call a theory axiomatized by a set of (universally quantified) equations an *equational theory*. In this paper, we denote by Mod(<sup>T</sup> ) the set of all models of <sup>T</sup> . We denote "falsum" with ⊥. If F and G are formulae we write F |= G (resp. F |=<sup>T</sup> G) to express the fact that every model of F (resp. every model of F which is also a model of T ) is a model of G. The definitions can be extended in a natural way to the case when F is a set of formulae; in this case, F |=<sup>T</sup> G if and only if T ∪ F |= G. F |=⊥ means that F is unsatisfiable; F |=<sup>T</sup> ⊥ means that there is no model of T which is also a model of F. If there is a model of T which is also a model of F we say that F is T -consistent. If C is a fixed countable set of fresh constants, we denote by Π<sup>C</sup> the extension of Π with constants in C.

Convexity and P-Convexity. We can define a notion of convexity w.r.t. a subset P of the set of predicates.

Definition 1. *A theory* <sup>T</sup> *with signature* <sup>Π</sup> = (Σ, Pred) *is* convex *with respect to a subset* <sup>P</sup> *of* Pred *(which may include also equality* <sup>≈</sup>*) if for all conjunctions* Γ *of ground* Π<sup>C</sup> *-atoms (with additional constants in a set* C*), relations* <sup>R</sup>1,...,R<sup>m</sup> <sup>∈</sup> <sup>P</sup> *and tuples of* <sup>Π</sup><sup>C</sup> *-terms of corresponding arity* <sup>t</sup>1,...,t<sup>m</sup> *such that* Γ |=<sup>T</sup> m <sup>i</sup>=1 Ri(ti) *there exists* i<sup>0</sup> ∈ {1,...,m} *such that* Γ |=<sup>T</sup> R<sup>i</sup><sup>0</sup> (t<sup>i</sup><sup>0</sup> )*.*

We will call a theory <sup>T</sup> *convex* if it is Pred ∪ {≈}-convex. The following result is well-known (cf. e.g. [5,10,32]):

Theorem 1. *Let* <sup>T</sup> *be a theory and let* Mod(<sup>T</sup> ) *be the class of models of* <sup>T</sup> *.*


Corollary 2. *Let* <sup>T</sup>1*,* <sup>T</sup><sup>2</sup> *be two theories with signatures* <sup>Π</sup>1, Π2*. If* Mod(T1) *and* Mod(T2) *are closed under direct products, then* <sup>T</sup><sup>1</sup> ∪ T<sup>2</sup> *is convex.*

*Proof:* Follows from the fact that if Mod(T1) and Mod(T2) are closed under direct products then so is also Mod(T<sup>1</sup> ∪ T2) and from Theorem 1. -

From Theorem 1 and Corollary 2 it immediately follows that if T<sup>1</sup> and T<sup>2</sup> are universal theories and convex then T1∪T<sup>2</sup> is convex. In particular, every extension of a convex universal theory T<sup>0</sup> with a set of new function symbols axiomatized by a set K of Horn clauses is convex.

Equality Interpolation, R-Interpolation. We say that a convex theory T has the equality interpolation property if for every conjunction of ground Π<sup>C</sup> literals A(c, a1, a) and B(c, b1, b), if A∧B |=<sup>T</sup> a ≈ b then there exists a term t(c) containing only the shared constants c such that A ∧ B |=<sup>T</sup> a ≈ t(c) ∧ t(c) ≈ b.

Sometimes, the theories and theory extensions we study contain interpreted symbols in a set Π<sup>0</sup> = (Σ0, Pred) and non-interpreted function symbols in a set Σ1. The classical definition for equality interpolation for a theory T mentioned above allows the term t(c) to contain all function symbols in the signature of T

– these symbols are in this case all seen as being interpreted. If we distinguish between interpreted and uninterpreted functions we might require that the intermediate term t(c) contains only "shared" uninterpreted functions and common constants.

If Σ<sup>A</sup> and Σ<sup>B</sup> are the uninterpreted function symbols occurring in A resp. B, and Θ is a closure operator, by "shared" uninterpreted functions we can mean:


*Example 1.* Let T = T<sup>0</sup> ∪ K be the extension of a theory T<sup>0</sup> with set of interpreted function symbols Σ<sup>0</sup> with a set K of clauses containing new uninterpreted function symbols in a set Σ1. If A and B are sets of atoms in the signature of T containing additional constants in a set C and uninterpreted function symbols ΣA, Σ<sup>B</sup> then the *intersection-shared* uninterpreted function symbols of A and B are Σ<sup>A</sup> ∩ ΣB. Let Θ<sup>K</sup> be defined for every Σ ⊆ Σ<sup>1</sup> by ΘK(Σ) = <sup>f</sup>∈<sup>Σ</sup>{<sup>g</sup> <sup>∈</sup> <sup>Σ</sup><sup>1</sup> <sup>|</sup> <sup>g</sup> <sup>∼</sup><sup>∗</sup> <sup>K</sup> <sup>f</sup>}, where <sup>∼</sup><sup>∗</sup> <sup>K</sup> is the equivalence relation induced by the relation f ∼<sup>K</sup> g iff there exists C∈K s.t. f,g both occur in C.

Then the ΘK-shared symbols are ΘK(ΣA)∩ΘK(ΣB). In particular, if A contains a function symbol f and B contains a symbol g such that f,g occur both in a clause in K, then f and g are considered to be ΘK-shared by A, B. -

We also might be interested in similar properties for other binary relations. We define an R-interpolation property, where R is a binary predicate symbol in Π.

Definition 2. *Let* <sup>R</sup> <sup>∈</sup> Pred∪ {≈} *be a binary predicate symbol. An* {R}*-convex theory* T *with uninterpreted symbols* Σ<sup>1</sup> *has the* R-interpolation property *if for all conjunctions of ground atoms* A(c, a1, a) *and* B(c, b1, b)*, if* A∧B |=<sup>T</sup> aRb *then there exists a term* t(c) *containing only common constants* c *and only "shared" uninterpreted symbols in* Σ<sup>1</sup> *such that* A ∧ B |=<sup>T</sup> aRt(c) ∧ t(c)Rb*.*

If <sup>P</sup> <sup>⊆</sup> Pred, we say that a theory has the <sup>P</sup>*-interpolation property* if it has the <sup>R</sup>interpolation property for every R ∈ P. In Sect. 5 we give examples of theories with this property and show that a theory may not have the R-interpolation property for a predicate symbol R if we use the notion of *intersection-shared symbols*, but has the R-interpolation property if we consider the less restrictive notion of Θ*-shared symbols* for a suitably defined closure operator Θ.

Beth Definability. Let <sup>T</sup> be a theory with signature <sup>Π</sup> = (Σ0∪Σ1, Pred), where the function symbols in Σ<sup>0</sup> are interpreted function symbols and the function symbols in Σ<sup>1</sup> are regarded as uninterpreted function symbols, and let C be a set of additional constants. We define a notion of Beth definability relative to a subset Σ<sup>S</sup> ⊆ Σ<sup>1</sup> ∪ C of non-interpreted function symbols and constants similar to the one introduced in [31], which we refer to as ΣS-Beth definability.

Let Σ<sup>S</sup> ⊆ Σ<sup>1</sup> ∪ C, let Σ<sup>r</sup> = Σ1\ΣS, and let Π = (Σ<sup>0</sup> ∪ (Σ<sup>S</sup> ∩ Σ1) ∪ Σ <sup>r</sup>, Pred), where Σ <sup>r</sup> = {f | f ∈ Σ1\ΣS} is the signature obtained by replacing all uninterpreted function symbols in Σ<sup>1</sup> which are not in Σ<sup>S</sup> with new primed copies. If φ is a Π<sup>C</sup> -formula, we will denote by φ the formula obtained from φ by replacing all uninterpreted function symbols in Σ1\Σ<sup>S</sup> and all constants in C\Σ<sup>S</sup> with distinct, primed versions. The interpreted function symbols and the uninterpreted function symbols and constants in Σ<sup>S</sup> are not changed. We regard the theory T as a set of formulae; let T := {φ | φ ∈T}. 1

Let <sup>A</sup> be a conjunction of ground <sup>Π</sup><sup>C</sup> -literals, and <sup>a</sup> <sup>∈</sup> <sup>C</sup>. We say that <sup>a</sup> is *implicitly defined* by A w.r.t. Σ<sup>S</sup> and T if, with the notations introduced before,

$$A \wedge A' \mid =\_{T \cup T'} a \approx a'.$$

We say that a is *explicitly defined* by A w.r.t. Σ<sup>S</sup> and T if there exists a term t containing only symbols in <sup>Σ</sup>0, Pred and <sup>Σ</sup><sup>S</sup> such that <sup>A</sup> <sup>|</sup>=<sup>T</sup> <sup>a</sup> <sup>≈</sup> <sup>t</sup>.

Definition 3. *Let* T *be a theory with uninterpreted function symbols in a set* Σ1*. Let* Σ<sup>S</sup> ⊆ Σ<sup>1</sup> ∪ C*.* T *has the* Beth definability property w.r.t. Σ<sup>S</sup> *(*ΣS*-Beth definability), if for every conjunction of literals* A *and every* a ∈ C*, if* A *implicitly defines* a *w.r.t.* Σ<sup>S</sup> *and* T *then* A *explicitly defines* a *w.r.t.* Σ<sup>S</sup> *and* T *.*

In [4,6] it was proved that if a convex theory has the ≈-interpolation property, then it has the Beth definability property. We give an analogous implication between ≈-interpolation and Beth definability w.r.t. a subsignature.

Theorem 3. *Let* <sup>T</sup> *be a convex theory with signature* <sup>Π</sup> = (Σ<sup>0</sup> <sup>∪</sup> <sup>Σ</sup>1, Pred)*,* <sup>C</sup> *a set of constants, and* Σ<sup>S</sup> ⊆ Σ<sup>1</sup> ∪ C*. Let* T *be as defined above.*


*Proof (Idea):* (i) Assume a is implicitly definable w.r.t. ΣS, i.e. there exists a conjunction A of literals such that if A is obtained by renaming as explained before, then A∧A |=T ∪T a ≈ a . Since T ∧T has ≈-interpolation, there exists a term t using only the functions and predicate symbols common to A and A (i.e. the symbols in Σ<sup>0</sup> ∪ ΣS) such that A ∧ A |=T ∪T a ≈ t ∧ t ≈ a . It can be shown that then A |=<sup>T</sup> a ≈ t.

(ii) Assume a is implicitly definable w.r.t. ΘK(ΣS), i.e. there exists a conjunction A of literals such that if A is obtained by renaming as explained before then A ∧ A |=T ∪T a ≈ a . The symbols shared by A and A are the symbols in Σ0∪Σ<sup>S</sup> ∪ΘK∪K- (ΣS), where ΘK∪K- (ΣS) = <sup>f</sup>∈Σ<sup>S</sup> <sup>∩</sup>Σ<sup>1</sup> {<sup>g</sup> <sup>∈</sup> <sup>Σ</sup>1∪Σ <sup>1</sup> | f ∼<sup>∗</sup> K∪K g}. It is easy to see that for every f ∈ Σ1\ΣS, f ∈ ΘK(ΣS) iff f ∈ ΘK- (ΣS), and ΘK∪K- (ΣS) = ΘK(ΣS) ∪ ΘK- (ΣS). Since we assumed that T ∪T has the ≈-interpolation property with the notion of ΘK∪K- -sharing, there exists a term t over the signature Σ<sup>0</sup> ∪ ΘK∪K- (ΣS) such that A ∧ A |=T ∪T a ≈ t ∧ t ≈ a. The term t might contain primed versions of function symbols. We can show that we can find a term t containing only terms in ΘK(ΣS) such that A |=<sup>T</sup> a ≈ t. -

<sup>1</sup> A similar definition can be given if theories are defined as classes of models.

### 3 Local Theory Extensions

Let <sup>Π</sup>0=(Σ0, Pred) be a signature, and <sup>T</sup><sup>0</sup> be a "base" theory with signature <sup>Π</sup>0. We consider extensions T := T<sup>0</sup> ∪ K of T<sup>0</sup> with new function symbols Σ<sup>1</sup> (*extension functions*) whose properties are axiomatized using a set K of (universally closed) clauses in the extended signature <sup>Π</sup> = (Σ<sup>0</sup> <sup>∪</sup> <sup>Σ</sup>1, Pred), which contain function symbols in Σ1. If G is a finite set of ground Π<sup>C</sup> -clauses, where C is an additional set of constants, and <sup>K</sup> a set of <sup>Π</sup>-clauses, we will denote by st(K, G) (resp. est(K, G)) the set of all ground terms (resp. extension ground terms, i.e. terms starting with a function in Σ1) which occur in G or K. In this paper we regard every finite set G of ground clauses as the ground formula <sup>C</sup>∈<sup>G</sup> <sup>C</sup>. If <sup>T</sup> is a set of ground terms in the signature <sup>Π</sup><sup>C</sup> , we denote by <sup>K</sup>[T] the set of all instances of K in which the terms starting with a function symbol in Σ<sup>1</sup> are in T. Let Ψ be a map associating with every finite set T of ground terms a finite set Ψ(T) of ground terms containing T. For any set G of ground Π<sup>C</sup> -clauses we write <sup>K</sup>[ΨK(G)] for <sup>K</sup>[Ψ(est(K, G))]. We define:

(Loc<sup>Ψ</sup> <sup>f</sup> ) For every finite set G of ground clauses in Π<sup>C</sup> it holds that

T<sup>0</sup> ∪K∪ G |= ⊥ if and only if T<sup>0</sup> ∪ K[ΨK(G)] ∪ G is unsatisfiable. Extensions satisfying condition (Loc<sup>Ψ</sup> <sup>f</sup> ) are called Ψ*-local*. If Ψ is the identity we obtain the notion of *local theory extensions* [21]; if in addition T<sup>0</sup> is the theory of pure equality we obtain the notion of *local theories* [9,15].

Hierarchical Reasoning. Consider a Ψ-local theory extension T<sup>0</sup> ⊆ T<sup>0</sup> ∪ K. Condition (Loc<sup>Ψ</sup> <sup>f</sup> ) requires that for every finite set G of ground Π<sup>C</sup> -clauses, T0∪K∪G |=⊥ iff T0∪K[ΨK(G)]∪G |=⊥. In all clauses in K[ΨK(G)]∪G the function symbols in Σ<sup>1</sup> only have ground terms as arguments, so K[ΨK(G)]∪G can be flattened and purified by introducing, in a bottom-up manner, new constants c<sup>t</sup> ∈ C for subterms t=f(c1,...,cn) where f∈Σ<sup>1</sup> and c<sup>i</sup> are constants, together with definitions <sup>c</sup>t=f(c1,...,cn). We thus obtain a set of clauses <sup>K</sup>0∪G0∪Def, where <sup>K</sup><sup>0</sup> and <sup>G</sup><sup>0</sup> do not contain <sup>Σ</sup>1-function symbols and Def contains clauses of the form c=f(c1,...,cn), where f∈Σ1, c, c1,...,c<sup>n</sup> are constants.

Theorem 4 ([11,12,21]). *Let* K *be a set of clauses. Assume that* T<sup>0</sup> ⊆ T<sup>0</sup> ∪ K *is a* Ψ*-local theory extension. For any finite set* G *of flat ground clauses (with no nestings of extension functions), let* <sup>K</sup>0∪G0∪Def *be obtained from* <sup>K</sup>[ΨK(G)]∪<sup>G</sup> *by flattening and purification, as explained above. Then the following are equivalent to* T<sup>0</sup> ∪K∪ G |=⊥*:*

$$\begin{array}{l} (i) \ \mathcal{T}\_{0} \cup \mathcal{K}[\mathbb{V}\_{\mathcal{K}}(G)] \cup G \mid = \perp \ . \\ (ii) \ \mathcal{T}\_{0} \cup \mathcal{K}\_{0} \cup G\_{0} \cup \mathsf{Con}\_{0} \mid = \perp, \text{ where} \\ \mathsf{Con}\_{0} = \left\{ \bigwedge\_{i=1}^{n} c\_{i} \approx d\_{i} \to c \approx d \mid \begin{array}{l} f(c\_{1}, \ldots, c\_{n}) \approx c \in \mathsf{Def} \\ f(d\_{1}, \ldots, d\_{n}) \approx d \in \mathsf{Def} \end{array} \right\}. \end{array}$$

In [12] we showed that for extensions with sets of flat and linear clauses Ψ-locality can be checked by checking whether an embeddability condition of partial into total models holds.In [26] we mention (without proof) that the proof in [12] can be extended to situations in which the clauses in K are not linear. The result is presented below. A full proof is given in the extended version of this paper [18].

Theorem 5. *Let* K *be a set of* Σ1*-flat clauses, and* Ψ<sup>K</sup> *be a term closure operator such that for every set* T *of ground terms and for every clause* D *in* K*, if a variable occurs in two terms in* D *then either the two terms are identical, or the variable occurs below two different unary function symbols* f *and* g *and, for every constant* c*,* f(c) *is in* Ψ(T) *iff* g(c) *is in* Ψ(T)*. If all partial models* A *of* T<sup>0</sup> ∪ K *with totally defined* Σ0*-functions, and for which the set of terms* {f(a1,...,an) | f ∈ Σ<sup>1</sup> *and* fA(a1,...,an) *is defined*} *is finite and closed under* <sup>Ψ</sup>*, embed into total models of* <sup>T</sup><sup>0</sup> ∪ K*, then the extension* <sup>T</sup><sup>0</sup> ∪ K *satisfies* (Loc<sup>Ψ</sup> <sup>f</sup> )*.*

### 4 *R*-interpolation in Local Theory Extensions

In [24] we considered convex and P-interpolating theories T<sup>0</sup> with signature <sup>Π</sup><sup>0</sup> = (Σ0, Pred) (where <sup>P</sup>⊆Pred). We studied <sup>Ψ</sup>-local extensions <sup>T</sup> <sup>=</sup> <sup>T</sup><sup>0</sup> ∪ K of T<sup>0</sup> with new function symbols in a set Σ<sup>1</sup> axiomatized by a set K of clauses, with the property that all clauses in K are of the form:

$$\begin{cases} x\_1 \, R\_1 \, s\_1 \wedge \dots \wedge x\_n \, R\_n \, s\_n \to f(x\_1, \dots, x\_n) \, R \, g(y\_1, \dots, y\_n) \\ x\_1 \, R\_1 \, y\_1 \wedge \dots \wedge x\_n \, R\_n \, y\_n \to f(x\_1, \dots, x\_n) \, R \, f(y\_1, \dots, y\_n) \end{cases} \tag{1}$$

where n ≥ 1, x1,...,xn, y1,...,y<sup>n</sup> are variables, f,g ∈ Σ1, R1,...,Rn, R are binary relations with R1,...,R<sup>n</sup> ∈ P and R transitive, and each s<sup>i</sup> is either a variable among the arguments of g, or a term of the form fi(z1,...,zk), where f<sup>i</sup> ∈ Σ<sup>1</sup> and all the arguments of f<sup>i</sup> are variables occurring among the arguments of g.

*Example 2.* A set K of axioms containing clauses of the form:

$$\begin{cases} x\_1 \le h(y\_1) \to f(x\_1) \le g(y\_1) \\\ x\_1 \le y\_1 \to f(x\_1) \le f(y\_1) \end{cases}$$

satisfies the conditions above: n = 1, R<sup>1</sup> = R =≤, s<sup>1</sup> = h(y1), f, g, h ∈ Σ1.

In [24], we proved that if T<sup>0</sup> allows ground interpolation, then T allows ground interpolation, and that the interpolants can be computed in a hierarchical way, using a method for ground interpolation in T0. We now show that under the conditions above, the property of P*-interpolation* can be transferred from the theory T<sup>0</sup> to the extension T = T<sup>0</sup> ∪ K of T0. The function symbols in the signature of T<sup>0</sup> are considered to be interpreted, and will always be considered to be shared. For the function symbols in the signature Σ<sup>1</sup> – considered to be "quasi"-interpreted – we use the notion of ΘK-sharing introduced in Sect. 2.

In order to show that T has the P-interpolation property, we need to prove that if A, B are conjunctions of atoms and A(c, a1, a)∧B(c, b1, b) |=<sup>T</sup> aRb, where R ∈ P, then there exists a term t containing only the constants common to A and B and only function symbols which are ΘK*-shared* by A and B, such that A(c, a1, a) ∧ B(c, b1, b) |=<sup>T</sup> aRt ∧ tRb.

<sup>A</sup>(c, <sup>a</sup>1, a)∧B(c, <sup>b</sup>1, b)|=<sup>T</sup> aRb iff <sup>A</sup>(c, <sup>a</sup>1, a)∧B(c, <sup>b</sup>1, b)∧¬(aRb) <sup>|</sup>=<sup>T</sup> <sup>⊥</sup>. By Theorem 4 we can purify and flatten this conjunction and obtain a conjunction of unit clauses <sup>A</sup><sup>0</sup> <sup>∧</sup> <sup>B</sup><sup>0</sup> <sup>∧</sup> Def ∧ ¬(aRb), where Def is a set of definitions of newly introduced constants. Let T be the extension terms in Def. We introduce new constants and definitions also for all extension terms in Ψ(T). This new set of definitions can be written as a conjunction D<sup>A</sup> ∧D<sup>B</sup> of its A-part and its B-part. By the Ψ-locality of the extension T<sup>0</sup> ⊆ T<sup>0</sup> ∪ K and Theorem 4,

$$(A\_0 \land B\_0 \land \mathsf{Def} \land \neg(aRb) \mid =\_T \bot \quad \text{iff } \mathcal{K}\_0 \land A\_0 \land B\_0 \land \mathsf{Con}[D\_A \land D\_B]\_0 \land \neg(aRb) \mid =\_{T\_0} \bot,$$

where K<sup>0</sup> is obtained from K[D<sup>A</sup> ∧ DB] by replacing the Σ1-terms with the corresponding constants contained in the definitions D<sup>A</sup> ∧ D<sup>B</sup> and

$$\mathsf{Con}[D\_A \wedge D\_B]\_0 = \bigwedge \left\{ \bigwedge\_{i=1}^n c\_i \approx d\_i \to c \approx d \mid \begin{array}{l} f(c\_1, \dots, c\_n) \approx c \in D\_A \cup D\_B, \\ f(d\_1, \dots, d\_n) \approx d \in D\_A \cup D\_B \end{array} \right\}.$$

In general, Con[D<sup>A</sup> <sup>∧</sup> <sup>D</sup>B]<sup>0</sup> <sup>=</sup> Con<sup>A</sup> <sup>0</sup> <sup>∧</sup> Con<sup>B</sup> <sup>0</sup> <sup>∧</sup> Conmix and <sup>K</sup><sup>0</sup> <sup>=</sup> <sup>K</sup><sup>A</sup> <sup>0</sup> ∧ K<sup>B</sup> <sup>0</sup> ∧ Kmix, where Con<sup>A</sup> <sup>0</sup> , <sup>K</sup><sup>A</sup> <sup>0</sup> only contain extension functions and constants which occur in A, Con<sup>B</sup> <sup>0</sup> , <sup>K</sup><sup>B</sup> <sup>0</sup> only contain extension functions and constants which occur in B, and Conmix, <sup>K</sup>mix contain mixed clauses with constants occurring in both <sup>A</sup> and <sup>B</sup>. Our goal is to separate Conmix and <sup>K</sup>mix into an <sup>A</sup>-part and a <sup>B</sup>-part, which would allow us to use the P-interpolation property of theory T0.

Proposition 6. *Assume that* T<sup>0</sup> *is convex and* P*-interpolating. Let* H *be a set of Horn clauses* ( <sup>n</sup> <sup>i</sup>=1 <sup>c</sup>iRidi) <sup>→</sup> cR0<sup>d</sup> *in the signature* <sup>Π</sup><sup>C</sup> <sup>0</sup> *(with* R<sup>0</sup> *transitive and* R<sup>i</sup> ∈ P*) which are instances of flattened and purified clauses of type (1) and of congruence axioms. Let* Hmix *be the mixed clauses in* H*:*

Hmix = { *n <sup>i</sup>*=1 *ciRid<sup>i</sup>* → *cR*0*d* ∈H| *ci, c constants in A, di, d constants in B*}∪ { *n <sup>i</sup>*=1 *ciRid<sup>i</sup>* → *cR*0*d* ∈H| *ci, c constants in B, di, d constants in A*}

*Let* A<sup>0</sup> *and* B<sup>0</sup> *be conjunctions of ground literals in the signature* Π<sup>C</sup> <sup>0</sup> *such that* A<sup>0</sup> ∧ B<sup>0</sup> ∧H∧¬(aRb) |=<sup>T</sup>0⊥*. Then* H *can be separated into an* A*- and a* B*-part by replacing the set* Hmix *of mixed clauses with a separated set of formulae* Hsep*:*

*(i) There exists a set* T *of* (Σ<sup>0</sup> ∪ C)*-terms containing only constants common to* A<sup>0</sup> *and* B<sup>0</sup> *such that* A<sup>0</sup> ∧ B<sup>0</sup> ∧ (H\Hmix) ∧ Hsep ∧ ¬(aRb) |=<sup>T</sup>0⊥*, where* Hsep={( <sup>n</sup> <sup>i</sup>=1 ciRit<sup>i</sup> → cRc<sup>f</sup>(t1,...,tn)) ∧ ( <sup>n</sup> <sup>i</sup>=1 tiRid<sup>i</sup> → c<sup>f</sup>(t1,...,tn)Rd) | <sup>n</sup> <sup>i</sup>=1 ciRid<sup>i</sup> → cRd∈ Hmix, di≈si(e1,...,en), d≈g(e1,...,en)∈DB, <sup>c</sup>≈f(c1,...,cn)∈D<sup>A</sup> *or vice versa* } <sup>=</sup> <sup>H</sup><sup>A</sup> sep ∧ H<sup>B</sup> sep

*and* c<sup>f</sup>(t1,...,tn) *are new constants in* Σ<sup>c</sup> *(considered to be common) introduced for the corresponding terms* f(t1,...,tn)*, where for* i ∈ {1,...,n}*,* t<sup>i</sup> *separates the atom* ciRidi*, which is entailed by the already deduced atoms.*

*(ii)* A<sup>0</sup> ∧ B<sup>0</sup> ∧(H\Hmix)∧ Hsep ∧ ¬(aRb) *is logically equivalent with respect to* T<sup>0</sup> *with the following separated conjunction of ground literals:* A<sup>0</sup> ∧ B0∧¬(aRb)= A<sup>0</sup> ∧ B<sup>0</sup> ∧ ¬(aRb) ∧ {cRd | Γ→cRd ∈ H\Hmix}∧ {cRc<sup>f</sup>(t) ∧ c<sup>f</sup>(t)Rd | (Γ → cRc<sup>f</sup>(t)) ∧ (Γ → c<sup>f</sup>(t)Rd) ∈ Hsep}.

*Proof (Idea).* The proof is similar to that of Prop. 5.7 in [24]. (i) and (ii) are proved simultaneously by induction on the number of clauses in H. If H = ∅, it is already separated. Otherwise, one can prove that either (A<sup>0</sup> ∧ B0) |= aRb – in which case we are done – or A<sup>0</sup> ∧ B<sup>0</sup> entails all the premises of some clause C in H. If C contains only constants in A<sup>0</sup> or B<sup>0</sup> we can remove it from H, add its conclusion to A<sup>0</sup> ∧ B<sup>0</sup> and repeat the procedure with the new A<sup>0</sup> ∧ B<sup>0</sup> and H. If the clause is mixed, we can compute terms t<sup>i</sup> which separate the premises in C, separate C into an instance C<sup>1</sup> of monotonicity and an instance C<sup>2</sup> of a clause in K, remove C from H, add to A<sup>0</sup> ∧ B<sup>0</sup> the conclusions of the clauses C1, C2, and repeat the procedure with the new A<sup>0</sup> ∧ B<sup>0</sup> and H. -

Theorem 7. *Assume that* T<sup>0</sup> *is convex and* P*-interpolating with respect to* P ⊆ Pred*, and that* <sup>T</sup> <sup>=</sup> <sup>T</sup>0∪K *is a local extension of* <sup>T</sup><sup>0</sup> *with a set of clauses* <sup>K</sup> *which only contains combinations of clauses of type (1). Then* T *is also* P*-interpolating.*

*Proof (Idea).* We prove that if A, B are conjunctions of literals and A(c, a1, a) ∧ B(c, b1, b) |=<sup>T</sup> aRb where R ∈ P, then there exists a term t containing only the constants common to A and B and only function symbols which are shared by A and B, such that A(c, a1, a)∧ B(c, b1, b) |=<sup>T</sup> aRt ∧ tRb. We can restrict w.l.o.g. to a purified and flattened conjunction of unit clauses <sup>A</sup><sup>0</sup> <sup>∧</sup> <sup>B</sup><sup>0</sup> <sup>∧</sup> Def ∧ ¬(aRb). With the notation used on page 8, by Theorem 4 we have:

<sup>A</sup><sup>0</sup> <sup>∧</sup> <sup>B</sup><sup>0</sup> <sup>∧</sup> Def ∧ ¬(aRb)|=<sup>T</sup> <sup>⊥</sup> iff <sup>K</sup><sup>0</sup> <sup>∧</sup> <sup>A</sup><sup>0</sup> <sup>∧</sup> <sup>B</sup><sup>0</sup> <sup>∧</sup> Con[D<sup>A</sup> <sup>∧</sup> <sup>D</sup>B]<sup>0</sup> ∧ ¬(aRb)|=T0⊥. By Proposition 6 (ii), there exists a set T of (Σ<sup>0</sup> ∪ C)-terms containing only constants common to <sup>A</sup><sup>0</sup> and <sup>B</sup><sup>0</sup> such that <sup>H</sup> <sup>=</sup> <sup>K</sup><sup>0</sup> <sup>∧</sup> Con[D<sup>A</sup> <sup>∧</sup> <sup>D</sup>B]<sup>0</sup> can be separated as described in Proposition 6, A<sup>0</sup> ∧ B<sup>0</sup> ∧ (H\Hmix) ∧ Hsep ∧ ¬aRb is logically equivalent w.r.t. T<sup>0</sup> with a separated conjunction of ground literals A<sup>0</sup> ∧B<sup>0</sup> ∧¬aRb, which is therefore unsatisfiable, so A<sup>0</sup> ∧B<sup>0</sup> |= aRb. From the Pinterpolation property in T0, there exists a term containing the shared constants such that A<sup>0</sup> ∧ B<sup>0</sup> |=T<sup>0</sup> aRt ∧ tRb. If we now replace all constants c<sup>f</sup>(t1,...,tn) introduced in the purification process or in the separation process with the terms they denote, we obtain A ∧ B |=<sup>T</sup> aRt ∧ tRb. -

We obtain the following procedure for P-interpolation if A ∧ B |=<sup>T</sup> aRb:

Step 1: Preprocess Using locality, flattening and purification we obtain a set H ∧ A<sup>0</sup> ∧ B<sup>0</sup> of formulae in the base theory, where H is as in Proposition 6.

Step 2: <sup>Δ</sup> := <sup>T</sup>. Repeat as long as <sup>A</sup><sup>0</sup> <sup>∧</sup> <sup>B</sup><sup>0</sup> <sup>∧</sup> <sup>Δ</sup> |<sup>=</sup> aRb:

Let C∈H whose premise is entailed by A0∧B0∧Δ.

If C is not mixed, move C to Hsep and add its conclusion to Δ.

If C is mixed, compute terms t<sup>i</sup> which separate the premises in C, and separate the clause into an instance C<sup>1</sup> of monotonicity and an instance C<sup>2</sup> of a clause in K as in the proof of Proposition 6. Remove C from H, and add C1, C<sup>2</sup> to Hsep and their conclusions to Δ.

Step 3: Compute separating term. Compute a separating term for A0∧B0∧ Δ |= aRb in T0, and construct an interpolant for the extension as explained in the proof of Theorem 7.

### 5 Example: Semilattices with Monotone Operators

We will now analyze ≤-interpolation properties for theories of semilattices with monotone operators. A semilattice (S,) is set S with a binary operation which is associative, commutative and idempotent. One can equivalently regard semilattices as partially ordered sets (S, ≤), in which infima of finite non-empty subsets exist; then a ≤ b iff a b = a.

The theory SLat of semilattices can be axiomatized by equations (associativity, commutativity and idempotence of ) hence clearly is ≈-convex: Convexity w.r.t. <sup>≤</sup> follows from the fact that <sup>x</sup> <sup>≤</sup> <sup>y</sup> iff (<sup>x</sup> <sup>y</sup>) <sup>≈</sup> <sup>x</sup>. The theory SLat is ≤-interpolating, therefore also ≈-interpolating (cf. also [17]; we present the idea of the proof since it indicates how the intermediate terms can be computed):

# Lemma 8. *The theory* SLat *of semilattices is* <sup>≤</sup>*-interpolating.*

*Proof (Idea):* This is a constructive proof based on the fact that every semilattice is isomorphic to a sublattice of a power of S2, where S<sup>2</sup> is the 2-element semilattice (or, alternatively, that every semilattice is isomorphic to a semilattice of sets). We prove that if A and B are two conjunctions of literals and A ∧ B |=SLat a ≤ b, where a is a constant occurring in A and b a constant occurring in B, then there exists a term containing only common constants in A and B such that A ∧ B |=SLat a ≤ t and A ∧ B |=SLat t ≤ b. We can assume without loss of generality that A and B consist only of atoms (for details cf. [17]). A ∧ B |=SLat a ≤ b if and only if the following conjunction of literals in propositional logic is unsatisfiable:

NA: ⎧ ⎪⎪⎨ ⎪⎪⎩ P<sup>e</sup>1e2↔P<sup>e</sup><sup>1</sup> ∧ P<sup>e</sup><sup>2</sup> P<sup>e</sup>1↔P<sup>e</sup><sup>2</sup> e<sup>1</sup> ≈ e<sup>2</sup> ∈ A P<sup>e</sup>1→P<sup>e</sup><sup>2</sup> e<sup>1</sup> ≤ e<sup>2</sup> ∈ A for all e1,e<sup>2</sup> subterms in A NB: ⎧ ⎪⎪⎨ ⎪⎪⎩ P<sup>g</sup>1g<sup>2</sup> ↔ P<sup>g</sup><sup>1</sup> ∧ P<sup>g</sup><sup>2</sup> P<sup>g</sup><sup>1</sup> ↔ P<sup>g</sup><sup>2</sup> g<sup>1</sup> ≈ g<sup>2</sup> ∈ B P<sup>g</sup><sup>1</sup> → P<sup>g</sup><sup>2</sup> g<sup>1</sup> ≤ g<sup>2</sup> ∈ B for all g1, g<sup>2</sup> subterms in B P<sup>a</sup> ¬P<sup>b</sup>

We obtain an unsatisfiable set of clauses (N<sup>A</sup> ∧Pa)∧(N<sup>B</sup> ∧ ¬Pb) |=⊥, where N<sup>A</sup> and N<sup>B</sup> are sets of Horn clauses in which each clause contains a positive literal. We show that if A ∧ B |=SLat a ≤ b holds, then for the term

$$t := \bigcap \{ e \mid A \mid =\_{\mathsf{Slat}} a \le e, e \text{ common subterm of } A \text{ and } B \} $$

we have (i) A |=SLat a ≤ t, and (ii) A ∧ B |=SLat t ≤ b.

Clearly, A |=SLat a ≤ t, thus (i) holds. For proving (ii), we analyze the set of clauses obtained by saturating N<sup>A</sup> ∧ P<sup>a</sup> under ordered resolution in which all propositional variables occurring in A but not in B are larger than the common symbols. It is proved that for deriving the contradiction only the unit clauses Pe, where e is a common subterm of A and B and A |= a ≤ e, and certain resolvents of N<sup>A</sup> ∧ P<sup>a</sup> are needed. The full proof is given in [17] and also in [18]. -

We illustrate the computation of intermediate terms on an example.

*Example 3.* Let A = {a<sup>1</sup> ≤ c1, c<sup>2</sup> ≤ a2, a<sup>2</sup> ≤ c3} and B = {c<sup>1</sup> ≤ b1, b<sup>1</sup> ≤ c2, c<sup>3</sup> ≤ b2}. It is easy to see that A∧B |= a<sup>1</sup> ≤ b2. We can find an intermediate term by using the methods described in the proof of Lemma 8: We saturate the set of clauses

N<sup>A</sup> ∧ Pa<sup>1</sup> = (Pa<sup>1</sup> → Pc<sup>1</sup> ) ∧ (Pc<sup>2</sup> → Pa<sup>2</sup> ) ∧ (Pa<sup>2</sup> → Pc<sup>3</sup> ) ∧ Pa<sup>1</sup>

under ordered resolution, in which the propositional variables Pa<sup>1</sup> , Pa<sup>2</sup> are larger than Pc<sup>1</sup> , Pc<sup>2</sup> , Pc<sup>3</sup> . This yields the clauses Pc<sup>1</sup> and Pc<sup>2</sup> → Pc<sup>3</sup> containing shared propositional variables. (N<sup>A</sup> ∧Pa<sup>1</sup> )∧(N<sup>B</sup> ∧ ¬Pb<sup>2</sup> ) is unsatisfiable iff N<sup>B</sup> ∧ ¬Pb<sup>2</sup> ∧ P<sup>c</sup><sup>1</sup> ∧ (P<sup>c</sup><sup>2</sup> → P<sup>c</sup><sup>3</sup> ) is unsatisfiable. Indeed t = c<sup>1</sup> is an intermediate term, as A |= a<sup>1</sup> ≤ c<sup>1</sup> and A ∧ B |= c<sup>1</sup> ≤ b2. Note that N<sup>B</sup> ∧ ¬P<sup>b</sup><sup>2</sup> ∧ P<sup>c</sup><sup>1</sup> is satisfiable, so B |= c<sup>1</sup> ≤ b2. Moreover, we only need P<sup>c</sup><sup>2</sup> → P<sup>c</sup><sup>3</sup> in addition to N<sup>B</sup> ∪ ¬P<sup>b</sup><sup>2</sup> to derive ⊥, thus A ∧ B |= c<sup>1</sup> ≤ b<sup>2</sup> and the clause P<sup>c</sup><sup>2</sup> → P<sup>c</sup><sup>3</sup> obtained from N<sup>A</sup> is really needed for this. -

Semilattices with operators. Let Σ be a set of unary<sup>2</sup> function symbols. We consider the extension SLat<sup>Σ</sup> <sup>=</sup> SLat <sup>∪</sup> Mon(Σ) of SLat with new function symbols in Σ satisfying the monotonicity axioms Mon<sup>Σ</sup> = <sup>f</sup>∈<sup>Σ</sup> Mon(f), where:

Mon(f) <sup>∀</sup>x, y(<sup>x</sup> <sup>≤</sup> <sup>y</sup> <sup>→</sup> <sup>f</sup>(x) <sup>≤</sup> <sup>f</sup>(y)) and also extensions SLat <sup>∪</sup> Mon(Σ) ∪ K, where <sup>K</sup> is a set of axioms of the form:

$$\forall x \quad f(x) \le g(x) \tag{2}$$

$$\forall x, y \quad y \le g(x) \to f(y) \le h(x) \tag{3}$$

where f, g, h ∈ Σ, not necessarily all different.

Lemma 9. *The following extensions satisfy a locality property:*


$$\Psi(G) = \bigcup\_{i \ge 0} \Psi^i(G), \text{ with } \Psi^0(G) = \textsf{est}(G) \text{ (the set of ground terms in } G)$$

$$\text{starting with extension functions), and}$$

$$\begin{split} \Psi^{i+1}(G) &= \{ h(c) \mid \forall x (g(x) \le h(x)) \in \mathcal{K} \text{ and } g(c) \in \Psi^i(G) \} \cup \\ &\quad \{ g(c) \mid \forall x (g(x) \le h(x)) \in \mathcal{K} \text{ and } h(c) \in \Psi^i(G) \} \cup \\ &\quad \{ h(c) \mid \forall x, y (y \le g(x) \to f(y) \le h(x)) \in \mathcal{K} \text{ and } g(c) \in \Psi^i(G) \} \cup \\ &\quad \{ g(c) \mid \forall x, y (y \le g(x) \to f(y) \le h(x)) \in \mathcal{K} \text{ and } h(c) \in \Psi^i(G) \}. \end{split}$$

*Proof:* (i) follows from a result on the locality of lattices by Skolem [20], or by results in [9], since every partial semilattice weakly embeds into a total one. (ii) follows from results in [27,28]. (iii) Since the axioms in K are not always

<sup>2</sup> We assume that the function symbols are unary to simplify the presentation, and because in the applications to description logics we need only unary function symbols. All the results can be extended to function symbols of higher arity.

linear, we use the locality criterion for non-linear sets of clauses mentioned in Theorem 5, and the fact that every semilattice P = (S,, {f}f∈<sup>Σ</sup>) with partially defined monotone operators satisfying the axioms K, and with the property that if a variable occurs in two terms g(x), h(x) in a clause in K, then for every s ∈ S, g(s) is defined iff h(s) is defined, weakly embeds into a semilattice with totally defined operators satisfying K, which was proved in Lemma 4.5 from [26]. -

Given two sets of conjunctions of ground literals A and B over the signature of semilattices with operators, we consider the lattice operation to be interpreted and the function symbols in Σ to be uninterpreted. Let Σ<sup>A</sup> be the function symbols in Σ occurring in A and Σ<sup>B</sup> those occurring in B. We consider the following variants for "shared uninterpreted function symbols":


Theorem 10. *For every set* K *containing clauses of the form (2) and (3) above, the theory* SLat <sup>∪</sup> Mon<sup>Σ</sup> ∪ K *of semilattices with monotone operators satisfying axioms* K *is* ≤*-interpolating with the notion of* ΘK*-sharing for uninterpreted function symbols.*

*Proof:* The clauses of type (2) and (3) satisfy the conditions in the statement of Proposition 6 and Theorem 7. The result is therefore a consequence of the fact that SLat is convex and {≈, ≤}-interpolating, and of Proposition <sup>6</sup> and Theorem 7. -

We illustrate the way Theorem 4, Proposition 6 and Theorem 7 and the algorithm in Sect. 4 can be used for computing intermediate terms below:

*Example 4.* Consider the extension SLO <sup>=</sup> SLat <sup>∪</sup> Mon<sup>f</sup> <sup>∪</sup> Mon<sup>g</sup> ∪ K of SLat with two monotone functions f,g satisfying: K = {y ≤ g(x) → f(y) ≤ g(x)}. Consider the following conjunctions of atoms: A := d ≤ g(a) ∧ a ≤ c ∧ g(c) ≤ a and B := b ≤ d ∧ b ≤ f(b). It can be checked that A ∧ B |= b ≤ a.

To obtain a separating term we proceed as follows: By the definition of SLO, <sup>A</sup> <sup>∧</sup> <sup>B</sup> <sup>|</sup>=SLO <sup>b</sup> <sup>≤</sup> <sup>a</sup> iff SLat <sup>∧</sup> Mon<sup>f</sup> <sup>∧</sup> Mon<sup>g</sup> ∧K∧ <sup>A</sup> <sup>∧</sup> <sup>B</sup> ∧ ¬(<sup>b</sup> <sup>≤</sup> <sup>a</sup>) <sup>|</sup>=⊥. By Theorem 4, this is the case iff SLat <sup>∧</sup> (Mon<sup>f</sup> <sup>∧</sup> Mon<sup>g</sup> ∧ K)[Ψ(G)] <sup>∧</sup> <sup>G</sup> <sup>|</sup>=⊥, where <sup>G</sup> <sup>=</sup> <sup>A</sup> <sup>∧</sup> <sup>B</sup> ∧ ¬(<sup>b</sup> <sup>≤</sup> <sup>a</sup>), est(G) = {g(a), g(c), f(b)} and <sup>Ψ</sup>(G) = {g(a), g(c), f(b)}.


Step 1: We purify (Mon<sup>f</sup> <sup>∧</sup>Mon<sup>g</sup> ∧K)[Ψ(G)]∧G, by introducing constants <sup>a</sup><sup>1</sup> for <sup>g</sup>(a), <sup>c</sup><sup>1</sup> for <sup>g</sup>(c) and <sup>b</sup><sup>1</sup> for <sup>f</sup>(b) and obtain the formula Def∧A0∧B0∧Mon0∧K0:


Step 2. <sup>Δ</sup> := . Find clauses in Mon0∧K<sup>0</sup> with premises entailed by <sup>A</sup>0∧B0∧Δ.

C = a ≤ c → a<sup>1</sup> ≤ c1: C is not mixed. Since A<sup>0</sup> ∧B<sup>0</sup> |=SLat a ≤ c, A<sup>0</sup> ∧B<sup>0</sup> ∧(a ≤ c → a<sup>1</sup> ≤ c1) is equivalent to A<sup>0</sup> ∧ B<sup>0</sup> ∧ a<sup>1</sup> ≤ c1. Let Δ := {a<sup>1</sup> ≤ c1}.

	- (1) b ≤ d → b<sup>1</sup> ≤ d<sup>1</sup> and

$$(2)\ d \le a\_1 \to d\_1 \le a\_1$$

((1) is an instance of a monotonicity axiom, (2) is another instance of K), and A<sup>0</sup> ∧ B<sup>0</sup> ∧ a<sup>1</sup> ≤ c<sup>1</sup> ∧ (b ≤ d → b<sup>1</sup> ≤ d1) ∧ (d ≤ a<sup>1</sup> → d<sup>1</sup> ≤ a1) is equivalent to A<sup>0</sup> ∧ B<sup>0</sup> ∧ a<sup>1</sup> ≤ c<sup>1</sup> ∧ b<sup>1</sup> ≤ d<sup>1</sup> ∧ d<sup>1</sup> ≤ a1. Let Δ := Δ ∧ b<sup>1</sup> ≤ d<sup>1</sup> ∧ d<sup>1</sup> ≤ a1.

Step 3: The last conjunction entails b ≤ a. To compute a separating term, we again use Lemma 8. We consider the encoding N <sup>B</sup> ∧ P<sup>b</sup> := (P<sup>b</sup> → Pd) ∧ (P<sup>b</sup> → P<sup>b</sup><sup>1</sup> ) ∧ (P<sup>b</sup><sup>1</sup> → P<sup>d</sup><sup>1</sup> ) ∧ P<sup>b</sup> of the B-part of the conjunction, B<sup>0</sup> ∧ b<sup>1</sup> ≤ d1. Using ordered resolution with an ordering in which Pb, P<sup>b</sup><sup>1</sup> Pd, P<sup>d</sup><sup>1</sup> we derive the unit clauses Pd, P<sup>b</sup><sup>1</sup> and P<sup>d</sup><sup>1</sup> . Since d, d<sup>1</sup> are the shared constants, t = d d<sup>1</sup> is the separating term. (It can be seen that already d is a separating term.) -

If K contains axioms of type (3) then the theory of semilattices with operators is not ≤-interpolating when sharing is regarded as *intersection-sharing*. Indeed, assume that for every <sup>K</sup> containing axioms of type (3), SLatΣ(K) is <sup>≤</sup> interpolating w.r.t. intersection-sharing. Then it would also be ≈-interpolating w.r.t. intersection-sharing. This cannot be the case, as can be seen from the following example.

*Example 5.* Consider the theory SLatΣ(K) of semilattices with monotone operators f,g satisfying the axioms K = {x ≤ g(y) → f(x) ≤ g(y)}, and let C be a set of constants containing constants a, b, d, e. We show that this theory does not have the ΣS-Beth-definability property, where Σ<sup>S</sup> = {g, e}.

Consider the conjunction of literals A = (a ≤ f(e)) ∧ (e ≤ g(b)) ∧ (g(b) ≤ a). One can prove that a is implicitly definable w.r.t. {g, e} by proving, using the hierarchical reduction for local theory extensions in Theorem 4, that:

(*a*≤*f*(*e*))∧(*e*≤*g*(*b*))∧(*g*(*b*)≤*a*)∧(*a*- ≤*f*- (*e*))∧(*e*≤*g*(*b*- ))∧(*g*(*b*- )≤*a*- )|=SlatΣ(K∪K-)*a*≈*a*- *.*

We show that a is not explicitly definable w.r.t. {g, e}. If there exists a term t containing only g and e such that (a≤f(e))∧(e≤g(b))∧(g(b)≤a) |=SlatΣ(K) a≈t, then the interpretations of <sup>a</sup> and <sup>t</sup> are equal in every model of SLatΣ(K) which is a model of A. We show that this is not the case. Let S = ({a, e, b, d},,f,g) be the semilattice where d ≤ e ≤ a, d ≤ b and ab = eb = d, and f(a) = f(e) = a, f(b) = f(d) = d, g(a) = g(e) = g(d) = d and g(b) = a. Then S satisfies A, f and g are monotone, and S is a model of K: Assume that x ≤ g(y). If y ∈ {a, e, d} then g(y) = d so x = d, and f(d) = d ≤ g(y). If y = b then g(b) = a, so x can be a, e or d, and f(a) = f(e) = a, f(d) = d, so f(x) ≤ g(b) = a. A term t containing only g and e can be e or can contain occurrences of g. If t = e then the interpretation of t in S is not a. If t contains occurrences of g it can be proven that the interpretation of t in S is d, i.e. is again different from a. Thus <sup>T</sup> <sup>=</sup> SLatΣ(K) does not have the Beth definability property w.r.t. <sup>Σ</sup>S, hence, by Theorem 3, T ∪T <sup>=</sup> SLatf,g(K) <sup>∪</sup> SLat<sup>f</sup>-,g(K ) = SLatf,f-,g(K∪K ),

where K = {y ≤ g(x) → f (y) ≤ g(x)}, does not have the ≈-interpolation property w.r.t. intersection-sharing, hence it does not have the ≤-interpolation property w.r.t. intersection-sharing. (By Theorem 10 and Theorem 3, T has the ΘK(ΣS)-Beth definability property, where ΘK(ΣS) = {f, g, e}. Indeed, then A |= a ≈ f(e).) -

# 6 Applications to *EL* and *EL***<sup>+</sup>**-Subsumption

We now explain how these results can be used in the study of the description logics EL and EL<sup>+</sup>. In any description logic a set <sup>N</sup><sup>C</sup> of *concept names* and a set N<sup>R</sup> of *roles* is assumed to be given. *Concept descriptions* can be defined with the help of a set of *concept constructors*. The available constructors determine the expressive power of a description logic. If we only allow intersection and existential restriction as concept constructors, we obtain the description logic EL [1], a logic used in terminological reasoning in medicine [29,30]. The table below shows the constructor names used in EL and their semantics.


The semantics is given by interpretations I = (Δ, · <sup>I</sup>), where C<sup>I</sup> ⊆ Δ and <sup>r</sup><sup>I</sup> <sup>⊆</sup> <sup>Δ</sup><sup>2</sup> for every <sup>C</sup> <sup>∈</sup> <sup>N</sup><sup>C</sup> , <sup>r</sup> <sup>∈</sup> <sup>N</sup>R. The extension of · <sup>I</sup> to concept descriptions is inductively defined using the semantics of the constructors. In [2,3], the extension EL<sup>+</sup> of EL with role inclusion axioms is studied.

A TBox (or terminology) is a finite set consisting of *general concept inclusions* (GCI) of the form C D, where C and D are concept descriptions. A CBox consists of a TBox and a set of role inclusions of the form r<sup>1</sup> ◦···◦ r<sup>n</sup> s, so we view CBoxes as unions GCI∪R of a set GCI of general concept inclusions and a set R of role inclusions of the form r<sup>1</sup> ◦···◦r<sup>n</sup> s, with n≥1. <sup>3</sup> An interpretation I is a *model of the CBox* C = GCI ∪ R if it is a model of GCI, i.e., CI⊆D<sup>I</sup> for every CD ∈ GCI, and satisfies all role inclusions in C, i.e., r<sup>I</sup> <sup>1</sup> ◦···◦ r<sup>I</sup> <sup>n</sup> ⊆ s<sup>I</sup> for all r<sup>1</sup> ◦···◦ r<sup>n</sup> ⊆ s ∈ R. If C is a CBox and C1, C<sup>2</sup> are concept descriptions, then C |= C<sup>1</sup> C<sup>2</sup> if and only if C<sup>I</sup> <sup>1</sup> ⊆ C<sup>I</sup> <sup>2</sup> for every model I of C.

In [23] we studied the link between TBox subsumption in EL and uniform word problems in the corresponding classes of semilattices with monotone functions. In [25], we showed that these results naturally extend to CBoxes and to the description logic EL<sup>+</sup>. When defining the semantics of EL or EL<sup>+</sup> with role names N<sup>R</sup> we use a class of -semilattices with monotone operators of the form SLatΣ, where <sup>Σ</sup> <sup>=</sup> {f<sup>r</sup> <sup>|</sup> <sup>r</sup> <sup>∈</sup> <sup>N</sup>R}. Every concept description <sup>C</sup> can be represented as a term C; the encoding is inductively defined: Every concept name C ∈ N<sup>C</sup> is regarded as a constant C = C. We define C<sup>1</sup> C<sup>2</sup> := C<sup>1</sup> C<sup>2</sup> and ∃rC = fr(C). If R is a set of role inclusions of the form r s and r<sup>1</sup> ◦ r<sup>2</sup> s, let K be the set of all axioms of the form:

∀x (fr(x) ≤ fs(x)) for all r s ∈ R ∀x (f<sup>r</sup><sup>1</sup> (f<sup>r</sup><sup>2</sup> (x)) ≤ fs(x)) for all r<sup>1</sup> ◦ r<sup>2</sup> s ∈ R

Theorem 11 ([25]). *Assume that the only concept constructors are intersection and existential restriction. Then for all concept descriptions* D1, D<sup>2</sup> *and every* EL<sup>+</sup> *CBox* <sup>C</sup>=GCI∪R *– where* <sup>R</sup> *consists of role inclusions of the form* <sup>r</sup> <sup>s</sup> *and* r<sup>1</sup> ◦ r<sup>2</sup> s *– with concept names* N<sup>C</sup> = {C1,...,Cn} *and set of roles* NR*:*

$$|\mathcal{C}| = D\_1 \Box \overline{D}\_2 \quad \text{iff} \quad \left(\prod\_{C \subseteq \underline{D} \in GCI} \overline{C} \le \overline{D}\right) \mid = \text{SLat}\_{\Sigma}(\mathbb{K}) \quad \overline{D\_1} \le \overline{D\_2},$$

*where* Σ *is associated with* N<sup>R</sup> *and* K *with* R *as described above.*

In [8,31] the following notion of interpolation which we call -interpolation is defined: A description logic has the -interpolation property if for any CBoxes C<sup>A</sup> = GCI<sup>A</sup> ∪RA, C<sup>B</sup> = GCI<sup>B</sup> ∪R<sup>B</sup> and any concept descriptions C, D such that C<sup>A</sup> ∪ C<sup>B</sup> |= C D there exists a concept description T containing only concept and role symbols "shared" by {CA, C} and {CB, D} such that C<sup>A</sup> ∪ C<sup>B</sup> |= C T and C<sup>A</sup> ∪ C<sup>B</sup> |= T D. By Theorem 11, C<sup>A</sup> ∪ C<sup>B</sup> |= C D iff A ∧ B |=SLatΣ(K) C≤D, where A = C1 <sup>C</sup>2∈GCI<sup>A</sup> <sup>C</sup>1≤C2, <sup>B</sup> <sup>=</sup> C1 <sup>C</sup>2∈GCI<sup>B</sup> <sup>C</sup>1≤C2, and <sup>K</sup> <sup>=</sup> K<sup>A</sup> ∪ KB, the union of the axioms associated with the set inclusions R<sup>A</sup> resp. RB. By Theorem 10, there exists a term containing only constants and function symbols Θ<sup>K</sup>A∪K<sup>B</sup> *-shared* by A and B such that A∧B |=SLatΣ(KA∪KB) C≤t∧t≤D. From t we can construct a concept description T containing only concept names and roles *shared* by C<sup>A</sup> and CB, and by Theorem 11, C<sup>A</sup> ∧C<sup>B</sup> |= C T ∧T D. Therefore, the -interpolation problem studied for description logics in [8,31] can be expressed in the case of EL and EL<sup>+</sup> as a <sup>≤</sup>-interpolation problem in the class of semilattices with operators, and the hierarchical method for ≤-interpolation can be used in this case. We distinguish between intersection-sharing and ΘRsharing, where Θ<sup>R</sup> is the analogon of Θ<sup>K</sup> where K is the translation of R.

<sup>3</sup> It can be shown that it is sufficient to consider role inclusions of the form *<sup>r</sup> <sup>s</sup>* or *r*<sup>1</sup> ◦ *r*<sup>2</sup> *s*, where *r, s, r*1*, r*<sup>2</sup> are role names [3].

Corollary 12. EL *and* EL<sup>+</sup> *have the -interpolation property w.r.t.* <sup>Θ</sup>R*sharing.* EL<sup>+</sup> *with role inclusions of the form* <sup>r</sup><sup>1</sup> ◦ <sup>r</sup><sup>2</sup> <sup>s</sup> *does not have interpolation w.r.t. intersection-sharing.*

### 7 Conclusions and Future Work

In this paper we gave a hierarchical method for P-interpolation in certain classes of local theory extensions T<sup>0</sup> ⊆ T<sup>0</sup> ∪ K. We used these results for proving ≤ interpolation in classes of semilattices with monotone operators satisfying additional clauses K with a suitable notion of ΘK-sharing we defined. We defined a form of Beth definability w.r.t. a subsignature Σ<sup>S</sup> and used it to show that the class of semilattices with operators under consideration does not have the ≤-interpolation property if only the common function symbols and constants are considered to be "shared". We discussed how these results can be used for the study of interpolation in EL and EL<sup>+</sup>.

The ideas were implemented in a prototype implementation<sup>4</sup> for the theory of semilattices with operators satisfying axioms of type (1) considered in this paper. The program is written in Python and uses Z3 [7] and SPASS [33] as external provers. The program implements Steps 1–3 in the algorithm presented at the end of Sect. 4 with the following optimization: In Step 1 after instantiation and purification, in order to reduce the size of the set of instances of axioms to be considered, an unsatisfiable core is computed with Z3. The program separates the mixed instances by computing intermediate terms for their premises using Theorem 8 and Proposition 6; for applying ordered resolution the prover SPASS is used. In Step 3, the intermediate term T for C ≤ D is computed using the method described in Theorem 8, again using SPASS.

For the use for interpolation in EL and EL<sup>+</sup>, the CBoxes <sup>C</sup><sup>A</sup> and <sup>C</sup><sup>B</sup> and the subsumption C D are given as an input. A minimal subset of C<sup>A</sup> ∪ C<sup>B</sup> is computed from which C D can be derived. (The user can choose between a precise translation to SPASS or a propositional translation to Z3 which is not always precise, but turned out to be a good approximation. Standard implementations available for computing justifications of entailments from description logic ontologies could be used as well.) The problem is then translated into a problem for ≤-interpolation in semilattices with operators. After computing the interpolating term, the result is expressed in the syntax of description logics.

In future work we will explore other application areas of these results, both to classes of non-classical logics and to theories relevant in the verification. We plan to extend the implementation with possibilities of choosing the base theory and the methods for P-interpolation in the base theory. We will further investigate the links with Beth definability and possibilities of using Beth definability for computing explicit definitions for implicitly definable terms – and analyze the applicability of such results in description logics but also in verification.

<sup>4</sup> The implementation and some tests can be found here: https://userpages.unikoblenz.de/~sofronie/p-interpolation-and-el/.

Acknowledgments. We thank the reviewers for their helpful comments. The research reported here was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Projektnummer 465447331.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Theorem Proving in Dependently-Typed Higher-Order Logic**

Colin Rothgang1(B) , Florian Rabe<sup>2</sup> , and Christoph Benzmüller1,3

<sup>1</sup> Mathematics and Computer Science, FU, Berlin, Germany colin.rothgang@gmx.de

<sup>2</sup> Computer Science, University Erlangen-Nürnberg, Erlangen, Germany

<sup>3</sup> AI Systems Engineering, University Bamberg, Bamberg, Germany

**Abstract.** Higher-order logic HOL offers a very simple syntax and semantics for representing and reasoning about typed data structures. But its type system lacks advanced features where types may depend on terms. Dependent type theory offers such a rich type system, but has rather substantial conceptual differences to HOL, as well as comparatively poor proof automation support.

We introduce a dependently-typed extension DHOL of HOL that retains the style and conceptual framework of HOL. Moreover, we build a translation from DHOL to HOL and implement it as a preprocessor to a HOL theorem prover, thereby obtaining a theorem prover for DHOL.

### **1 Introduction and Related Work**

Theorem proving in higher-order logic (HOL) [5,11] has been a long-running research strand producing multiple mature interactive provers [10,13,17] and automated provers [2,4,23]. Similarly, many, mostly interactive, theorem provers are available for various versions of dependent type theory (DTT) [7,9,15,18]. However, it is (maybe surprisingly) difficult to develop theorem provers for dependently-typed higher-order logic (DHOL).

In this paper, we use HOL to refer to a version of Church's *simply*-typed λ-calculus with a base type bool for Booleans, simple function types →, and equality =*A*: *A* → *<sup>A</sup>* <sup>→</sup> bool. This already suffices to define the usual logical quantifiers and connectives.1 Intuitively, it is straightforward to develop DHOL accordingly on top of the *dependently*-typed λ-calculus, which uses a dependent function type Π*x* :*A. B* instead of →. However, several subtleties arise that seem deceptively minor at first but end up presenting fundamental theoretical issues. They come up already in the elementary expression *x* =*<sup>A</sup> y* ⇒ *f*(*x*) =*B*(*x*) *f*(*y*) for some dependent function *f* : Π*x* :*A. B*(*x*).

**Firstly**, the equality *f*(*x*) =*B*(*x*) *f*(*y*) is not even well-typed because the terms *f*(*x*) : *B*(*x*) and *f*(*y*) : *B*(*y*) do not have the same type. Intuitively, it is obvious that the type system can (and maybe should) be adjusted so that the equality *x* =*<sup>A</sup> y* between terms

c The Author(s) 2023 B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 438–455, 2023. https://doi.org/10.1007/978-3-031-38499-8\_25

<sup>1</sup> We do not assume a choice operator or the axiom of infinity.

carries over to an equality *<sup>B</sup>*(*x*) <sup>≡</sup> *<sup>B</sup>*(*y*) between types.2 However, this means that the undecidability of equality leaks into the equality of types and thus into type-checking.

While some interactive provers successfully use undecidable type systems [6,16], most formal systems for DTT commit to keeping type-checking decidable. The typical approach goes back to Martin-Löf type theory [14] and the calculus of constructions [8] and uses two separate equality relations, a decidable meta-level equality for use in the type-checker and a stronger undecidable one subject to theorem proving. Moreover, it favors the propositions-as-types representation and deemphasizes or omits a type of classical Booleans. This approach has been studied extensively [7,9,15] and is not the subject of this paper.

Instead, our motivation is to retain a single equality relation and classical Booleans. This is arguably more intuitive to users, especially to those outside the DTT community such as typical HOL users or mathematicians, and it is certainly much closer to the logics of the strongest available ATP systems. This means we have to pay the price of undecidable type-checking. The current paper was prompted by the observation that this price may be acceptable for two reasons:


**Secondly**, even if we add a rule like "if *x* =*<sup>A</sup> y*, then *B*(*x*) ≡ *B*(*y*)" to our type system, the above expression is still not well-typed: Above, the equality *x* =*<sup>A</sup> y* on the left of ⇒ is needed to show the well-typedness of the equality *f*(*x*) =*B*(*x*) *f*(*y*) on the right. This intertwines theorem proving and type-checking even further. Concretely, we need a *dependent implication*, where the first argument is assumed to hold while checking the well-typedness of the second one. Formally, this means that to show *F* ⇒ *G* : bool, we require *F* : bool and *F G* : bool. Similarly, we need a dependent conjunction. And if we are classical, we may also opt to add a dependent disjunction *F* ∨*G*, where ¬*F* is assumed in *G*. Naturally, dependent conjunction and disjunction are not commutative anymore. This may feel disruptive, but similar behavior of connectives is well-known from short-circuit evaluation in programming languages.

The meta-logical properties of dependent connectives are straightforward. However, interestingly, these connectives can no longer be defined from just equality. At least one of them (we will choose dependent implication) must be taken as an additional primitive in DHOL along with =*A*.

**Finally**, the above generalizations require a notion of DHOL-contexts that is more complex than for HOL. HOL-contexts can be stratified into (a) a set of variable declarations

<sup>2</sup> Note that while term equality <sup>=</sup>*<sup>A</sup>* is a bool-valued connective, type equality <sup>≡</sup> is not. Instead, in HOL, ≡ is a judgment at the same level as the typing judgment *t* : *A*.

*xi* : *Ai*, and (b) a set of logical assumptions *F* possibly using the variables *xi*. Moreover, the former are often not explicitly listed at all and instead inferred from the remainder of the sequent. But in DHOL, the well-formedness of an *Ai* may now depend on previous logical assumptions. To linearize this inter-dependency, DHOL contexts must consist of a single list alternating between variable declarations and assumptions.

*Contribution.* Our contribution is twofold. Firstly, we introduce a new logic DHOL designed along the lines described above. Moreover, we further extend DHOL with predicate subtypes *A*|*<sup>p</sup>* for a predicate *p* : *A* → bool on the type *A*. Besides dependent types, these constitute a second important source of terms occurring in types. Because they also make typing undecidable, they are often avoided. The most prominent exception is PVS [16], whose kernel essentially arises by adding predicate subtypes to HOL. In current HOL ITPs going back to [10], their use is usually restricted to the subtype definition principle: here a definition *b* := *A*|*<sup>p</sup>* may occur on toplevel and is elaborated into a fresh type *b* that is axiomatized to mimic the subtype *A*|*<sup>p</sup>* . Because we are committed to undecidable typing anyway, predicate subtypes fit naturally into our approach.

Secondly, we develop and implement a sound and complete translation of DHOL into HOL. This setup allows the use of DHOL as the expressive user-facing language and HOL as the internal theorem-proving language. We position our implementation close to an existing HOL ATP, namely the LEO-III system. From the LEO-III perspective, DHOL serves as an additional input language that is translated into HOL by an external logic embedding tool [21,22] in the LEO-III ecosystem. Because LEO-III already supports such embeddings and because the TPTP syntax [24] foresees the use of dependent types in ATPs and provides syntax for them (albeit without a normative semantics), we were able to implement the translation with no disruptions to existing workflows.

The general idea of our translation of dependent into simple type theory is not new [3]. In that work, Martin-Löf-style dependent type theory is translated into Gordon's HOL ITP [10]. This work differs critically from ours because it uses DTT in propositionsas-types style. Our work builds DHOL with classical Booleans and equality predicate, which makes the task of proving the translation sound and complete very different. Moreover, their work targets an interactive prover while ours targets automated ones.

*Overview.* In Sect. 2 we recap the HOL logic. In Sect. 3 we extend it to DHOL and define our translation from DHOL to HOL. In Sect. 4 we add subtyping and predicate subtypes. In Sect. 5 we prove the soundness and completeness of the translation. In Sect. 6 we describe how to use our translation and a HOL ATP to implement a theorem prover for DHOL.

### **2 Preliminaries: Higher-Order Logic**

We introduce the syntax and rules of HOL. Our definitions are standard except that we tweak a few details in order to later present the extension to DHOL more succinctly. We use the following grammar for HOL:


A theory *T* is a list of base type declarations *a* : tp, typed constant declarations *c* : *A*, and named axioms *c*: *F* asserting the formula *F*. A context Γ has the same form except that no type variables are allowed. It is not strictly necessary to use named axioms and assumptions, but it makes our extensions to DHOL later on simpler. We write ◦ and *.* for the empty theory and context, respectively. At this point, it is possible to normalize contexts into a set of variable declarations followed by a set of assumptions because the well-formedness of a type *A* can never depend on a variable or an assumption. But that property will change when going to DHOL, which is why we allow Γ to alternate between variables and assumptions.

Types *A* are either user-declared types *a*, the built-in base type bool, or function types *A* → *B*. Terms are constants *c*, variables *x*, λ-abstractions λ*x*:*A. t*, function applications *f t*, or obtained from the built-in bool-valued connectives =*<sup>A</sup>* or ⇒. As usual [1], this suffices to define all the usual quantifiers and connectives true, false, ¬, ∧, ∨, ∀ and ∃. This includes ⇒, but we make it a primitive here because we will change it in DHOL. As usual, *E*[*<sup>x</sup>/t*] denotes the capture-avoiding substitution of the variable *x* with the term *t* within expression *E*.

The type and proof system uses the judgments given below. Note that we need a metalevel judgment for the equality of types because ≡ is not a bool-valued connective. On the contrary, the equality of terms *s* =*<sup>A</sup> t* is a special case of the validity judgment *F*. In HOL, ≡ is trivial, and the judgment is redundant. But we include it here already because it will become non-trivial in DHOL.


The rules are given in Fig. 1. We assume that all names in a theory or a context are unique without making that explicit in the rules. Following common practice, we further assume that HOL types are non-empty.

$$\begin{array}{ccc} \begin{array}{c} \begin{array}{c} \vdash T \ \mathsf{T} \mathsf{hy} \\ \vdash \mathsf{T} \end{array} \end{array} & \begin{array}{c} \begin{array}{c} \vdash T \ \mathsf{T} \mathsf{hy} \\ \vdash T \end{array} \end{array} & \begin{array}{c} \begin{array}{c} \vdash\_{T} A \ \mathsf{tp} \\ \vdash\_{T} T \ \ \mathsf{t} \end{array} \end{array} & \begin{array}{c} \begin{array}{c} \vdash\_{T} F \ \mathsf{t} \\ \vdash\_{T} T \ \ \mathsf{t} \end{array} \end{array} \end{array} \end{array} & \begin{array}{c} \begin{array}{c} \vdash\_{T} F \ \mathsf{t} \\ \vdash\_{T} F \ \mathsf{t} \end{array} \end{array}$$

$$\begin{array}{c} \begin{array}{c} \vdash\_{T} T \mathsf{t} \mathsf{y} \\ \vdash\_{T} \mathsf{Ctx} \end{array} \end{array} \quad \begin{array}{c} \Gamma \vdash\_{T} A \ \mathsf{t} \\ \vdash\_{T} F \ \mathsf{t} \end{array} \quad \begin{array}{c} \Gamma \vdash\_{T} F \ \mathsf{t} \\ \vdash\_{T} F \ \mathsf{t} \end{array} \end{array}$$

$$\begin{array}{c} a: \mathsf{tp} \,\mathsf{in}\,T \qquad \vdash\_{T} \Gamma \mathsf{Ctx} \\\hline \hline \Gamma \vdash\_{T} a \,\mathsf{tp} \\\hline \end{array} \quad \begin{array}{c} \begin{array}{c} c: A' \,\mathsf{in}\,T \qquad \Gamma \vdash\_{T} A' \equiv A \\\hline \Gamma \vdash\_{T} c: A \end{array} \quad \begin{array}{c} \Gamma \mathrel{\vdash\_{T} F \ \mathsf{in}\,T \qquad \vdash\_{T} \Gamma \mathsf{Ctx} \\\hline \Gamma \vdash\_{T} F \end{array} \\\hline \end{array}$$

$$\begin{array}{c} \begin{array}{c} \mathsf{x}: A' \,\mathsf{in}\,\Gamma \quad\Gamma \vdash\_{T} A' \equiv A \\\hline \Gamma \vdash\_{T} \mathsf{x}: A \end{array} \quad \begin{array}{c} \begin{array}{c} \mathsf{x}: F \text{ in}\,\Gamma \quad\vdash\_{T} \Gamma \mathsf{Ctx} \\\hline \Gamma \vdash\_{T} F \end{array} \end{array}$$

$$\begin{array}{c} \mathsf{ \vdash\_{T}} \Gamma \mathsf{ C tx} \\ \hline \Gamma \vdash\_{T} \mathsf{bool} \mathsf{ t} \mathsf{p} \end{array} \quad \begin{array}{c} \Gamma \vdash\_{T} A \mathsf{ t} \mathsf{p} \qquad \Gamma \vdash\_{T} B \mathsf{ t} \mathsf{p} \\ \hline \Gamma \vdash\_{T} A \mathsf{ s} \end{array} \quad \begin{array}{c} \Gamma \vdash\_{T} A \mathsf{ t} \mathsf{p} \\ \hline \Gamma \vdash\_{T} A \equiv A \end{array} \quad \begin{array}{c} \Gamma \vdash\_{T} A \equiv A' \quad \Gamma \vdash\_{T} B \equiv B' \\ \hline \Gamma \vdash\_{T} A \rightarrow B \equiv A' \rightarrow B' \end{array}$$

$$\begin{array}{c} \Gamma, \mathsf{x}: A \vdash\_{\mathsf{T}} \iota: \mathsf{z}: \mathcal{B} \\ \hline \Gamma \vdash\_{\mathsf{T}} (\lambda \mathsf{x}: A. \mathsf{t}) : A \to \mathcal{B} \end{array} \quad \begin{array}{c} \Gamma \vdash\_{\mathsf{T}} f: A \to \mathcal{B} \\ \hline \Gamma \vdash\_{\mathsf{T}} f \iota: \mathcal{B} \end{array} \quad \begin{array}{c} \Gamma \vdash\_{\mathsf{T}} s: A \quad \Gamma \vdash\_{\mathsf{T}} t: A \\ \hline \Gamma \vdash\_{\mathsf{T}} s =\_{A} t : \mathsf{bool} \end{array}$$

$$\begin{array}{c} \begin{array}{c} \Gamma \vdash\_{T} A \equiv \; A' \\ \hline \Gamma \vdash\_{T} \; \lambda \mathbf{x} : A. \; t =\_{A \to B} \; \lambda \mathbf{x} : A'. \; t' \end{array} \quad \begin{array}{c} \Gamma \vdash\_{T} t \; \equiv\_{A} \; t' \qquad \Gamma \vdash\_{T} f \; =\_{A \to B} \; f' \\ \hline \Gamma \vdash\_{T} t \; \lambda \mathbf{x} : A'. \; t' \end{array} \quad \begin{array}{c} \Gamma \vdash\_{T} t \; \equiv\_{A} \; t' \qquad \Gamma \vdash\_{T} f \; =\_{A \to B} \; f' \\ \hline \Gamma \vdash\_{T} f \; t \; =\_{B} \; f' \; t' \end{array} \\\\ \begin{array}{c} \Gamma \vdash\_{T} t \; \lambda \mathbf{x} : A. \; s \; t : B \\ \hline \Gamma \vdash\_{T} \; (\lambda \mathbf{x} : A. \; s) \; t : B \\ \hline \Gamma \vdash\_{T} t \; \equiv\_{A \to B} \; \lambda \mathbf{x} : A. \; t : B \end{array} \quad \begin{array}{c} \Gamma \vdash\_{T} t \; \; i : A \to B \\ \hline \Gamma \vdash\_{T} t \; \; i : A \to B \\ \hline \Gamma \vdash\_{A \to B} \; \lambda \mathbf{x} : A. \; t : B \\ \hline \end{array}$$

$$\begin{array}{c} \Gamma \vdash\_{\mathsf{T}} F \mathrel{\mathop{:} \mathsf{bool}} \quad \Gamma \vdash\_{\mathsf{T}} G \mathrel{\mathop{:} \mathsf{bool}} \quad \dfrac{\Gamma \vdash\_{\mathsf{T}} F \mathrel{\mathop{:} \mathsf{bool}}}{\Gamma \vdash\_{\mathsf{T}} F \mathrel{\mathop{:} \mathsf{bool}}} \quad \dfrac{\Gamma \vdash\_{\mathsf{T}} F \mathrel{\mathop{:} \mathsf{bool}} \quad \Gamma \vdash\_{\mathsf{T}} F \mathrel{\mathop{:} \mathsf{field}}}{\Gamma \vdash\_{\mathsf{T}} F \mathrel{\mathop{:} \mathsf{add}}} \quad \dfrac{\Gamma \vdash\_{\mathsf{T}} F \mathrel{\mathop{:} \mathsf{add}} G \quad \Gamma \vdash\_{\mathsf{T}} F \end{array}$$

$$\frac{\Gamma \vdash\_{T} F \rightleftharpoons\_{\mathsf{bool}} F' \quad \Gamma \vdash\_{T} F'}{\Gamma \vdash\_{T} F} \quad \frac{\Gamma \vdash\_{T} p \text{ tue} \quad \Gamma \vdash\_{T} p \text{ false}}{\Gamma, \mathtt{x} \mathrel{\mathop{:}} \mathsf{bool} \vdash\_{T} p \text{ x}} \quad \frac{\Gamma \vdash\_{T} F \text{ : book} \quad \Gamma, \mathtt{x} \mathrel{\mathop{:}} A \vdash\_{T} F}{\Gamma \vdash\_{T} F}$$

**Fig. 1.** HOL Rules

### **3 Dependent Function Types**

#### **3.1 Language**

We have carefully defined HOL in such a way that only a few surgical changes are needed to define DHOL. A consolidated summary of DHOL is given in Appendix A.2 in the extended preprint [20]. The **grammar** is as follows with unchanged parts shaded out:


Concretely, base types *a* may now take term arguments and simple function types *A* → *B* are replaced with dependent function types Π*x* : *A. B*. As usual we will retain the notation *A* → *B* for the latter if *x* does not occur free in *B*. DHOL is a conservative extension of HOL, and we recover HOL as the fragment of DHOL in which all base types *a* have arity 0.

*Example 1 (Category Theory).* As a running example, we formalize the theory of a category in DHOL. It declares the base type *ob j* for objects and the dependent base type mor *a b* for morphisms. Further it declares the constants *id* and comp for identity and composition, and the axioms for neutrality. We omit the associativity axiom for brevity.

$$\begin{aligned} \mathsf{obj} &: \mathsf{tp} \\ \mathsf{mor} &: \Pi x, \mathsf{y} : \mathsf{obj} . \mathsf{tp} \\ \mathsf{id} &: \Pi a : \mathsf{obj} . \mathsf{mor} \; a \; a \\ \mathsf{comp} &: \Pi a, b, c : \mathsf{obj} . \mathsf{mor} \; a \; b \to \mathsf{mor} \; b \; c \to \mathsf{mor} \; a \; c \\ \mathsf{mentL} &: \forall x, \mathsf{y} : \mathsf{obj} . \forall m : \mathsf{mor} \; x \; \mathsf{y} . \; m \circ \mathsf{id}\_{\mathsf{x}} =\_{\mathsf{mor} \; x \; \mathsf{y} } m \\ \mathsf{mentR} &: \forall x, \mathsf{y} : \mathsf{obj} . \; \forall m : \mathsf{mor} \; x \; \mathsf{y} . \; \mathsf{id}\_{\mathsf{y}} \; \circ m =\_{\mathsf{mor} \; x \; \mathsf{y}} m \end{aligned}$$

Here we use a few intuitive notational simplifications such as writing Π*x,y* :obj*.* for binding two variables of the same type. We also use the notations id*<sup>x</sup>* for id *<sup>x</sup>* and *<sup>h</sup>* ◦ *<sup>g</sup>* for comp \_\_\_ *g h* where the \_ denote inferable arguments of type obj.

The **judgments** stay the same and we only make minor changes to the **rules**, which we explain in the sequel. Firstly we replace all rules for → with the ones for Π:

$$\begin{array}{c} \begin{array}{c} \Gamma \vdash\_{T} A \text{ tp} \quad \Gamma, \mathsf{x}: A \vdash\_{T} B \text{ tp} \\ \hline \Gamma \vdash\_{T} \Pi \mathbf{x}: \mathsf{A}, \mathsf{B} \text{ tp} \end{array} \qquad \begin{array}{c} \Gamma \vdash\_{T} A \equiv A' \quad \Gamma, \mathsf{x}: A \vdash\_{T} B \equiv B' \\ \hline \Gamma \vdash\_{T} \Pi \mathbf{x}: \mathsf{A}, \mathsf{B} \equiv \Pi \mathbf{x}: A'. \mathsf{B}' \end{array} \end{array}$$

$$\begin{array}{c} \begin{array}{c} \Gamma, \mathsf{x}: A \vdash\_{T} t: B \\ \hline \Gamma \vdash\_{T} \left( \lambda \boldsymbol{x}: A. \mathsf{t} \right): \Pi \mathbf{x}: \mathsf{A}. \mathsf{B} \end{array} \qquad \begin{array}{c} \Gamma \vdash\_{T} f: \Pi \mathbf{x}: \mathsf{A}, \mathsf{B} \qquad \Gamma \vdash\_{T} t: A \end{array} \\ \begin{array}{c} \Gamma \vdash\_{T} f: \Pi \mathbf{x}: \mathsf{A}, \mathsf{B} \qquad \Gamma \vdash\_{T} f: \mathsf{B} \, [\,^{\mathsf{x}}\!\!\!\!\!\!\!\!\!\!\/^{\mathsf{x}}\!\!\!\!\!\!\!\/^{\mathsf{x}}\!\!\!\!\!\!\!\/^{\mathsf{x}}\!\!\!\!\!\!\/^{\mathsf{x}}\!\!\!\!\!\/^{\mathsf{x}}\!\!\!\!\!\/^{\mathsf{x}}\!\!\!\!\/^{\mathsf{x}}\!\!\!\!\/^{\mathsf{x}}\!\!\!\!\!\/^{\mathsf{x}}\!\!\!\!\!\/^{\mathsf{x}}\!\!\!\!\!\/$$

$$\frac{\Gamma \vdash\_T t : \Pi \mathbf{x} : \mathbf{A} . \mathbf{B}}{\Gamma \vdash\_T t \implies \Pi \mathbf{x} \mathbf{A} . \mathbf{B} \; \not\exists \lambda \mathbf{x} : \mathbf{A} . \mathbf{t} \; \mathbf{x} \mathbf{y}}$$

Then we replace the rules for declaring, using, and equating base types with the ones where base types are applied to arguments:

$$\frac{\vdash\_{T}\mathbf{x}\_{1}:A\_{1},\ldots,\mathbf{x}\_{n}:A\_{n}\ \mathbf{C}\mathbf{x}}{\vdash\_{T}T,\ a:\Pi\mathbf{x}\_{1}:A\_{1},\ldots,\Pi\mathbf{x}\_{n}:A\_{n},\ \mathbf{t}\ \mathbf{D}\mathbf{y}}$$

$$\frac{\vdash\_{T}\Gamma\bigcirc\mathbf{t}\times\big(a:\Pi\mathbf{x}\_{1}:A\_{1},\ldots,\Pi\mathbf{x}\_{n}:A\_{n},\ \mathbf{t}\ \mathbf{in}\ \mathbf{T}\big)}{\Gamma\vdash\_{T}t\_{1}:A\_{1}\ \ldots\ldots\Gamma\vdash\_{T}t\_{n}:A\_{n}[\mathbf{x}\_{1}/t\_{1}]\ldots[\mathbf{x}\_{n-1}/t\_{n-1}]}$$

$$\frac{\vdash\_{T}\Gamma\bigcirc\mathbf{t}\big(a\ :\Pi\mathbf{x}\_{1}:A\_{1},\ldots\ \Pi\mathbf{x}\_{n}:A\_{n},\ \mathbf{t}\ \mathbf{in}\ \mathbf{T}\big)}{\Gamma\vdash\_{T}\mathbf{s}\_{1}:\ldots\ \Gamma\vdash\_{T}\mathbf{s}\_{n}:A\_{n},\ \mathbf{t}\ \mathbf{in}\ \mathbf{T}}$$

$$\Gamma\vdash\_{T}\mathbf{s}\_{1}=\_{A\_{1}}t\_{1}\ \ldots\ \Gamma\vdash\_{T}\mathbf{s}\_{n}=\_{A\_{n}[\mathbf{s}\_{1}]\ldots[\mathbf{s}\_{n-1}/t\_{n-1}]}{\Gamma\vdash\_{T}a\ \mathbf{s}\_{1}\ \ldots\ \mathbf{s}\_{n}\equiv a\ \mathbf{t}\_{1}\ \ldots\ \mathbf{t}\_{n}}$$

The last of these is the critical rule via which term equality leaks into type equality. Thus, typing of expressions may now depend on equality assumptions and thus typing becomes undecidable.

*Example 2 (Undecidability of Typing).* Continuing Example 1, consider terms *f* : mor *u v* and *<sup>g</sup>* : mor *<sup>v</sup> <sup>w</sup>* for terms *<sup>u</sup>,v,<sup>v</sup> ,<sup>w</sup>* : obj. Then *<sup>g</sup>* ◦ *<sup>f</sup>* : mor *u w* holds iff *<sup>f</sup>* : mor *u v* , which holds iff *v* =obj *v* . Depending on the axioms present, this may be arbitrarily difficult to prove.

Finally, we modify the rule for the non-emptiness of types: we allow the existence of empty dependent types and only require that for each HOL type in the image of the translation there exists one non-empty DHOL type translated to it (rather than requiring all dependent types translated to it to be non-empty). And we replace the typing rule for implication with the dependent one. The proof rules for implications are unchanged.

$$\frac{\Gamma \vdash\_T F : \mathsf{bool} \quad \Gamma, \mathsf{x} ; F \vdash\_T G : \mathsf{bool}}{\Gamma \vdash\_T F \Rightarrow G : \mathsf{bool}}$$

*Example 3 (Dependent Implication).* Continuing Example 1, consider the formula

$$\{x \colon \mathsf{obj}, \, y \colon \mathsf{obj} \vdash x =\_{\mathsf{obj}} y \Rightarrow \mathsf{id}\_{x} =\_{\mathsf{norm}\, x \, x} \mathsf{id}\_{y} \, : \, \mathsf{bool}\}$$

which expresses that equal objects have equal identity morphisms. It is easy to prove. But it is only well-typed because the typing rule for dependent implication allows using *x* =obj *y* while type-checking id*<sup>x</sup>* =mor *x x* id*<sup>y</sup>* : bool, which requires deriving id*<sup>y</sup>* : mor *x x* and thus mor *y y* <sup>≡</sup> mor *x x*.

All the usual connectives and quantifiers can be defined in any of the usual ways now. However, the details matter for the dependent versions of the connectives. In particular, we choose *F* ∧*G* := ¬(*F* ⇒ ¬*G*) and *F* ∨*G* := ¬*F* ⇒ *G* in order to obtain the dependent versions of conjunction and disjunction, in which the well-formedness of *G* may depend on the truth or falsity of *F*, respectively.

#### **3.2 Translation**

We define a translation function *X* → *X* that maps any DHOL-syntax *X* to HOL-syntax. Its intuition is to erase type dependencies by translating all types *at*<sup>1</sup> *..., tn* to *a* and replacing every Π with →. To recover the information of the erased dependencies, we additionally define a partial equivalence relation (PER) *A*∗ on *A* for every DHOL-type *A*.

In general, a PER *r* on type *U* is a symmetric and transitive relation on *U*. This is equivalent to *r* being an equivalence relation on a subtype of *U*. The intuitive meaning of our translation is that the DHOL-type *A* corresponds in HOL to the quotient of the appropriate subtype of *A* by the equivalence *A*∗. In particular, the predicate *A*∗ *t t* captures whether *t* represents a term of type *A*. More formally, the correspondence is:


**Definition 1 (Translation).** *We translate DHOL-syntax by induction on the grammar. Theories and contexts are translated declaration-wise:*

$$
\overline{\circ} := \circ \quad \overline{T, D} := \overline{T}, \overline{D} \quad \text{.} \; := \text{ .} \quad \overline{T, D} := \overline{T}, \overline{D}
$$

*where D is a list of declarations.*

*The translation a* : Π*x*<sup>1</sup> :*A*1*. ...*Π*xn* :*An.* tp *of a base type declaration is given by*

*a* : tp*, a*<sup>∗</sup> : *A*<sup>1</sup> → *...* → *An* → *a* → *a* → bool

*aPER* : ∀*x*<sup>1</sup> :*A*1*. ...*∀*xn* :*An.* ∀*u,v*:*a. a*<sup>∗</sup> *x*<sup>1</sup> *... xn u v* ⇒ *u* =*<sup>a</sup> v*

*Thus, a is translated to a base type of the same name without arguments and a trivial PER for every argument tuple. Intuitively, a*<sup>∗</sup> *t*<sup>1</sup> *... tn u u defines the subtype of the HOL-type a corresponding to the DHOL-type a t*<sup>1</sup> *... tn.*

*Constant and variable declarations are translated by adding the assumptions that they are in the PER of their type, and axioms and assumptions are translated straightforwardly:*

$$\overline{c:A} := c:\overline{A},\ c^\*:A^\*\ c\ c \quad \overline{x:A} := \overline{x}:\overline{A},\ x^\*:A^\*\ x\ x$$

$$
\overline{c:F} := c:\overline{F} \quad \overline{x:F} := x:\overline{F}
$$

*The cases of A and A*∗ *for types A are:*

*a t*<sup>1</sup> *... tn* := *a* (*a t*<sup>1</sup> *... tn*) <sup>∗</sup> *s t* := *a*<sup>∗</sup> *t*<sup>1</sup> *... tn s t* Π*x* :*A. B* := *A* → *B* (Π*x* :*A. B*) <sup>∗</sup> *f g* := ∀*x,y*:*A. A*<sup>∗</sup> *x y* ⇒ *B*<sup>∗</sup> (*f x*) (*g y*) bool := bool bool<sup>∗</sup> *s t* := *s* =bool *t*

*Finally, the cases for terms are straightforward except for, crucially, translating equality to the respective PER:*

$$
\overline{\pi} := c \quad \overline{\pi} := x \quad \overline{\lambda x} : \\
\overline{A.t.} := \lambda x ; \\
\overline{A}. \overline{t} \quad \overline{f} \\
\overline{t} := \overline{f} \ \overline{t}
$$
 
$$
\overline{\overline{F \Rightarrow G}} := \overline{F} \Rightarrow \overline{G} \quad \overline{s =\_A t} := A^\* \ \overline{s} \ \overline{t}
$$

*Example 4 (Translating Derived Connectives).* If we define true, false, ¬ as usual in HOL and use the definition for dependent conjunction from above, it is straightforward to show that all DHOL-connectives are translated to their HOL-counterparts. For example, we have (up to logical equivalence in HOL) that *F* ∧*G* = *F* ∧*G*.

We also define the quantifiers in the usual way, e.g., using ∀*x* : *A.F*(*x*) := λ*x* : *A. F*(*x*) =*A*→bool λ*x* :*A.* true. Then applying our translation yields

$$\overline{\forall \mathbf{x} : A.F(\mathbf{x})} = (A \to \mathbf{bool})^{\*} \overline{\lambda x : A.F(\mathbf{x})} \overline{\lambda x : A.\mathbf{true}}$$

$$= \forall \mathbf{x}, \mathbf{y} : \overline{A}.A^{\*} \ge \mathbf{y} \Rightarrow \mathbf{bool}^{\*} \ F(\mathbf{x}) \text{ true}$$

This looks clunky, but (because *A*∗ is a PER as shown in Theorem 1) is equivalent to ∀*x* : *A.A*<sup>∗</sup> *x x* ⇒ *F*(*x*). Thus, DHOL-∀ is translated to HOL-∀ relativized using *A*<sup>∗</sup> *x x*. The corresponding rule ∃*x* : *A.F*(*x*) = ∃*x* : *A.A*<sup>∗</sup> *x x*∧*F*(*x*) can be shown accordingly.

*Example 5 (Categories in HOL).* We give a fragment of the translation of Example 1:

obj : tp obj<sup>∗</sup> : obj <sup>→</sup> obj <sup>→</sup> bool mor : tp mor<sup>∗</sup> : obj <sup>→</sup> obj <sup>→</sup> mor <sup>→</sup> mor <sup>→</sup> bool id : obj <sup>→</sup> mor id<sup>∗</sup> : <sup>∀</sup>*x,<sup>y</sup>* : obj*.*obj<sup>∗</sup> *x y* <sup>⇒</sup> mor<sup>∗</sup> *x x* (id *<sup>x</sup>*) (id *<sup>y</sup>*) comp : obj <sup>→</sup> obj <sup>→</sup> obj <sup>→</sup> mor <sup>→</sup> mor <sup>→</sup> mor neutL : <sup>∀</sup>*<sup>x</sup>* : obj*.*obj<sup>∗</sup> *x x* ⇒ ∀*<sup>y</sup>* : obj*.*obj<sup>∗</sup> *y y* <sup>⇒</sup> <sup>∀</sup>*<sup>m</sup>* : mor*.*mor<sup>∗</sup> *xymm* <sup>⇒</sup> mor<sup>∗</sup> *x y* (comp *xxy* (id *<sup>x</sup>*) *<sup>m</sup>*) *<sup>m</sup>*

Here, for brevity, we have omitted obj*PER*, mor*PER*, and comp<sup>∗</sup> and have already used the translation rule for ∀ from Example 4. The result is structurally close to what a native formalization of categories in HOL would look like, but somewhat clunkier.

$$\frac{\Gamma \vdash\_{T} p \; : \; \Pi \mathbf{x} : A. \; \mathsf{bool}}{\Gamma \vdash\_{T} A |\_{\mathfrak{p}} \; \mathsf{tp}} \quad \frac{\Gamma \vdash\_{T} t \; : \; A \quad \quad \Gamma \vdash\_{T} p \; t}{\Gamma \vdash\_{T} t \; : \; A |\_{\mathfrak{p}}} \quad \frac{\Gamma \vdash\_{T} t \; : \; A |\_{\mathfrak{p}}}{\Gamma \vdash\_{T} p \; t}$$

$$\frac{\Gamma \vdash\_{\boldsymbol{T}} A \equiv \boldsymbol{A}' \quad \quad \Gamma \vdash\_{\boldsymbol{T}} \rho =\_{\Pi \boldsymbol{x} \boldsymbol{A}.\text{bool}} \rho'}{\Gamma \vdash\_{\boldsymbol{T}} A \mid\_{\boldsymbol{\rho}} \equiv \boldsymbol{A}' \mid\_{\boldsymbol{\rho}'}} \quad \quad \frac{\Gamma \vdash\_{\boldsymbol{T}} A \; \lhd \text{\textquotedbl{}} \; A' \quad \Gamma, \; \operatorname{x} \; \text{\textquotedbl{}} \; A \vdash\_{\boldsymbol{T}} \rho \, \operatorname{x} \Rightarrow \boldsymbol{\rho}' \, \operatorname{x}}{\Gamma \vdash\_{\boldsymbol{T}} A \mid\_{\boldsymbol{\rho}} \; \operatorname{\boldsymbol{\epsilon}} \; \operatorname{A}' \mid\_{\boldsymbol{\rho}'}}$$

$$\frac{\Gamma \vdash\_{\boldsymbol{T}} A <:\colon A'}{\Gamma \vdash\_{\boldsymbol{T}} A |\_{\boldsymbol{\rho}} <:\colon A'} \qquad \frac{\Gamma \vdash\_{\boldsymbol{T}} A \text{ tp}}{\Gamma \vdash\_{\boldsymbol{T}} A \equiv:A|\_{\boldsymbol{\lambda} \times \boldsymbol{A}.\text{ true}}} \qquad \frac{\Gamma \vdash\_{\boldsymbol{T}} A \text{ tp}}{\Gamma \vdash\_{\boldsymbol{T}} A|\_{\boldsymbol{\lambda} \times \boldsymbol{A}.\text{ true}} \equiv A}$$

$$\frac{\Gamma \vdash\_{T} A \equiv A'}{\Gamma \vdash\_{T} A <: \colon A'} \quad \frac{\Gamma \vdash\_{T} A' <: \colon A \quad \Gamma, \ \propto \colon A' \vdash\_{T} B <: \colon B'}{\Gamma \vdash\_{T} \Pi \chi : A . \ B <: \colon \Pi \chi : A' . \ B'}$$

$$\begin{array}{c} \Gamma \vdash\_{T} A \text{ tp} \qquad \Gamma, \text{ x } : A \vdash\_{T} B \text{ tp} \qquad \Gamma, \text{ x } : A \vdash\_{T} p : \Pi y : \Pi y : B. \text{bool}}{\Gamma \vdash\_{T} \Pi \text{x } : A. \ (\mathcal{B}|\_{p}) \equiv (\Pi \text{x} : A. \ \mathcal{B})|\_{\lambda f \forall \text{x} \forall A. \ p \text{ (}f \text{ x)}}} \\\\ \Gamma \vdash\_{T} A \text{ tp} \qquad \Gamma \vdash\_{T} p : \Pi x : A. \ \mathsf{bool} \qquad \Gamma \vdash\_{T} q : \Pi x : (A|\_{p}). \ \mathsf{bool} \\ \hline \hline \Gamma \vdash\_{T} A|\_{p}|\_{q} \equiv A|\_{\lambda \forall A. \ p \text{ \x} \neq q} \end{array}$$

**Fig. 2.** Additional Rules for Predicate Subtypes

#### **4 Predicate Subtypes**

To add predicate subtypes, we extend the **grammar** with the production *A* ::= *A*|*<sup>F</sup>* . No new productions for terms are needed because the inhabitants of *A*|*<sup>F</sup>* use the same syntax as those of *A*.

*Example 6 (Isomorphisms).* We continue Example 1 and use predicate subtypes to write the type isomorphisms *u* of automorphisms on *u* as a subtype of mor *u u*. We can define isomorphisms *<sup>u</sup>* := (mor *u u*)|*<sup>p</sup>* where the predicate *<sup>p</sup>* is given by

$$(\lambda m: \mathfrak{m} \bullet \mathfrak{u} \,\,\, \exists i: \mathfrak{m} \,\, \mathtt{r} \,\, \mathtt{u} \,\, \mathtt{u} \,\, (i \diamond m =\_{\mathtt{norm} \,\, \mathtt{u} \,\, \mathtt{u}} \,\mathtt{i} \,\mathtt{d}\_{\mathtt{u}}) \wedge (m \diamond i =\_{\mathtt{norm} \,\, \mathtt{u} \,\, \mathtt{u}} \,\mathtt{i} \,\mathtt{d}\_{\mathtt{u}})$$

Adding subtyping requires a few extensions to our type system. First we add a **judgment** Γ *<sup>T</sup> A <*: *B* and replace the lookup rules for variables and constants with their subtyping-aware variants:

$$\frac{c:A' \text{ in } T \qquad \Gamma \vdash\_T A' \lhd\_T \mathsf{K}: A}{\Gamma \vdash\_T c:A} \qquad \frac{\begin{array}{c} \pi:A' \text{ in } \Gamma \qquad \Gamma \vdash\_T A' \lhd\_T \mathsf{K}: \mathsf{A} \\ \Gamma \vdash\_T \pi:A \end{array}}{\Gamma \vdash\_T \pi:A}$$

Then we add the **rules** given in Fig. 2. These induce an algorithm for deciding subtyping relative to an oracle for the undecidable validity judgment. The latter enters the algorithm when two predicate subtypes are compared. Note that the type-equality rule for *A*|*<sup>p</sup>* |*<sup>q</sup>* uses a dependent conjunction.

The resulting system is a conservative extension of the variants of HOL and DHOL without subtyping: we recover these systems as the fragments that do not use *A*|*<sup>p</sup>* . In particular, in that case *A <*: *B* is trivial and holds iff *A* ≡ *B* holds.

Finally, we extend our **translation** by adding the cases for predicate subtypes:

**Definition 2 (Translation).** *We extend Definition 1 with*

$$\overline{A|\_p} := \overline{A} \quad (A|\_p)^\* \text{ s } t := A^\* \text{ s } t \land \overline{p} \, s \land \overline{p} \, t$$

### **5 Soundness and Completeness**

Now we establish that our translation is faithful, i.e. sound and complete. We will use the terms *sound* and *complete* from the perspective of using a HOL-ATP for theorem proving in DHOL, e.g., *sound* means if *F* is a HOL-theorem, then *F* is a DHOLtheorem, and *complete* is the dual.3

The completeness theorem states that our translation preserves all DHOL-judgments. Moreover, the theorem statement clarifies the intuition behind the translations invariants:

#### **Theorem 1 (Completeness).** *We have*


*Additionally the substitution lemma holds, i.e.,*

Γ*, x* : *A <sup>T</sup> t* : *B and* Γ *u* : *A implies* Γ *<sup>T</sup> t*[*<sup>x</sup>/u*] =*<sup>B</sup> t*[*<sup>x</sup>/u*]

*Proof.* The proof proceeds by induction and can be found in Appendix B of the extended preprint [20].

<sup>3</sup> If, however, we think of our translation as an interpretation function that maps syntax to semantics, we could also justify swapping the names of the theorems.

The reverse direction is much trickier. To understand why, we look at two canaries in the coal mine that we have used to reject multiple intuitive but untrue conjectures:

*Example 7 (Non-Injectivity of the Translation).* Continuing Example 1, assume terms *u,v* : obj and consider the identify functions *Iu* := λ *f* : mor *u u. f* and *Iv* := λ *f* : mor *v v. f* . Both are translated to the same HOL-term *Iu* = *Iv* = λ *f* : mor*. f* (because *Iu* and *Iv* only differ in the type indices, which are erased by our translation).

Consequently, the ill-typed DHOL-Boolean *b* := *Iu* =mor *u u*→mor *u u Iv* is translated to the HOL-Boolean λ *<sup>f</sup>* : mor*. <sup>f</sup>* <sup>=</sup>mor→mor λ *f* : mor*. f* , which is not only well-typed but even a theorem.

To better understand the underlying issue we introduce the notion of *spurious* terms. The well-typed translation *t* of a DHOL-term *t* is called **spurious** if *t* is ill-typed (otherwise it is called *proper*). Intuitively, we should be able to use the PERs *A*∗ to deal with spurious terms: to type-check *t* : *A* in DHOL, we want to use *A*∗ *t t* in HOL. But even that is tricky:

*Example 8 (Trivial PERs for Built-In Base Types).* Consider the property bool∗ *x x*. Our translation guarantees bool∗ true true and bool∗ false false. Thus, we can use Boolean extensionality to prove in HOL that ∀*x* : bool*.*bool<sup>∗</sup> *x x*, making the property trivial. In particular, we can prove bool∗ *b b* for the spurious Boolean *b* from Example 7. Even worse, the property (Π*x* :*A. B*) ∗ *x x* is trivial in this way whenever it is for *B* and thus for all *n*-ary bool-valued function types.

More generally, this degeneration effect occurs for every base type that is built into both DHOL and HOL and that is translated to itself. bool is the simplest example of that kind, and the only one in the setting described here. But reasonable language extensions like built-in base types *a* for numbers, strings, etc. would suffer from the same issue. This is because all of these types would come with built-in induction principles that derive a universal property from its ground instances, at which point *a*∗ *x x* becomes trivial.

Note, however, that the degeneration effect does *not* occur for *user-declared* base types. For example, consider a theory that declares a base type *N* for the natural numbers and an induction axiom for it. *N* would not be translated to itself but to a fresh HOL-type in whose induction axiom the quantifier ∀ is relativized by *N*<sup>∗</sup> *x x*. Consequently, *N*<sup>∗</sup> *x x* is not trivial and can be used to reject spurious terms.

These examples show that we cannot expect the reverse directions of the statements in Theorem 1 to hold in general. However, we can show the following property that is sufficient to make our translation well-behaved:

**Theorem 2 (Soundness).** *Assume a well-formed DHOL-theory T* Thy*.*

*If* Γ *<sup>T</sup> F* : bool *and* Γ *<sup>T</sup> F, then* Γ *<sup>T</sup> F*

*In particular, if* Γ *<sup>T</sup> s* : *A and* Γ *<sup>T</sup> t* : *A and* Γ *<sup>T</sup> A*<sup>∗</sup> *s t, then* Γ *s* =*<sup>A</sup> t.*

*Proof.* The key idea is to transform a HOL-proof of *F* into one that is in the image of the translation, at which point we can read off a DHOL-proof of *F*. The full proof is given in Appendix B of the extended preprint [20].

Intuitively, the reverse directions of Theorem 1 holds once we establish that all involved expressions are well-typed in DHOL. Thus, we *can* use a HOL-ATP to prove DHOLconjectures if we validate independently that the conjecture is well-typed all along. In the remainder of the section, we develop the necessary type-checking algorithm for DHOL.

*Type-Checking.* Inspecting the rules of DHOL, we observe that all DHOL-judgments would be decidable if we had an oracle for the validity judgment Γ *<sup>T</sup> F*. Indeed, our DHOL-rules are already written in a way that essentially allows reading off a bidirectional type-checking algorithm. It only remains to split the typing judgment Γ *<sup>T</sup> t* : *A* into two algorithms for type-inference (which computes *A* from *t*) and type-checking (which takes *t* and *A* and returns yes or no) and to aggregate the rules for subtyping into an appropriate pattern-match.

The construction is routine, and we have implemented the resulting algorithm in our MMT/LF logical framework [12,19].<sup>4</sup> The oracle for the validity judgment is provided by our translation and a theorem prover for HOL (see Sect. 6). It remains to show that whenever the algorithm calls the oracle for Γ *<sup>T</sup> F*, we do in fact have that Γ *<sup>T</sup> F* : bool so that Theorem 2 is applicable. Formally, we show the following:

**Theorem 3.** *Relative to an oracle for* Γ *<sup>T</sup> F, consider a derivation of some DHOLjudgment, in which the children of each node are ordered according to the left-to-right order of the assumptions in the statement of the applied rule.*

*If the oracle calls are made in depth-first order, then each such call satisfies* Γ *<sup>T</sup> F* : bool*.*


*Proof.* We actually prove, by induction on derivations, the more general statement requires that each rule preserves the following preconditions:

<sup>4</sup> The formalization of DHOL in MMT is available at https://gl.mathhub.info/MMT/LATIN2/-/ blob/devel/source/logic/hol\_like/dhol.mmt. The example theories given throughout this paper and a few example conjectures are available at https://gl.mathhub.info/MMT/LATIN2/-/blob/ devel/source/casestudies/2023-cade.

Note that rules whose conclusion is a validity judgment can be ignored because they are replaced by the oracle anyway.

The most interesting case is the rule for Γ *<sup>T</sup> a s*<sup>1</sup> *... sn* ≡ *a t*<sup>1</sup> *...tn*. Here, the left-toright order of assumptions is critical because Γ *<sup>T</sup> s*<sup>1</sup> =*A*<sup>1</sup> *t*<sup>1</sup> may be needed to show, e.g., Γ *<sup>T</sup> s*<sup>2</sup> =*A*2[*<sup>x</sup>*1*/<sup>t</sup>*<sup>1</sup> ] *t*<sup>2</sup> : bool.

### **6 Theorem Prover Implementation**

We have integrated our translation as a preprocessor to the HOL ATP LEO-III [23]. We chose this ATP because its existing preprocessor infrastructure already includes a powerful logic embedding tool [21,22].However, with a little more effort, other HOL ATPs work as well.

Furthermore, we developed a bridge between the MMT logical framework [19] and LEO-III (both of which are written in the same programming language).This allows us to use our MMT-based type-checker for DHOL with our Leo-III-based theorem prover to obtain a full-fledge implementation of DHOL. Moreover, this system can immediately use MMT's logic-independent frontend features like IDE and module system.

Alternatively, we can use LEO-III as a general purpose DHOL-ATP that accepts input in TPTP. Even though TPTP does not officially sanction DHOL as a logic, it anticipates dependent function types and already provides syntax for them (although—to our knowledge—no ATP system has made use of it so far). Concretely, TPTP represents the type Π*x* : *A. B* as !>[X:A]:B and a base type *a t*<sup>1</sup> *... tn* as a @ t1 ... @ tn. TPTP does not yet provide syntax for predicate subtypes, i.e., this approach is currently limited to the no-subtyping fragment of DHOL. But extending the TPTP syntax with predicate subtypes would be straightforward, e.g., by using A ?| p to represent the type *A*|*<sup>p</sup>* .

The encoding of the conjecture given in Example 3 using the theory from Example 1 is given at https://gl.mathhub.info/MMT/LATIN2/-/blob/devel/source/casestudies/2023 cade/CategoryTheory/category-theory-lemmas-dhol.p (which also includes further example conjectures relative to the same theory). Running the logic embedding tool translates it into the TPTP TH0 problem given at https://gl.mathhub.info/MMT/ LATIN2/-/blob/devel/source/casestudies/2023-cade/CategoryTheory/category-theorylemmas-hol.p. Unsurprisingly, LEO-III can prove this simple theorem easily.

*Practical Evaluation.* In order to evaluate the practical usefulness of the translation we studied various example conjectures about function composition in set theory and category theory. We considered 5 further lemmas based on the theory in Example 1 which are written directly in TPTP and can all be proven by E, Vampire and cvc5. We also studied various harder lemmas about function composition and category theory. Those examples are written in MMT and take advantage of advanced MMT features to improve readability, such as definitions, user-defined notations, and implicit arguments that are inferred by the prover.

The examples can be found at https://gl.mathhub.info/MMT/LATIN2/-/blob/devel/ source/casestudies/2023-cade. The MMT prover successfully type-checks all problems and translates them into TPTP problems to be solved by HOL ATPs.

Since LEO-III can solve none of the 6 function composition examples, we also tested other HOL ATPs on the generated TPTP problems. Running all HOL ATP provers supported at https://www.tptp.org/cgi-bin/SystemOnTPTP on the function composition problems shows that many provers can solve 3 of the problems, Vampire can solve 4 of them, and 5 out of the 6 conjectures can be solved by at least one HOL ATP.

We also studied 6 more difficult theorems about limits in category theory including the uniqueness, commutativity, and associativity of some limits. To better evaluate the usefulness of our translation, we also formalized these lemmas in native HOL (in MMT) and compared the results. Naturally, the DHOL formalization is significantly more readable and benefits from the more expressive type system that can help spot mistakes in the formalization. Running the HOL ATPs from https://www.tptp.org/cgi-bin/ SystemOnTPTP on the generated TPTP problems (with 60 s timeout) yields the results in the table below (where we omit provers that proved none of the theorems in either formalization).



Overall more problems generated from the native HOL formalization can be solved by some HOL ATP (5/6 compared to 3/6 for the DHOL formalization). The HOL ATPs found 25 successful proofs for the native HOL problems and 20 for the DHOL problems. This suggests that current HOL ATPs can prove native HOL problems somewhat better than their translated DHOL counterparts, but not much better. In 8 cases a prover can prove the DHOL conjecture but not the native HOL analogue, indicating that the two formalizations have different advantages.

Furthermore, our translation has so far been engineered for generality and soundness/ completeness and not for ATP efficiency. Indeed, future work has multiple options to boost the ATP performance on translated DHOL, e.g., by


Thus, we consider the test results to be very promising. In particular, the translation could serve as a useful basis for type-checkers and hammer tools for DHOL ITPs.

### **7 Conclusion and Future Work**

We have combined two features of standard languages, higher-order logic HOL and dependent type theory DTT, thereby obtaining the new dependently-typed higher-order logic DHOL. Contrary to HOL, DHOL allows for *dependent* function types. Contrary to DTT, DHOL retains the simplicity of classical Booleans and standard equality.

On the downside, we have to accept that DHOL, unlike both HOL and DTT, has an undecidable type system. Further work will show how big this disadvantage weighs in practical theorem proving applications. But we anticipate that the drawback is manageable, especially if, as in our case, an implementation of DHOL is coupled tightly with a strong ATP system. We accomplish this with a sound and complete translation from DHOL into HOL that enables using existing HOL ATPs to discharge the proof obligations that come up during type-checking. We have implemented our novel translation as a TPTP-to-TPTP preprocessor for HOL ATP systems and outlined the implementation of a type-checker and hammer tool for DHOL based on the resulting prover.

Moreover, once this design is in place, it opens up the possibility to add certain type constructors to DHOL that are often requested by users but difficult to provide for system developers because they automatically make typing undecidable. We have shown an extension of DHOL with predicate subtypes as an example. Quotients, partial functions, or fixed-length lists are other examples that can be supported in future work.

We expect our translation remains sound and complete if DHOL is extended with other features underlying common HOL systems such as built-in types for numbers, the axiom of infinity, or the subtype definition principle. How to extend DHOL with a choice operator remains a question for future work — if solved, this would allow extending existing HOL ITPs to DHOL.

**Acknowledgment.** Chad Brown and Alexander Steen provided valuable feedback on earlier versions of this paper.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Towards Fast Nominal Anti-unification of Letrec-Expressions

Manfred Schmidt-Schauß<sup>1</sup> and Daniele Nantes-Sobrinho2(B)

<sup>1</sup> Goethe-University Frankfurt, Frankfurt, Germany schauss@em.uni-frankfurt.de <sup>2</sup> Imperial College London, London, UK dnantess@ic.ac.uk

Abstract. This paper describes anti-unification algorithms for computing least general generalizations of two expressions in a functional programming language with recursive let. First, by exploring a semantic approach to the problem, we argue for an improvement of the technique used in previous papers which avoids infinite chains of properly descending generalizations. Second, we present a (non-deterministic) nominal general anti-unification algorithm applicable to general expressions, which is complete, terminating and requires polynomial time. Third, we propose a specialized anti-unification algorithm applicable to two or more garbagefree ground expressions that produces a single least general generalization in polynomial time, and which can also exploit further semantically correct equivalences. Our results have potential applications in finding clones in functional programs.

Keywords: Anti-Unification · Nominal Techniques · Generalization · Functional Programming · Recursive Let

### 1 Introduction

Anti-unification problems (a.k.a. *generalization* problems) consist in finding a least general generalization (lgg) of two or more given expressions. This problem has interesting applications in computer science and software engineering, such as, symbolic mathematical computing [21], proof generalization [10], clone detection [8], among others; an overview is [6]. Early proposals to apply generalization for analyzing and improving programs by syntactic manipulations was given by Plotkin [12] and Reynolds [13].

We are interested in the anti-unification problem for languages with binders, such as the lambda-calculus, the pi-calculus, or the more general nominal language [11]. For instance, λx.Z is a generalization of the lambda-expressions λa.app(a, a), λa.λb.a, and λc.c. In fact, from λx.Z one can retrieve any of the

c The Author(s) 2023

Research reported in this paper is partially funded by the EPSRC Fellowship 'VeTSpec: Verified Trustworthy Software Specification' (EP/R034567/1).

B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 456–473, 2023. https://doi.org/10.1007/978-3-031-38499-8\_26

three expressions in the set by considering the appropriate instance of Z (where capturing is permitted), modulo renaming of bound variables: Z -→ app(x, x), Z -→ λb.x and Z -→ x, respectively.

In the context of languages with recursive let (letrec), techniques for solving anti-unification problems would allow, for instance, to identify the program scheme letr b.(λx.N); a.(λx.M) in b(y) as a generalization of the program [1]

$$\begin{aligned} \textbf{1etr} \; \begin{aligned} & \textbf{2etr} \; \textit{even}. (\lambda x. \textbf{if-else} \; (x = 0) \; (\textbf{true}) \; (odd(x - 1))); \\ & \textbf{0.d.} (\lambda x. \textbf{if-else} (x = 0) \textbf{(fa1se)} (even(x - 1))) \\ & \textbf{in} \; (even \; y) \end{aligned} \end{aligned}$$

or even identify both fragments of programs as possible clones [8].

In general, and as illustrated above, reasoning and automated deduction in higher order languages often require – as a very basic operation – to identify expressions up to α-equivalence. This means expressions are identified if they are syntactically equal up to a renaming of bound variables (which represent the binding structure). In addition, one has to have in mind that the letrec construct also satisfies laws like commutativity and associativity of its environment (e.g. we could permute the environment b.(λx.N); a.(λx.M) as a.(λx.M); b.(λx.N) above), which will be working in combination with binding primitives (i.e., also rename the bindings within the environment obtaining, e.g., c.(λx.M ); d.(λx.N )), and they also may occur nested.

Checking expressions for α-equivalence is an operation that is often performed on large and complex expressions. Ad-hoc algorithms for checking αequivalence of such expressions are worst-case exponential due to searching for all possible permutations and renamings. An approach to handle α-equivalence in deduction systems is to use nominal techniques [5,11], where the focus is to ease formula specification and deduction rather than speeding up α-equivalence checking. In general, checking α-equivalence with the language extended with letrec using nominal techniques is a GI-hard problem [18]. Here, we follow the nominal approach to handle binding of names and their renaming.

In [17] we have proposed a semantic approach to anti-unification based on nominal techniques which uses atom-variables, and significantly improves an existing approach [4] to anti-unification for languages with binders, since it provides a finitary set of least general generalizations. In this work we propose a simplification of this semantic approach to a nominal language extended by the letrec construct, which we call NLLX.

*Our Results.* We provide a nominal anti-unification algorithm (AntiUnifLetr) for NLLX which preserves the good properties of our semantic approach: it is terminating, sound, computes an exponential number of generalizations (Theorem 1) and weakly complete (Theorem 2). Completeness is achieved after further specialization of the computed generalization (Theorem 3).

The observation that *garbage* might be present in letrec expressions (for example, useless bindings in environments), and that they can be avoided by a semantically correct garbage collection algorithm, allows to apply the results and methods in [18], which shows that α-equivalence and further algorithms could be considerably improved for garbage-free expressions. This leads to the design of AntiUnifNoGarbage, an anti-unification algorithm for *ground* garbage-free expressions, that is terminating, runs in polynomial time and produces one least general generalization, i.e. it is unitary (Theorem 4).

### 2 Preliminaries

We consider a countable infinite set of atoms A of (concrete) symbols a, b which we usually denote in a meta-fashion; so we can use symbols a, b also with indices (the variables in lambda-calculus). We also consider a set F of function symbols with arity *ar* (·), and a countably infinite set of expression-variables *Var* ranged over by X, Y . We will use mappings on atoms from A: a *swapping* (a b) is a bijective function that maps atom a to atom b, atom b to a, and is the identity on other atoms. We will also use finite permutations π on atoms from A, which consists of a composition of swappings: in fact, every finite permutation π can be represented by a composition of at most (|*dom*(π)| − 1) swappings, where *dom*(π) = {<sup>a</sup> <sup>∈</sup> <sup>A</sup> <sup>|</sup> <sup>π</sup>(a) <sup>=</sup> <sup>a</sup>}. The identity permutation is denoted Id. Composition <sup>π</sup><sup>1</sup> ◦ <sup>π</sup><sup>2</sup> and the inverse <sup>π</sup>−<sup>1</sup> can be immediately computed, where the complexity is polynomial in the size of *dom*(π).

*Ground Expressions.* The syntax of expressions e¯ of the (ground) language NLL with recursive let is:

$$
\bar{e} ::= a \mid \lambda a. \bar{e} \mid (f \; \bar{e}\_1 \; \dots \; \; \bar{e}\_{ar(f)}) \mid (\mathtt{letr} \; a\_1. \bar{e}\_1; \dots; a\_n. \bar{e}\_n \; \mathbf{in} \; \bar{e}).
$$

Ground expressions are either atoms, abstractions of an atom in an expression, function application, or a letrec expression. We assume that binding atoms <sup>a</sup>1,...,an in a letrec-expression (letr <sup>a</sup>1.e¯1; ... ; <sup>a</sup>n.e¯n in <sup>e</sup>¯) are pairwise distinct. Sequences of bindings <sup>a</sup>1.e¯1; ... ; <sup>a</sup>n.e¯n may be abbreviated as *env* (environment). The *scope* of atom a in λa.e¯ is standard: a has scope e¯. The letr-construct has a special scoping rule: in (letr <sup>a</sup>1.e¯1; ... ; <sup>a</sup>n.e¯n in <sup>e</sup>¯), every atom <sup>a</sup>i that is free in some <sup>e</sup>¯j or <sup>e</sup>¯ is bound by the environment <sup>a</sup>1.e¯1; ... ; <sup>a</sup>n.e¯n. This defines in NLL the notion of free atoms *FA*(¯e), bound atoms *BA*(¯e) in expression e¯, and all atoms *AT*(¯e) that occur in <sup>e</sup>¯. For an environment *env* <sup>=</sup> {a1.e¯1,...,an.e¯n}, we define the set of letrec-atoms as *LA*(*env*) = {a1,...,an}. We say <sup>a</sup> *is fresh for* <sup>e</sup>¯ iff <sup>a</sup> ∈ *FA*(¯e), denoted as <sup>a</sup>#¯e.

*Remark 1.* The base language NLL is a lambda calculus extended with function constant and a recursive let constructor letr, and can also be interpreted as an untyped fragment of Haskell [7]. The function application operator in functional languages (implicit in some languages) can be encoded by a binary function app, and the case-construct in its plain form can be encoded as an application.

*Example 1.* The letrec-expression (letr a.*cons* e¯<sup>1</sup> b; b.cons e¯<sup>2</sup> a in a) represents an infinite list (cons e¯<sup>1</sup> (cons e¯<sup>2</sup> (cons e¯<sup>1</sup> (cons e¯<sup>2</sup> ...)))), where e¯1, e¯<sup>2</sup> are expressions and cons is the usual list constructor taken as a function symbol.

Syntactic α-equivalence on NLL is defined, following [16], as an extension of usual α-equivalence, where in addition the expressions (letr <sup>a</sup>1.e¯1; ... ; <sup>a</sup>n.e¯n in <sup>e</sup>¯) and (letr <sup>a</sup> 1.e¯ <sup>1</sup>; ... ; a n.e¯ n in <sup>e</sup>¯ ) are α-equivalent iff the expressions can be made equal by correctly renaming them, possibly reordering the environment.

Definition 1. *The* <sup>α</sup>*-equivalence* <sup>∼</sup>α *on* <sup>e</sup>¯ <sup>∈</sup> NLL *is defined as follows:*


$$\frac{\forall i. \ \pi(a'\_i) = a\_i \quad \pi \cdot \bar{e}'\_i \sim\_\alpha e\_i \quad \pi \cdot \bar{e}' \sim\_\alpha \bar{e} \quad a\_i \#(\mathtt{1} \bullet \mathtt{r} \ a'\_1, \bar{e}'\_1; \ldots; a'\_n, \bar{e}'\_n \text{ in } \bar{e}')}{(\mathtt{1} \bullet \mathtt{r} \ a\_1.\bar{e}\_1; \ldots; a\_n.\bar{e}\_n \text{ in } \bar{e}) \sim\_\alpha (\mathtt{1} \bullet \mathtt{r} \ a'\_1.\bar{e}'\_1; \ldots; a'\_n.\bar{e}'\_n \text{ in } \bar{e}')}$$

*where, for* <sup>i</sup> = 1,...,n*:* <sup>a</sup>i*'s are pairwise distinct, and* <sup>a</sup> i*'s are pairwise distinct.*

Permutations operate on NLL-expressions by recursing on their structure. For example, <sup>π</sup>·(letr <sup>a</sup>1.e¯1; ... ; <sup>a</sup>n.e¯n in <sup>e</sup>¯)=(letr <sup>π</sup>·a1.π·e¯1; ... ; <sup>π</sup>·an.π·e¯n in <sup>π</sup>·e¯).

*General Expressions.* The syntax of the nominal higher-order language NLLX with letrec and variables is:

$$\begin{array}{lcl} e,s,t ::= a \mid \pi \cdot X \mid \lambda a.e \mid (f \; e\_1 \; \dots \; e\_{ar(f)}) \mid (\mathtt{letr} \; a\_1.e\_1; \dots; a\_n.e\_n \; \text{in } e) \\ \pi \quad \mathrel{=} \emptyset \mid (a \; b) \cdot \pi \end{array}$$

General expressions extend NLL with *suspensions*, i.e., expressions of the form π · X, which denotes a variable X (also called a generalization variable) in which a permutation is suspended: π is waiting for some instantiation of X before its action. The basic properties and functions of NLL such as *FA*(e), *BA*(e), scope, fresh, etc., extend to NLLX as expected. In particular, *AT*(e) is extended to suspensions as *AT*(<sup>π</sup> · <sup>X</sup>) = {<sup>a</sup> <sup>|</sup> <sup>a</sup> <sup>∈</sup> *dom*(π)}. The suspension *Id*·<sup>X</sup> is written simply as <sup>X</sup>. We define *Head*(s) either as the top function symbol in {a, f, λ, letr} or *Head*(π · X) as X. More generally, for a nonvariable expression e, the expression π·e means an operation, which is performed by shifting π into the expression, using the additional simplification π1·(π2·e) → (π<sup>1</sup> ◦ π2)·e, where after the shift, π only remains in suspensions. For instance, (a c) · (letr a.(λb.X) in f(a)) denotes a renaming of a to c and vice-versa, which is equal to (letr c.(λb.(a c) · X) in f(c)).

An NLLX-*freshness constraint* is an expression of the form <sup>a</sup>#e, expressing that <sup>a</sup> is not free in (or is fresh for) <sup>e</sup>, where <sup>e</sup> is an NLLX-expression. A conjunction (or set) of freshness constraints is called *freshness context* which is written using the notation <sup>∇</sup>, Δ. Every NLLX-freshness context can be transformed into

Fig. 1. Simplification of freshness constraints in NLL<sup>X</sup>

a simpler one (flattened form) using the rules in Fig. 1 exhaustively until consisting only of constraints of the form a#X or ⊥ (fail), which are called *atomic*. An NLLX-freshness context <sup>∇</sup> is *consistent* if its flattened form does not contain <sup>⊥</sup>. The definition of <sup>α</sup>-equivalence extends to NLLX as expected. In the following, [s]α denotes the equivalence class of the expression <sup>s</sup> induced by the equivalence relation <sup>∼</sup>α.

Lemma 1. *Simplification using rules of Fig. 1 constitutes a polynomial decision algorithm for satisfiability of* ∇*: If* ⊥ *is in the result, then unsatisfiable; otherwise, satisfiable.*

An NLLX-substitution <sup>ρ</sup> is a finite mapping from generalization variables to NLLX-expressions. Substitutions act on expressions homomorphically and this action extends to freshness constraints and contexts as follows: (a#X)ρ iff a#Xρ and ∇ρ = {a#eρ | a#e ∈ ∇}. We will denote the domain of substitutions by *dom*(·). A substitution is *ground* if it maps (generalization) variables to NLL-expressions. For a ground substitution ρ: ∇ρ is called *valid* iff ∇ρ is consistent.

*Permutations and Cycles.* A *cycle* τ in A is a permutation represented by a sequence of different atoms <sup>a</sup>1, a2,...,an, such that <sup>τ</sup> (ai) = <sup>a</sup>i+1 for <sup>i</sup> <sup>=</sup> <sup>1</sup>,...,n <sup>−</sup> <sup>1</sup> and <sup>τ</sup> (an) = <sup>a</sup>1. As standard, such cycle will be denoted as <sup>τ</sup> = (a<sup>1</sup> <sup>a</sup><sup>2</sup> ...an). Every permutation <sup>π</sup> has a representation <sup>τ</sup>1τ<sup>2</sup> ...τn (which abbreviates <sup>τ</sup><sup>1</sup> ◦ <sup>τ</sup><sup>2</sup> ◦ ... ◦ <sup>τ</sup>n) where <sup>τ</sup>i are disjoint (primitive) cycles.

The disjoint cycles can be permuted. For instance, the permutation (a b)(b d)(c e) has the cycle presentation (abd)(c e) which is the same as (c e)(abd).

#### 2.1 Data-Structures of Anti-unification Algorithms

Anti-unification algorithms will produce as a result expressions that are restricted by a freshness context. These are called *expressions-in-context* and denoted as (∇, s), where <sup>∇</sup> is a freshness context and <sup>s</sup> is an NLLX-expression.

The semantics of expressions-in-context follow the idea that syntactically used names of atoms in expressions are fixed, and atoms occurring in ∇, but not in s are viewed as existentially quantified: these are treated as arbitrary names of atoms.

Definition 2. *An* expression-in-context *is a pair* (∇, e)*, where* e *is an expression and* ∇ *is a (consistent) freshness context. The* semantics *of* (∇, e) *is the set of ground instances of* e *that satisfy* ∇*, i.e.,*


*where* <sup>ρ</sup><sup>ˆ</sup> *is a mapping from Var* <sup>∪</sup> <sup>A</sup> *to ground expressions such that* <sup>ρ</sup>ˆ|<sup>A</sup> *is a bijection on atoms.*

The existential quantification on valid instances of expressions gives additional power to the semantics of expressions-in-context: by considering a as existentially quantified, we obtain that -({a#X}, X) is the same as -(∅, X).

*Example 2.* Consider the expression-in-context ({a#X}, f(X)). We will argue that -({a#X}, f(X)) = -(∅, f(X)). First, notice that a does not occur syntactically in f(X) and therefore we can take ρˆ mapping a to an arbitrary atom that does not break validity of ∇. In fact:


Our semantics for ({a#X}, X) differs from the one in Baumgartner et al. [3] where -({a#X}, X)B is the set of all ground instances of <sup>X</sup>, where <sup>a</sup> is not permitted to occur free. This will induce the negative effect of properly infinite descending chains<sup>1</sup> of expressions-in-context such as ... <sup>≺</sup>B ({a#X, b#X}, f(X)) <sup>≺</sup>B ({a#X}, f(X)) <sup>≺</sup>B (∅, f(X)), which is eliminated in our approach since in all these expressions-in-context have the same semantics.

Next we define an order relation on expressions-in-context which establishes when one expression-in-context is more general or more specific than another.

#### Definition 3 (Ordering, Generalization).


<sup>1</sup> -·<sup>B</sup> and ≺<sup>B</sup> denote the semantics and order relation in [4], resp.

*– A generalization* (Δ , r ) *of* (∇, s) *and* (∇ , t) *is the* most specific *(the least general) one, if for all generalizations* (Δ, r) *of* (∇, s) *and* (∇ , t)*, we have* (Δ, r) (Δ , r )*.*

For instance, the expression-in-context (∅, λe.*app*(e, X)) is a generalization of (∅, λa.app(a, c)) and (∅, λb.app(b, Z)), for a new atom e. It is easy to verify that (∅, λe.app(e, X)) (∅, λa.app(a, c)) and (∅, λe.app(e, X)) (∅, λb.app(b, Z)).

### 3 The Anti-unification Problem for NLL*<sup>X</sup>*

We are interested in *the anti-unification problem for* NLLX:

Given two expressions-in-context (∇, s) and (∇, t),

Find a *least general generalization*, i.e., another expression-in-context (Δ, r) that satisfies (Δ, r) (∇, s) and (Δ, r) (∇, t).

The challenge in treating letrec-expressions in anti-unification algorithms is, on the one hand, its unusual scoping and; on the other hand, the multiple possibilities to formulate the same problem in several syntactically different ways.

*Remark 2* [Permutations in the generalization of suspensions]. Generalization of suspensions, say (∅, π1·Z) and (∅, π2·Z), need some preparations based on properties of permutations: first, we decompose π<sup>1</sup> and π<sup>2</sup> into their cycle presentation, say <sup>π</sup><sup>1</sup> <sup>=</sup> <sup>μ</sup><sup>1</sup> ...μn and <sup>π</sup><sup>2</sup> <sup>=</sup> <sup>μ</sup> <sup>1</sup> ...μ m; second, we work on generalizing (∅, μ<sup>1</sup> ...μn ·Z) and (∅, μ <sup>1</sup> ...μ m·Z) as follows: let <sup>π</sup><sup>3</sup> be a permutation obtained from the set of common cycles of π<sup>1</sup> and π2, say π<sup>1</sup> = π3π <sup>1</sup> and π<sup>2</sup> = π3π 2. Then, π<sup>3</sup> · X is a generalization for (∅, π<sup>1</sup> · Z) and (∅, π<sup>2</sup> · Z). In the following we will denote the common cycles of permutations π<sup>1</sup> and π<sup>2</sup> as π<sup>1</sup> ∩ π2. This will be addressed in details with the specific rule for suspensions in Fig. 2.

# 3.1 The Algorithm AntiUnifLetr and Its Rules

We first define the nominal generalization algorithm AntiUnifLetr that (nondeterministically) computes a single generalization of the input expressions, where the generalization can also be nonlinear in the generalization variables due to merging. We will argue that the algorithm is sound and weakly complete, and one run can be performed in polynomial time.

The data structure of the algorithm AntiUnifLetr is (Γ,M, <sup>∇</sup>, L) where:


We call such a tuple a *state*. The rules of the algorithm AntiUnifLetr, given in Fig. 2, operate on states and ∪· denotes disjoint union. Given two NLL expressions s and t, and a freshness context Δ (possibly empty), to compute generalizations for (Δ, s) and (Δ, t), we start with ({X : s t}; ∅; Δ; []), the *initial state* (sometimes abbreviated to (Δ, {X : s t})), where X is a fresh generalization variable, and we apply the rules from Fig. 2 and Fig. 4 until no more rule applications are possible and we reach the *final state* which has the form (∅,M, ∇, L), where M must be completely merged. We will denote the computation from initial to a final state: (Γ; ∅; Δ; []) =⇒<sup>∗</sup> (∅; M; ∇, L).

The output is an expression-in-context obtained from the generated substitution L and the final freshness constraint ∇, i.e. the output is (∇, X ◦ L), also called the *result computed* by the AntiUnifLetr algorithm. We say it is *complete* if every least general generalization (lgg) is found and it is *weakly complete* if every lgg is found up to some set of freshness constraints.


Fig. 2. Rules of the algorithm AntiUnifLetr

Rules in Fig. 2 are similar to the ones in [3] without the parameter for the set of atoms occurring in the initial state and throughout the computation, and deal with abstractions, function application, and suspensions. The subalgorithm Eqvm, defined by the rules in Fig. 3, computes a matching permutation, say <sup>π</sup>, of

$$\frac{\begin{array}{c} \Psi \circ \{f(s\_1, \ldots, s\_n) \preceq f(s'\_1, \ldots, s'\_n)\} \\ \Psi \circ \{s\_1 \preceq s'\_1, \ldots, s\_n \preceq s'\_n\} \end{array}}{\begin{array}{c} \Psi \circ \{\lambda a.s \preceq \lambda a.s \} \\ \Psi \not\equiv \{s \preceq t\} \end{array}} \qquad\qquad\qquad \frac{\begin{array}{c} \Psi \circ \{\lambda a.s \preceq \lambda a.t\} \\ \Psi \circ \{s \preceq t\} \end{array}}{\begin{array}{c} \Psi \circ \{\lambda a.s \preceq \lambda b.t\} \\ \Psi \circ \{\{s \preceq (a \mathbin{\bf{b}} \,\begin{array}{c} \Psi \circ \{\lambda a.s \} \end{array} \star \#b.t\} \end{array}} \qquad\qquad \frac{\begin{array}{c} \Psi \circ \{\lambda a.s \preceq \lambda b.t\} \\ \Psi \circ \{\lambda b.t\} \end{array}}{\begin{array}{c} a \#\lambda b.t \end{array}}$$


#### Fig. 3. The permutation matching (sub-)algorithm Eqvm

$$\frac{\{X \colon \mathtt{1etr} \ a\_{1}.s\_{1}; \ldots; a\_{n}.s\_{n} \text{ \tiny in \, s} \stackrel{\Delta}{=} \mathtt{1etr} \ a\_{1}.t\_{1}; \ldots; a\_{n}.t\_{n} \text{ \tiny in \, t}\} \cup \Gamma, M, \nabla, L}{\Gamma \cup \{X\_{1} \colon s\_{1} \stackrel{\Delta}{=} t\_{1}, \ldots, X\_{n} \text{ \mid s\_{n} \stackrel{\Delta}{=} t\_{n}, Y \colon s \stackrel{\Delta}{=} t\}, M, \nabla, L \cup \{X \mapsto \mathtt{1etr} \ a\_{1}.X\_{1}, \ldots, a\_{n}.X\_{n} \text{ \tiny in \, Y}\}}$$

$$\{X : \mathbf{1etx} \ a\_1.s\_1; \dots; a\_n.s\_n \text{ in } s \triangleq \mathbf{1etx} \ b\_{\rho(1)}.t\_{\rho(1)}; \dots; b\_{\rho(n)}.t\_{\rho(n)} \text{ in } t\} \cup I, M, \nabla, L$$

Fig. 4. Rules for letrec of the algorithm AntiUnifLetr

two expressions-in-context (say <sup>s</sup> <sup>t</sup> in <sup>Ψ</sup> with context <sup>∇</sup>), where EqvBiEx(Π) checks whether the set of swappings is injective and then adds a minimal set of mappings such that the result is a bijection, i.e. a permutation (on atoms). Rules in Fig. 4 are new and will be described in detail:


The latter rule exploits the following idea: if λa.s and λb.t are α-equivalent, then one can rename a and b with the same fresh name c and propagate the renaming within s and t and still obtain α-equivalent expressions.

*Example 3.* A generalization for the expressions-in-context (∅, letr a.a; b.c in f(a, b)) and (∅, letr b.a; c.c in f(a, b)) is computed as follows:

1. We cannot apply rule (Letraa) since the binding atoms in the environment are not corresponding to each other. We may rearrange the bindings using (Letperm). Then we apply rule Letrab for renaming: we choose d, e as fresh atoms and use the renaming (a d)(b e) and (c d)(b e), which leads to the check ∇ = {d, e#(letr a.a; b.c in f(a, b))}∪{d, e#(letr c.c; b.a in f(a, b))} = ∅ which holds and evaluates to ∅, since the terms are ground. After an application (Letraa), which decomposes the letrec environments:

$$\begin{array}{l} \left( \{ X : \mathtt{1etr } a.a ; b.c \text{ in } f(a,b) \triangleq \mathtt{1etr } b.a ; c.c \text{ in } f(a,b) \}, \emptyset, \emptyset, [] \right) \\\hline \{ X : \mathtt{1etr } a.a ; b.c \text{ in } f(a,b) \triangleq \mathtt{1etr } c.c ; b.a \text{ in } f(a,b) \}, \emptyset, \emptyset, [] ) \\\hline \left( \{ X : \mathtt{1etr } d.d ; e.c \text{ in } f(d,e) \triangleq \mathtt{1etr } d.d ; e.a ; \mathtt{ in } f(a,e) \}, \emptyset, \emptyset, [] \right) \\\hline \left( \{ \overline{X : d \triangleq d } d, X\_2 ; c \triangleq a, Y : f(d,e) \triangleq f(a,e) \}, \emptyset, \emptyset, \{ X \longmapsto \mathtt{1etr } d.X\_1 ; e.X\_2 \; \mathtt{in } Y \} \right) \end{array}$$

2. After three applications of (Dec), one (Solve) and one (Mer) we obtain (∅, {X<sup>2</sup> : c a}, ∅, {X -→ letr d.d; e.X<sup>2</sup> in f((c d) · X2, e)}). The output generalization is (∅, letr d.d; e.X<sup>2</sup> in f((c d) · X2, e)).

Another Solution: from (X : letr a.a; b.c in f(a, b) letr b.a; c.c in f(a, b)) we could have immediately applied the rule (Letrab) using π<sup>1</sup> = (a d)(b e) for the left and π<sup>2</sup> = (b d)(c e) for the right expression. This finally leads to a generalization of the form letr d.X1, e.X<sup>2</sup> in f(X3, X4) which is "weaker" (too general) than the one above.

Note that the environments of one of the expressions to be generalized contains *garbage*: the binding c.c is not used in f(a, b).

Theorem 1. *The algorithm* AntiUnifLetr *is terminating and sound. A single run requires polynomial time. The overall computation requires exponential time and may compute an exponential number of generalizations.*

*Proof.* Soundness and termination can be easily checked by inspection of the rules of Figs. 2, 4 and 3. The number of nondeterministic alternatives is exponential in the worst case, and it is induced by the rule (Letperm). A single run (one branch) can be performed in polynomial time.

Notice that except for rule (Letrab), all the rules in AntiUnifLetr algorithm preserve the context ∇. This differs from the approach taken in [3] which might add new freshness constraints with a rule similar to our rule (SolveYY), based on a set A of all atoms appearing throughout the computation of a generalization. We show in the next example that this choice of initially preserving the freshness context leads to a weak completeness result, but completeness is regained with a specialization algorithm that will be presented next.

*Example 4 (Weak Completeness).* The expressions-in-context (∅, f(c1, a)) and (∅, f(c2, a)) have the generalization (∅, f(X1, a)) computed by the rules of Fig. 2. However, this is not the lgg since ({a#X1}, f(X1, a)) is a more specific generalization. In fact, f(a, a) ∈ -(∅, f(X1, a), but f(a, a) ∈/ -({a#X1}, f(X1, a).

Theorem 2 (Weak Completeness). *Given* NLLX *expressions* <sup>e</sup> *and* <sup>e</sup> *, and a freshness context* Δ*. If* (∇ , r) *is a generalization of* (Δ, e) *and* (Δ, e )*, then there exists a* ∇ *and a derivation* ({X : e e }, ∅, Δ, []) =⇒<sup>∗</sup> (∅,M, ∇, σ) *such that* (∇∪∇,Xσ) *is a generalization of* (Δ, e) *and* (Δ, e ) *and* (∇∪∇,Xσ) (∇ , r)*.*

*Proof.* The proof is by induction on the structure of r.

*Example 5 (Cont. Example* 4*).* We remark another behaviour that can be seen from the execution of AntiUnifLetr: ({X:f(c1, a) f(c2, a)}, ∅, ∅, []) reduces to (∅, {X1:c<sup>1</sup> c2}, ∅, {X -→ f(X1, a)}). Notice that (i) f(a, a) is clearly not an element of -(∅, f(c1, a)) nor -(∅, f(c2, a)); (ii) the information that c<sup>1</sup> and c<sup>2</sup> were free names in the input problem was "forgotten" by the generalization f(X1, a), but it can be retrieved from the solved triple in the final state. (iii) a#c<sup>1</sup> and a#c<sup>2</sup> hold trivially.

#### 3.2 From Weak Completeness to Completeness

Given a result (∇, s) of a run of the algorithm AntiUnifLetr, the result is in general only weakly complete, since the expressivity of the language may permit a better generalization. The true most specific generalization may have additional freshness constraints, as it was shown in Example 4. The problem of specializing the generalizer output by AntiUnifLetr is subtle: a different but related behaviour can be seen with the next example.

*Example 6.* Consider the expressions-in-context (∅, f(g(c1, a), a)) and (∅, f(c2, <sup>a</sup>)) as input for AntiUnifLetr. The output generalization is (∅, f(X1, a)), and this is the lgg. In fact, a run of the algorithm would terminate with the final state (∅, {X1:g(c1, a) c2}, ∅, {X -→ f(X1, a)}).

We can use the information in the solved part of the final state to build the substitutions σ<sup>1</sup> = {X<sup>1</sup> -→ g(c1, a)} and σ<sup>2</sup> = {X<sup>1</sup> -→ c2} that instantiate the generalization f(X1, a) back to the input terms. Notice that a#X1σ<sup>1</sup> is equal to a#g(c1, a) and does not hold. Thus, we cannot add {a#X1} as a constraint to the generalization, since ({a#X1}, f(X1, a)) cannot be instantiated to f(g(c1, a), a).

Let <sup>γ</sup> = (∅; <sup>M</sup>; <sup>∇</sup>;L) be a final state. We define ATf (γ) as the set of unbound atoms that occur in M, ∇ or codom(L). We say that a generalization variable X occurs in γ when it occurs in ∇, or as a subterm in M, or in codom(L).

# Algorithm 1 AntiUnifLetr- Phase 2

1: Input: (Δ, s) and (Δ, t) 2: ({<sup>X</sup> : <sup>s</sup> <sup>t</sup>}; <sup>∅</sup>; <sup>Δ</sup>; []) <sup>∗</sup> <sup>=</sup><sup>⇒</sup> <sup>γ</sup> = (∅; <sup>M</sup>; <sup>∇</sup>;L) 3: Let (∇, r)=(∇, X ◦ <sup>L</sup>) be the resulting generalization. 4: Let X be a generalization variable occurring in r. Repeat for each X 5: if <sup>a</sup> <sup>∈</sup> AT(r)\BA(r) and a /<sup>∈</sup> RelAtomsγ(X) then Repeat for each <sup>a</sup> <sup>∈</sup> AT(t) 6: <sup>∇</sup> := ∇∪{a#X} 7: end if

Definition 4 (Relevant Atoms). *Let* γ = (∅; M; ∇;L) *be a final state in a run of* AntiUnifLetr*. Let* <sup>X</sup> *be a generalization variable occurring in* <sup>γ</sup>*. The set of* relevant atoms for <sup>X</sup>*, denoted* RelAtomsγ(X)*, is defined recursively:*

	- *RelAtoms*γ(a) = <sup>a</sup>*, RelAtoms*γ(f s<sup>1</sup> ...sn) = - i *RelAtoms*γ(si)*;*
	- *RelAtoms*γ(π·s) = <sup>π</sup>·*RelAtoms*γ(s)*;*
	- *RelAtoms*γ(λa.s) = *RelAtoms*γ(s)\{a}*; and*
	- RelAtomsγ(letr <sup>a</sup>1.s1; ... ; <sup>a</sup>n.sn in <sup>r</sup>) = RelAtomsγ(s1,...,sn, r)\ {a1,...,an}*.*

For example, if we take M = {X:f(a, b) g((a c)·Y ), Y :f(c, d) g(e)} and ∇ = {a#Y }, then the set of relevant atoms for Y is {c, d, e}, and for X it is {a, b} ∪(a c){c, d, e} = {a, b, d, e}, where it is noteworthy that atom c is missing.

We formulate a postprocessing algorithm (Algorithm 1) for AntiUnifLetr which is able to compute least general generalizations.

Theorem 3. *Adding (Algorithm 1) makes* AntiUnifLetr *complete.*

Note, however, that due to the non-determinism, it may be possible that one of the runs generates a generalization that is strictly less specific than the result in another run, see Example 3.

*Example 7.* This example shows the result of generalizing more complex expressions. Consider the generalization problem, and the sequence of generalization steps, where the last step abbreviates several steps.

$$\frac{\left(\{X\_1 : \lambda a.f(a,a,c) \triangleq \lambda b.f(b,d,c)\}, \emptyset, [\text{ ]\}\right)}{\left(\{X\_1 : \lambda e.f(e,e,c) \triangleq \lambda e.f(e,d,c)\}, \emptyset, [\text{ ]\}\right)}$$

$$\frac{\left(\{X\_2 : f(e,e,c) \triangleq f(e,d,c)\}, \emptyset, \{X\_1 \mapsto \lambda e.X\_2\}\right)}{\left(\emptyset, \{X\_3 : e \triangleq d\}, \{X\_1 \mapsto \lambda e.X\_2, X\_2 \mapsto f(e,X\_3,c)\}\right)}$$

Now the resulting lgg can be computed by adding only one freshness constraint: ({g#X3}, λe.f(e, X3, c)). This holds, since <sup>d</sup> <sup>∈</sup> *RelAtoms*γ(X3), and hence does not occur in the freshness context. Notice that c#X<sup>3</sup> is added as a freshness constraint since <sup>c</sup> occurs in the generalization expression, but c /<sup>∈</sup> RelAtomsγ(X3).

### 4 Generalization Algorithm Under Semantic Equalities

We use semantic equivalences to specialize and extend our anti-unification algorithm to *ground expressions*. In particular, we exploit the fact that removal of *garbage* is semantically correct: it does not alter the meaning of the program. First, we develop a standardization algorithm for garbage-free expressions that helps in comparing the letrec-expressions and computing generalizations in polynomial time. Second, we propose a variation of our anti-unification algorithm called AntiUnifNoGarbage.

NLL-expressions may contain irrelevant bindings in the letrec environment: for instance, in (letr a.Nil; b.b in f(a, a)), the binding b.b is useless for the expression, and will be considered as *garbage*. The garbage bindings do not contribute to the meaning of the functional expressions. It is shown in [18], that α-equivalence of garbage-free letrec-expressions can be checked in polynomial time, and that, in general, this problem is group-isomorphism-complete [2,20].

Definition 5. *Let* e¯ *be an* NLL*-expression. We say that* e¯ *contains* garbage *iff there is a subexpression* (letr <sup>a</sup>1.e¯1,...,an.e¯n in <sup>e</sup>¯ ) *in* e¯ *such that the environment* <sup>a</sup>1.e¯1,...,an.e¯n *can be split into two nonempty sub-environments* <sup>a</sup>i<sup>1</sup> .e¯i<sup>1</sup> ,...,ai*<sup>k</sup>* .e¯i*<sup>k</sup> and* <sup>a</sup>j<sup>1</sup> .e¯j<sup>1</sup> ,...,aj*k*- .e¯j*k*- *, and the binding atoms* <sup>a</sup>i*<sup>h</sup>* , h <sup>=</sup> <sup>i</sup>1,...,i<sup>k</sup> *do not occur free in* letr <sup>a</sup>j<sup>1</sup> .ej<sup>1</sup> ,...,aj*<sup>k</sup>* .e¯j*<sup>k</sup>* in <sup>e</sup>¯ *. We say that* e¯ *is* garbage-free (or garbage-collected) *iff it does not contain garbage.*

Making an expression garbage-free may require an iterated removal of garbage, using the garbage removal rewriting rules below:

$$\begin{aligned} \mathtt{(gr1) 1} \mathtt{letr } a\_1.e\_1; \ldots; a\_n.e\_n; b\_1.e'\_1; \ldots; b\_m.e'\_m \text{ in } e'\_{m+1} \longrightarrow \\ \mathtt{1 \bullet tr} \ b\_1.e'\_1; \ldots; b\_m; e'\_m \text{ in } e'\_{m+1}, \text{ if } \bigcup FA(e'\_i) \cap \{a\_1, \ldots, a\_n\} = \emptyset \\ \mathtt{(gr2) 1} \mathtt{letr } a\_1.e\_1; \ldots; a\_n.e\_n \text{ in } e \longrightarrow e, \text{ if } FA(e) \cap \{a\_1, \ldots, a\_n\} = \emptyset \end{aligned}$$

We illustrate our ideas for the generalization of garbage-free expressions. Note that the used equality of expressions makes a notable difference for the results as well as for the algorithmic steps.

*Example 8.* Let s¯ = let c.a in f(g(c)) and t ¯ = let d.b in f(h(d)) two garbage-free ground expressions. A generalization of <sup>s</sup> and <sup>t</sup> w.r.t. <sup>∼</sup>α is s ¯ = let c.X<sup>1</sup> in f(X2), which is also an lgg. If we would allow more equalities on the expressions, like <sup>∼</sup>gc as a part of the equality or even an equality <sup>∼</sup>α,gc,letcp that allows also copying let-bindings, then <sup>s</sup>¯ would be equivalent to f(g(a)) and t ¯ equivalent to f(h(b), which have f(X) as a generalization. The generalisation algorithm, however, would be much more complex.

Fig. 5. Different lengths of letrec-environments in AntiUnifLetr

The next step is to standardize the sequence of bindings in garbage-collected expressions, which greatly supports further operations.

*Standardization Algorithm.* Consider let <sup>a</sup>1.e¯1; ... ; <sup>a</sup>n.e¯n in <sup>e</sup>¯ be a garbage-free NLL-expression. Then, rearrange the bindings as follows:


These steps are to be used iteratively: apply them to the smallest subexpression e¯ of e¯, which is not yet correctly arranged. The result is a *gc-standardized* expression <sup>t</sup>gcst of <sup>t</sup>.

*Example 9.* Consider the garbage-free expression let a.app(b, λc.c); b.λd.d in a, where app is a binary function symbol for denoting the usual application of the lambda calculus. The standardization algorithm returns the gc-standardized expression let b.λd.d; a.app(b, λc.c) in a.

Proposition 1. *For every garbage-free* NLL*-expression* e¯*, the gc-standardized expression* <sup>e</sup>¯ *of* <sup>e</sup>¯ *with* <sup>e</sup>¯ <sup>∼</sup>α <sup>e</sup>¯ *, has a sequence of bindings in all letrec environments that is unique and has a fixed ordering. The computation can be done in polynomial time.*

*Proof.* Garbage collection is polynomial: after every step the expression will be smaller, and a single step of detecting a set of redundant bindings is also polynomial. The rearrangement also can be done first for subexpressions of smaller size, and a single rearrangement of the top binding takes polynomial time.

#### 4.1 Anti-unification of Garbage-Free Expressions

In this and the next subsection on generalization we will use a syntactically fixed ordering of bindings in a let environments, and denote this as letf.

AntiUnifLetr is adapted to the *ground* situation in several aspects: (i) There are no freshness constraints; (ii) expressions are first gc-standardized; (iii) we permit that n ≥ 2 expressions are to be generalized in one step; (iv) in a set of expressions to be generalized, we make all top-level letrec environments to be of the same (minimal) length by adding bindings a.a with fresh atoms a; and (v) we fix the sequence of bindings in a let indicated by letf.

We remark that an iterated generalization of pairs (i.e., to generalize s1, s<sup>2</sup> and s<sup>3</sup> one first generalizes s<sup>1</sup> and s2, and from the result, say r, one repeat the generalization process with r and s3) has the disadvantage that from the second step, after the first application of rule, there are generalization variables, and the semantic properties get lost, which means that, e.g., the standardization is no longer usable, and so the method does no longer work properly in the next generalization steps.

Therefore, for generalizing more than 2 expressions, the data structure adopted is: the generalized state is as ({X:s<sup>1</sup> - ... <sup>s</sup>n}; <sup>M</sup>; <sup>∇</sup>;L), and we use generalization tuples of the form {X:s<sup>1</sup> - ... <sup>s</sup>n} to denote that <sup>X</sup> is a variable generalizing expressions <sup>s</sup>1,...,sn. Examples for the <sup>m</sup>odified rules are

(Decm) {X:f(s1,1,...,s1,n) - ... <sup>f</sup>(sm,1,...,sm,n)}∪· Γ,M, L X<sup>i</sup> are fresh variables n = 0 is permitted <sup>Γ</sup>∪{· <sup>X</sup>1:s1,<sup>1</sup> - ... sm,1,...,Xn:s1,n - ... <sup>s</sup>m,n}, M,L ∪ {<sup>X</sup> → <sup>f</sup>(X1,...,Xn)} (Absaam) {X:λa.s<sup>1</sup> - ... λa.sn}∪· Γ,M, L <sup>Γ</sup>∪{· <sup>Y</sup> :s<sup>1</sup> - ... <sup>s</sup>n},M,L ∪ {<sup>X</sup> → λa.Y } (Merm) Γ, {X:s<sup>1</sup> - ... sn, Y :t<sup>1</sup> - ... <sup>t</sup>n}∪· M,L Eqvm({(s1,...,sn)  (t1,...,tn)}) = <sup>π</sup> Γ,M ∪ {X1:s<sup>1</sup> <sup>t</sup>1}, L ∪ {<sup>X</sup> → <sup>π</sup>·<sup>Y</sup> }

Thus, we adapt the rules of AntiUnifLetr: it accepts <sup>n</sup> <sup>≥</sup> <sup>2</sup> ground expressions; the permutation-rule (Letrperm) is inactive due to fixing the ordering of bindings; merging is supported, and the subalgorithms Eqvm and EqvBiEx are almost trivial and applied to larger tuples. Also the sequence of bindings in lets is fixed. All these adaptations can be done within the polynomial complexity.

These explanations suggest the algorithm AntiUnifNoGarbage, for <sup>n</sup> <sup>≥</sup> 2 (ground) arguments, operating on a triple: (Γ,M, L). It is defined nondeterministically, but only one run will be done.

*Example 10 (Fixed letr bindings).* Generalizing the garbage-collected expressions let a .a; b .b; c .c in f(g(a , b , c )) and let a .b; b .c; c .a in f(h(a , b , c )) produces let a .a; b .b; c .c in f(X) since bindings can be rearranged, which requires exponential complexity for trying rearrangements. If we fix the sequence of bindings and generalize, then the algorithm requires only polynomial time in this step, then for letf a .a; b .b; c .c in f(g(a , b , c )) and letf a .b; b .c; c .a in f(h(a , b , c )), we obtain letf a .X1; b .X2; c .X<sup>3</sup> in f(X). Theorem 4. *Algorithm* AntiUnifNoGarbage *is sound, terminating and complete. It will compute a single least general generalization in polynomial time.*

*Proof (Sketch)*. The main argument is that if no rule applies, then the result is already a generalization. Second, every applied rule keeps the semantics, i.e., does not lose information. The complexity has two components: one is the preparation of the input, which is polynomial. The second part is the test and computation of every rule, which is polynomial since there are no ∇-sets, and the execution of every rule requires polynomial time in the input size. Moreover, the size of the problem is decreased in every step.

#### 4.2 Exploiting Semantic Equalities

Since we focus application of the algorithms in (functional) higher-order programming languages, it makes sense to take more semantic equations and properties into account to recognize semantic equality of syntactically different expressions, which improves the power of generalization algorithms.

Since there are various approaches and definitions to semantics, like variants of contextual equivalences or bisimulations [9,14,15,19] and we want to be consistent with most of them, we only investigate the equalities that are correct in a majority of the cases. By "cases" we mean different programming languages permitting letr, but with different operational and equational semantics.

The following semantically correct equalities, expressed as rewrite rules, in languages with letrec could also be used for further standardization of expressions, where we assume that there are no conflicts with variable names.

1. x.f(s1,...,sn) <sup>→</sup> x.f(y1,...,yn); <sup>y</sup>1.s1; ... ; <sup>y</sup>n.sn


Note that these equalities if used to standardize expressions keep the polynomial complexity of generalizations of ground expressions.

### 5 Conclusion and Future Work

We formulated an anti-unification algorithm for expressions in a functional higher-order language with a let constructor that has mutually recursive bindings. We constructed a weakly complete anti-unification algorithm that in the general case is finitary, which is improved to being complete by a post-processing. In the worst case, the time for the computation as well as the number of generalizations are exponential.

In case the expressions are specialized to be ground and garbage-free, then the problem becomes unitary and the computation is polynomial. These properties make the method more friendly to applications. We also considered modifications of the generalization algorithm for functions in functional programming languages with letr that has a wider coverage by abstracting from the syntactical details and by observing semantic equalities.

Further work is to generalize algorithms to other patterns and to experiment with the generalization method in practice.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Confluence Criteria for Logically Constrained Rewrite Systems

Jonas Schöpf(B) and Aart Middeldorp

Department of Computer Science, Universität Innsbruck, Innsbruck, Austria {jonas.schoepf,aart.middeldorp}@uibk.ac.at

Abstract. Numerous confluence criteria for plain term rewrite systems are known. For logically constrained rewrite system, an attractive extension of term rewriting in which rules are equipped with logical constraints, much less is known. In this paper we extend the strongly-closed and (almost) parallel-closed critical pair criteria of Huet and Toyama to the logically constrained setting. We discuss the challenges for automation and present crest, a new tool for logically constrained rewriting in which the confluence criteria are implemented, together with experimental data.

Keywords: Confluence · Term Rewriting · Constraints · Automation

#### 1 Introduction

Logically constrained rewrite systems constitute a general rewrite formalism with native support for constraints that are handled by SMT solvers. They are useful for program analysis, as illustrated in numerous papers [2,3,5,13]. Several results from term rewriting have been lifted to constrained rewriting. We mention termination analysis [6,7,12], rewriting induction [3], completion [12] as well as runtime complexity analysis [13].

In this paper we are concerned with confluence analysis of logically constrained rewrite systems (LCTRSs for short). Only two sufficient conditions for confluence of LCTRSs are known. Kop and Nishida considered (weak) orthogonality in [8]. Orthogonality is the combination of left-linearity and the absence of critical pairs, in a weakly orthogonal system trivial critical pairs are allowed. Completion of LCTRSs is the topic of [12] and the underlying confluence condition of completion is the combination of termination and joinability of critical pairs. In this paper we add two further confluence criteria. Both of these extend known conditions for standard term rewriting to the constrained setting. The first is the combination of linearity and strong closedness of critical pairs, introduced by Huet [4]. The second, also due to [4], is the combination of left-linearity and parallel closedness of critical pairs. We also consider an extension of the latter, due to Toyama [11].

This research is supported by FWF (Austrian Science Fund) Project I 5943-N.

c The Author(s) 2023

B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 474–490, 2023. https://doi.org/10.1007/978-3-031-38499-8\_27

*Overview.* The remainder of this paper is organized as follows. In the next section we summarize the relevant background. Section 3 recalls the existing confluence criteria for LCTRSs and some of the underlying results. The new confluence criteria for LCTRSs are reported in Sect. 4. In Sect. 5 the automation challenges we faced are described and we present our prototype implementation crest. Experimental results are reported in Sect. 6, before we conclude in Sect. 7.

### 2 Preliminaries

We assume familiarity with the basic notions of term rewrite systems (TRSs) [1], but shortly recapitulate terminology and notation that we use in the remainder. In particular, we recall the notion of logically constrained rewriting as defined in [3,8].

We assume a many-sorted signature F and a set V of (many-sorted) variables disjoint from F. The signature F is split into term symbols from Fte and theory symbols from Fth. The set T (F, V) contains the well-sorted terms over this signature and T (Fth) denotes the set of well-sorted ground terms that consist entirely of theory symbols. We assume a mapping I which assigns to every sort ι occurring in Fth a carrier set I(ι), and an interpretation J that assigns to every symbol f ∈ Fth with sort declaration ι<sup>1</sup> ×··· × ι<sup>n</sup> → κ a function f<sup>J</sup> : I(ι1) ×···×I(ιn) → I(κ). Moreover, for every sort ι occurring in Fth we assume a set <sup>V</sup>al<sup>ι</sup> ⊆ Fth of value symbols, such that all <sup>c</sup> ∈ Val<sup>ι</sup> are constants of sort <sup>ι</sup> and <sup>J</sup> constitutes a bijective mapping between <sup>V</sup>al<sup>ι</sup> and <sup>I</sup>(ι). Thus there exists a constant symbol in Fth for every value in the carrier set. The interpretation J naturally extends to a mapping [[·]] from ground terms in T (Fth) to values in <sup>V</sup>al <sup>=</sup> - <sup>ι</sup>∈Dom(I) <sup>V</sup>al<sup>ι</sup>: [[f(t1,...,tn)]] = <sup>f</sup><sup>J</sup> ([[t1]],..., [[tn]]) for all f(t1,...,tn) ∈ T (Fth). So every ground term in T (Fth) has a unique value. We demand that theory symbols and term symbols overlap only on values, i.e., <sup>F</sup>te ∩ Fth ⊆ Val. A term in <sup>T</sup> (Fth, <sup>V</sup>) is called a *logical* term.

Positions are strings of positive natural numbers used to address subterms. The empty string is denoted by . We write q p and say that p is below q if qq = p for some position q , in which case p\q is defined to be q . Furthermore, q<p if q p and q = p. Finally, positions q and p are parallel, written as q p, if neither q <sup>p</sup> nor p<q. The set of positions of a term <sup>t</sup> is defined as <sup>P</sup>os(t) = {} if <sup>t</sup> is a variable or a constant, and as <sup>P</sup>os(t) = {}∪{iq <sup>|</sup> <sup>1</sup> i n and q ∈ <sup>P</sup>os(ti)} if <sup>t</sup> <sup>=</sup> <sup>f</sup>(t1,...,tn) with <sup>n</sup> <sup>1</sup>. The subterm of <sup>t</sup> at position <sup>p</sup> ∈ Pos(t) is defined as t|<sup>p</sup> = t if p = and as t|<sup>p</sup> = ti|<sup>q</sup> if p = iq and t = f(t1,...,tn). We write s[t]<sup>p</sup> for the result of replacing the subterm at position p of s with t. We write <sup>P</sup>os<sup>V</sup> (t) for {<sup>p</sup> ∈ Pos(t) <sup>|</sup> <sup>t</sup>|<sup>p</sup> ∈ V} and <sup>P</sup>os<sup>F</sup> (t) for <sup>P</sup>os(t) \ Pos<sup>V</sup> (t). The set of variables occurring in the term <sup>t</sup> is denoted by <sup>V</sup>ar(t). A term <sup>t</sup> is linear if every variable occurs at most once in it. A substitution is a mapping σ from V to T (F, V) such that its domain {x ∈V| σ(x) = x} is finite. We write tσ for the result of applying σ to the term t.

We assume the existence of a sort bool such that <sup>I</sup>(bool) = <sup>B</sup> <sup>=</sup> { , ⊥}, <sup>V</sup>albool <sup>=</sup> {true, false}, [[true]] = , and [[false]] = <sup>⊥</sup> hold. Logical terms of sort bool are called *constraints*. A constraint <sup>ϕ</sup> is *valid* if [[ϕγ]] = for all substitutions <sup>γ</sup> such that <sup>γ</sup>(x) ∈ Val for all <sup>x</sup> ∈ Var(ϕ).

A *constrained rewrite rule* is a triple → r [ϕ] where , r ∈ T (F, V) are terms of the same sort such that root() ∈ Fte \ Fth and <sup>ϕ</sup> is a logical term of sort bool. If <sup>ϕ</sup> <sup>=</sup> true then the constraint is often omitted, and the rule is denoted as <sup>→</sup> <sup>r</sup>. We denote the set <sup>V</sup>ar(ϕ)∪(Var(r)\Var()) of *logical* variables in <sup>ρ</sup>: <sup>→</sup> <sup>r</sup> [ϕ] by LVar(ρ). We write EVar(ρ) for the set <sup>V</sup>ar(r)\(Var()∪Var(ϕ)) of variables that appear only in the right-hand side of ρ. Note that extra variables in right-hand sides are allowed, but they may only be instantiated by values. This is useful to model user input or random choice [3]. A set of constrained rewrite rules is called a *logically constrained rewrite system* (LCTRS for short).

The LCTRS R introduced in the example below computes the maximum of two integers.

*Example 1.* Before giving the rules, we need to define the term and theory symbols, the carrier sets and interpretation functions:

$$\begin{aligned} \mathcal{F}\_{\mathsf{te}} &= \{\mathsf{max} \colon \mathsf{int} \times \mathsf{int} \Rightarrow \mathsf{int}\} \cup \{\mathsf{0}, 1, \ldots \colon \mathsf{int}\} & \mathcal{Z}\_{\mathsf{bool}} &= \mathbb{B} & \mathcal{Z}\_{\mathsf{int}} &= \mathbb{Z} \\ \mathcal{F}\_{\mathsf{th}} &= \{\mathsf{0}, 1, \ldots \colon \mathsf{int}\} \cup \{\mathsf{true}, \mathsf{false} \colon \mathsf{bool}\} \cup \{\neg\text{:} \colon \mathsf{bool} \Rightarrow \mathsf{bool}\} \\ & \cup \{- \colon \mathsf{int} \Rightarrow \mathsf{int}\} \cup \{\wedge \colon \mathsf{bool} \times \mathsf{bool} \Rightarrow \mathsf{bool}\} \\ & \cup \{+, - \colon \mathsf{int} \times \mathsf{int} \Rightarrow \mathsf{int}\} \cup \{\leq, \geq, <, >, = \mathsf{:int} \times \mathsf{int} \Rightarrow \mathsf{bool}\} \end{aligned}$$

The interpretations for theory symbols follow the usual semantics given in the SMT-LIB theory Ints<sup>1</sup> used by the SMT-LIB logic QF\_LIA. The LCTRS <sup>R</sup> consists of the following constrained rewrite rules

$$\mathsf{imax}(x,y) \to x \; [x \ge y] \qquad \mathsf{max}(x,y) \to y \; [y \ge x] \qquad \mathsf{max}(x,y) \to \mathsf{max}(y,x) \; ]$$

In later examples we refrain from spelling out the signature and interpretations of the theory Ints. We now define rewriting using constrained rewrite rules. LCTRSs admit two kinds of rewrite steps. Rewrite rules give rise to *rule* steps, provided the constraint of the rule is satisfied. In addition, theory calls of the form <sup>f</sup>(v1,...,vn) with <sup>f</sup> ∈ Fth \ Val and values <sup>v</sup>1,...,v<sup>n</sup> can be evaluated in a *calculation* step. In the definition below, a substitution σ is said to *respect* a rule <sup>ρ</sup>: <sup>→</sup> <sup>r</sup> [ϕ], denoted by <sup>σ</sup> <sup>ρ</sup>, if <sup>D</sup>om(σ) = <sup>V</sup>ar()∪Var(r)∪Var(ϕ), <sup>σ</sup>(x) ∈ Val for all <sup>x</sup> ∈ LVar(ρ), and ϕσ is valid. Moreover, a constraint <sup>ϕ</sup> is respected by <sup>σ</sup>, denoted by <sup>σ</sup> <sup>ϕ</sup>, if <sup>σ</sup>(x) ∈ Val for all <sup>x</sup> ∈ Var(ϕ) and ϕσ is valid.

Definition 1. *Let* R *be an LCTRS. A* rule step s →ru t *satisfies* s|<sup>p</sup> = σ *and* t = s[rσ]<sup>p</sup> *for some position* p *and constrained rewrite rule* → r [ϕ] *that is respected by the substitution* σ*. A* calculation step s →ca t *satisfies* s|<sup>p</sup> = <sup>f</sup>(v1,...,vn) *and* <sup>t</sup> <sup>=</sup> <sup>s</sup>[v]<sup>p</sup> *for some* <sup>f</sup> ∈ Fth \ Val*,* <sup>v</sup>1,...,v<sup>n</sup> ∈ Val *with* <sup>v</sup> <sup>=</sup> [[f(v1,...,vn)]]*. In this case* f(x1,...,xn) → y [y = f(x1,...,xn)] *with a fresh variable* y *is a* calculation rule*. The set of all calculation rules is denoted by* Rca*. The relation* →<sup>R</sup> *associated with* R *is the union of* →ru ∪ →ca*.*

<sup>1</sup> http://smtlib.cs.uiowa.edu/Theories/Ints.smt2.

We sometimes write →p|ρ|<sup>σ</sup> to indicate that the rewrite step takes place at position p, using the constrained rewrite rule ρ with substitution σ.

*Example 2.* We have max(<sup>1</sup> <sup>+</sup> <sup>2</sup>, <sup>4</sup>) <sup>→</sup><sup>R</sup> max(3, <sup>4</sup>) <sup>→</sup><sup>R</sup> max(4, <sup>3</sup>) <sup>→</sup><sup>R</sup> <sup>4</sup> in the LCTRS of Example 1. The first step is a calculation step. In the third step we apply the rule max(x, y) <sup>→</sup> <sup>x</sup> [<sup>x</sup> <sup>≥</sup> <sup>y</sup>] with substitution <sup>σ</sup> <sup>=</sup> {<sup>x</sup> → 4, y → 3}.

### 3 Confluence

In this paper we are concerned with the confluence of LCTRSs. An LCTRS R is *confluent* if t →<sup>∗</sup> <sup>R</sup> · <sup>∗</sup> <sup>R</sup><sup>←</sup> <sup>u</sup> for all terms <sup>s</sup>, <sup>t</sup> and <sup>u</sup> such that <sup>t</sup> <sup>∗</sup> <sup>R</sup>← s →<sup>∗</sup> <sup>R</sup> <sup>u</sup>. Confluence criteria for TRSs are based on critical pairs. Critical pairs for LCTRS were introduced in [8]. The difference with the definition below is that we add dummy constraints for *extra* variables in right-hand sides of rewrite rules.

Definition 2. *An* overlap *of an LCTRS* R *is a triple* ρ1, p, ρ2 *with rules* ρ<sup>1</sup> : <sup>1</sup> → r<sup>1</sup> [ϕ<sup>1</sup> ] *and* ρ<sup>2</sup> : <sup>2</sup> → r<sup>2</sup> [ϕ<sup>2</sup> ]*, satisfying the following conditions:*


*In this case we call* 2σ[r1σ]<sup>p</sup> ≈ r2σ [ϕ1σ ∧ϕ2σ ∧ψσ] *a* constrained critical pair *obtained from the overlap* ρ1, p, ρ2*. Here*

$$\psi = \bigwedge \{ x = x \mid x \in \mathcal{E} \mathcal{V} \text{ar}(\rho\_1) \cup \mathcal{E} \mathcal{V} \text{ar}(\rho\_2) \} $$

*The set of all constrained critical pairs of* <sup>R</sup> *is denoted by* CCP(R)*.*

In the following we drop "constrained" and speak of critical pairs. The condition <sup>V</sup>ar(r1) - <sup>V</sup>ar(1) in the fifth condition is essential to correctly deal with extra variables in rewrite rules. The equations (ψ) added to the constraint of a critical pair save the information which variables in a critical pair were introduced by variables only occurring in the right-hand side of a rewrite rule and therefore should *only* be instantiated by values. Critical pairs as defined in [8,12] lack this information. The proof of Theorem 2 in the next section makes clear why those trivial equations are essential for our confluence criteria, see also Example 9.

*Example 3.* Consider the LCTRS consisting of the rule

$$\rho \colon \mathbf{f}(x) \to z \; [x = z \, \mathbf{\hat{2}}],$$

The variable <sup>z</sup> does not occur in the left-hand side and the condition <sup>V</sup>ar(r1) - <sup>V</sup>ar(1) ensures that <sup>ρ</sup> overlaps with (a variant of) itself at the root position. Note that <sup>R</sup> is not confluent due to the non-joinable local peak <sup>−</sup>4 <sup>←</sup> f(16) <sup>→</sup> 4. *Example 4.* The LCTRS R of Example 1 admits the following critical pairs:


The originating overlap is given on the right, where we number the rewrite rules from left to right in Example 1.

Actually, there are three more overlaps since the position of overlap () is the root position. Such overlaps are called *overlays* and always come in pairs. For instance, max(y, x) <sup>≈</sup> <sup>x</sup> [<sup>x</sup> <sup>≥</sup> <sup>y</sup>] is the critial pair originating from 3, , <sup>1</sup>. For confluence criteria based on symmetric joinability conditions of critical pairs (like weak orthogonality and joinability of critical pairs for terminating systems) we need to consider just one critical pair, but this is not true for the criteria presented in the next section.

Logically constrained rewriting aims to rewrite (unconstrained) terms with constrained rules. However, for the sake of analysis, rewriting *constrained terms* is useful. In particular, since critical pairs in LCTRSs come with a constraint, confluence criteria need to consider constrained terms. The relevant notions defined below originate from [3,8].

Definition 3. *A* constrained term *is a pair* s [ϕ] *of a term* s *and a constraint* ϕ*. Two constrained terms* s [ϕ] *and* t [ψ] *are* equivalent*, denoted by* s [ϕ] ∼ t [ψ]*, if for every substitution* γ *respecting* ϕ *there is some substitution* δ *that respects* ψ *such that* sγ = tδ*, and vice versa. Let* R *be an LCTRS and* s [ϕ] *a constrained term. If* s|<sup>p</sup> = σ *for some constrained rewrite rule* ρ: → r [ψ]*, position* p*, and substitution* <sup>σ</sup> *such that* <sup>σ</sup>(x) ∈ Val ∪ Var(ϕ) *for all* <sup>x</sup> ∈ LVar(ρ)*,* <sup>ϕ</sup> *is satisfiable and* ϕ ⇒ ψσ *is valid then*

$$s\ [\varphi] \rightarrow\_{\mathsf{ru}} s[r\sigma]\_p \ [\varphi]\_q$$

*is a* rule step*. If* <sup>s</sup>|<sup>p</sup> <sup>=</sup> <sup>f</sup>(s1,...,sn) *with* <sup>f</sup> ∈ Fth \ Fte *and* <sup>s</sup>1,...,s<sup>n</sup> ∈ Val <sup>∪</sup> <sup>V</sup>ar(ϕ) *then*

s [ϕ] →ca s[x]<sup>p</sup> [ϕ ∧ x = f(s1,...,sn)]

*is a* calculation step*. Here* x *is a fresh variable. We write* →<sup>R</sup> *for* →ru ∪ →ca *and the rewrite relation* → ∼ <sup>R</sup> *on constrained terms is defined as* ∼·→<sup>R</sup> · ∼*.*

Positions in connection with → ∼ <sup>R</sup> steps always refer to the underlying steps in →R. We give an example of constrained rewriting.

*Example 5.* Consider again the LCTRS R of Example 1. We have

$$\begin{aligned} \mathsf{max}(x+y, \mathsf{6}) \left[ x \ge \mathsf{2} \land y \ge \mathsf{4} \right] \to\_{\mathsf{R}} \mathsf{max}(z, \mathsf{6}) \left[ x \ge \mathsf{2} \land y \ge \mathsf{4} \land z = x + y \right] \\ \to\_{\mathsf{R}} z \left[ x \ge \mathsf{2} \land y \ge \mathsf{4} \land z = x + y \right] \end{aligned}$$

The first step is a calculation step. The second step is a rule step using the rule max(x, y) <sup>→</sup> <sup>x</sup> [<sup>x</sup> <sup>≥</sup> <sup>y</sup>] with the substitution <sup>σ</sup> <sup>=</sup> {<sup>x</sup> → z,y → 6}. Note that the constraint (<sup>x</sup> <sup>≥</sup> 2 <sup>∧</sup> <sup>y</sup> <sup>≥</sup> 4 <sup>∧</sup> <sup>z</sup> <sup>=</sup> <sup>x</sup> <sup>+</sup> <sup>y</sup>) <sup>⇒</sup> <sup>z</sup> <sup>≥</sup> 6 is valid.

Definition 4. *A critical pair* s ≈ t [ϕ] *is* trivial *if* sσ = tσ *for every substitution* σ *with* σ ϕ*.* <sup>2</sup> *A left-linear LCTRS having only trivial critical pairs is called* weakly orthogonal*. A left-linear TRS without critical pairs is called* orthogonal*.*

The following result is from [8].

Theorem 1. *Weakly orthogonal LCTRS are confluent.*

*Example 6.* The following left-linear LCTRS computes the Ackermann function using term symbols from <sup>F</sup>te <sup>=</sup> {ack : int <sup>×</sup> int <sup>⇒</sup> int}∪{0, <sup>1</sup>, ··· : int} and the same theory symbols, carrier sets and interpretations as in Example 1:

$$\begin{aligned} \mathsf{ack}(\mathbf{0},n) &\to n+1 \; [n \ge \mathbf{0}] \\ \mathsf{ack}(m,\mathbf{0}) &\to \mathsf{ack}(m-1,1) \; [m>\mathbf{0}] \\ \mathsf{ack}(m,n) &\to \mathsf{ack}(m-1,\mathsf{ack}(m,n-1)) \; [m>\mathbf{0} \land n>\mathbf{0}] \\ \mathsf{ack}(m,n) &\to \mathbf{0} \; [m<\mathbf{0} \lor n<\mathbf{0}] \end{aligned}$$

Since the conjunction of any two constraints is unsatisfiable, R lacks critical pairs. Hence R is confluent by Theorem 1.

The following result is proved in [12] and forms the basis of completion of LCTRSs.

Lemma 1. *Let* <sup>R</sup> *be an LCTRS. If* <sup>t</sup> <sup>R</sup><sup>←</sup> <sup>s</sup> <sup>→</sup><sup>R</sup> <sup>u</sup> *then* <sup>t</sup> <sup>↓</sup><sup>R</sup> <sup>u</sup> *or* <sup>t</sup> ←−−−→ CCP(R) u*.* 

In combination with Newman's Lemma, the following confluence criterion is obtained.

Corollary 1. *A terminating LCTRS is confluent if all critical pairs are joinable.*

This is less obvious than it seems. Joinability of a critical pair s ≈ t [ϕ] cannot simply be defined as s [ϕ] → ∼ ∗ <sup>R</sup> · <sup>∗</sup> <sup>R</sup>← <sup>∼</sup> t [ϕ], as the following example shows.

*Example 7.* Consider the terminating LCTRS R consisting of the rewrite rules

$$\mathbf{f}(x,y) \to \mathbf{g}(x, \mathbf{1} + \mathbf{1}) \qquad \qquad \qquad \mathsf{h}(\mathbf{f}(x,y)) \to \mathsf{h}(\mathbf{g}(y, \mathbf{1} + \mathbf{1})) $$

The single critical pair h(g(x, 1 <sup>+</sup> 1)) <sup>≈</sup> h(g(y, 1 <sup>+</sup> 1)) should not be joinable because R is not confluent, but we do have

$$\begin{aligned} \mathsf{h}(\mathsf{g}(x,1+1)) \to\_{\mathsf{ca}} \mathsf{h}(\mathsf{g}(x,z)) \ [z=1+1] &\sim \mathsf{h}(\mathsf{g}(y,v)) \ [v=1+1] \\ \mathsf{h}(\mathsf{g}(y,1+1)) \to\_{\mathsf{ca}} \mathsf{h}(\mathsf{g}(y,v)) \ [v=1+1] \end{aligned}$$

due to the equivalence relation ∼ on constrained terms; since x and y do not appear in the constraints, there is no demand that they must be instantiated with values.

<sup>2</sup> The triviality condition in [8] is wrong. Here we use the corrected version in an update of [8] announced on Cynthia Kop's website (accessible at https://www.cs.ru. nl/~cynthiakop/frocos13.pdf).

The solution is not to treat the two sides of a critical pair in isolation but define joinability based on rewriting constrained term pairs. So we view the symbol ≈ in a constrained equation s ≈ t [ϕ] as a binary constructor symbol such that the constrained equation can be viewed as a constrained term. Steps in s take place at positions 1 whereas steps in t use positions 2. The same is done in completion of LCTRSs [12].

Definition 5. *We call a constrained equation* s ≈ t [ϕ] trivial *if* sσ = tσ *for any substitution* σ *with* σ ϕ*. A critical pair* s ≈ t [ϕ] *is* joinable *if* s ≈ t [ϕ] → ∼ ∗ R u ≈ v [ψ] *and* u ≈ v [ψ] *is trivial.*

We revisit Example 7.

*Example 8.* For the critical pair in Example 7 we obtain

$$\begin{aligned} \mathsf{h}(\mathsf{g}(x,\mathsf{1}+\mathsf{1})) &\approx \mathsf{h}(\mathsf{g}(y,\mathsf{1}+\mathsf{1})) \\ &\rightarrow \mathsf{c}\mathsf{a}\ \mathsf{h}(\mathsf{g}(x,v)) \approx \mathsf{h}(\mathsf{g}(y,\mathsf{1}+\mathsf{1})) \ [v=\mathsf{1}+\mathsf{1}] \\ &\rightarrow \mathsf{c}\ \mathsf{h}(\mathsf{g}(x,v)) \approx \mathsf{h}(\mathsf{g}(y,z)) \ [v=\mathsf{1}+\mathsf{1}\wedge z=\mathsf{1}+\mathsf{1}] \end{aligned}$$

The substitution <sup>σ</sup> <sup>=</sup> {<sup>v</sup> → 2, z → 2} respects the constraint <sup>v</sup> <sup>=</sup> 1+1∧<sup>z</sup> <sup>=</sup> 1+1 but does not equate h(g(x, v)) and h(g(y, z)).

The converse of Corollary 1 also holds, but note that in contrast to TRSs, joinability of critical pairs is not a decidable criterion for terminating LCTRSs, due to the undecidable triviality condition. Moreover, for the converse to hold, it is essential that critical pairs contain the trivial equations ψ in Definition 2.

*Example 9.* Consider the LCTRS R consisting of the rules

$$\mathbf{f}(x) \to \mathbf{g}(y) \qquad\qquad\qquad\qquad \mathbf{g}(y) \to \mathbf{a} \left[y = y\right]^T$$

which admits the critical pair <sup>g</sup>(y) <sup>≈</sup> g(y ) [y = y ∧y = y ] originating from the overlap f(x) <sup>→</sup> g(y), , f(x ) <sup>→</sup> g(y ). This critical pair is joinable as y and y are restricted to values and thus both sides rewrite to a using the second rule. As R is also terminating, it is confluent by Corollary 1. If we were to drop ψ in Definition 2, we would obtain the non-joinable critical pair g(y) <sup>≈</sup> g(y ) instead and wrongly conclude non-confluence.

### 4 Main Results

We start with extending a confluence result of Huet [4] for linear TRSs. Below we write →<sup>p</sup> to indicate that the position of the contracted redex in the step is below position p.

Definition 6. *A critical pair* s ≈ t [ϕ] *is* strongly closed *if*

$$\begin{array}{l} \text{1. } s \approx t \text{ [}\varphi\text{]} \stackrel{\sim}{\rightarrow}\_{\geqslant 1}^{\*} \text{ } \stackrel{\coloneqq}{\rightarrow}\_{\geqslant 2}^{=} u \approx v \text{ [}\psi\text{]} \text{ for some trivial } u \approx v \text{ [}\psi\text{]}, \text{ and} \\\text{2. } s \approx t \text{ [}\varphi\text{]} \stackrel{\sim}{\rightarrow}\_{\geqslant 2}^{\*} \text{ } \stackrel{\coloneqq}{\rightarrow}\_{\geqslant 1} u \approx v \text{ [}\psi\text{]} \text{ for some trivial } u \approx v \text{ [}\psi\text{]}. \end{array}$$

A binary relation <sup>→</sup> on terms is *strongly confluent* if <sup>t</sup> <sup>→</sup><sup>∗</sup> · <sup>=</sup><sup>←</sup> <sup>u</sup> for all terms <sup>s</sup>, <sup>t</sup> and <sup>u</sup> with <sup>t</sup> <sup>←</sup> <sup>s</sup> <sup>→</sup> <sup>u</sup>. (By symmetry, also <sup>t</sup> <sup>→</sup><sup>=</sup> · <sup>∗</sup><sup>←</sup> <sup>u</sup> is required.) Strong confluence is a well-known sufficient condition for confluence. Huet [4] proved that linear TRSs are strongly confluent if all critical pairs are strongly closed. Below we extend this result to LCTRSs, using the above definition of strongly closed constrained critical pairs.

Theorem 2. *A linear LCTRS is strongly confluent if all its critical pairs are strongly closed.*

We give full proof details in order to illustrate the complications caused by constrained rewrite rules. The following result from [12] plays an important role.

Lemma 2. *Suppose* s ≈ t [ϕ] → ∼ <sup>p</sup> u ≈ v [ψ] *and* γ ϕ*. If* p 1 *then* sγ → uδ *and* tγ = vδ *for some substitution* δ *with* δ ψ*. If* p 2 *then* sγ = uδ *and* tγ → vδ *for some substitution* δ *with* δ ψ*.*

*Proof (of Theorem* 2*).* Consider an arbitrary local peak

$$t \leftarrow\_{p\_1 \mid \rho\_1 \mid \sigma\_1} s \rightarrow\_{p\_2 \mid \rho\_2 \mid \sigma\_2} u$$

with rewrite rules ρ<sup>1</sup> : <sup>1</sup> → r<sup>1</sup> [ϕ<sup>1</sup> ] and ρ<sup>2</sup> : <sup>2</sup> → r<sup>2</sup> [ϕ<sup>2</sup> ] from R∪Rca. We may assume that ρ<sup>1</sup> and ρ<sup>2</sup> have no variables in common, and consequently <sup>D</sup>om(σ1) ∩ Dom(σ2) = <sup>∅</sup>. We have <sup>s</sup>|<sup>p</sup><sup>1</sup> <sup>=</sup> 1σ1, <sup>t</sup> <sup>=</sup> <sup>s</sup>[r1σ1]<sup>p</sup><sup>1</sup> and <sup>σ</sup><sup>1</sup> <sup>ϕ</sup>1. Likewise, s|<sup>p</sup><sup>2</sup> = 2σ2, u = s[r2σ2]<sup>p</sup><sup>2</sup> and σ<sup>2</sup> ϕ2. If p<sup>1</sup> p<sup>2</sup> then

$$t \to\_{p\_2 \mid \rho\_2 \mid \sigma\_2} t[r\_2 \sigma\_2]\_{p\_2} = u[r\_1 \sigma\_1]\_{p\_1} \leftarrow\_{p\_1 \mid \rho\_1 \mid \sigma\_1} u$$

Hence both <sup>t</sup> <sup>→</sup><sup>∗</sup> · <sup>=</sup><sup>←</sup> <sup>u</sup> and <sup>t</sup> <sup>→</sup><sup>=</sup> · <sup>∗</sup><sup>←</sup> <sup>u</sup>. If <sup>p</sup><sup>1</sup> and <sup>p</sup><sup>2</sup> are not parallel then p<sup>1</sup> p<sup>2</sup> or p<sup>2</sup> < p1. Without loss of generality, we consider p<sup>1</sup> p2. Let <sup>q</sup> <sup>=</sup> <sup>p</sup>2\p1. We do a case analysis on whether or not <sup>q</sup> ∈ Pos<sup>F</sup> (1).

– First suppose q /∈ Pos<sup>F</sup> (1). Let <sup>q</sup> <sup>=</sup> <sup>q</sup>1q<sup>2</sup> such that <sup>q</sup><sup>1</sup> ∈ Pos<sup>V</sup> (1) and let <sup>x</sup> be the variable in <sup>1</sup> at position <sup>q</sup>1. We have 2σ<sup>2</sup> <sup>=</sup> xσ1|<sup>q</sup><sup>2</sup> and thus <sup>σ</sup>1(x) ∈ V / al. Define the substitution σ <sup>1</sup> as follows:

$$
\sigma\_1'(y) = \begin{cases}
x \sigma\_1 [r\_2 \sigma\_2]\_{q\_2} & \text{if } y = x \\
\sigma\_1(y) & \text{otherwise}
\end{cases}
$$

We show <sup>t</sup> <sup>→</sup><sup>=</sup> <sup>s</sup>[r1σ <sup>1</sup>]<sup>p</sup><sup>1</sup> <sup>←</sup> <sup>u</sup>, which yields <sup>t</sup> <sup>→</sup><sup>∗</sup> · <sup>=</sup><sup>←</sup> <sup>u</sup> and <sup>t</sup> <sup>→</sup><sup>=</sup> · <sup>∗</sup><sup>←</sup> <sup>u</sup>. Since R is left-linear, 1σ <sup>1</sup> = 1σ1[xσ <sup>1</sup>]<sup>q</sup><sup>1</sup> = 1σ1[xσ1[r2σ2]<sup>q</sup><sup>2</sup> ]<sup>q</sup><sup>1</sup> = 1σ1[r2σ2]<sup>q</sup> and thus u = s[r2σ2]<sup>p</sup><sup>2</sup> = s[1σ1[r2σ2]q]<sup>p</sup><sup>1</sup> = s[1σ <sup>1</sup>]<sup>p</sup><sup>1</sup> . If we can show σ <sup>1</sup> ρ<sup>1</sup> then u → s[r1σ <sup>1</sup>]<sup>p</sup><sup>1</sup> . Consider an arbitrary variable <sup>y</sup> ∈ LVar(ρ1). If <sup>y</sup> <sup>=</sup> <sup>x</sup> then σ <sup>1</sup>(y) = <sup>σ</sup>1(y) ∈ Val since <sup>σ</sup><sup>1</sup> <sup>ρ</sup>1. If <sup>y</sup> <sup>=</sup> <sup>x</sup> then <sup>x</sup> ∈ Var(ϕ) since <sup>x</sup> ∈ Var(1). However, this contradicts <sup>σ</sup><sup>1</sup> <sup>ρ</sup><sup>1</sup> as <sup>σ</sup>1(x) ∈ V / al. So <sup>σ</sup> <sup>1</sup>(y) = <sup>σ</sup>1(y) for all <sup>y</sup> ∈ LVar(ρ1) and thus <sup>σ</sup> <sup>1</sup> ρ<sup>1</sup> is an immediate consequence of <sup>σ</sup><sup>1</sup> <sup>ρ</sup>1. It remains to show <sup>t</sup> <sup>→</sup><sup>=</sup> <sup>s</sup>[r1σ <sup>1</sup>]<sup>p</sup><sup>1</sup> . If x /∈ Var(r1) then <sup>r</sup>1σ <sup>1</sup> = r1σ<sup>1</sup> and thus t = s[r1σ <sup>1</sup>]<sup>p</sup><sup>1</sup> . If <sup>x</sup> ∈ Var(r1) then there exists a unique position <sup>q</sup> ∈ Pos<sup>V</sup> (r1) such that <sup>r</sup>1|q <sup>=</sup> <sup>x</sup>, due to the right-linearity of <sup>R</sup>. Hence r1σ <sup>1</sup> = r1σ1[xσ1[r2σ2]<sup>q</sup><sup>2</sup> ]<sup>q</sup> = r1σ1[r2σ2]<sup>q</sup>q<sup>2</sup> . Since r1σ1|qq<sup>2</sup> = 2σ<sup>2</sup> we obtain t = s[r1σ1]<sup>p</sup><sup>1</sup> →p1qq<sup>2</sup> <sup>|</sup>ρ<sup>2</sup> <sup>|</sup>σ<sup>2</sup> s[r1σ <sup>1</sup>]<sup>p</sup><sup>1</sup> as desired.

– Next suppose <sup>q</sup> ∈ Pos<sup>F</sup> (1). The substitution <sup>σ</sup> <sup>=</sup> <sup>σ</sup><sup>1</sup> <sup>∪</sup> <sup>σ</sup><sup>2</sup> satisfies 1|qσ <sup>=</sup> 1|qσ<sup>1</sup> = 2σ<sup>2</sup> = 2σ and thus is a unifier of 1|<sup>q</sup> and 2. Since σ<sup>1</sup> ρ<sup>1</sup> and σ<sup>2</sup> ρ2, σ (x) ∈ Val for all <sup>x</sup> ∈ LVar(ρ1) ∪ LVar(ρ2). Let <sup>σ</sup> be an mgu of 1|<sup>q</sup> and 2. Since σ is at least as general as σ , <sup>σ</sup>(x) ∈ Val ∪ V for all <sup>x</sup> ∈ LVar(ρ1) ∪ LVar(ρ2). Since <sup>ϕ</sup>1σ <sup>=</sup> <sup>ϕ</sup>1σ<sup>1</sup> and <sup>ϕ</sup>2σ <sup>=</sup> <sup>ϕ</sup>2σ<sup>2</sup> are valid, ϕ1σ ∧ ϕ2σ is satisfiable. Hence conditions 1, 2, 3 and 4 in Definition 2 hold for the triple ρ2, q, ρ1. If condition 5 is *not* fulfilled then q = (and thus <sup>p</sup><sup>1</sup> <sup>=</sup> <sup>p</sup>2), <sup>ρ</sup><sup>2</sup> and <sup>ρ</sup><sup>1</sup> are variants, and <sup>V</sup>ar(r2) ⊆ Var(2) (and thus also <sup>V</sup>ar(r1) ⊆ Var(1)). Hence 1σ<sup>1</sup> <sup>=</sup> 2σ<sup>2</sup> and <sup>r</sup>1σ<sup>1</sup> <sup>=</sup> <sup>r</sup>2σ2, and thus <sup>t</sup> <sup>=</sup> <sup>u</sup>. In the remaining case condition 5 holds and hence ρ2, q, ρ1 is an overlap. By definition, 1σ[r2σ]<sup>q</sup> ≈ r1σ [ϕ2σ ∧ ϕ1σ ∧ ψσ] with

$$\psi = \bigwedge \{ x = x \mid x \in \mathcal{EVar}(\rho\_1) \cup \mathcal{EVar}(\rho\_2) \} $$

is a critical pair. To simplify the notation, we abbreviate 1σ[r2σ]<sup>q</sup> to s , r1σ to t , and ϕ2σ ∧ ϕ1σ ∧ ψσ to ϕ . Critical pairs are strongly closed by assumption, and thus both

1. s ≈ t [ϕ ] → ∼ ∗ -<sup>1</sup> · → <sup>∼</sup> = -<sup>2</sup> u ≈ v [ψ ] for some trivial u ≈ v [ψ ], and 2. s ≈ t [ϕ ] → ∼ ∗ -<sup>2</sup> · → <sup>∼</sup> = -<sup>1</sup> u ≈ v [ψ ] for some trivial u ≈ v [ψ ].

Let γ be the substitution such that σγ = σ . We claim that γ respects ϕ . So let <sup>x</sup> ∈ Var(ϕ ) = <sup>V</sup>ar(ϕ2<sup>σ</sup> <sup>∧</sup> <sup>ϕ</sup>1<sup>σ</sup> <sup>∧</sup> ψσ). We have

$$
\mathcal{L}\mathcal{V}\mathsf{ar}(\rho\_1) = \mathcal{V}\mathsf{ar}(\varphi\_1) \cup \mathcal{E}\mathcal{V}\mathsf{ar}(\rho\_1) \qquad \mathcal{L}\mathcal{V}\mathsf{ar}(\rho\_2) = \mathcal{V}\mathsf{ar}(\varphi\_2) \cup \mathcal{E}\mathcal{V}\mathsf{ar}(\rho\_2)
$$

Together with <sup>V</sup>ar(ψ) = EVar(ρ1) ∪ EVar(ρ2) we obtain

$$\mathcal{L}\mathcal{V}\text{ar}(\rho\_1) \cup \mathcal{L}\mathcal{V}\text{ar}(\rho\_2) = \mathcal{V}\text{ar}(\varphi\_1) \cup \mathcal{V}\text{ar}(\varphi\_2) \cup \mathcal{V}\text{ar}(\psi)$$

Since σ (x) ∈ Val for all <sup>x</sup> ∈ LVar(ρ1) ∪ LVar(ρ2), we obtain <sup>γ</sup>(x) ∈ Val for all <sup>x</sup> ∈ Var(ϕ ) and thus γ ϕ . At this point repeated applications of Lemma 2 to the constrained rewrite sequence in item 1 yields a substitution δ respecting ψ such that s γ →<sup>∗</sup> uδ and t γ = vδ. Since u ≈ v [ψ ] is trivial, uδ = vδ and hence s <sup>γ</sup> <sup>→</sup><sup>∗</sup> · <sup>=</sup><sup>←</sup> <sup>t</sup> γ. Likewise, s <sup>γ</sup> <sup>→</sup><sup>=</sup> · <sup>∗</sup><sup>←</sup> <sup>t</sup> γ is obtained from item 2. We have

$$s'\gamma = (\ell\_1 \sigma [r\_2 \sigma]\_q)\gamma = \ell\_1 \sigma' [r\_2 \sigma']\_q = \ell\_1 \sigma\_1 [r\_2 \sigma\_2]\_q \qquad t'\gamma = r\_1 \sigma' = r\_1 \sigma\_1$$

Moreover, t = s[r1σ1]<sup>p</sup><sup>1</sup> = s[t γ]<sup>p</sup><sup>1</sup> and u = s[1σ1[r2σ2]q]<sup>p</sup><sup>1</sup> = s[s γ]<sup>p</sup><sup>1</sup> . Since rewriting is closed under contexts, we obtain <sup>u</sup> <sup>→</sup><sup>∗</sup> · <sup>=</sup><sup>←</sup> <sup>t</sup> and <sup>u</sup> <sup>→</sup><sup>=</sup> · <sup>∗</sup><sup>←</sup> <sup>t</sup>. This completes the proof.

*Example 10.* Consider the LCTRS R of Example 1 and its critical pairs in Example 4. The critical pair

$$x \approx \mathsf{max}(y, x) \; [x \ge y]$$

is not trivial, so Theorem <sup>1</sup> is not applicable and the rule max(x, y) <sup>→</sup> max(y, x) precludes the use of Corollary 1 to infer confluence. We do have

$$x \approx \mathsf{max}(y, x) \; [x \ge y] \xrightarrow{\geqslant 2} x \approx x \; [x \ge y]$$

by applying the rule max(x, y) <sup>→</sup> <sup>y</sup> [<sup>y</sup> <sup>≥</sup> <sup>x</sup>] and the resulting constrained equation x ≈ x [x ≥ y] is obviously trivial. The same reasoning applies to the critical pair <sup>y</sup> <sup>≈</sup> max(y, x) [<sup>y</sup> <sup>≥</sup> <sup>x</sup>]. The first critical pair <sup>x</sup> <sup>≈</sup> <sup>y</sup> [<sup>x</sup> <sup>≥</sup> <sup>y</sup> <sup>∧</sup> <sup>y</sup> <sup>≥</sup> <sup>x</sup>] in Example 4 is trivial since any (value) substitution satisfying its constraint x ≥ y ∧ y ≥ x equates x and y. By symmetry, all critical pairs of R are strongly closed. Since R is linear, confluence follows from Theorem 2.

The second main result is the extension of Huet's parallel closedness condition on critical pairs in left-linear TRSs [4] to LCTRSs. To this end, we first define parallel rewriting for LCTRSs.

Definition 7. *Let* R *be an LCTRS. The relation* −→ <sup>R</sup> *is defined on terms inductively as follows:*


We write −→ <sup>p</sup> to indicate that all positions of contracted redexes in the parallel step are below p. In the next definition we add constraints to parallel rewriting.

Definition 8. *Let* R *be an LCTRS. The relation* −→ <sup>R</sup> *is defined on constrained terms inductively as follows:*


*Here we assume that different applications to case 4 result in different fresh variables. The constraint* ψ *in case 2 collects the assignments introduced in earlier applications of case 4. (If there are none,* <sup>ψ</sup> <sup>=</sup> true *is omitted.) The same holds for* ψ1,...,ψn*. We write* <sup>∼</sup> −→ *for the relation* ∼·−→ <sup>R</sup> · ∼*.*

In light of the earlier developments, the following definition is the obvious adaptation of parallel closedness for LCTRSs.

Definition 9. *A critical pair* s ≈ t [ϕ] *is* parallel closed *if*

s ≈ t [ϕ] <sup>∼</sup> −→ -<sup>1</sup> u ≈ v [ψ]

*for some trivial* u ≈ v [ψ]*.*

Note that the right-hand side t of the constrained equation s ≈ t [ϕ] may change due to the equivalence relation ∼, cf. the statement of Lemma 2.

Theorem 3. *A left-linear LCTRS is confluent if its critical pairs are parallel closed.*

To prove this result, we adapted the formalized proof presented in [10] to the constrained setting. The required changes are very similar to the ones in the proof of Theorem 2.

*Example 11.* Consider the LCTRS R with rules

$$\begin{aligned} \mathsf{f}(x,y) &\to \mathsf{g}(\mathsf{a},y+y) \ [y \ge x \land y=1] \end{aligned} \qquad \begin{aligned} \mathsf{a}\to\mathsf{b} \\ \mathsf{g}\to\mathsf{b} \\ \end{aligned}$$

The single critical pair <sup>h</sup>(g(a, y <sup>+</sup> <sup>y</sup>)) <sup>≈</sup> h(g(b, 2)) [<sup>y</sup> <sup>≥</sup> <sup>x</sup> <sup>∧</sup> <sup>y</sup> <sup>=</sup> 1 <sup>∧</sup> <sup>x</sup> <sup>≥</sup> <sup>y</sup>] is parallel closed:

$$\begin{aligned} \mathsf{h}(\mathsf{g}(\mathsf{a}, y+y)) &\approx \mathsf{h}(\mathsf{g}(\mathsf{b}, \mathsf{2})) \ [y \geq x \land y = 1 \land x \geq y] \\ &\twoheadrightarrow\_{\geqslant 1} \mathsf{h}(\mathsf{g}(\mathsf{b}, z)) \approx \mathsf{h}(\mathsf{g}(\mathsf{b}, \mathsf{2})) \ [y \geq x \land y = 1 \land x \geq y \land z = y + y] \end{aligned}$$

and the obtained equation is trivial. Hence R is confluent by Theorem 3. Note that the earlier confluence criteria do not apply.

We also consider the extension of Huet's result by Toyama [11], which has a less restricted joinability condition on critical pairs stemming from overlapping rules at the root position. Such critical pairs are called *overlays* whereas critical pairs originating from overlaps ρ1, p, ρ2 with p> are called *inner* critical pairs.

Definition 10. *An LCTRS* R *is almost parallel-closed if every inner critical pair is parallel closed and every overlay* s ≈ t [ϕ] *satisfies*

> s ≈ t [ϕ] <sup>∼</sup> −→ -<sup>1</sup> · → ∼ ∗ -<sup>2</sup> u ≈ v [ψ]

*for some trivial* u ≈ v [ψ]*.*

Theorem 4. *Left-linear almost parallel-closed LCTRSs are confluent.*

Again, the formalized proof of the corresponding result for plain TRSs in [10] can be adapted to the constrained setting.

*Example 12.* Consider the following variation of the LCTRS R in Example 11:

$$\begin{aligned} \mathsf{f}(x,y) &\to \mathsf{g}(\mathsf{a}, y+y) \ [y \ge x \land y=1] \end{aligned} \qquad \begin{aligned} \mathsf{a} &\to \mathsf{b} \\ \mathsf{g}(x,y) &\to \mathsf{g}(\mathsf{b}, 2) \end{aligned}$$

The overlay g(b, 2) <sup>≈</sup> g(a, y <sup>+</sup> <sup>y</sup>) [<sup>x</sup> <sup>≥</sup> <sup>y</sup> <sup>∧</sup> <sup>y</sup> <sup>≥</sup> <sup>x</sup> <sup>∧</sup> <sup>y</sup> <sup>=</sup> 1] is not parallel closed but one readily confirms that the condition in Definition 10 applies.

### 5 Automation

As it is very inconvenient and tedious to test by hand if an LCTRS satisfies one of the confluence criteria presented in the preceding sections, we provide an implementation. The natural choice would be to extend the existing tool Ctrl [9] because it is currently the only tool capable of analyzing confluence of LCTRSs. However, Ctrl is not actively maintained and not very well documented, so we decided to develop a new tool for the analysis of LCTRSs. Our tool is called crest (constrained rewriting software). It is written in Haskell, based on the Haskell term-rewriting<sup>3</sup> library and allows the logics QF\_LIA, QF\_NIA, QF\_LRA.

The input format of crest is described on its website.<sup>4</sup> After parsing the input, crest checks that the resulting LCTRS is well-typed. Missing sort information is inferred. Next it is checked concurrently whether one of the implemented confluence criteria applies. crest supports (weak) orthogonality, strong closedness and (almost) parallel closedness. The tool outputs the computed critical pairs and a "proof" describing how these are closed, based on the first criterion that reports a YES result. Below we describe some of the challenges that one faces when automating the confluence criteria presented in the preceding sections.

First of all, how can we determine whether a constrained critical pair or more generally a constrained equation s ≈ t [ϕ] is trivial? The following result explains how this can be solved by an SMT solver.

Definition 11. *Given a constrained equation* s ≈ t [ϕ]*, the formula* T(s, t, ϕ) *is inductively defined as follows:*

$$T(s, t, \varphi) = \begin{cases} \text{true} & \text{if } s = t \\ s = t & \text{if } s, t \in \mathcal{V} \mathbf{al} \cup \mathcal{V} \mathbf{ar}(\varphi) \\ \bigwedge^n T(s\_i, t\_i, \varphi) & \text{if } s = f(s\_1, \dots, s\_n) \text{ and } t = f(t\_1, \dots, t\_n) \\ \text{false} & \text{otherwise} \end{cases}$$

Lemma 3. *A constrained equation* s ≈ t [ϕ] *is trivial if and only if the formula* ϕ =⇒ T(s, t, ϕ) *is valid.*

*Proof.* First suppose ϕ =⇒ T(s, t, ϕ) is valid. Let σ be a substitution with <sup>σ</sup> <sup>ϕ</sup>. Since <sup>σ</sup>(x) ∈ Val for all <sup>x</sup> ∈ Var(ϕ), we can apply <sup>σ</sup> to the formula ϕ =⇒ T(s, t, ϕ). We obtain [[ϕσ]] = from σ ϕ. Hence also [[T(s, t, ϕ)σ]] = . Since T(s, t, ϕ) is a conjunction, the final case in the definition of T(s, t, ϕ) is not used. Hence <sup>P</sup>os(s) = <sup>P</sup>os(t), <sup>s</sup>(p) = <sup>t</sup>(p) for all internal positions <sup>p</sup> in <sup>s</sup> and t, and s|pσ = t|pσ for all leaf positions p in s and t. Consequently, sσ = tσ. This concludes the triviality proof of s ≈ t [ϕ].

For the only if direction, suppose s ≈ t [ϕ] is trivial. Note that the variables appearing in the formula ϕ =⇒ T(s, t, ϕ) are those of ϕ. Let σ be an arbitrary

<sup>3</sup> https://hackage.haskell.org/package/term-rewriting-0.4.0.2.

<sup>4</sup> http://cl-informatik.uibk.ac.at/software/crest/.

assignment such that [[ϕσ]] = . We need to show [[T(s, t, ϕ)σ]] = . We can view <sup>σ</sup> as a substitution with <sup>σ</sup>(x) ∈ Val for all <sup>x</sup> ∈ Var(ϕ). We have <sup>σ</sup> <sup>ϕ</sup> and thus sσ = tσ by the triviality of s ≈ t [ϕ]. Hence T(s, t, ϕ) is a conjunction of equations between values and variables in ϕ, which are turned into identities by σ. Hence [[T(s, t, ϕ)σ]] = as desired.

The second challenge is how to implement rewriting on constrained equations in particular, how to deal with the equivalence relation ∼ defined in Definition 3.

*Example 13.* The LCTRS R

f(x) <sup>→</sup> <sup>z</sup> [<sup>z</sup> <sup>=</sup> 3] g(f(x)) <sup>→</sup> a g(3) <sup>→</sup> a

over the integers admits two critical pairs:

$$z \approx z'\text{ [}z = \mathbf{3} \land z' = \mathbf{3}\text{]} \qquad \qquad \qquad \mathbf{g}(z) \approx \mathbf{a}\text{ [}z = \mathbf{3}\text{]}$$

The first one is trivial, but to join the second one, an initial equivalence step is required:

$$\mathbf{g}(z) \approx \mathbf{a}\ [z=\mathbf{3}] \sim \mathbf{g}(\mathbf{3}) \approx \mathbf{a}\ [z=\mathbf{3}] \to \mathbf{a} \approx \mathbf{a}\ [z=\mathbf{3}]$$

The transformation introduced below avoids having to look for an initial equivalence step before a rule becomes applicable.

Definition 12. *Let* R *be an LCTRS. Given a term* t ∈ T (F, V)*, we replace values in* t *by fresh variables and return the modified term together with the constraint that collects the bindings:*

$$\mathbf{tf}(t) = \begin{cases} (t, \mathtt{true}) & \text{if } t \in \mathcal{V} \\ (z, z = t) & \text{if } t \in \mathcal{V} \mathtt{al} \text{ and } z \text{ is a fresh variable} \\ (f(s\_1, \ldots, s\_n), \varphi\_1 \wedge \cdots \wedge \varphi\_n) & \text{if } t = f(t\_1, \ldots, t\_n) \text{and } \mathtt{tf}(t\_i) = (s\_i, \varphi\_i) \end{cases}$$

*Applying the transformation* tf *to the left-hand sides of the rules in* <sup>R</sup> *produces*

$$\mathsf{tf}(\mathcal{R}) = \{ \ell' \to r \mid \varphi \wedge \psi \rfloor \mid \ell \to r \mid \varphi \rangle \in \mathcal{R} \, and \, \mathsf{tf}(\ell) = (\ell', \psi) \}$$

*Example 14.* Applying the transformation tf to the LCTRS <sup>R</sup> of Example <sup>13</sup> produces the rules

$$\mathbf{f}(x) \to z \text{ [}z=\mathbf{3}\text{]} \qquad \qquad \mathbf{g}(\mathbf{f}(x)) \to \mathbf{a} \qquad \qquad \mathbf{g}(z) \to \mathbf{a} \ [z=\mathbf{3}]\text{]}$$

The critical pair g(z) <sup>≈</sup> a [<sup>z</sup> <sup>=</sup> 3] can now be joined by an application of the modified third rule. Note that the modified rule does not overlap with the second rule because <sup>z</sup> may not be instantiated with f(x). Hence the modified LCTRS tf(R) is strongly closed and, because it is linear, also confluent.

In the following we show the correctness of the transformation. In particular we prove that the initial rewrite relation is preserved.


Table 1. Specific experimental results.

Lemma 4. *The relations* →<sup>R</sup> *and* →tf(R) *coincide on unconstrained terms.*

*Proof.* Consider s, t ∈ T (F, <sup>V</sup>). Since the transformation tf does not affect calculation steps, it suffices to consider rule steps. First assume s = C[σ] →ru <sup>C</sup>[rσ] = <sup>t</sup> by applying the rule <sup>→</sup> <sup>r</sup> [ϕ] ∈ R and let <sup>→</sup> <sup>r</sup> [ϕ ] <sup>∈</sup> tf(R) be its transformation. So tf()=( , ψ) and ϕ = ϕ ∧ ψ. Define the substitution

$$\sigma' = \{ \ell' |\_p \mapsto \ell |\_p \mid (\ell', \psi) = \mathtt{tf}(\ell), p \in \mathcal{Pos}(\ell) \text{ and } \ell |\_p \in \mathcal{Val} \}$$

and let τ = σ ∪ σ . Since <sup>D</sup>om(σ) ∩ Dom(σ ) = ∅ by construction, τ is welldefined. From σ → r [ϕ] and σ ψ we immediately obtain τ → r [ϕ ], which yields s = C[ <sup>τ</sup> ] <sup>→</sup>ru <sup>C</sup>[rτ ] = <sup>t</sup> in tf(R).

For the other direction consider s = C[ σ] →ru C[r σ] = t by applying the rule <sup>→</sup> <sup>r</sup> [ϕ ] <sup>∈</sup> tf(R). The difference between and its originating lefthand side in R is that value positions in are occupied by fresh variables in . Because σ respects ϕ = ϕ ∧ ψ, σ substitutes the required values at these positions in . As σ → r [ϕ ], there exists a rule → r [ϕ] which is respected by σ and thus s = C[σ] →ru C[rσ] = t in R.

As the transformation is used in the implementation and rewriting on constrained terms plays a key role, the following result is needed. The proof is similar to the first half of the proof of Lemma 4 and omitted.

Lemma 5. *The inclusion* →<sup>R</sup> ⊆ →tf(R) *holds on constrained terms.*

#### 6 Experimental Results

In order to evaluate our tool we performed some experiments. As there is no official database of interesting confluence problems for LCTRSs, we collected several LCTRSs from the literature and the repository of Ctrl. The problem files in the latter that contain an equivalence problem of two functions for rewriting induction were split into two separate files. The experiments were performed on an AMD Ryzen 7 PRO 4750U CPU with a base clock speed of 1.7 GHz, 8


Table 2. Comparison between confluence criteria implemented in crest.

cores and 32 GB of RAM. The full set of benchmarks consists of 127 problems of which crest can prove 90 confluent, 11 result in MAYBE and 26 in a timeout. With a timeout of 5 s crest needs 141.09 s to analyze the set of benchmarks. We have tested the implementation with 3 well-known SMT solvers: Z3, Yices and CVC5. Among those Z3 gives the best performance regarding time and the handling of non-linear arithmetic. Hence we use Z3 as the default SMT solver in our implementation. In Table 1 we list some interesting systems from this paper and the relevant literature. Full details are available from the website of crest. We choose 5 as the maximum number of steps in the →<sup>∗</sup> parts of the strongly closed and almost parallel closed criteria.

From Table 2 the relative power of each implemented confluence criterion on our benchmark can be inferred, i.e., it depicts how many of the 127 problems both methods can prove confluent. This illustrates that the relative applicability in theory (e.g., weakly orthogonal LCTRSs are parallel closed), is preserved in our implementation. We conclude this section with an interesting observation discovered by crest when testing [12, Example 23].

We also tested the applicability of Corollary 1, using the tool Ctrl as a black box for proving termination. Of the 127 problems, Ctrl claims 102 to be terminating and 67 of those can be shown locally confluent by crest, where we limit the number of steps in the joining sequence to 100. It is interesting to note that all of these problems are orthogonal, and so proving termination and finding a joining sequence is not necessary to conclude confluence, on the current set of problems. Of the remaining 35 problems, crest can show confluence of 5 of these by almost parallel closedness.

*Example 15.* The LCTRS R is obtained by completing a system consisting of four constrained equations:

$$\mathop{1.}\_{\sigma} \quad \mathsf{f}(x, y) \to \mathsf{f}(z, y) + \mathbbm{1} \left[ x \ge \mathbbm{1} \land z = x - \mathbbm{1} \right]$$

$$\begin{array}{cc} 2. & \mathsf{f}(x,\mathsf{0}) \to \mathsf{g}(\mathsf{1},x) \; [x \le \mathsf{1}], \\ \mathsf{f} & \mathsf{f} \end{array}$$


Calling crest on R results in a timeout. As a matter of fact, the LCTRS is not confluent because the critical pair

$$\mathbf{g}(1,x) + \mathbf{1} \approx \mathbf{f}(x-1, \mathbf{0}) + \mathbf{2} \left[ x \le \mathbf{1} \land x \ge \mathbf{1} \right]$$

between rules 5 and 6 is not joinable. Inspecting the steps in [12, Example 23] reveals some incorrect applications of the inference rules of constrained completion, which causes rule 6 to be wrong. Replacing it with the correct rule

$$6'.\quad \mathsf{h}(x) \rightarrow (\mathsf{f}(z, \mathsf{0}) + 1) + 1 \ [x > 1 \land z = x - 1]$$

causes crest to report confluence by strong closedness.

### 7 Concluding Remarks

In this paper we presented new confluence criteria for LCTRSs as well as a new tool in which these criteria have been implemented. We clarified the subtleties that arise when analyzing joinability of critical pairs in LCTRSs and reported experimental results.

For plain rewrite systems many more confluence criteria are known and implemented in powerful tools that compete in the yearly Confluence Competition (CoCo).<sup>5</sup> In the near future we will investigate which of these can be lifted to LCTRSs. We will also advance the creation of a competition category on confluence of LCTRSs in CoCo.

Our tool crest has currently no support for termination. Implementing termination techniques in crest is of clear interest. The starting point here are the methods reported in [6,7,12]. Many LCTRSs coming from applications are actually non-confluent.<sup>6</sup> So developing more powerful techniques for LCTRSs is on our agenda as well.

Acknowledgments. We thank Fabian Mitterwallner for valuable discussions on the presented topics and our Haskell implementation. The detailed comments by the reviewers improved the presentation. Cynthia Kop and Deivid Vale kindly provided us with instructions and a working implementation of Ctrl.

### References


<sup>5</sup> http://project-coco.uibk.ac.at/.

<sup>6</sup> Naoki Nishida, personal communication (February 2023).


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Towards a Verified Tableau Prover for a Quantifier-Free Fragment of Set Theory**

Lukas Stevens(B)

Technical University of Munich, Boltzmannstr. 3, 85748 Garching, Germany lukas.stevens@in.tum.de

**Abstract.** Using Isabelle/HOL, we verify the state-of-the-art decision procedure for multi-level syllogistic with singleton (**MLSS** for short), which is a quantifier-free fragment of set theory. We formalise its syntax and semantics as well as a sound and complete tableau calculus for it. We also provide an executable specification of a decision procedure that exhaustively applies the rules of the calculus and prove its termination. Furthermore, we extend the calculus with a lightweight type system that paves the way for an integration of the procedure into Isabelle/HOL.

**Keywords:** Decision procedures · Semantic tableaux · Interactive theorem proving · Set theory

### **1 Introduction**

In Isabelle/HOL, there are specialised procedures for dealing with e.g. natural numbers, linear arithmetic, and metric spaces. Some of these procedures have been verified in Isabelle/HOL, such as a procedure for Presburger arithmetic [12] that was later extended to mixed real-integer arithmetic [11]. This procedure, though, uses reflection to work on goals in Isabelle/HOL, which, during execution, either sacrifices speed by going through the simplifier or requires trusting the code generator. More recently, Stevens and Nipkow [25] presented a verified decision procedure for orders that produces certificates. This approach offers efficient execution by using generated code as well as soundness because the certificates are replayed through Isabelle's inference kernel.

This paper focuses on another ubiquitous structure in mathematics, namely sets. To the best of our knowledge, we present the first formally verified decision procedure for (a fragment of) set theory. In particular, we consider a quantifierfree fragment which Cantone and Zarba [9] call multi-level syllogistic with singleton (**MLSS**). The fragment includes the usual set operations of union, intersection, difference, membership, equality and, in addition, it allows the construction of singleton sets.

Since **MLSS** admits a tableau calculus, generating certificates will be straightforward. Like with the aforementioned order solver, this paves the way for an integration of the decision procedure into Isabelle, adding to its growing body of verified decision procedures.

#### **1.1 Contributions**

We present a formalisation in Isabelle/HOL of a tableau calculus for **MLSS** due to Cantone and Zarba [9] [7, Chapter 14]. We prove soundness and completeness of the calculus and give an abstract specification of a decision procedure that exhaustively applies the rules of the calculus. To obtain total correctness of the procedure, we prove its termination. Additionally, we naively refine the abstract to an executable specification from which we can generate code. The formalisation initially follows the paper but offers a more thorough account of some important details:


In the context of Isabelle/HOL, there is one crucial aspect that requires us to modify the calculus in the paper: the calculus works under the assumption that every variable is a set; however, this is not the case in Isabelle/HOL, e.g. consider the expression n ∈ A where n is a natural number. We call these variables urelements. To deal with them, we extend the calculus with a lightweight type system and a verified inference algorithm that identifies the urelements.

The modification of the calculus required non-trivial changes to the completeness proof. Here, the formalisation was instrumental because Isabelle immediately revealed which proofs had been broken. This illustrates the usefulness of ITPs for developing logic calculi: they allow us to confidently make modifications without compromising correctness.

All in all, the formalisation amounts to over 6000 lines of theory. It is part of the *Archive of Formal Proofs* (AFP) [24]. The entry provides an overview theory MLSS Proc All.thy that highlights the (mostly syntactic) differences between paper and formalisation and references the constants and theorems that are introduced in this paper.

#### **1.2 Related Work**

Since the literature on decidable fragments of set theory is vast, we only focus on **MLSS** here. Ferro et al. [14] were the first to show the decidability of the fragment. Subsequent work [6] found the decision problem to be **NP**-complete. To obtain a practical decision procedure, Cantone [4] proposed a tableau calculus, which was later improved by Beckert and Hartmer [1]. Both of these procedures construct a model during execution that guides the proof search. Beckert and Hartmer also cover an extension of the calculus with uninterpreted functions, which Cantone and Zarba [10] later revisited while avoiding the construction of a model during execution. In this paper, we consider a version of the latter procedure due to Cantone and Zarba [9] that is specialised to **MLSS** and where the branching rules of the calculus are set up to guarantee the mutual exclusivity of the branches. Later extensions of the calculus added certain interpreted functions, such as monotone functions [8] and the inverse of a function [5]. The latter extension notably includes the Cartesian product. Those extensions, though, did not improve upon the tableau calculus for **MLSS**.

There is a large body of work at the intersection of ITPs and tableau methods, but to keep with this paper's theme we only consider formalisations of correctness here. For first-order logic, there are abstract completeness proofs using the *Beth-Hintikka style* of possibly infinite derivation trees [3] as well as the *Henkin style* of maximally consistent sets [17]. Both are abstract enough to be instantiated with a wide range of concrete calculi. A more concrete formalisation [19] verifies a sequent calculus for first-order logic whose completeness proof is via a translation to semantic tableau.

Beyond completeness, we target decidability, which is more attainable for propositional logic. There is a verified tableau calculus for the modal logic S5 [2] in Lean and one for hybrid logic [18] in Isabelle/HOL. Both of these do not prove termination but there is a formalisation of a tableau calculus for the temporal logic CTL in Coq [13] that does.

#### **1.3 Notation**

Isabelle/HOL [21] conforms to everyday mathematical notation for the most part. We establish notation and in particular some essential data types together with their primitive operations that are specific to Isabelle/HOL.

We write t :: 'a to specify that the term t has the type 'a and 'a ⇒ 'b for the space of total functions from type 'a to type 'b.

Sets with elements of type 'a have the type 'a set. The cardinality of a set A is denoted by |A| and the image of A under f by f'A.

We use 'a list to describe the type of lists, which are constructed using the empty list [] constructor or the infix cons constructor #, and are appended with the infix operator @. The function set converts a list into a set.

We remark that ←→ is equivalent to = on the type of Booleans bool and ≡ is definitional equality of the meta-logic of Isabelle/HOL, which is called Isabelle/Pure. Meta-implication is denoted by =⇒ and a chain of implications <sup>A</sup>1 <sup>=</sup>⇒ ··· <sup>=</sup><sup>⇒</sup> <sup>A</sup>k <sup>=</sup><sup>⇒</sup> <sup>C</sup> can be abbreviated by -<sup>A</sup>1; ... ;Ak <sup>=</sup><sup>⇒</sup> <sup>C</sup>.

#### **2 Syntax and Semantics of MLSS**

#### **2.1 Syntax**

At the heart of **MLSS**, we have the type of set terms, which is the disjoint union of the empty set and variables as well as the operations union, intersection, difference, and the singleton set represented by the constructor Single. We keep the type of variables abstract by making it a parameter of the set term data type. The only restriction on the type of variables is that it needs to be infinite. Isabelle/HOL's data type package automatically defines a function that gives us the set of variables in a set term, which we name vars. In what follows, we will overload the function vars to also work on set atoms, formulas, and branches.

```
datatype (vars: 'a) pset_term =
    ∅ | Var 'a | Single ('a pset_term)
  | 'a pset_term s 'a pset_term
  | 'a pset_term 	s 'a pset_term
  | 'a pset_term −s 'a pset_term
```
We can combine two set terms to form a set atom by using the membership or the equality operator.

```
datatype (vars: 'a) pset_atom =
    'a pset_term ∈s 'a pset_term
  | 'a pset_term =s 'a pset_term
```
With the above operators we can also represent the subset operator s and enumerate finite sets: s s <sup>t</sup> is equivalent to <sup>s</sup> s <sup>t</sup> <sup>=</sup>s <sup>t</sup> and a finite set of elements {t1,...,tk} can be expressed by Single t<sup>1</sup> s ... s Single t*k*.

We use the propositional fragment of formulas due to Nipkow [20] with set atoms as propositional atoms to form the quantifier-free fragment **MLSS** of set theory.

```
datatype (atoms: 'a) fm =
    A 'a
  | ¬ ('a fm)
  | 'a fm ∧ 'a fm
  | 'a fm ∨ 'a fm
```

```
type_synonym 'a pset_fm = 'a pset_atom fm
```
We will often drop the atom constructor A to reduce clutter. Additionally, we use <sup>s</sup> <sup>∈</sup>/s <sup>t</sup> and <sup>s</sup> <sup>=</sup>s <sup>t</sup> to denote *<sup>¬</sup>* A (s <sup>∈</sup>s t) and *<sup>¬</sup>* A (s <sup>=</sup>s t), respectively.

Similarly to vars, we get the function atoms :: 'a fm ⇒ 'a set for free that retrieves all set atoms in a formula. We combine these functions to extract all the variables occurring in a set formula.

definition vars φ ≡ -(vars ' atoms φ)

Likewise, we fix the constant subterms :: 'b ⇒ 'a pset\_term set that is polymorphic in its argument type 'b. We overload this constant to return the set terms that are subterms of a set term, set atom, or formula, respectively. Lastly, we introduce the function subfms :: 'a fm ⇒ 'a fm set that computes the subformulas of a formula. The functions subterms and subfms are implemented in the expected way.

#### **2.2 Semantics**

The original paper [9] bases the semantics of **MLSS** on the von Neumann hierarchy of sets V. We instead use the hierarchy of *hereditarily finite sets* (HF sets) which fulfil all the same axioms as V – that is, the axioms of ZF – except for the axiom of infinity. In particular, the membership relation is well-founded. The HF sets, as we will see, are sufficient to construct a model for any satisfiable **MLSS** formula. In contrast to V, the HF sets are directly representable in Isabelle/HOL, and indeed, an AFP entry [23] formalises them. The entry defines a type hf that comes with the following functionality:


Equipped with the above, we define the interpretation functions

– <sup>I</sup>st :: ('a <sup>⇒</sup> hf) <sup>⇒</sup> 'a pset\_term <sup>⇒</sup> hf and – <sup>I</sup>sa :: ('a <sup>⇒</sup> hf) <sup>⇒</sup> 'a pset\_atom <sup>⇒</sup> hf

in the standard way, i.e. by mapping each syntactic construct to the corresponding operation on HF sets and interpreting variables with respect to a given valuation function M :: 'a ⇒ hf. For the concrete definition we refer to the formalisation.

We write M |= φ for the judgement that the formula φ holds under the valuation function M. The implementation of |= coincides with the interpretation function of Nipkow [20]. As usual, we call a formula φ *satisfiable* if there exists a model M with M |= φ. Otherwise, we say that φ is *unsatisfiable*.

### **3 A Tableau Calculus for MLSS**

We formalise the tableau calculus for **MLSS** as described by Cantone and Zarba [9]. Inspired by the formalisation of a tableau calculus for hybrid logic by From [16], we use lists to represent the branches of the tableau tree. Note that we add formulas to the front of the list during branch expansion, so last b for a branch b is always the formula we are trying to disprove with the tableau. We sometimes call this formula the *initial formula*.

#### type\_synonym 'a branch = 'a pset\_fm list

We lift the functions vars and subterms to branches in the expected way.

In the standard tableau calculus for propositional logic as Fitting [15] describes it, a branch is called *closed* if it contains both the negation of a formula and the formula itself; conversely, it is called *open* if it is not closed. For **MLSS**, we extend the notion of closedness with three additional rules; the first two are straightforward while the last one states that a branch is closed when the branch contains a membership cycle <sup>t</sup>0 <sup>∈</sup>s <sup>t</sup>1, t1 <sup>∈</sup>s <sup>t</sup>2, ..., tk <sup>∈</sup>s <sup>t</sup>0.


**Table 1.** Linear expansion rules. All rules except the double negation rule coincide with the original paper [9]. For brevity, we omit the rules for <sup>s</sup> and −s.

```
inductive bclosed :: 'a branch ⇒ bool where
 -
   φ ∈ set b; ¬ φ ∈ set b  =⇒ bclosed b
   member_cycle cs; set cs ⊆ set b  =⇒ bclosed b
```

```
abbreviation bopen b ≡ ¬ bclosed b
```
A tableau is called *closed* if all of its branches are closed.

#### **3.1 Linear Expansion Rules**

The calculus considers two kinds of branch expansion rules: *linear* and *branching* rules. As the name suggests, branching rules lead to the creation of new branches in the tableau while linear rules only extend a branch b with new formulas b' = [ψ1,...,ψ*n*], which we denote by b' b. Table 1 shows the linear expansion rules. Note that in the first two rules for =s, <sup>l</sup> is a literal occurring in the branch. Furthermore, the term-for-term substitution l{s/t} is restricted to the top-level set terms of l, i.e. the set terms that occur directly under one of the atom constructors <sup>∈</sup>s or =s; for example, given the literal

l = *<sup>¬</sup>* ((s s u) <sup>−</sup>s <sup>s</sup> <sup>=</sup>s <sup>s</sup> s u)

we have


**Table 2.** Branching expansion rules. We write φ for last b here. All rules coincide with the original paper [9] so we only show an illustrative subset.

(*<sup>¬</sup>* ((s s u) <sup>−</sup>s <sup>s</sup> <sup>=</sup>s <sup>s</sup> s u)){t/s s u} <sup>=</sup> *<sup>¬</sup>* ((s s u) <sup>−</sup>s <sup>s</sup> <sup>=</sup>s t).

A more crucial restriction of the linear rules is that no new subterm may be created by their application; for instance, the second rule for s is

<sup>s</sup> <sup>∈</sup>s <sup>t</sup>1 <sup>=</sup><sup>⇒</sup> <sup>s</sup> <sup>∈</sup>s <sup>t</sup>1 s <sup>t</sup>2,

which formally represents

(s <sup>∈</sup>s <sup>t</sup>1) <sup>∈</sup> set b <sup>=</sup><sup>⇒</sup> [s <sup>∈</sup>s <sup>t</sup>1 s <sup>t</sup>2] b,

and may only be used under the condition <sup>t</sup>1 s <sup>t</sup>2 <sup>∈</sup> subterms (last b). The purpose of this restriction is to prevent unbounded expansion of the branch. In fact, we give an explicit upper bound for the number of formulas in a branch in Sect. 7.

Due to boundedness, repeated expansion with linear rules eventually results in a *linearly saturated* branch, i.e. a branch where no application of linear rules would produce new formulas.

### definition lin\_sat b ≡ ∀b'. b' b −→ set b' ⊆ set b

Finally, we remark that the original paper [9] is missing the last propositional rule dealing with double negation. This rule is required for completeness, though, considering that the branch [*¬¬¬* p, p, *¬¬¬* p *∧* p] is saturated—neither linear nor branching rules apply—and open, but there clearly is no model for the initial formula *¬¬¬* p *∧* p.

#### **3.2 Branching Rules**

After running out of linear rules to apply, only the branching rules shown in Table 2 remain. A rule is applicable if its *precondition* is met and, to prevent unnecessary branching, if it is not subsumed as indicated by the *subsumption condition*. These rules create multiple branches in the tableau, so we represent the different possibilities bs' to expand a branch b as a set and write bs' b. Accordingly, we get a new branch b' @ b in the tableau for each b' ∈ bs'.

A linearly saturated branch where no further branching is possible is called a *saturated* branch.

definition sat b ≡ lin\_sat b ∧ (bs'. bs' b)

Note that even branching rules are defined such that they never create new subterms, except for the last rule that adds a new variable to the branch. These variables serve to manifest an inequality; hence, we call them *witnesses*.

definition wits b ≡ vars b - vars (last b)

### **4 A Decision Procedure for MLSS**

The mechanics of the decision procedure are typical for a procedure based on a tableau calculus: it decides the satisfiability of a given formula φ by determining whether the formula has a closed tableau. More specifically, it initialises the tableau with the singleton branch [φ] and checks whether this branch can be expanded to a closed tableau.

We only discuss the abstract specification here and refer the reader to the formalisation for the executable specification. The implementation uses a couple of features of Isabelle/HOL's function package: instead of defining the function via pattern matching, we specify the equations of the function as conditional rewrite rules. This requires us to prove that the assumptions of the equations are non-overlapping, which is done by automation. The other concern is that Isabelle/HOL requires functions to be total, so a recursive function needs to terminate for it to be well-defined; nevertheless, the termination proof is separated from the definition of the function for modularity. The function package maintains the soundness of the definition by introducing a so-called domain predicate mlss\_proc\_branch\_dom which characterises the arguments for which the function terminates. Each equation of the function is guarded by an assumption that the predicate holds for the argument. In Sect. 7, we will show that the domain predicate holds for the context in which the function mlss\_proc\_branch is called in. Before we go into more detail on how the termination is proved, we discuss the definition of the function, as shown below.

```
function mlss_proc_branch :: 'a branch ⇒ bool where
 ¬ lin_sat b =⇒ mlss_proc_branch b =
 mlss_proc_branch ((SOME b'. b' b ∧
                             set b ⊂ set (b' @ b)) @ b)
```

```
   lin_sat b; bclosed b  =⇒ mlss_proc_branch b = True
   ¬ sat b; bopen b; lin_sat b  =⇒ mlss_proc_branch b =
 (∀b' ∈ (SOME bs. bs b). mlss_proc_branch (b' @ b))
   lin_sat b; sat b  =⇒ mlss_proc_branch b = bclosed b
definition mlss_proc :: 'a pset_fm ⇒ bool where
 mlss_proc φ ≡ mlss_proc_branch [φ]
```
The purpose of the function is to determine whether we can expand a given branch to a closed tableau. As stated before, we first use linear expansion rules in order to prevent premature branching; to this end, we recursively expand the branch with linear rules until the branch is linearly saturated. Note that we use Hilbert's ε-operator in the form of SOME<sup>1</sup> to choose some rule that actually adds new formulas to the branch. As soon as the branch is linearly saturated, we terminate if the branch is closed as the second equation shows. Otherwise, we choose an applicable branching rule and recursively check whether all newly created branches can be closed. The final equation applies once no further branch expansion is possible, in which case we just test for closedness of the branch.

The procedure mlss\_proc then calls mlss\_proc\_branch with a singleton branch [φ] to determine the satisfiability of a given formula φ.

Thus, we use mlss\_proc\_branch is only on branches that result from applying the expansion rules. We call this kind of branch *well-formed*. In the definition below, the expression b' <sup>∗</sup> b denotes that b' is one of the branches that results from applying (potentially zero) expansion rules to b.

```
definition wf_branch b ≡ ∃φ. b ∗ [φ]
```
We use this notion in Sect. 7 to state an upper bound for the cardinality of well-formed branches. The upper bound justifies the termination of the decision procedure. Before we come to that, though, we prove soundness and completeness in Sect. 6 and 5, respectively. In Sect. 7, we also show that both properties easily transfer to mlss\_proc, which, together with termination, establishes that it is a decision procedure.

### **5 Completeness of the Calculus**

For completeness of the calculus, we need to show that every unsatisfiable formula has a closed tableau or, conversely, that the formula is satisfiable if there is a saturated and open branch in the tableau. To facilitate inductive reasoning, we show a stronger statement by constructing a model M such that M |= φ for all φ ∈ set b. At the core of the model, there is a *realisation* function that maps set terms to sets of type hf. A subset of the witnesses, which we call *pure* witnesses, receives special treatment from the realisation function for reasons that will become apparent in Sect. 5.1. The collection of set terms of a branch can thus be partitioned into two collections, as defined below.

<sup>1</sup> In the formalisation, the function mlss\_proc\_branch is actually parametrised by choice functions to allow for refinement.

```
definition pwits :: 'a branch ⇒ 'a set where
  pwits b ≡ {c ∈ wits b. ∀t ∈ subterms (last b).
              AT (Var c =s t) ∈/ set b ∧ AT (t =s Var c) ∈/ set b}
definition subterms' :: 'a branch ⇒ 'a pset_term set where
  subterms' b ≡ subterms (last b) ∪ Var ' (wits b - pwits b)
```
We aim to construct a syntactic model that we derive from the membership literals <sup>s</sup> <sup>∈</sup>s <sup>t</sup> in the branch. To this end, we construct a graph whose vertices are the disjoint union of the sets above and there is an edge from s to t in the graph if, and only if, <sup>s</sup> <sup>∈</sup>s <sup>t</sup> is in <sup>b</sup>. Note that we use Noschinski's graph library [22] which represents a graph as a record of vertices, arcs (directed edges), and two functions tail and head that map an arc to its source and target vertex, respectively.

```
definition bgraph b ≡ let vs = Var ' pwits b ∪ subterms' b
  in  verts = vs, arcs = {(s, t). (s ∈s t) ∈ set b},
      tail = fst, head = snd
```
The realisation function is defined relative to this graph. As mentioned before, the realisation function treats the pure witnesses differently than the rest of the set terms. The function evaluates terms in the latter set in accordance to the structure of the graph, i.e. the realisation of a vertex is defined as the union of the realisations of the parent vertices. For the former set, we choose a function I that assigns the pure witnesses pairwise distinct sets with cardinality greater than that of the vertices. We can always choose such a function since we assume an infinite universe of variables. Then, we return the singleton set HF {I x}, which, together with the cardinality constraint, guarantees that realisations are distinct between pure witnesses themselves as well as between pure witnesses and set terms. The notation <sup>u</sup> <sup>→</sup>G <sup>s</sup> in the definition below indicates that there is an edge from u to s in the graph G.

```
abbreviation parents G s ≡ {u. u →G s}
```

```
function realise :: 'a pset_term ⇒ V where
  x ∈ Var ' pwits b =⇒ realise x = HF {I x}
  =⇒ realise t = HF {realise ' parents (bgraph b) s}
```
Again, we need to ensure that the assumptions of the equations are nonoverlapping and that the function terminates. The former is taken care of by automation, leaving us to prove termination. The assumption that b is open implies that there are no membership cycles, thus bgraph b is acyclic. Furthermore, the graph is finite by definition. Thus, we can use the cardinality of the set of ancestors as a measure that decreases in each recursive call.

Before we prove that the realisation function constitutes a model in Sect. 5.2, we will first explain the significance of the pure witnesses.

#### **5.1 Characterisation of the Pure Witnesses**

Recall that the pure witnesses of a branch b are those witnesses that are not related to other subterms in last b by equality. In the context of a well-formed branch, we can strengthen this characterisation to any set term and, in addition, we also get that there is no membership literal where a pure witness is on the right-hand side. Intuitively speaking, the realisation of a pure witness does not depend on the realisation of any other set term.

```
lemma lemma_2:
  assumes wf_branch b and c ∈ pwits b
  shows (Var c =s t) ∈/ set b and (t =s Var c) ∈/ set b
    and (t ∈s Var c) ∈/ set b
```
So why are pure witnesses treated differently? According to the definition of realise, it would evaluate the pure witnesses would to the empty set 0 :: hf, were they not treated separately. To see that this is a problem, consider the branch b = [Var s <sup>=</sup>s Var t, Var t <sup>=</sup>s Var u] which expands to several open and saturated branches, one of which is

```
[Var x =s Var y, Var x ∈s Var s, Var x ∈/s Var t,
                 Var y ∈s Var t, Var y ∈/s Var u] @ b
```
for some fresh x and y. Assigning both Var x and Var y a value of 0 would contradict the literal Var x <sup>=</sup>s Var y. To prevent this, we assign the pure witnesses pairwise different values.

The proof of lemma\_2 is more technical than interesting so we refer the reader to the formalisation.

#### **5.2 Realisation of an Open Branch**

Remember that for completeness, we need to show that the realisation function for an open and saturated branch b actually constitutes a model for all formulas in the branch. We start by verifying that the realisation function models all literals in the branch; more formally, the following propositions hold:

```
(1) We have realise s ∈ realise t if it holds that s ∈s t is in b.
```

```
(2) We have realise s = realise t if s =s t is in b.
```

```
(3) We have realise s = realise t if s =s t is in b.
```

```
(4) We have realise s ∈/ realise t if it holds that s ∈/s t is in b.
```
To illustrate the usefulness of lemma\_2, we prove Proposition (2). The proofs of all propositions translate well into Isabelle, so we refer to the original paper [9] for the remaining proofs.

*Proof. (Proof of Proposition* (2)*).* Assume that <sup>s</sup> <sup>=</sup>s <sup>t</sup> is in <sup>b</sup>. If there exists a c ∈ pwits b where s = Var c or t = Var c, we arrive at a contradiction due to lemma\_2. Therefore, both s ∈ subterms' b and t ∈ subterms' b must hold. Now, assume for contradiction that realise s = realise t. Without loss of generality—the other case is symmetric—we obtain an e such that e *∈* realise s and e *∈/* realise t. Considering that s ∈ subterms' b and the definition of realise, we obtain a <sup>d</sup> with e = realise d and <sup>d</sup> <sup>→</sup>bgraph b <sup>s</sup>. This, in turn, yields that <sup>d</sup> <sup>∈</sup>s <sup>s</sup> must be in <sup>b</sup>. Together with the assumption (s <sup>=</sup>s t) <sup>∈</sup> set b and the saturation of <sup>b</sup>, it follows that <sup>d</sup> <sup>∈</sup>s <sup>t</sup> must also be in b. But then we have realise d *∈* realise t ←→ e *∈* realise t using Proposition (1), which is a contradiction to the assumption e *∈/* realise t.

We now lower the results on literals to set terms. All of the proofs are straightforward so we refer the reader to the formalisation.

(a) It holds that realise ∅ = 0. (b) Let s <sup>∈</sup> {s, <sup>−</sup>s, s}. If the term <sup>s</sup> s <sup>t</sup> occurs in subterms b, then realise (s s t) = realise s realise t.

(c) If Single t ∈ subterms b, then

```
realise (Single t) = HF {realise t}.
```
The final step for obtaining a proper model is to connect the realisation function to the semantics as defined in Sect. 2. For set terms, we can use the Propositions (a)–(c) to prove the lemma below by induction on t.

```
lemma assumes t ∈ subterms b
      shows Ist (λx. realise (Var x)) t = realise t
```
Lifting the above result to formulas yields the coherence of b, as the original paper [9] calls it. The proof is a tedious but straightforward induction on the the size of the formulas.

### lemma coherence: assumes φ ∈ set b shows (λx. realise (Var x)) |= φ

The coherence property finishes the proof of completeness of the calculus as it gives us a model for every formula in an open and saturated branch.

### **6 Soundness of the Calculus**

A tableau calculus is sound if the corresponding formula is unsatisfiable for any closed tableau. We prove the following two properties to establish soundness:


We formalise the first property in Isabelle below.

```
lemma bclosed_sound:
  assumes bclosed b shows ∃φ ∈ set b. M |= φ
```
*Proof.* It is clear that, for any <sup>s</sup>, neither does <sup>M</sup> model <sup>s</sup> ∈ ∅ nor <sup>s</sup> <sup>=</sup>s <sup>s</sup>. Furthermore, no model can satisfy both φ and *¬*φ at the same time. Lastly, a membership cycle is impossible since the membership relation of hf is well-founded.

We are left with showing that both linear and branching expansion rules preserve satisfiability. As for the linear rules, a straightforward proof by case analysis on b' b suffices to obtain the lemma below.

lemma lexpands\_sound: assumes b' b and φ ∈ set b' and ψ. ψ ∈ set b =⇒ M |= ψ shows M |= φ

A similar argument would work for the branching rules if it were not for the last rule adding new variables. Those variables need to be assigned specific values; hence, we modify the model as shown in the proof below.

lemma bexpands\_sound:

assumes bs' b and ψ. ψ ∈ set b =⇒ M |= ψ shows ∃M'. ∃b' ∈ bs'. ∀ψ ∈ set (b' @ b). M' |= ψ

*Proof.* We only consider the case where bs' b was proved by applying the last branching expansion rule to <sup>s</sup> <sup>=</sup>s <sup>t</sup> for some <sup>s</sup> and <sup>t</sup>. We have

bs' = {[Var x <sup>∈</sup>s s, Var x <sup>∈</sup>/s t], [Var x <sup>∈</sup>s t, Var x <sup>∈</sup>/s s]}

for some fresh variable <sup>x</sup>. Since <sup>s</sup> <sup>=</sup>s <sup>t</sup> is in <sup>b</sup>, we have that <sup>I</sup>st M s <sup>=</sup> <sup>I</sup>st M t because M is a model. Without loss of generality, this inequality manifests itself through some <sup>y</sup> with <sup>y</sup> *<sup>∈</sup>* <sup>I</sup>st M s and <sup>y</sup> *<sup>∈</sup>/* <sup>I</sup>st M t. We update <sup>M</sup> to map <sup>x</sup> to y to obtain the assignment M'. Note that M' is still a model for formulas in b because x is fresh with respect to b. Furthermore, it is also a model for the first branch in bs', which finishes the proof.

### **7 Total Correctness of the Decision Procedure**

We first demonstrate the termination of the procedure for well-formed branches, i.e. every well-formed branch is in the domain of mlss\_proc\_branch. To this end, we derive an upper bound for the number of distinct formulas in a branch whose proof we omit here for brevity. We should point out that this bound is not to be construed as the complexity of the procedure as it may create exponentially many branches in general.

```
lemma card_wf_branch_ub:
  assumes wf_branch b
  shows |set b| ≤ 2 * |subfms (last b)| + 16 * |subterms (last b)|4
```
Remember that mlss\_proc\_branch only applies a linear expansion rule to a branch if the application results in new formulas. Moreover, the subsumption conditions of the branching expansion rules ensure that each of the newly created branches contain new formulas. Ultimately, we conclude that the procedure must terminate for well-formed branches because the number of formulas increases in each step but is also bounded.

#### lemma assumes wf\_branch b shows mlss\_proc\_branch\_dom b

The above lemma allows us to utilise the computation induction rule of mlss\_proc\_branch on well-formed branches, which we use to prove soundness and completeness. As both proofs are essentially an application of soundness, respectively completeness, of the calculus, we refer the reader to the formalisation.

```
lemma mlss_proc_branch_complete:
  fixes b :: 'a branch
  assumes wf_branch b and ¬ mlss_proc_branch b
  assumes infinite (UNIV :: 'a set)
  shows ∃M. M |= last b
lemma mlss_proc_branch_sound:
  assumes wf_branch b and ∀ψ ∈ set b. M |= ψ
  shows ¬ mlss_proc_branch b
```
To finish the proof of total correctness, note that every singleton branch is trivially well-formed; thus, termination, completeness, and soundness easily transfer to mlss\_proc.

```
theorem mlss_proc_complete:
  fixes φ :: 'a pset_fm
  assumes ¬ mlss_proc φ and infinite (UNIV :: 'a set)
  shows ∃M. M |= φ
theorem mlss_proc_sound:
  assumes M |= φ shows ¬ mlss_proc φ
```
### **8 Dealing with Urelements**

In the introduction, we stated the goal of integrating mlss\_proc as a tactic into Isabelle. For this to work, we must map every branch expansion rule to a corresponding theorem in Isabelle/HOL. This is straightforward for all expansion rules except for the last branching expansion rule. To illustrate, suppose that we are to disprove a statement of the form

s = (t :: 'a) ∧ s ∈ (A :: 'a set) ∪ B ∧ ...

in Isabelle/HOL. By way of reification, we convert this to a formula of the shape

s' <sup>=</sup>s t' *<sup>∧</sup>* s' <sup>∈</sup>s A' s B' *<sup>∧</sup>* ...

in our set syntax for some s', t', A', and B'. When we apply the decision procedure to this formula, it might return a tableau proof that contains an application of the last branching rule to (s' <sup>=</sup>s t') <sup>∈</sup> set b. This results in two branches, one of which is [Var x <sup>∈</sup>s s', Var x <sup>∈</sup>/s t'] @ b; however, there is no matching rule in Isabelle/HOL since s and t are not sets.

To deal with this problem, we formalise a lightweight type system as displayed in Fig. 1. The type of a set term in this system is just a natural number which we call level. Intuitively speaking, the level l means that the corresponding term t in Isabelle/HOL has type

'a set ... set l times

for some 'a. Note that the constructor ∅ now receives an additional argument indicating the level of each instance of ∅.

Moreover, the typing judgement extends to set atoms by matching up the levels of its component set terms.

Ultimately, we define Γ φ ≡ ∀a ∈ atoms φ. Γ a in order to type formulas.

We can now define the urelements with respect to a formula. An urelement is a set term whose corresponding type in Isabelle/HOL might not be a set.

### definition urelem :: 'a pset\_fm ⇒ 'a pset\_term ⇒ bool where urelem φ t ≡ ∃Γ. Γ φ ∧ Γ t:0

Using this definition, we make two changes to the specification of the calculus: (1) First and foremost, we require that neither s nor t is an urelement in the precondition of the last branching expansion rule. (2) As mentioned above, we add an argument to the ∅ constructor. This argument is only used for the typing judgement; it has no impact on the semantics.

Soundness, of course, is not affected by these changes but we have to make a few amendments to maintain completeness: (1) The first equation of realise now also must account for the urelements. In particular, it has to ensure that urelements receive pairwise different values unless they are related through equality atoms. This does not affect pure witnesses since they can not be related through equality atoms due to lemma\_2. (2) We must adjust the completeness proof in those places where it directly refers to the definition of realise to account for the case where a given term is an urelement. (3) The completeness theorem receives the additional assumption that Γ φ holds for the initial formula φ.


**Fig. 1.** The type system for set terms and atoms.

(4) For the completeness proof, we must show that the typing judgement is invariant under branch expansion.

The modifications above ensure that the proof can be replayed through Isabelle/HOL. To actually use the calculus, we must determine the urelements of the initial formula φ, though. In other words, we have to implement an inference algorithm for our lightweight type system. The algorithm is, in essence, a simplified version of Hindley-Milner type inference so it has the same two phases: it generates constraints using syntax directed rules and then passes them to a constraint solver.

Since we are only interested in the level of a term, we can encode all constraints into the theory of 0, the successor function S, and equality (but no disequality). Note that constraints of the form l = 0 can be replaced by l = S i with i being a fresh variable. A solver for this theory is straightforward to implement and verify; nevertheless, we have to be careful that it computes the minimum assignment Γ from variables to levels that fulfils the constraints. This guarantees that a set term t is not an urelement if, and only if, Γ t > 0. Conversely, all terms s with Γ s = 0 are urelements.

### **9 Conclusion and Future Work**

We developed a formalisation of a tableau calculus for a quantifier-free fragment of set theory called **MLSS** based on a paper by Cantone and Zarba [9]. The formalisation includes an abstract description of a decision procedure that builds on the calculus. To make the decision procedure compatible with Isabelle/HOL, we extended the calculus with a lightweight type system while maintaining completeness. We also refined the abstract specification to an executable specification from which code can be generated.

In future work, we plan to implement an efficient executable specification in the style of a worklist algorithm. This specification should also generate certificates that can be replayed through Isabelle's inference kernel to facilitate the integration of the procedure into Isabelle.

**Acknowledgements.** The author thanks Kevin Kappelmann and Tobias Nipkow for their comments on a draft version of this paper and the anonymous referees for their thorough reviews.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **An Experimental Pipeline for Automated Reasoning in Natural Language (Short Paper)**

Tanel Tammet1(B) , Priit J¨arv<sup>1</sup> , Martin Verrev<sup>1</sup> , and Dirk Draheim<sup>2</sup>

<sup>1</sup> Applied Artificial Intelligence Group, Tallinn University of Technology, Tallinn, Estonia *{*tanel.tammet,priit.jarv1,martin.verrev*}*@taltech.ee <sup>2</sup> Information Systems Group, Tallinn University of Technology, Tallinn, Estonia dirk.draheim@taltech.ee

**Abstract.** We describe an experimental implementation of a logic-based end-to-end pipeline of performing inference and giving explained answers to questions posed in natural language. The main components of the pipeline are semantic parsing, integration with large knowledge bases, automated reasoning using extended first order logic, and finally the translation of proofs back to natural language. While able to answer relatively simple questions on its own, the implementation is targeting research into building hybrid neurosymbolic systems for gaining trustworthiness and explainability. The end goal is to combine machine learning and large language models with the components of the implementation and to use the automated reasoner as an interface between natural language and external tools like database systems and scientific calculations.

### **1 Introduction**

Question answering and inference using natural language is a classic A.I. area, with a long history of little success using symbolic methods, able to solve only small problems with a limited structure. The recent machine learning (ML) systems, in particular, the Large Language Model (LLM) implementations of the BERT and GPT families are, in contrast, often able to give satisfactory answers to nontrivial questions.

However, the current LLMs are neither trustworthy nor explainable. They have a well-known tendency of "hallucinating", i.e. giving wrong answers and inventing actually nonexistent entities and facts. The problems of explicitly controlling the output and giving explanations for the solutions appear to be very hard for LLMs. An optimistic view of LLMs suggests that end-to-end learning can be improved to overcome these issues, while a more pessimistic view suggests that the problems are inherent and stem from the lack of an internal world model. The proponents of the latter view propose to build hybrid neurosymbolic systems, combining machine learning and symbolic methods of various kinds. Indeed, the research in the field of neurosymbolic systems has become quite active. The recent survey [14] points to a wider interest in connecting natural language systems to external software like databases and scientific calculations.

Using logic for natural language inference (NLI) in combination with ML may potentially alleviate the problems with LLMs and provide a glue to connect external systems to natural language interfaces. However, using logic directly for processing natural language is hard, for a number of reasons:


The motivation behind the research described in the paper is the following hypothesis: all the main problems described above can be alleviated by using ML techniques tailored separately for each particular problem. The current paper does not introduce any ML techniques for the problems above. The goal of our system is to serve as a backbone for research into combining the symbolic methods with ML. Our hypothesis is that by gradual improvement and combination of the existing symbolic subsystems with ML techniques it is possible to eventually build a question answering system which has enough power, trustworthiness and explainability to be practically useful in various application areas.

In other words, the envisioned end goal of this research is neither to replace LLMs nor to verify their output, but to develop systems combining LLMs and symbolic reasoning for specific areas where it is feasible to build sets of domainspecific rules and factual databases.

### **2 Related Work**

Here we will only consider projects building a full NLP inference system. The performance of older pure symbolic or logic-based methods like LogAnswer [7] remained at the level of specific toy examples and never achieved capabilities required for wider applicability. The long-running CYC project [22], although having several successes, did not succeed with its original stated goals, which is often used as an argument against symbolic systems.

A popular area for language processing is converting human queries to SQL or SPARQL queries. These systems typically do not handle rules expressed in natural language. The projects closest to ours use reasoners with a relatively limited capacity, like BRAID [12], which uses extended SLD+ reasoner with probabilistic rules and fuzzy unification, CASPR [18], which uses an ASP reasoner incorporating default logic, NatPro [1,2], which uses a Natural Logic prover. The latter is the only such project we know to be publicly available: https://github. com/kovvalsky/prove SICK NL.

The majority of research in neurosymbolic reasoning for natural language combines ML with weak forms of symbolic systems, typically taxonomies and triple graph knowledge bases like ConceptNet [25]. We approach the problem from the less common direction: starting from the symbolic/reasoning side and moving towards ML. There are already a few research projects combining ML with reasoning in quantified first order logic, although we are not aware of any such systems being publicly available. Noteworthy projects involving quantified logic are SQuARE [4], BRAID [12] and STAR [21]. The recent work in using large language models (LLM) mapping informal proofs to formal Isabelle [17] proof sketches guiding an automated prover [34] and using LLMs directly to generate Isabelle code [11] shows clear promise in combining LLMs with provers.

### **3 Natural Language Inference and Question Answering**

The described pipeline is able to handle both the natural language inference (NLI) tasks (given a premise, determine whether a given hypothesis is true, false or indeterminate) and the closely related question answering tasks of finding a specific object matching a given criterion.

We will use a few simple examples throughout the paper. The expected answer to the first example *"If an animal likes honey, then it is probably a bear. Most bears are big, although young bears are not big. John is an animal who likes honey. Mike is a young bear. Who is big?"* is *"Likely John"*. The expected answer to the second example *"The length of the red car is 4 m. The length of the black car is 5 m. The length of the red car is less than 5 m?"* is *"True"*.

It is worth noting that these examples are solved correctly by the current (May 2023) versions of GPT: ChatGPT using the text-davinci-002 model and the API using the gpt-3.5-turbo and gpt-4 models: moreover, they are able to give a satisfactory explanation of the reasoning behind the answers. However, if we insert additional irrelevant information to the first example, our system still finds the expected answer, while none of the GPT models above give a correct answer: *"If an animal likes honey, then it is probably a bear. Most bears are big, although young bears are not big. John is an animal who likes honey. Mike is a young bear. Mike can eat a lot. Penguins are birds who cannot fly. John took the block from the colored table. The table was really nice. The robot arm lifted a blue block from the table. Who is big?"*.

Similarly, when we modify the second example by using meaningless words and adding irrelevant text, our system finds the expected answer, while all the referred GPT models give confusing answers: *"The length of the barner is 200000000 m. The length of the red foozer is 312435 m. Most barners are 1000000 m long. Sun is larger than the moon. John saw the sun rising over an enormous foozer. A huge robot filled the sky. The length of the red foozer is less than 312546 m?"* However, the answers given by GPT versions may vary over time, i.e. experiments with GPT are not reproducible.

### **4 The Question Answering Pipeline**

Our system is publicly available at http://github.com/tammet/nlpsolver. It requires Linux and should be easy to install. The implementation consists of four main software systems. The pipeline driver calls the external Stanza parser [20] from Stanford, giving a Universal Dependencies (UD, see [5]) graph, then runs the semantic parser on the UD graph, calls the reasoner, and finally builds a natural language answer along with the explanation built from the proofs given by the reasoner. The pipeline driver, parser and answer construction components consist of over 400 Kbytes of Python code. Before running the solver, a small Python server component has to be started, to initialize the external UD parser Stanza and read a commonsense knowledge base into shared memory. For reasoning the pipeline calls our commonsense reasoner GK, written in C: this is the largest and the most complex part of the pipeline. There is a separate Python program for regression tests, along with several Python files containing sub-tests, currently over 1600 separate NLI tasks. The pipeline driver is called from a command line, with a natural language text and question as a command line argument, plus a number of optional arguments to control the behaviors like the amount of output.

#### **4.1 Semantic Parsing**

The parser takes English strings of natural language text as input and outputs extended clausified first-order logic formulas encoded in JSON as proposed in JSON-LD-LOGIC [29]. The main extension is adding numerical confidence to clauses and implementing default logic [23] by including special literals to encode exceptions, as presented in our papers [28] and [27].

Parsing consists of a number of phases, each adding new structural details to the results of the previous phases. For the most part, the phases are implemented procedurally, without using explicit transformation rules: we found that the more complex aspects of translation cannot be easily expressed with the help of simple transformation rules. In particular, the correct interpretation of a sentence depends heavily on previous sentences and a collected database of objects which have been talked about.

**Conversion to Universal Dependencies (UD) Format.** We use the external Stanza parser to get the UD format dependency graphs from input sentences. Stanza itself uses pretrained neural models. We first preprocess English strings to avoid several typical mistakes of the Stanza conversion, and then use Stanza to get the UD graph. The graph is then fed to our small set of simplifying transformations returning a simplified text, which is again fed to Stanza to get the final UD graph. The simplification phase reduces the amount of complexities and edge case handling necessary in the UD-to-logic converter, and is a prime candidate for experimenting with using LLMs for simplifications.

**Converting UD to Logic.** One of the strengths of UD representation given by Stanza is a high level of detail. The first subphase of conversion is restructuring the UD graph to a semi-logical representation explicating the outward logical structure around the subject/verb, object/verb or subject/verb/object tuples. The following subphases attach different kinds of properties to words. For example, the outmost structure constructed for the sentence "Most bears are big, although young bears are not big." is

[and, svo[bear,be,big], svo[bear,be,big]] which is then extended to [and, svo[bear,be,big], svo[[props,young,bear],be,big]].

The words in these structures are key-value objects containing both the initial UD information and additional details added during the phases.

The next subphase results in the extended logic in a non-clausified form, i.e. using explicit quantifiers. The conversion uses the previous structure recursively, taking into account the details of the original UD structure to find additional critical information like articles, negation, different kinds of quantifiers etc. We follow the approach of Davidsonian semantics, introducing event identification variables, while not taking the neo-Davidsonian path of splitting all relations to their minimal components (see [33])

For the coreference resolution we calculate the weighted heuristic scores for all candidate words, using also taxonomies of Wordnet. Another inherently complex task is determining whether a noun stands for a concrete object or should be quantified over. Importantly, any object detected is stored in a special data structure with new information about the object possibly added as the parsing process proceeds.

Let us consider an example sentence "John is a nice animal who likes honey." It would be first converted to a conjunction of three formulas

```
isa(animal, c1 John)
prop(nice, c1 John, generic, generic, ctxt(Pres, 1))
def0(c1 John)
∀S (def0(c1 John) ↔
   ∃X isa(honey, X)&(∃A do2(like, c1 John, X, A, ctxt(Pres, S))))
```
The system determined that in this sentence "John" refers to a concrete object and immediately created a Skolem constant c1 John, storing it for possible later use and extension. Here it also created a new definition def0 for encoding the complex property of "John": liking honey. The properties of objects like given in the second formula above also encode the intensity of the property (slightly/very) and the comparative class: for example, saying "John is a very large animal ..." would create prop(large*,* c1 John*,* 3*,* animal*,* ctxt(Pres*,* 1)). The constant generic indicates that intensity is not known or that the property is not comparative, i.e. does not relate to a specific class. The term ctxt(Pres*,* 1) encodes contextual aspects: the present tense and a concrete situation number in a possible sequence of situations created by different actions. The variable A in the last formula is an identifier of an action, which can be given additional properties, like place, time or assistive objects of an action, in the Davidsonian style.

In the representations above we have omitted the information about confidence and the possibility of exceptions. Indeed, the sentence we looked at is considered to be certain and without exceptions. However, the first part of the sentence "Most bears are big, although young bears are not big" attaches confidence 0.85 to the formula and includes a *blocker literal* encoding an exception in the sense of default logic, along with the comparative priority of the blocker:

```
0.85 : ∀X isa(bear, X) →
 (prop(big, X, generic, generic, ctxt(Pres, 1)))∨
  block(h(bear, 1), neg(prop(big, X, generic, generic, ctxt(Pres, 1))))
```
The blocker literals are used by the GK prover to recursively check the proof candidates found, with dimishing time limits: GK uses a part of a given time limit to attempt to prove each blocker literal in the proof. Whenever a blocker is proved, the candidate proof containing the blocker is considered invalid and thus discarded; see [27] for details.

The system is also able to handle simpler questions involving sizes of sets, like "An animal had two strong legs. The animal had a strong leg?", "John has three big nice cars. John has two big cars?", and measures, like "The length of the red car is 4 m. The length of the black car is 5 m. The length of the red car is less than 5 m?". We use terms encoding the sets and measures: for example, the first sentence of the last question is translated to a formula containing a standard equality predicate, an integer and several properties involving the measure term, including the main statement 4 = count(measure1(length*,* c1 car*,* meter*,* ctxt(Pres*,* 1))

**Instance Generation.** In order to answer questions without indicating concrete objects, like "Adult bears are large animals. Cats are small animals. Who is a large animal?" we need constants representing an anonymous instance of a class, essentially a "default adult bear", a "default bear" and a "default cat". For each such object the system generates a constant along with the formulas indicating its class and properties, enabling the system to produce an answer "An adult bear".

**Question Handling.** Actual questions like "Who is big?" or "The length of the red car is less than 5 m?" require special handling. The automated reasoner GK used in the pipeline employs the well-known *answer predicate* technique to construct and output the required substitution term. All the variables in the question formula will be instantiated and output, potentially resulting in a large combination of different answers. The "Who is big?" question will be first translated to ∃X*,* Y*,* Z prop(big*,* X*,* generic*,* Y*,* Z) indicating that we are not restricting the "bigness" or context in the question. However, we do not want to enumerate different "bigness" values or contexts in the answer, thus we wrap the formula into a definition (say, def2 ) over a single variable X, and search for different substitutions into def2(X) only. Asking questions about location and time is implemented by constructing a number of questions over relations "near", "on", "at", etc.

**Clausification and Simplification.** The system contains a clausifier skolemizing the formulas and converting these to a conjunctive normal form. The clausification phase also performs several simplifications, some of which are possible due to the known properties of the constructed formulas. Since nontrivial formulas may be converted into several clauses, the clausifier decides how to spread the numeric confidence of the formula and the exception literals in the formula into the clauses.

#### **4.2 Integration with Knowledge Bases**

The knowledge base provides the world model of our reasoning system. To answer the query "Tweety is a bird. Can Tweety fly?", the system needs to have the background knowledge that birds can fly. We construct the knowledge base (KB) using default logic rules augmented with numeric confidences. A small part of the knowledge base forms a core world model and is built by hand, while the bulk of the knowledge is integrated automatically from existing common sense knowledge (CSK) sources as described in [10].

We have integrated eight published knowledge graphs: ConceptNet [25], WebChild [30], Aristo TupleKB [15], Quasimodo [24], Ascent++ [16], UnCommonSense [3], ATOMIC<sup>20</sup> <sup>20</sup> [9] and ATOMIC<sup>10</sup>*<sup>x</sup>* [32]. These CSK sources are collections of relation triples. The majority of the sources contain natural language clauses or fragments in the triple elements. We have built a specialized pattern matching semantic parser to convert the relations to first order logic rules with the default logic extensions and estimated numeric confidence. The full knowledge base contains 18.5 million rules, with over 15 million of those are related to taxonomy: inferring a property or an event from the class of an entity.

#### **4.3 Automated Reasoning**

We use our automated reasoner GK to solve the problems generated by semantic parser. The reasoner uses both the parser output and a selected subset of the world knowledge to solve the questions. Wordnet taxonomies are used to solve the precedence problem of exceptions. Large datasets are parsed, indexed and kept in shared memory for quick re-use. GK is built on top of a conventional high-performance resolution-based reasoner GKC [26] for conventional first order logic. Thus GK inherits most of the capabilities and algorithms of GKC. The main additional features of GK are following:


– Performing reasoning by analogy via employing known similarity scores of words along with exceptions.

The first four features are covered in our previous paper [28] and the following two are covered in [27]. The word similarity handling is currently in an experimental phase: the initial experiments show that a naive implementation creates an unmanageable search space explosion, and thus a layered approach is necessary.

As a simple example of the basic features, consider sentences "John is nice. John is not nice. Mike is nice. Steve is not nice." GK output to the parsed versions of the following questions will directly lead to these answers: "John is nice?": "Unknown", "Mike is nice?": "True", "Mike is not nice?": "False", "Who is nice?": "Mike", "Who is not nice?": "Steve". For a slightly more complex example, consider the earlier "If an animal likes honey, then it is probably a bear. Most bears are big, although young bears are not big. John is an animal who likes honey. Mike is a young bear. Who is big?". GK will output the following proof in JSON, where we have removed quotation marks and a number of steps:

```
{result:answer found,
```

```
answers:[
{
answer:[[$ans,some_bear]],
blockers:
  [[$block,[$,bear,1],[$not,[prop,big,some_bear,$generic,$generic,[$ctxt,Pres,1]]]]],
confidence:0.85,
positive proof:
[
...,
[7,[mp,[5,1],6,fromgoal,0.85],
  [[$block,[$,bear,1],[$not,[prop,big,some_bear,$generic,$generic,[$ctxt,Pres,1]]]],
  [$ans,some_bear]]]
]},
{
answer:[[$ans,c1_John]],
blockers:[[$block,[$,bear,1],[$not,[prop,big,c1_John,$generic,$generic,[$ctxt,Pres,1]]]],
          [$block,[$,animal,3],[$not,[isa,bear,c1_John]]]],
confidence:0.765,
positive proof:
[
[1,[in,frm_10,axiom,0.85],
   [[$block,[$,bear,1],[$not,[prop,big,?:X,$generic,$generic,[$ctxt,Pres,1]]]],
    [prop,big,?:X,$generic,$generic,[$ctxt,Pres,1]],
    [-isa,bear,?:X]]],
[2,[in,frm_9,axiom,0.9],
   [[$block,[$,animal,3],[$not,[isa,bear,?:X]]],
    [-do2,like,?:X,?:Y,?:Z,[$ctxt,Pres,1]],
    [-isa,honey,?:Y],[-isa,animal,?:X],[isa,bear,?:X]]],
...,
[18,[mp,[1,2],[17,1],fromaxiom,0.765],
  [[$block,[$,bear,1],[$not,[prop,big,c1_John,$generic,$generic,[$ctxt,Pres,1]]]],
  [$block,[$,animal,3],[$not,[isa,bear,c1_John]]],
  [prop,big,c1_John,$generic,$generic,[$ctxt,Pres,1]]]],
...,
[21,[in,frm_30,goal,1],[[-$def2,?:X],[$ans,?:X]]],
[22,[mp,[20,2],21,fromgoal,0.765],
  [[$block,[$,bear,1],[$not,[prop,big,c1_John,$generic,$generic,[$ctxt,Pres,1]]]],
   [$block,[$,animal,3],[$not,[isa,bear,c1_John]]],
   [$ans,c1_John]]]
]}
]}
```
Observe that we get two answers. The following NLP pipeline step removes the generic [[\$ans,some bear]], since the more informative [[\$ans,c1 John]] is available. Here both proofs contain only positive parts, although in the general case we may find both a positive and a negative proof, each with their own confidences. GK will throw away both the clauses produced during search and the final answers which have a summary confidence below a configurable threshold. GK will also throw away proofs which do not contain a goal clause. The confidences stemming from input sentences like "Most bears are big *...*" are taken from our ad-hoc mapping of words like "most" to numeric values. By default, "normal" rule sentences are given a confidence below one and include a blocker literal for allowing exceptions.

The answers contain blocker literals, which have been recursively checked by separate proof searches before the final proof is accepted by GK. The details of these failed searches are not shown in the final proof. Had we included the sentence "John is not big" in our example, then the proof of the first blocker of the main answer would have been found, thus disqualifying the proof and leaving us with the final answer "Likely a bear.".

#### **4.4 Answers and Explanations in Natural Language**

Answers and explanations are generated from the proof, with additional details taken from the database of objects along with their properties as detected during semantic parsing. While some of the principles were described in the previous section, there are two major tasks to perform: give a suitably detailed representation of objects in a proof (say, select between "a car", "a red car", "the red car", "Mike's car" etc.) and create a grammatically correct and easy-to-understand textual representation of clauses. The system translates clauses in a proof oneto-one to English sentences, as exemplified by the explanation generated from the previously presented proof:

```
Likely john:
Confidence 76%.
Sentences used:
(1) If an animal likes honey, then it is probably a bear.
(2) Most bears are big, although young bears are not big.
(3) John is an animal who likes honey.
(4) Who is big?
Statements inferred:
(1) If X is a bear, then X is big. Confidence 85%. Why: sentence 2.
(2) If X does like Y and Y is a honey and X is an animal, then X is a bear.
    Confidence 90%. Why: sentence 1.
(4) If John has a property def1, then John does like cs4. Why: sentence 3.
...
(18) John is big. Confidence 76%. Why: statements 1, 17.
...
(21) If X matches the query, then X is an answer. Why: the question.
(22) John is an answer. Confidence 76%. Why: statements 20, 21.
```
### **5 Performance and the Test Set**

The system has miserable performance on most well-known natural language inference or question answering benchmarks, the majority of which are oriented towards machine learning. As an exception, the performance on the antimachine-learning question set HANS [13] is ca 95%, in contrast to the ca 60% performance of LLM systems before the GPT3 family (random choice would give 50% performance). The loss of 5% of HANS is due to the wrong UD parses chosen by Stanza.

However, the system is able to solve almost all of the demonstration examples of the Allen AI ProofWriter system https://proofwriter.apps.allenai.org/ and is able to solve inference problems the current LLM systems cannot, like the examples presented in the introduction. For regression testing we have built a set of ca 1600 simple questions with answers, structured over different types of capabilities. This test set may be of use for people working towards similar goals.

The runtime for the small examples presented in the paper is ca 0.5 s on a Linux laptop with a graphics card usable by Stanza. Of this time, Stanza UD parsing takes ca 0.17 s, UD to logic takes ca 0.04 s, and the rest is spent by the reasoner. For more complex examples the reasoner may spend unlimited time, i.e. the question is rather how complex questions can be solved in a preconfigured time window. In case the size of the input problem is relatively small and a tiny world model suffices for the solution, the correct answer is found in ca 1–2 s. However, in case the system is given a large knowledge base (KB) with a size of roughly one gigabyte, and the answer actually depends on the KB, then the search space may explode and the system may fail to find answer in a reasonable time. Efficiently handling a very large knowledge base clearly requires suitable heuristics based on the semantics and interdependence of rules/facts in the KB.

### **6 Towards a Hybrid Neurosymbolic System**

Although the scope of the sentences successfully parsed and questions answered could be improved by adding more and more specialized cases to the current system, the cost/benefit ratio of this work would rapidly decrease. We'll describe the most promising avenues of extending the system with ML hybridization as we currently see them.

*Semantic Parsing.* The two main approaches would be (a) end-to-end learning from sentences directly to extended logic as exemplified in [31], and (b) using existing LLMs or training specialized LLMs to perform simplification of sentences to the level where a hand-made semantic parser is able to convert the sentence to logic. Our initial experiments with the GPT models have shown that using a suitable prompt causes the LLMs to successfully split and simplify complex sentences.

*Automated Reasoning.* Despite being optimized for large knowledge bases and performing well in reasoning competitions on such problems, our system often fails to find nontrivial proofs in reasonable time in case a large knowledge base is used. The main approaches here would be (a) learning to find a proof, based on the experience of previous proofs (see [19] for an example), (b) using machine learning along with measures of semantic relatedness of formulas to the assumption and the question (see [6]) for an example), (c) using LLMs to predict intermediate results or relevant facts and rules. A significant boost in the terms of usability could be achieved by integrating external systems like databases and scientific computing with the automated reasoners.

*The Knowledge Base.* Publicly available knowledge bases do not focus on formalizing a basic world model, arguably critical for common-sense reasoning. It is possible that a core part needs to be built by hand. On the other hand, the existing knowledge bases along with large text corpuses can be extended by creating crucial new uncertain rules using both simpler statistical methods and more complex ML techniques: see [8] for a review.

### **7 Summary and Future Work**

We have described an implementation of a full natural language inference and question answering pipeline built around an extended first order reasoner. The system is capable of understanding relatively simple sentences and giving reasonable answers to questions, including the types currently out of scope of the capabilities of LLMs. We plan to enhance the capabilities of the system by incorporating machine learning techniques to the components of pipeline, while keeping the overall architecture, including the semantic parser, word knowledge and a reasoner. At the time of this writing we are experimenting with using off-the-shelf LLMs without finetuning, but with a suitable prompt, to split and simplify complex sentences to a degree where our semantic parser is able to properly convert the meaning of the resulting sentences to logic.

### **References**


et al. (eds.) Proceedings of CIKM 2019 - the 28th ACM International Conference on Information and Knowledge Management, pp. 1411–1420. ACM (2019)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Combining Combination Properties: An Analysis of Stable Infiniteness, Convexity, and Politeness

Guilherme V. Toledo1(B) , Yoni Zohar<sup>1</sup> , and Clark Barrett<sup>2</sup>

<sup>1</sup> Bar-Ilan University, Ramat Gan, Israel guivtoledo@gmail.com, yoni.zohar@cs.tau.ac.il <sup>2</sup> Stanford University, Stanford, USA barrett@cs.stanford.edu

Abstract. We make two contributions to the study of theory combination in satisfiability modulo theories. The first is a table of examples for the combinations of the most common model-theoretic properties in theory combination, namely stable infiniteness, smoothness, convexity, finite witnessability, and strong finite witnessability (and therefore politeness and strong politeness as well). All of our examples are sharp, in the sense that we also offer proofs that no theories are available within simpler signatures. This table significantly progresses the current understanding of the various properties and their interactions. The most remarkable example in this table is of a theory over a single sort that is polite but not strongly polite (the existence of such a theory was only known until now for two-sorted signatures). The second contribution is a new combination theorem showing that in order to apply polite theory combination, it is sufficient for one theory to be stably infinite and strongly finitely witnessable, thus showing that smoothness is not a critical property in this combination method. This result has the potential to greatly simplify the process of showing which theories can be used in polite combination, as showing stable infiniteness is considerably simpler than showing smoothness.

Keywords: Satisfiability modulo theories · Theory combination · Theory politeness

### 1 Introduction

Theory combination focuses on the following problem: given procedures for determining the satisfiability of formulas over individual theories, can we find a procedure for the combined theory? One of the foundational results in this field is in Nelson and Oppen's paper [9], where the authors show how to combine theories with disjoint signatures as long as they are both stably infinite, i.e., for every quantifier-free formula that is satisfied in the theory, there is an infinite interpretation of the theory that satisfies it.

With the introduction of stable infiniteness was born the notion of identifying model-theoretic properties that enable theory combination. It soon became clear, however, that this first step was insufficient, since some important theories with real-world applications (like the theories of bit-vectors and finite datatypes) turned out not to be stably infinite. Early attempts to find alternatives for stable infiniteness in theory combination included the introduction of gentle [5], shiny [12], and flexible [7] theories. We focus here on the notion of *politeness*, which forms the basis for theory combination in the state-of-the-art SMT solver cvc5 [1].

First considered in [10], polite theories were originally defined as those theories that are both smooth and finitely witnessable. Both notions are much harder to test for than stable infiniteness, but once a theory is known to be polite, it can be combined with any other theory, even non-stably-infinite ones.

A small problem in the proof of the main result of the paper was corrected in later work [6]. This paper introduces a slightly different, more strict, definition of politeness, together with a correct proof showing that polite theories can be combined with arbitrary theories. Following [4], we refer to theories satisfying the new definition as *strongly* polite, which is defined as being both smooth and *strongly finitely witnessable*; with that in mind, we call theories satisfying the earlier definition simply *polite*.

For some time, it was not known whether there exists a theory that is polite but not strongly polite. Then, in 2021 Sheng et al. [11] provided an example. This suggests the need for a more thorough analysis of properties such as stable infiniteness, smoothness, finite witnessability, and strong finite witnessability, as they appear to interact with each other in sometimes surprising or unforeseeable ways. We add to this list *convexity*, which was shown to be closely related to stable infiniteness in [2].

In this paper, we provide an exhaustive analysis, with examples whenever possible, of whether and how these properties can coexist. Some combinations are obviously impossible, such as a strongly finitely witnessable theory that is not finitely witnessable; the feasibility of other combinations is more elusive; for instance, it is initially unclear whether there can be a one-sorted, non-stablyinfinite theory that is also not finitely witnessable (we show that this is also impossible). A main result is a comprehensive table describing what is known about all possible combinations of these properties.

During the course of filling the table, we were also able to improve polite combination: by making the involved proof slightly more difficult, we can simplify the main polite theory combination result: we show that in order to combine theories, it is enough for one theory to be stably infinite and strongly finitely witnessable; there is no need for smoothness. This result simplifies the process of qualifying a theory for polite combination, as showing stable infiniteness is considerably simpler than showing smoothness.

The paper is organized as follows. Section 2 defines the basic notions we will make use of throughout the paper. Section 3 proves several theorems showing the unfeasibility of certain combinations of properties. Section 4 describes the example theories that populate the feasible entries of the table. Section 5 offers a new combination theorem. And finally, Sect. 6 gives concluding remarks and directions for future work.<sup>1</sup>

<sup>1</sup> Due to space limitations, proofs are included in an appendix to [13].

$$\psi\_{\geq n}^{\sigma} = \exists \rightleftarrows \bigwedge\_{1 \leq i < j \leq n} \neg(x\_i = x\_j), \quad \psi\_{\leq n}^{\sigma} = \exists \rightrightarrows \forall y. \bigvee\_{i=1}^{n} y = x\_i, \quad \psi\_{= n}^{\sigma} = \psi\_{\geq n}^{\sigma} \land \psi\_{\leq n}^{\sigma}$$

Fig. 1. Cardinality Formulas. −→x stands for x1,...,x*n*.

### 2 Preliminary Notions

#### 2.1 First-Order Signatures and Structures

A many-sorted signature Σ is a triple formed by a countable set S<sup>Σ</sup> of *sorts*, a countable set of function symbols F<sup>Σ</sup>, and a countable set of predicate symbols P<sup>Σ</sup> which contains, for every sort σ ∈ S<sup>Σ</sup>, an equality symbol =<sup>σ</sup> (often denoted by =); each function symbol has an arity σ<sup>1</sup> ×···× σ<sup>n</sup> → σ and each predicate symbol an arity <sup>σ</sup>1×···×σn, where <sup>σ</sup>1,...,σn, σ ∈ S<sup>Σ</sup> and <sup>n</sup> <sup>∈</sup> <sup>N</sup>. Each equality symbol =<sup>σ</sup> has arity σ × σ. A signature with no function or predicate symbols other than equalities is called *empty*.

A many-sorted signature Σ is *one-sorted* if S<sup>Σ</sup> has one element; we may refer to many-sorted signatures simply as signatures. Two signatures are said to be *disjoint* if they share only sorts and equality symbols.

We assume for each sort in S<sup>Σ</sup> a distinct countably infinite set of variables, and define terms, literals, and formulas (atomic or not) in the usual way. If s is a function symbol of arity σ → σ and x is a variable of sort σ, we define recursively the term <sup>s</sup><sup>k</sup>(x), for <sup>k</sup> <sup>∈</sup> <sup>N</sup>, as follows: <sup>s</sup><sup>0</sup>(x) = <sup>x</sup>, and <sup>s</sup><sup>k</sup>+1(x) = <sup>s</sup>(s<sup>k</sup>(x)). We denote the set of free variables of sort σ in a formula ϕ by *vars*σ(ϕ), and given S ⊆ S<sup>Σ</sup>, *vars*S(ϕ) = - <sup>σ</sup>∈<sup>S</sup> *vars*σ(ϕ) (we use *vars*(φ) as shorthand for *vars*S<sup>Σ</sup> ).

A Σ*-structure* A is composed of sets σ<sup>A</sup> for each sort σ ∈ S<sup>Σ</sup>, called the *domain of* σ, equipped with interpretations f<sup>A</sup> and P <sup>A</sup> of the function and predicate symbols, in a way that respects their arities. Furthermore, =<sup>A</sup> <sup>σ</sup> must be the identity on σA.

A Σ*-interpretation* A is an extension of a Σ-structure that also interprets variables, with the value of a variable x of sort σ being an element x<sup>A</sup> of σA; we will sometimes say that an interpretation B is an interpretation on a structure A (over the same signature) to mean that B has A as its underlying structure. We write α<sup>A</sup> for the interpretation of the term α under A; if Γ is a set of terms, we define Γ <sup>A</sup> = {α<sup>A</sup> : α ∈ Γ}. We write A ϕ if A satisfies ϕ. A formula ϕ is called *satisfiable* if it is satisfied by some interpretation A.

We shall make use of standard cardinality formulas, given in Fig. 1. ψ<sup>σ</sup> <sup>≥</sup><sup>n</sup> is only satisfied by a structure <sup>A</sup> if <sup>|</sup>σ<sup>A</sup><sup>|</sup> is at least <sup>n</sup>, <sup>ψ</sup><sup>σ</sup> <sup>≤</sup><sup>n</sup> is only satisfied by <sup>A</sup> if <sup>|</sup>σ<sup>A</sup><sup>|</sup> is at most <sup>n</sup>, and <sup>ψ</sup><sup>σ</sup> <sup>=</sup><sup>n</sup> is only satisfied by A if |σ<sup>A</sup>| is exactly n. In one-sorted signatures, we may drop σ from the formulas, giving us ψ≥<sup>n</sup>, ψ≤<sup>n</sup> and ψ=<sup>n</sup>.

The following lemmas are generalizations of the standard compactness and downward Skolem-Löwenheim theorems of first-order logic to the many-sorted case. They are proved in [8].

Lemma 1 ([8]). *A set of formulas is satisfiable iff each of its finite subsets is satisfiable.*

Lemma 2 ([8]). *If a set of formulas is satisfiable, there exists an interpretation* A *which satisfies it and where* σ<sup>A</sup> *is countable whenever it is infinite, for every sort* σ*.*

A *theory* T is a class of all Σ-structures that satisfy some set of closed formulas (formulas without free variables), called the *axiomatization* of T which we denote as *Ax*(T ); such structures will be called the *models* of T , a model being called *trivial* when σ<sup>A</sup> is a singleton for some sort σ in S<sup>Σ</sup>. A Σ-interpretation A whose underlying structure is in T is called a T -interpretation. A formula is said to be T *-satisfiable* if there is a T -interpretation that satisfies it; a set of formulas is T -satisfiable if there is a T -interpretation that satisfies each of its elements. Two formulas are T *-equivalent* when every T -interpretation satisfies one if and only if it satisfies the other. We write <sup>T</sup> ϕ and say that ϕ is T -valid if A ϕ for every T -interpretation A. Let Σ<sup>1</sup> and Σ<sup>2</sup> be disjoint signatures; by Σ = Σ<sup>1</sup> ∪ Σ2, we mean the signature with the union of the sorts, function symbols, and predicate symbols of Σ<sup>1</sup> and Σ2, all arities preserved. Given a Σ1-theory T<sup>1</sup> and a Σ2-theory T2, the Σ<sup>1</sup> ∪ Σ2-theory T = T<sup>1</sup> ⊕ T<sup>2</sup> is the theory axiomatized by the union of the axiomatizations of T<sup>1</sup> and T2.

#### 2.2 Model-Theoretic Properties

Let Σ be a signature. A Σ-theory T is said to be *stably infinite* w.r.t. S ⊆ S<sup>Σ</sup> if, for every T -satisfiable quantifier-free formula φ, there exists a T -interpretation A satisfying φ such that, for each σ ∈ S, σ<sup>A</sup> is infinite. T is *smooth* w.r.t. S ⊆ S<sup>Σ</sup> when, for every quantifier-free formula φ, T -interpretation A satisfying φ, and function κ from S to the class of cardinals such that κ(σ) ≥ |σ<sup>A</sup>| for every σ ∈ S, there exists a T -interpretation B satisfying φ with |σ<sup>B</sup>| = κ(σ), for every σ ∈ S.

Theorem 1. *Let* Σ *be a signature,* S ⊆ S<sup>Σ</sup>*, and* T *a* Σ*-theory. If* T *is smooth w.r.t.* S*, then it is also stably infinite w.r.t.* S*.*

For a finite set of sorts S, finite sets of variables V<sup>σ</sup> of sort σ for each σ ∈ S, and equivalence relations E<sup>σ</sup> on Vσ, the arrangement on V = - <sup>σ</sup>∈<sup>S</sup> <sup>V</sup><sup>σ</sup> induced by E = - <sup>σ</sup>∈<sup>S</sup> <sup>E</sup>σ, denoted by <sup>δ</sup><sup>V</sup> or <sup>δ</sup><sup>E</sup> <sup>V</sup> , is the quantifier-free formula given by δ<sup>V</sup> = <sup>σ</sup>∈<sup>S</sup> xE<sup>σ</sup> <sup>y</sup>(x = y) ∧ xEσ<sup>y</sup> ¬(x = y) , where E<sup>σ</sup> denotes the complement of the equivalence relation Eσ.

A theory T is said to be *finitely witnessable* w.r.t. the set of sorts S ⊆ S<sup>Σ</sup> when there exists a function *wit*, called a *witness*, from the quantifier-free formulas into themselves that is computable and satisfies for every quantifierfree formula <sup>φ</sup>: (i) <sup>φ</sup> and <sup>∃</sup> −→w .*wit*(φ) are <sup>T</sup> -equivalent, where −→<sup>w</sup> <sup>=</sup> *vars*(*wit*(φ)) \ *vars*(φ); and (ii) if *wit*(φ) is T -satisfiable, then there exists a T -interpretation A satisfying *wit*(φ) such that σ<sup>A</sup> = *vars*σ(*wit*(φ))<sup>A</sup> for each σ ∈ S. T is said to be *strongly finitely witnessable* if it has a strong witness *wit*, which has the properties of a witness with the exception of (ii), satisfying instead: (ii ) given a finite set of variables V and an arrangement δ<sup>V</sup> on V , if *wit*(φ) ∧ δ<sup>V</sup> is T satisfiable, then there exists a T -interpretation A satisfying *wit*(φ) ∧ δ<sup>V</sup> such that σ<sup>A</sup> = *vars*σ(*wit*(φ) ∧ δ<sup>V</sup> <sup>A</sup> for all <sup>σ</sup> <sup>∈</sup> <sup>S</sup>.

From the definitions, the following theorem directly follows:

Theorem 2. *Let* Σ *be a signature,* S ⊆ S<sup>Σ</sup>*, and* T *a* Σ*-theory. If* T *is strongly finitely witnessable w.r.t.* S *then it is also finitely witnessable w.r.t.* S*.*

A theory that is both smooth and finitely witnessable w.r.t. (a set of sorts) S is said to be *polite* w.r.t. S; a theory that is both smooth and strongly finitely witnessable w.r.t. S is called *strongly polite* w.r.t. S. For theories over one-sorted empty signatures, we have the following theorem from [11]:

Theorem 3 ([11]). *Every one-sorted theory over the empty signature that is polite w.r.t. its only sort is strongly polite w.r.t. that sort.*

A one-sorted theory T is said to be *convex* if, for any conjunction of literals <sup>φ</sup> and any finite set of variables {u1, v1, ..., un, v<sup>n</sup>}, <sup>T</sup> <sup>φ</sup> <sup>→</sup> <sup>n</sup> <sup>i</sup>=1 u<sup>i</sup> = v<sup>i</sup> implies <sup>T</sup> φ → u<sup>i</sup> = vi, for some i ∈ [1, n].

Given a one-sorted theory T , its *mincard* function takes a quantifier-free formula φ and returns the countable cardinal min{|σ<sup>A</sup>| : A is a T -interpretation that satisfies φ}. 2

Throughout this paper, we will use SI for stably infinite, SM for smooth, FW for finitely witnessable, SW for strongly finitely witnessable, and CV for convex.

### 3 Negative Results

If it were possible, we would present examples of every combination of properties using only the one-sorted empty signature, which is the simplest signature imaginable.

Of course, this is not always possible: smooth theories are necessarily stably infinite, and strongly finitely witnessable theories are obligatorily finitely witnessable. But there are several other connections we now proceed to show, which further restrict the combinations of properties that are possible.

In Sect. 3.1, we show that, under reasonable conditions, a convex theory must be stably infinite, while the reciprocal is also true over the empty signature. In Sect. 3.2, we show that over the empty one-sorted signature, theories that are not stably infinite are necessarily finitely witnessable (a somewhat counter-intuitive result, since we usually look for theories that are, simultaneously, smooth and strongly finitely witnessable) and, more importantly, that stably-infinite and strongly finitely witnessable one-sorted theories are also strongly polite.

<sup>2</sup> Note that this definition was generalized in two different ways to the many-sorted case in [4] and [10]. However, for our investigation, the single-sorted case is enough.

#### 3.1 Stable-Infiniteness and Convexity

Convexity is typically defined over one-sorted signatures. Here we offer the following generalization to arbitrary signatures.

Definition 1. *A theory* T *is said to be convex w.r.t. a set of sorts* S ⊆ S<sup>Σ</sup> *if, for any conjunction of literals* φ *and any finite set of variables* {u1, v1, ..., un, vn} *with sorts in* <sup>S</sup>*, if* <sup>T</sup> <sup>φ</sup> <sup>→</sup> <sup>n</sup> <sup>i</sup>=1 u<sup>i</sup> = v<sup>i</sup> *then* <sup>T</sup> φ → u<sup>i</sup> = vi*, for some* i ∈ [1, n]*.*

If we assume, as it is often natural to, that our theories have no trivial models, then convexity implies stable infiniteness. This is true for the one-sorted case, as proved in [2], but also for the many-sorted case as we show here. The proof is similar, though here we need to account for several sorts at once. In particular, the proof relies on Lemma 1.

Theorem 4. *If a* Σ*-theory* T *is convex w.r.t. some set* S *of sorts and, for each* <sup>σ</sup> ∈ S*,* <sup>T</sup> <sup>ψ</sup><sup>σ</sup> <sup>≥</sup><sup>2</sup>*, then* <sup>T</sup> *is stably infinite w.r.t.* <sup>S</sup>*.*

Reciprocally, we may also obtain convexity from stable infiniteness, but only over empty signatures.

Theorem 5. *Any theory over an empty signature that is stably infinite w.r.t. the set of all of its sorts is convex w.r.t. any set of sorts.*

As we shall see in Sect. 4, this result is tight: there are theories over non-empty signatures that are stably infinite but not convex.

#### 3.2 More Connections

We next present more connections between the properties. First, over the onesorted empty signature, a theory must be either stably infinite or finitely witnessable.

Theorem 6. *Every one-sorted, non-stably-infinite theory* T *with an empty signature is finitely witnessable w.r.t. its only sort.*

The following theorem shows that for one-sorted theories, strong politeness is a corollary of strong finite witnessability and stable infiniteness (rather than smoothness).

Theorem 7. *Every one-sorted theory that is stably infinite and strongly finitely witnessable w.r.t. its only sort is smooth, and therefore strongly polite w.r.t. that sort.*

Generalizing this theorem to the case of many-sorted signatures is left for future work.

Finally, by combining previous results, we can also get the following theorem, which relates stable infiniteness, strong finite witnessability, and convexity.

Fig. 2. A diagram of combinations over a one-sorted, empty signature: gray regions are empty.

Theorem 8. *A one-sorted theory* T *with an empty signature that is neither strongly finitely witnessable nor stably infinite w.r.t. its only sort cannot be convex.*

To summarize, while Theorem 4 is restricted to structures with no domains of cardinality 1, the remaining theorems of this section are not restricted to such structures. Theorem 5 applies to empty signatures, Theorem 7 applies to one-sorted signatures, and Theorems 6 and 8 apply to signatures that are both empty and one-sorted. Put together, we see that many combinations of properties for theories over a one-sorted empty signature are actually impossible. This is depicted in Fig. 2, in which all areas but the white ones are empty. For example, Theorem 6 shows that the area outside the SI and FW circles (representing theories that are neither stably infinite nor finitely witnessable) is empty, as every theory (over an empty one-sorted signature) must have one of these properties. Similarly, Theorem 8 further shows that within the CV (convex) circle, even more is empty, namely anything outside the SI and SW circles.

### 4 Positive Results

We now proceed to systematically address all possible combinations of stableinfiniteness, smoothness, finite witnessability, strong finite witnessability, and convexity.

The results are summarized in Table 1. Each row corresponds to a possible combination of properties, as determined by the truth values in the first five columns. For example, in the first row, the entries in the first five columns are all true, indicating that in this row, all theory examples must be stably-infinite, smooth, finitely witnessable, strongly finitely witnessable, and convex. The rest of the columns correspond to different possibilities for the theory signatures: either empty or non-empty, and either one-sorted or many-sorted. Again, looking at the first row, we see four different theories listed, one for each of the signature possibilities.

Some entries in the table list theorems instead of providing example theories. The listed theorems tell us that there do not exist any example theories for these entries. For example, lines 3 and 4 cannot provide examples over a one-sorted empty signature because of Theorem 3.

When an example is available, its name is given in corresponding cell of the table. The theories themselves are defined in Sect. 4.1 to 4.4. The examples on lines 25, 27 and 31 must have at least one structure with a trivial domain (i.e., a domain with exactly one element) because of Theorem 4.

Lines 9, 10, 13, and 14 cover theories that are stably infinite and strongly finitely witnessable but not smooth. We call these *unicorn theories* because we could not find any such theories, nor do we believe they exist, but (ignoring the obvious cases ruled out by Theorems 2, 5 and 7) we have no proof that they do not exist.

Definition 2. *A* unicorn theory *is stably infinite and strongly finitely witnessable but not smooth.*

Theorem 7 shows that there are no one-sorted unicorn theories. We believe it may be possible to provide a generalization of the upwards Löwenheim-Skolem theorem to many-sorted logic in such a way that it would prove the non-existence of unicorn theories, which leads to the following conjecture:

Conjecture 1. There are no unicorn theories.

Before defining the theories of Table 1, we introduce the following signatures.

Definition 3. Σ<sup>1</sup> *is the empty one-sorted signature with sort* σ*,* Σ<sup>2</sup> *is the empty two-sorted signature with sorts* σ *and* σ2*, and* Σ<sup>s</sup> *is the one-sorted signature with a single unary function symbol* s*.*

We now describe the theories: Sect. 4.1 describes the theories that are over the empty one-sorted signature; Sect. 4.2 then continues to the next column, describing theories over many-sorted empty signatures. Some build on the theories of the previous column, but some are also new. Section 4.3 describes the next column, one-sorted theories over a non-empty signature. Here, we use two constructions to generate new theories from previously introduced ones. One construction adds a function symbol to an empty signature (in a way that preserves all properties), and the second preserves all properties but convexity, making it possible to construct non-convex examples in a uniform way. We also present new theories when the constructions are not sufficient. Finally, Sect. 4.4 describes theories over non-empty many-sorted signatures.<sup>3</sup>

<sup>3</sup> Proofs that each theory has the claimed properties can be found in the appendix to [13].

Table 1. Summary of all possible combinations of theory properties. Shaded cells represent impossible combinations. In line 26: n > 1; in line 28: m > 1, n > 1 and |m − n| > 1.


#### 4.1 Theories over the One-Sorted Empty Signature


#### Table 3. Σ2-theories


The axiomatizations for theories over the one-sorted empty signature Σ<sup>1</sup> are given in Table 2. We briefly describe them here.

For each n > 0, T≥<sup>n</sup> includes all structures with domains of cardinality at least n; T<sup>∞</sup> is the theory including all structures whose domains are infinite; T <sup>∞</sup> *even* has structures with either an even or an infinite number of elements in their domains and was defined in [11], where it was proved to be finitely witnessable, but neither smooth nor strongly finitely witnessable. The proofs justifying Table 1 show additionally that it is stably infinite and convex. Tn,<sup>∞</sup> contains those structures whose domains have either exactly n or an infinite number of elements; T≤<sup>n</sup> includes all structures with at most n elements in their domains; and for positive integers m and n, Tm,n has structures whose domains have either precisely m elements, or precisely n elements. This completes the first column of theory examples.

*Example 1.* The theory T≥<sup>n</sup> admits all considered properties, while Tm,n admits only finite witnessability.

#### 4.2 Theories over the Two-Sorted Empty Signature

We next introduce the theories over empty two-sorted signatures. For many cases, we can simply add a trivial sort to one of the theories defined in Sect. 4.1. When this is not possible, we introduce new theories.

Adding a Sort to a Theory. Any Σ1-theory can be used to generate a Σ2 theory simply by adding the sort σ<sup>2</sup> to the signature (without changing the axiomatization). This is formalized as follows:

Definition 4. *Let* <sup>T</sup> *be a* <sup>Σ</sup>1*-theory.* (<sup>T</sup> )<sup>2</sup> *is the* <sup>Σ</sup>2*-theory axiomatized by Ax*(T )*.*

Lemma 3. *A* Σ1*-theory* T *is stably infinite, smooth, finitely witnessable, strongly finitely witnessable, or convex w.r.t.* {σ} *if and only if* (<sup>T</sup> )<sup>2</sup> *is, respectively, stably infinite, smooth, finitely witnessable, strongly finitely witnessable, or convex w.r.t.* {σ, σ2}*.*

Using Definition 4 and Lemma 3, we can populate many lines in the second column of examples by extending the corresponding theory from the previous column.

*Example 2.* (T≥<sup>n</sup>)<sup>2</sup> is a theory over two sorts, <sup>σ</sup> and <sup>σ</sup>2, whose structures must have at least n elements in the domain of σ (but have no restrictions on the size of the domain of σ2). As seen in the first line of Table 1, T≥<sup>n</sup> admits all the considered properties. By Lemma 3, so does (T≥<sup>n</sup>)<sup>2</sup>.

Additional Theories over *Σ***2**. On some lines, e.g., line 3, there is no Σ1-theory to extend. In such cases, we cannot use Definition 4 to construct a many-sorted variant.

We introduce the theories shown in Table 3 to cover these cases. The theory T2,<sup>3</sup> contains two kinds of structures: (i) structures whose domains both have at least 3 elements; and (ii) structures with exactly two elements in the domain of <sup>σ</sup> and an infinite number of elements in the domain of <sup>σ</sup>2. The theory <sup>T</sup> odd <sup>1</sup> has structures with exactly one element in the domain of σ and either an odd or an infinite number of elements in the domain of σ2. The theory T <sup>∞</sup> <sup>1</sup> is similar: it has structures with exactly one element in the domain of σ and an infinite number of elements in the domain of σ2. Finally, T <sup>∞</sup> <sup>2</sup> is similar to T <sup>∞</sup> <sup>1</sup> except that its structures have exactly 2 elements in the domain of σ.

*Example 3.* The theory T2,<sup>3</sup> was first defined in [4] and later used in [11], where it was proved to be polite (and therefore smooth, stably infinite, and finitely witnessable) without being strongly polite (and therefore not strongly finitely witnessable). The justification proofs for Table 1 show that T2,<sup>3</sup> is convex as well.<sup>4</sup>

### 4.3 Theories over a One-Sorted Non-empty Signature

We continue to the next column, with one-sorted non-empty signatures. Section 4.3 shows how to construct non-empty theories from one-sorted theories over the empty signature, while preserving all their properties. In Sect. 4.3, we provide a similar construction which generates non-convex theories from the theories in the first column of examples. And in Sect. 4.3, we introduce additional theories not captured by the above constructions. Two of these theories are described in more detail in Sect. 4.3.

Extending a Theory with a Unary Function Symbol While Preserving Properties. Whenever we have a theory over an empty signature, we can construct a variant of it over a non-empty signature by introducing a function symbol and interpreting it as the identity function. This extension preserves all the properties that we consider. This is formalized as follows.

Definition 5. *Let* Σ<sup>n</sup> *be an empty signature with sorts* S = {σ1,...,σ<sup>n</sup>}*, and let* <sup>T</sup> *be a* <sup>Σ</sup>n*-theory. The signature* <sup>Σ</sup><sup>n</sup> <sup>s</sup> *has sorts* S *and a single unary function symbol* <sup>s</sup> *of arity* <sup>σ</sup><sup>1</sup> <sup>→</sup> <sup>σ</sup>1*, and* (<sup>T</sup> )<sup>s</sup> *is the* <sup>Σ</sup><sup>n</sup> <sup>s</sup> *-theory axiomatized by Ax*(T ) ∪ {∀ x. [s(x) = x]}*, where* x *is a variable of sort* σ1*.*

Lemma 4. *For every theory* T *over an empty signature* Σ<sup>n</sup> *with sorts* S = {σ1,...,σ<sup>n</sup>}*:* T *is stably infinite, smooth, finitely witnessable, strongly finitely witnessable, or convex w.r.t.* S *if and only if* (T )<sup>s</sup> *is, respectively, stably infinite, smooth, finitely witnessable, strongly finitely witnessable, or convex w.r.t.* S*.*

We use the operator (·)<sup>s</sup> in various places in Table 1 in order to obtain examples in non-empty signatures from existing examples over Σ<sup>1</sup> and Σ2.

*Example 4.* (T≥<sup>n</sup>)<sup>s</sup> is a one-sorted theory, whose structures have at least n elements and interpret the function symbol s as the identity. As seen above, T≥<sup>n</sup> admits all the considered properties. By Lemma 4, so does (T≥<sup>n</sup>)s.

<sup>4</sup> We thank Oded Padon for raising the question of whether there exists a theory that is polite and convex, but not strongly polite.

Making a Theory Non-convex. The last general construction that we present aims at taking a theory and creating a non-convex variant of it while preserving the other properties we consider. This can be done with the addition of a single unary function symbol s. To define such a theory, we make use of the formula ψ<sup>∨</sup> from Fig. 3. Intuitively, ψ<sup>∨</sup> states that in an interpretation A in which it holds, sA(sA(a)) must equal either sA(a) or a itself; in other words, either a = sA(a) = sA(sA(a)), a = sA(sA(a)) = sA(a), or a = sA(a) = sA(sA(a)), as shown in Fig. 4.

$$\psi\_{\vee} = \forall x. \; \left[ \left( s^2(x) = x \right) \vee \left( s^2(x) = s(x) \right) \right]$$

Fig. 3. The formula ψ<sup>∨</sup> for non-convex theories.

This is especially useful for defining non-convex theories, since (s<sup>2</sup>(x) = <sup>x</sup>) <sup>∨</sup> (s<sup>2</sup>(x) = s(x)) is valid in the theory, but neither s<sup>2</sup>(x) = x nor s<sup>2</sup>(x) = s(x) is. Notice, of course, that non-convexity is only possible when there are at least two elements available in the domain – otherwise, all equalities are satisfied.

Fig. 4. Possible scenarios when ψ<sup>∨</sup> holds.

Definition 6. *Let* T *be a theory over an empty signature with sorts* S = {σ1,...,σ<sup>n</sup>}*. Then* (<sup>T</sup> )<sup>∨</sup> *is the* <sup>Σ</sup><sup>n</sup> <sup>s</sup> *-theory axiomatized by Ax*(T ) ∪ {ψ∨}*.*

Lemma 5. *Let* T *be a theory over an empty signature* Σ<sup>n</sup> *with sorts* S = {σ1,...,σ<sup>n</sup>}*. Then:* (T )<sup>∨</sup> *is stably infinite, smooth, finitely witnessable, or strongly finitely witnessable w.r.t.* S *if and only if* T *is, respectively, stably infinite, smooth, finitely witnessable, or strongly finitely witnessable w.r.t.* S*. In addition, if* T *has a model* A *with* |σ<sup>A</sup> <sup>1</sup> | ≥ 2*,* (T )<sup>∨</sup> *is not convex with respect to* S*.*

*Example 5.* The theory (T≥<sup>n</sup>)<sup>∨</sup> is one-sorted, and its structures have at least n elements. they interpret the symbol s in a way that satisfies ψ∨. In particular, for each element a of the domain, one of the scenarios from Fig. 4 holds. According to Lemma 5, since T≥<sup>n</sup> admits all properties, (T≥<sup>n</sup>)<sup>∨</sup> admits all properties but convexity.

Additional Theories over *Σs* . Whenever there is a Σ1-theory with some properties, we can obtain a Σ<sup>s</sup> theory with the same properties using one of the techniques above. To cover cases for which there is no corresponding Σ1-theory, we use the theories presented in Table 4 and described below.


Table 4. Σ*s*-theories

We start with <sup>T</sup> <sup>=</sup> *odd* , <sup>T</sup> <sup>=</sup> <sup>1</sup>,∞, and <sup>T</sup> <sup>=</sup> <sup>2</sup>,∞, deferring the discussion on <sup>T</sup><sup>f</sup> and <sup>T</sup> <sup>s</sup> f to Sect. 4.3. The theory <sup>T</sup> <sup>=</sup> *odd* has structures A with either an infinite or an odd number of elements and with the property that if A is not trivial, then sA(a) = a for all <sup>a</sup> <sup>∈</sup> <sup>σ</sup>A. The theory <sup>T</sup> <sup>=</sup> <sup>1</sup>,<sup>∞</sup> has all structures <sup>A</sup> that either: (i) are trivial; or (ii) have infinitely many elements and for which sA(a) = a for each a ∈ σA. Similarly, <sup>T</sup> <sup>=</sup> <sup>2</sup>,<sup>∞</sup> has structures <sup>A</sup> that either: (i) have exactly two elements and interpret s as the identity; or (ii) have infinitely many elements and interpret s in such a way that sA(a) = a for all a ∈ σA.

On the Theories *<sup>T</sup> <sup>f</sup>* and *<sup>T</sup> <sup>s</sup> <sup>f</sup>* . We now introduce the theories <sup>T</sup><sup>f</sup> and <sup>T</sup> <sup>s</sup> <sup>f</sup> . The importance of these theories is that both of them are *one-sorted* theories that are polite but not strongly polite (the first is also convex and the second is not). Their existence improves on the result of [11], which introduced a *two-sorted* theory that is polite but not strongly polite (namely T2,3).

For their axiomatizations, we use the formulas from Fig. 5, in which s is a unary function symbol. ψ<sup>=</sup> <sup>≥</sup><sup>n</sup> (ψ<sup>=</sup> <sup>=</sup><sup>n</sup>) states that a structure A has at least (exactly) n elements a satisfying sA(a) = a; similarly, ψ <sup>=</sup> <sup>≥</sup><sup>n</sup> (<sup>ψ</sup> <sup>=</sup> <sup>=</sup><sup>n</sup>) states that a structure A has at least (exactly) n elements a satisfying sA(a) = a.

Further, the axiomatization requires a function f from positive integers to {0, 1} that is not computable with the property that for k > 0, f maps half of the numbers in the interval [1, 2<sup>k</sup>] to 1 and the other half to 0. The existence of such a function is formalized below. We start by defining counting functions f<sup>0</sup> and f1.

Definition 7. *Let* <sup>f</sup> : <sup>N</sup> \ {0}→{0, <sup>1</sup>}*. For* <sup>i</sup> ∈ {0, <sup>1</sup>} *and* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*,* <sup>f</sup>i(n) *is defined by:* <sup>f</sup>i(n) = <sup>|</sup><sup>f</sup> <sup>−</sup><sup>1</sup>(i) <sup>∩</sup> [1, n]|*.*

Intuitively, f0(n) counts how many numbers between 1 and n (inclusive) are mapped by f to 0 and f1(n) counts how many are mapped to 1. Because f(n) always equals 0 or 1, it is easy to see that for every n > 0, n = f1(n) + f0(n).

$$\begin{split} \psi\_{\geq n}^{\mp} &= \exists \ \overline{x}^{\dagger}. [\bigwedge\_{i=1}^{n} p(x\_{i}) \wedge \delta\_{n}], \qquad \psi\_{\geq n}^{\prime} = \exists \ \overline{x}^{\dagger}. [\bigwedge\_{i=1}^{n} \neg p(x\_{i}) \wedge \delta\_{n}], \\ \psi\_{\leq n}^{\pm} &= \exists \ \overline{x}^{\dagger}. [\delta\_{n} \wedge \bigwedge\_{i=1}^{n} p(x\_{i}) \wedge \forall x. \ [p(x) \rightarrow \bigvee\_{i=1}^{n} x = x\_{i}]], \\ \psi\_{\leq n}^{\ast} &= \exists \ \overline{x}^{\dagger}. [\delta\_{n} \wedge \bigwedge\_{i=1}^{n} \neg p(x\_{i}) \wedge \forall x. \ [\neg p(x) \rightarrow \bigvee\_{i=1}^{n} x = x\_{i}]]. \end{split}$$

Fig. 5. Cardinality formulas for signatures with a unary function symbol s. −→x stands for x1,...,x*n*, p(x) for s(x) = x, and δ*<sup>n</sup>* for <sup>1</sup>≤*i<j*≤*<sup>n</sup>* <sup>¬</sup>(x*<sup>i</sup>* <sup>=</sup> <sup>x</sup>*<sup>j</sup>* ).

Lemma 6. *There exists a function* <sup>f</sup> : <sup>N</sup> \ {0}→{0, <sup>1</sup>} *such that* <sup>f</sup>(1) = 1 *with the properties that:* <sup>f</sup> *is not computable; and, for every* <sup>k</sup> <sup>∈</sup> <sup>N</sup> \ {0}*,* <sup>f</sup>0(2<sup>k</sup>) = f1(2<sup>k</sup>)*.*

*Example 6.* The constant function that assigns 0 to all positive integers satisfies neither the first nor the second condition of Lemma 6. The function that assigns 0 to even numbers and 1 to odd numbers satisfies the second condition, but not the first. Of course, any non-computable function satisfies the first condition. An example could be found by a function that returns 1 if the Turing machine that is encoded by the given number halts and 0 otherwise, under some encoding. Finding a function that admits both conditions is more challenging.

Let f be some function with the properties listed in Lemma 6. We can now define T<sup>f</sup> over Σ<sup>s</sup> (note that f itself is not a part of the signature, but is rather used to help define the axioms of T<sup>f</sup> ). T<sup>f</sup> consists of those structures A that either (i) have a finite cardinality n, with f1(n) elements satisfying sA(a) = a, and <sup>f</sup>0(n) elements satisfying <sup>s</sup>A(a) <sup>=</sup> <sup>a</sup> (and thus <sup>A</sup> satisfies <sup>ψ</sup><sup>=</sup> <sup>≥</sup>f1(k) <sup>∧</sup> <sup>ψ</sup> <sup>=</sup> <sup>≥</sup>f0(k) for <sup>k</sup> <sup>≤</sup> <sup>n</sup>, and <sup>ψ</sup><sup>=</sup> <sup>=</sup>f1(n) <sup>∧</sup><sup>ψ</sup> <sup>=</sup> <sup>=</sup>f0(n) and hence <sup>k</sup> <sup>i</sup>=1[ψ<sup>=</sup> <sup>=</sup>f1(i) <sup>∧</sup><sup>ψ</sup> <sup>=</sup> <sup>=</sup>f0(i)] for all k ≥ n); or (ii) have infinitely many elements, with infinitely many elements satisfying each condition, <sup>s</sup>A(a) = <sup>a</sup> and <sup>s</sup>A(a) <sup>=</sup> <sup>a</sup> (and thus <sup>A</sup> satisfies <sup>ψ</sup><sup>=</sup> <sup>≥</sup>f1(k) <sup>∧</sup> <sup>ψ</sup> <sup>=</sup> <sup>≥</sup>f0(k) for all <sup>k</sup> <sup>∈</sup> <sup>N</sup>). Note that the description is well-defined because an element must always satisfy either sA(a) = a or sA(a) = a, but never both or neither of these. The theory <sup>T</sup> <sup>s</sup> <sup>f</sup> is similar to T<sup>f</sup> , but in addition to *Ax*(T<sup>f</sup> ) its structures must also satisfy ψ∨.

*Remark 1.* The construction of <sup>T</sup> <sup>s</sup> <sup>f</sup> from T<sup>f</sup> is very similar to the general construction of Definition 6. However, the corresponding result, Lemma 5, according to which all properties but convexity are preserved by this operation, is only shown in Lemma 5 for cases where the original signature is empty, which is not the case for <sup>T</sup><sup>f</sup> . Obtaining <sup>T</sup> <sup>s</sup> <sup>f</sup> from T<sup>f</sup> is not done by adding a function symbol, but rather by changing the axiomatization of the already existing function symbol. While we do prove that <sup>T</sup> <sup>s</sup> <sup>f</sup> has the required properties, a general result in the style of Lemma 5 for arbitrary signatures, with the ability to preserve an existing function symbol instead of adding a new one, is left for future work.

*Example 7.* Let A<sup>n</sup> be a Σs-model with domain {a1,...,an} such that: sA<sup>n</sup> (ai) equals a<sup>i</sup> if 1 ≤ i ≤ f1(n), and a<sup>1</sup> if f1(n) < i ≤ n (the second condition may be void if <sup>n</sup> = 1). Then <sup>A</sup><sup>n</sup> is a model of both <sup>T</sup><sup>f</sup> and <sup>T</sup> <sup>s</sup> f .

If κ is an infinite cardinal, let A<sup>κ</sup> be a Σs-model with domain A ∪ {a<sup>n</sup> : n ∈ <sup>N</sup> \ {0}} (where <sup>A</sup> is a set of cardinality <sup>κ</sup> disjoint from {a<sup>n</sup> : <sup>n</sup> <sup>∈</sup> <sup>N</sup> \ {0}}) such that <sup>s</sup>A<sup>κ</sup> (ai) = <sup>a</sup><sup>i</sup> for each <sup>i</sup> <sup>∈</sup> <sup>N</sup> \ {0}, and <sup>s</sup>A<sup>κ</sup> (a) = <sup>a</sup><sup>1</sup> for each <sup>a</sup> <sup>∈</sup> <sup>A</sup>. Then <sup>A</sup><sup>κ</sup> is a model of both <sup>T</sup><sup>f</sup> and <sup>T</sup> <sup>s</sup> f .

To show that T<sup>f</sup> is smooth and finitely witnessable, we construct, given a T<sup>f</sup> -interpretation. another T<sup>f</sup> -interpretation by (possibly) adding two disjoint sets of elements to the interpretation, one whose elements will satisfy s(a) = a, and one whose elements will satisfy s(a) = a.

To show that it is not strongly finitely witnessable, we use the following lemmas, which are interesting in their own right. According to the first, the *mincard* function of T<sup>f</sup> is not computable.

### Lemma 7. *The mincard function of* T<sup>f</sup> *is not computable.*

The second lemma that is needed in order to prove that T<sup>f</sup> is not strongly finitely witnessable, is quite surprising. As it turns out, for quantifier-free formulas, the set of T<sup>f</sup> -satisfiable formulas coincides with the set of satisfiable formulas. That is, even though the definition of T<sup>f</sup> is very complex, it induces the same satisfiability relation, over quantifier-free formulas, as the simplest theory possible – the theory axiomatized by the empty set (or, equivalently, all valid first-order sentences).

### Lemma 8. *Every quantifier-free* Σs*-formula that is satisfiable is* T<sup>f</sup> *-satisfiable.*

Note that Lemma 8 does not hold for quantified formulas in general. For example, the formula ∀ x. s(x) = x is satisfiable but not T<sup>f</sup> -satisfiable: because f(1) = 1, every T<sup>f</sup> -interpretation A must have at least one element a with sA(a) = a.

Using Lemma 7 and 8, it is possible to show that T<sup>f</sup> is not strongly finitely witnessable:

### Lemma 9. T<sup>f</sup> *is not strongly finitely witnessable.*

The idea of the proof of Lemma 9 goes as follows: assume for contradiction that there is a strong witness *wit*. The *mincard* function for T<sup>f</sup> can then be defined as

$$\min card(\phi) = \min \{ |V/E| : E \in eq \text{ and } wt(\phi) \land \delta\_V^E \text{ is } T\_f\text{-satisfiable} \}, \tag{1}$$

where eq is the set of all equivalence relations E on V = *vars*(*wit*(φ)), being the corresponding arrangements denoted by δ<sup>E</sup> <sup>V</sup> . Clearly, the sets V and eq can be effectively computed. Also, by Lemma 8, testing for the T<sup>f</sup> -satisfiability of quantifier-free formulas is decidable. Together with our assumption that *wit* is computable, we get that the *mincard* function of T<sup>f</sup> is computable, which contradicts Lemma 7.

The arguments for <sup>T</sup> <sup>s</sup> <sup>f</sup> are very similar, and require minor changes in the corresponding proofs for T<sup>f</sup> .

*Remark 2.* We remark on the connection between the results regarding T<sup>f</sup> and T s <sup>f</sup> , and those of [3]. What we show here is that <sup>T</sup><sup>f</sup> (<sup>T</sup> <sup>s</sup> <sup>f</sup> ) is polite but not strongly polite. Figure 1 of [3] summarizes the relations between these two properties for the one-sorted case. It shows that polite theories that are axiomatized by a universal set of axioms, and whose quantifier-free satisfiability problem is decidable, are strongly polite. While T<sup>f</sup> is decidable for quantifier-free formulas (this is a corollary of Lemma 8), its presentation here is definitely not as a universal theory. On the other hand, [3] also shows that decidable polite theories for which checking if a finite interpretation belongs to the theory is decidable are also strongly polite. However, it is undecidable, given an interpretation, to check whether it belongs to <sup>T</sup><sup>f</sup> (and <sup>T</sup> <sup>s</sup> <sup>f</sup> ): such an algorithm would lead to an algorithm to compute <sup>f</sup> as well. Thus, the theories <sup>T</sup><sup>f</sup> and <sup>T</sup> <sup>s</sup> <sup>f</sup> are polite, but do not meet the criteria for strong politeness from [3]. And indeed, they are not strongly polite.

#### 4.4 Theories over Many-Sorted Non-empty Signatures

For the last column of Table 1, all possible theories can be obtained from theories that were already defined, using a combination of Definitions 4 to 6, and so there is no need to present additional theories specifically for many-sorted non-empty signatures.

*Example 8.* Line <sup>1</sup> includes the theory ((T≥<sup>n</sup>)<sup>2</sup>)s, obtained from (T≥<sup>n</sup>)<sup>2</sup> using Definition 5, where the latter theory is obtained from T≥<sup>n</sup> using Definition 4. This theory admits all properties, including convexity. To obtain a non-convex variant, the theory ((T≥<sup>n</sup>)<sup>2</sup>)<sup>∨</sup> is constructed in a similar fashion, using Definition 6 instead of Definition 5.

With many-sorted non-empty signatures, we can always find an example for each combination of properties, except for those that are trivially impossible due to Theorems 1 and 2 (i.e., theories that are strongly finitely witnessable but not finitely witnessable and theories that are smooth but not stably infinite). This is nicely depicted by Fig. 6. Theorems 1 and 2 are represented in this figure by the location of the circles: the circle for smooth theories is entirely inside the circle for stably infinite theories, and similarly for strongly finitely witnessable and finitely witnessable theories. Then, for every region in this figure, the rightmost column of Table 1 has an example, the sole exception being the region that represents unicorn theories.

*Remark 3.* For non-empty signatures, we chose to include functions rather than predicates. This is not essential as we can replace function symbols by predicate symbols by including the sort of the result of the function as the last component of the arity of the predicate, and then adding an axiom that forces the predicate to be a function.

### 5 Polite Combination Without Smoothness

Polite combination of theories was introduced in [10]. There, it was claimed that in order to combine a theory T with any other theory using polite combination, it suffices for T to be smooth and finitely witnessable (that is, polite). Later, in [6], this condition was corrected, and it was shown that in fact a stronger requirement is needed from T : it has to be smooth and strongly finitely witnessable (that is, strongly polite) to be applicable for the combination method.

Given that weakening strong finite witnessability to finite witnessability results in a condition that does not suffice, it is natural to ask whether there is any other way to weaken the required conditions for polite combination. Rather than weakening strong finite witnessability to finite witnessability, here we consider another option: weakening the smoothness condition to stable infiniteness. Thus, the main result of this section is that polite combination can be done for theories that are stably infinite and strongly finitely witnessable, even if they are not smooth.

Fig. 6. A diagram of the various notions studied in this paper. (Color figure online)

Our contribution can be understood by viewing Fig. 6, ignoring the circle that represents convexity (a property unrelated to the current section). [6] shows that polite combination can be done for the purple region, which represents smooth and strongly finitely witnessable theories. [6] also presented an example showing that expanding the same combination method to the blue region, which represents smooth and finitely witnessable theories, results in an error. Here we instead expand polite combination to the red region, which represents stably infinite and strongly finitely witnessable theories. Now, the red region, if not empty, is only populated by unicorn theories (see Sect. 4). If such theories do not exist, the result follows immediately. Until this is settled, however, we provide a direct proof, regardless of the existence of unicorn theories.

The next theorem shows that polite theory combination can be done for theories that are not necessarily strongly polite (smooth and strongly finitely witnessable), but rather that are simply stably infinite and strongly finitely witnessable.

Theorem 9. *Let* Σ<sup>1</sup> *and* Σ<sup>2</sup> *be disjoint signatures with sorts* S<sup>1</sup> *and* S2*; let* T<sup>1</sup> *be a* Σ1*-theory,* T<sup>2</sup> *be a* Σ2*-theory, and* T = T1⊕T2*; and let* φ<sup>1</sup> *be a quantifier-free* Σ1*-formula and* φ<sup>2</sup> *a quantifier-free* Σ2*-formula.*

*Assume that* T<sup>2</sup> *is stably-infinite and strongly finitely witnessable w.r.t.* S = S1∩S2*, with strong witness wit. Let* ψ = *wit*(φ2)*,* V<sup>σ</sup> = *vars*σ(ψ) *for every* σ ∈ S *and* V = - <sup>σ</sup>∈<sup>S</sup> *vars*σ(ψ)*. Then the following are equivalent:*


It relies heavily on the following lemma, that proves that stable infiniteness and strong finite witnessability imply a weaker notion of smoothness. In this weaker notion, uncountable domains in the original structure A are reduced to countable ones, and the function κ, that dictates the cardinalities of models, is assumed to never assign an uncountable cardinal to any of the sorts.

Lemma 10. *Let* Σ *be a signature with* S ⊆ S<sup>Σ</sup>*, and* T *a theory over* Σ*. If* T *is a stably-infinite and strongly finitely witnessable theory, both w.r.t. the set of sorts* S*, then: for every quantifier-free* Σ*-formula* φ*;* T *-interpretation* A *that satisfies* φ*; and function* κ *from* S<sup>A</sup> <sup>ω</sup> = {σ ∈ S : |σ<sup>A</sup>| ≤ ω} *to the class of cardinals such that* |σ<sup>A</sup>| ≤ κ(σ) ≤ ω *for every* σ ∈ S<sup>A</sup> <sup>ω</sup> *, there exists a* T *-interpretation* B *that satisfies* φ *with* |σ<sup>B</sup>| = κ(σ) *for every* σ ∈ S<sup>A</sup> <sup>ω</sup> *, and* |σ<sup>B</sup>| = ω *for every* σ ∈ S\S<sup>A</sup> <sup>ω</sup> *.*

The proof of Theorem 9 goes as follows: first, we make the infinite domains corresponding to shared sorts of a model A of φ<sup>1</sup> ∧ δ<sup>V</sup> at most countable, by applying Lemma 2. We then proceed similarly to the proof of the polite combination method in [6]: decrease a model B of ψ ∧ δ<sup>B</sup> by using *wit* as a strong witness; and then make the cardinalities of the shared sorts in B equal those of A (which are at most countable), by using Lemma 10.

This result greatly improves the state-of-the-art in polite theory combination, which requires proving that one of the theories is both smooth and strongly finitely witnessable. Thanks to this theorem, proving smoothness can be replaced by proving stable infiniteness, which is typically a much easier task.

#### 6 Conclusion

As mentioned, there are two main contributions offered in this paper, both associated with the theme of theory combination. In Sect. 4, we provide a table with examples for almost all the combinations of stable infiniteness, smoothness, convexity, finite witnessability, and strong finite witnessability known not to be impossible. Section 3 provides theorems proving the sharpness of the examples provided. The second contribution is a new combination theorem, according to which polite theory combination can be done without smoothness, provided we have instead stable infiniteness.

Many ideas for future work rise from the studies here presented. A first direction would be to settle the question of whether unicorn theories exist: if they do not, a proof would probably involve an interesting generalization of the upward Löwenheim-Skolem theorem for many-sorted logic and would imply that strongly polite theories are just simply stably-infinite and strongly finitely witnessable theories, thus greatly simplifying the proof of Theorem 9; if unicorn theories do exist, one wonders if they can be combined in some meaningful way. Another direction of future work involves considering other model-theoretic properties in our table, such as shininess, gentleness, flexibility, and so on, as well as the effect of taking proper subsets of sorts for signatures containing more than one sort.

Acknowledgments. This work was funded in part by NSF-BSF grant numbers 2110397 (NSF) and 2020704 (BSF) and ISF grant number 619/21.

### References


LNCS (LNAI), vol. 3717, pp. 48–64. Springer, Heidelberg (2005). https://doi.org/ 10.1007/11559306\_3. https://hal.inria.fr/inria-00000570


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Decidability of Difference Logic over the Reals with Uninterpreted Unary Predicates

Bernard Boigelot , Pascal Fontaine , and Baptiste Vergain(B)

Montefiore Institute, B28, Université de Liège, Liège, Belgium {Bernard.Boigelot,Pascal.Fontaine,BVergain}@uliege.be

Abstract. First-order logic fragments mixing quantifiers, arithmetic, and uninterpreted predicates are often undecidable, as is, for instance, Presburger arithmetic extended with a single uninterpreted unary predicate. In the SMT world, difference logic is a quite popular fragment of linear arithmetic which is less expressive than Presburger arithmetic. Difference logic on integers with uninterpreted unary predicates is known to be decidable, even in the presence of quantifiers. We here show that (quantified) difference logic on real numbers with a single uninterpreted unary predicate is undecidable, quite surprisingly. Moreover, we prove that difference logic on integers, together with order on reals, combined with uninterpreted unary predicates, remains decidable.

Keywords: First-order logic · Decidability · SMT · Arithmetic · Uninterpreted predicates

### 1 Introduction

The success of satisfiability modulo theories (SMT) solvers in verification can be attributed to several things, but one of them is indisputably the omnipresence, in the combination of theories, of arithmetic reasoners. As SMT solvers get stronger in quantified reasoning, it becomes more interesting to get a clear picture of decidability frontiers when arithmetic is used in a quantified SMT context. Some pure arithmetic theories are already undecidable, even in their quantifier-free fragment, e.g., Peano arithmetic [12], i.e., a first-order theory of the natural numbers with addition and multiplication. However, Presburger arithmetic, somehow the linear restriction of Peano arithmetic, is decidable even in the quantified case [10], but augmenting Presburger arithmetic with a single unary uninterpreted predicate already yields undecidability [7,11,19]. To obtain a decidable fragment mixing arithmetic and uninterpreted predicates, one must further restrict the expressiveness.

In the SMT world, difference logic used to be a popular fragment of arithmetic, because of its low complexity in the quantifier-free case. In this fragment, arithmetic is limited to difference constraints of the form x − y c where x and y are variables, c is an integer constant and belongs to {<, <sup>≤</sup>, =, <sup>≥</sup>, >}. Difference constraints can, e.g., express conditions on the distance between two variables, the atomic formula <sup>x</sup>−<sup>y</sup> = 2 stating that the distance between the values of <sup>x</sup> and <sup>y</sup> must be exactly 2. Notice that since difference constraints involve only two variables (c is an integer constant) those constraints are strictly less expressive than linear constraints in Presburger arithmetic. The decidability of the logic mixing difference constraints and unary uninterpreted predicates, when interpreted over N (or similarly Z) reduces to the decidability of the monadic second-order theory of one successor, usually referred to as S1S. The decidability of S1S has been established thanks to the concept of infinite-word automaton [4].

On the real domain, it is well known that the first-order theory of realclosed fields, which is in a sense the real counterpart of Peano arithmetic, is decidable [20] even in the presence of quantifiers. Whereas this might give the impression that decidability is more often obtained on the reals than on the integers, we here prove that the logic mixing difference constraints and unary uninterpreted predicates, when interpreted over R, is undecidable.

Further restricting the arithmetic language, and considering order on the real domain only, it is known that the monadic second-order theory of order is undecidable [9,17], but its universal fragment is decidable [5]. In this work, we establish that the fragment mixing unary uninterpreted predicates, difference constraints over integer variables, and order constraints over real variables is decidable.

Section 2 provides some prerequisites and the precise definition of the studied fragments. In Sect. 3, we prove the decidability of the fragment mixing unary uninterpreted predicates, difference constraints over integer variables, and order constraints over real variables. This was already the subject of a work-in-progress workshop paper [1]. In Sect. 4, we prove that the fragment of quantified difference constraints over real variables extended with a single unary uninterpreted predicate is undecidable.

### 2 Preliminaries

We refer to e.g., [8] for a general introduction to first-order logic with equality, and assume that the reader is familiar with the notions of signature, term, variable, and formula. We use the usual logical connectives (∨, ∧, ¬, ⇒, ⇔) and first-order quantification ∃x. ϕ and ∀x. ϕ, respectively equivalent to writing <sup>∃</sup><sup>x</sup> (ϕ) and <sup>∀</sup><sup>x</sup> (ϕ), i.e., the dot stands for an opening parenthesis that is closed at the end of the formula. Variable symbols are denoted by x, y, z, . . . and are meant to be interpreted as real numbers.

Our signature contains the interpreted arithmetic symbols 0, 1, +, <sup>−</sup>, <sup>&</sup>lt;, <sup>≤</sup>, <sup>≥</sup>, <sup>&</sup>gt;, =, and other constants in <sup>N</sup> that stand for terms 1+1+ ··· + 1. We furthermore use a monadic (i.e., unary) interpreted predicate <sup>x</sup> <sup>∈</sup> <sup>Z</sup> to denote that x has an integer value. The signature also contains uninterpreted predicate symbols P, Q, . . . In the whole article, we only consider unary predicate symbols. Indeed, including binary uninterpreted predicates without restriction on firstorder quantification directly yields undecidability. Our language is the set of all well-formed formulas, in the usual sense, built using symbols from the signature. Further specific restrictions will be introduced later.

An interpretation specifies a domain (i.e., a set of elements), assigns a value in the domain to each free variable, and assigns relations of appropriate arity on the domain to predicate symbols in the signature. Throughout the article, the interpretation domain is always <sup>R</sup>. The arithmetic symbols 0, 1, +, <sup>−</sup>, <sup>&</sup>lt;, <sup>≤</sup>, <sup>≥</sup>, <sup>&</sup>gt;, = are interpreted as expected on <sup>R</sup>, and <sup>x</sup> <sup>∈</sup> <sup>Z</sup> is true if and only if <sup>x</sup> has an integer value<sup>1</sup>. An interpretation assigns an arbitrary subset of the domain R to each unary predicate. By extension, an interpretation assigns a value in R to every term, and a truth value to every formula. We denote the interpretation I of a variable <sup>x</sup> by <sup>I</sup>[x], and the interpretation of a predicate <sup>P</sup> by <sup>I</sup>[P]. A model of a formula is an interpretation that assigns true to this formula. A formula is satisfiable on a domain (here R) if it has a model on that domain.

#### 2.1 Difference Arithmetic with Unary Predicates

We consider several fragments where the language is restricted, in particular in the way that the arithmetic relations can be used. A fragment is decidable if there exists a procedure to check whether a given formula in this fragment is satisfiable.

In the various fragments introduced below, all arithmetic atoms are either *order constraints* of the form x y, or *difference constraints* of the form x−y c, where x and y are variables, c is a constant in Z, and - ∈ {<, <sup>≤</sup>, =, <sup>≥</sup>, >}. As a reminder, the language of our formulas only contains *unary predicates*. The only atoms besides the arithmetic ones are of the form <sup>P</sup>(x) where <sup>P</sup> is an uninterpreted predicate symbol and <sup>x</sup> is a variable, and <sup>x</sup> <sup>∈</sup> <sup>Z</sup> where <sup>x</sup> is a variable. Note that the addition of constraints of the form x c, where x is a variable and c is an integer constant, to fragments that already admit difference constraints does not increase their expressive power: constraints x c can be replaced by difference constraints x − v<sup>0</sup> c, where v<sup>0</sup> is a particular variable in Z intended to be interpreted as zero. Indeed, shifting an interpretation by a fixed integer j — i.e., the new interpretation of any variable x is the old value of <sup>x</sup> plus <sup>j</sup>, and the new value of any predicate <sup>P</sup> for a real number <sup>d</sup> + <sup>j</sup> is the old value of P for d — preserves the assigned value of formulas in our fragments. Therefore any model where v<sup>0</sup> is an arbitrary integer can be shifted into a model where v<sup>0</sup> is zero.

As syntactic sugar, conjunctions of order constraints will be merged to improve readability, i.e., we will often write x<y<z rather than x<y∧y<z. Finally, we use the shorthand <sup>P</sup>(<sup>x</sup> + <sup>c</sup>) instead of <sup>∃</sup>y. y <sup>−</sup> <sup>x</sup> = <sup>c</sup> <sup>∧</sup> <sup>P</sup>(y), where <sup>x</sup> is a free variable and <sup>c</sup> <sup>∈</sup> <sup>Z</sup>.

We now introduce our fragments of interest. Their names are inspired from the SMT-LIB nomenclature, where acronyms stand for the theories that appear in the combinations:

<sup>1</sup> In the current context, this choice of notation for mixed integer-real arithmetic is simpler than using a multi-sorted logic.


uf<sup>1</sup>·ro. The fragment uf<sup>1</sup>·ro is the fragment with unary uninterpreted predicates and order constraints between variables interpreted over R. Difference logic constraints and atoms of the form <sup>x</sup> <sup>∈</sup> <sup>Z</sup> are not allowed.

*Example:* The formula <sup>∀</sup><sup>x</sup> <sup>∃</sup> y, z . y < x < z ∧ ∀t .(y<t<z <sup>∧</sup> <sup>P</sup>(t)) <sup>⇒</sup> <sup>t</sup> = <sup>x</sup> describes a predicate P that is true only on isolated real numbers.

uf<sup>1</sup>·iro. The fragment uf<sup>1</sup>·iro is the extension of uf<sup>1</sup>·ro where atoms of the form <sup>x</sup> <sup>∈</sup> <sup>Z</sup> are allowed. This fragment can express order relations between real and integer variables.

*Example:* The formula <sup>∀</sup>x, y.(x<y <sup>∧</sup> <sup>x</sup> <sup>∈</sup> <sup>Z</sup> <sup>∧</sup> <sup>y</sup> <sup>∈</sup> <sup>Z</sup>) ⇒ ∃v. x < v < y <sup>∧</sup> <sup>P</sup>(v) describes a predicate P that is true for at least one value located between any two integers.

uf<sup>1</sup>·idl·iro. The fragment uf<sup>1</sup>·idl·iro is an extension of the fragment uf<sup>1</sup>·iro (and therefore of uf<sup>1</sup>·ro). It is also interpreted over <sup>R</sup>. Order constraints between variables and atoms of the form <sup>x</sup> <sup>∈</sup> <sup>Z</sup> are allowed. Additionally, difference logic constraints are allowed, but they can only involve *integer-guarded* variables.

In order to enforce this integer-guard restriction on difference logic constraints, uf<sup>1</sup>·idl·iro formulas must be *well-guarded*, i.e., difference logic constraints can only appear in the two following contexts:

– <sup>x</sup> <sup>∈</sup> <sup>Z</sup> <sup>∧</sup> <sup>y</sup> <sup>∈</sup> <sup>Z</sup> <sup>∧</sup> <sup>x</sup> <sup>−</sup> y c, – (<sup>x</sup> <sup>∈</sup> <sup>Z</sup> <sup>∧</sup> <sup>y</sup> <sup>∈</sup> <sup>Z</sup>) <sup>⇒</sup> <sup>x</sup> <sup>−</sup> y c,

where <sup>x</sup> and <sup>y</sup> are variables, <sup>c</sup> <sup>∈</sup> <sup>Z</sup> is a constant, and -∈ {<, <sup>≤</sup>, =, <sup>≥</sup>, >}.

*Example:* The following formula describes a predicate that is either true on all odd numbers and false on all even numbers, or the opposite, as well as true on all non-integer numbers:

$$\begin{array}{l} \left[ \forall x, y. \left( x \in \mathbb{Z} \land y \in \mathbb{Z} \land y - x = 2 \right) \Rightarrow \left( P(x) \Leftrightarrow P(y) \right) \right] \\ \land \left[ \exists x, y. \, x \in \mathbb{Z} \land y \in \mathbb{Z} \land P(x) \land \neg P(y) \right] \land \left[ \forall z. \neg (z \in \mathbb{Z}) \Rightarrow P(z) \right] \\ \text{true.} \text{BD} \quad \text{The fragment rule, } \mathsf{BD} \text{ is the fragment} \end{array}$$

uf<sup>1</sup>·rdl. The fragment uf<sup>1</sup>·rdl is the fragment interpreted over <sup>R</sup>, where order constraints, difference logic constraints and unary predicate atoms are allowed without any restriction. The use of atoms of the form <sup>x</sup> <sup>∈</sup> <sup>Z</sup> is forbidden. Since order constraints are a special case of difference logic constraints, the name of the fragment only refers to rdl and not ro.

*Example:* The formula <sup>∀</sup><sup>x</sup> <sup>∃</sup>y. 0 < y <sup>−</sup> x < 3 <sup>∧</sup> <sup>P</sup>(y) describes a predicate <sup>P</sup> such that any subinterval of <sup>R</sup> of length greater or equal to 3 contains a value for which P is true.

*Note:* It might appear to the reader that a missing logic in this nomenclature is uf<sup>1</sup>·irdl, with difference logic constraints on both real and integer variables. We will later show that uf<sup>1</sup>·rdl is already undecidable, so it makes little sense to introduce any extension of it.

# 3 Decidability of uf1*·*idl*·*iro

The fragment uf<sup>1</sup>·ro is actually a restriction of the universal fragment of the monadic second-order theory of the real order <sup>R</sup>, i.e., uf<sup>1</sup>·ro augmented with universal quantification of predicate variables. It has been established in [5] that the universal fragment of the monadic second-order theory of the real order R is decidable, which trivially implies the decidability of uf<sup>1</sup>·ro. We show here that its extension uf<sup>1</sup>·idl·iro (and therefore uf<sup>1</sup>·iro) is also decidable, by a reduction to uf<sup>1</sup>·ro.

Theorem 1. uf<sup>1</sup>·idl·iro *and* uf<sup>1</sup>·iro *are decidable.*

Note that the decidability of uf<sup>1</sup>·iro is a direct consequence of the decidability of uf<sup>1</sup>·idl·iro, since uf<sup>1</sup>·idl·iro is an extension of uf<sup>1</sup>·iro. The remaining of this section is thus dedicated to proving that uf<sup>1</sup>·idl·iro is decidable.

### 3.1 Recognizing Integer Values

We first show how to define in uf<sup>1</sup>·ro a predicate <sup>P</sup>*int* over <sup>R</sup> that is <sup>&</sup>lt; isomorphic to Z, i.e., such that there exists a bijection between the sets described by P*int* and Z that preserves the order relation over their elements. Integer guards in uf<sup>1</sup>·idl·iro will later be translated using <sup>P</sup>*int*. Intuitively, an integer-guarded variable in a uf<sup>1</sup>·idl·iro formula will correspond to a variable taking its value in the set described by <sup>P</sup>*int* in the translated uf<sup>1</sup>·ro formula.

We axiomatize <sup>P</sup>*int* in uf<sup>1</sup>·ro as follows:


The set of all integers is a model for P*int*, therefore the above axiomatization is consistent. The set of elements satisfying P*int* is necessarily infinite and does not admit a maximal or a minimal element. This is a direct consequence of the successor and predecessor axioms. More interestingly, this set is also necessarily countable. Indeed, since each point is isolated, there exists an application that maps the elements satisfying P*int* to disjoint open intervals. Any set of disjoint intervals in R with non-zero length is necessarily countable [18], since each of them contains a rational value that does not belong to the others.

It is now possible to define a successor relation on the real numbers satisfying P*int* with the formula *Succ*(x, y)=P*int*(x) <sup>∧</sup> <sup>P</sup>*int*(y) <sup>∧</sup> y<x ∧ ∀z.y<z<x ⇒ ¬P*int*(z), i.e., x is the successor of y, or equivalently, y is the predecessor of x.

The axiomatization of P*int* is, in fact, precise enough to have the following lemma.

Lemma 1. *For any model* <sup>M</sup> *of* <sup>P</sup>*int, the set* <sup>M</sup>[P*int*] *is* <sup>&</sup>lt;*-isomorphic to* <sup>Z</sup>*.*

For convenience in the proof, we define <sup>0</sup>*int* as an arbitrary existentially quantified value that belongs to the set described by P*int*.

*Proof.* Given a model M of the axiomatization of P*int*, we need to define a bijection between the set <sup>M</sup>[P*int*] and <sup>Z</sup> that preserves order.

Let us define an application <sup>f</sup> from <sup>M</sup>[P*int*] to <sup>Z</sup>. We set <sup>f</sup>(0*int*)=0, and then define recursively:

– <sup>f</sup>(y) = <sup>f</sup>(x)+1 for each x, y <sup>∈</sup> <sup>M</sup>[P*int*] such that y > <sup>0</sup>*int* and *Succ*(y, x),

– <sup>f</sup>(y) = <sup>f</sup>(x) <sup>−</sup> <sup>1</sup> for each x, y <sup>∈</sup> <sup>M</sup>[P*int*] such that y < <sup>0</sup>*int* and *Succ*(x, y).

Thanks to the fact that every element of <sup>M</sup>[P*int*] has a unique predecessor and successor, it follows that f ranges over the whole set Z, proving that f is surjective. Since it is clear that f preserves order, it follows that f is strictly increasing, and therefore injective. It remains to show that f is well defined for every element in <sup>M</sup>[P*int*].

If there exists some element <sup>y</sup> <sup>∈</sup> <sup>M</sup>[P*int*] for which <sup>f</sup> is not defined, it means that <sup>f</sup> is not well-defined, in the sense that there exists either an element y > <sup>0</sup>*int* such that the interval [0*int*, y] contains an infinite number of elements satisfying <sup>P</sup>*int*, or there exists an element y < <sup>0</sup>*int* such that the interval [y, <sup>0</sup>*int*] contains an infinite number of elements satisfying P*int*. Since both cases are symmetric, we only address the former. There must exist a strictly increasing infinite series of elements in <sup>M</sup>[P*int*] bounded by <sup>y</sup>. Let us consider its limit <sup>z</sup> <sup>∈</sup> <sup>R</sup>. Because there must exist an element of <sup>M</sup>[P*int*] smaller than <sup>z</sup> and arbitrarily close to <sup>z</sup>, it follows that z cannot have a predecessor, which contradicts an axiom. Therefore <sup>f</sup> is well-defined, and every element of <sup>M</sup>[P*int*] is associated to an integer number. The application f is therefore a bijection. 

#### 3.2 Translating Formulas

We are now able to describe the satisfiability-preserving translation of formulas from uf<sup>1</sup>·idl·iro to uf<sup>1</sup>·ro. Consider a uf<sup>1</sup>·idl·iro formula <sup>ϕ</sup>. Without loss of generality, we assume that P*int* does not appear in ϕ. The translation of ϕ is defined as

> *AXIOMS*int(P*int*) <sup>∧</sup> ϕ

where *AXIOMS*int(P*int*) is the conjunction of the axioms of <sup>P</sup>*int*, and -· is a translation operator. This translation operator -· distributes over all Boolean operators and quantifiers, and corresponds to the identity transformation for most considered atoms, except in the following cases:


*Example:* <sup>x</sup> <sup>−</sup> <sup>y</sup> <sup>≤</sup> <sup>2</sup> <sup>=</sup> <sup>∃</sup>z0, z1, z2. y <sup>=</sup> <sup>z</sup><sup>0</sup> <sup>∧</sup> *Succ*(z1, z0) <sup>∧</sup> *Succ*(z2, z1) <sup>∧</sup> <sup>x</sup> <sup>≤</sup> <sup>z</sup>2. Notice that we only deal with the case <sup>c</sup>∈<sup>N</sup> since every atom of the form <sup>x</sup>−y c with <sup>c</sup> <sup>∈</sup> <sup>Z</sup>\<sup>N</sup> and - ∈ {<, <sup>≤</sup>, =, <sup>≥</sup>, >} can be rewritten as <sup>y</sup> <sup>−</sup>x - −c with the following correspondences: (-, - ) ∈ {(=, =),(<, >),(>, <),(≥, <sup>≤</sup>),(≤, <sup>≥</sup>)}.

#### 3.3 Establishing Equisatisfiability

Given a uf<sup>1</sup>·idl·iro formula <sup>ϕ</sup>, the translation that we have introduced generates a corresponding uf<sup>1</sup>·ro formula <sup>ψ</sup>. To establish that they are equisatisfiable, we need to prove that if ϕ admits a model, then ψ also admits one, and reciprocally.

Lemma 2. *Given a* uf*<sup>1</sup>*·idl·iro *formula* <sup>ϕ</sup>*, consider its translation into* uf*<sup>1</sup>*·ro <sup>ψ</sup> = *AXIOMS*int(P*int*) <sup>∧</sup> ϕ*. The formulas* ϕ *and* ψ *are equisatisfiable.*

*Proof.* If ϕ is satisfiable, let M be one of its models. Then, since ψ shares the same free variables and predicates than ϕ with the only addition of P*int*, we can directly construct a model M of ψ that is similar to M for the shared variables and predicates, and that interprets <sup>P</sup>*int* so that <sup>P</sup>*int*(x) holds whenever <sup>x</sup> <sup>∈</sup> <sup>Z</sup>. This is always possible since the only constraints on <sup>P</sup>*int* generated by the construction of ψ are the axioms stated above.

If ψ is satisfiable, then there exists a model M of ψ. Let us construct a model <sup>M</sup> of <sup>ϕ</sup>. Let <sup>0</sup>*int* <sup>∈</sup> <sup>R</sup> be an arbitrary element of <sup>M</sup>[P*int*]. We define an automorphism <sup>g</sup> of <sup>R</sup>, such that <sup>g</sup>(0*int*)=0, and recursively <sup>g</sup>(y) = <sup>g</sup>(x)+1 for x, y <sup>∈</sup> <sup>M</sup>[P*int*], y > <sup>0</sup>*int* and *Succ*(y, x), and <sup>g</sup>(y) = <sup>g</sup>(x) <sup>−</sup> <sup>1</sup> for x, y <sup>∈</sup> <sup>M</sup>[P*int*], y < <sup>0</sup>*int* and *Succ*(x, y). The automorphism <sup>g</sup> maps each open interval between the <sup>k</sup>-th and (<sup>k</sup> + 1)-th successors (resp. predecessors) of <sup>0</sup>*int* in <sup>M</sup>[P*int*], onto the open interval (k, k + 1) (resp. (−(k+1), <sup>−</sup>k)) while preserving order.

M is defined by M [x] = <sup>g</sup>(M[x]) for each free variable <sup>x</sup> of the formula ϕ, and M [P] = {g(x)<sup>|</sup> <sup>x</sup> <sup>∈</sup> <sup>M</sup>[P]} for each uninterpreted predicate <sup>P</sup> of <sup>ϕ</sup>. No unary predicate atom can be violated by M by definition. Furthermore, no order constraint can be violated by M either since g preserves order. Regarding the difference logic constraints, the intermediate variables z<sup>i</sup> introduced in the translation are necessarily mapped to values in <sup>M</sup>[P*int*] since the *Succ* relation enforces this property. Hence for each such variable, we have <sup>g</sup>(M[z<sup>i</sup>]) <sup>∈</sup> <sup>Z</sup>. Intuitively, this ensures that in M the difference between the values taken by the integer variables is consistent with the difference logic constraints. It follows that M is a model of ϕ. 

# 4 Undecidability of uf1*·*rdl

The result presented in the previous section establishes a lower bound for the decidability of our family of fragments. A natural follow-up problem is to establish a corresponding upper bound, i.e., to find an extension of this logic that yields undecidability. We show here that, when combined with uninterpreted unary predicates, as soon as difference logic constraints on reals are allowed, the logic becomes undecidable.

We actually show a stronger result which is that a single unary predicate symbol is enough to yield undecidability. More precisely, we establish the undecidability of the restriction of uf<sup>1</sup>·rdl where only one predicate symbol is allowed, by reducing the halting problem of a Turing machine to the satisfiability problem over this restriction of uf<sup>1</sup>·rdl.

Theorem 2. *Satisfiability is undecidable for* uf<sup>1</sup>·rdl *with a single predicate.*

Corollary 1. *Satisfiability is undecidable for* uf<sup>1</sup>·rdl*.*

The remaining of this section is dedicated to proving Theorem 2. We consider w.l.o.g. Turing machines defined over an alphabet with only two symbols and no explicit blank symbol [16]. This choice leads to a simpler proof.

#### 4.1 Definitions

The proof is by reduction from the halting problem for a Turing machine with a single bi-infinite tape, starting from a blank tape (i.e., a tape filled with the symbol <sup>0</sup>). Consider a Turing machine <sup>M</sup> = (Q, Σ, q<sup>I</sup> , q<sup>F</sup> , Δ), where


A *configuration* C of such a Turing machine is a triplet containing the current state <sup>q</sup>, the content of the tape <sup>t</sup> ∈ {0, 1}<sup>Z</sup> and the position of the head <sup>h</sup> <sup>∈</sup> <sup>Z</sup>. Since the machine starts from a blank tape, the initial configuration is <sup>C</sup><sup>0</sup> <sup>=</sup> (q<sup>I</sup> , <sup>0</sup><sup>Z</sup>, 0).

<sup>A</sup> *run* <sup>ρ</sup> of length <sup>n</sup> <sup>∈</sup> <sup>N</sup> (resp. <sup>n</sup> = +∞) of such a Turing machine is a finite (resp. infinite) sequence of configurations (C<sup>i</sup>)<sup>i</sup>∈[0,n] (resp. (C<sup>i</sup>)<sup>i</sup>∈<sup>N</sup>), such that for any two consecutive configurations <sup>C</sup><sup>i</sup> = (qi, ti, h<sup>i</sup>) and <sup>C</sup>i+1 = (qi+1, ti+1, hi+1) there exists a transition (q, α, q , α , λ) <sup>∈</sup> <sup>Δ</sup> such that:


A *halting run* is a finite run such that the state of its last configuration is the halting state q<sup>F</sup> .

#### 4.2 Encoding Runs

Our goal is to encode a run of a Turing machine (as described before), i.e., encode the state, the tape content, and the position of the head for each configuration of such a run. Starting from the initial configuration, we must also ensure the coherence of the run w.r.t. the Turing machine transition relation, by connecting every two consecutive configurations. Our idea is to define an infinite sequence of intervals on the real line, such that each interval contains the encoding of its corresponding configuration (i.e., the first interval will contain the first configuration of the run, and so on). Difference constraints can then be used to connect consecutive configurations.

Let <sup>N</sup> = log<sup>2</sup>(|Q|). Each state <sup>q</sup> <sup>∈</sup> <sup>Q</sup> of <sup>M</sup> can therefore be uniquely encoded with N Boolean values b q 1,...b<sup>q</sup> <sup>N</sup> . We want to encode consecutive configurations of the Turing machine using a single predicate P over R. In order to do so, we first need to describe a subset of R that will act as a grid supporting the encoding of the state, the tape content, and the head position of the current configuration.

We use the concept of linear ordering [15] to describe the shape of the grid. A *linear ordering* J is a totally ordered set, i.e., a set equipped with a binary relation < which is irreflexive (for all j in J, j < j), asymmetric (for all j, k in J, if j<k, then k < j), transitive (for all i, j, k in J, if i<j and j<k, then i<k), and complete (for all j, k <sup>∈</sup> <sup>J</sup>, either <sup>j</sup> = <sup>k</sup>, j<k, or k<j). The *order type* of a linear ordering J is the class of all linear orderings <-isomorphic to J. The order types of a singleton, the set composed of the N first natural numbers, <sup>N</sup>, and <sup>Z</sup> are respectively denoted by 1, <sup>N</sup>, <sup>ω</sup>, and <sup>ζ</sup>. The concatenation of two linear orderings J and K (where their associated order relations are respectively <sup>&</sup>lt;<sup>J</sup> and <sup>&</sup>lt;K) is denoted by <sup>J</sup> <sup>+</sup>K. It corresponds to the linear ordering composed of the set of pairs {(j, 1)<sup>|</sup> <sup>j</sup> <sup>∈</sup> <sup>J</sup>}∪{(k, 2)<sup>|</sup> <sup>k</sup> <sup>∈</sup> <sup>K</sup>}, and equipped with the order relation <sup>&</sup>lt;, defined by (j1, 1) <sup>&</sup>lt; (j2, 1) if <sup>j</sup><sup>1</sup> <sup>&</sup>lt;<sup>J</sup> <sup>j</sup>2, (k1, 2) <sup>&</sup>lt; (k2, 2) if <sup>k</sup><sup>1</sup> <sup>&</sup>lt;<sup>K</sup> <sup>k</sup>2, and (j, 1) <sup>&</sup>lt; (k, 2) for every <sup>j</sup> <sup>∈</sup> <sup>J</sup> and <sup>k</sup> <sup>∈</sup> <sup>K</sup>. More generally, given two linear orderings <sup>J</sup> and <sup>K</sup>, the linear ordering (J)<sup>K</sup> is the set of pairs (j, k) with <sup>j</sup> <sup>∈</sup> <sup>J</sup> and <sup>k</sup> <sup>∈</sup> <sup>K</sup>, with the order relation <sup>&</sup>lt; such that (j1, k<sup>1</sup>) <sup>&</sup>lt; (j2, k<sup>2</sup>) if either <sup>k</sup><sup>1</sup> <sup>&</sup>lt;<sup>K</sup> <sup>k</sup>2, or <sup>k</sup><sup>1</sup> <sup>=</sup><sup>K</sup> <sup>k</sup><sup>2</sup> and <sup>j</sup><sup>1</sup> <sup>&</sup>lt;<sup>J</sup> <sup>j</sup>2. These operators are naturally extended on order types. For instance, the order type (ω)<sup>ω</sup> is the class of all linear orderings <-isomorphic to N<sup>2</sup>.

The grid we consider is a linear ordering that is a subset of R, of order type <sup>N</sup> + <sup>ζ</sup> +1+ <sup>ζ</sup> ω . An ordering of order type <sup>N</sup> + <sup>ζ</sup> +1+ <sup>ζ</sup> within the interval [0, 3) is depicted in Fig. 1. Each dot corresponds to a natural number and each vertical line corresponds to an element of the linear ordering. The first N points will support the encoding of a state. The first subordering that is <-isomorphic to Z (i.e., of order type ζ) will be used to encode the position of the head, while the second one will support the encoding of the tape content. The whole grid is composed of an infinite repetition of the subordering <sup>N</sup> + <sup>ζ</sup> +1+ <sup>ζ</sup> (i.e., it is repeated on the intervals [3k, 3<sup>k</sup> + 3) for all <sup>k</sup> <sup>∈</sup> <sup>N</sup>), hence the <sup>ω</sup> exponent.

Fig. 1. A visual representation of a linear ordering of order type N + ζ +1+ ζ.

#### 4.3 Defining the Support of the Encoding

Let us first define concretely the support of the encoding of the Turing machine configurations. The difficulty lies in describing the grid using a single predicate P, without meddling with the actual encoding of the configurations afterwards. Our solution is to characterize the points that belong to the grid by enforcing that such a point is surrounded by an open interval where P is uniformly *true* on the left, and by an open interval where P is uniformly *false* on the right, such as depicted in Fig. 2. We do not specify yet how P behaves on x, as this is how the configurations will actually be encoded later.

Fig. 2. The real number x belongs to the grid, since it is surrounded by a *true* (black) open interval on the left, and a *false* (white) open interval on the right.

Such a characterization is easy to express in our restriction of uf<sup>1</sup>·rdl:

*Support*(x)=(∃y. y<x∧∀z.y<z<x <sup>⇒</sup> <sup>P</sup>(z))∧(∃y. x<y∧∀z. x<z<y ⇒ ¬P(z))

Let us now partially axiomatize the predicate P such that the set of *supporting* points constitutes a linear ordering of order type <sup>N</sup> + <sup>ζ</sup> +1+ <sup>ζ</sup> ω :


(d) There are exactly <sup>N</sup> <sup>−</sup> 2 *supporting* points within the interval (**0**, **<sup>1</sup>**): *Axiom*<sup>4</sup> <sup>=</sup> <sup>∃</sup>x1, x2,...x<sup>N</sup> . x<sup>1</sup> <sup>=</sup> **<sup>0</sup>** <sup>∧</sup> <sup>x</sup><sup>N</sup> <sup>=</sup> **<sup>1</sup>**

∧ <sup>1</sup>≤i<N **<sup>0</sup>** <sup>≤</sup> <sup>x</sup><sup>i</sup> <sup>&</sup>lt; **<sup>1</sup>** <sup>∧</sup> *SuccSupp*(xi+1, xi) 

where *SuccSupp*(x, y) is a formula that states that <sup>x</sup> is the first *supporting* real value that is strictly greater than y, i.e., x is the successor of y on the grid. It is defined as follows:

*SuccSupp*(*x, y*) = *y<x* ∧ *Support*(*x*) ∧ *Support*(*y*) ∧ ∀ *z. y < z < x* ⇒ ¬ *Support*(*z*) We also define an analogous formula to express that x is the predecessor of <sup>y</sup>: *PredSupp*(x, y) = *SuccSupp*(y, x).

(e) The set of *supporting* points within (**1**, **<sup>2</sup>**) is <sup>&</sup>lt;-isomorphic to <sup>Z</sup>. This is done similarly to the axiomatization of P*int* (cf. Section 3.1). But because **1** (resp. **2**) is a *supporting* point, there must exist a uniformly false (resp. true) interval of P at its right (resp. left) where no other *supporting* points can appear. All the *supporting* points will therefore be constrained to appear within a smaller interval (b1, b<sup>2</sup>) with **<sup>1</sup>** < b<sup>1</sup> < b<sup>2</sup> <sup>&</sup>lt; **<sup>2</sup>**, as illustrated in Fig. 3.

$$\begin{aligned} Axiom\_5 &= \left[ \exists b\_1, b\_2. \mathbf{1} < b\_1 < b\_2 < \mathbf{2} \right] \\ &\qquad \wedge \left[ \forall x. \left( b\_1 < x < b\_2 \right) \Rightarrow \exists y. x < y < b\_2 \land Support(y) \right] \\ &\qquad \wedge \forall z. x < z < y \Rightarrow \neg Support(z) \right] \tag{2} \\ &\qquad \wedge \left[ \forall x. \left( b\_1 < x < b\_2 \right) \Rightarrow \exists y. b\_1 < y < x \land Support(y) \right] \\ &\qquad \qquad \wedge \forall z. y < z < x \Rightarrow \neg Support(z) \right] \tag{3} \\ &\qquad \wedge \forall z. \left( 1 < z < 0 \land Support(z) \right) \Rightarrow \vdash \neg Support(z) \tag{4} \end{aligned}$$

$$\left[ \forall x. \left( 1 < x < 2 \land Support(x) \right) \Rightarrow b\_1 < x < b\_2 \right] \tag{4}$$

This axiom can be broken down into these elementary pieces:


*Axiom*<sup>6</sup> <sup>=</sup> <sup>∀</sup>x. **<sup>1</sup>** <x< **<sup>2</sup>** <sup>⇒</sup> (*Support*(x) <sup>⇔</sup> *Support*(<sup>x</sup> + 1))

(g) The pattern of *supporting* points within [**0**, **<sup>3</sup>**) is repeated onto every interval [**3k**, **3k** + **<sup>3</sup>**) for <sup>k</sup> <sup>∈</sup> <sup>N</sup>:

$$Ax 
i 
om 
 m\_7 = 
 \forall x. x \ge \mathbf{0} \Rightarrow (Support(x) \Leftrightarrow support(x+3))$$

Notice that for *Axiom*7, it is not enough that a similar pattern appears within each interval [**3k**, **3k** + **<sup>3</sup>**): there must be an exact offset of 3 with the previous interval. This is mandatory to connect two consecutive configurations and ensure that they are coherent with the transition relation of the Turing machine, as defined later. The same goes for *Axiom*6, where the exact offset of <sup>1</sup> will allow to connect the position of the head to the tape content within a single configuration. The formula *AXIOMSSupp* <sup>=</sup> 1≤k≤7 *Axiom*<sup>k</sup> axiomatizes the predicate P.

Fig. 3. The points of the grid surrounded by open *true* (black) and *false* (white) intervals within (**1**, **2**).

Fig. 4. A model for the axiomatization of P over the interval (−∞, 1).

#### Lemma 3. *The formula AXIOMSSupp is consistent.*

The proof sketch below provides the key ideas to construct a model of *AXIOMSSupp*. The complete construction is described in [2].

*Proof.* Let us construct a subset S of R that is a model of *AXIOMSSupp*. Firstly, we make every negative number belong to S, which ensures that there do not exist negative supporting points. The interval [0, 1] is then cut into 2<sup>N</sup> <sup>−</sup> 2 intervals of equal length, which alternate between being included in S, and being disjoint from <sup>S</sup>. This ensures the existence of exactly <sup>N</sup> <sup>−</sup> 1 supporting points within the interval (−∞, 1), 0 being the first; 1 will be considered later. These <sup>N</sup> <sup>−</sup> <sup>1</sup> supporting points are referred to as <sup>s</sup>1, s2,...s<sup>N</sup>−<sup>1</sup> and are depicted in Fig. 4. Recall that the supporting points are exactly those surrounded by an interval of S (i.e., black on the figure) on the left, and an interval disjoint from S (i.e., white) on the right.

In order to make the real value 1 the <sup>N</sup>-th supporting point, it is enough to make an interval on its right disjoint from <sup>S</sup>, e.g., the interval (1, 1 + <sup>1</sup> 4 ). Symmetrically, we make the interval (2 <sup>−</sup> <sup>1</sup> <sup>4</sup> , 2) included in <sup>S</sup>, satisfying the left part of the requirement for the real value 2 to be a supporting point.

We further characterize S such that the set of supporting points within the interval (1+ <sup>1</sup> <sup>4</sup> , <sup>2</sup><sup>−</sup> <sup>1</sup> <sup>4</sup> ) is <sup>&</sup>lt;-isomorphic to <sup>Z</sup>. This can be done by partitioning the

Fig. 5. A model for the axiomatization of P over the interval (1, 2).

open interval (1+ <sup>1</sup> <sup>4</sup> , <sup>2</sup><sup>−</sup> <sup>1</sup> <sup>4</sup> ) into a bi-infinite sequence of open intervals alternating between being included and disjoint from S, as depicted in Fig. 5.

The whole pattern described on the interval (1, 2) can be directly transposed onto the interval (2, 3) with an exact offset of +1. Similarly, the distribution of <sup>S</sup> over the interval (0, 3) can be transposed onto every interval (3k, 3<sup>k</sup> + 3) with an offset of +3k, for k > 0. The only real values for which we do not describe their relation with S are the points surrounded by an interval included in S on one side, and an interval disjoint from S on the other side. These points never conflict with the axiomatization *AXIOMSSupp* which only deals with non-empty open intervals.

By construction, S satisfies each axiom of the formula *AXIOMSSupp*, and is therefore a model of this formula. 

### 4.4 Encoding a Configuration of the Turing Machine

Now that the supporting grid has been properly defined, the actual encoding of a given configuration can be addressed. That is, the state, the tape content and the head position of the (<sup>k</sup> + 1)-th configuration of a run are encoded on the supporting points contained within the interval [3k, 3<sup>k</sup> + 3).

Encoding the State. Encoding the state of a given configuration is rather direct since we defined the grid to contain N consecutive supporting points within every interval [3k, 3<sup>k</sup> + 1] for <sup>k</sup> <sup>∈</sup> <sup>N</sup>, that can support the encoding of a state. We only need to indicate that we start reading the encoding on a multiple of 3. However the logic uf<sup>1</sup>·rdl does not allow to express periodicity constraints on variables. Nevertheless, thanks to our axiomatization, 0 and every other positive multiple of 3 are the only points that simultaneously have no supporting predecessor, while admitting a supporting successor. These properties are expressible as follows:

*NoPredSupp*(x) = <sup>∀</sup>z.(z<x <sup>∧</sup> *Support*(z)) ⇒ ∃y. z < y < x <sup>∧</sup> *Support*(y) *HasSuccSupp*(x) = <sup>∃</sup>z. x < z <sup>∧</sup> *Support*(z) ∧ ∀y. x < y < z ⇒ ¬*Support*(y)

For convenience, we introduce the formula *EncodingBegins* to characterize a real value x on which the encoding of a state starts:

*EncodingBegins*(x) = *Support*(x) <sup>∧</sup> *NoPredSupp*(x) <sup>∧</sup> *HasSuccSupp*(x) Furthermore, the formula *State*<sup>q</sup> expresses that a state q ∈ Q is encoded on a given real number <sup>x</sup> and its <sup>N</sup> <sup>−</sup> 1 supporting successors:

$$State\_q(x) = EncodingBegin(x) \land \exists y\_1, \dots, y\_N. x = y\_1$$

$$\land \bigwedge\_{1 \le i < N} succ\_{Sup}(y\_{i+1}, y\_i) \land \bigwedge\_{1 \le i \le N} P(y\_i) = b\_i^q$$

where <sup>P</sup>(y<sup>i</sup>) = <sup>b</sup> q <sup>i</sup> is a shorthand for <sup>P</sup>(y<sup>i</sup>) if <sup>b</sup> q <sup>i</sup> <sup>=</sup> , and <sup>¬</sup>P(y<sup>i</sup>) if <sup>b</sup> q <sup>i</sup> <sup>=</sup> <sup>⊥</sup>.

Encoding the Head Position. The position of the head is encoded in the second part of the grid, that is, in the interval (3<sup>k</sup> + 1, 3<sup>k</sup> + 2) for the (<sup>k</sup> + 1)-th configuration (cf. Fig. 1). The grid on this interval is <-isomorphic to Z. Each element of this subordering will correspond to a position of the tape. When the predicate P is true at such a point, it means that the head points towards that cell. Since the Turing machines that we consider here have a single read/write head, it must point towards a unique cell for each configuration. Therefore P must be true only for a single element of that subordering.

Encoding the Tape Content. Similarly, the tape content is encoded in the third part of the grid, that is, in the interval (3<sup>k</sup> + 2, 3<sup>k</sup> + 3) for the (<sup>k</sup> + 1)-th configuration (cf. Fig. 1). Again, the grid on this interval is <-isomorphic to Z. And again, each element x of this subordering will correspond to a cell of the tape, matching the cell that corresponds to <sup>x</sup> <sup>−</sup> 1 in the head position interval. Figure 6 illustrates the connections between the suborderings, within a single configuration and with the next one. The idea of the encoding is to simply set the value of P to true on the elements of the subordering that correspond to cells containing a 1, and to false for cells containing a 0.

Fig. 6. The first two consecutive configuration encodings.

#### 4.5 Enforcing a Valid Run

Let us now define formally the formulas characterizing an accepting run of M. We will decompose the global formula into three main parts: the initial conditions *START*M, the conditions on the transitions *STEP*<sup>M</sup> and the halting condition *END*M. For the sake of clarity, we use capital letters for these higherlevel formulas.

The initial conditions of <sup>M</sup> are that the state encoded on **<sup>0</sup>** and its <sup>N</sup> <sup>−</sup> 1 supporting successors is the initial state q0, that the head points towards a unique initial unspecified cell of the tape, and finally that the tape is initially filled with 0's. These conditions are expressed by the following formula:

$$\begin{aligned}START\_{\mathcal{M}} &= State\_{q\_0}(\mathbf{0}) \land \left[ \exists y. \mathbf{1} < y < \mathbf{2} \land Support(y) \land P(y) \right. \\ &\qquad \land \forall x. \left( \mathbf{1} < x < \mathbf{2} \land Support(x) \land P(x) \right) \Rightarrow x = y \right] \\ &\qquad \land \left[ \forall y. \left( \mathbf{2} < y < \mathbf{3} \land Support(y) \right) \Rightarrow \neg P(y) \right] \end{aligned}$$

The requirements on the transition are more complex. Intuitively, if before reaching the step <sup>i</sup> <sup>∈</sup> <sup>N</sup>, we have not yet encountered the halting state <sup>q</sup><sup>F</sup> , then we must ensure that the configuration at Step i can be obtained from the configuration at the previous step <sup>i</sup>−1 by following a transition (q, α, q , α , λ) <sup>∈</sup> Δ. The overall formula for this condition is the following:

$$\begin{aligned} STEP\_{\mathcal{M}} &= \forall y. \left( y > 0 \land EncoolingBegins(y) \land NotEnd\_{\mathcal{M}}(y) \right) \\ &\Rightarrow \exists x. \, y = x + 3 \land Transition\_{\mathcal{M}}(x, y) \end{aligned}$$

The subformula *NotEnded*<sup>M</sup>(y) expresses that no valid real value prior to <sup>y</sup> (i.e., a positive multiple of 3 strictly smaller than <sup>y</sup>) encodes the halting state. This formula is defined by:

$$NotEnded\_{\mathcal{M}}(y) = \forall x. \left( x < y \land EncodingBegin(x) \right) \Rightarrow \neg(State\_{qF}(x))$$

The subformula *Transition*<sup>M</sup>(x, y) expresses that there exists a transition (q, α, q , α , λ) <sup>∈</sup> <sup>Δ</sup> that allows to move in one step from the configuration encoded at x (i.e., that the encoding of the configuration starts exactly on x), to the configuration corresponding to y. To improve readability, we decompose the condition on the transition relation as follows:

$$\begin{aligned} \textit{Transition}\_{\mathcal{M}}(x, y) &= \\ \bigvee\_{(q, \alpha, q', \alpha', \lambda) \in \Delta} \left[ \textit{State}\_{q}(x) \wedge \textit{State}\_{q'}(y) \wedge \textit{Type}\_{\alpha, \alpha'}(x, y) \wedge \textit{Head}\_{\lambda}(x, y) \right] \end{aligned}$$

For a given transition (q, α, q , α , λ) <sup>∈</sup> <sup>Δ</sup>, the conditions on the states, tape and head are expressed as follows:


$$\begin{aligned} \operatorname{Tape}\_{\alpha,\alpha'}(x,y) &= \left[ \forall z. \left( x+1 < z < x+2 \land \operatorname{Support}(z) \land P(z) \right) \right] \\ &\Rightarrow P(z+1) = \alpha \land P(z+4) = \alpha' \right] \\ \wedge \left[ \forall z. \left( x+1 < z < x+2 \land \operatorname{Support}(z) \land \neg P(z) \right) \Rightarrow \left( P(z+1) \Leftrightarrow P(z+4) \right) \right] \end{aligned}$$

where <sup>P</sup>(<sup>z</sup> + <sup>k</sup>) = <sup>α</sup> is a shorthand for <sup>∃</sup>u. u = <sup>z</sup> + <sup>k</sup> <sup>∧</sup> <sup>P</sup>(u) if <sup>α</sup> = 1, and <sup>∃</sup>u. u = <sup>z</sup> + <sup>k</sup> ∧ ¬P(u) if <sup>α</sup> = 0. The " + 1" operator allows us to connect the encoding of the head position with the encoding of the tape content within the same configuration. The " + 4" operator does the same while jumping to the next configuration (cf. Fig. 4). Notice that this formula does not involve y; it assumes (rightfully, given the formula *STEP*M) that the equality <sup>y</sup> <sup>=</sup> <sup>x</sup>+ 3 holds.

– The head is moved in the direction specified by λ ∈ {L, R}, i.e., left for L and right for R. This can be expressed by exploiting the predecessor and successor relations defined for supporting real values.

$$\begin{aligned} \operatorname{Head}\_{\lambda}(x, y) &= \forall z. \left( x + 1 < z < x + 2 \land \operatorname{Support}(z) \land P(z) \right) \\ &\Rightarrow \exists v. \, f\_{\lambda}(v, z + 3) \land P(v) \land \neg P(z) \end{aligned}$$

where <sup>f</sup><sup>R</sup> <sup>=</sup> *SuccSupp* and <sup>f</sup><sup>L</sup> <sup>=</sup> *PredSupp*. Since in the initial configuration of the Turing machine the head points towards a single cell, the formula *Head*<sup>λ</sup> ensures that this remains the case throughout every run of the Turing machine.

Finally, the existence of a halting run is expressed by the formula:

$$END\_{\mathcal{M}} = \exists x.State\_{q\_F}(x).$$

The global formula that expresses that the Turing machine M halts on some run encoded by the value of the predicate P is the following:

$$HALT\_{\mathcal{M}} = START\_{\mathcal{M}} \land STEP\_{\mathcal{M}} \land END\_{\mathcal{M}} \land AXIOMS\_{Supp}$$

where *AXIOMSSupp* is the axiomatization of the supporting points as described in Sect. 4.3.

By construction, satisfiability of the global formula *HALT*<sup>M</sup> is equivalent to the existence of a halting run for the Turing machine M. It follows that the satisfiability problem for uf<sup>1</sup>·rdl is undecidable, which proves Theorem 2.

### 5 Conclusion

This work provides a lower and an upper bound for the decidability of firstorder fragments with quantifiers mixing uninterpreted unary predicates and weak forms of real arithmetic. This draws a precise picture of the frontier of decidability in fragments mixing real arithmetic and uninterpreted predicates.

We proved the decidability of the fragment uf<sup>1</sup>·idl·iro, where uninterpreted unary predicates, order constraints between real and integer variables, and difference logic constraints between integer variables are allowed. This result is a consequence of the already established decidability of its restriction uf<sup>1</sup>·ro, where only uninterpreted unary predicates and order constraints between real values are allowed. To the best of our knowledge, there does not exist yet a practical decision procedure for uf<sup>1</sup>·ro.

There exist fragments of arithmetic that are more expressive than difference logic, but still weaker than full Presburger arithmetic. It would be interesting to investigate if decidability for these is preserved in presence of uninterpreted unary predicates. Note however that our proof of decidability strongly relies on the translation of the constraints into the first-order theory of order over R, with unary predicates. This translation is not suitable for, e.g., constraints of the form <sup>x</sup> + y - 0, where <sup>x</sup> and <sup>y</sup> are variables, and -∈ {<, <sup>≤</sup>, =, <sup>≥</sup>, >}.

In another result, we established the undecidability of the fragment uf<sup>1</sup>·rdl, where uninterpreted unary predicates and difference logic constraints between real variables are allowed. It is worth mentioning that this result can be adapted straightforwardly to the same logic interpreted over the domain Q.

Our long term goal is to design an effective decision procedure for the decidable fragment. Complexity results have been established [6,13,14] for the temporal logic counterpart of the theory of order, to which we reduce the decidability of our fragment of interest. We are currently designing a decision procedure relying on the concept of automata on linear orderings introduced in [3]. We hope that the insight we obtained through this decision procedure will eventually guide the design of new powerful instantiation techniques for SMT in a more expressive context, and that these techniques will happen to be complete in particular for this decidable fragment.

Acknowledgments. We are thankful to Tanja Schindler and the reviewers of this paper and of our previous work-in-progress workshop paper for their comments.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Incremental Rewriting Modulo SMT**

Gerald Whitters1(B), Vivek Nigam<sup>3</sup>, and Carolyn Talcott<sup>2</sup>

 UPENN, Philadelphia, USA whitters@seas.upenn.edu SRI International, Menlo Park, USA Federal University of Para´ıba, Jo˜ao Pessoa, Brazil

**Abstract.** Rewriting Modulo SMT combines two powerful automated deduction techniques (1) rewriting and (2) SMT-solving. Rewriting enables the specification of behavior of systems using rewriting rules, while SMT theories specify system properties. Rewriting Modulo SMT is enabled by combining existing tools, such as Maude and SMT solvers. Search algorithms used for carrying out Rewriting Modulo SMT, however, cannot exploit the incremental solving features available in SMT solvers as they are based on breadth-first search. This paper addresses this limitation by proposing Incremental Rewriting Modulo SMT Theories, which is a syntactical restriction to rewriting rules. This restriction turns out to naturally be used in several applications of Rewriting Modulo SMT, including the verification of algorithms, cyber-physical systems, and security protocols. Moreover, we propose a Hybrid-Search algorithm for Incremental Rewriting Modulo SMT Theories that combines breadth-first search and depth-first search, thus enabling incremental SMT-solving. We demonstrate through a collection of existing benchmarks that the Hybrid-Search algorithm can achieve a 10 times performance improvement in verification times.

### **1 Introduction**

Rewriting modulo SMT [14] is the result of the combination of two powerful automated deduction methods: rewriting logic and SMT-solving. It is supported by the integration [11] of powerful tools, such as Maude [6] and Z3 [8]. During rewriting, a set of constraints on the symbols appearing in a term are generated. These constraints can be, for example, non-linear arithmetic constraints that specify possible values that can be assumed by the configuration parameters. Demonstrating properties of such specifications amounts to search using these rewrite rules and satisfiability checking of the accumulated constraints using SMT solvers. Rewriting modulo SMT has been successfully applied in several case-studies from several domains, including safety of cyber-physical systems (CPSes) [13]; verification of algorithms [2]; and for network security analysis [16].

One important aspect that has not been addressed until now is how to exploit an SMT solver's capability of incrementally solving problems. In this solving method, instead of checking for the satisfiability of a formula from scratch, it reuses data previously computed by prior checks. For example, if the satisfiability of a formula b has been checked, the check on b∧b<sup>I</sup> may re-use the intermediate results obtained while checking for the satisfiability of b. It has been shown that incremental solving can greatly improve performance by a factor of 2–5 times [10].<sup>1</sup>

The search algorithms used to implement rewriting modulo SMT are similar to those implemented in the Maude search engine [6]. They use a breadth-first search (BFS) algorithm with memoization techniques in order to improve performance. This type of search seems incompatible with incremental solving as constraints appearing in different branches of the search tree are generated under different conditions. Thus, it is hard to define what the increment (b<sup>I</sup> mentioned above) would be.

This paper's goal is to enable rewriting modulo SMT that can exploit incremental solving. To achieve this, we make the following contributions:


We carried out a collection of experiments (the case studies mentioned above) on algorithm verification, cyber-physical systems verification, and network security analysis. The experiments show that in all these benchmarks, the hybrid search algorithm outperforms current BFS techniques, in some experiments achieving a 10 factor performance improvement.

Section 2 illustrates the problems of existing BFS methods for Rewriting Modulo SMT and proposes Incremental Rewriting Theories which formalizes the notion of increments. Section 3 describes the Hybrid algorithm proposed illustrating how it enables incremental SMT solving. Section 4 describes experiments that compare different search mechanisms (BFS, DFS, and Hybrid) on existing benchmarks from the literature. Finally, we conclude by discussing Related Work in Sect. 5 and Future Work in Sect. 6.

### **2 Incremental Rewriting Modulo SMT**

Rewriting logic [12] is a logical formalism that is based on two ideas: states of a system are represented as elements of an algebraic data type, specified in an

<sup>1</sup> Albeit, incremental solving can also reduce performance depending on the theories that are used.

equational theory, and the behavior of a system is given by local transitions between states described by rewrite rules. A rewrite rule has the form t → t if b, where t and t are terms possibly containing variables and b is a condition (a boolean term). Such a rule applies to a system in state s (a ground term) if t can be matched to a part of s by supplying the right values for the variables, and if the condition b holds when supplied with those values. In this case, the rule can be applied by replacing the part of s matching t by t using the matching values for the variables in t .

Maude is a language and tool based on rewriting logic [6]. Maude provides a high performance rewriting engine featuring matching modulo associativity, commutativity, and identity axioms; and search and model-checking capabilities. Thus, given a specification S of a concurrent system, one can execute S to find one possible behavior; use search to see if a state meeting a given condition can be reached; or model-check S to see if a temporal property is satisfied, and if not, to see a computation that is a counterexample.

Symbolic rewriting modulo SMT [13,14] allows rewriting symbolic states (t, b), where t is a term possibly containing variables and b a boolean term constraining the allowed values of variables of t. The symbolic state (t, b) represents the set of (concrete) states that are instances of t such that the instantiating substitution satisfies b. Thus a rewrite to a symbolic state t , b ) such that b is not satisfiable represents the empty set of concrete rewrites and satisfiability can be checked at each step to avoid useless work. This independent of checking that a goal is satisfied by a symbolic state. To implement symbolic rewriting in Maude, variables are replaced by symbols, treated as constants by Maude, and translated as variables when using an SMT solver to check satisfiability of the constraint. Symbolic rewriting allows us to reason about open systems, and to reason about all (possibly infinitely many) instances of a configuration.

Verification problems are expressed as reachability problems expressed as statements for the form

$$\mathsf{search}(\mathsf{t}\_{0}, \mathsf{b}\_{0}) \Rightarrow (\mathsf{t}', \mathsf{b}') \text{ such that } \mathsf{goal}(\mathsf{Cond}(\mathsf{t}', \mathsf{b}'))$$

where (t , b ) is a pattern and goalCond is a boolean function that checks whether a state satisfies some condition. Typically, goalCond(t , b ) also makes calls to the SMT solver to check whether some constraints derived from b are satisfiable.

As illustrated by Fig. 1, Rewriting Modulo SMT implementations [11] traverse the search tree derived from the rewrite rules using BFS-based algorithms. At each step, e.g., (t0, b0) → (t1, b1), the engine checks for the satisfiability of the condition b1. If the check fails, then search backtracks following BFS strategy. Otherwise, if the check succeeds, then the engine checks (1) whether (t1, b1) matches the pattern (t , b ) and (2) if this is the case, it checks the condition goalCond(t1, b1), which may make further calls to the SMT solver, written as SMT(goalCond(t1, b1)). If goalCond returns true, then a solution for the reachability problem is found. Otherwise, the algorithm continues search following BFS.

From the sequence of calls to the SMT solver, one can observe the following difficulties of exploiting incremental SMT solving when using BFS based search strategy:

**Fig. 1.** Illustration of the search tree and SMT-calls when using Rewriting Modulo SMT following a BFS algorithm. The sequence of SMT-calls of a BFS algorithm is depicted to the left, where SMT(goalCond(t*i,* b*i*)) denotes possible SMT-calls required by the goal condition goalCond. The numbers inside the circles specify the order in which nodes are traversed.


To address this problem, we introduce a special class of rewrite theories, called *Incremental Rewrite Theories*.

**Definition 1.** *An incremental rewrite theory is a rewrite theory specification* Σ, E, R *where* Σ *is a typed alphabet;* E *is an equational theory; and* R *is a set of rewrite rules of the forms:*

(t, b) → (t1, b ∧ b<sup>I</sup> ) *and* (t, b) → (t1, b ∧ b<sup>I</sup> ) *if cond*

*where* t, t<sup>I</sup> *are well-formed terms;* b, b<sup>I</sup> *are boolean formulas (in a given theory); and cond is a conjunction of equations.* <sup>2</sup>

<sup>2</sup> The rule on the left is an unconditional rewrite rule that can be applied whenever it matches a subterm of the current state. The rule on the right is conditional. cond specifies conditions under which the rule can be applied. The condition is checked using using the equational theory to determine if the equations are satisfied by a candidate matching substitution. The term (t*,* b) represents a set of values, namely all instances for which the constraint b is true. A constraint solver is used to determine if b is satisfiable, that is, if the set of values is non-empty. In brief, the difference between b and cond is how they are used in reasoning.

The verification problem for incremental problems is a specialized reachability problem as defined below.

**Definition 2.** *Let* T *be an incremental rewrite theory. An incremental reachability problem over* T *is of the form:*

*search*(t0, b0) ⇒ (t , b ) *such that goalTerm*(t ) *and* SMT(b ∧ b<sup>I</sup> )

*where goalTerm is a function that takes a term and returns a boolean value and* b<sup>I</sup> = goal(t ) *is a boolean formula constructed from* t *.*

The following three examples illustrate how incremental theories can model different types of systems. These examples are based on specifications from the literature [2,13,16]. For ease of exposition, we simplify the rules in the description below. In Sect. 4, the full specifications from the literature are used in our experiments.

*Example 1.* This example is based on the work [2] for verification of the CASH scheduling algorithm [4]. In this algorithm, each task has a worst-case execution time. Whenever a task is completed before its deadline, the unused processing time is added to a global queue of unused budget, which can then be used by other tasks. Rewriting modulo SMT has been used to verify whether it is possible for a task to miss its deadline [2]. In particular, constraints keep track of the processing times and the available time budgets.

It turns out that the specification of this algorithm as rewrite rules and the verification problem are an incremental rewrite theory and an incremental reachability problem, respectively. For example, the following rule specifies when a deadline is missed:

(id<sup>1</sup> : global | deadlineMiss : missStat,Ats, id<sup>0</sup> : server | state : st, usedBudget : t,timeDeadline : t1, maxBudget : n rest, b) → (id<sup>1</sup> : global | deadlineMiss : true,Ats, id<sup>0</sup> : server | state : st, usedBudget : t,timeDeadline : t1, maxBudget : n rest, b ∧ b<sup>I</sup> ) if (st = waiting ∨ st = executing)

where rest is the specification of the remaining tasks, Ats are other attributes of the server, b<sup>I</sup> is the set of constraints t ≥ 0 ∧ t<sup>1</sup> ≥ 0 ∧ n > 0 ∧ (n − t) > t1. This rule specifies that the deadline is missed if there is a task id<sup>0</sup> that is not finished, i.e., either waiting or executing, such that the time to finish (t1) cannot be met by the available time budget n − t required by the task.

The verification problem of checking whether for some given configuration (t0, b0) of server and tasks, a task can miss its deadline is specified by the following search command which is an incremental reachability problem

search(t0*,* b0) ⇒ (id<sup>1</sup> : global | deadlineMiss : true*,*Ats rest*,* b- ) such that SMT(b- )

*Example 2.* Rewriting Modulo SMT has been used for verifying whether resource bounded intruders can slowly deny access to webservers [16]. This type of attack was inspired by application layer DDoS attacks such as Slowloris [7] where the attacker attempts to exhaust all the resources of a webserver by periodically sending bursts of multiple requests. When receiving such bursts of requests, the webserver has to allocate resources for at least some period of time, called timeout. As the webserver has limited resources, the attacker is capable of denying service to legitimate users by sending enough bursts.

Constraints were used in previous work [16] to keep track of (1) the number of resources available by the webservers, and (2) the timeout period of bursts. While we refer to the previous work [16] for the complete formalization, we illustrate the incrementality of such specifications with a simplified version of the protocol initialization rule from reference [16].

([iid | pxs | ri | Trec] [sid | pxs | rs], b) → ([iid <sup>|</sup> px(num,rp) pxs <sup>|</sup> ri<sup>ν</sup> <sup>|</sup> Trec] [sid <sup>|</sup> px(num,rp) pxs <sup>|</sup> rs<sup>ν</sup>], <sup>b</sup> <sup>∧</sup> <sup>b</sup><sup>I</sup> )

This rule specifies that the intruder iid with ri resources creates a new burst of protocol session instances px(num,rp) with num instances each using rp resources, where num is a symbol. These instance requests are received by the server sid which has rs resources. The resources of the intruder, ri, and the resources of the server rs are updated to the fresh symbols ri<sup>ν</sup> and rs<sup>ν</sup>. These symbols are constrained by the boolean increment <sup>b</sup><sup>I</sup> defined as ri<sup>ν</sup> = (ri <sup>−</sup> num <sup>×</sup> rp) <sup>∧</sup> rs<sup>ν</sup> <sup>=</sup> (rs <sup>−</sup> num <sup>×</sup> rp) <sup>∧</sup> num <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> ri<sup>ν</sup> <sup>≥</sup> 0. Similar rules specify when the protocol sessions timeout and are cleaned up by the server thus releasing resources.

The verification property is to check whether a bounded intruder with some limited number of resources ri can deny service by consuming the server sid's resources. This can be expressed by an incremental reachability property as follows where (t0, b0) specifies the initial condition when all intruder and server resources are free:

```
search(t0, b0) ⇒ ([iid | pxs | ri | Trec] [sid | pxs | rs], b
                                                           ) such that SMT(b ∧ bI )
```
where b<sup>I</sup> is the constraint rs ≤ 0 specifying that the resources of the server sid are depleted.

*Example 3.* This example of verification of cyber-physical systems (CPSes) is based on reference [13]. A CPS is represented by a set of agents (ag1,..., agn) that interact with the environment (env) to achieve some goal while not violating properties, such as the minimum distance to other objects.

Constraints are used to specify agent's physical attributes, such as its position, at(ag,(x, y)), speed, spd(ag, v), acceleration, acc(ag, acc), and direction dir(ag, dir) of an agent ag. The evolution of a system with one agent can be specified by the following incremental rule when assuming, for simplicity, that the agent's direction is on the x-axis.

$$\left( \left[ \texttt{env} \mid \texttt{at}(\texttt{ag}, (x, y)), \texttt{spd}(\texttt{ag}, v), \texttt{acc}(\texttt{ag}, acc), \texttt{dir}(\texttt{ag}, dir), \texttt{kb} \right] \texttt{conf}, \texttt{b} \right) \rightarrow \left( \left[ \texttt{env} \mid \texttt{at}(\texttt{ag}, (x\_1, y\_1)), \texttt{spd}(\texttt{ag}, v\_1), \texttt{acc}(\texttt{ag}, acc), \texttt{dir}(\texttt{ag}, dir), \texttt{kb} \right] \texttt{conf}, \texttt{b} \wedge \texttt{b}\_I \right)$$

Here kb is the set of other knowledge-base elements, conf contains the agent's internal representation, x1, y1, v<sup>1</sup> are fresh symbols and b<sup>I</sup> is set of constraints: x<sup>1</sup> = (x + (v + v1) × dt/2) ∧ y<sup>1</sup> = y ∧ v<sup>1</sup> = v + acc × dt. These constraints specify the agent's new position and speed using classical physics equations.

The verification property bad where an agent is too close to an obstacle, such as a pedestrian, is specified by the search command:

search(t0*,* b0) ⇒ ([env | at(ag1*,* (*x*1*, y*1))*,* at(ag2*,* (*x*2*, y*2))*,* kb] conf*,* b- ) such that SMT(b-∧ b*<sup>I</sup>* )

where b<sup>I</sup> is the set of constraints: x<sup>1</sup> = x<sup>2</sup> ∧ y<sup>1</sup> = y2, specifying that two agents ag<sup>1</sup> and ag<sup>2</sup> are in the same position, i.e., colliding.

### **3 Hybrid BFS-DFS Algorithm**

The definition of Incremental Rewrite Theories addresses the problem of the **Definition of Increments** discussed above. The second problem (**Not possible to chain incremental calls**) still needs to be addressed. Indeed, BFS procedures do not enable the chaining of incremental calls. To illustrate this, consider again the search tree and BFS execution in Fig. 1. Assume that b<sup>1</sup> = b<sup>0</sup> ∧ b0,1, b<sup>2</sup> = b<sup>0</sup> ∧ b0,<sup>2</sup> and that goalCond(t, b) has the form b ∧ b<sup>I</sup> as one would expect when using incremental rewrite theories. It is possible to call the SMT solver incrementally during the sequence of calls SMT(b1) and SMT(goalCond(t1, b1)), but not chain incrementally the call SMT(b2). This is because it is not possible to define an increment between b<sup>1</sup> and b<sup>2</sup> as they lie in different branches of the search tree.

The first obvious alternative is using Depth-First Search (DFS) instead of BFS. This would indeed lead to an execution that could chain incremental calls to the SMT solver. For example, in the tree depicted in Fig. 1, the sequence of calls would be

SMT(b0); SMT(goalCond(t0, b0)); SMT(b1); SMT(goalCond(t1, b1)); SMT(b3); SMT(goalCond(t3, b3))...

Since b<sup>3</sup> is of the form b<sup>0</sup> ∧ b0,<sup>1</sup> ∧ b1,3, we know the increment is b1,3. There are, however, two problems with DFS. The first problem is that DFS may not find a solution that could be found using BFS due to an infinite branch. The second problem is that the sequence of call using goalCond(t, b) appears in between the increments, e.g., SMT(b0); SMT(goalCond(t0, b0)); SMT(b1).

We propose the algorithm hybrid search described in Fig. 2 that addresses these two problems of DFS by combining BFS and DFS and using the PUSH and POP features of SMT solvers for incremental solving. These features enable the creation of backtracking scopes of learned clauses. By default, sequential calls to SMT will attempt to use incremental solving based on the constraints solved in previous calls. A call to PUSH will add to the solver stack any learned clauses from calls to SMT while a call to POP will remove any learned clauses since the last PUSH.

**Fig. 2.** Pseudo-code of the Hybrid Search Algorithm hybrid search.

The hybrid search algorithm takes as input the search tree T<sup>3</sup>, a non-negative natural number d, and a goal condition g. Intuitively, the parameter d specifies the depth to which the algorithm shall perform DFS before switching to BFS.

We start with *Queue* empty and a *Solver* . hybrid search starts at line 4 with the next few lines initializing *found* to be NULL and pushing the root of T onto *Queue*. The while loop starts with line 7 continuing while *Queue* is non empty and no solution has been found. It pops the next node off the *Queue* on line 8, then calls dfs bounded on the next line using this node as the root starting on line 12. dfs bounded is a modified depth-bounded depth-first search. It starts

<sup>3</sup> Notice that in practice, there is a mechanism that constructs the tree on the fly.

**Fig. 3.** Illustration of an hybrid search algorithm execution using the goal condition g and depth two. The POP surrounded by a box indicates the points when the algorithm back-tracks in the search tree. The numbers inside the circles specify the order in which nodes are traversed.

with creating a backtracking scope on *Solver* by calling PUSH and storing the result SMT(b) where b is the boolean constraint of the current node.

Subsequently, in line 16, it checks if SMT(b) returned UNSAT, and if so, we POP and return immediately and not explore any children of this node. Any descendent nodes would have a boolean constraint of the form b ∧ b<sup>I</sup> for some b<sup>I</sup> , and since SMT(b) is UNSAT it must be the case that b ∧ b<sup>I</sup> is also UNSAT. Otherwise, we continue with checking if goal(node) is true on line 20 and if so setting *found* to this node and then terminating dfs bounded and hybrid search. If *found* is not set, then line 24 checks when the current depth is equal to the depth parameter d and if it is we add all of the children nodes, i.e., all the nodes that are d+ 1 depth away from the initial root node called from line 9, to *Queue* and no more nodes at a lower depth are visited for now. After all such nodes are added, the execution returns to line 7 to start another dfs bounded from the next element in *Queue*. Until then, it continues traversing the tree in a DFS-like manner on line 30 ensuring that when dfs bounded backtracks, we call POP for each node, and hence it backtracks such that *Solver* can properly unlearn clauses that it no longer needs.

We illustrate the execution of hybrid search with the tree shown in Fig. 3. It also contains the sequence of calls to PUSH, POP and SMT due to the initial call to dfs bounded. The sequence of calls illustrates the chaining of incremental calls to the SMT solver. For example, the data-structures constructed in the call SMT(b1) are used in the SMT calls for b3, b4, including the calls goal(b3) and goal(b4). This makes sense as b<sup>1</sup> is sub-formula of b3, b4, goal(b3) and goal(b4). However, the data-structures constructed in the SMT call for goal(b1) are not stored due to the subsequent POP call, as goal(b1) is not necessarily a subformula of b3, b4, goal(b3) and goal(b4). The second observation is the combination of DFS and BFS. While the subtree of depth d = 2 is traversed, the algorithm removes the data-structures constructed during the call of SMT(b1), indicated by the 2 × POP in Fig. 3, as b<sup>1</sup> is not necessarily a subformula of b2.

Notice that the depth parameter (d) plays the role of specifying how much incremental solving one is willing to use with the risk of traversing longer a branch of the search tree that may not have a solution. For example, in the tree and execution shown in Fig. 3, the algorithm will traverse the node (t7, b7) and will call SMT(b7), but without using the data-structures constructed previously for b3, that is, it will not solve it incrementally.

The following results relate hybrid search with BFS and with DFS.

**Proposition 1.** *Let* T *be a tree and* g *be a decidable goal condition. Then,* hybrid search(T, 0, g) *will traverse* T *in the same order as BFS.*

**Proof Sketch.** A DFS search bounded by depth 0 will only traverse a single node, the node it starts at. Then, it adds nodes to a FIFO queue in the same manner as BFS. Hence, hybrid search(T, 0, g) will traverse T in the same order as BFS. **QED**.

**Proposition 2.** *Let* T *be a tree and* g *be a decidable goal condition. Suppose the depth of* T *is* d*. Then, for any* k ≥ d*,* hybrid search(T, k, g) *will traverse* T *in the same order as DFS.*

**Proof Sketch.** If k is greater than or equal to the depth of T, then a k depth-bounded DFS from the root node would traverse all of T. Hence, hybrid search(T, k, g) traverses the T in the same order as DFS. **QED**.

The following statement provides coverage guarantees.

**Proposition 3.** *Let* d > 0*,* T *be a tree of finite branching, and* g *be a decidable goal condition. Then,* hybrid search(T, d, g) *finds a solution in finite time, i.e., some node* n *in* T *such that* g(n) *is true, if such a solution exists.*

**Proof.** Let B<sup>i</sup> be the number of nodes in T at depth i. Suppose that the solution node n exists at depth r and no solutions exist at a lower depth. Let 0 ≤ r ≤ qd for some q. The first depth-bounded DFS will traverse all nodes up to depth d. This then adds B<sup>d</sup>+1 nodes to Queue. Running the depth-bounded DFS run these nodes will traverse all the nodes to 2d. Traversing all nodes up to qd would take 1+B<sup>d</sup>+1 +B<sup>d</sup>+2 +...+Bqd iterations of depth-bounded depth first searches. Since n exists at depth r ≤ qd and each B<sup>i</sup> is finite since T has finite branching, n would be found in finite time. **QED**.

To address the fact that search trees may have infinite depth, often one uses bounded search that searches the tree until only some given depth d. The following proposition states that in these cases it is best to deploy hybrid search with depth d to search through all nodes of the sub-tree, provided incremental SMT calls are more efficient than SMT calls from scratch.

**Proposition 4.** *Let* T *be a tree of finite branching with branching factor* b *and* g *be a decidable goal condition. Let* T(d) *be the sub-tree of* T *of depth* d *with* d > 0*. Assume that incremental SMT calls, i.e., using* PUSH*, take less time than calls from scratch, i.e., without using* PUSH*. Then for any* d ≥ 0 *such that* d = d*, the time required by* hybrid search(T, d, g) *to traverse all nodes in* T(d) *is less than the time of* hybrid search(T,d , g) *to traverse all nodes in* T(d)*.*

**Proof.** Let 0 <r< 1 be the average performance benefit from incremental SMT calls and t be the time it takes for non-incremental SMT calls. Let B<sup>i</sup> be the number of nodes at depth i. Since b is finite, each B<sup>i</sup> is finite. The time required by hybrid search(T, d, g) to traverse all nodes in T(d) is t + rtB<sup>1</sup> + rtB<sup>2</sup> + ... + rtBd. Suppose that 0 < d < d. Let pd < d ≤ (p + 1)d for some p. For hybrid search(T,d , g) to traverse all nodes in T(d), it must traverse all nodes in T((p + 1)d ) because each dfs bounded must travel exactly d depth, hybrid search(T,d , g) will traverse only depths that are multiples of d . Then, the time required for hybrid search(T,d , g) is t+rtB1+...+rtB<sup>d</sup>-+tB<sup>d</sup>-+1+rtB<sup>d</sup>-+2+ ...rtB2d- + ... + tBpd- + rtBpd-+1 + ... + rtB(p+1)d- . There are p + 1 terms that do not get the benefit from incremental SMT calls for hybrid search(T,d , g) while there is 1 term that does not get this benefit for hybrid search(T, d, g). Hence, the time required for hybrid search(T, d, g) to traverse all nodes in T(d) is less than the time required for hybrid search(T,d , g) to traverse all nodes in T(d). Now, suppose that d > d. Then, for hybrid search(T,d , g) to traverse all nodes in T(d), it must traverse all nodes in T(d ). The time required for hybrid search(T,d , g) is t + rtB<sup>1</sup> + rtB<sup>2</sup> + ... + rtB<sup>d</sup>- . But, because d > d and each rtB<sup>i</sup> > 0 the time required for hybrid search(T, d, g) is less than hybrid search(T,d , g). Hence, the time required for hybrid search(T, d, g) to traverse all nodes in T(d) is less than the time required for hybrid search(T,d , g) to traverse all nodes in T(d). Therefore, for any d = d the the time required for hybrid search(T, d, g) to traverse all nodes in T(d) is less than the time required for hybrid search(T,d , g) to traverse all nodes in T(d). **QED**.

### **4 Implementation and Experiments**

Our implementation is based on Python with the Z3 SMT solver and Maude integrated using Python bindings [15] as depicted in Fig. 4. The Z3 Solver is responsible for checking the incremental satisfiability of constraints using SMT, PUSH and POP, while Maude is responsible for executing rewriting rules. The Maude bindings allow for loading Maude files into the Python implementation of hybrid search. The search is done with a Python function that repeatedly calls the Maude search with one step (Search1) so that the traversal of the search space can be controlled. The original Maude specifications were modified to replace calls to SMT with calls to functions defined using the Maude hook mechanism for attaching external code to function symbols. This mechanism is exposed by the Maude Python bindings. There are two types of function, one that checks satisfiability while keeping any learned clauses from the check, and one that just checks without adding any learned clauses. The functions keep track of the SMT solver state using appropriate calls to PUSH and POP. The implementation is available at [17].

Figures 5, 6 and 7 summarize the experiments carried out using implementations available in the literature [3,13,16] for the verification of the systems described in Examples 1, 2, and 3. All experiments were run on a Windows 10 machine, Intel Core i7-10700J, 16 GB of RAM, on Python 3.10.2, using Maude

**Fig. 4.** Overview of the implementation used for the experiments using hybrid search, the SMT solver Z3 and the rewriting tool Maude.

Python bindings 1.1.2 and Z3 4.11.2.0. We measure the runtime for these three applications of rewriting modulo SMT to determine the performance gain from using hybrid search at various depth parameters compared to BFS and DFS. Each table shows the initial configuration for the system, then statistics for searches for BFS, DFS, and using hybrid search at various depths terminating when finding a single goal node. The statistics have the form n/m/p which specify the time n in seconds to perform verification, the number of states m traversed, and the percentage p of verification time required by SMT-solving. DNF indicates that no solution was found within 30 min. For example, the first row for cashOK<sup>1</sup> using the BFS mechanism for instance, the execution time was 6.9 seconds, requiring 91 state traversals while spending 77% of execution time in Z3.

For our experiments, we used the same subsets of the verification problems used in references [3,13,16]:


The results for the CASH verification experiments show that hybrid search finishes up to about 10 times faster than BFS and terminates in all cases as


**Fig. 5.** CASH Verification Experiments. cashOK<sup>1</sup> = cashOK(*I*0*, I*1*, I*2*, I*3*, true*), cashOK<sup>2</sup> = cashOK(*I*0*, I*1*, I*2*, I*3*, I*0+*I*3 *> I*1+*I*2), and caseOK<sup>3</sup> = caseOK(*I*0*, I*1*, I*2*, I*1*, I*0 + *I*2 *> I*1), and *mutatis mutandis* for cashBad1, cashBad<sup>2</sup> and cashBad3.


**Fig. 6.** Slowloris Experiments. Slow<sup>1</sup> = Slowloris(1*,* 0*,* 24), Slow<sup>2</sup> = Slowloris(1*,* 0*,* 36), Slow<sup>3</sup> = Slowloris(1*,* 1*,* 12), Slow<sup>4</sup> = Slowloris(1*,* 1*,* 24), Slow<sup>5</sup> = Slowloris(1*,* 1*,* 36).


**Fig. 7.** Cyber-Physical System Verification Experiments, where cps<sup>1</sup> = pedestrian(3*,* 3*,* 2*,* 1), cps<sup>2</sup> = pedestrian(4*,* 3*,* 2*,* 1), cps<sup>3</sup> = pedestrian(5*,* 3*,* 2*,* 1), cps<sup>4</sup> = pedestrian(3*,* 4*,* 2*,* 1), cps<sup>5</sup> = pedestrian(4*,* 4*,* 2*,* 1), cps<sup>6</sup> = pedestrian(5*,* 4*,* 2*,* 1). The bound *t*, 2 × *t* and 3 × *t* is determined according to the *t* parameter of the scenario.

opposed to two of the DFS cases where it does not finish within 30 min. The overhead of Z3 is reduced from about 70% to 80% down to 6% to 25% from BFS to hybrid search. This indicates the effectiveness of the incremental SMT solving for the types of constraints used in this example.

Similarly, in the Slowloris examples, hybrid search finishes up to 10 times faster than BFS with termination while two of the DFS cases do not finish within 30 min. In these cases the overhead of Z3 goes from about 80% to 90% in BFS while it goes from about 30% to 60% in hybrid search, demonstrating the effectiveness of the incremental solving. Interestingly, even when there is a much larger number of states traversed, e.g., in case Slow<sup>2</sup> and HYBRID d = 4 with 3314 states traversed as opposed to 775 states traversed by BFS, the verification time is one third, from about 40s to 13s. This indicates that the main overhead of BFS is indeed SMT solving.

For the Cyber-Physical System (CPS) Verification experiments, hybrid search completes up to about 5 times faster than BFS. The overhead of Z3 does not change significantly in these experiments, which indicates that the incremental solving is not as effective as in the other two examples (CASH and Slowloris). The reason for this may be the non-linear nature of the constraints for CPS systems which contrast with the former two examples that use linear arithmetic constraints. Despite this, hybrid search and DFS still outperform BFS because they need to traverse fewer nodes before finding a goal node.

### **5 Related Work**

We consider three related areas of work in optimizing symbolic execution modulo SMT, hybrid search strategies, incremental constraint solving methods, and tradeoffs between search space and constraint complexity.

*Hybrid Search Strategies.* There have been others that have previously explored techniques of combining BFS and DFS so to take advantage of both of their benefits while reducing the drawbacks of each.

Reference [5] proposes a hybrid algorithm for Binary Decision Diagrams (BDDs). BDDs are are often used to represent and manipulate boolean functions symbolically. Traditionally, depth-first approaches were used in the construction of BDDs as it had relatively low memory overhead. Though, it had been discovered that using a breadth-first approach instead had better performance due to better memory access locality at the cost of larger memory overhead. To improve upon both approaches a hybrid of the two is used. Essentially, the algorithm switches between the two techniques based on its memory overhead. When the memory overhead is computed to be low, a breadth-first search is used and when it is high a depth-first search is used.

Reference [1] constructs a "breadth-first, depth-next" algorithm for building Random Forest (RF) models. An RF model is a machine learning model that uses decision trees. Both DFS and BFS approaches are used in machine learning frameworks. They observe that BFS has memory efficient access patterns at lower depths. As the depth increases it loses this benefit and virtually has random access to memory. At this point, DFS performs better. As a result, their algorithm starts with a breadth-first approach until it is computed that is no longer has efficient access pattern, switching to a depth-first approach.

Reference [9] introduces "depth-first iterative-deepening (DFID)." One of the issues with BFS is that it has exponential memory complexity. DFS can circumvent this drawback as its memory complexity is linear, but comes with its own problems. It generally requires some depth bound and check for repeated nodes, otherwise the search may not terminate. The actual depth bound needed may not be knowable at runtime and choosing a bound too low may result in the search ending without finding the solution. To counteract the downsides of BFS and DFS, DFID is used. DFID starts with DFS bounded by depth one, then performs a DFS bounded by depth two, and continue this process with incrementally larger bounded depths until a solution is found. It must visit the same nodes multiple times, but it is shown that the runtime complexity is not effected by it.

Unfortunately, none of these algorithms seem particularly helpful with respect to rewriting modulo SMT. For example, prior algorithms [1,5] attempt to take advantage of memory locality as much as possible. In our case, it would not give us much performance increase. Reference [9] requires nodes to be visited multiple times. This would lead to duplicate calls the SMT solver, only increasing the bottleneck.

*Incremental Solving.* In reference [10] the authors compare cache-based and stack-based incremental constraint solving methods in the context of symbolic execution for test generation. Cached-based incrementality works outside the solver to cache results and attempt to reuse them. Stack-based incrementality uses a solvers ability to reuse information learned when solving a subproblem and the associated push/pop interface. Implementations of the two methods and a baseline (no incrementality) were compare on large benchmark set of C programs and on randomly generated programs. The space of symbolic execution paths was searched using bounded depth first search. The authors found that caching generally increased average solving time over baseline (by a factor of 2–5 depending on code size), while stack-based methods decreased average solving time by roughly a factor of 20. This is consistent with our observations even though the source of search tree is different and the class of constraints is different.

*Trading Search Space for Constraint Complexity.* A notion of guarded term is introduced in reference [2] as a method to reduce the search state space in symbolic rewriting modulo SMT by replacing non-determinism by disjunction. The effect of using guarded terms is demonstrated in a study of the CASH algorithm for task scheduling. Many properties that could not be checked using symbolic execution modulo SMT (due to size of search space and timeout) became tractable using guarded terms.

A study of the tradeoff between search space size and constraint size using symbolic execution modulo SMT in the context of analyzing safety of autonomous systems such as platooning scenarios is presented in reference [13]. The results in that paper suggest that not only the size of state space matters for automation, but also the size of constraints that are sent to the SMT solver as many searches fail to terminate due to non-termination of constraint solving when constraints get large, while the same searches terminate with disjunctions are turned into branching in the search space.

None of these approaches, however, investigate the use of incremental SMT solving for improving performance of Rewriting Modulo SMT.

### **6 Conclusions and Future Work**

This paper proposes Incremental Rewrite Theories that enable incremental SMT solving for rewriting modulo SMT. This is accomplished by the search procedure hybrid search which combines BFS and DFS. The effectiveness of hybrid search is demonstrated by using a collection of verification problems taken from the literature, including algorithm verification, network security analysis, and cyberphysical systems safety verification. In all examples, the time taken to verify by hybrid search improved by a factor between 5–10 when compared to traditional BFS approaches, showing the great benefits of using incremental solving.

The current notion of incremental rewrite theory is essentially a syntactic notion although equational theories are used to reduce terms and matching may be modulo axioms, such as associativity and commutativity. This makes identifing the boolean increment efficient and thus well suited for the hybrid algorithm. An interesting direction for future work is to investigate less restrictive notions of *incremental* and indentify more general classes of rewrite theories where incremental solving is effective. Another direction of future work we are investigating is the trade-offs of incremental solving and the shape of constraints, e.g., use disjunctions to reduce search space versus split disjunctions to reduce SMT solving time. We also are investigating the incorporation of incremental solving algorithms in tool implementations such as Maude.

**Acknowledgments.** This research is funded in part by the Ashton Fellowship at the University of Pennsylvania. Talcott was partially supported by the U. S. Office of Naval Research under award number N00014-20-1-2644, and NRL grant N0017317-1- G002. We also thank the anonymous reviewers for their careful reading and insightful comments.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Iscalc: An Interactive Symbolic Computation Framework (System Description)**

Bohua Zhan1,2(B) , Yuheng Fan<sup>3</sup>, Weiqiang Xiong<sup>2</sup>, and Runqing Xu<sup>1</sup>

<sup>1</sup> State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China <sup>2</sup> University of Chinese Academy of Sciences, Beijing, China

bzhan@ios.ac.cn

<sup>3</sup> National Computer System Engineering Research Institute of China, Beijing, China

**Abstract.** The need to verify symbolic computation arises in diverse application areas. In this paper, based on earlier work on verifying computation of definite integrals in HolPy, we present a tool Iscalc for performing a variety of symbolic computations interactively, taking a middle ground in terms of easy of use and rigor between computer algebra systems and interactive theorem provers. The tool supports user-level definitions and dependency among computations, allowing construction and reuse of custom theories. Side conditions are checked on a best-effort basis. The tool is applied to highly non-trivial computations from the textbook Inside Interesting Integrals.

**Keywords:** Symbolic computation · User interface · Computer algebra

### **1 Introduction**

Symbolic computations arise in many mathematical proofs as well as in science and engineering. The use of computers to ensure their correctness is hence an important problem. Interactive theorem provers and computer algebra systems provide two alternative approaches. Most interactive theorem provers have extensive libraries in analysis [6], based upon which one can verify correctness of computations with a very high level of confidence. However, the learning curve for using such libraries is quite steep. On the other hand, computer algebra systems, such as Mathematica, Maple, etc, aim to perform computations automatically. However, it is difficult to guide the computation if the automatic procedure fails, and the correctness is not fully guaranteed. Indeed there have been examples of mistakes made by such computer algebra systems in the past [11].

Previous work [18] introduces a system for performing and verifying symbolic computation as an extension to the HolPy interactive theorem prover [19]. The user can perform calculation of definite integrals step-by-step, using rules such as substitution, integration by parts, etc. Each step has a relatively simple implementation, and proofs in higher-order logic can be constructed automatically from the sequence of steps, which in turn can be checked by the HolPy kernel. This provides a user experience which can be seen as a mix between the two approaches discussed above, combining the more intuitive feel of computer algebra systems with higher level of confidence in the results.

In this paper, we present a significant extension to the work in [18], forming an independent tool named Iscalc (**I**nteractive **s**ymbolic **calc**ulations). In particular, we make the following extensions aimed at greater safety, extensibility, and ability to handle a wider range of examples.


One of our main aims and yardstick for measuring progress is verifying computations from the textbook *Inside Interesting Integrals* [17]. This book contains many computations of integrals using a variety of techniques, including differentiating under the integral sign, series expansions, and so on. Many computations are quite involved (the longest example we did, Ahmed's Integral, is 4 pages long in the book). We also carry over and complete some of the case studies in [18].

Our aim is to provide a user interface that is more intuitive and accessible to mathematicians and engineers. In particular, computations are displayed in LATEX form, and whenever there is tension between conventional mathematical language and the more precise formal language, we prefer the former. We take the best-effort approach to correctness, providing systematic checks for the usual mistakes, such as cancelling expressions that may be zero, or exchange of sums that are not absolutely convergent. However, full correctness guarantees in the sense of interactive theorem proving is not achieved without proof reconstruction, which we leave to future work. In this respect, our approach is more similar to SMT solvers and program verification tools based on them, which sacrifice some correctness guarantees for more efficiency and speed of development.

We now give an outline for the rest of this paper<sup>1</sup>. Section 2 describes the overall architecture of Iscalc. Section 3 shows results of case studies, and gives

<sup>1</sup> Source code and examples are available at https://github.com/bzhan/iscalc.

some interesting examples. Section 4 discusses some lessons we took from this work, especially for user interface design. Section 4.1 discusses related work and Sect. 5 concludes the paper.

### **2 Architecture**

Iscalc has a layered architecture consisting of several modules, as shown in Fig. 1. In this section, we begin with some preliminary definitions, then describe the functionality of each module in turn.

**Fig. 1.** Overall Architecture

#### **2.1 Preliminaries**

The term language of Iscalc inherits from that in [18], but with extensions for limits, summation, and indefinite integrals. The full syntax is as follows.

$$\begin{aligned} e &:= v \mid c \mid e\_1 \ o p \ e\_2 \mid f(e) \mid \mathsf{Deriv}(e, v) \mid \mathsf{Integral}(e, v, a, b) \mid \\ &\quad \mathsf{Limit}(e, v, a, dir) \mid \mathsf{Sum}(e, i, a, b) \mid \mathsf{Indefined}\mathsf{Integral}(e, v, deps) \mid \mathsf{Skolem}(n, deps) \end{aligned}$$

Constructors on the first line stand for variables, constants, operators, function applications, derivatives, and definite integrals, respectively. Constants are extended to include positive and negative infinities. Constructors on the second line are new, and we explain them in more detail.

Limit(e, v, a, *dir*) represents the limit of expression e as variable v goes to expression a, here *dir* represents the direction of the limit. That is, we distinguish between lim<sup>x</sup>→0+ f(x) and lim<sup>x</sup>→0<sup>−</sup> f(x), etc. Sum(e, i, a, b) represents summation of expression e as the integer index i goes from a to b (inclusive, except when b = ∞). IndefiniteIntegral(e, v, *deps*) and Skolem(n, *dep*) are used together for computing with indefinite integrals. The former represents indefinite integral of e with respect to v. When this is evaluated to an expression plus "C", this C is represented by a Skolem term. Here *deps* represent the additional variables that C may depend on, which comes from the list of dependent variables *deps* of the indefinite integral. The use of dependent variables in evaluating indefinite integrals is illustrated by an example in Sect. 3.1.

Another extension compared to [18] is the addition of formulas. These are used to specify goals, wellformedness conditions on terms, as well as assumptions on goals and definitions. Currently we support the following constructors for formulas:<sup>2</sup>

$$f := e\_1 \ o \,\, e\_2 \mid \mathsf{isInt}(e) \mid \mathsf{notInt}(e) \mid \mathsf{converges}(e)$$

where the binary operator *op* is one of =, =, <, ≤, >, ≥. isInt(e) and notInt(e) represent e is/is not an integer. converges(e) represents e is convergent, where e is a series whose upper limit is ∞.

#### **2.2 Context**

In [18], each computation is independent from each other, and all available definitions and identities are built into the kernel. In contrast, Iscalc develops a system of user-level definitions and dependency between computations similar to usual interactive theorem provers. This is achieved by a hierarchy of *books*, *files*, *definitions* and *goals*. Each book consists of an ordered list of axioms, definitions, and files, and may depend on other books. Each file contains a list of goals, whose computation may depend on previous items in the book. Each definition specifies a new function along with assumptions on the arguments of that function. Each axiom or goal specifies a single expression to be proved under a set of premises. It may be marked with *attributes* to specify its type or how it is to be used (e.g. whether it can be used during simplification).

In the implementation, a Context object maintains the list of definitions, identities, and inequality rules available at the current file. It also contains the premises and inductive hypothesis for the current computation (these are modified when performing a case analysis or induction, as described in Sect. 2.5).

#### **2.3 Algorithms**

Iscalc implements several basic algorithms in computer algebra, for checking inequalities, simplification and normalization of expressions, computing limits, and solving equations. All of these take a Context object as input, and depend on the context information.

<sup>2</sup> Currently we do not use logical operators, as negation is unnecessary for the current list of formulas, and conjunction and disjunction are represented using internal data structures. This may change as new needs arise in the future.

*Inequality Checking.* Unlike in the previous paper, condition checking is implemented entirely from scratch rather than relying on SymPy. It is well-known that checking inequalities involving transcendental functions is undecidable. Our goal is to perform simple rule-based reasoning automatically, leaving more involved inequalities to be proved with user guidance. The overall approach is saturation: we maintain a dictionary mapping expressions to conditions on them. Given an expression for which we wish to derive some conditions, saturation works recursively on each subexpression, matching it against the main argument of each rule (left side of inequalities, or the last argument of predicates). For each match, it looks in the dictionary for existing facts that justifies assumptions of the rule. Special reasoning is performed on numerical constants (e.g. x<c<sup>1</sup> can be used to justify x<c<sup>2</sup> if c<sup>1</sup> ≤ c2). Comparison between numerical constants are currently done with floating-point approximation.

The approach described here is relatively simple, and it is not difficult to ensure termination, as we only get conditions on expressions that already appear. However, in practice it can be quite powerful when combined with user-guided rewriting, as shown by the example in Sect. 3.2.

*Simplification.* Simplification of expressions works in mostly the same way as [18], and we restate the main ideas. We normalize with respect to AC-property of addition and multiplication, and combine equal terms. When trying to combine t at <sup>b</sup> into t <sup>a</sup>+<sup>b</sup>, we check using the current context that either t is nonzero and a, b are integers, or t is nonnegative. This prevents cancellation of e.g. t/t into 1 when t may be zero.

Moreover, we apply identities in the context that are marked with the simplify attribute. These cover evaluation of functions at special values, as well as issues like removal of absolute value sign (e.g. |x| = x if x ≥ 0).

*Normalization.* There are situations where different forms of an expression are desirable for different purposes, e.g. factorized vs. expanded form of a polynomial, single quotient vs. a sum of quotients, etc. We designed the simplifier to not make a choice in such situations. Instead, if the user wishes to convert an expression to a different form, she can specify the rewriting explicitly. Iscalc then normalizes both old and new expressions and check whether they are equal. Normalization expands polynomials and combines quotients (e.g. for checking partial fraction decomposition), and performs (among others) rewriting of logarithm and exponentials.

*Computing Limits.* For limit computations, we implement a simplified version of the approach by Gruntz [10]. To compute lim<sup>x</sup>→∞ e, we evaluate recursively the limit of each subexpression in e, as well as the asymptotics of approaching that limit. Possible asymptotics include powers of polynomials and logarithms, as well as exponentials. Finding the limit as x approaches other values is converted to computing the limit at infinity.

As with other algorithms, the aim is not to achieve high level of automation, but to perform the simpler limits, leaving more complex cases to human guidance (e.g. using L'Hopital's rule or with rewriting). On the other hand, using the complete algorithm of Gruntz, or the algorithm implemented by Eberl in Isabelle [8], would certainly increase automation and range of applications.

*Solving Equations.* We implement simple equation solving, including isolating the expression to be solved, and solving linear equations. This is used when performing substitutions and in transforming/applying an existing equality.

### **2.4 Rules**

Based upon the collection of algorithms in the previous section, Iscalc implements a set of rules for transforming the current expression in a computation. Currently 37 rules are available. We give some representative examples below.

*Integration Rules.* The list of integration rules are mostly inherited from [18]. They include Substitution, IntegrationByParts, etc. Integration identities can be applied by lookup from the context. There are also rules for more advanced techniques such as differentiating under the integral sign (illustrated in Sect. 3.1), and exchange of integral and sum (illustrated in Sect. 3.3).

*Rewriting Rules.* The most basic rewriting rule is FullSimplify, which applies simplification to the current expression. ApplyIdentity applies an identity from the context. This generalizes the use of Fu's rules for trigonometric identities [9]. The rule Equation supports rewriting to another form of an expression with equal normal form. Series expansion and evaluation of series are available as two different rules (again looking up identities from the context).

*Equality Transformation Rules.* These rules transform one equality into another. IntegralEquation transforms an equation of the form Deriv(e, x) = g(x) into e = IndefiniteIntegral(g, x, *fvars*), where *fvars* is the list of free variables in Deriv(e, x). Another very flexible rule is SolveEquation, which solves for some expression e in an equality s = t to give another equality e = e . Other examples include taking limit on both sides, applying a function to both sides, and so on.

*Other Rules.* Besides the above three major categories, other rules include the L'Hopital's rule for computing limits, and rules for series manipulations.

### **2.5 Proof Methods**

In [18], the only way to perform a computation is starting from a single expression, and applying rules to transform that expression. More complex applications necessitate more structures in the computation. We describe those supported by Iscalc briefly, as they are all familiar from other theorem provers.

*Proof by Computation.* To show an equality a = b, perform computation on both sides until they become identical. Likewise, for inequalities, perform computation on both sides until the inequality can be shown automatically.

*Proof by Transformation.* Starting from a known equality a = b, apply the equality transformation rules in Sect. 2.4 to obtain new equalities, until the desired one is obtained.

*Case Analysis.* To show a goal, divide into cases either by whether some comparison formula is true, or according to whether some expression is less than, equal to, or greater than 0. We shown an example with inequality goals in Sect. 3.2.

*Induction.* Some integrals involve an integer parameter n ≥ 0, and may be proved by induction on n. We support such inductive reasoning in Iscalc. The rule ApplyInductHyp can be used to apply inductive hypothesis at any time in the inductive branch of the proof.

#### **2.6 Top-Level Computation, Automation, and User Interface**

Based on the above rules and proof methods, Iscalc supports performing a variety of symbolic computation, including showing inequalities, checking convergence, evaluating limits, and performing indefinite and definite integrals. It is also possible to build higher-level automation on top of the rules. An implementation of Slagle's method is inherited from [18]. It performs best-first search using algorithmic and heuristic steps for performing an integral. If the search succeeds, it outputs a sequence of rules to apply, which can then be replayed in Iscalc.

The user interface of Iscalc is mostly inherited from [18]. The primary goal is to provide a visual interface that feels similar to that of a computer algebra system, and which allows mostly point-and-click based interactions. In particular, computation steps are performed by selecting rules to apply from the menu. For certain rules, the user may need to select a subexpression of the current expression to apply the rule on, and/or choose from suggestions given by the computer (e.g. when rewriting using identities).

Additional features in the current work, such as book and file hierarchy, and proof methods, are also supported in the user interface. This includes display and navigation of book and file contents. To begin the proof of an equation, the user selects from the menu one of the proof methods in Sect. 2.5. The structured computation is then displayed in a reader-friendly format. An example showing display of file contents and a computation is given in Fig. 2.

### **3 Examples**

We applied Iscalc on computations of limits, indefinite integrals, and definite integrals from a variety of sources. Three sources are inherited from [18]: an exam preparation book (Tongji), online problem lists by D. Kouba [13], and the MIT integration Bee [1]. The range of applicability is greater on these problem sets. For example, we can now perform all examples in the exponentials and trigonometric category from D. Kouba's problem lists, while the previous work


**Fig. 2.** Screenshot of the user interface, showing part of the example given in Sect. 3.1. The menu groups related rules into categories. The *Proof* category contains general actions such as proof by calculation and induction. The remaining five menu categories contain rewriting rules. The left side of the main window shows division of the computation into several parts, and the right side shows the selected part as a series of computation steps. On the bottom (not shown) are space for users to enter additional information for a computation step.

can perform only 7/12 and 22/27 examples respectively, due to limitations of SymPy as well as other unsupported features.

The main additional benchmark comes from the textbook *Inside Interesting Integrals* [17]. 71 integral calculations are performed in Iscalc, covering about half the content of the book, including early results about Gamma and zeta functions. Many of the remaining examples involve complex numbers and contour integration, which are not supported by the current version of the tool.

Next, we illustrate some special functionality of Iscalc using examples. From these examples, we wish to emphasize how different algorithms and rules described in Sect. 2.3 and 2.4 interact with each other, enabling a computation process that is very close to human writing.

#### **3.1 Working with Indefinite Integrals and** *C*

The goal is to evaluate Frullani's integral (Sect. 3.3 of [17]).

$$I(a,b) = \int\_0^\infty \frac{\tan^{-1}(ax) - \tan^{-1}(bx)}{x} \, dx$$

under the condition a > 0,b > 0. The computation starts by computing d da <sup>I</sup>(a, b) = <sup>π</sup> <sup>2</sup><sup>a</sup> , which follows by exchanging derivative and integral, then using the formula for the definite integral <sup>∞</sup> 0 1 <sup>u</sup>2+1 dx. The key step is integrating both sides of <sup>d</sup> da <sup>I</sup>(a, b) = <sup>π</sup> <sup>2</sup><sup>a</sup> using rule IntegralEquation to obtain <sup>I</sup>(a, b) = <sup>π</sup> <sup>2</sup><sup>a</sup> da, which evaluates to

$$I(a,b) = \frac{\pi \log a}{2} + C(b)$$

Here it is important to keep track of the dependency of the constant in <sup>π</sup> <sup>2</sup><sup>a</sup> da on the variable b, which is kept in the argument *deps* of the expression. This variable is then shown explicitly as an argument to the Skolem term C when the indefinite integral is evaluated.

Next, substitute b by a in the above equation, and from I(a, a) = 0 obtain <sup>C</sup>(a) = <sup>−</sup><sup>π</sup> log <sup>a</sup> <sup>2</sup> . Substituting back in the above equation gives the final answer

$$I(a,b) = \frac{\pi \log a}{2} - \frac{\pi \log b}{2}.$$

The entire computation can be carried out in Iscalc much as described above, consisting one definition and four goals, and using 17 rule applications.

#### **3.2 Wellformedness Checks**

An example from Sect. 2.3 in [17], illustrating partial fraction decomposition, involves computing the following integral:

$$I(a) = \int\_0^\infty \frac{1}{x^4 + 2x^2 \cos(2a) + 1} \, dx$$

under the condition cos(a) = 0. One particularly tricky point is that it is not obvious why the denominator is always nonzero. This cannot be shown automatically by Iscalc. However, we can state a separate goal showing this fact by case analysis. One of the step during the computation involves an integral with the same denominator, but with bounds (−∞,∞), so we perform the check without any assumption on x.

We perform case analysis on whether x is equal to 0. If x = 0 then the goal simply reduces to 1 = 0. If x = 0, we rewrite the goal as follows (the name of the rule applied is shown at right):

$$\begin{aligned} &x^4 + 2x^2 \cos(2a) + 1\\ &= (x^2 - 1)^2 + 2x^2(1 + \cos(2a)) \end{aligned} \tag{\text{Equation}} $$

$$\begin{aligned} &= (x^2 - 1)^2 + 2x^2(1 + (2\cos^2(a) - 1)) \end{aligned} \tag{\text{Aypply Identity}} $$

$$\begin{aligned} &= 4x^2 \cos^2(a) + (x^2 - 1)^2 \end{aligned} \tag{\text{FullSimplify}} $$

Now, from <sup>x</sup> = 0 and cos(a) = 0 we get 4x<sup>2</sup> cos<sup>2</sup>(a) <sup>&</sup>gt; 0. Also (x<sup>2</sup> <sup>−</sup> 1)<sup>2</sup> <sup>≥</sup> 0, so the whole expression is greater than zero (and hence nonzero). The inequality checking algorithm in Sect. 2.3 is able to perform this reasoning automatically, hence showing the expression in the integral is well-defined. Interestingly, the answer <sup>π</sup> 4 cos(a) given in the book is not fully correct. It only holds when cos(a) > 0. If cos(a) <sup>&</sup>lt; 0 the correct answer is <sup>−</sup> <sup>π</sup> 4 cos(a) (we can easily check there is a mistake since the integrand is always positive).

#### **3.3 Convergence Checks**

For the final example, we illustrate integration using series, as well as checking convergence. The example comes from Sect. 5 of [17]. The goal is to evaluate

$$\int\_0^1 \frac{\log(1+x)}{x} \, dx$$

The technique used is to expand the Taylor series for log(1 + x) (using rule SeriesExpansionIdentity), then exchange integration and summation. During the exchange the body of the sum and integral is (−1)*n*x*<sup>n</sup>* <sup>n</sup>+1 . As the body changes sign for different values of n, there is potential danger that the sum is not *absolutely convergent*, and the exchange of sum and integral is incorrect even if the final answer is finite. To exclude this possibility, Iscalc requires the user to first show the convergence of <sup>∞</sup> <sup>n</sup>=0 <sup>1</sup> 0 x*n* <sup>n</sup>+1 dx. This is checked after the computation

$$\sum\_{n=0}^{\infty} \int\_0^1 \frac{x^n}{n+1} \, dx = \sum\_{n=0}^{\infty} \frac{1}{n+1} \int\_0^1 x^n \, dx = \sum\_{n=0}^{\infty} \frac{1}{(n+1)^2}$$

which is convergent by the p-series test implemented within Iscalc. This shows the exchange of sum and integral is indeed safe. The final result of the integral is <sup>π</sup> <sup>12</sup> , which can be computed in Iscalc using 10 rule applications (including 3 for showing convergence), assuming the value of some standard infinite series is already known.

### **4 Discussion**

While there has been a long line of research on visual user-interfaces for interactive theorem proving, one persistent issue is that they are mostly limited to simple examples or narrow application areas. For large scale formalizations, the number of actions the user can perform steadily increases, so it becomes more and more difficult to organize them in the user interface. Our work can be seen as an exploration of how far we can go in the limited, but still wide area of symbolic computation. We believe the results are positive. In particular, the following design decisions contribute to controlling complexity:


The end result is that the user does not need to recall names of any existing identity (in fact no names are assigned at all). Instead, all results are either applied automatically, or selected after matching from a list of suggested choices.

#### **4.1 Related Work**

There is a large body of work combining theorem proving and symbolic computation, and in user interface design for theorem provers. Some earlier works include Harrison and Th´ery's "skeptic's" approach to invoking computer algebra systems from a theorem prover [12], and Bauer et al's Analytica [5], which implements automatic theorem proving for elementary analysis within Mathematica. We leave a detailed review to [18,19]. More recently, Lewis and Wu [14] implemented a bi-directional interface between Lean [16] and Mathematica. Donato et al. designed an interface for constructing proofs using drag-and-drop actions [7].

There are also many implementations of proof procedures related to computer algebra. For example, the tool MetiTarski for proving inequalities by Akbarpour and Paulson [2], and the heuristic-based prover Polya by Avigad et al [4]. For computation of limits, Eberl implemented verified computation of asymptotics with generated proofs in Isabelle [8]. We do not claim our procedures to be more effective than the ones listed above, but focus on their combination with user guidance to allow performing more complex symbolic computations.

### **5 Conclusion**

In this paper, we introduced Iscalc for performing symbolic computation interactively, as a significant extension to the system described in [18]. This results in a more extensible tool with greater range of applicability, in particular able to check difficult computations from the textbook [17], and find some mistakes in the process.

In future work, we wish to extend the functionality of Iscalc to handle complex numbers, multiple integrals, and vector calculus. One particularly interesting question is how to support evaluation of contour integrals (the formalization of which have been done in Isabelle by Li and Paulson [15]). On the applications side, we intend to explore verification of control systems [3].

Finally, more work would be required to extend the proof reconstruction in [18] to the larger set of functionality available, as well as linking with library of theorems in analysis. The custom language of expressions defined here is independent of particular choice of logical foundation, hence proof reconstruction should be possible in any interactive theorem prover.

**Acknowledgements.** This work was partially supported by the National Natural Science Foundation of China under Grant Nos. 62002351 and 62032024. Part of this work is presented at the IPAM workshop *Machine Assisted Proofs*. We thank audience members at the talk for valuable feedback that helped to improve this work.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Author Index**

#### **B**

Barrett, Clark 522 Benzmüller, Christoph 438 Berg, Jeremias 1 Bhayat, Ahmed 23 Bjørner, Nikolaj 41 Blanchette, Jasmin 61 Bogaerts, Bart 1 Boigelot, Bernard 542 Bonacina, Maria Paola 78 Brieger, Marvin 96 Bromberger, Martin 116, 134 Bruse, Florian 153

#### **C**

Chen, Yu-Fang 170 Coutelier, Robin 190

#### **D**

Desharnais, Martin 116 Dixon, Clare 382 Draheim, Dirk 509

#### **F**

Fan, Yuheng 577 Fazekas, Katalin 41 Fiedor, Tomáš 286 Fleury, Mathias 207 Fontaine, Pascal 542 Frohn, Florian 220

#### **G**

Giesl, Jürgen 220, 266, 344 Görlitz, Oliver 234 Graham-Lengrand, Stéphane 78

#### **H**

Hausmann, Daniel 234 Henkel, Elisabeth 248 Hensel, Jera 266 Hirokawa, Nao 401 Hoenicke, Jochen 248

Holík, Lukáš 286 Hozzová, Petra 307 Hruška, Martin 286 Humml, Merlin 234 Hustadt, Ullrich 382

#### **I**

Indrzejczak, Andrzej 325

#### **J** Jain, Chaahat 134

Järv, Priit 509

#### **K**

Kassing, Jan-Christoph 344 Kovács, Laura 190, 307 Kreuzer, Katharina 365

#### **L**

Lammich, Peter 207 Lange, Martin 153

#### **M**

Middeldorp, Aart 401, 474 Mitsch, Stefan 96 Möller, Sören 153

#### **N**

Nalon, Cláudia 382 Nantes-Sobrinho, Daniele 456 Niederhauser, Johannes 401 Nigam, Vivek 560 Nipkow, Tobias 365 Nordström, Jakob 1 Norman, Chase 307

#### **O**

Oertel, Andy 1

#### **P**

Papacchini, Fabio 382 Pattinson, Dirk 234

© The Editor(s) (if applicable) and The Author(s) 2023 B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 591–592, 2023. https://doi.org/10.1007/978-3-031-38499-8

Petrukhin, Yaroslav 325 Peuter, Dennis 419 Platzer, André 96 Prucker, Simon 234

#### **Q**

Qiu, Qi 61

#### **R**

Rabe, Florian 438 Rath, Jakob 190 Rawson, Michael 23, 190 Rogalewicz, Adam 286 Rothgang, Colin 438 Rümmer, Philipp 170

#### **S**

Schindler, Tanja 248 Schmidt-Schauß, Manfred 456 Schoisswohl, Johannes 23 Schöpf, Jonas 474 Schröder, Lutz 234 Síˇc, Juraj 286 Sofronie-Stokkermans, Viorica 419 Stevens, Lukas 491

#### **T**

Talcott, Carolyn 560 Tammet, Tanel 509 Thunert, Sebastian 419 Toledo, Guilherme V. 522 Tourret, Sophie 61 Tsai, Wei-Lun 170

#### **V**

Vandesande, Dieter 1 Vargovˇcík, Pavol 286 Vauthier, Christophe 78 Vergain, Baptiste 542 Verrev, Martin 509 Voronkov, Andrei 307

#### **W**

Weidenbach, Christoph 116, 134 Whitters, Gerald 560

#### **X**

Xiong, Weiqiang 577 Xu, Runqing 577

#### **Z**

Zhan, Bohua 577 Zohar, Yoni 522