**Uli Sattler Martin Suda (Eds.)**

# LNAI 14279

# **Frontiers of Combining Systems**

**14th International Symposium, FroCoS 2023 Prague, Czech Republic, September 20–22, 2023 Proceedings**

# Lecture Notes in Computer Science

# **Lecture Notes in Artificial Intelligence 14279**

Founding Editor Jörg Siekmann

Series Editors

Randy Goebel, *University of Alberta, Edmonton, Canada* Wolfgang Wahlster, *DFKI, Berlin, Germany* Zhi-Hua Zhou, *Nanjing University, Nanjing, China*

The series Lecture Notes in Artificial Intelligence (LNAI) was established in 1988 as a topical subseries of LNCS devoted to artificial intelligence.

The series publishes state-of-the-art research results at a high level. As with the LNCS mother series, the mission of the series is to serve the international R & D community by providing an invaluable service, mainly focused on the publication of conference and workshop proceedings and postproceedings.

Uli Sattler · Martin Suda Editors

# Frontiers of Combining Systems

14th International Symposium, FroCoS 2023 Prague, Czech Republic, September 20–22, 2023 Proceedings

*Editors* Uli Sattler University of Manchester Manchester, UK

Martin Suda Czech Technical University in Prague Prague, Czech Republic

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Artificial Intelligence ISBN 978-3-031-43368-9 ISBN 978-3-031-43369-6 (eBook) https://doi.org/10.1007/978-3-031-43369-6

LNCS Sublibrary: SL7 – Artificial Intelligence

© The Editor(s) (if applicable) and The Author(s) 2023. This book is an open access publication.

**Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.

# **Preface**

These proceedings contain the papers selected for presentation at the 14th *International Symposium on Frontiers of Combining Systems* (FroCoS 2023). The symposium was held during September 20–22, 2023 at Czech Technical University in Prague (CTU), Czech Republic. It was co-located with the 32nd *International Conference on Automated Reasoning with Analytic Tableaux and Related Methods* (TABLEAUX 2023).

FroCoS is the main international event for research on the development of techniques and methods for the combination and integration of formal systems, their modularization and analysis. Previous FroCoS meetings have been organized across the world, since 1996; see Figures 1 and 2 for a global and a European view of the locations of past and present meetings.

**Fig. 1.** A global map showing locations of past and current FroCoS meetings

FroCoS 2023 received 22 high-quality paper submissions, which were evaluated by the members of the Program Committee who did a great job at thoroughly evaluating these submissions regarding their technical and presentational quality and providing helpful feedback to the authors. Reviewing was single-blind and each paper was subject to at least three reviews, followed by sometimes extensive discussions within the Program Committee and, in three cases, a second round of reviewing. In the end, 14 papers were selected for presentation at the symposium and for publication. We have grouped them in this volume according to the following topic classification: (1) analysis of programs and equations, (2) unification, (3) decidable fragments, (4) frameworks, and (5) higher-order theorem proving.

Together with the Program Committee, we considered suitable candidates to give an invited talk, and were delighted to have found five outstanding invited speakers:

**Fig. 2.** A Europe-centric map showing locations of past and current FroCoS meetings in Europe and Asia


We would like to thank all the people who contributed to making FroCoS 2023 a success. In particular, we thank the members of the Program Committee and the external reviewers for their excellent, timely work and for providing the authors with insightful feedback. Of course we thank the authors for submitting high-quality papers, taking the reviewers' feedback into account, and presenting their work in a way that is accessible to the broad FroCoS audience. Next, we thank the invited speakers for their inspiring talks. Moreover, we thank the local organisers and the Czech Technical University in Prague for organising and supporting FroCoS. Finally, we gratefully acknowledge financial support from Springer.

July 2023 Uli Sattler Martin Suda

# **Organization**

#### **Program Committee Chairs**


### **Steering Committee**


#### **Program Committee**

Carlos Areces Universidad Nacional de Córdoba, Argentina Alessandro Artale Free University of Bolzano-Bozen, Italy Franz Baader TU Dresden, Germany Haniel Barbosa Universidade Federal de Minas Gerais, Brazil Peter Baumgartner CSIRO Canberra, Australia Clare Dixon University of Manchester, UK Mathias Fleury University of Freiburg, Germany Didier Galmiche LORIA, Université de Lorraine, France Silvio Ghilardi Università degli Studi di Milano, Italy Jürgen Giesl RWTH Aachen University, Germany Andreas Herzig IRIT at Université Paul Sabatier, France Roman Kontchakov Birkbeck, University of London, UK Paliath Narendran University at Albany - SUNY, USA Aina Niemetz Stanford University, USA Naoki Nishida Nagoya University, Japan

viii Organization

Andrew Reynolds University of Iowa, USA Renate A. Schmidt University of Manchester, UK Roberto Sebastiani University of Trento, Italy K. Subramani West Virginia University, USA Piotr Wojciechowski West Virginia University, USA Akihisa Yamada AIST, Japan

Giles Reger Amazon Web Services, USA & University of Manchester, UK Christophe Ringeissen LORIA, Université de Lorraine, France Philipp Rümmer University of Regensburg, Germany Viorica Sofronie-Stokkermans University of Koblenz, Germany Dmitriy Traytel University of Copenhagen, Denmark Christoph Weidenbach Max Planck Institute for Informatics, Germany

# **Local Organisers**


## **Additional Reviewers**

Nao Hirokawa JAIST, Japan

Daniel Cloerkes RWTH Aachen University, Germany Jan-Christoph Kassing RWTH Aachen University, Germany Hans-Jörg Schurr University of Iowa, USA Karel Chvalovský Czech Technical University in Prague, Czech Republic Boris Konev Liverpool University, UK Madalina Erascu West University of Timisoara, Romania

# **Abstracts of Invited Talks**

# **Incremental Reasoning in Embedded SAT Solvers**

Katalin Fazekas

#### TU Wien, Austria

**Abstract.** Embedding SAT solvers as sub-reasoning engines into more complex tools is a common practice in various application domains. For instance, SAT-based model checkers exploit modern solvers as black-box oracles, while solvers for Satisfiability Modulo Theories (SMT), Maximum Satisfiability (MaxSAT) or other combinatorial problems combine SAT solvers with various reasoning or optimization engines. Such embedded SAT solvers are used incrementally in most cases, i.e., the exact same SAT solver instance is reused to solve multiple related SAT queries. The goal of incremental reasoning is to exploit the shared constraints between consecutive SAT queries and thereby avoid repeated work and reduce solving time.

In this talk, first we briefly survey the functionalities supported by IPASIR, the standard API of incremental SAT solvers, which integrates solvers as black-boxes into larger systems. Then, we present our recently proposed extension to that interface which allows us to modify and refine SAT queries already during solving and thereby to benefit from incremental reasoning even more. The proposed extension, as we demonstrate by our experiments, captures the most essential functionalities that are sufficient to simplify and improve use cases where a more fine-grained interaction between the SAT solver and the rest of the system is required. We will present our experiments where we extended CaDiCaL, a state-of-the-art incremental SAT solver, with our proposed interface and evaluated it on two representative use cases: enumerating graphs within the SAT modulo Symmetries framework (SMS), and embedding it as the main CDCL(T) SAT engine in the SMT solver cvc5. Following that, we overview the key open challenges in such use cases to efficiently combine some complex crucial features of modern SAT solvers, such as inprocessing and proof production, with incremental reasoning. At the end, we briefly present possible ways to address some of these challenges.

This is a joint work with Aina Niemetz, Mathias Preiner, Markus Kirchweger, Stefan Szeider, and Armin Biere.

# **On Datatypes, Synergies, and Unicorns: Recent Developments in Theory Combination**

Yoni Zohar

Bar-Ilan University, Israel

**Abstract.** A Satisfiability Modulo Theories (SMT) solver is a tool that takes as input a first-order formula, and determines its T-satisfiability, that is, the existence of a first-order structure that satisfies it, as well as the axioms of some first-order theory T. Some theories are considered primitive, such as the theories of integers, reals, arrays, and lists. Other theories are considered combined, as they are obtained by the combination of existing theories. Examples include the theory of arrays of integers, or of lists of reals.

Now, assume that you have an SMT solver that supports two theories. How hard would it be to extend it so that it supports their combination? The classical answer to this question was given by Nelson-Oppen. They designed a decision procedure for a given combined theory by first purifying the input formula to two parts, one for each theory; then guessing equalities and disequalities between the shared variables of the two parts; and finally calling the two decision procedures for the separate theories on the part of the purified formula that is relevant to them, plus the guessed set of (dis)equalities.

The correctness of this combination method requires the two combined theories to be stably infinite, a model theoretic property related to the existence of infinite models. However, not all theories of interest are stably infinite. (For example, the theory of fixed-size bit-vectors is not.)

This state of affairs led to the development of various other combination methods that rely on various model theoretic notions, such as shiny, gentle, and polite theories. For each combination method, the corresponding properties of the theories need to be proven in order to be used with that method. And indeed, various theories have been shown to admit such properties.

In this talk I will survey recent results in the field of theory combination. First, I will sketch a proof that theories of datatypes (e.g., lists, trees) can be combined with any other theory, using the polite combination method. Next, I will show how the original Nelson-Oppen method

#### xiv Y. Zohar

can be integrated together with the polite combination method in a synergetic way that reduces the number of guesses one needs to make. Finally, a taxonomy of various model theoretic properties from theory combination will be presented, where the properties will be analyzed and compared. This will include the description of open problems which relate to a certain kind of theories (that are called "unicorns").

# **Contents**

#### **Analysis of Programs and Equations**


#### **Decidable Fragments**

Logic of Communication Interpretation: How to Not Get Lost in Translation . . . 119 *Giorgio Cignarale, Roman Kuznets, Hugo Rincon Galeana, and Ulrich Schmid*

Symbolic Model Construction for Saturated Constrained Horn Clauses . . . . . . . . 137 *Martin Bromberger, Lorenz Leutgeb, and Christoph Weidenbach*

#### **Frameworks**

Combining Finite Combination Properties: Finite Models and Busy Beavers . . . 159 *Guilherme V. Toledo, Yoni Zohar, and Clark Barrett*


xvi Contents

**Analysis of Programs and Equations**

# **Targeting Completeness: Using Closed Forms for Size Bounds of Integer Programs**

Nils Lommen(B) and J¨urgen Giesl(B)

LuFG Informatik 2, RWTH Aachen University, Aachen, Germany lommen@cs.rwth-aachen.de, giesl@informatik.rwth-aachen.de

**Abstract.** We present a new procedure to infer *size bounds* for integer programs automatically. Size bounds are important for the deduction of bounds on the runtime complexity or in general, for the resource analysis of programs. We show that our technique is *complete* (i.e., it always computes finite size bounds) for a subclass of loops, possibly with nonlinear arithmetic. Moreover, we present a novel approach to combine and integrate this complete technique into an incomplete approach to infer size and runtime bounds of general integer programs. We prove completeness of our integration for an important subclass of integer programs. We implemented our new algorithm in the automated complexity analysis tool KoAT to evaluate its power, in particular on programs with non-linear arithmetic.

#### **1 Introduction**

There are numerous incomplete approaches for automatic resource analysis of programs, e.g., [1,2,5,8,10,15,19,21,29,33]. However, also many complete techniques to decide termination, analyze runtime complexity, or study memory consumption for certain classes of programs have been developed, e.g., [3,4,6,7,16,17,20,22,27,34,36]. In this paper, we present a procedure to compute *size bounds* which indicate how large the absolute value of an integer variable may become. In contrast to other complete procedures for the inference of size bounds which are based on fixpoint computations [3,6], our technique can also handle (possibly negative) constants and exponential size bounds. Similar to our earlier paper [27], we embed a procedure which is *complete* for a subclass of loops (i.e., it computes finite size bounds for all loops from this subclass) into an incomplete approach for general integer programs [8,19]. In this way, the power of the incomplete approach is increased significantly, in particular for programs with non-linear arithmetic. However, in the current paper we tackle a completely different problem than in [27] (and thus, the actual new contributions are also completely different), because in [27] we embedded a complete technique in order

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 235950644 (Project GI 274/6-2).

c The Author(s) 2023

U. Sattler and M. Suda (Eds.): FroCoS 2023, LNAI 14279, pp. 3–22, 2023. https://doi.org/10.1007/978-3-031-43369-6\_1

to infer runtime bounds, whereas now we integrate a novel technique in order to infer size bounds. As an example, we want to determine bounds on the absolute values of the variables during (and after) the execution of the following loop.

#### **while** (x<sup>3</sup> <sup>&</sup>gt; 0) **do** (x1, x2, x3, x4)←(3·x1+2·x2, <sup>−</sup>5·x1−3·x2, x3−1, x4+x<sup>2</sup> <sup>3</sup>) (1)

We introduce a technique to compute size bounds for loops which admit a closed form, i.e., an expression which corresponds to applying the loop's update n times. Then we over-approximate the closed form to obtain a non-negative, weakly monotonically increasing function. For instance, a closed form for x<sup>3</sup> in our example is x<sup>3</sup> − n, since the value of x<sup>3</sup> is decreased by n after n iterations. The (absolute value of this) closed form can be over-approximated by x<sup>3</sup> + n, which is monotonically increasing in all variables. Finally, each occurrence of n is substituted by a runtime bound for the loop. Clearly, (1) terminates after at most x<sup>3</sup> iterations. So if we substitute n by the runtime bound x<sup>3</sup> in the over-approximated closed form x<sup>3</sup> + n, then we infer the linear bound 2 · x<sup>3</sup> on the size of x3. Due to the restriction to weakly monotonically increasing overapproximations, we can plug in any over-approximation of the runtime and do not necessarily need exact bounds.

*Structure.* We introduce our technique to compute size bounds by closed forms in Sect. 2 and show that it is complete for a subclass of loops in Sect. 3. Afterwards in Sect. 4, we incorporate our novel technique into the incomplete setting of general integer programs. In Sect. 5 we demonstrate how size bounds are used in automatic complexity analysis and study completeness for classes of general programs. In Sect. 6, we conclude with an experimental evaluation of our implementation in the tool KoAT and discuss related work. All proofs can be found in [28].

#### **2 Size Bounds by Closed Forms**

In this section, we present our novel technique to compute size bounds for loops by closed forms in Theorem 7. We start by introducing the required preliminaries. Let V = {x1,...,x<sup>d</sup>} be a set of variables. F(V) is the set of all *formulas* built from inequations p > 0 for polynomials p ∈ Q[V], ∧, and ∨. A *loop* (ϕ, η) consists of a guard ϕ ∈ F(V) and an update η : V → Z[V] mapping variables to polynomials. A *closed form* cl<sup>x</sup>*<sup>i</sup>* (formally defined in Definition 1 below) is an expression in n and in the (initial values of the) variables x1,...,x<sup>d</sup> which corresponds to the value of x<sup>i</sup> after iterating the loop n times. For our purpose we only need closed forms which hold for all n ≥ n<sup>0</sup> for some fixed n<sup>0</sup> ∈ N. Moreover, we restrict ourselves to closed forms which are so-called normalized polyexponential expressions [16]. Nonetheless, our procedure works for any closed form expression with a finite number of arithmetic operations (i.e., the number of operations must be independent of n). We extend the application of functions like η : V → Z[V] also to polynomials, vectors, and formulas, etc., by replacing each variable v in the expression by η(v). So in particular, (η2◦η1)(x) = η2(η1(x)) stands for the polynomial η1(x) in which every variable v is replaced by η2(v). Moreover, η<sup>n</sup> denotes the n-fold application of η.

We call a function σ : V → Z a *state*. By σ(exp) or σ(ϕ) we denote the number resp. Boolean value which results from replacing every variable v by the number σ(v) in the arithmetic expression exp or the formula ϕ.

**Definition 1 (Closed Forms).** *For a loop* (ϕ, η)*, an arithmetic expression* clx*<sup>i</sup> is a* closed form *for* <sup>x</sup><sup>i</sup> *with* start value <sup>n</sup><sup>0</sup> <sup>∈</sup> <sup>N</sup> *if* clx*<sup>i</sup>* <sup>=</sup> - <sup>1</sup>≤j≤ <sup>α</sup><sup>j</sup> ·na*<sup>j</sup>* · <sup>b</sup><sup>n</sup> j *with* , a<sup>j</sup> ∈ N*,* b<sup>j</sup> ∈ A*,* <sup>1</sup> <sup>α</sup><sup>j</sup> <sup>∈</sup> <sup>A</sup>[V]*, and for all* <sup>σ</sup> : V∪{n} → <sup>Z</sup> *with* <sup>σ</sup>(n) <sup>≥</sup> <sup>n</sup><sup>0</sup> *we have* σ(cl<sup>x</sup>*<sup>i</sup>* ) = σ(η<sup>n</sup>(xi))*. Similarly, we call* cl = (cl<sup>x</sup><sup>1</sup> ,..., cl<sup>x</sup>*<sup>d</sup>* ) *a* closed form *of the update* η *(resp. for the loop* (ϕ, η)*) with start value* n<sup>0</sup> *if for all* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>d</sup>*,* cl<sup>x</sup>*<sup>i</sup> are closed forms for* <sup>x</sup><sup>i</sup> *with start value* <sup>n</sup>0*.*

*Example 2.* In Sect. 3 we will show that for the loop (1), a closed form for x<sup>1</sup> (with start value 0) is cl<sup>x</sup><sup>1</sup> = <sup>1</sup> <sup>2</sup> ·α·(−i)<sup>n</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup> ·α·i <sup>n</sup> where <sup>α</sup> = (1+ 3i)·x<sup>1</sup> + 2i·x2. Here, α denotes the complex conjugate of α, i.e., the sign of those monomials is flipped where the coefficient is a multiple of the imaginary unit i. A closed form for <sup>x</sup><sup>4</sup> (also with start value 0) is cl<sup>x</sup><sup>4</sup> <sup>=</sup> <sup>x</sup><sup>4</sup> <sup>+</sup> <sup>n</sup> ·( <sup>1</sup> <sup>6</sup> + x<sup>3</sup> + x<sup>2</sup> <sup>3</sup> <sup>−</sup> <sup>x</sup><sup>3</sup> · <sup>n</sup> <sup>−</sup> <sup>n</sup> <sup>2</sup> <sup>+</sup> <sup>n</sup><sup>2</sup> <sup>3</sup> ).

Our aim is to compute *bounds* on the sizes of variables and on the runtime. As in [8,19], we only consider bounds which are weakly monotonically increasing in all occurring variables. Their advantage is that we can compose them easily (i.e., if f and g increase monotonically, then so does f ◦ g).

**Definition 3 (Bounds).** *The set of* bounds B *is the smallest set with* N = <sup>N</sup> ∪ {ω}⊆B*,* V⊆B*, and* {b<sup>1</sup> <sup>+</sup> <sup>b</sup>2, b<sup>1</sup> · <sup>b</sup>2, k<sup>b</sup><sup>1</sup> }⊆B *for all* <sup>k</sup> <sup>∈</sup> <sup>N</sup> *and* <sup>b</sup>1, b<sup>2</sup> ∈ B*.*

Size bounds should be bounds on the values of variables up to the point where the loop guard is not satisfied anymore for the first time. To define size bounds, we introduce the *runtime complexity* of a loop (whereas we considered the runtime complexity of arbitrary integer programs in [8,19,27]). Let Σ denote the set of all states σ : V → Z and let |σ| be the state with |σ|(x) = |σ(x)| for all x ∈ V.

**Definition 4 (Runtime Complexity for Loops).** *The* runtime complexity *of a loop* (ϕ, η) *is* rc : <sup>Σ</sup> <sup>→</sup> <sup>N</sup> *with* rc(σ) = inf{<sup>n</sup> <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>σ</sup>(η<sup>n</sup>(¬ϕ))}*, where* inf ∅ = ω*. An expression* r ∈ B *is a* runtime bound *if* |σ|(r) ≥ rc(σ) *for all* σ ∈ Σ*.*

*Example 5.* The runtime complexity of the loop (1) is rc(σ) = max(0, σ(x3)). For example, x<sup>3</sup> is a runtime bound, as |σ|(x3) ≥ max(0, σ(x3)) for all states σ ∈ Σ.

A *size bound* on a variable x is a bound on the absolute value of x after n iterations of the update η, where n is bounded by the runtime complexity. In contrast to the definition of size bounds for transitions in integer programs from [8], Definition 6 requires that size bounds also hold *before* evaluating the loop.

<sup>1</sup> A is the set of algebraic numbers, i.e., the field of all roots of polynomials in Z[x].

**Definition 6 (Size Bounds for Loops).** SB : V→B *is a* size bound *for* (ϕ, η) *if for all* <sup>x</sup> ∈ V *and all* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>*, we have* <sup>|</sup>σ|(SB(x)) <sup>≥</sup> sup{|σ(ηn(x))| | <sup>n</sup> <sup>≤</sup> rc(σ)}*.*

For any algebraic number c ∈ A, as usual |c| is the smallest natural number which is greater or equal to c's absolute value. Similarly, for any poly-exponential expression p = - j ( - <sup>i</sup> <sup>c</sup>i,j ·βi,j )·na*<sup>j</sup>* ·b<sup>n</sup> <sup>j</sup> where ci,j ∈ A and the βi,j are normalized monomials of the form xe<sup>1</sup> <sup>1</sup> ·...·xe*<sup>d</sup>* <sup>d</sup> , |p| denotes - j ( - <sup>i</sup>|ci,j |
 · <sup>β</sup>i,j )·na*<sup>j</sup>* ·|b<sup>j</sup> <sup>|</sup><sup>n</sup> .

We now determine size bounds by over-approximating the closed form cl<sup>x</sup> by the non-negative, weakly monotonically increasing function <sup>|</sup>cl<sup>x</sup>| . Then we substitute n by a runtime bound r (denoted by "[n/r]"). Due to the monotonicity, this results in a bound on the size of x not only at the end of the loop, but also during the iterations of the loop. Since the closed form is only valid for n iterations with n ≥ n0, we ensure that our size bound is also correct for less than n<sup>0</sup> iterations by symbolically evaluating the update, where we over-approximate maxima by sums. As mentioned, see [28] for the proofs of all new results.

**Theorem 7 (Size Bounds for Loops with Closed Forms).** *Let* cl *be a closed form for the loop* (ϕ, η) *with start value* n<sup>0</sup> *and let* r ∈ B *be a runtime bound. Then the (absolute) size of* <sup>x</sup> ∈ V *is bounded by* sb<sup>x</sup> <sup>=</sup>-<sup>|</sup>cl<sup>x</sup>|[n/r] + - <sup>0</sup>≤i<n<sup>0</sup> <sup>|</sup>η<sup>i</sup> (x)|*. Hence, the function* SB *with* SB(x) = sb<sup>x</sup> *for all* <sup>x</sup> ∈ V *is a size bound for* (ϕ, η)*.*

*Example 8.* As mentioned, for the loop (1), a closed form for x<sup>1</sup> with start value 0 is cl<sup>x</sup><sup>1</sup> = <sup>1</sup> <sup>2</sup> · <sup>α</sup> · (−i)<sup>n</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup> · α · i <sup>n</sup> where <sup>α</sup> = (1 + 3i) · <sup>x</sup><sup>1</sup> + 2i · <sup>x</sup>2. Hence, <sup>|</sup>cl<sup>x</sup><sup>1</sup> <sup>|</sup> <sup>=</sup> | 1 <sup>2</sup> · <sup>α</sup> · (−i)<sup>n</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup> · α · i n| = ( <sup>|</sup> 1+3i <sup>2</sup> | · <sup>x</sup><sup>1</sup> <sup>+</sup> <sup>|</sup>i|
 · <sup>x</sup>2) · | − <sup>i</sup>| <sup>n</sup> <sup>+</sup> ( <sup>|</sup> <sup>1</sup>−3i <sup>2</sup> | ·x1+| − <sup>i</sup>|
·x2)·|i| <sup>n</sup> = 4·x1+2·x2, as <sup>|</sup> 1+3i <sup>2</sup> | = <sup>|</sup> <sup>1</sup>−3i <sup>2</sup> | = <sup>√</sup><sup>10</sup> 2 = 2 and |i| = | − i| = 1. So our approach infers *linear* size bounds for x<sup>1</sup> and x<sup>2</sup> (the similar computations for x<sup>2</sup> are omitted) while [8] only infers exponential size bounds.

As this over-approximation does not depend on n, it directly yields a size bound, i.e., sb<sup>x</sup><sup>1</sup> <sup>=</sup> <sup>|</sup>cl<sup>x</sup><sup>1</sup> <sup>|</sup>. In contrast, in the over-approximation <sup>|</sup>cl<sup>x</sup><sup>4</sup> <sup>|</sup> <sup>=</sup> x<sup>4</sup> +n 1 + x<sup>3</sup> + x<sup>2</sup> <sup>3</sup> <sup>+</sup> <sup>x</sup><sup>3</sup> · <sup>n</sup> <sup>+</sup> <sup>n</sup> <sup>+</sup> <sup>n</sup><sup>2</sup> , we have to replace n by a runtime bound like <sup>x</sup>3. Thus, we obtain the overall size bound sb<sup>x</sup><sup>4</sup> <sup>=</sup> <sup>x</sup><sup>4</sup> + 3 · <sup>x</sup><sup>3</sup> <sup>3</sup> + 2 · <sup>x</sup><sup>2</sup> <sup>3</sup> + x3.

Although this section focused on closed forms which are poly-exponential expressions, our technique is applicable to all loops where we can compute overapproximating bounds for the closed form and the runtime complexity. For example, the update η(x) = x<sup>2</sup> has the closed form x(2*n*) , but it does not admit a poly-exponential closed form due to x's super-exponential growth. However, by instantiating n by a runtime bound, we can still compute a size bound for this update. The reason for focusing on poly-exponential expressions is that we can compute such a closed form for all so-called *solvable loops* automatically, see Sect. 3.

#### **3 Size and Runtime Bounds for Solvable Loops**

In this section, we present a class of loops where our technique of Theorem 7 is "complete". The technique relies on the computation of suitable closed forms and of runtime bounds. In Sect. 3.1, we show that poly-exponential closed forms can be computed for all *solvable loops* [17,23,25,26,32,36]. Then we prove in Sect. 3.2 that finite runtime bounds are computable for all terminating solvable loops with only periodic rational eigenvalues.

A loop (ϕ, η) is *solvable* if η is a *solvable update* (see Definition 9 below for a formal definition), which partitions V into blocks S1,..., S<sup>m</sup> (and loop guards ϕ are not relevant for closed forms). Each block allows updates with *cyclic dependencies* between its variables and *non-linear* dependencies on variables in blocks with lower indices.

**Definition 9 (Solvable Update** [17,23,25,26,32,36]**).** *An update* η : V → Z[V] *is* solvable *if there exists a partition* S1,..., S<sup>m</sup> *of* {x1,...,x<sup>d</sup>} *such that for all* 1 ≤ i ≤ m *we have η*S*<sup>i</sup>* = AS*<sup>i</sup>* · *x*S*<sup>i</sup>* + *p*S*<sup>i</sup> for an* AS*<sup>i</sup>* ∈ Z|S*i*|×|S*i*<sup>|</sup> *and a p*S*<sup>i</sup>* ∈ Z[ j<i S<sup>j</sup> ] |S*i*| *, where η*S*<sup>i</sup> is the vector of all* η(x<sup>j</sup> ) *and x*S*<sup>i</sup> is the vector of all* x<sup>j</sup> *with* j ∈ S<sup>i</sup>*. The eigenvalues of a solvable loop are defined as the union of the eigenvalues of all matrices* AS*<sup>i</sup> . The loop is* homogeneous *if p*S*<sup>i</sup>* = **0** *for all* 1 ≤ i ≤ m*.*

*Example 10.* The loop (1) is an example for a solvable loop using the partition S<sup>1</sup> = {x1, x2}, S<sup>2</sup> = {x3}, and S<sup>3</sup> = {x4}.

The crucial idea for our results in Sect. 3.1 and 3.2 is to reduce the problem of finding closed forms and runtime bounds from solvable loops to *triangular weakly non-linear* loops (*twn-loops*) [16,17,20]. A *twn-update* is a solvable update where each block S<sup>j</sup> has cardinality one. Thus, a twn-update is *triangular*, i.e., the update of a variable does not depend on variables with higher indices. Furthermore, the update is *weakly non-linear*, i.e., a variable does not occur non-linear in its own update. We are mainly interested in loops over Z, but to handle solvable updates, we will transform them into twn-updates with coefficients from A.

**Definition 11 (TWN-Update** [16,17,20]**).** *An update* η : V → A[V] *is* twn *if for all* 1 ≤ i ≤ d *we have* η(xi) = c<sup>i</sup> · x<sup>i</sup> + p<sup>i</sup> *for some* c<sup>i</sup> ∈ A *and some polynomial* p<sup>i</sup> ∈ A[x1,...,x<sup>i</sup>−<sup>1</sup>]*. A loop with a twn-update is called a* twn-loop*.*

Clearly, (1) is not a twn-loop due to the cyclic dependency between x<sup>1</sup> and x2.

#### **3.1 Closed Forms for Solvable Loops**

Lemma 12 (which extends [17, Thm. 16] from solvable updates with real eigenvalues to arbitrary solvable updates) illustrates that one can transform any solvable update η<sup>s</sup> into a twn-update η<sup>t</sup> by an automorphism ϑ. Here, ϑ is induced by the change-of-basis matrix of the Jordan normal form of each block of ηs. Note that the Jordan normal form is always computable in polynomial time (see [9]).

**Lemma 12 (Transforming Solvable Updates (see** [17]**, Thm. 16)***. Let* η<sup>s</sup> *be a solvable update. Then* ϑ : V → A[V] *is an automorphism, where* ϑ *is defined by* <sup>ϑ</sup>(S) = <sup>P</sup> · *<sup>x</sup>*<sup>S</sup> *for each block* <sup>S</sup>*, where* <sup>J</sup>(A<sup>S</sup> ) = <sup>P</sup> · <sup>A</sup><sup>S</sup> · <sup>P</sup> <sup>−</sup><sup>1</sup> *is the Jordan normal form of* <sup>A</sup><sup>S</sup> *. Furthermore,* <sup>η</sup><sup>t</sup> <sup>=</sup> <sup>ϑ</sup>−<sup>1</sup> ◦ <sup>η</sup><sup>s</sup> ◦ <sup>ϑ</sup> *is a twn-update.*

*Example 13* To illustrate Lemma 12, we transform the solvable update η<sup>s</sup> of (1) into a twn-update ηt. As the blocks S<sup>2</sup> = {x3} and S<sup>3</sup> = {x4} have cardinality one, we only have to consider <sup>S</sup><sup>1</sup> <sup>=</sup> {x1, x2}. The restriction of <sup>η</sup><sup>s</sup> to <sup>S</sup><sup>1</sup> is x<sup>1</sup> x2 ← A<sup>S</sup><sup>1</sup> · x1 x2 with <sup>A</sup><sup>S</sup><sup>1</sup> <sup>=</sup> 3 2 −5 −3 . So we get the Jordan normal form J(A<sup>S</sup><sup>1</sup> ) = <sup>P</sup> · <sup>A</sup><sup>S</sup><sup>1</sup> · <sup>P</sup> <sup>−</sup><sup>1</sup> <sup>=</sup> <sup>−</sup>i 0 0 i where P = <sup>−</sup> <sup>5</sup> <sup>2</sup> <sup>i</sup> <sup>1</sup> <sup>2</sup> (1−3i) <sup>5</sup> <sup>2</sup> <sup>i</sup> <sup>1</sup> <sup>2</sup> (1+3i) and P <sup>−</sup><sup>1</sup> = 1 <sup>5</sup> (i−3) <sup>−</sup> <sup>1</sup> <sup>5</sup> (i+3) 1 1 . Thus, we have the following automorphism ϑ and its inverse ϑ−<sup>1</sup>:

$$\begin{array}{llll}\vartheta\begin{pmatrix}x\_{1}\\x\_{2}\end{pmatrix}&=P\cdot\begin{pmatrix}x\_{1}\\x\_{2}\end{pmatrix}&=\begin{pmatrix}-\frac{5}{7}\mathbf{i}\cdot x\_{1}+\frac{1}{7}(1-3\mathbf{i})\cdot x\_{2}\\\frac{5}{7}\mathbf{i}\cdot x\_{1}+\frac{1}{7}(1+3\mathbf{i})\cdot x\_{2}\end{pmatrix},&\vartheta\begin{pmatrix}x\_{3}\\x\_{4}\end{pmatrix}&=\begin{pmatrix}x\_{3}\\x\_{4}\end{pmatrix}\\\vartheta^{-1}\begin{pmatrix}x\_{1}\\x\_{2}\end{pmatrix}&=P^{-1}\cdot\begin{pmatrix}x\_{1}\\x\_{2}\end{pmatrix}=P^{-1}\cdot\begin{pmatrix}x\_{1}\\x\_{2}\end{pmatrix}=\begin{pmatrix}\frac{1}{5}\left(\mathbf{i}-\mathbf{j}\right)\cdot x\_{1}-\frac{1}{5}(\mathbf{i}+\mathbf{j})\cdot x\_{2}\\x\_{1}+x\_{2}\end{pmatrix},&\vartheta^{-1}\begin{pmatrix}x\_{3}\\x\_{4}\end{pmatrix}=\begin{pmatrix}x\_{3}\\x\_{4}\end{pmatrix}\end{array}$$

Hence, <sup>η</sup><sup>t</sup> <sup>=</sup> <sup>ϑ</sup>−<sup>1</sup> ◦ <sup>η</sup><sup>s</sup> ◦ <sup>ϑ</sup> is the following twn-update:

<sup>η</sup>t(x1) = <sup>−</sup><sup>i</sup> · <sup>x</sup>1, ηt(x2)=i · <sup>x</sup>2, ηt(x3) = <sup>x</sup><sup>3</sup> <sup>−</sup> <sup>1</sup>, ηt(x4) = <sup>x</sup><sup>4</sup> <sup>+</sup> <sup>x</sup><sup>2</sup> 3

The reason for transforming solvable updates to twn-updates is that for the latter, we can re-use our previous algorithm from [16] to compute polyexponential closed forms. While [16] only considered updates with linear arithmetic over Z, it can directly be extended to twn-updates over A.

**Lemma 14 (Closed Forms for TWN-Updates (see** [16]**)).** *Let* η *be a twn-update. Then a (poly-exponential) closed form is computable for* η*.*

*Example 15.* For η<sup>t</sup> from Example 13, we obtain the following closed form (with start value 0): cl<sup>t</sup> = ((−i)<sup>n</sup> ·x1, <sup>i</sup> <sup>n</sup> ·x2, x3−n, x4+n( <sup>1</sup> <sup>6</sup> +x3+x<sup>2</sup> <sup>3</sup>−x<sup>3</sup> ·n<sup>−</sup> <sup>n</sup> <sup>2</sup> <sup>+</sup> <sup>n</sup><sup>2</sup> <sup>3</sup> )).

So to obtain a closed form of a solvable update ηs, we first transform it into a twn-update η<sup>t</sup> via Lemma 12, and then compute the closed form cl<sup>t</sup> of η<sup>t</sup> (Lemma 14). We now show how to obtain a closed form for η<sup>s</sup> from clt.

**Theorem 16 (Closed Forms for Solvable Updates).** *Let* η<sup>s</sup> *be a solvable update and* <sup>ϑ</sup> *be an automorphism as in Lemma <sup>12</sup> such that* <sup>η</sup><sup>t</sup> <sup>=</sup> <sup>ϑ</sup>−<sup>1</sup> ◦ <sup>η</sup><sup>s</sup> ◦ <sup>ϑ</sup> *is a twn-update. If* cl<sup>t</sup> *is a closed form of* η<sup>t</sup> *with start value* n0*, then* cl<sup>s</sup> = <sup>ϑ</sup> ◦ cl<sup>t</sup> ◦ <sup>ϑ</sup>−<sup>1</sup> *is a closed form of* <sup>η</sup><sup>s</sup> *with start value* <sup>n</sup>0*.*

*Example 17.* In Example 13 we transformed η<sup>s</sup> into the twn-update η<sup>t</sup> via an automorphism ϑ and in Example 15, we gave a closed form cl<sup>t</sup> of ηt. Thus, by Theorem 16, we can infer a closed form cl<sup>s</sup> <sup>=</sup> <sup>ϑ</sup> ◦ cl<sup>t</sup> ◦ <sup>ϑ</sup>−<sup>1</sup> of <sup>η</sup>s. For example, we compute a closed form for x<sup>1</sup> with start value 0 (cl<sup>x</sup><sup>2</sup> <sup>s</sup> can be inferred in a similar way):

$$\begin{split} \mathbf{c1}\_{s}^{x\_{1}} &= \left(\frac{1}{5} (\mathbf{i} - 3) \cdot x\_{1} - \frac{1}{5} (\mathbf{i} + 3) \cdot x\_{2}\right) \left[v/c\mathbf{1}\_{t}^{v} \mid v \in \mathcal{V}\right] \left[v/\vartheta(v) \mid v \in \mathcal{V}\right] \\ &= \left(\frac{1}{5} (\mathbf{i} - 3) \cdot (-\mathbf{i})^{n} \cdot x\_{1} - \frac{1}{5} (\mathbf{i} + 3) \cdot \mathbf{i}^{n} \cdot x\_{2}\right) \left[v/\vartheta(v) \mid v \in \mathcal{V}\right] \\ &= \frac{1}{2} (\underbrace{(1 + 3\mathbf{i}) \cdot x\_{1} + 2\mathbf{i} \cdot x\_{2}}\_{\alpha}) \cdot (-\mathbf{i})^{n} + \frac{1}{2} (\underbrace{(1 - 3\mathbf{i}) \cdot x\_{1} - 2\mathbf{i} \cdot x\_{2}}\_{\overline{\mathbf{n}}}) \cdot \mathbf{i}^{n} .\end{split}$$

#### **3.2 Periodic Rational Solvable Loops**

In Sect. 3.1, we discussed how to compute closed forms for solvable updates (by transforming them to twn-updates). However to compute size bounds, we have to instantiate the variable n in the closed forms by runtime bounds (Theorem 7). In [20], it was shown that (polynomial) runtime bounds can always be computed for terminating twn-loops over the integers. However, in general, transforming solvable loops via Lemma 12 yields twn-updates which may contain algebraic (complex) numbers. We now show that for the subclass of terminating *periodic rational* solvable loops, our approach is "complete" (i.e., finite runtime bounds and thus, also finite size bounds are always computable).

**Definition 18 (Periodic Rational** [25]**).** *A number* λ ∈ A *is* periodic rational *if* <sup>λ</sup><sup>p</sup> <sup>∈</sup> <sup>Q</sup> *for some* <sup>p</sup> <sup>∈</sup> <sup>N</sup> *with* p > <sup>0</sup>*. The* period *of* <sup>λ</sup> *is the smallest such* <sup>p</sup> *with* <sup>λ</sup><sup>p</sup> <sup>∈</sup> <sup>Q</sup>*. A solvable loop is* periodic rational *(i.e., it is a* prs loop*) with period* <sup>p</sup> *if all its eigenvalues* λ *are periodic rational and* p *is the least common multiple of all their periods. A prs loop is a* unit *prs loop if* |λ| ≤ 1 *for all its eigenvalues* λ*.*

So i, <sup>−</sup>i, and <sup>√</sup>2·i are periodic rational with period 2, while <sup>√</sup>2+i is not periodic rational. The following lemma from [25] gives a bound on the period of prs loops and thus yields an algorithm to detect prs loops and to compute their period.

**Lemma 19 (Bound on the Period** [25]**).** *Let* <sup>A</sup> <sup>∈</sup> <sup>Z</sup><sup>n</sup>×<sup>n</sup>*. If* <sup>λ</sup> *is a periodic rational eigenvalue of* <sup>A</sup> *with period* <sup>p</sup>*, then* <sup>p</sup> <sup>≤</sup> <sup>n</sup>3*.*

Now we show that by *chaining* (i.e., by performing p iterations of a prs loop with period p in a single step), one can transform any prs loop into a solvable loop with only integer eigenvalues. Then, our previous results on twn-loops [17,20] can be used to infer runtime bounds for these loops.

**Definition 20 (Chaining Loops).** *Let* L = (ϕ, η) *be a loop and* p ∈ N \ {0}*. Then* L<sup>p</sup> = (ϕp, ηp) *results from iterating* L p *times, i.e.,* ϕ<sup>p</sup> = ϕ ∧ η(ϕ) ∧ <sup>η</sup>(η(ϕ)) <sup>∧</sup> ... <sup>∧</sup> <sup>η</sup><sup>p</sup>−<sup>1</sup>(ϕ) *and* <sup>η</sup>p(v) = <sup>η</sup><sup>p</sup>(v) *for all* <sup>v</sup> ∈ V.

*Example 21.* The eigenvalues ±i of (1) have period 2. Chaining yields (ϕ∧η(ϕ), η<sup>2</sup>):

$$\text{while } (x\_3 > 0 \land x\_3 > 1) \text{ do } (x\_1, x\_2, x\_3, x\_4) \leftarrow (-x\_1, -x\_2, x\_3 - 2, x\_4 + (x\_3 - 1)^2 + x\_3^2) \tag{2}$$

Due to Lemma 12 we can transform every solvable update into a twn-update by a (linear) automorphism ϑ. For prs loops, ϑ's range can be restricted to Q[V], i.e., one does not need algebraic numbers. So, we first chain the prs loop L and then compute a Q-automorphism ϑ transforming the chained loop L<sup>p</sup> into a twn-loop L<sup>t</sup> via Lemma 12. Then we can infer a runtime bound for L<sup>t</sup> as in [20]. The reason is that all factors c<sup>i</sup> in the update of L<sup>t</sup> are integers and thus, we can compute a closed form - <sup>j</sup> <sup>α</sup><sup>j</sup> ·n<sup>a</sup>*<sup>j</sup>* ·b<sup>n</sup> <sup>j</sup> such that α<sup>j</sup> ∈ Q[V] and b<sup>j</sup> ∈ Z. Afterwards, the runtime bound for L<sup>t</sup> can be lifted to a runtime bound for the original loop

by reconsidering the automorphism ϑ. Similarly, in order to prove termination of the prs loop <sup>L</sup>, we analyze termination of <sup>L</sup><sup>t</sup> on <sup>ϑ</sup>(Zd) = {ϑ(*x*) <sup>|</sup> *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>d}. 2

**Lemma 22 (Runtime Bounds for PRS Loops).** *Let* L *be a prs loop with period* p *and let* L<sup>p</sup> = (ϕp, ηp) *result from chaining as in Definition 20. From* ηp*, one can compute a linear automorphism* ϑ : V → Q[V] *as in Lemma 12, such that:*

	- <sup>L</sup> *terminates on* <sup>Z</sup><sup>d</sup> *iff*
	- <sup>L</sup><sup>p</sup> *terminates on* <sup>Z</sup><sup>d</sup> *iff*
	- <sup>L</sup><sup>t</sup> *terminates on* <sup>ϑ</sup>(Z<sup>d</sup>) = {ϑ(*x*) <sup>|</sup> *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup><sup>d</sup>}*.*

**Fig. 1.** Illustration of Runtime and Size Bound Computations

Since we can detect prs loops and their periods by Lemma 19, Lemma 22 allows us to compute runtime bounds for all terminating prs loops. This is illustrated in Fig. 1: For runtime bounds, L is transformed to L<sup>p</sup> by chaining and L<sup>p</sup> is transformed further to L<sup>t</sup> by an automorphism ϑ. The runtime bound r for L<sup>t</sup> can then be transformed into a runtime bound for L<sup>p</sup> and further into a runtime bound for L. For size bounds, L is directly transformed to a twn-loop L <sup>t</sup> by an automorphism ϑ . The closed form cl<sup>t</sup> obtained for L <sup>t</sup> is transformed via the automorphism ϑ into a closed form cl<sup>s</sup> for L. Then the runtime bound for L is inserted into this closed form to yield a size bound for L. So in Fig. 1, standard arrows denote transformations of loops and wavy arrows denote transformations of runtime bounds or closed forms.

<sup>3</sup> More precisely, <sup>|</sup>σ|(r) <sup>≥</sup> inf{<sup>n</sup> <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>σ</sup>(η<sup>n</sup> <sup>t</sup> (¬ϕt))} must hold for all <sup>σ</sup> : V → <sup>ϑ</sup>(Z<sup>d</sup>).

<sup>2</sup> By [17], termination of <sup>L</sup><sup>t</sup> on <sup>ϑ</sup>(Z<sup>d</sup>) is reducible to invalidity of a formula <sup>∃</sup>*<sup>x</sup>* <sup>∈</sup> <sup>Q</sup><sup>d</sup>.ψ<sup>ϑ</sup>(Z*d*) <sup>∧</sup> <sup>ξ</sup><sup>L</sup>*<sup>t</sup>* . Here, <sup>ψ</sup><sup>ϑ</sup>(Z*d*) holds iff *<sup>x</sup>* <sup>∈</sup> <sup>ϑ</sup>(Z<sup>d</sup>) and <sup>ξ</sup><sup>L</sup>*<sup>t</sup>* holds iff <sup>L</sup><sup>t</sup> does not terminate on *x*. As shown in [17], non-termination of linear twn-loops with integer eigenvalues is NP-complete and it is semi-decidable for twn-loops with non-linear arithmetic.

**Theorem 23 (Completeness of Size and Runtime Bound Computation for Terminating PRS Loops).** *For all terminating prs loops, polynomial runtime bounds and finite size bounds are computable. For terminating unit prs loops, all these size bounds are polynomial as well.*

*Example 24.* For the loop L from (1), we computed L<sup>p</sup> for p = 2 in (2), see Example 21. As L<sup>p</sup> is already a twn-loop, we can use the technique of [20] (implemented in our tool KoAT) to obtain the runtime bound x<sup>3</sup> for Lp. Lemma 22 yields the runtime bound 2 · x<sup>3</sup> + 1 for the original loop (1). Of course, here one could also use (incomplete) approaches based on linear ranking functions (also implemented in KoAT, see, e.g., [8,19]) to directly infer the tighter runtime bound x<sup>3</sup> for the loop (1).

#### **4 Size Bounds for Integer Programs**

Up to now, we focused on *isolated* loops. In the following, we incorporate our complete approach from Sect. 2 and 3 into the setting of general *integer programs* where most questions regarding termination or complexity are undecidable. Formally, an integer program is a tuple (V,L, 0, T ) with a finite set of variables V, a finite set of locations L, a fixed initial location <sup>0</sup> ∈ L, and a finite set of transitions T . A *transition* is a 4-tuple (, ϕ, η, ) with a *start location* ∈ L, *target location* ∈ L\{0}, *guard* ϕ ∈ F(V), and *update* η : V → Z[V]. To simplify the presentation, we do not consider "temporary" variables (whose update is nondeterministic), but the approach can easily be extended accordingly. Transitions (0, , , ) are called *initial* and T<sup>0</sup> denotes the set of all initial transitions.

**Fig. 2.** An Integer Program with Non-Linear Size Bounds

*Example 25.* In the integer program of Fig. 2, we omitted identity updates η(v) = v and guards where ϕ is true. Here, V = {x1,...,x5} and L = {0, 1, 2}, where <sup>0</sup> is the initial location. Note that the loop in (1) *corresponds* to transition t1.

**Definition 26 (Correspondence between Loops and Transitions).** *Let* t = (, ϕ, η, ) *be a transition with* ϕ ∈ F(V ) *for some variables* V ⊆ V *such that* η(x) = x *for all* x ∈V\V *and* η(x) ∈ Z[V ] *for all* x ∈ V *. A loop* (ϕ , η ) *with* ϕ ∈ F({x1,...,xd}) *and* η : {x1,...,xd} → Z[{x1,...,xd}] corresponds *to the transition* t *via the variable renaming* π : {x1,...,xd}→V *if* ϕ *is* π(ϕ ) *and for all* 1 ≤ i ≤ d *we have* η(π(xi)) = π(η (xi))*.*

To define the semantics of integer programs, an evaluation step moves from one configuration (, σ) ∈L×Σ to another configuration ( , σ ) via a transition (, ϕ, η, ) where σ(ϕ) holds. Here, σ is obtained by applying the update η on σ. From now on, we fix an integer program P = (V,L, 0, T ).

**Definition 27 (Evaluation of Programs).** *For configurations* (, σ)*,* ( , σ ) *and* t = (t, ϕ, η, <sup>t</sup>) ∈ T *,* (, σ) →<sup>t</sup> ( , σ ) *is an* evaluation *step if* = t*,* = <sup>t</sup>*,* σ(ϕ) = true*, and* σ(η(v)) = σ (v) *for all* v ∈ V*. Let* →<sup>T</sup> = <sup>t</sup>∈T <sup>→</sup><sup>t</sup>*, where we also write* <sup>→</sup> *instead of* <sup>→</sup><sup>t</sup> *or* <sup>→</sup><sup>T</sup> *. Let* (0, σ0) <sup>→</sup><sup>k</sup> (k, σk) *abbreviate* (0, σ0) → ... → (k, σk) *and let* (, σ) →<sup>∗</sup> ( , σ ) *if* (, σ) <sup>→</sup><sup>k</sup> ( , σ ) *for some* k ≥ 0*.*

*Example 28.* If we encode states as tuples (σ(x1),...,σ(x5)) <sup>∈</sup> <sup>Z</sup><sup>5</sup>, then (−6, <sup>−</sup>8, <sup>2</sup>, <sup>1</sup>, 1) <sup>→</sup><sup>t</sup><sup>0</sup> (−6, <sup>−</sup>8, <sup>2</sup>, <sup>1</sup>, 1) <sup>→</sup><sup>2</sup> <sup>t</sup><sup>1</sup> (6, <sup>8</sup>, <sup>0</sup>, <sup>6</sup>, 1) <sup>→</sup><sup>t</sup><sup>2</sup> (6, <sup>8</sup>, <sup>0</sup>, <sup>6</sup>, 1) <sup>→</sup><sup>6</sup> t4 (0, 8, 0, 6, 1).

Now we define size bounds for variables v after evaluating a transition t: SB(t, v) is a *size bound* for v w.r.t. t if for any run starting in σ<sup>0</sup> ∈ Σ, |σ0|(SB(t, v)) is greater or equal to the largest absolute value of v after evaluating t.

**Definition 29 (Size Bounds** [8,19]**).** *A function* SB : (T ×V) → B *is a* (global) size bound *for the program* P *if for all* (t, x) ∈ T ×V *and all states* σ<sup>0</sup> ∈ Σ *we have* |σ0|(SB(t, x)) ≥ sup{|σ (x)||∃ ∈ L. (0, σ0) (→<sup>∗</sup> ◦ →<sup>t</sup>) ( , σ )}*.*

Later in Lemma 35, we will compare the notion of size bounds for transitions in a program from Definition 29 to our earlier notion of size bounds for loops from Definition 6.

*Example 30.* As an example, we give size bounds for the transitions t<sup>0</sup> and t<sup>3</sup> in Fig. 2. Since t<sup>0</sup> does not change any variables, a size bound is SB(t0, xi) = x<sup>i</sup> for all 1 ≤ i ≤ 5. Note that the value of x<sup>5</sup> is never increased and is bounded from below by 0 in any run through the program. Thus, SB(t3, x3) = x<sup>5</sup> = SB(t3, x5). Similarly, we have SB(t3, x1)=2 · x5, SB(t3, x2)=3 · x5, and SB(t3, x4) = x3.

To infer size bounds for transitions as in Definition 29 automatically, we lift *local* size bounds (i.e., size bounds which only hold for a subprogram with transitions T ⊆ T \T0) to global size bounds for the *complete* program. For the subprogram, one considers runs which start after evaluating an *entry transition* of T .

**Definition 31 (Entry Transitions** [8]**).** *Let* ∅ = T ⊆ T \T0*. The* entry transitions *of* T *are* E<sup>T</sup> - = {t | t = ( , , , ) ∈T \T *and there is a* (, , , ) ∈ T }*.*

*Example 32.* For the program in Fig. 2, we have E{t1} = {t0, t3} and E{t4} = {t2}.

**Definition 33 (Local Size Bounds).** *Let* ∅ = T ⊆ T \T<sup>0</sup> *and* t ∈ T *.* SBt- : V→B *is a* local size bound *for* t *w.r.t.* T *if for all* x ∈ V *and all* σ ∈ Σ*:* 4 |σ|(SBt- (x)) ≥ sup{|σ (x)||∃ ∈ L,( , , , ) ∈ E<sup>T</sup> - . (, σ) (→<sup>∗</sup> T - ◦ →t- ) ( , σ )}*.*

Theorem 34 below yields a novel *modular* procedure to infer (global) size bounds from previously computed local size bounds. A local size bound for a transition t w.r.t. a subprogram T ⊆T \T<sup>0</sup> is lifted by inserting size bounds for all entry transitions. Again, this is possible because we only use weakly monotonically increasing functions as bounds. Here, "b [v/p<sup>v</sup> | v ∈ V]" denotes the bound which results from replacing every variable v by p<sup>v</sup> in the bound b.

**Theorem 34 (Lifting Local Size Bounds).** *Let* ∅ = T ⊆T \T0*, let* SB<sup>t</sup>- *be a local size bound for a transition* t *w.r.t.* T *and let* SB : (T ×V) → B *be a size bound for* P*. Let* SB (t , x) = - <sup>r</sup>∈E<sup>T</sup> - SB<sup>t</sup>- (x) [v/SB(r, v) | v ∈ V] *and* SB (t, x) = SB(t, x) *for all* t = t*. Then* SB *is also a size bound for* P*.*

To obtain local size bounds which can then be lifted via Theorem 34, we look for transitions t<sup>L</sup> that correspond to a loop L and then we compute a size bound for L as in Sect. 2 and 3. The following lemma shows that size bounds for loops as in Definition 6 indeed yield local size bounds for the corresponding transitions.<sup>5</sup>

**Lemma 35 (Local Size Bounds via Loops).** *Let* SB<sup>L</sup> *be a size bound for a loop* L *(as in Definition 6) which corresponds to a transition* t<sup>L</sup> *via a variable renaming* <sup>π</sup>*. Then* <sup>π</sup> ◦ SB<sup>L</sup> ◦ <sup>π</sup>−<sup>1</sup> *is a local size bound for* <sup>t</sup><sup>L</sup> *w.r.t.* {t<sup>L</sup>} *(as in Definition 33).*

*Example 36.* SB<sup>L</sup>(x4) = <sup>x</sup><sup>4</sup> + 3 · <sup>x</sup><sup>3</sup> <sup>3</sup> + 2 · <sup>x</sup><sup>2</sup> <sup>3</sup> + x<sup>3</sup> is a size bound for x<sup>4</sup> in the loop (1), see Example 8. This loop corresponds to transition t<sup>1</sup> in the program of Fig. 2. Since E{t1} = {t0, t3} by Example 32, Theorem 34 yields the following (non-linear) size bound for x<sup>4</sup> in the full program of Fig. 2 (see Example 30 for SB(t0, v) and SB(t3, v)):

$$\begin{aligned} \mathcal{SS}(t\_1, x\_4) &= \mathcal{SS}\_L(x\_4) \left[ v/\mathcal{SS}(t\_0, v) \mid v \in \mathcal{V} \right] + \mathcal{SS}\_L(x\_4) \left[ v/\mathcal{SS}(t\_3, v) \mid v \in \mathcal{V} \right] \\ &= \left( x\_4 + 3 \cdot x\_3^3 + 2 \cdot x\_3^2 + x\_3 \right) + \left( x\_3 + 3 \cdot x\_5^3 + 2 \cdot x\_5^2 + x\_5 \right) \\ &= 2 \cdot x\_3 + 2 \cdot x\_3^2 + 3 \cdot x\_3^3 + x\_4 + x\_5 + 2 \cdot x\_5^2 + 3 \cdot x\_5^3 \end{aligned}$$

Analogously, we infer the remaining size bounds SB(t1, xi), e.g., SB(t1, x1) = (4 · x<sup>1</sup> + 2 · x2) [v/SB(t0, v) | v∈V] + (4 · x<sup>1</sup> +2 · x2) [v/SB(t3, v) | v∈V]=4 · x<sup>1</sup> + 2 · x<sup>2</sup> + 14 · x5.

<sup>4</sup> To simplify the formalism, in this definition, we consider every possible configuration (, σ) and not only configurations which are reachable from the initial location 0.

<sup>5</sup> Local or global size bounds for transitions only have to hold if the transition is indeed taken. In contrast, size bounds for loops also have to hold if there is no loop iteration. This will be needed in Theorem 38 to compute local size bounds for simple cycles.

Our approach alternates between improving size and runtime bounds for individual transitions. We start with SB(t0, x) = |η(x)| for initial transitions t<sup>0</sup> ∈ T<sup>0</sup> where η is t0's update, and SB(t, ) = ω for t ∈T \T0. Here, similar to the notion |p| in Sect. 2, for every polynomial p = - <sup>j</sup> c<sup>j</sup> · β<sup>j</sup> with normalized monomials β<sup>j</sup> , |p| is the polynomial - <sup>j</sup> |c<sup>j</sup> | · β<sup>j</sup> . To improve the size bounds of transitions that correspond to (possibly non-linear) solvable loops, we can use closed forms (Theorem 7) and the lifting via Theorem 34. Otherwise, we use an existing incomplete technique [8] to improve size bounds (where [8] essentially only succeeds for updates without non-linear arithmetic). In this way, we can automatically compute polynomial size bounds for all remaining transitions and variables in the program of Fig. 2 (e.g., we obtain SB(t2, x1) = SB(t1, x1) = 4 · x<sup>1</sup> + 2 · x<sup>2</sup> + 14 · x5).

Both the technique from [8] and our approach from Theorem 7 rely on runtime bounds to compute size bounds. On the other hand, as shown in [8,19,27], size bounds for "previous" transitions are needed to infer (global) runtime bounds for transitions in a program. For that reason, the alternated computation resp. improvement of global size and runtime bounds for the transitions is repeated until all bounds are finite. We will illustrate this in more detail in Sect. 5.

In Definition 26 and Lemma 35 we considered transitions with the same start and target location that directly correspond to loops. To increase the applicability of our approach, as in [27] now we consider so-called *simple cycles*, where iterations through the cycle can only be done in a unique way. So the cycle must not have subcycles and there must not be any indeterminisms concerning the next transition to be taken. Formally, C = {t1,...,t<sup>n</sup>}⊆T is a simple cycle if there are pairwise different locations 1,...,<sup>n</sup> such that t<sup>i</sup> = (i, , , <sup>i</sup>+1) for 1 ≤ i ≤ n − 1 and t<sup>n</sup> = (n, , , 1). To handle simple cycles, we *chain* transitions.<sup>6</sup>

**Definition 37 (Chaining (see, e.g.,** [27]**)).** *Let* t1,...,t<sup>n</sup> ∈ T *where* t<sup>i</sup> = (i, ϕi, ηi, <sup>i</sup>+1) *for all* 1 ≤ i ≤ n − 1*. Then the transition* t<sup>1</sup> ... 
 t<sup>n</sup> = (1, ϕ, η, <sup>n</sup>+1) *results from* chaining t1,...,t<sup>n</sup> *where*

$$\begin{aligned} \varphi &= \varphi\_1 \land \eta\_1(\varphi\_2) \land \eta\_2(\eta\_1(\varphi\_3)) \land \dots \land \eta\_{n-1}(\dots \eta\_1(\varphi\_n) \dots) \\ \eta(v) &= \eta\_n(\dots \eta\_1(v) \dots) \text{ for all } v \in \mathcal{V}, \; i.e., \; \eta = \eta\_n \circ \dots \circ \eta\_1. \end{aligned}$$

Now we want to compute a *local* size bound for the transition t<sup>n</sup> w.r.t. a simple cycle C = {t1,...,t<sup>n</sup>} where a loop L corresponds to t<sup>1</sup> ... 
 t<sup>n</sup> via π. Then a size bound SB<sup>L</sup> for the loop L yields the size bound π ◦ SB<sup>L</sup> ◦ <sup>π</sup>−<sup>1</sup> for <sup>t</sup><sup>n</sup> regarding runs through <sup>C</sup> starting in <sup>t</sup>1. However, to obtain a local size bound SB<sup>t</sup>*<sup>n</sup>* w.r.t. C, we have to consider runs starting after any entry transition ( , , , i) ∈ EC. Hence, we use <sup>|</sup> <sup>η</sup>n(...ηi(π(SB<sup>L</sup>(π−<sup>1</sup>(x))))...)<sup>|</sup> for any ( , , , i) ∈ EC. In this way, we also capture evaluations starting in i, i.e., without evaluating the complete cycle.

<sup>6</sup> The chaining of a loop <sup>L</sup> in Definition <sup>20</sup> corresponds to <sup>p</sup> <sup>−</sup> 1 chaining steps of a transition t<sup>L</sup> via Definition 37, i.e., to t<sup>L</sup> ... tL.

**Theorem 38 (Local Size Bounds for Simple Cycles).** *Let* C = {t1,...,tn} ⊆ T *be a simple cycle and let* SB<sup>L</sup> *be a size bound for a loop* L *which corresponds to* t<sup>1</sup> ...
t<sup>n</sup> *via a variable renaming* π*. Then a* local size bound SBt*<sup>n</sup> for* t<sup>n</sup> *w.r.t.* C *is* SBt*<sup>n</sup>* (x) = - <sup>1</sup>≤i≤n,( , , ,*<sup>i</sup>*)∈E<sup>C</sup> <sup>|</sup> <sup>η</sup>n(...ηi(π(SBL(π−<sup>1</sup>(x))))...)|*.*

*Example 39.* As an example, in the program of Fig. 2 we replace t<sup>1</sup> = (1, x<sup>3</sup> > 0, η1, 1) by t1<sup>a</sup> = (1, true, η1a, <sup>1</sup>) and t1<sup>b</sup> = ( <sup>1</sup>, x<sup>3</sup> > 0, η1b, 1) with a new location <sup>1</sup>, where η1a(v) = η1(v) for v ∈ {x1, x2}, η1b(v) = η1(v) for v ∈ {x3, x4}, and η1<sup>a</sup> resp. η1<sup>b</sup> are the identity on the remaining variables. Then {t1<sup>a</sup>, t1b} forms a simple cycle and Theorem 38 allows us to compute local size bounds SB<sup>t</sup>1*<sup>b</sup>* and SB<sup>t</sup>1*<sup>a</sup>* w.r.t. {t1<sup>a</sup>, t1<sup>b</sup>}, because the chained transitions t1<sup>a</sup> t1<sup>b</sup> = t<sup>1</sup> and t1<sup>b</sup>t1<sup>a</sup> both correspond to the loop (1). They can then be lifted to global size bounds as in Example 36 using size bounds for the entry transitions E{t1*a*,t1*b*} = {t0, t3}.

This shows how we choose t and T when lifting local size bounds to global ones with Theorem 34: For a transition t we search for a simple cycle T such that chaining the cycle results in a twn- or suitable solvable loop and the size bounds of E<sup>T</sup> are finite. For all other transitions, we compute size bounds as in [8].

#### **5 Completeness of Size and Runtime Analysis for Programs**

For individual loops, we showed in Theorem 23 that polynomial runtime bounds and finite size bounds are computable for all terminating prs loops. In this section, we discuss completeness of the size bound technique from the previous section and of termination and runtime complexity analysis for general integer programs. We show that for a large class of programs consisting of consecutive prs loops, in case of termination we can always infer finite runtime and size bounds.

To this end, we briefly recapitulate how size bounds are used to compute runtime bounds for general integer programs, and show that our new technique to infer size bounds also results in better runtime bounds. We call RB : T →B a *(global) runtime bound* if for every transition t ∈ T and state σ<sup>0</sup> ∈ Σ, |σ0|(RB(t)) over-approximates the number of evaluations of t in any run starting in (0, σ0).

**Definition 40 (Runtime Bound** [8,19]**).** *A function* RB : T →B *is a* (global) runtime bound *if for all* t ∈ T *and all states* σ<sup>0</sup> ∈ Σ*, we have* |σ0|(RB(t)) ≥ sup{n ∈ N | ∃ ( , σ ). (0, σ0) (→<sup>∗</sup> <sup>T</sup> ◦ →<sup>t</sup>)<sup>n</sup> ( , σ )}*.*

For our example in Fig. 2, a global runtime bound for t0, t2, and t<sup>3</sup> is RB(t0) = 1 and RB(t2) = RB(t3) = x5, as x<sup>5</sup> is bounded from below by t3's guard x<sup>5</sup> > 1 and the value of x<sup>5</sup> decreases by 1 in t3, and no transition increases x5.

To infer global runtime bounds automatically, similar as for size bounds, we first consider a smaller subprogram T ⊆ T and compute *local runtime bounds* for non-empty subsets T <sup>&</sup>gt; ⊆ T . A local runtime bound measures how often a transition t ∈ T <sup>&</sup>gt; can occur in a run through T that starts after an entry transition r ∈ E<sup>T</sup> - . Thus, local runtime bounds do not consider how many T runs take place in a global run and they do not consider the sizes of the variables before starting a T -run. We lift these local bounds to global runtime bounds for the complete program afterwards.

**Definition 41 (Local Runtime Bound** [27]**).** *Let* ∅ = T <sup>&</sup>gt; ⊆ T ⊆ T *.* RB<sup>T</sup> - *<sup>&</sup>gt;* ∈ B *is a* local runtime bound *for* T <sup>&</sup>gt; *w.r.t.* T *if for all* t ∈ T <sup>&</sup>gt;*, all* r ∈ E<sup>T</sup> *with* r = (, , , )*, and all* σ ∈ Σ*, we have* |σ|(RB<sup>T</sup> - *<sup>&</sup>gt;* ) ≥ sup{n ∈ N | ∃ σ0,( , σ ). (0, σ0) →<sup>∗</sup> <sup>T</sup> ◦ →<sup>r</sup> (, σ) (→<sup>∗</sup> T - ◦ →<sup>t</sup>)<sup>n</sup> ( , σ )}*.*

*Example 42.* In Fig. 2, local runtime bounds for T <sup>&</sup>gt; = T = {t1} and for T <sup>&</sup>gt; = T = {t4} are RB{t1} = x<sup>3</sup> and RB{t4} = x1. Local runtime bounds can often be inferred automatically by approaches based on ranking functions (see, e.g., [8]) or by the complete technique for terminating prs loops (see Theorem 23).

If we have a local runtime bound RB<sup>T</sup> - *<sup>&</sup>gt;* w.r.t. T - , then setting RB(t) to <sup>r</sup>∈E<sup>T</sup> - RB(r)·(RB<sup>T</sup> - *<sup>&</sup>gt;* [v/SB(r, v) | v∈V]) for all t ∈ T <sup>&</sup>gt; yields a global runtime bound [27]. Here, we over-approximate the number of local T -runs which are started by an entry transition r ∈ E<sup>T</sup> by an already computed global runtime bound RB(r). Moreover, we instantiate each v ∈ V by a size bound SB(r, v) to consider the size of v before a local T -run is started. So as mentioned in Sect. 4, we need runtime bounds to infer size bounds (see Theorem 7 and the inference of global size bounds in [8]), and on the other hand we need size bounds to compute runtime bounds. Thus, our implementation alternates between size bound and runtime bound computations (see [8,27] for a more detailed description of this alternation).

*Example 43.* Based on the local runtime bounds in Example 42, we can compute the remaining global runtime bounds for our example. We obtain RB(t1) = RB(t0)·(x<sup>3</sup> [v/SB(t0, v) <sup>|</sup> <sup>v</sup> ∈ V]) + RB(t3)·(x<sup>3</sup> [v/SB(t3, v) <sup>|</sup> <sup>v</sup> ∈ V]) = <sup>x</sup><sup>3</sup> <sup>+</sup> <sup>x</sup><sup>2</sup> 5 and RB(t4) = RB(t2) · (x<sup>1</sup> [v/SB(t2, v) | v ∈ V]) = x<sup>5</sup> · (4 · x<sup>1</sup> + 2 · x<sup>2</sup> + 14 · x5). Thus, overall we have a quadratic runtime bound - <sup>1</sup>≤i≤<sup>5</sup> RB(ti). Note that it is due to our new size bound technique from Sect. 2–4 that we obtain polynomial runtime bounds in this example. In contrast, to the best of our knowledge, all other state-of-the-art tools fail to infer polynomial size or runtime bounds for this example. Similarly, if one modifies t<sup>4</sup> such that instead of x1, x<sup>4</sup> is decreased as long as x<sup>4</sup> > 0 holds, then our approach again yields a polynomial runtime bound, whereas none of the other tools can infer finite runtime bounds.

Finally, we state our completeness results for integer programs. For a set C⊆T and , ∈ L, let -<sup>C</sup> hold iff there is a transition (, , , ) ∈ C. We say that C is a *component* if we have -+ <sup>C</sup> for all locations , occurring in C, where -+ <sup>C</sup> is the transitive closure of -C. So in particular, we must also have  -+ <sup>C</sup> for all locations in the transitions of <sup>C</sup>. We call an integer program *simple* if every component is a simple cycle that is "reachable" from any initial state.

**Definition 44 (Simple Integer Program).** *An integer program* (V,L, 0, T ) *is* simple *if every component* C⊆T *is a simple cycle, and for every entry transition* ( , , , ) ∈ E<sup>C</sup> *and every* σ<sup>0</sup> ∈ Σ*, there is an evaluation* (0, σ0) →<sup>∗</sup> T (, σ0)*.*

In Fig. 2, T \{t0} is a component that is no simple cycle. However, if we remove t<sup>3</sup> and replace t0's guard by true, then the resulting program P is simple (but not linear). A simple program terminates iff each of its isolated simple cycles terminates. Thus, if we can prove termination for every simple cycle, then the overall program terminates. Hence, if after chaining, every simple cycle corresponds to a linear, unit prs loop, then we can decide termination and infer polynomial runtime and size bounds for the overall integer program. For terminating, non-unit prs loops, runtime bounds are still polynomial but size bounds can be exponential. Hence, then the global runtime bounds can be exponential as well. Note that in the example program P above, the eigenvalues of the update matrices of t<sup>1</sup> and t<sup>4</sup> have absolute value 1, i.e., t<sup>1</sup> and t<sup>4</sup> correspond to unit prs loops. Hence, by Theorem 45 we obtain polynomial runtime and size bounds for P .

#### **Theorem 45 (Completeness Results for Integer Programs)**


In the definition of simple integer programs (Definition 44), we required that for every component C and every entry transition ( , , , ) ∈ EC, there is an evaluation (0, σ0) →<sup>∗</sup> <sup>T</sup> (, σ0) for every <sup>σ</sup><sup>0</sup> <sup>∈</sup> <sup>Σ</sup>. If one strengthens this by requiring that one can reach from <sup>0</sup> using only transitions whose guard is true and whose update is the identity, then the class of programs in Theorem 45 (a) is decidable (there are only n ways to chain a simple cycle with n transitions and checking whether a loop is a prs loop is decidable by Lemma 19).

#### **6 Conclusion and Evaluation**

*Conclusion.* In this paper, we developed techniques to infer size bounds automatically and to use them in order to obtain bounds on the runtime complexity of programs. This yields a complete procedure to prove termination and to infer

runtime and size bounds for a large class of integer programs. Moreover, we showed how to integrate the complete technique into an (incomplete) modular technique for general integer programs. To sum up, we presented the following new contributions in this paper:


To infer local runtime bounds as in Definition 41, KoAT first applies multiphase-linear ranking functions (see [5,19]), which can be done very efficiently. For twn-loops where no finite bound was found, it then uses the computability of runtime bounds for terminating twn-loops (see [17,20,27]). When computing size bounds, KoAT first applies the technique of [8] for reasons of efficiency and in case of exponential or infinite size bounds, it tries to compute size bounds via closed forms as in the current paper. Here, SymPy [30] is used to compute Jordan normal forms for the transformation to twn-loops. Moreover, KoAT applies a local control-flow refinement technique [19] (using the tool iRank-Finder [13]) and preprocesses the program in the beginning, e.g., by extending the guards of transitions with invariants inferred by Apron [24]. For all SMT problems, KoAT uses Z3 [31]. In the future, we plan to extend the runtime bound inference of KoAT to prs loops and to extend our size bound computations also to suitable non-linear non-twn-loops.

*Evaluation.* To evaluate our new technique, we tested KoAT on the 504 benchmarks for *Complexity of* C *Integer Programs* (CINT) from the *Termination Problems Data Base* [35] which is used in the annual *Termination and Complexity Competition (TermComp)* [18]. Here, all variables are interpreted as integers over Z (i.e., without overflows). To distinguish the original version of KoAT [8] from our re-implementation, we refer to them as KoAT1 resp. KoAT2. We used the following configurations of KoAT2, which apply different techniques to infer size bounds.


<sup>7</sup> For a homogeneous solvable loop, the closed form of the twn-loop over A that results from its transformation is particularly easy to compute.

The CINT collection contains almost only examples with linear arithmetic and the existing tools can already solve most of its benchmarks which are not known to be non-terminating.<sup>8</sup> While most complexity analyzers are essentially restricted to programs with linear arithmetic, our new approach also succeeds on programs with *non-linear* arithmetic. Some programs with non-linear arithmetic could already be handled by KoAT due to our integration of the complete technique for the inference of local runtime bounds in [27]. But the approach from the current paper increases KoAT's power substantially for programs (possibly with non-linear arithmetic) where the values of variables computed in "earlier" loops influence the runtime of "later" loops (e.g., the modification of our example from Fig. 2 where t<sup>4</sup> decreases x<sup>4</sup> instead of x1, see the end of Example 43).


**Table 1.** Evaluation on the Collection CINT<sup>+</sup>

Therefore, we extended CINT by 15 new typical benchmarks including the programs in (1), Fig. 2, and the modification of Fig. 2 discussed above, as well as several benchmarks from the literature (e.g., [3,6]), resulting in the collection CINT<sup>+</sup>. For KoAT2 and KoAT1, we used Clang [11] and llvm2kittel [14] to transform C into integer programs as in Sect. 4. We compare KoAT2 with KoAT1 [8] and the tools CoFloCo [15], MaxCore [2] with CoFloCo in the backend, and Loopus [33]. These tools also rely on variants of size bounds: CoFloCo uses a set of constraints to measure the size of variables w.r.t. their initial and final values, MaxCore's size bound computations build upon [12], and Loopus considers suitable bounding invariants to infer size bounds.

Table 1 gives the results of our evaluation, where as in *TermComp*, we used a timeout of 5 min per example. The first entry in every cell denotes the number of benchmarks from CINT<sup>+</sup> for which the tool inferred the respective bound. The number in brackets only considers the 15 new examples. The runtime bounds computed by the tools are compared asymptotically as functions which depend on the largest initial absolute value n of all program variables. So for example, KoAT2+SIZE proved a linear runtime bound for 231 + 2 = 233 benchmarks, i.e., rc(σ) ∈ O(n) holds for all initial states where |σ(v)| ≤ n for all v ∈ V.

<sup>8</sup> iRankFinder [13] proves non-termination for 119 programs in CINT. KoAT2orig already infers finite runtimes for 343 of the remaining 504 − 119 = 386 examples in CINT.

Overall, this configuration succeeds on 358 examples, i.e., "< ω" is the number of examples where a finite bound on the runtime complexity could be computed by the tool within the time limit. "AVG<sup>+</sup>(s)" denotes the average runtime of successful runs in seconds, whereas "AVG(s)" is the average runtime of all runs.

Already on the original benchmarks CINT, integrating our novel technique for the inference of size bounds leads to the most powerful approach for runtime complexity analysis. The effect of the new size bound technique becomes even clearer when also considering our new examples which contain non-linear arithmetic and loops whose runtime depends on the results of earlier loops in the program. Thus, the new contributions of the paper are crucial in order to extend automated complexity analysis to larger programs with non-linear arithmetic.

KoAT's source code, a binary, and a Docker image are available at https:// koat.verify.rwth-aachen.de/size. This website also has details on our experiments, a list and description of the new examples, and *web interfaces* to run KoAT's configurations directly online.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Recurrence-Driven Summations in Automated Deduction**

Visa Nummelin<sup>1</sup> , Jasmin Blanchette1,2(B) , and Sander R. Dahmen<sup>1</sup>

<sup>1</sup> Vrije Universiteit Amsterdam, Amsterdam, The Netherlands *{*visa.nummelin,s.r.dahmen,j.c.blanchette*}*@vu.nl <sup>2</sup> Ludwig-Maximilians-Universit¨at M¨unchen, Munich, Germany

jasmin.blanchette@lmu.de

**Abstract.** Many problems in mathematics and computer science involve summations. We present a procedure that automatically proves equations involving finite summations, inspired by the theory of holonomic sequences. The procedure is designed to be interleaved with the activities of a higher-order automatic theorem prover. It performs an induction and automatically solves the induction step, leaving the base cases to the theorem prover.

## **1 Introduction**

Finite summations—that is, summations n <sup>i</sup>=<sup>m</sup>t<sup>i</sup> over finitely many terms ti are ubiquitous in mathematics and computer science, but they are poorly supported by automatic theorem provers. One reason is that summations are higherorder, whereas most theorem provers are first-order.

In recent years, we have seen the rise of higher-order provers [2,3,16–18]. With these provers, n <sup>i</sup>=<sup>m</sup>t<sup>i</sup> can be represented as sum m n (λi. ti); the traditional syntax can be seen as syntactic sugar. But despite the use of heuristics [17, Sect. 4], higher-order provers are ill-equipped to reason inductively. A simple problem such as n <sup>i</sup>=0i = n(n + 1)/2 is a formidable challenge for them, even if we include axioms for +, ·, /, and together with an induction principle.

In this paper, we introduce a procedure for proving such equations in a higher-order prover. The procedure is triggered by a proof goal of the form k s + t = u, possibly with some conditions (Sect. 2). In a refutational prover, the equation would be negated, as k s + t -= u, and would correspond to the negated conjecture, a problem axiom, or some clause derived by the prover.

Our procedure translates facts about summations to linear recurrences. These recurrences have almost the same form as multivariate holonomic sequences [20], which, while not being a prerequisite for reading this paper, strongly inspired our work. Each recurrence is associated with a multivariate sequence—a sequence with one or more indices. In this paper, the word "sequence" generally means "multivariate sequence."

The procedure has three steps.


Propagation and induction apply holonomic-style techniques almost as a black box. Initialization connects them to the overall proof search.

For example, to prove n <sup>i</sup>=0i = n(n + 1)/2, the procedure would transform the equation into recurrences and find out that the difference n <sup>i</sup>=0i−n(n+ 1)/2 remains constant as n increases, thereby establishing the induction step. If that difference is constantly 0, we get n <sup>i</sup>=0i = n(n + 1)/2; in general, it suffices to prove a number of base cases, which are left to the prover. This example is very simple, but the procedure scales up to more sophisticated problems (Sect. 6). An implementation is under way in the Zipperposition prover [17].

The procedure treats as an interpreted (built-in) symbol. The summation expression evaluates to a value in a commutative group, or a ring if ring multiplication is present. The commutative group or ring gives us +, ·, and −. These are also interpreted, as are numerals. Integers, including indices, can multiply group elements. Based on the interpretation, we use the forms t = u and t − u = 0 interchangeably.

Compared with Wilf–Zeilberger pairs [19] and other methods (Sect. 7), the main benefit of our procedure is that it goes beyond holonomic sequences and supports both uninterpreted functions and an infinite number of base cases. Our procedure is widely applicable and may help prove not only difficult summations in a restrictive form but also easier summations in a more general form, which is useful in a general-purpose theorem prover. At the heart of our work is the novel combination of techniques from superposition and holonomic sequences, which is visible both in the prover integration (Sect. 2) and in the computation of so-called excess terms (Sect. 4). We refer to our technical report [14] for more details.

#### **2 Inference Rule**

Our procedure can be integrated into a theorem prover, where it takes the form of an inference rule that complements the prover's existing rules. Our technical report discusses an integration with satisfiability modulo theories (SMT) and tableaux; here, we present a rule for superposition:

$$\frac{C\_1 \dashv \quad C\_l \quad C' \lor t[\vec{s}] \neq 0}{D \lor C' \lor \bigvee\_{\vec{b} \in B} t[\vec{b}] \neq 0} \text{SUMMATION}$$

These side conditions apply:


The intuition behind the rule is that the conclusion should be easier to refute than the rightmost premise. As for the premises C1,...,Cl, they can contain useful information about s, often about bounds.

## **3 Initialization**

The first step of our procedure is to recognize the structure of recurrences. Variables on which we can perform induction appear as Skolem constants in the negated goal. Further opportunities for induction can be created by generalizing complex terms. Also as part of this step, we must choose which terms represent (multivariate) sequences and which clauses represent their recurrences.

**Theory Detection.** We require the necessary theory of summation to be predefined. Specifically, this refers to the inductive theory of integers, axioms for commutative groups (including multiplication by integers), and the definition of summation from 0 by -−1 <sup>n</sup>=0f<sup>n</sup> = 0 and m+1 <sup>n</sup>=0 f<sup>n</sup> = m <sup>n</sup>=0f<sup>n</sup> + f<sup>m</sup>+1 even for negative <sup>m</sup> <sup>∈</sup> <sup>Z</sup>. Other finite intervals than [0, m] are expressible as differences.

Ring multiplication may be absent, so we do not take it as predefined. Instead, we search candidate binary operators from the negated goal. For each candidate, we can try to prove left and right distributivity by syntactically looking for that axiom or by running another instance of the prover. Distributivity is the only necessary property to apply the procedure, but associativity, commutativity, and the unit element can also be used in simplifications.

**Term Generalization.** Term generalization transforms Skolem constants or complex terms into variables and then performs an induction on the variables. We propose a straightforward heuristic: For each nonnumeral subterm s of type Z occurring in the negated goal, generalize s if s stays variable-free even after recursively applying this heuristic on the proper subterms of s itself. For example, in the following variable-free integer terms, the underlined subterms would be generalized: a, 123, f 0 2, 2f (g (−1)) (−3a), f 1 (g (a + 1)), f (g a) 7a.

Let s = (s1,...,sd) be the subterms chosen for generalization. Then, based on the negated goal C ∨ t[s ] -= 0 (as in the Summation rule), generalization sets up the goal <sup>∀</sup>n <sup>∈</sup> N. t[n] = 0 where <sup>N</sup> <sup>⊆</sup> <sup>Z</sup><sup>d</sup> collects the bounds of s (often N = N<sup>d</sup>). We try to prove this goal up to base cases and other mild conditions.

The generalization makes it possible to use induction to prove that the goal sequence term t[n]—a function of n—equals zero on N. We try to prove the generalized goal assuming ¬C and some extra conditions E such as the base cases of the induction. Then, instantiating n := s, we conclude C ∨¬E∨t[s ] = 0. This, together with the negated goal C ∨ t[s ] -= 0, implies a conclusion of the form C ∨ ¬E for the Summation rule. Note that C is not generalized.

The set N embodies knowledge about s that we find among existing clauses C1,...,C<sup>l</sup> and the condition ¬C . The free variables of ¬C are interpreted as constants, and they can also occur in s. For example, assume that s = f s and n = n and that the generalized goal contains the factorial n!. Its recurrence must be in a conditional clause—e.g., (m + 1)! = (m + 1) m! ∨ 0 -≤ m. To use this recurrence for n!, we need n ≥ 0, which we can ensure using N if we find a bounding clause f s ≥ 0 or its generalization such as f m ≥ 0 where m is a free variable. The more we know about s, the more recurrences we can get. At the same time, N must allow induction, so we keep it convex by considering only coordinatewise bounds of s.

**Form of Sequence Terms.** Sequence terms are terms of the underlying higherorder logic that our procedure can work with. From their structure, we distinguish (pointwise) addition and multiplication, summation, and affine substitution. This gives a first-order grammar to express the sequence terms.

**Definition 1.** *Sequence terms* on a ring A are inductively defined as follows. The logic's terms of type A with distinguished integer variables n are sequence terms. If f<sup>n</sup> and g<sup>n</sup> are sequence terms with d variables n, then so are f<sup>n</sup> + gn, f<sup>n</sup> · gn, -c•n+a <sup>i</sup>=0 f{n*<sup>j</sup>* →i}<sup>n</sup>, and σf<sup>n</sup> = fσ<sup>n</sup> where c is a vector, a is an integer, c • n <sup>=</sup> <sup>c</sup>1n<sup>1</sup> <sup>+</sup> ··· <sup>+</sup> <sup>c</sup>dnd, and <sup>σ</sup> is an affine substitution (meaning σ m <sup>=</sup> q m <sup>+</sup>b for a matrix q and a vector b); a, the entries of c, and the entries of σ (meaning the entries of q and b) must be numerals.

**Remark 2.** In Definition 1 and in the sequel, a commutative group can be used instead of a ring if ring multiplication is absent. In this case, all formulas involving ring multiplication (e.g., f<sup>n</sup> · g<sup>n</sup>) should be ignored.

We view sequence terms as functions <sup>Z</sup><sup>d</sup> <sup>→</sup> <sup>A</sup>. We then write the sequence terms from the definition compactly as f + g, f · g, a <sup>j</sup> f, and σf, and call a = c • n + d an affine variable sum. Moreover, since ·, a <sup>j</sup> , and σ all distribute over +, we can write any sequence term as <sup>c</sup>1<sup>f</sup> <sup>1</sup>+···+ck<sup>f</sup> <sup>k</sup> where the coefficients c<sup>j</sup> are numerals and the sequence terms f<sup>j</sup> are distinct and do not contain +. Finally, we forbid variable shadowing: a <sup>n</sup>*j*=0 binds n<sup>j</sup> , and while a j b <sup>j</sup>g and n*j* <sup>j</sup> g and other references to n<sup>j</sup> outside a <sup>j</sup> are syntactically valid, we avoid such forms by renaming them during encoding and never reintroducing them.

**Choice of Initial Recurrences.** Semantically, the recurrences we look for are multivariate heterogeneous linear finite-fixed-step equations with polynomial coefficients. An archetypical example is

$$\left(n^2 + 1\right)f\_{n+2,m+1} + mf\_{n+1,m} = nmf\_{n,m+1} - 2h\_{n,m} + \left(m - n\right)h\_{n,m+1} + 1\tag{1}$$

Here, the sequences f, h, 1 are bivariate, and the sequence indices are all of the form <sup>n</sup> <sup>+</sup> <sup>k</sup> or <sup>m</sup> <sup>+</sup> <sup>k</sup> for numerals <sup>k</sup> <sup>∈</sup> <sup>Z</sup>, amounting to finite fixed steps.

The general form is 0 = <sup>P</sup>1g<sup>1</sup> <sup>+</sup> ··· <sup>+</sup> <sup>P</sup>kg<sup>k</sup> <sup>=</sup> P • g where g <sup>=</sup> g<sup>1</sup>, ..., gk is a tuple of sequence terms and P is a tuple of operator polynomials as defined below. If k = 1, we have a homogeneous recurrence of g<sup>1</sup>; otherwise, it is heterogeneous.

**Definition 3.** *Operator polynomials* are a Z-algebra with composition as product (meaning closed under addition, composition, and integer multiplication) spanned by the multiplier and shift operators:


With d index variables, the operator polynomials look like ordinary polynomials Z[M1, ..., Md, S1, ..., Sd], but the composition product is noncommutative since SiM<sup>i</sup> = MiS<sup>i</sup> + S<sup>i</sup> for all i = 1,...,d (a derivation of which is given in the next section directly above equation (2)). As an example of expressing recurrences in terms of operator polynomials, consider the previous archetypical recurrence (1). Taking n as the first and m as the second variable, the recurrence reads

$$\left( (M\_1^2 + 1)S\_1^2 S\_2 + M\_2 S\_1 - M\_1 M\_2 S\_2 \right) \cdot f + \left( 2 - (M\_2 - M\_1) S\_2 \right) \cdot h + (-1) \cdot 1 = 0$$

**Remark 4.** The expression P • g identifying a recurrence is itself a sequence term. It suffices to observe that if f is a sequence term, then so are the substitution Sjf and the product Mjf = (n → n<sup>j</sup> ) · f with the projection sequence term n → n<sup>j</sup> .

As sketched in Sect. 1, we must select some of the problem axioms as initial recurrences for the procedure. This is accomplished as follows. Let there be an edge between two axioms of the form C ∨s = t (where C may be empty) if they both contain a top-level occurrence of the same sequence g, i.e., an occurrence of g that is not nested inside an uninterpreted function symbol. The axioms then form a graph. We take as initial recurrences the connected component of the generalized goal.

By a sequence g, we mean the f a part of a term of the form f a n where f is an uninterpreted function symbol, a is a tuple of variable-free terms, and n is a nonempty tuple of integer variables or affine (i.e., linear term + constant term) combinations of them. The tuples a and n may in general be interleaved.

In other contexts, an analogous step is known as lemma filtering or premise selection [4, Sect. 2]. Clutter from irrelevant facts is less of an issue in the context of our procedure because it can use only linear recurrences. Beyond this, our simple heuristic does nothing to avoid clutter.

What should we do about conditions such as C in C ∨ f a n = t? We could forbid them and work only with unit equations such as f a n = t. We could collect them and put them in the D component of the Summation rule's conclusion. Or we could attempt to prove them when the initial recurrences are selected. In our ongoing implementation, we chose the first option, but what the best option is remains an open question.

## **4 Propagation**

*Holonomic sequences* can be defined by homogeneous recurrences with polynomial coefficients and finitely many base cases. They are closed under the four operations that build sequence terms (+, ·, a <sup>j</sup> , σ), which especially makes their equality decidable [20]. The closure is realized by four procedures to derive recurrences of a sequence term from the recurrences of its immediate subterms, which we call *propagation*. We can propagate independently of the base cases and hence work on nonholonomic sequence terms [6]. Although we expect the holonomic subcase to be decidable in our setting, in general decidable equality is lost. Additionally, unlike in the holonomic setting, we allow heterogeneous recurrences. We will build this into our noncommutative Gr¨obner basis setup that is used in the propagation procedures.

**Gr¨obner Bases of Recurrence Operators.** A (generalized) Gr¨obner basis is a certain well-behaved generating set of a left-ideal of (possibly noncommutative) polynomials. Equivalently, we will view it as a system of polynomial equations that is complete for rewriting. Given a polynomial equation P = 0, for every monomial M we get a rewrite rule as follows. Decompose MP as MP = L + R where L is the leading monomial of MP w.r.t. a fixed monomial ordering times its coefficient. Then L = −R gives rise to a rewrite rule L → −R. A system of equations is complete for rewriting if every one of its consequences can be proved via rewriting by these rules.

**Example 5.** The system ab<sup>2</sup> = a + b, a2b = a + 1 does not prove its consequence a<sup>2</sup> = b by rewriting. (We can see that a<sup>2</sup> = b is a consequence by multiplying the first equation by a and the second equation by b and then by subtracting the two equations.) In the other direction, the system's Gr¨obner basis <sup>a</sup><sup>2</sup> <sup>=</sup> b, b<sup>2</sup> <sup>=</sup> <sup>a</sup> + 1 does give rewrite proofs ab<sup>2</sup> −−−−−→ <sup>b</sup>2=a+1 <sup>a</sup><sup>2</sup> <sup>+</sup> <sup>a</sup> −−−→ <sup>a</sup>2=<sup>b</sup> b + a and <sup>a</sup><sup>2</sup><sup>b</sup> −−−→ <sup>a</sup>2=<sup>b</sup> <sup>b</sup><sup>2</sup> −−−−−→ <sup>b</sup>2=a+1 a + 1.

A theory of Gr¨obner bases exists for various polynomial algebras [10]. In our setting, a sufficient requirement is that all indeterminates X, Y commute up to lower-order terms: XY <sup>−</sup> Y X <sup>∈</sup> <sup>Z</sup><sup>X</sup> <sup>+</sup> <sup>Z</sup><sup>Y</sup> <sup>+</sup> <sup>Z</sup>. The operator polynomials of Definition 3 fall into this category with the natural choice of taking all multiplier and shift operators as indeterminates. Indeed, for any sequence term f, we have the noncommutation relations

$$(\left(S\_j M\_j f\right)\_{\vec{n}} = \left(S\_j \left(\vec{n} \mapsto n\_j f\_{\vec{n}}\right)\right)\_{\vec{n}} = \left(n\_j + 1\right)\left(S\_j f\right)\_{\vec{n}} = \left(\left(M\_j S\_j + S\_j\right)f\right)\_{\vec{n}}$$

and all other pairs of multipliers and shifts commute exactly. That is:

$$S\_i M\_j = M\_j S\_i + \delta\_{i,j} S\_i \qquad S\_i S\_j = S\_j S\_i \qquad \qquad M\_i M\_j = M\_j M\_i \tag{2}$$

for all i and j, where δi,j equals 1 if i = j and 0 otherwise. When we consider a formal polynomial algebra (necessary to perform Gr¨obner basis computations), we will usually mean polynomials with integer coefficients and indeterminates M1, M2,...,S1, S2,... satisfying (2). Exceptionally, when we use propagation to substitution, we will consider compositions of shifts formally as further individual indeterminates, as explained above Procedure 12. Apart from this exceptional setting, we fix a choice of monomials as follows.

**Definition 6.** In our setting, a *monomial* is a polynomial of the form M<sup>x</sup><sup>1</sup> <sup>1</sup> ··· M<sup>x</sup>*<sup>d</sup>* <sup>d</sup> <sup>S</sup><sup>y</sup><sup>1</sup> <sup>1</sup> ··· <sup>S</sup><sup>y</sup>*<sup>d</sup>* <sup>d</sup> where the exponents <sup>x</sup><sup>j</sup> , y<sup>j</sup> <sup>∈</sup> <sup>N</sup> are numerals.

Due to the (non)commutation relations (2), polynomials can be written as sums of monomials times their integer coefficients. This makes working with these noncommutative polynomials similar to working with commutative ones. A major difference is that monomials are not closed under product, as illustrated by S1·M<sup>1</sup> = M1S1+S1. This complicates the definition of monomial order below, which in turn defines how to interpret a polynomial equation as a rewrite rule.

**Definition 7.** A *monomial order* is a well-founded total order on monomials such that for all monomials A, B, C, if A - B, then the leading monomial of CA is --smaller than the leading monomial of CB; here, the *leading monomial* of a nonzero polynomial P means the --largest monomial occurring in P.

Buchberger's algorithm to compute Gr¨obner bases (also in a noncommutative context) is similar to saturation-based theorem proving. It repeatedly derives from polynomial equations P = 0 and R = 0 new equations AP −BR = 0 where coefficient–monomial products A, B make the leading monomials of AP and BR cancel. It suffices to take A, B with smallest total degree and coprime coefficient. A and B play a similar role to the most general unifier in superposition. Since S<sup>j</sup> is semantically bijective, we can and always do cancel it, replacing SjR = 0 by R = 0. This modified completion into a Gr¨obner basis always terminates. The standard termination proof reduces to applying noetherianity of commutative polynomials over Z or Dickson's lemma [10].

A single operator polynomial P<sup>1</sup> perfectly encodes a linear homogeneous recurrence 0 = P1g of a sequence term g. However, we allow any heterogeneous recurrence of the form 0 = P • f <sup>=</sup> <sup>P</sup>1<sup>f</sup> <sup>1</sup> <sup>+</sup> ··· <sup>+</sup> <sup>P</sup>k<sup>f</sup> <sup>k</sup> where f <sup>=</sup> f <sup>1</sup>,...,f <sup>k</sup> is an arbitrary tuple of different sequence terms. We can encode this by a single operator polynomial for the duration of one Gr¨obner basis computation as follows. Let f enumerate exactly once all the sequence terms needed to express the current recurrences with the help of operator polynomials. Let f depend on d variables. For each f<sup>j</sup> , we consider a shift F<sup>j</sup> := S<sup>d</sup>+<sup>j</sup> w.r.t. a so far unused variable. Then the operator polynomial P • F encodes 0 = P • f.

This encoding does not respect the semantics of operator polynomials; to recover it, we must apply the substitution {F <sup>→</sup> f}. However, products such as F1F<sup>2</sup> remain uninterpretable even with ring-valued sequences because the operator product—function composition—is different from multiplication of f<sup>j</sup> 's. Hence, we will simply discard uninterpretable polynomials after the Gr¨obner basis computation. Moreover, from now on, we will freely write f<sup>j</sup> for F<sup>j</sup> .

**Definition 8.** Let X1,...,X<sup>n</sup> be an enumeration of all multiplier and shift indeterminates. An (X1,...,Xk)*-elimination order* is a monomial order such that <sup>X</sup><sup>j</sup> <sup>X</sup>a*k*+1 <sup>k</sup>+1 ···Xa*<sup>n</sup>* <sup>n</sup> for all indices <sup>j</sup> <sup>≤</sup> <sup>k</sup> and all exponents <sup>a</sup>k+1,...,a<sup>n</sup> <sup>∈</sup> <sup>N</sup>.

Our default choice for the order is to compare total degree in X1, ..., X<sup>k</sup> and break ties using the total degree reverse lexicographical order [7, Chapter 2 §2].

**Procedure 9.** *Eliminating* indeterminates X1,...,X<sup>k</sup> from a finite system of equations E means computing a Gr¨obner basis G of E w.r.t. an (X1,...,Xk) elimination order and then discarding all polynomials from G that contain any of X1,...,X<sup>k</sup> or that are not linear in the indeterminates encoding sequence terms. (As mentioned above, during the Gr¨obner basis computation, whenever we derive a polynomial SjR, we replace it by R.)

While in principle any Gr¨obner basis would suffice for elimination, our default choice is to compute the reduced Gr¨obner basis (i.e., the fully simplified one). The nonlinear polynomials can be discarded as soon as they are derived during the Gr¨obner basis computation instead of only at the end. Recurrence equations produced by elimination are logical consequences of the input equations, as we explain in our technical report.

Despite the formally equivalent roles of all sequence terms f<sup>i</sup> in the recurrence 0 = P • f, we associate with every recurrence a sequence term <sup>f</sup><sup>j</sup> . It is often convenient to write such a recurrence of f<sup>j</sup> as Pjf<sup>j</sup>+e = 0 where the *excess terms* <sup>e</sup> <sup>=</sup> P • f−Pjf<sup>j</sup> contain all sequence terms <sup>f</sup><sup>i</sup> except <sup>f</sup><sup>j</sup> . The choice of <sup>f</sup><sup>j</sup> among f will be determined by the definition of excess terms (Definition 18). However, this choice remains irrelevant for the individual propagation steps, described below. We adapt these steps from the four closure properties of holonomic sequences by carrying excess terms along.

**Propagation to Addition.** Let us start with addition of sequence terms.

**Procedure 10.** Let f and g be sequence terms, and let h be the formal name of their addition f +g. The associated recurrences F of f and G of g are propagated to those of h by eliminating f and g from F ∪G∪ {h = f + g}. (By Procedure 9, this involves computing a Gr¨obner basis for these equations and then discarding the equations containing f or g as well as the corresponding nonlinear terms.)

Actually, the same propagation technique works if f + g is replaced by any expression in the general recurrence format P • l (a dot product of operator polynomials P and sequence terms l). The key is that the defining equation <sup>h</sup> <sup>=</sup> P • l is again a linear recurrence. Such propagations could also be done by iterating more primitive propagations.

**Example 11.** Consider the goal n <sup>j</sup>=0a<sup>j</sup> = g<sup>n</sup> + a<sup>0</sup> given g<sup>0</sup> = 0 and g<sup>n</sup>+2 = <sup>g</sup><sup>n</sup> <sup>+</sup>a<sup>n</sup>+1 <sup>+</sup>a<sup>n</sup>+2 for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>. The defining recurrence of <sup>g</sup> can be written using the operator polynomials as S<sup>2</sup> <sup>1</sup> g = g +S1a+S<sup>2</sup> <sup>1</sup> a. The defining recurrence of the sum f<sup>n</sup> := n <sup>j</sup>=0a<sup>j</sup> is S1f = f + S1a. We must prove that h<sup>n</sup> := g<sup>n</sup> + a<sup>0</sup> − f<sup>n</sup>

is 0. To achieve this, we propagate recurrences to h using the elimination procedure described above (Procedure 9) and the total-degree-based (f,g)-elimination order with f ≺ g. Leading monomials are shown in bold:

$$\begin{array}{c c c} 0 = \mathbf{S\_1^2g} - g - S\_1a - S\_1^2a & \text{recurrence of } g\\ -S\_1^2 & 0 = g + a\_0 - f - h\\ \hline 0 = -g - S\_1a - S\_1^2a - a\_0 + \mathbf{S\_1^2f} + S\_1^2h\\ \hline -S\_1 & 0 = \mathbf{S\_1f} - f - S\_1a\\ \hline 0 = -g - S\_1a - a\_0 + S\_1^2h + \mathbf{S\_1f}\\ \hline 0 = \mathbf{S\_1f} - f - S\_1a\\ \hline 0 = -g - a\_0 + S\_1^2h + f\\ \hline 0 = \mathbf{g} + a\_0 - f - h\\ \hline 0 = \mathbf{S\_1^2h} - h \end{array} \text{recurrence of } f$$

In this example, h<sup>n</sup>+2−h<sup>n</sup> = 0 is the only recurrence that does not contain f and g, so we discard the rest of the Gr¨obner basis calculation. Since h<sup>n</sup>+2 − h<sup>n</sup> = 0 contains only the sequence h, we can use it to prove the induction step (of size 2) of a proof of ∀n. h<sup>n</sup> = 0. We are then left with the two base cases h<sup>0</sup> = 0 and h<sup>1</sup> = 0, which the Summation inference would include in its conclusion without auxiliary symbols (f and h) as -0 <sup>j</sup>=0a<sup>j</sup> -= g<sup>0</sup> + a<sup>0</sup> ∨ -1 <sup>j</sup>=0a<sup>j</sup> -= g<sup>1</sup> + a0.

**Propagation to Substitution.** Consider a numeral matrix a = [akj ] kj <sup>∈</sup> <sup>Z</sup><sup>d</sup>×<sup>D</sup> and a vector b <sup>∈</sup> <sup>Z</sup><sup>d</sup>. They characterize an affine substitution <sup>σ</sup> <sup>=</sup> {n <sup>→</sup> an <sup>+</sup>b} = {n<sup>k</sup> → -D <sup>j</sup>=1akjn<sup>j</sup> +b<sup>k</sup> | 1 ≤ k ≤ d}. As an operator on sequences, σ performs an affine change of variables: (σf)<sup>n</sup> = fan+b.

Clearly, any recurrence P f = 0 of f implies σPf = 0. Moreover, if σP = P σ, then P σf = 0 gives a recurrence of σf. Finding such a P for a general P can be reduced to finding an operator polynomial P <sup>X</sup> satisfying σX = P <sup>X</sup>σ for every indeterminant X. This amounts to pushing all indeterminates X leftwards. For multipliers, we have σ(M1,...,Md)=(a(M1,...,MD) +b) σ. In contrast, shifts are easily pushed only rightwards—namely, Sjσ = σS<sup>a</sup>1*<sup>j</sup>* <sup>1</sup> ··· <sup>S</sup><sup>a</sup>*dj* <sup>d</sup> . Consequently, the recurrences of f must be first expressed in terms of the composite shifts S<sup>j</sup> := S<sup>a</sup>1*<sup>j</sup>* <sup>1</sup> ··· <sup>S</sup><sup>a</sup>*dj* <sup>d</sup> . As operators, these satisfy the (non)commutation relations

$$\mathbb{S}\_j M\_k = (M\_k + a\_{kj}) \mathbb{S}\_j \qquad \mathbb{S}\_i \mathbb{S}\_j = \mathbb{S}\_j \mathbb{S}\_i \qquad \mathbb{S}\_i S\_j = S\_j \mathbb{S}\_i \tag{3}$$

This makes the S<sup>j</sup> 's suitable as indeterminates in Gr¨obner basis computations.

Accordingly, for propagation to substitution, we enlarge our formal polynomial algebra to also contain the indeterminates S1, S2,... satisfying the relations (3), while also keeping (2). We note that, as operators, the indeterminates further satisfy (essentially by definition) the relations

$$\mathbb{S}\_j \prod\_{k:\ a\_{kj} < 0} S\_k^{|a\_{kj}|} = \prod\_{k:\ a\_{kj} > 0} S\_k^{a\_{kj}} \quad \text{for } j \in \{1, \ldots, D\} \tag{4}$$

We add these new relations to the system of recurrence equations of which we compute the Gr¨obner basis. Finally, we extend our notion of *monomial* from Definition 6 to mean any polynomial of the form Mx<sup>1</sup> <sup>1</sup> ··· <sup>M</sup>x*<sup>d</sup>* <sup>d</sup> <sup>S</sup>y<sup>1</sup> <sup>1</sup> ··· <sup>S</sup>y*<sup>d</sup>* <sup>d</sup> <sup>S</sup>z<sup>1</sup> <sup>1</sup> ··· <sup>S</sup>z*<sup>D</sup>* D where the exponents <sup>x</sup><sup>j</sup> , y<sup>j</sup> , z<sup>j</sup> <sup>∈</sup> <sup>N</sup> are numerals.

**Procedure 12.** Recurrences of a sequence term f are propagated to its affine substitution (σf)<sup>n</sup> = fan+<sup>b</sup> as follows. Eliminate each S<sup>k</sup> from the system of polynomial equations containing both the recurrences of f and the relations (4). Every resulting recurrence P(M, S)f + e = 0 implies a recurrence P(aM +b, S)σf +σe = 0 of σf where we have collected the indeterminates into vectors and where e are excess terms that do not contain f.

**Example 13.** Consider n<sup>2</sup> <sup>n</sup>1=0 <sup>n</sup><sup>1</sup> n2−n<sup>1</sup> = F<sup>n</sup>2+1 where the Fibonacci numbers are defined by F<sup>1</sup> = F<sup>2</sup> = 1 and S2 <sup>1</sup> − S<sup>1</sup> − 1 F = 0. For the binomial coefficient · · <sup>n</sup>1,n<sup>2</sup> <sup>=</sup> <sup>n</sup><sup>1</sup> n<sup>2</sup> = <sup>n</sup>1! <sup>n</sup>2!(n1−n2)! , the recurrence from Pascal's triangle reads as (S1S<sup>2</sup> − S<sup>2</sup> − 1) · · = 0 and extends <sup>n</sup><sup>1</sup> n<sup>2</sup> from 0 ≤ n<sup>2</sup> ≤ n<sup>1</sup> to all <sup>n</sup><sup>2</sup> <sup>∈</sup> <sup>Z</sup> and <sup>n</sup><sup>1</sup> <sup>∈</sup> <sup>N</sup>. Moreover, we have <sup>n</sup><sup>1</sup> n<sup>2</sup> = <sup>n</sup><sup>1</sup> n<sup>2</sup> <sup>n</sup>1−<sup>1</sup> n2−1 —i.e., ((M<sup>2</sup> + 1) S1S<sup>2</sup> − M<sup>1</sup> − 1) · · = 0. We want to propagate these recurrences to the substitution σ = {n<sup>1</sup> → n1, n<sup>2</sup> <sup>→</sup> <sup>n</sup><sup>2</sup> <sup>−</sup> <sup>n</sup>1}. We have <sup>S</sup>1<sup>σ</sup> <sup>=</sup> σS1S−<sup>1</sup> <sup>2</sup> and S2σ = σS2. So we introduce for S1S−<sup>1</sup> <sup>2</sup> and S<sup>2</sup> the indeterminates S<sup>1</sup> and S<sup>2</sup> whose characterizing recurrences (4) read

$$\begin{pmatrix} \left(\mathbb{S}\_1 S\_2 - S\_1\right) \begin{pmatrix} \cdot\\ \cdot \end{pmatrix} = 0 & \begin{pmatrix} \mathrm{i} \\ \end{pmatrix} & \begin{pmatrix} \left(\mathbb{S}\_2 - S\_2\right) \begin{pmatrix} \cdot\\ \cdot \end{pmatrix} = 0 & \begin{pmatrix} \mathrm{i} \\ \end{pmatrix} \end{pmatrix} \end{pmatrix}$$

Next, we eliminate S1, S<sup>2</sup> in favor of S1, S2. Here, (ii) immediately rewrites every <sup>S</sup><sup>2</sup> to <sup>S</sup><sup>2</sup> and then (i) becomes (−S<sup>1</sup> <sup>+</sup> <sup>S</sup>1S2) · · = 0, which rewrites every S1. The remaining steps to complete a Gr¨obner basis w.r.t. some total-degree order are irrelevant for what we want to illustrate. We factor the result for readability:

(−S<sup>1</sup> <sup>+</sup> <sup>S</sup>1S2) · · =0 (−S<sup>2</sup> <sup>+</sup> <sup>S</sup>2) · · = 0 S1S<sup>2</sup> <sup>2</sup> <sup>−</sup> <sup>S</sup><sup>2</sup> <sup>−</sup> <sup>1</sup> · · = 0 ((M<sup>2</sup> + 1) <sup>S</sup><sup>2</sup> <sup>−</sup> <sup>M</sup><sup>1</sup> <sup>+</sup> <sup>M</sup>2) · · = 0 ((M<sup>1</sup> + 1) <sup>S</sup>1S<sup>2</sup> <sup>−</sup> (M<sup>1</sup> <sup>−</sup> <sup>M</sup><sup>2</sup> + 2) <sup>S</sup><sup>1</sup> <sup>−</sup> <sup>M</sup><sup>1</sup> <sup>−</sup> 1)· · = 0 ((M<sup>1</sup> <sup>−</sup> <sup>M</sup><sup>2</sup> + 1) (M<sup>1</sup> <sup>−</sup> <sup>M</sup><sup>2</sup> + 2) <sup>S</sup><sup>1</sup> <sup>−</sup> <sup>M</sup>1M<sup>2</sup> <sup>−</sup> <sup>M</sup>2) · · = 0

Now σ maps the lowest four recurrences to recurrences of f<sup>n</sup>1,n<sup>2</sup> = <sup>n</sup><sup>1</sup> n2−n<sup>1</sup> below:

$$\begin{aligned} \left(S\_1 S\_2^2 - S\_2 - 1\right) f &= 0 \\ \left(\left(M\_1 + 1\right) S\_1 S\_2 - \left(2M\_1 - M\_2 + 2\right) S\_1 - M\_1 - 1\right) f &= 0 \\ \left(\left(2M\_1 - M\_2 + 1\right) \left(2M\_1 - M\_2 + 2\right) S\_1 - \left(M\_1 + 1\right) \left(M\_2 - M\_1\right)\right) f &= 0 \end{aligned}$$

The next step is to propagate to the summation. We postpone it to Example 16.

**Propagation to Product.** Let · be ring multiplication or more generally a group bihomomorphism. If the sequence terms f and g depend on disjoint sets of variables, recurrences of fg = f · g are essentially a union of recurrences of f and g. Namely, let P f + e = 0 be any recurrence of f where P is an operator polynomial on the variables of f and the excess terms e do not contain f. Then P (fg) + eg = 0 because g is effectively a constant to P, and similarly for recurrences of g. With the help of this special case, propagation to product can be reduced to propagation to substitution, as explained below.

**Procedure 14.** Let f and g be sequence terms parameterized by the variables n = (n<sup>j</sup> ) d <sup>j</sup>=1. Let m = (n<sup>j</sup>+<sup>d</sup>) d <sup>j</sup>=1 be a tuple of fresh variables. The recurrences of f and g are propagated to their pointwise product fg in two steps. First, the recurrences of the variable-disjoint product fngm are the union of the recurrences of f<sup>n</sup> multiplied on the right by gm and of those of gm multiplied on the left by f<sup>n</sup>. Then the recurrences of fng<sup>n</sup> = {m → n} (fngm- ) are found by propagating to substitution using Procedure 12.

**Propagation to Summation.** We finally consider the summations n<sup>2</sup> <sup>n</sup>1=0fn. We can assume that the variables are numbered so that the sum acts on the first two. Similarly to above, we consider the consequence n<sup>2</sup> <sup>n</sup>1=0P f<sup>n</sup> +n<sup>2</sup> <sup>n</sup>1=0e<sup>n</sup> = 0 of a recurrence P f + e = 0 of the sequence term f where P is an operator polynomial and e are excess terms. We want to find an operator polynomial P such that n<sup>2</sup> <sup>n</sup>1=0P becomes P n<sup>2</sup> <sup>n</sup>1=0 up to excess terms. Like for substitutions, finding such a P for P can be reduced to finding an operator polynomial P X satisfying n<sup>2</sup> <sup>n</sup>1=0X = P X n<sup>2</sup> <sup>n</sup>1=0 up to excess terms for every indeterminant X. The result will be a recurrence P n<sup>2</sup> <sup>n</sup>1=0f<sup>n</sup> + e = 0 of n<sup>2</sup> <sup>n</sup>1=0fn.


$$\begin{aligned} \sum\_{n\_1=0}^{n\_2} S\_1 g\_{\vec{n}} &= \sum\_{n\_1=0}^{n\_2} g\_{\vec{n}} + \left\{ n\_1 \mapsto n\_2 + 1 \right\} g\_{\vec{n}} - \left\{ n\_1 \mapsto 0 \right\} g\_{\vec{n}} \\ \sum\_{n\_1=0}^{n\_2} S\_2 g\_{\vec{n}} &= S\_2 \sum\_{n\_1=0}^{n\_2} g\_{\vec{n}} - S\_2 \left\{ n\_1 \mapsto n\_2 \right\} g\_{\vec{n}} \end{aligned}$$

**Example 16.** Let us continue the proof of n<sup>2</sup> <sup>n</sup>1=0 <sup>n</sup><sup>1</sup> n2−n<sup>1</sup> = F<sup>n</sup>2+1 from Example 13. There we found for the summand f<sup>n</sup>1,n<sup>2</sup> = <sup>n</sup><sup>1</sup> n2−n<sup>1</sup> a recurrence (S1S<sup>2</sup> <sup>2</sup> −S<sup>2</sup> −1)f = 0. It is actually the only recurrence after eliminating M<sup>1</sup> as a first step of propagation to summation. Next, we set S<sup>1</sup> to 1 using a telescoping identity:

$$\sum\_{n\_1=0}^{n\_2} S\_1 S\_2^2 f = \sum\_{n\_1=0}^{n\_2} S\_2^2 f + \{n\_1 \mapsto n\_2\} \, S\_1 S\_2^2 f - \{n\_1 \mapsto 0\} \, S\_2^2 f$$

Then we push the remaining shifts S<sup>2</sup> leftwards:

$$\begin{array}{l} \sum\_{n\_1=0}^{n\_2} \left( S\_2^2 - S\_2 \right) f = S\_2 \sum\_{n\_1=0}^{n\_2} \left( S\_2 - 1 \right) f - S\_2 \left\{ n\_1 \mapsto n\_2 \right\} \left( S\_2 - 1 \right) f \\ = \left( S\_2^2 - S\_2 \right) \sum\_{n\_1=0}^{n\_2} f - S\_2^2 \left\{ n\_1 \mapsto n\_2 \right\} f - S\_2 \left\{ n\_1 \mapsto n\_2 \right\} \left( S\_2 - 1 \right) f \end{array}$$

Hence, in total we have

$$\begin{array}{l} \sum\_{n\_1=0}^{n\_2} \left( S\_1 S\_2^2 - S\_2 - 1 \right) f - \left( S\_2^2 - S\_2 - 1 \right) \sum\_{n\_1=0}^{n\_2} f \\\ f = \left\{ n\_1 \mapsto n\_2 \right\} S\_1 S\_2^2 f - \left\{ n\_1 \mapsto 0 \right\} S\_2^2 f - S\_2^2 \left\{ n\_1 \mapsto n\_2 \right\} f - S\_2 \left\{ n\_1 \mapsto n\_2 \right\} \left( S\_2 - 1 \right) f \\\ f = \binom{n\_2 + 1}{1} & - \binom{0}{n\_2 + 2} & - \binom{n\_2 + 2}{0} & - \binom{n\_2 + 1}{1} + \binom{n\_2 + 1}{0} \\\ f = \binom{n\_2 + k}{1}^k & - 0 & -1 & - \binom{n\_2 + k}{1} + 1 & = 0 \\\ \end{array}$$

Since (S1S<sup>2</sup> <sup>2</sup> −S<sup>2</sup> −1)f = 0, we have S2 <sup>2</sup> − S<sup>2</sup> − 1 n<sup>2</sup> <sup>n</sup>1=0f = 0. Now this is the same recurrence that F<sup>n</sup>2+1 satisfies and hence the final propagation to difference gives S2 <sup>2</sup> − S<sup>2</sup> − 1 ( n<sup>2</sup> <sup>n</sup>1=0f −F<sup>n</sup>2+1) = 0. This proves an induction step of size 2 and leaves two base cases that can be discharged by a theorem prover.

**Iteration on Excess Terms.** Let g be the term from the negated goal to be proved to be 0. After propagating along the structure of g, we end up with recurrences of the form P g = e where P is an operator polynomial and the excess terms e do not contain g. In the holonomic case, e will be syntactically 0. We have also observed that e is often 0 in the nonholonomic case as well. But if e is not syntactically 0, then P g = e cannot immediately be used for a proof by induction. A solution is to iterate a full series of propagations with e in place of g to find P2e = e<sup>2</sup> and conclude P2P g = P2e = e2, then repeat as long as necessary. This process will always terminate, although it might fail to find recurrences.

We will impose an order on the sequence terms to accomplish three things. First, we get a proper definition of which terms in a recurrence are excess. Second, well-foundedness of the order will guarantee termination of the iteration of full propagations to excess terms. Third, the iterations can be interleaved with basic normalizations such as {n<sup>1</sup> → 2n1} M<sup>1</sup> {n<sup>1</sup> → 3n1+1} f → 2M<sup>1</sup> {n<sup>1</sup> → 6n1+2} f.

**Definition 17.** The *spine* of a sequence term f without addition, denoted by spine f, is the sequence term obtained intuitively by erasing operator polynomials from f. Precisely, this means fully reducing f by the rewrite rules at → t, M<sup>j</sup> t → t, {n → bn + c}t → {n → bn}t, and -c•n+a <sup>j</sup> t → -c•n <sup>j</sup> t where a, c, and the matrix b are all numeric.

Shift indeterminates mix with other substitutions, which explains the last two rules. For example, spine {n<sup>1</sup> → 2n1} M<sup>1</sup> {n<sup>1</sup> → 3n<sup>1</sup> + 1} f = {n<sup>1</sup> → 2n1} {n<sup>1</sup> <sup>→</sup> <sup>3</sup>n1} <sup>f</sup>. If we have a sequence term <sup>c</sup>1g<sup>1</sup> <sup>+</sup> ··· <sup>+</sup> <sup>c</sup>kg<sup>k</sup> with addition, it contains multiple spines, one for each g<sup>j</sup> . The significance of spines is that when we derive a more complex consequence from a recurrence (during elimination by applying an operator polynomial to it), its spines do not become more complex.

We can easily describe how each propagation step changes the spines of the involved sequence terms. Propagation to the addition f +g produces only spines e<sup>f</sup> and e<sup>g</sup> in the resulting recurrences, where e<sup>f</sup> denotes a spine of a term from a recurrence of f and analogously for eg. Moreover, propagation to the substitution σf produces σe<sup>f</sup> , propagation to the product fg produces (spine f)e<sup>g</sup> and e<sup>f</sup> (spine g), and propagation to the summation n<sup>2</sup> <sup>n</sup>1=0f produces {n<sup>1</sup> → 0} (spine f), {n<sup>1</sup> → n2} (spine f), and n<sup>2</sup> <sup>n</sup>1=0e<sup>f</sup> , where e<sup>f</sup> and e<sup>g</sup> are as above.

We want propagations to preserve the invariant that excess terms are small. Given how spines change under propagation, a term order on spines offers a way to define smallness. We choose an order that also orients simplifications.

**Definition 18.** Fix a Knuth–Bendix order with argument coefficients [12] with exactly three weights Wa <sup>n</sup>=0 > W (·) > 3Wσ > 0 and all argument coefficients set to 2. Moreover, projection sequence terms corresponding to M<sup>j</sup> 's (Remark 4) must have equal weights, and substitutions with fewer bindings must have lower precedence. The *excess* (*partial*) *order* on addition-free sequence terms is obtained by comparing the spines of terms using this fixed order. *Excess terms* of a recurrence are all its nonmaximal sequence terms w.r.t. the excess order.

The weights for the excess order are arranged to be compatible with normalization, which pushes substitutions to the leaf nodes of the term tree and pulls summations towards the root. The resulting normal form is simply the typical way of writing terms without explicit substitutions. It is also the normal form of the rewrite system consisting of the applicable associativity and/or commutativity rules of · as well as the following rules:

$$\begin{aligned} s \cdot \sum\_{j}^{a} t &\to \sum\_{j}^{a} st & \quad & 1t, t1, \{\} \ t \to t \\ \left(\sum\_{j}^{a} s\right) \cdot t &\to \sum\_{j}^{a} st & \quad & \left(\sigma \cup \{n\_{j} \mapsto a\}\right) u \to \sigma u \\ \sigma \sum\_{j}^{a} t &\to \sum\_{j}^{\sigma a} \sigma t & \quad & \left(\sigma \cup \{n\_{j} \mapsto a\}\right) M\_{j} \to a \\ \sum\_{j}^{c} t &\to \left\{n\_{j} \mapsto 0\right\} t + \dots + \left\{n\_{j} \mapsto c\right\} t & \quad & \sigma\left(ts\right) \to \sigma t \cdot \sigma s \\ \sum\_{j}^{-c} t &\to -\left\{n\_{j} \mapsto -1\right\} t - \dots - \left\{n\_{j} \mapsto 1 - c\right\} t & \quad & \sigma \sigma' t \to \left(\sigma \circ \sigma'\right) t \end{aligned}$$

where s, t, u are sequence terms, u does not contain the variable n<sup>j</sup> , a is an affine variable sum, M<sup>j</sup> = n → n<sup>j</sup> is a projection sequence term, σ, σ are affine substitutions, and the numeral c is nonnegative.

These rules produce additions, which must be interpreted as follows. For any rule above of the general form t<sup>0</sup> → c1t<sup>1</sup> + ··· + cktk, the actual rewrite on the level of entire recurrences is f [t0]+R = 0 → c1f [t1]+···+ckf [tk]+R = 0 where c<sup>j</sup> are numerals, the sequence terms f [t<sup>j</sup> ] are equal except for the distinguished subterm t<sup>j</sup> , and R is the sum of the remaining terms in the recurrence.

To conclude termination, it suffices to prove that t<sup>0</sup> dominates each of t1,..., t<sup>k</sup> individually. The proof is in our technical report. It makes apparent our choices of weights and argument coefficients for the transfinite Knuth–Bendix order.

#### **5 Induction**

After propagation, we consider all recurrences P g = 0 of the goal sequence term g to be proved to be 0. In exceptionally fortunate cases, the operator polynomial P is ±1 and we are unconditionally done because, for any group, the multiplication-by-±1 map is invertible. This happens when the objective is to prove a recurrence that this method derives as a substep anyway. Otherwise, we apply induction and leave as conditions the base cases as well as invertibility of the multiplication maps associated with the leading monomials' coefficients.

A common case is that variables range over natural numbers and we have a final recurrence with leading shift S<sup>b</sup><sup>1</sup> <sup>1</sup> ··· <sup>S</sup><sup>b</sup>*<sup>d</sup>* <sup>d</sup> w.r.t. any monomial order. Then the values <sup>d</sup> <sup>j</sup>=1 n <sup>∈</sup> <sup>N</sup><sup>d</sup> *<sup>|</sup>* <sup>n</sup><sup>j</sup> < b<sup>j</sup> suffice for the base cases, as a union of stacked hyperplanes that is infinite unless d ≤ 1, but it corresponds to only d <sup>j</sup>=1b<sup>j</sup> one-variable substitutions {n<sup>j</sup> → a} for 1 ≤ j ≤ d and 0 ≤ a<b<sup>j</sup> . If our eager generalization produced variables that do not participate in their induction (i.e., their b<sup>j</sup> 's are 0), they are replaced back to their original values.

If there is more than one applicable final recurrence, we take the intersection of their base value sets w.r.t. the same monomial order. To see that it works, consider any point outside the intersection. It is a nonbase point w.r.t. some final recurrence and hence the induction step can be taken by the recurrence.

To represent the intersection as substitutions, we distribute it over the hyperplane stack unions. This results in a union of hyperline stacks of the form N(J,b) := n <sup>∈</sup> <sup>N</sup><sup>d</sup> *<sup>|</sup>* <sup>n</sup><sup>j</sup> < b<sup>j</sup> for all <sup>j</sup> <sup>∈</sup> <sup>J</sup> where <sup>J</sup> ⊆ {1,...,d} and b vary. One such stack is represented by <sup>j</sup>∈<sup>J</sup> <sup>b</sup><sup>j</sup> substitutions {n<sup>j</sup> → a<sup>j</sup> *|* j ∈ J} where the a<sup>j</sup> 's are chosen arbitrarily such that 0 ≤ a<sup>j</sup> < b<sup>j</sup> . Unfortunately, distribution duplicates some base cases. To compensate, if <sup>I</sup> <sup>⊆</sup> <sup>J</sup> and b <sup>≥</sup> c pointwise, then <sup>N</sup>(I,b) <sup>⊇</sup> <sup>N</sup>(J, c), so that <sup>N</sup>(J, c) can be removed in favor of <sup>N</sup>(I,b).

If a variable <sup>n</sup> <sup>∈</sup> <sup>Z</sup> is unbounded, we perform two inductions on the rays: 0 ≤ n and n<b if b base cases are needed. The backward induction on n<b can be transformed into an induction on <sup>N</sup> by the change of variables <sup>n</sup> → b − 1 − n.

#### **6 Examples**

Our procedure can prove the induction step of holonomic sequence formulas such as Example 13, the binomial formula:

$$\left(a+b\right)^{h} = \sum\_{n=0}^{h} {h \choose n} a^{n}b^{h-n} \qquad \qquad \binom{a+b}{h} = \sum\_{n=0}^{h} {h \choose n} {h \choose h-n}$$

Heterogeneous recurrences, which are beyond the holonomic fragment, enable proving elementary general sequence formulas such as Example 11 and the following:

$$\sum\_{n=0}^{h} f\_{h-n} = \sum\_{n=0}^{h} f\_n \qquad\qquad \sum\_{h=0}^{k} \sum\_{n=0}^{h} f\_{h,n} = \sum\_{n=0}^{k} \sum\_{h=n}^{k} f\_{h,n}$$

If we ignore the holonomic base case requirements, we can for example prove the induction steps of Abel's binomial formula and of some Stirling number identities:

$$a\left(a+b\right)^{h} = \sum\_{n=0}^{h} {h \choose n} a\left(a-n\right)^{n-1} \left(b+n\right)^{h-n} \quad h^{k}/h! = \sum\_{n=0}^{h} {h \choose n} \left(\left\{\begin{array}{c} k\\n\end{array}\right\} / \left(h-n\right)!$$

Here, the Stirling numbers of the second kind {<sup>k</sup> <sup>n</sup>} are one of many special nonholonomic sequences that frequently arise in combinatorics. They count the number of partitions of a k-element set into n subsets.

As further demonstration, we apply our procedure to the last equation. For convenience, we will use the name of a variable also to denote its multiplier operator. Moreover, we will use the uppercase version of the name of a variable to denote its shift operator. The defining recurrence of the Stirling numbers then reads (KN − (n + 1) N − 1) {<sup>k</sup> <sup>n</sup>} = 0 for k, n ≥ 0, where K and N denote the shift operators for the variables k and n, the first n denotes the multiplier for the variable n, and the second n is the variable itself. This recurrence is complemented by the initial values {<sup>0</sup> <sup>0</sup>} = 1 and {<sup>n</sup> <sup>0</sup>} = {<sup>0</sup> <sup>n</sup>} = 0 if n -= 0.

Starting from the right, the inverse m! <sup>−</sup><sup>1</sup> of the factorial satisfies the recurrence (mM + M − 1) m! <sup>−</sup><sup>1</sup> = 0 that holds for all <sup>m</sup> <sup>∈</sup> <sup>Z</sup> by extension. This must be found in the initialization step because there is no propagation to division. Propagation to the substitution {m → h − n} then gives the following recurrences, factored for clarity:

$$\left(\left(h-n+1\right)H-1\right)\left(h-n\right)!^{-1}=0\qquad\left(N-h+n\right)\left(h-n\right)!^{-1}=0$$

To propagate to product, we consider <sup>k</sup><sup>1</sup> n<sup>1</sup> and (h<sup>2</sup> <sup>−</sup> <sup>n</sup>2)!−<sup>1</sup> with variables renamed apart. We must propagate to the substitution {n<sup>j</sup> → n, h<sup>j</sup> → h, kj <sup>→</sup> <sup>k</sup> <sup>|</sup> <sup>j</sup> ∈ {1, <sup>2</sup>}} the recurrences of <sup>k</sup><sup>1</sup> n<sup>1</sup> (h<sup>2</sup> <sup>−</sup> <sup>n</sup>2)!−<sup>1</sup> given by the following five operator polynomials:

$$\begin{aligned} K\_1 N\_1 - \left(n\_1 + 1\right) N\_1 - 1 \quad \text{and} \quad H\_1 - 1 \quad \text{from } \begin{Bmatrix} k\_1\\ n\_1 \end{Bmatrix} \\ \text{and } \begin{Bmatrix} h\_2 - n\_2 + 1 \end{Bmatrix} H\_2 - 1, \ N\_2 - h\_2 + n\_2, \text{ and } \begin{Bmatrix} K\_2 - 1 & \text{from } (h\_2 - n\_2)! \end{Bmatrix}^{-1} \end{aligned}$$

We added here the trivial recurrences given by H<sup>1</sup> − 1 and K<sup>2</sup> − 1 implied by the independence from h<sup>1</sup> and k2. Among the defining recurrences (4) of the compound shift indeterminates N, H,K, the recurrence H1H<sup>2</sup> − H simplifies to H<sup>2</sup> − H by H<sup>1</sup> − 1 and K1K<sup>2</sup> − K to K<sup>1</sup> − K by K<sup>2</sup> − 1. (In other words, the factorwise renaming of already disjoint variables h and k amounts to renaming in the entire product.) The third compound shift recurrence, N1N<sup>2</sup> − N, simplifies to (h<sup>2</sup> − n2) N<sup>1</sup> − N by N<sup>2</sup> − h<sup>2</sup> + n2. The part of the Gr¨obner basis with only compound shifts is then straightforwardly finished with the result {KN − (n<sup>1</sup> + 1) N − h<sup>2</sup> + n2, (h<sup>2</sup> − n<sup>2</sup> + 1) H − 1}. Hence this propagation step yields

$$\left(\left(KN - \left(n+1\right)N - h + n\right)\frac{\left\{^{k}\_{n}\right\}}{\left(h - n\right)!} = 0 \quad \left(\left(h - n + 1\right)H - 1\right)\frac{\left\{^{k}\_{n}\right\}}{\left(h - n\right)!} = 0$$

To sum over n, we first eliminate n from the previous two recurrences and conclude (H (K − h)+(N − 1) (KH − (h + 1) H + 1)) ({<sup>k</sup> <sup>n</sup>} / (h − n)!) = 0. The sum has natural boundaries, meaning that the summand vanishes outside them. This guarantees that there will be no excess terms, which we also tediously discover when pulling out the indeterminates:

$$\begin{split} \sum\_{n=0}^{h} \left( N - 1 \right) \overbrace{\left( K H - (h+1) H + 1 \right)}^{P} \overbrace{\frac{\left\{ k \right\}}{\left( h - n \right)!}}^{\left\{ k \right\}} &= P \left( \overbrace{\frac{\left\{ h+1 \right\}}{\left( -1 \right)!}}^{k} - \overbrace{\frac{\left\{ k \right\}}{\left\{ \left\{ h \right\} \right\}}}^{k} \right) \\ &= -\left\{ \overbrace{\frac{k+1}{0}}^{k+1} \right\} / \left( \left( h+1 \right)! + \left\{ \frac{k}{0} \right\} \left( \left( h+1 \right) H - 1 \right) h!^{-1} = 0 \\ \sum\_{n=0}^{h} H \left( K - h \right) \frac{\left\{ \frac{k}{n} \right\}}{\left( h - n \right)!} - H \left( K - h \right) \sum\_{n=0}^{h} \frac{\left\{ \frac{k}{n} \right\}}{\left( h - n \right)!} = - \left( K - h \right) \frac{\left\{ \frac{k}{h+1} \right\}}{\left( -1 \right)!} = 0 \end{split}$$

Here, by the recurrence of the inverse of the factorial, we get (−1)!−<sup>1</sup> = 0. So we obtain a recurrence H (K − h) h <sup>n</sup>=0 {<sup>k</sup> <sup>n</sup>} / (h − n)! = 0 for the left-hand side of our goal. For the right-hand side, we unproblematically obtain (K − h) h<sup>k</sup>/h! = 0. Hence <sup>H</sup> (<sup>K</sup> <sup>−</sup> <sup>h</sup>) zeros out the difference <sup>h</sup><sup>k</sup>/h! <sup>−</sup> h <sup>n</sup>=0 {<sup>k</sup> <sup>n</sup>} / (h − n)!. The largest shift HK of the operator H (K − h) determines that the two sets of base cases h = 0 and k = 0 are sufficient for induction.

#### **7 Related Work**

Holonomic sequences [20] are closely related to our work. Unlike our approach, which allows infinitely many base cases as long as they are finitely representable (Sect. 5), they are limited to a finite number of base cases. Relaxing this limitation yields approximately the homogeneous version of our propagation procedure (i.e., without excess terms), whose theory Chyzak, Kauers, and Salvy laid out [6]. Heterogeneity amounts to module Gr¨obner bases [5,8,13]. Its integration into propagations makes elementary identities about general sequences automatically provable, which may be of interest for general-purpose theorem provers.

In practice, hypergeometric sums are common holonomic sequences that have much faster algorithms available. Gosper's indefinite summation [9] can be applied to compute Wilf–Zeilberger pairs [19], which offer compact proof certificates for definite sum identities. These fast methods admit generalizations to the full holonomic setting. See Koutschan's thesis [11] for an overview.

Finding a closed form instead of only checking it for a summation is a different but related task. A common approach is to perform a recurrence solving phase after recurrence computation, as in the Mathematica package Sigma [1,15].

## **8 Conclusion**

We presented a procedure for proving equations involving summations within an automatic higher-order theorem prover. The procedure is inspired by holonomic sequences and partly generalizes them. It expresses the problem as recurrences and derives new recurrences from existing ones. In case of success, it shows the induction step of a proof by induction, leaving the base cases to the prover.

As future work, we want to continue implementing the procedure in Zipperposition [17]. We hope that the subsequent practical experiments help us to settle how side conditions of initial recurrences ought to be handled.

**Acknowledgment.** We thank Pascal Fontaine for his ideas on how to integrate our procedure into SMT and tableaux. We also thank Anne Baanen, Pascal Fontaine, Mark Summerfield, and the anonymous reviewers for suggesting improvements.

Nummelin and Blanchette's research has received funding from the Netherlands Organization for Scientific Research (NWO) under the Vidi program (project No. 016.Vidi.189.037, Lean Forward). Dahmen's research has received funding from the NWO under the Vidi program (project No. 639.032.613, New Diophantine Directions).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Formal Verification of Bit-Vector Invertibility Conditions in Coq**

Burak Ekici<sup>1</sup> , Arjun Viswanathan<sup>2</sup> , Yoni Zohar3(B) , Cesare Tinelli<sup>2</sup> , and Clark Barrett<sup>4</sup>

> Mu˘gla Sıtkı Ko¸cman University, Mu˘gla, Turkey The University of Iowa, Iowa City, USA Bar-Ilan University, Ramat Gan, Israel yoni.zohar@biu.ac.il Stanford University, Stanford, USA

**Abstract.** We prove the correctness of invertibility conditions for the theory of fixed-width bit-vectors—used to solve quantified bit-vector formulas in the Satisfiability Modulo Theories (SMT) solver cvc5— in the Coq proof assistant. Previous work proved many of these in a completely automatic fashion for arbitrary bit-width; however, some were only proved for bit-widths up to 65, even though they are being used to solve formulas over larger bit-widths. In this paper we describe the process of proving a representative subset of these invertibility conditions in Coq. In particular, we describe the BVList library for bit-vectors in Coq, our extensions to it, and proofs of the invertibility conditions.

#### **1 Introduction**

Many applications in hardware and software verification rely on bit-precise reasoning, which can be modeled using the SMT-LIB 2 theory of fixed-width bitvectors [3]. While Satisfiability Modulo Theories (SMT) solvers are able to reason about bit-vectors of fixed width, they currently require all widths to be expressed concretely (by a numeral) in their input formulas. For this reason, they cannot be used to prove properties of bit-vector operators that are parametric in the bit-width, such as the associativity of bit-vector concatenation. Proof assistants such as Coq [25], which have direct support for dependent types, are better suited for such tasks.

Bit-vector formulas that are parametric in the bit-width arise in the verification of parametric Boolean functions and circuits (see, e.g., [13]). In our case, we are mainly interested in parametric lemmas that are relevant to internal techniques of SMT solvers for the theory of fixed-width bit-vectors. These include, for example, rewrite rules, refinement schemes, and preprocessing passes. Such techniques are developed a priori for every possible bit-width. Meta-reasoning about the correctness of such solvers then requires bit-width independent reasoning.

In this paper, we focus on parametric lemmas that originate from a quantifierinstantiation technique implemented in the SMT solver cvc5 [2]. This technique is based on *invertibility conditions* [15]. For a trivial case of an invertibility condition, consider the equation x + s = t. where x, s and t are variables of the same bit-vector sort. In the terminology of Niemetz et al. [15], this equation is "invertible for x." A general inverse, or "solution," is given by the term t−s. Since there is always such an inverse, the invertibility condition for x + s = t is simply the universally true formula -. The formula stating this fact, referred to here as an *invertibility equivalence*, is -⇔∃x. x + s = t, which is valid in the theory of fixed-width bit-vectors, for any bit-width. In contrast, the equation x · s = t is not always invertible for x. A necessary and sufficient condition for invertibility in this case was found in [15] to be (−s | s) & t = t. So, the invertibility equivalence (−s | s) & t = t ⇔ ∃x. x · s = t is valid for any bit-width. Notice that the invertibility condition does not contain x. Hence, invertibility conditions can be seen as a technique for quantifier elimination.

In [15], a total of 160 invertibility conditions were provided. However, they were verified only for bit-widths up to 65, due to the reasoning limitations of SMT solvers mentioned earlier. Recent work [16,17] addresses this challenge by translating the invertibility equivalences to the combined theory of non-linear integer arithmetic and uninterpreted functions. This approach was partially successful, but failed to verify over a quarter of the equivalences.

We verify invertibility equivalences proposed in [15] by proving them interactively in Coq. From a representative subset of the invertibility equivalences, we prove 19 equivalences, 12 of which were not proven in [16,17]. For the remaining 7, that were already proved there, our Coq proofs provide more confidence. Our results offer evidence that proof assistants can support automated theorem provers in meta-verification tasks. To facilitate the verification of invertibility equivalences, we use a rich Coq library for bit-vectors, which is a part of the SMTCoq project [10]. This Coq library models the theory of fixed-width bitvectors adopted by the SMT-LIB 2 standard [3]. For this work, we extended the library with the arithmetic right-shift operation and the unsigned weak less-than


**Table 1.** The signatures Σ<sup>1</sup> and Σ<sup>0</sup> with SMT-LIB 2 syntax. Σ<sup>1</sup> consists of the operators in the entire table. Σ<sup>0</sup> consists of the operators in the upper part.

and greater-than predicates. To summarize, the contributions of this paper are as follows: (i) a description of the SMTCoq bit-vector library; (ii) extensions to the signature and proofs of the library; and (iii) formal proofs in Coq of invertibility equivalences. These contributions, while important in their own right, have the potential to go beyond the verification of invertibility equivalences. For (i) and (ii), we envision that the library, as well as its extension, will be useful for the formalization of other bit-precise reasoning mechanisms, especially related to SMT, such as rewriting rules, lemma schemas, interactive verification, and more. For (iii), invertibility conditions are primarily used for quantifier instantiation (see, e.g., [15]). We hope that the increased confidence in their correctness will encourage their usage in other contexts and in more solvers. Further, the formal proofs can serve as guiding examples for other proofs related to bit-precise reasoning.

The remainder of this paper is organized as follows. After technical preliminaries in Sect. 2, we formalize invertibility conditions in Sect. 3 and discuss previous attempts at verifying them. In Sect. 4, we describe the Coq library and our extensions to it. In Sect. 5, we discuss our Coq proofs. We conclude in Sect. 6 with directions for future work. A preliminary version of this work was presented as an extended abstract in the proceedings of the PxTP 2019 workshop [11]. The current version is more detailed and complete. In particular, the one Coq proof that was missing in [11] is now completed.

#### **2 Preliminaries**

#### **2.1 Theory of Bit-Vectors**

We assume the usual terminology of many-sorted first-order logic with equality (see, e.g., [12]). We denote equality by =, and use x = y as an abbreviation for ¬(x = y). The signature Σ*BV* of the SMT-LIB 2 theory of fixed-width bitvectors defines a unique sort for each positive integer n, which we denote by σ[*n*]. For every positive integer n and bit-vector of width n, the signature contains a constant symbol of sort σ[*n*], representing that bit-vector, which we denote as a binary string of length n. The function and predicate symbols of Σ*BV* are as described in the SMT-LIB 2 standard. Formulas of Σ*BV* are built from variables, bit-vector constants, and the function and predicate symbols of Σ*BV* , along with the usual logical connectives and quantifiers. We write ψ[x1,...,x*n*] to represent a formula whose free variables are from the set {x1,...,x*n*}.

The semantics of Σ*BV* -formulas is given by interpretations where the domain of σ[*n*] is the set of bit-vectors of width n, and the function and predicate symbols are interpreted as specified by the SMT-LIB 2 standard. A Σ*BV* -formula is *valid* in the theory of fixed-width bit-vectors if it is satisfied by every such interpretation.

Table 1 contains the operators from Σ*BV* for which invertibility conditions were defined in [15]. We define Σ<sup>1</sup> to be the signature that contains only these symbols. Σ<sup>0</sup> is the sub-signature obtained by only taking the operators from the upper part of the table. We use the (overloaded) constant 0 to represent the bit-vectors composed of all 0-bits.

#### **2.2 Coq**

The Coq proof assistant is based on the calculus of inductive constructions (CIC) [20]. It implements properties as types, and proofs as terms, reducing proof-checking to type-checking. Coq has a rich type system, that allows for highly expressive propositions to be stated and proved in this manner. One particular feature of interest is that of *dependent types* — types that can depend on values — through which one can express correctness properties within types. We refer to non-dependent types as *simple types*.

The Coq module system — in addition to allowing for principled separations of large developments — allows the abstraction of complex types along with operations over them as *modules*. A *module signature* or *module type* acts as an interface to a module, specifying the type it encapsulates along with the signatures of the associated operators. A *functor* is a module-to-module function.

## **3 Invertibility Conditions and Their Verification**

In [15], a technique to solve quantified bit-vector formulas is presented, which is based on *invertibility conditions*.

**Definition 1.** *An invertibility condition for a variable* x *in a* Σ*BV -literal* [x, s, t] *is a formula* IC[s, t] *such that* ∀s.∀t. IC[s, t] ⇔ ∃x. [x, s, t] *is valid in the theory of fixed-width bit-vectors.*

*Example 1.* The invertibility condition for x in x & s = t is t & s = t.

In [15], invertibility conditions are defined for a representative set of literals over the bit-vector operators of Σ1, having a single occurrence of x. The soundness of the technique proposed in that work relies on the correctness of the invertibility conditions. Every literal [x, s, t] and its corresponding invertibility condition IC[s, t] induce an *invertibility equivalence*.

**Definition 2.** *The* invertibility equivalence *associated with the literal* [x, s, t] *and its invertibility condition* IC[s, t] *is the formula*

$$IC[s, t] \Leftrightarrow \exists x. \; \ell[x, s, t] \tag{1}$$

The correctness of invertibility equivalences should be verified for all possible sorts for the variables x, s, t for which the condition is well sorted. Concretely, one needs to prove the validity of the following formula:

$$\forall n: \mathbb{N}. \; n > 0 \Rightarrow \forall s: \sigma\_{[n]}. \forall t: \sigma\_{[n]}. \; IC[s, t] \Leftrightarrow \exists x: \sigma\_{[n]}. \; \ell[x, s, t] \tag{2}$$

This was done in [15], but only for concrete values of n from 1 to 65, using solvers for the theory of fixed-width bit-vectors. In contrast, Eq. (2) cannot even be expressed in this theory. To overcome this limitation, later work suggested a translation from bit-vector formulas over *parametric* bit-widths to the theory of non-linear integer arithmetic with uninterpreted functions [16,17]. Thanks to this translation, the authors were able to verify the correctness of 110 out of 160 invertibility equivalences. For the remaining 50 equivalences, it then seems appropriate to use a proof-assistant, as this allows for more intervention by the user who can provide crucial intermediate steps. Even for the 110 invertibility equivalences that were proved, the level of confidence achieved by proving them in a proof assistant would be greater than an automatic verification by an SMT solver due to the smaller trusted code-base of proof assistants in relation to those of automatic theorem provers such as SMT solvers.

**Fig. 1.** The level of confidence achieved by the different approaches.

Figure 1 depicts the level of confidence achieved by the various approaches to verify invertibility equivalences. The smallest circle, labelled *auto-65*, represents the approach taken by [15], where invertibility equivalences were verified automatically up to 65 bits. While a step in the right direction, this approach is insufficient, because invertibility conditions are used for arbitrary bit-widths. The next circle, labeled *auto-ind*, depicts the approach of [17], which addresses the restrictions of auto-65 by providing bit-width independent proofs of the invertibility equivalences. However, both auto-65 and auto-ind provide proofs by SMT solvers, which are less trusted than ITPs. The largest circle (*Coq*) corresponds to work presented in the current paper which, while addressing the limitations of auto-65 via bit-width independent proofs, also provides stronger verification guarantees by proving the equivalences in an interactive theorem prover. Moreover, with this approach, we were able to prove equivalences that couldn't be fully verified (for arbitrary bit-widths) by either auto-65 or auto-ind.

#### **4 The** BVList **Library**

In this section, we describe the Coq library we use and the extensions we developed with the goal of formalizing and proving invertibility equivalences. Various formalizations of bit-vectors in Coq exist. The internal Coq library of bitvectors [9] is one, but it has only definitions and no lemmas. The Bedrock Bit Vectors Library [6] treats bit-vectors as words (machine integers). The SSRBit Library [5] represents bit-vectors as finite bit-sets in Coq and extracts them to OCaml machine integers. Our library is more suited to the SMT-LIB 2 bitvectors, and includes operators that are not fully covered by any of the previously mentioned libraries. More recently, Shi et al. [22] developed a library called CoqQFBV that presents a bit-vector type as a sequence of Booleans, defines operators over it, and proves the correctness of these operations with respect to a (machine integer) semantics. [22] uses this library to define a bit-blasting algorithm in Coq, that is extracted into an OCaml program to perform certified bit-blasting. Since CoqQFBV covers the entire SMT-LIB 2 bit-vector signature, it would be a good alternative to ours in formalizing and proving invertibility conditions. Our library offers a rich set of lemmas over bit-vector operations that makes it suitable for proofs of invertibility conditions and other bit-vector properties. Bit-vectors have also been formalized in other proof assistants. Within the Isabelle/HOL framework, one can utilize the library developed by Beeren et al. [4] to align with SMT-LIB 2 bit-vector operations. Furthermore, Harrison [1] presents a formalization of finite-dimensional Euclidean space within HOL light, accompanied by an implementation of vectors.

#### **4.1** BVList **Without Extensions**

BVList was developed for SMTCoq [10], a Coq plugin that enables Coq to dispatch proofs to external proof-producing solvers. While the library was only briefly mentioned in [10], here we provide more details.

The library adopts the little-endian notation for bit-vectors, following the internal representation of bit-vectors in SMT solvers such as cvc5, and corresponding to lists in Coq. This makes arithmetic operations easier to perform since the least significant bit of a bit-vector is the head of the Boolean list that represents it.

Another choice is how to formalize the bit-vector type. A dependently-typed definition is natural, since then the type of a bit-vector is parameterized by its length. However, such a representation leads to some difficulties in proofs. Dependent pattern-matching or case-analysis with dependent types is cumbersome and unduly complex (see, e.g., [23]), because of the complications brought by unification in Coq (which is inherently undecidable [24]). A simply-typed definition, on the other hand, does not provide such obstacles for proofs, but is less natural, as the length becomes external to the type. The BVList library defines for convenience both the dependently and the simply typed version of bit-vectors. It uses the Coq module system to separate them, and a functor that connects them, avoiding redundancy. The relationship between the two definitions is depicted in Fig. 2.

In BVList, a dependently-typed bit-vector is a record parameterized by its size n and consisting of two fields: a Boolean list and a condition to ensure that the list has length n. This type, and the corresponding lemmas and properties over it, are encapsulated by the BITVECTOR LIST module of type BITVECTOR. A simply-typed or *raw* bit-vector representation is simply a Boolean list which, along with its associated operators and lemmas is specified by module signature RAWBITVECTOR and implemented in module RAWBITVECTOR LIST. In other words, the interface of BVList offers dependently-typed bit-vectors, while the underlying operators are defined and proofs are performed using raw bit-vectors.

**Fig. 2.** Modular separation of BVList

A functor called RAW2BITVECTOR derives corresponding definitions and proofs over dependently-typed bit-vectors within the module for dependent-types, when it is applied to RAWBITVECTOR LIST. The functor establishes a correspondence between the two theories so that one can first prove a bit-vector property in the context of the simply-typed theory and then map it to its corresponding dependently-typed one via the functor module. Otherwise put, users of the library can encode theorem statements more naturaly, and in a more expressive environment employing dependent types. For proofs, one can unlift them (by the functor) to the equivalent encodings with simple types, and prove them there.

#### **4.2 Extending** BVList

Out of the 13 bit-vector functions and 10 predicates contained in Σ1, BVList had direct support for 10 functions and 6 predicates. The predicate symbols that were not directly supported were the weak inequalities ≤*u*, ≥*u*, ≤*s*, ≥*<sup>s</sup>* and the unsupported function symbols were >>*a*, ÷, and mod . We extended BVList with the operator >>*<sup>a</sup>* and the predicates ≤*<sup>u</sup>* and ≥*<sup>u</sup>* in order to support the corresponding invertibility conditions. Additionally, we redefined << and >> in order to simplify the proofs of invertibility conditions over them.<sup>1</sup>

We focused on invertibility conditions for literals of the form x  s t and s  x t, where  and are respectively function and predicate symbols in Σ0. Σ<sup>0</sup> was chosen as a representative set because it is both expressive enough (in the sense that other operators can be easily translated to this fragment), and

<sup>1</sup> Both the extended library and the proofs of invertibility equivalences can be found at https://github.com/ekiciburak/bitvector/tree/frocos23.

feasible for proofs in Coq using the library. In particular, it was chosen as one that would require the minimal amount of changes to BVList. As a result, such literals, as well as their invertibility conditions, contain only operators supported by BVList (after its extension with >>*a*, ≤*u*, and ≥*u*). Supporting the full set of operators in Σ1, both in the library and the proofs is left for future work.

```
1 Fixpoint ule_list_big_endian (x y : list bool) :=
2 match x, y with
3 | [ ], [ ] ⇒ true
4 | [ ], _ ⇒ false
5 | _, [] ⇒ false
6 | xi:: x', yi:: y' ⇒ ((eqb xi yi) && (ule_list_big_endian x' y'))
7 || ((negb xi) && yi)
8 end.
9
10 Definition ule_list (x y: list bool) :=
11 (ule_list_big_endian (rev x) (rev y)).
12
13 Definition bv_ule (a b : bitvector) :=
14 if @size a =? @size b then
15 ule_list a b
16 else
17 false.
18
19 Definition bv_ule n (bv1 bv2:bitvector n) : bool := M.bv_ule bv1 bv2.
```
**Fig. 3.** Definitions of ≤*<sup>u</sup>* in Coq.

In what follows, we describe our extensions to BVList with weak unsigned inequalities, alternative definitions for logical shifts, and the arithmetic right shift operator.

**Weak Unsigned Inequalities.** We added both weak inequalities for unsigned bit-vectors, ≤*<sup>u</sup>* and ≥*u*. We illustrate this extension via that of the ≤*<sup>u</sup>* operator (the extension of ≥*<sup>u</sup>* is similar). The relevant Coq definitions are provided in Fig. 3. The top three definitions (including the fixpoint) cover the simplytyped representation, and the fourth, bv ule is the dependently-typed representation that invokes the definition with the same name from module M of type RAWBITVECTOR. Like most other operators, ≤*<sup>u</sup>* (over raw bit-vectors) is defined over a few *layers*. The function bv ule, at the highest layer, ensures that comparisons are between bit-vectors of the same size and then calls ule list. Since we want to compare bit-vectors starting from their most significant bits and the input lists start instead with the least significant bits, ule list first reverses the two lists. Then it calls ule list big endian, which we consider to be at the lowest layer of the definition. This function does a lexicographic comparison of the two lists, starting from the most significant bits.

To see why the addition of ≤*<sup>u</sup>* to the library is useful, consider, for example, the following parametric lemma, stating that ∼0 is the largest unsigned bit-

$$\forall x: \sigma\_{[n]}. \ x \leq\_u \sim 0 \tag{3}$$

Without an operator for the weak inequality, we would write it as:

vector of its type:

$$\forall x: \sigma\_{[n]}. \ x <\_u \sim 0 \lor x = \sim 0 \tag{4}$$

```
1 Definition shl_one_bit (a: list bool) :=
2 match a with
3 | [ ]⇒ [ ]
4 | _ ⇒ false :: removelast a
5 end.
6
7 Fixpoint shl_n_bits (a: list bool) (n: nat) :=
8 match n with
9 | O ⇒ a
10 | S n' ⇒ shl_n_bits (shl_one_bit a) n'
11 end.
12
13 Definition shl_n_bits_a (a: list bool) (n: nat) :=
14 if (n <? length a)%nat then
15 mk_list_false n ++ firstn (length a -n) a
16 else
17 mk_list_false (length a).
18
19 Theorem bv_shl_eq: forall (a b : bitvector), bv_shl a b = bv_shl_a a b.
```
**Fig. 4.** Various definitions of <<.

In such cases, since the definitions of <*<sup>u</sup>* and = have a similar structure to that of ≤*u*, we strip down the layers of <*<sup>u</sup>* and = separately, whereas using ≤*u*, we only do this once.

**Left and Right Logical Shifts.** We have redefined the shift operators << and >> in BVList. Figure 4 shows both the original and new definitions of <<. Those of >> are similar. Originally, << was defined using the shl one bit and shl n bits. The function shl one bit shifts the bit-vector to the left by one bit and is called by shl n bits as many times as necessary. The new definition shl n bits a uses mk list false which constructs the necessary list of 0 bits and appends (++ in Coq) to it the bits to be shifted from the original bit-vector, which are retrieved using the firstn function, from the Coq standard library for lists. The nat type used in Fig. 4 is the Coq representation of Peano natural numbers that has 0 and S as its two constructors — as depicted in the cases rendered by pattern matching n (lines 9-10). The theorem at the bottom of Fig. 4 asserts the equivalence of the two representations, allowing us to switch between them, when needed. In the extended library, bv shl defines the left shift operation using shl n bits whereas bv shl a does it using shl n bits a. This new representation was useful in proving some of the invertibility equivalences over shift operators (see, e.g., Example 4 below).

**Arithmetic Right Shift.** Unlike logical shifts that were already defined in BVList and for which we have added alternative definitions, arithmetic right shift was not defined at all. We provided two alternative definitions for it, very similar to the definitions of logical shifts — bv ashr and bv ashr a. Both definitions are conditional on the sign of the bit-vector (its most-significant bit). Apart from this detail, the definitions take the same approach taken by shl n bits and shl n bits a from Fig. 4. Operator bv ashr uses the definition of an independent shift and repeats it as many number of times as necessary, and bv ashr a uses either mk list false or mk list true to append the necessary number of sign bits to the shifted bits.

## **5 Proving Invertibility Equivalences in Coq**

In this section we provide specific details about proving invertibility equivalences in Coq. We start by outlining the general approach for proving invertibility equivalences in Sect. 5.1. Then, Sect. 5.2 presents detailed examples of such proofs. Section 5.3 summarizes the results and impact of these proofs.

#### **5.1 General Approach**

The natural representation of bit-vectors in Coq is the dependently-typed representation, and therefore the invertibility equivalences are formulated using this representation. In keeping with the modular approach described in Sect. 4, however, proofs in this representation are composed of proofs over simply-typed bit-vectors, which are easier to reason about. Most of the work is on proving an equivalence over raw bit-vectors. Then, we derive the proof of the corresponding equivalence over dependently-typed bit-vectors using a smaller, boilerplate set of tactics. Since this derivation process is mostly the same across many equivalences, these tactics are a good candidate for automation in the future.

When proving an invertibility equivalence IC[s, t] ⇔ ∃x. [x, s, t], we first split it into two sub-goals: the left-to-right and right-to-left implications. For proving the left-to-right implication, since Coq implements a constructive logic, the only way to prove an existentially quantified formula is to construct the literal witnessing it. Thus, in addition to being able to prove the equivalence, a positive side-effect of our proofs are actual inverses for x in literals of the form [x, s, t]. In Niemetz et al. [16], these are called *conditional inverses*, as the fact that they are inverses is conditional on the correctness of the invertibility condition. There, such inverses were synthesized automatically for a subset of the literals. In each of our Coq proofs, such an inverse is found, even when the proof is done by case-splitting. This provides a more general solution than the one in [16], which did not consider case-splitting.

*Example 2.* Consider the literal s >>*<sup>a</sup>* x ≥*<sup>u</sup>* t. Its invertibility condition is (s ≥*<sup>u</sup>* ∼s) ∨ (s ≥*<sup>u</sup>* t). The left-to-right implication of the invertibility equivalence is:

$$\forall s, t: \sigma\_{[n]}. \ (s \ge\_u \sim s) \lor (s \ge\_u t) \Rightarrow \exists x: \sigma\_{[n]}. \ s \gg\_a x \ge\_u t$$

Here, case splitting is done on the disjunction in the invertibility condition. When s ≥*<sup>u</sup>* ∼s is true, the inverse for x is the bit-vector constant that correspond to the length of the s, namely n; when s ≥*<sup>u</sup>* t is true, the inverse is 0.

In addition to BVList, several proofs of invertibility equivalences benefited from CoqHammer [7], a plug-in that aims at extending the level of automation in Coq by combining machine learning and automated reasoning techniques in a similar fashion to what is done in by Sledgehammer [21] in Isabelle/HOL [18]. CoqHammer, when triggered on some Coq goal, (i) submits the goal together with potentially useful terms to external solvers/automatedprovers, (ii) attempts to reconstruct returned proofs (if any) directly in the Coq tactic language Ltac [8], and (iii) outputs the set of tactics closing the goal in case of success. As we directly employ these tactics inside BVList, one does not need to install CoqHammer in order to build the library, although it would be beneficial for further extensions.

#### **5.2 Detailed Examples**

In this section we provide specific examples for proofs of invertibility equivalences. The first example illustrates the two-theories approach of the library.

*Example 3.* Consider the literal s >>*<sup>a</sup>* x <*<sup>u</sup>* t. Its invertibility condition is ((s <*<sup>u</sup>* t ∨ ¬(s <*<sup>s</sup>* 0)) ∧ t = 0). Figure 5 shows the proof of the following direction of the corresponding invertibility equivalence:

$$(\forall s, t: \sigma\_{[n]}. \ (\exists x: \sigma\_{[n]}. \ s >>\_a x <\_u t) \Rightarrow ((s <\_u t \lor \neg(s <\_s 0)) \land t \neq 0)$$

In the proof, lines 8–11 transform the dependent bit-vectors from the goal and the hypotheses into simply-typed bit-vectors. Then, lines 12–14 invoke the corresponding lemma for simply-typed bit-vectors (called InvCond.bvashr ult2 rtl) along with some simplifications.

Most of the effort in this project went into proving equivalences over raw bit-vectors, as the following example illustrates.

*Example 4.* Consider the literal x <<s >*<sup>u</sup>* t. Its invertibility condition is (t <*<sup>u</sup>* ∼0 << s). The corresponding invertibility equivalence is:

$$\forall s, t: \sigma\_{[n]}. \ (t <\_u \sim 0 << s) \Leftrightarrow (\exists x: \sigma\_{[n]}. \ x << s >\_u t) \tag{5}$$

The left-to-right implication is easy to prove using ∼0 itself as the witness of the existential proof goal and considering the symmetry between >*<sup>u</sup>* and <*u*. The proof of the right-to-left implication relies on the following lemma:

$$\forall x, s: \sigma\_{[n]}. \ (x << s) \leq\_u (\sim 0 << s) \tag{6}$$

From the right side of the equivalence in Eq. (5), we get some skolem x for which x <<s>*<sup>u</sup>* t holds. Flipping the inequality, we have that t <*<sup>u</sup>* x << s; using this, and transitivity over <*<sup>u</sup>* and ≤*u*, the lemma given by Eq. (6) gives us the left side of the equivalence in Eq. (5).

As mentioned in Sect. 4, we have redefined the shift operators << and >> in the library. This was instrumental, for example, in the proof of Eq. (6).

```
1 Theorem bvashr_ult2_rtl :
2 forall (n : N), forall (s t : bitvector n),
3 (exists (x : bitvector n), (bv_ult (bv_ashr_a s x) t = true)) ->
4 ((( bv_ult s t = true) ∨ (bv_slt s (zeros n)) = false) ∧
5 (bv_eq t (zeros n)) = false).
6 Proof.
7 intros nstH.
8 destruct H as ((x, Hx), H).
9 destruct s as (s, Hs).
10 destruct t as (t, Ht).
11 unfold bv_ult, bv_slt, bv_ashr_a, bv_eq, bv in ∗. cbn in ∗.
12 specialize (InvCond.bvashr_ult2_rtl n s t Hs Ht); intro STIC.
13 rewrite Hs, Ht in STIC. apply STIC.
14 now exists x.
15 Qed.
```
**Fig. 5.** A proof of one direction of the invertibility equivalence for >>*<sup>a</sup>* and <*<sup>u</sup>* using dependent types.

The new definition uses firstn and ++, over which many useful properties are already proven in the standard library. This benefits us in manual proofs, and in calls to CoqHammer, since the latter is able to use lemmas from the imported libraries to prove the goals that are given to it. Using this representation, proving Eq. (6) reduces to proving Lemmas bv ule 1 firstn and bv ule pre append, shown in Fig. 6. The proof of bv ule pre append benefited from the property app comm cons from the standard list library of Coq, whereas firstn length le was useful in reducing the goal of bv ule 1 firstn to the Coq equivalent of Eq. (3). The statements of the properties mentioned from the standard library are also shown in Fig. 6.

Finally, we examine what was considered a challenge problem in the previous version of this work [11]. The next example details how we completed the proof. *Example 5.* Consider the literal (x >> s) >*<sup>u</sup>* t. Its invertibility condition is t <*<sup>u</sup>* (∼s >> s). Now consider the following direction of the corresponding invertibility equivalence:

$$\forall s, t: \sigma\_{[n]}. \ t <\_u (\sim s >> s) \Rightarrow \exists x: \sigma\_{[n]}. \ (x >> s) >\_u t \tag{7}$$

Figure 7 contains the theorem stating the equivalence, and some lemmas used within its proof. A crucial step in the proof of the implication is to rewrite the definition of the right shift operator bv shr to its alternate definition bv shr a (see Sect. 4.2). Unfolding the alternative definition leads to a case-analysis on the following condition:

$$\mathtt{toMat}(s) < \mathtt{len}(x)$$

where toNat casts a bit-vector to its natural number representation, and len returns the length of a bit-vector as a natural number.

```
1 Lemma bv_ule_1_firstn : forall (n : nat) (x : bitvector),
2 (n < length x)%nat ->
3 bv_ule (firstn n x) firstn n (mk_list_true (length x))) = true.
4
5 Lemma bv_ule_pre_append : forall (xyz : bitvector),
6 bv_ule x y = true -> bv_ule (z ++ x) (z ++ y) = true.
7
8 Theorem app_comm_cons : forall (x y:list A) (a:A),
9 a :: (x ++ y)=(a :: x) ++ y
10
11 Lemma firstn_length_le: forall l:list A, forall n:nat,
12 n <= length l -> length (firstn n l) = n.
```
The challenge in the proof arises in the positive case of the condition, which reduces to a proof of first bits zero (see Fig. 7). first bits zero says that given toNat(s) < len(s), the most-significant len(s) − toNat(s) bits of s are 0. As seen in Fig. 4, the second argument to the top-most layer of the shift (called from bv shl eq) is a bit-vector that specifies the number of times to shift the bit-vector in the first argument. This second argument is converted to a natural number by the abstract toNat function invoked above, the concrete definitions of which are specified in Fig. 7 as list2nat be a and list2N. At the same level of abstraction, we use rev for the list reversal function corresponding to the Coq function of the same name, and firstn also for its Coq namesake (firstn n l returns the n most significant bits of l), so that first bits zero can be specified as follows:

$$\mathtt{tol\mathsf{lat}}(s) < \mathtt{len}(s) \Rightarrow \mathtt{first\mathsf{nt}}\left(\mathtt{len}(s) - \mathtt{tol\mathsf{att}}(s)\right)\left(\mathtt{rev}(s)\right) = 0$$

The intuition behind its validity is that if the most-significant len(s)−toNat(s) bits were not 0 then they would contribute to the value of toNat(s), making it

greater than or equal to len(s) and thus falsifying the condition. However, it is challenging to convert this intuition into a proof using induction over lists, as explained in what follows.

To prove first bits zero, we redefined list2N as a tail-recursive function list2NTR. This step was proven to be sound by a lemma of equivalence between the two definitions (list2N eq). Since list2N is not tail recursive, it only begins computation at the end of the input list representing a bit-vector. Such a definition further complicates the proof of first bits zero when based on the typical induction principle over the structure of the Boolean list underlying the bit-vector s. This is because it does not easily reduce (via ι-reduction for inductive definitions [19]), into a useful expression in the step case of the intended induction.

The advantage of tail recursion in this context is best illustrated by Fig. 8 where x is a Boolean variable and xs represents an arbitrary Boolean list. The

```
1 Theorem bvshr_ugt_ltr : forall (n : N), forall (s t : bitvector n),
2 (bv_ult t (bv_shr (bv_not s) s) = true) ->
3 (exists (x : bitvector n), bv_ugt (bv_shr x s) t = true).
5 Lemma first_bits_zero : forall (s : bitvector),
6 (N.to_nat (list2N s) < length s)%nat ->
7 firstn (length s - N.to_nat (list2N s)) (rev s) =
8 mk_list_false (length s -N.to_nat (list2N s)).
10 Lemma first_bits_zeroA : forall (s : bitvector),
11 (length s >= (list2NTR s))%nat ->
12 firstn (length s - (list2NTR s)) s =
13 mk_list_false (length s -(list2NTR s)).
15 Fixpoint list2N (a: list bool) :=
16 match a with
17 | [ ] ⇒ 0
18 | x :: xs ⇒ if x then N.succ_double (list2N xs) else
19 N.double (list2N xs)
20 end.
22 Definition list2nat_be_a (a: list bool) := N.to_nat (list2N a).
24 Fixpoint list2NR (a: list bool) (n: nat) :=
25 match a with
26 | [ ] ⇒ n
27 | x :: xs ⇒ if x then list2NR xs (2 ∗ n + 1) else
28 list2NR xs (2 ∗ n)
29 end.
31 Definition list2NTR (a: list bool) := list2NR a 0.
33 Lemma list2N_eq: forall (s: bitvector),
34 list2NTR (rev s) = N.to_nat (list2N s).
```
**Fig. 7.** Invertibility equivalence for >> and >*<sup>u</sup>* and some lemmas used by its proof.


**Fig. 8.** Sub-goals generated in the proof of first bits zero. Note that 0 is a bit-vector constant of the appropriate length (list of falses).

derivation of the goal from the inductive hypothesis (IH) in derivation (8) from Fig. 8 is complicated in Coq because the functions firstn and rev are not wellmatched with list2N, if not incompatible. For instance, observe that the in the inductive step (Goal), as the first argument to firstn increases, the number of bits fetched from the list increases towards the *right*. However, due to the littleendian notation of bit-vectors and the fact that the list cons function (::) can be seen as incrementing its argument list to its *left*, the rev function must be used to corrects the direction of increase of the second argument to firstn. Despite this correction, an induction over s must deal with two structurally different lists.

In contrast, the tail-recursive definition of list2NTR hides the rev function. This is illustrated in derivation (9) in Fig. 8, where toNatTR corresponds to list2NTR. Furthermore, such an induction over lists using append (++) to the right, rather than cons to the left is possible thanks to the *reverse induction principle*<sup>2</sup>. Closing such a goal allowed us to prove the list2NTR-variant of first bits zero, specified as first bits zeroA in Fig. 7, and the proof of equivalence between the two definitions (list2N eq) allowed us to use this in closing the original goal (7).

#### **5.3 Results**

Table 2 summarizes the results of proving invertibility equivalences for invertibility conditions in the signature <sup>Σ</sup>0. In the table, means that the invertibility equivalence was successfully verified in Coq but not in Niemetz et al. [17], and means the opposite; - means that the invertibility equivalence was verified using both approaches. We successfully proved all invertibility equivalences over = that are expressible in Σ0, including 4 that were not proved in [17]. For the rest of the predicates, we focused only on the 8 invertibility equivalences that were not proved in [17], and succeeded in proving all of them.

Our work thus complements [17] in verifying all invertibility conditions in Σ<sup>0</sup> for arbitrary bit-widths, by proving all 12 equivalences that were previously unverified, and corroborating 7 others that were verified by SMT solvers. It also complements [15], which verified all invertibility conditions in Σ1, but only up to bit-width of 65.

<sup>2</sup> see rev ind in https://coq.inria.fr/library/Coq.Lists.List.html.

**Table 2.** Proved invertibility equivalences in Σ<sup>0</sup> where ranges over the given predicate symbols. means that the invertibility equivalence was successfully verified in Coq but not in [17], whereas means the opposite; - means that the invertibility equivalence was verified using both approaches.


### **6 Conclusion and Future Work**

We have described our work on verifying bit-vector invertibility conditions in the Coq proof assistant, which required extending the BVList library in Coq. In addition to describing the library and our extensions to it, this paper presented details about the Coq proofs of the invertibility equivalences. These were done on a representative subset of the operators from the theory of bit-vectors that is well-supported by the extended library. We were able to prove in Coq all the equivalences that were left unproven in previous attempts for all bit-widths, and also to prove in Coq some equivalences that were proven automatically before, thus increasing confidence in their correctness.

The most immediate direction for future work is proving more of the invertibility equivalences supported by the bit-vector library. In addition, we plan to extend the library so that it supports the full syntax in which invertibility conditions are expressed, namely Σ1. This will also increase the potential usage of the library for other applications. Another direction for future work is to extend the proofs for invertibility conditions where some of the bits are known. Such invertibility conditions were introduced by Niemetz and Preiner [14]. However, their formal verification for every bit-width is yet to be done.

**Acknowledgements.** This work was funded in part by NSF-BSF grant numbers 2110397 (NSF) and 2020704 (BSF), and ISF grant number 619/21.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Unification**

# **Weighted Path Orders Are Semantic Path Orders**

Teppei Saito(B) and Nao Hirokawa

JAIST, Nomi, Japan *{*saito,hirokawa*}*@jaist.ac.jp

**Abstract.** We explore the relationship between weighted path orders and (monotonic) semantic path orders. Our findings reveal that weighted path orders can be considered instances of a variant of semantic path orders that comprise order pairs. This observation leads to a generalization of weighted path orders that does not impose simplicity on their underlying algebras. As a result, the generalized version is capable of proving termination of term rewrite systems beyond the realm of simple termination. In order to assess practicality we provide experimental data comparing generalized weighted path orders with the original ones as well as other well-known classes of reduction orders.

**Keywords:** Term Rewriting · Termination · Weighted Path Order · Semantic Path Order

## **1 Introduction**

Reduction orders are a fundamental tool in termination analysis of term rewrite systems, and they also underlie completion-based automated theorem proving. *Weighted path orders* (WPOs) [27] are known as a versatile class of reduction orders; WPOs can simulate (generalized) Knuth–Bendix orders [7,13,16] and lexicographic path orders [12], depending on the choice of parameters, namely *simple monotone algebras* and precedences. In fact, weighted path orders are so powerful that they characterize simple termination of term rewrite systems [20, Definition 6.3.7], that is, a term rewrite system is simply terminating if and only if it admits a compatible WPO. Besides automated termination analysis [14,26], WPOs are used in reachability analysis [25], and automated theorem proving [11,18].

Another well-known class of reduction orders is the class of *monotonic semantic path orders* (MSPOs) [4,5], which are a monotonic version of semantic path orders (SPOs) [12]. MSPOs take triples of orders (called reduction triples) as parameters, and provide a complete characterization of terminating term rewrite

T. Saito—supported by JST SPRING, Grant Number JPMJSP2102. N. Hirokawa supported by JSPS KAKENHI Grant Number JP22K11900.

c The Author(s) 2023 U. Sattler and M. Suda (Eds.): FroCoS 2023, LNAI 14279, pp. 63–80, 2023. https://doi.org/10.1007/978-3-031-43369-6\_4

systems: A term rewrite system is terminating if and only if it admits a compatible MSPO. However, the relationship between WPOs and MSPOs has not been known [24].

In this paper, we give a solution to the open problem, demonstrating an effective construction of an MSPO from the algebra and the precedence of a given WPO. The key of the proof lies in finding a suitable new *variant* of MSPOs, which is described as follows: First, the variant uses lexicographic comparison [4, Definition 4.5.1], as the original WPOs [27, Definition 5] are based on this comparison strategy. Second, the variant employs reduction triples [4, Definition 4.1.19] because an example shows that a variant based on (quasi-)reduction pairs [5, Definition 4] leads to an invalid construction.

The obtained simulation result leads to a generalization of WPOs that does not impose simplicity on their underlying algebras. The generalization can show termination of term rewrite systems that are not simply terminating. This is a sharp contrast to the termination proving power of WPOs. In addition, upgrading WPOs to GWPOs can be done with little implementation effort, so we anticipate that tools which employ WPOs as reduction orders (e.g. [11,14,18,23,25,26]) may benefit from power of GWPOs.

The remaining part of the paper is organized as follows. After recalling notions and notations for term rewriting and WPOs in Sect. 2, we introduce a slightly modified version of semantic path orders that employs order pairs in Sect. 3. In Sect. 4 we show that weighted path orders are instances of semantic path orders. Using this fact, we introduce a generalization of WPOs in Sect. 5. In Sect. 6 experimental data for (generalized) weighted path orders are reported. As in the case of MSPOs [5, Section 5.2], GWPOs are capable of simulating a basic version of the dependency pair method [1]. This is discussed in Sect. 7. The paper is concluded by stating related work in Sect. 8.

# **2 Preliminaries**

Throughout the paper, we assume familiarity with term rewriting [3,20]. First we briefly recall basic notions for term rewriting and reduction orders, and then introduce weighted path orders.

#### **2.1 Term Rewriting**

Let F be a signature and V a countable set of variables with F∩V = ∅. The set of all *terms* built from F and V is referred to as T (F, V), or just as T when F and V are clear from the context. When we need to indicate the arity n of a function symbol f, we write f(n) for f. Quasi-orders on the signature are called *(quasi-)precedences*. A quasi-precedence is called *well-founded* if its strict part is well-founded. The *size* |t| of a term t is the number of function symbols and variables occurring in t. Let be a constant with ∈ F / . *Contexts* are terms over F∪{} that contain exactly one . The term resulting from replacing in a context C by a term t is denoted by C[t]. We write s t if there is a context C

with s = C[t]. The strict part of is denoted by . A *substitution* is a mapping σ from variables to terms such that {x ∈V| σ(x) = x} is finite. The application tσ of a substitution σ to a term t is inductively defined as follows: tσ = σ(t) if t is a variable, and tσ = f(t1σ, . . . , tnσ) if t = f(t1,...,tn).

A pair (, r) of terms is said to be a *rewrite rule* if is not a variable and every variable in r occurs in . Rewrite rules (, r) are written by → r. A set of rewrite rules is called a *term rewrite system* (TRS). Let R be a TRS. We write D<sup>R</sup> for the set of *defined symbols* {f | f(1,...,n) → r ∈ R}. The relation →<sup>R</sup> is defined on terms as follows: s →<sup>R</sup> t if there exist a rewrite rule → r ∈ R, a context C, and a substitution σ such that s = C[σ] and t = C[rσ] hold. The TRS R is said to be *terminating* if there is no infinite sequence t<sup>1</sup> →<sup>R</sup> t<sup>2</sup> →<sup>R</sup> ··· . A relation on terms is *closed under contexts* if C[s] C[t] holds whenever s t and C is a context, and it is called *closed under substitutions* or just *stable* if sσ tσ holds whenever s t and σ is a substitution. We say has the *subterm property* if s t for all terms s, t satisfying st. Relations closed under contexts and substitutions are called *rewrite relations*.

Termination is often shown by using orders. We say that a rewrite relation is a *rewrite preorder* or *reduction order* if it is a preorder or a well-founded order, respectively. A TRS R is *compatible* with a strict order > if R ⊆ >.

#### **Proposition 1.** *A TRS* R *is terminating if* R *is compatible with some reduction order* >*.*

An *ordered* F*-algebra* is a triple (A, {fA}<sup>f</sup>∈F , >), where A is a set called a *carrier*, f<sup>A</sup> is an n-ary function on A (called an *interpretation function*) associated with each <sup>f</sup>(n) ∈ F, and <sup>&</sup>gt; is a strict order on <sup>A</sup>. Let <sup>A</sup> = (A, {fA}<sup>f</sup>∈F , >) be an ordered algebra. A mapping from V to A is called an *assignment* for A. The interpretation [α]A(t) of a term t under an assignment α is inductively defined as follows: [α]A(t) = α(t) if t is a variable, and [α]A(t) = fA([α]A(t1),..., [α]A(tn)) if t = f(t1,...,tn). We write s ><sup>A</sup> t if [α]A(s) > [α]A(t) for all assignments α. The relation ><sup>A</sup> is a strict order. Similarly we write s <sup>A</sup> t if [α]A(s) [α]A(t) holds for all assignments α, where stands for the reflexive closure of >. The relation <sup>A</sup> is a quasi-order, and satisfies <sup>A</sup> · ><sup>A</sup> · <sup>A</sup> ⊆ >A. We say that the ordered algebra A is


If > is well-founded, so is >A. If A is a weakly monotone algebra, <sup>A</sup> is a rewrite preorder. If in addition A is simple, <sup>A</sup> has the subterm property ⊆ A.

#### **2.2 Weighted Path Orders**

Weighted path orders (WPOs) are reduction orders introduced by Yamada et al. [27]. The definition of WPOs is based on the pair of an ordered algebra A

and a precedences -. A WPO compares terms s, t as a generalized KBO does: First the terms are compared by s ><sup>A</sup> t. If only weak inequality s <sup>A</sup> t holds then their root symbols, say f and g, are compared by the precedence -. If again only weak inequality f g holds, arguments are compared *lexicographically*.

Lexicographic comparison is formalized as follows. Let > be a strict order on a set A and let A<sup>∗</sup> denote the set of all strings (tuples) over A. The *lexicographic extension* >lex of > is defined on A<sup>∗</sup> as follows: (a1,...,an) >lex (b1,...,bm) if there is a natural number k<n such that

– a<sup>j</sup> = b<sup>j</sup> for all 1 j k, and

– either k = m, or k<m and ak+1 > bk+1.

It is known that >lex is a strict order on A∗.

**Definition 1 (**[27]**).** *Let* A *be an ordered* F*-algebra and a precedence. The* weighted path order >wpo *is defined on terms over* F *as follows:* s >wpo t *if*

*1.* s ><sup>A</sup> t*, or 2.* s <sup>A</sup> t*,* s = f(s1,...,sm)*, and one of the following conditions holds. a.* s<sup>i</sup> wpo t *for some* 1 i m*. b.* t = g(t1,...,tn) *and* s >wpo t<sup>j</sup> *for all* 1 j n*, and moreover (i)* f g*, or (ii)* f g *and* (s1,...,sm) >lex wpo (t1,...,tn)*.*

*Here* wpo *denotes the reflexive closure of* >wpo*.*

**Theorem 1 (**[27]**).** *Suppose that the signature is finite. For every simple monotone well-founded algebra and well-founded precedence the induced relation* >wpo *is a reduction order with the subterm property.*

*Example 1.* Consider the following TRS R taken from [27, Example 9]:

$$\mathbf{f}(\mathbf{g}(x)) \to \mathbf{g}(\mathbf{f}(\mathbf{f}(x))) \qquad \qquad \qquad \mathbf{f}(\hbar(x)) \to \hbar(\hbar(\mathbf{f}(x)))$$

Let A be the simple monotone algebra on N with fA(x) = hA(x) = x and gA(x) = x + 1. Take a precedence with f g h. The relation f(g(x)) >wpo g(f(f(x))) is verified by the following derivation:

$$\frac{\mathbf{f}(\mathbf{g}(x)) \rhd\_{\mathcal{A}} \mathbf{g}(\mathbf{f}(\mathbf{f}(x)))}{\mathbf{f}(\mathbf{g}(x)) \rhd\_{\mathcal{A}} \mathbf{g}(\mathbf{f}(\mathbf{f}(x)))} \mathbf{f} \succ \mathbf{g} \quad \frac{\mathbf{f}(\mathbf{g}(x)) >\_{\mathcal{A}} \mathbf{f}(\mathbf{f}(x))}{\mathbf{f}(\mathbf{g}(x)) >\_{\mathbf{w} \mathbf{p} \mathbf{o}} \mathbf{g}(\mathbf{f}(\mathbf{f}(x)))} \text{ WPO 1}$$

Here WPO 1 and WPO 2b(i) indicate the corresponding conditions in Definition 1. Similarly, one can verify f(h(x)) >wpo h(h(f(x))). Therefore, R ⊆ >wpo follows. Hence, we conclude that R is terminating.

The following example shows that the simplicity condition cannot be dropped from Theorem 1.

*Example 2.* Any WPO >wpo induced by the weakly monotone but non-simple algebra A on N with a<sup>A</sup> = 1 and fA(x) = 0 lacks well-foundedness as it admits the cyclic sequence f(a) >wpo f(f(a)) >wpo f(a).

#### **3 Semantic Path Orders Based on Order Pairs**

Borralleras [4, Definition 4.1.19] introduced a variant of SPO that employs a pair of a quasi-order and a strict order. This variant compares arguments of terms by a multiset order. In order to simulate WPOs which compare arguments in a lexicographic manner, we introduce another variant of SPO.

We say that the pair (, >) of a quasi-order and a strict order > is an *order pair* if · > · ⊆ >. The inclusion is referred to as *compatibility*. We say that an order pair ( <sup>∼</sup>, ) on terms is *stable* if both <sup>∼</sup> and are stable.

**Definition 2.** *Let* ( <sup>∼</sup>, ) *be a stable order pair on* T \V*.* <sup>1</sup> *The* semantic path order >spo (SPO) *is defined on terms as follows:* s >spo t *if* s = f(s1,...,sm) *and one of the following conditions hold:*

*1.* s<sup>i</sup> spo t *for some* 1 i m*. 2.* t = g(t1,...,tn) *and* s >spo t<sup>j</sup> *for all* 1 j n*, and moreover a.* s t*, or b.* <sup>s</sup> <sup>∼</sup> <sup>t</sup> *and* (s1,...,sm) <sup>&</sup>gt;lex spo (t1,...,tn)*.*

*Here* spo *denotes the reflexive closure of* >spo*.*

*Remark 1.* The standard definitions of SPOs ( [12] and [4, Definition 4.1.19]) use the multiset extension of >spo in SPO 2b instead of the lexicographic extension. The lexicographic version of SPOs, introduced by Borralleras [4, Definition 4.5.1], can be obtained by setting to the strict part of <sup>∼</sup> in Definition 2.

*Example 3.* Lexicographic path orders (LPOs) are special instances of SPOs. Let be a precedence. Define <sup>f</sup>(s1,...,sm) <sup>∼</sup> <sup>g</sup>(t1,...,tn) by <sup>f</sup> g, and let be the strict part of <sup>∼</sup>. The semantic path order induced by ( <sup>∼</sup>, ) is the lexicographic path order induced by -.

Let ( <sup>∼</sup>, ) be a stable order pair on T \V and let <sup>&</sup>gt;spo be the semantic path order induced by ( <sup>∼</sup>, ). The transitivity, reflexivity, and stability of <sup>&</sup>gt;spo are straightforward. A small remark is that the compatibility <sup>∼</sup> · · <sup>∼</sup> <sup>⊆</sup> is used in the proof of the transitivity.

**Lemma 1.** *The SPO* >spo *is a stable strict order.*

When the signature is infinite, the lexicographic version of SPOs is not wellfounded in general even if is well-founded. This forms a contrast to the multiset versions of SPOs mentioned in Remark 1.

*Example 4.* Consider the signature consisting of a(0), b(0), and f (i) <sup>i</sup> for all numbers i ∈ N. Let be a well-founded precedence satisfying a b and f<sup>i</sup> f<sup>j</sup> for all i, j <sup>∈</sup> <sup>N</sup>. The pair ( <sup>∼</sup>, ) defined as in Example <sup>3</sup> is an order pair with well-founded, but the SPO <sup>&</sup>gt;spo induced from ( <sup>∼</sup>, ) admits the infinite chain:

f1(a) >spo f2(b, a) >spo f3(b, b, a) >spo ···

See [22, Section 3] and [19, Section 3] for related discussions.

<sup>1</sup> The restriction to *T \V* is not essential but meant to be a minimum requirement. Observe that Definition 2 uses the order pair only when *s* and *t* are not variables.

Well-foundedness of >spo is restored by assuming existence of an upper bound of arities. We refer to this property as *boundedness* of the signature. Needless to say, a signature is bounded whenever it is finite.

Hereafter we assume that is well-founded and F is bounded. For showing that >spo is well-founded, we adopt Buchholz's method [6]. One can find a similar proof in [27, Lemma 8]. We write SN(>spo) for the set of all terms t such that there is no infinite descending sequence t >spo t<sup>1</sup> >spo t<sup>2</sup> >spo ··· starting from t. <sup>2</sup> The following properties are immediate:


Buchholz's method proves well-foundedness by well-founded induction. To express our well-founded order for induction, we recall the notion of the lexicographic product of order pairs. Let (1, >1),...,(n, >n) be n order pairs on sets A1,...,An, respectively. The *lexicographic product* (1, >1)⊗···⊗(n, >n) is the strict order > defined on A1×···×A<sup>n</sup> as follows: (a1,...,an) > (b1,...,bn) if there exists an index k ∈ {1,...,n} such that a<sup>k</sup> ><sup>k</sup> b<sup>k</sup> and a<sup>j</sup> <sup>j</sup> b<sup>j</sup> for all 1 j<k. Note that the lexicographic product > is well-founded if every ><sup>i</sup> is well-founded.

Given a set A, we write A<sup>k</sup> for the union of A<sup>i</sup> for all i k. If a strict order > on A is well-founded, then the restriction of >lex to A<sup>k</sup> is also well-founded, see [19, Section 3]. Thus, the lexicographic product given by

$$(\cong,\supset)\otimes(\cong^{\mathsf{eps}}\_{\mathsf{spo}},>^{\mathsf{eps}}\_{\mathsf{spo}})\otimes(\mathbb{k},\triangleright)$$

is a well-founded order on (T \V)× T -<sup>M</sup> × T . Here <sup>M</sup> stands for the maximum arity in the signature <sup>F</sup>, and lex spo for the reflexive closure of >lex spo.

**Lemma 2.** *The term* u *belongs to* SN(>spo) *whenever* t = f(t1,...,tn) >spo u *and* t1,...,t<sup>n</sup> ∈ SN(>spo)*.*

*Proof.* We show the claim by well-founded induction on (t,(t1,...,tn), u) with respect to . Here we proceed by analyzing the derivation of t >spo u. If t >spo u is derived from SPO 1 then t<sup>i</sup> spo u for some i ∈ {1,...,n}. In this case u ∈ SN(>spo) trivially follows from t<sup>i</sup> ∈ SN(>spo). If t >spo u is derived from SPO 2a or SPO 2b, then u is of the form g(u1,...,um) and t >spo u<sup>j</sup> for all j ∈ {1,...,m}. From u u<sup>j</sup> we have (t,(t1,...,tn), u) (t,(t1,...,tn), u<sup>j</sup> ). So from the induction hypothesis u<sup>i</sup> ∈ SN(>spo) for each j. For showing our goal u ∈ SN(>spo) fix an arbitrary term v with u >spo v. We further distinguish the case of SPO 2a and that of SPO 2b.

a. If t >spo u is derived from SPO 2a then t u. Thus, (t,(t1,...,tn), u) (u,(u1,...,um), v), and the induction hypothesis yields v ∈ SN(>spo).

<sup>2</sup> SN stands for strong normalization, which is another name of termination.

b. If t >spo <sup>u</sup> is derived from SPO 2b then we additionally have <sup>t</sup> <sup>∼</sup> <sup>u</sup> and (t1,...,tn) >lex spo (u1,...,um). Thus, (t,(t1,...,tn), u) (u,(u1,...,um), v) holds. So from the induction hypothesis we obtain v ∈ SN(>spo).

In either case v ∈ SN(>spo). So we conclude u ∈ SN(>spo). 

**Lemma 3.** *The relation* >spo *is well-founded.*

*Proof.* We show that t ∈ SN(>spo) by induction on |t|. If t is a variable trivially t ∈ SN(>spo). Otherwise, Lemma 2 applies. 

**Theorem 2.** *Every semantic path order is a stable well-founded order, provided that the signature is bounded.* 

In general, semantic path orders are not closed under contexts. For a remedy, Borralleras et al. [5] propose the use of another preorder with the *harmony* property. This results in monotonic semantic path orders.

**Definition 3 (**[4, Definition 4.1.20]**).** *A triple* (, <sup>∼</sup>, ) *is a* reduction triple *if is a rewrite preorder on terms,* ( <sup>∼</sup>, ) *is a stable order pair on* T \V *with well-founded, and and* <sup>∼</sup> *have the* harmony *property, meaning that for every* <sup>f</sup>(n) ∈ F *the implication*

$$s\_i \gtrsim t \implies f(s\_1, \dots, s\_i, \dots, s\_n) \underset{\sim}{\sqsupset} f(s\_1, \dots, t, \dots, s\_n)$$

*holds for all terms* s1,...,sn, t *and argument positions* 1 i n*.*

**Definition 4.** *Let* (, <sup>∼</sup>, ) *be a reduction triple, and let* <sup>&</sup>gt;spo *be the semantic path order induced from* ( <sup>∼</sup>, )*. The* monotonic semantic path order s >mspo <sup>t</sup> (MSPO) *is defined as* s t *and* s >spo t*.*

**Theorem 3.** *Every monotonic semantic path order is a reduction order, provided that the signature is bounded.*

*Proof.* The proof due to Borralleras et al. [5, Theorem 2] goes through. 

#### **4 Simulating WPOs by SPOs**

We show that WPOs are instances of SPOs by constructing a suitable order pair ( <sup>∼</sup>, ) from a weakly monotone well-founded algebra <sup>A</sup> and a well-founded precedence -. For terms <sup>s</sup> <sup>=</sup> <sup>f</sup>(s1,...,sm), t <sup>=</sup> <sup>g</sup>(t1,...,tn) we write <sup>s</sup> <sup>∼</sup> <sup>t</sup> if s ><sup>A</sup> t, or both s <sup>A</sup> t and f g. Similarly, we define s t if s ><sup>A</sup> t, or both s <sup>A</sup> t and f g. It is worth noting that the proof of [27, Lemma 8] also combines the interpretation order and precedence in a lexicographic manner.

**Lemma 4.** *The pair* ( <sup>∼</sup>, ) *is a stable order pair with well-founded.* 

In the remaining part of the section we consider the WPO >wpo induced by A and -, and the SPO <sup>&</sup>gt;spo induced by the corresponding order pair ( <sup>∼</sup>, ). Note that is not a strict part of <sup>∼</sup> in general, as <sup>&</sup>gt;<sup>A</sup> is not necessarily the strict part of A. This is why we decoupled from <sup>∼</sup> in Definition 2; see also Remark 1.

*Example 5.* Let the signature <sup>F</sup> <sup>=</sup> {f(1)}. Consider the trivial precedence <sup>f</sup> f and the algebra A over the carrier N with the interpretation fA(x)=2x. On the one hand we have <sup>f</sup>(f(x)) <sup>∼</sup> <sup>f</sup>(x) from <sup>f</sup>(f(x)) <sup>A</sup> <sup>f</sup>(x) but not <sup>f</sup>(x) <sup>∼</sup> <sup>f</sup>(f(x)) as f(x) <sup>A</sup> f(f(x)). On the other hand f(f(x)) f(x) does not hold.

We illustrate how the derivation of >wpo in Example 1 is simulated by the semantic path order.

*Example 6 (continued from Example 1).* From f(g(x)) <sup>A</sup> g(f(f(x))) and f g the inequality f(g(x)) g(f(f(x))) is obtained. Moreover, we have f(g(x)) ><sup>A</sup> f(f(x)). Since <sup>A</sup> has the subterm property, the subterm f(x) of f(f(x)) also satisfies f(g(x)) ><sup>A</sup> f(x). Thus we obtain f(g(x)) f(f(x)), f(x). Therefore, f(g(x)) >spo g(f(f(x))) is verified as follows:

f(g(*x*)) g(f(f(*x*))) f(g(*x*)) f(f(*x*)) f(g(*x*)) f(*x*) *<sup>x</sup>* spo *<sup>x</sup>* SPO 1 <sup>g</sup>(*x*) spo *<sup>x</sup>* SPO 1 <sup>f</sup>(g(*x*)) *<sup>&</sup>gt;*spo *<sup>x</sup>* SPO 2a <sup>f</sup>(g(*x*)) *<sup>&</sup>gt;*spo <sup>f</sup>(*x*) SPO 2a <sup>f</sup>(g(*x*)) *<sup>&</sup>gt;*spo <sup>f</sup>(f(*x*)) SPO 2a f(g(*x*)) *>*spo g(f(f(*x*)))

Similarly, f(h(x)) >spo h(h(f(x))) can be verified. Hence, the inclusion R ⊆ >spo holds. Observe that the use of WPO 1 in Example 1 is replaced by successive application of SPO 1 and SPO 2a.

As shown in the example, the subterm property of <sup>A</sup> is a key for filling in the gap between >spo and >wpo.

**Lemma 5.** *Suppose that* A *is simple. If* s >wpo t *then* s >spo t*.*

*Proof.* We prove the claim by induction on |s|+|t|. Let s = f(s1,...,sm) >wpo t. Depending on the derivation of s >wpo t, we distinguish five cases.


$$s \supset t \quad \frac{\begin{array}{c} \forall j. \ s >\_{\mathcal{A}} t\_j \\ \forall j. \ s >\_{\mathsf{wpo}} t\_j \\ \forall j. \ s >\_{\mathsf{spo}} t\_j \\ s >\_{\mathsf{spo}} g(t\_1, \ldots, t\_n) = t \end{array}}{\text{I.H.}} \text{I.H.}$$

– Suppose that s >wpo t is derived as follows:

$$\frac{s \geqslant\_{\mathcal{A}} t \qquad s\_i \geqslant\_{\mathsf{wpo}} t}{s = f(s\_1, \dots, s\_n) >\_{\mathsf{wpo}} t} \text{ WPO 2a}$$

By the induction hypothesis we have s<sup>i</sup> spo t for some i, and thus s >spo t. – Suppose that s >wpo t is derived as follows:

$$\frac{s \gg\_{\mathcal{A}} t \quad f \succ g \quad \forall j. \; s >\_{\mathsf{wpo}} t\_j}{s = f(s\_1, \dots, s\_n) >\_{\mathsf{wpo}} g(t\_1, \dots, t\_m) = t} \text{ WPO 2b(i)}$$

From s <sup>A</sup> t and f g we obtain s t. Thus, we have:

$$\begin{array}{c} \begin{array}{c} \forall j. \ s >\_{\mathsf{wps}} t\_j\\ \hline s \supset t \end{array} \frac{\forall j. \ s >\_{\mathsf{wps}} t\_j}{\forall j. \ s >\_{\mathsf{spo}} t\_j} \text{ I.H.}\\ \hline s = f(s\_1, \ldots, s\_n) >\_{\mathsf{spo}} g(t\_1, \ldots, t\_m) = t \end{array} \text{SPO 2a} $$

– Suppose that s >wpo t is derived as follows:

$$\frac{s \gg\_{\mathcal{A}} t \quad f \succeq\_{\sim} g \quad \forall j. \quad s \succ\_{\mathsf{wpo}} t\_j \quad (s\_1, \dots, s\_n) \succ\_{\mathsf{wpo}}^{\mathsf{lex}} (t\_1, \dots, t\_m)}{s = f(s\_1, \dots, s\_n) \succ\_{\mathsf{wpo}} g(t\_1, \dots, t\_m) = t} \text{ WPO 2b(ii)}$$

From <sup>s</sup> <sup>A</sup> <sup>t</sup> we obtain <sup>s</sup> <sup>∼</sup> <sup>t</sup>. Thus, we have:

$$\frac{s \underset{s \quad \stackrel{\scriptstyle \forall j. \ s >\_{\text{wpo}} t\_j}{\forall j. \ s >\_{\text{spo}} t\_j} \text{ I.H.}}{s \quad t >\_{\text{spo}} s\_j} \text{ I.H.} \quad \frac{(s\_1, \dots, s\_n) >\_{\text{wpo}}^{\text{le\\_ex}} (t\_1, \dots, t\_m)}{(s\_1, \dots, s\_n) >\_{\text{spo}}^{\text{le\\_ex}} (t\_1, \dots, t\_m)} \text{ I.H.}} \text{ I.H.}$$

In any case we have s >spo t.

Next we prove the converse direction of Lemma 5. The next lemma is a basic property of WPOs.

**Lemma 6.** *If* s >wpo t *then* s <sup>A</sup> t*.*

**Lemma 7.** *Suppose that* A *is simple. If* s >spo t *then* s >wpo t*.*

*Proof.* We prove the claim by induction on |s| + |t|. We distinguish three cases, depending on the derivation of s >spo t.

– Suppose that s >spo t is derived as follows:

$$\frac{s\_i \gtrsim\_{\text{spo }t}}{s = f(s\_1, \dots, s\_n) >\_{\text{spo }t}} \text{ SPO 1}$$

The induction hypothesis yields s<sup>i</sup> wpo t for some i. By Lemma 6 and the subterm property of <sup>A</sup> we have s <sup>A</sup> t. Thus, we obtain the following derivation of s >wpo t:

$$\frac{s \geqslant\_{\mathcal{A}} t \qquad s\_i \geqslant\_{\mathsf{wpo}} t}{s = f(s\_1, \dots, s\_n) >\_{\mathsf{wpo}} t} \text{ WPO 2a}$$

– Suppose that s >spo t is derived as follows:

$$\frac{s \sqsupset t \quad \forall j. \ s >\_{\mathsf{spo}} t\_j}{s = f(s\_1, \ldots, s\_n) >\_{\mathsf{spo}} g(t\_1, \ldots, t\_n) = t} \text{ SPO 2a}$$

According to the definition of s t, we further distinguish two subcases. If s ><sup>A</sup> t then s >wpo t is immediate. Otherwise, s <sup>A</sup> t and f g hold. In this case we derive s >wpo t as follows:

$$\frac{s \rhd\_{\mathcal{A}} t \quad f \succ g}{s = f(s\_1, \dots, s\_n) >\_{\mathsf{wpo}} g(t\_1, \dots, t\_m) = t} \text{ I.H.}$$

$$\frac{s = f(s\_1, \dots, s\_n) >\_{\mathsf{wpo}} g(t\_1, \dots, t\_m) = t}{s} \text{ WPO 2b(i)}$$

– Suppose that s >spo t is derived as follows:

$$\frac{s \underset{\asymp}{\rightleftarrows} t \quad \forall j. \quad s >\_{\mathsf{spo}} t\_j \quad \left(s\_1, \ldots, s\_n\right) >\_{\mathsf{spo}}^{\mathsf{lex}} \left(t\_1, \ldots, t\_m\right)}{s = f(s\_1, \ldots, s\_n) >\_{\mathsf{spo}} g(t\_1, \ldots, t\_m) = t} \text{ } \text{SPO 2b}$$

Because of <sup>s</sup> <sup>∼</sup> <sup>t</sup>, we have s ><sup>A</sup> <sup>t</sup> or both <sup>s</sup> <sup>A</sup> <sup>t</sup> and <sup>f</sup> g. In the former case s >wpo t is immediate. In the latter case s >wpo t is derived by WPO 2b(ii) as follows:

$$\frac{s \rhd\_{\mathcal{A}} t \quad t \urcorner \succ\_{\mathcal{A}} g \quad \frac{\forall j. \ s >\_{\mathsf{spo}} t\_j}{\forall j. \ s >\_{\mathsf{wpo}} t\_j} \text{ I.H.}}{s = f(s\_1, \dots, s\_n) >\_{\mathsf{wpo}} (t\_1, \dots, t\_m)} \text{ I.H.}$$

$$\frac{s = f(s\_1, \dots, s\_n) >\_{\mathsf{wpo}} g(t\_1, \dots, t\_m) = t}{s}$$

In any case we have s >wpo t.

As a consequence, >wpo and >spo coincide, provided that A is simple. This result can be extended to monotonic semantic path orders.

**Lemma 8.** *The triple* (A, <sup>∼</sup>, ) *is a reduction triple.* 

Let >mspo denote the monotonic semantic path order induced from <sup>A</sup> and >spo. Since s >wpo t implies s <sup>A</sup> t (Lemma 6), s >mspo t is equivalent to s >spo t. By using this equivalence together with Lemmata 5 and 7, we obtain the following result.

**Theorem 4.** *The three orders* >wpo*,* >spo*, and* >mspo *coincide, provided that* A *is simple.* 

#### **5 Generalized Weighted Path Orders**

According to Theorem 4, weighted path orders can be defined as monotonic semantic path orders. Moreover, Lemma 8 reveals that even for non-simple algebras the construction of reduction triples is valid. This observation suggests a generalization of weighted path orders, which does not impose simplicity on algebras. Besides, we exploit the fact that stable order pairs need not be closed under contexts, marking root symbols of function applications; see [1] and [5, Definition 5].

Let F be a signature. For each f ∈ F we associate a marked function symbol f- ∈ F / of the same arity. The set {f- <sup>|</sup> <sup>f</sup> ∈ F} is denoted by <sup>F</sup>-. For each term <sup>t</sup> <sup>=</sup> <sup>f</sup>(t1,...,tn) ∈ T (F, <sup>V</sup>) we denote <sup>f</sup>-(t1,...,tn) by t -. Let A be a weakly monotone well-founded (F∪F-)-algebra and a well-founded precedence on <sup>F</sup>. The pair ( <sup>∼</sup> - , -) of relations on T (F, V) \ V is defined as follows: Let <sup>s</sup> <sup>=</sup> <sup>f</sup>(s1,...,sn) and <sup>t</sup> <sup>=</sup> <sup>g</sup>(t1,...,tm). We write <sup>s</sup> <sup>∼</sup> t if s- ><sup>A</sup> t -, or s- <sup>A</sup> t - and f g. Similarly, we write s t if s- ><sup>A</sup> t -, or s- <sup>A</sup> t and f g. The relation is defined as the restriction of <sup>A</sup> to T (F, V).

**Proposition 2.** *The triple* (, <sup>∼</sup> - , -) *is a reduction triple on* T (F, V)*.* 

**Definition 5.** *The* generalized weighted path order (GWPO) >gwpo *induced from* A *and is the monotonic semantic path order induced from* (, <sup>∼</sup> - , -)*.*

**Corollary 1.** *Every generalized weighted path order is a reduction order, provided that the signature is bounded.* 

For convenience, we reformulate the definition of >gwpo in the style of Definition 1.

**Definition 6.** *The relation* >wpo *is defined on terms as follows:* s >wpo t *if* s = f(s1,...,sm) *and one of the following conditions hold.*


**Proposition 3.** *The SPO* <sup>&</sup>gt;spo *induced from* ( <sup>∼</sup> - , -) *coincides with* >wpo- *. For all terms* s *and* t *the relation* s >gwpo t *is equivalent to* s <sup>A</sup> t *and* s >wpo t*.* 

**Corollary 2.** *The relations* >gwpo *and* >wpo *coincide, provided that* A *is simple and* <sup>f</sup>A(x1,...,xn) = <sup>f</sup>- <sup>A</sup>(x1,...,xn) *for all* <sup>f</sup>(n) ∈ F*.* 

Since polynomial interpretation orders [15] and Knuth–Bendix orders [13] as well as LPOs are simulated by WPOs [27], they are also subsumed by GWPOs. We demonstrate termination proofs by GWPOs with a few examples. All examples are not handled by WPOs.

*Example 7.* Consider the TRS R for round-up division:

$$\begin{aligned} \mathfrak{p}(\mathfrak{0}) &\to \mathfrak{0} & \quad x - \mathfrak{0} \to x & \quad \mathfrak{0} \div \mathfrak{s}(y) \to \mathfrak{0} \\ \mathfrak{p}(\mathfrak{s}(x)) &\to x & \quad x - \mathfrak{s}(y) \to \mathfrak{p}(x) - y & \quad \mathfrak{s}(x) \div \mathfrak{s}(y) \to \mathfrak{s}((x - y) \div \mathfrak{s}(y)) \end{aligned}$$

Let A be the weakly monotone algebra on N with the interpretations

$$\begin{aligned} \mathfrak{M}\_{\mathcal{A}} &= 0 & \mathfrak{s}\_{\mathcal{A}}(x) &= x + 1 & \mathfrak{p}\_{\mathcal{A}}(x) &= x & x - \mathfrak{s}\_{\mathcal{A}}y &= x & x \div\_{\mathcal{A}} y &= x\\ \mathfrak{d}\_{\mathcal{A}}^{\sharp} &= 0 & \mathfrak{s}\_{\mathcal{A}}^{\sharp}(x) &= 0 & \mathfrak{p}\_{\mathcal{A}}^{\sharp}(x) &= 0 & x - \mathfrak{s}\_{\mathcal{A}}y &= y & x \div\_{\mathcal{A}}^{\sharp} y &= x + y \end{aligned}$$

and let be an arbitrary precedence. The GWPO induced from A and orients all rules in R. In particular, x−s(y) >wpo p(x)−y is derived from the inequalities <sup>x</sup> <sup>−</sup> <sup>s</sup>(y) <sup>&</sup>gt;<sup>A</sup> <sup>p</sup>(x) <sup>−</sup> <sup>y</sup> and <sup>x</sup> <sup>−</sup> s(y) ><sup>A</sup> p-(x).

*Example 8.* Consider the TRS R taken from [2, Example 4.28], which computes the bit length of a natural number:

$$\begin{array}{ll} \mathsf{hlat}(\mathsf{0}) \to \mathsf{0} & \mathsf{haff}(\mathsf{s}(\mathsf{0})) \to \mathsf{0} & \mathsf{haff}(\mathsf{s}(\mathsf{s}(x))) \to \mathsf{s}(\mathsf{haff}(x)) \\\\ \mathsf{hits}(\mathsf{0}) \to \mathsf{0} & \mathsf{bits}(\mathsf{s}(x)) \to \mathsf{s}(\mathsf{bits}(\mathsf{haff}(\mathsf{s}(x)))) \end{array}$$

Let A be the weakly monotone algebra on N with:

$$\begin{aligned} \mathsf{0}\_{\mathcal{A}} = 0 & \mathsf{s}\_{\mathcal{A}}(x) = x + 1 & \mathsf{h} \mathsf{h} \mathsf{f}\_{\mathcal{A}}(x) = \max\{0, x - 1\} & \mathsf{bits}\_{\mathcal{A}}(x) = x \\ \mathsf{0}\_{\mathcal{A}}^{\sharp} = 0 & \mathsf{s}\_{\mathcal{A}}^{\sharp}(x) = x + 1 & \mathsf{h} \mathsf{h} \mathsf{f}\_{\mathcal{A}}^{\sharp}(x) = \max\{0, x - 1\} & \mathsf{bits}\_{\mathcal{A}}^{\sharp}(x) = x \end{aligned}$$

The GWPO >gwpo induced by A and a precedence with half, bits s satisfies R ⊆ >gwpo as <sup>A</sup> r and >wpo r for all rules → r ∈ R. In particular, bits(s(x)) >wpo s(bits(half(s(x)))) is derived as follows. The inequality bits(s(x)) >wpobits(half(s(x))) is derived from repeated application of WPO 2a:

$$\frac{\mathsf{bits}^{\sharp}(\mathsf{s}(x)) >\_{\mathcal{A}} \mathsf{bits}^{\sharp}(\mathsf{half}(\mathsf{s}(x)))}{\mathsf{bits}^{\sharp}(\mathsf{s}(x)) >\_{\mathcal{A}} \mathsf{bits}^{\sharp}(\mathsf{s}(x))} \xrightarrow{\mathsf{bits}^{\sharp}(\mathsf{s}(x))} \frac{\mathsf{s}(x) >\_{\mathsf{s}\mathsf{pop}'} \mathsf{s}(x)}{\mathsf{bits}(\mathsf{s}(x)) >\_{\mathsf{s}\mathsf{pop}'} \mathsf{half}(\mathsf{s}(x))}$$

Thus, bits(s(x)) >wpo s(bits(half(s(x)))) follows from WPO 2b with bits s and bits- (s(x)) <sup>A</sup> <sup>s</sup>-(bits(half(s(x)))).

#### **6 Experimental Results**

In order to evaluate GWPOs in termination analysis we implemented a prototype termination tool based on Proposition 1 and Corollary 1. Following the automation techniques of WPO [27], we search a suitable weakly monotone well-founded algebra from two classes of algebras over N: One is *linear interpretation* and the other is *max/plus interpretation*. Since simplicity of algebras is not required for GWPOs, we may use more general forms of interpretations.

*Linear Interpretations.* Algebras A of this class use linear polynomials over <sup>N</sup> like Example 7. For each <sup>f</sup>(n) ∈ F∪F its interpretation is of the form fA(x1,...,xn) = c<sup>0</sup> + c1x<sup>1</sup> + ··· + cnx<sup>n</sup> where c<sup>0</sup> ∈ N and c1,...,c<sup>n</sup> ∈ {0, 1}. Simple monotone algebras for WPOs<sup>3</sup> are obtained by setting <sup>c</sup><sup>1</sup> <sup>=</sup> ··· <sup>=</sup> <sup>c</sup><sup>n</sup> <sup>=</sup> <sup>1</sup>, f<sup>A</sup> <sup>=</sup> <sup>f</sup>- <sup>A</sup> for all <sup>f</sup>(n) ∈ F, and those for Knuth–Bendix orders (KBOs) are obtained by further restriction for admissibility, see [27]. Comparison of linear polynomials is reduced to that of coefficients by using the following trivial fact:

**Proposition 4.** *Let* f(x1,...,xn) = c<sup>0</sup> + c1x<sup>1</sup> + ··· + cnx<sup>n</sup> *and* g(x1,...,xn) = d<sup>0</sup> + d1x<sup>1</sup> + ··· + dnx<sup>n</sup> *be linear polynomials over* N*. The next statements hold.*

*–* f g *if and only if* c<sup>0</sup> d<sup>0</sup> *and* c<sup>i</sup> d<sup>i</sup> *for all* 1 i n*. –* f>g *if and only if* c<sup>0</sup> > d<sup>0</sup> *and* c<sup>i</sup> d<sup>i</sup> *for all* 1 i n*.*

*Here* f g *(*f>g*) means that* f(a1,...,an) g(a1,...,an) *(*f(a1,...,an) > g(a1,...,an)*) for all* a1,...,a<sup>n</sup> ∈ N*.*

*Max/plus Interpretations.* Algebras A of this class use a combination of + and max like Example 8. For each <sup>f</sup>(n) ∈ F∪F its interpretation is of the form fA(x1,...,xn) = max{c0, c<sup>1</sup> + c <sup>1</sup>x1, ··· , c<sup>n</sup> + c <sup>n</sup>xn} where c<sup>0</sup> ∈ N, c1,...,c<sup>n</sup> ∈ Z and c 1,...,c <sup>n</sup> ∈ {0, 1}. Simple monotone algebras for WPOs are obtained by imposing c1,...,c<sup>n</sup> ∈ N, c <sup>1</sup> = ··· = c <sup>n</sup> = 1, f<sup>A</sup> <sup>=</sup> <sup>f</sup>- <sup>A</sup> for all <sup>f</sup>(n) ∈ F, and algebras for lexicographic path orders (LPOs) are obtained by additionally setting <sup>c</sup><sup>0</sup> <sup>=</sup> <sup>c</sup><sup>1</sup> <sup>=</sup> ··· <sup>=</sup> <sup>c</sup><sup>n</sup> = 0 for all <sup>f</sup>(n) ∈ F as in [27]. The restriction <sup>c</sup>1,...,c<sup>n</sup> <sup>∈</sup> <sup>N</sup> is necessary for WPOs because allowing c1,...,c<sup>n</sup> < 0 results in non-simple interpretations such as max{0, x − 1}. Under this form of algebras, an interpretation of a term is flattened to the form of max{g1,...,gm} where g1,...,g<sup>m</sup> are linear polynomials over N. So comparison of max/plus interpretation is reduced to that of coefficients, using the following trivial fact and Proposition 4 in turn:

**Proposition 5.** *Let* G *and* H *be non-empty sets of linear polynomials over* N*. The next statements hold.*


Since precedence constraints can be regarded as inequalities on natural numbers [28], searching a suitable combination of a precedence and an interpretation is done by solving linear arithmetic constraints (with if-then-else expressions).

The problem set for experiments consists of 1511 term rewrite systems from version 11.3 of the Termination Problem Database (TPDB) [21]. The reference implementation uses the SMT solver Z3 [17] as an external tool for solving linear constraints. The experiments were run on a PC with Intel Core i7-1065G7 CPU (1.30 GHz) and 16 GB memory.

<sup>3</sup> Our WPOs based on linear interpretations correspond to WPO(*Sum*) by Yamada et al. [27] but without status functions.


**Table 1.** Experiments on 1511 TRSs from TPDB 11.3.

Now let us discuss the experimental results.<sup>4</sup> Table 1 shows that, as a whole, use of non-simple algebras substantially improves termination analysis, at the small cost of extra running time. In particular, in the case of linear interpretation, GWPOs significantly outperform WPOs. As a matter of fact, linear WPOs are unable to orient variable duplicating rules → r such as f(x) → g(x, x) since <sup>A</sup> r cannot be satisfied, but this does not apply to GWPOs based on linear interpretations with {0, 1}-coefficients. In the case of max/plus interpretations there are two TRSs (with over 100 rules) that are proved to be terminating by WPOs, but not by GWPOs due to the time limit. This indicates that using non-simple algebras for max/plus interpretation can result in increase of search space. This is not the case for linear interpretations.

#### **7 Simulating Dependency Pairs by GWPOs**

The powerfulness of GWPOs revealed in Sect. 6 can partly be explained by the fact that GWPO is capable of simulating a basic result of the dependency pair method [1]. To show the fact, we recall the dependency pair method. The set DP(R) of *dependency pairs* of a TRS R is defined as follows:

$$\mathsf{DP}(\mathcal{R}) = \{ \ell^{\sharp} \to g^{\sharp}(t\_1, \dots, t\_n) \mid \ell \to r \in \mathcal{R}, \, r \rhd g(t\_1, \dots, t\_n), \,\text{and } g \in \mathcal{D}\_{\mathcal{R}} \}$$

An order pair (, ) on terms is a *reduction pair* if is a rewrite preorder and is a well-founded stable order. The following theorem states a basic result of the dependency pair method.

**Theorem 5 (**[1]**).** *A TRS* R *is terminating if* R ⊆ *and* DP(R) ⊆ *for some reduction pair* (, )*.*

We illustrate Theorem 5, using the fact that every weakly monotone algebra A on N induces the reduction pair (A, >A).

*Example 9.* Consider the TRS R = {f(f(x)) → f(g(f(x))), f(x) → g(x)}. We show the termination of R using Theorem 5. The set DP(R) consists of the two dependency pairs:

$$\mathsf{f}^{\sharp}(\mathsf{f}(x)) \to \mathsf{f}^{\sharp}(\mathsf{g}(\mathsf{f}(x))) \qquad \qquad \qquad \mathsf{f}^{\sharp}(\mathsf{f}(x)) \to \mathsf{f}^{\sharp}(x)$$

<sup>4</sup> The implementation and the detailed experimental data are available at: https:// www.jaist.ac.jp/project/maxcomp/23frocos/

By taking the {f, <sup>g</sup>, <sup>f</sup>-, g-}-algebra A with the interpretations

$$\mathsf{f}\_{\mathcal{A}}(x) = x + 1 \qquad \qquad \mathsf{g}\_{\mathcal{A}}(x) = 0 \qquad \qquad \mathsf{f}\_{\mathcal{A}}^{\sharp}(x) = x \qquad \qquad \mathsf{g}\_{\mathcal{A}}^{\sharp}(x) = 1$$

the inclusions R ⊆ <sup>A</sup> and DP(R) ⊆ ><sup>A</sup> hold. Hence, R is terminating.

We show that every termination proof by Theorem 5 with a weakly monotone algebra on N can be simulated by a GWPO. This class of algebras include linear polynomial interpretations and max/plus interpretations described in Sect. 6. Let <sup>R</sup> be a TRS and <sup>A</sup> a weakly monotone (F∪F-)-algebra on N satisfying R ⊆ <sup>A</sup> and DP(R) <sup>⊆</sup> <sup>&</sup>gt;A. Define the (F∪F-)-algebra B on N by

$$\begin{aligned} f\_{\mathcal{B}}(a\_1, \dots, a\_n) &= f\_{\mathcal{A}}(a\_1, \dots, a\_n) \\ f\_{\mathcal{B}}^\sharp(a\_1, \dots, a\_n) &= \begin{cases} f\_{\mathcal{A}}^\sharp(a\_1, \dots, a\_n) + 1 & \text{if } f \in \mathcal{D}\_{\mathcal{R}} \\ 0 & \text{otherwise} \end{cases} \end{aligned}$$

for each <sup>f</sup>(n) ∈ F. Let <sup>&</sup>gt;gwpo and <sup>&</sup>gt;wpo denote the orders induced from B and an arbitrary but fixed precedence. First, let us see that R ⊆ >gwpo holds for the last example.

*Example 10 (continued from Example 9).* The corresponding algebra B is:

$$\mathfrak{f}\_{\mathfrak{B}}(x) = x + 1 \qquad \qquad \mathfrak{g}\_{\mathfrak{B}}(x) = 0 \qquad \qquad \mathfrak{f}\_{\mathfrak{B}}^{\sharp}(x) = x + 1 \qquad \qquad \mathfrak{g}\_{\mathfrak{B}}^{\sharp}(x) = 0$$

We have R ⊆ <sup>B</sup> by construction. The inequality f(f(x)) >wpo f(g(f(x))) is derived by successive application of WPO 2a as follows:

$$\frac{\mathsf{f}^{\sharp}(\mathsf{f}(x)) >\_{\mathsf{B}} \mathsf{f}^{\sharp}(\mathsf{g}(\mathsf{f}(x)))}{\frac{\mathsf{f}^{\sharp}(\mathsf{f}(x)) >\_{\mathsf{B}} \mathsf{f}^{\sharp}(\mathsf{g}(\mathsf{f}(x)))}{\mathsf{f}(\mathsf{f}(x)) >\_{\mathsf{w}\mathsf{p}\mathsf{o}'} \mathsf{f}(x)} \xrightarrow{\mathsf{f}^{\sharp}(\mathsf{f}(x)) >\_{\mathsf{B}} \mathsf{f}^{\sharp}(x)} \frac{\mathsf{f}^{\flat}(\mathsf{f}(x)) >\_{\mathsf{w}\mathsf{p}\mathsf{o}'} x}{\mathsf{f}(x) >\_{\mathsf{w}\mathsf{p}\mathsf{o}'} \mathsf{f}(x)}$$

The inequality f(x) >wpo g(x) follows from f-(x) <sup>&</sup>gt;<sup>B</sup> <sup>g</sup>-(x). Hence R ⊆ >gwpo. Note that neither f-(f(x)) ><sup>A</sup> g-(f(x)) nor f-(x) ><sup>A</sup> g-(x) holds.

Now we verify that R ⊆ >gwpo holds in general. By construction R ⊆ <sup>B</sup> is immediate from R ⊆ A. So it remains to show R ⊆ >wpo- . We prove the following stronger property.

**Lemma 9.** *Let* → r ∈ R*. For every subterm* t *of* r *the relation* >wpot *holds.*

*Proof.* We use structural induction on t. If t is a variable, then x must be a subterm of , and thus >wpo t. Otherwise, t is in the form of g(t1,...,tn). The induction hypothesis yields >wpo t<sup>j</sup> for all 1 j n. We claim - ><sup>B</sup> t -, from which the desired inequality >wpo t follows by WPO 2a. To show the claim, consider an arbitrary assignment α for B. Depending on g, we distinguish two cases.


In either case [α]B(-) > [α]B(t -) is obtained. Hence, - ><sup>B</sup> t holds.

**Theorem 6.** *The inclusion* R ⊆ >gwpo *holds.*

# **8 Conclusion**

We have shown that weighted path orders can be simulated by a suitable variant of SPOs based on order pairs, and introduced a generalization of WPOs whose termination proving power goes beyond the realm of simple termination. To conclude the paper, we discuss related work and future work.

*Simulating KBOs by SPOs.* A key observation for simulating WPOs by SPOs is that weight comparison can be simulated by successive application of SPO 1 and SPO 2a as observed in Example 6. Another observation is that the SPOs are already reduction orders without a help of harmonious rewrite preorders. These two observations owe to Geser's work [9, Theorem 5], where it is shown that extended KBOs [7, Sect. 5] can be simulated by SPOs. Unifying our result and Geser's result is future work.

*General Path Orders.* In this paper the lexicographic versions of path orders were investigated. However, it is very likely that the same result can be obtained even if we adopt multiset comparison or status functions. General path orders (GPOs) [8,10] are a unifying framework for such extensions, parameterizing the way to compare arguments. It is worth investigating simulation results between GPOs and WPOs by extending the parameters of GPOs so as to take order pairs.

*Reduction Pairs Based on WPOs.* In order to build reduction pairs from WPOs Yamada et al. [27, Sect. 4] extended the definition of WPOs by the notion of *partial status function* π. The extension allows us to specify argument positions π(f)=[i1,...,im] compared in WPO 2b and WPO 2b(ii) for each function symbol <sup>f</sup>(n) ∈F∪F-. We anticipate that partial status functions can also be integrated into GWPOs and the thus-obtained version characterizes the reduction pair version of WPOs.

**Acknowledgements.** We are grateful to Vincent van Oostrom for his valuable questions and comments on our preliminary work. We also thank Alfons Geser for his support on literature. The suggestions by the anonymous referees greatly helped to improve the presentation of the paper.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# KBO Constraint Solving Revisited

Yasmine Briefs1,2(B) , Hendrik Leidinger1,2 , and Christoph Weidenbach<sup>1</sup>

<sup>1</sup> Max Planck Institute for Informatics, Saarbrücken, Germany {ybriefs,hleiding,weidenbach}@mpi-inf.mpg.de <sup>2</sup> Graduate School of Computer Science, Saarland Informatics Campus, Saarbrücken, Germany

Abstract. KBO constraint solving is very well-known to be an NPcomplete problem. Motivated by the needs of the family of SCL calculi, we consider the particular case where all terms occurring in a constraint are bound by a (single) ground term. We show that this problem and variants of this problem remain NP-complete even if the form of atoms in the constraint is further restricted. In addition, for a non-strict, partial term ordering solely based on symbol counting constraint solving remains NP-complete. Nevertheless, we provide a new simple algorithm testing KBO constraint solvability that performs well on benchmark examples.

Keywords: KBO Constraint Solving · NP-complete problem · Weight Ordering Constraint Solving

# 1 Introduction

The family of SCL calculi (Clause Learning from Simple Models) [2,5,13] perform reasoning on a set of first-order clauses. They develop a trail of ground literals with respect to a ground term (atom) bound β and an ordering ≺. All ground literals on the trail are ≺ (or ) smaller than the ground term (atom) β and ≺ should in particular have the property that for any term t there are only finitely many literals s such that s ≺ t. In case SCL does not detect a conflict with respect to a finite, exhaustive trail of ground literals, they constitute a model candidate for the clause set [4]. If SCL detects a conflict it learns a new first-order non-ground clause. It is derived by resolution and factoring with guidance from the trail. A natural choice for the ordering ≺ is the Knuth-Bendix (KBO) ordering [9]. For the ground case, a KBO relation can be efficiently computed [14]. All SCL calculi propagate literals from clauses with respect to the trail. For example, given a trail [P(a)] and a clause ¬P(x) ∨ R(x, y) the literal R(a, y) could be propagated. The SCL theory only enables ground literals on the trail, however, in practice it is not affordable to put all groundings of R(a, y) on the trail that are ≺ smaller than β. Therefore, we already considered trail literals with variables when we developed a two-watched literal scheme for SCL [3]. Recall that this propagation situation is not exceptional as typically not all literals in a clause carry all occurring variables. The consequence of this extension is that for SCL we now need to decide solvability of conjunctions of inequations t*<sup>i</sup>* ≺ β where the t*<sup>i</sup>* may contain (shared) variables, i.e., we have to decide solvability of a particular form of KBO constraints if ≺ is the KBO.

For the SCL(EQ) calculus [13] the requirements on constraint solving get more sophisticated. Now the trail is a sequence of unit (in)equalities and propagation and conflicting clauses are decided with respect to the resulting congruence. For an extended congruence closure algorithm [6,8,15,16] we need now in addition to inequations t*<sup>i</sup>* ≺ β to consider inequalities t*<sup>i</sup>* = s*<sup>i</sup>* in order to separate congruence classes. In its simplest form, constraints consist of inequations t*<sup>i</sup>* ≺ β and inequalities t*<sup>i</sup>* = s*<sup>i</sup>* where β and the s*<sup>i</sup>* are ground, so called *simple rightground* constraints, Definition 4. In a more general setting, the s*<sup>i</sup>* carry variables and then a quantifier alternation on variables occurring in s*<sup>i</sup>* but not in t*<sup>i</sup>* needs to be considered. Such constraints are called *alternating*, Definition 27.

In this paper we investigate the complexity of all these variants with respect to a KBO <, Definitions 3, 4, 25, 27, but also a weaker non-strict ordering based on pure symbol counting, Definition 22. Except for constraints bound by a single ground term, Proposition 26, all problems are NP-hard, Propositions 5, 21, 24, 28.

Korovin and Voronkov developed a decision procedure [10] for KBO constraints consisting of inequations s*<sup>j</sup>* < t*<sup>j</sup>* only and refined it to an NP algorithm [11]. According to Löchner [14], these results are "of more theoretical interest" because they are "too involved to be implemented with reasonable effort". In fact, to the best of our knowledge we present the first implemented algorithm for KBO constraint solving in this paper. Later, Korovin and Voronkov [12] showed that checking satisfiability of a KBO constraint consisting of a single inequation s<t can be done in polynomial time. For the special case of a right-ground constraint consisting of a single inequation s<t, what their algorithm essentially does is assigning the minimal constant to every variable.

To the best of our knowledge the problem of simple right-ground KBO constraints has never been studied before. We are also not aware of any implementation of a KBO constraint solving algorithm. The paper is now organized as follows: In Sect. 3 we prove the NP-completeness of this problem and present an algorithm to solve it. In Sect. 4 we study the complexity of variants of this problem including alternating constraints. We also consider a non-strict, partial ordering based on symbol counting and weaker than a KBO. The algorithm for right-ground constraints is extended to alternating constraints. In Sect. 5 we put the algorithm developed in Sects. 3 and 4 to practice and end the paper with a discussion of the obtained results, Sect. 6.

#### 2 Preliminaries

In the following let Σ be a *signature*, i.e., a finite set of function symbols. Every function symbol f has an associated *arity* which we denote by arity(f). Function symbols c with arity(c)=0 are called *constants*. We denote the set of all *terms* by T(Σ, X ) where X is an infinite set of variables. *Vars*(t) denotes the set of variables occurring in the term t. A term t is called *ground* if it contains no variables, i.e., *Vars*(t) = ∅. The set of all *ground terms* is denoted by T(Σ). We assume that Σ contains at least one non-constant function and at least one constant, i.e., that T(Σ) is infinite. For otherwise, constraint solving becomes trivial. A *substitution* is a mapping σ : X → T(Σ, X ) such that σ(x) = x for only finitely many x ∈ X . The application tσ of a substitution σ to a term t ∈ T(Σ, X ) is defined in the usual way. We call a substitution *grounding* for some term t ∈ T(Σ, X ) if tσ is ground. A substitution σ is a *matcher* from s to t if sσ = t. We consider the following version of the Knuth-Bendix ordering (KBO) on ground terms:

Definition 1 (KBO on Ground Terms [9]). *Let be a strict total ordering (a precedence) on* <sup>Σ</sup>*, and* <sup>w</sup> : <sup>Σ</sup> <sup>→</sup> <sup>N</sup><sup>+</sup> *a weight function.* <sup>w</sup> *is extended to terms recursively by* w(f(t1,...,t*n*)) = w(f)+*n <sup>i</sup>*=1 w(t*i*)*. The Knuth-Bendix ordering* >*KBO induced by and* w *is defined by* s >*KBO* t *iff*

$$1. \ w(s) > w(t), \ or \\\ 2. \ w(s) = w(t), \ and\\\ (a) \ s = f(s\_1, \dots, s\_m), \ t = g(t\_1, \dots, t\_n) \ and \ f \succ g, \ or\\\ (b) \ s = f(s\_1, \dots, s\_m), \ t = f(t\_1, \dots, t\_m) \ and \ (s\_1, \dots, s\_m) >\_{KBO}^{lex}(t\_1, \dots, t\_m).$$

In particular, the precedence is strict and total, no unary function f with w(f)=0 is allowed and all weights are natural numbers. It can be shown that >KBO is a strict, total and well-founded ordering on ground terms. In the following, we simply write > for >KBO.

Definition 2. *A KBO constraint* C *is a finite set of atoms* t#s *where* t, s ∈ T(Σ, X ) *and* # ∈ {<, >, =, ≤, ≥, =}*. We say that* C = {t1#1s1,...,t*n*#*n*s*n*} *is satisfiable if there exists a substitution* σ *that is grounding for all* t*<sup>j</sup>* , s*<sup>j</sup> such that*

$$\bigwedge\_{j=1}^{n} t\_j \sigma \not\equiv\_j s\_j \sigma.$$

*Such a grounding substitution* σ *is called a solution.*

Definition 3. *A right-ground KBO constraint* C *is a KBO constraint where* s1,...,s*<sup>n</sup>* ∈ T(Σ)*, i.e., only the* t*<sup>j</sup> may contain variables.*

Definition 4. *A simple right-ground KBO constraint* C *is a right-ground KBO constraint where* # ∈ {<, =}*.*

*For simple right-ground KBO constraints, we prefer more explicit notation: We now assume* t1,...,t*n*, l1,...,l*<sup>m</sup>* ∈ T(Σ, X )*,* s1,...,s*n*, r1,...,r*<sup>m</sup>* ∈ T(Σ) *and call* C *satisfiable if there exists a substitution* σ *that is grounding for all* t*<sup>j</sup>* , l*<sup>j</sup> such that*

$$\left(\bigwedge\_{j=1}^n t\_j \sigma < s\_j\right) \land \left(\bigwedge\_{j=1}^m l\_j \sigma \neq r\_j\right).$$

# 3 Simple, Right-Ground KBO Constraints

We start by investigating the complexity of simple, right-ground KBO constraint solving.

Proposition 5. *Checking satisfiability for simple right-ground KBO constraints is NP-hard.*

*Proof.* We reduce from MONOTONE 3SAT which is NP-complete by [7]. Let N M be a set of clauses where N consists of the clauses with only positive literals and M consists of the clauses with only negative literals. We consider a signature with a constant a, a ternary function f and a unary function g. We use a KBO instance where all weights are 1 and f g a. For every propositional variable P occurring in N M, we introduce a variable x*<sup>P</sup>* . Then the equation x*<sup>P</sup>* = a stands for P is *true* and x*<sup>P</sup>* = a stands for P is *false*.

Now every positive clause (P ∨ Q ∨ R) ∈ N is encoded as an inequation f(x*<sup>P</sup>* , x*Q*, x*R*) < f(g(a), g(a), g(a)). Obviously, this inequation can only be satisfied by a grounding that maps at least one of these variables to a, i.e., that sets at least one of P, Q, R to *true*.

Every negative clause (¬P ∨ ¬Q ∨ ¬R) ∈ M is encoded as an inequality f(x*<sup>P</sup>* , x*Q*, x*R*) = f(a, a, a). Obviously, this can only be satisfied if not all of these variables are mapped to a, i.e., if at least one of P, Q, R is *false*.

Now the clause set has a solution iff there is a solution to the constructed simple right-ground KBO constraint. Assume N M is satisfiable by a valuation β. Then for every propositional variable P map x*<sup>P</sup>* to a if β(P)=1 and to g(a) otherwise. As explained above, this grounding will satisfy the constraint. Now let σ be a solution to the constraint. Then the valuation β where β(P)=1 if σ(x*<sup>P</sup>* ) = a and β(P)=0 otherwise satisfies N M.

We have added |M| inequalities and |N| inequations which can be constructed in polynomial time, so the reduction works in polynomial time.

Proposition 6. *Checking satisfiability for simple right-ground KBO constraints is in NP.*

*Proof.* Let C = {t<sup>1</sup> < s1,...,t*<sup>n</sup>* < s*n*, l<sup>1</sup> = r1,...,l*<sup>m</sup>* = r*m*} be a constraint. If for some inequality l*<sup>j</sup>* = r*<sup>j</sup>* , there is no matcher from l*<sup>j</sup>* to r*<sup>j</sup>* , we can ignore this inequality since it is true for every grounding. If for some inequality l*<sup>j</sup>* = r*<sup>j</sup>* , it actually holds that l*<sup>j</sup>* = r*<sup>j</sup>* , then this inequality is impossible to satisfy, so we are done. After sorting out these two cases, as r*<sup>j</sup>* is ground, every inequality l*<sup>j</sup>* = r*<sup>j</sup>* has a unique matcher τ*<sup>j</sup>* which has linear size with respect to r*<sup>j</sup>* . In the following, we say that the term τ*<sup>j</sup>* (x) is restricted by the inequality l*<sup>j</sup>* = r*<sup>j</sup>* . The inequality l*<sup>j</sup>* = r*<sup>j</sup>* then signifies

$$\bigvee\_{x \in Vars(l\_j)} \sigma(x) \neq \tau\_j(x).$$

For the inequations t*<sup>j</sup>* < s*<sup>j</sup>* , it is obviously optimal to assign the smallest possible term to every variable. Larger terms only have to be considered due to the inequalities l*<sup>j</sup>* = r*<sup>j</sup>* . If there is a grounding σ that satisfies t*<sup>j</sup>* < s*<sup>j</sup>* , then any grounding σ with σ (x) ≤ σ(x) for all variables x satisfies t*<sup>j</sup>* < s*<sup>j</sup>* . Hence, if there exists a solution, then there also exists a solution that only uses the m + 1 smallest terms for every variable. This is because every inequality l*<sup>j</sup>* = r*<sup>j</sup>* only restricts at most one term for every variable, so for every variable the m + 1 smallest terms contain the smallest term that is not restricted for that variable.

As we only have to consider the m + 1 smallest terms for every variable, the size of the groundings we have to consider is polynomially bounded by the input size. Let f be the function with the maximal arity and let p = arity(f). Let a be the smallest constant. We claim that every of the m + 1 smallest terms has at most mp + 1 symbols. Proof by contradiction: Assume t<sup>0</sup> is one of the m + 1 smallest terms with #t<sup>0</sup> > mp + 1. Perform the following m times: Obtain t*i*+1 by replacing any subterm g(s1,...,s*n*), where the s*<sup>i</sup>* are constants, by a. The number of symbols decreases by at most p, so #t*<sup>i</sup>* > (m−i)p+ 1. As none of the t*<sup>i</sup>* is a constant, such a subterm always exists. After m steps, we obtain terms t<sup>0</sup> > t<sup>1</sup> > ··· > t*<sup>m</sup>* with #t*<sup>m</sup>* > (m − m)p +1=1, i.e., t*<sup>m</sup>* is not a constant, so t*<sup>m</sup>* > a. This contradicts the fact that t<sup>0</sup> was one of the m + 1 smallest terms since at least m + 1 terms are smaller than t0. Thus, we can guess a grounding and check in polynomial time whether it is a solution.

Next we propose an algorithm for testing satisfiability of simple right-ground KBO constraints. Of course, by Proposition 6, there already exists an algorithm, but we expect that the following algorithm performs better in practice. Let C be a simple right-ground KBO constraint with n inequations t*<sup>j</sup>* < s*<sup>j</sup>* and m inequalities l*<sup>j</sup>* = r*<sup>j</sup>* .

Assume that *Vars* ({t*<sup>j</sup>* | 1 ≤ j ≤ n}∪{l*<sup>j</sup>* | 1 ≤ j ≤ m}) = {x1,...,x*k*}. As explained in the proof of Proposition 6, we only have to consider the m + 1 smallest terms for the grounding, so to begin, we generate an ordered list S of the m+ 1 smallest terms. This way, a grounding substitution σ corresponds to a vector v <sup>∈</sup> <sup>N</sup>*<sup>k</sup>* where <sup>v</sup>*<sup>i</sup>* < m + 1 is the index of the term <sup>σ</sup>(x*i*) in <sup>S</sup>, i.e., <sup>S</sup>[v*i*] = σ(x*i*). Let σ(v) with σ(v)(x*i*) := S[v*i*] denote the grounding corresponding to the vector v. Later on, we give a dynamic programming algorithm to compute the k smallest terms for some number k. Actually, we do not directly generate the m + 1 smallest terms, but start with a constant number of terms and generate more terms as needed.

The algorithm is given by three inference rules that are represented by an abstract rewrite system. They operate on a state which is either ⊥ or a four-tuple (T; v; <sup>F</sup>; <sup>C</sup>) where <sup>T</sup> is a sequence of variables, the *trace*; v <sup>∈</sup> <sup>N</sup>*<sup>k</sup>* is a grounding substitution in vector notation, the *current grounding*; F is a set of *forbidden* groundings; and C is a *simple right-ground KBO constraint*. The initial state for a constraint C is (ε; (0,..., 0); ∅; C), i.e., the *trace* is empty, every variable is mapped to the smallest constant and there are no *forbidden* groundings.

We use the following partial ordering ≤*<sup>F</sup>* on groundings: v ≤*<sup>F</sup>* u iff for all i ∈ {1,...,k} we have v*<sup>i</sup>* ≤ u*i*. By *inc*(v, i) we denote the grounding v with v *<sup>i</sup>* = v*<sup>i</sup>* + 1 and v *<sup>l</sup>* = v*<sup>l</sup>* for all l ∈ {1,...,k} with l = i, i.e., the grounding where we increase the term for the variable x*<sup>i</sup>* by one. Analogously, we define *dec*(v, i), where we instead decrease the term for the variable x*<sup>i</sup>* by one, i.e., v *<sup>i</sup>* = v*<sup>i</sup>* − 1. The two operations *inc* and *dec* are only used when they are well-defined, i.e., they yield a grounding v <sup>∈</sup> <sup>N</sup>*<sup>k</sup>* where <sup>v</sup>*<sup>i</sup>* < m + 1. The operation *inc* is only used when an inequality l*<sup>j</sup>* = r*<sup>j</sup>* is not satisfied, and this can happen at most m times without intermediate Backtrack steps. The operation *dec*(v, i) is only used for Backtrack, and by Lemma 15, in this case v*<sup>i</sup>* > 0.

The role of F is that we want to keep the algorithm from considering wrong groundings again. For all u ∈ F, we do not visit states with grounding v if v ≥*<sup>F</sup>* u. When we Backtrack, we insert the current grounding into F. The trace T records the last updated variables so Backtrack is able to undo the last Increase operation. As will be proven in Theorem 18, the algorithm terminates in ⊥ iff there exists no solution, and if there exists a solution, then it terminates in a state where the current grounding v is a solution.

Increase (T; v; F; C) ⇒KCS (T x*i*; v ; F; C)

provided v = *inc*(v, i), l*j*σ(v) = r*<sup>j</sup>* for some l*<sup>j</sup>* = r*<sup>j</sup>* ∈ C, l*j*σ(v ) = r*<sup>j</sup>* and there is no u ∈ F with v ≥*<sup>F</sup>* u

Backtrack (T x*i*; v; F; C) ⇒KCS (T; v ; F ∪ {v}; C)

provided v = *dec*(v, i) and either


$$\begin{array}{ccc} \textbf{Fail} & & (\varepsilon; \vec{v}; F; C) & \Rightarrow\_{\textbf{KCS}} & \bot \end{array}$$

provided either

1. l*j*σ(v) = r*<sup>j</sup>* for some l*<sup>j</sup>* = r*<sup>j</sup>* ∈ C, but for all l ∈ {1,...,k}, we have that l*j*σ(*inc*(v,l)) = r*<sup>j</sup>* implies that there is a u ∈ F with *inc*(v,l) ≥*<sup>F</sup>* u, or 2. t*j*σ(v) ≥ s*<sup>j</sup>* for some t*<sup>j</sup>* < s*<sup>j</sup>* ∈ C

Informally, Increase is applicable if some inequality l*<sup>j</sup>* = r*<sup>j</sup>* is not fulfilled and we can fix this with the new grounding *inc*(v, i) which is not forbidden by F. Backtrack undoes an operation and is applicable if either some inequality l*<sup>j</sup>* = r*<sup>j</sup>* is not fulfilled, but Increase is not applicable, or if some inequation t*<sup>j</sup>* < s*<sup>j</sup>* is not fulfilled. Fail is applicable if Backtrack would be applicable on an empty *trace*, i.e., there is no operation to undo.

Obviously, there is no state on which we can apply both Backtrack and Fail.

Definition 7. *A reasonable strategy is a strategy that prefers Backtrack and Fail over Increase.*

*Example 8.* Consider a signature with constants a, b, c and a binary function f. We set w(a) = 1; w(b) = w(c) = 2; w(f)=3 and a ≺ b ≺ c ≺ f. We consider the constraint

$$C = \{x\_1 \neq a, f(x\_1, x\_2) < f(a, c)\}.$$

The m+ 1 smallest terms, where m = 1, are a, b. This is the unique execution of the algorithm. In order to increase readability, for v, we write the terms instead of the indices.

$$\begin{array}{c} \quad \left(\varepsilon; (a, a); \emptyset; C\right) \\ \Rightarrow \text{KCS} \\ \Rightarrow \text{Backtrack} \\ \Rightarrow \text{KCS} \end{array} \qquad \left(x\_1; (b, a); \emptyset; C\right) \\ \left(\varepsilon; (a, a); \{(b, a)\}; C\right) \\ \Rightarrow \begin{array}{c} \text{Backtrack} \\ \left(\varepsilon; (a, a); \{(b, a)\}; C\right) \\ \bot \end{array}$$

The algorithm terminates in ⊥, so there is no solution.

*Example 9.* Consider a signature with constants a, b, a binary function g and a ternary function f. Let w(a)=1, w(b) = w(f) = w(g)=2 and a ≺ b ≺ g ≺ f. The constraint is

$$C = \{x\_1 < b, g(x\_2, a) < g(b, b), f(x\_1, x\_2, x\_3) \neq f(a, a, a), g(x\_1, x\_2) \neq g(a, b)\}.$$

The m + 1 smallest terms, where m = 2, are a, b, g(a, a).

(ε; (a, a, a); ∅; C) <sup>⇒</sup>Increase KCS (x1; (b, a, a); ∅; C) <sup>⇒</sup>Backtrack KCS (ε; (a, a, a); {(b, a, a)}; C) <sup>⇒</sup>Increase KCS (x2; (a, b, a); {(b, a, a)}; C) <sup>⇒</sup>Increase KCS (x2x2; (a, g(a, a), a); {(b, a, a)}; C) <sup>⇒</sup>Backtrack KCS (x2; (a, b, a); {(b, a, a),(a, g(a, a), a)}; C) <sup>⇒</sup>Backtrack KCS (ε; (a, a, a); {(b, a, a),(a, g(a, a), a),(a, b, a)}; C) <sup>⇒</sup>Increase KCS (x3; (a, a, b); {(b, a, a),(a, g(a, a), a),(a, b, a)}; C)

The algorithm has found a solution, so no rule is applicable and it terminates. Note that after the third and fifth operation, we cannot increase x<sup>1</sup> because (b, b, a) ≥*<sup>F</sup>* (b, a, a) ∈ F.

Next we prove the correctness of the algorithm.

Lemma 10. *If* (ε; (0,..., 0); <sup>∅</sup>; <sup>C</sup>) <sup>⇒</sup>*<sup>l</sup> KCS* (T; v ; F; C)*, then there is no* u ∈ F *with* v ≥*<sup>F</sup>* u*.*

*Proof.* We prove this by induction on l. For l = 0, this holds since F = ∅. For l > 0, the last applied rule must have been either Increase or Backtrack. If the last applied rule was Increase, then there cannot be such a u because Increase does not modify F and because this is part of the condition of the Increase rule. Now assume the last applied rule was Backtrack, so the previous state was (T x*i*; v; F ; C) with v = *dec*(v, i) and F = F ∪ {v}. If there was some u in F such that v ≥*<sup>F</sup>* u, then, since v <*<sup>F</sup>* v, we have v >*<sup>F</sup>* u. Hence, by the induction hypothesis, u /∈ F , so as F = F ∪ {v}, it must hold that u = v, contradiction to v >*<sup>F</sup>* u.

Lemma 11. *If* (ε; (0,..., 0); <sup>∅</sup>; <sup>C</sup>) <sup>⇒</sup>*<sup>i</sup> KCS* (T; u; <sup>F</sup>; <sup>C</sup>) <sup>⇒</sup>*<sup>l</sup> KCS* (T ; u ; F ; C) *for* l > 0*, then* u = u *or* F = F *.*

*Proof.* If all l rule applications are applications of the Increase rule, then clearly u <*<sup>F</sup>* u , so in particular, u = u . There is no rule that removes elements from F, so F ⊆ F . If there is at least one application of the Backtrack rule among the l rule applications, the current assignment v is added to F, and by Lemma 10, v /∈ F, so F is modified and F = F .

Proposition 12. ⇒*KCS is well-founded, i.e., the algorithm always terminates.*

*Proof.* By Lemma 11, we can reach every combination of v and F at most once. For v, there are (m + 1)*<sup>k</sup>* possibilities. We only add occurring groundings to F, so the number of possibilities for F is upper bounded by the number of subsets of all possible groundings which is 2(*m*+1)*<sup>k</sup>* . Thus, the number of reached states is finite (it is at most (m + 1)*<sup>k</sup>*2(*m*+1)*<sup>k</sup>* ), so the algorithm terminates.

Of course, the upper bounds in the proof of Proposition 12 are far too high and the algorithm will run much faster in practice.

Lemma 13. *If* (ε; (0,..., 0); <sup>∅</sup>; <sup>C</sup>) <sup>⇒</sup>*<sup>l</sup> KCS* (T; v; F; C) *and* u ∈ F*, then for all* u ≥*<sup>F</sup>* u *it holds that* u *cannot be a solution.*

*Proof.* The proof is by induction on l. For l = 0, we have F = ∅, so this holds. For l > 0, if the last applied rule was Increase, the statement follows by the induction hypothesis since F is not modified. Now assume that the last applied rule was Backtrack. Let (T ; v ; F ; C) be the previous state. We only have to show that all u ≥*<sup>F</sup>* v cannot be solutions, for all other elements of F = F ∪ {v }, this follows by the induction hypothesis. First assume that Backtrack is applicable because of condition (1). Then v cannot be a solution since l*j*σ(v ) = r*<sup>j</sup>* . For u >*<sup>F</sup>* v , if l*j*σ(u ) = r*<sup>j</sup>* , then u clearly cannot be a solution. Otherwise, there is a variable x*<sup>i</sup>* such that u ≥*<sup>F</sup> inc*(v , i) and l*j*σ(*inc*(v , i)) = r*<sup>j</sup>* . However, it is part of condition (1) that then, there is an element u ∈ F with u ≤*<sup>F</sup> inc*(v , i) ≤*<sup>F</sup>* u , so by the induction hypothesis, u cannot be a solution. If Backtrack is applicable because of condition (2), then t*i*σ(v ) ≥ s*<sup>i</sup>* for some i ∈ {1,...,n}. Clearly, if u ≥*<sup>F</sup>* v , then also t*i*σ(u ) ≥ s*i*, so u cannot be a solution. Corollary 14. *If* (ε; (0,..., 0); <sup>∅</sup>; <sup>C</sup>) <sup>⇒</sup>*<sup>l</sup> KCS* (T; v; F; C) *and condition (1) or condition (2) of Fail is fulfilled for* (T; v; F)*, then for all* u ≥*<sup>F</sup>* v*,* u *cannot be a solution.*

*Proof.* The conditions for Fail are the same as the conditions for Backtrack, so this follows by the proof of Lemma 13.

Lemma 15. *If* (ε; (0,..., 0); <sup>∅</sup>; <sup>C</sup>) <sup>⇒</sup>*<sup>l</sup> KCS* (T; v; F; C)*, then for all* i ∈ {1,...,k} *the number of occurrences of* x*<sup>i</sup> on the trace* T *equals* v*i.*

*Proof.* In the following, we denote the number of occurrences of x*<sup>i</sup>* in T by C(T,x*i*). The proof is by induction on l. If l = 0, the statement trivially holds. For l > 0 let (T ; v ; F ; C) be the previous state. If the last applied rule was Increase, then T = T x*<sup>i</sup>* and v = *inc*(v , i), so

$$C(T, x\_i) = C(T', x\_i) + 1 \overset{\text{IH}}{=} v\_i' + 1 = v\_i.$$

For j = i, C(T,x*<sup>j</sup>* ) = C(T , x*<sup>j</sup>* ) and v*<sup>j</sup>* = v *<sup>j</sup>* , so the statement follows by the induction hypothesis. If the last applied rule was Backtrack, then T = T x*<sup>i</sup>* and v = *dec*(v , i), so

$$C(T, x\_i) = C(T', x\_i) - 1 \overset{\text{IH}}{=} v\_i' - 1 = v\_i.$$

Again, for j = i, C(T,x*<sup>j</sup>* ) = C(T , x*<sup>j</sup>* ) and v*<sup>j</sup>* = v *<sup>j</sup>* , so the statement follows by the induction hypothesis.

Lemma 16. *If* (ε; (0,..., 0); <sup>∅</sup>; <sup>C</sup>) <sup>⇒</sup>*<sup>l</sup> KCS* (T; v; <sup>F</sup>; <sup>C</sup>) <sup>⇒</sup>*Fail KCS* ⊥*, then there exists no solution.*

*Proof.* Since Fail is applicable on (T; v; F; C), T = ε, so by Lemma 15, v = (0,..., 0). Hence, by Corollary 14, for all u ≥*<sup>F</sup>* (0,..., 0), u cannot be a solution, so there exists no solution.

Lemma 17. *If* (ε; (0,..., 0); <sup>∅</sup>; <sup>C</sup>) <sup>⇒</sup>*<sup>l</sup> KCS* (T; v; F; C) *and no rule is applicable on* (T; v; F; C) *then* v *is a solution.*

*Proof.* Assume that for some j ∈ {1,...,n}, we had t*j*σ(v) ≥ s*<sup>j</sup>* . Then, either Backtrack or Fail would be applicable. Now assume that for some j ∈ {1,...,m}, we had l*j*σ(v) = r*<sup>j</sup>* . Then, either Increase or Backtrack would be applicable.

Theorem 18. *The algorithm is correct: If there exists a solution, then starting from* (ε; (0,..., 0); ∅; C)*, the algorithm terminates in a state* (T; v; F; C) *where* v *is a solution. If there is no solution, the algorithm terminates in* ⊥*.*

*Proof.* Follows by Proposition 12, Lemma 16 and Lemma 17.

We have implemented the above algorithm in the context of the SPASS reasoning workbench. The efficiency of the algorithm depends on the respective variables we choose for Increase. If there exists a solution, then there exists an execution using only the rule Increase. The following criteria might be useful to select the best variable for Increase:


It is possible to calculate and maintain some score for every variable here and decide based on this score. The exact selection criteria still need to be further explored.

A remaining problem from the presentation of the algorithm is how to compute the k smallest terms. If the occurring weights are rather small, the following dynamic programming algorithm might be useful in practice. The idea is to compute all terms of a specific weight for increasing weights until we generated at least k terms. Unfortunately, there may be exponentially many terms of a specific weight where the exponent is the maximal arity of a function and the base is the number of terms of smaller weights. However, k is bounded above by the number of inequalities m, the number of terms with smaller weights is bounded above by k and the maximal arity is probably small, so it is to be expected that this is not a big problem.

As it is probably hard to find the next possible weight, we simply always increase the weight by 1 starting by the weight of the smallest constant. Our DP array is two-dimensional, one dimension having the weight and the other dimension having the size of the tuple from 1 to *max\_arity*. Actually, it is fourdimensional since every entry is a list of tuples of terms and every tuple is a list of its entries. A tuple of size 1 is just a term of the specific weight. The tuples of larger size are needed for the DP transitions where they serve as argument tuples for the functions. We maintain an array *smallest\_terms* that will in the end contain the at least k smallest terms.

We iterate over the weights starting at the weight of the minimal constant. Let *curweight* denote the current weight. The idea is to compute all terms of weight *curweight*, sort them, add them to *smallest\_terms*, and proceed with weight *curweight* + 1 if |*smallest\_terms*| is still smaller than k. To do so, if *curweight* is not the smallest weight, we first compute the tuples of size 2 to *max\_arity* for the previous weight. This is done via DP: For tuple size i we iterate over the terms s ∈ *smallest\_terms*. Then we iterate over the tuples t of size i−1 and weight *curweight*−1−w(s) using the DP array and add (s, t) to the current DP entry. Afterwards, we calculate all terms of weight *curweight* by iterating over all symbols f and all tuples t of size arity(f) and weight *curweight* − w(f) using the DP array. Then, the term f(t) has weight *curweight*.

We finish this section by a discussion of potential heuristics, sufficient conditions for a simple right-ground KBO constraint to have a solution. As explained before, every inequality l*<sup>j</sup>* = r*<sup>j</sup>* rules out any assignment that satisfies τ*<sup>j</sup>* , the matcher from l*<sup>j</sup>* to r*<sup>j</sup>* . Now assume we have m inequalities and know that there are more than m solutions for the inequation t<s, then one might think that there is a grounding that solves all inequalities l*<sup>j</sup>* = r*<sup>j</sup>* and the inequation t<s. However, this is not true.

*Example 19.* Consider a signature with constants a, b and c and a binary function f. The weights are w(a) = 1; w(b) = w(c) = 2; w(f)=3 and we use a ≺ b ≺ c ≺ f as a precedence. Now consider the constraint

$$C = \{ x \neq a, f(x, y) < f(a, c) \}.$$

The inequation has two solutions, namely {x → a, y → a} and {x → a, y → b}. However, it has no solution where x is not mapped to a, so for the overall problem, there is no solution.

So the above sufficient condition needs to refined in order to be correct. However, calculating the number of solutions is again NP-hard.

Proposition 20. *Calculating the number of solutions* σ *for some right-ground inequation* t<s *is NP-hard.*

*Proof.* We reduce from the Unbounded Subset Sum Problem (USSP) which is NP-complete by [7]. Let <sup>s</sup>1,...,s*n*, T <sup>∈</sup> <sup>N</sup><sup>+</sup>. We have to find out whether there are <sup>x</sup>1,...,x*<sup>n</sup>* <sup>∈</sup> <sup>N</sup> such that *n <sup>i</sup>*=1 x*i*s*<sup>i</sup>* = T, i.e., whether there is a multiset of values from {s1,...,s*n*} that sums up to T. Assume we had an oracle that could compute the number of solutions for any inequation l<r where r is ground. We will use this oracle twice.

For both uses, we use a signature with constants c and d and unary functions f1,...,f*n*. We have w(c)=1, w(f*i*) = s*<sup>i</sup>* for i ∈ {1,...,n} and d ≺ c ≺ f<sup>1</sup> ≺ ···≺ f*n*. For the first case, set w(d) = T +2. Using the oracle with the inequation x<d, we get the number of terms smaller than d. Since d is the smallest term of weight T + 2, this is exactly the number of terms with weight ≤ T + 1. For the second case, set w(d) = T + 1. Again, using the oracle with the inequation x<d, we get the number of terms smaller than d. This time, this is the number of terms with weight ≤ T. If we now subtract those values, we get the number of terms with weight exactly T + 1.

Now the USSP has a solution iff the number of terms with weight exactly T + 1 is not 0. Every term t of weight T + 1 must have the constant c as subterm since the weight of d is too large. The rest of t must consist of the unary functions. Hence, the weights of the unary functions used sum up to T + 1 − 1 = T. Since the weights of the unary functions correspond to the numbers from the USSP, this yields a solution for the USSP. Conversely, given a solution to the USSP, we can construct a term of weight T + 1 analogously.

The problem with the aforementioned insufficient condition is that an inequality l*<sup>j</sup>* = r*<sup>j</sup>* does not necessarily rule out only one grounding, but possibly infinitely many groundings. This happens if there are variables that are not restricted by the matcher τ*<sup>j</sup>* of l*<sup>j</sup>* and r*<sup>j</sup>* . However, the criterion can be refined to a correct sufficient condition. If we restrict ourselves to the m + 1 smallest terms again, we would again at least have a finite number of groundings that l*<sup>j</sup>* = r*<sup>j</sup>* rules out. If we now sum up these numbers over all inequalities, we have an upper bound on the total number of ruled out groundings. For the inequation t<s, the same problem with variables that do not occur arises (there may be infinitely many solutions), so here, we restrict ourselves to the m + 1 smallest terms again. If now, the number of solutions for t<s is larger than the upper bound on the total number of ruled out groundings, we can actually be sure that there is a solution. However, this correct sufficient condition is hard to compute and therefore seems to be not very useful in practice.

# 4 Further Constraint Variants and Ordering Relaxation

In this section we study further variants of constraint problems and eventually extend the algorithm of Sect. 3 to alternating KBO constraints.

Proposition 21. *Checking satisfiability for right-ground KBO constraints restricted to strict inequations is NP-hard.*

*Proof.* The proof strategy is the same as the one used in the proof of Proposition 5. The encoding for positive clauses stays the same as < is still allowed. For negative clauses ¬P ∨ ¬Q ∨ ¬R we encode them as f(x*<sup>P</sup>* , x*Q*, x*R*) > f(a, a, a). This inequation can only be satisfied by a grounding that does not map all of these variables to a, and is trivially satisfied by any such grounding.

In particular, we have seen that only having constraints of the form t*<sup>i</sup>* < s*<sup>i</sup>* and t*<sup>i</sup>* > s*<sup>i</sup>* suffices to make the problem NP-hard. Next we turn to a weaker term ordering ≤sym solely based on symbol counting. Even for this ordering constraint solving remains NP-hard.

Definition 22. *For ground terms* t, s ∈ T(Σ)*, we define* t ≤*sym* s : ⇐⇒ |*sym*(t)|≤|*sym*(s)|*, i.e.,* t *does not contain more symbols than* s*.*

Definition 23. *A right-ground symbol constraint* C *is a finite set of atoms* t#s *with* t ∈ T(Σ, X )*,* s ∈ T(Σ) *and* # ∈ {≤*sym*, =}*. Satisfiability is defined analogously to the satisfiability of KBO constraints.*

Proposition 24. *Checking satisfiability for right-ground symbol constraints is NP-hard.*

*Proof.* The proof strategy is the same as the one used in the proof of Proposition 5. We encode positive clauses P ∨Q∨R as f(x*<sup>P</sup>* , x*Q*, x*R*) ≤ f(g(a), g(a), a). The only way to satisfy this inequation is to map at least one of these variables to a. Negative clauses ¬P ∨ ¬Q ∨ ¬R are encoded as f(x*<sup>P</sup>* , x*Q*, x*R*) = f(a, a, a).

In particular, the NP-hardness of these problems is not caused by the complicated structure of the KBO since the problem is already NP-hard for a comparison as simple as counting the number of symbols.

Our next variants are motivated by the definition of congruence classes with respect to terms with variables. For the first variant, all instances of the defining term t have to be smaller than a single ground term β and different from ground terms s1,...,s*n*.

Definition 25. *A simple, single ground KBO constraint* C *consists of terms* t ∈ T(Σ, X ) *and* s1,...,s*n*, β ∈ T(Σ)*. We say that* C *is satisfiable if there exists a substitution* σ *that is grounding for* t *such that*

$$\left(\bigwedge\_{j=1}^n t\sigma \neq s\_j\right) \land t\sigma < \beta.$$

Proposition 26. *Assuming that we are given the* n+1 *smallest terms, checking satisfiability of simple, single right-ground KBO constraints is in P.*

*Proof.* Actually, for this problem, if a reasonable strategy (Definition 7) is used, the algorithm from Sect. 3 runs in polynomial time. The key difference to the other problems is that here, every variable occurs in every inequality, so every inequality rules out at most one grounding. We first show that we can only reach polynomially many states. First, consider states (T; v; F; C) where tσ(v) < β. If v does not violate any inequality t = s*i*, then the algorithm terminates, so there is at most one such state. For every inequality t = s*i*, there is at most one grounding v that violates it. We claim that we reach at most k + 1 states with current grounding v where k is the number of variables. v is reached at most once using Increase because otherwise, there must be an intermediate Backtrack, so v would be inserted into F. If we reach v using Backtrack for some variable x*i*, then *inc*(v, i) was inserted into F, so for every variable x*i*, we can reach v using Backtrack at most once. Hence, v is reached at most k + 1 times.

Now, consider states (T; v; F; C) where tσ(v) ≥ β. Since a reasonable strategy is used, we must have reached this state from a state that does not violate t<β, so by the argumentation before, at most k such states can be reached for every inequality, so at most n · k in total where n is the number of inequalities.

Hence, in total, there are at most (k + 1) · n + n · k + 1 states. The state transitions can be done in polynomial time because we only need to iterate over all inequalities and inequations and over all entries in F. Since there are only polynomially many states and every rule application inserts at most one element into F, F has polynomially many entries.

Definition 27 (Alternating KBO Constraint). *An alternating KBO constraint* C *consists of terms* t, s1,...,s*<sup>n</sup>* ∈ T(Σ, X ) *and* β ∈ T(Σ)*. We say that* C *is satisfiable if there exists a substitution* σ *that is grounding for* t *such that for all substitutions* τ *that are grounding for all* s*<sup>j</sup> we have*

$$\left(\bigwedge\_{j=1}^n t\sigma \neq s\_j \tau\right) \land t\sigma < \beta.$$

Proposition 28. *Checking satisfiability for alternating KBO constraints is NPhard.*

*Proof.* We reduce from SAT. Let N be a set of clauses and X1,...,X*<sup>k</sup>* be the variables occurring in N. We use a signature with a k-ary function f, two constants a and b, variables x1,...,x*<sup>k</sup>* and y1,...,y*k*. Set t = f(x1,...,x*k*). Now for every clause C*<sup>j</sup>* ∈ N we introduce an inequality f(x1,...,x*k*) = f(s*j,*1,...,s*j,k*) where we set s*j,i* = b if X*<sup>i</sup>* occurs positively in C*<sup>j</sup>* , s*j,i* = a if X*<sup>i</sup>* occurs negatively in C*<sup>j</sup>* and s*j,i* = y*<sup>i</sup>* if X*<sup>i</sup>* does not occur in C*<sup>j</sup>* . The idea is that x*<sup>i</sup>* = a stands for X*<sup>i</sup>* is set to and x*<sup>i</sup>* = b stands for X*<sup>i</sup>* is set to ⊥. ∀τ.x*i*σ = y*i*τ is obviously impossible to satisfy, so the inequality must be made true by setting some positive variable to a or some negative variable to b.

To ensure that the x*<sup>i</sup>* are only mapped to a or b, we do the following: We first introduce a new constant c and set β = c. Then we set w(f) = w(a) = w(b)=1, w(c) = k + 2 and c ≺ a ≺ b ≺ f. If all variables x*<sup>i</sup>* are mapped to a or b, we have w(tσ) = k + 1, i.e., tσ < β. Any grounding where some x*<sup>i</sup>* is not mapped to a or b results in tσ ≥ β.

Now there is a solution σ iff there is a satisfying valuation for N.

If a reasonable strategy is used, satisfiability of alternating KBO constraints can be checked using the algorithm from Sect. 3. Any solution σ must be such that tσ < β, so we only have to consider instances of the s*<sup>j</sup>* τ with s*<sup>j</sup>* τ<β. What we can now do is to calculate for all s*<sup>j</sup>* all groundings τ with s*<sup>j</sup>* τ<β and add the inequality t = s*<sup>j</sup>* τ to the constraint. There are only finitely many such groundings because we did not allow unary functions f with w(f)=0. This way, we obtain a simple right-ground KBO constraint, so we can apply the algorithm. A more efficient possibility to do this is to add the groundings of the s*<sup>j</sup>* implicitly, i.e., to change the condition of Increase (and the first case of Backtrack and Fail) to whether there exists a matcher τ such that l*j*σ(v) = r*<sup>j</sup>* τ . Also, the condition for the next grounding for Increase changes: It is not that we fix the inequality anymore, but that we change a variable that occurs on the left side of the inequality.

*Example 29.* Consider the signature <sup>Σ</sup> <sup>=</sup> {f(2), g(1), a(0)}, where the superscript numbers denote the function arities, together with the following alternating KBO constraint C:

$$\begin{aligned} t &= f(x\_1, x\_2) \\ \beta &= f(f(a, a), a) \\ \end{aligned} \qquad \begin{aligned} s\_1 &= f(g(y\_1), y\_2) \\ s\_2 &= f(a, a) \end{aligned}$$

We set w(a) = w(g) = w(f)=1 and a ≺ g ≺ f. The few smallest terms are

$$a, g(a), g(g(a)), f(a, a).$$

Note that for alternating KBO constraints, it does not suffice anymore to consider the n + 1 smallest terms only since an inequality may rule out more than one term for a variable. However, as mentioned in Sect. 3, we calculate the smallest terms as needed, so this is not a problem. For shorter notation, for F, we omit groundings u if there is a grounding v ∈ F with v <*<sup>F</sup>* u. A possible run of the algorithm looks as follows:

$$\begin{array}{lcl} & \{\varepsilon; (a,a); \emptyset; C\} \\ \stackrel{\text{Incrace}}{\to} & \{x\_1; (g(a),a); \emptyset; C\} & \quad & s\_2, \tau = \{\} \\ \stackrel{\text{Increase}}{\to} & \{x\_1 x\_1; (g(g(a)),a); \emptyset; C\} & \quad & s\_1, \tau = \{y\_1 \mapsto a, y\_2 \mapsto a\} \\ \stackrel{\text{Increase}}{\to} & \{x\_1 x\_1 x\_1; (f(a,a),a); \emptyset; C\} & \quad & s\_1, \tau = \{y\_1 \mapsto g(a), y\_2 \mapsto a\} \\ \stackrel{\text{Backtrack}}{\to} & \{x\_1 x\_1; (g(g(a)),a); \{(f(a,a),a)\}; C\} & \quad & \beta \\ \stackrel{\text{Backtrack}}{\to} & \{x\_1; (g(a),a); \{(g(g(a)),a)\}; C\} & \quad & s\_1 \\ \stackrel{\text{Backtrack}}{\to} & \{\varepsilon; (a,a); \{(g(a),a)\}; C\} & \quad & s\_1 \\ \stackrel{\text{Dirac}}{\to} & \{x\_2; (a,g(a)); \{(g(a),a)\}; C\} & \quad & s\_2, \tau = \{\} \end{array}$$

#### 5 Experiments

We implemented the algorithm of Sect. 3 and its extension to constraints with right hand side variables, Definition 27, and tested it in the context of an extended congruence closure (CC) algorithm with variables [6,8,15,16]. We implemented a rather naive variant of [8] with the only goal to generate KBO constraints in order to test our new algorithm on KBO constraints. In contrast to [8] our algorithm considers a finite signature, as usual for first-order logic problems. All experiments were carried out on a Debian Linux server equipped with AMD EPYC 7702 64-Core CPUs running at 3.35GHz and an overall memory of 2TB. The result of all runs as well as all input files and binaries can be found at https://nextcloud.mpi-klsb.mpg.de/index.php/s/BAwd99cxFpSJmSp.

As a first test case we considered all eligible UEQ problems from CASC-J11 [17]. We consider equations and all inequalities for the congruence closure algorithm. The equations generate the congruence and for the inequalities we compute the congruence classes for the respective right and left side term of the inequality. For each example, the KBO function weight was always set to one and the precedence is generated with respect to the occurrence of symbols in the input file in ascending order. For β we chose a fixed nesting depth of 4 and build for each input file a nested term of exactly this depth using function symbols in the order of occurrence in the input, starting with a non-constant function symbol. Out of all eligible problems our CC algorithm terminated on 186 problems within a time limit of 30 min. Please note that although our CC implementation is rather naive, in contrast to the classical ground CC algorithm it does not need a complete grounding; for the examples where our naive algorithm runs out of time a complete grounding is not affordable. The below table shows some typical runs on the UEQ domain. All timings are presented in hundredths of a second and if they take less than one hundredth of a second we write zero. The below table shows the problem name, the number of ground terms smaller than β indicating the solution space for the constraint, the summed up time of all calls to the KBO constraint solver during the CC run, the number of calls to the KBO constraint solver, and the results of these calls. The three selected examples are typical: most of the problems are satisfiable and the constraint solving algorithm needs almost no time. Note that for the first example all 8014 calls to the constraint solver needed in sum 3 hundreds of a second. The LAT143-1 is the example showing the worst constraint solving performance, i.e., still less than a hundreth of a second per call.


For the SMT-LIB examples of the UF domain [1], we expanded let operators, removed the typing, coded predicates as equations, did a CNF transformation and then took the first literal of each clause as input for the CC algorithm. Nesting depth was set to 2, the rest done as for the UEQ examples. Removing types means that the number of smaller terms increases, i.e., the problems get potentially more difficult for the constraint solver, in particular for unsatisfiable constraints. The below table again shows some typical results. 1112 examples could be performed by the CC algorithm inside 30 min. The UF domain contains larger examples compared to the UEQ domain, but the characteristics remain. Constraint solving itself takes almost no time. Again all timings are presented in hundredths of a second.


Here uf.555113 is the worst example on constraint solving time with 1.34 s for 5120 calls. Although alternating KBO constraint solving is NP-hard, in practice there are typically only a few inequalities meaning that out of the overall number of terms smaller β, only a few need to be considered.

# 6 Discussion

We have studied a number of specific KBO constraint solving problems motivated by the SCL calculus and established their complexity. Except for simple, single right-ground KBO constraints all studied problems are proven NP-hard. We propose an algorithm that eventually runs for alternating KBO constraints which include a quantifier alternation. The algorithm shows nice performance on benchmark problems. Our next step is to turn our naive CC implementation with variables into a robust algorithm.

Acknowledgments. We thank our reviewers for their constructive comments that helped us improve the paper.

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# A Critical Pair Criterion for Level-Commutation of Conditional Term Rewriting Systems

Ryota Haga, Yuki Kagaya, and Takahito Aoto(B)

Niigata University, Niigata, Japan {r-haga,kagaya}@nue.ie.niigata-u.ac.jp, aoto@ie.niigata-u.ac.jp

Abstract. The rewrite relation of a conditional term rewriting system (CTRS) can be divided into a hierarchy of rewrite relations of term rewriting systems (TRSs) by the depth of the recursive use of rewrite relation in conditions; a CTRS is said to be level-confluent if each of these TRSs are confluent, and level-confluence implies confluence. We introduce level-commutation of CTRSs that extends the notion of levelconfluence, in a way similar to extending confluence to commutation, and give a critical pair criterion for level-commutation of oriented CTRSs with extra variables (3-CTRSs). Our result generalizes a criterion for commutation of TRSs of (Toyama, 1987), and properly extends a criterion for level-confluence of orthogonal oriented 3-CTRSs (Suzuki et al., 1995). We also present criteria for level-confluence and commutation of join and semi-equational 3-CTRSs that may have overlaps.

Keywords: Level-commutation · Level-confluence · Commutation · Confluence · Critical pair · Conditional term rewriting systems

# 1 Introduction

Confluence, which guarantees unique results of computations, is an important property of term rewriting systems (TRSs). Commutativity between two TRSs is a natural generalization of confluence in the sense that self-commutativity coincides with confluence. It also allows to infer confluence of TRSs in a modular way—the union of two confluent TRSs is confluent if they commute.

Conditional term rewriting systems (CTRSs) are extensions of TRSs in which each rewrite rule can be equipped with conditions, where these conditions are supposed to be evaluated recursively using the underlying CTRS itself. Some type of CTRSs is known as a model of functional (and logic) programs. The underlying logic of TRSs is the equational logic, whereas the one of CTRSs is

The parts of this research were done while the first and second authors were students at Niigata University. Partial results of the paper have been appeared in workshops PPL 2020, PPL 2022 and IWC 2022.

called the quasi-equational logic, constituting also an important class of systems for reasoning on a wider class of algebras.

From the computational point of view, the rewrite relation of a CTRS can be divided into a hierarchy of rewrite relations of TRSs by the depth of the recursive use of rewrite relation in conditions; a CTRS is said to be level-confluent if each of these TRSs are confluent. Suzuki et al. showed a criterion for orthogonal (i.e. left-linear non-overlapping) oriented CTRSs to be level-confluent [14]. Levelconfluence implies confluence, and their result can be thought as a generalization of confluence of orthogonal TRSs. More crucially, since much fewer criterion have been obtained for CTRSs comparing to TRSs, level-confluence can be seen as an important approach to obtain confluence proofs of CTRSs. In contrast to TRSs, where many extensions of the orthogonality criterion for left-linear (possibly overlapping) TRSs to have confluence have been explored (e.g., [4,8, 11,16]), similar extensions for CTRSs are not known. Similarly, several criteria for ensuring commutation for left-linear TRSs are known (e.g., [16,19]). Again, similar criteria for left-linear CTRSs are not known. In this paper, we give a criterion for a class of (possibly overlapping) left-linear oriented CTRSs, under which we prove level-commutation of such CTRSs. Our result is a generalization of the one given for TRSs in [16] and properly extends the result of [14] mentioned above. We also present criteria for level-confluence and commutation of left-linear join and semi-equational CTRSs that may have overlaps.

The rest of the paper is organized as follows. In the next section, we fix some notions and notations used in this paper, and explain two results that give starting points of our work. In Sect. 3, we present our main theorem on levelcommutation of *oriented* CTRSs and its proof in detail, and explain relations to the previous results. We then give some results on *join* CTRSs and *semiequational* CTRSs in Sect. 4. Section 5 concludes.

#### 2 Preliminaries

We basically follow standard notions and notations (e.g., [3,10]). Below, we explain some key notions and fix notations that will be used in this paper, while omitting most of definitions of standard notions and notations.

We consider a set F of function symbols. The set of variables is denoted by <sup>V</sup> and the set of terms over <sup>F</sup> and <sup>V</sup> is by T(F, <sup>V</sup>). We sometimes specify a set C⊆F of *constructors* to give the set of constructor terms T(C, <sup>V</sup>), i.e. terms over <sup>C</sup> and <sup>V</sup>. The set of variables in a term <sup>t</sup> is denoted by <sup>V</sup>(t). A term <sup>t</sup> is *linear* if each variable occurs in t at most once; t is *ground* if no variable occurs in t. The size of a term <sup>t</sup> is denoted by <sup>|</sup>t|. The set of positions in a term <sup>t</sup> is denoted by Pos(t); the *root* position is written as -. The symbol at a position <sup>p</sup> <sup>∈</sup> Pos(t) in a term <sup>t</sup> is written as <sup>t</sup>(p). We put Pos<sup>F</sup> (t) = {<sup>p</sup> <sup>∈</sup> Pos(t) <sup>|</sup> <sup>t</sup>(p) ∈ F}.

If t = C[u]<sup>p</sup> for a context C, we say u is a *subterm* of t (at a position <sup>p</sup> <sup>∈</sup> Pos(t)). The subterm of <sup>t</sup> at a position <sup>p</sup> <sup>∈</sup> Pos(t) is written as <sup>t</sup>|p. For terms t = C[u]<sup>p</sup> and s, the term C[s]<sup>p</sup> is denoted by t[s]p. We speak of subterm *occurrences* when we consider subterms with their respective positions; see e.g. [15] for a precise formalization of subterm occurrences. We will use capital letters A, B, . . . for subterm occurrences. For simplicity, a subterm occurrence A in a term is also treated as a term <sup>A</sup> (for example, we might write <sup>V</sup>(A)). Suppose A, B are subterm occurrences in a term t. If t = C[A]<sup>p</sup> and t = C [B]<sup>q</sup> with <sup>p</sup> <sup>≤</sup> <sup>q</sup> (p<q) we say that B is a (proper) subterm occurrence in a subterm occurrence <sup>A</sup> and write <sup>B</sup> <sup>⊆</sup> <sup>A</sup> (<sup>B</sup> <sup>⊂</sup> <sup>A</sup>, respectively). Overlaps on subterm occurrences will be used to give a notion of weight on which our induction proof works.

A *term rewriting system* (*TRS*, for short) R is a set of *rewrite rules*, where each *rewrite rule* <sup>l</sup> <sup>→</sup> <sup>r</sup> satisfies the conditions l /∈ V and <sup>V</sup>(r) ⊆ V(l). Rewrite rules are identified modulo renaming. A TRS <sup>R</sup> is *left-linear* if <sup>l</sup> is linear for each <sup>l</sup> <sup>→</sup> <sup>r</sup> ∈ R. We write <sup>s</sup> <sup>→</sup><sup>p</sup> <sup>R</sup> <sup>t</sup> if <sup>s</sup>|<sup>p</sup> is the redex of this rewrite step; we also write s A <sup>→</sup><sup>R</sup> <sup>t</sup> to indicate the redex occurrence <sup>A</sup> of this rewrite step. The relation →<sup>R</sup> over terms is called the *rewrite relation* of R, and its reflexive transitive closure is denoted by <sup>∗</sup> →R. A *reduction* is a successive sequence of rewrite steps <sup>t</sup><sup>0</sup> <sup>→</sup><sup>R</sup> <sup>t</sup><sup>1</sup> <sup>→</sup><sup>R</sup> ··· →<sup>R</sup> <sup>t</sup>n, where <sup>n</sup> is the *length* of this reduction. When no confusion arises, a reduction <sup>s</sup> <sup>→</sup><sup>R</sup> ··· →<sup>R</sup> <sup>t</sup> is written as <sup>s</sup> <sup>∗</sup> <sup>→</sup><sup>R</sup> <sup>t</sup> for brevity, whose length is denoted by <sup>|</sup><sup>s</sup> <sup>∗</sup> <sup>→</sup><sup>R</sup> <sup>t</sup>|. We have a *parallel rewrite step* <sup>s</sup> −→- <sup>R</sup> <sup>t</sup> if <sup>s</sup> <sup>=</sup> <sup>C</sup>[A1,...,An], <sup>t</sup> <sup>=</sup> <sup>C</sup>[B1,...,Bn] (<sup>n</sup> <sup>≥</sup> <sup>0</sup>) for some context <sup>C</sup> and subterm occurrences <sup>A</sup>i, B<sup>i</sup> such that <sup>A</sup><sup>i</sup> <sup>→</sup>- <sup>R</sup> <sup>B</sup><sup>i</sup> for all <sup>i</sup> = 1,...,n; this rewrite step is written as s A1,...,A*<sup>n</sup>* −→-<sup>R</sup> <sup>t</sup> to indicate the redex occurrences <sup>A</sup>1,...,An.

A relation <sup>→</sup> is *confluent* if <sup>∗</sup> ← ◦ <sup>∗</sup> → ⊆ <sup>∗</sup> → ◦ <sup>∗</sup> ←; A TRS R is confluent if so is its rewrite relation <sup>→</sup>R. Relations <sup>→</sup> and ❀ *commute* (or, are *commutative*) if <sup>∗</sup> ←◦ <sup>∗</sup> - <sup>⊆</sup> <sup>∗</sup> -◦ ∗ ←; TRSs R and S commute if so do their rewrite relations →<sup>R</sup> and →<sup>S</sup> . Clearly, self-commutativity equals confluence, and from a sufficient criterion for commutativity the one for confluence naturally arises.

Let <sup>l</sup><sup>1</sup> <sup>→</sup> <sup>r</sup><sup>1</sup> and <sup>l</sup><sup>2</sup> <sup>→</sup> <sup>r</sup><sup>2</sup> be rewrite rules so that their sets of variables are renamed to be disjoint. If a non-variable subterm <sup>l</sup>2|<sup>p</sup> of <sup>l</sup><sup>2</sup> satisfies <sup>l</sup>2|p<sup>σ</sup> <sup>=</sup> <sup>l</sup>1<sup>σ</sup> for some substitution <sup>σ</sup>, we say that <sup>l</sup><sup>1</sup> <sup>→</sup> <sup>r</sup><sup>1</sup> *overlaps on* <sup>l</sup><sup>2</sup> <sup>→</sup> <sup>r</sup><sup>2</sup> (at <sup>p</sup>), provided that <sup>p</sup> = for the case <sup>l</sup><sup>1</sup> <sup>→</sup> <sup>r</sup><sup>1</sup> and <sup>l</sup><sup>2</sup> <sup>→</sup> <sup>r</sup><sup>2</sup> are identical. Suppose <sup>l</sup><sup>1</sup> <sup>→</sup> <sup>r</sup><sup>1</sup> overlaps on <sup>l</sup><sup>2</sup> <sup>→</sup> <sup>r</sup><sup>2</sup> at <sup>p</sup> and <sup>σ</sup> is an mgu of <sup>l</sup>2|<sup>p</sup> and <sup>l</sup>1. Then the pair <sup>l</sup>2[r1]pσ, r2<sup>σ</sup> is called a *critical pair* (obtained from that overlap); the pair is called *outer* if p = and is called *inner* if p>-. The set of critical pairs from overlaps of rules of <sup>R</sup> is denoted by *CP*(R); the set of outer (inner) critical pairs are denoted by *CPout*(R) (resp. *CP*in(R)). Let <sup>R</sup>, <sup>S</sup> be TRSs. The set of critical pairs obtained from overlaps of <sup>l</sup><sup>1</sup> <sup>→</sup> <sup>r</sup><sup>1</sup> ∈ R on <sup>l</sup><sup>2</sup> <sup>→</sup> <sup>r</sup><sup>2</sup> ∈ S is denoted by *CP*(R, <sup>S</sup>). The sets *CPout*(R, <sup>S</sup>) and *CP*in(R, <sup>S</sup>) are defined similarly. We are now ready to state a sufficient criterion for commutativity of TRSs.

Proposition 1 ([16]). *Let* R *and* S *be left-linear TRSs. If both of the following conditions are satisfied, then* R *and* S *commute:*

*1. for any* p, q<sup>∈</sup> *CP*(R, <sup>S</sup>)*,* <sup>p</sup> −→- <sup>S</sup> ◦ <sup>∗</sup> <sup>←</sup><sup>R</sup> <sup>q</sup>*, and 2. for any* q, p<sup>∈</sup> *CP*in(S, <sup>R</sup>)*,* <sup>q</sup> −→-<sup>R</sup> <sup>p</sup> *holds.*

The above criterion for commutativity arises a criterion for confluence: a left-linear TRS <sup>R</sup> is confluent if (1) for any p, q<sup>∈</sup> *CPout*(R), <sup>p</sup> −→- <sup>R</sup> ◦ <sup>∗</sup> <sup>←</sup><sup>R</sup> <sup>q</sup>, and (2) for any q, p<sup>∈</sup> *CP*in(R), <sup>q</sup> −→- <sup>R</sup> <sup>p</sup> holds. Note here in the condition (1), considering p, q<sup>∈</sup> *CPout*(R) is sufficient, instead of considering p, q<sup>∈</sup> *CP*(R), because of the presence of condition (2).

A (directed) equation is an ordered pair u, v of terms, written as e.g. <sup>u</sup> <sup>≈</sup> <sup>v</sup>. <sup>A</sup> *conditional rewrite rule* has the form <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>u</sup><sup>1</sup> <sup>≈</sup> <sup>v</sup>1,...,u<sup>k</sup> <sup>≈</sup> <sup>v</sup><sup>k</sup> where l /∈ V; here, <sup>u</sup><sup>1</sup> <sup>≈</sup> <sup>v</sup>1,...,u<sup>k</sup> <sup>≈</sup> <sup>v</sup><sup>k</sup> is a sequence of (directed) equations, called the *conditional part* of the rule. Often, we will use a meta-variable, say c, to denote the conditional part of the rule. Let <sup>c</sup> <sup>=</sup> <sup>u</sup><sup>1</sup> <sup>≈</sup> <sup>v</sup>1,...,u<sup>k</sup> <sup>≈</sup> <sup>v</sup>k. Then, for any given substitution <sup>σ</sup>, we put cσ <sup>=</sup> <sup>u</sup>1<sup>σ</sup> <sup>≈</sup> <sup>v</sup>1σ, . . . , uk<sup>σ</sup> <sup>≈</sup> <sup>v</sup>kσ. Also, we write e.g. <sup>V</sup>(l, c) to denote the set of variables occurring in <sup>l</sup> and <sup>c</sup>. We often also treat <sup>c</sup> as a set {u<sup>1</sup> <sup>≈</sup> <sup>v</sup>1,...,u<sup>k</sup> <sup>≈</sup> <sup>v</sup>k} so as to write <sup>u</sup> <sup>≈</sup> <sup>v</sup> <sup>∈</sup> <sup>c</sup>, cσ <sup>⊆</sup> ❀, etc., whose meaning should be apparent. The empty sequence is also written as <sup>∅</sup>, and <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ ∅ is abbreviated as <sup>l</sup> <sup>→</sup> <sup>r</sup>.

Conditional term rewriting system (CTRS, for short) is a set of conditional rewrite rules. In the literature, CTRSs are categorized into several types of CTRSs according the way of interpreting the conditions of the rules used in the definition of their rewrite steps. A *rewrite step* of *oriented* CTRS R is defined via the following TRSs <sup>R</sup><sup>n</sup> (<sup>n</sup> <sup>∈</sup> <sup>N</sup>), which are inductively given as follows: <sup>R</sup><sup>0</sup> <sup>=</sup> <sup>∅</sup>, <sup>R</sup>n+1 <sup>=</sup> {lσ <sup>→</sup> rσ <sup>|</sup> <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> ∈ R, cσ <sup>⊆</sup> <sup>∗</sup> <sup>→</sup>R*<sup>n</sup>* }. A rewrite step <sup>s</sup> <sup>→</sup><sup>R</sup> <sup>t</sup> of CTRS <sup>R</sup> is given as <sup>s</sup> <sup>→</sup><sup>R</sup> <sup>t</sup> iff <sup>s</sup> <sup>→</sup>R*<sup>n</sup>* <sup>t</sup> for some <sup>n</sup>. Note that <sup>m</sup> <sup>≤</sup> <sup>n</sup> implies <sup>→</sup>R*<sup>m</sup>* ⊆ →R*<sup>n</sup>* . The smallest <sup>n</sup> such that <sup>s</sup> <sup>→</sup>R*<sup>n</sup>* <sup>t</sup> is called the *level* of the rewrite step <sup>s</sup> <sup>→</sup><sup>R</sup> <sup>t</sup>. We also use the notation <sup>→</sup>R*<n* <sup>=</sup> - i<n →R*<sup>i</sup>* . We will also write <sup>R</sup><sup>n</sup> cσ to denote cσ <sup>⊆</sup> <sup>∗</sup> →R*<sup>n</sup>* . Except Sect. 4, we will only consider oriented CTRSs in this paper, and thus let us postpone to mention about join or semiequational CTRSs until Sect. 4. A CTRS R is *level-confluent* if TRSs R<sup>n</sup> are confluent for all <sup>n</sup> <sup>≥</sup> <sup>0</sup>. One can naturally extend the notion of level-confluence, in the similar way extending confluence to commutation.

Definition 1 (Level-commutation). *CTRSs* R *and* S *are* level-commutative *if for any* m, n <sup>≥</sup> <sup>0</sup>*,* <sup>∗</sup> <sup>←</sup>R*<sup>m</sup>* ◦ <sup>∗</sup> <sup>→</sup>S*<sup>n</sup>* <sup>⊆</sup> <sup>∗</sup> <sup>→</sup>S*<sup>n</sup>* ◦ <sup>∗</sup> ←R*<sup>m</sup>.*

Clearly, level-commutativity (level-confluence) implies commutativity (resp. confluence), and self-level-commutativity implies level-confluence.

A conditional rewrite rule <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> has *type 1* if <sup>V</sup>(r, c) ⊆ V(l), *type 2* if <sup>V</sup>(r) ⊆ V(l), *type 3* if <sup>V</sup>(r) ⊆ V(l, c), and *type 4* if "true". A CTRS <sup>R</sup> has type n if all rules have type n; CTRSs of type n are also referred to as n*-CTRSs*. We will mainly deal with 3-CTRSs below. Variables occurring in r, c which is not contained in <sup>V</sup>(l) are often called *extra* variables.

We now explain some notions necessary to give a sufficient criterion for levelconfluence [14]. A CTRS <sup>R</sup> is *properly oriented* if <sup>V</sup>(r) ⊆ V(l) implies <sup>V</sup>(ui) <sup>⊆</sup> <sup>V</sup>(l) <sup>∪</sup> i−1 <sup>j</sup>=1 <sup>V</sup>(v<sup>j</sup> ) for all <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>, for any <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>u</sup><sup>1</sup> <sup>≈</sup> <sup>v</sup>1,...,u<sup>k</sup> <sup>≈</sup> <sup>v</sup><sup>k</sup> ∈ R. A CTRS <sup>R</sup> is *right-stable* if, for all <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>u</sup><sup>1</sup> <sup>≈</sup> <sup>v</sup>1,...,u<sup>k</sup> <sup>≈</sup> <sup>v</sup><sup>k</sup> ∈ R, (1) (V(l) <sup>∪</sup> ( i−1 <sup>j</sup>=1 <sup>V</sup>(u<sup>j</sup> , v<sup>j</sup> )) ∪ V(ui)) ∩ V(vi) = <sup>∅</sup> for all <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup> and (2) for any <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>, <sup>v</sup><sup>i</sup> is either a linear constructor term or a ground <sup>R</sup>u-normal form, where the constructors are given by <sup>C</sup> <sup>=</sup> F \{l(-) <sup>|</sup> <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> ∈ R} and the (extended) TRS <sup>R</sup><sup>u</sup> is given by <sup>R</sup><sup>u</sup> <sup>=</sup> {<sup>l</sup> <sup>→</sup> <sup>r</sup> <sup>|</sup> <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> ∈ R}. A CTRS <sup>R</sup> is left-linear if <sup>l</sup> is linear for all <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> ∈ R. Let <sup>l</sup><sup>1</sup> <sup>→</sup> <sup>r</sup><sup>1</sup> ⇐ <sup>c</sup><sup>1</sup> and <sup>l</sup><sup>2</sup> <sup>→</sup> <sup>r</sup><sup>2</sup> ⇐ <sup>c</sup><sup>2</sup> be conditional rewrite rules so that their sets of variables are renamed to be disjoint. We say <sup>l</sup><sup>1</sup> <sup>→</sup> <sup>r</sup><sup>1</sup> ⇐ <sup>c</sup><sup>1</sup> overlaps on <sup>l</sup><sup>2</sup> <sup>→</sup> <sup>r</sup><sup>2</sup> ⇐ <sup>c</sup><sup>2</sup> (at <sup>p</sup>) if a non-variable subterm <sup>l</sup>2|<sup>p</sup> of <sup>l</sup><sup>2</sup> satisfies <sup>l</sup>2|p<sup>σ</sup> <sup>=</sup> <sup>l</sup>1<sup>σ</sup> for some substitution <sup>σ</sup>, provided that p = for the case <sup>l</sup><sup>1</sup> <sup>→</sup> <sup>r</sup><sup>1</sup> ⇐ <sup>c</sup><sup>1</sup> and <sup>l</sup><sup>2</sup> <sup>→</sup> <sup>r</sup><sup>2</sup> ⇐ <sup>c</sup><sup>2</sup> are identical. A CTRS R is *non-overlapping* if there is no overlap between rules of R; A CTRS R is *orthogonal* if it is left-linear and non-overlapping.

Proposition 2 ([14]). *Let* R *be an orthogonal, properly oriented, right-stable 3-CTRS. Then,* <sup>∗</sup> <sup>←</sup>R*<sup>m</sup>* ◦ <sup>∗</sup> <sup>→</sup>R*<sup>n</sup>* <sup>⊆</sup> <sup>∗</sup> <sup>→</sup>R*<sup>n</sup>* ◦ <sup>∗</sup> <sup>←</sup>R*<sup>m</sup> for any* m, n <sup>≥</sup> <sup>0</sup>*. In particular,* <sup>R</sup> *is level-confluent.*

#### 3 Level-Commutation of Oriented CTRSs

Proposition 1 only deals with TRSs but its scope is not limited to orthogonal ones. On the other hand, Proposition 2 can deal with CTRSs (not only TRSs) but limited to only orthogonal case. Also Proposition 2 only claims on (level- )confluence, whereas Proposition 1 claims on commutativity. A natural question is whether we can unify these two propositions and how—we will focus on this question in the this section.

Our basic idea is to unify proofs of [16, Theorem 3.1] and [14, Theorem 4.6]. The basic scenario of the former proof is showing that ←−- <sup>R</sup>◦−→- <sup>S</sup> ⊆ −→- <sup>S</sup> ◦ <sup>∗</sup> ←−- <sup>R</sup>. In the latter, an extended parallel rewriting −→- <sup>R</sup>*<sup>n</sup>* of −→- <sup>R</sup> was introduced and they showed ←−- R*<sup>m</sup>* ◦ −→- <sup>R</sup>*<sup>n</sup>* <sup>⊆</sup> −→- <sup>R</sup>*<sup>n</sup>* ◦ ←−- R*<sup>m</sup>*. Naturally, our first attempt was to prove ←−- R*<sup>m</sup>* ◦ −→- <sup>S</sup>*<sup>n</sup>* <sup>⊆</sup> −→- <sup>S</sup>*<sup>n</sup>* ◦ <sup>∗</sup> ←−- R*<sup>m</sup>*. Examining the details, however, it turned out that this scenario does not work (induction does not work). Thus, our first key ingredient is to modify our proof scenario as showing:

$$\epsilon \dashv \dashv \mathcal{R}\_m \circ \hookrightarrow \dashv \mathcal{S}\_n \subseteq \hookrightarrow \dashv \mathcal{S}\_n \circ \hookrightarrow \stackrel{\*}{\dashv} \mathcal{S}\_{\leq n} \circ \stackrel{\*}{\longleftrightarrow} \mathcal{R}\_m \tag{\*}$$

We now reason why this scenario is sound using an abstract setting.

Let (→n)<sup>n</sup>∈<sup>N</sup> be an <sup>N</sup>-indexed family of relations on a set <sup>X</sup>. We put <sup>→</sup><n <sup>=</sup> - i<n <sup>→</sup>i. We say (→n)<sup>n</sup>∈<sup>N</sup> is *up-simulated* if <sup>∗</sup> <sup>→</sup><n ⊆ →<sup>n</sup> for any <sup>n</sup> <sup>∈</sup> <sup>N</sup>.

Lemma 1. *Let* (→n)<sup>n</sup>∈<sup>N</sup>,(❀n)<sup>n</sup>∈<sup>N</sup> *be up-simulated families of relations on a set* <sup>X</sup>*. Suppose that*<sup>1</sup>*, for any* m, n <sup>∈</sup> <sup>N</sup>*,* <sup>←</sup><sup>m</sup> ◦❀<sup>n</sup> <sup>⊆</sup> <sup>n</sup> ◦ <sup>∗</sup> -<n ◦ <sup>∗</sup> ←m*. Then, for any* m, n <sup>∈</sup> <sup>N</sup>*, we have (1)* <sup>∗</sup> ←<sup>m</sup> ◦ ❀<sup>n</sup> ⊆ <sup>n</sup> ◦ <sup>∗</sup> -<n ◦ <sup>∗</sup> <sup>←</sup>m*, (2)* <sup>∗</sup> <sup>←</sup><sup>m</sup> ◦ ❀<sup>n</sup> <sup>⊆</sup> <sup>∗</sup> <sup>n</sup> ◦ <sup>∗</sup> ←<sup>m</sup> *and (3)* <sup>∗</sup> <sup>←</sup><sup>m</sup> ◦ <sup>∗</sup> ❀<sup>n</sup> <sup>⊆</sup> <sup>∗</sup> <sup>n</sup> ◦ <sup>∗</sup> ←m*.*

*Proof.* Use induction. Use (1) to show (2), and then (2) to (3).

<sup>1</sup> The criterion has some similarity with the *decreasing diagrams*; however, because multiple →*m*-steps are allowed, it is not at all apparent (currently, to the authors) whether the criterion can be obtained via the decreasing diagrams.

Now let us adopt our abstract framework to CTRSs. Let R be a CTRS. The notion of extended parallel rewriting [14] is given as follows: we write s −→- <sup>R</sup>*<sup>n</sup>* <sup>t</sup> if <sup>s</sup> <sup>=</sup> <sup>C</sup>[A1,...,Ap], <sup>t</sup> <sup>=</sup> <sup>C</sup>[B1,...,Bp] (<sup>p</sup> <sup>≥</sup> <sup>0</sup>) for some context <sup>C</sup> and subterm occurrences <sup>A</sup>i, B<sup>i</sup> such that either <sup>A</sup><sup>i</sup> <sup>→</sup>- <sup>R</sup>*<sup>n</sup>* <sup>B</sup><sup>i</sup> or <sup>A</sup><sup>i</sup> ∗ <sup>→</sup>R*<n* <sup>B</sup><sup>i</sup> for all <sup>i</sup> <sup>=</sup> <sup>1</sup>,...,p. We put −→- <sup>R</sup> = - <sup>n</sup>≥<sup>0</sup> −→- <sup>R</sup>*<sup>n</sup>* , which is called the *extended parallel rewrite step* of <sup>R</sup>. We will also write <sup>s</sup> A1,...,A*<sup>p</sup>* −→- <sup>R</sup> <sup>t</sup> to indicate subterm occurrences A1,...,Ap.

Then, from the Lemma 1, it easily follows:

Lemma 2. *Let* <sup>R</sup>, <sup>S</sup> *be CTRSs. Suppose* ←−- R*<sup>m</sup>* ◦ −→- <sup>S</sup>*<sup>n</sup>* <sup>⊆</sup> −→- <sup>S</sup>*<sup>n</sup>* ◦ <sup>∗</sup> −→- <sup>S</sup>*<n* ◦ ∗ <sup>←</sup>R*<sup>m</sup> for any* m, n <sup>≥</sup> <sup>0</sup>*. Then, for any* m, n*, we have* <sup>∗</sup> ←−- R*<sup>m</sup>* ◦ <sup>∗</sup> −→- <sup>S</sup>*<sup>n</sup>* <sup>⊆</sup> <sup>∗</sup> −→- <sup>S</sup>*<sup>n</sup>* ◦ <sup>∗</sup> ←−- R*<sup>m</sup>. Hence, for any* m, n*, we have* <sup>∗</sup> <sup>←</sup>R*<sup>m</sup>* ◦ <sup>∗</sup> <sup>→</sup>S*<sup>n</sup>* <sup>⊆</sup> <sup>∗</sup> <sup>→</sup>S*<sup>n</sup>* ◦ <sup>∗</sup> ←R*<sup>m</sup>.*

*Proof.* Suppose t<sup>1</sup> ∗ <sup>←</sup>R*<sup>m</sup>* <sup>t</sup> <sup>∗</sup> <sup>→</sup>S*<sup>n</sup>* <sup>t</sup>2. As <sup>→</sup>R*<sup>k</sup>* <sup>⊆</sup> −→- <sup>R</sup>*<sup>k</sup>* for each <sup>k</sup> we have t1 ∗ ←−- R*<sup>m</sup>* <sup>t</sup> <sup>∗</sup> −→- <sup>R</sup>*<sup>n</sup>* <sup>t</sup><sup>2</sup> (and similarly for <sup>S</sup>). From the fact <sup>→</sup>R*<sup>m</sup>* ⊆ →R*<sup>n</sup>* for m<n, it immediately follows that (−→ <sup>n</sup>)<sup>n</sup>∈<sup>N</sup> is up-simulated (again, similarly for <sup>S</sup>). Thus, it follows <sup>t</sup><sup>1</sup> ∗ −→- <sup>S</sup>*<sup>n</sup>* <sup>t</sup> <sup>∗</sup> ←−- R*<sup>m</sup>* <sup>t</sup><sup>2</sup> by using Lemma <sup>1</sup> and our hypothesis. Because <sup>∗</sup> −→- <sup>R</sup>*<sup>k</sup>* <sup>⊆</sup> <sup>∗</sup> <sup>→</sup>R*<sup>k</sup>* for each <sup>k</sup> (and similarly for <sup>S</sup>), we obtain t1 ∗ <sup>→</sup>S*<sup>n</sup>* <sup>t</sup> ∗ <sup>←</sup>R*<sup>m</sup>* <sup>t</sup>2.

It is now concluded from this lemma that our proof scenario (∗) works to obtain the level-confluence.

For our proof below, we need to use the induction hypothesis to claim a more general statement as in the above. The following lemma is presented for this purpose.

Lemma 3. *Let* <sup>R</sup>, <sup>S</sup> *be CTRSs and* <sup>k</sup> <sup>∈</sup> <sup>N</sup>*. Suppose* ←−- R*<sup>m</sup>* ◦ −→- <sup>S</sup>*<sup>n</sup>* <sup>⊆</sup> −→- <sup>S</sup>*<sup>n</sup>* ◦ <sup>∗</sup> −→- <sup>S</sup>*<n* ◦ <sup>∗</sup> <sup>←</sup>R*<sup>m</sup> for any* m, n *such that* <sup>m</sup> <sup>+</sup> n<k*. Then, for any* m, n *such that* m + n<k*, we have (1)* <sup>∗</sup> ←−- R*<sup>m</sup>* ◦ −→- <sup>S</sup>*<sup>n</sup>* <sup>⊆</sup> −→- <sup>S</sup>*<sup>n</sup>* ◦ <sup>∗</sup> −→- <sup>S</sup>*<n* ◦ <sup>∗</sup> ←−- R*<sup>m</sup>, (2)* <sup>∗</sup> ←−- R*<sup>m</sup>* ◦ −→- <sup>S</sup>*<sup>n</sup>* <sup>⊆</sup> <sup>∗</sup> −→- <sup>S</sup>*<sup>n</sup>* ◦ <sup>∗</sup> ←−- R*<sup>m</sup> and (3)* <sup>∗</sup> ←−- R*<sup>m</sup>* ◦ <sup>∗</sup> −→- <sup>S</sup>*<sup>n</sup>* <sup>⊆</sup> <sup>∗</sup> −→- <sup>S</sup>*<sup>n</sup>* ◦ <sup>∗</sup> ←−-R*<sup>m</sup>.*

*Proof.* Use an abstract version of the lemma, which can be proved in the way similar to Lemma 1.

Our second key ingredient is the following alternative definition of conditional critical pairs.

Definition 2 (Condition-separated CCP). *Suppose* <sup>l</sup><sup>1</sup> <sup>→</sup> <sup>r</sup><sup>1</sup> ⇐ <sup>c</sup><sup>1</sup> *overlaps on* <sup>l</sup><sup>2</sup> <sup>→</sup> <sup>r</sup><sup>2</sup> ⇐ <sup>c</sup><sup>2</sup> *at* <sup>p</sup> *and* <sup>σ</sup> *is an mgu of* <sup>l</sup>2|<sup>p</sup> *and* <sup>l</sup>1*. Then the quadruple* <sup>l</sup>2[r1]pσ, r2σ⇐c1σ, c2<sup>σ</sup> *is called a* (condition-separated) conditional critical pair (CCP, for short) *(obtained from that overlap); when* p = -*, the pair is called* outer *and* p>-*, the pair is called* inner*. The set of (outer, inner) critical pairs obtained from overlaps of* <sup>l</sup><sup>1</sup> <sup>→</sup> <sup>r</sup><sup>1</sup> ⇐ <sup>c</sup><sup>1</sup> ∈ R *on* <sup>l</sup><sup>2</sup> <sup>→</sup> <sup>r</sup><sup>2</sup> ⇐ <sup>c</sup><sup>2</sup> ∈ S *is denoted by CCP*(R, <sup>S</sup>) *(resp. CCPout*(R, <sup>S</sup>)*, CCPin*(R, <sup>S</sup>)*). The set of (outer, inner) critical pairs from overlaps of rules of* <sup>R</sup> *is denoted by CCP*(R) *(resp. CCPout*(R)*, CCPin*(R)*).*

In most literature, we see that instead of distinguishing two sequences c1σ and c2σ, the combined sequence of c1σ and c2σ is employed in the definition of CCPs. But, in our case where CTRSs R and S may be different, this distinction is important to state a precise condition of our theorem.

We now present one more preparation: the following lemma is used several times as a part of the proof of our main theorem—when the lemma is used in the proof of our main theorem, the assumption (†) of the lemma can be inferred from the induction hypothesis (of the proof of the main theorem), using Lemma 3.

Lemma 4. *Let* R *and* S *be 3-CTRSs and suppose that* R *is left-linear and rightstable. Suppose that* <sup>M</sup> <sup>=</sup> lσ*,* <sup>N</sup> <sup>=</sup> rσ*,* <sup>R</sup><sup>m</sup>−<sup>1</sup> cσ *with* <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> ∈ R*. Assume moreover that* M P1,...,P*<sup>p</sup>* −→- <sup>S</sup>*<sup>n</sup>* <sup>P</sup> *and* <sup>P</sup>1,...,P<sup>p</sup> *occurs in the substitution* <sup>σ</sup>*. Assume that (*†*)* <sup>∗</sup> ←−- R*<sup>i</sup>* ◦ <sup>∗</sup> −→- <sup>S</sup>*<sup>j</sup>* <sup>⊆</sup> <sup>∗</sup> −→- <sup>S</sup>*<sup>j</sup>* ◦ <sup>∗</sup> ←−- R*<sup>i</sup> for any* i, j *such that* <sup>i</sup> <sup>+</sup> j<m <sup>+</sup> <sup>n</sup>*. Then, there exists* <sup>Q</sup> *such that* N −→-<sup>S</sup>*<sup>n</sup>* <sup>Q</sup> *and* <sup>P</sup> <sup>→</sup>R*<sup>m</sup>* <sup>Q</sup>*.*

Now we present our critical pair criterion for commutativity.

Theorem 1. *Let* R *and* S *be left-linear, properly oriented, right-stable 3- CTRSs. If the following conditions are satisfied, then* R *and* S *are levelcommutative:*


*Proof.* Let M A1,...,A*m*¯ −→- <sup>R</sup>*<sup>m</sup>* <sup>N</sup> and <sup>M</sup> B1,...,B*n*¯ −→- <sup>S</sup>*<sup>n</sup>* <sup>P</sup>. We show N −→- <sup>S</sup>*<sup>n</sup>* ◦ <sup>∗</sup> −→- <sup>S</sup>*<n* <sup>Q</sup> and P <sup>∗</sup> −→- <sup>R</sup>*<sup>m</sup>* <sup>Q</sup> for some <sup>Q</sup>. For the rewrite steps used in the critical pairs conditions above, note that −→- ◦ <sup>∗</sup> <sup>→</sup>< <sup>=</sup> −→- ◦ <sup>∗</sup> −→- < as well as <sup>∗</sup> <sup>→</sup> <sup>=</sup> <sup>∗</sup> −→- for any . Let Γ and Δ be sets of subterm occurrences in the term M given as follows:

$$\begin{array}{l} \Gamma = \{A\_i \mid \exists B\_j. \ A\_i \subset B\_j \} \cup \{B\_i \mid \exists A\_j. \ B\_i \subseteq A\_j \} \\ \Delta = \{A\_i \mid \forall B\_j. \ A\_i \not\subseteq B\_j \} \cup \{B\_i \mid \forall A\_j. \ B\_i \not\subseteq A\_j \} \end{array}$$

Thus, Γ consists of subterm occurrences Ai's that is a proper subterm occurrence of some B<sup>j</sup> and subterm occurrences B<sup>j</sup> 's that is a subterm occurrence of some Ai; Δ consists of subterm occurrences Ai's and B<sup>j</sup> 's not contained in Γ. Clearly, for any <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>m</sup>¯ , either one of <sup>A</sup><sup>i</sup> <sup>∈</sup> <sup>Γ</sup> or <sup>A</sup><sup>i</sup> <sup>∈</sup> <sup>Δ</sup> holds, and for any <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>n</sup>¯, either one of <sup>B</sup><sup>j</sup> <sup>∈</sup> <sup>Γ</sup> or <sup>B</sup><sup>j</sup> <sup>∈</sup> <sup>Δ</sup> holds. In the case <sup>A</sup><sup>i</sup> and B<sup>j</sup> are the same subterm occurrence, we put A<sup>i</sup> to Δ and B<sup>j</sup>to Γ.

Δ denotes the set of maximal redexes occurrences in the following sense. Let <sup>Δ</sup> <sup>=</sup> {M1,...,Mp¯}. Then we have <sup>M</sup> <sup>=</sup> <sup>C</sup>[M1,...,Mp¯] for some context C. Furthermore, we have N = C[N1,...,Np¯] and P = C[P1,...,Pp¯] for some <sup>N</sup>1,...,Np¯, P1,...,Pp¯ such that <sup>M</sup><sup>i</sup> −→- <sup>R</sup>*<sup>m</sup>* <sup>N</sup>i, <sup>M</sup><sup>i</sup> −→- <sup>S</sup>*<sup>n</sup>* <sup>P</sup><sup>i</sup> (<sup>i</sup> = 1,..., <sup>p</sup>¯). Thus, it suffices to show for each <sup>M</sup>i, there exists <sup>Q</sup><sup>i</sup> such that <sup>N</sup><sup>i</sup> −→- <sup>S</sup>*<sup>n</sup>* ◦ <sup>∗</sup> −→- <sup>S</sup>*<n* <sup>Q</sup><sup>i</sup> and P<sup>i</sup> ∗ −→-<sup>R</sup>*<sup>m</sup>* <sup>Q</sup>i. On the other hand, <sup>Γ</sup> is used to count the size of overlaps and is used to give the induction weight. Let <sup>|</sup>Γ<sup>|</sup> <sup>=</sup> <sup>D</sup>∈<sup>Γ</sup> <sup>|</sup>D|. Our proof proceeds on induction on lexicographic combination of <sup>m</sup> <sup>+</sup> n, <sup>|</sup>Γ|.

The cases for m = 0 or n = 0 are easy, thus we consider the cases for m > 0,n> 0. We distinguish two cases:

	- Mi −→-<sup>S</sup>*<sup>n</sup>* <sup>P</sup>i. We distinguish the cases.
	- (a) Case M<sup>i</sup> ∗ <sup>→</sup>R*m*−<sup>1</sup> <sup>N</sup>i. Since <sup>∗</sup> <sup>→</sup>R*m*−<sup>1</sup> <sup>⊆</sup> <sup>∗</sup> −→- <sup>R</sup>*m*−<sup>1</sup> , we have <sup>M</sup><sup>i</sup> ∗ −→- <sup>R</sup>*m*−<sup>1</sup> <sup>N</sup>i. Thus, the desired Q<sup>i</sup> is obtained by induction hypothesis and Lemma 3.
	- (b) Case M<sup>i</sup> M →*i* <sup>R</sup>*<sup>m</sup>* <sup>N</sup>i. Then <sup>M</sup><sup>i</sup> <sup>=</sup> lθ, <sup>N</sup><sup>i</sup> <sup>=</sup> rθ and <sup>R</sup><sup>m</sup>−<sup>1</sup> cθ for some <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> ∈ R and <sup>θ</sup>. If all redex occurrences <sup>B</sup> <sup>j</sup> in <sup>M</sup><sup>i</sup> are contained in the substitution θ, then the desired Q<sup>i</sup> exists by Lemmas 3, 4 and induction hypothesis. Suppose otherwise, i.e. there exists B <sup>j</sup> which is not contained in <sup>θ</sup>. Let <sup>X</sup> <sup>=</sup> {B <sup>j</sup> <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> q, B ¯ <sup>j</sup> is not contained in <sup>θ</sup> } and <sup>Y</sup> <sup>=</sup> {B <sup>j</sup> <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> q, B ¯ <sup>j</sup> is contained in <sup>θ</sup> }. For each <sup>B</sup> <sup>j</sup> <sup>∈</sup> <sup>X</sup>, either B j B- *j* <sup>→</sup>S*<sup>n</sup>* <sup>B</sup>˜ <sup>j</sup> or <sup>B</sup> j ∗ <sup>→</sup>S*<n* <sup>B</sup>˜ <sup>j</sup> . We distinguish two cases.
		- i Case that there exists B <sup>j</sup> <sup>∈</sup> <sup>X</sup> such that <sup>B</sup> j B- *j* <sup>→</sup>S*<sup>n</sup>* <sup>B</sup>˜ <sup>j</sup> . W.l.o.g. suppose j = 1, i.e. B <sup>1</sup> <sup>∈</sup> <sup>X</sup> and <sup>B</sup> 1 B- →1 <sup>S</sup>*<sup>n</sup>* <sup>B</sup>˜ <sup>1</sup>. Let M<sup>i</sup> B- →1 <sup>S</sup>*<sup>n</sup>* <sup>M</sup>˜i. Note also here M˜<sup>i</sup> B- 2,...,B- *q*¯ −→- <sup>S</sup>*<sup>n</sup>* <sup>P</sup>i. The proof of this case is illustrated in Fig. 1. Let l <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> ∈ S, <sup>B</sup> <sup>1</sup> = l <sup>θ</sup> and <sup>S</sup><sup>n</sup>−<sup>1</sup> <sup>c</sup> θ . Then, since B <sup>1</sup> is not contained in <sup>θ</sup>, <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> ∈ R and <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> ∈ S overlap. Furthermore, as B <sup>1</sup> <sup>⊂</sup> <sup>M</sup>i, we have v, u⇐c , c<sup>∈</sup> *CCP*in(S, <sup>R</sup>) and there exists a substitution θ such that M˜<sup>i</sup> = vθ and N<sup>i</sup> = uθ. By our critical pair condition (2), we obtain <sup>M</sup>˜<sup>i</sup> −→- <sup>R</sup>*<sup>m</sup>* <sup>Q</sup>˜<sup>i</sup> ∗ −→- <sup>R</sup>*<m* <sup>N</sup>i; let M˜<sup>i</sup> C1,...,C*r*¯ −→- <sup>R</sup>*<sup>m</sup>* <sup>Q</sup>˜i. Let <sup>Γ</sup> <sup>=</sup> {C<sup>i</sup> | ∃B <sup>j</sup> (<sup>j</sup> = 1). C<sup>i</sup> <sup>⊂</sup> <sup>B</sup> j}∪{B <sup>i</sup> <sup>|</sup> <sup>i</sup> = <sup>1</sup>, <sup>∃</sup>C<sup>j</sup> . B <sup>i</sup> <sup>⊆</sup> <sup>C</sup>j}. Occurrences in <sup>Γ</sup> are distinct, and for any <sup>B</sup>˜ <sup>∈</sup> <sup>Γ</sup> , there exists B <sup>j</sup> (2 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>q</sup>¯) such that <sup>B</sup>˜ <sup>⊆</sup> <sup>B</sup> <sup>j</sup> . Thus, <sup>|</sup>Γ | ≤ <sup>q</sup>¯ <sup>j</sup>=2 <sup>|</sup>B j | holds. Hence, we obtain <sup>|</sup>Γ | ≤ <sup>q</sup>¯ <sup>j</sup>=2 <sup>|</sup>B <sup>j</sup> <sup>|</sup> <sup>&</sup>lt; <sup>q</sup>¯ <sup>j</sup>=1 <sup>|</sup>B <sup>j</sup> |≤|Γ|. Thus, one can apply induction hypothesis to Q˜<sup>i</sup> C1,...,C*r*¯ ←−- <sup>R</sup>*<sup>m</sup>* <sup>M</sup>˜<sup>i</sup> B- 2,...,B- *q*¯ −→- <sup>S</sup>*<sup>n</sup>* <sup>P</sup><sup>i</sup> so as to obtain Q˜ <sup>i</sup>, <sup>P</sup>˜<sup>i</sup> such that <sup>Q</sup>˜<sup>i</sup> −→- <sup>S</sup>*<sup>n</sup>* <sup>Q</sup>˜ i ∗ −→- <sup>S</sup>*<n* <sup>P</sup>˜<sup>i</sup> and <sup>P</sup><sup>i</sup> ∗ −→- <sup>R</sup>*<sup>m</sup>* <sup>P</sup>˜i. Since we have N<sup>i</sup> ∗ ←−- R*<m* <sup>Q</sup>˜<sup>i</sup> −→- <sup>S</sup>*<sup>n</sup>* <sup>Q</sup>˜ i, by applying induction hypothesis and Lemma 3, it follows that there exists <sup>N</sup>˜<sup>i</sup> such that <sup>N</sup><sup>i</sup> −→- S*<sup>n</sup>* ◦ <sup>∗</sup> −→- <sup>S</sup>*<n* <sup>N</sup>˜<sup>i</sup> and <sup>Q</sup>˜ i ∗ −→- <sup>R</sup>*<m* <sup>N</sup>˜i. Then, by induction hypothesis and Lemma 3, it follows that there exists Q<sup>i</sup> such that N˜<sup>i</sup> ∗ −→- <sup>&</sup>lt;S*<sup>n</sup>* <sup>Q</sup><sup>i</sup> and P˜i ∗ −→-<sup>R</sup>*<m* <sup>Q</sup>i.
		- ii Case that B j ∗ <sup>→</sup>S*n*−<sup>1</sup> <sup>B</sup>˜ <sup>j</sup> holds for any <sup>B</sup> <sup>j</sup> <sup>∈</sup> <sup>X</sup>. As <sup>M</sup><sup>i</sup> B- 1,...,B- *q*¯ −→- <sup>S</sup>*<sup>n</sup>* <sup>P</sup><sup>i</sup> and B 1,...,B <sup>q</sup>¯ are parallel, we can first rewrite all <sup>B</sup> <sup>j</sup> <sup>∈</sup> <sup>Y</sup> (1 <sup>≤</sup>

Fig. 1. Case 1.(b).i

Fig. 2. Case 1.(b).ii

<sup>j</sup> <sup>≤</sup> <sup>q</sup>¯). Namely, let <sup>Y</sup> <sup>=</sup> {B <sup>1</sup> ,...,B <sup>r</sup>¯ }, and we have <sup>M</sup><sup>i</sup> B-- <sup>1</sup> ,...,B-- *r*¯ −→- S*<sup>n</sup>* M˜i ∗ <sup>→</sup>S*n*−<sup>1</sup> <sup>P</sup>i. The proof of this case is illustrated in Fig. 2. Here, since each B <sup>j</sup> in contained in the substitution <sup>θ</sup>, one can use Lemma <sup>4</sup> to obtain <sup>Q</sup>˜ such that <sup>N</sup><sup>i</sup> −→- <sup>S</sup>*<sup>n</sup>* <sup>Q</sup>˜ and <sup>M</sup>˜<sup>i</sup> <sup>→</sup>R*<sup>m</sup>* <sup>Q</sup>˜. Now, since <sup>→</sup>R*<sup>m</sup>* <sup>⊆</sup> −→- <sup>R</sup>*<sup>m</sup>* and <sup>∗</sup> <sup>→</sup>S*n*−<sup>1</sup> <sup>⊆</sup> <sup>∗</sup> −→- <sup>S</sup>*n*−<sup>1</sup> , we have <sup>Q</sup>˜ ←−- R*<sup>m</sup>* <sup>M</sup>˜<sup>i</sup> ∗ −→- <sup>S</sup>*n*−<sup>1</sup> <sup>P</sup>i. Then, using induction hypothesis and Lemma 3, we can obtain Q<sup>i</sup> such that Q˜ <sup>∗</sup> −→- <sup>S</sup>*n*−<sup>1</sup> <sup>Q</sup>i, <sup>P</sup><sup>i</sup> ∗ −→- <sup>R</sup>*<sup>m</sup>* <sup>Q</sup>i. As a side remark, we mention that our first key ingredient becomes necessary to solve this case.

$$\begin{array}{l} \text{2. Case } M\_{i} \in \{B\_{1}, \ldots, B\_{\bar{n}}\}. \text{ Let } \{A\_{1}^{\prime}, \ldots, A\_{\bar{q}}^{\prime}\} = \{A\_{j} \mid 1 \le j \le \bar{n}, A\_{j}^{\prime} \subseteq M\_{i}\}. \\ \text{Then one can put } M\_{i} = C\_{i}[A\_{1}^{\prime}, \ldots, A\_{\bar{q}}^{\prime}], N\_{i} = C\_{i}[\bar{A}\_{1}^{\prime}, \ldots, \bar{A}\_{\bar{q}}^{\prime}], M\_{i} \overset{A\_{1}^{\prime}, \ldots, A\_{\bar{q}}^{\prime}}{\longleftrightarrow} \bar{N}\_{\mathcal{R}\_{m}} \\ \text{and } M\_{i} \overset{M\_{i}}{\longleftrightarrow} \mathcal{S}\_{\mathcal{R}\_{i}}. \text{ By definition, } M\_{i} \overset{M\_{i}}{\longleftrightarrow} \mathcal{S}\_{\mathcal{R}\_{n}} P\_{i} \text{ is either of the form } M\_{i} \overset{\*}{\twoheadrightarrow} \mathcal{S}\_{\mathcal{R}\_{n}-1} P\_{i} \\ \text{or } M\_{i} \overset{M\_{i}}{\twoheadrightarrow} \mathcal{S}\_{\mathcal{R}\_{n}} P\_{i}. \end{array}$$

Suppose M<sup>i</sup> ∗ <sup>→</sup>S*n*−<sup>1</sup> <sup>P</sup>i. Then, we have <sup>M</sup><sup>i</sup> ∗ −→- <sup>S</sup>*n*−<sup>1</sup> <sup>P</sup><sup>i</sup> and thus the desired Q<sup>i</sup> exists by induction hypothesis and Lemma 3.

Thus, it remains to consider the case M<sup>i</sup> M →*i* <sup>S</sup>*<sup>n</sup>* <sup>P</sup>i. Then there exists <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> ∈ S and <sup>θ</sup> such that <sup>M</sup><sup>i</sup> <sup>=</sup> <sup>l</sup> θ , P<sup>i</sup> = r θ and c <sup>θ</sup> <sup>⊆</sup> <sup>∗</sup> →S*n*−<sup>1</sup> . We distinguish whether all redex occurrences A <sup>j</sup> in <sup>M</sup><sup>i</sup> are contained in <sup>θ</sup> or not. If all redex occurrences A <sup>j</sup> in <sup>M</sup><sup>i</sup> are contained in <sup>θ</sup> , then using <sup>→</sup>S*<sup>n</sup>* <sup>⊆</sup> −→- <sup>S</sup>*<sup>n</sup>* ◦ <sup>∗</sup> −→- S*<n* and −→- <sup>R</sup>*<sup>m</sup>* <sup>⊆</sup> <sup>∗</sup> <sup>→</sup>R*<sup>m</sup>*, one obtains desired <sup>Q</sup><sup>i</sup> by Lemma 4.

So, let us consider there exists A <sup>j</sup> which is not contained in <sup>θ</sup> . Let X = {A <sup>j</sup> <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> q, A¯ <sup>j</sup> is not contained in <sup>θ</sup> } and <sup>Y</sup> <sup>=</sup> {A <sup>j</sup> <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> q, A¯ <sup>j</sup> is contained in <sup>θ</sup> }. Then for each <sup>A</sup> <sup>j</sup> <sup>∈</sup> <sup>X</sup> , we have either A j A- *j* <sup>→</sup>R*<sup>m</sup>* <sup>A</sup>˜ j , or A j ∗ <sup>→</sup>R*<m* <sup>A</sup>˜ <sup>j</sup> . We distinguish two cases.

	- i Case (α). Then we have M<sup>i</sup> = A 1 A- →1 <sup>R</sup>*<sup>m</sup>* <sup>A</sup>˜ <sup>1</sup> <sup>=</sup> <sup>N</sup><sup>i</sup> and <sup>M</sup><sup>i</sup> M →*i* <sup>S</sup>*<sup>n</sup>* <sup>P</sup>i. By lθ = M<sup>i</sup> = lθ , xθ <sup>=</sup> xθ for any <sup>x</sup> ∈ V(l). We also have <sup>R</sup><sup>m</sup>−<sup>1</sup> cθ and <sup>S</sup><sup>n</sup>−<sup>1</sup> cθ . Thus, if <sup>V</sup>(r) ⊆ V(l), then rθ <sup>=</sup> rθ , and it suffices to take rθ as <sup>Q</sup>i. Suppose otherwise, i.e. <sup>V</sup>(r) <sup>V</sup>(l). Below, let <sup>c</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> <sup>≈</sup> <sup>t</sup>1,...,s<sup>j</sup> <sup>≈</sup> <sup>t</sup><sup>j</sup> and <sup>c</sup><sup>k</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> <sup>≈</sup> <sup>t</sup>1,...,s<sup>k</sup> <sup>≈</sup> <sup>t</sup><sup>k</sup> (1 <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>j</sup>). We now show there are substitution <sup>ρ</sup><sup>k</sup> (<sup>k</sup> ∈ {0,...,j}) satisfying the following properties (a)–(c) by induction.
		- (a) <sup>ρ</sup><sup>k</sup> <sup>=</sup> <sup>θ</sup> <sup>=</sup> <sup>θ</sup> [V(l)].
		- (b) *dom*(ρk) ⊆ V(l) ∪ V(ck).
		- (c) for any <sup>x</sup> ∈ V(l)∪V(ck), we have xθ <sup>∗</sup> −→- <sup>R</sup>*m*−<sup>1</sup> xρ<sup>k</sup> and xθ <sup>∗</sup> −→- S*n*−<sup>1</sup> xρk.

If <sup>k</sup> = 0 then take <sup>ρ</sup><sup>0</sup> <sup>=</sup> <sup>θ</sup>|V(l), and (a)–(c) follow. Suppose k > <sup>0</sup>. Since <sup>r</sup> contains an extra variable and <sup>R</sup> (or <sup>S</sup>) is properly oriented, we have <sup>V</sup>(sk) ⊆ V(l) ∪ V(c<sup>k</sup>−<sup>1</sup>). Thus, by induction hypothesis on (c), we have skθ <sup>∗</sup> −→- <sup>S</sup>*n*−<sup>1</sup> <sup>s</sup>kρ<sup>k</sup>−<sup>1</sup> and <sup>s</sup>kθ <sup>∗</sup> −→- <sup>R</sup>*m*−<sup>1</sup> <sup>s</sup>kρ<sup>k</sup>−<sup>1</sup>. Furthermore, we have skθ <sup>∗</sup> <sup>→</sup>R*m*−<sup>1</sup> <sup>t</sup>k<sup>θ</sup> and <sup>s</sup>kθ <sup>∗</sup> <sup>→</sup>S*n*−<sup>1</sup> <sup>t</sup>kθ by <sup>R</sup><sup>m</sup>−<sup>1</sup> cθ and <sup>S</sup><sup>n</sup>−<sup>1</sup> cθ , respectively. Hence, <sup>s</sup>kρ<sup>k</sup>−<sup>1</sup> ∗ ←−- S*n*−<sup>1</sup> <sup>s</sup>k<sup>θ</sup> <sup>∗</sup> −→- <sup>R</sup>*m*−<sup>1</sup> <sup>t</sup>k<sup>θ</sup> and tkθ <sup>∗</sup> ←−- S*n*−<sup>1</sup> <sup>s</sup>kθ <sup>∗</sup> −→- <sup>R</sup>*m*−<sup>1</sup> <sup>s</sup>kρ<sup>k</sup>−<sup>1</sup>. Then, by applying induction hypothesis and Lemma 3, we obtain q , r such that <sup>s</sup>kρ<sup>k</sup>−<sup>1</sup> ∗ −→- R*m*−<sup>1</sup> q <sup>∗</sup> ←−- S*n*−<sup>1</sup> <sup>t</sup>k<sup>θ</sup> and <sup>t</sup>kθ <sup>∗</sup> −→- <sup>R</sup>*m*−<sup>1</sup> <sup>r</sup> <sup>∗</sup> ←−- S*n*−<sup>1</sup> <sup>s</sup>kρ<sup>k</sup>−<sup>1</sup>. Thus, one obtains r <sup>∗</sup> ←−- S*n*−<sup>1</sup> <sup>s</sup>kρ<sup>k</sup>−<sup>1</sup> ∗ −→- <sup>R</sup>*m*−<sup>1</sup> <sup>q</sup> . Again, by applying induction hypothesis and Lemma 3, we obtain s such that r <sup>∗</sup> −→- <sup>R</sup>*m*−<sup>1</sup> <sup>s</sup> <sup>∗</sup> ←−- S*n*−<sup>1</sup> <sup>q</sup> . Thus, we have tkθ <sup>∗</sup> −→- <sup>S</sup>*n*−<sup>1</sup> <sup>s</sup> and <sup>t</sup>kθ <sup>∗</sup> −→- <sup>R</sup>*m*−<sup>1</sup> <sup>s</sup> .

We know that <sup>t</sup><sup>k</sup> is either a ground <sup>R</sup>u-normal form or a linear constructor term (w.r.t. <sup>R</sup>) by the right-stability of <sup>R</sup>, and that <sup>t</sup><sup>k</sup> is either a ground Su-normal form or a linear constructor term (w.r.t. <sup>S</sup>) by the right-stability of <sup>R</sup>. Suppose <sup>t</sup><sup>k</sup> is a ground <sup>R</sup>u-normal form or <sup>t</sup><sup>k</sup> is a ground <sup>S</sup>u-normal form. Then, <sup>t</sup>kθ <sup>=</sup> <sup>t</sup>k<sup>θ</sup> <sup>=</sup> <sup>t</sup><sup>k</sup> by <sup>V</sup>(tk) = <sup>∅</sup>, and thus, <sup>t</sup><sup>k</sup> <sup>=</sup> <sup>s</sup> by <sup>t</sup>kθ <sup>∗</sup> −→- <sup>R</sup>*m*−<sup>1</sup> <sup>s</sup> . Furthermore, as we are assuming <sup>V</sup>(r) <sup>V</sup>(l), we know <sup>V</sup>(si) ⊆ V(l) ∪ V(c<sup>i</sup>−<sup>1</sup>) from the proper-orientedness of <sup>R</sup> (or <sup>S</sup>). Thus, <sup>V</sup>(l)∪ V(ck) = <sup>V</sup>(l)∪ V(c<sup>k</sup>−<sup>1</sup>). Hence, <sup>ρ</sup><sup>k</sup> := <sup>ρ</sup><sup>k</sup>−<sup>1</sup> satisfies (a)–(c). Suppose otherwise. Then <sup>t</sup><sup>k</sup> is linear and is a constructor term w.r.t. both R and S. Then, by tkθ <sup>∗</sup> −→- <sup>S</sup>*n*−<sup>1</sup> <sup>s</sup> , there exists a substitution ρ such that s = tkρ and *dom*(ρ) ⊆ V(tk) such that for any <sup>x</sup> ∈ V(tk), xθ <sup>∗</sup> −→- <sup>S</sup>*n*−<sup>1</sup> xρ. Furthermore, by tkθ <sup>∗</sup> −→- <sup>R</sup>*m*−<sup>1</sup> <sup>s</sup> , there exists a substitution ρ such that s = tkρ and *dom*(ρ ) ⊆ V(tk) such that for any <sup>x</sup> ∈ V(tk), xθ <sup>∗</sup> −→- <sup>R</sup>*m*−<sup>1</sup> xρ . Now, because tkρ = s = tkρ , we know xρ = xρ for any <sup>x</sup> ∈ V(tk), and thus <sup>ρ</sup> <sup>=</sup> <sup>ρ</sup> from *dom*(ρ), *dom*(ρ ) ⊆ V(tk). We also have <sup>V</sup>(tk) <sup>∩</sup> (V(l) ∪ V(c<sup>k</sup>−<sup>1</sup>)) = <sup>∅</sup> by the right-stability of <sup>R</sup> (or <sup>S</sup>), and thus, *dom*(ρ) <sup>∩</sup> *dom*(ρ<sup>k</sup>−<sup>1</sup>) = <sup>∅</sup>. Hence, <sup>ρ</sup><sup>k</sup> := <sup>ρ</sup><sup>k</sup>−<sup>1</sup> <sup>∪</sup> <sup>ρ</sup> is a substitution, and ρ<sup>k</sup> satisfies (a)–(c). This completes the induction proof for existence of substitutions <sup>ρ</sup><sup>k</sup> satisfying (a)–(c) (1 <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>j</sup>). Now consider the substitution <sup>ρ</sup><sup>j</sup> . Since <sup>R</sup> (and <sup>S</sup>) is a 3-CTRS, we have <sup>V</sup>(r) ⊆ V(l)∪V(c<sup>j</sup> ). Thus, by the condition (c), <sup>N</sup><sup>i</sup> <sup>=</sup> rθ <sup>∗</sup> −→- S*n*−<sup>1</sup> rρ<sup>j</sup> and P<sup>i</sup> = rθ <sup>∗</sup> −→- <sup>R</sup>*m*−<sup>1</sup> rρ<sup>j</sup> hold. Thus, taking <sup>Q</sup><sup>i</sup> := rρ<sup>j</sup> , and we have N<sup>i</sup> ∗ −→- <sup>S</sup>*n*−<sup>1</sup> <sup>Q</sup><sup>i</sup> and <sup>P</sup><sup>i</sup> ∗ −→-<sup>R</sup>*m*−<sup>1</sup> <sup>Q</sup>i.

ii Case (β). Let M<sup>i</sup> A- →1 <sup>R</sup>*<sup>m</sup>* <sup>M</sup>˜<sup>i</sup> A- 2,...,A- *q*¯ −→- <sup>S</sup>*<sup>n</sup>* <sup>N</sup>i. The proof of this case is illustrated in Fig. 3 (left). Because there exists an overlap between <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> ∈ R and <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> ∈ S, there is substitution <sup>θ</sup> and a position <sup>p</sup> <sup>∈</sup> Pos<sup>F</sup> (<sup>l</sup> ) such that M<sup>i</sup> = l θ = l θ[lθ]<sup>p</sup> = lθ[A <sup>1</sup>]p. Then, M˜<sup>i</sup> = l [r]pθ, P<sup>i</sup> = r <sup>θ</sup>, <sup>R</sup><sup>m</sup>−<sup>1</sup> cθ and <sup>S</sup><sup>n</sup>−<sup>1</sup> <sup>c</sup> θ. Then, there exists an CCP u, v⇐d, d <sup>∈</sup> *CCP*(R, <sup>S</sup>), where <sup>u</sup> <sup>=</sup> <sup>l</sup> [r]pσ, v = r σ, d = cσ and d = c σ for the mgu σ of l <sup>|</sup><sup>p</sup> and <sup>l</sup>. Then, as (l <sup>θ</sup>)<sup>p</sup> <sup>=</sup> lθ, we have <sup>θ</sup> <sup>=</sup> <sup>ρ</sup> ◦ <sup>σ</sup> for some <sup>ρ</sup>. Thus, <sup>P</sup><sup>i</sup> <sup>=</sup> <sup>r</sup> θ = (r σ)ρ = vρ, M˜<sup>i</sup> = l [r]pθ = (l [r]pσ)<sup>ρ</sup> <sup>=</sup> uρ, <sup>R</sup><sup>m</sup>−<sup>1</sup> dρ, and <sup>S</sup><sup>n</sup>−<sup>1</sup> d <sup>ρ</sup>. Hence, by our critical pair condition (2), uρ −→- <sup>S</sup>*<sup>n</sup>* ◦ <sup>∗</sup> −→- <sup>S</sup>*<n* <sup>s</sup> and vρ <sup>∗</sup> −→- <sup>R</sup>*<sup>m</sup>* <sup>s</sup> for some <sup>s</sup>, and thus, by taking <sup>P</sup>˜<sup>i</sup> := sρ, we have <sup>M</sup>˜<sup>i</sup> −→- <sup>S</sup>*<sup>n</sup>* <sup>P</sup>˜ i ∗ −→- <sup>S</sup>*<n* <sup>P</sup>˜<sup>i</sup> and <sup>P</sup><sup>i</sup> ∗ −→- <sup>R</sup>*<sup>m</sup>* <sup>P</sup>˜<sup>i</sup> for some <sup>P</sup>˜ i . Suppose M˜<sup>i</sup> C1,...,C*r*¯ −→- <sup>S</sup>*<sup>n</sup>* <sup>P</sup>˜ <sup>i</sup> . Let <sup>Γ</sup> <sup>=</sup> {A <sup>i</sup> | ∃C<sup>j</sup> .A <sup>i</sup> <sup>⊂</sup> <sup>C</sup>j}∪{C<sup>i</sup> <sup>|</sup> ∃A <sup>j</sup> .C<sup>i</sup> <sup>⊆</sup> <sup>A</sup> <sup>j</sup>}. Occurrences in <sup>Γ</sup> are distinct, and for any <sup>C</sup>˜ <sup>∈</sup> <sup>Γ</sup> , there exists A <sup>j</sup> (2 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>q</sup>¯) such that <sup>C</sup>˜ <sup>⊆</sup> <sup>A</sup> <sup>j</sup> . Hence, <sup>|</sup>Γ


Fig. 3. Case 2.(a).ii (left) and Case 2.(b) (right)

<sup>q</sup>¯ <sup>j</sup>=2 <sup>|</sup>A <sup>j</sup> <sup>|</sup> <sup>&</sup>lt; <sup>q</sup>¯ <sup>j</sup>=1 <sup>|</sup>A <sup>j</sup> |≤|Γ|. Thus, one can apply induction hypothesis to obtain <sup>Q</sup>˜<sup>i</sup> such that <sup>N</sup><sup>i</sup> −→- <sup>S</sup>*<sup>n</sup>* ◦ <sup>∗</sup> −→- <sup>S</sup>*<n* <sup>Q</sup>˜<sup>i</sup> and <sup>P</sup>˜ i ∗ −→- <sup>R</sup>*<sup>m</sup>* <sup>Q</sup>˜i. By applying induction hypothesis and Lemma 3 once again, we know that there exists Q<sup>i</sup> such that Q˜ <sup>∗</sup> −→- <sup>S</sup>*<n* <sup>Q</sup><sup>i</sup> ∗ ←−-R*<sup>m</sup>* <sup>P</sup>˜i.

(b) Case that A j ∗ <sup>→</sup>R*m*−<sup>1</sup> <sup>A</sup>˜ <sup>j</sup> for any <sup>A</sup> <sup>j</sup> <sup>∈</sup> <sup>X</sup> . Since M<sup>i</sup> A- 1,...,A- *q*¯ −→- <sup>R</sup>*<sup>m</sup>* <sup>N</sup><sup>i</sup> and A 1,...,A <sup>q</sup>¯ are parallel, one can rewrite <sup>A</sup> <sup>j</sup> <sup>∈</sup> <sup>Y</sup> first. That is, Mi A-- <sup>1</sup> ,...,A-- *r*¯ −→- <sup>R</sup>*<sup>m</sup>* <sup>M</sup>˜<sup>i</sup> ∗ <sup>→</sup>R*m*−<sup>1</sup> <sup>N</sup><sup>i</sup> where <sup>Y</sup> <sup>=</sup> {A <sup>1</sup> ,...,A <sup>r</sup>¯}. The proof of this case is illustrated in Fig. 3 (right). Then, as each A <sup>j</sup> is contained in θ , by Lemma 4, there exists <sup>Q</sup>˜ such that <sup>M</sup>˜<sup>i</sup> <sup>→</sup>S*<sup>n</sup>* <sup>Q</sup>˜ and <sup>P</sup><sup>i</sup> −→- <sup>R</sup>*<sup>m</sup>* <sup>Q</sup>˜. Furthermore, as <sup>→</sup>S*<sup>n</sup>* <sup>⊆</sup> −→- <sup>S</sup>*<sup>n</sup>* and <sup>∗</sup> <sup>→</sup>R*m*−<sup>1</sup> <sup>⊆</sup> <sup>∗</sup> −→- <sup>R</sup>*m*−<sup>1</sup> , one can apply induction hypothesis and Lemma 3 to N<sup>i</sup> ∗ <sup>←</sup>R*m*−<sup>1</sup> <sup>M</sup>˜<sup>i</sup> <sup>→</sup>S*<sup>n</sup>* <sup>Q</sup>˜ to obtain <sup>Q</sup><sup>i</sup> such that <sup>N</sup><sup>i</sup> −→- <sup>S</sup>*<sup>n</sup>* ◦ <sup>∗</sup> −→- <sup>S</sup>*<n* <sup>Q</sup><sup>i</sup> and <sup>Q</sup>˜ <sup>∗</sup> −→-<sup>R</sup>*<sup>m</sup>* <sup>Q</sup>i.

Finally, from Lemma 2 we conclude that R and S are level-commutative.

A level-confluence criterion is obtained by taking R = S. Note that one can use *CCPout* instead of *CCP* in the first condition, contrast to the commutativity criterion, as the second condition implies the part for *CCPin*(R) of it.

Corollary 1. *Let* R *be a left-linear, properly oriented, right-stable 3-CTRS. If the following conditions are satisfied, then* R *is level-confluent:*


*Example 1.* Let R and S be the following CTRSs:

$$\mathcal{R} = \left\{ \begin{aligned} \mathsf{p}(x) &\to \mathsf{q}(x) \\ \mathsf{r}(x) &\to \mathsf{s}(\mathsf{p}(x)) \\ \mathsf{s}(x) &\to \mathsf{f}(y) \end{aligned} \right\} \quad \mathcal{S} = \left\{ \begin{aligned} \mathsf{p}(x) &\to \mathsf{r}(x) \\ \mathsf{q}(x) &\to \mathsf{s}(\mathsf{p}(x)) \\ \mathsf{s}(x) &\to \mathsf{f}(y) \end{aligned} \right\}$$

We have *CCP*(R, <sup>S</sup>) = {q(x),r(x) ⇐ ∅, ∅} and *CCPin*(S, <sup>R</sup>) = <sup>∅</sup>. Note that the overlap of <sup>s</sup>(x) <sup>→</sup> <sup>f</sup>(y) ⇐ <sup>p</sup>(x) <sup>≈</sup> <sup>y</sup> ∈ R and <sup>s</sup>(x) <sup>→</sup> <sup>f</sup>(y) ⇐ <sup>p</sup>(x) <sup>≈</sup> <sup>y</sup> ∈ S is not considered, as these rules are identical; the case 2.(a).i of the proof above treats this case. Now, because we have <sup>q</sup>(x) <sup>→</sup>S*<sup>n</sup>* <sup>s</sup>(p(x)) and <sup>r</sup>(x) <sup>→</sup>R*<sup>m</sup>* <sup>s</sup>(p(x)) (n, m <sup>≥</sup> <sup>1</sup>) the condition (1) of the Theorem <sup>1</sup> is satisfied. Other conditions of the theorem are also satisfied. Thus, R and S are level-commutative. Similarly, one can show R∪S is level-confluent.

*Example 2.* Take CTRSs R = R ∪ R<sup>f</sup> and S = S ∪ R<sup>f</sup> such that

$$\mathcal{R}' = \left\{ \begin{aligned} \mathfrak{p}(x, y) &\to \mathfrak{r}(x, y) \ \Leftarrow x \approx \mathtt{a} \\ \mathfrak{q}(x, y) &\to \mathfrak{p}(x, y) \ \Leftarrow x \approx \mathtt{a} \end{aligned} \right\} \quad \mathcal{S}' = \left\{ \begin{aligned} \mathfrak{p}(x, y) &\to \mathfrak{q}(x, y) \Leftarrow y \approx \mathtt{b} \\ \mathfrak{r}(x, y) &\to \mathfrak{p}(x, y) \Leftarrow y \approx \mathtt{b} \end{aligned} \right\}$$

and <sup>R</sup><sup>f</sup> <sup>=</sup> {f(0) <sup>→</sup> <sup>a</sup>, <sup>f</sup>(s(x)) <sup>→</sup> <sup>b</sup> ⇐ <sup>f</sup>(x) <sup>≈</sup> <sup>a</sup>, <sup>f</sup>(s(x)) <sup>→</sup> <sup>a</sup> ⇐ <sup>f</sup>(x) <sup>≈</sup> <sup>b</sup>}. We have *CCP*(R, <sup>S</sup>) = { (a) : <sup>r</sup>(x, y), <sup>q</sup>(x, y) ⇐ {<sup>x</sup> <sup>≈</sup> <sup>a</sup>}, {<sup>y</sup> <sup>≈</sup> <sup>b</sup>}, (b) : <sup>a</sup>, <sup>b</sup> ⇐ {f(x) <sup>≈</sup> <sup>b</sup>}, {f(x) <sup>≈</sup> <sup>a</sup>}, (c) : <sup>b</sup>, <sup>a</sup> ⇐ {f(x) <sup>≈</sup> <sup>a</sup>}, {f(x) <sup>≈</sup> <sup>b</sup>}}, and *CCPin*(S, <sup>R</sup>) = <sup>∅</sup>. For the CCP (a), let m, n <sup>≥</sup> <sup>1</sup> and <sup>ρ</sup> be any substitution, and suppose that <sup>ρ</sup>(x) <sup>→</sup>R*m*−<sup>1</sup> <sup>a</sup> and <sup>ρ</sup>(y) <sup>→</sup>S*n*−<sup>1</sup> <sup>b</sup>. Then, we have <sup>r</sup>(ρ(x), ρ(y)) <sup>→</sup>S*<sup>n</sup>* <sup>p</sup>(ρ(x), ρ(y)) and <sup>q</sup>(ρ(x), ρ(y)) <sup>→</sup>R*<sup>m</sup>* <sup>p</sup>(ρ(x), ρ(y)). Also, note that there is no term t such that t <sup>∗</sup> <sup>→</sup><sup>R</sup> <sup>b</sup> and <sup>t</sup> <sup>∗</sup> <sup>→</sup><sup>S</sup> <sup>a</sup> (or <sup>t</sup> <sup>∗</sup> <sup>→</sup><sup>R</sup> <sup>a</sup> and <sup>t</sup> <sup>∗</sup> →<sup>S</sup> b). Thus, the condition (1) of the Theorem 1 holds for CCPs (a)–(c). Other conditions of the theorem are also satisfied. Thus, R and S are level-commutative. Similarly, one can show R∪S is level-confluent.

Since TRSs can be regarded as CTRSs with no conditions and they are trivially properly-oriented, right-stable, and of type 3, this theorem covers Proposition 1. However, this does not mean our theorem broaden the scope of TRSs that can be guaranteed to commute—because rewrite steps of TRSs are level 1 rewrite steps in CTRSs, our condition reduces to the one of Proposition 1 in TRSs. Thus, when restricting to TRSs, Theorem 1 coincides Proposition 1.

On the other hand, Corollary 1 properly extends Proposition 2, as witnessed by R∪S in Examples 1, 2.

#### 4 Critical Pair Criteria for Join and Semi-Equational CTRSs

In this section, we explore critical pair criteria for join and semi-equational CTRSs, following our approach in the previous section.

First, let us fix additional notions and notations that will be used in this section. A rewrite step of *join* CTRS R is defined via the following TRS R<sup>n</sup>

(<sup>n</sup> <sup>∈</sup> <sup>N</sup>), which are inductively given as follows: <sup>R</sup><sup>0</sup> <sup>=</sup> <sup>∅</sup>, <sup>R</sup>n+1 <sup>=</sup> {lσ <sup>→</sup> rσ <sup>|</sup> <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> ∈ R, cσ <sup>⊆</sup> <sup>∗</sup> <sup>→</sup>R*<sup>n</sup>* ◦ <sup>∗</sup> ←R*<sup>n</sup>* }. For *semi-equational* CTRS R, we modify the second clause as: <sup>R</sup>n+1 <sup>=</sup> {lσ <sup>→</sup> rσ <sup>|</sup> <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>c</sup> ∈ R, cσ <sup>⊆</sup> <sup>∗</sup> ↔R*<sup>n</sup>* }. Similarly to the oriented case, a rewrite step <sup>s</sup> <sup>→</sup><sup>R</sup> <sup>t</sup> of <sup>R</sup> is given as <sup>s</sup> <sup>→</sup><sup>R</sup> <sup>t</sup> iff <sup>s</sup> <sup>→</sup>R*<sup>n</sup>* <sup>t</sup> for some <sup>n</sup>, and the smallest <sup>n</sup> such that <sup>s</sup> <sup>→</sup>R*<sup>n</sup>* <sup>t</sup> is called the *level* of the rewrite step <sup>s</sup> <sup>→</sup><sup>R</sup> <sup>t</sup>. We write <sup>↓</sup>R*<sup>n</sup>* (↓R) for the relation <sup>∗</sup> <sup>→</sup>R*<sup>n</sup>* ◦ <sup>∗</sup> <sup>←</sup>R*<sup>n</sup>* (resp. <sup>∗</sup> <sup>→</sup><sup>R</sup> ◦ <sup>∗</sup> ←R).

In this section (except Subsect. 4.2), in order to distinguish three types of CTRSs, we write <sup>R</sup><sup>o</sup> for an oriented CTRS, <sup>R</sup><sup>j</sup> for a join CTRS, and <sup>R</sup><sup>s</sup> for a semi-equational CTRS. Similarly, notations <sup>R</sup><sup>o</sup> <sup>n</sup>, <sup>R</sup><sup>j</sup> <sup>n</sup>,... are employed. Notations Ro <sup>n</sup> cσ (R<sup>j</sup> <sup>n</sup> cσ, <sup>R</sup><sup>s</sup> <sup>n</sup> cσ) stands for cσ <sup>⊆</sup> <sup>∗</sup> →R<sup>o</sup> *<sup>n</sup>* (resp. cσ ⊆ ↓R<sup>j</sup> *<sup>n</sup>* , cσ <sup>⊆</sup> <sup>∗</sup> ↔R<sup>s</sup> *<sup>n</sup>* ).

The following basic relations between rewrite relation on three types of CTRSs on each level are essentially proved in [18, Lemmas 1 and 2].

Lemma 5. *Let* R *be a CTRS. Then* →R<sup>o</sup> *<sup>n</sup>* ⊆ →R<sup>j</sup> *<sup>n</sup>* ⊆ →R<sup>s</sup> *<sup>n</sup> for each* n*.*

Notions of orthogonality, proper-orientedness and right-stability are syntaxoriented, and their definitions remain same for other types of CTRSs. Note that even under the conditions of proper-orientedness and right-stability, →R<sup>o</sup> *<sup>n</sup>* = →R<sup>j</sup> *<sup>n</sup>* does not hold in general.

#### 4.1 Level-Confluence of Join and Semi-Equational 3-CTRSs

In [14, Corollary 5.3], Proposition 2 is applied to show the corresponding class of join CTRSs are level-confluent:

Proposition 3 ([14]). *Let* R *be an orthogonal, properly oriented, right-stable 3-CTRS. Then* <sup>R</sup><sup>j</sup> *is level-confluent.*

Given our Theorem 1, a natural question is whether a similar extension is possible for our theorem. In this subsection, we give a partially positive answer to this question—we generalize the result above to the level-confluence part (Corollary 1) of our theorem, even though a similar extension does not work for level-commutation. Indeed, we show that above proposition can be extended to a more general setting of CTRSs where the orthogonality requirement is replaced with level-confluence of <sup>R</sup><sup>o</sup>. Furthermore, the generalization is obtained not only for join CTRSs but also for semi-equational CTRSs.

The next two lemmas are abstractions of the ones [14, Lemmas 5.1 and 5.2], where the proofs remain almost the same.

Lemma 6. *Let* <sup>R</sup> *be a properly oriented, right-stable 3-CTRS such that* <sup>R</sup><sup>o</sup> *is level-confluent. Let* <sup>l</sup> <sup>→</sup> <sup>r</sup> ⇐ <sup>s</sup><sup>1</sup> <sup>≈</sup> <sup>t</sup>1,...,s<sup>j</sup> <sup>≈</sup> <sup>t</sup><sup>j</sup> ∈ R*. If* <sup>s</sup>i<sup>σ</sup> <sup>↓</sup>R<sup>o</sup> *<sup>n</sup>*−<sup>1</sup> <sup>t</sup>i<sup>σ</sup> *for any* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>j</sup> *then* lσ <sup>↓</sup>R<sup>o</sup> *<sup>n</sup>* rσ*.*

Lemma 7. *Let* <sup>R</sup> *be a properly oriented, right-stable 3-CTRS such that* <sup>R</sup><sup>o</sup> *is level-confluent. If* <sup>s</sup> <sup>→</sup>R<sup>s</sup> *<sup>n</sup>* <sup>t</sup> *then* <sup>s</sup> <sup>↓</sup>R<sup>o</sup> *<sup>n</sup>* t*.*

Now we present the claimed result:

Theorem 2. *Let* <sup>R</sup> *be a properly oriented, right-stable 3-CTRS. If* <sup>R</sup><sup>o</sup> *is levelconfluent then* <sup>R</sup><sup>j</sup> *and* <sup>R</sup><sup>s</sup> *are level-confluent.*

*Proof.* Let <sup>R</sup> be a properly oriented, right-stable 3-CTRS such that <sup>R</sup><sup>o</sup> is levelconfluent. Suppose t<sup>1</sup> ∗ ←R<sup>j</sup> *<sup>n</sup>* <sup>s</sup> <sup>∗</sup> →R<sup>j</sup> *<sup>n</sup>* <sup>t</sup><sup>2</sup> (t<sup>1</sup> ∗ ←R<sup>s</sup> *<sup>n</sup>* s <sup>∗</sup> →R<sup>s</sup> *<sup>n</sup>* t2). Then t<sup>1</sup> ∗ ←R<sup>s</sup> *<sup>n</sup>* s <sup>∗</sup> →R<sup>s</sup> *n* t<sup>2</sup> by Lemma 5. Thus, by Lemma 7, t<sup>1</sup> ∗ ↔R<sup>o</sup> *<sup>n</sup>* <sup>t</sup>2. Hence, <sup>t</sup><sup>1</sup> <sup>↓</sup>R<sup>o</sup> *<sup>n</sup>* t<sup>2</sup> follows by the level-confluence of <sup>R</sup>o. Using again Lemma 5, this implies <sup>t</sup><sup>1</sup> <sup>↓</sup>R<sup>j</sup> *<sup>n</sup>* <sup>t</sup><sup>2</sup> (resp. <sup>t</sup><sup>1</sup> <sup>↓</sup>R<sup>s</sup> *<sup>n</sup>* <sup>t</sup>2).

Thus, Corollary 1 can be applied to show the level-confluence of join and semi-equational CTRSs. Note here that the conditions of Corollary 1 is stated in terms of <sup>→</sup><sup>o</sup> <sup>R</sup> not in that of <sup>→</sup><sup>j</sup> <sup>R</sup> or <sup>→</sup><sup>s</sup> R.

#### 4.2 Commutation of Semi-Equational 3-CTRSs

A most fundamental ingredient of the proof presented (inherited from [14]) is to use induction on the level of rewrite relation. It seems, however, applying this approach for join and semi-equational CTRSs contains fundamental difficulty. Without the induction on the level, what can we do within the parallel-closed approach? In this subsection, we will exhibit one alternative approach for semiequational CTRSs.

In [1], it is reported that left-linear parallel-closed semi-equational 1-CTRSs are confluent. By examining its proof detail, we can extend it to commutativity of 3-CTRSs as follows. Below, notation R cσ (etc.) stands for cσ <sup>⊆</sup> <sup>∗</sup> ↔R.

Theorem 3. *Let* <sup>R</sup>, <sup>S</sup> *be semi-equational left-linear 3-CTRSs. Suppose the following conditions are satisfied:*


*Furthermore, assume* −→- <sup>S</sup> <sup>⊆</sup> <sup>∗</sup> <sup>↔</sup>R*,* −→- <sup>R</sup> <sup>⊆</sup> <sup>∗</sup> ↔<sup>S</sup> *and* R∩S *is a 2-CTRS. Then,* R *and* S *commute.*

We remark that conditions −→- <sup>S</sup> <sup>⊆</sup> <sup>∗</sup> <sup>↔</sup><sup>R</sup> and −→- <sup>R</sup> <sup>⊆</sup> <sup>∗</sup> ↔<sup>S</sup> are used to close nested peaks, and that the condition that R∩S is a 2-CTRS is required to resolve for peaks obtained by the same rule.

*Example 3.* Let R and S be the following left-linear semi-equational 3-CTRSs:

$$\begin{array}{l} \mathcal{R} = \{ \mathfrak{q}(x, y) \to \mathfrak{p}(y, x), \ \mathfrak{p}(x, y) \to \mathfrak{q}(x', y') \Leftarrow x \approx x', y \approx y' \}, \\ \mathcal{S} = \{ \mathfrak{p}(x, y) \to \mathfrak{q}(y, x), \ \mathfrak{q}(x, y) \to \mathfrak{p}(x', y') \Leftarrow x \approx x', y \approx y' \}. \end{array}$$

By induction on the level <sup>n</sup>, one can show <sup>→</sup>S*<sup>n</sup>* <sup>⊆</sup> <sup>∗</sup> <sup>→</sup>R*<sup>n</sup>* and <sup>→</sup>R*<sup>n</sup>* <sup>⊆</sup> <sup>∗</sup> →S*<sup>n</sup>* . Thus, conditions −→- <sup>S</sup> <sup>⊆</sup> <sup>∗</sup> <sup>↔</sup><sup>R</sup> are −→- <sup>R</sup> <sup>⊆</sup> <sup>∗</sup> ↔<sup>S</sup> are satisfied. Clearly, R∩S = ∅ is a 2-CTRS. We have *CCP*(R, <sup>S</sup>) = {q(x , y ), <sup>q</sup>(y, x) ⇐ {<sup>x</sup> <sup>≈</sup> <sup>x</sup> , y <sup>≈</sup> y }, <sup>∅</sup>, {p(y, x), <sup>p</sup>(x , y ) ⇐ ∅, {<sup>x</sup> <sup>≈</sup> <sup>x</sup> , y <sup>≈</sup> <sup>y</sup> }} and *CCPin*(S, <sup>R</sup>) = <sup>∅</sup>. Clearly, <sup>ρ</sup>(x) <sup>∗</sup> <sup>↔</sup><sup>R</sup> <sup>ρ</sup>(x ) and ρ(y) <sup>∗</sup> <sup>↔</sup><sup>R</sup> <sup>ρ</sup>(y ) imply p(ρ(x ), ρ(y )) →<sup>R</sup> q(ρ(y), ρ(x)), and ρ(x) <sup>∗</sup> <sup>↔</sup><sup>S</sup> <sup>ρ</sup>(x ) and ρ(y) <sup>∗</sup> <sup>↔</sup><sup>S</sup> <sup>ρ</sup>(y ) imply <sup>q</sup>(ρ(x), ρ(y)) <sup>←</sup><sup>S</sup> <sup>p</sup>(ρ(y), ρ(x)). Thus, all conditions of the Theorem <sup>3</sup> are satisfied. Thus, <sup>R</sup> and S commute.

Note the conditions −→- <sup>S</sup> <sup>⊆</sup> <sup>∗</sup> <sup>↔</sup><sup>R</sup> and −→- <sup>R</sup> <sup>⊆</sup> <sup>∗</sup> <sup>↔</sup><sup>S</sup> of Theorem <sup>3</sup> imply <sup>∗</sup> <sup>↔</sup><sup>R</sup> <sup>=</sup> <sup>∗</sup> ↔<sup>S</sup> , i.e. R and S have the same underlying logic.

#### 5 Conclusion

We have given a critical pair criterion for ensuring level-commutativity of leftlinear properly-oriented right-stable oriented 3-CTRSs. Our result generalizes a sufficient criterion for commutativity of left-linear TRSs of Toyama [16]. It also properly extends level-confluence of orthogonal properly-oriented right-stable oriented 3-CTRSs of Suzuki et al. [14]. We then have showed this result can be applied to obtain a criterion for level-confluence of left-linear properly-oriented right-stable join and semi-equational 3-CTRSs, generalizing a result of [14]. We have also explored a similar but different approach of Aoto and Toyama [1] to obtain a criterion for the commutation of semi-equational 3-CTRSs.

Wirth [17] also gave a criterion of level-confluence for possibly non-orthogonal CTRSs that generalizes a sufficient criterion for confluence of left-linear TRSs of [16]. He adapted the approach of [16] for a framework of join CTRSs. It also incorporates some ideas of [14] so as to give the notions of (weak-)quasi-normal CTRSs, etc. A critical key difference with the usual conditional rewriting such as employed in our paper, however, is that the validity of conditions needs to be satisfied under a kind of constructor discipline. This restriction considerably simplifies proof arguments dealing with conditional parts, paying the penalty of going apart from the standard framework. On the other hand, despite these sharp differences on the underlying frameworks of ours and [17], interestingly, the critical pair criterion of Theorem 3 and Wirth's critical pair criterion ([17, Definition 28]) resemble very much.

Over various formalisms of rewriting, considerable efforts have been spent on automating confluence checks in recent years. Yearly competition<sup>2</sup> of confluence tools started in 2012; the category of CTRS has been also introduced in 2014. In recent competitions, confluence of *oriented 3-CTRSs*, which our main theorem deal with, has been focused in the category of CTRS. Known confluence tools for CTRSs include CONFident [6], ConCon [13], CO3 [9] and ACP [2]. We note here that all these tools fail to show confluence of R∪S of Example <sup>2</sup><sup>3</sup>. Among these tools (at least) ConCon and ACP incorporate checking of confluence criterion of [14]. We have been working on the automation of our results, but it is yet under

<sup>2</sup> http://project-coco.uibk.ac.at/.

<sup>3</sup> Experimented for CoCo 2022 participants ACP, CO3, CONFident and a CoCo 2020 participant ConCon, via CoCoWeb [7].

development. Recent advances in confluence tools for CTRSs include automation of infeasibility checking [5]—we believe some approaches for automation of infeasibility checking can be adapted for automation of our criterion.

Formalization by interactive theorem provers such as Isabelle/HOL, Coq, PVS4, etc. have been of great interest in recent years. Formalization is also indispensable for certification of results obtained by confluence tools. Regarding for results of [14], a formalization in Isabelle/HOL has been reported by Sternagel and Sternagel [12]. On the other hand, formalization of our results remains completely as a future work.

Acknowledgements. Thanks are due to the anonymous reviewers (including those for all previous versions of the paper) for valuable comments. This work is partially supported by JSPS KAKENHI No. 21K11750.

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Decidable Fragments**

# **Logic of Communication Interpretation: How to Not Get Lost in Translation**

Giorgio Cignarale , Roman Kuznets , Hugo Rincon Galeana(B) , and Ulrich Schmid

TU Wien, Vienna, Austria *{*giorgio.cignarale,roman.kuznets,hugo.galeana*}*tuwien.ac.at, s@ecs.tuwien.ac.at

**Abstract.** Byzantine fault-tolerant distributed systems are designed to provide resiliency despite arbitrary faults, i.e., even in the presence of agents who do not follow the common protocol and/or despite compromised communication. It is, therefore, common to focus on the perspective of correct agents, to the point that the epistemic state of byzantine agents is completely ignored. Since this view relies on the assumption that faulty agents may behave arbitrarily adversarially, it is overly conservative in many cases. In blockchain settings, for example, dishonest players are usually not malicious, but rather selfish, and thus just follow some "hidden" protocol that is different from the protocol of the honest players. Similarly, in high-availability large-scale distributed systems, software updates cannot be globally instantaneous, but are rather performed node-by-node. Consequently, updated and non-updated nodes may simultaneously be involved in a protocol for solving a distributed task like consensus or transaction commit. Clearly, the usual assumption of common knowledge of the protocol is inappropriate in such a setting. On the other hand, joint protocol execution and, sometimes, even basic communication becomes problematic without this assumption: How are agents supposed to interpret each other's messages without knowing their mutual communication protocols? We propose a novel epistemic modality *creed* for epistemic reasoning in heterogeneous distributed systems with agents that are uncertain of the actual communication protocol used by their peers. We show that the resulting logic is quite closely related to modal logic S5, the standard logic of epistemic reasoning in distributed systems. We demonstrate the utility of our approach by several examples.

### **1 Introduction**

A *distributed system* is a system with multiple processes, or agents, located on different machines that communicate and coordinate actions, via *message*

c The Author(s) 2023

G. Cignarale and R. Kuznets—Supported by the Austrian Science Fund (FWF) projects ByzDEL (P33600).

H.R. Galeana—Supported by the Doctoral College Resilient Embedded Systems, which is run jointly by the TU Wien's Faculty of Informatics and the UAS Technikum Wien.

U. Sattler and M. Suda (Eds.): FroCoS 2023, LNAI 14279, pp. 119–136, 2023. https://doi.org/10.1007/978-3-031-43369-6\_7

*passing* or *shared memory*, in order to accomplish some task [8,21]. This common task is achieved by means of agent protocols instructing agents how to exchange information and act. Designing distributed systems is difficult due to the inherent uncertainty agents have about the global state of the system, caused, e.g., by different computation speeds and message delays.

Knowledge [15] is a powerful conceptual way of reasoning about this uncertainty [13,14]. Indeed, knowledge is at the core of the agents' ability to act according to the protocol: According to the Knowledge of Preconditions principle [22], a protocol instruction to act based on a precondition ϕ can only be followed if the agent knows ϕ to hold. While trivial for preconditions based on the local state of the acting agent itself, this observation comes to the fore for global preconditions, also involving other agents, as is common for coordination problems such as consensus.

One of the standard ways of modeling agents' knowledge is via the possible world semantics that takes into account all the possible global states the agents can be in and which of these possible worlds a particular agent can distinguish based on its local information. In this view, agent i knows a proposition ϕ, written K*i*ϕ, in a global state s iff this proposition holds in all global states s that are indistinguishable from s for i. The primary means of obtaining new knowledge — and the only way of increasing knowledge about the local states of other agents — in a distributed system is by means of communication.

Fault-tolerant systems add another layer of complexity, in particular, when processes may not only stop operating or drop messages but can be (or become) *byzantine* [19], i.e., may behave arbitrarily erroneously, in particular, can communicate in erratic, arbitrary, or deceptive manner. Malicious faulty agents may have a "hidden agenda", in which case, instead of following the original commonly known protocol, a faulty agent (or a group of faulty agents) can execute actions (possibly in consort with each other) that jeopardize the original goals of the system.

Although these hidden agendas are typically not transparent for correct agents, some assumptions must be made to restrict the types and numbers of protocol-defying actions and messages. Without such restrictions, provably correct solutions for a distributed task do not exist. These assumptions must usually be commonly known by all agents, like the basic communication mechanism, the protocol of all correct agents, the data encoding used in its messages, etc. In [7], the whole corpus of these common assumptions is referred to as *a priori knowledge*. <sup>1</sup> For the possible world semantics, this translates into the assumption of common knowledge of the model [3], which enables the agents to compute epistemic states of other agents, a task necessary for a typical coordination problem like consensus [6].

Since correct agents generally cannot distinguish a simple malfunction from malintent, erroneous messages, i.e., messages sent in contravention of the com-

<sup>1</sup> The focus of [7] is on a priori assumptions that can be erroneous and may require later updates, hence, the term *a priori beliefs* there. In this paper, we generally assume these assumptions to be factive, hence, we use a priori knowledge instead.

monly known joint protocol, are usually left uninterpreted. For instance, in the epistemic modeling and analysis framework [11,16–18] for byzantine agents, message ϕ received from agent i is interpreted by means of the hope modality

$$H\_i\varphi := correct\_i \to B\_i\varphi,$$

where B*i*ϕ represents belief of agent i and is understood in the spirit of belief as defeasible knowledge [24], where

$$B\_i \varphi := K\_i(correct\_i \to \varphi).$$

This hope modality H*i*ϕ is equivalent to a disjunction

$$¬correct\_i ∨(correct\_i ∧B\_iφ),$$

suggesting that a message ϕ from i is interpreted as the uncertainty between agent i being faulty or the epistemic state of i confirming ϕ in case i is a correct agent. Note that in the former case, the message carries no meaning whatsoever. Indeed, the axiomatization of hope in [10] takes H*i*⊥ to be the definition of faulty agents because only a faulty agent can send contradictory messages. Given that in normal modal logic H*i*⊥ → H*i*ϕ holds for any ϕ, the consequence is that a faulty agent can send any message independent of its epistemic state. In other words, no conclusions about the epistemic state of a faulty agent can be drawn from its messages, as reflected in the hope modality.

However, not all systems exhibit such a stark dichotomy between commonly known and fully transparent *us* (correct processes) and the mysterious and uninterpretable *them* (faulty processes). *Rational agents* in blockchain settings [12], for instance, do not necessarily have the same goal as the rest of the system. Nevertheless, neither their actions nor their communication are arbitrary, not to speak of adversarial. Consequently, game theoretic modeling, based on a model of their beliefs and goals, can be applied for the analysis of such systems [2].

In this paper, we extend this finer-grained view to the epistemic modeling of distributed systems and consider *heterogeneous distributed systems*, where different processes may run different protocols and where the assumption that all protocols are commonly known is dropped. In such systems, we assume that processes are partitioned into types (or roles, or classes) of agents, so that within one type the protocols are commonly known to the agents of that type. While such a strong assumption is not made for agents of different types, we do not assume them to have zero knowledge of each other's protocol either. In particular, we assume that each class is equipped with an *interpretation function* that encodes the amount of knowledge agents have regarding the preconditions for communication agents of a different type have.

Since having no preconditions for sending a message is an allowed instance, this setting generalizes the byzantine setting described earlier, where there are two types — correct and faulty agents — and only messages of correct agents have a non-trivial interpretation. These interpretation functions are formalized by means of the new *creed* modality C*<sup>A</sup>*\*<sup>B</sup> <sup>p</sup>* ϕ introduced in this paper, which generalizes the hope modality for the byzantine case and represents the information an agent of type A can infer upon receiving message ϕ from agent p of type B.

We illustrate the communication scenarios where this creed modality may be useful by means of some examples:

*Example 1 ("The Murders in the Rue Morgue").* This famous story by Edgar Allan Poe describes a murder mystery. Several witnesses heard the murderer (agent m) but nobody saw m. The problem in interpreting their testimony is that they seem to contradict each other: for instance, a French witness f thinks m spoke Italian and is certain m was not French, whereas a Dutch witness d thought m was French, etc. Importantly, none of the witnesses could understand what was being said (f does not speak Italian, while d does not speak French, etc.). The standard byzantine framework considers the possibility of a faulty agent sending different messages to different agents to confuse them, but provides no means to describe one uncorrupted message being treated so differently by correct agents. Standard epistemic methods either accept all incoming information as being of equal value or make a priori preferential judgements. However, in the story, Monsieur C. Auguste Dupin correctly surmises that m spoke neither of the languages. Dupin neither dismisses witness accounts completely as lies nor accepts them completely. Instead he chooses some of the witness statements over others without prejudging them.

*Example 2 (Knights and Knaves puzzles).* There is a series of logical puzzles, popularized by Smullyan [26], about an island, all inhabitants of which are either *knights* who always tell the truth or *knaves* who always lie. One of the simplest ones [26, Puzzle 28] is as follows:

There are only two people, p and q, each of whom is either a knight or a knave. p makes the following statement: "At least one of us is a knave." What are p and q?

Our goal is to incorporate the uncertainty about the mode of communication (knaves lie/knights tell the truth) into the logic. Fault-tolerant systems do not provide a satisfactory model since there information from faulty agents is either accepted (in case of benign faults) or ignored as completely unreliable (in case of byzantine faults). Instead, enough information is collected from correct agents (and they must constitute an overwhelming majority for most problems to be solvable). By contrast, knights and knaves puzzles are typically solvable even if all agents involved are knaves. The answer to the puzzle above, for instance, is that p is a knight and q is a knave. We would like to derive this answer fully within the logic.

*Example 3 (Software Updates).* In a highly available large scale distributed system like an ATM network, it is impossible to simultaneously update the software executed by the processes. Rather, processes are usually updated more or less sequentially during normal operation of the system, at unpredictable times. As a consequence, the joint protocol executed in the system while a software update is in progress might mix both old and new protocol instances. Existing solutions like [1,25], which aim at updating complex protocols/software, typically provide "consistent update" environments that prevent such mixing.

Thanks to our creed modality, however, mixed joint protocols could be allowed, by explicitly considering those in the development of the new protocol instance: Indeed, when implementing a bug fix or feature update, the developer obviously knows the previous implementation. A message received at some process p from some process q in the new implementation just needs to be interpreted differently, depending on whether q runs an old or a new protocol instance. Note that backward compatibility typically rules out incorporating a version number into the messages of the (new) protocol here, in which case p would be uncertain about the actual status of q, despite having received a message from it.

For light-weight low-level protocols, this approach might indeed constitute an attractive alternative to complex consistent update mechanisms.

After introducing our framework, we explain in Sect. 6 how these examples could be formalized.

*Related Work.* Our logical framework generalizes the *hope* modality [10] introduced to reason about byzantine agents in distributed systems. We extend the standard formulation by considering the byzantine case as a special agent-type. Agent-types in the field of epistemic logic are formulated in [5], where *names* are used as abstract roles for groups of agents, depending on their characteristics. From the dynamic epistemic logic [9] perspective, a public announcement logic with agent types is presented in [20], providing a dynamic framework to reason about uncertainty of agent-types that is used to formalize the knights and knaves puzzle. Due to the different motivations, while treating a closely related problems set, [5] and [20] make different and at times incomparable choices regarding the postulates underlying the systems. For instance, a precondition for an announcement for an agent in [20] need not entail the agent knowing this precondition, which contradicts the fundamental Knowledge of Preconditions principle for distributed systems [22]. On the other hand, all agents in [20] possess the same knowledge about each of the existing agent types, in particular, all agents share one common interpretation of messages from a particular type, an assumption in line with the rather centralized nature of updates in dynamic epistemic logic but less sensible for distributed systems.

*Paper Organization.* In Sect. 2, we introduce the basic preliminary definitions and lemmas for describing heterogeneous distributed system where agents are grouped into types, each characterized by a different protocol. In Sect. 3, we provide an epistemic logic for representing heterogeneous distributed settings by introducing the *creed* modality and prove soundness and completeness in Sect. 4. We derive the properties of creed in Sect. 5. Having done that, in Sect. 6 we show how to apply this framework to the motivating examples. Finally, some conclusions are provided in Sect. 7.

## **2 Heterogeneous Distributed Systems**

In this paper, we focus on *heterogeneous distributed systems* where agents are of different types characterized by different protocols. All agents are assumed to be at most benign faulty,<sup>2</sup> in the sense that they do not take actions not specified by their protocol, cannot communicate wrong information, and have perfect recall. At any time, however, agents may change their type, i.e., change their protocol.

These different protocols partition the set of processes into different types, which are identified with the names of the protocols. The set of all existing types is commonly known to all the agents. All agents of the same type, which typically work towards the same goal, use the same protocol that is commonly known to all agents of this type. What is not generally known to an agent is the distribution of agents into types and the actual protocol of a type different from its own. In other words, agent a generally does neither know the type nor the protocol of agent b.

Communication in the system is governed by the protocols. Whereas all protocols must use the same basic communication mechanism and a common layering structure [23], i.e., (possibly non-synchronous) communication rounds, agents of different types generally communicate according to different protocol rules, data formats, encodings, etc. Communication actions are triggered by preconditions that depend on the protocol of the agent's type. Consequently, the interpretation of each message depends on:


More formally, we consider a finite set of processes Π = {p1,...,p*n*} that communicate with each other by using a joint communication mechanism, such as, e.g., shared memory objects or point-to-point messages. Each process executes some protocol with a name (= type) taken from a commonly known set of names A. However, no assumption is made about the types and the actual protocols of distinct agents i and j being identical or mutually known. All protocols are organized in a common, possibly non-synchronous communication round structure. We also require that the system has a common notion of time, represented by a directed set T. Common choices for T are the set of natural number N, or even the set of real numbers R. It should be noted that in Definitions 4–5, we assume that concepts such as configuration and protocol match the standard notions in distributed computing literature [4,21].

**Definition 4 (Heterogeneous distributed system).** *We say that a tuple* Π, A,P, C, T *is a* heterogeneous distributed system *iff*


<sup>2</sup> Adding byzantine faults to the picture will be left for future research.


*The joint protocol of* Π, A,P, C, T *is the protocol formed by the protocols of all the agents.*

In this setting, given multiple possibly non-cooperating teams of agents, we need to re-define the notion of tasks and solvability. In particular, we generally cannot impose restrictions on the output of processes in other partitions.

**Definition 5 (Partial task).** *We say that a tuple* S, I, O, Δ *is a* partial task relative to S ⊆ Π *iff* I *is a set of input configurations for* Π*;* O *is a set of output configurations for* S*; and* Δ *is a validity correspondence that maps valid initial configurations of the system to a subset of valid output configurations for* S*.*

**Definition 6 (Solvability).** *Let* Π, A,P, C, T *be a heterogeneous distributed system. We say that agents of type* A*<sup>i</sup>* ∈ A can solve a partial task T = S, I, O, Δ *iff for any input configuration* σ ∈ I*, the execution of the joint protocol of* Π, A,P, C, T *leads to an output configuration* ρ|*<sup>S</sup>* ∈ Δ(σ)*.*

Note that traditional distributed systems with *benign failures* fall into the particular case where A = {Π} and there is one unique protocol executed by all processes. Similarly, distributed systems with send-restricted byzantine faults (no false perceptions of received messages, but arbitrary message sending) could be modeled as an instance with two types <sup>A</sup>*<sup>B</sup>* <sup>=</sup> {Correct, Faulty}, where all agents of type Correct follow the intended protocol, whereas agents of type Faulty can arbitrarily deviate from it.

## **3 Epistemic Logic for Heterogeneous Distributed Systems**

We consider a heterogeneous distributed system Π, A,P, C, T according to Definition 4, where processes are partitioned into different types according to their protocol. Agents of the same type share a common protocol, which also includes information on how to interpret messages from agents of various types. Recall that we assume that each process knows its own protocol/type, and, therefore, the protocol of all other agents of the same type, but not necessarily which agents are of this type. In particular, an agent may be unsure whether another agent belongs to its own type or not.

Agents interpret received messages by means of an interpretation function:

**Definition 7 (Interpretation function).** *Let* F *be the set of well-defined formulas used by agents to communicate. An* interpretation function *for type* A ∈ <sup>A</sup> *with respect to type* <sup>E</sup> ∈ A *messages is any function* <sup>f</sup>*AE* : <sup>F</sup> <sup>→</sup> <sup>F</sup>*.*

Intuitively, f*AE*(ϕ) corresponds to the knowledge that type A agents (or simply A agents) have about the preconditions for E agents to send message ϕ. We assume that function f*AE*, for every type E, is a priori known by every A agent, as part of its protocol.

*Example 8.* Interpretation function f*AE*(ϕ) := for all <sup>ϕ</sup> <sup>∈</sup> <sup>F</sup> corresponds to the case when A agents have no knowledge about the communication protocol of E agents. For instance, byzantine agents who can send any message at any time (send-unrestricted byzantine agents) can be captured by choosing fCorrect*,*Faulty(ϕ) = for partition <sup>A</sup>*<sup>E</sup>* <sup>=</sup> {Correct, Faulty}. The minimal requirement that all correct agents tell the truth translates into fCorrect*,*Correct(ϕ) = ϕ.

Since we want to be able to express partition membership into our language and formulas, we need to define partition membership atoms.

**Definition 9 (Propositional variables and partition atoms).** *We consider, for each process* p*<sup>i</sup>* ∈ Π*, a finite set Prop<sup>i</sup> of* propositional variables*. In addition, for each agent type* A ∈ A*, we consider the set* Π*<sup>A</sup>* := {A*<sup>p</sup>* | p ∈ Π} *of* partition atoms*. The set of all* atomic propositions *is defined as*

$$Prop := \bigcup\_{i=1}^{n} Prop\_i \cup \bigcup\_{A \in \mathcal{A}} II\_A.$$

Since A is a partition, every agent belongs to one and only one type. For convenience, we denote the type of agent p by ¯p. Furthermore, we will assume that each agent knows its own type, i.e., K*p*(¯p*p*).

Now that we have established the basics of our heterogeneous distributed systems, we can proceed to define the language.

**Definition 10 (Language of** EHL**).** *The language* L *of the epistemic heterogeneous logic extends the standard (multi-modal) epistemic language by a new family of modalities called* creed *and is given by the grammar:*

$$\varphi ::= r \mid \neg \varphi \mid (\varphi \land \varphi) \mid K\_p \varphi,\tag{1}$$

*where* r ∈ *Prop is an atomic proposition (i.e., propositional variable or partition atom),* p ∈ Π *is an agent, and* A, E ∈ A *are agent types. Other boolean connectives, as well as boolean constants and* ⊥*, are defined in the usual way. We use the following derived modalities:* <sup>K</sup>ˆ*p*<sup>ϕ</sup> := <sup>¬</sup>K*p*¬<sup>ϕ</sup> *and creed defined as*

$$\mathbb{C}\_p^{A \backslash E} \varphi := E\_p \to K\_p f\_{AE}(\varphi) \tag{2}$$

*for any agent* p ∈ Π *and agent types* A, E ∈ A*.*

Creed C*<sup>A</sup>*\*<sup>E</sup> <sup>p</sup>* ϕ represents the amount of information an A agent can extract from a message ϕ received from agent p under the assumption that p belongs to type E of the partition. It is based on the a priori knowledge A agents possess of the preconditions for an E agent to send message ϕ, as encoded in the interpretation function f*AE* from Definition 7, which is external to the language. This precondition already takes into account the Knowledge of Preconditions principle [22], by assuming that the sender must know that the preconditions hold. We use the standard Kripke model semantics with additional restrictions for partition atoms:

**Definition 11 (Semantics).** *Let* Π, A,P, C, T *be a heterogeneous distributed system and* {f*AE* | A, E ∈ A} *be the collection of interpretation functions for it. An* (epistemic) Kripke frame F = (W, ∼) *is a pair of a non-empty set* W *of worlds (or states) and a function* ∼: Π → P(W × W) *that assigns to each agent* p ∈ Π *an equivalence relation* ∼*p*⊆ W × W *on* W*. A* Kripke model M = (W, ∼, V ) *is a triple where* (W, ∼) *is an epistemic Kripke frame and* V : W → P(*Prop*) *is a* valuation function *for atomic propositions. The truth relation* |= *between Kripke models and formulas is defined as follows:* M,s |= r *iff* r ∈ V (s) *for any* r ∈ *Prop; cases for the boolean connectives are standard;* M,s |= K*p*ϕ *iff* M, t |= ϕ *for all* t ∈ W *such that* s ∼*<sup>p</sup>* t*. As usual, validity in a model, denoted* M |= ϕ*, means* M,s |= ϕ *for all* s ∈ W*.*

*A Kripke model* M = (W, ∼, V ) *is called an* EHL model *iff the following two conditions hold:*

*1. For any state* s ∈ W *and any agent* p ∈ Π*,*

$$\left| V(s) \cap \{ A\_p \mid A \in \mathcal{A} \} \right| = 1,\tag{3}$$

*i.e., exactly one of partition atoms* A*<sup>p</sup> involving agent* p *is true at state* s*. 2. For any agent* p*, any agent type* A*, and pair of states* s *and* t*,*

$$s \sim\_p t \qquad \implies \qquad \left( A\_p \in V(s) \quad \Leftrightarrow \quad A\_p \in V(t) \right), \tag{4}$$

*i.e.,* p *can distinguish worlds where it is of different types.*

*General validity, denoted* |= ϕ*, means* M |= ϕ *for all* EHL *models.*

*Example 12* For the interpretation functions from Example 8 for send-unrestricted byzantine agents, <sup>C</sup>Correct\Faulty *<sup>p</sup>* <sup>ϕ</sup> <sup>=</sup> Faulty*<sup>p</sup>* <sup>→</sup> <sup>K</sup>*<sup>p</sup>* . For epistemic models, it is logically equivalent to , meaning that no information can be gleaned from a message under the assumption that it is sent by a fully byzantine agent without perception flaws. At the same time, for truth-telling correct agents

$$
\mathbb{C}\_p^{\mathsf{Correct}\backslash\mathsf{Correct}}\_p \varphi = \mathsf{Correct}\_p \to K\_p \varphi,
$$

which closely matches the hope modality

$$H\_p\varphi = \mathbf{Correct}\_p \to K\_p(\mathbf{Correct}\_p \to \varphi)$$

from [10]. Indeed, since we assume agents to know their own type, it is the case that Correct*<sup>p</sup>* <sup>→</sup> <sup>K</sup>*p*Correct*<sup>p</sup>* holds, making <sup>H</sup>*p*<sup>ϕ</sup> equivalent to <sup>C</sup>Correct\Correct *<sup>p</sup>* <sup>ϕ</sup>.

*Example 13* Apart from helping to understand messages, an interpretation function can be used to gain knowledge about the type of the sender. For instance, if A agents know enough about the way E agents communicate to conclude that a particular message ϕ can never be sent by an E agent, which corresponds to <sup>f</sup>*AE*(ϕ) = <sup>⊥</sup>, then <sup>C</sup>*<sup>A</sup>*\*<sup>E</sup> <sup>q</sup>* <sup>ϕ</sup> <sup>=</sup> <sup>E</sup>*<sup>q</sup>* <sup>→</sup> <sup>K</sup>*q*⊥. For epistemic models, such <sup>C</sup>*<sup>A</sup>*\*<sup>E</sup> <sup>q</sup>* <sup>ϕ</sup> is logically equivalent to ¬E*q*. In other words, having received ϕ from agent q, an A agent p learns at least K*p*¬E*q*.

*Remark 14 (Information from message passing).* Let p, q ∈ Π be agents and A be a partition of Π. The knowledge gained by agent p upon receiving a message ϕ from agent q can be described by K*p*C*<sup>p</sup> <sup>q</sup>*ϕ, where

$$\mathbb{C}\_q^p \varphi := \bigwedge\_{E \in \mathcal{A}} \mathbb{C}\_q^{\bar{p} \backslash E} \varphi \tag{5}$$

In other words, knowing its own type, p considers all possible types for the sender q and for each type considers the respective interpretation of the message; the conjunction combined with the implications within creed make sure that the appropriate type is chosen. Note that the presence of send-unrestricted agents from Example 12 adds a conjunct to (5) that is equivalent to . Hence, sendunrestricted agents can be safely ignored in determining the message meaning. By the same token, some conjuncts in (5) can rule out a particular type for agent q as in Example 13. Finally, if p has already ruled out some type E, then K*p*¬E*<sup>q</sup>* logically implies K*p*(E*<sup>q</sup>* → K*q*f*AE*(ϕ)) independent of the interpretation function. In this case, the E-conjunct of (5) becomes redundant.

*Example 15.* In the system from Example 8 with send-unrestricted byzantine agents, upon receiving message ϕ from agent q, agent p can ignore the possibility of the sender being Faulty and conclude Correct*<sup>q</sup>* → K*q*ϕ, i.e., hope H*q*ϕ for the case of factive beliefs, in full accordance with [10]. Note also that p may infer K*q*ϕ from this message if p is sure that q is correct.

Now that we have established the basic definitions and semantics for the logic, we will now provide an axiomatization that we prove sound and complete in the next section.

**Definition 16 (Logic** EHL**).** *Let* Π, A,P, C, T *be a heterogeneous distributed system and* {f*AE* | A, E ∈ A} *be the collection of interpretation functions for it. Logic* EHL *is obtained by adding to the standard axiomatization of modal logic of knowledge* S5 *the partition axioms* P1*–*P3*. The resulting axiom system is as follows: for all* p ∈ Π*, all* A ∈ A*, and all* E ∈ A *such that* E = A*,*

Taut *All propositional tautologies in the language of* EHL*;*

k K*p*(ϕ → ψ) → (K*p*ϕ → K*p*ψ)*;* 4 K*p*ϕ → K*p*K*p*ϕ*;* t K*p*ϕ → ϕ*;* 5 ¬K*p*ϕ → K*p*¬K*p*ϕ*;* (MP) *rule inferring* ψ *from* ϕ → ψ *and* ϕ*;* (Nec) *rule inferring* K*p*ϕ *from* ϕ*;*

$$\mathsf{P1} \quad \bigvee\_{A \in \mathcal{A}} A\_p; \qquad \mathsf{P2} \ A\_p \to \neg E\_p; \qquad \mathsf{P3} \ A\_p \to K\_p A\_p. \tag{6}$$

Partition axiom P1 states that each agent belongs to at least one of the types. Partition axiom P2 postulates that each agent belongs to at most one of the types. Together they imply that agent types partition the set of agent. Partition axiom P3 expresses that every process knows its own type.

## **4 Soundness and Completeness of** EHL

Since EHL is an extension of S5 with partition axioms governing the behavior of partition atoms while EHL models are instances of epistemic models, the soundness and completeness for EHL follows the standard proof for S5 (see, e.g., [9]), where additionally it is necessary to establish that the partition axioms are sound and that the canonical model satisfies the additional restrictions.

**Theorem 17 (Soundness).** *Logic* EHL *is sound with respect to* EHL *models, i.e.,* EHL ϕ *implies* |= ϕ*.*

*Proof.* We only establish the validity of partition axioms. Axioms P1 and P2 hold due to condition (3). Similarly, P3 holds because of (4).

Completeness is proved by the standard canonical model construction, which requires several definitions. We omit the proofs of the following lemmas if completely standard and only treat new cases otherwise.

**Definition 18 (Maximal consistent sets).** *A set* <sup>Γ</sup> <sup>⊆</sup> <sup>F</sup> *of formulas is called* consistent *iff* EHL - ¬ Γ<sup>0</sup> *for any finite subset* Γ<sup>0</sup> ⊆ Γ*. A set* Γ *is called* maximal consistent *iff* Γ *is consistent but no proper superset* Δ Γ *is consistent.*

**Lemma 19 (Lindenbaum Lemma).** *Any consistent set* Γ *can be extended to a maximal consistent set* Δ ⊇ Γ*.*

**Definition 20 (Canonical model).** *We define the canonical model* M*<sup>C</sup>* = (S*<sup>C</sup>* , <sup>∼</sup>*<sup>C</sup>* , V *<sup>C</sup>* ) *is defined as follows:*

*–* S*<sup>C</sup> is the collection of all maximal consistent sets; –* Γ ∼*<sup>p</sup>* Δ *iff* {K*p*ϕ | K*p*ϕ ∈ Γ} = {K*p*ϕ | K*p*ϕ ∈ Δ}; *–* <sup>V</sup> *<sup>C</sup>* (Γ) := {<sup>r</sup> <sup>∈</sup> *Prop* <sup>|</sup> <sup>r</sup> <sup>∈</sup> <sup>Γ</sup>}*.*

**Lemma 21 (Truth Lemma).** *For any* <sup>ϕ</sup> <sup>∈</sup> <sup>F</sup> *and any* <sup>Γ</sup> <sup>∈</sup> <sup>S</sup>*<sup>C</sup> ,*

<sup>ϕ</sup> <sup>∈</sup> <sup>Γ</sup> ⇐⇒ <sup>M</sup>*<sup>C</sup>* , Γ <sup>|</sup><sup>=</sup> <sup>ϕ</sup>

**Lemma 22 (Correctness).** *The canonical model is an* EHL *model.*

*Proof.* That <sup>S</sup>*<sup>C</sup>* <sup>=</sup> <sup>∅</sup> and <sup>∼</sup>*<sup>p</sup>* is an equivalence relation for each <sup>p</sup> <sup>∈</sup> <sup>Π</sup> is proved the same way as for S5. It remains to show that (3) and (4) hold.

(3) Consider any maximal consistent set <sup>Γ</sup> <sup>∈</sup> <sup>S</sup>*<sup>C</sup>* and any agent <sup>p</sup> <sup>∈</sup> <sup>Π</sup>. By the standard properties of maximal consistent sets, all theorems of EHL belong to each maximal consistent set, in particular, *<sup>A</sup>*∈A <sup>A</sup>*<sup>p</sup>* ∈ Γ because of axiom P1. A disjunction belongs to a maximal consistent set iff one of the disjuncts does. Hence, there exists at least one type A such that A*<sup>p</sup>* ∈ Γ. At the same time, for any other type E, we have (A*<sup>p</sup>* → ¬E*p*) ∈ Γ because of axiom P2. Hence, E*<sup>p</sup>* ∈/ Γ because maximal consistent sets are consistent and closed with respect to (MP). It follows that there is exactly one partition atom of the form A*<sup>p</sup>* in Γ. Hence, by the definition of V *<sup>C</sup>* ,

$$\left| V^C(\varGamma) \cap \{ A\_p \mid A \in \mathcal{A} \} \right| = 1.$$

(4) Consider two maximal consistent sets Γ ∼*<sup>p</sup>* Δ. Let A*<sup>p</sup>* ∈ Γ. By P3, also K*p*A*<sup>p</sup>* ∈ Γ. Hence, K*p*A*<sup>p</sup>* ∈ Δ by the definition of ∼*p*. Finally, A*<sup>p</sup>* ∈ Δ by axiom t. We proved that A*<sup>p</sup>* ∈ Γ implies A*<sup>p</sup>* ∈ Δ. The inverse implication is analogous.

**Theorem 23 (Completeness).** *Logic* EHL *is complete with respect to* EHL *models, i.e.,* EHL ϕ *whenever* |= ϕ*.*

*Proof.* We prove the contrapositive. Assume EHL ϕ. That means that {¬ϕ} is consistent. By Lindenbaum Lemma 19, there exists a maximal consistent set Γ ⊇ {¬ϕ}. Hence, this <sup>Γ</sup> <sup>∈</sup> <sup>S</sup>*<sup>C</sup>* for the canonical model <sup>M</sup>*<sup>C</sup>* defined in Definition 20, which is an EHL model by Lemma 22. By the Truth Lemma 21, it follows that <sup>M</sup>*<sup>C</sup>* , Γ <sup>|</sup><sup>=</sup> <sup>¬</sup>ϕ. Since <sup>M</sup>*<sup>C</sup>* , Γ |<sup>=</sup> <sup>ϕ</sup> for some EHL model, <sup>ϕ</sup> is not valid, i.e., |<sup>=</sup> <sup>ϕ</sup>.

# **5 Properties of Creed**

In this section, we derive several useful properties of creed modalities.

The explicit assumption P3 that each agent knows which type it belongs to implies a complete knowledge of own type, i.e., each agent a knows whether it belongs to any type A:

**Theorem 24.** EHL ¬A*<sup>p</sup>* → K*p*¬A*<sup>p</sup> for all* p ∈ Π*,* A ∈ A*, i.e., agents know which type they do not belong to.*

*Proof.* By P1, agent p must belong to one of the types. Hence, if not type A, it must be one of the remaining types, i.e., ¬A*<sup>p</sup>* → *<sup>E</sup>*=*<sup>A</sup>* <sup>E</sup>*p*. Therefore, we have ¬A*<sup>p</sup>* → *<sup>E</sup>*=*<sup>A</sup>* <sup>K</sup>*p*E*<sup>p</sup>* due to P3. Given that <sup>E</sup>*<sup>p</sup>* → ¬A*<sup>p</sup>* for each <sup>E</sup> <sup>=</sup> <sup>A</sup> by P2, also K*p*E*<sup>p</sup>* → K*p*¬A*<sup>p</sup>* for each E = A by standard modal reasoning. Hence, ¬A*<sup>p</sup>* → K*p*¬A*p*.

**Corollary 25.** EHL K*p*A*<sup>p</sup>* ∨ K*p*¬A*<sup>p</sup> for all* p ∈ Π*,* A ∈ A

*Proof.* It follows directly from P3 and Theorem 24 by propositional reasoning.

The creed modality amounts to K45-belief:

**Theorem 26.** *Creed satisfies the normality, positive and negative introspection axioms if applied to statements already translated by an interpretation function. Formally, let* ϕ*AE stand for any formula* ξ *such that* f*AE*(ξ) = ϕ*. Then the following formulas are derivable in* EHL*:*

$$\begin{split} \mathsf{k}\_{\mathsf{C}} &\vdash \mathsf{C}\_{p}^{A\backslash E} \left[ \varphi \rightarrow \psi \right]\_{AE} \rightarrow \left( \mathsf{C}\_{p}^{A\backslash E} \left[ \varphi \right]\_{AE} \rightarrow \mathsf{C}\_{p}^{A\backslash E} \left[ \psi \right]\_{AE} \right) \\ \mathsf{k}\_{\mathsf{C}} &\vdash \mathsf{C}\_{p}^{A\backslash E} \left[ \varphi \right]\_{AE} \rightarrow \mathsf{C}\_{p}^{A\backslash E} \left[ \mathsf{C}\_{p}^{A\backslash E} \left[ \varphi \right]\_{AE} \right]\_{AE} \\ \mathsf{k}\_{\mathsf{C}} &\vdash \neg \mathsf{C}\_{p}^{A\backslash E} \left[ \varphi \right]\_{AE} \rightarrow \mathsf{C}\_{p}^{A\backslash E} \left[ \neg \mathsf{C}\_{p}^{A\backslash E} \left[ \varphi \right]\_{AE} \right]\_{AE} \end{split}$$

*Proof.* We start by deriving kC:

1. C*<sup>A</sup>*\*<sup>E</sup> <sup>p</sup>* ϕ → ψ*AE* = E*<sup>p</sup>* → K*p*(ϕ → ψ) definition of creed 2. K*p*(ϕ → ψ) → (K*p*ϕ → K*p*ψ) axiom k 3. C*<sup>A</sup>*\*<sup>E</sup> <sup>p</sup>* ϕ → ψ*AE* → E*<sup>p</sup>* → (K*p*ϕ → K*p*ψ) prop. reasoning from 1.,2. 4. C*<sup>A</sup>*\*<sup>E</sup> <sup>p</sup>* ϕ*AE* = E*<sup>p</sup>* → K*p*ϕ definition of creed 5. <sup>C</sup>*A\<sup>E</sup> <sup>p</sup>* ϕ → ψ*AE* → - <sup>C</sup>*A\<sup>E</sup> <sup>p</sup>* ϕ*AE* → (E*<sup>p</sup>* → K*p*ψ) prop. reasoning from 3.,4. 6. <sup>E</sup>*<sup>p</sup>* <sup>→</sup> <sup>K</sup>*p*<sup>ψ</sup> <sup>=</sup> <sup>C</sup>*<sup>A</sup>*\*<sup>E</sup> <sup>p</sup>* ψ*AE* definition of creed 7. C*<sup>A</sup>*\*<sup>E</sup> <sup>p</sup>* ϕ → ψ*AE* → C*<sup>A</sup>*\*<sup>E</sup> <sup>p</sup>* <sup>ϕ</sup>*AE* <sup>→</sup> <sup>C</sup>*<sup>A</sup>*\*<sup>E</sup> <sup>p</sup>* <sup>ψ</sup>*AE* rewriting of 5. using 6.

The following is a derivation of 4C:

$$\begin{array}{ll} \text{1. } & \mathsf{C}\_{p}^{A^{A}}[\![\varphi] \!]\_{AE} = E\_{p} \to K\_{p}\varphi\\ \text{2. } & K\_{p}\varphi \to K\_{p}K\_{p}\varphi\\ \text{3. } & \mathsf{C}\_{p}^{A^{A}}[\![\varphi] \!]\_{AE} \to (E\_{p} \to K\_{p}K\_{p}\varphi) & \text{prop. reasoning from 1.2.}\\ \text{4. } & K\_{p}\varphi \to (E\_{p} \to K\_{p}\varphi) & \text{prop. tautology from 4.}\\ \text{5. } & K\_{p}K\_{p}\varphi \to K\_{p}(E\_{p} \to K\_{p}\varphi) & \text{normal modal reasoning from 4.}\\ \text{6. } & \mathsf{C}\_{p}^{A^{A}E}[\![\varphi] \!]\_{AE} \to \left(E\_{p} \to K\_{p}\mathsf{C}\_{p}^{A^{A}E}[\![\varphi] \!]\_{AE}\right) & \text{prop. reasoning from 3.5. using from 1.5. using 1.}\\ \text{7. } & E\_{p} \to K\_{p}\mathsf{C}\_{p}^{A^{A}E}[\![\varphi] \!]\_{AE} = \mathsf{C}\_{p}^{A^{A}E}\left[\![\mathsf{C}\_{p}^{A^{A}E}[\![\varphi] \!]\_{AE}\right]\_{AE} & \text{defination of cred} \\ \text{8. } & \mathsf{C}\_{p}^{A^{A}E}[\![\varphi] \!]\_{AE} \to \mathsf{C}\_{p}^{A^{A}E}\left[\![\mathsf{C}\_{p}^{A^{A}E}[\![\varphi] \!]\_{AE}\right]\_{AE} & \text{rewriting of 6. using 7.} \end{array}$$

The following is a derivation of 5C:

$$\begin{array}{ll} \text{1. } \neg \mathsf{C}\_{p}^{A}{}^{E} [\varphi]\_{AE} \leftarrow E\_{p} \land \neg K\_{p}\varphi & \text{prop. reasoning from the definition of cred} \\ \text{2. } E\_{p} \rightarrow K\_{p}E\_{p} & \text{xouring from the definition of cred} \\ \text{3. } \neg K\_{p}\varphi \leftarrow K\_{p}\neg K\_{p}\varphi & \text{axion} \\ \text{4. } \neg \mathsf{C}\_{p}^{A \, E} [\varphi]\_{AE} \rightarrow K\_{p}(E\_{p} \land \neg K\_{p}\varphi) & \text{normal modal reasoning from 1.-3.} \\ \text{5. } \neg \mathsf{C}\_{p}^{A \, E} [\varphi]\_{AE} \rightarrow K\_{p}\neg \mathsf{C}\_{p}^{A \, E} [\varphi]\_{AE} & \text{normal modal reasoning from 1.4.} \\ \text{6. } \neg \mathsf{C}\_{p}^{A \, E} [\varphi]\_{AE} \rightarrow \left(E\_{p} \rightarrow K\_{p}\neg \mathsf{C}\_{p}^{A \, E} [\varphi]\_{AE}\right) & \text{prop. reasoning from 5.} \\ \text{7. } \neg \mathsf{C}\_{p}^{A \, E} [\varphi]\_{AE} \rightarrow \mathsf{C}\_{p}^{A \, E} [\neg \mathsf{C}\_{p}^{A \, E} [\varphi]\_{AE}]\_{AE} & \text{rewriting of 6.} & \text{D} \end{array}$$

In addition, this creed belief is factive whenever the speaker type is correctly identified (cf. a similar conditional factivity for hope in [10]):

#### **Theorem 27.** t ∗ <sup>C</sup> : EHL E*<sup>p</sup>* → C*<sup>A</sup>*\*<sup>E</sup> <sup>p</sup>* ϕ*AE* → ϕ *.*

$$\begin{cases} \text{Proof. } 1. \vdash \mathbb{C}\_p^{A/E} [\![\varphi] \}\_{AE} = (E\_p \to K\_p \varphi) & \text{definition of cred} \\ 2. \vdash K\_p \varphi \to \varphi & \text{axiom } \mathsf{t} \\ 3. \vdash E\_p \to \left( \mathbb{C}\_p^{A/E} [\![\varphi] \}\_{AE} \to \varphi \right) & \text{prop. reasoning from 1.,2.} \end{cases}$$

On the other hand, misidentifying the speaker's type may easily destroy factivity. Let p /<sup>∈</sup> <sup>E</sup>. Given that <sup>C</sup>*<sup>A</sup>*\*<sup>E</sup> <sup>p</sup>* ϕ*AE* → ϕ = (E*<sup>p</sup>* → K*p*ϕ) → ϕ, we have E*<sup>p</sup>* → K*p*ϕ true simply because E*<sup>p</sup>* is false. Accordingly, there is no reason why ϕ must hold.

This provides a formal model of how a true statement can lead to false beliefs due to misinterpretation. Moreover, as Theorem 26 shows, such false beliefs cannot be detected by introspection.

#### **6 Applications**

#### **6.1 Formalizing "The Murders in the Rue Morgue"**

Example 1 describes a situation where honest witnesses provide contradictory information that is, nevertheless, successfully filtered by Dupin. We show how his reasoning can be formalized and explained using the creed modality. Dupin reads all witness accounts from a paper. We assume no misinterpretation of what the witnesses said. In addition, the paper mentions the exact type of each witness (French not speaking Italian, Dutch not speaking French, etc.), which again is assumed to be factive. Hence, we use only one creed modality with the identity interpretation function per witness account read by Dupin. In other words, Dupin reasons about the available information without the need to interpret it. The crucial question is: Why does Dupin ignore some but not all of the information provided by each witness? The answer becomes clear if we view each witness account as one or several creed modalities regarding what this witness heard from m. Ignoring slight variations in details, all witness statements can be divided into two types: (a) m did not speak the language I speak; (b) m spoke a language I do not speak. Dupin accepts statements (a) but ignores statements (b). Even when statement (b) of a witness contradicts statement (a) of another witness, Dupin accepts statements (a) from both witnesses. Here is how these statements of, say, the French witness f ∈ F regarding the utterance ϕ of m can be represented via the creed modality:

$$\begin{aligned} (a) \quad \mathbb{C}\_m^{F \backslash F} \varphi &= F\_m \to K\_m f\_{FF}(\varphi) = F\_m \to K\_m \bot; \\ (b) \quad \mathbb{C}\_m^{F \backslash I} \varphi &= I\_m \to K\_m f\_{FI}(\varphi) = I\_m \to K\_m \top. \end{aligned}$$

Indeed, for (a), since the interpreting function from French to French is meaningful (in the simplest case, is the identity function), the fact that f could not understand what m was saying in this case means that f*F F* (ϕ) = ⊥. On the other hand, for (b), since f does not know Italian, he has f*F I* (ψ) = for all ψ. As discussed in Example 13, (a) yields ¬F*m*. Similarly, (b) yields as per Example 12. This rightfully leads Dupin to the conclusion ¬F*m*, i.e., m /∈ F. In other words, statements (b) are ignored because they are trivial, not because they are false. One might say that, for f, a stronger precondition of m saying something in Italian is m ∈ I. But using I*<sup>m</sup>* → K*m*I*<sup>m</sup>* in place of (b) would yield axiom P3, still a logically trivial statement.

In the story, m was an orangutan (Ourang-Outang in Poe's spelling), thus, fulfilling m /∈ A for any language A discussed.

#### **6.2 Solution to Knights and Knaves**

Clearly the partition of the island from Example 2 involves two types: I for knIghts and A for knAves. Let s be the reasoner and L be his type. The puzzle postulates that f*LI* (ϕ) = ϕ and f*LA*(ϕ) = ¬ϕ for any formula ϕ. Accordingly, the full information agent s receives from agent p's statement that ϕ is

$$\mathbb{C}\_p^s \varphi = \mathbb{C}\_p^{L \backslash I} \varphi \wedge \mathbb{C}\_p^{L \backslash A} \varphi = (I\_p \to K\_p \varphi) \wedge (A\_p \to K\_p \neg \varphi).$$

In the puzzle in question, p states that at least one of p and q is a knave, A*p*∨A*<sup>q</sup>* in formulas. Hence, agent s learns

$$\mathbb{C}\_p^s(A\_p \lor A\_q) = \left(I\_p \to K\_p(A\_p \lor A\_q)\right) \land \left(A\_p \to K\_p \neg (A\_p \lor A\_q)\right). \tag{7}$$

Here is how to derive in EHL that p is a knight and q is a knave, i.e., I*<sup>p</sup>* ∧ A*q*:

1. A*<sup>p</sup>* → K*p*¬(A*<sup>p</sup>* ∨ A*q*) prop. reasoning from (7) 2. K*p*¬(A*<sup>p</sup>* ∨ A*q*) → ¬A*<sup>p</sup>* t and prop. reasoning 3. ¬A*<sup>p</sup>* prop. reasoning since A*<sup>p</sup>* → ¬A*<sup>p</sup>* follows from 1. and 2. 4. ¬A*<sup>p</sup>* → I*<sup>p</sup>* P1 and prop. reasoning 5. I*<sup>p</sup>* (MP) from 3. and 4. 6. I*<sup>p</sup>* → K*p*(A*<sup>p</sup>* ∨ A*q*) prop. reasoning from (7) 7. I*<sup>p</sup>* → A*<sup>p</sup>* ∨ A*<sup>q</sup>* t and prop. reasoning from 6. 8. I*<sup>p</sup>* → A*<sup>q</sup>* prop. reasoning from 7. since I*<sup>p</sup>* → ¬A*<sup>p</sup>* by P2 9. I*<sup>p</sup>* ∧ A*<sup>q</sup>* prop. reasoning from 5. and 8.

Hence, EHL <sup>C</sup>*<sup>s</sup> <sup>p</sup>*(A*<sup>p</sup>* ∨ A*q*) → I*<sup>p</sup>* ∧ A*q*.

#### **6.3 Modelling of Software Updates**

Consider an heterogeneous distributed system with two agent-types, U for the updated agents running the most recent software and O for the agents running the old protocol, which is designed with the possibility of future updates in mind. Since the new protocols are designed by taking into account the existence of processes running the old protocol, the interpretation functions can be built asymmetrically. Each type interprets information from its own type directly: f*UU* (ϕ) = ϕ and f*OO*(ϕ) = ϕ. U agents can interpret messages from O agents using backward compatibility f*UO*(ϕ) = g f*OO*(ϕ) , where g translates into the updated system language.

The opposite is not always possible as O agents have no knowledge of the new protocols. Accordingly, messages ϕ compatible with the old protocol will be processed as before, i.e., using f*OO*(ϕ). But if ϕ is unknown to the old protocol, i.e., f*OO*(ϕ) = ⊥, the creed under the assumption that sender s ∈ O would yield <sup>C</sup>*<sup>O</sup>*\*<sup>O</sup> <sup>s</sup>* <sup>ϕ</sup> ↔ ¬O*s*. In this case, receiver <sup>r</sup> can conclude that the sender process <sup>s</sup> does not conform to the old protocol. Since this error flagging disappears when r is also updated, however, it may very well be the case that this does not violate the fault resilience properties of the old protocol, in particular, when not too many processes are updated simultaneously. In this case, r could be guaranteed to always compute a correct result.

#### **6.4 Comparison to Related Work**

The interpretation functions in the knights and knaves puzzles depend on the speaker only, which made it possible to formalize them in [20] by means of public announcements. In the other two examples (Rue Morgue and software update), there is an additional difficulty: even knowing the sender's type, agents interpret messages differently based on the varying levels of knowledge about the sender's protocol. This important degree of freedom of our method compared to [20] is especially central to the software update example.

# **7 Conclusion and Future Work**

This paper provides a sound and complete axiomatization for a logic for heterogeneous distributed systems that generalizes the logic of fault-tolerant distributed systems and enables us to explicitly model the interpretation of messages sent by agents that execute different protocols (identified by types). It revolves around a (derived) new modality called creed, a generalization of the hope modality for byzantine agents, that satisfies positive and negative introspection post messageinterpretation and enjoys factivity when the sender's type is correctly identified. We demonstrated the explanatory power of our approach by applying it to three representative examples from areas ranging from detective reasoning to logic puzzles to distributed systems. The current formalization assumes that agents knowledge is factive even if this factivity does not affect how they communicated. Relaxing this assumption and working with agents whose beliefs may be compromised, e.g., due to sensor errors or memory failures, is a natural next step. Another natural extension is to allow for on-the-fly updates to the interpretation functions based on received information.

**Acknowledgments.** The authors would like to thank Stephan Felber, Krisztina Fruzsa, Rojo Randrianomentsoa, and Thomas Schl¨ogl as well as the participants of Dagstuhl Seminar 23272 "Epistemic and Topological Reasoning in Distributed Systems" for discussions and suggestions. We also thank the anonymous reviewers for their useful comments.

## **References**


France, 17–19 July 2019. Electronic Proceedings in Theoretical Computer Science, vol. 297, pp. 293–312. Open Publishing Association (2019). https://doi.org/10. 4204/EPTCS.297.19


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Symbolic Model Construction for Saturated Constrained Horn Clauses**

Martin Bromberger<sup>1</sup> , Lorenz Leutgeb1,2(B) , and Christoph Weidenbach<sup>1</sup>

<sup>1</sup> Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbr¨ucken, Germany

{mbromber,lorenz,weidenb}@mpi-inf.mpg.de <sup>2</sup> Graduate School of Computer Science, Saarland Informatics Campus, Saarbr¨ucken, Germany

**Abstract.** Clause sets saturated by hierarchic ordered resolution do not offer a model representation that can be effectively queried, in general. They only offer the guarantee of the existence of a model. We present an effective symbolic model construction for saturated constrained Horn clauses. Constraints are in linear arithmetic, the first-order part is restricted to a function-free language. The model is constructed in finite time, and non-ground clauses can be effectively evaluated with respect to the model. Furthermore, we prove that our model construction produces the least model.

**Keywords:** Bernays-Sch¨onfinkel Fragment · Linear Arithmetic · Horn Clauses · Superposition · Model Construction

## **1 Introduction**

Constrained Horn Clauses (CHCs) combine logical formulas with constraints over various domains, e.g. linear real arithmetic, linear integer arithmetic, equalities of uninterpreted functions [15]. This formalism has gained widespread attention in recent years due to its applications in a variety of fields, including program analysis and verification: safety, liveness, and termination [17,38], complexity and resource analysis [33], intermediate representation [22], and software testing [35]. Technical controls, so called *Supervisors*, like an electronic engine control unit, or a lane change assistant in a car [8,9] can be modelled, run, and proven safe. Moreover, there exist many different approaches for reasoning in CHCs and associated first-order logic fragments extended with theories [2,5,7,10,15,23– 25,28,29,34,37]. Thus, CHCs are a powerful tool for reasoning about complex systems that involve logical constraints, and they have been used to solve a wide range of problems.

A failed proof attempt of some conjecture or undesired run points to a bug. In this case investigation of the cause of the unexpected result or behavior is crucial. Building a model of the situation that can then be effectively queried is an important means towards a repair. However, some algorithms for CHCs, e.g. hierarchic superposition, which boils down to hierarchic ordered resolution in the context of CHCs, do not return a model that can be effectively queried if a proof attempt fails, in general. If so, queries are still restricted to ground clauses [4].

The contribution of our paper can be seen as an extension for these saturation based algorithms that produces models and not just saturated clause sets. In fact, we show how to build symbolic models out of any saturated CHC clause set over linear arithmetic. This fragment is equivalent to Horn clause sets of linear arithmetic combined with the Bernays-Sch¨onfinkel fragment. Recall that although satisfiability in this fragment is undecidable [16,26], in general, for a finitely saturated set we can construct such a representation in finite time.

Our models fulfill all important properties postulated in the literature for automated model building in first-order logic [13,20]. First, they can be *effectively constructed*, i.e., each model is represented by one linear arithmetic formula of finite size for each of its predicates and it can be constructed in finite time. Second, they are *unique*, i.e., the model representation specifies exactly one interpretation; in our case the least model. Third, they can be *effectively queried*, i.e., we provide decision procedures that evaluate whether an atom, clause, or formula is entailed/satisfied by the model. Fourth, it is possible to *test* the *equivalence* of two models. The approach we present does not exploit features of linear arithmetic beyond equality, the existence of a well-founded order for the theories' universe, and decidability of the theory. The results may therefore be adapted to other constraint domains. Model representation that can be effectively constructed and queried like ours are also called *effective model representations*. Moreover, our method is the first effective model construction approach for ordered resolution (or its extension to superposition) that is based on saturation, goes beyond ground clauses, and includes theory constraints. In the future, we plan to use this approach as the basis for a more general model construction approach that also works on more expressive fragments of first-order logic modulo theories.

Our model construction is inspired by the model construction operator used in the proof for refutational completeness of hierarchic superposition [3,6,30]. The main difference is that the model construction operator from the refutational completeness proof is restricted to ground clauses and executed on the potentially infinite ground instances of the saturated clause set (in addition to an infinite axiomatization of the background theory as ground clauses). As a result, the model construction operator from the refutational completeness proof cannot effectively construct a model because iterating over a potentially infinite set means it may diverge. Moreover, in contrast to our model construction, the original model operator cannot effectively evaluate non-ground atoms, clauses, or formulas. It is, however, sufficient, to show the existence of a model if the clause set is saturated and does not contain the empty clause [3,6,30]. In our version of the model construction operator, we managed to lift the restriction to ground clause sets by restricting the input logic to the Horn Bernays-Sch¨onfinkel fragment instead of full first-order logic. This enables us to define a strict propagation/production order for our non-ground clauses instead of just for ground clauses. As a result, we can construct the model one clause at a time.

The paper is organized as follows. In Sect. 2 we clarify notation and preliminaries. The main contribution is presented in Sect. 3. At the end of this section, we also explain how our models satisfy the postulates (see [13, Section 5.1, p. 234]) by Ferm¨uller and Leitsch for automated model building. We conclude in Sect. 4. Proofs were elided in favor of explanations and examples. An extended version, which includes proofs, can be found at [12].

### **2 Preliminaries and Notation**

We briefly recall the basic logical formalisms and notations we build upon [9]. Our starting point is a standard first-order language with *variables* (denoted x, y, z), *predicates* (denoted P, Q) of some fixed *arity*, and *terms* (denoted t, s). An *atom* (denoted A) is an expression P(t1,...,tn) for a predicate P of arity n = arity(P). When the terms t1,...,t<sup>n</sup> in P(t1,...,tn) are not relevant in some context, we also write P(∗). A *positive literal* is an atom A and a *negative literal* is a negated atom ¬A. We define comp(A) = ¬A, comp(¬A) = A, |A| = A and |¬A| = A. Literals are usually denoted L, K. We sometimes write literals as [¬]P(∗), meaning that the sign of the literal is arbitrary, often followed by a case distinction. Formulas are defined in the usual way using quantifiers ∀, ∃ and the boolean connectives (in order of decreasing binding strength) ¬, ∨, ∧, →, and ↔. The logic we consider does not feature a first-order equality predicate.

A *clause* (denoted C, D) is a universally closed disjunction of literals A1∨···∨ A<sup>n</sup> ∨¬B<sup>1</sup> ∨···∨¬Bm. We may equivalently write B<sup>1</sup> ∧···∧B<sup>m</sup> → A<sup>1</sup> ∨···∨An. A clause is *Horn* if it contains at most one positive literal, i.e. n ≤ 1. In Sect. 3, all clauses considered are Horn clauses. If Y is a term, formula, or a set thereof, vars(Y ) denotes the set of all variables in Y , and Y is *ground* if vars(Y ) = ∅. Analogously, Π(Y ) is the set of predicate symbols occurring in Y .

The *Bernays-Sch¨onfinkel Clause Fragment* (BS) in first-order logic consists of first-order clauses where all terms are either variables or constants. The *Horn Bernays-Sch¨onfinkel Clause Fragment* (HBS) is further restricted to Horn clauses.

A *substitution* σ is a function from variables to terms with a finite domain and codomain. We denote substitutions by σ, τ . The application of substitutions is often written postfix, as in xσ, and is homomorphically extended to terms, atoms, literals, clauses, and quantifier-free formulas. A substitution is *ground* if its codomain is ground. Let Y denote some term, literal, clause, or clause set. A substitution σ is a *grounding* for Y if Y σ is ground, and Y σ is a *ground instance* of Y in this case. We denote by gnd(Y ) the set of all ground instances of Y . The *most general unifier* mgu(Z1, Z2) of two terms/atoms/literals Z<sup>1</sup> and Z<sup>2</sup> is defined as usual, and we assume that it does not introduce fresh variables and is idempotent.

#### **2.1 Horn Bernays-Sch¨onfinkel with Linear Arithmetic**

The class HBS(LRA) is the extension of the Horn Bernays-Sch¨onfinkel fragment with linear real arithmetic (LRA). Analogously, the classes HBS(LQA) and HBS(LIA) are the extensions of the Horn Bernays-Sch¨onfinkel fragment with linear rational arithmetic (LQA) and linear integer arithmetic (LIA), respectively. The only difference between the three classes are the sort LA their variables and terms range over and the universe U over which their interpretations range. As the names already imply LA = LRA and <sup>U</sup> <sup>=</sup> <sup>R</sup> for HBS(LRA), LA = LQA and <sup>U</sup> <sup>=</sup> <sup>Q</sup> for HBS(LQA), and LA = LIA and <sup>U</sup> <sup>=</sup> <sup>Z</sup> for HBS(LIA). The results presented in this paper hold for all three classes and by HBS(LA) we denote that we are talking about an arbitrary one of them.

Linear arithmetic terms are constructed from a set X of *variables*, the set of constants <sup>c</sup> <sup>∈</sup> <sup>Q</sup> (if in HBS(LRA) or HBS(LQA)) or <sup>c</sup> <sup>∈</sup> <sup>Z</sup> (if in HBS(LIA)), and binary function symbols + and − (written infix). Additionally, we allow multiplication · if one of the factors is a constant. Multiplication only serves us as syntactic sugar to abbreviate other arithmetic terms, e.g., x + x + x is abbreviated to 3 · x. Atoms in HBS(LA) are either *first-order atoms* (e.g., P(13, x)) or *(linear) arithmetic atoms* (e.g., x < 42). Arithmetic atoms are denoted by λ and may use the predicates ≤, <, ≈, ≈, >, ≥, which are written infix and have the expected fixed interpretation. We use ≈ instead of = to avoid confusion between equality in LA and equality on the meta level. While we do not permit quantifiers in the syntax of clauses, the notion of symbolic interpretations that we will develop does require this, denoted as usual. By atoms(Y )/quants(Y ) we denote the linear arithmetic atoms/quantifiers in a formula or set of formulas Y . *First-order literals* and related notation is defined as before. *Arithmetic literals* coincide with arithmetic atoms, since the arithmetic predicates are closed under negation, e.g., ¬(x ≥ 42) is equivalent to x < 42.

HBS(LA) clauses are defined as for HBS but using HBS(LA) atoms. We often write clauses in the form Λ C where C is a clause solely built of free first-order literals and Λ is a multiset of LA atoms called the *constraint* of the clause. A clause of the form Λ C is therefore also called a *constrained clause*. Since the interpretation of linear arithmetic relations is fixed, we set Π(Λ C) := Π(C).

The fragment we consider in Sect. 3 is restricted even further to *abstracted* clauses: For any clause Λ C, all terms in C must be variables. Put differently, we disallow any arithmetic function symbols, including numerical constants, in C. Variable abstraction, e.g. rewriting x ≥ 3 P(x, 1) to x ≥ 3, y ≈ 1 P(x, y), is always possible. Hence, the restriction to abstracted clauses is not a theoretical limitation, but allows us to formulate our model construction operator in a more concise way. We assume abstracted clauses for theory development, but we prefer non-abstracted clauses in examples for readability, e.g., a unit clause P(3, 5) is considered in the development of the theory as the clause x ≈ 3, y ≈ 5 P(x, y).

In contrast to other works, e.g. [11], we do not permit first-order constants, and consequently also no variables that range over the induced Herbrand universe. All variables are arithmetic in the sense that they are interpreted by U. Since we only allow equalities in the arithmetic constraint, it is possible to simulate variables over first-order constants, by e.g. numbering them, i.e. defining a bijection between N and constant symbols. So this again not a theoretical limitation.

The semantics of Λ C is as follows:

$$A \parallel C \quad \text{iff} \quad \left(\bigwedge\_{\lambda \in A} \lambda\right) \to C \quad \text{iff} \quad \left(\bigvee\_{\lambda \in A} \neg \lambda\right) \lor C$$

For example, the clause x > 1∨y ≈ 5∨¬Q(x)∨R(x, y) is also written x ≤ 1, y ≈ 5 ¬Q(x) ∨ R(x, y). The negation ¬(Λ C) of a constrained clause Λ C where C = A<sup>1</sup> ∨···∨ A<sup>n</sup> ∨ ¬B<sup>1</sup> ∨···∨¬B<sup>m</sup> is thus equivalent to ( <sup>λ</sup>∈<sup>Λ</sup> <sup>λ</sup>) ∧ ¬A<sup>1</sup> <sup>∧</sup> ···∧¬A<sup>n</sup> ∧ B<sup>1</sup> ∧···∧ Bm. Note that since the neutral element of conjunction is , an empty constraint is thus valid, i.e. equivalent to true. In analogy to the empty clause in settings without constraints, we write to mean any and all clauses Λ ⊥ where Λ is satisfiable, which are all unsatisfiable.

An *assignment* for a constraint Λ is a substitution (denoted β) that maps all variables in vars(Λ) to values in U. An assignment is a *solution* for a constraint Λ if all atoms λ ∈ (Λβ) evaluate to true. A constraint Λ is *satisfiable* if there exists a solution for Λ. Otherwise it is *unsatisfiable*.

We assume *pure* input clause sets because otherwise satisfiability is undecidable for impure HBS(LA) [21]. This means the only constants of our sort LA are concrete rational numbers. Irrational numbers are not allowed by the standard definition of the theory. Fractions are not allowed if LA = LIA. Satisfiability of pure HBS(LA) clause sets is semi-decidable, e.g., using *hierarchic superposition* [3] or *SCL(T)* [10]. Note that pure HBS(LA) clauses correspond to *constrained Horn clauses (CHCs)* with LA as background theory.

All arithmetic predicates and functions are interpreted in the usual way denoted by the interpretation <sup>A</sup>LA. An interpretation of HBS(LA) coincides with <sup>A</sup>LA on arithmetic predicates and functions, and freely interprets non-arithmetic predicates. For pure clause sets this is well-defined [3]. Logical satisfaction and entailment is defined as usual, and uses similar notation as for HBS.

*Example 1.* The clause y ≥ 5, x ≈ x + 1 S0(x, y) → S1(x , 0) is part of a timed automaton with two clocks x and y modeled in HBS(LA). It represents a transition from state S<sup>0</sup> to state S<sup>1</sup> that can be traversed only if clock y is at least 5 and that resets y to 0 and increases x by 1.

#### **2.2 Ordering Literals and Clauses**

In order to define redundancy for constrained clauses, we need an *order* : Let ≺<sup>Π</sup> be a total, well-founded, strict ordering on predicate symbols and let ≺<sup>U</sup> be a total, well-founded, strict ordering on the universe U. (Note that ≺ cannot be the standard ordering < because it is not well-founded for Z, Q, or R. In the case of R, the existence of such an order is even dependent on whether we assume the axiom of choice [18].) We extend these orders step by step. First, to atoms, i.e., <sup>P</sup>(a) <sup>≺</sup> <sup>Q</sup>(b) if <sup>P</sup> <sup>≺</sup><sup>Π</sup> <sup>Q</sup> or <sup>P</sup> <sup>=</sup> <sup>Q</sup>, a,b ∈ U<sup>|</sup>a<sup>|</sup> , and a <sup>≺</sup>lex b, where <sup>≺</sup>lex is the lexicographic extension of ≺<sup>U</sup> . Next, we extend the order to literals with a strict precedence on the predicate and the polarity, i.e.,

$$P(\vec{t}) \prec \neg P(\vec{s}) \prec Q(\vec{u}) \qquad \text{if } P \prec Q$$

independent of the arguments of the literals. Then, take the multiset extension to order clauses. To handle constrained clauses extend the relation such that constraint literals (in our case arithmetic literals) are always smaller than firstorder literals. We conflate the notation of all extensions into the symbol ≺ and define as the reflexive closure of ≺. Note that ≺ is only total for ground atoms/literals/clauses, which is sufficient for a hierarchic superposition order [6].

**Definition 2 (**≺**-maximal Literal).** *A literal* L *is called* ≺-maximal *in a clause* C *if there exists a grounding substitution* σ *for* C*, such that there is no different* L ∈ C *for which* Lσ ≺ L σ*. The literal* L *is called* strictly ≺-maximal *if there is no different* L ∈ C *for which* Lσ L σ*.*

**Proposition 3.** *If* ≺ *is a predicate-based ordering,* C *is a Horn clause,* C *has a positive literal* L*, and* L *is* ≺*-maximal in* C*, then* L *is strictly* ≺*-maximal in* C*.*

**Definition 4 (**≺**-maximal Predicate in Clause).** *A predicate symbol* P *is called* (strictly) ≺-maximal *in a clause* C *if there is a literal* [¬]P(∗) ∈ C *that is (strictly)* ≺*-maximal in* C*.*

**Definition 5.** *Let* N *be a set of clauses,* ≺ *a clause ordering,* C *a clause, and* <sup>P</sup> *a predicate symbol. Then* <sup>N</sup> <sup>≺</sup><sup>C</sup> := {C <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>C</sup> <sup>≺</sup> <sup>C</sup>} *and* <sup>N</sup> <sup>P</sup> := {<sup>C</sup> <sup>∈</sup> <sup>N</sup> <sup>|</sup> Q *is* ≺ *-maximal in* C *and* Q P}*.*

#### **2.3 Hierarchic Superposition, Redundancy and Saturation**

For pure HBS(LA) most rules of the (hierarchic) superposition calculus become obsolete or can be simplified. In fact, in the HBS(LA) case (hierarchic) superposition boils down to (hierarchic) ordered resolution. For a full definition of (hierarchic) superposition calculus in the context of linear arithmetic, consider SUP(LA) [1]. Here, we will only define its simplified version in the form of the hierarchic resolution rule.

**Definition 6 (Hierarchic** ≺**-Resolution).** *Let* ≺ *be an order on literals and* Λ<sup>1</sup> L1∨C1*,* Λ<sup>2</sup> L2∨C<sup>2</sup> *be constrained clauses. The inference rule of hierarchic* ≺*-resolution is:*

$$\frac{A\_1 \parallel L\_1 \lor C\_1 \qquad A\_2 \parallel L\_2 \lor C\_2 \qquad \sigma = \text{mgu}(L\_1, \text{comp}(L\_2))}{(A\_1, A\_2 \parallel C\_1 \lor C\_2)\sigma}$$

*where* L<sup>1</sup> *is* ≺*-maximal in* C<sup>1</sup> *and* L<sup>2</sup> *is* ≺*-maximal in* C2*.*

Note that in the resolution rule we do not enforce explicitly that the positive literal is strictly maximal. This is possible because in the Horn case any positive literal is strictly maximal if it is maximal in the clause.

For saturation, we need a termination condition that defines when the calculus under consideration cannot make any further progress. In the case of superposition, this notion is that any new inferences are *redundant*.

**Definition 7 (Clause Redundancy).** *A ground clause* Λ C ∈ N *is* redundant *with respect to a set* <sup>N</sup> *of ground clauses and order* <sup>≺</sup> *if* <sup>N</sup> <sup>≺</sup><sup>Λ</sup> <sup>C</sup> <sup>Λ</sup> <sup>C</sup>*. A potentially non-ground clause* Λ C ∈ N *is* redundant *with respect to a potentially non-ground clause set* N *and order* ≺ *if for all* Λ C ∈ gnd(Λ C) *the clause* Λ C *is redundant with respect to* gnd(N)*.*

If a clause Λ C ∈ N is redundant with respect to a clause set N, then it can be removed from N without changing its semantics. If Λ C is newly inferred, then we also call it redundant if Λ C is already part of N. The same cannot be said for clauses in N or all clauses in N would be redundant. Determining clause redundancy is an undecidable problem [10,40]. However, there are special cases of redundant clauses that can be easily checked, e.g., tautologies and subsumed clauses. Redundancy also means that <sup>I</sup> <sup>N</sup> <sup>≺</sup><sup>Λ</sup> <sup>C</sup> implies <sup>I</sup> <sup>Λ</sup> <sup>C</sup> if <sup>Λ</sup> <sup>C</sup> is redundant w.r.t. N. We will exploit this fact in the model construction.

**Definition 8 (Saturation).** *A set of clauses* N *is* saturated up to redundancy *with respect to some set of inference rules, if application of any rules to clauses in* N *yields a clause that is redundant with respect to* N *or is contained in* N*.*

#### **2.4 Interpretations**

In our context, models are interpretations that satisfy (sets of) clauses. The standard notion of an interpretation is fairly opaque and interprets a predicate P as the potentially infinite set of ground arguments that satisfy P.

**Definition 9 (Interpretation).** *Let* P *be a predicate symbol with* arity(P) = <sup>n</sup>*. Then,* <sup>P</sup> <sup>I</sup> *denotes the subset of* <sup>U</sup> <sup>n</sup> *for which the* interpretation <sup>I</sup> *maps the predicate symbol* P *to* true*.*

Since our model construction approach manipulates interpretations directly, we need a notion of interpretations that always has a finite representation and for which it is possible to decide (in finite time) whether a clause is satisfied by the interpretation. Therefore, we rely on the notion of symbolic interpretations:

**Definition 10 (Symbolic Interpretation).** *Let* x1, x2,... *be an infinite sequence of distinct variables, i.e.* x<sup>i</sup> = x<sup>j</sup> *for all* 1 ≤ i<j*. (We assume the same sequence for all symbolic interpretations in order to prevent conflicts when we later combine multiple symbolic interpretations into one.) A* symbolic interpretation S *is a function that maps every predicate symbol* P *with* arity(P) = n *to a formula denoted* P <sup>S</sup> (x) *of finite size, constructed using the usual boolean connectives over* LA *atoms, where the only free variables appear in* x = (x1,...,xn)*. The interpretation* I<sup>S</sup> *corresponding to* S *is defined by* P <sup>I</sup><sup>S</sup> = {(x)β | β P <sup>S</sup> (x)} *and maps the predicate symbol* <sup>P</sup> *to* true *for the subset of* <sup>U</sup> <sup>n</sup> *which corresponds to the solutions of* P <sup>S</sup> (x)*.*

*Example 11.* Let N be a clause set consisting of the clauses 0 ≤ x ≤ 2, 0 ≤ y ≤ 2P(x, y) and x<sup>Q</sup> ≥ x<sup>P</sup> + 1, y<sup>Q</sup> ≥ y<sup>P</sup> + 1¬P(x<sup>P</sup> , y<sup>P</sup> ) ∨ Q(xQ, yQ). An example of a symbolic interpretation S that satisfies N, would be the function that maps P to P <sup>S</sup> (x1, x2)=0 ≤ x<sup>1</sup> ≤ 2 ∧ 0 ≤ x<sup>2</sup> ≤ 2 and Q<sup>S</sup> (x1, x2)=1 ≤ x<sup>1</sup> ∧ 1 ≤ x2. It corresponds to the interpretation I<sup>S</sup> where P <sup>I</sup><sup>S</sup> = {(a1, a2) ∈U| 0 ≤ a<sup>1</sup> ≤ 2 ∧ 0 ≤ a<sup>2</sup> ≤ 2} and QI<sup>S</sup> = {(a1, a2) ∈U| 1 ≤ a<sup>1</sup> ∧ 1 ≤ a2}.

The notion of symbolic interpretations is closely related to A*-definable models* [7, Definition 7] and *constrained atomic representations* [13, Definition 5.1, pp. 236–237]. Each symbolic interpretation S(x) is equivalent to a constrained atomic representation that consists of one constraint atom [[P(x) : P <sup>S</sup> (x)]] (written in the notation from [13]) for every predicate P. Note that in this context the constraint is not just a quantifier-free conjunction of linear arithmetic atoms, but a linear arithmetic formula potentially containing quantifiers (although those can be eliminated with quantifier elimination techniques).

Due to the fact that each symbolic interpretation consists of a finite set of formulas of finite size, symbolic interpretations can be considered as finite representations. In contrast, the standard representation of an interpretation as a potentially infinite set of ground atoms is not a finite representation. However, this also means that there are some interpretations for which no corresponding symbolic interpretation exists, for instance the set of prime numbers is a satisfying interpretation for y ≈ 2 P(y), but not expressible as a symbolic interpretation (in LA). As we will later see, at least any saturated set of HBS(LA) clauses either is unsatisfiable or has a symbolic interpretation that satisfies it (Theorem 29).

The *top interpretation*, denoted I, is defined as P <sup>I</sup>- := <sup>U</sup> <sup>n</sup> for all predicate symbols P with arity(P) = n and corresponds to the *top symbolic interpretation*, denoted S, defined as P <sup>S</sup>- := for all predicate symbols P. The *bottom interpretation* (or *empty interpretation*), denoted I⊥, and the *bottom symbolic interpretation* (or *empty symbolic interpretation*), denoted S⊥, are defined analogously. The interpretation of P under I∪J is defined as P I∪J := P <sup>I</sup> ∪ P <sup>J</sup> for every predicate P. In the symbolic case, S∪R is defined as P S∪R(x) := P <sup>S</sup> (x) ∨ P <sup>R</sup>(x) for every predicate P. We write I⊆J or I is *included in* J (resp. I⊂J or I is *strictly* included in J ) if P <sup>I</sup> ⊆ P <sup>J</sup> (resp. P <sup>I</sup> ⊂ P <sup>J</sup> ) for all predicate symbols P.

**Definition 12 (Entailment of Literal).** *Let* I *be an interpretation. Given a ground literal* P(a1,...,an)*, where* a<sup>i</sup> ∈ U*, we write* I P(a1,...,an) *if* (a1,...,an) <sup>∈</sup> <sup>P</sup> <sup>I</sup>*. Conversely, we write* <sup>I</sup> - P(a1,...,an) *if* (a1,...,an) ∈ P <sup>I</sup>*. For a non-ground literal* L*, we write* I L *if for all grounding substitutions* σ *for* <sup>L</sup>*, we have* <sup>I</sup> Lσ*. Conversely, we write* <sup>I</sup> - L*, if there exists a grounding substitution* <sup>σ</sup> *for* <sup>L</sup>*, such that* <sup>I</sup> -Lσ*.*

We overload for symbolic interpretations, i.e. we write S L and mean I<sup>S</sup> L. The following function encodes a clause as an LA formula for evaluation under a given symbolic interpretation.

**Definition 13 (Clause Evaluation Function).***Let* Λ C *be a constrained clause where* C = L<sup>1</sup> ∨···∨ Lm*,* L<sup>i</sup> = [¬]Pi(yi,<sup>1</sup>,...,yi,n<sup>i</sup> ) *and let* S *be a symbolic interpretation. Then the clause evaluation function* (Λ C <sup>S</sup> *is defined as* *follows based on the definitions for* σ<sup>i</sup> *and* φ<sup>i</sup> *(for* 1 ≤ i ≤ m*):*

$$\sigma\_i := \{ x\_j \mapsto y\_{i,j} \mid 1 \le j \le n\_i \} \qquad \phi\_i := \begin{cases} P\_i^{\mathbb{S}} & L\_i \text{ is positive} \\ \neg P\_i^{\mathbb{S}} & L\_i \text{ is negative (otherwise)} \end{cases}$$

$$\left(A \parallel C\right)^{\mathcal{S}} := \left(\bigwedge\_{\lambda \in A} \lambda\right) \to \left(\bigvee\_{i=1}^{m} \phi\_i \sigma\_i\right).$$

Note that the free variables of (Λ C)<sup>S</sup> are exactly the free variables of (Λ C). Moreover, the substitutions σ<sup>i</sup> are necessary in the above definition in order to map the variables in the symbolic interpretation for the predicates P <sup>S</sup> i to the variables that appear as arguments in the literals Pi(y1,1,...,y1,n<sup>i</sup> ).

**Proposition 14.** *Given a constrained clause* Λ C *with grounding* β*, we have*

$$\models \left(\boldsymbol{A} \parallel \boldsymbol{C}\right)^{\mathcal{S}} \beta \qquad \text{if } \operatorname{and} \text{ only if } \qquad \mathcal{S} \models \left(\boldsymbol{A} \parallel \boldsymbol{C}\right) \beta$$

As a corollary of the previous proposition, the entailment S Λ C holds if and only if the universal closure of the formula (Λ C)<sup>S</sup> is valid. This means that for a symbolic interpretation S it is always computable whether a clause is entailed by S because there are decision procedures for quantified LRA, LQA, and LIA formulas of finite size.

We require two functions that manipulate LA-formulas directly to express our model construction (cf. Definition 17), i.e. to map solutions for a clause defined by a formula vars(φ) to one atom inside the clause. This requires from us to project away all variables in φ that appear in the clause but not in the atom.

**Definition 15 (Projection).** *Let* V *be a set of variables and* φ *be an* LA*formula. The projection function* π *is defined as follows:*

π(V,φ) := ∃x<sup>1</sup> ... ∃xn. φ *where* {x1,...,xn} = vars(φ) \ V

π(V,φ) is a standard projection function that binds a subset V of the variables in the formula φ with existential quantifiers. Note that we also know that π(V,φ) is equivalent to a quantifier-free LA formula just over the variables x1,...,x<sup>n</sup> because there exist quantifier elimination algorithms for LRA, LQA, and LIA [14,32].

A further function is needed when we encounter literals of the form P(x, x, . . .), i.e., where one variable is shared among two arguments. In this case, we use to express in our symbolic interpretation that the equivalent argument positions must also be equivalent in our interpretation.

**Definition 16 (Sharing).** *Let* (y1,...,yn) *and* (x1,...,xn) *be tuples of variables with the same length. The sharing function , which encodes variable sharing across different argument positions, is defined as follows:*

$$\vee\left(\left(y\_1,\ldots,y\_n\right),\left(x\_1,\ldots,x\_n\right)\right) := \bigwedge\_{\substack{1 \le i < j \le n, \ y\_i = y\_j}} x\_i \approx x\_j$$

#### **2.5 Consequence and Least Model**

The notion of a *least model* is common in logic programming. Horn logic programs admit a least model, which is the intersection of all models of the program (see [31, § 6, p. 36]). In our context, the least model of a set of clauses N is the intersection of all models of N. An alternative characterization of the least model of N is through the least fixed point of the one-step consequence operator, which we define as T<sup>N</sup> for the context of LA constraints analogously to [27, Section 4]. The one-step consequence operator T<sup>N</sup> takes a set of clauses N and an interpretation I as input and returns an interpretation:

$$P^{T\_N(\mathcal{T})} := \left\{ (\vec{y})\beta \; \middle| \begin{array}{l} A \parallel \neg P\_1(\vec{y\_1}) \vee \dots \vee \neg P\_n(\vec{y\_n}) \vee P(\vec{y}) \in N, \\ \models A\beta, \text{and } \mathcal{T} \models P\_i(\vec{y\_i})\beta \text{ for } 1 \le i \le n \end{array} \right\}$$

The least fixed point of this operator exists by Tarski's Fixed Point Theorem [39]: Interpretations form a complete lattice under inclusion (supremum given by union, infimum given by intersection), and T<sup>N</sup> is monotone.

#### **3 Model Construction**

In this section we address construction of models for HBS(LA). Throughout this section, we consider a set of constrained Horn clauses N and an order ≺ to be given. Our aim is to define an interpretation I<sup>N</sup> , such that

$$
\mathcal{T}\_N \models N \qquad \text{if } N \text{ is saturated and } \Box \notin N
$$

Towards that goal, we define the operator δ(S, Λ C∨P(y)). It takes a symbolic interpretation S, and a Horn clause with maximal literal P(y). It results in a symbolic interpretation that accounts for Λ C ∨ P(y).

**Definition 17 (Production Operator).***Let* Λ C *be a constrained Horn clause, where* C = C ∨ P(y)*,* P(y) C *, and* C = ¬P1(y<sup>1</sup>,<sup>1</sup>,...,y<sup>1</sup>,n<sup>1</sup> ) ∨···∨ ¬Pm(ym,<sup>1</sup>,...,ym,nm)*. Let* S *be a symbolic interpretation, where the free variables of* P <sup>S</sup> *are* x *and the free variables of* P <sup>S</sup> <sup>i</sup> *are* x<sup>i</sup> *(for* 1 ≤ i ≤ m*). Note that* n = |y| = |x| = arity(P)*.*

*The* production operator δ(S, Λ C) *results in a new symbolic interpretation*

$$\begin{aligned} P^{\delta(\mathcal{S},\Lambda \parallel C)}(\vec{x}) &:= \left(\pi(\{y\_1,\ldots,y\_n\}, \bigwedge\_{\lambda \in \Lambda} \lambda \wedge \bigwedge\_{i=1}^m (P\_i^{\mathcal{S}})\sigma\_i\right) \sigma \wedge \vee(\vec{y},\vec{x}) \\\ Q^{\delta(\mathcal{S},\Lambda \parallel C)}(\vec{z}) &:= \bot \qquad \text{for all } Q \neq P \text{ where } |\vec{z}| = \text{arity}(Q) \end{aligned}$$

*where, to map variables from literal arguments to the variables appearing in the symbolic interpretation* S *and back, we have the substitutions*

$$\begin{aligned} \sigma &:= \{ y' \quad \mapsto x\_j \mid y' \in \{ y\_1, \dots, y\_n \} \text{ and } j \text{ is the smallest index } s.t. \ y\_j = y' \} \\ \sigma\_i &:= \{ x\_{i,j} \mapsto y\_{i,j} \mid 1 \le j \le n\_i \} \qquad \text{for } 1 \le i \le m \end{aligned}$$

The goal of the operator δ(S, Λ C) is to define an extension of the symbolic interpretation S such that S∪δ(S, Λ C) satisfies Λ C. Note that δ only extends the interpretation over the strictly maximal predicate P. Moreover, due to our predicate order, it only needs to consider the interpretation S for predicates Q with Q ≺ P. δ also satisfies the following two symmetrical properties: On the one hand, every grounding τ of Λ C ∨ P(y) that is not yet satisfied by S must correspond to solution β of Pδ(S,Λ <sup>C</sup> <sup>∨</sup><sup>P</sup> (y)) that satisfies P(y)τ . On the other hand, every solution β of Pδ(S,Λ <sup>C</sup> <sup>∨</sup><sup>P</sup> (y)) must correspond to a grounding of Λ C ∨ P(y) that is not yet satisfied by S. The first property is needed so S ∪ δ(S, Λ C ∨ P(y)) satisfies Λ C ∨ P(y). The second property is needed so we do not accidentally extend our interpretation by any solutions not needed to satisfy Λ C ∨ P(y).

Note that in the above statements β and τ are generally not the same because the variables x used to define P <sup>S</sup> are not necessarily the same as the variables appearing in the clause Λ C and literal P(y). There are three reasons for this that are handled by three different methods in our model construction:


The parts of P<sup>δ</sup>(S,Λ <sup>C</sup>) that we have not yet discussed are based on the fact that any constrained Horn clause Λ C ∨ P(y) can also be written as an implication of the form φ → P(y), where φ := Λ ∧ P1(y<sup>1</sup>,<sup>1</sup>,...,y<sup>1</sup>,n<sup>1</sup> ) ∧···∧ <sup>P</sup>m(ym,<sup>1</sup>,...,ym,nm) and <sup>S</sup> - Λ C τ if and only if S φτ . This means the groundings τ of Λ C not satisfied by S are also the groundings of φ satisfied by S. It is straightforward to express these groundings with a conjunctive formula based on Λ and the P <sup>S</sup> <sup>i</sup> . The only challenge is the reverse problem from before, i.e. mapping the variables of P <sup>S</sup> <sup>i</sup> to the variables in the literals Pi(y<sup>1</sup>,<sup>1</sup>,...,y<sup>1</sup>,n<sup>i</sup> ). This mapping is done in δ by the substitution σi.

Now, based on the production operator δ for one clause, we can use an inductive definition over the order ≺ to define an interpretation S<sup>N</sup> for all clauses in N. We distinguish the following auxiliary symbolic interpretations: S≺<sup>P</sup> which captures progress up to but excluding the predicate P, Δ<sup>P</sup> which captures how P should be interpreted considering S≺<sup>P</sup> , and S<sup>P</sup> which captures progress up to and including the predicate P. The symbolic interpretation Δ<sup>Λ</sup> <sup>C</sup> <sup>P</sup> is the extension of S≺<sup>P</sup> w.r.t. the single clause Λ C.

**Definition 18 (Model Construction).** *Let* N *be a finite set of constrained Horn clauses. We define symbolic interpretations* S≺<sup>P</sup> *,* S<sup>P</sup> *and* Δ<sup>P</sup> *for all predicates* P ∈ Π(N) *by mutual induction over* ≺*:*

$$\mathcal{S}\_{\preceq P} \coloneqq \mathcal{S}\_{\prec P} \cup \Delta\_P \qquad \mathcal{S}\_{\prec P} \coloneqq \bigcup\_{Q \prec P} \Delta\_Q \qquad \Delta\_P \coloneqq \bigcup\_{A \parallel C \vee P(\*) \in N} \Delta\_P^A \parallel C' \vee P(\*)$$
 
$$\Delta\_P^A \parallel^C \coloneqq \begin{cases} \delta(\mathcal{S}\_{\prec P}, A \parallel C) & \text{if } P(\vec{y}) \text{ maximal in } C, \text{ and } \mathcal{S}\_{\prec P} \not\models A \parallel C \\ \mathcal{S}\_{\perp} & \text{otherwise} \end{cases}$$

Finally, based on the above inductive definition of S≺<sup>P</sup> for every predicate symbol P ∈ Π(N), we arrive at an overall interpretation for N.

**Definition 19 (Candidate Interpretation).** *The* candidate interpretation *for* N *(w.r.t* ≺*), denoted* I<sup>N</sup> *, is the interpretation associated with the symbolic interpretation* S<sup>N</sup> = <sup>P</sup> <sup>∈</sup>Π(N) <sup>Δ</sup><sup>P</sup> *where* <sup>P</sup> *ranges over all predicate symbols occurring in* N*.*

Note that S<sup>N</sup> = S<sup>P</sup> where P is ≺-maximal in Π(N). Obviously, we intend that <sup>S</sup><sup>N</sup> <sup>N</sup> if <sup>N</sup> is saturated (Theorem 29). Otherwise, i.e. <sup>S</sup><sup>N</sup> - N, we can use our construction to find a non-redundant inference (Corollary 30). Consider the following two examples, demonstrating how δ sits at the core of the aforementioned inductive definitions of symbolic interpretations.

*Example 20 (Dependent Interpretation).*Assume P ≺ Q and consider the following set of clauses:

$$N := \begin{cases} 0 \le y\_1 \le 2, 0 \le y\_2 \le 2 \\ y\_3 \ge y\_1 + 1, y\_4 \ge y\_2 + 1 \end{cases} \mid \frac{P(y\_1, y\_2)}{P(y\_1, y\_2)} \to \underline{Q(y\_3, y\_4)} \quad (C\_2)$$

Maximal literals are underlined. Since the maximal literals of C<sup>1</sup> and C<sup>2</sup> are both positive, ordered resolution cannot be applied. The set is saturated. Since P is the ≺-smallest predicate we have S≺<sup>P</sup> = S⊥. Applying the δ operator yields the following interpretation for P:

$$P^{\mathcal{S}\_{\preceq^P}} = P^{\delta(\mathcal{S}\_{\prec^P}, C\_1)}(x\_1, x\_2) = 0 \le x\_1 \le 2 \land 0 \le x\_2 \le 2\pi$$

Then, Q is interpreted relative to P. Consider the clause C2: For all solutions of its constraint y<sup>3</sup> ≥ y<sup>1</sup> + 1, y<sup>4</sup> ≥ y<sup>2</sup> + 1 our model must also satisfy its logical part P(y1, y2) → Q(y3, y4). The intuition that Q depends on P arises from the implication in the logical part. Whenever the constraint of C<sup>2</sup> and P(y1, y2) are satisfied, Q(y3, y4) must be satisfied. These are exactly the points defined through δ(S≺<sup>Q</sup>, C2), based on S≺<sup>Q</sup> = S<sup>P</sup> = δ(S≺<sup>P</sup> , C1):

$$\begin{aligned} Q^{\delta(\mathcal{S}\_{\mathcal{L}(Q)}C\_2)}(x\_1, x\_2) &= \exists z\_1, z\_2. \ x\_1 \ge z\_1 + 1 \land x\_2 \ge z\_2 + 1 \land 0 \le z\_1 \le 2 \land 0 \le z\_2 \le 2\\ &= x\_1 \ge 1 \land x\_2 \ge 1 \end{aligned}$$

Whenever the conjuncts 0 ≤ y<sup>1</sup> ≤ 2 and 0 ≤ y<sup>2</sup> ≤ 2 are satisfied, the premise of the implication is true, thus there must be a solution to the interpretation of Q, additionally abiding the constraint of the clause. Since Q is ≺-maximal in N, we arrive at S<sup>N</sup> = S<sup>Q</sup> = S<sup>P</sup> ∪ δ(S≺Q, C2) = δ(S⊥, C1) ∪ δ(S<sup>P</sup> , C2). See Fig. 1a for a visual representation of S<sup>N</sup> .

*Example 21 (Unsaturated Clause Set).*Assume P ≺ Q and consider the following set of clauses:

$$N := \begin{cases} y\_1 < 0 \parallel P(y\_1) & (C\_1), \\ y\_1 > 0 \parallel \underline{P(y\_1)} & (C\_2), \\ \end{cases} \quad \begin{array}{l} y\_1 < 1 \parallel \underline{Q(y\_1)} \\ y\_1 \le 0 \parallel \underline{Q(y\_1)} \rightarrow P(y\_1) \end{array} \begin{array}{l} (C\_3), \\ (C\_4) \end{array} \right\}$$

Maximal literals are underlined. Note that a resolution inference is possible, since the maximal literals of C<sup>3</sup> and C<sup>4</sup> have opposite polarity, use the same predicate symbol, and are trivially unifiable. Thus, in this example we consider the effect of applying our model construction to a clause set that is *not* saturated. Since P is ≺-minimal, we start with the following steps:

$$\begin{aligned} \mathcal{S}\_{\prec P} = \mathcal{S}\_{\perp} \qquad &P^{\delta(\mathcal{S}\_{\prec P}, C\_1)}(x\_1) = x\_1 < 0 \\ &P^{\delta(\mathcal{S}\_{\prec P}, C\_2)}(x\_1) = x\_1 > 0 \qquad &P^{\mathcal{S}\_{\preceq P}}(x\_1) = x\_1 < 0 \lor x\_1 > 0 \end{aligned}$$

Next, we obtain the following results for Q:

$$\begin{aligned} \mathcal{S}\_{\prec Q} = \mathcal{S}\_{\preceq P} \quad & Q^{\delta(\mathcal{S}\_{\prec Q}, C\_3)}(x\_1) = x\_1 < 1 \\ & Q^{\delta(\mathcal{S}\_{\prec Q}, C\_4)}(x\_1) = \bot \qquad & Q^{\mathcal{S}\_{\preceq Q}}(x\_1) = x\_1 < 1 \lor \bot = x\_1 < 1 \end{aligned}$$

See Fig. 1b for a visual representation of <sup>S</sup><sup>N</sup> <sup>=</sup> <sup>S</sup><sup>Q</sup>. Note that <sup>S</sup><sup>N</sup> - C4, since we have <sup>S</sup><sup>N</sup> <sup>Q</sup>(0) but <sup>S</sup><sup>N</sup> - P(0). Thus, by using the constructed model, we can pinpoint clauses that contradict that N is saturated. Applying resolution to C<sup>3</sup> and C<sup>4</sup> leads to the clause y<sup>1</sup> ≤ 0 P(y1) labelled C5. If we then add C<sup>5</sup> to N, we instead get P <sup>S</sup><sup>P</sup> (x1) = x<sup>1</sup> < 0 ∨ x<sup>1</sup> > 0 ∨ x<sup>1</sup> ≤ 0 = .

In the following, we clarify some properties of the construction. We provide an upper bound for the number of LA atoms and quantifiers in the symbolic model for LRA and LQA. Although we do not state it explicitly, the estimate for LIA works in a similar way, but due to the higher complexity of LIA quantifier elimination, the size of the symbolic model grows triple exponentially [36].

**Proposition 22.** *If* N *is a finite set of* LRA*/*LQA *constrained Horn clauses, and* S <sup>N</sup> *the result of applying quantifier elimination to* S<sup>N</sup> *then, for every predicate symbol* <sup>P</sup> <sup>∈</sup> <sup>Π</sup>(N)*, the number of* LA *atoms in* <sup>P</sup> <sup>S</sup> <sup>N</sup> *is in* O(m<sup>2</sup>·qp−<sup>1</sup> · n<sup>2</sup>·qp−<sup>1</sup> · (<sup>l</sup> <sup>+</sup> <sup>a</sup><sup>2</sup>)<sup>q</sup><sup>p</sup> ) *where* n *is the max. number of clauses with the same max. predicate,* m *is the max. number of non-arithmetic literals in a clause,* l *is the max. number of arithmetic literals in a clause,* a *is the max. arity of any predicate,* p = |Π(N)|*,* q *is the max. difference of variables in any clause and its positive maximal literal.*

**Fig. 1.** Visual representation of the models resulting from Examples 20 and 21.

**Corollary 23 (Effective Construction).** *If* N *is a finite set of constrained Horn clauses then for every predicate* P ∈ Π(N)*,* P <sup>S</sup><sup>N</sup> *is a linear arithmetic formula of finite size, and can be computed in a finite number of steps.*

We show that all points in P <sup>I</sup> <sup>N</sup> are necessary and justified in some sense, that I<sup>N</sup> is indeed a model of N, and that I<sup>N</sup> is also the least model of N if N is saturated. The notion of whether a clause is productive captures whether it contributes something to the symbolic interpretation.

**Definition 24 (Productive Clause).** *Let* P *be a predicate symbol with* arity(P) = n*. We say that* Λ C produces P(a1,...,an) *if* (a1,...,an) ∈ P <sup>Δ</sup><sup>Λ</sup> <sup>C</sup> <sup>P</sup> *.*

Next, we want to formally express that every element of the resulting interpretation is justified. Firstly, we express that the operator δ will produce points such that every clause is satisfied whenever necessary, i.e. whenever the maximal literal of the clause is P(∗) and the maximal literal not satisfied by S≺<sup>P</sup> .

**Proposition 25.** *Let* Λ<sup>C</sup> C *where* C = C ∨ P(y) *and* C ≺ P(y)*. Let* τ *be a grounding substitution for* <sup>Λ</sup><sup>C</sup> <sup>C</sup>*. If* <sup>S</sup>≺<sup>P</sup> - (Λ<sup>C</sup> C)τ *, then* Λ<sup>C</sup> τ *and* S<sup>P</sup> P(y)τ *, thus* S<sup>P</sup> (Λ<sup>C</sup> C)τ *.*

Secondly, we express that for every point in P <sup>I</sup> <sup>N</sup> , it is justified in the sense that there is a clause that produced the point, i.e. this clause would otherwise not be satisfied by the resulting interpretation.

**Proposition 26.** *If* S<sup>P</sup> P(a)*, then there exists a clause* Λ<sup>C</sup> C *where* C = C ∨ P(y) *and* C ≺ P(y)*, and there exists a grounding* τ *for* Λ<sup>C</sup> C*, such that* <sup>P</sup>(a) = <sup>P</sup>(y)<sup>τ</sup> *and* <sup>S</sup>≺<sup>P</sup> -(Λ<sup>C</sup> C)τ *.*

Also, observe that once the maximal predicate P of a given clause is interpreted by S<sup>P</sup> , the interpretation of the clause does not change for S<sup>Q</sup> where Q P.

**Corollary 27.** *Let* P ≺ Q R*, and* P *be maximal in clause* C*. If* S<sup>P</sup> Λ<sup>C</sup> C *or* S≺<sup>Q</sup> Λ<sup>C</sup> C*, then* S≺<sup>R</sup> Λ<sup>C</sup> C *and* S<sup>R</sup> Λ<sup>C</sup> C*.*

As a result, we know that the full model satisfies N, i.e., I<sup>N</sup> N if every clause is satisfied at the point of the construction, where the interpretation of its maximal predicate P stays fixed.

**Proposition 28.** *For every clause* Λ<sup>C</sup> C ∈ N *with maximal predicate* P*, if* S<sup>P</sup> Λ<sup>C</sup> C*, then* I<sup>N</sup> N*.*

With the above propositions (and some auxiliary properties that can be found in [12]) we show that indeed I<sup>N</sup> N if N is saturated and does not contain the empty clause.

**Theorem 29.** *Let* ≺ *be a clause ordering and* N *be a set of constrained Horn clauses. If (1.)* N *is saturated w.r.t.* ≺*-resolution, and (2.)* -∈ N*, then* I<sup>N</sup> N*.*

For clauses with positive maximal literal, the fact that they are satisfied by I<sup>N</sup> follows from Proposition 25. For clauses with maximal literal ¬P(∗), we prove this theorem by contradiction: If there is a minimal clause Λ<sup>C</sup> C such that <sup>S</sup><sup>N</sup> - Λ<sup>C</sup> C. We can then exploit Proposition 26 to find the smallest clause Λ<sup>D</sup> D that produced the respective instance P(a). Applying hierarchic ≺-resolution to Λ<sup>C</sup> C and Λ<sup>D</sup> D then yields a non-redundant clause. This idea then leads to the following theorem.

**Corollary 30.** *Let* ≺ *be a clause ordering and* N *be a set of constrained Horn clauses. If (1.)* <sup>I</sup><sup>N</sup> - N*, and (2.)*- ∈ N*, then there exist two clauses* Λ<sup>C</sup> C*,* Λ<sup>D</sup> D ∈ N *such that: (1.)* Λ<sup>C</sup> C *is the smallest clause not satisfied by* I<sup>N</sup> *, i.e. there exists a grounding* <sup>τ</sup> *such that* <sup>I</sup><sup>N</sup> - (Λ<sup>C</sup> C)τ *, but there does not exist a clause* Λ<sup>C</sup> C ∈ N *with grounding* τ *, such that* <sup>I</sup><sup>N</sup> - (Λ<sup>C</sup> C )τ *and* (Λ<sup>C</sup> C )τ ≺ (Λ<sup>C</sup> C)τ *, (2.)*¬P(a) *is the maximal literal of* (Λ<sup>C</sup> C)τ *, (3.)*Λ<sup>D</sup> D *is the minimal clause that produces* P(a)*, (4.)*≺*-resolution is applicable to* Λ<sup>C</sup> C *and* Λ<sup>D</sup> D*, and (5.)the resolvent of* Λ<sup>C</sup> C *and* Λ<sup>D</sup> D *is not redundant w.r.t.* N*.*

Additionally, we show that I<sup>N</sup> is the least model of N, establishing a connection between our approach and the literature on constrained Horn clauses (see [27, Section 4] and [15, Section 2.4.1]) and logic programming (see [31, § 6, p. 37]).

**Theorem 31.** I<sup>N</sup> *is the least model of* N*.*

Ferm¨uller and Leitsch define four postulates (see [19] as cited in [13, Section 5.1, p. 234]) regarding *automated model building*. In the following, we instantiate the postulates for our setting. By S(N) we denote the set of all symbolic interpretations of the set of constrained Horn clauses N. We argue how our approach satisfies all postulates, one by one:

**Uniqueness.** *Each element of* S(N) *specifies a single interpretation of* N. We have shown (cf. Theorem 31) that I<sup>N</sup> , the model represented by S<sup>N</sup> , is the least model of N, which is unique.

**Atom Test.** *There exists a fast procedure to evaluate arbitrary ground atoms over* Π(N) *in the interpretation defined by a* S *in* S(N).

This is a special case of clause evaluation (cf. Proposition 14): A ground atom <sup>P</sup>(t) is true in <sup>S</sup> if and only if <sup>P</sup> <sup>S</sup> (x){x<sup>i</sup> → <sup>t</sup><sup>i</sup> <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>i</sup> ≤ |x<sup>|</sup> <sup>=</sup> <sup>|</sup>t|}. Fulfillment of this property thus hinges on the meaning of "fast". We consider methods for evaluating formulas of LA against points to be fast.


## **4 Conclusion**

We have presented the first model construction approach to Horn clauses with linear arithmetic constraints based on hierarchic ordered resolution, (cf. Definition 19). The linear arithmetic constraints may range over the reals, rationals, or integers. The computed model is the canonical least model of the saturated Horn clause set (cf. Theorem 31). Clauses can be effectively evaluated with respect to the model (cf. Proposition 14). This offers a way to explore the properties of a saturated clause set, e.g., if the set represents a failed refutation attempt.

*Future Work.* It is straightforward to see that any symbolic LQA model is also a symbolic LRA model. (This holds due to convexity of conjunctions of ground LQA atoms.) So even if the axiom of choice is not assumed, there is an alternative way to obtain a model for a HBS(LRA) clause set: Simply treat it as an HBS(LQA) clause set, saturate it and construct its model based on HBS(LQA).

In this work, we restrict ourselves to only one sort LA per set of clauses. An extension to a many-sorted setup, e.g. including first-order variables with sort F is possible. This can even be simulated, by encoding first-order constants as concrete natural numbers via a bijection to <sup>N</sup>, since <sup>N</sup> ⊂ U. By not placing any arithmetic constraints on the variables used for the encoding, it can be read off and mapped back from the resulting model.

One obvious challenge is relaxation of the restriction to Horn clauses. With respect to ordered resolution saturation there is typically no difference in the sense that if a Horn fragment can always be finitely saturated, so can the non-Horn fragment be. However, our proposed ordering for the model construction at the granularity of predicate symbols will not suffice in this general case, and the key to overcome this challenge seems to be the appropriate treatment of clauses with maximal literals of the same predicate. Backtracking on the selection of literals might also be sufficient.

The approach we presented does not exploit features of linear arithmetic beyond equality and the existence of a well-founded order for the underlying universe U. The results may therefore be adapted to other constraint domains such as non-linear arithmetic.

**Acknowledgements.** We thank our reviewers for their constructive comments.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Frameworks**

# **Combining Finite Combination Properties: Finite Models and Busy Beavers**

Guilherme V. Toledo1(B) , Yoni Zohar<sup>1</sup> , and Clark Barrett<sup>2</sup>

<sup>1</sup> Bar-Ilan University, Ramat Gan, Israel guivtoledo@gmail.com <sup>2</sup> Stanford University, Stanford, USA

**Abstract.** This work is a part of an ongoing effort to understand the relationships between properties used in theory combination. We here focus on including two properties that are related to shiny theories: the finite model property and stable finiteness. For any combination of properties, we consider the question of whether there exists a theory that exhibits it. When there is, we provide an example with the simplest possible signature. One particular class of interest includes theories with the finite model property that are not finitely witnessable. To construct such theories, we utilize the Busy Beaver function.

**Keywords:** satisfiability modulo theories · theory combination · theory politeness · theory shininess

# **1 Introduction**

The story of this paper begins with [7], where it was shown that the theory of algebraic datatypes, useful for modeling data structures like lists and trees, can be combined with any other theory, using the polite combination method [6]. This combination method offers a way to combine decisions procedures of two theories into a decision procedure for the combined theory, with different assumptions than those of the earlier Nelson-Oppen approach [4]. In particular, it was proven that the theory admits a technical property concerning cardinalities of models, called *strong politeness* [2]. It was noted in [7] that proving strong politeness for this theory seemed much harder than proving *politeness*, a similar but simpler property. Therefore, the proof was split into three steps: (i) a class of theories was identified in which politeness and strong politeness coincide; (ii) the theory of algebraic datatypes was shown to be in this class; and (iii) this theory was proven to be polite. This proof technique raised the following question: **does politeness imply strong politeness?** An affirmative answer to this question would simplify strong politeness proofs that follow such steps, as only the last step would be needed. Unfortunately, the answer to this question was shown in [8] to be negative, in its most general form. However, an affirmative answer was given for theories over one-sorted empty signatures, where politeness and strong politeness do coincide.

Seeing that relationships between model-theoretic properties of theories (like politeness and strong politeness) are non-trivial, and can have a big impact on proofs in the field of theory combination, we have recently initiated a more general research plan: to systematically determine the relationships between model-theoretic properties that relate to theory combination. An analysis of such properties can, for example, simplify proofs, in cases where a property follows from a combination of other properties.

In the first stage of this plan [10], we studied the relationships between all properties that relate to either polite or Nelson-Oppen combination, namely: stable infiniteness, smoothness, finite witnessability, strong finite witnessability, and convexity. The first two properties relate to the ability to enlarge cardinalities of models, while the next two require a computable *witness* function that restricts the models of a formula based on its variables. The last property relies on the ability to deduce an equality from a disjunction of equalities. The result of [10] was a comprehensive table: nearly every combination of these properties (e.g., theories that are smooth and stably infinite but do not admit the other properties) was either proved to be infeasible, or an example for it was given.

In this paper we continue with this plan by adding two properties: the finite model property and stable finiteness, both related to shiny theories [9]. The former requires finite models for satisfiable formulas, and the latter enforces bounds on them.

Of course, the theories from [10] can be reused. For these, one only needs to determine if they admit the finite model property and/or stable finiteness. The results and examples from [10] are, however, not enough. Given that the number of considered combinations is doubled with the addition of each property, new theories need to be introduced in order to exemplify the new possibilities, and new impossible combinations can be found. Hence, in this paper we provide several impossibility results for the aforementioned properties, as well as examples of theories for possible combinations. The overall result is a new table which extends that of [10] with two new columns corresponding to the finite model property and stable finiteness.1

The most interesting combinations that we study are theories that admit the finite model property but not finite witnessability. While both properties deal with finite models, the latter has a computable element to it, namely the witness function. In separating these properties, we found it useful to define theories that are based on the *Busy Beaver* function, a well known function from computability theory, that is not only non-computable, but also grows eventually faster than any computable function.

**Outline:** Sect. 2 reviews many-sorted logics and theory combination properties. Section 3 identifies combinations that are contradictory; Sect. 4 constructs the extended table of combinations, and describes the newly introduced theories. Section 5 gives final remarks and future directions this work can take. The proofs for the results in this paper may be found in an appendix to a preprint version of this work, available as [11].

#### **2 Preliminary Notions**

#### **2.1 Many-Sorted Logic**

A *many-sorted signature* Σ is a triple (SΣ, FΣ,PΣ) where: S<sup>Σ</sup> is a countable set of *sorts*; F<sup>Σ</sup> is a countable set of function symbols; and P<sup>Σ</sup> is a countable set of predicate

<sup>1</sup> While we use several results from [10], we do not assume here any familiarity with that paper. All required results are mentioned here explicitly.

symbols containing, for each σ ∈ SΣ, an equality =σ. When σ is clear from the context, we write =. Every function symbol has an *arity* of the form σ<sup>1</sup> ×···× σ<sup>n</sup> → σ, and every predicate symbol one of the form σ<sup>1</sup> ×··· × σn, where σ1,...,σn, σ ∈ SΣ; equalities =<sup>σ</sup> have arity σ × σ.

A signature that has no functions and only the equalities as predicates is called *empty*. Many-sorted signatures Σ where S<sup>Σ</sup> has only one element are called *one-sorted*.

For any sort in S<sup>Σ</sup> we assume a countably infinite set of variables, and distinct sorts have disjoint sets of variables; we then define first-order terms, formulas, and literals in the usual way. The set of free variables of sort σ in a formula ϕ is denoted by *vars*σ(ϕ), while *vars*(ϕ) will denote - <sup>σ</sup>∈S<sup>Σ</sup> *vars*σ(ϕ).

Σ-Structures A are defined as usual, by interpreting sorts (denoted by σ<sup>A</sup>), functions (f <sup>A</sup>) and predicate symbols (P <sup>A</sup>), with the restrictions that equality symbols are interpreted as identities. A <sup>Σ</sup>-interpretation <sup>A</sup> is an extension of a <sup>Σ</sup>-structure <sup>A</sup> with interpretations to variables. If <sup>A</sup> is the underlying <sup>Σ</sup>-structure of a <sup>Σ</sup>-interpretation <sup>A</sup>, we say that <sup>A</sup> is an interpretation on <sup>A</sup>. For simplicity, and because the use of structures is sparse in this paper, we will usually denote both structures and interpretations by using the same font, A, B and so on. α<sup>A</sup> is the value taken by a Σ-term α in a Σ-interpretation A, and if Γ is a set of terms, we simply write Γ <sup>A</sup> for {α<sup>A</sup> : α ∈ Γ}.

We write A ϕ if the Σ-interpretation A satisfies the Σ-formula ϕ; ϕ is then said to be *satisfiable* if it is satisfied by some interpretation A. The formulas found in Fig. 1 will be useful in the sequel. A <sup>Σ</sup>-interpretation <sup>A</sup>: satisfies <sup>ψ</sup><sup>σ</sup> <sup>≥</sup><sup>n</sup> iff <sup>|</sup>σ<sup>A</sup>| ≥ <sup>n</sup>; satisfies ψ<sup>σ</sup> <sup>≤</sup><sup>n</sup> iff <sup>|</sup>σ<sup>A</sup>| ≤ <sup>n</sup>; and satisfies <sup>ψ</sup><sup>σ</sup> <sup>=</sup><sup>n</sup> iff |σ<sup>A</sup>| = n. For simplicity, when dealing with one-sorted signatures, we may drop the sort σ from the cardinality formulas.

$$\psi\_{\geq n}^{\sigma} = \exists \ \exists \ \bigwedge\_{1 \leq i < j \leq n} \neg(x\_i = x\_j) \quad \psi\_{\leq n}^{\sigma} = \exists \ \exists \ \forall \ y. \ \bigvee\_{i=1}^n y = x\_i \quad \psi\_{= n}^{\sigma} = \psi\_{\geq n}^{\sigma} \land \psi\_{\leq n}^{\sigma}$$

**Fig. 1.** Cardinality Formulas. −→x stands for x1,...,xn, all variables of sort σ.

A Σ-*theory* T is a class of all Σ-interpretations (called T -interpretations) that satisfy some set *Ax*(T ) of closed formulas called the *axiomatization* of T ; the structures underlying these interpretations will be called the *models* of T .

A formula is T *-satisfiable* if it is satisfied by some T -interpretation and, analogously, a set of formulas is T -satisfiable if there is a T -interpretation that satisfies all of them simultaneously. Two formulas are T *-equivalent* when a T -interpretation satisfies the first iff it satisfies the second. We write |=<sup>T</sup> ϕ, and say that ϕ is T *-valid* if A ϕ for all T -interpretations A.

#### **2.2 Theory Combination Properties**

Let Σ be a signature, T a Σ-theory and S⊆SΣ. We define several properties T may have with respect to S.

**Convexity, Stable Infiniteness, and Smoothness** T is *convex* with respect to S if for any conjunction of Σ-literals φ and any finite set of variables {u1, v1,...,un, vn} of sorts in <sup>S</sup> with <sup>|</sup>=<sup>T</sup> <sup>φ</sup> <sup>→</sup> <sup>n</sup> <sup>i</sup>=1 u<sup>i</sup> = vi, one has |=<sup>T</sup> φ → u<sup>i</sup> = v<sup>i</sup> for some i. T is *stably infinite* with respect to S if for every T -satisfiable quantifier-free Σ-formula there is a T -interpretation A satisfying it such that |σA| is infinite for each σ ∈ S. T is *smooth* with respect to S if for every quantifier-free formula, T -interpretation A that satisfies it, and function κ from S to the class of cardinals such that κ(σ) ≥ |σA| for each σ ∈ S, there is a T -interpretation B that satisfies it with |σB| = κ(σ) for each σ ∈ S.

**(Strong) Finite witnessability** For finite sets of variables V<sup>σ</sup> of sort σ for each σ ∈ S, and equivalence relations E<sup>σ</sup> on Vσ, the arrangement on V = - <sup>σ</sup>∈<sup>S</sup> <sup>V</sup><sup>σ</sup> induced by E = - <sup>σ</sup>∈<sup>S</sup> <sup>E</sup>σ, denoted by <sup>δ</sup><sup>V</sup> or <sup>δ</sup><sup>E</sup> <sup>V</sup> , is the formula δ<sup>V</sup> = σ∈S xEσ<sup>y</sup>(x = y) ∧ xE<sup>σ</sup> <sup>y</sup> ¬(x = y) , where E<sup>σ</sup> denotes the complement of the equivalence relation Eσ.

T is *finitely witnessable* with respect to S when there exists a computable function *wit*, called a *witness*, from the quantifier-free Σ-formulas to themselves that satisfies, for every <sup>φ</sup>: (i) <sup>φ</sup> and <sup>∃</sup> −→w . *wit*(φ) are <sup>T</sup> -equivalent, for −→<sup>w</sup> <sup>=</sup> *vars*(*wit*(φ)) \ *vars*(φ); and (ii) if *wit*(φ) is T -satisfiable, there exists a T -interpretation A satisfying *wit*(φ) such that σ<sup>A</sup> = *vars*σ(*wit*(φ))<sup>A</sup> for each σ ∈ S.

*Strong finite witnessability* is defined similarly to finite witnessability, replacing (ii) by: (ii) given a finite set of variables V and an arrangement δ<sup>V</sup> on V , if *wit*(φ) ∧ δ<sup>V</sup> is T -satisfiable, there exists a T -interpretation A that satisfies *wit*(φ) ∧ δ<sup>V</sup> with σ<sup>A</sup> = *vars*σ(*wit*(φ) ∧ δ<sup>V</sup> <sup>A</sup> for all <sup>σ</sup> <sup>∈</sup> <sup>S</sup>. If <sup>T</sup> is smooth and (strongly) finitely witnessable with respect to S, then it is *(strongly) polite* with respect to S.

**Finite Model Property and Stable Finiteness** T has the *finite model property* with respect to S if for every quantifier-free T -satisfiable Σ-formula, there exists a T interpretation A that satisfies it with |σ<sup>A</sup>| finite for each σ ∈ S. T is *stably finite* with respect to S if, for every quantifier-free Σ-formula and T -interpretation A that satisfies it, there exists a T -interpretation B that satisfies it with: |σ<sup>B</sup>| finite for each σ ∈ S; and |σ<sup>B</sup>|≤|σ<sup>A</sup>| for each σ ∈ S. Clearly, stable finiteness implies the finite model property: **Theorem 1.** *If* T *is stably finite w.r.t.* S*, then it has the finite model property w.r.t.* S*.*

We shall write **SI** for stably infinite; **SM** for smooth; **FW** (**SW**) for (strong) finitely witnessable; **CV** for convex; **FM** for the finite model property; and **SF** for stably finite.

## **3 Relationships Between Model-Theoretic Properties**

In this section we study the connections between finiteness properties related to theory combination: the finite model property, stable finiteness, finite witnessability, and strong finite witnessability. We show how these properties are related to one another. In Sect. 3.1, we provide general results that hold for all signatures. Then, in Sect. 3.2, we focus on empty signatures, in which we are able to find more connections.

#### **3.1 General Signatures**

Finite witnessability, as well as its strong variant, were introduced in the context of polite theory combination. In contrast, the study of shiny theories utilizes the notions of the finite model property, as well as stable finiteness. It was shown in [1] that for theories with a decidable quantifier-free satisfiability problem, shiny theories and strongly polite theories are one and the same. This already showed some connections between the aforementioned finiteness properties. However, that analysis also relied on smoothness, the decidability of the quantifier-free satisfiability problem of the studied theories, as well as the computability of the *mincard* function, the function that computes the minimal sizes of domains in models of a given formula in these theories.

Here we focus purely on the finiteness properties, and show that even without any other assumptions, they are closely related. Considering finite witnessability and the finite model property, notice that any witness ensures that some formulas always have finite models. Using the equivalence of the existential closure of such formulas to the formulas that are given to the witness, one gets the following result, according to which finite witnessability implies the finite model property.

**Theorem 2.** *Any* Σ*-theory* T *finitely witnessable with respect to* S ⊆ S<sup>Σ</sup> *also has the finite model property with respect to* S*.*

Strong finite witnessability is a stronger property than finite witnessability, obtained by requiring finite models in the presence of arrangements. This requirement allows one to conclude stable finiteness for it, as the finer control on cardinalities that is required for stable finiteness can be achieved with the aid of arrangements. The following result is proved in Lemma 3.6 of [1], although under the assumption that the theory is smooth, something that is not actually used in their proof.

**Theorem 3.** *Any* Σ*-theory* T *strongly finitely witnessable with respect to* S ⊆ S<sup>Σ</sup> *is also stably finite with respect to* S*.*

Clearly, stable finiteness implies the finite model property (Theorem 1). The converse does not generally hold, as we will see in Sect. 4. However, when these properties are considered with respect to a single sort, they actually coincide:

**Theorem 4.** *If a* Σ*-theory* T *has the finite model property with respect to a set of sorts* S *with* |S| = 1*, then* T *is also stably finite with respect to* S*.*

Theorems 2 and 3 are visualized in the Venn diagram of Fig. 2, where, for example, theories that are strongly finitely witnessable are clearly inside the intersection of finitely witnessable theories and stably finite theories.

When only one sort is considered, the picture is much simpler, and is described in Fig. 3. There, the finite model property and stable finiteness populate the same region, as ensured by Theorem 4. Notice that the results depicted in Fig. 3 hold for one-sorted and many-sorted signatures. The key thing is that the properties are all w.r.t. one of the sorts.

#### **3.2 Empty Signatures**

Figures 2 and 3 show a complete picture of the relationships between the properties studied in this section, for arbitrary signatures. However, when this generality is relaxed,

**Fig. 2.** Finiteness properties: general case.

several other connections appear. For this section, we require that the signatures are empty, and that they have a finite set of sorts. We further require that the properties in question hold for the entire set of sorts, not for any subset of it.

Table 1 defines the 5 signatures that will be used in the examples found in Sect. 4, and that will also appear in some of the results shown below: the empty signatures Σ1, Σ<sup>2</sup> and Σ3, with sets of sorts {σ}, {σ, σ2} and {σ, σ2, σ3}, respectively; and the signatures Σ<sup>s</sup> and Σ<sup>2</sup> <sup>s</sup> with one function s of arity σ → σ, and sets of sorts {σ} and {σ, σ2}, respectively. Notice these are the simplest possible signatures when we order those by establishing: first, that the signature with fewer sorts is simpler; and second, that if two signatures have the same number of sorts, the one with fewer function symbols is simpler. We are free not to consider predicates, as they are at least as expressive as functions themselves; furthermore, we do not consider the problem of defining which of two signatures with the same numbers of sorts and function symbols is simpler, choosing rather to add only functions from a sort to itself.

**Table 1.** Signatures that will be used throughout the paper.


First, in such a setting, we have that the finite model property implies finite witnessability, in the presence of smoothness.

**Theorem 5.** *If* Σ *is an empty signature with a finite set of sorts* SΣ*, and the* Σ*-theory* T *has the finite model property and is smooth with respect to* SΣ*, then* T *is also finitely witnessable with respect to* SΣ*.*

Next, we show that stable finiteness and smoothness together, imply strong finite witnessability.

**Fig. 4.** Interplay between **SM**, **FW** (**SW**) and **FM** (**SF**) w.r.t. S<sup>Σ</sup> in an empty signature.

**Theorem 6.** *If* Σ *is an empty signature with a finite set of sorts* SΣ*, and the* Σ*-theory* T *is stably finite and smooth with respect to* SΣ*, then* T *is also strongly finitely witnessable with respect to* SΣ*.*

While Theorem 2 and Theorem 3 establish certain unconditional relations between finite witnessability and the finite model property, and strong finite witnessability and stable finiteness, the converses shown to hold in Theorem 5 and Theorem 6 demand smoothness and that the properties hold with respect to the entire set of sorts. In that case, the situation can be represented by the diagram found in Fig. 4, showing clearly that a smooth theory that also has the finite model property (respectively, is stably finite), cannot not be finitely witnessable (strongly finitely witnessable).

Lastly, regarding the empty signatures Σ1, Σ<sup>2</sup> and Σ3, the following theorem shows that Σ<sup>3</sup> is sometimes necessary.

**Theorem 7.** *There are no* Σ<sup>1</sup> *or* Σ2*-theories* T *that are, simultaneously, neither stably infinite nor stably finite, but are convex and have the finite model property, with respect to the entire set of their sorts.*

Hence, to exhibit such theories, one has to consider three-sorted theories.

#### **4 A Taxonomy of Examples**

In [10], we have created a table, in which for every possible combinations of properties from { **SI**, **SM**, **FW**, **SW**, **CV** } we either gave an example of a theory in this combination, or proved a theorem that shows there is no such example, with the exception of theories that are stably infinite and strongly finitely witnessable but not smooth. Such theories, referred to in [10] as *Unicorn Theories* (due to our conjecture that they do not exist) were left for future work, and are still left for future work, as the focus of the current paper is the integration of finiteness properties, namely **FM** and **SF** to the table.

And indeed, the goal of this section is to add two columns to the table from [10]: one for the finite model property and one for stable finiteness. The extended table is Table 2. We do not assume familiarity with [10], and describe the entire resulting table (though focusing on the new results).

**Table 2.** Summary of all possible combinations of theory properties. Red cells represent impossible combinations. In lines 26 and 34, n > 1; in lines 29, 30 and 35, m > 1, n > 1 and |m − n| > 1.


This section is structured as follows: In Sect. 4.1 we describe the structure of Table 2. In Sects. 4.2 to 4.4 we provide details about the axiomatizations of theories that populate it. Finally, in Sect. 4.5, we reuse operators from [10], prove that they preserve the finite model property and stable finiteness, and show how they are used in order to generate more theories for Table 2.

#### **4.1 The Table**

The columns left to the vertical double-line of Table 2 correspond to possible combinations of properties. In them, T means that the property holds, while F means that it does not. The first 5 columns correspond to properties already studied in [10], and the next two columns correspond to **FM** and **SF**. The columns right to the vertical double-line correspond to possible signatures: empty or non-empty, and one-sorted or many-sorted. White cells correspond to cases where a theory with the combination of properties induced by the row exists in a signature that is induced by the column. In such a case, the name of the theory is written. The theories themselves are defined in Figs. 5, 7 and 8, axiomatically. Shaded correspond to the cases where there is no such theory. In such a case, the theorem that excludes this possibility is written. If that theorem is from [10], we simply write [10].

*Example 1.* Line 1 of Table 2 corresponds to theories that admit all studied properties. We see that there is such a theory in each of the studied types of signatures (e.g., for the empty one-sorted signature, the theory T≥<sup>n</sup> exhibits all properties). In contrast, line 3 corresponds to theories that admit all properties but strong finite witnessability. We see that such theories exist in non-empty signatures, but not in empty signatures. This is thanks to Theorem 6.

Section 3, as well as results from [10], make some potential rows of Table 2 completely shaded. To allow this table to fit a single page, we chose to erase such rows. For example, by Theorem 1, there are no theories that are stably finite but do not have the finite model property, in any signature. Thus, no rows that represent such theories appear in the table.

In the remainder of this section, we describe the various theories that populate the cells of the table. Fortunately, all theories from [10] can be reused to exhibit also the new properties **SF** and **FM**, or their negations. These are described in Sect. 4.2. However, the theories from [10] alone are not enough. Hence we introduce several new theories in Sects. 4.3 and 4.4. Some of them are relatively simple, and are described in Sect. 4.3. Most of them, however, are more complex, and rely on the Busy Beaver function from theoretical computer science. We discuss these theories in Sect. 4.4.

#### **4.2 Theories from [10]**

For completeness, we include in Fig. 5 the axiomatizations of all theories from [10] that are used in Table 2 (Fig. 6 includes the definitions of formulas that are abbreviated in Fig. 5, such as ψ<sup>=</sup> <sup>≥</sup><sup>n</sup> from the definition of <sup>T</sup><sup>f</sup> ). For lack of space, however, we refrain from elaborating on these theories, and refer the reader to their detailed description in [10]. For the theories of Fig. 5, whether they admit the properties from {**SI**, **SM**, **FW**, **SW**, **CV**} or not was already established in [10]. For each of them, here, we also check and prove whether they admit the new properties **FM** and **SF**.



**Fig. 5.** Theories for Table 2 that were studied in [10]; p(x) stands for s(x) = x. In T<sup>f</sup> , f is any non-computable function from the positive integers to {0, 1}, such that for every k ≥ 0, f maps half of the numbers between 1 and 2<sup>k</sup> to 1, and the other half to 0. In [10], such a function was proven to exist.

$$\psi\_{\geq n}^{\rightarrow} = \exists \ \overline{x} . \bigwedge\_{i=1}^{n} p(x\_i) \wedge \delta\_n \qquad \psi\_{= n}^{\rightarrow} = \exists \ \overline{x} . [\bigwedge\_{i=1}^{n} p(x\_i) \wedge \delta\_n \wedge \forall \ x. \ [p(x) \rightarrow \bigvee\_{i=1}^{n} x = x\_i]]$$

$$\psi\_{\geq n}^{\psi} = \exists \ \overline{x} . \bigwedge\_{i=1}^{n} \neg p(x\_i) \wedge \delta\_n \qquad \psi\_{= n}^{\neq} = \exists \ \overline{x} . [\bigwedge\_{i=1}^{n} \neg p(x\_i) \wedge \delta\_n \wedge \forall \ x. [\neg p(x) \rightarrow \bigvee\_{i=1}^{n} x = x\_i]]$$

$$\psi\_{\vee} = \forall \ x. \ [(s(s(x)) = x) \vee (s(s(x)) = s(x))]$$

**Fig. 6.** Formulas for Σs-theories. −→x stands for x1,...,xn. δ<sup>n</sup> stands for - <sup>1</sup>≤i<j≤<sup>n</sup> <sup>¬</sup>(x<sup>i</sup> <sup>=</sup> <sup>x</sup><sup>j</sup> ), and p(x) stands for s(x) = x.

For example, for each n, T≥<sup>n</sup> consists of all Σ1-structures that have at least n elements. This theory was shown in [10] to be strongly finitely witnessable, and so by Theorem 3 it is also stably finite. Then, by Theorem 1, it also admits the finite model property.

It is worth mentioning that T2,<sup>3</sup> was first introduced in [1], in the context of shiny theories, where it was shown to have the finite model property, while not being stably finite. An alternative proof of this fact goes as follows: it was proven in [8] that T2,<sup>3</sup> is: (i) finitely witnessable; (ii) not strongly finitely witnessable; and (iii) smooth. By


**Fig. 7.** Simple theories for Table 2. diagσ,σ<sup>2</sup> (<sup>k</sup> + 2), for any <sup>k</sup> <sup>∈</sup> <sup>N</sup>, stands for the formula (ψ<sup>σ</sup> <sup>≥</sup>k+2 <sup>∧</sup> <sup>ψ</sup><sup>σ</sup><sup>2</sup> <sup>≥</sup>k+2) <sup>∨</sup> <sup>k</sup>+2 <sup>i</sup>=2 (ψ<sup>σ</sup> <sup>=</sup><sup>i</sup> <sup>∧</sup> <sup>ψ</sup><sup>σ</sup><sup>2</sup> <sup>=</sup><sup>i</sup> ), and p(x) stands for s(x) = x.

Theorem 2 and (i), it also has the finite model property. But since it is over an empty signature, by (ii), (iii) and Theorem 6, we have that it cannot be stably finite.

#### **4.3 New Theories: The Simple Cases**

While the theories from Fig. 5 suffice to populate many cells of Table 2, they are not enough. Hence we describe new theories, not taken from [10]. The simplest theories that we have added can be found in Fig. 7, and are described below.

T <sup>∞</sup> is a theory with three distinct groups of models: its first group consists of models A that have |σ<sup>A</sup>| = 1 and σ<sup>A</sup> <sup>2</sup> infinite; its second group, of models A where both σ<sup>A</sup> and σ<sup>A</sup> <sup>2</sup> are infinite; and its third group, of models A where |σ<sup>A</sup>| = |σ<sup>A</sup> <sup>2</sup> | is any value <sup>k</sup> <sup>≥</sup> <sup>2</sup>. In its axiomatization, one finds the formula diagσ,σ<sup>2</sup> (<sup>k</sup> + 2), equal to (ψ<sup>σ</sup> <sup>≥</sup>k+2 <sup>∧</sup> <sup>ψ</sup><sup>σ</sup><sup>2</sup> <sup>≥</sup>k+2) <sup>∨</sup> <sup>k</sup>+2 <sup>i</sup>=2 (ψ<sup>σ</sup> <sup>=</sup><sup>i</sup> <sup>∧</sup> <sup>ψ</sup><sup>σ</sup><sup>2</sup> <sup>=</sup>i) for <sup>k</sup> <sup>∈</sup> <sup>N</sup>: that formula characterizes the models A of T <sup>∞</sup> that lie in the diagonal, that is, where |σ<sup>A</sup>| = |σ<sup>A</sup> <sup>2</sup> | (and this value is greater than 1), or both are infinite.

T <sup>∞</sup> m,n is a theory that depends on two distinct positive integers m and n, and without loss of generality let us suppose m>n, when the theory has two types of models A: in the first, |σ<sup>A</sup>| equals m, while σ<sup>A</sup> <sup>2</sup> can be anything; in the second, |σ<sup>A</sup>| equals n, and then σ<sup>A</sup> <sup>2</sup> must be infinite.

The models <sup>A</sup> of the <sup>Σ</sup><sup>2</sup> <sup>s</sup> -theory T <sup>∞</sup> <sup>=</sup> have either: <sup>|</sup>σ<sup>A</sup><sup>|</sup> = 1, <sup>|</sup>σ<sup>A</sup> <sup>2</sup> | ≥ ω and s<sup>A</sup> the identity function; both σ<sup>A</sup> and σ<sup>A</sup> <sup>2</sup> infinite, and s<sup>A</sup> with no fixed points; or |σ<sup>A</sup>| = |σ<sup>A</sup> <sup>2</sup> | equal to any number in <sup>N</sup> \ {0, <sup>1</sup>}, and again <sup>s</sup><sup>A</sup> with no fixed points.

Finally, <sup>T</sup> <sup>3</sup> <sup>2</sup>,<sup>3</sup> is made up of just the models A of T2,<sup>3</sup> (see Fig. 5) with an extra domain associated to the new sort σ<sup>3</sup> such that |σ<sup>A</sup> <sup>3</sup> | = 1.

#### **4.4 New Theories: The Busy Beaver**

So far we have seen that the theories from [10], together with a small set of simple new theories, can already get us quite far in filling Table 2. However, for several com-


**Fig. 8.** Busy Beaver Theories for Table 2. diagσ,σ<sup>2</sup> <sup>ς</sup> (<sup>k</sup> + 2) stands, for each <sup>k</sup> <sup>∈</sup> <sup>N</sup>, for (ψ<sup>σ</sup> <sup>≥</sup>ς(k+2) <sup>∧</sup> <sup>ψ</sup><sup>σ</sup><sup>2</sup> ≥ς(k+2)) <sup>∨</sup> <sup>k</sup>+2 <sup>i</sup>=2 (ψ<sup>σ</sup> <sup>=</sup>ς(i) <sup>∧</sup> <sup>ψ</sup><sup>σ</sup><sup>2</sup> <sup>=</sup>ς(i)), and <sup>p</sup>(x) for <sup>s</sup>(x) = <sup>x</sup>; in <sup>T</sup> <sup>ς</sup> m,n, we assume w.l.g. m ≥ n.

binations, it seems that more complex theories are needed. For this purpose, we utilize the well-known Busy Beaver function, and define various theories based on it. In this section, we describe these theories. First, in Sect. 4.4.1, we review the Busy Beaver function, and explain why it is useful in our context. Then, in Sects. 4.4.2 to 4.4.6, we describe the theories that make use of it, separated according to their signatures.

**4.4.1 On the Busy Beaver Function** The Busy Beaver function, here denoted ς, is an old acquaintance of theoretical computer scientists: essentially, given any <sup>n</sup> <sup>∈</sup> <sup>N</sup>, ς(n) is the maximum number of 1's a Turing machine with at most n states can write to it's tape when it halts, if the tape is initialized to be all 0's. Somewhat confusingly, any Turing machine that achieves that number is also called a Busy Beaver.

It is possible to prove that <sup>ς</sup>(n) <sup>∈</sup> <sup>N</sup> for any <sup>n</sup> <sup>∈</sup> <sup>N</sup> (see [5]), and so we may write <sup>ς</sup> : <sup>N</sup> <sup>→</sup> <sup>N</sup>; furthermore, <sup>ς</sup> is increasing. But the very desirable property of <sup>ς</sup> is that it is not only increasing, but actually very rapidly increasing.

More formally, Rado proved, in the seminal paper [ ´ 5], that ς grows asymptotically faster than any computable function (being, therefore, non-computable). That is, for every computable function <sup>f</sup> : <sup>N</sup> <sup>→</sup> <sup>N</sup>, there exists <sup>N</sup> <sup>∈</sup> <sup>N</sup> such that <sup>ς</sup>(n) > f(n) for all n ≥ N. Despite that, the Busy Beaver starts somewhat slowly: ς(0) = 0, ς(1) = 1, ς(2) = 4, ς(3) = 6 and ς(4) = 13; the exact value of ς(5) (and actually ς(n) for any n ≥ 5) is not known, but is at least 4098 [3].

The fact that ς grows eventually faster than any computable function is a great property to have when constructing theories that admit the finite model property, while not being finitely witnessable. Roughly speaking, if the cardinalities of models of a theory are related to ς, this guarantees that it has models of sufficiently large finite size, while not being finitely witnessable since its models grow too fast: by carefully choosing formulas φ<sup>n</sup> that hold only in the "n-th model" of the theory (when ordered by cardinality), the number of variables of *wit*(φn) offers an upper bound to ς(n) and is therefore not computable, leading to a contradiction with the fact that *wit* is supposed to be computable. Notice that, despite the dependency of our theories on the Busy Beaver, the function is not actually part of their signatures.

Now we present the theories that are based on ς. These theories are axiomatized in Fig. 8.

**4.4.2 A** *Σ***1-Theory** The most basic Busy Beaver theory is T<sup>ς</sup> . This is the Σ1-theory whose models have cardinality ς(k), for some k ≥ 2, or are infinite: that is, T<sup>ς</sup> has models with 4 elements, 6, 13 and so on. This theory forms the basis to all other theories of this section, that are designed to admit various properties from Table 2.

By itself, T<sup>ς</sup> has the finite model property while not being (strongly) finitely witnessable. It was in fact constructed precisely to exhibit this. As it turns out, it is also not smooth, but does satisfy all other properties. To populate other rows in the table that correspond to theories with other combinations of properties, more theories are needed, with richer signatures.

**4.4.3** *Σ***2-Theories** To fill the rows that correspond to other combinations, we introduce several Σ<sup>2</sup> theories.

The Σ2-theory T <sup>∞</sup> <sup>ς</sup> is more complex. It has, essentially, three classes of models: the first is made up of structures A where |σ<sup>A</sup>| = 1 and σ<sup>A</sup> <sup>2</sup> is infinite; the second, of structures where both σ<sup>A</sup> and σ<sup>A</sup> <sup>2</sup> are infinite; and the third, of structures where |σ<sup>A</sup>| = |σ<sup>A</sup> <sup>2</sup> <sup>|</sup> is a finite value that equals <sup>ς</sup>(k), for some <sup>k</sup> <sup>≥</sup> <sup>2</sup>. The formula diagσ,σ<sup>2</sup> <sup>ς</sup> (k + 2), for <sup>k</sup> <sup>≥</sup> <sup>2</sup>, in the axiomatization equals (ψ<sup>σ</sup> ≥ς(k+2)∧ψ<sup>σ</sup><sup>2</sup> ≥ς(k+2))∨<sup>k</sup>+2 <sup>i</sup>=2 (ψ<sup>σ</sup> =ς(i)∧ψ<sup>σ</sup><sup>2</sup> <sup>=</sup>ς(i)) and is similar to diagσ,σ<sup>2</sup> (<sup>k</sup> + 2) from <sup>T</sup> <sup>∞</sup>, characterizing the models <sup>A</sup> where either |σ<sup>A</sup>| = |σ<sup>A</sup> <sup>2</sup> | = ς(k + 2), or both σ<sup>A</sup> and σ<sup>A</sup> <sup>2</sup> are infinite.

For each n > <sup>0</sup>, <sup>T</sup> <sup>ς</sup> <sup>n</sup> has as interpretations those A with |σ<sup>A</sup>| = n, and |σ<sup>A</sup> <sup>2</sup> |either infinite or equal to ς(k), for some k ≥ 2 (so (|σ<sup>A</sup>|, |σ<sup>A</sup> <sup>2</sup> |) may equal (n, 4), (n, 6), (n, 13) and so on).

T ς m,n is a Σ2-theory that can be seen as some sort of combination of T <sup>∞</sup> m,n and T ς <sup>n</sup> , dependent on two distinct positive integers m and n. Consider the case where the former is the greater of the two (the other cases are similar). In this case, we may divide its interpretations A into three classes: those with |σ<sup>A</sup>| = n and σ<sup>A</sup> <sup>2</sup> infinite; those with |σ<sup>A</sup>| = m and σ<sup>A</sup> <sup>2</sup> infinite; and those with |σ<sup>A</sup>| = m and |σ<sup>A</sup> <sup>2</sup> | equal to some ς(k), for k ≥ 2.

**4.4.4** *Σs* **-Theories** For some lines of Table 2, e.g. line 7, empty signatures are not enough for presenting examples. Hence we also introduce Σs-theories.

We start with <sup>T</sup> <sup>s</sup> <sup>ς</sup> , which is, arguably, the most confusing theory we here define: we are forced to appeal not only to the special cardinality formulas found in Fig. 6, but also to the function <sup>ς</sup>−1, which is a left inverse of <sup>ς</sup>. More formally, <sup>ς</sup>−<sup>1</sup> : <sup>N</sup> <sup>→</sup> <sup>N</sup> is the only function such that <sup>ς</sup>−<sup>1</sup>(k) = min{<sup>l</sup> : <sup>ς</sup>(<sup>l</sup> + 1) > k}: so <sup>ς</sup>−<sup>1</sup>(0) = 0, <sup>ς</sup>−<sup>1</sup>(1) = <sup>ς</sup>−<sup>1</sup>(2) = <sup>ς</sup>−<sup>1</sup>(3) = 1, <sup>ς</sup>−<sup>1</sup>(4) = <sup>ς</sup>−<sup>1</sup>(5) = 2, <sup>ς</sup>−<sup>1</sup>(6) = ··· <sup>=</sup> <sup>ς</sup>−<sup>1</sup>(12) = 3, <sup>ς</sup>−<sup>1</sup>(13) = ··· <sup>=</sup> <sup>ς</sup>−<sup>1</sup>(4097) = 4, and further values of <sup>ς</sup>−<sup>1</sup> are currently unknown. From the definition of <sup>ς</sup>−1, we have that <sup>ς</sup>(ς−<sup>1</sup>(k)) <sup>≤</sup> <sup>k</sup> and <sup>ς</sup>−<sup>1</sup>(ς(k)) = <sup>k</sup>. <sup>ς</sup>−<sup>1</sup> is not computable given that, since <sup>ς</sup>−<sup>1</sup>(k) = min{<sup>l</sup> : <sup>ς</sup>(<sup>l</sup> + 1) > k} by definition, <sup>ς</sup>−<sup>1</sup>(<sup>k</sup> + 1) = ς−<sup>1</sup>(k) iff k + 1 is a value of ς: so, an algorithm to compute the values of ς could be obtained by simply computing the values of ς−<sup>1</sup> and checking where there is a change.

T s <sup>ς</sup> is then the Σs-theory with models A with any cardinality k + 1 ≥ 1, such that <sup>s</sup>A(a) = <sup>a</sup> holds for precisely <sup>ς</sup>−<sup>1</sup>(<sup>k</sup> + 1) elements of <sup>A</sup>, and so <sup>s</sup>A(a) = a holds for <sup>k</sup> + 1 <sup>−</sup> <sup>ς</sup>−<sup>1</sup>(<sup>k</sup> + 1) elements, being the function <sup>k</sup> <sup>→</sup> <sup>k</sup> + 1 <sup>−</sup> <sup>ς</sup>−<sup>1</sup>(<sup>k</sup> + 1) itself non-decreasing, given that ς−<sup>1</sup>(k + 1) can equal either ς−<sup>1</sup>(k) or ς−<sup>1</sup>(k)+1.

*Example 2.* We mention some <sup>T</sup> <sup>s</sup> <sup>ς</sup> -structures as examples: a structure A with |σ<sup>A</sup>| = 1 and s<sup>A</sup> the identity; a structure B with |σ<sup>B</sup>| = 2 and s<sup>A</sup> a constant function; a structure C with |σ<sup>C</sup>| = 3 (say σ<sup>C</sup> = {a, b, c}) and s<sup>C</sup> the identity for only one of these elements (e.g., s<sup>C</sup> can be a constant function, but now there are further possibilities such as sC(a) = sC(b) = a and sC(c) = b); and a structure D with |σ<sup>D</sup>| = 4 (say σ<sup>D</sup> = {a, b, c, d}) and s<sup>D</sup> the identity for only two of these elements (e.g., sD(a) = sD(b) = sD(c) = a and sD(d) = d);

Next, we continue to describe other Σ<sup>s</sup> theories.

T = <sup>ς</sup> has essentially two classes of models A: those with |σ<sup>A</sup>| = 2 and s<sup>A</sup> never the identity; and those with |σ<sup>A</sup>| equal to ς(k) or infinite, for some k ≥ 2, and s<sup>A</sup> the identity.

T = ς,<sup>1</sup> is very similar to <sup>T</sup> <sup>=</sup> <sup>ς</sup> : the difference lies on where s will be the identity: while in <sup>T</sup> <sup>=</sup> <sup>ς</sup> the function <sup>s</sup> is the identity for all interpretations <sup>A</sup> with <sup>|</sup>σ<sup>A</sup><sup>|</sup> <sup>&</sup>gt; <sup>2</sup>, <sup>s</sup> in <sup>T</sup> <sup>=</sup> ς,<sup>1</sup> is the identity only for the interpretations <sup>A</sup> with <sup>|</sup>σ<sup>A</sup><sup>|</sup> = 1. So, in <sup>T</sup> <sup>=</sup> ς,1, we have a model A with |σ<sup>A</sup>| = 1 and s<sup>A</sup> the identity, and then models A with |σ<sup>A</sup>| = ς(k) for some k ≥ 2 or infinite, and sA(a) anything but a.

The Σs-theory T <sup>∨</sup> <sup>ς</sup> is then just <sup>T</sup> <sup>s</sup> <sup>ς</sup> , satisfying in addition the formula ψ<sup>∨</sup> (see Fig. 6). It has models <sup>A</sup> of any finite cardinality <sup>k</sup> + 1, as long as <sup>ς</sup>−<sup>1</sup>(<sup>k</sup> + 1) of these elements a satisfy sA(a) = a, or infinite cardinalities, as long as the number of elements a satisfying sA(a) = a is infinite; additionally, sA(sA(a)) must always equal either sA(a) or a itself.

#### **4.4.5** *Σ***<sup>2</sup>** *<sup>s</sup>* **-Theories** Now for theories in a many-sorted non-empty signature.

The Σ<sup>2</sup> <sup>s</sup> -theory <sup>T</sup> <sup>=</sup> <sup>ς</sup> appears simple, but is actually quite tricky: starting by the easy case, if σ<sup>A</sup> has infinitely many elements a satisfying sA(a) = a, σ<sup>A</sup> <sup>2</sup> is also infinite. If, however, the number of elements a ∈ σ<sup>A</sup> satisfying sA(a) = a is finite (notice that, even if this is the case, σ<sup>A</sup> may still be infinite) and equal to some k + 2, then σ<sup>A</sup> <sup>2</sup> has at least ς(k + 2) elements. So, to give a better example, suppose σ<sup>A</sup> has 2 elements satisfying sA(a) = a: then σ<sup>A</sup> <sup>2</sup> has at least ς(2) = 4 elements, but may have any cardinality up to, and including, infinite ones; notice that in this example σ<sup>A</sup> may be infinite as well, as long as only two of the elements satisfy sA(a) = a.

T 2 <sup>ς</sup> is the same as <sup>T</sup> <sup>=</sup> <sup>ς</sup> , but with extra models A where |σA| = 1 and |σ<sup>A</sup> <sup>2</sup> | ≥ ω (of course, then we have that s<sup>A</sup> is the identity).

T = <sup>ς</sup><sup>∨</sup> is then the same as <sup>T</sup> <sup>2</sup> <sup>ς</sup> , with the added validity of the formula ψ∨; So the models of <sup>T</sup> <sup>=</sup> <sup>ς</sup><sup>∨</sup> are just models of <sup>T</sup> <sup>2</sup> <sup>ς</sup> satisfying that sA(sA(a)) equals either sA(a) or a itself.

T <sup>∞</sup> <sup>ς</sup><sup>=</sup> is just the <sup>Σ</sup>2-theory <sup>T</sup> <sup>∞</sup> <sup>ς</sup> with the added function s such that, if |σA| = 1, s<sup>A</sup> is the identity; and if |σA| > 1, sA(a) is anything but a.

**4.4.6 A** *<sup>Σ</sup>***3-Theory** Finally, <sup>T</sup> <sup>∞</sup>,<sup>3</sup> <sup>ς</sup> is obtained by adding a sort with a single element to the Σ2-theory T <sup>∞</sup> <sup>ς</sup> , similarly to the definition of <sup>T</sup> <sup>3</sup> <sup>2</sup>,3, that was based on the Σ2-theory T2,<sup>3</sup> (see Sect. 4.3).

#### **4.5 Theory Operators**

There are two types of theories in Table 2: The first consists of base theories, such as T≥<sup>n</sup>, that are axiomatized in Figs. 5, 7 and 8. The second is obtained from the first, by applying several operators on theories. For example, the theories (T≥<sup>n</sup>)<sup>2</sup>, (T≥<sup>n</sup>)s, ((T≥<sup>n</sup>)<sup>2</sup>)s, are all obtained from the base theory <sup>T</sup>≥<sup>n</sup>. So far we have only described the theories of the first type. In this section we explain the theories of the second type.

The operators that are used in Table 2 were defined in [10], in order to be able to systematically generate examples in various signatures. For example, if T is a Σ1 theory, then (<sup>T</sup> )<sup>2</sup> is a <sup>Σ</sup>2-theory with the same axiomatization as <sup>T</sup> , that is, the second sort is completely free and is not axiomatized in any way. For completeness sake, we include the definitions of these operators here:

#### **Definition 1 (Theory Operators from** [10]**)**


It was proven in [10] that these operators preserve the properties **SI**, **SM**, **FW**, **SW**, **CV**, and the lack of them. Here we prove that the same holds for **FM** and **SF** as well.

**Theorem 8.** *Let* <sup>T</sup> *be a* <sup>Σ</sup>1*-theory. Then:* <sup>T</sup> *is FM, or SF, w.r.t.* {σ} *if and only if* (<sup>T</sup> )<sup>2</sup> *is, respectively, FM, or SF w.r.t.* {σ, σ2}*.*

**Theorem 9.** *If* T *is a theory over an empty signature* Σ<sup>n</sup> *with sorts* S = {σ1,...,σn}*, then:* T *is FM, or SF, w.r.t.* S *if and only if* (T )<sup>s</sup> *is, respectively, FM, or SF, w.r.t.* S*.*

**Theorem 10.** *If* T *is a theory over an empty signature* Σ<sup>n</sup> *with sorts* S = {σ1,...,σn}*, then:* T *is FM, or SF, w.r.t.* S *if and only if* (T )<sup>∨</sup> *is, respectively, FM, or SF, w.r.t.* S*.*

Thus, in various cases, theories need not be invented from scratch, but can be generated from other theories. For example, the theory T≥<sup>n</sup> exhibits all studied properties, but is defined in a one-sorted signature. Using the operators, we obtain variants of this theory in all signature types, namely (T≥<sup>n</sup>)<sup>2</sup> for empty many-sorted signatures, (T≥<sup>n</sup>)<sup>s</sup> for non-empty one-sorted signatures, and ((T≥<sup>n</sup>)<sup>2</sup>)<sup>s</sup> for non-empty many-sorted signatures. The properties of the theories generated using these operators are guaranteed by Theorems 8 and 9, as well as the corresponding results from [10].

In two cases of theories defined using the Busy Beaver function, T <sup>∨</sup> <sup>ς</sup> and <sup>T</sup> <sup>=</sup> <sup>ς</sup>∨, we cannot obtain them by relying on Theorem <sup>10</sup> from, respectively, <sup>T</sup> <sup>s</sup> <sup>ς</sup> and <sup>T</sup> <sup>=</sup> <sup>ς</sup> , since the signatures of the latter theories are not empty. Curiously, adding ψ<sup>∨</sup> to their axiomatizations still has the desirable outcome, but we prove this separately, without relying on Theorem 10. Extending Theorem 10 to non-empty signatures is left for future work.

The number of combinations of properties that we consider, together with the possible types of the signatures, adds up to 2<sup>9</sup> = 512. Our negative results from Sect. 3 guarantee that only ∼15% of the actual table can be filled with examples. The remaining ∼85% are either shaded or are excluded from the table for space considerations. As for the examples that can be given, notice that there are in total an astonishing number of 78 theories in our table. But, thanks to the theory operators of Definition 1, only 33 of them (∼42%) had to be concretely axiomatized in Figs. 5, 7 and 8. The remaining 45 theories were defined using the operators.

#### **5 Conclusion**

We examined, in addition to all properties considered in [10], the finite model property, and stable finiteness. Interesting restrictions for the combinations involving these properties were established. We also found interesting theories to fill in our table of combinations, most prominently those involving the Busy Beaver function as well as its inverse.

One possible direction this research could take is reasonably clear: considering the computability of the *mincard* function, what will, most probably, double the number of theories to be taken into consideration. Further interesting properties that could be considered include the decidability of the theory's axiomatization, or even its finiteness, and the satisfiability problem of the theory with respect to quantifier-free formulas.

Second, some of the negative results in [10] and in the present paper only hold with respect to the entire set of sorts SΣ. We plan to study if they hold also with respect to proper subsets of sorts, and if they do not, to provide counterexamples to those generalizations.

**Acknowledgments.** ( <sup>1</sup>Funded in part by NSF-BSF grant numbers 2110397 (NSF) and 2020704 (BSF) and ISF grant number 619/21. )

### **References**

1. Casal, F., Rasga, J.: Many-sorted equivalence of shiny and strongly polite theories. J. Autom. Reason. **60**(2), 221–236 (2018)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Formal Reasoning Using Distributed Assertions**

Farah Al Wardani , Kaustuv Chaudhuri(B) , and Dale Miller

Inria Saclay and LIX, Institut Polytechnique Paris, Palaiseau, France *{*farah.al-wardani,kaustuv.chaudhuri,dale.miller*}*@inria.fr

**Abstract.** When a proof system checks a formal proof, we can say that its kernel *asserts* that the formula is a theorem in a particular logic. We describe a general framework in which such assertions can be made globally available so that any other proof assistant willing to trust the assertion's creator can use that assertion without rechecking any associated formal proof. This framework, called DAMF, is heterogeneous and allows each participant to decide which tools and operators they are willing to trust in order to accept external assertions. This framework can also be integrated into existing proof systems by making minor changes to the input and output subsystems of the prover. DAMF achieves a high level of distributivity using such off-the-shelf technologies as IPFS, IPLD, and public key cryptography. We illustrate the framework by describing an implemented tool for validating and publishing assertion objects and a modified version of the Abella theorem prover that can use and publish such assertions.

### **1 Introduction**

In order to communicate a result from one formal reasoning system to another, a common technique is to transfer a formal proof certificate from the source system to the target system. This technique is usually required when the target system is *autarkic*, <sup>1</sup> wherein the system only trusts its own components, of which a particularly trusted component is an implementation of a proof checking *kernel*. To transfer a formal proof to an autarkic target system, either (a) the proof has to be *translated* from the source system, or (b) the verifier for the proof must be re-implemented as a *certified* procedure in the target system [6,25]. Both kinds of transferal are complicated for a variety of reasons: (1) The source and target system may not be syntactically, semantically, or foundationally compatible. (2) The source-proof language can have complex operational semantics that is cumbersome to encode in the target system. (Note that no universal standard has yet emerged for encoding the formal semantics of arbitrary proof languages; cf. Sect. 5.) (3) As systems change and mature, older versions of proof certificates can become stale and unmaintained. (4) Perhaps most importantly, many

<sup>1</sup> In [12], the adjective *autarkic* was applied to computational components of a proof checker but not to an entire proof checker.

U. Sattler and M. Suda (Eds.): FroCoS 2023, LNAI 14279, pp. 176–194, 2023. https://doi.org/10.1007/978-3-031-43369-6\_10

popular reasoning systems do not produce proof certificates at all. Prominent examples of that latter are SMT solvers that are not certifying when memory size and execution time are critical [32] and the specification tool Twelf [42] when using non-certifying procedures (e.g., totality checking).

Formal reasoning systems that are *non-autarkic* have an additional way to interact with external provers that addresses many of the above issues. In such systems, a host system is designed to build *proof obligations* that are then dispatched to external systems to solve. While these external systems may produce proofs, the host system usually does not check the proofs and instead *trusts the executions* of the external systems. This system architecture is most commonly used in program verification tools such as Dafny [28], Why3 [24], and TLAPS [16]. One issue not addressed with this enlarged view of trust is that the external dependencies tend to have unclear descriptions, especially from a thirdparty perspective. To illustrate, Dafny may declare that it trusts "Z3 v.4.12.1", but what does this mean? Is this external dependency to be interpreted by name, in which case any tool called "Z3 v.4.12.1" can be used, or is it precisely identified by, e.g., (a cryptographic hash of) the source code (or better, an executable binary) of a particular tool called "Z3 v.4.12.1"? Even with a precise identification, an external executable dependency may not be practical to incorporate. For example, the HOL Light system [27] re-checks its entire standard library every time it is started, taking on the order of minutes. If a development involves many calls to an external HOL Light-based solver, how are the calls to be orchestrated?

In addition to these two bases of trust—autarkic based on proof certificates, and non-autarkic based on executions of external tools—there is at least one other basis of trust in any heterogeneous development: the *agents* that write and assemble the developments and execute the formal tools as required (checkers, solvers, etc.). An example of an agent is a user, although one individual user can have many *agent profiles* (see Sect. 3.2). Entities such as a trustworthy central database can also correspond to an agent. Trusted agents have been largely neglected in the formal reasoning world, but they are common in other high reliability settings, such as security. Nevertheless, agents are at least implicitly present in any formal development: to claim that a result has been formally achieved is tantamount to saying that some trustworthy agent (e.g., peer reviewers) has *correctly and successfully* executed a specific collection of formal tools to convince themselves of that formal result. Furthermore, if one agent A *trusts* another B, there is no need for A to re-check B's proof scripts and re-execute any tools that B used to construct the result.

In this paper, we propose a framework where a *distributed* collection of agents can exchange formal results (called *assertions*), where the results have an unimpeachable *provenance*, and where each agent is in full control of their trust parameters. This *Distributed Assertion Management Framework* (DAMF) is:

– *Decentralized*: a global notion of *truth* is not imposed on every participant by means of a privileged logic, language, system, or software. This linguistic independence makes DAMF different from formalisms such as the *evidential tool bus* [20,38] that have been proposed for integrating external reasoning agents into a unified formal system. Participants in DAMF are free to combine assertions from different sources if they believe the combination to be meaningful. Any participant can *retrieve* and use any assertion they understand, and this external import will be explicitly marked as a *dependency* if they choose to *publish* assertions they build with such external imports.


Concretely, DAMF provides JSON-based representations of a small number of concepts such as formulas, assertions, dependencies, etc. *without* any upfront commitment to a formal syntax or any particular semantics. These objects are then added to a *global store* in terms of the *InterPlanetary File System* (IPFS) [13] using linked data in the *InterPlanetary Linked Data* (IPLD) format. An object in IPFS/IPLD is denoted by a *canonical* content identifier (cid), a cryptographic hash of its content. Knowing the cid is sufficient to retrieve the object by any participant of the IPFS network. Furthermore, the cids are the only externally visible *names* in DAMF, and links between objects are made using these cids by IPLD. Features specific to a particular language or system, such as constants, variables, definitions, and notations, are kept localized to particular *formula objects*. Assertions are built using (the cids of) formula objects and *signed* by their creator agents using public key cryptography. IPFS is used to distribute DAMF objects transparently using various technologies whose precise details are irrelevant to this paper.

This paper is accompanied by two concrete implementations that illustrate DAMF. First, we provide a tool called Dispatch that can be used by users and systems to both produce and consume DAMF assertions. Dispatch is not a privileged tool in DAMF: users and systems can interact directly with DAMF objects in IPFS if they so choose. Dispatch is simply one *interface* to the DAMF *global store*, making the integration of producers and consumers minimally demanding. It does tasks such as schematically validating the concrete JSON objects added to or retrieved from the global store. Dispatch also helps to analyze and modify the trust parameters for (compositions of) assertions.

Second, we implement a version of the Abella interactive theorem prover [10] that can produce and consume assertions in DAMF, mediated by Dispatch. As an example of its use, we show how Abella can use a lemma that was stated and proved using the automated linear arithmetic reasoning tactics of Coq (v. 8.16.1); this lemma is manually translated from the Coq to the Abella language, with an explicit dependency on its Coq development, and added to the global store by the present authors. A user can accept this heterogeneous development as long as they trust Coq, Abella, and our translation of the Coq lemma to Abella. Moreover, this assertion, which contains explicit links to the externally sourced DAMF imports, can be published back to DAMF for use by others.

Since dependencies are explicitly tracked in DAMF assertions, any user can analyze various aspects of how it was composed of other assertions. Such analysis can form the basis of various kinds of *investigations*: for example, if a formula is found to be a non-theorem, an investigator can explore the compositions of the DAMF assertions that yield that formula in order to find the agents whose trust parameters may need to be modified. The Dispatch tool mentioned above comes with a command called *lookup* that explores combinations of known assertions that ultimately yield a desired result; for each such composition, the analysis extracts the collection of agents (and tools) that could be *trusted* in order to accept that composition.

In the next section, we describe the abstract design of DAMF and its underlying logic of assertions which form the basis of the abovementioned investigations. Section 3 describes our concrete implementation of DAMF, Sect. 4 discusses some of the design choices in DAMF, and Sect. 5 discusses some related work. The specific software tools (Dispatch and Abella-DAMF) accompanying this paper are fully documented at https://distributed-assertions.github.io/.

# **2 Design of DAMF**

#### **2.1 Languages, Contexts, and Formulas**

To transfer a theorem from a source proof system to a target proof system, we must be able to transfer the statement of the theorem, which we represent as a *formula* object in DAMF. To be as general as possible, we represent the content of such a formula as a *string*, i.e., in a format suitable as an input to a parser of the source proof system. In order to determine that the input is well-formed, the source proof system may need further information about the *features*—symbols, predicates, functions, types, notations, hints, etc.—used in the formula. Such additional information is the *context* of the formula, which we represent as a document fragment in the language of the source proof system.

For example, take the following theorem written in Coq 8.16.1:

```
1 Definition lincomb (njk: nat) := exists x y, n = x * j + y * k.
2 Theorem ex_coq : forall n:nat, 8 <= n -> lincomb n 3 5.
```
The formula corresponding to the theorem ex\_coq is the literal string "forall n:nat, *···* lincombn35". The symbols 8, <=, etc. are part of the standard prelude of this language, and the symbol lincomb is defined in line 1, so a sufficient context necessary for Coq 8.16.1 to parse and type-check the theorem statement is the text of line 1, which is also written in the Coq 8.16.1 language.

Abstractly, a *formula object* in DAMF is a triple (L, Σ, F) where L denotes a *language*, Σ denotes a *context*, and F denotes a *formula*, all of which may conceptually be thought of as strings. We will use the schematic variable N to range over such formula objects. The language L is a canonical identifier (specifically, the cid of a DAMF language object) which may optionally represent information about a suitable loader for the language that will make sense of the strings Σ and F; DAMF compares languages just by their identifiers. Moreover, L is interpreted as defining all the globally available features; for instance, the symbol nat is part of the standard prelude of this version of Coq and should therefore be understood as being defined in the language Coq 8.16.1. The context Σ introduces any user-defined features such as the definition lincomb above that is not part of Coq's standard prelude.

Note that DAMF formula objects are considered to be *closed*, i.e., every symbol used in the formula is defined in the language or the context. From the perspective of DAMF, a formula object is an atomic entity. Additionally, DAMF does not need to be aware of any reasoning principles of the language or context components. For instance, no mechanism in DAMF would allow the substitution of a declared symbol in the context with a concrete definition. The purpose of differentiating a formula object into three parts is purely pragmatic: the language part will in most cases be a well known object used by many agents, and the context part may potentially be shared between multiple assertions. DAMF consumers may be able to use this sharing of information to consolidate tasks such as context-processing.

#### **2.2 Sequents and Assertions**

A *sequent* in DAMF is abstractly of the form N1,...,N*<sup>k</sup>* - N<sup>0</sup> where each of the N*<sup>i</sup>* is a DAMF formula object defined in the previous subsection. We will use the schematic variable Γ to range over ordered lists of formula objects, and S to range over sequents. In a sequent Γ - N, we say that N is the *conclusion* and Γ are the *dependencies*. Such sequent objects may be produced whenever a formal proof has been checked in a proof checker: the conclusion represents the statement of the theorem, and the dependencies are external lemmas that were used during that proof. As an example, suppose the Coq 8.16.1 theorem in Sect. 2.1 has a proof that appeals to the lemma lem : forall m n, m <= n -> S m <= n \/ m = n. The sequent that is produced is conceptually of the form lem - ex\_coq, though concretely we would have to build DAMF formula objects by packaging the language and contexts.

An *agent* is a globally unique name. We use the schematic variable K to range over agents. We define a simple multi-sorted first-order logic where agents and sequents are primitive sorts and where the infix predicate *says* is the sole predicate; the atomic formula K *says* S, where K is an agent and S a sequent, is an *assertion*. The *says* predicate is implemented in DAMF using public-key cryptography. In a DAMF-aware proof system, when an appeal is made—say as part of the proof of some other theorem—to an assertion K *says* (N1,...,N*<sup>k</sup>* - N0), the appeal is interpreted as follows:


#### **2.3 Adapters**

Because every formula object packages the formula together with its context and language identifier, every formula object is independent of every other formula object. Thus, in a sequent N<sup>1</sup> - N0, there is no requirement that the conclusion N<sup>0</sup> and the dependency N<sup>1</sup> be in the same language or have a common context. When working within a single autarkic system (e.g., a proof checker using a single logic), the sequents that are generated for every theorem will probably place the conclusion and dependencies in the same language and context; however, in the wider non-autarkic world, we can use multilingual sequents as first class entities that are documented and tracked the same way as any other kind of sequent.

An important class of multilingual sequents comes from *adapters*. In order for a theorem written in the Coq 8.16.1 language to be used by a different system with a different language, say Abella 2.0.9, we will need to transform the formula objects in the former language to those in the latter language. This kind of translation is an example of a *language adapter*, which falls into the general class of *adapters*, and which creates a sequent by translating between languages or modifying the logical context by standard logical operations such as weakening (adding extra symbols), instantiation (replacing a symbol by a term), or unfolding (replacing a defined symbol by its definition).

As an example, the Coq 8.16.1 example above can be translated to the Abella 2.0.9 language as follows, where the function symbols + and \* are replaced by relations in Abella. 2

```
1 Import "nats". % some natural numbers library
2 Define lincomb : nat -> nat -> nat -> prop by
3 lincomb N J K := exists X Y U V,
4 times X J U /\ times Y K V /\ plus U V N.
5 Theorem ex_ab : forall n, nat n -> le 8 n -> lincomb n 3 5.
```
Lines 1–4 determine the context Σex\_ab for the formula ex\_ab on line 5.

The sequent that represents this translation therefore has the form

```

 Coq 8.16.1, Σex_coq, ex_coq
                               -
                                (Abella 2.0.9, Σex_ab, ex_ab).
```
Suppose agent K<sup>1</sup> signs this translation and that agent K<sup>2</sup> signs the sequent - - Coq 8.16.1, Σex\_coq, ex\_coq . As long as K<sup>1</sup> and K<sup>2</sup> are trusted by the user of Abella 2.0.9, then the formula object (Abella 2.0.9, Σex\_ab, ex\_ab) can also be treated as a theorem by that user thanks to *composition*, discussed next.

<sup>2</sup> This encoding of functions using relations is the usual one: see [17] for details.

#### **2.4 Composing Assertions, Trust**

Assertions will be composed by means of a single rule of inference that implements a cut-like rule for sequents, Compose.

$$\frac{K \text{ says } (\varGamma\_1 \vdash M)}{K \text{ says } (\varGamma\_1, \varGamma\_2 \vdash N)} \text{ Composite}$$

The effect of this rule means that the *says* predicate does not correspond oneto-one with cryptographic signatures. The conclusion of the Compose rule may, in particular, not be a sequent that has been explicitly signed by the agent K even if both premises are. Rather, the rule states that whenever K can be said to reliably claim, *either* by a cryptographic signature *or* by a Composederivation tree, that both Γ<sup>1</sup> - M and M,Γ<sup>2</sup> - N, then K must also reliably claim Γ1, Γ<sup>2</sup> -N.

There are many variations to *access control logic* in the literature. For example, some such logics use inference rules such as:

$$\frac{\Gamma \vdash N}{K \text{ says } (\Gamma \vdash N)} \quad \text{or} \quad \frac{K \text{ says } (\Gamma \vdash N)}{K \text{ says } (K \text{ says } (\Gamma \vdash N))}.$$

Such rules are neither syntactically well-formed nor desirable for our purposes. We use here a very weak access control logic (see [1] for a survey of such logics). Instead, checking the validity of a given derivation using Compose is computationally trivial: each instance of it must eliminate exactly the leftmost dependency in the second premise, which is a DAMF formula object that is compared by cid.

Observe that the agent K does not participate in a meaningful way in a derivation that is built with the Compose rule. Thus, for a given end sequent of the form K *says* (- N), a Compose derivation can be seen as a *proof outline* for the desired theorem N, with the leaves of the derivation being the assertions that need to be sourced from an assertion database (such as the DAMF global store). We say that an assertion (K *says* S) is *published* if it can be retrieved from such a database. The inference system is then enlarged with the following rule that can be used to complete the open leaves of the Compose derivation using assertions made by different agents.

$$\frac{(K\_1 \text{ says } S) \text{ is published}}{K\_2 \text{ says } S} \text{ TRust } [K\_1 \mapsto K\_2]$$

This rule is parameterized by a pair of agents, K<sup>1</sup> and K2, and is understood to be applicable only when K<sup>1</sup> is in the user-specified *allow list* of K<sup>2</sup> (i.e., K<sup>1</sup> *speaks for* K2, which we write as [K<sup>1</sup> → K2]).

We do not assume that agents have any additional closure properties beyond Compose and Trust. For example, suppose N*A*, N*<sup>A</sup>*→*<sup>B</sup>*, and N*<sup>B</sup>* are the formula objects that correspond to the formulas A, A → B, and B respectively in

some language. We do not assume that the following rule is admissible:

$$\frac{K \text{ says } (\varGamma \vdash N\_{A \to B})}{K \text{ says } (\varGamma \vdash N\_B)} \text{ MP.}$$

That is, we do not assume that the formulas asserted by agent K are closed under modus ponens. Similarly, we do not assume that what agents assert are closed by substitution or instantiation of any symbols that are defined in the contexts of the formula objects. While a particular agent may not be closed under modus ponens, substitution, or instantiation, it is possible to employ other agents that can look for opportunities to apply such inference rules on the results of trusted agents. In particular, if we want the query engine to be able to use the mp rule, then the engine must construct an agent Kmp whose sole function is to generate assertions such as Kmp *says* (N*<sup>A</sup>*→*<sup>B</sup>*, N*<sup>A</sup>* - N*B*) that correspond to applications of the mp rule. Of course, Kmp will need to be in the *allow list* for any agent wanting to use this agent.

#### **2.5 Producing Assertions, Formal Reasoning Tools**

Conceptually, an agent constructs a DAMF sequent as a consequence of running formal reasoning tools such as proof checkers or theorem provers. DAMF includes *tool objects*, which are unconstrained JSON objects that can be used to describe such tools. A *tool object* does not necessarily describe an implemented tool; it might describe a part of it, or an abstract description of the logical system in which the sequent is asserted in, for instance. Like with languages in Sect. 2.1, we compare tools for equality by means of the cids of these tool objects. It is also possible for an agent to build a DAMF sequent manually, without running any tool. The agent may do this for a number of reasons: e.g., the assertion may be a *conjecture* (i.e., a proof may be provided at some other time but is currently missing) or a manually produced *adapter*.

A DAMF *production* is a sequent that is annotated with a *mode* that describes how the sequent was produced; this mode can be the cid of a tool object mentioned above, or it can be *null* expressing an *unproven* sequent. We use the schematic variable T for modes, and write a production of the sequent Γ - N with mode T as Γ -*<sup>T</sup>* N. Published DAMF assertions will be of the form K *says* (Γ -*<sup>T</sup>* N), and we modify the Trust rule to the following:

$$\frac{\left(K\_1 \text{ says } (\varGamma \vdash\_T N)\right) \text{ is published}}{K\_2 \text{ says } (\varGamma \vdash N)} \text{ TRust } [K\_1/T \mapsto K\_2]$$

where the side condition [K1/T → K2] means that K<sup>2</sup> allows K1's assertions in mode T. It may be tempting to think of K1/T as an agent by itself, but, as we shall see in Sect. 3.1, agents are implemented in DAMF using keypairs, so if K1/T<sup>1</sup> and K1/T<sup>2</sup> were separate agents then there would be no verifiable way to link them both to K1. This use of modes makes it possible, for example, to trust an agent K using any version of Coq while not trusting K when using other proof systems.

#### **2.6 Logical Consistency of Heterogeneous Combinations**

DAMF imposes no constraints on the composition of assertions, which can at first glance appear to be risky. For example, suppose the assertions come from incompatible logics, say an assertion in classical logic during the proof of an intuitionistic theorem. Without exceptional care, the result of a Compose will only be classically, not intuitionistically, true. Similar problems exist if the imported assertion requires additional axioms that are incompatible with the user's setting (e.g. extensionality or UIP in the setting of univalence).

This issue highlights the fact that DAMF *does not* guarantee logical compatibility of assertions; rather, DAMF is more accurately seen as a *record* of compositions that have been made. To trust an agent's assertion is just to say that we trust that the agent indeed had good reasons (such as a proof) to make that assertion, *not* that the assertion may be arbitrarily composed. Moreover, DAMF assertions are intended to be read as *hypothetical statements* from dependencies to conclusions (where "*hypothetical*" is understood in the informal language of discourse rather than as a formal implication or entailment). If the dependencies cannot be met, the assertion is useless. To illustrate, if an agent K wants to use an assertion Γ - M in their proof of N, the assertion they will publish is K *says* (M - N), which is acceptable in isolation; if M is incompatible with the logic of N, then the assertion K *says* (M -N) is vacuous.

## **3 Implementation: Information, Processes, and Tools**

#### **3.1 The Structures of the Global Store**

A crucial design criterion of DAMF is that the assertions and their constituent objects are a globally shared commodity, existing independently of the tools that produce or consume them. To this end, DAMF requires well-defined basic structures that producers would produce and consumers would expect and know how to address.

The use of a content-addressing scheme is an essential part of seeing these structures as global. Each structure is identified and addressed by a unique global identifier in a common namespace in an independently verifiable and trusted way: the identifier is derived from the content itself and every alteration of the content produces a new identifier; at the DAMF level, *the content is the name/address*, and comparing two objects structurally at the DAMF level is reduced to comparing their cids as strings. One way to handle differences in cids between different forms of conceptually the same DAMF object is by curation and normalization of such structures at the level of producers or potentially other DAMF actors.

The structures we may want to specify in DAMF are built by composing several elements; for instance, a *sequent* contains *formula* structures, which themselves contain *context* structures. In DAMF, we make the design choice to treat all such structures as *first class* objects stored in a distributed network through

IPFS, and use the linked data representation of IPLD to represent an object as being composed of other objects.

The core DAMF structures we define are *context*, *formula*, *sequent*, *production*, and *assertion*. Concretely, these structures are represented as JSON objects with a varying format property which has the type of the structure as its value. These structures are described as follows (full definitions in [4, Appendix A]):


Given these schemata, the aspects of tracking and trusting become natural: a formula present as a dependency in some assertion could be matched with the same formula present as the conclusion of a different assertion.

It is also useful to annotate these core DAMF objects with additional metadata such as external names, proof objects, timestamps, etc. In DAMF, we have chosen to give the core objects a cid independent of the metadata; instead, for every core object, we define an *annotated* object that is composed of a link to the core object and a link to any additional metadata. DAMF follows the design principle that objects are to be considered equal at the DAMF level if they have the same cid: the content of the objects is not examined, and no IPLD-links are followed for such comparisons. Generally speaking, therefore, DAMF core objects will not link to annotated objects, since the annotations will factor into the cids and force disequality when undesired, such as when building compositions (Sect. 2.4). The sole exception to this rule of thumb are assertion objects which can use annotated production objects as their claims. Note that every assertion object will be globally unique when produced: it will have a different cid each time its claim is signed, even if signed by the same agent, because cryptographic signatures always include a nonce.

Another layer of structures that can aggregate global object references are *collections*. We currently define one generic *collection* format in our implementation: many other non-generic collection formats can easily be considered.

#### **3.2 Processes in DAMF, and Dispatch as an Intermediary Tool**

The two obvious processes in DAMF are the *production* and *consumption* of DAMF objects. In a *production* process, DAMF objects are constructed starting from local information, published, and then stored across the distributed network. The *consumption* process is in the opposite direction: locally consumable information are constructed from DAMF objects. The important point is that these DAMF objects are common and well-understood (as DAMF formats) for all consumers, and each consumer decides what to consume and how to consume it. For example, a consumer might only choose to read formulas that are of some specific language, and then decide how to process their internal structures based on its own criteria. Other than these two, other processes will be done on the published DAMF objects that will incorporate their combination, curation, and analysis. The process we consider first in our implementation is *lookup* which will be discussed further below. Individual producers and consumers, such as theorem provers, can choose to implement some or several of these DAMF processes. However, many aspects of dealing with linked data and IPFS will be common to such tools, so we describe an intermediary tool called Dispatch that simplifies the interactions between these producers and consumers and the DAMF global store. Of course, Dispatch would be considered part of the *trusted code base*, along with IPFS and any utilities used to manipulate JSON data and cryptographic signatures. If this is problematic, Dispatch can be completely foregone in preference to native implementations.

The Dispatch tool is distributed as an executable dispatch with three subcommands: publish, get, and lookup. The dispatch publish command operates on one of a collection of standard input formats that contains local information corresponding to DAMF types. After syntactically validating this input, the publish command will construct and publish the global objects. Dispatch can also optionally interact with a specific storage service in order to make that object widely discoverable in the IPFS network. As an example, consider the following input for an *assertion* object, where newly created formulas and contexts are placed in the same file and are referred by local names such as plus\_comm, and previously existing objects are referred by their cids using the damf: flag, such as the first value of "dependencies" (line 10) which refers to a *formula* object cid, as well as "language" and "mode" values which refer to existing *language* and *tool* objects respectively.

```
1 { "format": "assertion",
2 "agent": "localAgent",
3 "claim": {
4 "format": "annotated-production",
5 "annotation": ...,
6 "production": {
7 "mode": "damf:bafyreihnx2...",
8 "sequent": {
9 "conclusion": "plus_comm",
10 "dependencies": [ "damf:bafyreihw6g...", "plus_succ" ] } } },
11 "formulas": {
12 "plus_comm": {
13 "language": "damf:bafyreidyts...",
14 "content": ": forall M N K, nat K -> ...",
```

```
15 "context": ["plus"] },
16 "plus_succ": {
17 "language": "damf:bafyreidyts.....",
18 "content": ": forall M N K, ...",
19 "context": ["plus"] } },
20 "contexts": {
21 "plus": {
22 "language": "damf:bafyreidyts.....",
23 "content": [
24 "Kind nat type.", "Type z nat.", "Type s nat -> nat.",
25 "Define plus : nat -> nat -> prop by ...." ]}}}
```
This example is based on an output from our Abella-DAMF prover described below. A prover using Dispatch tool only needs to be able to produce and consume JSON objects with this structure, without needing to interface with IPFS directly. The value of "agent" (line 2) refers to an *agent profile* in Dispatch; each profile maps a user-readable name to a cryptographic key-pair, created separately using the dispatch create-agent command.

The dispatch get command takes a cid as an argument, fetches the IPLD dag (the full JSON object) referenced by it from the global store, validates the types of all constituent IPLD linked objects, verifies any signatures, and finally outputs a JSON object that is similar in structure to that accepted by dispatch publish. The consumer will have access to all the necessary DAMF objects referenced by the root cid without needing to interact with the global store or structurally validating any objects. The only difference between the output of dispatch get and the input of dispatch publish is that the local names that appeared in the input will be replaced by cids (i.e., *global names*) in the output. Input and output formats corresponding to other global types are described further at the site mentioned in the introduction.<sup>3</sup>

The dispatch lookup command, as mentioned earlier, is the starting process that we consider in our implementation regarding the combination and analysis of DAMF assertions. Given a formula cid and a collection of assertion cids, the output of this command is a list of potential sets of (agent, mode/tool) pairs that correspond to combinations of assertions that would yield the target formula. Any remaining unmatched dependency is also outputted along with the (agent, mode/tool) pairs. In our current implementation, Dispatch exhaustively generates all possible ways of constructing the target formula. A direct improvement is to change this aspect of the tool to allow for a more interactive and incremental exploration of such dependencies. In addition, filtering through allow-lists would reduce the number of assertion combinations generated by this command.

#### **3.3 Edge Systems Example: Abella**

We have implemented a DAMF-aware branch of Abella [10] as an example of a system that interacts with assertions in DAMF with the help of Dispatch as a

<sup>3</sup> https://distributed-assertions.github.io/.

mediator. Abella was originally designed to test a particular approach to metatheoretic reasoning using a new, proof-theoretically motivated mechanism for reasoning directly with bound variables (in particular, the ∇-quantifier [30] and a treatment of equality based on equivariant higher-order unification [26]). While the current implementation of Abella has succeeded with those meta-theoretic tasks [22,41], the prover has not grown much beyond that domain. Indeed, Abella has some (mis)features that make it a good test case for DAMF: (1) it has no awareness of the file system and it is easy to replace the backing store from local files to objects stored in IPFS; (2) it has a feature-poor proof language with nearly no support for proof automation and hence an underdeveloped formal mathematical libraries; and (3) it uses *relational* specifications as opposed to the more common *functional programming* specifications. Furthermore, the area of meta-theory that Abella treats declaratively is also an area many conventional proof systems do not deal well, in part, because of the need to encode and manipulate bindings [9,23]. Such conventional systems might be willing to delegate such meta-theoretic reasoning to Abella.

Ordinary Abella developments (in .thm files) support a kind of *import* mechanism which loads in marshaled results from a different run of Abella. We extend *import* with a new kind of statement: Import "damf:bafyr..." that refers to a collection of DAMF assertions (i.e., a DAMF collection object whose elements are assertions). Dispatch is used to fetch all the referenced objects from IPFS as explained in the previous subsection.

To appeal to an assertion, the elements of the context of the conclusion of the assertion are *merged* using their internal names with the ambient context of Abella where the assertion is appealed to. An Abella declaration in the context is *mergeable* if it has both the same internal name and an identical (up to λequivalence) definition; thus, type and term constants are merged if they have the same kinds or types (respectively), and (co-)definitions are merged if they have the same definitional clauses. This is done to keep the implementation simple and mostly unchanged from the standard (non-DAMF) Abella, which also only allows an Import declaration when the imported objects can be merged.

When the proof of a theorem is completed in Abella, a sequent object is constructed with the dependencies being all the DAMF lemmas appealed to in the proof, and the conclusion being the statement of the theorem (the formula) in the context of all its necessary declarations, computed using a dependency analysis. We use only the necessary declarations to allow such DAMF sequents to have the widest possible uses, since a DAMF assertion can only be used in Abella if the *entire* context of the conclusion can be merged.

A full example of an Abella development that makes use of imported assertions from Abella, Coq, and λProlog can be found in [4, Appendix B]. In this example, Coq and λProlog are not modified at all, and Abella is only minimally modified to use Dispatch to interact with DAMF assertions. The total amount of modifications to Abella to interface with Dispatch amounts to about 100 lines of code, most of which deals with (un)marshalling JSON. We expect that making tools DAMF-aware would require negligible effort.

#### **4 Discussion: Design Choices and Alternatives**

#### **4.1 The Role of Formal Proofs**

Autarkic theorem provers often exploit the existence of proofs for several reasons. Obviously, the ability to check a fully detailed proof object in their own kernel, following the *De Bruijn criterion* [11], is central. But proofs can also be used for various other roles. For example, they sometimes contain constructive content that can be extracted as executable programs, and they can be used as guides during the development and maintenance of other proofs. Given their central role in many proof assistants, a great deal of effort has gone into the formalization, manipulation, and transformation of formal proof objects; see, for example, MMT [35], Logipedia [21], and foundational proof certificates [18]. As a concrete matter, proof objects can be included in the annotations of annotated productions in the global store of DAMF. Sequents are linked in productions by their cids, so it is possible for the same sequent to have multiple proof objects contributed by different agents in separate assertions.

#### **4.2 Potential Benefits to Mainstream Systems**

The fact that proof objects are not central to DAMF and the example presented in Sect. 3.3 might lead the reader to believe that the only beneficiaries of DAMF are new systems that want to leverage existing developments in mainstream systems. This belief is not necessarily true for two reasons. First, there are certain logical systems and formalization styles that are inordinately complicated or impossible to do in mainstream systems. Good examples are nominal sets [34], λ-tree syntax (a.k.a. *higher-order abstract syntax* ) [2,23], generic judgments [30], and nominal abstraction [26]. It is conceivable that a mainstream prover can use DAMF to import a formalization such as the proof of soundness of Howe's method done in the setting of higher-order abstract syntax and contextual modal type theory [31], which is at present not available in a mainstream proof system such as Coq or Agda.

A second benefit to mainstream systems is to enable more trustworthy refactoring of their existing implementations. For example, modern autarkic provers routinely recheck large collections of proofs, often after every invocation of a new instance of the proof checker and certainly after every change in the version of the prover. As a result of needing to recheck such proofs, there is a tendency for implementers of proof checkers to optimize such kernels to be more efficient. However, such optimizations can add greater complexity to a kernel, making errors in the kernel more likely to occur. With DAMF, once a trustworthy but slow kernel—e.g., a certified implementation of a kernel [39]—checks a proof, it rarely needs to be rechecked. This can even lower the pressure for kernel implementations to chase performance with increasing, error-prone complexity. Furthermore, the immutable nature of IPFS objects makes DAMF assertions resistant to malicious subversion of the proper execution of a tool – see, for example, the discussion in [5] concerning attacks on Coq's .vo object files

#### **4.3 Other Use Cases**

While it is common to view tools that perform pure computations (such as functional program execution or proof search a la λProlog) as producing assertions without proofs, there are various well-known reasoning systems that have been used a lot without being either certified or certifying: for example, Twelf [33]. DAMF would enable Twelf-based assertions to be exported to agents willing to trust its type and totality checkers.

The relationship of DAMF to the following topics is discussed in greater detail in the technical report [3]: libraries as curation on top of the DAMF model of global objects; attacks in the adversarial environment of the web; and possible uses of this framework in settings (such as journalism) where the lack of formal proof means increasing the need to explicitly track trust.

#### **5 Related Work**

The *semantic web* [14,15] was proposed to enrich the web with aspects of trust and would rely on concepts and technologies such as cryptography, taxonomies, ontologies, and inference rules. While the semantic web and DAMF both use cryptographic signatures and low-level web-based technologies, DAMF differs from the semantic web by focusing on objects rather than documents and using richer notions of logic and compositional reasoning.

Dedukti [8] is a dependently typed λ-calculus augmented with rewriting. Dedukti can be used to produce adapters (Sect. 2.3): in particular, proofs in a source system can be transformed to Dedukti proofs and then transformed back into formal proofs in a different system. For example, the Logipedia documentation mentions that "some proofs expressed in some Dedukti theories can be translated to other proof systems, such as HOL Light, HOL 4, Isabelle/HOL, Coq, Matita, Lean, PVS, ..." [29]. As a by-product, Dedukti can be used to build correctness-preserving translations of assertions for DAMF.

TPTP [40] provides a number of standards for the concrete syntax of firstorder and higher-order logic along with tools for parsing and printing files that adhere to such standards. Deploying those tools for the production of the kind of multilingual adapters that we have described in Sect. 2.3 is a natural next step for tool development within DAMF.

The recognition that distributing some aspects of proof environments goes back to at least the systems described by Sacerdoti Coen, et al. [7,19]. In such systems, integration was meant to work between "near-peer" systems: that is, between systems that are both based on rich logics such as higher-order logic or on typed λ-calculi based on the Curry-Howard correspondence. A prerequisite for successful integration in such systems is the ability to connect the semantics of formulas, types, universes, proofs, etc. The wide spread use of such integration approaches has been delayed since it has only been in recent years that efforts, such as Dedukti [8] and MMT [36,37], are making it possible to form the necessary deep and sophisticated ties between the semantics of these objects arising from different implementations. In contrast, DAMF allows the

composition of different assertions without an a priori assumption that there is a formal semantics that relates them. Of course, correctness is a concern in many (most) situations: in those cases, Dedukti and MMT encodings can be used to translate assertions between two provers with precise correctness assurances. Often, however, the integration is of a more asymmetric kind. For example, when integrating a system that only performs integer operations or reasons only with integer inequalities (operations that are available in SMT systems) with a system based on higher-order logic, producing adapters based on sophisticated encodings might be completely unnecessary. The DAMF system similarly allows such integration.

## **6 Conclusion**

We have described a Distributed Assertion Management Framework (DAMF) designed to share assertions between agents while tracking dependencies with canonical content ids (cids). This framework endows assertions with reliable provenance using public key cryptography and distributes them globally using the IPFS network. We have given an example of using DAMF to import a Coq lemma into Abella. The biggest challenge for future work is to adapt existing work on language translation and proof translation (in, e.g., Dedukti) to create or derive adapters automatically. Another important matter for future consideration is whether to persist compositions (i.e., Compose-derivations, cf. Sect. 2.4) to DAMF, which can serve as hints for post hoc investigations.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# An Abstract CNF-to-d-DNNF Compiler Based on Chronological CDCL

Sibylle Möhle(B)

Max Planck Institute for Informatics, Saarland Informatics Campus E1 4, 66123 Saarbrücken, Germany smoehle@mpi-inf.mpg.de

Abstract. We present Abstract CNF2dDNNF, a calculus describing an approach for compiling a formula in conjunctive normal form (CNF) into deterministic negation normal form (d-DNNF). It combines component-based reasoning with a model enumeration approach based on conflict-driven clause learning (CDCL) with chronological backtracking. Its properties, such as soundness and termination, carry over to implementations which can be modeled by it. We provide a correctness proof and a detailed example. The main conceptual differences to currently available tools targeting d-DNNF compilation are discussed and future research directions presented. The aim of this work is to lay the theoretical foundation for a novel method for d-DNNF compilation. To the best of our knowledge, our approach is the first knowledge compilation method using CDCL with chronological backtracking.

Keywords: Knowledge compilation · d-DNNF · Chronological CDCL

## 1 Introduction

In real-world applications, constraints may be modeled in conjunctive normal form (CNF), but many tasks relevant in AI and reasoning, such as checks for consistency, validity, clausal entailment, and implicants, can not be executed efficiently on them [9]. Tackling these and other computationally expensive problems is the aim of the knowledge compilation paradigm [13]. The idea is to translate a formula into a language in which the task of interest can be executed efficiently [22]. The knowledge compilation map [22] contains an in-depth discussion of such languages and their properties, and other (families of) languages have been introduced since its publication [21,25,29]. The focus in this work is on the language deterministic decomposable negation normal form (d-DNNF) [19]. It has been applied in planning [2,39], Bayesian reasoning [15], diagnosis [3,43], and machine learning [28] as well as in functional E-MAJSAT [40], to mention a few, and was also studied from a theoretical perspective [7,8,10]. Several d-DNNF compilers are available [20,30,37,48], as well as a d-DNNF reasoner<sup>1</sup>.

<sup>1</sup> http://www.cril.univ-artois.fr/kc/d-DNNF-reasoner.html.

c The Author(s) 2023

U. Sattler and M. Suda (Eds.): FroCoS 2023, LNAI 14279, pp. 195–213, 2023. https://doi.org/10.1007/978-3-031-43369-6\_11

Translating a formula from CNF to d-DNNF requires to process the search space exhaustively. The number of variable assignments which need to be checked is exponential in the number of variables occurring in the formula and testing them one by one is out of question from a computational complexity point of view. However, if the formula can be partitioned into subformulae defined over pairwise disjoint sets of variables, these subformulae can be processed independently and the results combined [4]. This may reduce the amount of work per computation significantly. Consider F = (a ∨ b) ∧ (c ∨ d) defined over the set of variables <sup>V</sup> <sup>=</sup> {a, b, c, d}. Its search space consists of <sup>2</sup><sup>4</sup> = 16 variable assignments. The formula F can be partitioned into F<sup>1</sup> = (a∨b) and F<sup>2</sup> = (c∨d) defined over the sets of variables V<sup>1</sup> = {a, b} and V<sup>2</sup> = {c, d}, respectively, and such that F = F<sup>1</sup> ∧ F2. Due to V<sup>1</sup> ∩ V<sup>2</sup> = ∅, d-DNNF representations of F<sup>1</sup> and F<sup>2</sup> can be computed independently and conjoined obtaining a d-DNNF representation of F. Moreover, in each computation we only need to check 2<sup>2</sup> = 4 assignments. The subformulae F<sup>1</sup> and F<sup>2</sup> are called *components* due to the original motivation originating in graph theory, and the partitioning process is referred to as *decomposition* or *component analysis*. This approach, also called *componentbased reasoning*, is realized in various exact #SAT solvers [1,4,11,12,41,42,47], and its success suggests that formulae stemming from real-world applications decompose well enough to generate a substantial amount of work saving.

The formula F in our example satisfies *decomposability* [22], i.e., for each conjunction, the conjuncts are defined over pairwise disjoint sets of variables. We call such a formula *decomposable*. Negations occur only in front of literals, hence it is in decomposable negation normal form (DNNF) [17,18]. A formula in which for each disjunction its disjuncts are pairwise logically contradictory satisfies *determinism* [22], i.e., for each disjunction C<sup>1</sup> ∨ ... ∨ C<sup>n</sup> it holds that C<sup>i</sup> ∧ C<sup>j</sup> ≡ ⊥ for i, j ∈ {1,...,n} and i = j. A deterministic DNNF formula is said to be in d-DNNF. Determinism is also met by the language disjoint sum of products (DSOP), which is a disjunction of pairwise contradictory conjunctions of literals, and which is relevant in circuit design [5]. In a previous work [34], we introduced an approach for translating a CNF formula into DSOP based on CDCL with chronological backtracking. The motivation for using chronological backtracking is twofold. First, it has shown not to significantly harm solver performance [33,38]. Second, pairwise disjoint models are detected without the need for blocking clauses commonly used in model enumeration based on CDCL with non-chronological backtracking. Blocking clauses rule out already found models, but they also slow down the solver, and avoiding their usage in model enumeration by means of CDCL with chronological backtracking has empirically shown to be effective [46]. Enhancing our former approach [34] by componentbased reasoning enables us to compute a d-DNNF representation of a CNF formula. Reconsider our previous example, and suppose we obtained dsop(F1) = a∨(¬a∧b) and dsop(F2) = c∨(¬c∧d). Now F ≡ F<sup>1</sup> ∧F2, hence F ≡ dsop(F1)∧ dsop(F2)=(a ∨ (¬a ∧ b)) ∧ (c ∨ (¬c ∧ d)), which is in d-DNNF.

Our Contributions. We present Abstract CNF2dDNNF, ACD for short, a declarative formal framework describing the compilation of CNF into d-DNNF and a proof of its correctness. This abstract presentation allows for a thorough understanding of our method at a conceptual level and of its correctness. If our framework is sound, every implementation which can be modeled by it is sound as well. This comprises optimizations and implementation details, such as caches. ACD combines component-based reasoning and CNF-to-DSOP compilation based on conflict-driven clause learning (CDCL) with chronological backtracking. Disjunctions with pairwise contradictory disjuncts are introduced by decisions and subsequently flipping their value upon backtracking, while conjunctions whose conjuncts share no variable are introduced by unit propagation and decomposition. For the sake of simplicity, in our calculus formulae are partitioned into two subformulae. However, lifting it to an arbitrary number of subcomponents is straightforward, and a corresponding generalization is presented.

#### 2 Preliminaries

Let V be a set of propositional variables defined over the set of Boolean constants ⊥ (false) and (true) denoted by <sup>B</sup> <sup>=</sup> {⊥, }. A *literal* is either a variable v ∈ V or its negation ¬v. We refer to the variable of a literal by var(-) and extend this notation to sets and sequences of literals and formulae. We consider formulae in *conjunctive normal form (CNF)* which are conjunctions of *clauses* which are disjunctions of literals. A formula in *disjoint sum of products (DSOP)* is a disjunction of pairwise contradictory *cubes*, which are conjunctions of literals. Our target language is *deterministic decomposable negation normal form (d-DNNF)*, whose formulae are built of literals, conjunctions sharing no variables, and disjunctions whose disjuncts are pairwise contradictory. We might interpret formulae as sets of clauses and cubes and clauses and cubes as sets of literals by writing C ∈ F and - ∈ C to refer to a clause C in a formula F and a literal contained in a clause or cube C, respectively. The empty CNF formula and the empty cube are denoted by and the empty DSOP formula and the empty clause by ⊥.

<sup>A</sup> *total variable assignment* is a mapping <sup>σ</sup> : <sup>V</sup> <sup>→</sup> <sup>B</sup>, and a *trail* <sup>I</sup> <sup>=</sup> -<sup>1</sup> ...n is a non-contradictory sequence of literals which might also be interpreted as a (possibly partial) assignment, such that I(-) = iff - ∈ I. Similarly, I(C) and I(F) are defined. We might interpret a trail I as a set of literals and write - ∈ I to refer to the literal on I. The empty trail is denoted by ε and the set of variables of the literals on I by var(I). Trails and literals can be concatenated, written I J and I -, given var(I)∩var(J) = ∅ and var(I)∩var(-) = ∅. The position of on the trail I is denoted by τ (I,-). The decision literals on I are annotated by a superscript, e.g., <sup>d</sup>, denoting open "left" branches in the sense of the Davis-Putnam-Logemann-Loveland (DPLL) algorithm [23,24]. *Flipping the value of a decision literal* can be seen as closing the corresponding left branch and starting a "right" branch, where the decision literal <sup>d</sup> becomes a *flipped literal* <sup>¬</sup>-.

The *residual* of F under I, written F|<sup>I</sup> , is obtained by assigning the variables in F their truth value and by propagating truth values through Boolean connectives. The notion of residual is extended to clauses and literals. A *unit clause* is a clause {-} containing one single literal -. By units(F) (units(F|<sup>I</sup> )) we denote the set of unit literals in F (F|<sup>I</sup> ). Similarly, decs(I) denotes the set of decision literals on I. By writing - ∈ decs(I) (- ∈ units(F), - ∈ units(F|<sup>I</sup> )), we refer to a decision literal on I (unit literal in F, F|<sup>I</sup> ). A trail I *falsifies* F, if I(F) ≡ ⊥, i.e., F|<sup>I</sup> = ⊥. It *satisfies* F, I |= F, if I(F) ≡ , i.e., F|<sup>I</sup> = , and is then called a *model* of F. If var(I) = V , I is a *total model*, otherwise it is a *partial model*.

The trail is partitioned into *decision levels*, starting with a decision literal and extending until the literal preceding the next decision. The *decision level function* <sup>δ</sup> : <sup>V</sup> <sup>→</sup> <sup>N</sup>∪{∞} returns the decision level of a variable <sup>v</sup> <sup>∈</sup> <sup>V</sup> . If <sup>v</sup> is unassigned, δ(v) = ∞, and δ is updated whenever a variable is assigned or unassigned, e.g., δ[v → d] if v is assigned to decision level d. We define δ(-) = δ(var(-)), δ(C) = max{δ(-) | - ∈ C} for C = ⊥ and δ(I) = max{δ(-) | - ∈ I} for I = ε extending this notation to sets of literals. Finally, we define δ(⊥) = δ(ε) = ∞. By writing δ[I → ∞], all literals on the trail I are unassigned. The decision level function is left-associative, i.e., δ[I → ∞][- → d] expresses that first all literals on I are unassigned and then literal is assigned to decision level d.

Unlike in CDCL with non-chronological backtracking [36,44,45], in *chronological CDCL* [33,38] literals may not be ordered on the trail in ascending order with respect to their decision level. We write I<sup>n</sup> (I<n, I=<sup>n</sup>) for the subsequence of I containing all literals with δ(-) ≤ n (δ(-) < n, δ(-) = n). The *pending search space* of I is given by the assignments not yet tested [34], i.e., I and its open right branches R(I), and is defined as O(I) = I ∨ R(I), where R(I) = - -<sup>∈</sup>decs(I) <sup>R</sup>=δ(-)(I) and R=δ(-)(I) = ¬- ∧ I<δ(-) for - ∈ decs(I). As an example, for <sup>I</sup> <sup>=</sup> ab<sup>d</sup> cde<sup>d</sup>f, <sup>O</sup>(I)=(<sup>a</sup> <sup>∧</sup> <sup>b</sup> <sup>∧</sup> <sup>c</sup> <sup>∧</sup> <sup>d</sup> <sup>∧</sup> <sup>e</sup> <sup>∧</sup> <sup>f</sup>) <sup>∨</sup> (¬<sup>b</sup> <sup>∧</sup> <sup>a</sup>) <sup>∨</sup> (¬<sup>e</sup> <sup>∧</sup> <sup>a</sup> <sup>∧</sup> <sup>b</sup> <sup>∧</sup> <sup>c</sup> <sup>∧</sup> <sup>d</sup>). Similarly, the *pending models* of F are the satisfying assignments of F not yet detected and which are given by F ∧ O(I).

# 3 Chronological CDCL for CNF-to-d-DNNF Compilation

In *static component analysis* the component structure is computed once, typically as a preprocessing step, and not altered during the further execution. In contrast, in our approach the component structure is computed iteratively adopting *dynamic component analysis*. Algorithm 1 provides a general schema in pseudo-code. It is formulated recursively, capturing the recursive nature of dynamic component analysis. Lines 1–7 and 11 describe model enumeration based on chronological CDCL [34], while lines 8–10 capture component analysis.

Now assume unit propagation has been carried out until completion, no conflict has occurred and there are still unassigned variables (line 8). If F|<sup>I</sup> can be decomposed into two formulae G and H, we call CNF2dDNNF recursively on G and H, conjoin the outcomes of these computations with I and add the result to M (line 9). If I contains no decisions, the search space has been explored exhaustively, otherwise chronological backtracking occurs (lines 10). The working of our approach is shown by an example.

*Example 1.* Let V = {a, b, c, d, e, f, g, h} be a set of propositional variables and F = (a) ∧ (¬a ∨ ¬b ∨ c ∨ d) ∧ (¬a ∨ ¬b ∨ e ∨ f) ∧ (b ∨ ¬c ∨ e) ∧ (b ∨ d ∨ f) ∧ (g ∨ h) be a formula defined over V . The execution is depicted as a tree in Fig. 1. For the sake of readability, we show only the formula on which a rule is executed, represented by a box annotated with its component level. Black arrows correspond to "downward" rule applications, while violet (gray) arrows represent "upwards" rule applications and are annotated with the formula returned by the computation of a component. Ignore the rule names for now, they are intended to clarify the working of our calculus which is presented in Sect. 4. We see that, first, a is propagated, denoted by the black vertical arrow annotated with a and the name of the applied rule (Unit). The residual of F under a is F|<sup>a</sup> = (¬b∨c∨d)∧ (¬b∨e∨f)∧(b∨¬c∨e)∧(b∨d∨f)∧(g∨h) (not shown). It contains no unit clause but can be decomposed into (¬b∨c∨d)∧(¬b∨e∨f)∧(b∨¬c∨e)∧(b∨d∨f) and (g∨h). Two new (sub)components are created (by applying rule Decompose) with component level 01 and 02, respectively, represented by the shadowed boxes.

Since (g∨h) can not be decomposed further, model enumeration with chronological CDCL is executed on it (not shown) by deciding g (rule Decide) satisfying (g ∨ h), followed by backtracking chronologically (BackTrue), which amounts to negating the value of the most recent decision g, and propagating h (Unit). The processing of (g∨h) terminates with g∨¬g∧h (CompTrue, not shown). But before this result can be used further, the subcomponent at component level 01 needs to be processed. Its formula is G = (¬b∨c∨d)∧(¬b∨e∨f)∧(b∨¬c∨e)∧(b∨d∨f). It neither contains a unit nor can it be decomposed, hence we take a decision, let's say, <sup>b</sup><sup>d</sup>. Now <sup>G</sup>|<sup>b</sup> = (c∨d)∧(e∨f), which is decomposed into two components with one clause each and component level 011 and 012, respectively (Decompose). These formulae can not be decomposed further, and they are processed independently, similarly to (g ∨ h). Before G was decomposed, a decision was taken, and we backtrack combining the results of its subcomponents (ComposeBack). We have G|¬<sup>b</sup> = (¬c ∨ e) ∧ (d ∨ f) resulting in two components with component

Fig. 1. Component structure of F created by ACD.

levels 011 and 012, respectively. They are processed and their results combined, after which the results of the subcomponents of the root component are conjoined with a. There is no decision on the trail, and the process terminates with M = (a)∧(¬a∨ ¬b∨c∨d)∧(¬a∨ ¬b∨e∨f)∧(b∨ ¬c∨e)∧(b∨d∨f)∧(g ∨h) (ComposeEnd). Notice that although component levels can occur multiple times throughout the computation, they are unique at any point in time.

# 4 Calculus

Due to its recursive nature, combining the results computed for subcomponents in CNF2dDNNF is straightforward. For its formalization, however, a nonrecursive approach turned out to be better suited. Consequently, a method is needed for matching subcomponents and their parent. For this purpose, a *component level* is associated with each component. It is defined as a string of numbers in <sup>N</sup> as follows. Suppose a component <sup>C</sup> is assigned level "d" and assume its formula is decomposed into two subformulae. The corresponding subcomponents C<sup>G</sup> and C<sup>H</sup> are assigned component levels "d · 1" and "d · 2", respectively, with "·" denoting string composition. Accordingly, the component level of their parent C is given by the substring consisting of all but the last element of their level, i.e., "d".<sup>2</sup> The *root component* holds the input formula, it has no parent and its component level is zero. A component is *closed* if no rule can be applied to it, and *decomposed* if either at least one of its subcomponents is not closed or both its subcomponents are closed, but their results are not yet combined. Components which are neither closed nor decomposed are *open*. <sup>3</sup> Closed components may be discarded as soon

<sup>2</sup> From now on, we omit the quotes for the sake of readability.

<sup>3</sup> The differentiation between open and decomposed components is purely technical and needed for the termination proof in Sect. 5.

as their results are combined, and the computation stops as soon as the root component is closed. With these remarks, we are ready to present our calculus.

We describe our algorithm in terms of a state transition system Abstract CNF2dDNNF, ACD for short, over a set of global states S, a transition relation ❀ ⊆ S × S and an initial global state S0. A *global state* is a set of components. A *component* <sup>C</sup> is described as a seven-tuple (F, V, d, e, I, M, δ)<sup>s</sup>, where s denotes its *component state*. It is c if C is closed, f if F is decomposed, and o if C is open. The first two elements F and V refer to a formula and its set of variables, respectively. The third element d denotes the component level of C. If d = 0, then d ∈ {d · 1, d · 2}, where d is the component level of the parent component of C, as explained above. In this manner, the component level keeps track of the decomposition structure of F and is used to match parent components and their subcomponents. The number of subcomponents of C is given by e, while I and δ refer to a trail ranging over variables in V and a decision level function with domain V , respectively. Finally, M is a formula in d-DNNF representing the models of F found so far. A component is initialized by (F, V, d, <sup>0</sup>, ε, <sup>⊥</sup>, <sup>∞</sup>)<sup>o</sup> and closed after its computation has terminated, i.e., (F, V, d, 0, I, M, δ)<sup>c</sup>. Notice that in these cases e = 0. The *initial global state* <sup>S</sup><sup>0</sup> <sup>=</sup> {C0} consists of the *root component* <sup>C</sup><sup>0</sup> = (F, V, <sup>0</sup>, <sup>0</sup>, ε, <sup>⊥</sup>, <sup>∞</sup>)<sup>o</sup> with <sup>F</sup> and V denoting the input formula and V = var(F), while the *final global state* is given by <sup>S</sup><sup>n</sup> <sup>=</sup> {(F, V, <sup>0</sup>, <sup>0</sup>, I, M, δ)<sup>c</sup>} where <sup>M</sup> <sup>≡</sup> <sup>F</sup> is in d-DNNF. The transition relation ❀ is defined as the union of transition relations ❀R, where R is either Unit, Decide, BackTrue, BackFalse, CompTrue, CompFalse, Decompose, ComposeBack or ComposeEnd. Our calculus contains three types of rules, which can abstractly be described as follows:

$$
\alpha \colon \mathcal{S} \uplus \{ \mathcal{C} \} \sim\_{\mathbb{R}} \mathcal{S} \uplus \mathcal{C}'; \\
\beta \colon \mathcal{S} \uplus \{ \mathcal{C} \} \sim\_{\mathbb{R}} \mathcal{S} \uplus \{ \mathcal{C}', \mathcal{C}\_1, \mathcal{C}\_2 \}; \\
\gamma \colon \mathcal{S} \uplus \{ \mathcal{C}, \mathcal{C}\_1, \mathcal{C}\_2 \} \sim\_{\mathbb{R}} \mathcal{S} \uplus \{ \mathcal{C}' \} \cdot \mathcal{C}
$$

In this description, S refers to the subset of the current global state consisting of all components which are not touched by rule R, with denoting the disjoint set union, e.g., in α, C, C ∈ S. An α rule affects a component C turning it into C . The rules Unit, Decide, BackTrue, BackFalse, CompTrue, and CompFalse are α rules. A β rule modifies C obtaining C and creates two new components C<sup>1</sup> and C2. Rule Decompose is the only β rule. Finally, a γ rule removes the two components C<sup>1</sup> and C<sup>2</sup> from the global state and modifies their parent C. Rules ComposeBack and ComposeEnd are γ rules. The rules are listed in Fig. 2.

Model Computation. Rules Unit, Decide, BackTrue, BackFalse, CompTrue, and CompFalse execute model enumeration with chronological CDCL [34] and are applicable exclusively to open components. Unit literals are assigned the decision level of their reason, which might be lower than the current decision level (rule Unit). Decisions can be taken only if the processed formula is not decomposable (Decide). Backtracking occurs chronologically, i.e., to the second highest decision level on the trail, after finding a model (BackTrue) and to the decision level preceding the conflict level after conflict analysis (BackFalse), respectively. In the latter case, the propagated literal is assigned the lowest level at which the learned clause becomes unit and to which a SAT solver implementing CDCL with


Fig. 2. ACD transition rules.

non-chronological backtracking would backtrack to. Since the literals might not be ordered on the trail in ascending order with respect to their decision level, a non-contiguous part of it is discarded. Finally, a component is closed if its trail contains no decisions and either satisfies its formula (CompTrue) or a conflict occurs at decision level zero, i.e., the conflicting clause has decision level zero (CompFalse). In the former case, the newly found model is recorded.

Component Analysis. Rules Decompose, ComposeBack, and ComposeEnd capture the decomposition of a formula and the combination of the models of its subformulae and thus affect multiple components.

Decompose. The state of the parent component C with formula F is o (open). The trail I neither satisfies nor falsifies F, and F|<sup>I</sup> contains no unit clause but can be partitioned into two formulae G and H defined over disjoint sets of variables. Subcomponents for G and H are created, the number of subcomponents of C is set to two and its state is changed to f (decomposed). Notice that C can only be processed further after its subcomponents are closed.

ComposeBack. The state of the component C with formula F is f (decomposed). Its subcomponents C<sup>G</sup> and C<sup>H</sup> with formulae G and H, respectively, have state c (closed). Furthermore, N ≡ G and O ≡ H, hence F|<sup>I</sup> ≡ I ∧ N ∧ O, which is added to M. This corresponds to enumerating multiple models of F in one step. This can easily be seen by applying the distributive laws to I ∧ N ∧ O which gives us a DSOP formula whose disjuncts are satisfying assignments of F|<sup>I</sup> . The search space has not yet been processed exhaustively (δ(I) > 0), backtracking to the second highest decision level occurs, and the state of C is changed back to o (open). Finally, C<sup>G</sup> and C<sup>H</sup> are removed from the global state. If I can not be extended to a model of F, we have N = ⊥ or O = ⊥, and I ∧ N ∧ O = ⊥. Otherwise, I ∧ N ∧ O = ⊥. Both cases are captured by rule ComposeBack.

ComposeEnd. The state of the parent component C with formula F is f (decomposed). Its subcomponents C<sup>G</sup> and C<sup>H</sup> with formulae G and H, respectively, are closed. Furthermore, N ≡ G and O ≡ H, hence F|<sup>I</sup> ≡ I ∧ N ∧O, which is added to M. The search space has been processed exhaustively (decs(I) = ∅), and the state of C is set to c (closed). Finally, C<sup>G</sup> and C<sup>H</sup> are removed from the global state. As in rule ComposeBack, either I ∧ N ∧ O = ⊥ or I ∧ N ∧ O = ⊥.

*Example 2.* Reconsider Example 1 with variables V = {a, b, c, d, e, f, g, h} and F = (a)∧(¬a∨¬b∨c∨d)∧(¬a∨¬b∨e∨f)∧(b∨¬c∨e)∧(b∨d∨f)∧(g∨h) defined over V . The execution trace of ACD is shown in Fig. 3. Unaffected components are depicted in gray, and model enumeration by means of chronological CDCL is shown only once in full detail. The execution starts with the root component C<sup>F</sup> containing F. In step (1), the unit literal a is propagated, upon which F|<sup>a</sup> is decomposed into (g∨h) and G creating components C(g∨h) and C<sup>G</sup> shown in (2). Steps (3) to (6) capture model enumeration by chronological CDCL of (g∨h), i.e., the computation of a DSOP representation of (g∨h), after which C(g∨h) is closed. Next, the formula G is processed by deciding b in step (7), decomposing G|<sup>b</sup> into (c ∨ d) and (e ∨ f) and creating components C(c∨d) and C(e∨f), respectively, in step (8). The processing of C(c∨d) and C(e∨f) occurs analogously to steps (3) to (6) resulting in the state shown in (9). The results are conjoined with b, which is the trail of C<sup>G</sup> and under which G|<sup>b</sup> was decomposed. Since b is a decision, it is flipped in (10) to explore its right branch ¬b. The formula G|¬<sup>b</sup> is decomposed into (¬c ∨ e) and (d ∨ f) and components C(¬c∨e) and C(d∨f) are created, as in (11). Their processing, which is not shown, results in the state depicted in (12), and the results are conjoined with the trail of CG. Since its trail contains no decision, C<sup>G</sup> is closed, see (13). The global state now contains the root component and its two subcomponents, which are closed, hence the rule ComposeEnd is executed, and the computation terminates with the closed root component and M = a∧(g∨¬g∧h)∧(b∧(c∨¬c∧d)∧(e∨¬e∧f)∨¬b∧(c∧e∨¬c)∧(d∨¬d∧f), where M ≡ F, and which is shown in (14).


Fig. 3. Execution trace of ACD for Example 1.

# 5 Proofs

For proving correctness, we first show that our calculus is sound by identifying invariants which need to hold in a sound global state and show that they still hold after the execution of any rule. Then we prove that for any closed component it holds that M ≡ F and that ACD can not get stuck and terminates in a correct state. Showing termination concludes our proof.

Definition 1 (Sound Global State). *A global state* S *is* sound *if for all its components* <sup>C</sup> = (F, V, d, e, I, M, δ)<sup>s</sup> *the following invariants hold:*


Invariants (1) - (5) correspond to the ones in our previous work [34]. They say that decisions are ordered in ascending order with respect to their decision level and that every decision level contains a decision literal. They further ensure that literals propagated after backtracking upon finding a model are indeed implied, that no model is enumerated multiple times and that all models are found. Invariant (3) is only useful for open or decomposed components, since I remains unaltered when a component is closed. Invariant (4) only holds for closed components if I(F) = ⊥. Invariants (6) and (7) are concerned with the properties of a parent component and its subcomponents (for the case c = 2), such as the definition of the component level. Since, given a trail I, F|<sup>I</sup> is decomposed into formulae G and H, we also have that F|<sup>I</sup> ≡ N ∧ O, where N ≡ G and O ≡ H. Finally, Inv.(8) says that the trail of a closed component contains no decision.

Lemma 1 (Soundness of the Initial Global State). *The initial global state* <sup>S</sup><sup>0</sup> <sup>=</sup> {(F, V, <sup>0</sup>, <sup>0</sup>, ε, <sup>⊥</sup>, <sup>∞</sup>)<sup>o</sup>} *is sound.*

*Proof.* Due to I = ε and e = 0 and since the (root) component is open, all invariants in Definition 1 are trivially met.

Theorem 1 (Soundness of ACD Rules). *The rules of* ACD *preserve soundness, i.e., they transform a sound global state into another sound global state.*

*Proof.* The proof is carried out by induction over the rule applications. We assume that prior to the application of a rule the invariants in Definition 1 are met and show that they also hold in the target state. The (parent) component in the original state is denoted by <sup>C</sup> = (F, V, d, e, I, M, δ)<sup>s</sup> and in the target state by C = (F, V, d , e , I , M , δ )s- . Its subcomponents, if there are any, are written <sup>C</sup><sup>G</sup> = (G, var(G), d · <sup>1</sup>, eG, J, N, δG)<sup>s</sup>, <sup>C</sup><sup>H</sup> = (H, var(H), d · 2, eH, K, O, δH)<sup>s</sup>. Unit, Decide, BackTrue, and BackFalse: Apart from the additional elements V , d, e and the component state s, the rules are defined as in the former calculus [34]. The arguments given in the proof there apply here as well, and after applying rules Unit, Decide, BackTrue, or BackFalse, Inv. (1) - (5) hold. Notice that in the proof of Inv.(4), it suffices to replace "DSOP" by "d-DNNF", since the relevant property here is determinism. Since e = 0, Inv.(6) and (7) do not apply. An open state is mapped to an open state, hence Inv.(8) holds.

CompTrue and CompFalse: Invariants (1) and (2) hold, since I remains unaffected. Since C is closed, Inv.(3) and (4) are met. The proof that Inv. (5) holds is carried out similarly to the proof of Proposition 1 in our previous work [34] for rules EndTrue and EndFalse, respectively. Since e = 0 and I = I, Inv. (6) - (8) hold.

Decompose: The parent component C remains unaltered except for e = 2 and for its state, which becomes f. Both its subcomponents C<sup>G</sup> and C<sup>H</sup> are open, and we have J<sup>G</sup> = J<sup>H</sup> = ε and e<sup>G</sup> = e<sup>H</sup> = 0. Therefore, Inv.(1) - (5) hold. Invariant (6) is satisfied by the definition of rule Decompose. Since C is decomposed and C<sup>G</sup> and C<sup>H</sup> are open by definition, Inv.(7) and (8) hold as well.

ComposeBack: It suffices to show that the validity of the invariants for C is preserved, since C<sup>G</sup> and C<sup>H</sup> do not occur in the target state. The most recent decision literal is flipped, similar to rule BackTrue. The same argument to the one given there applies, and Inv.(1) and (2) are satisfied. We need to show that F ∧ ¬(M ∨ (I ∧ N ∧ O)) ∧ decsn(PK-) |= (PK-)<sup>n</sup> holds for all n. The decision levels of the literals in P K do not change, except for the one of -, which is decremented from e+1 to e. The literal also stops from being a decision literal. Since δ(PK-) = e, we can assume n ≤ e. Furthermore, F∧¬(M ∨ (I ∧ N ∧ O))∧ decsn(PK-)) ≡ (¬I∧(F ∧¬M∧decs<sup>n</sup>(I)))∨(F ∧¬M∧¬(N ∧ O)∧decs<sup>n</sup>(I)), since is not a decision literal in PK and I<sup>e</sup> = P K and thus I<sup>n</sup> = (P K)n by definition. By applying the induction hypothesis, we get ¬I ∧ F ∧ ¬M ∧ decsn(PK-) |= (P K)<sup>n</sup>, and hence F ∧¬(M ∨ (I ∧ N ∧ O))∧decsn(PK-) |= (P K)<sup>n</sup>. We still need to show that F∧¬(M ∨ (I ∧ N ∧ O))∧decse(PK-) |= -, as δ(-) = e in PK after applying ComposeBack and thus disappears from the proof obligation for n<e. Notice that F ∧ ¬D |= I using again the induction hypothesis for n = e + 1. This gives us F ∧ ¬decs<sup>e</sup>(P K) ∧ ¬- |= I and thus F ∧ ¬decs<sup>e</sup>(P K) ∧ ¬I |= by conditional contraposition, and Inv.(3) holds.

For proving that Inv.(4) holds, we consider two cases: (A) I ∧ N ∧ O = ⊥, i.e., there exists an extension of I which satisfies F, and (B) I ∧ N ∧ O = ⊥, i.e., all extensions of I falsify F. For both cases, we know that I ∨O(I) is a d-DNNF.

(A) We need to show that M ∨ (I ∧ N ∧ O) ∨ O(PK-) is a d-DNNF. Due to δ(I) = e + 1, we have O(I) = I ∨ R<sup>e</sup>+1(I) = I ∨ R<sup>e</sup>(I) ∨ R=e+1(I). The pending search space of PK is given by O(PK-) = PK- ∨ Re(PK-). But P K = I<sup>e</sup> and PK- = Ie - = R=e+1(I), since ¬- ∈ decs(I) and δ(¬-) = e + 1. Furthermore, Re(PK-) = R<sup>e</sup>(P K), since - ∈ decs(PK-) and δ(-) = e, hence Re(PK-) = R<sup>e</sup>(I). We have O(PK-) = R=e+1(I)∨R<sup>e</sup>(I), hence O(PK-)∨ I = O(I) and (M ∨ I) ∨ O(PK-) = M ∨ O(I), which is a DSOP and hence a d-DNNF. Now I, N, and O are defined over pairwise disjoint sets of variables by construction, i.e., I ∧ N ∧O is decomposable, and M ∨(I ∧ N ∧O)∨O(PK-) is a d-DNNF.

(B) We need to show that M ∨ O(PK-) is a d-DNNF. As just shown, O(PK-) ∨ I = O(I). Now M ∨ O(PK-) = M ∨ R<sup>e</sup>+1(I). Recalling that R<sup>e</sup>+1(I) is equal to O(I) without I and M ∨O(I) is a d-DNNF by the premise, M ∨ O(PK-) is a d-DNNF as well. Therefore, Inv.(4) holds.

For the proof of the validity of Inv.(5), given M ∨ F ∧ O(I) ≡ F, the same two cases are relevant: (A) I ∧ N ∧ O = ⊥ and (B) I ∧ N ∧ O = ⊥.

(A) We have to show that M ∨ (I ∧ N ∧ O) ∨ (F ∧ O(PK-)) ≡ F. From O(PK-) ∨ I = O(I) we get M ∨ (F ∧ O(I)) = M ∨ (F ∧ (O(PK-)) ∨ I) = M∨(F∧O(PK-))∨(F∧I) ≡ F. But F∧I ≡ I∧N∧O. Therefore M∨(F∧O(I)) ≡ M ∨ (F ∧ O(PK-)) ∨ (I ∧ N ∧ O) = M ∨ (I ∧ N ∧ O) ∨ (F ∧ O(PK-)) ≡ F.

(B) We must show that M ∨ (F ∧ O(PK-)) ≡ F. Similarly to (A) we have M ∨ (F ∧ O(I)) ≡ M ∨ (F ∧ O(PK-)) ∨ (F ∧ I) ≡ M ∨ (F ∧ O(PK-)) ≡ F, due to F ∧ I ≡ F. Therefore, Inv.(5) holds after applying rule ComposeBack. We have e = 0, and C is open, hence Inv.(6) - (8) trivially hold.

ComposeEnd: It suffices to show that after applying rule ComposeBack the invariants are met by C , since its subcomponent states C<sup>G</sup> and C<sup>H</sup> do not occur in the target state anymore. Due to I = I and decs(I) = ∅ and since C is closed, Inv.(1) - (4) trivially hold.

For proving that invariant (5) holds after applying rule ComposeEnd, i.e., that M ∨(I ∧ N ∧ O)∨(F ∧ O(I)) ≡ F, the same two cases need to be distinguished: (A) I ∧ N ∧ O = ⊥ and (B) I ∧ N ∧ O = ⊥.

(A) From decs(I) = ∅, we get O(I) = I and F ∧O(I) = F ∧I. Recalling that F ∧I ≡ I ∧N ∧O, we obtain M ∨(I ∧N ∧O)∨(F ∧O(I)) ≡ M ∨(F ∧O(I)) ≡ F by the premise.

(B) We have M ∨ (I ∧ N ∧ O) ∨ (F ∧ O(I)) = M ∨ (F ∧ O(I)) ≡ F by the premise, and Inv.(5) holds after executing rule ComposeEnd. Invariants (6) - (8) trivially hold, due to e = 0 and I = I and hence decs(I ) = ∅.

Corollary 1 (Soundness of ACD Run). ACD *starting with an initial global state is sound.*

*Proof.* The initial state is sound by Lemma 1, and all rule applications lead to a sound state according to Theorem 1.

Lemma 2 (Correctness of Closed Component State). *For any closed component* (F, V, d, <sup>0</sup>, I, M, δ)<sup>c</sup> *it holds that* <sup>M</sup> <sup>≡</sup> <sup>F</sup>*.*

*Proof.* Follows from Theorem 1, proof of Inv.(5) for rules CompTrue, CompFalse, and ComposeEnd, which are the only rules closing a component.

Theorem 2 (Correctness of Final Global State). *In the final global state* <sup>S</sup><sup>n</sup> <sup>=</sup> {(F, V, d, <sup>0</sup>, I, M, δ)<sup>c</sup>} *of* ACD*,* <sup>M</sup> <sup>≡</sup> <sup>F</sup> *holds.*

*Proof.* Correctness of the closed root component follows from Lemma 2. We need to show that the final global state contains exactly the closed root component. The initial global state consists of the open root component. Additional components are created exclusively by rule Decompose, and a parent component state can only be closed by rule ComposeEnd, which also removes its subcomponents from the global state. Hence the root component can only be closed if it has no subcomponents. But since the initial global state contains exclusively the root component, the final global state contains only the closed root component.

Theorem 3 (Progress). ACD *always makes progress.*

Fig. 4. Rule applications lead to smaller global states.

*Proof.* The proof is conducted by induction over the rules. We show that as long as the root component is not closed, a rule is applicable. For the case S {C}, where <sup>C</sup> = (F, V, d, <sup>0</sup>, I, M, δ)<sup>o</sup> has no subcomponents, the proof is identical to the one showing progress in our previous work [34] replacing EndTrue with CompTrue and EndFalse with CompFalse, and by checking whether the preconditions for rule Decompose are met if rule Unit is not applicable and before taking a decision. Now let the global state be given by S {C} where <sup>C</sup> = (F, V, d, <sup>2</sup>, I, M, δ)<sup>f</sup> is decomposed. Due to Inv.(6), <sup>S</sup> contains <sup>C</sup><sup>G</sup> <sup>=</sup> (G, var(G), d · <sup>1</sup>, eG, JG, N, δG)<sup>s</sup> and <sup>C</sup><sup>H</sup> = (H, var(H), d · <sup>2</sup>, eH, JH, O, δH)<sup>s</sup> such that F|<sup>I</sup> = G ∧ H and var(G) ∩ var(H) = ∅. Assume s = c for both C<sup>G</sup> and C<sup>H</sup> . If decs(I) = ∅, rule ComposeEnd is applicable. Otherwise, similarly to rule BackTrue, we can show that all preconditions of rule ComposeBack are met. If instead s ∈ {f,o} for at least one of C<sup>G</sup> and CH, the non-closed component(s) are processed further, and as soon as both C<sup>G</sup> and C<sup>H</sup> are closed, rule ComposeEnd or ComposeBack can be applied. This proves that ACD always makes progress.

#### Theorem 4 (Termination). ACD *always terminates.*

*Proof.* We need to show that no infinite sequence of rule applications can happen. To this end, we define a strict, well-founded ordering ACD on the global states and show that S ❀<sup>R</sup> T implies S ACD T for all S, T ∈ S and rules R in ACD. Global states are sets of components, and ACD is the multiset extension of a component ordering <sup>c</sup>= (cl, tr, cs), where cl, tr, and cs are orderings on component levels, trails, and component states, respectively. We want to compare trails defined over the same set of variables V , and to this end we represent them


Fig. 5. Generalized transition rules.

as lists over {0, 1, 2}. A trail I = -<sup>1</sup> ...<sup>k</sup> defined over V , where k ≤ |V |, is represented as [l1,...,lk, 2,..., 2], where l<sup>i</sup> = 0 if <sup>i</sup> is a propagation literal and l<sup>i</sup> = 1 if <sup>i</sup> is a decision literal. The last |V | − m positions with value 2 represent the unassigned variables. Trails defined over the same variable set are encoded into lists of the same length. This representation induces a lexicographic order >lex on trails, and we define tr as the restriction of >lex to {[l1,...,l|<sup>V</sup> <sup>|</sup>] | l<sup>i</sup> ∈ {0, 1, 2} for 1 ≤ i ≤ |V |}, i.e., we have t<sup>1</sup> tr t<sup>2</sup> if t<sup>1</sup> >lex t2. The ordering tr is well-founded, its minimal element is [0,..., 0]. The component state takes values in {o, f, c}, and we define cs as >lex, i.e., s<sup>1</sup> cs s<sup>2</sup> if s<sup>1</sup> >lex s2. The minimal element of cs is c, hence cs is well-founded. Given two component levels d<sup>1</sup> and d2, we define d<sup>1</sup> cl d<sup>2</sup> if length(d1) < length(d2). This may seem counterintuitive but is needed to ensure that the execution of rule Decompose results in a smaller state, since both the component state and the trail of the new subcomponents are of higher order than those of their parent. To see that cs is well-founded, recall that we consider finite variable sets. Their size provides an upper limit on the length of the component level representation and a minimal element of cs.

Now we define the component ordering <sup>c</sup>= (cl, tr, cs). Let two components be C<sup>1</sup> = (d1, t1, s1) and C<sup>2</sup> = (d2, t2, s2). We have C<sup>1</sup> <sup>c</sup> C<sup>2</sup> if C<sup>1</sup> = C<sup>2</sup> and d<sup>1</sup> cl d<sup>2</sup> or d<sup>1</sup> = d<sup>2</sup> and either t<sup>1</sup> tr t<sup>2</sup> or t<sup>1</sup> = t<sup>2</sup> and s<sup>1</sup> cs s2. Clearly <sup>c</sup> is well-founded, since tr, cs, and cl are well-founded. For two global states S and T , we have S ACD T if S = T and for each component C such that C is larger in T than in S with respect to <sup>c</sup>, S contains a component C that is larger in S than in T . Since <sup>c</sup> is well-founded, also ACD is well-founded. Figure 4 shows that each rule application leads to a smaller global state, concluding our proof.

#### 6 Generalization

The generalized rules are listed in Fig. 5. In our generalized framework, we have <sup>F</sup>|<sup>I</sup> <sup>=</sup> <sup>n</sup> <sup>i</sup>=1 Gi, and var(Gi) ∩ var(G<sup>j</sup> ) = ∅ for i, j ∈ {1,...,n} and i = j (rule DecomposeG). Similarly to their equivalents in ACD, rules ComposeBackG and ComposeEndG are applicable if all subcomponents are closed.

## 7 Discussion

We have presented Abstract CNF2dDNNF, or ACD for short, a formal framework for compiling a formula in CNF into d-DNNF combining CDCLbased model enumeration with chronological backtracking [34] and dynamic component analysis [4]. Conflict-driven clause learning enables our framework to escape regions without solution early, and chronological backtracking prevents multiple model enumeration without the need for remembering already found models using blocking clauses, which slow down unit propagation. However, the absence of blocking clauses also prevents the use of restarts. If exclusively the rules Unit, Decide, BackTrue, BackFalse, CompTrue, and CompFalse are used, a DSOP representation of F is computed. Unit propagation is prioritized due to its potential to reduce the number of decisions and thus of right branches to be explored. Favoring decompositions over decisions may also shrink a larger part of the search space. Our framework lays the theoretical foundation for practical All-SAT and #SAT solving based on chronological CDCL. Any implementation which can be modeled by ACD exhibits its properties, in particular its correctness, which has been established in a formal proof.

Comparison with Available Tools. There exist other knowledge compilers addressing d-DNNFs. We want to mention c2d [20], Dsharp [37], and D4 [30], which also execute an exhaustive search and conflict analysis. However, our approach differs conceptually from these tools in several ways. The most prominent ones are the use of CDCL with chronological backtracking [33,38] instead of CDCL with non-chronological backtracking and the way the d-DNNF is created. Our method generates DSOP representations of formulae which can not be decomposed further by an exhaustive (partial) model enumeration and then combines the result, while the tools mentioned above generate the d-DNNF by recording the execution trace as a graph [26,27]. As ACD, both D4 and Dsharp adopt a dynamic decomposition strategy, while c2d constructs a decomposition tree which it then uses for for component analysis.

Future Research Directions. We plan to implement a proof of concept of our calculus in order to compare the size of the returned d-DNNF with the ones obtained by c2d, D4, and Dsharp. For dynamic component analysis, one could follow the algorithm implemented in COMPSAT [6], while dual reasoning [32] and logical entailment [35] enable the detection of short partial models. This is particularly interesting in tasks where the length of the d-DNNF is crucial. Dual reasoning has shown to be almost competitive on CNFs if the search space is small, we therefore expect that component analysis boosts its performance. The major challenge posed by the second approach lies in an efficient implementation of the oracle calls required by the entailment checks. It would be interesting to investigate the impact of dynamic component analysis on a recent implementation [46] of model enumeration by chronological CDCL [34]. Cache structures, being an inherent part of modern knowledge compilers and #SAT solvers [11,16,19,20,30,31,37,41,42,47,49] due to their positive impact on solver efficiency [1], should be added to any implementation of our framework. Finally, an important research topic is that of optimizing the encoding of a formula making best use of component analysis [14]. Related to this question is whether formulae stemming from practical applications are decomposable in general.

Acknowledgements. My thanks go to Armin Biere for a fruitful discussion when I got stuck in a first, very raw version of the proof, and to Martin Bromberger for his input enhancing it.

#### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Higher-Order Theorem Proving**

# **Hammering Floating-Point Arithmetic**

Olle Torstensson<sup>1</sup> and Tjark Weber2(B)

<sup>1</sup> Link¨oping University, Link¨oping, Sweden olle.torstensson@liu.se <sup>2</sup> Uppsala University, Uppsala, Sweden tjark.weber@it.uu.se

**Abstract.** Sledgehammer, a component of the interactive proof assistant Isabelle/HOL, aims to increase proof automation by automatically discharging proof goals with the help of external provers. Among these provers are a group of satisfiability modulo theories (SMT) solvers with support for the SMT-LIB input language. Despite existing formalizations of IEEE floating-point arithmetic in both Isabelle/HOL and SMT-LIB, Sledgehammer employs an abstract translation of floating-point types and constants, depriving the SMT solvers of the opportunity to make use of their dedicated decision procedures for floating-point arithmetic.

We show that, by extending Sledgehammer's translation from the language of Isabelle/HOL into SMT-LIB with an interpretation of floatingpoint types and constants, floating-point reasoning in SMT solvers can be made available to Isabelle/HOL. Our main contribution is a description and implementation of such an extension. An evaluation of the extended translation shows a significant increase of Sledgehammer's success rate on proof goals involving floating-point arithmetic.

### **1 Introduction**

Interactive theorem proving is one of the more flexible and powerful formal verification techniques available. However, finding a proof outline with intermediate proof steps just simple enough for a proof assistant to be able to discharge automatically may require a considerable amount of time and effort, even from a seasoned user. As an example, the seL4 micro-kernel, the product of about two person-years and 9000 lines of code, took a total of about 20 person-years and 200,000 lines of proof development to formally verify [29]. For this reason, increasing proof automation in interactive proof assistants is crucial to further broaden their applicability.

As a way of tackling this issue, many interactive proof assistants have the ability to transfer the proof burden of some of the intermediate steps onto *automated* reasoning systems with automatic proof methods better suited for the task. This approach has proven to be quite successful in bringing the number of required user interactions down for many types of problems, thus increasing productivity.

Among these proof assistants, we find Isabelle/HOL [34] and its powerful proof-delegation tool Sledgehammer [36], which acts as an interface between Isabelle/HOL and a number of external provers. In addition to traditional (resolution-based) first-order automated theorem provers (ATPs) such as E [40], SPASS [45] and Vampire [38] and the higher-order ATP Zipperposition [9], these external provers include satisfiability modulo theories (SMT) solvers such as CVC4 [7], veriT [15] and Z3 [31]. SMT solvers are highly specialized for reasoning within certain logical theories (e.g., integers, real numbers, and bit vectors), and often implement decision procedures more efficient than those found in the automatic proof methods of Isabelle/HOL.

Whether an external prover succeeds in solving a delegated proof obligation depends, among other factors, on how the proof obligation is encoded in the language of the prover. SMT solvers support the SMT-LIB input language [6], which offers both uninterpreted (free) type and function symbols that are declared by the user, as well as theory-specific *interpreted* types and operations that have a fixed semantics. Dedicated inference rules and decision procedures for specific theories that are available in SMT solvers are typically employed only when the types and operations that appear in the delegated proof obligation are interpreted. An abstract translation that leaves types and operations uninterpreted will deprive external solvers of the opportunity to make use of their dedicated decision procedures for specific background theories, and will instead have to rely on a sufficient set of facts being passed to the solver along with the proof obligation.

One of the more recent additions to the growing set of theories supported by major SMT solvers is that of floating-point arithmetic [16]. A formalization of IEEE floating-point arithmetic in Isabelle/HOL has been available in the Archive of Formal Proofs for nearly a decade [46]. However, Sledgehammer has not yet caught up to this development; its SMT component does not implement an interpretation of floating-point types and operations. Our aim is to provide such an interpretation, with the purpose of increasing the success rate for floatingpoint proof obligations delegated to SMT solvers, and thereby to increase the degree of automation in the interactive proof process.

As an example, let us consider the commutativity of floating-point addition. SMT solvers that support floating-point arithmetic typically have no trouble proving that x+y=y+x when they can assume that x and y denote floatingpoint numbers, and that + denotes IEEE floating-point addition (i.e., when + is translated as fp.add). However, if this formula is translated in an uninterpreted fashion, the problem becomes much harder: it now requires to show commutativity of a user-declared function over a user-declared type. Whether the SMT solver will succeed in this case depends on many factors, including which additional facts (definitions and lemmas) are passed along from the interactive proof assistant together with the proof obligation itself.

*Contributions.* We define a formal model of floating-point arithmetic in Isabelle/ HOL that implements the SMT-LIB floating-point theory (Sect. 3).

We then extend the SMT solver integration in Isabelle/HOL by adding support for floating-point arithmetic, i.e., by treating floating-point types and operations as interpreted in the translation from the language of Isabelle/HOL to the SMT-LIB input format. In addition to describing this extension in detail (Sect. 4), we provide an implementation (in the Archive of Formal Proofs [46]) that supports Sledgehammer. To the best of our knowledge, this makes Isabelle/HOL the first interactive proof assistant to employ an interpreted translation for floating-point arithmetic in its integration of automated theorem provers.

An evaluation (Sect. 5), performed on a representative set of floating-point proof obligations from interactive proof, confirms the expectation that our translation extension significantly increases Sledgehammer's success rate on proof goals involving floating-point arithmetic, albeit at the cost of lower success rates for proof reconstruction—at this stage, our integration typically requires the external SMT solvers to be trusted as *oracles*.

#### **2 Background**

In this section, we cover additional background information regarding Sledgehammer and floating-point arithmetic.

#### **2.1 The Sledgehammer Proof Process**

When trying to prove a conjecture in Isabelle, a user may, via a simple call to Sledgehammer, pass along the proof obligation to several external provers, which will then work on the problem in parallel. The statement to be proven is used by a relevance filter [30] to select additional facts (axioms and previously proven statements) that may help in finding a proof. All of these statements are then translated and compiled into a file in the input format of the external prover (in the case of SMT solvers, an SMT-LIB input file), as illustrated in Fig. 1.

After working on the problem, the external prover (if it does not time out) returns to Isabelle with its findings. At this point, if a prover reported the conjecture to be true, the user can either choose to view the prover as an *oracle* and accept the conjecture as a theorem (the dashed path in Fig. 1), or make Isabelle try to automatically reconstruct the proof internally, based on the additional facts sent with the conjecture and any proof details the prover may provide. Theorems that are only proved externally are marked with an *oracle* tag, meant to convey a certain amount of skepticism—reconstructed proofs are generally preferred, as they remove the consideration of possible bugs in the external prover, or in the translation between formats.

In Sledgehammer's translation module, types and constants are generally declared with a unique (freshly generated) identifier that has no inherent meaning to the external prover. A few Isabelle theories (e.g., those for integer arithmetic, real arithmetic, and bit vectors) define types and constants that are treated as interpreted by the translation into SMT-LIB [11], in which case they are mapped directly to their counterpart in the target logic—thereby allowing the SMT solvers to use their built-in decision procedures designed specifically to reason within the theories in question.

**Fig. 1.** A conjecture's journey to become a theorem via Sledgehammer

#### **2.2 IEEE 754 Binary Floating-Point Arithmetic**

The most common way to approximate the real numbers to a suitable finite set of numbers in modern hardware is via *floating-points*. Simulating real arithmetic using floating-points is not a straightforward task; the definitions of arithmetic operations are not always obvious, and should ideally not vary between implementations. To this end, the IEEE developed the technical standard IEEE 754 [26], aiming to provide clear specifications and recommendations on all aspects of floating-point arithmetic. To meet the needs of different applications, the standard specifies several floating-point *formats*, each defining a unique set of numbers.

A binary floating-point format is characterized by its exponent width *w* ∈ <sup>N</sup>, and its precision *<sup>p</sup>* <sup>∈</sup> <sup>N</sup>. A binary floating-point number, *<sup>x</sup>*, may then be represented in this format by a triple (*s, e, f*) of bit vectors of length 1, *w*, and *p* − 1, respectively, such that (for finite *x*)

$$x = \begin{cases} (-1)^s \cdot 2^{1-\text{bias}(w)} \cdot (0 + \frac{f}{2^{p-1}}) & \text{if } e = 0\\ (-1)^s \cdot 2^{e-\text{bias}(w)} \cdot (1 + \frac{f}{2^{p-1}}) & \text{otherwise}, \end{cases} \tag{1}$$

where bias(*w*)=2*<sup>w</sup>*−<sup>1</sup> <sup>−</sup>1. The standard also specifies two signed infinities, +<sup>∞</sup> and −∞, denoting values that are too great in magnitude for the format. These are represented by the triples (0*,* 1 *...* 1*,* 0 *...* 0) and (1*,* 1 *...* 1*,* 0 *...* 0), respectively. Together, the sign *s*, the (biased) exponent *e*, and the fraction *f* constitute a unique representation of any finite or infinite floating-point number; in particular, the two numbers +0, represented by (0*,* 0 *...* 0*,* 0 *...* 0), and −0, represented by (1*,* 0 *...* 0*,* 0 *...* 0), are considered distinct. To represent the result of invalid operations, such as 0*/*0, the standard defines a special *Not-a-Number* (NaN) value, represented via any triple (*s,* <sup>1</sup> *...* <sup>1</sup>*, f*) such that *<sup>f</sup>* = 0 *...* 0.<sup>1</sup>

Additionally, IEEE 754 specifies various arithmetic operations on floatingpoint numbers. Conceptually, floating-point arithmetic is carried out by converting floating-point numbers to more precise values, performing the corresponding arithmetic operation, and converting the result back to the original floating-point format, in an emulation of a rounded infinitely precise calculation. In an environment like Isabelle/HOL, where theories of real arithmetic are available, the task of carrying out calculations with infinite precision falls upon these, whereas the floating-point operations handle the rounding and special cases (e.g., an argument being NaN or infinite). IEEE 754 specifies precisely how this handling should be performed.

## **3 An Implementation of SMT-LIB Floating-Point Arithmetic in Isabelle/HOL**

Formalizations of floating-point arithmetic are readily available for many proof assistants. For Isabelle/HOL, a formalization originally developed by Lei Yu is available from the Archive of Formal Proofs [46]. This defines a (polymorphic) type of floating-point numbers, whose instances correspond to IEEE floatingpoint formats with specific width and precision, and various arithmetic operations over this type.

However, although both are based on the IEEE standard, there are important semantic differences between this model and the SMT-LIB floating-point theory [16]. These differences would have rendered a direct interpretation of Lei Yu's model in the SMT-LIB floating-point theory unsound.

First, the SMT-LIB theory offers five rounding modes. The mode round-NearestTiesToAway (which is optional according to IEEE 754) was not available in the Isabelle/HOL model. Therefore, the enumerated type of rounding modes in Isabelle/HOL did not correspond to the RoundingMode sort in SMT-LIB. We have resolved this difference by adding support for roundNearestTiesToAway to Lei Yu's model. Although rounding is pervasive in IEEE—it is performed by most arithmetic operations—it is factored out into only two functions in the Isabelle/HOL model (round and intround), so that this was a relatively minor, local change.

Second, the formalization by Lei Yu emphasizes the bit representation of floating-point values (corresponding to specification level 4 in IEEE 754), while the SMT-LIB floating-point theory takes a more abstract view (corresponding to specification level 2 in IEEE 754). Specifically, in Lei Yu's formalization, each floating-point format contains multiple NaN values (with different bit representations), while the corresponding floating-point format in SMT-LIB only

<sup>1</sup> The IEEE 754 standard defines a *quiet* and a *signalling* NaN. This distinction is not present in the SMT-LIB floating-point theory, which is based on a higher level of abstraction.

contains a single (abstract) NaN value. To resolve this fundamental difference, we have constructed a new model of floating-point arithmetic in Isabelle/HOL. Our starting point is a quotient construction over the type ('e,'f) float of floating-point numbers offered by Lei Yu's model. We first define an equivalence relation is nan equivalent on this type that relates all NaN values:

**definition** is nan equivalent :: ('e,'f) float ⇒ ('e,'f) float ⇒ bool **where** is nan equivalent *a b* ≡ *a* = *b* ∨ (is nan *a* ∧ is nan *b*)

We then define a new type ('e,'f) floatSingleNaN that contains the equivalence classes of ('e,'f) float with respect to the relation is nan equivalent: **quotient type** (overloaded) ('e,'f) floatSingleNaN =

('e,'f) float / is nan equivalent

The resulting type ('e,'f) floatSingleNaN contains a single (abstract) NaN value. The (type) arguments 'e and 'f indicate the bit width of the exponent and fraction, respectively. A similar construction, but limited to the doubleprecision (64-bit) format, was used in [8] to facilitate OCaml code generation for floating-point numbers. Flocq [14], a Coq library of floating-point arithmetic, defines a type with similar semantics inductively, rather than using a quotient construction.

Most floating-point operations can then be lifted [25] in a straightforward manner from ('e,'f) float to ('e,'f) floatSingleNaN. We have additionally defined various operations that are supported in SMT-LIB but that were not available in Lei Yu's model, such as conversion functions between floating-point numbers and bit vectors. Our model now covers all operations that are available in the SMT-LIB floating-point theory.

Some (rather subtle) semantic differences between our model and the SMT-LIB floating-point theory remain. In SMT-LIB, the result of certain operations, such as converting NaN or infinities to a real number, is unspecified. Isabelle/HOL does not support partial specifications; therefore, the result of these operations is defined<sup>2</sup> in our model. Technically, the Isabelle/HOL model is an implementation of the SMT-LIB specification. This does not affect the soundness of interpreting the model in SMT-LIB: any theorem provable under SMT-LIB semantics also holds for the Isabelle/HOL model.

An error in the remainder function float rem as defined in Isabelle/HOL was discovered during implementation and has been patched: the remainder of a finite floating-point value *x* and ±∞ shall be *x* [26, §5.3.1].

## **4 Interpreting Isabelle/HOL Floating-Point Arithmetic in SMT-LIB**

This section describes an interpreted translation of floating-point types and operations from Isabelle/HOL to SMT-LIB. Our translation extends a preexisting general translation [11] targeting SMT solvers that is part of Sledgehammer, which treats floating-point arithmetic as uninterpreted. It supports the formal

<sup>2</sup> For instance, in terms of a special constant called undefined.

model of IEEE floating-point arithmetic in Isabelle/HOL that was described in the previous section. We aim to be comprehensive but restrict attention to those floating-point concepts that are defined in both Isabelle/HOL and SMT-LIB.

#### **4.1 SMT-LIB Logic**

The first task of our translation module is to select an SMT-LIB logic within which the SMT solver is to reason when deciding the satisfiability of the formula. For performance reasons, it is generally a good idea to select a logic that is as specific as allowed by the contents and structure of the formula. However, FP, the logic for floating-point arithmetic, is too restrictive for many of Isabelle's proof obligations, which may freely combine floating-point operations with other types and constants. When translated, these will require support for symbols that are either free (uninterpreted) or defined in other SMT-LIB theories.

Sledgehammer's SMT integration relies on callback functions to analyze the proof obligation and determine the problem's logic. However, only one of these functions may select a logic. In the absence of a framework allowing for a more modular approach (e.g., incrementally generalizing the logic as little as necessary, based on the types and constants that appear in the proof obligation), we need to select a logic that covers all operations that appear in the proof obligation. To achieve this, whenever a supported floating-point type is detected in the formula to be translated, our callback function returns the (pseudo-)logic ALL. Available since version 2.5 of the SMT-LIB standard, this provides a convenient way to select the most general logic that the respective SMT solver supports.

#### **4.2 Types**

Both Isabelle/HOL and SMT-LIB define binary floating-point formats of arbitrary width of the exponent and fraction fields. In Isabelle/HOL, (m,n) float-SingleNaN is the type of floating-point numbers with an exponent field of width m and a fraction field of width n (and thus with precision n+1). In SMT-LIB, the hidden bit of the significand (the bit preceding the fraction) is included in the format specification, making ( FloatingPoint m n+1) the corresponding sort. The SMT-LIB sorts are only defined for formats with m *>* 1 and n *>* 0, whereas m and n are merely required to be positive in Isabelle/HOL. Thus, any type (1,n) floatSingleNaN lacks a corresponding sort in SMT-LIB, and is left uninterpreted by the translation.

In Isabelle/HOL, all floating-point formats (m,n) floatSingleNaN are instances of a polymorphic type ('e,'f) floatSingleNaN. Here, 'e and 'f are type variables that may be instantiated with concrete (type) arguments, or left uninstantiated to express generic properties that hold for all floating-point formats. Due to the current lack of support for polymorphism in SMT-LIB, (m,n) floatSingleNaN is interpreted only when m and n are (type) arguments encoding fixed numeric values; polymorphic types are left uninterpreted.

In addition to the types for floating-point formats, Isabelle/HOL defines an enumerated type roundmode for the rounding modes used by the arithmetic operations. SMT-LIB provides a corresponding type; roundmode is interpreted as RoundingMode in SMT-LIB.

#### **4.3 Constants**

For the sake of brevity, we focus here on some of the more interesting aspects of the translation of constants. (In HOL, constants are not limited to arity 0, but may have a function type.) An exhaustive enumeration of the mapping is provided in Table 1.

*Polymorphism.* The issue regarding polymorphism, described in the previous section, affects the translation of constants as well. A constant can only be interpreted if its type is not polymorphic. Since Isabelle's automatic type inference assigns constants the most general type possible with respect to the context, variables and constants with a floating-point type will in many cases need to be attached with explicit type constraints in order to trigger the interpretation.

*Direct Correspondence.* For many floating-point related constants in Isabelle, there is a direct semantic-preserving mapping to a function in SMT-LIB. Among these we find, e.g., the rounding modes and comparison operations together with many arithmetic operations and classification predicates. The translation of these does not involve much more than simply replacing their name with the corresponding identifier in SMT-LIB.

*Format Parameter Extraction.* A few SMT-LIB functions targeted by our translation are technically elements of an infinite family of functions generated by an index over all floating-point formats. This holds, e.g., for the conversion operation from reals to floating-points, and for the (nullary) functions denoting the special floating-point values ±0, ±∞ and NaN. Their behavior depends on the result sort, which is not necessarily derivable from context and must be indicated explicitly in SMT-LIB. In these cases, we extract the type arguments of the (result) type of the constant to be translated, and add them explicitly as arguments to the corresponding function symbol in SMT-LIB. For instance, the Isabelle/HOL function round of type roundmode ⇒ real ⇒ ('e,'f) floatSingleNaN, which converts a real number into a floating-point number (rounding as necessary), is interpreted as ( to fp m n+1) whenever its result type is of the form (m,n) floatSingleNaN, where m and n encode fixed numeric values.

*Term Translation.* Isabelle/HOL supports the definition of advanced concepts on top of the types and constants that are provided by the model of floating-point arithmetic. Our translation does not interpret such derived concepts directly. Instead, these can be handled by unfolding their definitions in Isabelle when desired, or by relying on Sledgehammer's relevance filter, which can make their definitions and other relevant facts available to external provers automatically.

**Table 1.** Types and constants in Isabelle/HOL covered by the translation, together with sorts and functions in SMT-LIB. m *>* 1 and n *>* 0 indicate the floating-point format. Square brackets denote syntactic sugar, which is also interpreted.


#### **5 Evaluation**

To investigate the difference in the performance of Sledgehammer brought on by the interpreted translation, and to get a clear overview of the comparative performance of the SMT solvers, we conducted an experimental evaluation on a set of proof obligations that involve floating-point operations. Freely available Isabelle formalizations of floating-point properties are scarce; only a few properties are included with the formal IEEE model in the Archive of Formal Proofs. We complemented these with our own formalizations of floating-point properties taken from the IEEE 754 standard and the *Handbook of Floating-point Arithmetic* [32], resulting in a set of 124 formulas. The formulas in the evaluation set exhibit difficulties ranging from nearly trivial to levels on par with Sterbenz's lemma [42].

All formulas in the evaluation set are polymorphic over a single floatingpoint type ('e,'f) floatSingleNaN. This type was instantiated to different fixed-size floating-point formats: half (16-bit), single (32-bit), double (64-bit), and quadruple (128-bit) precision formats, as specified by IEEE 754. The interpreted translation was evaluated on each of these fixed-size formats. For comparison, the abstract (uninterpreted) translation that was previously employed by Sledgehammer was additionally evaluated on the original (polymorphic) evaluation set. This gives rise to nine different models—technically, Isabelle theories with different type annotations—for measuring Sledgehammer's performance on the evaluation set, defined for *x* ∈ {(5,10)*,* (8,23)*,* (11,52)*,* (15,112)} as:


We used the Mirabelle [17] tool with default settings—including a 30 s time limit per formula—to apply Sledgehammer to each proof obligation. The default external provers invoked by Sledgehammer in Isabelle2022 are the ATPs E (version 2.6-1), SPASS (version 3.8ds-2), Vampire (version 4.6) and Zipperposition (version 2.1-1), along with the SMT solvers CVC4 (version 1*.*8), veriT (version 2021.06.2-rmx), and Z3 (version 4.4.0 4.4.1). Since the floating-point solver in this version of Z3 suffers from a soundness bug, we evaluated Z3 version 4.12.2 instead. We did not evaluate newer versions of the other solvers, such as cvc5 [3], as they are not yet integrated with Isabelle.

Out of the three SMT solvers, only CVC4 and Z3 support the floatingpoint theory of SMT-LIB. For each of the nine models, we evaluated four different prover configurations: CVC4 only, Z3 only, CVC4+Z3, and Sledgehammer's default prover configuration, which includes all of the ATPs and SMT solvers listed above. For the I*<sup>x</sup>* models, where interpretation is enabled, the default prover configuration uses both interpreted and uninterpreted translations (depending on the prover). For CVC4, we enabled its experimental floating-point solver (option --fp-exp) to obtain support for floating-point formats beyond single and double precision.

Sledgehammer's relevance filter had access to a large collection of theorems from the Isabelle/HOL library, including the definitions of all types and operations, and (for later formulas in the evaluation set) to all formulas that were evaluated earlier. This mimics realistic use in interactive proof, where users can rely on proven statements and employ them as lemmas in subsequent proofs. To avoid later runs being affected by earlier runs, the status of the machine learning selection of facts (stored in the Isabelle configuration file mash state) was reset before each Mirabelle run.

The experiments were conducted under Debian GNU/Linux 6.1.0-10-amd64, running on an i9-9980HK CPU at 2.4 GHz with 16 processor threads and 32 GB of main memory.

#### **5.1 Results**

Table 2 shows Sledgehammer's success rates for the four different prover configurations when run on the evaluation set in the models described above. For convenience, the four fixed formats are abbreviated by their total bit length (16, 32, 64, and 128, respectively) in the model name. Sledgehammer succeeds when at least one of the external provers reports that it found a proof within the time limit of 30 s.


**Table 2.** Sledgehammer's success rates for the four prover configurations on proof goals from the evaluation set, by model.

In this case, Sledgehammer attempts to reconstruct the external proof in Isabelle using a collection of automated proof methods (as discussed in Sect. 2.1). The success rates for this process, again as a percentage of the total number (124) of proof obligations, are shown in Table 3.

For each floating-point format (and also for the polymorphic model), the largest success rate across prover configurations, with or without interpretation enabled, is indicated in boldface.


**Table 3.** Success rates of proof reconstruction for the four prover configurations on proof goals from the evaluation set, by model.

#### **5.2 Discussion**

Based on the results of our evaluation, we put forward the following observations:


6. *Interpretation leads to (much) lower proof reconstruction rates for all prover configurations and fixed-size floating point formats.* Although interpretation allows external provers to find more proofs, these proofs are rarely successfully reconstructed in Isabelle. This is to be expected: Isabelle currently does not offer built-in automated proof procedures for floating-point reasoning that could be used to reconstruct such proofs.

Many formulas from the evaluation set were previously proven with 10–20 lines of interactively developed Isabelle proof script, and can now (after interpretation) be proven completely automatically by CVC4 or Z3. The interpreted translation can save significant amounts of human labor in formal proof developments that involve floating-point arithmetic. However, due to the lower proof reconstruction rate, interpretation of floating-point arithmetic is currently primarily of interest to users who are willing to accept CVC4 and Z3 as oracles (cf. Sect. 2.1).

#### **6 Related Work**

The practice of employing automatic provers as back-ends in interactive theorem provers is not unique to Isabelle. Generic proof-delegation tools similar to Sledgehammer have also been developed for other proof assistants, e.g., MizAR [43] for Mizar [2], and HOL(y)Hammer [27] for HOL Light [22] and HOL4 [41]. There are also proof-delegation tools aimed specifically toward SMT solvers, e.g., Smtlink [37] for ACL2 [28] and SMTCoq [1] for Coq [10].

Single integrations of SMT solvers have perhaps been more common than these larger-scale tools. The interactive theorem prover PVS [35] is tightly connected with the SMT solver Yices [18] (and its predecessor ICS), which has been available as a decision procedure for a long time. An oracle integration of Yices in Isabelle by Erk¨ok and Matthews [20] makes use of its dedicated decision procedures, but refrains from translating into SMT-LIB, and instead targets the native input format of Yices due to its expressiveness. Weber [44] proposes a similar oracle integration of Yices into HOL4, but extends it with support for additional SMT solvers via the SMT-LIB format. This integration has since been supplemented with proof reconstruction and become part of HOL(y)Hammer [13].

The work presented here is based on the original integration of SMT solvers in Isabelle's Sledgehammer by Blanchette et al. [11]. It is dependent on various aspects of their translation into SMT-LIB, including the interpretation of bit-vector types and constants. In this sense, it also bears resemblance to how SMTCoq was recently extended with dedicated support for the theory of bit vectors [19].

Formalizations of IEEE 754 floating-point arithmetic are readily available in interactive proof assistants, e.g., in HOL Light [23], ACL2 [39], and Coq [14], and have been used extensively to verify floating-point related properties. However, to the best of our knowledge, no integration of SMT solvers in interactive proof assistants takes advantage of the dedicated decision procedures for floating-point arithmetic available in the former.

Superficially, the work perhaps most similar to ours is a Why3 [12] formalization of floating-point arithmetic and its mapping to the SMT-LIB floatingpoint theory [21]. Why3, however, is not a prover itself, but a stand-alone proofdelegation tool relying completely on external provers. Thus greater automation in interactive proof assistants is not a shared objective.

# **7 Conclusions**

In the years since its introduction in Isabelle, Sledgehammer has seen a number of improvements. In varying degree, they have each gradually brought us closer to the ultimate goal of powerful proof automation in interactive proof assistants. By defining a formal model of floating-point arithmetic in Isabelle/HOL that implements SMT-LIB semantics, and by enhancing the translation from Isabelle to SMT-LIB with an interpretation of floating-point types and constants, we have taken another step in this direction. Sledgehammer enjoys a significant increase in success rates (before proof reconstruction) for proof obligations that involve floating-point arithmetic.

Many proof obligations that were previously out of reach for any automated prover can now be solved automatically. For users who are willing to trust the external SMT solvers, enhancing Sledgehammer's translation with a floatingpoint interpretation increases proof automation and reduces the manual effort required to construct proofs in this important application domain.

Our translation does not require formulas to be fully interpretable in the SMT-LIB floating-point theory. The SMT solvers are instructed to reason in a more general logic, where interpreted and uninterpreted sorts and functions can be combined freely.

There are two notable limitations, which we propose to address through future work. First, the interpretation of floating-point arithmetic is restricted to fixed-size formats. In many situations, this is not a severe limitation—fixedsize reasoning is sufficient, for instance, when one wants to verify a specific hardware architecture, or a software implementation that uses a specific floatingpoint type such as binary64. However, floating-point properties that hold for all formats are most naturally stated polymorphically in Isabelle/HOL. Such properties cannot be interpreted in the floating-point theory of SMT-LIB, which (in its current version 2.6) lacks support for polymorphism: although it offers a type ( FloatingPoint m n) for any sufficiently large m and n, it does not offer a polymorphic type ( FloatingPoint *m n*) where *m* and *n* are variables that may be instantiated.

Supporting polymorphism in SMT solvers is no small feat. Fortunately, there is ongoing work to obtain a tighter integration of automatic provers, including SMT solvers, with proof assistants. One of the means by which to achieve this is via support for higher-order logic in these provers [5]. Most likely, SMT-LIB 3 the next major update to SMT-LIB—will facilitate these changes by supporting polymorphism [4]. When such support becomes available in SMT solvers that support floating-point arithmetic, an interpreted translation can be employed also for polymorphic floating-point properties. There has already been work on supporting parametric bit-vector formulas in SMT solvers by encoding them as formulas over non-linear integer arithmetic, uninterpreted functions, and universal quantifiers (the UFNIA logic in SMT-LIB) [33]. This approach could in principle be extended to floating-point numbers.

Second, interpretation of floating-point arithmetic allows SMT solvers to find more proofs, but reduces proof reconstruction rates in Isabelle. There is a mismatch between the reasoning capabilities of SMT solvers that support floatingpoint arithmetic and Isabelle's built-in automated proof procedures, which are used to reconstruct proofs. The latter currently do not offer dedicated support for floating-point reasoning, but need to rely on explicit lemmas to reason about concepts for which the SMT solver, when interpretation is enabled, can employ specialized decision procedures. Users may opt to bypass proof reconstruction and use external SMT solvers as oracles; however, this reduces trust in the resulting theorems, as errors in the SMT solver, in the translation from Isabelle/HOL to SMT-LIB, or in the Isabelle/HOL model of floating-point arithmetic could lead to unsound results. The approach preferred by the interactive theorem proving community is that of a skeptic [24]—external proofs should be reconstructed internally. If successful, this approach combines the speed of the SMT solver with the reliability of the proof assistant.

Efficient reconstruction of proofs has previously been achieved for other SMT-LIB logics [11], and is likely possible also for floating-point reasoning, through improving on the proof information provided by SMT solvers and translating theory-specific inferences. An automated proof procedure for floating-point arithmetic implemented on top of Isabelle's inference kernel would both facilitate the reconstruction of external proofs and increase the built-in automation for floating-point reasoning available in Isabelle/HOL. The implementation of such a proof procedure will require substantial work, but the evaluation results in this paper—in particular, the difference between Tables 2 and 3—clearly indicate that the effort would not be wasted.

**Acknowledgments.** This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Learning Proof Transformations and Its Applications in Interactive Theorem Proving**

Liao Zhang1,2(B) , Lasse Blaauwbroek<sup>3</sup> , Cezary Kaliszyk1,4 , and Josef Urban<sup>2</sup>

<sup>1</sup> University of Innsbruck, Innsbruck, Austria

zhangliao714@gmail.com <sup>2</sup> Czech Technical University in Prague, Prague, Czech Republic

<sup>3</sup> Institut des Hautes Etudes Scientifiques Paris, Paris, France

<sup>4</sup> International Neurodegenerative Disorders Research Center, Prague, Czech Republic

**Abstract.** Interactive theorem provers are today increasingly used to certify mathematical theories. To formally prove a theorem, reasoning procedures called tactics are invoked successively on the proof states starting with the initial theorem statement, transforming them into subsequent intermediate goals, and ultimately discharging all proof obligations. In this work, we develop and experimentally evaluate approaches that predict the most likely tactics that will achieve particular desired transformations of proof states. First, we design several characterizations to efficiently capture the semantics of the proof transformations. Then we use them to create large datasets on which we train state-of-the-art random forests and language models. The trained models are evaluated experimentally, and we show that our best model is able to guess the right tactic for a given proof transformation in 74% of the cases. Finally, we use the trained methods in two applications: proof shortening and tactic suggesting. To the best of our knowledge, this is the first time that tactic synthesis is trained on proof transformations and assists interactive theorem proving in these ways.

**Keywords:** Interactive theorem proving · Machine learning · Neural networks

## **1 Introduction**

Interactive theorem provers (ITPs) [15] are sophisticated systems used for constructing machine-verified proofs. Various proof assistants, such as HOL4 [31], HOL Light [14], Lean [23], Isabelle/HOL [24], and Mizar [3], are used by formalizers. Coq [33] is one of the most popular proof assistant systems. Coq formalizers invoke reasoning procedures called *tactics* that transform proof states into simpler proof states, eventually discharging all proof obligations and thus proving the initial proof state.

```
Theorem rev_length : ∀ l : list nat, length (rev l) = length l.
Proof.
  intros l. induction l as [| n l' IHl'].
  - reflexivity.
  - simpl. rewrite → app_length. simpl. rewrite → IHl'.
    rewrite add_comm. reflexivity.
Qed.
```
**Fig. 1.** A formal Coq proof, showing the equality property of the lengths of a list and its reverse

To give a simple example, we show a Coq proof of the equality of the lengths of a list and its reverse (Fig. 1). To complete the proof, one can perform induction on the list l (with the help of the tactic induction l as [| n l' IHl']), splitting the proof state into a case where l is empty and a case where l is nonempty. In the first case, the goal reduces to length (rev []) = length [], which is easily discharged using simple computation. In the second case, we obtain the induction hypothesis IHl' that states length (rev l') = length l' and need to prove that the equation still holds when the original list has a natural number n prepended to it. After some simplification, we transform the length of the concatenation of two lists into the summation of their individual lengths. Then, with the help of the induction hypothesis, we simplify the goal. Finally, we rewrite the goal by the commutative property of addition and obtain a simple equation to prove.

A Coq proof state consists of a list of hypotheses and a goal that needs to be proven. Given a proof state before the tactic application, the tactic may either transform the *before state* to several *after states* or finish the proof. The *semantic* of a tactic is captured by the (usually infinite) set of proof state transformations that can potentially be generated by that tactic. In this work, we approximate that infinite set with a finite dataset of transformations that occur in real proofs written by Coq users. We then use machine learning models to gain an understanding of tactics using their approximated semantics.

As an example, Fig. 2 presents the before and after states of the tactic rewrite add\_comm at its position in Fig. 1. In this particular case, the hypotheses remain unchanged, but in the goal, the two sides of the addition are swapped.

**Fig. 2.** The before and after states of rewrite add\_comm in Fig. 1, with hypotheses above the dashed line and the required goal below it.

In this paper, we consider the machine learning task of predicting a tactic capable of generating a given proof state transformation and investigate the applications of this task. Formally, given a before state *ps* and *n* after states {*ps* }<sup>1</sup>*..n*, we attempt to predict a tactic *t* that transforms *ps* to {*ps*}<sup>1</sup>*..n* such that *ps <sup>i</sup>* is equal to *ps <sup>i</sup>* modulo *α*-equivalence for every *i*.

# **1.1 Motivation**

Tactic prediction methods have so far relied solely on before states, typically to guide automated tactical proof search in systems like Tactician [6]. We are interested in synthesizing tactics based both on the before and after states for a number of reasons.

First, there are multiple interesting applications of this task. For example, formalizers may want to arrive at a particular proof state, given a particular initial proof state. Or, given particular before and after states that were generated with a sequence of tactics, we may want to find a *single* tactic capturing the transformation, thus shortening and simplifying the proof, and teaching the formalizer how to use the available tactics.

Second, our work is the first step to designing a novel human-like proof search strategy. When mathematicians write pencil-and-pen proofs, they often first imagine some intermediate goals and then sequentially fill in the gaps. This provides another motivation: our trained predictors can recommend the tactics that will bridge the gaps between such intermediate human-designed proof goals.

Third, the task can be of particular importance for the ITPs which support constructing proofs in a declarative proof style, such as Isabelle, Mizar, and Lean. In declarative-style proofs often the after states are specified by the user manually. A large formal library, Mizar Mathematical Library [2], is developed in a declarative style. The Isabelle Archive of Formal Proofs (one of the most developed libraries today) is also predominantly written in a declarative style. Our approach can be directly applied to predict tactics able to fill the gap between two subsequent declarative statements.

Finally, the learned tactic embeddings could be used to perform MuZerostyle [30] reinforcement learning, which means obtaining the after states by combining the embeddings of the before states and of the tactics without actually running the ITP. This could be particularly useful when some tactic applications require large computational resources.

# **1.2 Contributions**

The main contributions of our paper can be summarized as follows.


Besides the above-mentioned contributions, Sect. 3 introduces the preliminaries of the learning technology used in this paper. We discuss two related research fields in Sect. 6. The conclusions and future work are presented in Sect. 7.

#### **2 Proof State Characterizations**

To train the machine learning models, we need to provide characterizations of the before and after states. Apart from directly using the unprocessed textual representation of proof states, we design three characterizations: feature difference, anti-unification, and tree difference.

# **2.1 Feature Difference**

To characterize the proof states, we start with the features used by [42]. In that work, the features were used to apply machine learning to predict tactics for proof states. For example, GOAL-\$l' and HYPS-Coq.Lists.List.rev-\$l' are two features extracted from the before state in Fig. 2. The prefixes GOAL and HYPS denote whether a feature belongs to the goal or the hypotheses. The symbol \$l' denotes a node that occurs in the abstract syntax tree (AST) of the proof state. The prefix \$ means that l' denotes a named variable. We subsequently consider the nodes connected in the AST. For example, the feature Coq.Lists.List.rev-\$l' means that the identifier of the reversion operation of a list and the list l' are connected in the AST.

For the current work, we additionally consider feature difference. From the before state *ps* and after states {*ps* }<sup>1</sup>*..n*, we extract features *f* and {*f* }<sup>1</sup>*..n*, respectively using the procedure discussed above. We define *f* as the union of {*f* }<sup>1</sup>*..n*. By set difference, we compute the *disappeared features f* − *f* and the *appearing features f* − *f*. The disappeared features and appearing features are together used as feature difference characterization of the tactic.

# **2.2 Anti-unification**

Anti-unification, first proposed by Plotkin [27] and Reynolds [29], aims to calculate generalizations of the given objects. Since Coq is based on the Calculus of Inductive Constructions (CIC) [25], an appropriate anti-unification algorithm for Coq should be higher-order. However, higher-order anti-unification is undecidable [26]. Therefore, we first convert Coq terms to first-order terms so that we can execute a decidable and efficient first-order anti-unification algorithm.

To encode Coq terms into first-order logic, we transform them recursively following the AST. First-order applications and constants are encoded directly,

**Fig. 3.** The least general generalization of the before and after states in Fig. 2

other applications use the apply functor app and all other cases use special first-order functions (e.g., a dependent product is encoded as a first-order function prod). The goal of the before state in Fig. 2 will be converted to the firstorder term = (+(*length*(*l* )*, S*(*O*))*, S*(*length*(*l* ))). The non-leaves =*,* +*, length, S* denote function symbols. The leaves *l* and *O* denote constants.

*Terms* in first-order anti-unification are defined as *t* ::= *x* | *a* | *f*(*t*1*, ..., tn*) where *x* is a variable, *a* is a constant, *f* is an n-ary function symbol, and *t* is a term. In this paper, letters *s, t, u* denote terms, letters *f, g, h* denote function symbols, letters *a, b* denote constants, and letters *x, y* denote variables. *Substitutions* map variables to terms and are usually written in the form of sets. We can represent a substitution *σ* as a set {*x* -→ *σ*(*x*) | *x* = *σ*(*x*)} where *σ*(*x*) is the term mapped by *x*. The application of a substitution *σ* to a term *t* is represented as *tσ*. If *t* is a variable, then *tσ* = *σ*(*t*). If *t* = *f*(*t*1*, ..., tn*), then *tσ* = *f*(*t*1*σ, ..., tnσ*). A term *u* is called a *generalization* of a term *t* if there exists a substitution *σ* such that *uσ* = *t*. For instance, the term *f*(*g*(*x*)*, y*) is a generalization of the term *f*(*g*(*a*)*, h*(*a, b*)). The substitution *σ* is {*x* -→ *a, y* -→ *h*(*a, b*)} such that *f*(*g*(*x*)*, y*)*σ* = *f*(*g*(*a*)*, h*(*a, b*)).

Anti-unification aims to obtain the *least general generalization (lgg)* of two terms *s* and *t*. A term *u* is called a generalization of *s* and *t* if there exist substitutions *σ*<sup>1</sup> and *σ*<sup>2</sup> such that *uσ*<sup>1</sup> = *s* ∧ *uσ*<sup>2</sup> = *t*. A generalization *u* of *s* and *t* is called the lgg if, for any generalization *u* of *s* and *t*, there is a substitution *σ*, such that *u σ* = *u*. Assuming *φ* is a bijective function from a pair of terms to a variable, given two terms *s* and *t*, the anti-unification algorithm *AU* calculates the lgg using the two rules below.

– *AU*(*s, t*) = *f*(*AU*(*s*1*, t*1)*, ..., AU*(*sn, tn*)) if *s* = *f*(*s*1*, ..., sn*), *t* = *f*(*t*1*, ..., tn*) – *AU*(*s, t*) = *φ*(*s, t*) if the preceding rule does not match.

Figure 3 presents the lgg of the before and after states considered in Fig. 2. Compared to the before state, most of the nodes in the lgg remain the same. The differences stay in the left side of the equality in the goal: length l' is substituted with Var0, and the natural number 1 is substituted with Var1. We need to apply the substitutions {*var*<sup>0</sup> -→ *length l , var*<sup>1</sup> -→ 1} and {*var*<sup>0</sup> -→ 1*, var*<sup>1</sup> -→ *length l* } to the lgg to obtain the before and after states, respectively.

We compute the *lggs* of the goals and the hypotheses separately. We can directly anti-unify the goals of the before and after states. However, the number of hypotheses may be changed by the tactic application. For instance, the tactic intros introduces new hypotheses, while the tactic clear H removes the hypothesis H. Suppose we are anti-unifying the hypotheses *hyps*(*h*1*, ..., hn*) and *hyps*(*h*1*, ..., hn, hn*+1). The first rule of anti-unification immediately fails, and the second rule will generate a variable that corresponds to all hypotheses in the before state and all hypotheses in the after states. Therefore, anti-unifying all hypotheses together prevents us from developing a compact characterization. To calculate the lggs of hypotheses, we first match the hypotheses with the same names. Then, we compute an lgg on each pair. We refer to the hypotheses that are only in the before state and only in the after state as respectively *deleted hypotheses* and *inserted hypotheses*. Different from the pairwise hypotheses, we do not perform anti-unification on the deleted hypotheses and inserted hypotheses, and they remain unchanged.

We choose anti-unification because it can generate a more compact representation compared with directly utilizing the before and after states. Consider Fig. 2, we need a Coq string of the before state and another Coq string of the after state to characterize the transformation. Notice that many parts of the before state are unchanged after the tactic application. It is redundant to represent these unchanged parts twice in both the before and after states. However, anti-unification enables us to use a single lgg and the substitutions to characterize the transformation. The unchanged parts of the before and after states are shared in the lgg. Moreover, previous research has demonstrated that features based on generalization are very helpful for theorem proving [19].

# **2.3 Tree Difference**

In addition to anti-unification, we propose a characterization based on a tree difference algorithm [21]. Compared to anti-unification, tree difference is better at generalizing the differences between the before and after states. Tree difference extends the standard Unix diff [16] algorithm by the capability to compute the differences according to the tree structures. Since proof states have tree structures, such tree differences can be used to characterize the transformations.

Take the before and after states in Fig. 2 for demonstration. First, for the hypotheses that are the same in the before and after states, we keep them unchanged. Therefore, the hypotheses n, l', and IHl' remain the same.

The next step is to extract common subtrees from the original trees (except for the unchanged hypotheses) to obtain more compact characterizations. We focus on the ASTs of Coq terms. Assuming there is an oracle to judge whether the current subtree is a common subtree, we traverse a tree from the root. The calculation of the oracle is explained in the original paper [21]. If the current subtree is a common subtree and not a leaf node, we substitute it with a hole. We do not substitute leaves with holes because, in practice, the substitutions of leaves lead to many unexpected holes. The same common subtrees should always be substituted with the same hole. The results of applying the substitutions to the before and after states are called the *deletion context* and the *insertion context*, respectively. After the substitutions, the deletion and insertion contexts are shown in Fig. 4.

Afterward, we calculate the *greatest common prefix (gcp)* of the deletion and insertion contexts and obtain a *patch*. According to the original algorithm, if the two trees have the same non-hole node, we keep the node unchanged and execute the algorithm on their children. Otherwise, we denote them as a *change*.

**Fig. 4.** The deletion and insertion contexts of the before and after states in Fig. 2. Hole0, Hole1, and Hole2 denote length l', 1, and S(length l'), respectively.

**Fig. 5.** The patch of the before and after states in Fig. 2

Similar to anti-unification, due to the deletion, insertion, and reordering of the hypotheses, we need to adjust the gcp algorithm for proof states. We match hypotheses by their names and obtain the deleted hypotheses, inserted hypotheses, and matched hypotheses as in Sect. 2.2. We only calculate gcps on the matched hypotheses. The deleted hypotheses and inserted hypotheses are represented as a change. Executing gcp on proof states returns a patch in the format of *state*(*hyps*\_*patch, goal*\_*patch*) where *hyps*\_*patch* is constructed by *hyps*(*h*1*, ..., hn, change*(*del*\_*hyps, ins*\_*hyps*)). Each *h<sup>i</sup>* is the patch of two matched hypotheses. Figure 5 depicts the patch of the before and after states in Fig. 2.

**Fig. 6.** The result of applying the closure function to the patch in Fig. 5

Subsequently, we need to calculate the *closure* of a patch. The intention is to confirm that every change is *closed*: the left and right sides contain the same holes. Notice that the patch in Fig. 5 contains two unclosed changes, Change(Hole0, Hole1) and Change(Hole1, Hole0). The closure function will go to the subtree, whose root is the parent node of the unclosed change. Then, restore the subtree with the deletion and insertion contexts before we execute gcp on them. The procedure repeats until all changes are closed. Since the gcp function on proof states also returns a patch in a tree structure, we can run the closure function on it. If any patch of matched hypotheses *h<sup>i</sup>* or *change*(*del*\_*hyps, ins*\_*hyps*) are not closed, we restore the *hyps*\_*patch* with the original deletion and insertion contexts of the hypotheses. Then, if the *goal*\_*patch* or the deletion and insertion contexts of the hypotheses are not closed, we restore the patch of the proof states with the entire deletion and insertion contexts of the two proof states. Figure 6 depicts the patch after the execution of the closure function.

The final step is to replace the identical changes with their origin term. The original algorithm may cause identical changes, such as Change(Hole2, Hole2) in Fig. 6. Since we want a compact characterization, they are not necessary.

Tree difference is better at generalizing the differences compared to antiunification. Take the example in Fig. 2 for instance. The lgg in Fig. 3 merely shows that the proof state changes in the position of the variables. The substitutions may be different if we execute rewrite add\_comm on different proof states. However, in the patch generated by the tree difference in Fig. 6, the changes are generalized because we substitute common subterms with holes and will be the same even if we execute rewrite add\_comm on different proof states.

# **2.4 Input Formats**

During training, the language model receives the string <Characterization> Tactic: <Tactic> as input. <Characterization> has four variations:

```
– Before:<Before State>
```

A proof state is represented as a sequent <Hyps> |- <Goal>. The plain texts (like Tactic:) serve as prompts, while the placeholders (such as <Before State> and <Tactic>) are substituted according to the proof context. [] denotes a list. During prediction, the language model receives <Characterization> Tactic: as input and outputs the predicted tactics.

Random forests are fed discrete features as input. For feature difference, the disappeared features and appearing features are distinguished from each other (appearing features and disappeared features as introduced in Sect. 2.1). To utilize anti-unification, we convert the lgg and the terms in the substitution that should be used to obtain the before and after states to features in three disjoint spaces. For anti-unification, we also distinguish the features of deleted hypotheses and inserted hypotheses from other ones. For tree difference, we distinguish the gcp of the proof states, the origin and the destination of changes, and the common subterms into four spaces.

## **3 Learning Models**

We consider two machine learning models for the task. The models will be compared experimentally in the next section.

The first model is a random forest classifier [7]. Random forests are based on decision trees. In decision trees, leaves represent labels (tactics in our case), and internal nodes correspond to features. A rule is a path from the root to a non-leaf. It represents the conjunction of all features on the path. A rule is determined by maximizing the *information gain* of examples. For instance, if we have examples with labels {*b, b, b, a, a*}, we want to generate a rule that passes all examples with the label *a* to its left child and all examples with the label *b* to its right child. A forest makes predictions by voting based on a large number of decision trees. Random forests contain several sub-forests. Each sub-forest is built on a random subset of the entire dataset. We choose a random forest implementation that has previously been used to predict tactics for Coq [42].

The other used machine learning technique is the pre-trained language model GPT-2 [28]. GPT-2 is based on neural networks, which consist of many artificial neurons to learn from training data. The self-attention [35] technique is intensively applied in GPT-2 to differentially weigh every part of the input data. As a language model, GPT-2 predicts the probability distribution of the next word given a sequence of words as the input. GPT-2 is a pre-trained language model. The concept of pre-training imitates the learning process of humans. When humans encounter a new task, humans do not need to learn it from scratch. They will transfer and reuse their old knowledge to learn to solve it. Similarly, GPT-2 is pre-trained on a large natural language dataset BooksCorpus [43]. Afterward, GPT-2 can reuse the knowledge of natural language learned from pre-training to solve new tasks. To be adapted to a new task, we need to fine-tune GPT-2 on a relatively small dataset and slightly modify the weights learned from pre-training. We decide on GPT-2 because pre-trained language models have recently demonstrated outstanding achievements in natural language process (NLP) [8] and formal mathematics [34,39].

#### **4 Experiments**

We perform the experiments on the dataset extracted from the Coq standard library. The dataset consists of 158*,* 494 states extracted from 11*,* 372 lemmas. We randomly split the dataset into three subsets for training, validation, and testing in an 80-10-10% ratio. First, we use 100 trees by default and optimize the Gini Impurity [22]. Gini Impurity is a metric of the information gain. After the optimization, we set the Gini Impurity to its best value, try various numbers of trees and obtain the optimized number of trees. Finally, the best combination of Gini Impurity and the number of trees is determined for each characterization. The experiments with GPT-2 are based on the Hugging Face library [38]. In particular, we employ the smallest GPT-2. The hyper-parameters are: *eta* = 3*e* − 4*, num*\_*beams* = 3*, batch*\_*size* = 32. During training, we apply a linear schedule with the first 20% training steps for warm-up. The remaining parameters are left as their default values. At most 50 tokens are predicted for a single tactic. We truncate the input on the left side if it is longer than the maximal length limitation of GPT-2 (1024 tokens). Language models have length limitations for efficiency. The attention mechanism used by them causes a quadratic usage of memory as the length of tokens scales. Every model is trained for 25 epochs on an NVIDIA V100 GPU, and the snapshot with the highest accuracy on the validation dataset is selected for testing.

Table 1 depicts the results of our experiments. The accuracies of the combinations of before states with after states are significantly better than only relying


**Table 1.** Results on the test dataset, showing how often the prediction makes the same transformation as the tactic in the library. The transformations are considered modulo *α*-equivalence.

on the before states in both random forests and GPT-2. Thus, we conclude that taking after states into consideration is very helpful to learn the semantics of tactics. The accuracies of GPT-2 are significantly higher than random forests, which confirms that the pre-trained language model is a more advanced machine learning technique compared to random forests. For random forests, all of the feature difference, anti-unification, and tree difference perform better than the unprocessed before and after states. This indicates that our characterizations can extract more precise features for random forests. We do not apply GPT-2 to feature differences, as it relies on natural language. In principle, it would be possible to give it feature differences directly as input, but as there are very few similarities between features and natural language it would be a serious disadvantage to the model. The knowledge grasped by pretraining is difficult to be used to understand features. Although feature difference is a little better than anti-unification and tree difference, their results are quite similar. The probable explanation is that random forests are not good at learning from sophisticated features. Random forests cannot learn meaningful knowledge from all three characterizations and almost only learn to make correct predictions for the simple tactics. Similarly, with GPT-2, anti-unification and tree difference provide more accurate predictions than the unprocessed before and after states. We suppose the explanation is that we are able to appropriately shorten the length of the input and also keep important information about the proof transformation. Appropriately shortening the input length is beneficial for GPT-2 because it has a maximal limitation on the number of input tokens. Table 2 compares the percentages of the inputs that are longer than the maximal length limitation. The statistics show that our implementation significantly reduces the probability that the input is over the maximal length limitation. Tree difference can provide more accurate predictions compared to anti-unification with both random forests and GPT-2. This may be attributed to that the generalization made by tree difference is easier to learn by machine learning models.


**Table 2.** The ratios of how many inputs exceed the maximal length limitation

#### **5 Applications**

In this section, we propose two promising applications of the task. We only evaluate the most accurate of the methods proposed in the previous Sect. 4 (GPT-2) on the two tasks.

The first, more direct application, is making tactic suggestions. Given a before state, it is common for an ITP user to have an intuition of the intermediate proof states that are necessary to complete the proof. However, sometimes the user cannot guess the appropriate tactic needed to make the transformations. Using our model with the before state and the imagined intermediate states, the user can get a complete proposed proof as output. Hence, our model will predict the likely tactics to perform the transformations.

The other application is shortening existing Coq proofs. Specifically, for the transformation *ps*<sup>0</sup> ⇒*<sup>t</sup>*<sup>0</sup> *ps*<sup>1</sup> ⇒*<sup>t</sup>*<sup>1</sup> *ps*2*...* ⇒*<sup>t</sup><sup>n</sup> psn*+1, where *ps* is a proof state and *t* is a tactic, we want to predict a tactic *t* such that *ps*<sup>0</sup> ⇒*<sup>t</sup> ps* where *ps* and *psn*+1 are equal under *α*-equivalence. Thus, we can replace the tactic sequence with a single tactic and decrease the length of the Coq proof. A restriction for this task is that because we are only interested in exploring shorter paths between proof states, *psn*+1 should not be a finishing state.



# **5.1 Tactic Suggestion**

We view the experiments in Sect. 4 as the evaluation of tactic suggestions. The before and after states extracted from the Coq standard library are considered as the states that are presented in the Coq editor and those in users' minds, respectively. The results show that taking the after states into consideration, together with the more compact characterization, is essential for correctly suggesting tactics.

The following is an actual tactic suggestion question taken from the Coq Discourse Forum<sup>1</sup>. The question can be summarized as finding a tactic that transforms the following before state to the after state. The goal of the before state is to prove that the element indexed by *m* − 0 in a list equals the element indexed by *m*.

– Before state: l : list nat, x:nat, m : nat, H0 : 1 <= m |- nth (m - 0) l 0 = nth m l 0 – After state: l : list nat, x:nat, m : nat, H0 : 1 <= m |- nth m l 0 = nth m l 0

Table 3 shows the first five tactics predicted by each model. If we consider only the before state, we will obtain the correct prediction in the third place. However, the first two synthesized tactics using anti-unification, tree difference as well as unprocessed before and after states are appropriate. Besides the tactics displayed in bold, other tactics do not perform the expected transformation due to various reasons. Some tactics such as trivial, simpl, and auto do not change the proof state. The tactics rewrite <- plus\_n\_O and apply sub\_0\_r are not applicable and cause errors. The lemma minus\_n\_0 used in rewrite <- minus\_n\_0 does not exist in the Coq standard library. Although rewrite <- sub\_0\_r does not cause an error, it leads to an unexpected after state l : list nat, x:nat, m : nat, H0 : 1 <= m |- nth (m - 0) l 0 = nth m l 0 - 0. Since the operations executed by trivial, simpl, and auto are quite complicated and may depend on the context, we assume it is difficult for the model to comprehensively understand them. Their occurrences in the first five predictions may be mainly because they occur quite frequently in the training data. The results confirm that the combination of before and after states is beneficial for suitably suggesting tactics.

# **5.2 Shortening Proofs**

The results presented in the previous Sect. 4 focused on decomposed tactics. This means compound tactic expressions that perform several steps at once have been decomposed into individual tactic invocations. We apply the technique that is developed by [5] to decompose the tactics. Here, we utilize the same models; however, we focus on the original human-written tactics and try to shorten these (shortening expanded tactics would be unfair). For all tactic sequences of lengths two and three in the training dataset, we input their before and after states into the model. In our experiment, we can only consider the states in the training dataset since our model is trained on all present tactics. Compared to the validation dataset and testing dataset, our model should be able to give better predictions on proof shortening for the training dataset. The amount of original tactics in the training dataset is 56,788. The model synthesizes 10 tactics for each sequence, and we execute them in Coq to verify that they perform the same transformation as the sequence modulo *α*-equivalence.

<sup>1</sup> https://coq.discourse.group/t/how-to-avoid-awkward-assertions/1153/2.


**Table 4.** The shortening ratios and amounts of redundant tactics with different characterizations and sequence lengths.

The results are presented in Table 4. We define the number of *redundant tactics* of *ps*<sup>0</sup> ⇒*<sup>t</sup>*<sup>0</sup> *ps*<sup>1</sup> ⇒*<sup>t</sup>*<sup>1</sup> *ps*2*...* ⇒*<sup>t</sup><sup>n</sup> psn*+1 as *n*. The *shortening ratio* is defined as the number of all discovered redundant tactics divided by the total number of occurrences of tactics in the training dataset. In this section, our method only applies to a tactic sequence that, besides the last tactic, every intermediate tactic produces a single after state. While in Sect. 4, our experiments apply to tactic applications that may produce several after states. The reason is that it is difficult to calculate the number of redundant tactics if intermediate tactics produce several after states. The tactic sequence will become a tree of tactics, and each path consists of a sequence of tactics. We initially expected that the shortening ratios would not be very high because of the selected dataset. Indeed, the Coq standard library is written by Coq experts and has been edited and improved for decades, so we expected that there is not much room to improve. However, given the size of the dataset, the proposed technique can find a number of redundant tactics, which lets us conclude that taking the after states into consideration is useful for proof shortening.

We discover many interesting cases, where proofs can be optimized. We present two examples of such proofs in Table 5. The first is about the Riemann integral where *ring* and *field* denote algebraic structures. The Coq user first substituted a subterm in the proof state, rewrote the goal by several lemmas, and finally applied a lemma about rings. However, our model discovers the nontrivial transformation on ring can be completed with a single transformation in field.

In the second example, the Coq library authors first applied the lemma Qle\_lteq to transform the goal into a disjunction. Later, they selected the left side of the disjunction to continue the proof. Our model is able to figure out that the operation is redundant. Indeed it finds another lemma Qlt\_le\_weak that is able to immediately transform the goal to the left part of the disjunction.

In addition to such more impressive examples of simpler, shorter proofs, our model is also able to find a few abbreviations. Such abbreviations make the proof shorter but do not necessarily improve their readability. For instance, our model sometimes combines unfold Un\_growing and intro into intros x y P H n. It uses the implicit mechanism of intros to unfold Un\_growing. However, a Coq user will not be able to understand what operation intros x y P H n conducts without actually executing the Coq script.

**Table 5.** Two examples of shortening of proofs using the prediction. In both of the presented cases, a single tactic provides an equivalent transformation as a sequence of tactics. Since the hypotheses are not changed in any of the presented examples, we omit them and only present the goals for simplicity.


#### **6 Related Work**

Several problems originating in formal mathematics and theorem proving have been considered from the machine learning point of view. One of the most explored ones is premise selection [1]. The goal of this task is to find lemmas in a large library, that are most likely to prove a given conjecture. For premise selection, the meaning of dependency in formal mathematics has been explored using both approaches that try to explicitly define the logical semantics [19], as well as approaches that use deep learning for this [36]. Next, it is possible to apply machine learning to guide inference-based theorem provers. As part of this task, implicitly the meaning of provability and step usefulness are derived by the learning methods. This has been explored in the two top-performing firstorder theorem provers [17,32] as well as in higher-order logic automated theorem proving [10]. Similarly, the meaning of the usefulness of a proof step has been considered, for example as part of the HOLStep [18], where various machine learning methods try to predict if particular inferences are needed in a proof. All these tasks are different from the task that we propose in the current paper.

Various proof automation systems have emerged to construct proofs by tactic prediction and proof search. SEPIA infers tactics for Coq by tactic trace and automata [13]. TacticToe [12] and Tactician [5,42] apply classical statistical learning techniques such *k*-nearest neighbors [9] and random forests [7] to generate tactic predictions based on the before states. Several systems use neural networks for the same task, e.g. HOList [4], CoqGym [41], and Lime [40]. These are all different from the current work that considers the after states as well.

Autoformalization [20] is a machine translation task applied to formal mathematical proofs. The accuracy of the best methods applied to the task is still very weak in comparison with human formalization [37], however, the neural methods already show some minimal understanding of the meaning of formalization, for example by finding equivalent formulations. Again this is a different task from the one considered in the current work.

#### **7 Conclusion**

In this paper, we propose a new machine learning task, with which we aim to capture the semantics of tactics in formal mathematics. Based on a dataset of almost 160 thousand proof states we consider synthesizing a tactic that transforms a before state to the expected after states. We implement three novel characterizations to describe the transformation: feature difference, anti-unification, and tree difference. The results of the experiments confirm the effectiveness of our characterizations. Two applications of the task are discussed: tactic suggestion for declarative proofs and proof shortening.

In the future, we will investigate if tactic embeddings can be used directly. We can also try to estimate the after states by calculating the embeddings of the before state and the tactic or align tactics between systems in a similar way to how concepts are already aligned between systems [11].

**Acknowledgements.** This work was partially supported by the ERC Starting Grant *SMART* no. 714034, the ERC Consolidator grant *AI4REASON* no. 649043, the European Regional Development Fund under the Czech project AI&Reasoning no. CZ.02.1.01/0.0/0.0/15\_003/0000466, the Cost action CA20111 EuroProofNet, the ERC-CZ project *POSTMAN* no. LL1902, Amazon Research Awards, and the EU ICT-48 2020 project TAILOR no. 952215.

#### **References**


2015. LNCS (LNAI), vol. 9195, pp. 378–388. Springer, Cham (2015). https://doi. org/10.1007/978-3-319-21401-6\_26


254 L. Zhang et al.

43. Zhu, Y., et al.: Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19–27 (2015)

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Translating SUMO-K to Higher-Order Set Theory

Chad E. Brown<sup>1</sup>, Adam Pease1,2(B) , and Josef Urban<sup>1</sup>

<sup>1</sup> Czech Technical University in Prague, Prague, Czech Republic apease@articulatesoftware.com

<sup>2</sup> Parallax Advanced Research, Beavercreek, OH, USA

Abstract. We describe a translation from a fragment of SUMO (SUMO-K) into higher-order set theory. The translation provides a formal semantics for portions of SUMO which are beyond first-order and which have previously only had an informal interpretation. It also for the first time embeds a large common-sense ontology into an interactive theorem proving system. We further extend our previous work in finding contradictions in SUMO from first-order constructs to include a portion of SUMO's higher-order constructs. Finally, using the translation, we can create problems that can be proven using higher-order interactive and automated theorem provers. This is tested in several systems and used to form a corpus of higher-order common-sense reasoning problems.

Keywords: ontology · theorem proving · Megalodon · theorem proving · automated theorem proving · automated reasoning · SUMO

# 1 Introduction and Motivation

The Suggested Upper Merged Ontology (SUMO) [15,16] is a comprehensive ontology of around 20,000 concepts and 80,000 hand-authored logical statements in a higher-order logic. It has an associated integrated development environment called Sigma [19] <sup>1</sup> that interfaces to theorem provers such as E [22] and Vampire [12]. In previous work on translating SUMO to the TPTP [25] THF (Typed Higher-order Form) [1] format, a syntactic translation to THF was created but did not resolve many aspects of the intended higher-order semantics of SUMO.

In this work, we lay the groundwork for a new translation to a language for higher-order automated theorem provers based on expressing SUMO in higherorder set theory. We believe this will attach to SUMO a stronger set-theoretical interpretation that will allow deciding more queries and provide better intuition for avoiding contradictory formalizations. Once this is done, our plan is to train ENIGMA-style [5–8] query answering and contradiction-finding [23] AITP systems on such SUMO problems and develop autoformalization [9–11,28] methods targeting common-sense reasoning based on SUMO. We believe that this is the most viable path towards common-sense reasoning that is both trainable, but also explainable and verifiable, providing an alternative to language models which come with no formal guarantees.

<sup>1</sup> https://www.ontologyportal.org.

c The Author(s) 2023

U. Sattler and M. Suda (Eds.): FroCoS 2023, LNAI 14279, pp. 255–274, 2023. https://doi.org/10.1007/978-3-031-43369-6\_14

#### 1.1 Related Work and Contributions

In earlier work, we described [19] how to translate SUMO to the strictly firstorder language of TPTP-FOF [20] and TF0 [17,18,26]. SUMO has an extensive type structure and all relations have type restrictions on their arguments. Translation to TPTP FOF involved implementing a sorted (typed) logic axiomatically in TPTP by altering all implications in SUMO to contain type restrictions on any variables that appear.

In [21] 35 SUMO queries were converted into challenge problems for firstorder automated theorem provers. In many cases, first-order ATPs can prove the corresponding problem. However, some of the queries involve aspects of SUMO that go beyond first-order representation. For example, one of the queries involves a term-level binder (κ).<sup>2</sup> Several of the queries also involve *row variables* (also called *sequence variables*), i.e., variables that should be instantiated with a list of terms. We discuss here several such examples to motivate the translation to higher-order set theory. We then embed SUMO into the Megalodon system, providing, to our knowledge, the first representation of a large commonsense ontology within a interactive theorem prover (ITP). We then consider the higher-order problems obtained via the translation. This provides a set of challenge problems for higher-order theorem provers that come from a different source than formalized mathematics or program verification.

The rest of the paper is organized as follows. In Sect. 2 we introduce the SUMO-K fragment of SUMO, an extension of the first-order fragment of SUMO. We also show there examples in SUMO that motivate the extensions. Section 3 describes a translation from SUMO-K into a higher-order set theory. We have constructed interactive proofs of the translated form of 23 SUMO-K queries. We describe several of the proofs in Sect. 4. From the interactive proofs we obtain 4880 ATP problems and we measure the performance of higher-order automated theorem provers on this problem set in Sect. 5. Section 6 describes the planned extensions and Sect. 7 concludes. Our code and problem set are available online.<sup>3</sup>

#### 2 The SUMO-K Fragment

We define a fragment of SUMO we call SUMO-K. This extends the first-order fragment of SUMO with support for row variables, variable arity functions and relations, and the κ class formation term binder.<sup>4</sup> Elements of SUMO not included in SUMO-K are temporal, modal and probabilistic operations.

We start by defining SUMO-K *terms*, *spines* (lists of terms) and *formulas*. Formally, we have *standard variables* (x), *row variables* (ρ) and *constants* (c).

<sup>2</sup> Note that by "term-level binder" we mean a binder that yields a term. By way of constrast, ∀ and ∃ are formula-level binders. κ is used to form classes in SUMO.

Informally, one can think of κx.ψ as the class {x|ψ}. <sup>3</sup> http://grid01.ciirc.cvut.cz/~chad/sumo2set-0.9.tgz.

<sup>4</sup> SUMO classes should not be confused with set-theoretic classes. Our use of "class" in this paper will always refer to SUMO classes.

We will also have *signed rationals* (q) represented by a decimal expression with finitely many digits (i.e., those rationals expressible in such a way) as terms. We define by mutual recursion the sets of SUMO-K terms t, SUMO-K spines s and SUMO-K formulas ψ as follows:

$$\begin{array}{l} t & ::= x|c|q|(x\ s)|(c\ s)|(\kappa x.\psi)|\mathsf{Real}|\mathsf{Nag}|\mathsf{Nong}\mathsf{e}|(t+t)|(t-t)|(t\ \ast t)|(t\ \ \ \mid t\ )\\ s & ::= t\ s|\cdot|\rho|\rho\ t\ \cdot\cdot\cdot t\\ \psi & ::= \perp|\top|(\neg\psi)|(\psi\rightarrow\psi)|(\psi\wedge\psi)|(\psi\vee\psi)|(\psi\leftrightarrow\psi)|(\forall x.\psi)|(\exists x.\psi)|(\forall\rho.\psi)|(\exists\rho.\psi)|\\ & |\quad (t=t)|(\mathsf{instname}\ t\ t)|(\mathsf{subcla}\ \mathsf{s}\ t\ \ \mathsf{t})|(t\ \leq\ t)|(t<\ t)|(c\ s) \end{array}$$

The definition is mutually recursive since the term κx.ψ depends on the formula ψ. Of course, κ, ∀ and ∃ are binders. In practice, most occurrences of ρ are at the end of the spine. In some cases, however, extra arguments t1,...,t*<sup>n</sup>* occur after the ρ. The idea is that ρ will be a list of arguments and t1,...,t*<sup>n</sup>* will be appended to the end of that list. Note that at most one row variable can occur in a spine.

#### 2.1 Implicit Type Guards

Properly parsing SUMO terms and formulas requires mechanisms for inferring implicit type guards for variables (interpreted conjunctively for κ and ∃ and via implication for ∀). Free variables in SUMO assertions are implicitly universally quantified and are restricted by inferred type guards, as described in [19]. In previous translations targeting first-order logic, relation and function variables are instantiated during the translation (treating the general statement quantifying over relations and functions as a macro to be expanded). Since the current translation will leave these as variables, we must also deal with type guards that are not known until the relation or function is instantiated.

#### 2.2 Variable Arity Relations and Functions

Consider the SUMO relation partition, declared as follows:

```
(instance partition Predicate)
(instance partition VariableArityRelation)
(domain partition 1 Class)
(domain partition 2 Class)
```
The last three items indicate that partition has variable arity with at least 2 arguments, both of which are intended to be classes. If there are more than 2 arguments, the remaining arguments are also intended to be classes. In general, the extra optional arguments of a variable arity relation or function are intended to have the same domain as the last required argument. We will translate partition to a set that encodes not only when the relation should hold, but also its domain information, its minimum arity and whether or not it is variable arity.

Two other variable arity relations (with the same arity and type information as partition) are exhaustiveDecomposition and disjointDecomposition. The following is an example of a SUMO-K assertion relating these concepts:

∀ρ.partition ρ → exhaustiveDecomposition ρ ∧ disjointDecomposition ρ.

Previous translations to first-order logic expanded this assertion into several facts for different possible arities (using different predicates partition3, partition4, etc.), up to some limit. The following is an example of a partition occurring in Merge.kif<sup>5</sup> with 6 arguments:

(partition Word Noun Verb Adjective Adverb ParticleWord)

From this one should be able to infer the following query:

*Example 1 (*wordex*).*

(query (exhaustiveDecomposition Word Noun Verb Adjective Adverb ParticleWord))

However, the corresponding first-order problem will not be provable unless the limit on the generated arity is at least 6. Our translation into set theory will free us from the need to know such limits in advance.

#### 2.3 Quantification over Relations

Merge.kif includes assertions that quantify over relations. The following is an example of such an assertion:

```
(=>
  (and
    (subrelation ?REL1 ?REL2)
    (instance ?REL1 Predicate)
    (instance ?REL2 Predicate)
    (?REL1 @ROW))
  (?REL2 @ROW))
```
In previous first-order translations such assertions are instantiated with all R and R where (subrelation R R ) is asserted. One of the 35 problems from [21] (TQG22) makes use of the SUMO assertion that son is a subrelation of parent and the macro expansion style of first-order translation is sufficient to handle this example. However, the macro expansion approach is insufficient to handle hypothetical subrelation assertions. The following is an example of a query creating a hypothetical subrelation assertion:

<sup>5</sup> Merge.kif is the main SUMO ontology file. While Merge.kif evolves over time, we work with a fixed version of the file from January 2023. Latest versions of it and all the other files that make up SUMO are available at https://github.com/ ontologyportal/sumo.

```
Example 2 (TQG22alt4).
(query (=> (exists (?X) (employs ?X ?X))
           (not (subrelation employs uses))))
```
During the process of answering this query we will assume employs is a subrelation of uses and then must instantiate the general assertion about subrelations with employs and uses. Our translation to set theory will permit this.

#### 2.4 Kappa Binders

One of the 35 queries from [21] (TQG27) has the following local assumption making use of a κ-binder.

*Example 3.* The example TQG27 includes three assertions:

(A1) instance Planet Class, (A2) subclass Planet AstronomicalBody, and (the one with a κ-binder) (A3) instance o (κp.instance p Planet ∧ attribute p Earthlike). Informally, one can read (A3) as o ∈ {p|p is an Earthlike planet}. The query is (Q) instance o Planet.

The query should easily follow by eliminating the κ-abstraction. The first-order problem generated in [21] drops the assumption with the κ-abstraction (A3), making the problem unlikely to be provable (at least not for the intended reason). Our translation to set theory will handle κ-binders and the translation of this problem will be provable in the set theory.

#### 2.5 Real Arithmetic

Six of the 35 examples from [21] involve some real arithmetic. Two simple example queries are the following:

*Example 4 (*TQG3*).*

```
(instance Number3-1 NonnegativeRealNumber)
(query (not (instance Number3-1 NegativeRealNumber)))
```

```
Example 5 (TQG11). (query (equal 12 (MultiplicationFn 3 4)))
```
For the sake of brevity we represent the first problem as having one local constant n, one local assumption instance n Nonneg and the query (conjecture) ¬(instance n Neg). We will translate signed rationals with a finite decimal expansion to real numbers represented as sets.<sup>6</sup> We will also translate Real to be equal to the set of reals . Furthermore we translate the operations +, −, ∗, /, < and ≤ to have the appropriate meaning when applied to two reals.<sup>7</sup> We then translate

<sup>6</sup> We use a fixed construction of the reals, but the details of this are not relevant here.

<sup>7</sup> To be more precise, we are using a specific set of reals constructed in the higherorder set theory, and operations (e.g., multiplication) are the expected set-theoretic operations on that set of reals. For simplicity, our set-theoretic division is a total function returning 0 when the denominator is 0.

Neg to {x ∈ 

In addition to direct uses of arithmetic as in the examples above, arithmetic is also often used to check type guard information. This is due to the fact that a spine like t<sup>1</sup> t<sup>2</sup> ρ will use subtraction to determine that under some constraints the i *th* element of the corresponding list will be the (<sup>i</sup> <sup>−</sup> 2)*nd* element of the list interpreting ρ.

### 3 Translation of SUMO-K to Set Theory

#### 3.1 High Level Overview: Sets, Terms, Spines and Formulas

Our translation maps terms t to sets. The particular set theory we use is *higherorder Tarski-Grothendieck* as described in [4].<sup>8</sup> The details of this set theory are not important here. We only note that we have ∈, ⊆ (which will be used to interpret SUMO's instance and subclass) and that we have the ability to λ-abstract variables to form terms at higher types. The main types of interest are ι (the base type of sets), o (the type of propositions), ι → ι (the type of functions from sets to sets) and ι → o (the type of predicates over sets).

Terms: When we say SUMO terms t are *translated to sets*, we mean they are translated to terms of type ι in the higher-order set theory.

Spines: Spines s are essentially lists of sets (of varying length). We translate them as functions that encode finite sequences. These functions are formally of the general type ι → ι. However, we only use them when restricted to natural numbers, i.e., arguments n ∈ ω (where ω is the set of finite ordinals). We also maintain the invariant that the function returns the empty set on all but finitely many n ∈ ω. An auxiliary function listset : (ι → ι) → ι gives a set-theoretic representation of the list by restricting its domain to ω. 9

Tagging, Untagging, Length: To avoid confusion with the empty set being on a list, we tag elements of lists to ensure they are nonempty. Let I : ι → ι be such a *tagging function* (injective on the universe of sets) and U : ι → ι be an *untagging function*. We then define nil : ι → ι to be constantly ∅ and cons : ι → (ι → ι) → ι → ι to take a set x and a list l to the function mapping 0 to I x and i + 1 to l i for i ∈ ω. We also define a function len : (ι → ι) → ι by λl.{<sup>i</sup> <sup>∈</sup> <sup>ω</sup>|l i <sup>=</sup> ∅}<sup>10</sup> giving us the length of the list (assuming it is a list). Informally, a spine t<sup>0</sup> ···t*<sup>n</sup>*−<sup>1</sup> is thus a function taking i to I(t *<sup>i</sup>*) for each i ∈ {0,...,n − 1} where t *<sup>i</sup>* is the set-theoretic value of t*<sup>i</sup>* and I the tagging function.

<sup>8</sup> Tarski-Grothendieck is a set theory in which there are universes modeling ZFC set theory. These set-theoretic universes should not be confused with the universe of discourse Univ1 introduced below.

<sup>9</sup> We include listset since sometimes a list needs to be considered as a set.

<sup>10</sup> Note that by design this set is the finite ordinal giving the length of the list.

Formulas: The translation of a SUMO formula ψ can be thought of either as a set (which should be one of the sets 0 or 1) or as a proposition. We also sometimes coerce between type ι and o by considering the sets 0 and 1 to be sets corresponding to false and true. Let P : ι → o be λX.∅ ∈ X and let B : o → ι be λp.if p then 1 else 0. We use these functions as coercions between ι and o.

#### 3.2 Motivating Examples

Before describing the translation in more detail, we give a few more simple examples to explain various aspects of the translation and motivate our choices.

Univ1 and Kappa: Let Univ1 be a set. This set is intended to be a *universe of discourse* in which most (but not all) targets of interpretation for t will live. Specifically, we will map the SUMO-type Class to the set ℘ Univ1 (the power set of the universe). We take all SUMO-types except the four special cases Class, SetOrClass, Abstract and Entity to be sets in ℘ Univ1. Consequently, if a SUMO object is an instance of some class other than Class, SetOrClass, Abstract and Entity, we will know that the object is a member of Univ1. Due to this we choose to translate κ-binders using simple separation bounded by Univ1. Reconsidering TQG27 discussed in Sect. 2.4 we translate instance o (κp.instance p Planet ∧ attribute p Earthlike) to a set-theoretic proposition<sup>11</sup> of the form <sup>o</sup> ∈ {<sup>p</sup> <sup>∈</sup> Univ1|··· <sup>p</sup> <sup>∈</sup> PLANET ∧ ···} (only partially specified at the moment).<sup>12</sup> From this set-theoretic proposition we can easily derive o ∈ PLANET to solve the set-theoretic version of TQG27.

Variable Arity and Type Guards: As mentioned above, partition is a variable arity relation of at least arity 2 where every argument must be of SUMOtype Class. We will translate partition to a set PA containing multiple pieces of information. The behavior of PA as a relation is captured by the results one obtains by applying it to a set encoding a list of sets (via a set-theoretic operation ap : ι → ι → ι). We can apply an abstract function arity : ι → ι to obtain the minimum arity of PA. We can apply an abstract predicate vararity : ι → o to encode that PA has variable arity. Likewise we can apply an abstract domseq : ι → ι → ι to PA and an i ∈ ω to recover the intended domain of argument i of PA. These extra pieces of information are important to determine type guards in the presence of function and relation arguments.

In the specific case of partition the translation yields a set PA such that arity PA = 2, vararity PA is true and for i ∈ {0, 1, 2}, domseq PA i = ℘ Univ1. The value of domseq PA 2 determines the intended domain of all remaining (optional) arguments of the relation. (Note that SUMO indexes the first argument by 1 while in the set theory the first argument is indexed by 0.) The SUMO assertion

(partition Word Noun Verb Adjective Adverb ParticleWord)

<sup>11</sup> A set-theoretic proposition is a closed formula in the language of higher-order set theory [4].

<sup>12</sup> Note that the SUMO constant is Planet while its translated set-theoretic counterpart is PLANET.

translates to the set-theoretic statement<sup>13</sup>

```
P (ap PA (listset (cons Word (cons Noun (cons Verb (cons Adjective
            (cons Adverb (cons ParticleWord nil)))))))).
```
Recall the SUMO-K assertion

∀ρ.partition ρ → exhaustiveDecomposition ρ ∧ disjointDecomposition ρ.

In this case the translation also generates type guards for the row variable ρ. Let PA, ED and DD be the sets corresponding to the SUMO constants partition, exhaustiveDecomposition and disjointDecomposition. Essentially, the assertion should only apply to ρ when ρ has at least length 2 and every entry is a (tagged) class. The translated set-theoretic statement (with type guards) is

```
∀ρ : ι → ι.dom_of (vararity PA) (arity PA) (domseq PA) ρ
   → dom_of (vararity ED) (arity ED) (domseq ED) ρ
  → dom_of (vararity DD) (arity DD) (domseq DD) ρ
     → P (ap PA ρ) → P (ap ED ρ) ∧ P (ap DD ρ)
```
The statement above makes use of a new definition: dom\_of : o → ι → (ι → ι) → (ι → ι) → o. The first argument of dom\_of is a proposition encoding whether or not the function or relation is variable arity. In this case, all three of the propositions are variable arity (with the same typing information for all three). In the variable arity case dom\_of nDρ is defined to be dom\_of\_varar nDρ where dom\_of\_varar : ι → (ι → ι) → (ι → ι) → o, n is the minimum arity, D is the list of domain information and ρ is the list we are requiring to satisfy the guard. dom\_of\_varar nDρ is defined to hold if the following three conditions hold:


3. ∀i ∈ len ρ, n ⊆ i → U (ρ i) ∈ D n.

For fixed arity, dom\_of is defined via a simpler dom\_of\_fixedar condition. Another SUMO assertion about partitions is

(=> (partition ?SUPER ?SUB1 ?SUB2) (partition ?SUPER ?SUB2 ?SUB1))

In this case there are three standard (nonrow) variables needing type guards in the translation. Roughly speaking, domseq PA has the information we need, but in general we must modify it to be appropriate for variable arity relations. For this reason domseqm : ι → ι → ι is defined to be

λri.if vararity r then domseq r (if i ∈ arity r then i else arity r) else domseq r i.

<sup>13</sup> Note that we omit parentheses via the usual convention that implication is right associative, i.e., φ → ψ → ξ means φ → (ψ → ξ). Note also this is logically equivalent to φ ∧ ψ → ξ.

The translated statement is

$$\begin{array}{c} \mathsf{PT} \mathsf{Y} \mathsf{Z}. X \in \mathsf{domseqm} \mathsf{PA} \ 0 \to Y \in \mathsf{domseqm} \mathsf{PA} \ 1 \to Z \in \mathsf{domseqm} \mathsf{PA} \ 2 \to \mathsf{domseqm} \mathsf{PA} \ 2 \to \mathsf{R} \in \mathsf{domseqm} \mathsf{PA} \ 2 \to \mathsf{R} \in \mathsf{P} (\mathsf{ap}\ \mathsf{PA} \ (\mathsf{cons} \ X \ (\mathsf{cons} \ Y \ (\mathsf{cons} \ Y \ (\mathsf{cons} \ Z \ \mathsf{nil}))))) \\ \to \mathsf{P} (\mathsf{ap}\ \mathsf{PA} \ (\mathsf{cons} \ X \ (\mathsf{cons} \ Y \ (\mathsf{cons} \ Z \ (\mathsf{cons} \ Y \ \mathsf{nil})))))). \\ \to \mathsf{P} (\mathsf{ap}\ \mathsf{PA} \ (\mathsf{cons} \ X \ (\mathsf{cons} \ Z \ (\mathsf{cons} \ Y \ \mathsf{nil}))))). \end{array}$$

A simpler translation for handling type guards in this example could avoid the use of dom\_of and domseqm and instead look up the arity and typing information for partition, etc. This translation would not work in general since SUMO assertions quantify over relations, in which case the particular type guards are not known until the relation variables are instantiated. Consider the SUMO-K formula

∀R1R2.∀ρ.subrelation R<sup>1</sup> R<sup>2</sup> ∧ instance R<sup>1</sup> Predicate → instance R<sup>2</sup> Predicate → R<sup>1</sup> ρ → R<sup>2</sup> ρ.

This translates to the set-theoretic proposition

$$\begin{array}{c} \forall R\_1 R\_2 : \iota \forall \rho : \iota \to \iota. R\_1 \in \mathsf{dom}\mathsf{seq}. \mathsf{SR}\ 0 \to R\_2 \in \mathsf{dom}\mathsf{seq}. \mathsf{SR}\ 1 \to \mathsf{R} \\\ R\_1 \in \mathsf{E} \to R\_2 \in \mathsf{E} \to \mathsf{dom}\\_ \mathsf{of} \ (\mathsf{variety} \ R\_1) \ (\mathsf{arity} \ R\_1) \ \rho \\\ \rightarrow \mathsf{dom}\\_ \mathsf{of} \ (\mathsf{variety} \ R\_2) \ (\mathsf{arity} \ R\_2) \ \rho \\\ \rightarrow \mathsf{P} \ (\mathsf{ap}\ \mathsf{SR}\ (\mathsf{cons}\ R\_1 \ (\mathsf{cons}\ R\_2 \ \mathsf{nil}))) \ \land R\_1 \in \mathsf{PR} \to R\_2 \in \mathsf{PR} \to \mathsf{P} \ (\mathsf{ap}\ \mathsf{R}\_1 \ \rho) \\\ \rightarrow \mathsf{P} \ (\mathsf{ap}\ \mathsf{R}\_2 \ \rho) \end{array}$$

where E, SR and PR are the sets corresponding to the SUMO constants Entity, subrelation and Predicate. Here the type guards on ρ depend on R<sup>1</sup> and R2. Two special cases are the type guards R*<sup>i</sup>* ∈ E which are derived from the use of R*<sup>i</sup>* as the first argument of instance.

#### 3.3 The Translation

We now describe the translation itself. A first pass through the SUMO files given records the typing information from domain, range, domainsubclass, rangesubclass and subrelation assertions. A finite number of secondary passes determines which names will have variable arity (either due to a direct assertion or due to being inferred to be in a variable arity class).<sup>14</sup>

The final pass translates the assertions, and this is our focus here. Each SUMO-K assertion is a SUMO-K formula ϕ which may have free variables in it. Thus if we translate the SUMO-K formula ϕ into the set-theoretic proposition ϕ , then the translated assertion will be

$$
\forall x\_1 \cdots x\_n. G\_1 \to \cdots \cdot G\_m \to \varphi'
$$

where x1,...,x*<sup>n</sup>* are the free variables in ϕ and G1,...,G*<sup>m</sup>* are the type guards for these free variables. Note that some of these free variables may be for spine

<sup>14</sup> In practice with the current Merge.kif file, a single secondary pass suffices, but in general one might need an extra pass to climb the class hierarchy.

variables (i.e., row variables) and may have type ι → ι. Such variables may also have type guards.

SUMO-K variables x translate to themselves where after translation x is a variable of type ι (ranging over sets). For SUMO-K constants c we choose a name c and declare this as having type ι. Rational numbers q with a finite decimal expansion are translated to the set calculating the quotient of the base ten numerator divided by the appropriate power of 10. For example, 11.2 would be translated to the term <sup>1</sup> <sup>∗</sup> <sup>10</sup><sup>2</sup> + 1 <sup>∗</sup> 10 + 2 divided by <sup>10</sup> (where <sup>1</sup>, <sup>2</sup> and 10 are the usual finite ordinals and exponentiation by finite ordinals is defined by recursion). When a variable or constant is applied to a spine we translate the spine and use ap. As mentioned in Sect. 2.5 Real is translated to the set , Neg is translated to {x ∈ 

$$\forall xy \in \mathfrak{R}. \mathsf{ap}\ \mathsf{ADD}\ (\mathsf{cons}\ x\ (\mathsf{cons}\ y\ \mathsf{nil})) = x + y,$$

∀xy ∈ .ap MULT (cons x (cons y nil)) = x · y

and

∀xy ∈ .P (ap (LESSTHAN (cons x (cons y nil)))) = (x<y).


The only remaining case for terms is κ binder terms.

– We translate (κx.ψ) to

$$\{x \in \mathsf{Univ1} \mid G\_1 \land \dots \, G\_m \land \psi'\}$$

where G1,...,G*<sup>m</sup>* are generated type guards for x and ψ is the result of translating the SUMO-K formula ψ to a set-theoretic proposition. Note that x ranges over Univ1.

The translations of spines is relatively straightforward, but a few points are worth mentioning.


We consider each case of a SUMO-K formula. The usual logical operators are translated as the corresponding operators:


We use set membership and inclusion to interpret instance and subclass.


#### 4 Interactive Proofs of Translated SUMO Queries

The motivating set of examples were the 35 example queries from [21], now expanded<sup>15</sup>. Six of the original examples involve temporal reasoning. We omit

<sup>15</sup> https://github.com/ontologyportal/sumo/tree/master/tests.

these for the moment, leaving a future translation to handle temporal and modal reasoning. 9 questions involve too many arguments for the existing first-order translation with macro expansion to work, but which are handled by our new translation. Among the remaining problems, 5 require some arithmetical reasoning, which use preexisting translations to standard first-order logic (FOF) and to an extension of first-order logic with arithmetic (TFF). For the remaining problems, the results of (at least) 5 were still not provable by the ATPs Vampire or E within a 600 s timeout.

We carefully looked at the set-theoretic translation of 13 of the problems that were too difficult for first-order provers (for any of the above reasons other than the use of temporal or modal reasoning). We either did an interactive proof or found slight modifications of the problem that could be interactively proven. The interactive proofs were done in Megalodon (the successor to the Egal system [4]). One advantage of having such a translation is the ability to attempt interactive proofs and recognize what may be missing from Merge.kif or the original query. We also did interactive proofs of 4 problems that the firstorder provers could prove. We additionally included the 6 problems dealing with variable arity and row variables (e.g., Example 1). In total we have 23 SUMO-K queries translated to set-theoretic statements that have been interactively proven. We briefly describe some of the interactive proofs here.

An example with a particularly simple proof is TQG27 (Example 3), the example with a κ-binder. The assertion with the κ-binder translates to the settheoretic proposition

$$o \in \{ p \in \mathsf{Univ1} \mid p \in \mathsf{E} \land p \in \mathsf{domseqm} \text{ #ribute } 0 \land p \in \mathsf{Planet} \land \text{ } \mathsf{P} \text{ (ap \; \mathsf{attribute} (listset (cons } p \text{ (cons } \mathsf{Extllike } \mathsf{nil}))))\}\}\dots$$

The query translates simply to o ∈ Planet.

When interactively proving the translated query in Megalodon, we are free to use statements coming from three sources: set-theoretic propositions already previously proven in Megalodon (or are axioms of Tarski-Grothendieck), propositions resulting from the translation of formulas in Merge.kif, and propositions resulting from translating formulas local to the example. In this case we only need two propositions: the translated formula local to the example given above and one known set-theoretic proposition of the form:

$$\forall X: \iota. \forall P: \iota \to o. \forall x: \iota. x \in \{x \in X | P \: x\} \to x \in X \land P \: x.$$

From the two propositions we easily obtain the conjunction

o ∈ Univ1 ∧ o ∈ E ∧ o ∈ domseqm attribute 0 ∧ o ∈ Planet ∧P (ap attribute (listset (cons o (cons Earthlike nil)))).

After this first step, a series of steps eliminate the conjunctions until we have the desired conjunct o ∈ Planet.

Another relatively simple example is TQG11 (Example 5) in which we must essentially prove 12 is 3 · 4. To be more precise we must prove

1 · 10 + 2 = ap MULT (listset (cons 3 (cons 4 nil))).

As mentioned in Sect. 3.3 the translation adds the proposition

$$\forall xy \in \mathfrak{R}. \mathsf{ap} \text{ } \mathsf{M} \mathsf{U} \mathsf{L} \mathsf{T} \ (\mathsf{cons} \ x \ (\mathsf{cons} \ y \ \mathsf{nil})) = x \cdot y$$

which will be useful here. In the interactive proof, we first prove a claim that every natural number (finite ordinal) is a real number (i.e., ω ⊆ , which is true for the representation of the reals being used). This claim is then used to prove 3 ∈ and 4 ∈ . This allows us to reduce the main goal to proving 1 · 10+ 2 = 3 · 4. This goal is then proven by an unsurprising sequence of rewrites using equations defining the behavior of + and · on finite ordinals. (Many details are elided here, such as the fact that there are actually two different operations +, one on reals and one only on finite ordinals and that they provably agree on finite ordinals.)

We next consider the proof of the translation of Example 2. The set-theoretic proposition resulting from translating the query is

$$\begin{array}{l} (\exists x. x \in \mathsf{domseqm ɛmplays } 0 \land x \in \mathsf{domseqm ɛmplays } 1) \\ \land \mathsf{P} \left( \mathsf{ap ɛmplays } (\mathsf{listset} \left( \mathsf{cons} \ x \ (\mathsf{cons} \ x \ \mathsf{nil}))) \right) \right) \\ \to \neg \mathsf{P} \left( \mathsf{SR} \left( \mathsf{listset} \left( \mathsf{cons} \ \mathsf{emplos} \ (\mathsf{cons} \ \mathsf{uns} \ \mathsf{nil})) \right) \right) \right) . \end{array}$$

We begin the interactive proof by proving the following sequence of claims:

1. len nil = 0. 2. ∀X.∀R : ι → ι.∀n.nat\_p n → len R = n → len (cons X R) = ordsucc n. 3. ∀y.¬vararity y → ∀i.domseqm y i = domseq y i. 4. ∀y.¬vararity y → ∀xi.x ∈ domseq y i → x ∈ domseqm y i. 5. ∀X.∀R : ι → ι.cons X R 0 = I X 6. ∀n.nat\_p n → ∀X.∀R : ι → ι.cons X R (ordsucc n) = R n.

We can then rewrite domseqm employs into domseq employs. Starting the main body of the proof, we assume we have an x such that x ∈ domseq employs 0, x ∈ domseq employs 1 and P (ap employs (listset (cons x (cons x nil)))). We further assume P (SR (listset (cons employs (cons uses nil))) and prove a contradiction. Using the translated Merge.kif type information from employs we can infer x is an autonomous agent and an object. Likewise we can infer employs is a predicate and a relation, and the same for uses. The contradiction follows from two claims: P (ap uses (cons x (cons x nil))) and ¬P (ap uses (cons x (cons x nil))).

We first prove P (ap uses (cons x (cons x nil))). We locally let ROW be cons x (cons x nil) and use the claims above prove from ROW 0 = I x, ROW 1 = I x, U (ROW 0) = x, U (ROW 1) = x and len ROW = 2. We can then essentially complete the subproof using the local assumptions

P (ap employs (listset (cons x (cons x nil))))

and

P (SR (listset (cons employs (cons uses nil))))

along with the translation of the following Merge.kif formula:

```
(=>
  (and
    (subrelation ?REL1 ?REL2)
    (instance ?REL1 Predicate)
    (instance ?REL2 Predicate)
    (?REL1 @ROW))
  (?REL2 @ROW))
```
To complete the contradiction we prove ¬P (ap uses (cons x (cons x nil))). The three most significant Merge.kif formulas whose translated propositions are used in the subproof are:

```
(instance uses AsymmetricRelation)
(subclass AsymmetricRelation IrreflexiveRelation)
(=>
  (instance ?REL IrreflexiveRelation)
  (forall (?INST)
    (not
      (?REL ?INST ?INST))))
```
That is, Merge.kif declares that uses is an asymmetric relation, every asymmetric relation is an irreflexive relation, and that irreflexive relations have the expected property of irreflexivity.

# 5 ATP Problem Set

After interactively proving the 23 problems, we created TH0<sup>16</sup> problems restricted to the axioms used in the proof. This removes the need for the higherorder ATP to do premise selection. Additionally we used Megalodon to analyze the interactive proof to create a number of subgoal problems for ATPs – ranging from the full problem (the initial goal to be proven) to the smallest subgoals (completed by a single tactic). For example, the interactive proofs of Examples 1, 2 and 5 generate 415, 322 and 100 TH0 problems, respectively. In total analysis of the interactive proofs yields 4880 (premise-minimized) TH0 problems for ATPs. In Table 1 we give the results for several higher-order automated theorem provers (Leo-III [24], Vampire [13], Lash [3], Zipperposition [27], E [22]), given a 60 s timeout.

# 6 Future Work

The primary plan to extend the translation is to include temporal and modal operators. SUMO includes many modal operators including necessity, possibility,

<sup>16</sup> TH0 was introduced as THF0 in [2] as a core language for representing typed higherorder formulas (in the sense of Church's simple type theory) for automated theorem provers.


Table 1. Number of Subgoals Proven Automatically in 60 s

deontological operators (obligation and permission) and modalities for knowledge, beliefs and desires. Each modality can be modelled using Kripke style semantics [14] (possible worlds with an accessibility relation).

The following is an example of a SUMO formula in Merge.kif using modalities<sup>17</sup> :

(=>

```
(modalAttribute ?FORMULA Necessity)
(modalAttribute ?FORMULA Possibility))
```
<sup>17</sup> Note that SUMO embeds several different modalities that have different axiomatizations. Rather than assuming one particular modal logic axiomatization (S4, S5 etc.) by embedding different modal logics in higher-order logic we hope to determine if we can create a coherent system of axiomatizations while avoiding known paradoxes like the gentle murderer paradox.

The current translation simply skips these formulas as they are not in the SUMO-K fragment. If we only wanted to extend the translation to include necessity and possibility, we could change the translation to make the dependence on worlds explicit. The SUMO formula above could translate to the proposition

$$\forall w \in W. \forall \varphi: \iota \to \iota. (\forall v \in W. R \ w \ v \to \mathsf{P}. (\varphi \ v)) \to (\exists v \in W. R \ w \ v \land \mathsf{P}. (\varphi \ v)).$$

Here W is a set of worlds and R is an accessibility relation on W. Note that the translated formula variable has type ι → ι instead of type ι to make the dependence of the formula on the world explicit. In general, terms, spines and formulas would depend on a world w and in an asserted formula the world w would be universally quantified (ranging over W) as above.

If we took the approach above to model necessity and possibility, then to add deontic modalities later we would need a second set of worlds and accessibility relation. The translation of terms would then have type ι → ι → ι to account for the dependence on both kinds of worlds. In order to prevent needing to keep adding new dependencies for every modalities, our plan is to combine the sets of worlds and accessibility relations in an extensible way. Thus terms will translate to have type ι → ι essentially giving dependence on a single set encoding a sequence of worlds (where we are open ended about the length of the sequence). Using this idea, the SUMO formula above would translate to something like

$$\begin{array}{c} \forall w \in (\varPi x \in X. W. x). \forall \varphi: \iota \to \iota. (\forall v \in (\varPi x \in X. W. x). R. m. w. v \to \mathsf{P}. (\varphi \ v)) \\ \rightarrow (\exists v \in (\varPi x \in X. W. x). R. m. w. v \wedge \mathsf{P}. (\varphi \ v)) \end{array}$$

where X is an index set (where each x ∈ X corresponds to a modality being interpreted), m ∈ X is the specific index for necessity and possibility, W x is the set of worlds for x, and R x is a relation between w, v ∈ Πx ∈ X.W x that holds if the x components satisfy the accessibility relation over W x and the other components of w and v do not change. This allows us to model an arbitrary number of modalities using Kripke semantics while only carrying one world argument. Another advantage is that it minimizes the change to the translation of formulas in the SUMO-K fragment (without modalities). The only required change is to add a single dependence on w via a new argument and universally quantify over w if the formula is asserted.

We have already done some experiments with this approach and it shows promise. The previous experiments need to be extended to include changes that have occurred to obtain the SUMO-K translation described in the present paper. Once this is done, we must ensure that translated examples both with modalities and the examples in this paper without modalities are provable interactively. We plan to also test automated theorem provers on the subgoals obtained from the interactive proofs. Doing so with the 23 examples in this paper will give an indication how much more difficult the translated problems become if the Kripke infrastructure to handle modalities is included.

Another aspect of SUMO are modalities involving likelihood and probability. These cannot be modelled by Kripke semantics (as the modalities are not normal). We are experimenting with using neighborhood semantics to include these modalities.

## 7 Conclusion

We have described a translation from the SUMO-K fragment of SUMO into higher-order set theory. We have considered a number of examples that use aspects of SUMO-K that go beyond traditional first-order logic, namely variable arity functions and relations, row variables, term-level κ-binders and arithmetic. We have described a number of interactive proofs of translated queries and tested higher-order automated theorem provers on problems obtained by doing premise selection using the corresponding interactive proofs. This gives a set of problems for automated theorem provers that come from the area of "common sense reasoning," an area quite different from the more common sources of formalized mathematics and program verification. On most of the examples, higher-order automated theorem provers cannot fully automatically prove the query, but they perform reasonably well on subgoal problems extracted from the interactive proofs. This gives an indication that the full problems (assuming premise selection) are not too far out of reach for current state of the art higher-order automated theorem provers.

Acknowledgments. This work was partially supported by the ERC-CZ project POSTMAN no. LL1902, Amazon Research Awards, EU ICT-48 2020 project TAILOR no. 952215 and the European Regional Development Fund under the Czech project AI&Reasoning with identifier CZ.02.1.01/0.0/0.0/15\_003/0000466.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Author Index**

#### **A**

Al Wardani, Farah 176 Aoto, Takahito 99

#### **B**

Barrett, Clark 41, 159 Blaauwbroek, Lasse 236 Blanchette, Jasmin 23 Briefs, Yasmine 81 Bromberger, Martin 137 Brown, Chad E. 255

**C** Chaudhuri, Kaustuv 176 Cignarale, Giorgio 119

**D** Dahmen, Sander R. 23

**E** Ekici, Burak 41

**G** Giesl, Jürgen 3

**H** Haga, Ryota 99 Hirokawa, Nao 63

#### **K**

Kagaya, Yuki 99 Kaliszyk, Cezary 236 Kuznets, Roman 119

**L** Leidinger, Hendrik 81 Leutgeb, Lorenz 137 Lommen, Nils 3

#### **M**

Miller, Dale 176 Möhle, Sibylle 195

**N** Nummelin, Visa 23

**P** Pease, Adam 255

**R** Rincon Galeana, Hugo 119

**S** Saito, Teppei 63 Schmid, Ulrich 119

#### **T**

Tinelli, Cesare 41 Toledo, Guilherme V. 159 Torstensson, Olle 217

**U** Urban, Josef 236, 255

**V** Viswanathan, Arjun 41

**W** Weber, Tjark 217 Weidenbach, Christoph 81, 137

**Z** Zhang, Liao 236 Zohar, Yoni 41, 159

© The Editor(s) (if applicable) and The Author(s) 2023 U. Sattler and M. Suda (Eds.): FroCoS 2023, LNAI 14279, p. 275, 2023. https://doi.org/10.1007/978-3-031-43369-6